From nobody Sat May 30 20:13:15 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1776422908575371.00509556960674;
 Fri, 17 Apr 2026 03:48:28 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists1p.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1wDgjE-00011L-Ps; Fri, 17 Apr 2026 06:47:16 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjB-00010B-8U; Fri, 17 Apr 2026 06:47:14 -0400
Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn)
 by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgj8-0007u8-ED; Fri, 17 Apr 2026 06:47:13 -0400
Received: from Huawei.localdomain (unknown [36.110.52.2])
 by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S3;
 Fri, 17 Apr 2026 18:47:05 +0800 (CST)
From: Molly Chen <xiaoou@iscas.ac.cn>
To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com,
 daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com,
 chao.liu.zevorn@gmail.com
Cc: xiaoou@iscas.ac.cn,
	qemu-riscv@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 01/14] target/riscv: rvp: Add option defines and dependency
 check for packed simd extension
Date: Fri, 17 Apr 2026 18:46:38 +0800
Message-Id: <20260417104652.17857-2-xiaoou@iscas.ac.cn>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
References: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S3
X-Coremail-Antispam: 1UD129KBjvJXoWxGw1UurWrtryrurWkuw4rKrg_yoW5Zr4Upr
 ZxG3yakw4DJayfAa93trykXFn8WrsYgws7Kwsruw4xAFZ5ArWUWrnxtw4j9r43GFWrZF42
 93Wv9F13ZFWUZFDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2
 9KBjDU0xBIdaVrnRJUUUBm14x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0
 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2
 x26xkF7I0E14v26r1I6r4UM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0
 Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1l84
 ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1le2I2
 62IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcV
 AFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG
 0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_Jw0_GFyl42
 xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWU
 GwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI4
 8JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4U
 MIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I
 8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUn2-5UUUUU
X-Originating-IP: [36.110.52.2]
X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists1p.gnu.org;
Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn;
 helo=cstnet.cn
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998,
 HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: qemu development <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1776422910538158500
Content-Type: text/plain; charset="utf-8"

Co-Authored by: Yin Zhang <zhangyin2018@iscas.ac.cn>
Co-Authored by: Dajun Huang <djhuang_1@std.uestc.edu.cn>
Co-Authored by: Zhiyuan Yang <zhiyuan.plct@isrc.iscas.ac.cn>

Signed-off-by: Molly Chen <xiaoou@iscas.ac.cn>
---
 target/riscv/cpu.c         |  5 +++--
 target/riscv/cpu.h         |  1 +
 target/riscv/tcg/tcg-cpu.c | 16 ++++++++++++++++
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 72c6f4f0f1..c630faa892 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -41,7 +41,7 @@
 /* RISC-V CPU definitions */
 static const char riscv_single_letter_exts[] =3D "IEMAFDQCBPVH";
 const uint32_t misa_bits[] =3D {RVI, RVE, RVM, RVA, RVF, RVD, RVV,
-                              RVC, RVS, RVU, RVH, RVG, RVB, 0};
+                              RVC, RVS, RVU, RVH, RVG, RVB, RVP, 0};
=20
 /*
  * From vector_helper.c
@@ -1172,7 +1172,8 @@ static const MISAExtInfo misa_ext_info_arr[] =3D {
     MISA_EXT_INFO(RVH, "h", "Hypervisor"),
     MISA_EXT_INFO(RVV, "v", "Vector operations"),
     MISA_EXT_INFO(RVG, "g", "General purpose (IMAFD_Zicsr_Zifencei)"),
-    MISA_EXT_INFO(RVB, "b", "Bit manipulation (Zba_Zbb_Zbs)")
+    MISA_EXT_INFO(RVB, "b", "Bit manipulation (Zba_Zbb_Zbs)"),
+    MISA_EXT_INFO(RVP, "x-p", "Packed-SIMD instructions")
 };
=20
 static void riscv_cpu_validate_misa_mxl(RISCVCPUClass *mcc)
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 4c0676ed53..e08f57d282 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -69,6 +69,7 @@ typedef struct CPUArchState CPURISCVState;
 #define RVH RV('H')
 #define RVG RV('G')
 #define RVB RV('B')
+#define RVP RV('P')
=20
 extern const uint32_t misa_bits[];
 const char *riscv_get_misa_ext_name(uint32_t bit);
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index f3f7808895..4545ae721c 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -601,6 +601,11 @@ static void riscv_cpu_validate_b(RISCVCPU *cpu)
     }
 }
=20
+static void riscv_cpu_validate_p(RISCVCPU *cpu)
+{
+    /* Enable sub-extensions here. Do nothing for now. */
+}
+
 /*
  * Check consistency between chosen extensions while setting
  * cpu->cfg accordingly.
@@ -619,6 +624,10 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, =
Error **errp)
         riscv_cpu_validate_b(cpu);
     }
=20
+    if (riscv_has_ext(env, RVP)) {
+        riscv_cpu_validate_p(cpu);
+    }
+
     if (riscv_has_ext(env, RVI) && riscv_has_ext(env, RVE)) {
         error_setg(errp,
                    "I and E extensions are incompatible");
@@ -683,6 +692,12 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, =
Error **errp)
         return;
     }
=20
+    if (riscv_has_ext(env, RVP) &&
+        !(cpu->cfg.ext_zba && cpu->cfg.ext_zbb && cpu->cfg.ext_zbkb)) {
+        error_setg(errp, "P extension requires zba, zbb and zbkb extension=
s");
+        return;
+    }
+
     riscv_cpu_validate_v(env, &cpu->cfg, &local_err);
     if (local_err !=3D NULL) {
         error_propagate(errp, local_err);
@@ -1413,6 +1428,7 @@ static const RISCVCPUMisaExtConfig misa_ext_cfgs[] =
=3D {
     MISA_CFG(RVV, false),
     MISA_CFG(RVG, false),
     MISA_CFG(RVB, false),
+    MISA_CFG(RVP, false),
 };
=20
 /*
--=20
2.34.1
From nobody Sat May 30 20:13:15 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1776422880323156.14155418811038;
 Fri, 17 Apr 2026 03:48:00 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists1p.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1wDgjG-00012H-Od; Fri, 17 Apr 2026 06:47:18 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjF-000121-Jp; Fri, 17 Apr 2026 06:47:17 -0400
Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn)
 by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjA-0007ub-Jw; Fri, 17 Apr 2026 06:47:17 -0400
Received: from Huawei.localdomain (unknown [36.110.52.2])
 by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S4;
 Fri, 17 Apr 2026 18:47:07 +0800 (CST)
From: Molly Chen <xiaoou@iscas.ac.cn>
To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com,
 daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com,
 chao.liu.zevorn@gmail.com
Cc: xiaoou@iscas.ac.cn,
	qemu-riscv@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 02/14] target/riscv: rvp: add arithmetic instructions,
 including saturating and non-saturating operations
Date: Fri, 17 Apr 2026 18:46:39 +0800
Message-Id: <20260417104652.17857-3-xiaoou@iscas.ac.cn>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
References: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S4
X-Coremail-Antispam: 1UD129KBjvAXoWDGryrZryDtFyfXryrCr43GFg_yoWrtF13Wo
 W7Gw4rAr1xJr13u3s3uw48XFWDZFW29a1kJr4F9r4Duas7Wr1xKr1UJwn5Za1rJr45KrWf
 XFZaqFn8Jas3Cr9rn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3
 AaLaJ3UjIYCTnIWjp_UUUYM7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva
 j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r15M28IrcIa0x
 kI8VCY1x0267AKxVW8JVW5JwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l84AC
 jcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26F4j6r4UJw
 A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS
 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2
 IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0
 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0262kKe7AKxVWUtVW8Zw
 CF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j
 6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64
 vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_
 Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0x
 vEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUYJmUUUUUU=
X-Originating-IP: [36.110.52.2]
X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists1p.gnu.org;
Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn;
 helo=cstnet.cn
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998,
 HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: qemu development <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1776422883945154100
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Molly Chen <xiaoou@iscas.ac.cn>
---
 target/riscv/helper.h                   |   40 +
 target/riscv/insn32.decode              |   65 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  564 ++++++++++++
 target/riscv/meson.build                |    3 +-
 target/riscv/psimd_helper.c             | 1069 +++++++++++++++++++++++
 target/riscv/translate.c                |    1 +
 6 files changed, 1741 insertions(+), 1 deletion(-)
 create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc
 create mode 100644 target/riscv/psimd_helper.c

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 54d2331966..76bc6583fb 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1351,3 +1351,43 @@ DEF_HELPER_4(vsm4r_vs, void, ptr, ptr, env, i32)
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_1(ssamoswap_disabled, void, env)
 #endif
+
+/* Packed SIMD */
+DEF_HELPER_3(padd_b, tl, env, tl, tl)
+DEF_HELPER_3(padd_h, tl, env, tl, tl)
+DEF_HELPER_3(padd_w, i64, env, i64, i64)
+DEF_HELPER_3(padd_bs, tl, env, tl, tl)
+DEF_HELPER_3(padd_hs, tl, env, tl, tl)
+DEF_HELPER_3(padd_ws, i64, env, i64, i64)
+DEF_HELPER_3(psub_b, tl, env, tl, tl)
+DEF_HELPER_3(psub_h, tl, env, tl, tl)
+DEF_HELPER_3(psub_w, i64, env, i64, i64)
+DEF_HELPER_3(psh1add_h, tl, env, tl, tl)
+DEF_HELPER_3(psh1add_w, i64, env, i64, i64)
+DEF_HELPER_3(pssh1sadd_h, tl, env, tl, tl)
+DEF_HELPER_3(pssh1sadd_w, i64, env, i64, i64)
+DEF_HELPER_3(ssh1sadd, i32, env, i32, i32)
+DEF_HELPER_3(psadd_b, tl, env, tl, tl)
+DEF_HELPER_3(psadd_h, tl, env, tl, tl)
+DEF_HELPER_3(psadd_w, i64, env, i64, i64)
+DEF_HELPER_3(psaddu_b, tl, env, tl, tl)
+DEF_HELPER_3(psaddu_h, tl, env, tl, tl)
+DEF_HELPER_3(psaddu_w, i64, env, i64, i64)
+DEF_HELPER_3(sadd, i32, env, i32, i32)
+DEF_HELPER_3(saddu, i32, env, i32, i32)
+DEF_HELPER_3(pssub_b, tl, env, tl, tl)
+DEF_HELPER_3(pssub_h, tl, env, tl, tl)
+DEF_HELPER_3(pssub_w, i64, env, i64, i64)
+DEF_HELPER_3(pssubu_b, tl, env, tl, tl)
+DEF_HELPER_3(pssubu_h, tl, env, tl, tl)
+DEF_HELPER_3(pssubu_w, i64, env, i64, i64)
+DEF_HELPER_3(ssub, i32, env, i32, i32)
+DEF_HELPER_3(ssubu, i32, env, i32, i32)
+DEF_HELPER_3(psati_h, tl, env, tl, tl)
+DEF_HELPER_3(pusati_h, tl, env, tl, tl)
+DEF_HELPER_3(psati_w, i64, env, i64, i64)
+DEF_HELPER_3(pusati_w, i64, env, i64, i64)
+DEF_HELPER_3(sati_32, i32, env, i32, i32)
+DEF_HELPER_3(usati_32, i32, env, i32, i32)
+DEF_HELPER_3(sati_64, i64, env, i64, i64)
+DEF_HELPER_3(usati_64, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 6e35c4b1e6..6043eb39cf 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -40,6 +40,9 @@
 %imm_z6   26:1 15:5
 %imm_mop5 30:1 26:2 20:2
 %imm_mop3 30:1 26:2
+%imm_p_ui16 20:4
+%imm_p_ui32 20:5
+%imm_p_ui64 20:6
=20
 # Argument sets:
 &empty
@@ -105,6 +108,10 @@
 @mop5 . . .. .. .... .. ..... ... ..... ....... &mop5 imm=3D%imm_mop5 %rd =
%rs1
 @mop3 . . .. .. . ..... ..... ... ..... ....... &mop3 imm=3D%imm_mop3 %rd =
%rs1 %rs2
=20
+@p_ui16 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui16 %rs1 %=
rd
+@p_ui32 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui32 %rs1 %=
rd
+@p_ui64 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui64 %rs1 %=
rd
+
 # Formats 64:
 @sh5     .......  ..... .....  ... ..... ....... &shift  shamt=3D%sh5     =
 %rs1 %rd
=20
@@ -1084,3 +1091,61 @@ sb_aqrl  00111 . . ..... ..... 000 ..... 0101111 @at=
om_st
 sh_aqrl  00111 . . ..... ..... 001 ..... 0101111 @atom_st
 sw_aqrl  00111 . . ..... ..... 010 ..... 0101111 @atom_st
 sd_aqrl  00111 . . ..... ..... 011 ..... 0101111 @atom_st
+
+
+# *** P Experimental Extension Version v018 ***
+# Arithmetic Operations(Non-Saturating and Saturating)
+padd_b     1000010 ..... ..... 000 ..... 0111011 @r
+padd_h     1000000 ..... ..... 000 ..... 0111011 @r
+padd_w     1000001 ..... ..... 000 ..... 0111011 @r
+padd_bs    1001110 ..... ..... 010 ..... 0011011 @r
+padd_hs    1001100 ..... ..... 010 ..... 0011011 @r
+padd_ws    1001101 ..... ..... 010 ..... 0011011 @r
+psub_b     1100010 ..... ..... 000 ..... 0111011 @r
+psub_h     1100000 ..... ..... 000 ..... 0111011 @r
+psub_w     1100001 ..... ..... 000 ..... 0111011 @r
+psh1add_h  1010000 ..... ..... 010 ..... 0111011 @r
+psh1add_w  1010001 ..... ..... 010 ..... 0111011 @r
+pssh1sadd_h   1011000 ..... ..... 010 ..... 0111011 @r
+{
+  ssh1sadd    1011001 ..... ..... 010 ..... 0111011 @r
+  pssh1sadd_w 1011001 ..... ..... 010 ..... 0111011 @r
+}
+psadd_b    1001010 ..... ..... 000 ..... 0111011 @r
+psadd_h    1001000 ..... ..... 000 ..... 0111011 @r
+{
+  sadd     1001001 ..... ..... 000 ..... 0111011 @r
+  psadd_w  1001001 ..... ..... 000 ..... 0111011 @r
+}
+psaddu_b   1011010 ..... ..... 000 ..... 0111011 @r
+psaddu_h   1011000 ..... ..... 000 ..... 0111011 @r
+{
+  saddu    1011001 ..... ..... 000 ..... 0111011 @r
+  psaddu_w 1011001 ..... ..... 000 ..... 0111011 @r
+}
+pssub_b    1101010 ..... ..... 000 ..... 0111011 @r
+pssub_h    1101000 ..... ..... 000 ..... 0111011 @r
+{
+  ssub     1101001 ..... ..... 000 ..... 0111011 @r
+  pssub_w  1101001 ..... ..... 000 ..... 0111011 @r
+}
+pssubu_b   1111010 ..... ..... 000 ..... 0111011 @r
+pssubu_h   1111000 ..... ..... 000 ..... 0111011 @r
+{
+  ssubu      1111001 ..... ..... 000 ..... 0111011 @r
+  pssubu_w   1111001 ..... ..... 000 ..... 0111011 @r
+}
+psati_h    11100 001.... ..... 100 ..... 0011011 @p_ui16
+pusati_h   10100 001.... ..... 100 ..... 0011011 @p_ui16
+{
+  sati_32    11100 01..... ..... 100 ..... 0011011 @p_ui32
+  psati_w      11100 01..... ..... 100 ..... 0011011 @p_ui32
+}
+{
+  usati_32   10100 01..... ..... 100 ..... 0011011 @p_ui32
+  pusati_w     10100 01..... ..... 100 ..... 0011011 @p_ui32
+}
+
+sati_64    111001 ...... ..... 100 ..... 0011011 @p_ui64
+usati_64   101001 ...... ..... 100 ..... 0011011 @p_ui64
+
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr=
ans/trans_rvp.c.inc
new file mode 100644
index 0000000000..6f7246b563
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -0,0 +1,564 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* RISC-V translation routines for the P Standard Extensions. */
+/* Copyright (c) 2026 ISRC ISCAS. */
+
+#define GEN_SIMD_TRANS(NAME)                                \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+   REQUIRE_EXT(ctx, RVP);                                   \
+   TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);              \
+   TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE);              \
+   TCGv dest =3D dest_gpr(ctx, a->rd);                        \
+   gen_helper_##NAME(dest, tcg_env, src1, src2);            \
+   return true;                                             \
+}
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_32(NAME)                             \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a)  \
+{                                                           \
+    REQUIRE_32BIT(ctx);                                     \
+    REQUIRE_EXT(ctx, RVP);                                  \
+    TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);             \
+    TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE);             \
+    TCGv dest =3D dest_gpr(ctx, a->rd);                       \
+    gen_helper_##NAME(dest, tcg_env, src1, src2);           \
+    return true;                                            \
+}
+#else
+#define GEN_SIMD_TRANS_32(NAME)                             \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a)  \
+{                                                           \
+    REQUIRE_32BIT(ctx);                                     \
+    return true;                                            \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_64(NAME)                             \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a)  \
+{                                                           \
+   REQUIRE_64BIT(ctx);                                      \
+   return true;                                             \
+}
+#else
+#define GEN_SIMD_TRANS_64(NAME)                             \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a)  \
+{                                                           \
+    REQUIRE_64BIT(ctx);                                     \
+    REQUIRE_EXT(ctx, RVP);                                  \
+    TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);             \
+    TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE);             \
+    TCGv dest =3D dest_gpr(ctx, a->rd);                       \
+    gen_helper_##NAME(dest, tcg_env, src1, src2);           \
+    return true;                                            \
+}
+#endif
+
+#define GEN_SIMD_TRANS_ACC(NAME)                            \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_EXT(ctx, RVP);                                  \
+    TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);             \
+    TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE);             \
+    TCGv dest =3D dest_gpr(ctx, a->rd);                       \
+    TCGv t =3D tcg_temp_new();                                \
+    gen_helper_##NAME(t, tcg_env, src1, src2, dest);        \
+    gen_set_gpr(ctx, a->rd, t);                             \
+    return true;                                            \
+}
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_ACC_32(NAME)                         \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_32BIT(ctx);                                     \
+    REQUIRE_EXT(ctx, RVP);                                  \
+    TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);             \
+    TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE);             \
+    TCGv dest =3D dest_gpr(ctx, a->rd);                       \
+    TCGv t =3D tcg_temp_new();                                \
+    gen_helper_##NAME(t, tcg_env, src1, src2, dest);        \
+    gen_set_gpr(ctx, a->rd, t);                             \
+    return true;                                            \
+}
+#else
+#define GEN_SIMD_TRANS_ACC_32(NAME)                         \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_32BIT(ctx);                                     \
+    return true;                                            \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_ACC_64(NAME)                         \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_64BIT(ctx);                                     \
+    return true;                                            \
+}
+#else
+#define GEN_SIMD_TRANS_ACC_64(NAME)                         \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_64BIT(ctx);                                     \
+    REQUIRE_EXT(ctx, RVP);                                  \
+    TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);             \
+    TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE);             \
+    TCGv dest =3D dest_gpr(ctx, a->rd);                       \
+    TCGv t =3D tcg_temp_new();                                \
+    gen_helper_##NAME(t, tcg_env, src1, src2, dest);        \
+    gen_set_gpr(ctx, a->rd, t);                             \
+    return true;                                            \
+}
+#endif
+
+#define GEN_SIMD_TRANS_R1(NAME)                             \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_EXT(ctx, RVP);                                  \
+    TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);             \
+    TCGv dest =3D dest_gpr(ctx, a->rd);                       \
+    gen_helper_##NAME(dest, tcg_env, src1);                 \
+    gen_set_gpr(ctx, a->rd, dest);                          \
+    return true;                                            \
+}
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_R1_64(NAME)                          \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_64BIT(ctx);                                     \
+    return true;                                            \
+}
+#else
+#define GEN_SIMD_TRANS_R1_64(NAME)                          \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_64BIT(ctx);                                     \
+    REQUIRE_EXT(ctx, RVP);                                  \
+    TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);             \
+    TCGv dest =3D dest_gpr(ctx, a->rd);                       \
+    gen_helper_##NAME(dest, tcg_env, src1);                 \
+    return true;                                            \
+}
+#endif
+
+#define GEN_SIMD_TRANS_IMM(NAME)                            \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_EXT(ctx, RVP);                                  \
+    TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);             \
+    TCGv imm =3D tcg_constant_tl(a->imm);                     \
+    TCGv dest =3D dest_gpr(ctx, a->rd);                       \
+    gen_helper_##NAME(dest, tcg_env, src1, imm);            \
+    return true;                                            \
+}
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_IMM_32(NAME)                         \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_32BIT(ctx);                                     \
+    REQUIRE_EXT(ctx, RVP);                                   \
+    TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);             \
+    TCGv imm =3D tcg_constant_tl(a->imm);                     \
+    TCGv dest =3D dest_gpr(ctx, a->rd);                       \
+    gen_helper_##NAME(dest, tcg_env, src1, imm);            \
+    return true;                                            \
+}
+#else
+#define GEN_SIMD_TRANS_IMM_32(NAME)                         \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_32BIT(ctx);                                     \
+    return true;                                            \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_IMM_64(NAME)                         \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_64BIT(ctx);                                     \
+    return true;                                            \
+}
+#else
+#define GEN_SIMD_TRANS_IMM_64(NAME)                         \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_64BIT(ctx);                                     \
+    REQUIRE_EXT(ctx, RVP);                                   \
+    TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);             \
+    TCGv imm =3D tcg_constant_tl(a->imm);                     \
+    TCGv dest =3D dest_gpr(ctx, a->rd);                       \
+    gen_helper_##NAME(dest, tcg_env, src1, imm);            \
+    return true;                                            \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_REG_PAIR_1(NAME)                     \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_32BIT(ctx);                                     \
+    REQUIRE_EXT(ctx, RVP);                                  \
+    TCGv_i32 src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);         \
+    TCGv_i32 src2 =3D get_gpr(ctx, a->rs2, EXT_NONE);         \
+    TCGv_i64 t =3D tcg_temp_new_i64();                        \
+    gen_helper_##NAME(t, tcg_env, src1, src2);              \
+    set_pair_regs(ctx, (a->rd) * 2, t);                       \
+    return true;                                            \
+}
+#else
+#define GEN_SIMD_TRANS_REG_PAIR_1(NAME)                     \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_32BIT(ctx);                                     \
+    return true;                                            \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_REG_PAIR_2(INSN, HELPER)                \
+static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)    \
+{                                                              \
+    REQUIRE_32BIT(ctx);                                        \
+    REQUIRE_EXT(ctx, RVP);                                     \
+    TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE);          \
+    TCGv src2_0 =3D get_gpr(ctx, (a->rs2) * 2, EXT_NONE);          \
+    TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2);                    \
+    TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE);        \
+    TCGv src2_1 =3D get_gpr(ctx, (a->rs2) * 2 + 1, EXT_NONE);        \
+    TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);                  \
+    gen_helper_##HELPER(dest_0, tcg_env, src1_0, src2_0);      \
+    gen_helper_##HELPER(dest_1, tcg_env, src1_1, src2_1);      \
+    return true;                                               \
+}
+#else
+#define GEN_SIMD_TRANS_REG_PAIR_2(INSN, HELPER)                \
+static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)    \
+{                                                              \
+    REQUIRE_32BIT(ctx);                                        \
+    return true;                                               \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_REG_PAIR_3(INSN, HELPER)                \
+static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)    \
+{                                                              \
+    REQUIRE_EXT(ctx, RVP);                                     \
+    TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE);          \
+    TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2);                    \
+    TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE);        \
+    TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);                  \
+    TCGv src2   =3D get_gpr(ctx, a->rs2, EXT_NONE);              \
+    gen_helper_##HELPER(dest_0, tcg_env, src1_0, src2);        \
+    gen_helper_##HELPER(dest_1, tcg_env, src1_1, src2);        \
+    return true;                                               \
+}
+#else
+#define GEN_SIMD_TRANS_REG_PAIR_3(INSN, HELPER)                \
+static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)    \
+{                                                              \
+    REQUIRE_32BIT(ctx);                                        \
+    return true;                                               \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_REG_PAIR_DW(INSN, HELPER)               \
+static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)     \
+{                                                              \
+    REQUIRE_32BIT(ctx);                                        \
+    REQUIRE_EXT(ctx, RVP);                                     \
+    TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE);          \
+    TCGv src2_0 =3D get_gpr(ctx, (a->rs2) * 2, EXT_NONE);          \
+    TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2);                    \
+    TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE);        \
+    TCGv src2_1 =3D get_gpr(ctx, (a->rs2) * 2 + 1, EXT_NONE);        \
+    TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);                  \
+    gen_helper_##HELPER(dest_0, tcg_env, src1_0, src2_0);      \
+    gen_helper_##HELPER(dest_1, tcg_env, src1_1, src2_1);      \
+    return true;                                               \
+}
+#else
+#define GEN_SIMD_TRANS_REG_PAIR_DW(INSN, HELPER)               \
+static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)     \
+{                                                              \
+    REQUIRE_32BIT(ctx);                                        \
+    return true;                                               \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_REG_PAIR_DW_IMM(INSN, HELPER)           \
+static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)     \
+{                                                              \
+    REQUIRE_32BIT(ctx);                                        \
+    REQUIRE_EXT(ctx, RVP);                                     \
+    TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE);          \
+    TCGv imm_0 =3D tcg_constant_tl(a->imm);                      \
+    TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2);                    \
+    TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE);        \
+    TCGv imm_1 =3D tcg_constant_tl(a->imm);                      \
+    TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);                  \
+    gen_helper_##HELPER(dest_0, tcg_env, src1_0, imm_0);       \
+    gen_helper_##HELPER(dest_1, tcg_env, src1_1, imm_1);       \
+    return true;                                               \
+}
+#else
+#define GEN_SIMD_TRANS_REG_PAIR_DW_IMM(INSN, HELPER)           \
+static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)     \
+{                                                              \
+    REQUIRE_32BIT(ctx);                                        \
+    return true;                                               \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_REG_PAIR_DW_IMM_2(INSN, HELPER)         \
+static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)     \
+{                                                              \
+    REQUIRE_32BIT(ctx);                                        \
+    REQUIRE_EXT(ctx, RVP);                                     \
+    TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE);          \
+    TCGv imm_0 =3D tcg_constant_tl(a->imm);                      \
+    TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2);                    \
+    TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE);        \
+    TCGv imm_1 =3D tcg_constant_tl(a->imm);                      \
+    TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);                  \
+    gen_helper_##HELPER##_32(dest_0, tcg_env, src1_0, imm_0);  \
+    gen_helper_##HELPER##_32(dest_1, tcg_env, src1_1, imm_1);  \
+    return true;                                               \
+}
+#else
+#define GEN_SIMD_TRANS_REG_PAIR_DW_IMM_2(INSN, HELPER)         \
+static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)     \
+{                                                              \
+    REQUIRE_32BIT(ctx);                                        \
+    return true;                                               \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_REG_PAIR_5(INSN, HELPER)                \
+static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)    \
+{                                                              \
+    REQUIRE_EXT(ctx, RVP);                                     \
+    TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE);          \
+    TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2);                    \
+    TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE);        \
+    TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);                  \
+    gen_helper_##HELPER(dest_0, tcg_env, src1_0);              \
+    gen_helper_##HELPER(dest_1, tcg_env, src1_1);              \
+    gen_set_gpr(ctx, (a->rd) * 2, dest_0);                       \
+    gen_set_gpr(ctx, (a->rd) * 2 + 1, dest_1);                     \
+    return true;                                               \
+}
+#else
+#define GEN_SIMD_TRANS_REG_PAIR_5(INSN, HELPER)                \
+static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)    \
+{                                                              \
+    REQUIRE_32BIT(ctx);                                        \
+    return true;                                               \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_REG_PAIR_IMM(NAME)                   \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_32BIT(ctx);                                     \
+    REQUIRE_EXT(ctx, RVP);                                  \
+    TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);             \
+    TCGv_i32 imm =3D tcg_constant_i32(a->imm);                \
+    TCGv_i64 t =3D tcg_temp_new_i64();                        \
+    gen_helper_##NAME(t, tcg_env, src1, imm);               \
+    set_pair_regs(ctx, (a->rd) * 2, t);                       \
+    return true;                                            \
+}
+#else
+#define GEN_SIMD_TRANS_REG_PAIR_IMM(NAME)                   \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_32BIT(ctx);                                     \
+    return true;                                            \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_REG_PAIR_IMM_2(INSN, HELPER)          \
+static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)  \
+{                                                            \
+    REQUIRE_32BIT(ctx);                                      \
+    REQUIRE_EXT(ctx, RVP);                                   \
+    TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE);        \
+    TCGv imm_0 =3D tcg_constant_tl(a->imm);                    \
+    TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2);                  \
+    TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE);      \
+    TCGv imm_1 =3D tcg_constant_tl(a->imm);                    \
+    TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);                \
+    gen_helper_##HELPER(dest_0, tcg_env, src1_0, imm_0);     \
+    gen_helper_##HELPER(dest_1, tcg_env, src1_1, imm_1);     \
+    return true;                                             \
+}
+#else
+#define GEN_SIMD_TRANS_REG_PAIR_IMM_2(INSN, HELPER)          \
+static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)  \
+{                                                            \
+    REQUIRE_32BIT(ctx);                                      \
+    return true;                                             \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_ACC_REG_PAIR_1(NAME)                 \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_32BIT(ctx);                                     \
+    REQUIRE_EXT(ctx, RVP);                                  \
+    TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);             \
+    TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE);             \
+    TCGv_i64 t =3D tcg_temp_new_i64();                        \
+    if (a->rd =3D=3D 0) {                                         \
+        tcg_gen_movi_i64(t, 0);                             \
+    } else {                                                  \
+        get_pair_regs(ctx, t, (a->rd) * 2);                   \
+    }                                                       \
+    gen_helper_##NAME(t, tcg_env, src1, src2, t);           \
+    set_pair_regs(ctx, (a->rd) * 2, t);                       \
+    return true;                                            \
+}
+#else
+#define GEN_SIMD_TRANS_ACC_REG_PAIR_1(NAME)                 \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_32BIT(ctx);                                     \
+    return true;                                            \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_REG_PAIR_PREDSUM(NAME)               \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_32BIT(ctx);                                     \
+    REQUIRE_EXT(ctx, RVP);                                  \
+    TCGv_i32 src1_l;                                        \
+    TCGv_i32 src1_h;                                        \
+    TCGv_i32 src2 =3D get_gpr(ctx, a->rs2, EXT_NONE);         \
+    TCGv_i32 dest =3D dest_gpr(ctx, a->rd);                   \
+    if (a->rs1 =3D=3D 0) {                                        \
+        src1_l =3D tcg_temp_new_i32();                        \
+        src1_h =3D tcg_temp_new_i32();                        \
+        tcg_gen_movi_i32(src1_l, 0);                        \
+        tcg_gen_movi_i32(src1_h, 0);                        \
+    } else {                                                  \
+        src1_l =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE);        \
+        src1_h =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE);      \
+    }                                                       \
+    gen_helper_##NAME(dest, tcg_env, src1_l, src1_h, src2); \
+    return true;                                            \
+}
+#else
+#define GEN_SIMD_TRANS_REG_PAIR_PREDSUM(NAME)               \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                           \
+    REQUIRE_32BIT(ctx);                                     \
+    return true;                                            \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_PN_OP_IMM(NAME)                     \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                          \
+    REQUIRE_32BIT(ctx);                                    \
+    REQUIRE_EXT(ctx, RVP);                                 \
+    TCGv_i64 s1 =3D tcg_temp_new_i64();                      \
+    if (a->rs1 =3D=3D 0) {                                     \
+        tcg_gen_mov_i64(s1, 0);                            \
+    } else {                                               \
+        get_pair_regs(ctx, s1, a->rs1 * 2);                \
+    }                                                      \
+    TCGv shamt =3D tcg_constant_tl(a->imm);                  \
+    TCGv_i32 dest =3D dest_gpr(ctx, a->rd);                  \
+    gen_helper_##NAME(dest, tcg_env, s1, shamt);           \
+    return true;                                           \
+}
+#else
+#define GEN_SIMD_TRANS_PN_OP_IMM(NAME)                     \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                          \
+    REQUIRE_32BIT(ctx);                                    \
+    return true;                                           \
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+#define GEN_SIMD_TRANS_PN_OP_REG(NAME)                     \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                          \
+    REQUIRE_32BIT(ctx);                                    \
+    REQUIRE_EXT(ctx, RVP);                                 \
+    TCGv_i64 s1 =3D tcg_temp_new_i64();                      \
+    if (a->rs1 =3D=3D 0) {                                     \
+        tcg_gen_mov_i64(s1, 0);                            \
+    } else {                                               \
+        get_pair_regs(ctx, s1, a->rs1 * 2);                \
+    }                                                      \
+    TCGv_i32 rs2 =3D get_gpr(ctx, a->rs2, EXT_NONE);         \
+    TCGv_i32 dest =3D dest_gpr(ctx, a->rd);                  \
+    gen_helper_##NAME(dest, tcg_env, s1, rs2);             \
+    return true;                                           \
+}
+#else
+#define GEN_SIMD_TRANS_PN_OP_REG(NAME)                     \
+static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+{                                                          \
+    REQUIRE_32BIT(ctx);                                    \
+    return true;                                           \
+}
+#endif
+
+GEN_SIMD_TRANS(padd_b)
+GEN_SIMD_TRANS(padd_h)
+GEN_SIMD_TRANS_64(padd_w)
+GEN_SIMD_TRANS(padd_bs)
+GEN_SIMD_TRANS(padd_hs)
+GEN_SIMD_TRANS_64(padd_ws)
+GEN_SIMD_TRANS(psub_b)
+GEN_SIMD_TRANS(psub_h)
+GEN_SIMD_TRANS_64(psub_w)
+GEN_SIMD_TRANS(psh1add_h)
+GEN_SIMD_TRANS_64(psh1add_w)
+GEN_SIMD_TRANS(pssh1sadd_h)
+GEN_SIMD_TRANS_64(pssh1sadd_w)
+GEN_SIMD_TRANS_32(ssh1sadd)
+GEN_SIMD_TRANS(psadd_b)
+GEN_SIMD_TRANS(psadd_h)
+GEN_SIMD_TRANS_64(psadd_w)
+GEN_SIMD_TRANS(psaddu_b)
+GEN_SIMD_TRANS(psaddu_h)
+GEN_SIMD_TRANS_64(psaddu_w)
+GEN_SIMD_TRANS_32(sadd)
+GEN_SIMD_TRANS_32(saddu)
+GEN_SIMD_TRANS(pssub_b)
+GEN_SIMD_TRANS(pssub_h)
+GEN_SIMD_TRANS_64(pssub_w)
+GEN_SIMD_TRANS(pssubu_b)
+GEN_SIMD_TRANS(pssubu_h)
+GEN_SIMD_TRANS_64(pssubu_w)
+GEN_SIMD_TRANS_32(ssub)
+GEN_SIMD_TRANS_32(ssubu)
+GEN_SIMD_TRANS_IMM(psati_h)
+GEN_SIMD_TRANS_IMM(pusati_h)
+GEN_SIMD_TRANS_IMM_64(psati_w)
+GEN_SIMD_TRANS_IMM_64(pusati_w)
+GEN_SIMD_TRANS_IMM_32(sati_32)
+GEN_SIMD_TRANS_IMM_32(usati_32)
+GEN_SIMD_TRANS_IMM_64(sati_64)
+GEN_SIMD_TRANS_IMM_64(usati_64)
diff --git a/target/riscv/meson.build b/target/riscv/meson.build
index 79f36abd63..45ed7f8d8a 100644
--- a/target/riscv/meson.build
+++ b/target/riscv/meson.build
@@ -28,7 +28,8 @@ riscv_ss.add(files(
   'm128_helper.c',
   'crypto_helper.c',
   'zce_helper.c',
-  'vcrypto_helper.c'
+  'vcrypto_helper.c',
+  'psimd_helper.c'
 ))
=20
 riscv_system_ss =3D ss.source_set()
diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c
new file mode 100644
index 0000000000..a754ee3b5e
--- /dev/null
+++ b/target/riscv/psimd_helper.c
@@ -0,0 +1,1069 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* RISC-V Packed SIMD Extension Helpers for QEMU. */
+/* Copyright (C) 2026 ISRC ISCAS. */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "qemu/host-utils.h"
+#include "exec/helper-proto.h"
+#include "fpu/softfloat.h"
+#include "internals.h"
+
+
+/* Helper macros */
+
+/* Element count calculations */
+#define ELEMS_B(target) (sizeof(target) * 8 / 8)    /* byte elements count=
 */
+#define ELEMS_H(target) (sizeof(target) * 8 / 16)
+#define ELEMS_W(target) (sizeof(target) * 8 / 32)   /* word elements count=
 */
+
+/* Element extraction macros - unsigned to avoid sign extension */
+#define EXTRACT8(val, idx)  (((val) >> ((idx) * 8)) & 0xFF)
+#define EXTRACT16(val, idx) (((val) >> ((idx) * 16)) & 0xFFFF)
+#define EXTRACT32(val, idx) (((val) >> ((idx) * 32)) & 0xFFFFFFFF)
+
+/* Element insertion macros */
+#define INSERT8(val, res, idx) \
+    ((val) | ((target_ulong)(uint8_t)(res) << ((idx) * 8)))
+#define INSERT16(val, res, idx) \
+    ((val) | ((target_ulong)(uint16_t)(res) << ((idx) * 16)))
+#define INSERT32(val, res, idx) \
+    ((val) | ((target_ulong)(uint32_t)(res) << ((idx) * 32)))
+
+/* Saturation constants */
+static const int8_t   SAT_MAX_B =3D 127;
+static const int8_t   SAT_MIN_B =3D -128;
+static const int16_t  SAT_MAX_H =3D 32767;
+static const int16_t  SAT_MIN_H =3D -32768;
+static const int32_t  SAT_MAX_W =3D 2147483647;
+static const int32_t  SAT_MIN_W =3D -2147483648LL;
+static const uint8_t  USAT_MAX_B =3D 255;
+static const uint16_t USAT_MAX_H =3D 65535;
+static const uint32_t USAT_MAX_W =3D 4294967295U;
+
+
+/* Saturation helper functions */
+
+/**
+ * Signed saturation for 8-bit elements
+ * Returns saturated value and sets *sat if saturation occurred
+ */
+static inline int8_t signed_saturate_b(int32_t val, int *sat)
+{
+    if (val > SAT_MAX_B) {
+        *sat =3D 1;
+        return SAT_MAX_B;
+    }
+    if (val < SAT_MIN_B) {
+        *sat =3D 1;
+        return SAT_MIN_B;
+    }
+    return (int8_t)val;
+}
+
+/**
+ * Signed saturation for 16-bit elements
+ */
+static inline int16_t signed_saturate_h(int32_t val, int *sat)
+{
+    if (val > SAT_MAX_H) {
+        *sat =3D 1;
+        return SAT_MAX_H;
+    }
+    if (val < SAT_MIN_H) {
+        *sat =3D 1;
+        return SAT_MIN_H;
+    }
+    return (int16_t)val;
+}
+
+/**
+ * Signed saturation for 32-bit elements
+ */
+static inline int32_t signed_saturate_w(int64_t val, int *sat)
+{
+    if (val > SAT_MAX_W) {
+        *sat =3D 1;
+        return SAT_MAX_W;
+    }
+    if (val < SAT_MIN_W) {
+        *sat =3D 1;
+        return SAT_MIN_W;
+    }
+    return (int32_t)val;
+}
+
+/**
+ * Unsigned saturation for 8-bit elements
+ */
+static inline uint8_t unsigned_saturate_b(uint32_t val, int *sat)
+{
+    if (val > USAT_MAX_B) {
+        *sat =3D 1;
+        return USAT_MAX_B;
+    }
+    return (uint8_t)val;
+}
+
+/**
+ * Unsigned saturation for 16-bit elements
+ */
+static inline uint16_t unsigned_saturate_h(uint32_t val, int *sat)
+{
+    if (val > USAT_MAX_H) {
+        *sat =3D 1;
+        return USAT_MAX_H;
+    }
+    return (uint16_t)val;
+}
+
+/**
+ * Unsigned saturation for 32-bit elements
+ */
+static inline uint32_t unsigned_saturate_w(uint64_t val, int *sat)
+{
+    if (val > USAT_MAX_W) {
+        *sat =3D 1;
+        return USAT_MAX_W;
+    }
+    return (uint32_t)val;
+}
+
+/* Basic addition operations (non-saturating) */
+
+/**
+ * PADD.B - Packed 8-bit addition
+ * For each byte: rd[i] =3D rs1[i] + rs2[i] (modular)
+ */
+target_ulong HELPER(padd_b)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i);
+        uint8_t res =3D e1 + e2;
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PADD.H - Packed 16-bit addition
+ * For each halfword: rd[i] =3D rs1[i] + rs2[i] (modular)
+ */
+target_ulong HELPER(padd_h)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint16_t res =3D e1 + e2;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PADD.W - Packed 32-bit addition (RV64 only)
+ * For each word: rd[i] =3D rs1[i] + rs2[i] (modular)
+ */
+uint64_t HELPER(padd_w)(CPURISCVState *env,
+                     uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;  /* 2 words in 64-bit */
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint32_t res =3D e1 + e2;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PADD.BS - Packed 8-bit addition with scalar second operand
+ * For each byte: rd[i] =3D rs1[i] + rs2[0] (modular)
+ */
+target_ulong HELPER(padd_bs)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+    uint8_t e2 =3D EXTRACT8(rs2, 0);  /* Scalar, take least significant by=
te */
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t res =3D e1 + e2;
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PADD.HS - Packed 16-bit addition with scalar second operand
+ * For each halfword: rd[i] =3D rs1[i] + rs2[0] (modular)
+ */
+target_ulong HELPER(padd_hs)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    uint16_t e2 =3D EXTRACT16(rs2, 0);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t res =3D e1 + e2;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PADD.WS - Packed 32-bit addition with scalar second operand (RV64 only)
+ * For each word: rd[i] =3D rs1[i] + rs2[0] (modular)
+ */
+uint64_t HELPER(padd_ws)(CPURISCVState *env,
+                     uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    uint32_t e2 =3D EXTRACT32(rs2, 0);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t res =3D e1 + e2;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+
+/* Basic subtraction operations (non-saturating) */
+
+/**
+ * PSUB.B - Packed 8-bit subtraction
+ * For each byte: rd[i] =3D rs1[i] - rs2[i] (modular)
+ */
+target_ulong HELPER(psub_b)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i);
+        uint8_t res =3D e1 - e2;
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSUB.H - Packed 16-bit subtraction
+ * For each halfword: rd[i] =3D rs1[i] - rs2[i] (modular)
+ */
+target_ulong HELPER(psub_h)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint16_t res =3D e1 - e2;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSUB.W - Packed 32-bit subtraction (RV64 only)
+ * For each word: rd[i] =3D rs1[i] - rs2[i] (modular)
+ */
+uint64_t HELPER(psub_w)(CPURISCVState *env,
+                     uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint32_t res =3D e1 - e2;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/* Shift-left-by-one and add operations */
+
+/**
+ * PSH1ADD.H - Shift left by 1 and add (16-bit)
+ * For each halfword: rd[i] =3D (rs1[i] << 1) + rs2[i]
+ */
+target_ulong HELPER(psh1add_h)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint16_t res =3D (e1 << 1) + e2;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSH1ADD.W - Shift left by 1 and add (32-bit, RV64 only)
+ * For each word: rd[i] =3D (rs1[i] << 1) + rs2[i]
+ */
+uint64_t HELPER(psh1add_w)(CPURISCVState *env,
+                     uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint32_t res =3D (e1 << 1) + e2;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSSH1SADD.H - Saturating shift left by 1 and saturating add (16-bit)
+ * For each halfword: rd[i] =3D sat16(sat16(rs1[i] << 1) + rs2[i])
+ */
+target_ulong HELPER(pssh1sadd_h)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        int32_t shifted;
+
+        /* Check if shift-left-1 would overflow */
+        if (e1 > 0x3FFF || e1 < -0x4000) {
+            shifted =3D (e1 < 0) ? 0xFFFF8000LL : 0x7FFF;
+            sat =3D 1;
+        } else {
+            shifted =3D e1 << 1;
+        }
+
+        int32_t sum =3D shifted + e2;
+        int16_t res =3D signed_saturate_h(sum, &sat);
+        rd =3D INSERT16(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSH1SADD.W - Saturating shift left by 1 and add
+ * with saturation (32-bit, RV64 only)
+ * For each word: rd[i] =3D sat32(sat32(rs1[i] << 1) + rs2[i])
+ */
+uint64_t HELPER(pssh1sadd_w)(CPURISCVState *env,
+                     uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t e2 =3D (int32_t)EXTRACT32(rs2, i);
+        int64_t shifted;
+
+        /* Check if shift-left-1 would overflow */
+        if (e1 > 0x3FFFFFFF || e1 < -0x40000000) {
+            shifted =3D (e1 < 0) ? 0xFFFFFFFF80000000LL : 0x7FFFFFFF;
+            sat =3D 1;
+        } else {
+            shifted =3D (int64_t)e1 << 1;
+        }
+
+        int64_t sum =3D shifted + e2;
+        int32_t res =3D signed_saturate_w(sum, &sat);
+        rd =3D INSERT32(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * SSH1SADD - 32-bit scalar saturating shift left by 1 and saturating add
+ */
+uint32_t HELPER(ssh1sadd)(CPURISCVState *env,
+                     uint32_t rs1, uint32_t rs2)
+{
+    int32_t a =3D (int32_t)rs1;
+    int32_t b =3D (int32_t)rs2;
+    int64_t shifted;
+    int sat =3D 0;
+
+    /* Check if shift-left-1 would overflow */
+    if (a > 0x3FFFFFFF || a < -0x40000000) {
+        shifted =3D (a < 0) ? 0xFFFFFFFF80000000LL : 0x7FFFFFFF;
+        sat =3D 1;
+    } else {
+        shifted =3D (int64_t)a << 1;
+    }
+
+    int64_t sum =3D shifted + b;
+    int32_t res =3D signed_saturate_w(sum, &sat);
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return (uint32_t)res;
+}
+
+/* Saturating addition operations */
+
+/**
+ * PSADD.B - Packed 8-bit signed saturating addition
+ * For each byte: rd[i] =3D sat8(rs1[i] + rs2[i])
+ */
+target_ulong HELPER(psadd_b)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int8_t e1 =3D (int8_t)EXTRACT8(rs1, i);
+        int8_t e2 =3D (int8_t)EXTRACT8(rs2, i);
+        int32_t sum =3D (int32_t)e1 + (int32_t)e2;
+        int8_t res =3D signed_saturate_b(sum, &sat);
+        rd =3D INSERT8(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSADD.H - Packed 16-bit signed saturating addition
+ * For each halfword: rd[i] =3D sat16(rs1[i] + rs2[i])
+ */
+target_ulong HELPER(psadd_h)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        int32_t sum =3D (int32_t)e1 + (int32_t)e2;
+        int16_t res =3D signed_saturate_h(sum, &sat);
+        rd =3D INSERT16(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSADD.W - Packed 32-bit signed saturating addition (RV64 only)
+ * For each word: rd[i] =3D sat32(rs1[i] + rs2[i])
+ */
+uint64_t HELPER(psadd_w)(CPURISCVState *env,
+                     uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t e2 =3D (int32_t)EXTRACT32(rs2, i);
+        int64_t sum =3D (int64_t)e1 + (int64_t)e2;
+        int32_t res =3D signed_saturate_w(sum, &sat);
+        rd =3D INSERT32(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSADDU.B - Packed 8-bit unsigned saturating addition
+ * For each byte: rd[i] =3D usat8(rs1[i] + rs2[i])
+ */
+target_ulong HELPER(psaddu_b)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i);
+        uint32_t sum =3D (uint32_t)e1 + (uint32_t)e2;
+        uint8_t res =3D unsigned_saturate_b(sum, &sat);
+        rd =3D INSERT8(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSADDU.H - Packed 16-bit unsigned saturating addition
+ * For each halfword: rd[i] =3D usat16(rs1[i] + rs2[i])
+ */
+target_ulong HELPER(psaddu_h)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint32_t sum =3D (uint32_t)e1 + (uint32_t)e2;
+        uint16_t res =3D unsigned_saturate_h(sum, &sat);
+        rd =3D INSERT16(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSADDU.W - Packed 32-bit unsigned saturating addition (RV64 only)
+ * For each word: rd[i] =3D usat32(rs1[i] + rs2[i])
+ */
+uint64_t HELPER(psaddu_w)(CPURISCVState *env,
+                     uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint64_t sum =3D (uint64_t)e1 + (uint64_t)e2;
+        uint32_t res =3D unsigned_saturate_w(sum, &sat);
+        rd =3D INSERT32(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * SADD - 32-bit signed saturating addition
+ */
+uint32_t HELPER(sadd)(CPURISCVState *env,
+                     uint32_t rs1, uint32_t rs2)
+{
+    int32_t a =3D (int32_t)rs1;
+    int32_t b =3D (int32_t)rs2;
+    int64_t sum =3D (int64_t)a + (int64_t)b;
+    int sat =3D 0;
+    int32_t res =3D signed_saturate_w(sum, &sat);
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return (uint32_t)res;
+}
+
+/**
+ * SADDU - 32-bit unsigned saturating addition
+ */
+uint32_t HELPER(saddu)(CPURISCVState *env,
+                     uint32_t rs1, uint32_t rs2)
+{
+    uint32_t a =3D rs1;
+    uint32_t b =3D rs2;
+    uint64_t sum =3D (uint64_t)a + (uint64_t)b;
+    int sat =3D 0;
+    uint32_t res =3D unsigned_saturate_w(sum, &sat);
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return res;
+}
+
+/* Saturating subtraction operations */
+
+/**
+ * PSSUB.B - Packed 8-bit signed saturating subtraction
+ * For each byte: rd[i] =3D sat8(rs1[i] - rs2[i])
+ */
+target_ulong HELPER(pssub_b)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int8_t e1 =3D (int8_t)EXTRACT8(rs1, i);
+        int8_t e2 =3D (int8_t)EXTRACT8(rs2, i);
+        int32_t diff =3D (int32_t)e1 - (int32_t)e2;
+        int8_t res =3D signed_saturate_b(diff, &sat);
+        rd =3D INSERT8(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSUB.H - Packed 16-bit signed saturating subtraction
+ * For each halfword: rd[i] =3D sat16(rs1[i] - rs2[i])
+ */
+target_ulong HELPER(pssub_h)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        int32_t diff =3D (int32_t)e1 - (int32_t)e2;
+        int16_t res =3D signed_saturate_h(diff, &sat);
+        rd =3D INSERT16(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSUB.W - Packed 32-bit signed saturating subtraction (RV64 only)
+ * For each word: rd[i] =3D sat32(rs1[i] - rs2[i])
+ */
+uint64_t HELPER(pssub_w)(CPURISCVState *env,
+                     uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t e2 =3D (int32_t)EXTRACT32(rs2, i);
+        int64_t diff =3D (int64_t)e1 - (int64_t)e2;
+        int32_t res =3D signed_saturate_w(diff, &sat);
+        rd =3D INSERT32(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSUBU.B - Packed 8-bit unsigned saturating subtraction
+ * For each byte: rd[i] =3D usat8(rs1[i] - rs2[i])
+ */
+target_ulong HELPER(pssubu_b)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i);
+        uint32_t diff =3D e1 - e2;  /* Unsigned subtraction may underflow =
*/
+        uint8_t res =3D unsigned_saturate_b(diff, &sat);
+        rd =3D INSERT8(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSUBU.H - Packed 16-bit unsigned saturating subtraction
+ * For each halfword: rd[i] =3D usat16(rs1[i] - rs2[i])
+ */
+target_ulong HELPER(pssubu_h)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint32_t diff =3D e1 - e2;
+        uint16_t res =3D unsigned_saturate_h(diff, &sat);
+        rd =3D INSERT16(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSUBU.W - Packed 32-bit unsigned saturating subtraction (RV64 only)
+ * For each word: rd[i] =3D usat32(rs1[i] - rs2[i])
+ */
+uint64_t HELPER(pssubu_w)(CPURISCVState *env,
+                     uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint32_t res =3D (e1 >=3D e2) ? (e1 - e2) : 0;
+        if (e1 < e2) {
+            sat =3D 1;
+        }
+        rd =3D INSERT32(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * SSUB - 32-bit signed saturating subtraction
+ */
+uint32_t HELPER(ssub)(CPURISCVState *env,
+                     uint32_t rs1, uint32_t rs2)
+{
+    int32_t a =3D (int32_t)rs1;
+    int32_t b =3D (int32_t)rs2;
+    int64_t diff =3D (int64_t)a - (int64_t)b;
+    int sat =3D 0;
+    int32_t res =3D signed_saturate_w(diff, &sat);
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return (uint32_t)res;
+}
+
+/**
+ * SSUBU - 32-bit unsigned saturating subtraction
+ */
+uint32_t HELPER(ssubu)(CPURISCVState *env,
+                     uint32_t rs1, uint32_t rs2)
+{
+    uint32_t a =3D rs1;
+    uint32_t b =3D rs2;
+    uint64_t diff =3D (uint64_t)a - (uint64_t)b;
+    int sat =3D 0;
+    uint32_t res =3D unsigned_saturate_w(diff, &sat);
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return res;
+}
+
+/* Saturation instructions (SAT, USAT) */
+
+/**
+ * PSATI.H - Packed 16-bit signed saturate to immediate bit-width
+ * For each halfword: rd[i] =3D sat(rs1[i], imm+1 bits)
+ */
+target_ulong HELPER(psati_h)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong imm)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int range =3D (imm & 0x0F) + 1;  /* imm specifies bits-1 */
+    int64_t max =3D (1LL << (range - 1)) - 1;
+    int64_t min =3D -(1LL << (range - 1));
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t res;
+
+        if (e1 > max) {
+            res =3D max;
+            sat =3D 1;
+        } else if (e1 < min) {
+            res =3D min;
+            sat =3D 1;
+        } else {
+            res =3D e1;
+        }
+
+        rd =3D INSERT16(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PUSATI.H - Packed 16-bit unsigned saturate to immediate bit-width
+ * For each halfword: rd[i] =3D usat(rs1[i], imm bits)
+ */
+target_ulong HELPER(pusati_h)(CPURISCVState *env,
+                          target_ulong rs1, target_ulong imm)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    uint32_t max =3D (1U << imm) - 1;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t res;
+
+        if (e1 < 0) {
+            res =3D 0;
+            sat =3D 1;
+        } else if ((uint16_t)e1 > max) {
+            res =3D max;
+            sat =3D 1;
+        } else {
+            res =3D e1;
+        }
+
+        rd =3D INSERT16(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSATI.W - Packed 32-bit signed saturate to immediate bit-width (RV64 on=
ly)
+ * For each word: rd[i] =3D sat(rs1[i], imm+1 bits)
+ */
+uint64_t HELPER(psati_w)(CPURISCVState *env,
+                     uint64_t rs1, uint64_t imm)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    int range =3D (imm & 0x1F) + 1;
+    int64_t max =3D (1LL << (range - 1)) - 1;
+    int64_t min =3D -(1LL << (range - 1));
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t res;
+
+        if (e1 > max) {
+            res =3D max;
+            sat =3D 1;
+        } else if (e1 < min) {
+            res =3D min;
+            sat =3D 1;
+        } else {
+            res =3D e1;
+        }
+
+        rd =3D INSERT32(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PUSATI.W - Packed 32-bit unsigned saturate to immediate bit-width (RV64=
 only)
+ * For each word: rd[i] =3D usat(rs1[i], imm bits)
+ */
+uint64_t HELPER(pusati_w)(CPURISCVState *env,
+                     uint64_t rs1, uint64_t imm)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    uint64_t max =3D (1ULL << imm) - 1;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t res;
+
+        if (e1 < 0) {
+            res =3D 0;
+            sat =3D 1;
+        } else if ((uint32_t)e1 > max) {
+            res =3D max;
+            sat =3D 1;
+        } else {
+            res =3D e1;
+        }
+
+        rd =3D INSERT32(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * SATI_32 - 32-bit scalar signed saturation with immediate range
+ */
+uint32_t HELPER(sati_32)(CPURISCVState *env,
+                         uint32_t rs1, uint32_t imm)
+{
+    int32_t a =3D (int32_t)rs1;
+    int range =3D (imm & 0x1F) + 1;  /* imm specifies bits-1 */
+    int64_t max =3D (1LL << (range - 1)) - 1;
+    int64_t min =3D -(1LL << (range - 1));
+    int sat =3D 0;
+
+    if (a > max) {
+        a =3D max;
+        sat =3D 1;
+    } else if (a < min) {
+        a =3D min;
+        sat =3D 1;
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return (uint32_t)a;
+}
+
+/**
+ * USATI_32 - 32-bit scalar unsigned saturation with immediate range
+ */
+uint32_t HELPER(usati_32)(CPURISCVState *env,
+                          uint32_t rs1, uint32_t imm)
+{
+    int32_t a =3D (int32_t)rs1;
+    uint32_t max =3D (1U << imm) - 1;
+    int sat =3D 0;
+
+    if (a < 0) {
+        a =3D 0;
+        sat =3D 1;
+    } else if ((uint32_t)a > max) {
+        a =3D max;
+        sat =3D 1;
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return (uint32_t)a;
+}
+
+/**
+ * SATI_64 - 64-bit scalar signed saturation with immediate range
+ */
+uint64_t HELPER(sati_64)(CPURISCVState *env,
+                     uint64_t rs1, uint64_t imm)
+{
+    int64_t a =3D (int64_t)rs1;
+    int range =3D (imm & 0x3F) + 1;
+    int64_t max =3D (1LL << (range - 1)) - 1;
+    int64_t min =3D -(1LL << (range - 1));
+    int sat =3D 0;
+
+    if (a > max) {
+        a =3D max;
+        sat =3D 1;
+    } else if (a < min) {
+        a =3D min;
+        sat =3D 1;
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return (uint64_t)a;
+}
+
+/**
+ * USATI_64 - 64-bit scalar unsigned saturation with immediate range
+ */
+uint64_t HELPER(usati_64)(CPURISCVState *env,
+                     uint64_t rs1, uint64_t imm)
+{
+    int64_t a =3D (int64_t)rs1;
+    uint64_t max =3D (1ULL << imm) - 1;
+    int sat =3D 0;
+
+    if (a < 0) {
+        a =3D 0;
+        sat =3D 1;
+    } else if ((uint64_t)a > max) {
+        a =3D max;
+        sat =3D 1;
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return (uint64_t)a;
+}
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 81087e0a5d..de3ec7a7ec 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -1206,6 +1206,7 @@ static uint32_t opcode_at(DisasContextBase *dcbase, t=
arget_ulong pc)
 #include "insn_trans/trans_rvh.c.inc"
 #include "insn_trans/trans_rvv.c.inc"
 #include "insn_trans/trans_rvb.c.inc"
+#include "insn_trans/trans_rvp.c.inc"
 #include "insn_trans/trans_rvzicond.c.inc"
 #include "insn_trans/trans_rvzacas.c.inc"
 #include "insn_trans/trans_rvzabha.c.inc"
--=20
2.34.1
From nobody Sat May 30 20:13:15 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1776422968466246.70471034990396;
 Fri, 17 Apr 2026 03:49:28 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists1p.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1wDgjH-00012U-9I; Fri, 17 Apr 2026 06:47:19 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjE-00010h-AH; Fri, 17 Apr 2026 06:47:16 -0400
Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn)
 by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjB-0007w9-75; Fri, 17 Apr 2026 06:47:16 -0400
Received: from Huawei.localdomain (unknown [36.110.52.2])
 by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S5;
 Fri, 17 Apr 2026 18:47:09 +0800 (CST)
From: Molly Chen <xiaoou@iscas.ac.cn>
To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com,
 daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com,
 chao.liu.zevorn@gmail.com
Cc: xiaoou@iscas.ac.cn,
	qemu-riscv@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 03/14] target/riscv: rvp: add averaging operations
Date: Fri, 17 Apr 2026 18:46:40 +0800
Message-Id: <20260417104652.17857-4-xiaoou@iscas.ac.cn>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
References: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S5
X-Coremail-Antispam: 1UD129KBjvJXoW3uF47GF4kKrWUCr13ur4Durg_yoWkWr43pF
 WkJry2qay8JFWaqr4SkF15Ar43WFsxJw48Gr43tFySva1rJFZ5tryUtw42yFsxWF9rWF1Y
 9a90y34DAa4Iqa7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2
 9KBjDU0xBIdaVrnRJUUUBl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0
 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2
 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0
 Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UM2
 8EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AI
 xVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20x
 vE14v26r1Y6r17McIj6I8E87Iv67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xv
 r2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxkF7I0En4kS14v26r1q6r43Mx
 AIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_
 Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwI
 xGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWx
 JwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcV
 C2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7VUbF4iUUUUUU==
X-Originating-IP: [36.110.52.2]
X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists1p.gnu.org;
Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn;
 helo=cstnet.cn
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998,
 HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: qemu development <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1776422970561158500
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Molly Chen <xiaoou@iscas.ac.cn>
---
 target/riscv/helper.h                   |  20 ++
 target/riscv/insn32.decode              |  28 ++-
 target/riscv/insn_trans/trans_rvp.c.inc |  20 ++
 target/riscv/psimd_helper.c             | 266 ++++++++++++++++++++++++
 4 files changed, 333 insertions(+), 1 deletion(-)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 76bc6583fb..a72e02b44c 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1353,6 +1353,7 @@ DEF_HELPER_1(ssamoswap_disabled, void, env)
 #endif
=20
 /* Packed SIMD */
+/* Packed SIMD - Arithmetic Operations(Non-Saturating and Saturating) */
 DEF_HELPER_3(padd_b, tl, env, tl, tl)
 DEF_HELPER_3(padd_h, tl, env, tl, tl)
 DEF_HELPER_3(padd_w, i64, env, i64, i64)
@@ -1391,3 +1392,22 @@ DEF_HELPER_3(sati_32, i32, env, i32, i32)
 DEF_HELPER_3(usati_32, i32, env, i32, i32)
 DEF_HELPER_3(sati_64, i64, env, i64, i64)
 DEF_HELPER_3(usati_64, i64, env, i64, i64)
+
+/* Packed SIMD - Averaging and Rounding Operations */
+DEF_HELPER_3(paadd_b, tl, env, tl, tl)
+DEF_HELPER_3(paadd_h, tl, env, tl, tl)
+DEF_HELPER_3(paadd_w, i64, env, i64, i64)
+DEF_HELPER_3(paaddu_b, tl, env, tl, tl)
+DEF_HELPER_3(paaddu_h, tl, env, tl, tl)
+DEF_HELPER_3(paaddu_w, i64, env, i64, i64)
+DEF_HELPER_3(aadd, i32, env, i32, i32)
+DEF_HELPER_3(aaddu, i32, env, i32, i32)
+DEF_HELPER_3(pasub_b, tl, env, tl, tl)
+DEF_HELPER_3(pasub_h, tl, env, tl, tl)
+DEF_HELPER_3(pasub_w, i64, env, i64, i64)
+DEF_HELPER_3(pasubu_b, tl, env, tl, tl)
+DEF_HELPER_3(pasubu_h, tl, env, tl, tl)
+DEF_HELPER_3(pasubu_w, i64, env, i64, i64)
+DEF_HELPER_3(asub, i32, env, i32, i32)
+DEF_HELPER_3(asubu, i32, env, i32, i32)
+
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 6043eb39cf..f609c38638 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1094,7 +1094,7 @@ sd_aqrl  00111 . . ..... ..... 011 ..... 0101111 @ato=
m_st
=20
=20
 # *** P Experimental Extension Version v018 ***
-# Arithmetic Operations(Non-Saturating and Saturating)
+# Packed SIMD - Arithmetic Operations(Non-Saturating and Saturating)
 padd_b     1000010 ..... ..... 000 ..... 0111011 @r
 padd_h     1000000 ..... ..... 000 ..... 0111011 @r
 padd_w     1000001 ..... ..... 000 ..... 0111011 @r
@@ -1149,3 +1149,29 @@ pusati_h   10100 001.... ..... 100 ..... 0011011 @p_=
ui16
 sati_64    111001 ...... ..... 100 ..... 0011011 @p_ui64
 usati_64   101001 ...... ..... 100 ..... 0011011 @p_ui64
=20
+# Packed SIMD - Averaging and Rounding Operations
+paadd_b    1001110 ..... ..... 000 ..... 0111011 @r
+paadd_h    1001100 ..... ..... 000 ..... 0111011 @r
+{
+  aadd     1001101 ..... ..... 000 ..... 0111011 @r
+  paadd_w  1001101 ..... ..... 000 ..... 0111011 @r
+}
+paaddu_b   1011110 ..... ..... 000 ..... 0111011 @r
+paaddu_h   1011100 ..... ..... 000 ..... 0111011 @r
+{
+  aaddu    1011101 ..... ..... 000 ..... 0111011 @r
+  paaddu_w 1011101 ..... ..... 000 ..... 0111011 @r
+}
+pasub_b    1101110 ..... ..... 000 ..... 0111011 @r
+pasub_h    1101100 ..... ..... 000 ..... 0111011 @r
+{
+  asub     1101101 ..... ..... 000 ..... 0111011 @r
+  pasub_w  1101101 ..... ..... 000 ..... 0111011 @r
+}
+pasubu_b   1111110 ..... ..... 000 ..... 0111011 @r
+pasubu_h   1111100 ..... ..... 000 ..... 0111011 @r
+{
+  asubu    1111101 ..... ..... 000 ..... 0111011 @r
+  pasubu_w 1111101 ..... ..... 000 ..... 0111011 @r
+}
+
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr=
ans/trans_rvp.c.inc
index 6f7246b563..e3abb38d18 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -524,6 +524,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME =
* a) \
 }
 #endif
=20
+/* Packed SIMD - Arithmetic Operations(Non-Saturating and Saturating) */
 GEN_SIMD_TRANS(padd_b)
 GEN_SIMD_TRANS(padd_h)
 GEN_SIMD_TRANS_64(padd_w)
@@ -562,3 +563,22 @@ GEN_SIMD_TRANS_IMM_32(sati_32)
 GEN_SIMD_TRANS_IMM_32(usati_32)
 GEN_SIMD_TRANS_IMM_64(sati_64)
 GEN_SIMD_TRANS_IMM_64(usati_64)
+
+/* Packed SIMD - Averaging and Rounding Operations */
+GEN_SIMD_TRANS(paadd_b)
+GEN_SIMD_TRANS(paadd_h)
+GEN_SIMD_TRANS_64(paadd_w)
+GEN_SIMD_TRANS(paaddu_b)
+GEN_SIMD_TRANS(paaddu_h)
+GEN_SIMD_TRANS_64(paaddu_w)
+GEN_SIMD_TRANS_32(aadd)
+GEN_SIMD_TRANS_32(aaddu)
+GEN_SIMD_TRANS(pasub_b)
+GEN_SIMD_TRANS(pasub_h)
+GEN_SIMD_TRANS_64(pasub_w)
+GEN_SIMD_TRANS(pasubu_b)
+GEN_SIMD_TRANS(pasubu_h)
+GEN_SIMD_TRANS_64(pasubu_w)
+GEN_SIMD_TRANS_32(asub)
+GEN_SIMD_TRANS_32(asubu)
+
diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c
index a754ee3b5e..23c0402de2 100644
--- a/target/riscv/psimd_helper.c
+++ b/target/riscv/psimd_helper.c
@@ -1067,3 +1067,269 @@ uint64_t HELPER(usati_64)(CPURISCVState *env,
     }
     return (uint64_t)a;
 }
+
+/* Averaging Operations (non-saturating) */
+
+/**
+ * PAADD.B - Packed 8-bit signed averaging addition
+ * For each byte: rd[i] =3D (rs1[i] + rs2[i]) >> 1
+ */
+target_ulong HELPER(paadd_b)(CPURISCVState *env, target_ulong rs1,
+                             target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int8_t)EXTRACT8(rs1, i);
+        int16_t e2 =3D (int8_t)EXTRACT8(rs2, i);
+        int16_t avg =3D (e1 + e2) >> 1;
+        rd =3D INSERT8(rd, (int8_t)avg, i);
+    }
+    return rd;
+}
+
+/**
+ * PAADD.H - Packed 16-bit signed averaging addition
+ * For each halfword: rd[i] =3D (rs1[i] + rs2[i]) >> 1
+ */
+target_ulong HELPER(paadd_h)(CPURISCVState *env, target_ulong rs1,
+                             target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int32_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        int32_t avg =3D (e1 + e2) >> 1;
+        rd =3D INSERT16(rd, (int16_t)avg, i);
+    }
+    return rd;
+}
+
+/**
+ * PAADD.W - Packed 32-bit signed averaging addition (RV64 only)
+ * For each word: rd[i] =3D (rs1[i] + rs2[i]) >> 1
+ */
+uint64_t HELPER(paadd_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int64_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int64_t e2 =3D (int32_t)EXTRACT32(rs2, i);
+        int64_t avg =3D (e1 + e2) >> 1;
+        rd =3D INSERT32(rd, (int32_t)avg, i);
+    }
+    return rd;
+}
+
+/**
+ * PAADDU.B - Packed 8-bit unsigned averaging addition
+ * For each byte: rd[i] =3D (rs1[i] + rs2[i]) >> 1
+ */
+target_ulong HELPER(paaddu_b)(CPURISCVState *env, target_ulong rs1,
+                              target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT8(rs1, i);
+        uint16_t e2 =3D EXTRACT8(rs2, i);
+        uint16_t avg =3D (e1 + e2) >> 1;
+        rd =3D INSERT8(rd, (uint8_t)avg, i);
+    }
+    return rd;
+}
+
+/**
+ * PAADDU.H - Packed 16-bit unsigned averaging addition
+ * For each halfword: rd[i] =3D (rs1[i] + rs2[i]) >> 1
+ */
+target_ulong HELPER(paaddu_h)(CPURISCVState *env, target_ulong rs1,
+                              target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT16(rs1, i);
+        uint32_t e2 =3D EXTRACT16(rs2, i);
+        uint32_t avg =3D (e1 + e2) >> 1;
+        rd =3D INSERT16(rd, (uint16_t)avg, i);
+    }
+    return rd;
+}
+
+/**
+ * PAADDU.W - Packed 32-bit unsigned averaging addition (RV64 only)
+ * For each word: rd[i] =3D (rs1[i] + rs2[i]) >> 1
+ */
+uint64_t HELPER(paaddu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint64_t e1 =3D EXTRACT32(rs1, i);
+        uint64_t e2 =3D EXTRACT32(rs2, i);
+        uint64_t avg =3D (e1 + e2) >> 1;
+        rd =3D INSERT32(rd, (uint32_t)avg, i);
+    }
+    return rd;
+}
+
+/**
+ * AADD - 32-bit signed averaging addition
+ */
+uint32_t HELPER(aadd)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int64_t a =3D (int32_t)rs1;
+    int64_t b =3D (int32_t)rs2;
+    return (uint32_t)((a + b) >> 1);
+}
+
+/**
+ * AADDU - 32-bit unsigned averaging addition
+ */
+uint32_t HELPER(aaddu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t a =3D rs1;
+    uint64_t b =3D rs2;
+    return (uint32_t)((a + b) >> 1);
+}
+
+/**
+ * PASUB.B - Packed 8-bit signed averaging subtraction
+ * For each byte: rd[i] =3D (rs1[i] - rs2[i]) >> 1
+ */
+target_ulong HELPER(pasub_b)(CPURISCVState *env, target_ulong rs1,
+                             target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int8_t)EXTRACT8(rs1, i);
+        int16_t e2 =3D (int8_t)EXTRACT8(rs2, i);
+        int16_t avg =3D (e1 - e2) >> 1;
+        rd =3D INSERT8(rd, (int8_t)avg, i);
+    }
+    return rd;
+}
+
+/**
+ * PASUB.H - Packed 16-bit signed averaging subtraction
+ * For each halfword: rd[i] =3D (rs1[i] - rs2[i]) >> 1
+ */
+target_ulong HELPER(pasub_h)(CPURISCVState *env, target_ulong rs1,
+                             target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int32_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        int32_t avg =3D (e1 - e2) >> 1;
+        rd =3D INSERT16(rd, (int16_t)avg, i);
+    }
+    return rd;
+}
+
+/**
+ * PASUB.W - Packed 32-bit signed averaging subtraction (RV64 only)
+ * For each word: rd[i] =3D (rs1[i] - rs2[i]) >> 1
+ */
+uint64_t HELPER(pasub_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int64_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int64_t e2 =3D (int32_t)EXTRACT32(rs2, i);
+        int64_t avg =3D (e1 - e2) >> 1;
+        rd =3D INSERT32(rd, (int32_t)avg, i);
+    }
+    return rd;
+}
+
+/**
+ * PASUBU.B - Packed 8-bit unsigned averaging subtraction
+ * For each byte: rd[i] =3D (rs1[i] - rs2[i]) >> 1
+ */
+target_ulong HELPER(pasubu_b)(CPURISCVState *env, target_ulong rs1,
+                              target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT8(rs1, i);
+        uint16_t e2 =3D EXTRACT8(rs2, i);
+        uint16_t avg =3D (e1 - e2) >> 1;
+        rd =3D INSERT8(rd, (uint8_t)avg, i);
+    }
+    return rd;
+}
+
+/**
+ * PASUBU.H - Packed 16-bit unsigned averaging subtraction
+ * For each halfword: rd[i] =3D (rs1[i] - rs2[i]) >> 1
+ */
+target_ulong HELPER(pasubu_h)(CPURISCVState *env, target_ulong rs1,
+                              target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT16(rs1, i);
+        uint32_t e2 =3D EXTRACT16(rs2, i);
+        uint32_t avg =3D (e1 - e2) >> 1;
+        rd =3D INSERT16(rd, (uint16_t)avg, i);
+    }
+    return rd;
+}
+
+/**
+ * PASUBU.W - Packed 32-bit unsigned averaging subtraction (RV64 only)
+ * For each word: rd[i] =3D (rs1[i] - rs2[i]) >> 1
+ */
+uint64_t HELPER(pasubu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint64_t e1 =3D EXTRACT32(rs1, i);
+        uint64_t e2 =3D EXTRACT32(rs2, i);
+        uint64_t avg =3D (e1 - e2) >> 1;
+        rd =3D INSERT32(rd, (uint32_t)avg, i);
+    }
+    return rd;
+}
+
+/**
+ * ASUB - 32-bit signed averaging subtraction
+ */
+uint32_t HELPER(asub)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int64_t a =3D (int32_t)rs1;
+    int64_t b =3D (int32_t)rs2;
+    return (uint32_t)((a - b) >> 1);
+}
+
+/**
+ * ASUBU - 32-bit unsigned averaging subtraction
+ */
+uint32_t HELPER(asubu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t a =3D rs1;
+    uint64_t b =3D rs2;
+    return (uint32_t)((a - b) >> 1);
+}
--=20
2.34.1
From nobody Sat May 30 20:13:15 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 17764228808321020.8274620942657;
 Fri, 17 Apr 2026 03:48:00 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists1p.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1wDgjJ-00013K-9m; Fri, 17 Apr 2026 06:47:21 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjG-00012J-OJ; Fri, 17 Apr 2026 06:47:18 -0400
Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn)
 by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjD-0007xe-ED; Fri, 17 Apr 2026 06:47:18 -0400
Received: from Huawei.localdomain (unknown [36.110.52.2])
 by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S6;
 Fri, 17 Apr 2026 18:47:10 +0800 (CST)
From: Molly Chen <xiaoou@iscas.ac.cn>
To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com,
 daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com,
 chao.liu.zevorn@gmail.com
Cc: xiaoou@iscas.ac.cn,
	qemu-riscv@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 04/14] target/riscv: rvp: add absolute value and difference,
 comparison and mask generation operations
Date: Fri, 17 Apr 2026 18:46:41 +0800
Message-Id: <20260417104652.17857-5-xiaoou@iscas.ac.cn>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
References: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S6
X-Coremail-Antispam: 1UD129KBjvAXoWfuw4DJrWfKrWktw43GF43GFg_yoW8ur1kXo
 ZrKw15A34fGr1fW348uw4xZr18XrW2v3WDGr48uw45Z3s3WF1Sgr15J3WkA3WxtrWayrW3
 X39aqFn8J3ZxK3sxn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3
 AaLaJ3UjIYCTnIWjp_UUUOU7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva
 j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc
 Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l
 84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26F4j6r
 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq
 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7
 IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4U
 M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0262kKe7AKxVWUtV
 W8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2
 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2
 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_
 Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJw
 CI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7VUbnNVPUU
 UUU==
X-Originating-IP: [36.110.52.2]
X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists1p.gnu.org;
Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn;
 helo=cstnet.cn
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998,
 HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: qemu development <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1776422884014154100
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Molly Chen <xiaoou@iscas.ac.cn>
---
 target/riscv/helper.h                   |  38 ++
 target/riscv/insn32.decode              |  44 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  38 ++
 target/riscv/psimd_helper.c             | 634 ++++++++++++++++++++++++
 4 files changed, 754 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a72e02b44c..f6351ecd43 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1411,3 +1411,41 @@ DEF_HELPER_3(pasubu_w, i64, env, i64, i64)
 DEF_HELPER_3(asub, i32, env, i32, i32)
 DEF_HELPER_3(asubu, i32, env, i32, i32)
=20
+/* Packed SIMD - Absolute Value and Difference Operations */
+DEF_HELPER_2(psabs_b, tl, env, tl)
+DEF_HELPER_2(psabs_h, tl, env, tl)
+DEF_HELPER_2(abs, tl, env, tl)
+DEF_HELPER_2(absw, i64, env, i64)
+DEF_HELPER_3(pabd_b, tl, env, tl, tl)
+DEF_HELPER_3(pabdu_b, tl, env, tl, tl)
+DEF_HELPER_3(pabd_h, tl, env, tl, tl)
+DEF_HELPER_3(pabdu_h, tl, env, tl, tl)
+DEF_HELPER_3(pabdsumu_b, tl, env, tl, tl)
+DEF_HELPER_4(pabdsumau_b, tl, env, tl, tl, tl)
+
+/* Packed SIMD - Comparison and Mask Generation Operations */
+DEF_HELPER_3(pmseq_b, tl, env, tl, tl)
+DEF_HELPER_3(pmslt_b, tl, env, tl, tl)
+DEF_HELPER_3(pmsltu_b, tl, env, tl, tl)
+DEF_HELPER_3(pmin_b, tl, env, tl, tl)
+DEF_HELPER_3(pminu_b, tl, env, tl, tl)
+DEF_HELPER_3(pmax_b, tl, env, tl, tl)
+DEF_HELPER_3(pmaxu_b, tl, env, tl, tl)
+DEF_HELPER_3(pmseq_h, tl, env, tl, tl)
+DEF_HELPER_3(pmslt_h, tl, env, tl, tl)
+DEF_HELPER_3(pmsltu_h, tl, env, tl, tl)
+DEF_HELPER_3(pmin_h, tl, env, tl, tl)
+DEF_HELPER_3(pminu_h, tl, env, tl, tl)
+DEF_HELPER_3(pmax_h, tl, env, tl, tl)
+DEF_HELPER_3(pmaxu_h, tl, env, tl, tl)
+DEF_HELPER_3(pmseq_w, i64, env, i64, i64)
+DEF_HELPER_3(pmslt_w, i64, env, i64, i64)
+DEF_HELPER_3(pmsltu_w, i64, env, i64, i64)
+DEF_HELPER_3(pmin_w, i64, env, i64, i64)
+DEF_HELPER_3(pminu_w, i64, env, i64, i64)
+DEF_HELPER_3(pmax_w, i64, env, i64, i64)
+DEF_HELPER_3(pmaxu_w, i64, env, i64, i64)
+DEF_HELPER_3(mseq, i32, env, i32, i32)
+DEF_HELPER_3(mslt, i32, env, i32, i32)
+DEF_HELPER_3(msltu, i32, env, i32, i32)
+
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f609c38638..2034041639 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1175,3 +1175,47 @@ pasubu_h   1111100 ..... ..... 000 ..... 0111011 @r
   pasubu_w 1111101 ..... ..... 000 ..... 0111011 @r
 }
=20
+# Packed SIMD - Absolute Value and Difference Operations
+psabs_b    1110010 00111 ..... 010 ..... 0011011 @r2
+psabs_h    1110000 00111 ..... 010 ..... 0011011 @r2
+abs        01100 0000111 ..... 001 ..... 0010011 @r2
+absw       01100 0000111 ..... 001 ..... 0011011 @r2
+pabd_b     1100110 ..... ..... 000 ..... 0111011 @r
+pabdu_b    1110110 ..... ..... 000 ..... 0111011 @r
+pabd_h     1100100 ..... ..... 000 ..... 0111011 @r
+pabdu_h    1110100 ..... ..... 000 ..... 0111011 @r
+pabdsumu_b 1011010 ..... ..... 001 ..... 0111011 @r
+pabdsumau_b 1011110 ..... ..... 001 ..... 0111011 @r
+
+# Packed SIMD - Comparison and Mask Generation Operations
+pmseq_b    1100010 ..... ..... 110 ..... 0111011 @r
+pmslt_b    1101010 ..... ..... 110 ..... 0111011 @r
+pmsltu_b   1101110 ..... ..... 110 ..... 0111011 @r
+pmin_b     1110010 ..... ..... 110 ..... 0111011 @r
+pminu_b    1110110 ..... ..... 110 ..... 0111011 @r
+pmax_b     1111010 ..... ..... 110 ..... 0111011 @r
+pmaxu_b    1111110 ..... ..... 110 ..... 0111011 @r
+pmseq_h    1100000 ..... ..... 110 ..... 0111011 @r
+pmslt_h    1101000 ..... ..... 110 ..... 0111011 @r
+pmsltu_h   1101100 ..... ..... 110 ..... 0111011 @r
+pmin_h     1110000 ..... ..... 110 ..... 0111011 @r
+pminu_h    1110100 ..... ..... 110 ..... 0111011 @r
+pmax_h     1111000 ..... ..... 110 ..... 0111011 @r
+pmaxu_h    1111100 ..... ..... 110 ..... 0111011 @r
+{
+  mseq     1100001 ..... ..... 110 ..... 0111011 @r
+  pmseq_w  1100001 ..... ..... 110 ..... 0111011 @r
+}
+{
+  mslt     1101001 ..... ..... 110 ..... 0111011 @r
+  pmslt_w  1101001 ..... ..... 110 ..... 0111011 @r
+}
+{
+  msltu    1101101 ..... ..... 110 ..... 0111011 @r
+  pmsltu_w 1101101 ..... ..... 110 ..... 0111011 @r
+}
+pmin_w     1110001 ..... ..... 110 ..... 0111011 @r
+pminu_w    1110101 ..... ..... 110 ..... 0111011 @r
+pmax_w     1111001 ..... ..... 110 ..... 0111011 @r
+pmaxu_w    1111101 ..... ..... 110 ..... 0111011 @r
+
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr=
ans/trans_rvp.c.inc
index e3abb38d18..27d482863c 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -582,3 +582,41 @@ GEN_SIMD_TRANS_64(pasubu_w)
 GEN_SIMD_TRANS_32(asub)
 GEN_SIMD_TRANS_32(asubu)
=20
+/* Packed SIMD - Absolute Value and Difference Operations */
+GEN_SIMD_TRANS_R1(psabs_b)
+GEN_SIMD_TRANS_R1(psabs_h)
+GEN_SIMD_TRANS_R1(abs)
+GEN_SIMD_TRANS_R1_64(absw)
+GEN_SIMD_TRANS(pabd_b)
+GEN_SIMD_TRANS(pabdu_b)
+GEN_SIMD_TRANS(pabd_h)
+GEN_SIMD_TRANS(pabdu_h)
+GEN_SIMD_TRANS(pabdsumu_b)
+GEN_SIMD_TRANS_ACC(pabdsumau_b)
+
+/* Packed SIMD - Comparison and Mask Generation Operations */
+GEN_SIMD_TRANS(pmseq_b)
+GEN_SIMD_TRANS(pmslt_b)
+GEN_SIMD_TRANS(pmsltu_b)
+GEN_SIMD_TRANS(pmin_b)
+GEN_SIMD_TRANS(pminu_b)
+GEN_SIMD_TRANS(pmax_b)
+GEN_SIMD_TRANS(pmaxu_b)
+GEN_SIMD_TRANS(pmseq_h)
+GEN_SIMD_TRANS(pmslt_h)
+GEN_SIMD_TRANS(pmsltu_h)
+GEN_SIMD_TRANS(pmin_h)
+GEN_SIMD_TRANS(pminu_h)
+GEN_SIMD_TRANS(pmax_h)
+GEN_SIMD_TRANS(pmaxu_h)
+GEN_SIMD_TRANS_64(pmseq_w)
+GEN_SIMD_TRANS_64(pmslt_w)
+GEN_SIMD_TRANS_64(pmsltu_w)
+GEN_SIMD_TRANS_64(pmin_w)
+GEN_SIMD_TRANS_64(pminu_w)
+GEN_SIMD_TRANS_64(pmax_w)
+GEN_SIMD_TRANS_64(pmaxu_w)
+GEN_SIMD_TRANS_32(mseq)
+GEN_SIMD_TRANS_32(mslt)
+GEN_SIMD_TRANS_32(msltu)
+
diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c
index 23c0402de2..38207c3a39 100644
--- a/target/riscv/psimd_helper.c
+++ b/target/riscv/psimd_helper.c
@@ -1333,3 +1333,637 @@ uint32_t HELPER(asubu)(CPURISCVState *env, uint32_t=
 rs1, uint32_t rs2)
     uint64_t b =3D rs2;
     return (uint32_t)((a - b) >> 1);
 }
+
+/* Absolute value operations */
+
+/**
+ * PSABS.B - Packed 8-bit absolute value
+ * For each byte: rd[i] =3D abs(rs1[i]), saturate if MIN
+ */
+target_ulong HELPER(psabs_b)(CPURISCVState *env, target_ulong rs1)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int8_t e1 =3D (int8_t)EXTRACT8(rs1, i);
+        int8_t res;
+
+        if (e1 =3D=3D INT8_MIN) {
+            res =3D INT8_MAX;
+            sat =3D 1;
+        } else if (e1 < 0) {
+            res =3D -e1;
+        } else {
+            res =3D e1;
+        }
+
+        rd =3D INSERT8(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSABS.H - Packed 16-bit absolute value
+ * For each halfword: rd[i] =3D abs(rs1[i]), saturate if MIN
+ */
+target_ulong HELPER(psabs_h)(CPURISCVState *env, target_ulong rs1)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t res;
+
+        if (e1 =3D=3D INT16_MIN) {
+            res =3D INT16_MAX;
+            sat =3D 1;
+        } else if (e1 < 0) {
+            res =3D -e1;
+        } else {
+            res =3D e1;
+        }
+
+        rd =3D INSERT16(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * ABS - 32/64-bit scalar absolute value
+ */
+target_ulong HELPER(abs)(CPURISCVState *env, target_ulong rs1)
+{
+    target_long a =3D (target_long)rs1;
+    return (a < 0) ? (target_ulong)(-a) : rs1;
+}
+
+/**
+ * ABSW - Absolute value of low 32 bits (RV64)
+ */
+uint64_t HELPER(absw)(CPURISCVState *env, uint64_t rs1)
+{
+    int32_t a =3D (int32_t)EXTRACT32(rs1, 0);
+    uint32_t res;
+
+    if (a =3D=3D INT32_MIN) {
+        res =3D 0x80000000;
+    } else if (a < 0) {
+        res =3D (uint32_t)(-a);
+    } else {
+        res =3D (uint32_t)a;
+    }
+
+    return (uint64_t)res;
+}
+
+
+/* Absolute difference operations */
+
+/**
+ * PABD.B - Packed 8-bit signed absolute difference
+ * For each byte: rd[i] =3D |rs1[i] - rs2[i]|
+ */
+target_ulong HELPER(pabd_b)(CPURISCVState *env,
+                            target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int8_t e1 =3D (int8_t)EXTRACT8(rs1, i);
+        int8_t e2 =3D (int8_t)EXTRACT8(rs2, i);
+        int16_t diff =3D (int16_t)e1 - (int16_t)e2;
+        uint8_t res =3D (diff >=3D 0) ? (uint8_t)diff : (uint8_t)(-diff);
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PABDU.B - Packed 8-bit unsigned absolute difference
+ * For each byte: rd[i] =3D |rs1[i] - rs2[i]|
+ */
+target_ulong HELPER(pabdu_b)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i);
+        uint8_t res =3D (e1 > e2) ? (e1 - e2) : (e2 - e1);
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PABD.H - Packed 16-bit signed absolute difference
+ * For each halfword: rd[i] =3D |rs1[i] - rs2[i]|
+ */
+target_ulong HELPER(pabd_h)(CPURISCVState *env,
+                            target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        int32_t diff =3D (int32_t)e1 - (int32_t)e2;
+        uint16_t res =3D (diff >=3D 0) ? (uint16_t)diff : (uint16_t)(-diff=
);
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PABDU.H - Packed 16-bit unsigned absolute difference
+ * For each halfword: rd[i] =3D |rs1[i] - rs2[i]|
+ */
+target_ulong HELPER(pabdu_h)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint16_t res =3D (e1 > e2) ? (e1 - e2) : (e2 - e1);
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PABDSUMU.B - Sum of unsigned absolute differences
+ * Returns sum(|rs1[i] - rs2[i]|) for all bytes
+ */
+target_ulong HELPER(pabdsumu_b)(CPURISCVState *env,
+                                target_ulong rs1, target_ulong rs2)
+{
+    target_ulong sum =3D 0;
+    int elems =3D ELEMS_B(rs1);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i);
+        uint8_t diff =3D (e1 > e2) ? (e1 - e2) : (e2 - e1);
+        sum +=3D diff;
+    }
+
+    return sum;
+}
+
+/**
+ * PABDSUMAU.B - Accumulated sum of unsigned absolute differences
+ * rd =3D rd + sum(|rs1[i] - rs2[i]|)
+ */
+target_ulong HELPER(pabdsumau_b)(CPURISCVState *env, target_ulong rs1,
+                                 target_ulong rs2, target_ulong rd)
+{
+    target_ulong sum =3D rd;
+    int elems =3D ELEMS_B(rs1);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i);
+        uint8_t diff =3D (e1 > e2) ? (e1 - e2) : (e2 - e1);
+        sum +=3D diff;
+    }
+
+    return sum;
+}
+
+/* Comparison operations (producing masks) */
+
+/**
+ * PMSEQ.B - Packed 8-bit equal comparison
+ * For each byte: rd[i] =3D 0xFF if rs1[i] =3D=3D rs2[i], else 0x00
+ */
+target_ulong HELPER(pmseq_b)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i);
+        uint8_t res =3D (e1 =3D=3D e2) ? 0xFF : 0x00;
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMSLT.B - Packed 8-bit signed less-than comparison
+ * For each byte: rd[i] =3D 0xFF if rs1[i] < rs2[i], else 0x00
+ */
+target_ulong HELPER(pmslt_b)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int8_t e1 =3D (int8_t)EXTRACT8(rs1, i);
+        int8_t e2 =3D (int8_t)EXTRACT8(rs2, i);
+        uint8_t res =3D (e1 < e2) ? 0xFF : 0x00;
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMSLTU.B - Packed 8-bit unsigned less-than comparison
+ * For each byte: rd[i] =3D 0xFF if rs1[i] < rs2[i], else 0x00
+ */
+target_ulong HELPER(pmsltu_b)(CPURISCVState *env,
+                              target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i);
+        uint8_t res =3D (e1 < e2) ? 0xFF : 0x00;
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMIN.B - Packed 8-bit signed minimum
+ * For each byte: rd[i] =3D min(rs1[i], rs2[i])
+ */
+target_ulong HELPER(pmin_b)(CPURISCVState *env,
+                            target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int8_t e1 =3D (int8_t)EXTRACT8(rs1, i);
+        int8_t e2 =3D (int8_t)EXTRACT8(rs2, i);
+        int8_t res =3D (e1 < e2) ? e1 : e2;
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMINU.B - Packed 8-bit unsigned minimum
+ * For each byte: rd[i] =3D min(rs1[i], rs2[i])
+ */
+target_ulong HELPER(pminu_b)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i);
+        uint8_t res =3D (e1 < e2) ? e1 : e2;
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMAX.B - Packed 8-bit signed maximum
+ * For each byte: rd[i] =3D max(rs1[i], rs2[i])
+ */
+target_ulong HELPER(pmax_b)(CPURISCVState *env,
+                            target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int8_t e1 =3D (int8_t)EXTRACT8(rs1, i);
+        int8_t e2 =3D (int8_t)EXTRACT8(rs2, i);
+        int8_t res =3D (e1 > e2) ? e1 : e2;
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMAXU.B - Packed 8-bit unsigned maximum
+ * For each byte: rd[i] =3D max(rs1[i], rs2[i])
+ */
+target_ulong HELPER(pmaxu_b)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i);
+        uint8_t res =3D (e1 > e2) ? e1 : e2;
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMSEQ.H - Packed 16-bit equal comparison
+ * For each halfword: rd[i] =3D 0xFFFF if rs1[i] =3D=3D rs2[i], else 0x0000
+ */
+target_ulong HELPER(pmseq_h)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint16_t res =3D (e1 =3D=3D e2) ? 0xFFFF : 0x0000;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMSLT.H - Packed 16-bit signed less-than comparison
+ * For each halfword: rd[i] =3D 0xFFFF if rs1[i] < rs2[i], else 0x0000
+ */
+target_ulong HELPER(pmslt_h)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        uint16_t res =3D (e1 < e2) ? 0xFFFF : 0x0000;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMSLTU.H - Packed 16-bit unsigned less-than comparison
+ * For each halfword: rd[i] =3D 0xFFFF if rs1[i] < rs2[i], else 0x0000
+ */
+target_ulong HELPER(pmsltu_h)(CPURISCVState *env,
+                              target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint16_t res =3D (e1 < e2) ? 0xFFFF : 0x0000;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMIN.H - Packed 16-bit signed minimum
+ * For each halfword: rd[i] =3D min(rs1[i], rs2[i])
+ */
+target_ulong HELPER(pmin_h)(CPURISCVState *env,
+                            target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        int16_t res =3D (e1 < e2) ? e1 : e2;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMINU.H - Packed 16-bit unsigned minimum
+ * For each halfword: rd[i] =3D min(rs1[i], rs2[i])
+ */
+target_ulong HELPER(pminu_h)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint16_t res =3D (e1 < e2) ? e1 : e2;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMAX.H - Packed 16-bit signed maximum
+ * For each halfword: rd[i] =3D max(rs1[i], rs2[i])
+ */
+target_ulong HELPER(pmax_h)(CPURISCVState *env,
+                            target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        int16_t res =3D (e1 > e2) ? e1 : e2;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMAXU.H - Packed 16-bit unsigned maximum
+ * For each halfword: rd[i] =3D max(rs1[i], rs2[i])
+ */
+target_ulong HELPER(pmaxu_h)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint16_t res =3D (e1 > e2) ? e1 : e2;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMSEQ.W - Packed 32-bit equal comparison (RV64 only)
+ * For each word: rd[i] =3D 0xFFFFFFFF if rs1[i] =3D=3D rs2[i], else 0x000=
00000
+ */
+uint64_t HELPER(pmseq_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint32_t res =3D (e1 =3D=3D e2) ? 0xFFFFFFFFU : 0x00000000U;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMSLT.W - Packed 32-bit signed less-than comparison (RV64 only)
+ * For each word: rd[i] =3D 0xFFFFFFFF if rs1[i] < rs2[i], else 0x00000000
+ */
+uint64_t HELPER(pmslt_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t e2 =3D (int32_t)EXTRACT32(rs2, i);
+        uint32_t res =3D (e1 < e2) ? 0xFFFFFFFFU : 0x00000000U;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMSLTU.W - Packed 32-bit unsigned less-than comparison (RV64 only)
+ * For each word: rd[i] =3D 0xFFFFFFFF if rs1[i] < rs2[i], else 0x00000000
+ */
+uint64_t HELPER(pmsltu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint32_t res =3D (e1 < e2) ? 0xFFFFFFFFU : 0x00000000U;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMIN.W - Packed 32-bit signed minimum (RV64 only)
+ * For each word: rd[i] =3D min(rs1[i], rs2[i])
+ */
+uint64_t HELPER(pmin_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t e2 =3D (int32_t)EXTRACT32(rs2, i);
+        int32_t res =3D (e1 < e2) ? e1 : e2;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMINU.W - Packed 32-bit unsigned minimum (RV64 only)
+ * For each word: rd[i] =3D min(rs1[i], rs2[i])
+ */
+uint64_t HELPER(pminu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint32_t res =3D (e1 < e2) ? e1 : e2;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMAX.W - Packed 32-bit signed maximum (RV64 only)
+ * For each word: rd[i] =3D max(rs1[i], rs2[i])
+ */
+uint64_t HELPER(pmax_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t e2 =3D (int32_t)EXTRACT32(rs2, i);
+        int32_t res =3D (e1 > e2) ? e1 : e2;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMAXU.W - Packed 32-bit unsigned maximum (RV64 only)
+ * For each word: rd[i] =3D max(rs1[i], rs2[i])
+ */
+uint64_t HELPER(pmaxu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint32_t res =3D (e1 > e2) ? e1 : e2;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * MSEQ - 32-bit scalar set if equal (mask)
+ */
+uint32_t HELPER(mseq)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    return (rs1 =3D=3D rs2) ? 0xFFFFFFFFU : 0x00000000U;
+}
+
+/**
+ * MSLT - 32-bit scalar set if signed less than (mask)
+ */
+uint32_t HELPER(mslt)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    return ((int32_t)rs1 < (int32_t)rs2) ? 0xFFFFFFFFU : 0x00000000U;
+}
+
+/**
+ * MSLTU - 32-bit scalar set if unsigned less than (mask)
+ */
+uint32_t HELPER(msltu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    return (rs1 < rs2) ? 0xFFFFFFFFU : 0x00000000U;
+}
+
--=20
2.34.1
From nobody Sat May 30 20:13:15 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1776422857901955.2302875986313;
 Fri, 17 Apr 2026 03:47:37 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists1p.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1wDgjS-0001Bn-1U; Fri, 17 Apr 2026 06:47:31 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjK-00014V-Ds; Fri, 17 Apr 2026 06:47:22 -0400
Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn)
 by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjG-0007yy-PO; Fri, 17 Apr 2026 06:47:22 -0400
Received: from Huawei.localdomain (unknown [36.110.52.2])
 by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S7;
 Fri, 17 Apr 2026 18:47:12 +0800 (CST)
From: Molly Chen <xiaoou@iscas.ac.cn>
To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com,
 daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com,
 chao.liu.zevorn@gmail.com
Cc: xiaoou@iscas.ac.cn,
	qemu-riscv@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 05/14] target/riscv: rvp: add shift operations
Date: Fri, 17 Apr 2026 18:46:42 +0800
Message-Id: <20260417104652.17857-6-xiaoou@iscas.ac.cn>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
References: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S7
X-Coremail-Antispam: 1UD129KBjvAXoWftr1ruFyrAw48CrW7Zr1fWFg_yoW8Kr1rWo
 ZxKw1Yyw1fGr13u348uw48Xr1Iqry2vw1DJr4rZr4UXa97Wr12gF15J34kZF4xJrWayrW5
 XFZ3KF95JF1akr93n29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3
 AaLaJ3UjIYCTnIWjp_UUUOU7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva
 j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc
 Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l
 84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26F4j6r
 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq
 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7
 IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4U
 M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0262kKe7AKxVWUtV
 W8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2
 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2
 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_
 Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJw
 CI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7VUbnNVPUU
 UUU==
X-Originating-IP: [36.110.52.2]
X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists1p.gnu.org;
Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn;
 helo=cstnet.cn
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998,
 HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: qemu development <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1776422860775158500
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Molly Chen <xiaoou@iscas.ac.cn>
---
 target/riscv/helper.h                   |  34 ++
 target/riscv/insn32.decode              |  44 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  34 ++
 target/riscv/psimd_helper.c             | 736 ++++++++++++++++++++++++
 4 files changed, 848 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f6351ecd43..d97552eb58 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1449,3 +1449,37 @@ DEF_HELPER_3(mseq, i32, env, i32, i32)
 DEF_HELPER_3(mslt, i32, env, i32, i32)
 DEF_HELPER_3(msltu, i32, env, i32, i32)
=20
+/* Packed SIMD - Shift Operations */
+DEF_HELPER_3(pslli_b, tl, env, tl, tl)
+DEF_HELPER_3(psll_bs, tl, env, tl, tl)
+DEF_HELPER_3(pslli_h, tl, env, tl, tl)
+DEF_HELPER_3(psll_hs, tl, env, tl, tl)
+DEF_HELPER_3(pslli_w, i64, env, i64, i64)
+DEF_HELPER_3(psll_ws, i64, env, i64, i64)
+DEF_HELPER_3(psrli_b, tl, env, tl, tl)
+DEF_HELPER_3(psrl_bs, tl, env, tl, tl)
+DEF_HELPER_3(psrli_h, tl, env, tl, tl)
+DEF_HELPER_3(psrl_hs, tl, env, tl, tl)
+DEF_HELPER_3(psrli_w, i64, env, i64, i64)
+DEF_HELPER_3(psrl_ws, i64, env, i64, i64)
+DEF_HELPER_3(psrai_b, tl, env, tl, tl)
+DEF_HELPER_3(psra_bs, tl, env, tl, tl)
+DEF_HELPER_3(psrai_h, tl, env, tl, tl)
+DEF_HELPER_3(psra_hs, tl, env, tl, tl)
+DEF_HELPER_3(psrai_w, i64, env, i64, i64)
+DEF_HELPER_3(psra_ws, i64, env, i64, i64)
+DEF_HELPER_3(psslai_h, tl, env, tl, tl)
+DEF_HELPER_3(psslai_w, i64, env, i64, i64)
+DEF_HELPER_3(sslai, i32, env, i32, i32)
+DEF_HELPER_3(psrari_h, tl, env, tl, tl)
+DEF_HELPER_3(psrari_w, i64, env, i64, i64)
+DEF_HELPER_3(srari_32, i32, env, i32, i32)
+DEF_HELPER_3(srari_64, i64, env, i64, i64)
+DEF_HELPER_3(pssha_hs, tl, env, tl, tl)
+DEF_HELPER_3(pssha_ws, i64, env, i64, i64)
+DEF_HELPER_3(psshar_hs, tl, env, tl, tl)
+DEF_HELPER_3(psshar_ws, i64, env, i64, i64)
+DEF_HELPER_3(ssha, i32, env, i32, i32)
+DEF_HELPER_3(sshar, i32, env, i32, i32)
+DEF_HELPER_3(sha, i64, env, i64, i64)
+DEF_HELPER_3(shar, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2034041639..69514e2cb9 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -40,6 +40,7 @@
 %imm_z6   26:1 15:5
 %imm_mop5 30:1 26:2 20:2
 %imm_mop3 30:1 26:2
+%imm_p_ui8  20:3
 %imm_p_ui16 20:4
 %imm_p_ui32 20:5
 %imm_p_ui64 20:6
@@ -108,6 +109,7 @@
 @mop5 . . .. .. .... .. ..... ... ..... ....... &mop5 imm=3D%imm_mop5 %rd =
%rs1
 @mop3 . . .. .. . ..... ..... ... ..... ....... &mop3 imm=3D%imm_mop3 %rd =
%rs1 %rs2
=20
+@p_ui8  ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui8  %rs1 %=
rd
 @p_ui16 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui16 %rs1 %=
rd
 @p_ui32 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui32 %rs1 %=
rd
 @p_ui64 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui64 %rs1 %=
rd
@@ -1219,3 +1221,45 @@ pminu_w    1110101 ..... ..... 110 ..... 0111011 @r
 pmax_w     1111001 ..... ..... 110 ..... 0111011 @r
 pmaxu_w    1111101 ..... ..... 110 ..... 0111011 @r
=20
+# Packed SIMD - Shift Operations
+pslli_b    10000 0001... ..... 010 ..... 0011011 @p_ui8
+psll_bs    1000110 ..... ..... 010 ..... 0011011 @r
+pslli_h    10000 001.... ..... 010 ..... 0011011 @p_ui16
+psll_hs    1000100 ..... ..... 010 ..... 0011011 @r
+pslli_w    10000 01..... ..... 010 ..... 0011011 @p_ui32
+psll_ws    1000101 ..... ..... 010 ..... 0011011 @r
+psrli_b    10000 0001... ..... 100 ..... 0011011 @p_ui8
+psrl_bs    1000110 ..... ..... 100 ..... 0011011 @r
+psrli_h    10000 001.... ..... 100 ..... 0011011 @p_ui16
+psrl_hs    1000100 ..... ..... 100 ..... 0011011 @r
+psrli_w    10000 01..... ..... 100 ..... 0011011 @p_ui32
+psrl_ws    1000101 ..... ..... 100 ..... 0011011 @r
+psrai_b    11000 0001... ..... 100 ..... 0011011 @p_ui8
+psra_bs    1100110 ..... ..... 100 ..... 0011011 @r
+psrai_h    11000 001.... ..... 100 ..... 0011011 @p_ui16
+psra_hs    1100100 ..... ..... 100 ..... 0011011 @r
+psrai_w    11000 01..... ..... 100 ..... 0011011 @p_ui32
+psra_ws    1100101 ..... ..... 100 ..... 0011011 @r
+psslai_h   11010 001.... ..... 010 ..... 0011011 @p_ui16
+{
+  sslai    11010 01..... ..... 010 ..... 0011011 @p_ui32
+  psslai_w 11010 01..... ..... 010 ..... 0011011 @p_ui32
+}
+psrari_h   11010 001.... ..... 100 ..... 0011011 @p_ui16
+{
+  srari_32 11010 01..... ..... 100 ..... 0011011 @p_ui32
+  psrari_w 11010 01..... ..... 100 ..... 0011011 @p_ui32
+}
+srari_64   110101 ...... ..... 100 ..... 0011011 @p_ui64
+pssha_hs   1110100 ..... ..... 010 ..... 0011011 @r
+{
+  ssha     1110101 ..... ..... 010 ..... 0011011 @r
+  pssha_ws 1110101 ..... ..... 010 ..... 0011011 @r
+}
+psshar_hs  1111100 ..... ..... 010 ..... 0011011 @r
+{
+  sshar    1111101 ..... ..... 010 ..... 0011011 @r
+  psshar_ws 1111101 ..... ..... 010 ..... 0011011 @r
+}
+sha        1110111 ..... ..... 010 ..... 0011011 @r
+shar       1111111 ..... ..... 010 ..... 0011011 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr=
ans/trans_rvp.c.inc
index 27d482863c..d0b645d083 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -620,3 +620,37 @@ GEN_SIMD_TRANS_32(mseq)
 GEN_SIMD_TRANS_32(mslt)
 GEN_SIMD_TRANS_32(msltu)
=20
+/* Packed SIMD - Shift Operations */
+GEN_SIMD_TRANS_IMM(pslli_b)
+GEN_SIMD_TRANS(psll_bs)
+GEN_SIMD_TRANS_IMM(pslli_h)
+GEN_SIMD_TRANS(psll_hs)
+GEN_SIMD_TRANS_IMM_64(pslli_w)
+GEN_SIMD_TRANS_64(psll_ws)
+GEN_SIMD_TRANS_IMM(psrli_b)
+GEN_SIMD_TRANS(psrl_bs)
+GEN_SIMD_TRANS_IMM(psrli_h)
+GEN_SIMD_TRANS(psrl_hs)
+GEN_SIMD_TRANS_IMM_64(psrli_w)
+GEN_SIMD_TRANS_64(psrl_ws)
+GEN_SIMD_TRANS_IMM(psrai_b)
+GEN_SIMD_TRANS(psra_bs)
+GEN_SIMD_TRANS_IMM(psrai_h)
+GEN_SIMD_TRANS(psra_hs)
+GEN_SIMD_TRANS_IMM_64(psrai_w)
+GEN_SIMD_TRANS_64(psra_ws)
+GEN_SIMD_TRANS_IMM(psslai_h)
+GEN_SIMD_TRANS_IMM_64(psslai_w)
+GEN_SIMD_TRANS_IMM_32(sslai)
+GEN_SIMD_TRANS_IMM(psrari_h)
+GEN_SIMD_TRANS_IMM_64(psrari_w)
+GEN_SIMD_TRANS_IMM_32(srari_32)
+GEN_SIMD_TRANS_IMM_64(srari_64)
+GEN_SIMD_TRANS(pssha_hs)
+GEN_SIMD_TRANS_64(pssha_ws)
+GEN_SIMD_TRANS(psshar_hs)
+GEN_SIMD_TRANS_64(psshar_ws)
+GEN_SIMD_TRANS_32(ssha)
+GEN_SIMD_TRANS_32(sshar)
+GEN_SIMD_TRANS_64(sha)
+GEN_SIMD_TRANS_64(shar)
diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c
index 38207c3a39..ef556eb007 100644
--- a/target/riscv/psimd_helper.c
+++ b/target/riscv/psimd_helper.c
@@ -1967,3 +1967,739 @@ uint32_t HELPER(msltu)(CPURISCVState *env, uint32_t=
 rs1, uint32_t rs2)
     return (rs1 < rs2) ? 0xFFFFFFFFU : 0x00000000U;
 }
=20
+/* Shift operations (immediate and register) */
+
+/**
+ * PSLLI.B - Packed 8-bit logical shift left immediate
+ * For each byte: rd[i] =3D rs1[i] << imm
+ */
+target_ulong HELPER(pslli_b)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong imm)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+    uint8_t shamt =3D imm & 0x07;  /* 8-bit elements, max shift 7 */
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t res =3D e1 << shamt;
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSLL.BS - Packed 8-bit logical shift left from register
+ * For each byte: rd[i] =3D rs1[i] << rs2[4:0]
+ */
+target_ulong HELPER(psll_bs)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+    uint8_t shamt =3D rs2 & 0x07;  /* rs2[2:0] for 8-bit */
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t res =3D e1 << shamt;
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSLLI.H - Packed 16-bit logical shift left immediate
+ * For each halfword: rd[i] =3D rs1[i] << imm
+ */
+target_ulong HELPER(pslli_h)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong imm)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    uint8_t shamt =3D imm & 0x0F;  /* 16-bit elements, max shift 15 */
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t res =3D e1 << shamt;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSLL.HS - Packed 16-bit logical shift left from register
+ * For each halfword: rd[i] =3D rs1[i] << rs2[4:0]
+ */
+target_ulong HELPER(psll_hs)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    uint8_t shamt =3D rs2 & 0x0F;  /* rs2[3:0] for 16-bit */
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t res =3D e1 << shamt;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSLLI.W - Packed 32-bit logical shift left immediate (RV64 only)
+ * For each word: rd[i] =3D rs1[i] << imm
+ */
+uint64_t HELPER(pslli_w)(CPURISCVState *env, uint64_t rs1, uint64_t imm)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    uint8_t shamt =3D imm & 0x1F;  /* 32-bit elements, max shift 31 */
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t res =3D e1 << shamt;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSLL.WS - Packed 32-bit logical shift left from register (RV64 only)
+ * For each word: rd[i] =3D rs1[i] << rs2[5:0]
+ */
+uint64_t HELPER(psll_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    uint8_t shamt =3D rs2 & 0x1F;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t res =3D e1 << shamt;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSRLI.B - Packed 8-bit logical shift right immediate
+ * For each byte: rd[i] =3D rs1[i] >> imm
+ */
+target_ulong HELPER(psrli_b)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong imm)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+    uint8_t shamt =3D imm & 0x07;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t res =3D e1 >> shamt;
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSRL.BS - Packed 8-bit logical shift right from register
+ * For each byte: rd[i] =3D rs1[i] >> rs2[4:0]
+ */
+target_ulong HELPER(psrl_bs)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+    uint8_t shamt =3D rs2 & 0x07;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        uint8_t res =3D e1 >> shamt;
+        rd =3D INSERT8(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSRLI.H - Packed 16-bit logical shift right immediate
+ * For each halfword: rd[i] =3D rs1[i] >> imm
+ */
+target_ulong HELPER(psrli_h)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong imm)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    uint8_t shamt =3D imm & 0x0F;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t res =3D e1 >> shamt;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSRL.HS - Packed 16-bit logical shift right from register
+ * For each halfword: rd[i] =3D rs1[i] >> rs2[4:0]
+ */
+target_ulong HELPER(psrl_hs)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    uint8_t shamt =3D rs2 & 0x0F;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t res =3D e1 >> shamt;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSRLI.W - Packed 32-bit logical shift right immediate (RV64 only)
+ * For each word: rd[i] =3D rs1[i] >> imm
+ */
+uint64_t HELPER(psrli_w)(CPURISCVState *env, uint64_t rs1, uint64_t imm)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    uint8_t shamt =3D imm & 0x1F;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t res =3D e1 >> shamt;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSRL.WS - Packed 32-bit logical shift right from register (RV64 only)
+ * For each word: rd[i] =3D rs1[i] >> rs2[5:0]
+ */
+uint64_t HELPER(psrl_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    uint8_t shamt =3D rs2 & 0x1F;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t res =3D e1 >> shamt;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSRAI.B - Packed 8-bit arithmetic shift right immediate
+ * For each byte: rd[i] =3D (int8_t)rs1[i] >> imm
+ */
+target_ulong HELPER(psrai_b)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong imm)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+    uint8_t shamt =3D imm & 0x07;
+
+    for (int i =3D 0; i < elems; i++) {
+        int8_t e1 =3D (int8_t)EXTRACT8(rs1, i);
+        int8_t res =3D e1 >> shamt;  /* Arithmetic right shift */
+        rd =3D INSERT8(rd, (uint8_t)res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSRA.BS - Packed 8-bit arithmetic shift right from register
+ * For each byte: rd[i] =3D (int8_t)rs1[i] >> rs2[4:0]
+ */
+target_ulong HELPER(psra_bs)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_B(rd);
+    uint8_t shamt =3D rs2 & 0x07;
+
+    for (int i =3D 0; i < elems; i++) {
+        int8_t e1 =3D (int8_t)EXTRACT8(rs1, i);
+        int8_t res =3D e1 >> shamt;
+        rd =3D INSERT8(rd, (uint8_t)res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSRAI.H - Packed 16-bit arithmetic shift right immediate
+ * For each halfword: rd[i] =3D (int16_t)rs1[i] >> imm
+ */
+target_ulong HELPER(psrai_h)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong imm)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    uint8_t shamt =3D imm & 0x0F;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t res =3D e1 >> shamt;
+        rd =3D INSERT16(rd, (uint16_t)res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSRA.HS - Packed 16-bit arithmetic shift right from register
+ * For each halfword: rd[i] =3D (int16_t)rs1[i] >> rs2[4:0]
+ */
+target_ulong HELPER(psra_hs)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    uint8_t shamt =3D rs2 & 0x0F;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t res =3D e1 >> shamt;
+        rd =3D INSERT16(rd, (uint16_t)res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSRAI.W - Packed 32-bit arithmetic shift right immediate (RV64 only)
+ * For each word: rd[i] =3D (int32_t)rs1[i] >> imm
+ */
+uint64_t HELPER(psrai_w)(CPURISCVState *env, uint64_t rs1, uint64_t imm)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    uint8_t shamt =3D imm & 0x1F;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t res =3D e1 >> shamt;
+        rd =3D INSERT32(rd, (uint32_t)res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSRA.WS - Packed 32-bit arithmetic shift right from register (RV64 only)
+ * For each word: rd[i] =3D (int32_t)rs1[i] >> rs2[5:0]
+ */
+uint64_t HELPER(psra_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    uint8_t shamt =3D rs2 & 0x1F;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t res =3D e1 >> shamt;
+        rd =3D INSERT32(rd, (uint32_t)res, i);
+    }
+    return rd;
+}
+
+/* Saturating shift operations */
+
+/**
+ * PSSLAI.H - Packed 16-bit saturating shift left immediate
+ * For each halfword: rd[i] =3D sat16(rs1[i] << imm)
+ */
+target_ulong HELPER(psslai_h)(CPURISCVState *env,
+                              target_ulong rs1, target_ulong imm)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+    uint8_t shamt =3D imm & 0x0F;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int32_t shifted =3D (int32_t)e1 << shamt;
+        int16_t res =3D signed_saturate_h(shifted, &sat);
+        rd =3D INSERT16(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSLAI.W - Packed 32-bit saturating shift left immediate (RV64 only)
+ * For each word: rd[i] =3D sat32(rs1[i] << imm)
+ */
+uint64_t HELPER(psslai_w)(CPURISCVState *env, uint64_t rs1, uint64_t imm)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    int sat =3D 0;
+    uint8_t shamt =3D imm & 0x1F;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int64_t shifted =3D (int64_t)e1 << shamt;
+        int32_t res =3D signed_saturate_w(shifted, &sat);
+        rd =3D INSERT32(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * SSLAL - 32-bit scalar saturating shift left immediate
+ */
+uint32_t HELPER(sslai)(CPURISCVState *env, uint32_t rs1, uint32_t imm)
+{
+    int32_t a =3D (int32_t)rs1;
+    uint8_t shamt =3D imm & 0x1F;
+    int64_t shifted =3D (int64_t)a << shamt;
+    int sat =3D 0;
+    int32_t res =3D signed_saturate_w(shifted, &sat);
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return (uint32_t)res;
+}
+
+/* Rounding shift operations */
+
+/**
+ * PSRARI.H - Packed 16-bit arithmetic shift right with rounding (immediat=
e)
+ * For each halfword: rd[i] =3D round((int16_t)rs1[i] >> imm)
+ */
+target_ulong HELPER(psrari_h)(CPURISCVState *env,
+                              target_ulong rs1, target_ulong imm)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    uint8_t shamt =3D imm & 0x0F;
+
+    if (shamt =3D=3D 0) {
+        return rs1;
+    }
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int32_t rounded =3D ((e1 >> (shamt - 1)) + 1) >> 1;
+        rd =3D INSERT16(rd, (int16_t)rounded, i);
+    }
+    return rd;
+}
+
+/**
+ * PSRARI.W - Packed 32-bit arithmetic shift right
+ * with rounding (immediate) (RV64 only)
+ * For each word: rd[i] =3D round((int32_t)rs1[i] >> imm)
+ */
+uint64_t HELPER(psrari_w)(CPURISCVState *env, uint64_t rs1, uint64_t imm)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    uint8_t shamt =3D imm & 0x1F;
+
+    if (shamt =3D=3D 0) {
+        return rs1;
+    }
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int64_t rounded =3D ((e1 >> (shamt - 1)) + 1) >> 1;
+        rd =3D INSERT32(rd, (int32_t)rounded, i);
+    }
+    return rd;
+}
+
+/**
+ * SRARI_32 - 32-bit scalar arithmetic shift right with rounding
+ */
+uint32_t HELPER(srari_32)(CPURISCVState *env, uint32_t rs1, uint32_t imm)
+{
+    int32_t a =3D (int32_t)rs1;
+    uint8_t shamt =3D imm & 0x1F;
+
+    if (shamt =3D=3D 0) {
+        return rs1;
+    }
+
+    return (uint32_t)(((a >> (shamt - 1)) + 1) >> 1);
+}
+
+/**
+ * SRARI_64 - 64-bit scalar arithmetic shift right with rounding
+ */
+uint64_t HELPER(srari_64)(CPURISCVState *env, uint64_t rs1, uint64_t imm)
+{
+    int64_t a =3D (int64_t)rs1;
+    uint8_t shamt =3D imm & 0x3F;
+
+    if (shamt =3D=3D 0) {
+        return rs1;
+    }
+
+    return (uint64_t)(((a >> (shamt - 1)) + 1) >> 1);
+}
+
+/* Variable shift operations (with saturation and rounding) */
+
+/**
+ * PSSHA.HS - Packed 16-bit variable shift with saturation
+ * Positive shift left (saturating), negative shift right (non-saturating)
+ */
+target_ulong HELPER(pssha_hs)(CPURISCVState *env,
+                              target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);  /* rs2[7:0] as signed */
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t res;
+
+        if (shamt >=3D 0) {
+            /* Left shift with saturation */
+            int32_t shifted =3D (int32_t)e1 << shamt;
+            res =3D signed_saturate_h(shifted, &sat);
+        } else {
+            /* Right shift (no saturation) */
+            int right =3D -shamt;
+            if (right >=3D 16) {
+                res =3D (e1 < 0) ? -1 : 0;
+            } else {
+                res =3D e1 >> right;
+            }
+        }
+
+        rd =3D INSERT16(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSHA.WS - Packed 32-bit variable shift with saturation (RV64 only)
+ * Positive shift left (saturating), negative shift right (non-saturating)
+ */
+uint64_t HELPER(pssha_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    int sat =3D 0;
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t res;
+
+        if (shamt >=3D 0) {
+            int64_t shifted =3D (int64_t)e1 << shamt;
+            res =3D signed_saturate_w(shifted, &sat);
+        } else {
+            int right =3D -shamt;
+            if (right >=3D 32) {
+                res =3D (e1 < 0) ? -1 : 0;
+            } else {
+                res =3D e1 >> right;
+            }
+        }
+
+        rd =3D INSERT32(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSHAR.HS - Packed 16-bit variable shift with rounding and saturation
+ * Positive shift left (saturating), negative shift right (rounded)
+ */
+target_ulong HELPER(psshar_hs)(CPURISCVState *env,
+                               target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t res;
+
+        if (shamt >=3D 0) {
+            /* Left shift with saturation */
+            int32_t shifted =3D (int32_t)e1 << shamt;
+            res =3D signed_saturate_h(shifted, &sat);
+        } else {
+            /* Right shift with rounding */
+            int right =3D -shamt;
+            if (right >=3D 16) {
+                res =3D (e1 < 0) ? -1 : 0;
+            } else {
+                int32_t rounded =3D ((e1 >> (right - 1)) + 1) >> 1;
+                res =3D (int16_t)rounded;
+            }
+        }
+
+        rd =3D INSERT16(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSHAR.WS - Packed 32-bit variable shift with
+ * rounding and saturation (RV64 only)
+ * Positive shift left (saturating), negative shift right (rounded)
+ */
+uint64_t HELPER(psshar_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    int sat =3D 0;
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t res;
+
+        if (shamt >=3D 0) {
+            int64_t shifted =3D (int64_t)e1 << shamt;
+            res =3D signed_saturate_w(shifted, &sat);
+        } else {
+            int right =3D -shamt;
+            if (right >=3D 32) {
+                res =3D (e1 < 0) ? -1 : 0;
+            } else {
+                int64_t rounded =3D ((e1 >> (right - 1)) + 1) >> 1;
+                res =3D (int32_t)rounded;
+            }
+        }
+
+        rd =3D INSERT32(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * SSHA - 32-bit scalar variable shift with saturation
+ */
+uint32_t HELPER(ssha)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int32_t a =3D (int32_t)rs1;
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+    int sat =3D 0;
+    int32_t res;
+
+    if (shamt >=3D 0) {
+        int64_t shifted =3D (int64_t)a << shamt;
+        res =3D signed_saturate_w(shifted, &sat);
+    } else {
+        int right =3D -shamt;
+        if (right >=3D 32) {
+            res =3D (a < 0) ? -1 : 0;
+        } else {
+            res =3D a >> right;
+        }
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return (uint32_t)res;
+}
+
+/**
+ * SSHAR - 32-bit scalar variable shift with rounding and saturation
+ */
+uint32_t HELPER(sshar)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int32_t a =3D (int32_t)rs1;
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+    int sat =3D 0;
+    int32_t res;
+
+    if (shamt >=3D 0) {
+        int64_t shifted =3D (int64_t)a << shamt;
+        res =3D signed_saturate_w(shifted, &sat);
+    } else {
+        int right =3D -shamt;
+        if (right >=3D 32) {
+            res =3D (a < 0) ? -1 : 0;
+        } else {
+            int64_t rounded =3D ((a >> (right - 1)) + 1) >> 1;
+            res =3D (int32_t)rounded;
+        }
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return (uint32_t)res;
+}
+
+/**
+ * SHA - 64-bit scalar variable shift
+ */
+uint64_t HELPER(sha)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int64_t a =3D (int64_t)rs1;
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+
+    if (shamt >=3D 0) {
+        return (uint64_t)(a << shamt);
+    } else {
+        int right =3D -shamt;
+        if (right >=3D 64) {
+            return (a < 0) ? (uint64_t)-1 : 0;
+        } else {
+            return (uint64_t)(a >> right);
+        }
+    }
+}
+
+/**
+ * SHAR - 64-bit scalar variable shift with rounding
+ */
+uint64_t HELPER(shar)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int64_t a =3D (int64_t)rs1;
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+
+    if (shamt >=3D 0) {
+        return (uint64_t)(a << shamt);
+    } else {
+        int right =3D -shamt;
+        if (right >=3D 64) {
+            return (a < 0) ? (uint64_t)-1 : 0;
+        } else {
+            __int128_t rounded =3D ((__int128_t)a >> (right - 1)) + 1;
+            return (uint64_t)((int64_t)(rounded >> 1));
+        }
+    }
+}
--=20
2.34.1
From nobody Sat May 30 20:13:15 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1776422934665626.6776050674862;
 Fri, 17 Apr 2026 03:48:54 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists1p.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1wDgjL-000154-FW; Fri, 17 Apr 2026 06:47:23 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjJ-00013B-0o; Fri, 17 Apr 2026 06:47:21 -0400
Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn)
 by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjF-0007yf-PG; Fri, 17 Apr 2026 06:47:20 -0400
Received: from Huawei.localdomain (unknown [36.110.52.2])
 by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S8;
 Fri, 17 Apr 2026 18:47:13 +0800 (CST)
From: Molly Chen <xiaoou@iscas.ac.cn>
To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com,
 daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com,
 chao.liu.zevorn@gmail.com
Cc: xiaoou@iscas.ac.cn,
	qemu-riscv@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 06/14] target/riscv: rvp: add exchange operations
Date: Fri, 17 Apr 2026 18:46:43 +0800
Message-Id: <20260417104652.17857-7-xiaoou@iscas.ac.cn>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
References: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S8
X-Coremail-Antispam: 1UD129KBjvJXoWfJF1rXF1xKF47Kw45ZrW7twb_yoWkAryDpF
 Wvkry2q3y3JFySgw4fKF1fAw15WwsxJry8GrZxKF1Sqa1fXF1kJrW5tw13urs7GF9rWry5
 Wa98A3y8AFyIq37anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2
 9KBjDU0xBIdaVrnRJUUUPj14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0
 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI
 kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2
 z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr
 1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0D
 M2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjx
 v20xvE14v26r1Y6r17McIj6I8E87Iv67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1l
 F7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxkF7I0En4kS14v26r1q6r
 43MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_
 Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x
 0EwIxGrwCI42IY6xIIjxv20xvE14v26r1I6r4UMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWx
 JVW8Jr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMI
 IF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjfU5TmhDUUU
 U
X-Originating-IP: [36.110.52.2]
X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists1p.gnu.org;
Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn;
 helo=cstnet.cn
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998,
 HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: qemu development <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1776422936808154100
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Molly Chen <xiaoou@iscas.ac.cn>
---
 target/riscv/helper.h                   |  14 ++
 target/riscv/insn32.decode              |  14 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  14 ++
 target/riscv/psimd_helper.c             | 294 ++++++++++++++++++++++++
 4 files changed, 336 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index d97552eb58..fc66712570 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1483,3 +1483,17 @@ DEF_HELPER_3(ssha, i32, env, i32, i32)
 DEF_HELPER_3(sshar, i32, env, i32, i32)
 DEF_HELPER_3(sha, i64, env, i64, i64)
 DEF_HELPER_3(shar, i64, env, i64, i64)
+
+/* Packed SIMD - Exchange Operations */
+DEF_HELPER_3(pas_hx, tl, env, tl, tl)
+DEF_HELPER_3(psa_hx, tl, env, tl, tl)
+DEF_HELPER_3(psas_hx, tl, env, tl, tl)
+DEF_HELPER_3(pssa_hx, tl, env, tl, tl)
+DEF_HELPER_3(paas_hx, tl, env, tl, tl)
+DEF_HELPER_3(pasa_hx, tl, env, tl, tl)
+DEF_HELPER_3(pas_wx, i64, env, i64, i64)
+DEF_HELPER_3(psa_wx, i64, env, i64, i64)
+DEF_HELPER_3(psas_wx, i64, env, i64, i64)
+DEF_HELPER_3(pssa_wx, i64, env, i64, i64)
+DEF_HELPER_3(paas_wx, i64, env, i64, i64)
+DEF_HELPER_3(pasa_wx, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 69514e2cb9..ba003ed513 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1263,3 +1263,17 @@ psshar_hs  1111100 ..... ..... 010 ..... 0011011 @r
 }
 sha        1110111 ..... ..... 010 ..... 0011011 @r
 shar       1111111 ..... ..... 010 ..... 0011011 @r
+
+# Packed SIMD - Exchange Operations
+pas_hx     1000000 ..... ..... 110 ..... 0111011 @r
+psa_hx     1000010 ..... ..... 110 ..... 0111011 @r
+psas_hx    1001000 ..... ..... 110 ..... 0111011 @r
+pssa_hx    1001010 ..... ..... 110 ..... 0111011 @r
+paas_hx    1001100 ..... ..... 110 ..... 0111011 @r
+pasa_hx    1001110 ..... ..... 110 ..... 0111011 @r
+pas_wx     1000001 ..... ..... 110 ..... 0111011 @r
+psa_wx     1000011 ..... ..... 110 ..... 0111011 @r
+psas_wx    1001001 ..... ..... 110 ..... 0111011 @r
+pssa_wx    1001011 ..... ..... 110 ..... 0111011 @r
+paas_wx    1001101 ..... ..... 110 ..... 0111011 @r
+pasa_wx    1001111 ..... ..... 110 ..... 0111011 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr=
ans/trans_rvp.c.inc
index d0b645d083..b24a8ef7c2 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -654,3 +654,17 @@ GEN_SIMD_TRANS_32(ssha)
 GEN_SIMD_TRANS_32(sshar)
 GEN_SIMD_TRANS_64(sha)
 GEN_SIMD_TRANS_64(shar)
+
+/* Packed SIMD - Exchange Operations */
+GEN_SIMD_TRANS(pas_hx)
+GEN_SIMD_TRANS(psa_hx)
+GEN_SIMD_TRANS(psas_hx)
+GEN_SIMD_TRANS(pssa_hx)
+GEN_SIMD_TRANS(paas_hx)
+GEN_SIMD_TRANS(pasa_hx)
+GEN_SIMD_TRANS_64(pas_wx)
+GEN_SIMD_TRANS_64(psa_wx)
+GEN_SIMD_TRANS_64(psas_wx)
+GEN_SIMD_TRANS_64(pssa_wx)
+GEN_SIMD_TRANS_64(paas_wx)
+GEN_SIMD_TRANS_64(pasa_wx)
diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c
index ef556eb007..e48c9897ae 100644
--- a/target/riscv/psimd_helper.c
+++ b/target/riscv/psimd_helper.c
@@ -2703,3 +2703,297 @@ uint64_t HELPER(shar)(CPURISCVState *env, uint64_t =
rs1, uint64_t rs2)
         }
     }
 }
+
+/* Exchange operations (AS/SA/AS/SA with X suffix) */
+
+/**
+ * PAS.HX - Packed add-subtract with exchange
+ * For each pair: {rd[2i] =3D rs1[2i] - rs2[2i+1], rd[2i+1] =3D rs1[2i+1] =
+ rs2[2i]}
+ */
+target_ulong HELPER(pas_hx)(CPURISCVState *env,
+                            target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i +=3D 2) {
+        int16_t s1_lo =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t s1_hi =3D (int16_t)EXTRACT16(rs1, i + 1);
+        int16_t s2_lo =3D (int16_t)EXTRACT16(rs2, i);
+        int16_t s2_hi =3D (int16_t)EXTRACT16(rs2, i + 1);
+        int16_t res_lo =3D s1_lo - s2_hi;
+        int16_t res_hi =3D s1_hi + s2_lo;
+        rd =3D INSERT16(rd, res_lo, i);
+        rd =3D INSERT16(rd, res_hi, i + 1);
+    }
+    return rd;
+}
+
+/**
+ * PSA.HX - Packed subtract-add with exchange
+ * For each pair: {rd[2i] =3D rs1[2i] + rs2[2i+1], rd[2i+1] =3D rs1[2i+1] =
- rs2[2i]}
+ */
+target_ulong HELPER(psa_hx)(CPURISCVState *env,
+                            target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i +=3D 2) {
+        int16_t s1_lo =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t s1_hi =3D (int16_t)EXTRACT16(rs1, i + 1);
+        int16_t s2_lo =3D (int16_t)EXTRACT16(rs2, i);
+        int16_t s2_hi =3D (int16_t)EXTRACT16(rs2, i + 1);
+        int16_t res_lo =3D s1_lo + s2_hi;
+        int16_t res_hi =3D s1_hi - s2_lo;
+        rd =3D INSERT16(rd, res_lo, i);
+        rd =3D INSERT16(rd, res_hi, i + 1);
+    }
+    return rd;
+}
+
+/**
+ * PSAS.HX - Packed saturating add-subtract with exchange
+ */
+target_ulong HELPER(psas_hx)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i +=3D 2) {
+        int16_t s1_lo =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t s1_hi =3D (int16_t)EXTRACT16(rs1, i + 1);
+        int16_t s2_lo =3D (int16_t)EXTRACT16(rs2, i);
+        int16_t s2_hi =3D (int16_t)EXTRACT16(rs2, i + 1);
+        int32_t diff =3D (int32_t)s1_lo - (int32_t)s2_hi;
+        int32_t sum =3D (int32_t)s1_hi + (int32_t)s2_lo;
+        int16_t res_lo =3D signed_saturate_h(diff, &sat);
+        int16_t res_hi =3D signed_saturate_h(sum, &sat);
+        rd =3D INSERT16(rd, res_lo, i);
+        rd =3D INSERT16(rd, res_hi, i + 1);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSA.HX - Packed saturating subtract-add with exchange
+ */
+target_ulong HELPER(pssa_hx)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i +=3D 2) {
+        int16_t s1_lo =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t s1_hi =3D (int16_t)EXTRACT16(rs1, i + 1);
+        int16_t s2_lo =3D (int16_t)EXTRACT16(rs2, i);
+        int16_t s2_hi =3D (int16_t)EXTRACT16(rs2, i + 1);
+        int32_t sum =3D (int32_t)s1_lo + (int32_t)s2_hi;
+        int32_t diff =3D (int32_t)s1_hi - (int32_t)s2_lo;
+        int16_t res_lo =3D signed_saturate_h(sum, &sat);
+        int16_t res_hi =3D signed_saturate_h(diff, &sat);
+        rd =3D INSERT16(rd, res_lo, i);
+        rd =3D INSERT16(rd, res_hi, i + 1);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PAAS.HX - Packed averaging add-subtract with exchange
+ */
+target_ulong HELPER(paas_hx)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i +=3D 2) {
+        int16_t s1_lo =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t s1_hi =3D (int16_t)EXTRACT16(rs1, i + 1);
+        int16_t s2_lo =3D (int16_t)EXTRACT16(rs2, i);
+        int16_t s2_hi =3D (int16_t)EXTRACT16(rs2, i + 1);
+        int16_t res_lo =3D (s1_lo - s2_hi) >> 1;
+        int16_t res_hi =3D (s1_hi + s2_lo) >> 1;
+        rd =3D INSERT16(rd, res_lo, i);
+        rd =3D INSERT16(rd, res_hi, i + 1);
+    }
+    return rd;
+}
+
+/**
+ * PASA.HX - Packed averaging subtract-add with exchange
+ */
+target_ulong HELPER(pasa_hx)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i +=3D 2) {
+        int16_t s1_lo =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t s1_hi =3D (int16_t)EXTRACT16(rs1, i + 1);
+        int16_t s2_lo =3D (int16_t)EXTRACT16(rs2, i);
+        int16_t s2_hi =3D (int16_t)EXTRACT16(rs2, i + 1);
+        int16_t res_lo =3D (s1_lo + s2_hi) >> 1;
+        int16_t res_hi =3D (s1_hi - s2_lo) >> 1;
+        rd =3D INSERT16(rd, res_lo, i);
+        rd =3D INSERT16(rd, res_hi, i + 1);
+    }
+    return rd;
+}
+
+/**
+ * PAS.WX - Word version of packed add-subtract with exchange (RV64 only)
+ */
+uint64_t HELPER(pas_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i +=3D 2) {
+        int32_t s1_lo =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t s1_hi =3D (int32_t)EXTRACT32(rs1, i + 1);
+        int32_t s2_lo =3D (int32_t)EXTRACT32(rs2, i);
+        int32_t s2_hi =3D (int32_t)EXTRACT32(rs2, i + 1);
+        int32_t res_lo =3D s1_lo - s2_hi;
+        int32_t res_hi =3D s1_hi + s2_lo;
+        rd =3D INSERT32(rd, res_lo, i);
+        rd =3D INSERT32(rd, res_hi, i + 1);
+    }
+    return rd;
+}
+
+/**
+ * PSA.WX - Word version of packed subtract-add with exchange (RV64 only)
+ */
+uint64_t HELPER(psa_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i +=3D 2) {
+        int32_t s1_lo =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t s1_hi =3D (int32_t)EXTRACT32(rs1, i + 1);
+        int32_t s2_lo =3D (int32_t)EXTRACT32(rs2, i);
+        int32_t s2_hi =3D (int32_t)EXTRACT32(rs2, i + 1);
+        int32_t res_lo =3D s1_lo + s2_hi;
+        int32_t res_hi =3D s1_hi - s2_lo;
+        rd =3D INSERT32(rd, res_lo, i);
+        rd =3D INSERT32(rd, res_hi, i + 1);
+    }
+    return rd;
+}
+
+/**
+ * PSAS.WX - Word version of packed saturating
+ * add-subtract with exchange (RV64 only)
+ */
+uint64_t HELPER(psas_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i +=3D 2) {
+        int32_t s1_lo =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t s1_hi =3D (int32_t)EXTRACT32(rs1, i + 1);
+        int32_t s2_lo =3D (int32_t)EXTRACT32(rs2, i);
+        int32_t s2_hi =3D (int32_t)EXTRACT32(rs2, i + 1);
+        int64_t diff =3D (int64_t)s1_lo - (int64_t)s2_hi;
+        int64_t sum =3D (int64_t)s1_hi + (int64_t)s2_lo;
+        int32_t res_lo =3D signed_saturate_w(diff, &sat);
+        int32_t res_hi =3D signed_saturate_w(sum, &sat);
+        rd =3D INSERT32(rd, res_lo, i);
+        rd =3D INSERT32(rd, res_hi, i + 1);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSA.WX - Word version of packed saturating
+ * subtract-add with exchange (RV64 only)
+ */
+uint64_t HELPER(pssa_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i +=3D 2) {
+        int32_t s1_lo =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t s1_hi =3D (int32_t)EXTRACT32(rs1, i + 1);
+        int32_t s2_lo =3D (int32_t)EXTRACT32(rs2, i);
+        int32_t s2_hi =3D (int32_t)EXTRACT32(rs2, i + 1);
+        int64_t sum =3D (int64_t)s1_lo + (int64_t)s2_hi;
+        int64_t diff =3D (int64_t)s1_hi - (int64_t)s2_lo;
+        int32_t res_lo =3D signed_saturate_w(sum, &sat);
+        int32_t res_hi =3D signed_saturate_w(diff, &sat);
+        rd =3D INSERT32(rd, res_lo, i);
+        rd =3D INSERT32(rd, res_hi, i + 1);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PAAS.WX - Word version of packed averaging
+ * add-subtract with exchange (RV64 only)
+ */
+uint64_t HELPER(paas_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i +=3D 2) {
+        int64_t s1_lo =3D (int32_t)EXTRACT32(rs1, i);
+        int64_t s1_hi =3D (int32_t)EXTRACT32(rs1, i + 1);
+        int64_t s2_lo =3D (int32_t)EXTRACT32(rs2, i);
+        int64_t s2_hi =3D (int32_t)EXTRACT32(rs2, i + 1);
+        int32_t res_lo =3D (s1_lo - s2_hi) >> 1;
+        int32_t res_hi =3D (s1_hi + s2_lo) >> 1;
+        rd =3D INSERT32(rd, res_lo, i);
+        rd =3D INSERT32(rd, res_hi, i + 1);
+    }
+    return rd;
+}
+
+/**
+ * PASA.WX - Word version of packed averaging
+ * subtract-add with exchange (RV64 only)
+ */
+uint64_t HELPER(pasa_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i +=3D 2) {
+        int64_t s1_lo =3D (int32_t)EXTRACT32(rs1, i);
+        int64_t s1_hi =3D (int32_t)EXTRACT32(rs1, i + 1);
+        int64_t s2_lo =3D (int32_t)EXTRACT32(rs2, i);
+        int64_t s2_hi =3D (int32_t)EXTRACT32(rs2, i + 1);
+        int32_t res_lo =3D (s1_lo + s2_hi) >> 1;
+        int32_t res_hi =3D (s1_hi - s2_lo) >> 1;
+        rd =3D INSERT32(rd, res_lo, i);
+        rd =3D INSERT32(rd, res_hi, i + 1);
+    }
+    return rd;
+}
--=20
2.34.1
From nobody Sat May 30 20:13:15 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1776422951796336.36580170293564;
 Fri, 17 Apr 2026 03:49:11 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists1p.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1wDgjU-0001DX-F4; Fri, 17 Apr 2026 06:47:32 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjN-00015O-8P; Fri, 17 Apr 2026 06:47:27 -0400
Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn)
 by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjJ-0007za-FI; Fri, 17 Apr 2026 06:47:24 -0400
Received: from Huawei.localdomain (unknown [36.110.52.2])
 by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S9;
 Fri, 17 Apr 2026 18:47:15 +0800 (CST)
From: Molly Chen <xiaoou@iscas.ac.cn>
To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com,
 daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com,
 chao.liu.zevorn@gmail.com
Cc: xiaoou@iscas.ac.cn,
	qemu-riscv@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 07/14] target/riscv: rvp: add horizontal reduction, pack,
 merge and cout leading operations
Date: Fri, 17 Apr 2026 18:46:44 +0800
Message-Id: <20260417104652.17857-8-xiaoou@iscas.ac.cn>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
References: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S9
X-Coremail-Antispam: 1UD129KBjvAXoWftFWxXrWkurWUWrW5ZryDAwb_yoW8ur4rXo
 Z3Gw15A34fGr1fZ34kCw47Xr17ZrZFvw1kWr4rursruas7Wr1agF15t3W8Aa4xGrWSyrW5
 X39aqF15J3W3u3sxn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3
 AaLaJ3UjIYCTnIWjp_UUUOU7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva
 j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc
 Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l
 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26F4j6r
 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq
 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7
 IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4U
 M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0262kKe7AKxVWUtV
 W8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2
 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2
 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_
 Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJw
 CI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7VUbnNVPUU
 UUU==
X-Originating-IP: [36.110.52.2]
X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists1p.gnu.org;
Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn;
 helo=cstnet.cn
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998,
 HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: qemu development <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1776422952453158500
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Molly Chen <xiaoou@iscas.ac.cn>
---
 target/riscv/helper.h                   |  46 ++
 target/riscv/insn32.decode              |  44 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  44 ++
 target/riscv/psimd_helper.c             | 619 ++++++++++++++++++++++++
 4 files changed, 753 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index fc66712570..78ae034331 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1497,3 +1497,49 @@ DEF_HELPER_3(psas_wx, i64, env, i64, i64)
 DEF_HELPER_3(pssa_wx, i64, env, i64, i64)
 DEF_HELPER_3(paas_wx, i64, env, i64, i64)
 DEF_HELPER_3(pasa_wx, i64, env, i64, i64)
+
+/* Packed SIMD - Horizontal Reduction Operations */
+DEF_HELPER_3(predsum_bs, tl, env, tl, tl)
+DEF_HELPER_3(predsumu_bs, tl, env, tl, tl)
+DEF_HELPER_3(predsum_hs, tl, env, tl, tl)
+DEF_HELPER_3(predsumu_hs, tl, env, tl, tl)
+DEF_HELPER_3(predsum_ws, i64, env, i64, i64)
+DEF_HELPER_3(predsumu_ws, i64, env, i64, i64)
+
+/* Packed SIMD - Pack, Unpack, and Merge Operations */
+DEF_HELPER_3(ppaire_b, tl, env, tl, tl)
+DEF_HELPER_3(ppaireo_b, tl, env, tl, tl)
+DEF_HELPER_3(ppairoe_b, tl, env, tl, tl)
+DEF_HELPER_3(ppairo_b, tl, env, tl, tl)
+
+DEF_HELPER_3(ppaire_h, i64, env, i64, i64)
+DEF_HELPER_3(ppaireo_h, tl, env, tl, tl)
+DEF_HELPER_3(ppairoe_h, tl, env, tl, tl)
+DEF_HELPER_3(ppairo_h, tl, env, tl, tl)
+
+DEF_HELPER_3(ppaireo_w, i64, env, i64, i64)
+DEF_HELPER_3(ppairoe_w, i64, env, i64, i64)
+DEF_HELPER_3(ppairo_w, i64, env, i64, i64)
+DEF_HELPER_2(psext_h_b, tl, env, tl)
+DEF_HELPER_2(psext_w_b, i64, env, i64)
+DEF_HELPER_2(psext_w_h, i64, env, i64)
+DEF_HELPER_2(rev, tl, env, tl)
+DEF_HELPER_2(rev16, i64, env, i64)
+DEF_HELPER_3(zip8p, i64, env, i64, i64)
+DEF_HELPER_3(zip8hp, i64, env, i64, i64)
+DEF_HELPER_3(unzip8p, i64, env, i64, i64)
+DEF_HELPER_3(unzip8hp, i64, env, i64, i64)
+DEF_HELPER_3(zip16p, i64, env, i64, i64)
+DEF_HELPER_3(zip16hp, i64, env, i64, i64)
+DEF_HELPER_3(unzip16p, i64, env, i64, i64)
+DEF_HELPER_3(unzip16hp, i64, env, i64, i64)
+DEF_HELPER_4(slx, tl, env, tl, tl, tl)
+DEF_HELPER_4(srx, tl, env, tl, tl, tl)
+DEF_HELPER_4(mvm, tl, env, tl, tl, tl)
+DEF_HELPER_4(mvmn, tl, env, tl, tl, tl)
+DEF_HELPER_4(merge, tl, env, tl, tl, tl)
+
+/* Packed SIMD - Count Leading Operations */
+DEF_HELPER_2(cls, tl, env, tl)
+DEF_HELPER_2(clsw, i64, env, i64)
+
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ba003ed513..09bb69b302 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1277,3 +1277,47 @@ psas_wx    1001001 ..... ..... 110 ..... 0111011 @r
 pssa_wx    1001011 ..... ..... 110 ..... 0111011 @r
 paas_wx    1001101 ..... ..... 110 ..... 0111011 @r
 pasa_wx    1001111 ..... ..... 110 ..... 0111011 @r
+
+# Packed SIMD - Horizontal Reduction Operations
+predsum_bs   1001110 ..... ..... 100 ..... 0011011 @r
+predsumu_bs  1011110 ..... ..... 100 ..... 0011011 @r
+predsum_hs   1001100 ..... ..... 100 ..... 0011011 @r
+predsumu_hs  1011100 ..... ..... 100 ..... 0011011 @r
+predsum_ws   1001101 ..... ..... 100 ..... 0011011 @r
+predsumu_ws  1011101 ..... ..... 100 ..... 0011011 @r
+
+# Packed SIMD - Pack, Unpack, and Merge Operations
+ppaire_b    1000000 ..... ..... 100 ..... 0111011 @r
+ppaireo_b   1001000 ..... ..... 100 ..... 0111011 @r
+ppairoe_b   1010000 ..... ..... 100 ..... 0111011 @r
+ppairo_b    1011000 ..... ..... 100 ..... 0111011 @r
+ppaireo_h   1001001 ..... ..... 100 ..... 0111011 @r
+ppairoe_h   1010001 ..... ..... 100 ..... 0111011 @r
+ppairo_h    1011001 ..... ..... 100 ..... 0111011 @r
+ppaire_h    1000001 ..... ..... 100 ..... 0111011 @r
+ppaireo_w   1001011 ..... ..... 100 ..... 0111011 @r
+ppairoe_w   1010011 ..... ..... 100 ..... 0111011 @r
+ppairo_w    1011011 ..... ..... 100 ..... 0111011 @r
+psext_h_b   1110000 00100 ..... 010 ..... 0011011 @r2
+psext_w_b   1110001 00100 ..... 010 ..... 0011011 @r2
+psext_w_h   1110001 00101 ..... 010 ..... 0011011 @r2
+rev         01101 0111111 ..... 101 ..... 0010011 @r2
+rev16       01101 0110000 ..... 101 ..... 0010011 @r2
+zip8p       1111000 ..... ..... 010 ..... 0111011 @r
+zip8hp      1111010 ..... ..... 010 ..... 0111011 @r
+unzip8p     1110000 ..... ..... 010 ..... 0111011 @r
+unzip8hp    1110010 ..... ..... 010 ..... 0111011 @r
+zip16p      1111001 ..... ..... 010 ..... 0111011 @r
+zip16hp     1111011 ..... ..... 010 ..... 0111011 @r
+unzip16p    1110001 ..... ..... 010 ..... 0111011 @r
+unzip16hp   1110011 ..... ..... 010 ..... 0111011 @r
+slx         1000111 ..... ..... 001 ..... 0111011 @r
+srx         1010111 ..... ..... 001 ..... 0111011 @r
+mvm         1010100 ..... ..... 001 ..... 0111011 @r
+mvmn        1010101 ..... ..... 001 ..... 0111011 @r
+merge       1010110 ..... ..... 001 ..... 0111011 @r
+
+# Packed SIMD - Count Leading Operations
+cls    01100 0000011 ..... 001 ..... 0010011 @r2
+clsw   01100 0000011 ..... 001 ..... 0011011 @r2
+
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr=
ans/trans_rvp.c.inc
index b24a8ef7c2..fc6254b395 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -668,3 +668,47 @@ GEN_SIMD_TRANS_64(psas_wx)
 GEN_SIMD_TRANS_64(pssa_wx)
 GEN_SIMD_TRANS_64(paas_wx)
 GEN_SIMD_TRANS_64(pasa_wx)
+
+/* Packed SIMD - Horizontal Reduction Operations */
+GEN_SIMD_TRANS(predsum_bs)
+GEN_SIMD_TRANS(predsumu_bs)
+GEN_SIMD_TRANS(predsum_hs)
+GEN_SIMD_TRANS(predsumu_hs)
+GEN_SIMD_TRANS_64(predsum_ws)
+GEN_SIMD_TRANS_64(predsumu_ws)
+
+/* Packed SIMD - Pack, Unpack, and Merge Operations */
+GEN_SIMD_TRANS(ppaire_b)
+GEN_SIMD_TRANS(ppaireo_b)
+GEN_SIMD_TRANS(ppairoe_b)
+GEN_SIMD_TRANS(ppairo_b)
+GEN_SIMD_TRANS_64(ppaire_h)
+GEN_SIMD_TRANS(ppaireo_h)
+GEN_SIMD_TRANS(ppairoe_h)
+GEN_SIMD_TRANS(ppairo_h)
+GEN_SIMD_TRANS_64(ppaireo_w)
+GEN_SIMD_TRANS_64(ppairoe_w)
+GEN_SIMD_TRANS_64(ppairo_w)
+GEN_SIMD_TRANS_R1(psext_h_b)
+GEN_SIMD_TRANS_R1_64(psext_w_b)
+GEN_SIMD_TRANS_R1_64(psext_w_h)
+GEN_SIMD_TRANS_R1(rev)
+GEN_SIMD_TRANS_R1_64(rev16)
+GEN_SIMD_TRANS_64(zip8p)
+GEN_SIMD_TRANS_64(zip8hp)
+GEN_SIMD_TRANS_64(unzip8p)
+GEN_SIMD_TRANS_64(unzip8hp)
+GEN_SIMD_TRANS_64(zip16p)
+GEN_SIMD_TRANS_64(zip16hp)
+GEN_SIMD_TRANS_64(unzip16p)
+GEN_SIMD_TRANS_64(unzip16hp)
+GEN_SIMD_TRANS_ACC(slx)
+GEN_SIMD_TRANS_ACC(srx)
+GEN_SIMD_TRANS_ACC(mvm)
+GEN_SIMD_TRANS_ACC(mvmn)
+GEN_SIMD_TRANS_ACC(merge)
+
+/* Packed SIMD - Count Leading Operations */
+GEN_SIMD_TRANS_R1(cls)
+GEN_SIMD_TRANS_R1_64(clsw)
+
diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c
index e48c9897ae..4080aab234 100644
--- a/target/riscv/psimd_helper.c
+++ b/target/riscv/psimd_helper.c
@@ -2997,3 +2997,622 @@ uint64_t HELPER(pasa_wx)(CPURISCVState *env, uint64=
_t rs1, uint64_t rs2)
     }
     return rd;
 }
+
+/* Horizontal sum operations */
+
+/**
+ * PREDSUM.BS - Signed reduction sum of bytes
+ * rd =3D rs2 + sum(sign_extend(rs1[i]))
+ */
+target_ulong HELPER(predsum_bs)(CPURISCVState *env,
+                                target_ulong rs1, target_ulong rs2)
+{
+    int64_t sum =3D (int64_t)(int32_t)rs2;
+    int elems =3D ELEMS_B(rs1);
+
+    for (int i =3D 0; i < elems; i++) {
+        int8_t e1 =3D (int8_t)EXTRACT8(rs1, i);
+        sum +=3D e1;
+    }
+
+    return (target_ulong)sum;
+}
+
+/**
+ * PREDSUMU.BS - Unsigned reduction sum of bytes
+ * rd =3D rs2 + sum(zero_extend(rs1[i]))
+ */
+target_ulong HELPER(predsumu_bs)(CPURISCVState *env,
+                                 target_ulong rs1, target_ulong rs2)
+{
+    uint64_t sum =3D rs2;
+    int elems =3D ELEMS_B(rs1);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t e1 =3D EXTRACT8(rs1, i);
+        sum +=3D e1;
+    }
+
+    return (target_ulong)sum;
+}
+
+/**
+ * PREDSUM.HS - Signed reduction sum of halfwords
+ * rd =3D rs2 + sum(sign_extend(rs1[i]))
+ */
+target_ulong HELPER(predsum_hs)(CPURISCVState *env,
+                                target_ulong rs1, target_ulong rs2)
+{
+    int64_t sum =3D (int64_t)(int32_t)rs2;
+    int elems =3D ELEMS_H(rs1);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        sum +=3D e1;
+    }
+
+    return (target_ulong)sum;
+}
+
+/**
+ * PREDSUMU.HS - Unsigned reduction sum of halfwords
+ * rd =3D rs2 + sum(zero_extend(rs1[i]))
+ */
+target_ulong HELPER(predsumu_hs)(CPURISCVState *env,
+                                 target_ulong rs1, target_ulong rs2)
+{
+    uint64_t sum =3D rs2;
+    int elems =3D ELEMS_H(rs1);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        sum +=3D e1;
+    }
+
+    return (target_ulong)sum;
+}
+
+/**
+ * PREDSUM.WS - Signed reduction sum of words (RV64 only)
+ * rd =3D rs2 + sum(sign_extend(rs1[i]))
+ */
+uint64_t HELPER(predsum_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int64_t sum =3D (int64_t)rs2;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        sum +=3D e1;
+    }
+
+    return (uint64_t)sum;
+}
+
+/**
+ * PREDSUMU.WS - Unsigned reduction sum of words (RV64 only)
+ * rd =3D rs2 + sum(zero_extend(rs1[i]))
+ */
+uint64_t HELPER(predsumu_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs=
2)
+{
+    uint64_t sum =3D rs2;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        sum +=3D e1;
+    }
+
+    return sum;
+}
+
+/* Packing/unpacking operations */
+
+/**
+ * PPAIRE.B - Pair low bytes of corresponding halfwords
+ * For each halfword: rd[i] =3D {rs2[i][7:0], rs1[i][7:0]}
+ */
+target_ulong HELPER(ppaire_b)(CPURISCVState *env,
+                               target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint16_t res =3D ((e2 & 0x00FF) << 8) | (e1 & 0x00FF);
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PPAIREO.B - Pair high byte of rs2 with low byte of rs1
+ * For each halfword: rd[i] =3D {rs2[i][15:8], rs1[i][7:0]}
+ */
+target_ulong HELPER(ppaireo_b)(CPURISCVState *env,
+                                target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint16_t res =3D ((e2 >> 8) << 8) | (e1 & 0x00FF);
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PPAIROE.B - Pair low byte of rs2 with high byte of rs1
+ * For each halfword: rd[i] =3D {rs2[i][7:0], rs1[i][15:8]}
+ */
+target_ulong HELPER(ppairoe_b)(CPURISCVState *env,
+                                target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint16_t res =3D ((e2 & 0x00FF) << 8) | ((e1 >> 8) & 0x00FF);
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PPAIRO.B - Pair high bytes of corresponding halfwords
+ * For each halfword: rd[i] =3D {rs2[i][15:8], rs1[i][15:8]}
+ */
+target_ulong HELPER(ppairo_b)(CPURISCVState *env,
+                               target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint16_t res =3D ((e2 >> 8) << 8) | ((e1 >> 8) & 0x00FF);
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PPAIRE.H - Pair low halfwords of corresponding words
+ * (RV64 only)
+ * For each word: rd[i] =3D {rs2[i][15:0], rs1[i][15:0]}
+ */
+uint64_t HELPER(ppaire_h)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint32_t res =3D ((e2 & 0x0000FFFF) << 16) | (e1 & 0x0000FFFF);
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PPAIREO.H - Pair high halfword of rs2 with low halfword of rs1 (RV64 on=
ly)
+ */
+target_ulong HELPER(ppaireo_h)(CPURISCVState *env,
+                                target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint32_t res =3D ((e2 >> 16) << 16) | (e1 & 0x0000FFFF);
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PPAIROE.H - Pair low halfword of rs2 with high halfword of rs1 (RV64 on=
ly)
+ */
+target_ulong HELPER(ppairoe_h)(CPURISCVState *env,
+                                target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint32_t res =3D ((e2 & 0x0000FFFF) << 16) | ((e1 >> 16) & 0x0000F=
FFF);
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PPAIRO.H - Pair high halfwords of corresponding words (RV64 only)
+ */
+target_ulong HELPER(ppairo_h)(CPURISCVState *env,
+                               target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint32_t res =3D ((e2 >> 16) << 16) | ((e1 >> 16) & 0x0000FFFF);
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PPAIREO.W - Pair low word of rs2 with low word of rs1 (RV64 only)
+ */
+uint64_t HELPER(ppaireo_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    uint32_t e1 =3D EXTRACT32(rs1, 0);
+    uint32_t e2 =3D EXTRACT32(rs2, 1);
+    rd =3D ((uint64_t)e2 << 32) | e1;
+    return rd;
+}
+
+/**
+ * PPAIROE.W - Pair low word of rs2 with high word of rs1 (RV64 only)
+ */
+uint64_t HELPER(ppairoe_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    uint32_t e1 =3D EXTRACT32(rs1, 1);
+    uint32_t e2 =3D EXTRACT32(rs2, 0);
+    rd =3D ((uint64_t)e2 << 32) | e1;
+    return rd;
+}
+
+/**
+ * PPAIRO.W - Pair high word of rs2 with high word of rs1 (RV64 only)
+ */
+uint64_t HELPER(ppairo_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    uint32_t e1 =3D EXTRACT32(rs1, 1);
+    uint32_t e2 =3D EXTRACT32(rs2, 1);
+    rd =3D ((uint64_t)e2 << 32) | e1;
+    return rd;
+}
+
+/**
+ * PSEXT.H.B - Sign-extend bytes to halfwords within each halfword
+ */
+target_ulong HELPER(psext_h_b)(CPURISCVState *env, target_ulong rs1)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        int8_t b0 =3D (int8_t)(e1 & 0xFF);
+        int16_t res =3D (int16_t)b0;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PSEXT.W.B - Sign-extend bytes to words (RV64 only)
+ */
+uint64_t HELPER(psext_w_b)(CPURISCVState *env, uint64_t rs1)
+{
+    uint64_t rd =3D 0;
+    int8_t b0 =3D (int8_t)EXTRACT8(rs1, 0);
+    int8_t b4 =3D (int8_t)EXTRACT8(rs1, 4);
+    uint32_t lo =3D (uint32_t)(int32_t)b0;
+    uint32_t hi =3D (uint32_t)(int32_t)b4;
+    rd =3D ((uint64_t)hi << 32) | lo;
+    return rd;
+}
+
+/**
+ * PSEXT.W.H - Sign-extend halfwords to words (RV64 only)
+ */
+uint64_t HELPER(psext_w_h)(CPURISCVState *env, uint64_t rs1)
+{
+    uint64_t rd =3D 0;
+    int16_t h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t h2 =3D (int16_t)EXTRACT16(rs1, 2);
+    uint32_t lo =3D (uint32_t)(int32_t)h0;
+    uint32_t hi =3D (uint32_t)(int32_t)h2;
+    rd =3D ((uint64_t)hi << 32) | lo;
+    return rd;
+}
+
+/**
+ * REV - Reverse all bits
+ */
+target_ulong HELPER(rev)(CPURISCVState *env, target_ulong rs1)
+{
+    target_ulong rd =3D 0;
+
+    for (int i =3D 0; i < TARGET_LONG_BITS; i++) {
+        rd =3D (rd << 1) | (rs1 & 1);
+        rs1 >>=3D 1;
+    }
+
+    return rd;
+}
+
+/**
+ * REV16 - Reverse 16-bit chunks (RV64 only)
+ */
+uint64_t HELPER(rev16)(CPURISCVState *env, uint64_t rs1)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t chunk =3D EXTRACT16(rs1, i);
+        rd =3D (rd << 16) | chunk;
+    }
+
+    return rd;
+}
+
+/**
+ * ZIP8P - Interleave bytes from rs2 and rs1 (RV64 only)
+ * rd =3D {rs2[31:24], rs1[31:24], rs2[23:16], rs1[23:16],
+ *       rs2[15:8], rs1[15:8], rs2[7:0], rs1[7:0]}
+ */
+uint64_t HELPER(zip8p)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint8_t b1 =3D EXTRACT8(rs1, 3 - i);
+        uint8_t b2 =3D EXTRACT8(rs2, 3 - i);
+        rd =3D (rd << 16) | ((uint16_t)b2 << 8) | b1;
+    }
+
+    return rd;
+}
+
+/**
+ * ZIP8HP - Interleave high bytes from rs2 and rs1 (RV64 only)
+ */
+uint64_t HELPER(zip8hp)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint8_t b1 =3D EXTRACT8(rs1, 7 - i);
+        uint8_t b2 =3D EXTRACT8(rs2, 7 - i);
+        rd =3D (rd << 16) | ((uint16_t)b2 << 8) | b1;
+    }
+
+    return rd;
+}
+
+/**
+ * UNZIP8P - De-interleave bytes
+ * (RV64 only)
+ */
+uint64_t HELPER(unzip8p)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint64_t b1 =3D EXTRACT8(rs1, 2 * i) << 8 * i;
+        uint64_t b2 =3D EXTRACT8(rs2, 2 * i) << (32 + 8 * i);
+        rd =3D rd | b2 | b1;
+    }
+
+    return rd;
+}
+
+/**
+ * UNZIP8HP - De-interleave high bytes
+ * (RV64 only)
+ */
+uint64_t HELPER(unzip8hp)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint64_t b1 =3D EXTRACT8(rs1, 2 * i + 1) << 8 * i;
+        uint64_t b2 =3D EXTRACT8(rs2, 2 * i + 1) << (32 + 8 * i);
+        rd =3D rd | b2 | b1;
+    }
+
+    return rd;
+}
+
+/**
+ * ZIP16P - Interleave halfwords from rs2 and rs1 (RV64 only)
+ */
+uint64_t HELPER(zip16p)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint16_t h1 =3D EXTRACT16(rs1, 1 - i);
+        uint16_t h2 =3D EXTRACT16(rs2, 1 - i);
+        rd =3D (rd << 32) | ((uint32_t)h2 << 16) | h1;
+    }
+
+    return rd;
+}
+
+/**
+ * ZIP16HP - Interleave high halfwords (RV64 only)
+ */
+uint64_t HELPER(zip16hp)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint16_t h1 =3D EXTRACT16(rs1, 3 - i);
+        uint16_t h2 =3D EXTRACT16(rs2, 3 - i);
+        rd =3D (rd << 32) | ((uint32_t)h2 << 16) | h1;
+    }
+
+    return rd;
+}
+
+/**
+ * UNZIP16P - De-interleave halfwords (RV64 only)
+ */
+uint64_t HELPER(unzip16p)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint64_t b1 =3D EXTRACT16(rs1, 2 * i) << 16 * i;
+        uint64_t b2 =3D EXTRACT16(rs2, 2 * i) << (32 + 16 * i);
+        rd =3D rd | b2 | b1;
+    }
+
+    return rd;
+}
+
+/**
+ * UNZIP16HP - De-interleave high halfwords (RV64 only)
+ */
+uint64_t HELPER(unzip16hp)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint64_t b1 =3D EXTRACT16(rs1, 2 * i + 1) << 16 * i;
+        uint64_t b2 =3D EXTRACT16(rs2, 2 * i + 1) << (32 + 16 * i);
+        rd =3D rd | b2 | b1;
+    }
+
+    return rd;
+}
+
+
+/* Merge and mask operations */
+
+/**
+ * SLX - Shift left extended (concatenate rd and rs1, shift left, take upp=
er)
+ */
+target_ulong HELPER(slx)(CPURISCVState *env, target_ulong rs1,
+                         target_ulong rs2, target_ulong rd)
+{
+    int shamt =3D (TARGET_LONG_BITS =3D=3D 32) ? (rs2 & 0x1F) : (rs2 & 0x3=
F);
+    target_ulong xrs1 =3D 0;
+    target_ulong xrd =3D 0;
+
+    if (shamt <=3D TARGET_LONG_BITS) {
+        xrs1 =3D rs1 >> (TARGET_LONG_BITS - shamt);
+        xrd =3D (rd << shamt) + xrs1;
+    } else {
+        xrd =3D rs1 << (shamt - TARGET_LONG_BITS);
+    }
+
+    return xrd;
+}
+
+/**
+ * SRX - Shift right extended (concatenate rs1 and rd, shift right, take l=
ower)
+ */
+target_ulong HELPER(srx)(CPURISCVState *env, target_ulong rs1,
+                         target_ulong rs2, target_ulong rd)
+{
+    int shamt =3D (TARGET_LONG_BITS =3D=3D 32) ? (rs2 & 0x1F) : (rs2 & 0x3=
F);
+    target_ulong xrs1 =3D 0;
+    target_ulong xrd =3D 0;
+
+    if (shamt <=3D TARGET_LONG_BITS) {
+        xrs1 =3D rs1 << (TARGET_LONG_BITS - shamt);
+        xrd =3D (rd >> shamt) + xrs1;
+    } else {
+        xrd =3D rs1 >> (shamt - TARGET_LONG_BITS);
+    }
+
+    return xrd;
+}
+
+/**
+ * MVM - Move masked
+ * For each bit: rd[i] =3D rs2[i] ? rs1[i] : rd[i]
+ */
+target_ulong HELPER(mvm)(CPURISCVState *env, target_ulong rs1,
+                         target_ulong rs2, target_ulong rd)
+{
+    return (~rs2 & rd) | (rs2 & rs1);
+}
+
+/**
+ * MVMN - Move masked not
+ * For each bit: rd[i] =3D rs2[i] ? rd[i] : rs1[i]
+ */
+target_ulong HELPER(mvmn)(CPURISCVState *env, target_ulong rs1,
+                          target_ulong rs2, target_ulong rd)
+{
+    return (~rs2 & rs1) | (rs2 & rd);
+}
+
+/**
+ * MERGE - Merge
+ * For each bit: rd[i] =3D rd[i] ? rs2[i] : rs1[i]
+ */
+target_ulong HELPER(merge)(CPURISCVState *env, target_ulong rs1,
+                           target_ulong rs2, target_ulong rd)
+{
+    return (~rd & rs1) | (rd & rs2);
+}
+
+/* Count leading operations */
+
+/**
+ * CLS - Count leading redundant sign bits
+ */
+target_ulong HELPER(cls)(CPURISCVState *env, target_ulong rs1)
+{
+    target_long a =3D (target_long)rs1;
+    target_ulong cnt =3D 0;
+
+#if TARGET_LONG_BITS =3D=3D 64
+    target_long lo_bound =3D 0xC000000000000000LL;
+    target_long hi_bound =3D 0x3FFFFFFFFFFFFFFFLL;
+#else
+    target_long lo_bound =3D 0xC0000000;
+    target_long hi_bound =3D 0x3FFFFFFF;
+#endif
+
+    while (cnt < TARGET_LONG_BITS - 1 && a >=3D lo_bound && a <=3D hi_boun=
d) {
+        cnt++;
+        a <<=3D 1;
+    }
+
+    return cnt;
+}
+
+/**
+ * CLSW - Count leading redundant sign bits of low 32 bits (RV64)
+ */
+uint64_t HELPER(clsw)(CPURISCVState *env, uint64_t rs1)
+{
+    int32_t a =3D (int32_t)(rs1 & 0xFFFFFFFF);
+    int32_t lo_bound =3D 0xC0000000;
+    int32_t hi_bound =3D 0x3FFFFFFF;
+    int c =3D 0;
+
+    while (c < 31 && a >=3D lo_bound && a <=3D hi_bound) {
+        c++;
+        a <<=3D 1;
+    }
+
+    return c;
+}
--=20
2.34.1
From nobody Sat May 30 20:13:15 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1776422971518963.8582716011879;
 Fri, 17 Apr 2026 03:49:31 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists1p.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1wDgjX-0001GI-GR; Fri, 17 Apr 2026 06:47:35 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjN-00015R-RB; Fri, 17 Apr 2026 06:47:27 -0400
Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn)
 by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjI-0007zM-NP; Fri, 17 Apr 2026 06:47:25 -0400
Received: from Huawei.localdomain (unknown [36.110.52.2])
 by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S10;
 Fri, 17 Apr 2026 18:47:16 +0800 (CST)
From: Molly Chen <xiaoou@iscas.ac.cn>
To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com,
 daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com,
 chao.liu.zevorn@gmail.com
Cc: xiaoou@iscas.ac.cn,
	qemu-riscv@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 08/14] target/riscv: rvp: add pure multiplication operations
Date: Fri, 17 Apr 2026 18:46:45 +0800
Message-Id: <20260417104652.17857-9-xiaoou@iscas.ac.cn>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
References: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S10
X-Coremail-Antispam: 1UD129KBjvAXoWfXr1kAr1UGr45JF4xtF1rtFb_yoWrXF4xKo
 W3Gw1Yy3s3Gw1xuw4rCa1UXw17ZrWIvw1DJw4Fvr45Xas7Gr17KF15J34kAayxGrWSyrW8
 WFZavF1fJF9Ik3srn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3
 AaLaJ3UjIYCTnIWjp_UUUOj7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva
 j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc
 Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l
 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV
 WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE
 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I
 x0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8
 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_Jw
 0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AK
 xVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrx
 kI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v2
 6F4j6r4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr
 1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUqLvNU
 UUUU=
X-Originating-IP: [36.110.52.2]
X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists1p.gnu.org;
Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn;
 helo=cstnet.cn
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998,
 HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: qemu development <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1776422972645158500
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Molly Chen <xiaoou@iscas.ac.cn>
---
 target/riscv/helper.h                   |   62 ++
 target/riscv/insn32.decode              |   92 ++
 target/riscv/insn_trans/trans_rvp.c.inc |   62 ++
 target/riscv/psimd_helper.c             | 1066 +++++++++++++++++++++++
 4 files changed, 1282 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 78ae034331..4b3f01f8d0 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1543,3 +1543,65 @@ DEF_HELPER_4(merge, tl, env, tl, tl, tl)
 DEF_HELPER_2(cls, tl, env, tl)
 DEF_HELPER_2(clsw, i64, env, i64)
=20
+/* Packed SIMD - Pure Multiplication Operations */
+DEF_HELPER_3(pmulh_h, tl, env, tl, tl)
+DEF_HELPER_3(pmulhsu_h, tl, env, tl, tl)
+DEF_HELPER_3(pmulhu_h, tl, env, tl, tl)
+DEF_HELPER_3(pmulhr_h, tl, env, tl, tl)
+DEF_HELPER_3(pmulhrsu_h, tl, env, tl, tl)
+DEF_HELPER_3(pmulhru_h, tl, env, tl, tl)
+DEF_HELPER_3(pmulh_w, i64, env, i64, i64)
+DEF_HELPER_3(pmulhr_w, i64, env, i64, i64)
+DEF_HELPER_3(pmulhsu_w, i64, env, i64, i64)
+DEF_HELPER_3(pmulhrsu_w, i64, env, i64, i64)
+DEF_HELPER_3(pmulhu_w, i64, env, i64, i64)
+DEF_HELPER_3(pmulhru_w, i64, env, i64, i64)
+DEF_HELPER_3(mulhr, i32, env, i32, i32)
+DEF_HELPER_3(mulhrsu, i32, env, i32, i32)
+DEF_HELPER_3(mulhru, i32, env, i32, i32)
+DEF_HELPER_3(pmulh_h_b0, tl, env, tl, tl)
+DEF_HELPER_3(pmulh_h_b1, tl, env, tl, tl)
+DEF_HELPER_3(pmulhsu_h_b0, tl, env, tl, tl)
+DEF_HELPER_3(pmulhsu_h_b1, tl, env, tl, tl)
+DEF_HELPER_3(mulh_h0, i32, env, i32, i32)
+DEF_HELPER_3(mulh_h1, i32, env, i32, i32)
+DEF_HELPER_3(mulhsu_h0, i32, env, i32, i32)
+DEF_HELPER_3(mulhsu_h1, i32, env, i32, i32)
+DEF_HELPER_3(pmulh_w_h0, i64, env, i64, i64)
+DEF_HELPER_3(pmulh_w_h1, i64, env, i64, i64)
+DEF_HELPER_3(pmulhsu_w_h0, i64, env, i64, i64)
+DEF_HELPER_3(pmulhsu_w_h1, i64, env, i64, i64)
+DEF_HELPER_3(pmul_h_b00, tl, env, tl, tl)
+DEF_HELPER_3(pmul_h_b01, tl, env, tl, tl)
+DEF_HELPER_3(pmul_h_b11, tl, env, tl, tl)
+DEF_HELPER_3(pmulsu_h_b00, tl, env, tl, tl)
+DEF_HELPER_3(pmulsu_h_b11, tl, env, tl, tl)
+DEF_HELPER_3(pmulu_h_b00, tl, env, tl, tl)
+DEF_HELPER_3(pmulu_h_b01, tl, env, tl, tl)
+DEF_HELPER_3(pmulu_h_b11, tl, env, tl, tl)
+DEF_HELPER_3(pmul_w_h00, i64, env, i64, i64)
+DEF_HELPER_3(pmul_w_h01, i64, env, i64, i64)
+DEF_HELPER_3(pmul_w_h11, i64, env, i64, i64)
+DEF_HELPER_3(pmulsu_w_h00, i64, env, i64, i64)
+DEF_HELPER_3(pmulsu_w_h11, i64, env, i64, i64)
+DEF_HELPER_3(pmulu_w_h00, i64, env, i64, i64)
+DEF_HELPER_3(pmulu_w_h01, i64, env, i64, i64)
+DEF_HELPER_3(pmulu_w_h11, i64, env, i64, i64)
+DEF_HELPER_3(pm2sadd_h, tl, env, tl, tl)
+DEF_HELPER_3(pm2sadd_hx, tl, env, tl, tl)
+DEF_HELPER_3(mul_h00, i32, env, i32, i32)
+DEF_HELPER_3(mul_h01, i32, env, i32, i32)
+DEF_HELPER_3(mul_h11, i32, env, i32, i32)
+DEF_HELPER_3(mulsu_h00, i32, env, i32, i32)
+DEF_HELPER_3(mulsu_h11, i32, env, i32, i32)
+DEF_HELPER_3(mulu_h00, i32, env, i32, i32)
+DEF_HELPER_3(mulu_h01, i32, env, i32, i32)
+DEF_HELPER_3(mulu_h11, i32, env, i32, i32)
+DEF_HELPER_3(mul_w00, i64, env, i64, i64)
+DEF_HELPER_3(mul_w01, i64, env, i64, i64)
+DEF_HELPER_3(mul_w11, i64, env, i64, i64)
+DEF_HELPER_3(mulsu_w00, i64, env, i64, i64)
+DEF_HELPER_3(mulsu_w11, i64, env, i64, i64)
+DEF_HELPER_3(mulu_w00, i64, env, i64, i64)
+DEF_HELPER_3(mulu_w01, i64, env, i64, i64)
+DEF_HELPER_3(mulu_w11, i64, env, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 09bb69b302..bd3b14af5b 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1321,3 +1321,95 @@ merge       1010110 ..... ..... 001 ..... 0111011 @r
 cls    01100 0000011 ..... 001 ..... 0010011 @r2
 clsw   01100 0000011 ..... 001 ..... 0011011 @r2
=20
+# Packed SIMD - Pure Multiplication Operations
+pmulh_h     10000 00 ..... ..... 111 ..... 0111011 @r
+pmulhsu_h   11000 00 ..... ..... 111 ..... 0111011 @r
+pmulhu_h    10010 00 ..... ..... 111 ..... 0111011 @r
+pmulhr_h    10000 10 ..... ..... 111 ..... 0111011 @r
+pmulhrsu_h  11000 10 ..... ..... 111 ..... 0111011 @r
+pmulhru_h   10010 10 ..... ..... 111 ..... 0111011 @r
+pmulh_w     10000 01 ..... ..... 111 ..... 0111011 @r
+{
+  mulhr     10000 11 ..... ..... 111 ..... 0111011 @r
+  pmulhr_w  10000 11 ..... ..... 111 ..... 0111011 @r
+}
+pmulhsu_w   11000 01 ..... ..... 111 ..... 0111011 @r
+{
+  mulhrsu   11000 11 ..... ..... 111 ..... 0111011 @r
+  pmulhrsu_w    11000 11 ..... ..... 111 ..... 0111011 @r
+}
+pmulhu_w    10010 01 ..... ..... 111 ..... 0111011 @r
+{
+  mulhru    10010 11 ..... ..... 111 ..... 0111011 @r
+  pmulhru_w 10010 11 ..... ..... 111 ..... 0111011 @r
+}
+pmulh_h_b0      10100 00 ..... ..... 111 ..... 0111011 @r
+pmulh_h_b1      10110 00 ..... ..... 111 ..... 0111011 @r
+pmulhsu_h_b0    10100 10 ..... ..... 111 ..... 0111011 @r
+pmulhsu_h_b1    10110 10 ..... ..... 111 ..... 0111011 @r
+{
+  mulh_h0       10100 01 ..... ..... 111 ..... 0111011 @r
+  pmulh_w_h0    10100 01 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mulh_h1       10110 01 ..... ..... 111 ..... 0111011 @r
+  pmulh_w_h1    10110 01 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mulhsu_h0     10100 11 ..... ..... 111 ..... 0111011 @r
+  pmulhsu_w_h0  10100 11 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mulhsu_h1     10110 11 ..... ..... 111 ..... 0111011 @r
+  pmulhsu_w_h1  10110 11 ..... ..... 111 ..... 0111011 @r
+}
+pmul_h_b00      10000 00 ..... ..... 011 ..... 0111011 @r
+pmul_h_b01      10010 00 ..... ..... 001 ..... 0111011 @r
+pmul_h_b11      10010 00 ..... ..... 011 ..... 0111011 @r
+pmulsu_h_b00    11100 00 ..... ..... 011 ..... 0111011 @r
+pmulsu_h_b11    11110 00 ..... ..... 011 ..... 0111011 @r
+pmulu_h_b00     10100 00 ..... ..... 011 ..... 0111011 @r
+pmulu_h_b01     10110 00 ..... ..... 001 ..... 0111011 @r
+pmulu_h_b11     10110 00 ..... ..... 011 ..... 0111011 @r
+{
+  mul_h00       10000 01 ..... ..... 011 ..... 0111011 @r
+  pmul_w_h00    10000 01 ..... ..... 011 ..... 0111011 @r
+}
+{
+  mul_h01       10010 01 ..... ..... 001 ..... 0111011 @r
+  pmul_w_h01    10010 01 ..... ..... 001 ..... 0111011 @r
+}
+{
+  mul_h11       10010 01 ..... ..... 011 ..... 0111011 @r
+  pmul_w_h11    10010 01 ..... ..... 011 ..... 0111011 @r
+}
+{
+  mulsu_h00     11100 01 ..... ..... 011 ..... 0111011 @r
+  pmulsu_w_h00  11100 01 ..... ..... 011 ..... 0111011 @r
+}
+{
+  mulsu_h11     11110 01 ..... ..... 011 ..... 0111011 @r
+  pmulsu_w_h11  11110 01 ..... ..... 011 ..... 0111011 @r
+}
+{
+  mulu_h00      10100 01 ..... ..... 011 ..... 0111011 @r
+  pmulu_w_h00   10100 01 ..... ..... 011 ..... 0111011 @r
+}
+{
+  mulu_h01      10110 01 ..... ..... 001 ..... 0111011 @r
+  pmulu_w_h01   10110 01 ..... ..... 001 ..... 0111011 @r
+}
+{
+  mulu_h11      10110 01 ..... ..... 011 ..... 0111011 @r
+  pmulu_w_h11   10110 01 ..... ..... 011 ..... 0111011 @r
+}
+pm2sadd_h       11000 10 ..... ..... 101 ..... 0111011 @r
+pm2sadd_hx      11010 10 ..... ..... 101 ..... 0111011 @r
+mul_w00         10000 11 ..... ..... 011 ..... 0111011 @r
+mul_w01         10010 11 ..... ..... 001 ..... 0111011 @r
+mul_w11         10010 11 ..... ..... 011 ..... 0111011 @r
+mulsu_w00       11100 11 ..... ..... 011 ..... 0111011 @r
+mulsu_w11       11110 11 ..... ..... 011 ..... 0111011 @r
+mulu_w00        10100 11 ..... ..... 011 ..... 0111011 @r
+mulu_w01        10110 11 ..... ..... 001 ..... 0111011 @r
+mulu_w11        10110 11 ..... ..... 011 ..... 0111011 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr=
ans/trans_rvp.c.inc
index fc6254b395..b01656ffb0 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -712,3 +712,65 @@ GEN_SIMD_TRANS_ACC(merge)
 GEN_SIMD_TRANS_R1(cls)
 GEN_SIMD_TRANS_R1_64(clsw)
=20
+/* Packed SIMD - Pure Multiplication Operations */
+GEN_SIMD_TRANS(pmulh_h)
+GEN_SIMD_TRANS(pmulhsu_h)
+GEN_SIMD_TRANS(pmulhu_h)
+GEN_SIMD_TRANS(pmulhr_h)
+GEN_SIMD_TRANS(pmulhrsu_h)
+GEN_SIMD_TRANS(pmulhru_h)
+GEN_SIMD_TRANS_64(pmulh_w)
+GEN_SIMD_TRANS_64(pmulhr_w)
+GEN_SIMD_TRANS_64(pmulhsu_w)
+GEN_SIMD_TRANS_64(pmulhrsu_w)
+GEN_SIMD_TRANS_64(pmulhu_w)
+GEN_SIMD_TRANS_64(pmulhru_w)
+GEN_SIMD_TRANS_32(mulhr)
+GEN_SIMD_TRANS_32(mulhrsu)
+GEN_SIMD_TRANS_32(mulhru)
+GEN_SIMD_TRANS(pmulh_h_b0)
+GEN_SIMD_TRANS(pmulh_h_b1)
+GEN_SIMD_TRANS(pmulhsu_h_b0)
+GEN_SIMD_TRANS(pmulhsu_h_b1)
+GEN_SIMD_TRANS_32(mulh_h0)
+GEN_SIMD_TRANS_32(mulh_h1)
+GEN_SIMD_TRANS_32(mulhsu_h0)
+GEN_SIMD_TRANS_32(mulhsu_h1)
+GEN_SIMD_TRANS_64(pmulh_w_h0)
+GEN_SIMD_TRANS_64(pmulh_w_h1)
+GEN_SIMD_TRANS_64(pmulhsu_w_h0)
+GEN_SIMD_TRANS_64(pmulhsu_w_h1)
+GEN_SIMD_TRANS(pmul_h_b00)
+GEN_SIMD_TRANS(pmul_h_b01)
+GEN_SIMD_TRANS(pmul_h_b11)
+GEN_SIMD_TRANS(pmulsu_h_b00)
+GEN_SIMD_TRANS(pmulsu_h_b11)
+GEN_SIMD_TRANS(pmulu_h_b00)
+GEN_SIMD_TRANS(pmulu_h_b01)
+GEN_SIMD_TRANS(pmulu_h_b11)
+GEN_SIMD_TRANS_64(pmul_w_h00)
+GEN_SIMD_TRANS_64(pmul_w_h01)
+GEN_SIMD_TRANS_64(pmul_w_h11)
+GEN_SIMD_TRANS_64(pmulsu_w_h00)
+GEN_SIMD_TRANS_64(pmulsu_w_h11)
+GEN_SIMD_TRANS_64(pmulu_w_h00)
+GEN_SIMD_TRANS_64(pmulu_w_h01)
+GEN_SIMD_TRANS_64(pmulu_w_h11)
+GEN_SIMD_TRANS(pm2sadd_h)
+GEN_SIMD_TRANS(pm2sadd_hx)
+GEN_SIMD_TRANS_32(mul_h00)
+GEN_SIMD_TRANS_32(mul_h01)
+GEN_SIMD_TRANS_32(mul_h11)
+GEN_SIMD_TRANS_32(mulsu_h00)
+GEN_SIMD_TRANS_32(mulsu_h11)
+GEN_SIMD_TRANS_32(mulu_h00)
+GEN_SIMD_TRANS_32(mulu_h01)
+GEN_SIMD_TRANS_32(mulu_h11)
+GEN_SIMD_TRANS_64(mul_w00)
+GEN_SIMD_TRANS_64(mul_w01)
+GEN_SIMD_TRANS_64(mul_w11)
+GEN_SIMD_TRANS_64(mulsu_w00)
+GEN_SIMD_TRANS_64(mulsu_w11)
+GEN_SIMD_TRANS_64(mulu_w00)
+GEN_SIMD_TRANS_64(mulu_w01)
+GEN_SIMD_TRANS_64(mulu_w11)
diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c
index 4080aab234..b60fd3094c 100644
--- a/target/riscv/psimd_helper.c
+++ b/target/riscv/psimd_helper.c
@@ -3616,3 +3616,1069 @@ uint64_t HELPER(clsw)(CPURISCVState *env, uint64_t=
 rs1)
=20
     return c;
 }
+
+/* Pure multiplication operations */
+
+/**
+ * PMULH.H - Packed signed 16-bit multiply high
+ * For each halfword: rd[i] =3D (rs1[i] * rs2[i]) >> 16
+ */
+target_ulong HELPER(pmulh_h)(CPURISCVState *env,
+                             target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        int32_t prod =3D (int32_t)e1 * (int32_t)e2;
+        uint16_t high =3D (uint16_t)(prod >> 16);
+        rd =3D INSERT16(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULHSU.H - Packed signed x unsigned 16-bit multiply high
+ */
+target_ulong HELPER(pmulhsu_h)(CPURISCVState *env,
+                               target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        int32_t prod =3D (int32_t)e1 * (uint32_t)e2;
+        uint16_t high =3D (uint16_t)(prod >> 16);
+        rd =3D INSERT16(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULHU.H - Packed unsigned 16-bit multiply high
+ */
+target_ulong HELPER(pmulhu_h)(CPURISCVState *env,
+                              target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint32_t prod =3D (uint32_t)e1 * (uint32_t)e2;
+        uint16_t high =3D (uint16_t)(prod >> 16);
+        rd =3D INSERT16(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULHR.H - Packed signed 16-bit multiply high with rounding
+ */
+target_ulong HELPER(pmulhr_h)(CPURISCVState *env,
+                              target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        int32_t prod =3D (int32_t)e1 * (int32_t)e2 + (1 << 15);
+        uint16_t high =3D (uint16_t)(prod >> 16);
+        rd =3D INSERT16(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULHRSU.H - Packed signed x unsigned 16-bit multiply high with rounding
+ */
+target_ulong HELPER(pmulhrsu_h)(CPURISCVState *env,
+                                target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        int32_t prod =3D (int32_t)e1 * (uint32_t)e2 + (1 << 15);
+        uint16_t high =3D (uint16_t)(prod >> 16);
+        rd =3D INSERT16(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULHRU.H - Packed unsigned 16-bit multiply high with rounding
+ */
+target_ulong HELPER(pmulhru_h)(CPURISCVState *env,
+                               target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint32_t prod =3D (uint32_t)e1 * (uint32_t)e2 + (1 << 15);
+        uint16_t high =3D (uint16_t)(prod >> 16);
+        rd =3D INSERT16(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULH.W - Packed signed 32-bit multiply high (RV64 only)
+ */
+uint64_t HELPER(pmulh_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t e2 =3D (int32_t)EXTRACT32(rs2, i);
+        int64_t prod =3D (int64_t)e1 * (int64_t)e2;
+        uint32_t high =3D (uint32_t)(prod >> 32);
+        rd =3D INSERT32(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULHR.W - Packed signed 32-bit multiply high with rounding (RV64 only)
+ */
+uint64_t HELPER(pmulhr_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t e2 =3D (int32_t)EXTRACT32(rs2, i);
+        int64_t prod =3D (int64_t)e1 * (int64_t)e2 + (1LL << 31);
+        uint32_t high =3D (uint32_t)(prod >> 32);
+        rd =3D INSERT32(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULHSU.W - Packed signed x unsigned 32-bit multiply high (RV64 only)
+ */
+uint64_t HELPER(pmulhsu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        int64_t prod =3D (int64_t)e1 * (uint64_t)e2;
+        uint32_t high =3D (uint32_t)(prod >> 32);
+        rd =3D INSERT32(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULHRSU.W - Packed signed x unsigned 32-bit
+ * multiply high with rounding (RV64 only)
+ */
+uint64_t HELPER(pmulhrsu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        int64_t prod =3D (int64_t)e1 * (uint64_t)e2 + (1LL << 31);
+        uint32_t high =3D (uint32_t)(prod >> 32);
+        rd =3D INSERT32(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULHU.W - Packed unsigned 32-bit multiply high (RV64 only)
+ */
+uint64_t HELPER(pmulhu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint64_t prod =3D (uint64_t)e1 * (uint64_t)e2;
+        uint32_t high =3D (uint32_t)(prod >> 32);
+        rd =3D INSERT32(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULHRU.W - Packed unsigned 32-bit multiply high with rounding (RV64 on=
ly)
+ */
+uint64_t HELPER(pmulhru_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint64_t prod =3D (uint64_t)e1 * (uint64_t)e2 + (1LL << 31);
+        uint32_t high =3D (uint32_t)(prod >> 32);
+        rd =3D INSERT32(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * MULHR - 32-bit signed multiply high with rounding
+ */
+uint32_t HELPER(mulhr)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int32_t a =3D (int32_t)rs1;
+    int32_t b =3D (int32_t)rs2;
+    int64_t prod =3D (int64_t)a * (int64_t)b + (1LL << 31);
+    return (uint32_t)(prod >> 32);
+}
+
+/**
+ * MULHRSU - 32-bit signed x unsigned multiply high with rounding
+ */
+uint32_t HELPER(mulhrsu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int32_t a =3D (int32_t)rs1;
+    uint32_t b =3D rs2;
+    int64_t prod =3D (int64_t)a * (uint64_t)b + (1LL << 31);
+    return (uint32_t)(prod >> 32);
+}
+
+/**
+ * MULHRU - 32-bit unsigned multiply high with rounding
+ */
+uint32_t HELPER(mulhru)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint32_t a =3D rs1;
+    uint32_t b =3D rs2;
+    uint64_t prod =3D (uint64_t)a * (uint64_t)b + (1LL << 31);
+    return (uint32_t)(prod >> 32);
+}
+
+/**
+ * PMULH.H.B0 - Multiply halfword by low byte, result high halfword
+ */
+target_ulong HELPER(pmulh_h_b0)(CPURISCVState *env,
+                                target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int8_t e2 =3D (int8_t)EXTRACT8(rs2, i * 2);
+        int32_t prod =3D (int32_t)e1 * (int32_t)e2;
+        uint16_t high =3D (uint16_t)(prod >> 8);
+        rd =3D INSERT16(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULH.H.B1 - Multiply halfword by high byte, result high halfword
+ */
+target_ulong HELPER(pmulh_h_b1)(CPURISCVState *env,
+                                target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int8_t e2 =3D (int8_t)EXTRACT8(rs2, i * 2 + 1);
+        int32_t prod =3D (int32_t)e1 * (int32_t)e2;
+        uint16_t high =3D (uint16_t)(prod >> 8);
+        rd =3D INSERT16(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULHSU.H.B0 - Multiply signed halfword by unsigned
+ * low byte, result high halfword
+ */
+target_ulong HELPER(pmulhsu_h_b0)(CPURISCVState *env,
+                                  target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i * 2);
+        int32_t prod =3D (int32_t)e1 * (uint32_t)e2;
+        uint16_t high =3D (uint16_t)(prod >> 8);
+        rd =3D INSERT16(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULHSU.H.B1 - Multiply signed halfword by unsigned
+ * high byte, result high halfword
+ */
+target_ulong HELPER(pmulhsu_h_b1)(CPURISCVState *env,
+                                  target_ulong rs1, target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i * 2 + 1);
+        int32_t prod =3D (int32_t)e1 * (uint32_t)e2;
+        uint16_t high =3D (uint16_t)(prod >> 8);
+        rd =3D INSERT16(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * MULH.H0 - 32-bit multiply by low halfword, result high 16 bits
+ */
+uint32_t HELPER(mulh_h0)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int32_t a =3D (int32_t)rs1;
+    int16_t b =3D (int16_t)(rs2 & 0xFFFF);
+    int64_t prod =3D (int64_t)a * (int64_t)b;
+    return (uint32_t)(prod >> 16);
+}
+
+/**
+ * MULH.H1 - 32-bit multiply by high halfword, result high 16 bits
+ */
+uint32_t HELPER(mulh_h1)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int32_t a =3D (int32_t)rs1;
+    int16_t b =3D (int16_t)((rs2 >> 16) & 0xFFFF);
+    int64_t prod =3D (int64_t)a * (int64_t)b;
+    return (uint32_t)(prod >> 16);
+}
+
+/**
+ * MULHSU.H0 - 32-bit signed multiply by unsigned
+ * low halfword, result high 16 bits
+ */
+uint32_t HELPER(mulhsu_h0)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int32_t a =3D (int32_t)rs1;
+    uint16_t b =3D (uint16_t)(rs2 & 0xFFFF);
+    int64_t prod =3D (int64_t)a * (uint64_t)b;
+    return (uint32_t)(prod >> 16);
+}
+
+/**
+ * MULHSU.H1 - 32-bit signed multiply by unsigned
+ * high halfword, result high 16 bits
+ */
+uint32_t HELPER(mulhsu_h1)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int32_t a =3D (int32_t)rs1;
+    uint16_t b =3D (uint16_t)((rs2 >> 16) & 0xFFFF);
+    int64_t prod =3D (int64_t)a * (uint64_t)b;
+    return (uint32_t)(prod >> 16);
+}
+
+/**
+ * PMULH.W.H0 - Multiply word by low halfword, result high word (RV64 only)
+ */
+uint64_t HELPER(pmulh_w_h0)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int64_t prod =3D (int64_t)e1 * (int64_t)e2;
+        uint32_t high =3D (uint32_t)(prod >> 16);
+        rd =3D INSERT32(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULH.W.H1 - Multiply word by high halfword, result high word (RV64 onl=
y)
+ */
+uint64_t HELPER(pmulh_w_h1)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int64_t prod =3D (int64_t)e1 * (int64_t)e2;
+        uint32_t high =3D (uint32_t)(prod >> 16);
+        rd =3D INSERT32(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULHSU.W.H0 - Multiply signed word by unsigned
+ * low halfword, result high word (RV64 only)
+ */
+uint64_t HELPER(pmulhsu_w_h0)(CPURISCVState *env, uint64_t rs1, uint64_t r=
s2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i * 2);
+        int64_t prod =3D (int64_t)e1 * (uint64_t)e2;
+        uint32_t high =3D (uint32_t)(prod >> 16);
+        rd =3D INSERT32(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULHSU.W.H1 - Multiply signed word by unsigned
+ * high halfword, result high word (RV64 only)
+ */
+uint64_t HELPER(pmulhsu_w_h1)(CPURISCVState *env, uint64_t rs1, uint64_t r=
s2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i * 2 + 1);
+        int64_t prod =3D (int64_t)e1 * (uint64_t)e2;
+        uint32_t high =3D (uint32_t)(prod >> 16);
+        rd =3D INSERT32(rd, high, i);
+    }
+    return rd;
+}
+
+/**
+ * PMUL.H.B00 - Multiply halfword by low byte of each halfword
+ * For each halfword: rd[i] =3D rs1[i][7:0] * rs2[i][7:0]
+ */
+target_ulong HELPER(pmul_h_b00)(CPURISCVState *env,
+                                target_ulong s1, target_ulong s2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h =3D EXTRACT16(s1, i);
+        uint16_t s2_h =3D EXTRACT16(s2, i);
+        int8_t s1_b0 =3D (int8_t)(s1_h & 0xFF);
+        int8_t s2_b0 =3D (int8_t)(s2_h & 0xFF);
+        int16_t mul =3D (int16_t)s1_b0 * (int16_t)s2_b0;
+        rd =3D INSERT16(rd, (uint16_t)mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMUL.H.B01 - Multiply halfword low byte by halfword high byte
+ * For each halfword: rd[i] =3D rs1[i][7:0] * rs2[i][15:8]
+ */
+target_ulong HELPER(pmul_h_b01)(CPURISCVState *env,
+                                target_ulong s1, target_ulong s2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h =3D EXTRACT16(s1, i);
+        uint16_t s2_h =3D EXTRACT16(s2, i);
+        int8_t s1_b0 =3D (int8_t)(s1_h & 0xFF);
+        int8_t s2_b1 =3D (int8_t)((s2_h >> 8) & 0xFF);
+        int16_t mul =3D (int16_t)s1_b0 * (int16_t)s2_b1;
+        rd =3D INSERT16(rd, (uint16_t)mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMUL.H.B11 - Multiply halfword high byte by halfword high byte
+ * For each halfword: rd[i] =3D rs1[i][15:8] * rs2[i][15:8]
+ */
+target_ulong HELPER(pmul_h_b11)(CPURISCVState *env,
+                                target_ulong s1, target_ulong s2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h =3D EXTRACT16(s1, i);
+        uint16_t s2_h =3D EXTRACT16(s2, i);
+        int8_t s1_b1 =3D (int8_t)((s1_h >> 8) & 0xFF);
+        int8_t s2_b1 =3D (int8_t)((s2_h >> 8) & 0xFF);
+        int16_t mul =3D (int16_t)s1_b1 * (int16_t)s2_b1;
+        rd =3D INSERT16(rd, (uint16_t)mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULSU.H.B00 - Signed x unsigned multiply, low bytes
+ * For each halfword: rd[i] =3D (signed)rs1[i][7:0] * (unsigned)rs2[i][7:0]
+ */
+target_ulong HELPER(pmulsu_h_b00)(CPURISCVState *env,
+                                  target_ulong s1, target_ulong s2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h =3D EXTRACT16(s1, i);
+        uint16_t s2_h =3D EXTRACT16(s2, i);
+        int8_t s1_b0 =3D (int8_t)(s1_h & 0xFF);
+        uint8_t s2_b0 =3D (uint8_t)(s2_h & 0xFF);
+        int16_t mul =3D (int16_t)s1_b0 * (uint16_t)s2_b0;
+        rd =3D INSERT16(rd, (uint16_t)mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULSU.H.B11 - Signed x unsigned multiply, high bytes
+ * For each halfword: rd[i] =3D (signed)rs1[i][15:8] * (unsigned)rs2[i][15=
:8]
+ */
+target_ulong HELPER(pmulsu_h_b11)(CPURISCVState *env,
+                                  target_ulong s1, target_ulong s2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h =3D EXTRACT16(s1, i);
+        uint16_t s2_h =3D EXTRACT16(s2, i);
+        int8_t s1_b1 =3D (int8_t)((s1_h >> 8) & 0xFF);
+        uint8_t s2_b1 =3D (uint8_t)((s2_h >> 8) & 0xFF);
+        int16_t mul =3D (int16_t)s1_b1 * (uint16_t)s2_b1;
+        rd =3D INSERT16(rd, (uint16_t)mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULU.H.B00 - Unsigned multiply, low bytes
+ * For each halfword: rd[i] =3D rs1[i][7:0] * rs2[i][7:0] (unsigned)
+ */
+target_ulong HELPER(pmulu_h_b00)(CPURISCVState *env,
+                                 target_ulong s1, target_ulong s2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h =3D EXTRACT16(s1, i);
+        uint16_t s2_h =3D EXTRACT16(s2, i);
+        uint8_t s1_b0 =3D (uint8_t)(s1_h & 0xFF);
+        uint8_t s2_b0 =3D (uint8_t)(s2_h & 0xFF);
+        uint16_t mul =3D (uint16_t)s1_b0 * (uint16_t)s2_b0;
+        rd =3D INSERT16(rd, mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULU.H.B01 - Unsigned multiply, rs1 low byte x rs2 high byte
+ * For each halfword: rd[i] =3D rs1[i][7:0] * rs2[i][15:8] (unsigned)
+ */
+target_ulong HELPER(pmulu_h_b01)(CPURISCVState *env,
+                                 target_ulong s1, target_ulong s2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h =3D EXTRACT16(s1, i);
+        uint16_t s2_h =3D EXTRACT16(s2, i);
+        uint8_t s1_b0 =3D (uint8_t)(s1_h & 0xFF);
+        uint8_t s2_b1 =3D (uint8_t)((s2_h >> 8) & 0xFF);
+        uint16_t mul =3D (uint16_t)s1_b0 * (uint16_t)s2_b1;
+        rd =3D INSERT16(rd, mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULU.H.B11 - Unsigned multiply, high bytes
+ * For each halfword: rd[i] =3D rs1[i][15:8] * rs2[i][15:8] (unsigned)
+ */
+target_ulong HELPER(pmulu_h_b11)(CPURISCVState *env,
+                                 target_ulong s1, target_ulong s2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h =3D EXTRACT16(s1, i);
+        uint16_t s2_h =3D EXTRACT16(s2, i);
+        uint8_t s1_b1 =3D (uint8_t)((s1_h >> 8) & 0xFF);
+        uint8_t s2_b1 =3D (uint8_t)((s2_h >> 8) & 0xFF);
+        uint16_t mul =3D (uint16_t)s1_b1 * (uint16_t)s2_b1;
+        rd =3D INSERT16(rd, mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMUL.W.H00 - Multiply word by low halfword of each word
+ * For each word: rd[i] =3D rs1[i][15:0] * rs2[i][15:0]
+ */
+uint64_t HELPER(pmul_w_h00)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h0;
+        rd =3D INSERT32(rd, (uint32_t)mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMUL.W.H01 - Multiply word by low halfword x high halfword
+ * For each word: rd[i] =3D rs1[i][15:0] * rs2[i][31:16]
+ */
+uint64_t HELPER(pmul_w_h01)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h1;
+        rd =3D INSERT32(rd, (uint32_t)mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMUL.W.H11 - Multiply word by high halfword x high halfword
+ * For each word: rd[i] =3D rs1[i][31:16] * rs2[i][31:16]
+ */
+uint64_t HELPER(pmul_w_h11)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t mul =3D (int32_t)s1_h1 * (int32_t)s2_h1;
+        rd =3D INSERT32(rd, (uint32_t)mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULSU.W.H00 - Signed x unsigned multiply, low halfwords
+ * For each word: rd[i] =3D (signed)rs1[i][15:0] * (unsigned)rs2[i][15:0]
+ */
+uint64_t HELPER(pmulsu_w_h00)(CPURISCVState *env, uint64_t rs1, uint64_t r=
s2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2);
+        int32_t mul =3D (int32_t)s1_h0 * (uint32_t)s2_h0;
+        rd =3D INSERT32(rd, (uint32_t)mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULSU.W.H11 - Signed x unsigned multiply, high halfwords
+ * For each word: rd[i] =3D (signed)rs1[i][31:16] * (unsigned)rs2[i][31:16]
+ */
+uint64_t HELPER(pmulsu_w_h11)(CPURISCVState *env, uint64_t rs1, uint64_t r=
s2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1);
+        int32_t mul =3D (int32_t)s1_h1 * (uint32_t)s2_h1;
+        rd =3D INSERT32(rd, (uint32_t)mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULU.W.H00 - Unsigned multiply, low halfwords
+ * For each word: rd[i] =3D rs1[i][15:0] * rs2[i][15:0] (unsigned)
+ */
+uint64_t HELPER(pmulu_w_h00)(CPURISCVState *env, uint64_t rs1, uint64_t rs=
2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h0 =3D EXTRACT16(rs1, i * 2);
+        uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2);
+        uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h0;
+        rd =3D INSERT32(rd, mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULU.W.H01 - Unsigned multiply, low halfword x high halfword
+ * For each word: rd[i] =3D rs1[i][15:0] * rs2[i][31:16] (unsigned)
+ */
+uint64_t HELPER(pmulu_w_h01)(CPURISCVState *env, uint64_t rs1, uint64_t rs=
2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h0 =3D EXTRACT16(rs1, i * 2);
+        uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1);
+        uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h1;
+        rd =3D INSERT32(rd, mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMULU.W.H11 - Unsigned multiply, high halfwords
+ * For each word: rd[i] =3D rs1[i][31:16] * rs2[i][31:16] (unsigned)
+ */
+uint64_t HELPER(pmulu_w_h11)(CPURISCVState *env, uint64_t rs1, uint64_t rs=
2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h1 =3D EXTRACT16(rs1, i * 2 + 1);
+        uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1);
+        uint32_t mul =3D (uint32_t)s1_h1 * (uint32_t)s2_h1;
+        rd =3D INSERT32(rd, mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PM2SADD.H - Packed saturating multiply-add (non-crossed)
+ *
+ * For each 32-bit word:
+ *   result =3D sat32(rs1[31:16] * rs2[31:16] + rs1[15:0] * rs2[15:0])
+ *
+ * Special case: if both halfwords in both sources are 0x8000 (-32768),
+ *   result saturates to 0x7FFFFFFF and sets vxsat
+ */
+target_ulong HELPER(pm2sadd_h)(CPURISCVState *env,
+                                target_ulong s1, target_ulong s2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);  /* Number of 32-bit words */
+    int global_sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        /* Extract both halfwords from each source for this word */
+        uint32_t s1_word =3D EXTRACT32(s1, i);
+        uint32_t s2_word =3D EXTRACT32(s2, i);
+
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(s1_word, 0);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(s1_word, 1);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(s2_word, 0);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(s2_word, 1);
+
+        uint32_t result;
+
+        /* Check for the special saturation case: all halfwords are -32768=
 */
+        if ((s1_h0 =3D=3D -32768) && (s1_h1 =3D=3D -32768) &&
+            (s2_h0 =3D=3D -32768) && (s2_h1 =3D=3D -32768)) {
+            result =3D 0x7FFFFFFF;
+            global_sat =3D 1;
+        } else {
+            /* Normal case: compute products and sum */
+            int32_t mul_00 =3D (int32_t)s1_h0 * (int32_t)s2_h0;
+            int32_t mul_11 =3D (int32_t)s1_h1 * (int32_t)s2_h1;
+
+            /* The sum may overflow 32 bits; the result is truncated. */
+            result =3D (uint32_t)(mul_00 + mul_11);
+        }
+
+        rd =3D INSERT32(rd, result, i);
+    }
+
+    if (global_sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PM2SADD.HX - Packed saturating multiply-add crossed
+ *
+ * For each 32-bit word:
+ *   result =3D sat32(rs1[31:16] * rs2[15:0] + rs1[15:0] * rs2[31:16])
+ *
+ * Special case: if both halfwords in both sources are 0x8000 (-32768),
+ *   result saturates to 0x7FFFFFFF and sets vxsat
+ */
+target_ulong HELPER(pm2sadd_hx)(CPURISCVState *env,
+                                 target_ulong s1, target_ulong s2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);  /* Number of 32-bit words */
+    int global_sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        /* Extract both halfwords from each source for this word */
+        uint32_t s1_word =3D EXTRACT32(s1, i);
+        uint32_t s2_word =3D EXTRACT32(s2, i);
+
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(s1_word, 0);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(s1_word, 1);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(s2_word, 0);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(s2_word, 1);
+
+        uint32_t result;
+
+        /* Check for the special saturation case: all halfwords are -32768=
 */
+        if ((s1_h0 =3D=3D -32768) && (s1_h1 =3D=3D -32768) &&
+            (s2_h0 =3D=3D -32768) && (s2_h1 =3D=3D -32768)) {
+            result =3D 0x7FFFFFFF;
+            global_sat =3D 1;
+        } else {
+            /* Crossed products: s1_h0 * s2_h1 and s1_h1 * s2_h0 */
+            int32_t mul_01 =3D (int32_t)s1_h0 * (int32_t)s2_h1;
+            int32_t mul_10 =3D (int32_t)s1_h1 * (int32_t)s2_h0;
+
+            /* Sum the crossed products */
+            result =3D (uint32_t)(mul_01 + mul_10);
+        }
+
+        rd =3D INSERT32(rd, result, i);
+    }
+
+    if (global_sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * MUL.H00 - 32-bit signed multiply, low halfwords
+ * Returns product of low halfwords of rs1 and rs2
+ */
+uint32_t HELPER(mul_h00)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0);
+    int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h0;
+    return (uint32_t)mul;
+}
+
+/**
+ * MUL.H01 - 32-bit signed multiply, rs1 low halfword x rs2 high halfword
+ * Returns product of low halfword of rs1 and high halfword of rs2
+ */
+uint32_t HELPER(mul_h01)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1);
+    int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h1;
+    return (uint32_t)mul;
+}
+
+/**
+ * MUL.H11 - 32-bit signed multiply, high halfwords
+ * Returns product of high halfwords of rs1 and rs2
+ */
+uint32_t HELPER(mul_h11)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1);
+    int32_t mul =3D (int32_t)s1_h1 * (int32_t)s2_h1;
+    return (uint32_t)mul;
+}
+
+/**
+ * MULSU.H00 - 32-bit signed x unsigned multiply, low halfwords
+ * Returns product of low halfword of rs1 (signed)
+ * and low halfword of rs2 (unsigned)
+ */
+uint32_t HELPER(mulsu_h00)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    uint16_t s2_h0 =3D EXTRACT16(rs2, 0);
+    int32_t mul =3D (int32_t)s1_h0 * (uint32_t)s2_h0;
+    return (uint32_t)mul;
+}
+
+/**
+ * MULSU.H11 - 32-bit signed x unsigned multiply, high halfwords
+ * Returns product of high halfword of rs1 (signed)
+ * and high halfword of rs2 (unsigned)
+ */
+uint32_t HELPER(mulsu_h11)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    uint16_t s2_h1 =3D EXTRACT16(rs2, 1);
+    int32_t mul =3D (int32_t)s1_h1 * (uint32_t)s2_h1;
+    return (uint32_t)mul;
+}
+
+/**
+ * MULU.H00 - 32-bit unsigned multiply, low halfwords
+ * Returns product of low halfwords of rs1 and rs2 (unsigned)
+ */
+uint32_t HELPER(mulu_h00)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint16_t s1_h0 =3D EXTRACT16(rs1, 0);
+    uint16_t s2_h0 =3D EXTRACT16(rs2, 0);
+    uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h0;
+    return mul;
+}
+
+/**
+ * MULU.H01 - 32-bit unsigned multiply, rs1 low halfword x rs2 high halfwo=
rd
+ * Returns product of low halfword of rs1 and high halfword of rs2 (unsign=
ed)
+ */
+uint32_t HELPER(mulu_h01)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint16_t s1_h0 =3D EXTRACT16(rs1, 0);
+    uint16_t s2_h1 =3D EXTRACT16(rs2, 1);
+    uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h1;
+    return mul;
+}
+
+/**
+ * MULU.H11 - 32-bit unsigned multiply, high halfwords
+ * Returns product of high halfwords of rs1 and rs2 (unsigned)
+ */
+uint32_t HELPER(mulu_h11)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint16_t s1_h1 =3D EXTRACT16(rs1, 1);
+    uint16_t s2_h1 =3D EXTRACT16(rs2, 1);
+    uint32_t mul =3D (uint32_t)s1_h1 * (uint32_t)s2_h1;
+    return mul;
+}
+
+/**
+ * MUL.W00 - 64-bit signed multiply, low word x low word
+ * Returns full 64-bit product of low 32 bits of rs1 and rs2
+ */
+uint64_t HELPER(mul_w00)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0);
+    int64_t mul =3D (int64_t)s1_w0 * (int64_t)s2_w0;
+    return (uint64_t)mul;
+}
+
+/**
+ * MUL.W01 - 64-bit signed multiply, low word x high word
+ * Returns full 64-bit product of low 32 bits of rs1 and high 32 bits of r=
s2
+ */
+uint64_t HELPER(mul_w01)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t mul =3D (int64_t)s1_w0 * (int64_t)s2_w1;
+    return (uint64_t)mul;
+}
+
+/**
+ * MUL.W11 - 64-bit signed multiply, high word x high word
+ * Returns full 64-bit product of high 32 bits of rs1 and high 32 bits of =
rs2
+ */
+uint64_t HELPER(mul_w11)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t mul =3D (int64_t)s1_w1 * (int64_t)s2_w1;
+    return (uint64_t)mul;
+}
+
+/**
+ * MULSU.W00 - 64-bit signed x unsigned multiply, low word x low word
+ * Returns full 64-bit product of low 32 bits of rs1
+ * (signed) and low 32 bits of rs2 (unsigned)
+ */
+uint64_t HELPER(mulsu_w00)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    uint32_t s2_w0 =3D EXTRACT32(rs2, 0);
+    int64_t mul =3D (int64_t)s1_w0 * (uint64_t)s2_w0;
+    return (uint64_t)mul;
+}
+
+/**
+ * MULSU.W11 - 64-bit signed x unsigned multiply, high word x high word
+ * Returns full 64-bit product of high 32 bits of rs1
+ * (signed) and high 32 bits of rs2 (unsigned)
+ */
+uint64_t HELPER(mulsu_w11)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    uint32_t s2_w1 =3D EXTRACT32(rs2, 1);
+    int64_t mul =3D (int64_t)s1_w1 * (uint64_t)s2_w1;
+    return (uint64_t)mul;
+}
+
+/**
+ * MULU.W00 - 64-bit unsigned multiply, low word x low word
+ * Returns full 64-bit product of low 32 bits of rs1
+ * and low 32 bits of rs2 (unsigned)
+ */
+uint64_t HELPER(mulu_w00)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint32_t s1_w0 =3D EXTRACT32(rs1, 0);
+    uint32_t s2_w0 =3D EXTRACT32(rs2, 0);
+    uint64_t mul =3D (uint64_t)s1_w0 * (uint64_t)s2_w0;
+    return mul;
+}
+
+/**
+ * MULU.W01 - 64-bit unsigned multiply, low word x high word
+ * Returns full 64-bit product of low 32 bits of rs1
+ * and high 32 bits of rs2 (unsigned)
+ */
+uint64_t HELPER(mulu_w01)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint32_t s1_w0 =3D EXTRACT32(rs1, 0);
+    uint32_t s2_w1 =3D EXTRACT32(rs2, 1);
+    uint64_t mul =3D (uint64_t)s1_w0 * (uint64_t)s2_w1;
+    return mul;
+}
+
+/**
+ * MULU.W11 - 64-bit unsigned multiply, high word x high word
+ * Returns full 64-bit product of high 32 bits of rs1
+ * and high 32 bits of rs2 (unsigned)
+ */
+uint64_t HELPER(mulu_w11)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint32_t s1_w1 =3D EXTRACT32(rs1, 1);
+    uint32_t s2_w1 =3D EXTRACT32(rs2, 1);
+    uint64_t mul =3D (uint64_t)s1_w1 * (uint64_t)s2_w1;
+    return mul;
+}
--=20
2.34.1
From nobody Sat May 30 20:13:16 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1776422952788468.7809594646741;
 Fri, 17 Apr 2026 03:49:12 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists1p.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1wDgjU-0001EB-Vr; Fri, 17 Apr 2026 06:47:33 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjP-00017F-38; Fri, 17 Apr 2026 06:47:27 -0400
Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn)
 by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjK-0007zf-9k; Fri, 17 Apr 2026 06:47:26 -0400
Received: from Huawei.localdomain (unknown [36.110.52.2])
 by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S11;
 Fri, 17 Apr 2026 18:47:17 +0800 (CST)
From: Molly Chen <xiaoou@iscas.ac.cn>
To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com,
 daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com,
 chao.liu.zevorn@gmail.com
Cc: xiaoou@iscas.ac.cn,
	qemu-riscv@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 09/14] target/riscv: rvp: add multiply-accumulate operations
Date: Fri, 17 Apr 2026 18:46:46 +0800
Message-Id: <20260417104652.17857-10-xiaoou@iscas.ac.cn>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
References: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S11
X-Coremail-Antispam: 1UD129KBjvAXoWfAw1fXr1xZF4xtF4DGr47CFg_yoWrJrWxto
 W3Gw1Yy395ur4xu3yF9w4UXr1jqrWIvw1DJw4Fvr43Xas7Gr9rKr15J34kAa4xCrWayrWr
 WrZayFyrtFy3C3sxn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3
 AaLaJ3UjIYCTnIWjp_UUUOj7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva
 j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc
 Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l
 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV
 WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE
 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I
 x0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8
 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_Jw
 0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AK
 xVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrx
 kI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v2
 6F4j6r4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr
 1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUqLvNU
 UUUU=
X-Originating-IP: [36.110.52.2]
X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists1p.gnu.org;
Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn;
 helo=cstnet.cn
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998,
 HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: qemu development <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1776422954522158500
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Molly Chen <xiaoou@iscas.ac.cn>
---
 target/riscv/helper.h                   |  56 ++
 target/riscv/insn32.decode              |  92 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  56 ++
 target/riscv/psimd_helper.c             | 946 ++++++++++++++++++++++++
 4 files changed, 1150 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4b3f01f8d0..54f8591672 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1605,3 +1605,59 @@ DEF_HELPER_3(mulsu_w11, i64, env, i64, i64)
 DEF_HELPER_3(mulu_w00, i64, env, i64, i64)
 DEF_HELPER_3(mulu_w01, i64, env, i64, i64)
 DEF_HELPER_3(mulu_w11, i64, env, i64, i64)
+
+/* Packed SIMD - Multiply-Accumulate Operations */
+DEF_HELPER_4(pmhacc_h, tl, env, tl, tl, tl)
+DEF_HELPER_4(pmhaccsu_h, tl, env, tl, tl, tl)
+DEF_HELPER_4(pmhaccu_h, tl, env, tl, tl, tl)
+DEF_HELPER_4(pmhracc_h, tl, env, tl, tl, tl)
+DEF_HELPER_4(pmhraccsu_h, tl, env, tl, tl, tl)
+DEF_HELPER_4(pmhraccu_h, tl, env, tl, tl, tl)
+DEF_HELPER_4(pmhacc_w, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmhracc_w, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmhaccsu_w, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmhraccsu_w, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmhaccu_w, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmhraccu_w, i64, env, i64, i64, i64)
+DEF_HELPER_4(mhacc, i32, env, i32, i32, i32)
+DEF_HELPER_4(mhracc, i32, env, i32, i32, i32)
+DEF_HELPER_4(mhaccsu, i32, env, i32, i32, i32)
+DEF_HELPER_4(mhraccsu, i32, env, i32, i32, i32)
+DEF_HELPER_4(mhaccu, i32, env, i32, i32, i32)
+DEF_HELPER_4(mhraccu, i32, env, i32, i32, i32)
+DEF_HELPER_4(pmhacc_h_b0, tl, env, tl, tl, tl)
+DEF_HELPER_4(pmhacc_h_b1, tl, env, tl, tl, tl)
+DEF_HELPER_4(pmhaccsu_h_b0, tl, env, tl, tl, tl)
+DEF_HELPER_4(pmhaccsu_h_b1, tl, env, tl, tl, tl)
+DEF_HELPER_4(mhacc_h0, i32, env, i32, i32, i32)
+DEF_HELPER_4(mhacc_h1, i32, env, i32, i32, i32)
+DEF_HELPER_4(mhaccsu_h0, i32, env, i32, i32, i32)
+DEF_HELPER_4(mhaccsu_h1, i32, env, i32, i32, i32)
+DEF_HELPER_4(pmhacc_w_h0, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmhacc_w_h1, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmhaccsu_w_h0, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmhaccsu_w_h1, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmacc_w_h00, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmacc_w_h01, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmacc_w_h11, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmaccsu_w_h00, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmaccsu_w_h11, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmaccu_w_h00, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmaccu_w_h01, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmaccu_w_h11, i64, env, i64, i64, i64)
+DEF_HELPER_4(macc_h00, i32, env, i32, i32, i32)
+DEF_HELPER_4(macc_h01, i32, env, i32, i32, i32)
+DEF_HELPER_4(macc_h11, i32, env, i32, i32, i32)
+DEF_HELPER_4(maccsu_h00, i32, env, i32, i32, i32)
+DEF_HELPER_4(maccsu_h11, i32, env, i32, i32, i32)
+DEF_HELPER_4(maccu_h00, i32, env, i32, i32, i32)
+DEF_HELPER_4(maccu_h01, i32, env, i32, i32, i32)
+DEF_HELPER_4(maccu_h11, i32, env, i32, i32, i32)
+DEF_HELPER_4(macc_w00, i64, env, i64, i64, i64)
+DEF_HELPER_4(macc_w01, i64, env, i64, i64, i64)
+DEF_HELPER_4(macc_w11, i64, env, i64, i64, i64)
+DEF_HELPER_4(maccsu_w00, i64, env, i64, i64, i64)
+DEF_HELPER_4(maccsu_w11, i64, env, i64, i64, i64)
+DEF_HELPER_4(maccu_w00, i64, env, i64, i64, i64)
+DEF_HELPER_4(maccu_w01, i64, env, i64, i64, i64)
+DEF_HELPER_4(maccu_w11, i64, env, i64, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index bd3b14af5b..9944d0b52c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1413,3 +1413,95 @@ mulsu_w11       11110 11 ..... ..... 011 ..... 01110=
11 @r
 mulu_w00        10100 11 ..... ..... 011 ..... 0111011 @r
 mulu_w01        10110 11 ..... ..... 001 ..... 0111011 @r
 mulu_w11        10110 11 ..... ..... 011 ..... 0111011 @r
+
+# Packed SIMD - Multiply-Accumulate Operations
+pmhacc_h    10001 00 ..... ..... 111 ..... 0111011 @r
+pmhaccsu_h  11001 00 ..... ..... 111 ..... 0111011 @r
+pmhaccu_h   10011 00 ..... ..... 111 ..... 0111011 @r
+pmhracc_h   10001 10 ..... ..... 111 ..... 0111011 @r
+pmhraccsu_h 11001 10 ..... ..... 111 ..... 0111011 @r
+pmhraccu_h  10011 10 ..... ..... 111 ..... 0111011 @r
+{
+  mhacc     10001 01 ..... ..... 111 ..... 0111011 @r
+  pmhacc_w  10001 01 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mhracc    10001 11 ..... ..... 111 ..... 0111011 @r
+  pmhracc_w 10001 11 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mhaccsu   11001 01 ..... ..... 111 ..... 0111011 @r
+  pmhaccsu_w    11001 01 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mhraccsu  11001 11 ..... ..... 111 ..... 0111011 @r
+  pmhraccsu_w   11001 11 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mhaccu    10011 01 ..... ..... 111 ..... 0111011 @r
+  pmhaccu_w 10011 01 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mhraccu   10011 11 ..... ..... 111 ..... 0111011 @r
+  pmhraccu_w    10011 11 ..... ..... 111 ..... 0111011 @r
+}
+pmhacc_h_b0     10101 00 ..... ..... 111 ..... 0111011 @r
+pmhacc_h_b1     10111 00 ..... ..... 111 ..... 0111011 @r
+pmhaccsu_h_b0   10101 10 ..... ..... 111 ..... 0111011 @r
+pmhaccsu_h_b1   10111 10 ..... ..... 111 ..... 0111011 @r
+{
+  mhacc_h0      10101 01 ..... ..... 111 ..... 0111011 @r
+  pmhacc_w_h0   10101 01 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mhacc_h1      10111 01 ..... ..... 111 ..... 0111011 @r
+  pmhacc_w_h1   10111 01 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mhaccsu_h0    10101 11 ..... ..... 111 ..... 0111011 @r
+  pmhaccsu_w_h0 10101 11 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mhaccsu_h1    10111 11 ..... ..... 111 ..... 0111011 @r
+  pmhaccsu_w_h1 10111 11 ..... ..... 111 ..... 0111011 @r
+}
+{
+  macc_h00      10001 01 ..... ..... 011 ..... 0111011 @r
+  pmacc_w_h00   10001 01 ..... ..... 011 ..... 0111011 @r
+}
+{
+  macc_h01      10011 01 ..... ..... 001 ..... 0111011 @r
+  pmacc_w_h01   10011 01 ..... ..... 001 ..... 0111011 @r
+}
+{
+  macc_h11      10011 01 ..... ..... 011 ..... 0111011 @r
+  pmacc_w_h11   10011 01 ..... ..... 011 ..... 0111011 @r
+}
+{
+  maccsu_h00    11101 01 ..... ..... 011 ..... 0111011 @r
+  pmaccsu_w_h00 11101 01 ..... ..... 011 ..... 0111011 @r
+}
+{
+  maccsu_h11    11111 01 ..... ..... 011 ..... 0111011 @r
+  pmaccsu_w_h11 11111 01 ..... ..... 011 ..... 0111011 @r
+}
+{
+  maccu_h00     10101 01 ..... ..... 011 ..... 0111011 @r
+  pmaccu_w_h00  10101 01 ..... ..... 011 ..... 0111011 @r
+}
+{
+  maccu_h01     10111 01 ..... ..... 001 ..... 0111011 @r
+  pmaccu_w_h01  10111 01 ..... ..... 001 ..... 0111011 @r
+}
+{
+  maccu_h11     10111 01 ..... ..... 011 ..... 0111011 @r
+  pmaccu_w_h11  10111 01 ..... ..... 011 ..... 0111011 @r
+}
+macc_w00        10001 11 ..... ..... 011 ..... 0111011 @r
+macc_w01        10011 11 ..... ..... 001 ..... 0111011 @r
+macc_w11        10011 11 ..... ..... 011 ..... 0111011 @r
+maccsu_w00      11101 11 ..... ..... 011 ..... 0111011 @r
+maccsu_w11      11111 11 ..... ..... 011 ..... 0111011 @r
+maccu_w00       10101 11 ..... ..... 011 ..... 0111011 @r
+maccu_w01       10111 11 ..... ..... 001 ..... 0111011 @r
+maccu_w11       10111 11 ..... ..... 011 ..... 0111011 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr=
ans/trans_rvp.c.inc
index b01656ffb0..b3476c26ad 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -774,3 +774,59 @@ GEN_SIMD_TRANS_64(mulsu_w11)
 GEN_SIMD_TRANS_64(mulu_w00)
 GEN_SIMD_TRANS_64(mulu_w01)
 GEN_SIMD_TRANS_64(mulu_w11)
+
+/* Packed SIMD - Multiply-Accumulate Operations */
+GEN_SIMD_TRANS_ACC(pmhacc_h)
+GEN_SIMD_TRANS_ACC(pmhaccsu_h)
+GEN_SIMD_TRANS_ACC(pmhaccu_h)
+GEN_SIMD_TRANS_ACC(pmhracc_h)
+GEN_SIMD_TRANS_ACC(pmhraccsu_h)
+GEN_SIMD_TRANS_ACC(pmhraccu_h)
+GEN_SIMD_TRANS_ACC_64(pmhacc_w)
+GEN_SIMD_TRANS_ACC_64(pmhracc_w)
+GEN_SIMD_TRANS_ACC_64(pmhaccsu_w)
+GEN_SIMD_TRANS_ACC_64(pmhraccsu_w)
+GEN_SIMD_TRANS_ACC_64(pmhaccu_w)
+GEN_SIMD_TRANS_ACC_64(pmhraccu_w)
+GEN_SIMD_TRANS_ACC_32(mhacc)
+GEN_SIMD_TRANS_ACC_32(mhracc)
+GEN_SIMD_TRANS_ACC_32(mhaccsu)
+GEN_SIMD_TRANS_ACC_32(mhraccsu)
+GEN_SIMD_TRANS_ACC_32(mhaccu)
+GEN_SIMD_TRANS_ACC_32(mhraccu)
+GEN_SIMD_TRANS_ACC(pmhacc_h_b0)
+GEN_SIMD_TRANS_ACC(pmhacc_h_b1)
+GEN_SIMD_TRANS_ACC(pmhaccsu_h_b0)
+GEN_SIMD_TRANS_ACC(pmhaccsu_h_b1)
+GEN_SIMD_TRANS_ACC_32(mhacc_h0)
+GEN_SIMD_TRANS_ACC_32(mhacc_h1)
+GEN_SIMD_TRANS_ACC_32(mhaccsu_h0)
+GEN_SIMD_TRANS_ACC_32(mhaccsu_h1)
+GEN_SIMD_TRANS_ACC_64(pmhacc_w_h0)
+GEN_SIMD_TRANS_ACC_64(pmhacc_w_h1)
+GEN_SIMD_TRANS_ACC_64(pmhaccsu_w_h0)
+GEN_SIMD_TRANS_ACC_64(pmhaccsu_w_h1)
+GEN_SIMD_TRANS_ACC_64(pmacc_w_h00)
+GEN_SIMD_TRANS_ACC_64(pmacc_w_h01)
+GEN_SIMD_TRANS_ACC_64(pmacc_w_h11)
+GEN_SIMD_TRANS_ACC_64(pmaccsu_w_h00)
+GEN_SIMD_TRANS_ACC_64(pmaccsu_w_h11)
+GEN_SIMD_TRANS_ACC_64(pmaccu_w_h00)
+GEN_SIMD_TRANS_ACC_64(pmaccu_w_h01)
+GEN_SIMD_TRANS_ACC_64(pmaccu_w_h11)
+GEN_SIMD_TRANS_ACC_32(macc_h00)
+GEN_SIMD_TRANS_ACC_32(macc_h01)
+GEN_SIMD_TRANS_ACC_32(macc_h11)
+GEN_SIMD_TRANS_ACC_32(maccsu_h00)
+GEN_SIMD_TRANS_ACC_32(maccsu_h11)
+GEN_SIMD_TRANS_ACC_32(maccu_h00)
+GEN_SIMD_TRANS_ACC_32(maccu_h01)
+GEN_SIMD_TRANS_ACC_32(maccu_h11)
+GEN_SIMD_TRANS_ACC_64(macc_w00)
+GEN_SIMD_TRANS_ACC_64(macc_w01)
+GEN_SIMD_TRANS_ACC_64(macc_w11)
+GEN_SIMD_TRANS_ACC_64(maccsu_w00)
+GEN_SIMD_TRANS_ACC_64(maccsu_w11)
+GEN_SIMD_TRANS_ACC_64(maccu_w00)
+GEN_SIMD_TRANS_ACC_64(maccu_w01)
+GEN_SIMD_TRANS_ACC_64(maccu_w11)
diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c
index b60fd3094c..7f32a13ba0 100644
--- a/target/riscv/psimd_helper.c
+++ b/target/riscv/psimd_helper.c
@@ -4682,3 +4682,949 @@ uint64_t HELPER(mulu_w11)(CPURISCVState *env, uint6=
4_t rs1, uint64_t rs2)
     uint64_t mul =3D (uint64_t)s1_w1 * (uint64_t)s2_w1;
     return mul;
 }
+
+/* Multiply-Accumulate Operations */
+
+/**
+ * PMHACC.H - Packed signed 16-bit multiply high with accumulate
+ */
+target_ulong HELPER(pmhacc_h)(CPURISCVState *env, target_ulong rs1,
+                              target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        int16_t d =3D (int16_t)EXTRACT16(dest, i);
+        int32_t prod =3D (int32_t)e1 * (int32_t)e2;
+        int16_t high =3D (int16_t)(prod >> 16);
+        uint16_t res =3D (uint16_t)(high + d);
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHACCSU.H - Packed signed x unsigned 16-bit multiply high with accumul=
ate
+ */
+target_ulong HELPER(pmhaccsu_h)(CPURISCVState *env, target_ulong rs1,
+                                target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        int16_t d =3D (int16_t)EXTRACT16(dest, i);
+        int32_t prod =3D (int32_t)e1 * (uint32_t)e2;
+        int16_t high =3D (int16_t)(prod >> 16);
+        uint16_t res =3D (uint16_t)(high + d);
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHACCU.H - Packed unsigned 16-bit multiply high with accumulate
+ */
+target_ulong HELPER(pmhaccu_h)(CPURISCVState *env, target_ulong rs1,
+                               target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint16_t d =3D (uint16_t)EXTRACT16(dest, i);
+        uint32_t prod =3D (uint32_t)e1 * (uint32_t)e2;
+        uint16_t high =3D (uint16_t)(prod >> 16);
+        uint16_t res =3D high + d;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHRACC.H - Packed signed 16-bit multiply high with rounding and accumu=
late
+ */
+target_ulong HELPER(pmhracc_h)(CPURISCVState *env, target_ulong rs1,
+                               target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        int16_t d =3D (int16_t)EXTRACT16(dest, i);
+        int32_t prod =3D (int32_t)e1 * (int32_t)e2 + (1 << 15);
+        int16_t high =3D (int16_t)(prod >> 16);
+        uint16_t res =3D (uint16_t)(high + d);
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHRACCSU.H - Packed signed x unsigned 16-bit
+ * multiply high with rounding and accumulate
+ */
+target_ulong HELPER(pmhraccsu_h)(CPURISCVState *env, target_ulong rs1,
+                                 target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        int16_t d =3D (int16_t)EXTRACT16(dest, i);
+        int32_t prod =3D (int32_t)e1 * (uint32_t)e2 + (1 << 15);
+        int16_t high =3D (int16_t)(prod >> 16);
+        uint16_t res =3D (uint16_t)(high + d);
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHRACCU.H - Packed unsigned 16-bit multiply
+ * high with rounding and accumulate
+ */
+target_ulong HELPER(pmhraccu_h)(CPURISCVState *env, target_ulong rs1,
+                                target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D EXTRACT16(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i);
+        uint16_t d =3D (uint16_t)EXTRACT16(dest, i);
+        uint32_t prod =3D (uint32_t)e1 * (uint32_t)e2 + (1 << 15);
+        uint16_t high =3D (uint16_t)(prod >> 16);
+        uint16_t res =3D high + d;
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHACC.W - Packed signed 32-bit multiply high with accumulate (RV64 onl=
y)
+ */
+uint64_t HELPER(pmhacc_w)(CPURISCVState *env, uint64_t rs1,
+                          uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t e2 =3D (int32_t)EXTRACT32(rs2, i);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)e1 * (int64_t)e2;
+        int32_t high =3D (int32_t)(prod >> 32);
+        uint32_t res =3D (uint32_t)(high + d);
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHRACC.W - Packed signed 32-bit multiply high
+ * with rounding and accumulate (RV64 only)
+ */
+uint64_t HELPER(pmhracc_w)(CPURISCVState *env, uint64_t rs1,
+                           uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t e2 =3D (int32_t)EXTRACT32(rs2, i);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)e1 * (int64_t)e2 + (1LL << 31);
+        int32_t high =3D (int32_t)(prod >> 32);
+        uint32_t res =3D (uint32_t)(high + d);
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHACCSU.W - Packed signed x unsigned 32-bit
+ * multiply high with accumulate (RV64 only)
+ */
+uint64_t HELPER(pmhaccsu_w)(CPURISCVState *env, uint64_t rs1,
+                            uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)e1 * (uint64_t)e2;
+        int32_t high =3D (int32_t)(prod >> 32);
+        uint32_t res =3D (uint32_t)(high + d);
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHRACCSU.W - Packed signed x unsigned 32-bit
+ * multiply high with rounding and accumulate
+ * (RV64 only)
+ */
+uint64_t HELPER(pmhraccsu_w)(CPURISCVState *env, uint64_t rs1,
+                             uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)e1 * (uint64_t)e2 + (1LL << 31);
+        int32_t high =3D (int32_t)(prod >> 32);
+        uint32_t res =3D (uint32_t)(high + d);
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHACCU.W - Packed unsigned 32-bit multiply high with accumulate (RV64 =
only)
+ */
+uint64_t HELPER(pmhaccu_w)(CPURISCVState *env, uint64_t rs1,
+                           uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint32_t d =3D EXTRACT32(dest, i);
+        uint64_t prod =3D (uint64_t)e1 * (uint64_t)e2;
+        uint32_t high =3D (uint32_t)(prod >> 32);
+        uint32_t res =3D high + d;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHRACCU.W - Packed unsigned 32-bit multiply
+ * high with rounding and accumulate (RV64 only)
+ */
+uint64_t HELPER(pmhraccu_w)(CPURISCVState *env, uint64_t rs1,
+                            uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint32_t e1 =3D EXTRACT32(rs1, i);
+        uint32_t e2 =3D EXTRACT32(rs2, i);
+        uint32_t d =3D EXTRACT32(dest, i);
+        uint64_t prod =3D (uint64_t)e1 * (uint64_t)e2 + (1LL << 31);
+        uint32_t high =3D (uint32_t)(prod >> 32);
+        uint32_t res =3D high + d;
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * MHACC - 32-bit signed multiply high with accumulate
+ */
+uint32_t HELPER(mhacc)(CPURISCVState *env, uint32_t rs1,
+                        uint32_t rs2, uint32_t dest)
+{
+    int32_t a =3D (int32_t)rs1;
+    int32_t b =3D (int32_t)rs2;
+    int32_t d =3D (int32_t)dest;
+    int64_t prod =3D (int64_t)a * (int64_t)b;
+    return (uint32_t)(d + (prod >> 32));
+}
+
+/**
+ * MHRACC - 32-bit signed multiply high with rounding and accumulate
+ */
+uint32_t HELPER(mhracc)(CPURISCVState *env, uint32_t rs1,
+                         uint32_t rs2, uint32_t dest)
+{
+    int32_t a =3D (int32_t)rs1;
+    int32_t b =3D (int32_t)rs2;
+    int32_t d =3D (int32_t)dest;
+    int64_t prod =3D (int64_t)a * (int64_t)b + (1LL << 31);
+    return (uint32_t)(d + (prod >> 32));
+}
+
+/**
+ * MHACCSU - 32-bit signed x unsigned multiply high with accumulate
+ */
+uint32_t HELPER(mhaccsu)(CPURISCVState *env, uint32_t rs1,
+                          uint32_t rs2, uint32_t dest)
+{
+    int32_t a =3D (int32_t)rs1;
+    uint32_t b =3D rs2;
+    int32_t d =3D (int32_t)dest;
+    int64_t prod =3D (int64_t)a * (uint64_t)b;
+    return (uint32_t)(d + (prod >> 32));
+}
+
+/**
+ * MHRACCSU - 32-bit signed x unsigned multiply high
+ * with rounding and accumulate
+ */
+uint32_t HELPER(mhraccsu)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint32_t dest)
+{
+    int32_t a =3D (int32_t)rs1;
+    uint32_t b =3D rs2;
+    int32_t d =3D (int32_t)dest;
+    int64_t prod =3D (int64_t)a * (uint64_t)b + (1LL << 31);
+    return (uint32_t)(d + (prod >> 32));
+}
+
+/**
+ * MHACCU - 32-bit unsigned multiply high with accumulate
+ */
+uint32_t HELPER(mhaccu)(CPURISCVState *env, uint32_t rs1,
+                         uint32_t rs2, uint32_t dest)
+{
+    uint32_t a =3D rs1;
+    uint32_t b =3D rs2;
+    uint32_t d =3D dest;
+    uint64_t prod =3D (uint64_t)a * (uint64_t)b;
+    return (uint32_t)(d + (prod >> 32));
+}
+
+/**
+ * MHRACCU - 32-bit unsigned multiply high with rounding and accumulate
+ */
+uint32_t HELPER(mhraccu)(CPURISCVState *env, uint32_t rs1,
+                          uint32_t rs2, uint32_t dest)
+{
+    uint32_t a =3D rs1;
+    uint32_t b =3D rs2;
+    uint32_t d =3D dest;
+    uint64_t prod =3D (uint64_t)a * (uint64_t)b + (1LL << 31);
+    return (uint32_t)(d + (prod >> 32));
+}
+
+/**
+ * PMHACC.H.B0 - Multiply halfword by low byte and accumulate (high halfwo=
rd)
+ */
+target_ulong HELPER(pmhacc_h_b0)(CPURISCVState *env, target_ulong rs1,
+                                 target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int8_t e2 =3D (int8_t)EXTRACT8(rs2, i * 2);
+        int16_t d =3D (int16_t)EXTRACT16(dest, i);
+        int32_t prod =3D (int32_t)e1 * (int32_t)e2;
+        int16_t high =3D (int16_t)(prod >> 8);
+        uint16_t res =3D (uint16_t)(high + d);
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHACC.H.B1 - Multiply halfword by high byte and accumulate (high halfw=
ord)
+ */
+target_ulong HELPER(pmhacc_h_b1)(CPURISCVState *env, target_ulong rs1,
+                                 target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int8_t e2 =3D (int8_t)EXTRACT8(rs2, i * 2 + 1);
+        int16_t d =3D (int16_t)EXTRACT16(dest, i);
+        int32_t prod =3D (int32_t)e1 * (int32_t)e2;
+        int16_t high =3D (int16_t)(prod >> 8);
+        uint16_t res =3D (uint16_t)(high + d);
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHACCSU.H.B0 - Multiply signed halfword by unsigned low byte and accum=
ulate
+ */
+target_ulong HELPER(pmhaccsu_h_b0)(CPURISCVState *env, target_ulong rs1,
+                                   target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i * 2);
+        int16_t d =3D (int16_t)EXTRACT16(dest, i);
+        int32_t prod =3D (int32_t)e1 * (uint32_t)e2;
+        int16_t high =3D (int16_t)(prod >> 8);
+        uint16_t res =3D (uint16_t)(high + d);
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHACCSU.H.B1 - Multiply signed halfword by unsigned high byte and accu=
mulate
+ */
+target_ulong HELPER(pmhaccsu_h_b1)(CPURISCVState *env, target_ulong rs1,
+                                   target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        uint8_t e2 =3D EXTRACT8(rs2, i * 2 + 1);
+        int16_t d =3D (int16_t)EXTRACT16(dest, i);
+        int32_t prod =3D (int32_t)e1 * (uint32_t)e2;
+        int16_t high =3D (int16_t)(prod >> 8);
+        uint16_t res =3D (uint16_t)(high + d);
+        rd =3D INSERT16(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * MHACC.H0 - 32-bit multiply by low halfword high accumulate
+ */
+uint32_t HELPER(mhacc_h0)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint32_t dest)
+{
+    int32_t a =3D (int32_t)rs1;
+    int16_t b =3D (int16_t)(rs2 & 0xFFFF);
+    int32_t d =3D (int32_t)dest;
+    int64_t prod =3D (int64_t)a * (int64_t)b;
+    return (uint32_t)(d + (prod >> 16));
+}
+
+/**
+ * MHACC.H1 - 32-bit multiply by high halfword high accumulate
+ */
+uint32_t HELPER(mhacc_h1)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint32_t dest)
+{
+    int32_t a =3D (int32_t)rs1;
+    int16_t b =3D (int16_t)((rs2 >> 16) & 0xFFFF);
+    int32_t d =3D (int32_t)dest;
+    int64_t prod =3D (int64_t)a * (int64_t)b;
+    return (uint32_t)(d + (prod >> 16));
+}
+
+/**
+ * MHACCSU.H0 - 32-bit signed multiply by unsigned low halfword high accum=
ulate
+ */
+uint32_t HELPER(mhaccsu_h0)(CPURISCVState *env, uint32_t rs1,
+                             uint32_t rs2, uint32_t dest)
+{
+    int32_t a =3D (int32_t)rs1;
+    uint16_t b =3D (uint16_t)(rs2 & 0xFFFF);
+    int32_t d =3D (int32_t)dest;
+    int64_t prod =3D (int64_t)a * (uint64_t)b;
+    return (uint32_t)(d + (prod >> 16));
+}
+
+/**
+ * MHACCSU.H1 - 32-bit signed multiply by unsigned high halfword high accu=
mulate
+ */
+uint32_t HELPER(mhaccsu_h1)(CPURISCVState *env, uint32_t rs1,
+                             uint32_t rs2, uint32_t dest)
+{
+    int32_t a =3D (int32_t)rs1;
+    uint16_t b =3D (uint16_t)((rs2 >> 16) & 0xFFFF);
+    int32_t d =3D (int32_t)dest;
+    int64_t prod =3D (int64_t)a * (uint64_t)b;
+    return (uint32_t)(d + (prod >> 16));
+}
+
+/**
+ * PMHACC.W.H0 - Multiply word by low halfword high accumulate (RV64 only)
+ */
+uint64_t HELPER(pmhacc_w_h0)(CPURISCVState *env, uint64_t rs1,
+                              uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)e1 * (int64_t)e2;
+        int32_t high =3D (int32_t)(prod >> 16);
+        uint32_t res =3D (uint32_t)(high + d);
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHACC.W.H1 - Multiply word by high halfword high accumulate (RV64 only)
+ */
+uint64_t HELPER(pmhacc_w_h1)(CPURISCVState *env, uint64_t rs1,
+                              uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)e1 * (int64_t)e2;
+        int32_t high =3D (int32_t)(prod >> 16);
+        uint32_t res =3D (uint32_t)(high + d);
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHACCSU.W.H0 - Multiply signed word by unsigned low halfword
+ * high accumulate (RV64 only)
+ */
+uint64_t HELPER(pmhaccsu_w_h0)(CPURISCVState *env, uint64_t rs1,
+                                uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i * 2);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)e1 * (uint64_t)e2;
+        int32_t high =3D (int32_t)(prod >> 16);
+        uint32_t res =3D (uint32_t)(high + d);
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMHACCSU.W.H1 - Multiply signed word by unsigned high halfword
+ * high accumulate (RV64 only)
+ */
+uint64_t HELPER(pmhaccsu_w_h1)(CPURISCVState *env, uint64_t rs1,
+                                uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        uint16_t e2 =3D EXTRACT16(rs2, i * 2 + 1);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)e1 * (uint64_t)e2;
+        int32_t high =3D (int32_t)(prod >> 16);
+        uint32_t res =3D (uint32_t)(high + d);
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMACC.W.H00 - Packed multiply-accumulate, low halfwords
+ * For each word: rd[i] =3D dest[i] + (rs1[i][15:0] * rs2[i][15:0])
+ */
+uint64_t HELPER(pmacc_w_h00)(CPURISCVState *env, uint64_t rs1,
+                              uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int32_t d_h =3D (int32_t)EXTRACT32(dest, i);
+        int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h0;
+        rd =3D INSERT32(rd, (uint32_t)(d_h + mul), i);
+    }
+    return rd;
+}
+
+/**
+ * PMACC.W.H01 - Packed multiply-accumulate, rs1 low x rs2 high
+ * For each word: rd[i] =3D dest[i] + (rs1[i][15:0] * rs2[i][31:16])
+ */
+uint64_t HELPER(pmacc_w_h01)(CPURISCVState *env, uint64_t rs1,
+                              uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t d_h =3D (int32_t)EXTRACT32(dest, i);
+        int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h1;
+        rd =3D INSERT32(rd, (uint32_t)(d_h + mul), i);
+    }
+    return rd;
+}
+
+/**
+ * PMACC.W.H11 - Packed multiply-accumulate, high halfwords
+ * For each word: rd[i] =3D dest[i] + (rs1[i][31:16] * rs2[i][31:16])
+ */
+uint64_t HELPER(pmacc_w_h11)(CPURISCVState *env, uint64_t rs1,
+                              uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t d_h =3D (int32_t)EXTRACT32(dest, i);
+        int32_t mul =3D (int32_t)s1_h1 * (int32_t)s2_h1;
+        rd =3D INSERT32(rd, (uint32_t)(d_h + mul), i);
+    }
+    return rd;
+}
+
+/**
+ * PMACCSU.W.H00 - Packed signed x unsigned multiply-accumulate, low halfw=
ords
+ * For each word: rd[i] =3D dest[i] +
+ * (signed)rs1[i][15:0] * (unsigned)rs2[i][15:0]
+ */
+uint64_t HELPER(pmaccsu_w_h00)(CPURISCVState *env, uint64_t rs1,
+                                uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2);
+        int32_t d_h =3D (int32_t)EXTRACT32(dest, i);
+        int32_t mul =3D (int32_t)s1_h0 * (uint32_t)s2_h0;
+        rd =3D INSERT32(rd, (uint32_t)(d_h + mul), i);
+    }
+    return rd;
+}
+
+/**
+ * PMACCSU.W.H11 - Packed signed x unsigned multiply-accumulate, high half=
words
+ * For each word: rd[i] =3D dest[i] +
+ * (signed)rs1[i][31:16] * (unsigned)rs2[i][31:16]
+ */
+uint64_t HELPER(pmaccsu_w_h11)(CPURISCVState *env, uint64_t rs1,
+                                uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1);
+        int32_t d_h =3D (int32_t)EXTRACT32(dest, i);
+        int32_t mul =3D (int32_t)s1_h1 * (uint32_t)s2_h1;
+        rd =3D INSERT32(rd, (uint32_t)(d_h + mul), i);
+    }
+    return rd;
+}
+
+/**
+ * PMACCU.W.H00 - Packed unsigned multiply-accumulate, low halfwords
+ * For each word: rd[i] =3D dest[i] + rs1[i][15:0] * rs2[i][15:0] (unsigne=
d)
+ */
+uint64_t HELPER(pmaccu_w_h00)(CPURISCVState *env, uint64_t rs1,
+                               uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h0 =3D EXTRACT16(rs1, i * 2);
+        uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2);
+        uint32_t d_h =3D EXTRACT32(dest, i);
+        uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h0;
+        rd =3D INSERT32(rd, d_h + mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMACCU.W.H01 - Packed unsigned multiply-accumulate, rs1 low x rs2 high
+ * For each word: rd[i] =3D dest[i] + rs1[i][15:0] * rs2[i][31:16] (unsign=
ed)
+ */
+uint64_t HELPER(pmaccu_w_h01)(CPURISCVState *env, uint64_t rs1,
+                               uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h0 =3D EXTRACT16(rs1, i * 2);
+        uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1);
+        uint32_t d_h =3D EXTRACT32(dest, i);
+        uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h1;
+        rd =3D INSERT32(rd, d_h + mul, i);
+    }
+    return rd;
+}
+
+/**
+ * PMACCU.W.H11 - Packed unsigned multiply-accumulate, high halfwords
+ * For each word: rd[i] =3D dest[i] + rs1[i][31:16] * rs2[i][31:16] (unsig=
ned)
+ */
+uint64_t HELPER(pmaccu_w_h11)(CPURISCVState *env, uint64_t rs1,
+                               uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h1 =3D EXTRACT16(rs1, i * 2 + 1);
+        uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1);
+        uint32_t d_h =3D EXTRACT32(dest, i);
+        uint32_t mul =3D (uint32_t)s1_h1 * (uint32_t)s2_h1;
+        rd =3D INSERT32(rd, d_h + mul, i);
+    }
+    return rd;
+}
+
+/**
+ * MACC.H00 - 32-bit signed multiply-accumulate, low halfwords
+ * dest =3D dest + (rs1[15:0] * rs2[15:0])
+ */
+uint32_t HELPER(macc_h00)(CPURISCVState *env, uint32_t rs1,
+                          uint32_t rs2, uint32_t dest)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0);
+    int32_t d_h =3D (int32_t)dest;
+    int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h0;
+    return (uint32_t)(d_h + mul);
+}
+
+/**
+ * MACC.H01 - 32-bit signed multiply-accumulate, rs1 low x rs2 high
+ * dest =3D dest + (rs1[15:0] * rs2[31:16])
+ */
+uint32_t HELPER(macc_h01)(CPURISCVState *env, uint32_t rs1,
+                          uint32_t rs2, uint32_t dest)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1);
+    int32_t d_h =3D (int32_t)dest;
+    int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h1;
+    return (uint32_t)(d_h + mul);
+}
+
+/**
+ * MACC.H11 - 32-bit signed multiply-accumulate, high halfwords
+ * dest =3D dest + (rs1[31:16] * rs2[31:16])
+ */
+uint32_t HELPER(macc_h11)(CPURISCVState *env, uint32_t rs1,
+                          uint32_t rs2, uint32_t dest)
+{
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1);
+    int32_t d_h =3D (int32_t)dest;
+    int32_t mul =3D (int32_t)s1_h1 * (int32_t)s2_h1;
+    return (uint32_t)(d_h + mul);
+}
+
+/**
+ * MACCSU.H00 - 32-bit signed x unsigned multiply-accumulate, low halfwords
+ * dest =3D dest + (rs1[15:0] * rs2[15:0]) with rs2 unsigned
+ */
+uint32_t HELPER(maccsu_h00)(CPURISCVState *env, uint32_t rs1,
+                            uint32_t rs2, uint32_t dest)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    uint16_t s2_h0 =3D EXTRACT16(rs2, 0);
+    int32_t d_h =3D (int32_t)dest;
+    int32_t mul =3D (int32_t)s1_h0 * (uint32_t)s2_h0;
+    return (uint32_t)(d_h + mul);
+}
+
+/**
+ * MACCSU.H11 - 32-bit signed x unsigned multiply-accumulate, high halfwor=
ds
+ * dest =3D dest + (rs1[31:16] * rs2[31:16]) with rs2 unsigned
+ */
+uint32_t HELPER(maccsu_h11)(CPURISCVState *env, uint32_t rs1,
+                            uint32_t rs2, uint32_t dest)
+{
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    uint16_t s2_h1 =3D EXTRACT16(rs2, 1);
+    int32_t d_h =3D (int32_t)dest;
+    int32_t mul =3D (int32_t)s1_h1 * (uint32_t)s2_h1;
+    return (uint32_t)(d_h + mul);
+}
+
+/**
+ * MACCU.H00 - 32-bit unsigned multiply-accumulate, low halfwords
+ * dest =3D dest + (rs1[15:0] * rs2[15:0]) (unsigned)
+ */
+uint32_t HELPER(maccu_h00)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint32_t dest)
+{
+    uint16_t s1_h0 =3D EXTRACT16(rs1, 0);
+    uint16_t s2_h0 =3D EXTRACT16(rs2, 0);
+    uint32_t d_h =3D dest;
+    uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h0;
+    return d_h + mul;
+}
+
+/**
+ * MACCU.H01 - 32-bit unsigned multiply-accumulate, rs1 low x rs2 high
+ * dest =3D dest + (rs1[15:0] * rs2[31:16]) (unsigned)
+ */
+uint32_t HELPER(maccu_h01)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint32_t dest)
+{
+    uint16_t s1_h0 =3D EXTRACT16(rs1, 0);
+    uint16_t s2_h1 =3D EXTRACT16(rs2, 1);
+    uint32_t d_h =3D dest;
+    uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h1;
+    return d_h + mul;
+}
+
+/**
+ * MACCU.H11 - 32-bit unsigned multiply-accumulate, high halfwords
+ * dest =3D dest + (rs1[31:16] * rs2[31:16]) (unsigned)
+ */
+uint32_t HELPER(maccu_h11)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint32_t dest)
+{
+    uint16_t s1_h1 =3D EXTRACT16(rs1, 1);
+    uint16_t s2_h1 =3D EXTRACT16(rs2, 1);
+    uint32_t d_h =3D dest;
+    uint32_t mul =3D (uint32_t)s1_h1 * (uint32_t)s2_h1;
+    return d_h + mul;
+}
+
+/**
+ * MACC.W00 - 64-bit signed multiply-accumulate, low word x low word
+ * dest =3D dest + (rs1[31:0] * rs2[31:0])
+ */
+uint64_t HELPER(macc_w00)(CPURISCVState *env, uint64_t rs1,
+                          uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0);
+    int64_t d_w =3D (int64_t)dest;
+    int64_t mul =3D (int64_t)s1_w0 * (int64_t)s2_w0;
+    return (uint64_t)(d_w + mul);
+}
+
+/**
+ * MACC.W01 - 64-bit signed multiply-accumulate, low word x high word
+ * dest =3D dest + (rs1[31:0] * rs2[63:32])
+ */
+uint64_t HELPER(macc_w01)(CPURISCVState *env, uint64_t rs1,
+                          uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t d_w =3D (int64_t)dest;
+    int64_t mul =3D (int64_t)s1_w0 * (int64_t)s2_w1;
+    return (uint64_t)(d_w + mul);
+}
+
+/**
+ * MACC.W11 - 64-bit signed multiply-accumulate, high word x high word
+ * dest =3D dest + (rs1[63:32] * rs2[63:32])
+ */
+uint64_t HELPER(macc_w11)(CPURISCVState *env, uint64_t rs1,
+                          uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t d_w =3D (int64_t)dest;
+    int64_t mul =3D (int64_t)s1_w1 * (int64_t)s2_w1;
+    return (uint64_t)(d_w + mul);
+}
+
+/**
+ * MACCSU.W00 - 64-bit signed x unsigned
+ * multiply-accumulate, low word x low word
+ * dest =3D dest + (rs1[31:0] * rs2[31:0]) with rs2 interpreted as unsigned
+ */
+uint64_t HELPER(maccsu_w00)(CPURISCVState *env, uint64_t rs1,
+                            uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    uint32_t s2_w0 =3D EXTRACT32(rs2, 0);
+    int64_t d_w =3D (int64_t)dest;
+    int64_t mul =3D (int64_t)s1_w0 * (uint64_t)s2_w0;
+    return (uint64_t)(d_w + mul);
+}
+
+/**
+ * MACCSU.W11 - 64-bit signed x unsigned
+ * multiply-accumulate, high word x high word
+ * dest =3D dest + (rs1[63:32] * rs2[63:32]) with rs2 interpreted as unsig=
ned
+ */
+uint64_t HELPER(maccsu_w11)(CPURISCVState *env, uint64_t rs1,
+                            uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    uint32_t s2_w1 =3D EXTRACT32(rs2, 1);
+    int64_t d_w =3D (int64_t)dest;
+    int64_t mul =3D (int64_t)s1_w1 * (uint64_t)s2_w1;
+    return (uint64_t)(d_w + mul);
+}
+
+/**
+ * MACCU.W00 - 64-bit unsigned multiply-accumulate, low word x low word
+ * dest =3D dest + (rs1[31:0] * rs2[31:0]) (unsigned)
+ */
+uint64_t HELPER(maccu_w00)(CPURISCVState *env, uint64_t rs1,
+                           uint64_t rs2, uint64_t dest)
+{
+    uint32_t s1_w0 =3D EXTRACT32(rs1, 0);
+    uint32_t s2_w0 =3D EXTRACT32(rs2, 0);
+    uint64_t d_w =3D dest;
+    uint64_t mul =3D (uint64_t)s1_w0 * (uint64_t)s2_w0;
+    return d_w + mul;
+}
+
+/**
+ * MACCU.W01 - 64-bit unsigned multiply-accumulate, low word x high word
+ * dest =3D dest + (rs1[31:0] * rs2[63:32]) (unsigned)
+ */
+uint64_t HELPER(maccu_w01)(CPURISCVState *env, uint64_t rs1,
+                           uint64_t rs2, uint64_t dest)
+{
+    uint32_t s1_w0 =3D EXTRACT32(rs1, 0);
+    uint32_t s2_w1 =3D EXTRACT32(rs2, 1);
+    uint64_t d_w =3D dest;
+    uint64_t mul =3D (uint64_t)s1_w0 * (uint64_t)s2_w1;
+    return d_w + mul;
+}
+
+/**
+ * MACCU.W11 - 64-bit unsigned multiply-accumulate, high word x high word
+ * dest =3D dest + (rs1[63:32] * rs2[63:32]) (unsigned)
+ */
+uint64_t HELPER(maccu_w11)(CPURISCVState *env, uint64_t rs1,
+                           uint64_t rs2, uint64_t dest)
+{
+    uint32_t s1_w1 =3D EXTRACT32(rs1, 1);
+    uint32_t s2_w1 =3D EXTRACT32(rs2, 1);
+    uint64_t d_w =3D dest;
+    uint64_t mul =3D (uint64_t)s1_w1 * (uint64_t)s2_w1;
+    return d_w + mul;
+}
--=20
2.34.1
From nobody Sat May 30 20:13:16 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1776422936058589.8751077899603;
 Fri, 17 Apr 2026 03:48:56 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists1p.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1wDgjW-0001Ft-RX; Fri, 17 Apr 2026 06:47:34 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjP-00017T-HB; Fri, 17 Apr 2026 06:47:28 -0400
Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn)
 by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjL-0007zq-9r; Fri, 17 Apr 2026 06:47:27 -0400
Received: from Huawei.localdomain (unknown [36.110.52.2])
 by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S12;
 Fri, 17 Apr 2026 18:47:19 +0800 (CST)
From: Molly Chen <xiaoou@iscas.ac.cn>
To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com,
 daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com,
 chao.liu.zevorn@gmail.com
Cc: xiaoou@iscas.ac.cn,
	qemu-riscv@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 10/14] target/riscv: rvp: add Q-format multiplication
 operations
Date: Fri, 17 Apr 2026 18:46:47 +0800
Message-Id: <20260417104652.17857-11-xiaoou@iscas.ac.cn>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
References: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S12
X-Coremail-Antispam: 1UD129KBjvAXoW3trW3Ar43XF43Aw17Kw1kAFb_yoW8Aw1rAo
 W3Gw1Yy395uw17ur409w4UX3WUXrZ2qw1DXw4UZr47Xa4xKrnrKF45J34kAFyxGrWayrW7
 WFZ3JF1rtFy3C3sxn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3
 AaLaJ3UjIYCTnIWjp_UUUOb7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva
 j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc
 Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l
 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV
 WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE
 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I
 x0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8
 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_Jw
 0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AK
 xVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrx
 kI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v2
 6r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F
 4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjfU5Tmh
 DUUUU
X-Originating-IP: [36.110.52.2]
X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists1p.gnu.org;
Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn;
 helo=cstnet.cn
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998,
 HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: qemu development <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1776422938119154100
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Molly Chen <xiaoou@iscas.ac.cn>
---
 target/riscv/helper.h                   |  28 ++
 target/riscv/insn32.decode              |  43 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  28 ++
 target/riscv/psimd_helper.c             | 446 ++++++++++++++++++++++++
 4 files changed, 545 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 54f8591672..a5ecf9b7d7 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1661,3 +1661,31 @@ DEF_HELPER_4(maccsu_w11, i64, env, i64, i64, i64)
 DEF_HELPER_4(maccu_w00, i64, env, i64, i64, i64)
 DEF_HELPER_4(maccu_w01, i64, env, i64, i64, i64)
 DEF_HELPER_4(maccu_w11, i64, env, i64, i64, i64)
+
+/* Packed SIMD - Q-Format Multiplication Operations */
+DEF_HELPER_3(pmulq_h, tl, env, tl, tl)
+DEF_HELPER_3(pmulqr_h, tl, env, tl, tl)
+DEF_HELPER_3(pmulq_w, i64, env, i64, i64)
+DEF_HELPER_3(pmulqr_w, i64, env, i64, i64)
+DEF_HELPER_3(mulq, i32, env, i32, i32)
+DEF_HELPER_3(mulqr, i32, env, i32, i32)
+
+/* Packed SIMD - Q-Format Multiply-Accumulate Operations */
+DEF_HELPER_4(mqacc_h00, i32, env, i32, i32, i32)
+DEF_HELPER_4(mqacc_h01, i32, env, i32, i32, i32)
+DEF_HELPER_4(mqacc_h11, i32, env, i32, i32, i32)
+DEF_HELPER_4(mqracc_h00, i32, env, i32, i32, i32)
+DEF_HELPER_4(mqracc_h01, i32, env, i32, i32, i32)
+DEF_HELPER_4(mqracc_h11, i32, env, i32, i32, i32)
+DEF_HELPER_4(mqacc_w00, i64, env, i64, i64, i64)
+DEF_HELPER_4(mqacc_w01, i64, env, i64, i64, i64)
+DEF_HELPER_4(mqacc_w11, i64, env, i64, i64, i64)
+DEF_HELPER_4(mqracc_w00, i64, env, i64, i64, i64)
+DEF_HELPER_4(mqracc_w01, i64, env, i64, i64, i64)
+DEF_HELPER_4(mqracc_w11, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmqacc_w_h00, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmqacc_w_h01, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmqacc_w_h11, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmqracc_w_h00, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmqracc_w_h01, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmqracc_w_h11, i64, env, i64, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 9944d0b52c..b2a89e3a1f 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1505,3 +1505,46 @@ maccsu_w11      11111 11 ..... ..... 011 ..... 01110=
11 @r
 maccu_w00       10101 11 ..... ..... 011 ..... 0111011 @r
 maccu_w01       10111 11 ..... ..... 001 ..... 0111011 @r
 maccu_w11       10111 11 ..... ..... 011 ..... 0111011 @r
+
+# Packed SIMD - Q-Format Multiplication Operations
+pmulq_h     11010 00 ..... ..... 111 ..... 0111011 @r
+pmulqr_h    11010 10 ..... ..... 111 ..... 0111011 @r
+{
+  mulq      11010 01 ..... ..... 111 ..... 0111011 @r
+  pmulq_w   11010 01 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mulqr     11010 11 ..... ..... 111 ..... 0111011 @r
+  pmulqr_w  11010 11 ..... ..... 111 ..... 0111011 @r
+}
+# Packed SIMD - Q-Format Multiply-Accumulate Operations
+{
+  mqacc_h00 11101 00 ..... ..... 111 ..... 0111011 @r
+  pmqacc_w_h00  11101 00 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mqacc_h01 11111 00 ..... ..... 101 ..... 0111011 @r
+  pmqacc_w_h01 11111 00 ..... ..... 101 ..... 0111011 @r
+}
+{
+  mqacc_h11 11111 00 ..... ..... 111 ..... 0111011 @r
+  pmqacc_w_h11 11111 00 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mqracc_h00 11101 10 ..... ..... 111 ..... 0111011 @r
+  pmqracc_w_h00 11101 10 ..... ..... 111 ..... 0111011 @r
+}
+{
+  mqracc_h01 11111 10 ..... ..... 101 ..... 0111011 @r
+  pmqracc_w_h01 11111 10 ..... ..... 101 ..... 0111011 @r
+}
+{
+  mqracc_h11 11111 10 ..... ..... 111 ..... 0111011 @r
+  pmqracc_w_h11 11111 10 ..... ..... 111 ..... 0111011 @r
+}
+mqacc_w00       11101 01 ..... ..... 111 ..... 0111011 @r
+mqacc_w01       11111 01 ..... ..... 101 ..... 0111011 @r
+mqacc_w11       11111 01 ..... ..... 111 ..... 0111011 @r
+mqracc_w00      11101 11 ..... ..... 111 ..... 0111011 @r
+mqracc_w01      11111 11 ..... ..... 101 ..... 0111011 @r
+mqracc_w11      11111 11 ..... ..... 111 ..... 0111011 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr=
ans/trans_rvp.c.inc
index b3476c26ad..3310e23dce 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -830,3 +830,31 @@ GEN_SIMD_TRANS_ACC_64(maccsu_w11)
 GEN_SIMD_TRANS_ACC_64(maccu_w00)
 GEN_SIMD_TRANS_ACC_64(maccu_w01)
 GEN_SIMD_TRANS_ACC_64(maccu_w11)
+
+/* Packed SIMD - Q-Format Multiplication Operations */
+GEN_SIMD_TRANS(pmulq_h)
+GEN_SIMD_TRANS(pmulqr_h)
+GEN_SIMD_TRANS_64(pmulq_w)
+GEN_SIMD_TRANS_64(pmulqr_w)
+GEN_SIMD_TRANS_32(mulq)
+GEN_SIMD_TRANS_32(mulqr)
+
+/* Packed SIMD - Q-Format Multiply-Accumulate Operations */
+GEN_SIMD_TRANS_ACC_32(mqacc_h00)
+GEN_SIMD_TRANS_ACC_32(mqacc_h01)
+GEN_SIMD_TRANS_ACC_32(mqacc_h11)
+GEN_SIMD_TRANS_ACC_32(mqracc_h00)
+GEN_SIMD_TRANS_ACC_32(mqracc_h01)
+GEN_SIMD_TRANS_ACC_32(mqracc_h11)
+GEN_SIMD_TRANS_ACC_64(mqacc_w00)
+GEN_SIMD_TRANS_ACC_64(mqacc_w01)
+GEN_SIMD_TRANS_ACC_64(mqacc_w11)
+GEN_SIMD_TRANS_ACC_64(mqracc_w00)
+GEN_SIMD_TRANS_ACC_64(mqracc_w01)
+GEN_SIMD_TRANS_ACC_64(mqracc_w11)
+GEN_SIMD_TRANS_ACC_64(pmqacc_w_h00)
+GEN_SIMD_TRANS_ACC_64(pmqacc_w_h01)
+GEN_SIMD_TRANS_ACC_64(pmqacc_w_h11)
+GEN_SIMD_TRANS_ACC_64(pmqracc_w_h00)
+GEN_SIMD_TRANS_ACC_64(pmqracc_w_h01)
+GEN_SIMD_TRANS_ACC_64(pmqracc_w_h11)
diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c
index bddd24c997..d69a2f6453 100644
--- a/target/riscv/psimd_helper.c
+++ b/target/riscv/psimd_helper.c
@@ -5628,3 +5628,449 @@ uint64_t HELPER(maccu_w11)(CPURISCVState *env, uint=
64_t rs1,
     uint64_t mul =3D (uint64_t)s1_w1 * (uint64_t)s2_w1;
     return d_w + mul;
 }
+
+/* Q-Format Multiplication Operations */
+
+/**
+ * PMULQ.H - Packed signed Q-format multiply (fractional)
+ */
+target_ulong HELPER(pmulq_h)(CPURISCVState *env, target_ulong rs1,
+                             target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        uint16_t result;
+
+        if ((e1 =3D=3D -32768) && (e2 =3D=3D -32768)) {
+            sat =3D 1;
+            result =3D 0x7FFF;
+        } else {
+            int32_t prod =3D (int32_t)e1 * (int32_t)e2;
+            result =3D (prod >> 15) & 0xFFFF;
+        }
+        rd =3D INSERT16(rd, result, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PMULQR.H - Packed signed Q-format multiply with rounding
+ */
+target_ulong HELPER(pmulqr_h)(CPURISCVState *env, target_ulong rs1,
+                              target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t e1 =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t e2 =3D (int16_t)EXTRACT16(rs2, i);
+        uint16_t result;
+
+        if ((e1 =3D=3D -32768) && (e2 =3D=3D -32768)) {
+            sat =3D 1;
+            result =3D 0x7FFF;
+        } else {
+            int32_t prod =3D (int32_t)e1 * (int32_t)e2 + (1 << 14);
+            result =3D (prod >> 15) & 0xFFFF;
+        }
+        rd =3D INSERT16(rd, result, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PMULQ.W - Packed signed 32-bit Q-format multiply (RV64 only)
+ */
+uint64_t HELPER(pmulq_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t e2 =3D (int32_t)EXTRACT32(rs2, i);
+        uint32_t result;
+
+        if ((e1 =3D=3D -2147483647 - 1) && (e2 =3D=3D -2147483647 - 1)) {
+            sat =3D 1;
+            result =3D 0x7FFFFFFF;
+        } else {
+            int64_t prod =3D (int64_t)e1 * (int64_t)e2;
+            result =3D (uint32_t)(prod >> 31);
+        }
+        rd =3D INSERT32(rd, result, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PMULQR.W - Packed signed 32-bit Q-format multiply with rounding (RV64 o=
nly)
+ */
+uint64_t HELPER(pmulqr_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int elems =3D 2;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < elems; i++) {
+        int32_t e1 =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t e2 =3D (int32_t)EXTRACT32(rs2, i);
+        uint32_t result;
+
+        if ((e1 =3D=3D -2147483647 - 1) && (e2 =3D=3D -2147483647 - 1)) {
+            sat =3D 1;
+            result =3D 0x7FFFFFFF;
+        } else {
+            int64_t prod =3D (int64_t)e1 * (int64_t)e2 + (1LL << 30);
+            result =3D (uint32_t)(prod >> 31);
+        }
+        rd =3D INSERT32(rd, result, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * MULQ - 32-bit signed Q-format multiply
+ */
+uint32_t HELPER(mulq)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int32_t a =3D (int32_t)rs1;
+    int32_t b =3D (int32_t)rs2;
+
+    if ((a =3D=3D -2147483647 - 1) && (b =3D=3D -2147483647 - 1)) {
+        env->vxsat =3D 1;
+        return 0x7FFFFFFF;
+    } else {
+        int64_t prod =3D (int64_t)a * (int64_t)b;
+        return (uint32_t)(prod >> 31);
+    }
+}
+
+/**
+ * MULQR - 32-bit signed Q-format multiply with rounding
+ */
+uint32_t HELPER(mulqr)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int32_t a =3D (int32_t)rs1;
+    int32_t b =3D (int32_t)rs2;
+
+    if ((a =3D=3D -2147483647 - 1) && (b =3D=3D -2147483647 - 1)) {
+        env->vxsat =3D 1;
+        return 0x7FFFFFFF;
+    } else {
+        int64_t prod =3D (int64_t)a * (int64_t)b + (1LL << 30);
+        return (uint32_t)(prod >> 31);
+    }
+}
+
+
+/* Q-Format Multiply-Accumulate Operations */
+
+/**
+ * MQACC.H00 - Q-format multiply accumulate, both operands low halfword
+ */
+uint32_t HELPER(mqacc_h00)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint32_t dest)
+{
+    int16_t s1_h0 =3D (int16_t)(rs1 & 0xFFFF);
+    int16_t s2_h0 =3D (int16_t)(rs2 & 0xFFFF);
+    int32_t d =3D (int32_t)dest;
+    int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h0;
+    return (uint32_t)(d + (int32_t)(prod >> 15));
+}
+
+/**
+ * MQACC.H01 - Q-format multiply accumulate, rs1 low, rs2 high
+ */
+uint32_t HELPER(mqacc_h01)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint32_t dest)
+{
+    int16_t s1_h0 =3D (int16_t)(rs1 & 0xFFFF);
+    int16_t s2_h1 =3D (int16_t)((rs2 >> 16) & 0xFFFF);
+    int32_t d =3D (int32_t)dest;
+    int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h1;
+    return (uint32_t)(d + (int32_t)(prod >> 15));
+}
+
+/**
+ * MQACC.H11 - Q-format multiply accumulate, both operands high halfword
+ */
+uint32_t HELPER(mqacc_h11)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint32_t dest)
+{
+    int16_t s1_h1 =3D (int16_t)((rs1 >> 16) & 0xFFFF);
+    int16_t s2_h1 =3D (int16_t)((rs2 >> 16) & 0xFFFF);
+    int32_t d =3D (int32_t)dest;
+    int64_t prod =3D (int64_t)s1_h1 * (int64_t)s2_h1;
+    return (uint32_t)(d + (int32_t)(prod >> 15));
+}
+
+/**
+ * MQRACC.H00 - Q-format multiply accumulate with rounding, both low halfw=
ord
+ */
+uint32_t HELPER(mqracc_h00)(CPURISCVState *env, uint32_t rs1,
+                            uint32_t rs2, uint32_t dest)
+{
+    int16_t s1_h0 =3D (int16_t)(rs1 & 0xFFFF);
+    int16_t s2_h0 =3D (int16_t)(rs2 & 0xFFFF);
+    int32_t d =3D (int32_t)dest;
+    int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h0 + (1LL << 14);
+    return (uint32_t)(d + (int32_t)(prod >> 15));
+}
+
+/**
+ * MQRACC.H01 - Q-format multiply accumulate with rounding, rs1 low, rs2 h=
igh
+ */
+uint32_t HELPER(mqracc_h01)(CPURISCVState *env, uint32_t rs1,
+                            uint32_t rs2, uint32_t dest)
+{
+    int16_t s1_h0 =3D (int16_t)(rs1 & 0xFFFF);
+    int16_t s2_h1 =3D (int16_t)((rs2 >> 16) & 0xFFFF);
+    int32_t d =3D (int32_t)dest;
+    int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h1 + (1LL << 14);
+    return (uint32_t)(d + (int32_t)(prod >> 15));
+}
+
+/**
+ * MQRACC.H11 - Q-format multiply accumulate with rounding, both high half=
word
+ */
+uint32_t HELPER(mqracc_h11)(CPURISCVState *env, uint32_t rs1,
+                            uint32_t rs2, uint32_t dest)
+{
+    int16_t s1_h1 =3D (int16_t)((rs1 >> 16) & 0xFFFF);
+    int16_t s2_h1 =3D (int16_t)((rs2 >> 16) & 0xFFFF);
+    int32_t d =3D (int32_t)dest;
+    int64_t prod =3D (int64_t)s1_h1 * (int64_t)s2_h1 + (1LL << 14);
+    return (uint32_t)(d + (int32_t)(prod >> 15));
+}
+
+/**
+ * MQACC.W00 - Q-format multiply accumulate, both low word (RV64)
+ */
+uint64_t HELPER(mqacc_w00)(CPURISCVState *env, uint64_t rs1,
+                           uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w0 =3D (int32_t)(rs1 & 0xFFFFFFFF);
+    int32_t s2_w0 =3D (int32_t)(rs2 & 0xFFFFFFFF);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod =3D (int64_t)s1_w0 * (int64_t)s2_w0;
+    __int128_t prod_95 =3D ((__int128_t)prod) >> 31;
+    return (uint64_t)(d + (int64_t)prod_95);
+}
+
+/**
+ * MQACC.W01 - Q-format multiply accumulate, rs1 low, rs2 high (RV64)
+ */
+uint64_t HELPER(mqacc_w01)(CPURISCVState *env, uint64_t rs1,
+                           uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w0 =3D (int32_t)(rs1 & 0xFFFFFFFF);
+    int32_t s2_w1 =3D (int32_t)((rs2 >> 32) & 0xFFFFFFFF);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod =3D (int64_t)s1_w0 * (int64_t)s2_w1;
+    __int128_t prod_95 =3D ((__int128_t)prod) >> 31;
+    return (uint64_t)(d + (int64_t)prod_95);
+}
+
+/**
+ * MQACC.W11 - Q-format multiply accumulate, both high word (RV64)
+ */
+uint64_t HELPER(mqacc_w11)(CPURISCVState *env, uint64_t rs1,
+                           uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w1 =3D (int32_t)((rs1 >> 32) & 0xFFFFFFFF);
+    int32_t s2_w1 =3D (int32_t)((rs2 >> 32) & 0xFFFFFFFF);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod =3D (int64_t)s1_w1 * (int64_t)s2_w1;
+    __int128_t prod_95 =3D ((__int128_t)prod) >> 31;
+    return (uint64_t)(d + (int64_t)prod_95);
+}
+
+/**
+ * MQRACC.W00 - Q-format multiply accumulate with rounding,
+ * both low word (RV64)
+ */
+uint64_t HELPER(mqracc_w00)(CPURISCVState *env, uint64_t rs1,
+                            uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w0 =3D (int32_t)(rs1 & 0xFFFFFFFF);
+    int32_t s2_w0 =3D (int32_t)(rs2 & 0xFFFFFFFF);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod =3D (int64_t)s1_w0 * (int64_t)s2_w0 + (1LL << 30);
+    __int128_t prod_95 =3D ((__int128_t)prod) >> 31;
+    return (uint64_t)(d + (int64_t)prod_95);
+}
+
+/**
+ * MQRACC.W01 - Q-format multiply accumulate with rounding,
+ * rs1 low, rs2 high (RV64)
+ */
+uint64_t HELPER(mqracc_w01)(CPURISCVState *env, uint64_t rs1,
+                            uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w0 =3D (int32_t)(rs1 & 0xFFFFFFFF);
+    int32_t s2_w1 =3D (int32_t)((rs2 >> 32) & 0xFFFFFFFF);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod =3D (int64_t)s1_w0 * (int64_t)s2_w1 + (1LL << 30);
+    __int128_t prod_95 =3D ((__int128_t)prod) >> 31;
+    return (uint64_t)(d + (int64_t)prod_95);
+}
+
+/**
+ * MQRACC.W11 - Q-format multiply accumulate with rounding,
+ * both high word (RV64)
+ */
+uint64_t HELPER(mqracc_w11)(CPURISCVState *env, uint64_t rs1,
+                            uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w1 =3D (int32_t)((rs1 >> 32) & 0xFFFFFFFF);
+    int32_t s2_w1 =3D (int32_t)((rs2 >> 32) & 0xFFFFFFFF);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod =3D (int64_t)s1_w1 * (int64_t)s2_w1 + (1LL << 30);
+    __int128_t prod_95 =3D ((__int128_t)prod) >> 31;
+    return (uint64_t)(d + (int64_t)prod_95);
+}
+
+/**
+ * PMQACC.W.H00 - Packed Q-format multiply accumulate,
+ * low halfword (RV64)
+ */
+uint64_t HELPER(pmqacc_w_h00)(CPURISCVState *env, uint64_t rs1,
+                              uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int32_t d_w =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h0;
+        uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15));
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMQACC.W.H01 - Packed Q-format multiply accumulate,
+ * rs1 low, rs2 high (RV64)
+ */
+uint64_t HELPER(pmqacc_w_h01)(CPURISCVState *env, uint64_t rs1,
+                              uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t d_w =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h1;
+        uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15));
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMQACC.W.H11 - Packed Q-format multiply accumulate,
+ * both high halfword (RV64)
+ */
+uint64_t HELPER(pmqacc_w_h11)(CPURISCVState *env, uint64_t rs1,
+                              uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t d_w =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)s1_h1 * (int64_t)s2_h1;
+        uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15));
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMQRACC.W.H00 - Packed Q-format multiply accumulate
+ * with rounding, low halfword (RV64)
+ */
+uint64_t HELPER(pmqracc_w_h00)(CPURISCVState *env, uint64_t rs1,
+                               uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int32_t d_w =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h0 + (1LL << 14);
+        uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15));
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMQRACC.W.H01 - Packed Q-format multiply accumulate
+ * with rounding, rs1 low, rs2 high (RV64)
+ */
+uint64_t HELPER(pmqracc_w_h01)(CPURISCVState *env, uint64_t rs1,
+                               uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t d_w =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h1 + (1LL << 14);
+        uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15));
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMQRACC.W.H11 - Packed Q-format multiply accumulate
+ * with rounding, both high halfword (RV64)
+ */
+uint64_t HELPER(pmqracc_w_h11)(CPURISCVState *env, uint64_t rs1,
+                               uint64_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t d_w =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)s1_h1 * (int64_t)s2_h1 + (1LL << 14);
+        uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15));
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
--=20
2.34.1
From nobody Sat May 30 20:13:16 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1776422974065246.73751285455887;
 Fri, 17 Apr 2026 03:49:34 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists1p.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1wDgjq-0001Wz-GK; Fri, 17 Apr 2026 06:47:54 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjn-0001Ps-Ig; Fri, 17 Apr 2026 06:47:51 -0400
Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn)
 by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjh-00080h-PI; Fri, 17 Apr 2026 06:47:51 -0400
Received: from Huawei.localdomain (unknown [36.110.52.2])
 by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S13;
 Fri, 17 Apr 2026 18:47:20 +0800 (CST)
From: Molly Chen <xiaoou@iscas.ac.cn>
To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com,
 daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com,
 chao.liu.zevorn@gmail.com
Cc: xiaoou@iscas.ac.cn,
	qemu-riscv@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 11/14] target/riscv: rvp: add two-way and four-way multiply
 and accumulate operations
Date: Fri, 17 Apr 2026 18:46:48 +0800
Message-Id: <20260417104652.17857-12-xiaoou@iscas.ac.cn>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
References: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S13
X-Coremail-Antispam: 1UD129KBjvAXoWfKw48WryDAw4UArWkGrW7urg_yoWrWr1kto
 W3G3Wjy393Xw17uws5uw1UZr1vvrW2vrn8Ww40vr15Xas7Gry7KF1rXw1kZFW8CrWSyFWU
 WrZ2vF1rJa43C3srn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3
 AaLaJ3UjIYCTnIWjp_UUUOb7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva
 j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc
 Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l
 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV
 WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE
 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I
 x0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8
 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_Jw
 0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AK
 xVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrx
 kI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v2
 6r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F
 4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjfU5Tmh
 DUUUU
X-Originating-IP: [36.110.52.2]
X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists1p.gnu.org;
Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn;
 helo=cstnet.cn
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998,
 HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: qemu development <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1776422976604158501
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Molly Chen <xiaoou@iscas.ac.cn>
---
 target/riscv/helper.h                   |  48 ++
 target/riscv/insn32.decode              |  48 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  48 ++
 target/riscv/psimd_helper.c             | 938 ++++++++++++++++++++++++
 4 files changed, 1082 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a5ecf9b7d7..663ac0e242 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1689,3 +1689,51 @@ DEF_HELPER_4(pmqacc_w_h11, i64, env, i64, i64, i64)
 DEF_HELPER_4(pmqracc_w_h00, i64, env, i64, i64, i64)
 DEF_HELPER_4(pmqracc_w_h01, i64, env, i64, i64, i64)
 DEF_HELPER_4(pmqracc_w_h11, i64, env, i64, i64, i64)
+
+/* Packed SIMD - Two-Way Multiply and Accumulate Operations */
+DEF_HELPER_3(pmq2add_h, tl, env, tl, tl)
+DEF_HELPER_3(pmqr2add_h, tl, env, tl, tl)
+DEF_HELPER_4(pmq2adda_h, tl, env, tl, tl, tl)
+DEF_HELPER_4(pmqr2adda_h, tl, env, tl, tl, tl)
+DEF_HELPER_3(pmq2add_w, i64, env, i64, i64)
+DEF_HELPER_3(pmqr2add_w, i64, env, i64, i64)
+DEF_HELPER_4(pmq2adda_w, i64, env, i64, i64, i64)
+DEF_HELPER_4(pmqr2adda_w, i64, env, i64, i64, i64)
+DEF_HELPER_3(pm2add_h, tl, env, tl, tl)
+DEF_HELPER_3(pm2addsu_h, tl, env, tl, tl)
+DEF_HELPER_3(pm2addu_h, tl, env, tl, tl)
+DEF_HELPER_3(pm2add_hx, tl, env, tl, tl)
+DEF_HELPER_3(pm2sub_h, tl, env, tl, tl)
+DEF_HELPER_3(pm2sub_hx, tl, env, tl, tl)
+DEF_HELPER_4(pm2adda_h, tl, env, tl, tl, tl)
+DEF_HELPER_4(pm2addasu_h, tl, env, tl, tl, tl)
+DEF_HELPER_4(pm2addau_h, tl, env, tl, tl, tl)
+DEF_HELPER_4(pm2adda_hx, tl, env, tl, tl, tl)
+DEF_HELPER_4(pm2suba_h, tl, env, tl, tl, tl)
+DEF_HELPER_4(pm2suba_hx, tl, env, tl, tl, tl)
+DEF_HELPER_3(pm2add_w, i64, env, i64, i64)
+DEF_HELPER_3(pm2addsu_w, i64, env, i64, i64)
+DEF_HELPER_3(pm2addu_w, i64, env, i64, i64)
+DEF_HELPER_3(pm2add_wx, i64, env, i64, i64)
+DEF_HELPER_3(pm2sub_w, i64, env, i64, i64)
+DEF_HELPER_3(pm2sub_wx, i64, env, i64, i64)
+DEF_HELPER_4(pm2adda_w, i64, env, i64, i64, i64)
+DEF_HELPER_4(pm2addasu_w, i64, env, i64, i64, i64)
+DEF_HELPER_4(pm2addau_w, i64, env, i64, i64, i64)
+DEF_HELPER_4(pm2adda_wx, i64, env, i64, i64, i64)
+DEF_HELPER_4(pm2suba_w, i64, env, i64, i64, i64)
+DEF_HELPER_4(pm2suba_wx, i64, env, i64, i64, i64)
+
+/* Packed SIMD - Four-Way Multiply and Accumulate Operations */
+DEF_HELPER_3(pm4add_b, tl, env, tl, tl)
+DEF_HELPER_3(pm4addsu_b, tl, env, tl, tl)
+DEF_HELPER_3(pm4addu_b, tl, env, tl, tl)
+DEF_HELPER_4(pm4adda_b, tl, env, tl, tl, tl)
+DEF_HELPER_4(pm4addasu_b, tl, env, tl, tl, tl)
+DEF_HELPER_4(pm4addau_b, tl, env, tl, tl, tl)
+DEF_HELPER_3(pm4add_h, i64, env, i64, i64)
+DEF_HELPER_3(pm4addsu_h, i64, env, i64, i64)
+DEF_HELPER_3(pm4addu_h, i64, env, i64, i64)
+DEF_HELPER_4(pm4adda_h, i64, env, i64, i64, i64)
+DEF_HELPER_4(pm4addasu_h, i64, env, i64, i64, i64)
+DEF_HELPER_4(pm4addau_h, i64, env, i64, i64, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b2a89e3a1f..ebfbf8c799 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1548,3 +1548,51 @@ mqacc_w11       11111 01 ..... ..... 111 ..... 01110=
11 @r
 mqracc_w00      11101 11 ..... ..... 111 ..... 0111011 @r
 mqracc_w01      11111 11 ..... ..... 101 ..... 0111011 @r
 mqracc_w11      11111 11 ..... ..... 111 ..... 0111011 @r
+
+# Packed SIMD - Two-Way Multiply and Accumulate Operations
+pmq2add_h       10110 00 ..... ..... 101 ..... 0111011 @r
+pmqr2add_h      10110 10 ..... ..... 101 ..... 0111011 @r
+pmq2adda_h      10111 00 ..... ..... 101 ..... 0111011 @r
+pmqr2adda_h     10111 10 ..... ..... 101 ..... 0111011 @r
+pmq2add_w       10110 01 ..... ..... 101 ..... 0111011 @r
+pmqr2add_w      10110 11 ..... ..... 101 ..... 0111011 @r
+pmq2adda_w      10111 01 ..... ..... 101 ..... 0111011 @r
+pmqr2adda_w     10111 11 ..... ..... 101 ..... 0111011 @r
+pm2add_h        10000 00 ..... ..... 101 ..... 0111011 @r
+pm2addsu_h      11100 00 ..... ..... 101 ..... 0111011 @r
+pm2addu_h       10100 00 ..... ..... 101 ..... 0111011 @r
+pm2add_hx       10010 00 ..... ..... 101 ..... 0111011 @r
+pm2sub_h        11000 00 ..... ..... 101 ..... 0111011 @r
+pm2sub_hx       11010 00 ..... ..... 101 ..... 0111011 @r
+pm2adda_h       10001 00 ..... ..... 101 ..... 0111011 @r
+pm2addasu_h     11101 00 ..... ..... 101 ..... 0111011 @r
+pm2addau_h      10101 00 ..... ..... 101 ..... 0111011 @r
+pm2adda_hx      10011 00 ..... ..... 101 ..... 0111011 @r
+pm2suba_h       11001 00 ..... ..... 101 ..... 0111011 @r
+pm2suba_hx      11011 00 ..... ..... 101 ..... 0111011 @r
+pm2add_w        10000 01 ..... ..... 101 ..... 0111011 @r
+pm2addsu_w      11100 01 ..... ..... 101 ..... 0111011 @r
+pm2addu_w       10100 01 ..... ..... 101 ..... 0111011 @r
+pm2add_wx       10010 01 ..... ..... 101 ..... 0111011 @r
+pm2sub_w        11000 01 ..... ..... 101 ..... 0111011 @r
+pm2sub_wx       11010 01 ..... ..... 101 ..... 0111011 @r
+pm2adda_w       10001 01 ..... ..... 101 ..... 0111011 @r
+pm2addasu_w     11101 01 ..... ..... 101 ..... 0111011 @r
+pm2addau_w      10101 01 ..... ..... 101 ..... 0111011 @r
+pm2adda_wx      10011 01 ..... ..... 101 ..... 0111011 @r
+pm2suba_w       11001 01 ..... ..... 101 ..... 0111011 @r
+pm2suba_wx      11011 01 ..... ..... 101 ..... 0111011 @r
+
+# Packed SIMD - Four-Way Multiply and Accumulate Operations
+pm4add_b        10000 10 ..... ..... 101 ..... 0111011 @r
+pm4addsu_b      11100 10 ..... ..... 101 ..... 0111011 @r
+pm4addu_b       10100 10 ..... ..... 101 ..... 0111011 @r
+pm4adda_b       10001 10 ..... ..... 101 ..... 0111011 @r
+pm4addasu_b     11101 10 ..... ..... 101 ..... 0111011 @r
+pm4addau_b      10101 10 ..... ..... 101 ..... 0111011 @r
+pm4add_h        10000 11 ..... ..... 101 ..... 0111011 @r
+pm4addsu_h      11100 11 ..... ..... 101 ..... 0111011 @r
+pm4addu_h       10100 11 ..... ..... 101 ..... 0111011 @r
+pm4adda_h       10001 11 ..... ..... 101 ..... 0111011 @r
+pm4addasu_h     11101 11 ..... ..... 101 ..... 0111011 @r
+pm4addau_h      10101 11 ..... ..... 101 ..... 0111011 @r
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr=
ans/trans_rvp.c.inc
index 3310e23dce..86071d71f7 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -858,3 +858,51 @@ GEN_SIMD_TRANS_ACC_64(pmqacc_w_h11)
 GEN_SIMD_TRANS_ACC_64(pmqracc_w_h00)
 GEN_SIMD_TRANS_ACC_64(pmqracc_w_h01)
 GEN_SIMD_TRANS_ACC_64(pmqracc_w_h11)
+
+/* Packed SIMD - Two-Way Multiply and Accumulate Operations */
+GEN_SIMD_TRANS(pmq2add_h)
+GEN_SIMD_TRANS(pmqr2add_h)
+GEN_SIMD_TRANS_ACC(pmq2adda_h)
+GEN_SIMD_TRANS_ACC(pmqr2adda_h)
+GEN_SIMD_TRANS_64(pmq2add_w)
+GEN_SIMD_TRANS_64(pmqr2add_w)
+GEN_SIMD_TRANS_ACC_64(pmq2adda_w)
+GEN_SIMD_TRANS_ACC_64(pmqr2adda_w)
+GEN_SIMD_TRANS(pm2add_h)
+GEN_SIMD_TRANS(pm2addsu_h)
+GEN_SIMD_TRANS(pm2addu_h)
+GEN_SIMD_TRANS(pm2add_hx)
+GEN_SIMD_TRANS(pm2sub_h)
+GEN_SIMD_TRANS(pm2sub_hx)
+GEN_SIMD_TRANS_ACC(pm2adda_h)
+GEN_SIMD_TRANS_ACC(pm2addasu_h)
+GEN_SIMD_TRANS_ACC(pm2addau_h)
+GEN_SIMD_TRANS_ACC(pm2adda_hx)
+GEN_SIMD_TRANS_ACC(pm2suba_h)
+GEN_SIMD_TRANS_ACC(pm2suba_hx)
+GEN_SIMD_TRANS_64(pm2add_w)
+GEN_SIMD_TRANS_64(pm2addsu_w)
+GEN_SIMD_TRANS_64(pm2addu_w)
+GEN_SIMD_TRANS_64(pm2add_wx)
+GEN_SIMD_TRANS_64(pm2sub_w)
+GEN_SIMD_TRANS_64(pm2sub_wx)
+GEN_SIMD_TRANS_ACC_64(pm2adda_w)
+GEN_SIMD_TRANS_ACC_64(pm2addasu_w)
+GEN_SIMD_TRANS_ACC_64(pm2addau_w)
+GEN_SIMD_TRANS_ACC_64(pm2adda_wx)
+GEN_SIMD_TRANS_ACC_64(pm2suba_w)
+GEN_SIMD_TRANS_ACC_64(pm2suba_wx)
+
+/* Packed SIMD - Four-Way Multiply and Accumulate Operations */
+GEN_SIMD_TRANS(pm4add_b)
+GEN_SIMD_TRANS(pm4addsu_b)
+GEN_SIMD_TRANS(pm4addu_b)
+GEN_SIMD_TRANS_ACC(pm4adda_b)
+GEN_SIMD_TRANS_ACC(pm4addasu_b)
+GEN_SIMD_TRANS_ACC(pm4addau_b)
+GEN_SIMD_TRANS_64(pm4add_h)
+GEN_SIMD_TRANS_64(pm4addsu_h)
+GEN_SIMD_TRANS_64(pm4addu_h)
+GEN_SIMD_TRANS_ACC_64(pm4adda_h)
+GEN_SIMD_TRANS_ACC_64(pm4addasu_h)
+GEN_SIMD_TRANS_ACC_64(pm4addau_h)
diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c
index d69a2f6453..5eede48581 100644
--- a/target/riscv/psimd_helper.c
+++ b/target/riscv/psimd_helper.c
@@ -6074,3 +6074,941 @@ uint64_t HELPER(pmqracc_w_h11)(CPURISCVState *env, =
uint64_t rs1,
     }
     return rd;
 }
+
+/* Two-Way Multiply and Accumulate Operations */
+
+/**
+ * PMQ2ADD.H - Add two Q-format products
+ */
+target_ulong HELPER(pmq2add_h)(CPURISCVState *env, target_ulong rs1,
+                               target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0;
+        int64_t prod0_47 =3D ((int64_t)prod0) >> 15;
+        int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1;
+        int64_t prod1_47 =3D ((int64_t)prod1) >> 15;
+        uint32_t sum =3D (uint32_t)(prod0_47 + prod1_47);
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PMQR2ADD.H - Add two Q-format products with rounding
+ */
+target_ulong HELPER(pmqr2add_h)(CPURISCVState *env, target_ulong rs1,
+                                target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0 + (1LL << 14);
+        int64_t prod0_47 =3D ((int64_t)prod0) >> 15;
+        int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1 + (1LL << 14);
+        int64_t prod1_47 =3D ((int64_t)prod1) >> 15;
+        uint32_t sum =3D (uint32_t)(prod0_47 + prod1_47);
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PMQ2ADDA.H - Add two Q-format products with accumulate
+ */
+target_ulong HELPER(pmq2adda_h)(CPURISCVState *env, target_ulong rs1,
+                                target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0;
+        int64_t prod0_47 =3D ((int64_t)prod0) >> 15;
+        int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1;
+        int64_t prod1_47 =3D ((int64_t)prod1) >> 15;
+        uint32_t sum =3D (uint32_t)(d + prod0_47 + prod1_47);
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PMQR2ADDA.H - Add two Q-format products with rounding and accumulate
+ */
+target_ulong HELPER(pmqr2adda_h)(CPURISCVState *env, target_ulong rs1,
+                                 target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0 + (1LL << 14);
+        int64_t prod0_47 =3D ((int64_t)prod0) >> 15;
+        int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1 + (1LL << 14);
+        int64_t prod1_47 =3D ((int64_t)prod1) >> 15;
+        uint32_t sum =3D (uint32_t)(d + prod0_47 + prod1_47);
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PMQ2ADD.W - Add two Q-format products (word, RV64 only)
+ */
+uint64_t HELPER(pmq2add_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0;
+    __int128_t prod0_95 =3D ((__int128_t)prod0) >> 31;
+    int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1;
+    __int128_t prod1_95 =3D ((__int128_t)prod1) >> 31;
+    return (uint64_t)(prod0_95 + prod1_95);
+}
+
+/**
+ * PMQR2ADD.W - Add two Q-format products with rounding (word, RV64 only)
+ */
+uint64_t HELPER(pmqr2add_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0 + (1LL << 30);
+    __int128_t prod0_95 =3D ((__int128_t)prod0) >> 31;
+    int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1 + (1LL << 30);
+    __int128_t prod1_95 =3D ((__int128_t)prod1) >> 31;
+    return (uint64_t)(prod0_95 + prod1_95);
+}
+
+/**
+ * PMQ2ADDA.W - Add two Q-format products with accumulate (word, RV64 only)
+ */
+uint64_t HELPER(pmq2adda_w)(CPURISCVState *env, uint64_t rs1,
+                            uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0;
+    __int128_t prod0_95 =3D ((__int128_t)prod0) >> 31;
+    int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1;
+    __int128_t prod1_95 =3D ((__int128_t)prod1) >> 31;
+    return (uint64_t)(d + prod0_95 + prod1_95);
+}
+
+/**
+ * PMQR2ADDA.W - Add two Q-format products with rounding
+ * and accumulate (word, RV64 only)
+ */
+uint64_t HELPER(pmqr2adda_w)(CPURISCVState *env, uint64_t rs1,
+                             uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0 + (1LL << 30);
+    __int128_t prod0_95 =3D ((__int128_t)prod0) >> 31;
+    int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1 + (1LL << 30);
+    __int128_t prod1_95 =3D ((__int128_t)prod1) >> 31;
+    return (uint64_t)(d + prod0_95 + prod1_95);
+}
+
+/**
+ * PM2ADD.H - Add two products horizontally
+ * For each word: rd[i] =3D rs1[2i] * rs2[2i] + rs1[2i+1] * rs2[2i+1]
+ */
+target_ulong HELPER(pm2add_h)(CPURISCVState *env, target_ulong rs1,
+                              target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0;
+        int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1;
+        uint32_t sum =3D (uint32_t)(prod0 + prod1);
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PM2ADDSU.H - Add two products horizontally (signed x unsigned)
+ */
+target_ulong HELPER(pm2addsu_h)(CPURISCVState *env, target_ulong rs1,
+                                target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2);
+        uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1);
+        int32_t prod0 =3D (int32_t)s1_h0 * (uint32_t)s2_h0;
+        int32_t prod1 =3D (int32_t)s1_h1 * (uint32_t)s2_h1;
+        uint32_t sum =3D (uint32_t)(prod0 + prod1);
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PM2ADDU.H - Add two products horizontally (unsigned)
+ */
+target_ulong HELPER(pm2addu_h)(CPURISCVState *env, target_ulong rs1,
+                               target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h0 =3D EXTRACT16(rs1, i * 2);
+        uint16_t s1_h1 =3D EXTRACT16(rs1, i * 2 + 1);
+        uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2);
+        uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1);
+        uint32_t prod0 =3D (uint32_t)s1_h0 * (uint32_t)s2_h0;
+        uint32_t prod1 =3D (uint32_t)s1_h1 * (uint32_t)s2_h1;
+        uint32_t sum =3D prod0 + prod1;
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PM2ADD.HX - Add cross products horizontally
+ * For each word: rd[i] =3D rs1[2i] * rs2[2i+1] + rs1[2i+1] * rs2[2i]
+ */
+target_ulong HELPER(pm2add_hx)(CPURISCVState *env, target_ulong rs1,
+                               target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t prod01 =3D (int32_t)s1_h0 * (int32_t)s2_h1;
+        int32_t prod10 =3D (int32_t)s1_h1 * (int32_t)s2_h0;
+        uint32_t sum =3D (uint32_t)(prod01 + prod10);
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PM2SUB.H - Subtract two products horizontally
+ * For each word: rd[i] =3D rs1[2i] * rs2[2i] - rs1[2i+1] * rs2[2i+1]
+ */
+target_ulong HELPER(pm2sub_h)(CPURISCVState *env, target_ulong rs1,
+                              target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0;
+        int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1;
+        uint32_t diff =3D (uint32_t)(prod0 - prod1);
+        rd =3D INSERT32(rd, diff, i);
+    }
+    return rd;
+}
+
+/**
+ * PM2SUB.HX - Subtract cross products horizontally
+ * For each word: rd[i] =3D rs1[2i+1] * rs2[2i] - rs1[2i] * rs2[2i+1]
+ */
+target_ulong HELPER(pm2sub_hx)(CPURISCVState *env, target_ulong rs1,
+                               target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t prod10 =3D (int32_t)s1_h1 * (int32_t)s2_h0;
+        int32_t prod01 =3D (int32_t)s1_h0 * (int32_t)s2_h1;
+        uint32_t diff =3D (uint32_t)(prod10 - prod01);
+        rd =3D INSERT32(rd, diff, i);
+    }
+    return rd;
+}
+
+/**
+ * PM2ADDA.H - Add two products horizontally with accumulate
+ */
+target_ulong HELPER(pm2adda_h)(CPURISCVState *env, target_ulong rs1,
+                               target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0;
+        int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1;
+        uint32_t sum =3D (uint32_t)(d + prod0 + prod1);
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PM2ADDASU.H - Add two products horizontally with accumulate
+ * (signed x unsigned)
+ */
+target_ulong HELPER(pm2addasu_h)(CPURISCVState *env, target_ulong rs1,
+                                 target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2);
+        uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int32_t prod0 =3D (int32_t)s1_h0 * (uint32_t)s2_h0;
+        int32_t prod1 =3D (int32_t)s1_h1 * (uint32_t)s2_h1;
+        uint32_t sum =3D (uint32_t)(d + prod0 + prod1);
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PM2ADDAU.H - Add two products horizontally with accumulate (unsigned)
+ */
+target_ulong HELPER(pm2addau_h)(CPURISCVState *env, target_ulong rs1,
+                                target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t s1_h0 =3D EXTRACT16(rs1, i * 2);
+        uint16_t s1_h1 =3D EXTRACT16(rs1, i * 2 + 1);
+        uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2);
+        uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1);
+        uint32_t d =3D EXTRACT32(dest, i);
+        uint32_t prod0 =3D (uint32_t)s1_h0 * (uint32_t)s2_h0;
+        uint32_t prod1 =3D (uint32_t)s1_h1 * (uint32_t)s2_h1;
+        uint32_t sum =3D d + prod0 + prod1;
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PM2ADDA.HX - Add cross products horizontally with accumulate
+ */
+target_ulong HELPER(pm2adda_hx)(CPURISCVState *env, target_ulong rs1,
+                                target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int32_t prod01 =3D (int32_t)s1_h0 * (int32_t)s2_h1;
+        int32_t prod10 =3D (int32_t)s1_h1 * (int32_t)s2_h0;
+        uint32_t sum =3D (uint32_t)(d + prod01 + prod10);
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PM2SUBA.H - Subtract two products horizontally with accumulate
+ */
+target_ulong HELPER(pm2suba_h)(CPURISCVState *env, target_ulong rs1,
+                               target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0;
+        int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1;
+        uint32_t diff =3D (uint32_t)(d + prod0 - prod1);
+        rd =3D INSERT32(rd, diff, i);
+    }
+    return rd;
+}
+
+/**
+ * PM2SUBA.HX - Subtract cross products horizontally with accumulate
+ */
+target_ulong HELPER(pm2suba_hx)(CPURISCVState *env, target_ulong rs1,
+                                target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1);
+        int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int32_t prod01 =3D (int32_t)s1_h0 * (int32_t)s2_h1;
+        int32_t prod10 =3D (int32_t)s1_h1 * (int32_t)s2_h0;
+        uint32_t diff =3D (uint32_t)(d + prod01 - prod10);
+        rd =3D INSERT32(rd, diff, i);
+    }
+    return rd;
+}
+
+/**
+ * PM2ADD.W - Add two products horizontally (word, RV64 only)
+ */
+uint64_t HELPER(pm2add_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0;
+    int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1;
+    return (uint64_t)(prod0 + prod1);
+}
+
+/**
+ * PM2ADDSU.W - Add two products horizontally (signed x unsigned, RV64 onl=
y)
+ */
+uint64_t HELPER(pm2addsu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    uint32_t s2_w0 =3D EXTRACT32(rs2, 0);
+    uint32_t s2_w1 =3D EXTRACT32(rs2, 1);
+    int64_t prod0 =3D (int64_t)s1_w0 * (uint64_t)s2_w0;
+    int64_t prod1 =3D (int64_t)s1_w1 * (uint64_t)s2_w1;
+    return (uint64_t)(prod0 + prod1);
+}
+
+/**
+ * PM2ADDU.W - Add two products horizontally (unsigned, RV64 only)
+ */
+uint64_t HELPER(pm2addu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint32_t s1_w0 =3D EXTRACT32(rs1, 0);
+    uint32_t s1_w1 =3D EXTRACT32(rs1, 1);
+    uint32_t s2_w0 =3D EXTRACT32(rs2, 0);
+    uint32_t s2_w1 =3D EXTRACT32(rs2, 1);
+    uint64_t prod0 =3D (uint64_t)s1_w0 * (uint64_t)s2_w0;
+    uint64_t prod1 =3D (uint64_t)s1_w1 * (uint64_t)s2_w1;
+    return prod0 + prod1;
+}
+
+/**
+ * PM2ADD.WX - Add cross products horizontally (word, RV64 only)
+ */
+uint64_t HELPER(pm2add_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t prod01 =3D (int64_t)s1_w0 * (int64_t)s2_w1;
+    int64_t prod10 =3D (int64_t)s1_w1 * (int64_t)s2_w0;
+    return (uint64_t)(prod01 + prod10);
+}
+
+/**
+ * PM2SUB.W - Subtract two products horizontally (word, RV64 only)
+ */
+uint64_t HELPER(pm2sub_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0;
+    int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1;
+    return (uint64_t)(prod0 - prod1);
+}
+
+/**
+ * PM2SUB.WX - Subtract cross products horizontally (word, RV64 only)
+ */
+uint64_t HELPER(pm2sub_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t prod10 =3D (int64_t)s1_w1 * (int64_t)s2_w0;
+    int64_t prod01 =3D (int64_t)s1_w0 * (int64_t)s2_w1;
+    return (uint64_t)(prod10 - prod01);
+}
+
+/**
+ * PM2ADDA.W - Add two products horizontally with accumulate (word, RV64 o=
nly)
+ */
+uint64_t HELPER(pm2adda_w)(CPURISCVState *env, uint64_t rs1,
+                           uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0;
+    int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1;
+    return (uint64_t)(d + prod0 + prod1);
+}
+
+/**
+ * PM2ADDASU.W - Add two products horizontally with accumulate
+ * (signed x unsigned, RV64 only)
+ */
+uint64_t HELPER(pm2addasu_w)(CPURISCVState *env, uint64_t rs1,
+                             uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    uint32_t s2_w0 =3D EXTRACT32(rs2, 0);
+    uint32_t s2_w1 =3D EXTRACT32(rs2, 1);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod0 =3D (int64_t)s1_w0 * (uint64_t)s2_w0;
+    int64_t prod1 =3D (int64_t)s1_w1 * (uint64_t)s2_w1;
+    return (uint64_t)(d + prod0 + prod1);
+}
+
+/**
+ * PM2ADDAU.W - Add two products horizontally with accumulate
+ * (unsigned, RV64 only)
+ */
+uint64_t HELPER(pm2addau_w)(CPURISCVState *env, uint64_t rs1,
+                            uint64_t rs2, uint64_t dest)
+{
+    uint32_t s1_w0 =3D EXTRACT32(rs1, 0);
+    uint32_t s1_w1 =3D EXTRACT32(rs1, 1);
+    uint32_t s2_w0 =3D EXTRACT32(rs2, 0);
+    uint32_t s2_w1 =3D EXTRACT32(rs2, 1);
+    uint64_t d =3D dest;
+    uint64_t prod0 =3D (uint64_t)s1_w0 * (uint64_t)s2_w0;
+    uint64_t prod1 =3D (uint64_t)s1_w1 * (uint64_t)s2_w1;
+    return d + prod0 + prod1;
+}
+
+/**
+ * PM2ADDA.WX - Add cross products horizontally with accumulate
+ * (word, RV64 only)
+ */
+uint64_t HELPER(pm2adda_wx)(CPURISCVState *env, uint64_t rs1,
+                            uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod01 =3D (int64_t)s1_w0 * (int64_t)s2_w1;
+    int64_t prod10 =3D (int64_t)s1_w1 * (int64_t)s2_w0;
+    return (uint64_t)(d + prod01 + prod10);
+}
+
+/**
+ * PM2SUBA.W - Subtract two products horizontally with accumulate
+ * (word, RV64 only)
+ */
+uint64_t HELPER(pm2suba_w)(CPURISCVState *env, uint64_t rs1,
+                           uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0;
+    int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1;
+    return (uint64_t)(d + prod0 - prod1);
+}
+
+/**
+ * PM2SUBA.WX - Subtract cross products horizontally with accumulate
+ * (word, RV64 only)
+ */
+uint64_t HELPER(pm2suba_wx)(CPURISCVState *env, uint64_t rs1,
+                            uint64_t rs2, uint64_t dest)
+{
+    int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0);
+    int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1);
+    int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0);
+    int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod01 =3D (int64_t)s1_w0 * (int64_t)s2_w1;
+    int64_t prod10 =3D (int64_t)s1_w1 * (int64_t)s2_w0;
+    return (uint64_t)(d + prod01 - prod10);
+}
+
+
+/* Four-Way Multiply and Accumulate Operations */
+
+/**
+ * PM4ADD.B - Add four products horizontally (byte to word)
+ */
+target_ulong HELPER(pm4add_b)(CPURISCVState *env, target_ulong rs1,
+                              target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int8_t s1_b0 =3D (int8_t)EXTRACT8(rs1, i * 4);
+        int8_t s1_b1 =3D (int8_t)EXTRACT8(rs1, i * 4 + 1);
+        int8_t s1_b2 =3D (int8_t)EXTRACT8(rs1, i * 4 + 2);
+        int8_t s1_b3 =3D (int8_t)EXTRACT8(rs1, i * 4 + 3);
+        int8_t s2_b0 =3D (int8_t)EXTRACT8(rs2, i * 4);
+        int8_t s2_b1 =3D (int8_t)EXTRACT8(rs2, i * 4 + 1);
+        int8_t s2_b2 =3D (int8_t)EXTRACT8(rs2, i * 4 + 2);
+        int8_t s2_b3 =3D (int8_t)EXTRACT8(rs2, i * 4 + 3);
+        int32_t prod0 =3D (int32_t)s1_b0 * (int32_t)s2_b0;
+        int32_t prod1 =3D (int32_t)s1_b1 * (int32_t)s2_b1;
+        int32_t prod2 =3D (int32_t)s1_b2 * (int32_t)s2_b2;
+        int32_t prod3 =3D (int32_t)s1_b3 * (int32_t)s2_b3;
+        uint32_t sum =3D (uint32_t)(prod0 + prod1 + prod2 + prod3);
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PM4ADDSU.B - Add four products horizontally (signed x unsigned)
+ */
+target_ulong HELPER(pm4addsu_b)(CPURISCVState *env, target_ulong rs1,
+                                target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int8_t s1_b0 =3D (int8_t)EXTRACT8(rs1, i * 4);
+        int8_t s1_b1 =3D (int8_t)EXTRACT8(rs1, i * 4 + 1);
+        int8_t s1_b2 =3D (int8_t)EXTRACT8(rs1, i * 4 + 2);
+        int8_t s1_b3 =3D (int8_t)EXTRACT8(rs1, i * 4 + 3);
+        uint8_t s2_b0 =3D EXTRACT8(rs2, i * 4);
+        uint8_t s2_b1 =3D EXTRACT8(rs2, i * 4 + 1);
+        uint8_t s2_b2 =3D EXTRACT8(rs2, i * 4 + 2);
+        uint8_t s2_b3 =3D EXTRACT8(rs2, i * 4 + 3);
+        int32_t prod0 =3D (int32_t)s1_b0 * (uint32_t)s2_b0;
+        int32_t prod1 =3D (int32_t)s1_b1 * (uint32_t)s2_b1;
+        int32_t prod2 =3D (int32_t)s1_b2 * (uint32_t)s2_b2;
+        int32_t prod3 =3D (int32_t)s1_b3 * (uint32_t)s2_b3;
+        uint32_t sum =3D (uint32_t)(prod0 + prod1 + prod2 + prod3);
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PM4ADDU.B - Add four products horizontally (unsigned)
+ */
+target_ulong HELPER(pm4addu_b)(CPURISCVState *env, target_ulong rs1,
+                               target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t s1_b0 =3D EXTRACT8(rs1, i * 4);
+        uint8_t s1_b1 =3D EXTRACT8(rs1, i * 4 + 1);
+        uint8_t s1_b2 =3D EXTRACT8(rs1, i * 4 + 2);
+        uint8_t s1_b3 =3D EXTRACT8(rs1, i * 4 + 3);
+        uint8_t s2_b0 =3D EXTRACT8(rs2, i * 4);
+        uint8_t s2_b1 =3D EXTRACT8(rs2, i * 4 + 1);
+        uint8_t s2_b2 =3D EXTRACT8(rs2, i * 4 + 2);
+        uint8_t s2_b3 =3D EXTRACT8(rs2, i * 4 + 3);
+        uint32_t prod0 =3D (uint32_t)s1_b0 * (uint32_t)s2_b0;
+        uint32_t prod1 =3D (uint32_t)s1_b1 * (uint32_t)s2_b1;
+        uint32_t prod2 =3D (uint32_t)s1_b2 * (uint32_t)s2_b2;
+        uint32_t prod3 =3D (uint32_t)s1_b3 * (uint32_t)s2_b3;
+        uint32_t sum =3D prod0 + prod1 + prod2 + prod3;
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PM4ADDA.B - Add four products horizontally with accumulate
+ */
+target_ulong HELPER(pm4adda_b)(CPURISCVState *env, target_ulong rs1,
+                               target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int8_t s1_b0 =3D (int8_t)EXTRACT8(rs1, i * 4);
+        int8_t s1_b1 =3D (int8_t)EXTRACT8(rs1, i * 4 + 1);
+        int8_t s1_b2 =3D (int8_t)EXTRACT8(rs1, i * 4 + 2);
+        int8_t s1_b3 =3D (int8_t)EXTRACT8(rs1, i * 4 + 3);
+        int8_t s2_b0 =3D (int8_t)EXTRACT8(rs2, i * 4);
+        int8_t s2_b1 =3D (int8_t)EXTRACT8(rs2, i * 4 + 1);
+        int8_t s2_b2 =3D (int8_t)EXTRACT8(rs2, i * 4 + 2);
+        int8_t s2_b3 =3D (int8_t)EXTRACT8(rs2, i * 4 + 3);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int32_t prod0 =3D (int32_t)s1_b0 * (int32_t)s2_b0;
+        int32_t prod1 =3D (int32_t)s1_b1 * (int32_t)s2_b1;
+        int32_t prod2 =3D (int32_t)s1_b2 * (int32_t)s2_b2;
+        int32_t prod3 =3D (int32_t)s1_b3 * (int32_t)s2_b3;
+        uint32_t sum =3D (uint32_t)(d + prod0 + prod1 + prod2 + prod3);
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PM4ADDASU.B - Add four products horizontally with accumulate
+ * (signed x unsigned)
+ */
+target_ulong HELPER(pm4addasu_b)(CPURISCVState *env, target_ulong rs1,
+                                 target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        int8_t s1_b0 =3D (int8_t)EXTRACT8(rs1, i * 4);
+        int8_t s1_b1 =3D (int8_t)EXTRACT8(rs1, i * 4 + 1);
+        int8_t s1_b2 =3D (int8_t)EXTRACT8(rs1, i * 4 + 2);
+        int8_t s1_b3 =3D (int8_t)EXTRACT8(rs1, i * 4 + 3);
+        uint8_t s2_b0 =3D EXTRACT8(rs2, i * 4);
+        uint8_t s2_b1 =3D EXTRACT8(rs2, i * 4 + 1);
+        uint8_t s2_b2 =3D EXTRACT8(rs2, i * 4 + 2);
+        uint8_t s2_b3 =3D EXTRACT8(rs2, i * 4 + 3);
+        int32_t d =3D (int32_t)EXTRACT32(dest, i);
+        int32_t prod0 =3D (int32_t)s1_b0 * (uint32_t)s2_b0;
+        int32_t prod1 =3D (int32_t)s1_b1 * (uint32_t)s2_b1;
+        int32_t prod2 =3D (int32_t)s1_b2 * (uint32_t)s2_b2;
+        int32_t prod3 =3D (int32_t)s1_b3 * (uint32_t)s2_b3;
+        uint32_t sum =3D (uint32_t)(d + prod0 + prod1 + prod2 + prod3);
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PM4ADDAU.B - Add four products horizontally with accumulate (unsigned)
+ */
+target_ulong HELPER(pm4addau_b)(CPURISCVState *env, target_ulong rs1,
+                                target_ulong rs2, target_ulong dest)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_W(rd);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint8_t s1_b0 =3D EXTRACT8(rs1, i * 4);
+        uint8_t s1_b1 =3D EXTRACT8(rs1, i * 4 + 1);
+        uint8_t s1_b2 =3D EXTRACT8(rs1, i * 4 + 2);
+        uint8_t s1_b3 =3D EXTRACT8(rs1, i * 4 + 3);
+        uint8_t s2_b0 =3D EXTRACT8(rs2, i * 4);
+        uint8_t s2_b1 =3D EXTRACT8(rs2, i * 4 + 1);
+        uint8_t s2_b2 =3D EXTRACT8(rs2, i * 4 + 2);
+        uint8_t s2_b3 =3D EXTRACT8(rs2, i * 4 + 3);
+        uint32_t d =3D EXTRACT32(dest, i);
+        uint32_t prod0 =3D (uint32_t)s1_b0 * (uint32_t)s2_b0;
+        uint32_t prod1 =3D (uint32_t)s1_b1 * (uint32_t)s2_b1;
+        uint32_t prod2 =3D (uint32_t)s1_b2 * (uint32_t)s2_b2;
+        uint32_t prod3 =3D (uint32_t)s1_b3 * (uint32_t)s2_b3;
+        uint32_t sum =3D d + prod0 + prod1 + prod2 + prod3;
+        rd =3D INSERT32(rd, sum, i);
+    }
+    return rd;
+}
+
+/**
+ * PM4ADD.H - Add four products horizontally (halfword to doubleword, RV64=
 only)
+ */
+uint64_t HELPER(pm4add_h)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    int16_t s1_h2 =3D (int16_t)EXTRACT16(rs1, 2);
+    int16_t s1_h3 =3D (int16_t)EXTRACT16(rs1, 3);
+    int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0);
+    int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1);
+    int16_t s2_h2 =3D (int16_t)EXTRACT16(rs2, 2);
+    int16_t s2_h3 =3D (int16_t)EXTRACT16(rs2, 3);
+    int64_t prod0 =3D (int64_t)s1_h0 * (int64_t)s2_h0;
+    int64_t prod1 =3D (int64_t)s1_h1 * (int64_t)s2_h1;
+    int64_t prod2 =3D (int64_t)s1_h2 * (int64_t)s2_h2;
+    int64_t prod3 =3D (int64_t)s1_h3 * (int64_t)s2_h3;
+    rd =3D (uint64_t)(prod0 + prod1 + prod2 + prod3);
+    return rd;
+}
+
+/**
+ * PM4ADDSU.H - Add four products horizontally (signed x unsigned, RV64 on=
ly)
+ */
+uint64_t HELPER(pm4addsu_h)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    int16_t s1_h2 =3D (int16_t)EXTRACT16(rs1, 2);
+    int16_t s1_h3 =3D (int16_t)EXTRACT16(rs1, 3);
+    uint16_t s2_h0 =3D EXTRACT16(rs2, 0);
+    uint16_t s2_h1 =3D EXTRACT16(rs2, 1);
+    uint16_t s2_h2 =3D EXTRACT16(rs2, 2);
+    uint16_t s2_h3 =3D EXTRACT16(rs2, 3);
+    int64_t prod0 =3D (int64_t)s1_h0 * (uint64_t)s2_h0;
+    int64_t prod1 =3D (int64_t)s1_h1 * (uint64_t)s2_h1;
+    int64_t prod2 =3D (int64_t)s1_h2 * (uint64_t)s2_h2;
+    int64_t prod3 =3D (int64_t)s1_h3 * (uint64_t)s2_h3;
+    rd =3D (uint64_t)(prod0 + prod1 + prod2 + prod3);
+    return rd;
+}
+
+/**
+ * PM4ADDU.H - Add four products horizontally (unsigned, RV64 only)
+ */
+uint64_t HELPER(pm4addu_h)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    uint16_t s1_h0 =3D EXTRACT16(rs1, 0);
+    uint16_t s1_h1 =3D EXTRACT16(rs1, 1);
+    uint16_t s1_h2 =3D EXTRACT16(rs1, 2);
+    uint16_t s1_h3 =3D EXTRACT16(rs1, 3);
+    uint16_t s2_h0 =3D EXTRACT16(rs2, 0);
+    uint16_t s2_h1 =3D EXTRACT16(rs2, 1);
+    uint16_t s2_h2 =3D EXTRACT16(rs2, 2);
+    uint16_t s2_h3 =3D EXTRACT16(rs2, 3);
+    uint64_t prod0 =3D (uint64_t)s1_h0 * (uint64_t)s2_h0;
+    uint64_t prod1 =3D (uint64_t)s1_h1 * (uint64_t)s2_h1;
+    uint64_t prod2 =3D (uint64_t)s1_h2 * (uint64_t)s2_h2;
+    uint64_t prod3 =3D (uint64_t)s1_h3 * (uint64_t)s2_h3;
+    rd =3D prod0 + prod1 + prod2 + prod3;
+    return rd;
+}
+
+/**
+ * PM4ADDA.H - Add four products horizontally with accumulate (RV64 only)
+ */
+uint64_t HELPER(pm4adda_h)(CPURISCVState *env, uint64_t rs1,
+                           uint64_t rs2, uint64_t dest)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    int16_t s1_h2 =3D (int16_t)EXTRACT16(rs1, 2);
+    int16_t s1_h3 =3D (int16_t)EXTRACT16(rs1, 3);
+    int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0);
+    int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1);
+    int16_t s2_h2 =3D (int16_t)EXTRACT16(rs2, 2);
+    int16_t s2_h3 =3D (int16_t)EXTRACT16(rs2, 3);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod0 =3D (int64_t)s1_h0 * (int64_t)s2_h0;
+    int64_t prod1 =3D (int64_t)s1_h1 * (int64_t)s2_h1;
+    int64_t prod2 =3D (int64_t)s1_h2 * (int64_t)s2_h2;
+    int64_t prod3 =3D (int64_t)s1_h3 * (int64_t)s2_h3;
+    return (uint64_t)(d + prod0 + prod1 + prod2 + prod3);
+}
+
+/**
+ * PM4ADDASU.H - Add four products horizontally with accumulate
+ * (signed x unsigned, RV64 only)
+ */
+uint64_t HELPER(pm4addasu_h)(CPURISCVState *env, uint64_t rs1,
+                             uint64_t rs2, uint64_t dest)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    int16_t s1_h2 =3D (int16_t)EXTRACT16(rs1, 2);
+    int16_t s1_h3 =3D (int16_t)EXTRACT16(rs1, 3);
+    uint16_t s2_h0 =3D EXTRACT16(rs2, 0);
+    uint16_t s2_h1 =3D EXTRACT16(rs2, 1);
+    uint16_t s2_h2 =3D EXTRACT16(rs2, 2);
+    uint16_t s2_h3 =3D EXTRACT16(rs2, 3);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod0 =3D (int64_t)s1_h0 * (uint64_t)s2_h0;
+    int64_t prod1 =3D (int64_t)s1_h1 * (uint64_t)s2_h1;
+    int64_t prod2 =3D (int64_t)s1_h2 * (uint64_t)s2_h2;
+    int64_t prod3 =3D (int64_t)s1_h3 * (uint64_t)s2_h3;
+    return (uint64_t)(d + prod0 + prod1 + prod2 + prod3);
+}
+
+/**
+ * PM4ADDAU.H - Add four products horizontally with accumulate
+ * (unsigned, RV64 only)
+ */
+uint64_t HELPER(pm4addau_h)(CPURISCVState *env, uint64_t rs1,
+                            uint64_t rs2, uint64_t dest)
+{
+    uint16_t s1_h0 =3D EXTRACT16(rs1, 0);
+    uint16_t s1_h1 =3D EXTRACT16(rs1, 1);
+    uint16_t s1_h2 =3D EXTRACT16(rs1, 2);
+    uint16_t s1_h3 =3D EXTRACT16(rs1, 3);
+    uint16_t s2_h0 =3D EXTRACT16(rs2, 0);
+    uint16_t s2_h1 =3D EXTRACT16(rs2, 1);
+    uint16_t s2_h2 =3D EXTRACT16(rs2, 2);
+    uint16_t s2_h3 =3D EXTRACT16(rs2, 3);
+    uint64_t d =3D dest;
+    uint64_t prod0 =3D (uint64_t)s1_h0 * (uint64_t)s2_h0;
+    uint64_t prod1 =3D (uint64_t)s1_h1 * (uint64_t)s2_h1;
+    uint64_t prod2 =3D (uint64_t)s1_h2 * (uint64_t)s2_h2;
+    uint64_t prod3 =3D (uint64_t)s1_h3 * (uint64_t)s2_h3;
+    return d + prod0 + prod1 + prod2 + prod3;
+}
--=20
2.34.1
From nobody Sat May 30 20:13:16 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1776422920325306.8366230748461;
 Fri, 17 Apr 2026 03:48:40 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists1p.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1wDgjm-0001P9-A8; Fri, 17 Apr 2026 06:47:50 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjk-0001OZ-4O; Fri, 17 Apr 2026 06:47:48 -0400
Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn)
 by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjh-000827-W2; Fri, 17 Apr 2026 06:47:47 -0400
Received: from Huawei.localdomain (unknown [36.110.52.2])
 by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S14;
 Fri, 17 Apr 2026 18:47:22 +0800 (CST)
From: Molly Chen <xiaoou@iscas.ac.cn>
To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com,
 daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com,
 chao.liu.zevorn@gmail.com
Cc: xiaoou@iscas.ac.cn,
	qemu-riscv@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 12/14] target/riscv: rvp: add load and replicate instructions.
Date: Fri, 17 Apr 2026 18:46:49 +0800
Message-Id: <20260417104652.17857-13-xiaoou@iscas.ac.cn>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
References: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S14
X-Coremail-Antispam: 1UD129KBjvJXoWxXry3Kr4xuF1DXrWDKw18Grg_yoWrAr4fpF
 48Gr17GrWkGr13AF93Kr45Jr13Wrs5G34UG3sxW3Z7AF45JFWrA348Kw43tr4FqryDWFWU
 GF1UAryDuFZ5JwUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2
 9KBjDU0xBIdaVrnRJUUUPY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0
 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI
 kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2
 z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F
 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq
 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7
 IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4U
 M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0262kKe7AKxVWUtV
 W8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2
 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2
 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_
 Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr
 1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUqLvNU
 UUUU=
X-Originating-IP: [36.110.52.2]
X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists1p.gnu.org;
Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn;
 helo=cstnet.cn
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998,
 HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: qemu development <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1776422921780154100
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Molly Chen <xiaoou@iscas.ac.cn>
---
 target/riscv/insn32.decode              | 16 ++++++
 target/riscv/insn_trans/trans_rvp.c.inc | 67 +++++++++++++++++++++++++
 target/riscv/translate.c                |  2 +
 3 files changed, 85 insertions(+)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ebfbf8c799..b1bde37de4 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -44,6 +44,10 @@
 %imm_p_ui16 20:4
 %imm_p_ui32 20:5
 %imm_p_ui64 20:6
+%imm_p_l1  16:8
+%imm_p_l2  15:s1 16:9
+%imm_p_l3  15:s9 24:1              !function=3Dex_shift_6
+%imm_p_l4  15:s9 24:1              !function=3Dex_shift_22
=20
 # Argument sets:
 &empty
@@ -64,6 +68,7 @@
 &k_aes     shamt rs2 rs1 rd
 &mop5 imm rd rs1
 &mop3 imm rd rs1 rs2
+&p_l  imm rd
=20
 # Formats 32:
 @r       .......   ..... ..... ... ..... ....... &r                %rs2 %r=
s1 %rd
@@ -113,6 +118,10 @@
 @p_ui16 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui16 %rs1 %=
rd
 @p_ui32 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui32 %rs1 %=
rd
 @p_ui64 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui64 %rs1 %=
rd
+@p_l1  ........ ........ .... ..... ....... &p_l      imm=3D%imm_p_l1     =
    %rd
+@p_l2  ....... .......... ... ..... ....... &p_l      imm=3D%imm_p_l2     =
    %rd
+@p_l3  ....... .......... ... ..... ....... &p_l      imm=3D%imm_p_l3     =
    %rd
+@p_l4  ....... .......... ... ..... ....... &p_l      imm=3D%imm_p_l4     =
    %rd
=20
 # Formats 64:
 @sh5     .......  ..... .....  ... ..... ....... &shift  shamt=3D%sh5     =
 %rs1 %rd
@@ -1596,3 +1605,10 @@ pm4addu_h       10100 11 ..... ..... 101 ..... 01110=
11 @r
 pm4adda_h       10001 11 ..... ..... 101 ..... 0111011 @r
 pm4addasu_h     11101 11 ..... ..... 101 ..... 0111011 @r
 pm4addau_h      10101 11 ..... ..... 101 ..... 0111011 @r
+
+# Packed SIMD - Load and Replicate instructions
+pli_b    10110100 ........ 0010 ..... 0011011 @p_l1
+pli_h    1011000 .......... 010 ..... 0011011 @p_l2
+plui_h   1111000 .......... 010 ..... 0011011 @p_l3
+pli_w    1011001 ..... ..... 010 ..... 0011011 @p_l2
+plui_w   1111001 ..... ..... 010 ..... 0011011 @p_l4
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr=
ans/trans_rvp.c.inc
index 86071d71f7..b82774e00f 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -906,3 +906,70 @@ GEN_SIMD_TRANS_64(pm4addu_h)
 GEN_SIMD_TRANS_ACC_64(pm4adda_h)
 GEN_SIMD_TRANS_ACC_64(pm4addasu_h)
 GEN_SIMD_TRANS_ACC_64(pm4addau_h)
+
+static bool trans_pli_b(DisasContext *ctx, arg_pli_b * a)
+{
+    REQUIRE_EXT(ctx, RVP);
+    int i =3D 1;
+    target_long imm =3D a->imm;
+    while (i < TARGET_LONG_SIZE) {
+        imm =3D ((imm << 8) + a->imm);
+        i++;
+    }
+    gen_set_gpri(ctx, a->rd, imm);
+    return true;
+}
+
+static bool trans_pli_h(DisasContext *ctx, arg_pli_h * a)
+{
+    REQUIRE_EXT(ctx, RVP);
+    int i =3D 1;
+    target_long imm =3D a->imm;
+    while (i < TARGET_LONG_SIZE / 2) {
+        imm =3D (imm << 16) + (a->imm & 0xFFFF);
+        i++;
+    }
+    gen_set_gpri(ctx, a->rd, imm);
+    return true;
+}
+
+static bool trans_plui_h(DisasContext *ctx, arg_plui_h * a)
+{
+    REQUIRE_EXT(ctx, RVP);
+    int i =3D 1;
+    target_long imm =3D a->imm;
+    while (i < TARGET_LONG_SIZE / 2) {
+        imm =3D (imm << 16) + (a->imm & 0xFFFF);
+        i++;
+    }
+    gen_set_gpri(ctx, a->rd, imm);
+    return true;
+}
+
+static bool trans_pli_w(DisasContext *ctx, arg_pli_w * a)
+{
+    REQUIRE_64BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    int i =3D 1;
+    int64_t imm =3D a->imm;
+    while (i < TARGET_LONG_SIZE / 4) {
+        imm =3D (imm << 32) + (a->imm & 0xFFFFFFFF);
+        i++;
+    }
+    gen_set_gpri(ctx, a->rd, imm);
+    return true;
+}
+
+static bool trans_plui_w(DisasContext *ctx, arg_plui_w * a)
+{
+    REQUIRE_64BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    int i =3D 1;
+    int64_t imm =3D a->imm;
+    while (i < TARGET_LONG_SIZE / 4) {
+        imm =3D (imm << 32) + (a->imm & 0xFFFFFFFF);
+        i++;
+    }
+    gen_set_gpri(ctx, a->rd, imm);
+    return true;
+}
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index de3ec7a7ec..04efc7aced 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -796,7 +796,9 @@ EX_SH(1)
 EX_SH(2)
 EX_SH(3)
 EX_SH(4)
+EX_SH(6)
 EX_SH(12)
+EX_SH(22)
=20
 #define REQUIRE_EXT(ctx, ext) do { \
     if (!has_ext(ctx, ext)) {      \
--=20
2.34.1
From nobody Sat May 30 20:13:16 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1776422938295955.9618840338967;
 Fri, 17 Apr 2026 03:48:58 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists1p.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1wDgjs-0001Xw-9Y; Fri, 17 Apr 2026 06:47:56 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjp-0001Uw-9F; Fri, 17 Apr 2026 06:47:53 -0400
Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn)
 by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjj-00083b-QA; Fri, 17 Apr 2026 06:47:52 -0400
Received: from Huawei.localdomain (unknown [36.110.52.2])
 by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S15;
 Fri, 17 Apr 2026 18:47:23 +0800 (CST)
From: Molly Chen <xiaoou@iscas.ac.cn>
To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com,
 daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com,
 chao.liu.zevorn@gmail.com
Cc: xiaoou@iscas.ac.cn,
	qemu-riscv@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 13/14] target/riscv: rvp: add rv32-only register-pair
 instructions
Date: Fri, 17 Apr 2026 18:46:50 +0800
Message-Id: <20260417104652.17857-14-xiaoou@iscas.ac.cn>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
References: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S15
X-Coremail-Antispam: 1UD129KBjvAXoWDKw1rGFWfGrykAFy3Kr1DGFg_yoWfXr48to
 W5Gw15Ar97GrW7ua4akw4UXFy7Zry2vwn3Jr45Zr47uayfGr47KFn8Jrn5Zay8JrWFkFWf
 XFZ3Grn5tr1a934Dn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3
 AaLaJ3UjIYCTnIWjp_UUUY37AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva
 j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc
 Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l
 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV
 WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE
 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I
 x0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8
 JwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_Jw0_GFyl4I8I3I0E4IkC6x
 0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2
 zVAF1VAY17CE14v26r1q6r43MIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2
 IY6xkF7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2
 jsIE14v26r4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73Uj
 IFyTuYvjfU5TmhDUUUU
X-Originating-IP: [36.110.52.2]
X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists1p.gnu.org;
Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn;
 helo=cstnet.cn
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998,
 HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: qemu development <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1776422940602158501
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Molly Chen <xiaoou@iscas.ac.cn>
---
 target/riscv/helper.h                   |  131 ++
 target/riscv/insn32.decode              |  279 +++
 target/riscv/insn_trans/trans_rvp.c.inc |  786 ++++++++-
 target/riscv/psimd_helper.c             | 2068 +++++++++++++++++++++++
 4 files changed, 3220 insertions(+), 44 deletions(-)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 663ac0e242..85d4fe1b67 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1737,3 +1737,134 @@ DEF_HELPER_3(pm4addu_h, i64, env, i64, i64)
 DEF_HELPER_4(pm4adda_h, i64, env, i64, i64, i64)
 DEF_HELPER_4(pm4addasu_h, i64, env, i64, i64, i64)
 DEF_HELPER_4(pm4addau_h, i64, env, i64, i64, i64)
+
+/* Packed SIMD - Double-Width Operations (RV32 only, register pairs) */
+DEF_HELPER_3(pwadd_b, i64, env, i32, i32)
+DEF_HELPER_4(pwadda_b, i64, env, i32, i32, i64)
+DEF_HELPER_3(pwaddu_b, i64, env, i32, i32)
+DEF_HELPER_4(pwaddau_b, i64, env, i32, i32, i64)
+DEF_HELPER_3(pwsub_b, i64, env, i32, i32)
+DEF_HELPER_4(pwsuba_b, i64, env, i32, i32, i64)
+DEF_HELPER_3(pwsubu_b, i64, env, i32, i32)
+DEF_HELPER_4(pwsubau_b, i64, env, i32, i32, i64)
+DEF_HELPER_3(pwslli_b, i64, env, i32, i32)
+DEF_HELPER_3(pwsll_bs, i64, env, i32, i32)
+DEF_HELPER_3(pwslai_b, i64, env, i32, i32)
+DEF_HELPER_3(pwsla_bs, i64, env, i32, i32)
+
+DEF_HELPER_3(pwadd_h, i64, env, i32, i32)
+DEF_HELPER_4(pwadda_h, i64, env, i32, i32, i64)
+DEF_HELPER_3(pwaddu_h, i64, env, i32, i32)
+DEF_HELPER_4(pwaddau_h, i64, env, i32, i32, i64)
+DEF_HELPER_3(pwsub_h, i64, env, i32, i32)
+DEF_HELPER_4(pwsuba_h, i64, env, i32, i32, i64)
+DEF_HELPER_3(pwsubu_h, i64, env, i32, i32)
+DEF_HELPER_4(pwsubau_h, i64, env, i32, i32, i64)
+DEF_HELPER_3(pwslli_h, i64, env, i32, i32)
+DEF_HELPER_3(pwsll_hs, i64, env, i32, i32)
+DEF_HELPER_3(pwslai_h, i64, env, i32, i32)
+DEF_HELPER_3(pwsla_hs, i64, env, i32, i32)
+
+DEF_HELPER_3(wadd, i64, env, i32, i32)
+DEF_HELPER_4(wadda, i64, env, i32, i32, i64)
+DEF_HELPER_3(waddu, i64, env, i32, i32)
+DEF_HELPER_4(waddau, i64, env, i32, i32, i64)
+DEF_HELPER_3(wsub, i64, env, i32, i32)
+DEF_HELPER_4(wsuba, i64, env, i32, i32, i64)
+DEF_HELPER_3(wsubu, i64, env, i32, i32)
+DEF_HELPER_4(wsubau, i64, env, i32, i32, i64)
+DEF_HELPER_3(wslli, i64, env, i32, i32)
+DEF_HELPER_3(wsll, i64, env, i32, i32)
+DEF_HELPER_3(wslai, i64, env, i32, i32)
+DEF_HELPER_3(wsla, i64, env, i32, i32)
+
+DEF_HELPER_3(wzip8p, i64, env, i32, i32)
+DEF_HELPER_3(wzip16p, i64, env, i32, i32)
+
+DEF_HELPER_4(predsum_dbs, i32, env, i32, i32, i32)
+DEF_HELPER_4(predsumu_dbs, i32, env, i32, i32, i32)
+DEF_HELPER_4(predsum_dhs, i32, env, i32, i32, i32)
+DEF_HELPER_4(predsumu_dhs, i32, env, i32, i32, i32)
+
+DEF_HELPER_3(pnsrli_b, i32, env, i64, i32)
+DEF_HELPER_3(pnsrai_b, i32, env, i64, i32)
+DEF_HELPER_3(pnsrari_b, i32, env, i64, i32)
+DEF_HELPER_3(pnclipi_b, i32, env, i64, i32)
+DEF_HELPER_3(pnclipri_b, i32, env, i64, i32)
+DEF_HELPER_3(pnclipiu_b, i32, env, i64, i32)
+DEF_HELPER_3(pnclipriu_b, i32, env, i64, i32)
+DEF_HELPER_3(pnsrl_bs, i32, env, i64, i32)
+DEF_HELPER_3(pnsra_bs, i32, env, i64, i32)
+DEF_HELPER_3(pnsrar_bs, i32, env, i64, i32)
+DEF_HELPER_3(pnclip_bs, i32, env, i64, i32)
+DEF_HELPER_3(pnclipr_bs, i32, env, i64, i32)
+DEF_HELPER_3(pnclipu_bs, i32, env, i64, i32)
+DEF_HELPER_3(pnclipru_bs, i32, env, i64, i32)
+
+DEF_HELPER_3(pnsrli_h, i32, env, i64, i32)
+DEF_HELPER_3(pnsrai_h, i32, env, i64, i32)
+DEF_HELPER_3(pnsrari_h, i32, env, i64, i32)
+DEF_HELPER_3(pnclipi_h, i32, env, i64, i32)
+DEF_HELPER_3(pnclipri_h, i32, env, i64, i32)
+DEF_HELPER_3(pnclipiu_h, i32, env, i64, i32)
+DEF_HELPER_3(pnclipriu_h, i32, env, i64, i32)
+DEF_HELPER_3(pnsrl_hs, i32, env, i64, i32)
+DEF_HELPER_3(pnsra_hs, i32, env, i64, i32)
+DEF_HELPER_3(pnsrar_hs, i32, env, i64, i32)
+DEF_HELPER_3(pnclip_hs, i32, env, i64, i32)
+DEF_HELPER_3(pnclipr_hs, i32, env, i64, i32)
+DEF_HELPER_3(pnclipu_hs, i32, env, i64, i32)
+DEF_HELPER_3(pnclipru_hs, i32, env, i64, i32)
+
+DEF_HELPER_3(nsrli, i32, env, i64, i32)
+DEF_HELPER_3(nsrai, i32, env, i64, i32)
+DEF_HELPER_3(nsrari, i32, env, i64, i32)
+DEF_HELPER_3(nclipi, i32, env, i64, i32)
+DEF_HELPER_3(nclipri, i32, env, i64, i32)
+DEF_HELPER_3(nclipiu, i32, env, i64, i32)
+DEF_HELPER_3(nclipriu, i32, env, i64, i32)
+DEF_HELPER_3(nsrl, i32, env, i64, i32)
+DEF_HELPER_3(nsra, i32, env, i64, i32)
+DEF_HELPER_3(nsrar, i32, env, i64, i32)
+DEF_HELPER_3(nclip, i32, env, i64, i32)
+DEF_HELPER_3(nclipr, i32, env, i64, i32)
+DEF_HELPER_3(nclipu, i32, env, i64, i32)
+DEF_HELPER_3(nclipru, i32, env, i64, i32)
+
+DEF_HELPER_4(pmqwacc_h, i64, env, i32, i32, i64)
+DEF_HELPER_4(pmqrwacc_h, i64, env, i32, i32, i64)
+DEF_HELPER_4(mqwacc, i64, env, i32, i32, i64)
+DEF_HELPER_4(mqrwacc, i64, env, i32, i32, i64)
+
+DEF_HELPER_3(pwmul_b, i64, env, i32, i32)
+DEF_HELPER_3(pwmulsu_b, i64, env, i32, i32)
+DEF_HELPER_3(pwmulu_b, i64, env, i32, i32)
+DEF_HELPER_3(pwmul_h, i64, env, i32, i32)
+DEF_HELPER_3(pwmulsu_h, i64, env, i32, i32)
+DEF_HELPER_3(pwmulu_h, i64, env, i32, i32)
+
+DEF_HELPER_4(pwmacc_h, i64, env, i32, i32, i64)
+DEF_HELPER_4(pwmaccsu_h, i64, env, i32, i32, i64)
+DEF_HELPER_4(pwmaccu_h, i64, env, i32, i32, i64)
+
+DEF_HELPER_3(wmul, i64, env, i32, i32)
+DEF_HELPER_3(wmulsu, i64, env, i32, i32)
+DEF_HELPER_3(wmulu, i64, env, i32, i32)
+
+DEF_HELPER_4(wmacc, i64, env, i32, i32, i64)
+DEF_HELPER_4(wmaccsu, i64, env, i32, i32, i64)
+DEF_HELPER_4(wmaccu, i64, env, i32, i32, i64)
+
+DEF_HELPER_3(pm2wadd_h, i64, env, i32, i32)
+DEF_HELPER_3(pm2waddsu_h, i64, env, i32, i32)
+DEF_HELPER_3(pm2waddu_h, i64, env, i32, i32)
+DEF_HELPER_3(pm2wadd_hx, i64, env, i32, i32)
+DEF_HELPER_4(pm2wadda_h, i64, env, i32, i32, i64)
+DEF_HELPER_4(pm2waddasu_h, i64, env, i32, i32, i64)
+DEF_HELPER_4(pm2waddau_h, i64, env, i32, i32, i64)
+DEF_HELPER_4(pm2wadda_hx, i64, env, i32, i32, i64)
+
+DEF_HELPER_3(pm2wsub_h, i64, env, i32, i32)
+DEF_HELPER_3(pm2wsub_hx, i64, env, i32, i32)
+DEF_HELPER_4(pm2wsuba_h, i64, env, i32, i32, i64)
+DEF_HELPER_4(pm2wsuba_hx, i64, env, i32, i32, i64)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b1bde37de4..7be0b9e5e6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -23,6 +23,9 @@
 %rd        7:5
 %sh5       20:5
 %sh6       20:6
+%rs2_p     21:4
+%rs1_p     16:4
+%rd_p      8:4
=20
 %sh7    20:7
 %csr    20:12
@@ -69,6 +72,7 @@
 &mop5 imm rd rs1
 &mop3 imm rd rs1 rs2
 &p_l  imm rd
+&p_ui imm rs1 rd
=20
 # Formats 32:
 @r       .......   ..... ..... ... ..... ....... &r                %rs2 %r=
s1 %rd
@@ -101,6 +105,11 @@
 @r2_zimm11 . zimm:11  ..... ... ..... ....... %rs1 %rd
 @r2_zimm10 .. zimm:10  ..... ... ..... ....... %rs1 %rd
 @r2_s    .......   ..... ..... ... ..... ....... %rs2 %rs1
+@r_p_1       .......   ..... ..... ... ..... ....... &r    %rs2 %rs1 rd=3D=
%rd_p
+@r_p_2     .......   ..... ..... ... ..... ....... &r    rs2=3D%rs2_p rs1=
=3D%rs1_p rd=3D%rd_p
+@r_p_3     .......   ..... ..... ... ..... ....... &r    %rs2 rs1=3D%rs1_p=
 rd=3D%rd_p
+@r_p_4     .......   ..... ..... ... ..... ....... &r    %rs2 rs1=3D%rs1_p=
 %rd
+@r2_p      .......   ..... ..... ... ..... ....... &r2   rs1=3D%rs1_p rd=
=3D%rd_p
=20
 @hfence_gvma ....... ..... .....   ... ..... ....... %rs2 %rs1
 @hfence_vvma ....... ..... .....   ... ..... ....... %rs2 %rs1
@@ -122,6 +131,18 @@
 @p_l2  ....... .......... ... ..... ....... &p_l      imm=3D%imm_p_l2     =
    %rd
 @p_l3  ....... .......... ... ..... ....... &p_l      imm=3D%imm_p_l3     =
    %rd
 @p_l4  ....... .......... ... ..... ....... &p_l      imm=3D%imm_p_l4     =
    %rd
+@p_l1_p  ........ ........ .... ..... ....... &p_l    imm=3D%imm_p_l1     =
    rd=3D%rd_p
+@p_l2_p  ........ ........ .... ..... ....... &p_l    imm=3D%imm_p_l2     =
    rd=3D%rd_p
+@p_l3_p  ....... .......... ... ..... ....... &p_l    imm=3D%imm_p_l3     =
    rd=3D%rd_p
+@p_ui8_p ..... .... ... ..... ... ..... .......  &i imm=3D%imm_p_ui8 rs1=
=3D%rs1_p rd=3D%rd_p
+@p_ui16_p ..... .... ... ..... ... ..... ....... &p_ui imm=3D%imm_p_ui16 %=
rs1 rd=3D%rd_p
+@p_ui16_p_2 ..... .... ... ..... ... ..... ....... &p_ui imm=3D%imm_p_ui16=
 rs1=3D%rs1_p rd=3D%rd_p
+@p_ui16_p_3 ..... .... ... .... .... ..... ....... &p_ui imm=3D%imm_p_ui16=
 rs1=3D%rs1_p %rd
+@p_ui32_p ..... .... ... ..... ... ..... ....... &p_ui imm=3D%imm_p_ui32 %=
rs1 rd=3D%rd_p
+@p_ui32_p_2 ..... .... ... ..... ... ..... ....... &p_ui imm=3D%imm_p_ui32=
 rs1=3D%rs1_p rd=3D%rd_p
+@p_ui32_p_3 ..... .... ... ..... ... ..... ....... &p_ui imm=3D%imm_p_ui32=
 rs1=3D%rs1_p %rd
+@p_ui64_p ..... .... ... ..... ... ..... ....... &p_ui imm=3D%imm_p_ui64 %=
rs1 rd=3D%rd_p
+@p_ui64_p_2 ..... .... ... ..... ... ..... ....... &p_ui imm=3D%imm_p_ui64=
 rs1=3D%rs1_p %rd
=20
 # Formats 64:
 @sh5     .......  ..... .....  ... ..... ....... &shift  shamt=3D%sh5     =
 %rs1 %rd
@@ -1612,3 +1633,261 @@ pli_h    1011000 .......... 010 ..... 0011011 @p_l2
 plui_h   1111000 .......... 010 ..... 0011011 @p_l3
 pli_w    1011001 ..... ..... 010 ..... 0011011 @p_l2
 plui_w   1111001 ..... ..... 010 ..... 0011011 @p_l4
+
+# Packed SIMD - Double-Width Operations (RV32 only, register pairs)
+# register-pair destination
+pwadd_b    0000010 ..... ..... 010 .... 10011011 @r_p_1
+pwadda_b   0000110 ..... ..... 010 .... 10011011 @r_p_1
+pwaddu_b   0001010 ..... ..... 010 .... 10011011 @r_p_1
+pwaddau_b  0001110 ..... ..... 010 .... 10011011 @r_p_1
+pwsub_b    0100010 ..... ..... 010 .... 10011011 @r_p_1
+pwsuba_b   0100110 ..... ..... 010 .... 10011011 @r_p_1
+pwsubu_b   0101010 ..... ..... 010 .... 10011011 @r_p_1
+pwsubau_b  0101110 ..... ..... 010 .... 10011011 @r_p_1
+pwslli_b   00000 001.... ..... 010 .... 00011011 @p_ui16_p
+pwsll_bs   0000100 ..... ..... 010 .... 00011011 @r_p_1
+pwslai_b   01000 001.... ..... 010 .... 00011011 @p_ui16_p
+pwsla_bs   0100100 ..... ..... 010 .... 00011011 @r_p_1
+
+pwadd_h    0000000 ..... ..... 010 .... 10011011 @r_p_1
+pwadda_h   0000100 ..... ..... 010 .... 10011011 @r_p_1
+pwaddu_h   0001000 ..... ..... 010 .... 10011011 @r_p_1
+pwaddau_h  0001100 ..... ..... 010 .... 10011011 @r_p_1
+pwsub_h    0100000 ..... ..... 010 .... 10011011 @r_p_1
+pwsuba_h   0100100 ..... ..... 010 .... 10011011 @r_p_1
+pwsubu_h   0101000 ..... ..... 010 .... 10011011 @r_p_1
+pwsubau_h  0101100 ..... ..... 010 .... 10011011 @r_p_1
+pwslli_h   00000 01..... ..... 010 .... 00011011 @p_ui32_p
+pwsll_hs   0000101 ..... ..... 010 .... 00011011 @r_p_1
+pwslai_h   01000 01..... ..... 010 .... 00011011 @p_ui32_p
+pwsla_hs   0100101 ..... ..... 010 .... 00011011 @r_p_1
+
+wadd    0000001 ..... ..... 010 .... 10011011 @r_p_1
+wadda   0000101 ..... ..... 010 .... 10011011 @r_p_1
+waddu   0001001 ..... ..... 010 .... 10011011 @r_p_1
+waddau  0001101 ..... ..... 010 .... 10011011 @r_p_1
+wsub    0100001 ..... ..... 010 .... 10011011 @r_p_1
+wsuba   0100101 ..... ..... 010 .... 10011011 @r_p_1
+wsubu   0101001 ..... ..... 010 .... 10011011 @r_p_1
+wsubau  0101101 ..... ..... 010 .... 10011011 @r_p_1
+wslli   00000 1...... ..... 010 .... 00011011 @p_ui64_p
+wsll    0000111 ..... ..... 010 .... 00011011 @r_p_1
+wslai   01000 1...... ..... 010 .... 00011011 @p_ui64_p
+wsla    0100111 ..... ..... 010 .... 00011011 @r_p_1
+
+wzip8p    0111100 ..... ..... 010 .... 00011011 @r_p_1
+wzip16p   0111101 ..... ..... 010 .... 00011011 @r_p_1
+
+#register-pair operands
+pli_db    00110100 ........ 0010 .... 00011011 @p_l1_p
+padd_db   1000010 .... 0 .... 0110 .... 00011011 @r_p_2
+psub_db   1100010 .... 0 .... 0110 .... 00011011 @r_p_2
+psadd_db  1001010 .... 0 .... 0110 .... 00011011 @r_p_2
+psaddu_db    1011010 .... 0 .... 0110 .... 00011011 @r_p_2
+pssub_db     1101010 .... 0 .... 0110 .... 00011011 @r_p_2
+pssubu_db    1111010 .... 0 .... 0110 .... 00011011 @r_p_2
+paadd_db     1001110 .... 0 .... 0110 .... 00011011 @r_p_2
+paaddu_db    1011110 .... 0 .... 0110 .... 00011011 @r_p_2
+pasub_db     1101110 .... 0 .... 0110 .... 00011011 @r_p_2
+pasubu_db    1111110 .... 0 .... 0110 .... 00011011 @r_p_2
+pabd_db      1100110 .... 0 .... 0110 .... 00011011 @r_p_2
+pabdu_db     1110110 .... 0 .... 0110 .... 00011011 @r_p_2
+psabs_db     0110010 00111 .... 0110 .... 00011011 @r2_p
+pli_dh    0011000 .......... 010 .... 00011011 @p_l2_p
+plui_dh   0111000 .......... 010 .... 00011011 @p_l3_p
+padd_dh   1000000 .... 0 .... 0110 .... 00011011 @r_p_2
+psub_dh   1100000 .... 0 .... 0110 .... 00011011 @r_p_2
+psadd_dh  1001000 .... 0 .... 0110 .... 00011011 @r_p_2
+psaddu_dh 1011000 .... 0 .... 0110 .... 00011011 @r_p_2
+pssub_dh  1101000 .... 0 .... 0110 .... 00011011 @r_p_2
+pssubu_dh 1111000 .... 0 .... 0110 .... 00011011 @r_p_2
+paadd_dh  1001100 .... 0 .... 0110 .... 00011011 @r_p_2
+paaddu_dh 1011100 .... 0 .... 0110 .... 00011011 @r_p_2
+pasub_dh  1101100 .... 0 .... 0110 .... 00011011 @r_p_2
+pasubu_dh    1111100 .... 0 .... 0110 .... 00011011 @r_p_2
+psh1add_dh   1010000 .... 1 .... 0110 .... 00011011 @r_p_2
+pssh1sadd_dh 1011000 .... 1 .... 0110 .... 00011011 @r_p_2
+pas_dhx    1000000 .... 1 .... 1110 .... 00011011 @r_p_2
+psa_dhx    1000010 .... 1 .... 1110 .... 00011011 @r_p_2
+psas_dhx   1001000 .... 1 .... 1110 .... 00011011 @r_p_2
+pssa_dhx   1001010 .... 1 .... 1110 .... 00011011 @r_p_2
+paas_dhx   1001100 .... 1 .... 1110 .... 00011011 @r_p_2
+pasa_dhx   1001110 .... 1 .... 1110 .... 00011011 @r_p_2
+pabd_dh    1100100 .... 0 .... 0110 .... 00011011 @r_p_2
+pabdu_dh   1110100 .... 0 .... 0110 .... 00011011 @r_p_2
+psabs_dh   0110000 00111 .... 0110 .... 00011011 @r2_p
+padd_dw    1000001 .... 0 .... 0110 .... 00011011 @r_p_2
+psub_dw    1100001 .... 0 .... 0110 .... 00011011 @r_p_2
+psadd_dw   1001001 .... 0 .... 0110 .... 00011011 @r_p_2
+psaddu_dw  1011001 .... 0 .... 0110 .... 00011011 @r_p_2
+pssub_dw   1101001 .... 0 .... 0110 .... 00011011 @r_p_2
+pssubu_dw  1111001 .... 0 .... 0110 .... 00011011 @r_p_2
+paadd_dw   1001101 .... 0 .... 0110 .... 00011011 @r_p_2
+paaddu_dw  1011101 .... 0 .... 0110 .... 00011011 @r_p_2
+pasub_dw   1101101 .... 0 .... 0110 .... 00011011 @r_p_2
+pasubu_dw  1111101 .... 0 .... 0110 .... 00011011 @r_p_2
+psh1add_dw 1010001 .... 1 .... 0110 .... 00011011 @r_p_2
+pssh1sadd_dw  1011001 .... 1 .... 0110 .... 00011011 @r_p_2
+addd_p    1000011 .... 0 .... 0110 .... 00011011 @r_p_2
+subd_p    1100011 .... 0 .... 0110 .... 00011011 @r_p_2
+
+# register-pair first source only
+predsum_dbs    0001110 ..... .... 0100 ..... 0011011 @r_p_4
+predsumu_dbs   0011110 ..... .... 0100 ..... 0011011 @r_p_4
+predsum_dhs    0001100 ..... .... 0100 ..... 0011011 @r_p_4
+predsumu_dhs   0011100 ..... .... 0100 ..... 0011011 @r_p_4
+
+# register-pair operands
+pslli_db    00000 0001... .... 0110 .... 00011011 @p_ui8_p
+psrli_db    00000 0001... .... 1110 .... 00011011 @p_ui8_p
+psrai_db    01000 0001... .... 1110 .... 00011011 @p_ui8_p
+pmin_db     1110010 .... 1 .... 1110 .... 00011011 @r_p_2
+pminu_db    1110110 .... 1 .... 1110 .... 00011011 @r_p_2
+pmax_db     1111010 .... 1 .... 1110 .... 00011011 @r_p_2
+pmaxu_db    1111110 .... 1 .... 1110 .... 00011011 @r_p_2
+pmseq_db    1100010 .... 1 .... 1110 .... 00011011 @r_p_2
+pmslt_db    1101010 .... 1 .... 1110 .... 00011011 @r_p_2
+pmsltu_db   1101110 .... 1 .... 1110 .... 00011011 @r_p_2
+psext_dh_b  0110000 00100 .... 0110 .... 00011011 @r2_p
+psati_dh    01100 001.... .... 1110 .... 00011011 @p_ui16_p_2
+pusati_dh   00100 001.... .... 1110 .... 00011011 @p_ui16_p_2
+pslli_dh    00000 001.... .... 0110 .... 00011011 @p_ui16_p_2
+psrli_dh    00000 001.... .... 1110 .... 00011011 @p_ui16_p_2
+psrai_dh    01000 001.... .... 1110 .... 00011011 @p_ui16_p_2
+psslai_dh   01010 001.... .... 0110 .... 00011011 @p_ui16_p_2
+psrari_dh   01010 001.... .... 1110 .... 00011011 @p_ui16_p_2
+pmin_dh     1110000 .... 1 .... 1110 .... 00011011 @r_p_2
+pminu_dh    1110100 .... 1 .... 1110 .... 00011011 @r_p_2
+pmax_dh     1111000 .... 1 .... 1110 .... 00011011 @r_p_2
+pmaxu_dh    1111100 .... 1 .... 1110 .... 00011011 @r_p_2
+pmseq_dh    1100000 .... 1 .... 1110 .... 00011011 @r_p_2
+pmslt_dh    1101000 .... 1 .... 1110 .... 00011011 @r_p_2
+pmsltu_dh   1101100 .... 1 .... 1110 .... 00011011 @r_p_2
+psext_dw_b  0110001 00100 .... 0110 .... 00011011 @r2_p
+psext_dw_h  0110001 00101 .... 0110 .... 00011011 @r2_p
+psati_dw    01100 01..... .... 1110 .... 00011011 @p_ui32_p_2
+pusati_dw   00100 01..... .... 1110 .... 00011011 @p_ui32_p_2
+pslli_dw    00000 01..... .... 0110 .... 00011011 @p_ui32_p_2
+psrli_dw    00000 01..... .... 1110 .... 00011011 @p_ui32_p_2
+psrai_dw    01000 01..... .... 1110 .... 00011011 @p_ui32_p_2
+psslai_dw   01010 01..... .... 0110 .... 00011011 @p_ui32_p_2
+psrari_dw   01010 01..... .... 1110 .... 00011011 @p_ui32_p_2
+pmin_dw    1110001 .... 1 .... 1110 .... 00011011 @r_p_2
+pminu_dw   1110101 .... 1 .... 1110 .... 00011011 @r_p_2
+pmax_dw    1111001 .... 1 .... 1110 .... 00011011 @r_p_2
+pmaxu_dw   1111101 .... 1 .... 1110 .... 00011011 @r_p_2
+pmseq_dw    1100001 .... 1 .... 1110 .... 00011011 @r_p_2
+pmslt_dw    1101001 .... 1 .... 1110 .... 00011011 @r_p_2
+pmsltu_dw   1101101 .... 1 .... 1110 .... 00011011 @r_p_2
+
+# register-pair first source and dest
+padd_dbs    0001110 ..... .... 0110 .... 00011011 @r_p_3
+psll_dbs    0000110 ..... .... 0110 .... 00011011 @r_p_3
+psra_dbs    0100110 ..... .... 1110 .... 00011011 @r_p_3
+padd_dhs    0001100 ..... .... 0110 .... 00011011 @r_p_3
+psll_dhs    0000100 ..... .... 0110 .... 00011011 @r_p_3
+psrl_dhs    0000100 ..... .... 1110 .... 00011011 @r_p_3
+psra_dhs    0100100 ..... .... 1110 .... 00011011 @r_p_3
+pssha_dhs   0110100 ..... .... 0110 .... 00011011 @r_p_3
+psshar_dhs  0111100 ..... .... 0110 .... 00011011 @r_p_3
+padd_dws    0001101 ..... .... 0110 .... 00011011 @r_p_3
+psll_dws    0000101 ..... .... 0110 .... 00011011 @r_p_3
+psrl_dws    0000101 ..... .... 1110 .... 00011011 @r_p_3
+psra_dws    0100101 ..... .... 1110 .... 00011011 @r_p_3
+pssha_dws   0110101 ..... .... 0110 .... 00011011 @r_p_3
+psshar_dws  0111101 ..... .... 0110 .... 00011011 @r_p_3
+
+# register-pair operands
+ppaire_db    1000000 .... 0 .... 1110 .... 00011011 @r_p_2
+ppaireo_db  1001000 .... 0 .... 1110 .... 00011011 @r_p_2
+ppairoe_db  1010000 .... 0 .... 1110 .... 00011011 @r_p_2
+ppairo_db   1011000 .... 0 .... 1110 .... 00011011 @r_p_2
+ppaire_dh    1000001 .... 0 .... 1110 .... 00011011 @r_p_2
+ppaireo_dh  1001001 .... 0 .... 1110 .... 00011011 @r_p_2
+ppairoe_dh  1010001 .... 0 .... 1110 .... 00011011 @r_p_2
+ppairo_dh   1011001 .... 0 .... 1110 .... 00011011 @r_p_2
+
+#register-pair first source only
+pnsrli_b    00000 001.... .... 1100 ..... 0011011 @p_ui16_p_3
+pnsrai_b    01000 001.... .... 1100 ..... 0011011 @p_ui16_p_3
+pnsrari_b   01010 001.... .... 1100 ..... 0011011 @p_ui16_p_3
+pnclipi_b   01100 001.... .... 1100 ..... 0011011 @p_ui16_p_3
+pnclipri_b  01110 001.... .... 1100 ..... 0011011 @p_ui16_p_3
+pnclipiu_b  00100 001.... .... 1100 ..... 0011011 @p_ui16_p_3
+pnclipriu_b 00110 001.... .... 1100 ..... 0011011 @p_ui16_p_3
+pnsrl_bs    00001 00 ..... .... 1100 ..... 0011011 @r_p_4
+pnsra_bs    01001 00 ..... .... 1100 ..... 0011011 @r_p_4
+pnsrar_bs   01011 00 ..... .... 1100 ..... 0011011 @r_p_4
+pnclip_bs   01101 00 ..... .... 1100 ..... 0011011 @r_p_4
+pnclipr_bs  01111 00 ..... .... 1100 ..... 0011011 @r_p_4
+pnclipu_bs  00101 00 ..... .... 1100 ..... 0011011 @r_p_4
+pnclipru_bs 00111 00 ..... .... 1100 ..... 0011011 @r_p_4
+
+pnsrli_h    00000 01..... .... 1100 ..... 0011011 @p_ui32_p_3
+pnsrai_h    01000 01..... .... 1100 ..... 0011011 @p_ui32_p_3
+pnsrari_h   01010 01..... .... 1100 ..... 0011011 @p_ui32_p_3
+pnclipi_h   01100 01..... .... 1100 ..... 0011011 @p_ui32_p_3
+pnclipri_h  01110 01..... .... 1100 ..... 0011011 @p_ui32_p_3
+pnclipiu_h  00100 01..... .... 1100 ..... 0011011 @p_ui32_p_3
+pnclipriu_h 00110 01..... .... 1100 ..... 0011011 @p_ui32_p_3
+pnsrl_hs    00001 01 ..... .... 1100 ..... 0011011 @r_p_4
+pnsra_hs    01001 01 ..... .... 1100 ..... 0011011 @r_p_4
+pnsrar_hs   01011 01 ..... .... 1100 ..... 0011011 @r_p_4
+pnclip_hs   01101 01 ..... .... 1100 ..... 0011011 @r_p_4
+pnclipr_hs  01111 01 ..... .... 1100 ..... 0011011 @r_p_4
+pnclipu_hs  00101 01 ..... .... 1100 ..... 0011011 @r_p_4
+pnclipru_hs 00111 01 ..... .... 1100 ..... 0011011 @r_p_4
+
+nsrli       00000 1...... .... 1100 ..... 0011011 @p_ui64_p_2
+nsrai       01000 1...... .... 1100 ..... 0011011 @p_ui64_p_2
+nsrari      01010 1...... .... 1100 ..... 0011011 @p_ui64_p_2
+nclipi      01100 1...... .... 1100 ..... 0011011 @p_ui64_p_2
+nclipri     01110 1...... .... 1100 ..... 0011011 @p_ui64_p_2
+nclipiu     00100 1...... .... 1100 ..... 0011011 @p_ui64_p_2
+nclipriu    00110 1...... .... 1100 ..... 0011011 @p_ui64_p_2
+nsrl        00001 11 ..... .... 1100 ..... 0011011 @r_p_4
+nsra        01001 11 ..... .... 1100 ..... 0011011 @r_p_4
+nsrar       01011 11 ..... .... 1100 ..... 0011011 @r_p_4
+nclip       01101 11 ..... .... 1100 ..... 0011011 @r_p_4
+nclipr      01111 11 ..... .... 1100 ..... 0011011 @r_p_4
+nclipu      00101 11 ..... .... 1100 ..... 0011011 @r_p_4
+nclipru     00111 11 ..... .... 1100 ..... 0011011 @r_p_4
+
+# register-pair multiply
+pmqwacc_h       01111 00 ..... ..... 010 .... 10011011 @r_p_1
+pmqrwacc_h      01111 10 ..... ..... 010 .... 10011011 @r_p_1
+mqwacc          01111 01 ..... ..... 010 .... 10011011 @r_p_1
+mqrwacc         01111 11 ..... ..... 010 .... 10011011 @r_p_1
+
+pwmul_b         00100 10 ..... ..... 010 .... 10011011 @r_p_1
+pwmulsu_b       01100 10 ..... ..... 010 .... 10011011 @r_p_1
+pwmulu_b        00110 10 ..... ..... 010 .... 10011011 @r_p_1
+
+pwmul_h         00100 00 ..... ..... 010 .... 10011011 @r_p_1
+pwmulsu_h       01100 00 ..... ..... 010 .... 10011011 @r_p_1
+pwmulu_h        00110 00 ..... ..... 010 .... 10011011 @r_p_1
+pwmacc_h        00101 00 ..... ..... 010 .... 10011011 @r_p_1
+pwmaccsu_h      01101 00 ..... ..... 010 .... 10011011 @r_p_1
+pwmaccu_h       00111 00 ..... ..... 010 .... 10011011 @r_p_1
+
+wmul            00100 01 ..... ..... 010 .... 10011011 @r_p_1
+wmulsu          01100 01 ..... ..... 010 .... 10011011 @r_p_1
+wmulu           00110 01 ..... ..... 010 .... 10011011 @r_p_1
+wmacc           00101 01 ..... ..... 010 .... 10011011 @r_p_1
+wmaccsu         01101 01 ..... ..... 010 .... 10011011 @r_p_1
+wmaccu          00111 01 ..... ..... 010 .... 10011011 @r_p_1
+
+pm2wadd_h       00000 11 ..... ..... 010 .... 10011011 @r_p_1
+pm2waddsu_h     01100 11 ..... ..... 010 .... 10011011 @r_p_1
+pm2waddu_h      00100 11 ..... ..... 010 .... 10011011 @r_p_1
+pm2wadd_hx      00010 11 ..... ..... 010 .... 10011011 @r_p_1
+
+pm2wadda_h      00001 11 ..... ..... 010 .... 10011011 @r_p_1
+pm2waddasu_h    01101 11 ..... ..... 010 .... 10011011 @r_p_1
+pm2waddau_h     00101 11 ..... ..... 010 .... 10011011 @r_p_1
+pm2wadda_hx     00011 11 ..... ..... 010 .... 10011011 @r_p_1
+
+pm2wsub_h       01000 11 ..... ..... 010 .... 10011011 @r_p_1
+pm2wsub_hx      01010 11 ..... ..... 010 .... 10011011 @r_p_1
+pm2wsuba_h      01001 11 ..... ..... 010 .... 10011011 @r_p_1
+pm2wsuba_hx     01011 11 ..... ..... 010 .... 10011011 @r_p_1
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr=
ans/trans_rvp.c.inc
index b82774e00f..ca459293a3 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -2,6 +2,38 @@
 /* RISC-V translation routines for the P Standard Extensions. */
 /* Copyright (c) 2026 ISRC ISCAS. */
=20
+/* Save a 64 bit data in src to dst and dst + 1 */
+static void set_pair_regs(DisasContext *ctx, int dst, TCGv_i64 src)
+{
+#if defined(TARGET_RISCV32)
+    TCGv_i64 tl_64 =3D tcg_temp_new_i64();
+    TCGv_i64 th_64 =3D tcg_temp_new_i64();
+    TCGv_i32 tl_32 =3D tcg_temp_new_i32();
+    TCGv_i32 th_32 =3D tcg_temp_new_i32();
+    tcg_gen_extract_i64(tl_64, src, 0, 32);
+    tcg_gen_extract_i64(th_64, src, 32, 32);
+    tcg_gen_trunc_i64_tl(tl_32, tl_64);
+    tcg_gen_trunc_i64_tl(th_32, th_64);
+    gen_set_gpr(ctx, dst, tl_32);
+    gen_set_gpr(ctx, dst + 1, th_32);
+# else
+    gen_set_gpr(ctx, dst, src);
+#endif
+}
+
+/* Concat two 32 bit data in src and src + 1 to dst */
+static void get_pair_regs(DisasContext *ctx, TCGv_i64 dst, int src)
+{
+#if defined(TARGET_RISCV32)
+    TCGv t1 =3D get_gpr(ctx, src, EXT_NONE);
+    TCGv t2 =3D get_gpr(ctx, src + 1, EXT_NONE);
+    tcg_gen_concat_i32_i64(dst, t1, t2);
+#else
+    TCGv t1 =3D get_gpr(ctx, src, EXT_NONE);
+    tcg_gen_mov_tl(dst, t1);
+#endif
+}
+
 #define GEN_SIMD_TRANS(NAME)                                \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
 {                                                           \
@@ -10,7 +42,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * =
a) \
    TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE);              \
    TCGv dest =3D dest_gpr(ctx, a->rd);                        \
    gen_helper_##NAME(dest, tcg_env, src1, src2);            \
-   return true;                                             \
+   return true;                                            \
 }
=20
 #if defined(TARGET_RISCV32)
@@ -23,14 +55,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME =
* a)  \
     TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE);             \
     TCGv dest =3D dest_gpr(ctx, a->rd);                       \
     gen_helper_##NAME(dest, tcg_env, src1, src2);           \
-    return true;                                            \
+    return true;                                           \
 }
 #else
 #define GEN_SIMD_TRANS_32(NAME)                             \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a)  \
 {                                                           \
     REQUIRE_32BIT(ctx);                                     \
-    return true;                                            \
+    return true;                                           \
 }
 #endif
=20
@@ -39,7 +71,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * =
a)  \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a)  \
 {                                                           \
    REQUIRE_64BIT(ctx);                                      \
-   return true;                                             \
+   return true;                                            \
 }
 #else
 #define GEN_SIMD_TRANS_64(NAME)                             \
@@ -51,7 +83,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * =
a)  \
     TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE);             \
     TCGv dest =3D dest_gpr(ctx, a->rd);                       \
     gen_helper_##NAME(dest, tcg_env, src1, src2);           \
-    return true;                                            \
+    return true;                                           \
 }
 #endif
=20
@@ -65,7 +97,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * =
a) \
     TCGv t =3D tcg_temp_new();                                \
     gen_helper_##NAME(t, tcg_env, src1, src2, dest);        \
     gen_set_gpr(ctx, a->rd, t);                             \
-    return true;                                            \
+    return true;                                           \
 }
=20
 #if defined(TARGET_RISCV32)
@@ -80,14 +112,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME=
 * a) \
     TCGv t =3D tcg_temp_new();                                \
     gen_helper_##NAME(t, tcg_env, src1, src2, dest);        \
     gen_set_gpr(ctx, a->rd, t);                             \
-    return true;                                            \
+    return true;                                           \
 }
 #else
 #define GEN_SIMD_TRANS_ACC_32(NAME)                         \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
 {                                                           \
     REQUIRE_32BIT(ctx);                                     \
-    return true;                                            \
+    return true;                                           \
 }
 #endif
=20
@@ -96,7 +128,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME *=
 a) \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
 {                                                           \
     REQUIRE_64BIT(ctx);                                     \
-    return true;                                            \
+    return true;                                           \
 }
 #else
 #define GEN_SIMD_TRANS_ACC_64(NAME)                         \
@@ -110,7 +142,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME =
* a) \
     TCGv t =3D tcg_temp_new();                                \
     gen_helper_##NAME(t, tcg_env, src1, src2, dest);        \
     gen_set_gpr(ctx, a->rd, t);                             \
-    return true;                                            \
+    return true;                                           \
 }
 #endif
=20
@@ -122,7 +154,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME =
* a) \
     TCGv dest =3D dest_gpr(ctx, a->rd);                       \
     gen_helper_##NAME(dest, tcg_env, src1);                 \
     gen_set_gpr(ctx, a->rd, dest);                          \
-    return true;                                            \
+    return true;                                           \
 }
=20
 #if defined(TARGET_RISCV32)
@@ -130,7 +162,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME =
* a) \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
 {                                                           \
     REQUIRE_64BIT(ctx);                                     \
-    return true;                                            \
+    return true;                                           \
 }
 #else
 #define GEN_SIMD_TRANS_R1_64(NAME)                          \
@@ -141,7 +173,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME =
* a) \
     TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);             \
     TCGv dest =3D dest_gpr(ctx, a->rd);                       \
     gen_helper_##NAME(dest, tcg_env, src1);                 \
-    return true;                                            \
+    return true;                                           \
 }
 #endif
=20
@@ -153,7 +185,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME =
* a) \
     TCGv imm =3D tcg_constant_tl(a->imm);                     \
     TCGv dest =3D dest_gpr(ctx, a->rd);                       \
     gen_helper_##NAME(dest, tcg_env, src1, imm);            \
-    return true;                                            \
+    return true;                                           \
 }
=20
 #if defined(TARGET_RISCV32)
@@ -166,14 +198,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAM=
E * a) \
     TCGv imm =3D tcg_constant_tl(a->imm);                     \
     TCGv dest =3D dest_gpr(ctx, a->rd);                       \
     gen_helper_##NAME(dest, tcg_env, src1, imm);            \
-    return true;                                            \
+    return true;                                           \
 }
 #else
 #define GEN_SIMD_TRANS_IMM_32(NAME)                         \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
 {                                                           \
     REQUIRE_32BIT(ctx);                                     \
-    return true;                                            \
+    return true;                                           \
 }
 #endif
=20
@@ -182,7 +214,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME =
* a) \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
 {                                                           \
     REQUIRE_64BIT(ctx);                                     \
-    return true;                                            \
+    return true;                                           \
 }
 #else
 #define GEN_SIMD_TRANS_IMM_64(NAME)                         \
@@ -194,7 +226,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME =
* a) \
     TCGv imm =3D tcg_constant_tl(a->imm);                     \
     TCGv dest =3D dest_gpr(ctx, a->rd);                       \
     gen_helper_##NAME(dest, tcg_env, src1, imm);            \
-    return true;                                            \
+    return true;                                           \
 }
 #endif
=20
@@ -209,14 +241,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAM=
E * a) \
     TCGv_i64 t =3D tcg_temp_new_i64();                        \
     gen_helper_##NAME(t, tcg_env, src1, src2);              \
     set_pair_regs(ctx, (a->rd) * 2, t);                       \
-    return true;                                            \
+    return true;                                           \
 }
 #else
 #define GEN_SIMD_TRANS_REG_PAIR_1(NAME)                     \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
 {                                                           \
     REQUIRE_32BIT(ctx);                                     \
-    return true;                                            \
+    return true;                                           \
 }
 #endif
=20
@@ -234,14 +266,14 @@ static bool trans_##INSN(DisasContext *ctx, arg_##INS=
N * a)    \
     TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);                  \
     gen_helper_##HELPER(dest_0, tcg_env, src1_0, src2_0);      \
     gen_helper_##HELPER(dest_1, tcg_env, src1_1, src2_1);      \
-    return true;                                               \
+    return true;                                              \
 }
 #else
 #define GEN_SIMD_TRANS_REG_PAIR_2(INSN, HELPER)                \
 static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)    \
 {                                                              \
     REQUIRE_32BIT(ctx);                                        \
-    return true;                                               \
+    return true;                                              \
 }
 #endif
=20
@@ -257,14 +289,14 @@ static bool trans_##INSN(DisasContext *ctx, arg_##INS=
N * a)    \
     TCGv src2   =3D get_gpr(ctx, a->rs2, EXT_NONE);              \
     gen_helper_##HELPER(dest_0, tcg_env, src1_0, src2);        \
     gen_helper_##HELPER(dest_1, tcg_env, src1_1, src2);        \
-    return true;                                               \
+    return true;                                              \
 }
 #else
 #define GEN_SIMD_TRANS_REG_PAIR_3(INSN, HELPER)                \
 static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)    \
 {                                                              \
     REQUIRE_32BIT(ctx);                                        \
-    return true;                                               \
+    return true;                                              \
 }
 #endif
=20
@@ -282,14 +314,14 @@ static bool trans_##INSN(DisasContext *ctx, arg_##INS=
N * a)     \
     TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);                  \
     gen_helper_##HELPER(dest_0, tcg_env, src1_0, src2_0);      \
     gen_helper_##HELPER(dest_1, tcg_env, src1_1, src2_1);      \
-    return true;                                               \
+    return true;                                              \
 }
 #else
 #define GEN_SIMD_TRANS_REG_PAIR_DW(INSN, HELPER)               \
 static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)     \
 {                                                              \
     REQUIRE_32BIT(ctx);                                        \
-    return true;                                               \
+    return true;                                              \
 }
 #endif
=20
@@ -307,14 +339,14 @@ static bool trans_##INSN(DisasContext *ctx, arg_##INS=
N * a)     \
     TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);                  \
     gen_helper_##HELPER(dest_0, tcg_env, src1_0, imm_0);       \
     gen_helper_##HELPER(dest_1, tcg_env, src1_1, imm_1);       \
-    return true;                                               \
+    return true;                                              \
 }
 #else
 #define GEN_SIMD_TRANS_REG_PAIR_DW_IMM(INSN, HELPER)           \
 static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)     \
 {                                                              \
     REQUIRE_32BIT(ctx);                                        \
-    return true;                                               \
+    return true;                                              \
 }
 #endif
=20
@@ -332,14 +364,14 @@ static bool trans_##INSN(DisasContext *ctx, arg_##INS=
N * a)     \
     TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);                  \
     gen_helper_##HELPER##_32(dest_0, tcg_env, src1_0, imm_0);  \
     gen_helper_##HELPER##_32(dest_1, tcg_env, src1_1, imm_1);  \
-    return true;                                               \
+    return true;                                              \
 }
 #else
 #define GEN_SIMD_TRANS_REG_PAIR_DW_IMM_2(INSN, HELPER)         \
 static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)     \
 {                                                              \
     REQUIRE_32BIT(ctx);                                        \
-    return true;                                               \
+    return true;                                              \
 }
 #endif
=20
@@ -356,14 +388,14 @@ static bool trans_##INSN(DisasContext *ctx, arg_##INS=
N * a)    \
     gen_helper_##HELPER(dest_1, tcg_env, src1_1);              \
     gen_set_gpr(ctx, (a->rd) * 2, dest_0);                       \
     gen_set_gpr(ctx, (a->rd) * 2 + 1, dest_1);                     \
-    return true;                                               \
+    return true;                                              \
 }
 #else
 #define GEN_SIMD_TRANS_REG_PAIR_5(INSN, HELPER)                \
 static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)    \
 {                                                              \
     REQUIRE_32BIT(ctx);                                        \
-    return true;                                               \
+    return true;                                              \
 }
 #endif
=20
@@ -378,14 +410,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAM=
E * a) \
     TCGv_i64 t =3D tcg_temp_new_i64();                        \
     gen_helper_##NAME(t, tcg_env, src1, imm);               \
     set_pair_regs(ctx, (a->rd) * 2, t);                       \
-    return true;                                            \
+    return true;                                           \
 }
 #else
 #define GEN_SIMD_TRANS_REG_PAIR_IMM(NAME)                   \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
 {                                                           \
     REQUIRE_32BIT(ctx);                                     \
-    return true;                                            \
+    return true;                                           \
 }
 #endif
=20
@@ -403,14 +435,14 @@ static bool trans_##INSN(DisasContext *ctx, arg_##INS=
N * a)  \
     TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);                \
     gen_helper_##HELPER(dest_0, tcg_env, src1_0, imm_0);     \
     gen_helper_##HELPER(dest_1, tcg_env, src1_1, imm_1);     \
-    return true;                                             \
+    return true;                                            \
 }
 #else
 #define GEN_SIMD_TRANS_REG_PAIR_IMM_2(INSN, HELPER)          \
 static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a)  \
 {                                                            \
     REQUIRE_32BIT(ctx);                                      \
-    return true;                                             \
+    return true;                                            \
 }
 #endif
=20
@@ -430,14 +462,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAM=
E * a) \
     }                                                       \
     gen_helper_##NAME(t, tcg_env, src1, src2, t);           \
     set_pair_regs(ctx, (a->rd) * 2, t);                       \
-    return true;                                            \
+    return true;                                           \
 }
 #else
 #define GEN_SIMD_TRANS_ACC_REG_PAIR_1(NAME)                 \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
 {                                                           \
     REQUIRE_32BIT(ctx);                                     \
-    return true;                                            \
+    return true;                                           \
 }
 #endif
=20
@@ -461,14 +493,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAM=
E * a) \
         src1_h =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE);      \
     }                                                       \
     gen_helper_##NAME(dest, tcg_env, src1_l, src1_h, src2); \
-    return true;                                            \
+    return true;                                           \
 }
 #else
 #define GEN_SIMD_TRANS_REG_PAIR_PREDSUM(NAME)               \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
 {                                                           \
     REQUIRE_32BIT(ctx);                                     \
-    return true;                                            \
+    return true;                                           \
 }
 #endif
=20
@@ -487,14 +519,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAM=
E * a) \
     TCGv shamt =3D tcg_constant_tl(a->imm);                  \
     TCGv_i32 dest =3D dest_gpr(ctx, a->rd);                  \
     gen_helper_##NAME(dest, tcg_env, s1, shamt);           \
-    return true;                                           \
+    return true;                                          \
 }
 #else
 #define GEN_SIMD_TRANS_PN_OP_IMM(NAME)                     \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
 {                                                          \
     REQUIRE_32BIT(ctx);                                    \
-    return true;                                           \
+    return true;                                          \
 }
 #endif
=20
@@ -513,14 +545,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAM=
E * a) \
     TCGv_i32 rs2 =3D get_gpr(ctx, a->rs2, EXT_NONE);         \
     TCGv_i32 dest =3D dest_gpr(ctx, a->rd);                  \
     gen_helper_##NAME(dest, tcg_env, s1, rs2);             \
-    return true;                                           \
+    return true;                                          \
 }
 #else
 #define GEN_SIMD_TRANS_PN_OP_REG(NAME)                     \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
 {                                                          \
     REQUIRE_32BIT(ctx);                                    \
-    return true;                                           \
+    return true;                                          \
 }
 #endif
=20
@@ -907,6 +939,236 @@ GEN_SIMD_TRANS_ACC_64(pm4adda_h)
 GEN_SIMD_TRANS_ACC_64(pm4addasu_h)
 GEN_SIMD_TRANS_ACC_64(pm4addau_h)
=20
+/* Packed SIMD - Double-Width Operations (RV32 only, register pairs) */
+GEN_SIMD_TRANS_REG_PAIR_1(pwadd_b)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwadda_b)
+GEN_SIMD_TRANS_REG_PAIR_1(pwaddu_b)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwaddau_b)
+GEN_SIMD_TRANS_REG_PAIR_1(pwsub_b)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwsuba_b)
+GEN_SIMD_TRANS_REG_PAIR_1(pwsubu_b)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwsubau_b)
+GEN_SIMD_TRANS_REG_PAIR_IMM(pwslli_b)
+GEN_SIMD_TRANS_REG_PAIR_1(pwsll_bs)
+GEN_SIMD_TRANS_REG_PAIR_IMM(pwslai_b)
+GEN_SIMD_TRANS_REG_PAIR_1(pwsla_bs)
+
+GEN_SIMD_TRANS_REG_PAIR_1(pwadd_h)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwadda_h)
+GEN_SIMD_TRANS_REG_PAIR_1(pwaddu_h)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwaddau_h)
+GEN_SIMD_TRANS_REG_PAIR_1(pwsub_h)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwsuba_h)
+GEN_SIMD_TRANS_REG_PAIR_1(pwsubu_h)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwsubau_h)
+GEN_SIMD_TRANS_REG_PAIR_IMM(pwslli_h)
+GEN_SIMD_TRANS_REG_PAIR_1(pwsll_hs)
+GEN_SIMD_TRANS_REG_PAIR_IMM(pwslai_h)
+GEN_SIMD_TRANS_REG_PAIR_1(pwsla_hs)
+
+GEN_SIMD_TRANS_REG_PAIR_1(wadd)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(wadda)
+GEN_SIMD_TRANS_REG_PAIR_1(waddu)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(waddau)
+GEN_SIMD_TRANS_REG_PAIR_1(wsub)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(wsuba)
+GEN_SIMD_TRANS_REG_PAIR_1(wsubu)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(wsubau)
+
+GEN_SIMD_TRANS_REG_PAIR_IMM(wslli)
+GEN_SIMD_TRANS_REG_PAIR_1(wsll)
+GEN_SIMD_TRANS_REG_PAIR_IMM(wslai)
+GEN_SIMD_TRANS_REG_PAIR_1(wsla)
+
+GEN_SIMD_TRANS_REG_PAIR_2(padd_db, padd_b)
+GEN_SIMD_TRANS_REG_PAIR_2(psub_db, psub_b)
+GEN_SIMD_TRANS_REG_PAIR_2(psadd_db, psadd_b)
+GEN_SIMD_TRANS_REG_PAIR_2(psaddu_db, psaddu_b)
+GEN_SIMD_TRANS_REG_PAIR_2(pssub_db, pssub_b)
+GEN_SIMD_TRANS_REG_PAIR_2(pssubu_db, pssubu_b)
+GEN_SIMD_TRANS_REG_PAIR_2(paadd_db, paadd_b)
+GEN_SIMD_TRANS_REG_PAIR_2(paaddu_db, paaddu_b)
+GEN_SIMD_TRANS_REG_PAIR_2(pasub_db, pasub_b)
+GEN_SIMD_TRANS_REG_PAIR_2(pasubu_db, pasubu_b)
+GEN_SIMD_TRANS_REG_PAIR_2(pabd_db, pabd_b)
+GEN_SIMD_TRANS_REG_PAIR_2(pabdu_db, pabdu_b)
+GEN_SIMD_TRANS_REG_PAIR_5(psabs_db, psabs_b)
+GEN_SIMD_TRANS_REG_PAIR_2(padd_dh, padd_h)
+GEN_SIMD_TRANS_REG_PAIR_2(psub_dh, psub_h)
+GEN_SIMD_TRANS_REG_PAIR_2(psadd_dh, psadd_h)
+GEN_SIMD_TRANS_REG_PAIR_2(psaddu_dh, psaddu_h)
+GEN_SIMD_TRANS_REG_PAIR_2(pssub_dh, pssub_h)
+GEN_SIMD_TRANS_REG_PAIR_2(pssubu_dh, pssubu_h)
+GEN_SIMD_TRANS_REG_PAIR_2(paadd_dh, paadd_h)
+GEN_SIMD_TRANS_REG_PAIR_2(paaddu_dh, paaddu_h)
+GEN_SIMD_TRANS_REG_PAIR_2(pasub_dh, pasub_h)
+GEN_SIMD_TRANS_REG_PAIR_2(pasubu_dh, pasubu_h)
+GEN_SIMD_TRANS_REG_PAIR_2(psh1add_dh, psh1add_h)
+GEN_SIMD_TRANS_REG_PAIR_2(pssh1sadd_dh, pssh1sadd_h)
+GEN_SIMD_TRANS_REG_PAIR_2(pas_dhx, pas_hx)
+GEN_SIMD_TRANS_REG_PAIR_2(psa_dhx, psa_hx)
+GEN_SIMD_TRANS_REG_PAIR_2(psas_dhx, psas_hx)
+GEN_SIMD_TRANS_REG_PAIR_2(pssa_dhx, pssa_hx)
+GEN_SIMD_TRANS_REG_PAIR_2(paas_dhx, paas_hx)
+GEN_SIMD_TRANS_REG_PAIR_2(pasa_dhx, pasa_hx)
+GEN_SIMD_TRANS_REG_PAIR_2(pabd_dh, pabd_h)
+GEN_SIMD_TRANS_REG_PAIR_2(pabdu_dh, pabdu_h)
+GEN_SIMD_TRANS_REG_PAIR_5(psabs_dh, psabs_h)
+GEN_SIMD_TRANS_REG_PAIR_DW(psadd_dw, sadd)
+GEN_SIMD_TRANS_REG_PAIR_DW(psaddu_dw, saddu)
+GEN_SIMD_TRANS_REG_PAIR_DW(pssub_dw, ssub)
+GEN_SIMD_TRANS_REG_PAIR_DW(pssubu_dw, ssubu)
+GEN_SIMD_TRANS_REG_PAIR_DW(paadd_dw, aadd)
+GEN_SIMD_TRANS_REG_PAIR_DW(paaddu_dw, aaddu)
+GEN_SIMD_TRANS_REG_PAIR_DW(pasub_dw, asub)
+GEN_SIMD_TRANS_REG_PAIR_DW(pasubu_dw, asubu)
+GEN_SIMD_TRANS_REG_PAIR_DW(pssh1sadd_dw, ssh1sadd)
+
+GEN_SIMD_TRANS_REG_PAIR_IMM_2(pslli_db, pslli_b)
+GEN_SIMD_TRANS_REG_PAIR_IMM_2(psrli_db, psrli_b)
+GEN_SIMD_TRANS_REG_PAIR_IMM_2(psrai_db, psrai_b)
+GEN_SIMD_TRANS_REG_PAIR_2(pmin_db, pmin_b)
+GEN_SIMD_TRANS_REG_PAIR_2(pminu_db, pminu_b)
+GEN_SIMD_TRANS_REG_PAIR_2(pmax_db, pmax_b)
+GEN_SIMD_TRANS_REG_PAIR_2(pmaxu_db, pmaxu_b)
+GEN_SIMD_TRANS_REG_PAIR_2(pmseq_db, pmseq_b)
+GEN_SIMD_TRANS_REG_PAIR_2(pmslt_db, pmslt_b)
+GEN_SIMD_TRANS_REG_PAIR_2(pmsltu_db, pmsltu_b)
+GEN_SIMD_TRANS_REG_PAIR_5(psext_dh_b, psext_h_b)
+GEN_SIMD_TRANS_REG_PAIR_IMM_2(psati_dh, psati_h)
+GEN_SIMD_TRANS_REG_PAIR_IMM_2(pusati_dh, pusati_h)
+GEN_SIMD_TRANS_REG_PAIR_IMM_2(pslli_dh, pslli_h)
+GEN_SIMD_TRANS_REG_PAIR_IMM_2(psrli_dh, psrli_h)
+GEN_SIMD_TRANS_REG_PAIR_IMM_2(psrai_dh, psrai_h)
+GEN_SIMD_TRANS_REG_PAIR_IMM_2(psslai_dh, psslai_h)
+GEN_SIMD_TRANS_REG_PAIR_IMM_2(psrari_dh, psrari_h)
+GEN_SIMD_TRANS_REG_PAIR_2(pmin_dh, pmin_h)
+GEN_SIMD_TRANS_REG_PAIR_2(pminu_dh, pminu_h)
+GEN_SIMD_TRANS_REG_PAIR_2(pmax_dh, pmax_h)
+GEN_SIMD_TRANS_REG_PAIR_2(pmaxu_dh, pmaxu_h)
+GEN_SIMD_TRANS_REG_PAIR_2(pmseq_dh, pmseq_h)
+GEN_SIMD_TRANS_REG_PAIR_2(pmslt_dh, pmslt_h)
+GEN_SIMD_TRANS_REG_PAIR_2(pmsltu_dh, pmsltu_h)
+GEN_SIMD_TRANS_REG_PAIR_DW_IMM_2(psati_dw, sati)
+GEN_SIMD_TRANS_REG_PAIR_DW_IMM_2(pusati_dw, usati)
+GEN_SIMD_TRANS_REG_PAIR_DW_IMM(psslai_dw, sslai)
+GEN_SIMD_TRANS_REG_PAIR_DW_IMM_2(psrari_dw, srari)
+GEN_SIMD_TRANS_REG_PAIR_DW(pmseq_dw, mseq)
+GEN_SIMD_TRANS_REG_PAIR_DW(pmslt_dw, mslt)
+GEN_SIMD_TRANS_REG_PAIR_DW(pmsltu_dw, msltu)
+
+GEN_SIMD_TRANS_REG_PAIR_3(padd_dbs, padd_bs)
+GEN_SIMD_TRANS_REG_PAIR_3(psll_dbs, psll_bs)
+GEN_SIMD_TRANS_REG_PAIR_3(psra_dbs, psra_bs)
+GEN_SIMD_TRANS_REG_PAIR_3(padd_dhs, padd_hs)
+GEN_SIMD_TRANS_REG_PAIR_3(psll_dhs, psll_hs)
+GEN_SIMD_TRANS_REG_PAIR_3(psrl_dhs, psrl_hs)
+GEN_SIMD_TRANS_REG_PAIR_3(psra_dhs, psra_hs)
+GEN_SIMD_TRANS_REG_PAIR_3(pssha_dhs, pssha_hs)
+GEN_SIMD_TRANS_REG_PAIR_3(psshar_dhs, psshar_hs)
+GEN_SIMD_TRANS_REG_PAIR_DW(pssha_dws, ssha)
+GEN_SIMD_TRANS_REG_PAIR_DW(psshar_dws, sshar)
+
+GEN_SIMD_TRANS_REG_PAIR_2(ppairo_db, ppairo_b)
+GEN_SIMD_TRANS_REG_PAIR_2(ppairo_dh, ppairo_h)
+GEN_SIMD_TRANS_REG_PAIR_2(ppaire_db, ppaire_b)
+GEN_SIMD_TRANS_REG_PAIR_2(ppaireo_db, ppaireo_b)
+GEN_SIMD_TRANS_REG_PAIR_2(ppaireo_dh, ppaireo_h)
+GEN_SIMD_TRANS_REG_PAIR_2(ppairoe_dh, ppairoe_h)
+GEN_SIMD_TRANS_REG_PAIR_2(ppairoe_db, ppairoe_b)
+
+GEN_SIMD_TRANS_REG_PAIR_PREDSUM(predsum_dbs)
+GEN_SIMD_TRANS_REG_PAIR_PREDSUM(predsumu_dbs)
+GEN_SIMD_TRANS_REG_PAIR_PREDSUM(predsum_dhs)
+GEN_SIMD_TRANS_REG_PAIR_PREDSUM(predsumu_dhs)
+
+GEN_SIMD_TRANS_PN_OP_IMM(pnsrli_b)
+GEN_SIMD_TRANS_PN_OP_IMM(pnsrai_b)
+GEN_SIMD_TRANS_PN_OP_IMM(pnsrari_b)
+GEN_SIMD_TRANS_PN_OP_IMM(pnclipi_b)
+GEN_SIMD_TRANS_PN_OP_IMM(pnclipri_b)
+GEN_SIMD_TRANS_PN_OP_IMM(pnclipiu_b)
+GEN_SIMD_TRANS_PN_OP_IMM(pnclipriu_b)
+
+GEN_SIMD_TRANS_PN_OP_IMM(pnsrli_h)
+GEN_SIMD_TRANS_PN_OP_IMM(pnsrai_h)
+GEN_SIMD_TRANS_PN_OP_IMM(pnsrari_h)
+GEN_SIMD_TRANS_PN_OP_IMM(pnclipi_h)
+GEN_SIMD_TRANS_PN_OP_IMM(pnclipri_h)
+GEN_SIMD_TRANS_PN_OP_IMM(pnclipiu_h)
+GEN_SIMD_TRANS_PN_OP_IMM(pnclipriu_h)
+
+GEN_SIMD_TRANS_PN_OP_IMM(nsrli)
+GEN_SIMD_TRANS_PN_OP_IMM(nsrai)
+GEN_SIMD_TRANS_PN_OP_IMM(nsrari)
+GEN_SIMD_TRANS_PN_OP_IMM(nclipi)
+GEN_SIMD_TRANS_PN_OP_IMM(nclipri)
+GEN_SIMD_TRANS_PN_OP_IMM(nclipiu)
+GEN_SIMD_TRANS_PN_OP_IMM(nclipriu)
+
+GEN_SIMD_TRANS_PN_OP_REG(pnsrl_bs)
+GEN_SIMD_TRANS_PN_OP_REG(pnsra_bs)
+GEN_SIMD_TRANS_PN_OP_REG(pnsrar_bs)
+GEN_SIMD_TRANS_PN_OP_REG(pnclip_bs)
+GEN_SIMD_TRANS_PN_OP_REG(pnclipr_bs)
+GEN_SIMD_TRANS_PN_OP_REG(pnclipu_bs)
+GEN_SIMD_TRANS_PN_OP_REG(pnclipru_bs)
+
+GEN_SIMD_TRANS_PN_OP_REG(pnsrl_hs)
+GEN_SIMD_TRANS_PN_OP_REG(pnsra_hs)
+GEN_SIMD_TRANS_PN_OP_REG(pnsrar_hs)
+GEN_SIMD_TRANS_PN_OP_REG(pnclip_hs)
+GEN_SIMD_TRANS_PN_OP_REG(pnclipr_hs)
+GEN_SIMD_TRANS_PN_OP_REG(pnclipu_hs)
+GEN_SIMD_TRANS_PN_OP_REG(pnclipru_hs)
+
+GEN_SIMD_TRANS_PN_OP_REG(nsrl)
+GEN_SIMD_TRANS_PN_OP_REG(nsra)
+GEN_SIMD_TRANS_PN_OP_REG(nsrar)
+GEN_SIMD_TRANS_PN_OP_REG(nclip)
+GEN_SIMD_TRANS_PN_OP_REG(nclipr)
+GEN_SIMD_TRANS_PN_OP_REG(nclipu)
+GEN_SIMD_TRANS_PN_OP_REG(nclipru)
+
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pmqwacc_h)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pmqrwacc_h)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(mqwacc)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(mqrwacc)
+
+GEN_SIMD_TRANS_REG_PAIR_1(pwmul_b)
+GEN_SIMD_TRANS_REG_PAIR_1(pwmulsu_b)
+GEN_SIMD_TRANS_REG_PAIR_1(pwmulu_b)
+
+GEN_SIMD_TRANS_REG_PAIR_1(pwmul_h)
+GEN_SIMD_TRANS_REG_PAIR_1(pwmulsu_h)
+GEN_SIMD_TRANS_REG_PAIR_1(pwmulu_h)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwmacc_h)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwmaccsu_h)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwmaccu_h)
+
+GEN_SIMD_TRANS_REG_PAIR_1(wmul)
+GEN_SIMD_TRANS_REG_PAIR_1(wmulsu)
+GEN_SIMD_TRANS_REG_PAIR_1(wmulu)
+
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(wmacc)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(wmaccsu)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(wmaccu)
+
+GEN_SIMD_TRANS_REG_PAIR_1(pm2wadd_h)
+GEN_SIMD_TRANS_REG_PAIR_1(pm2waddsu_h)
+GEN_SIMD_TRANS_REG_PAIR_1(pm2waddu_h)
+GEN_SIMD_TRANS_REG_PAIR_1(pm2wadd_hx)
+
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pm2wadda_h)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pm2waddasu_h)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pm2waddau_h)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pm2wadda_hx)
+
+GEN_SIMD_TRANS_REG_PAIR_1(pm2wsub_h)
+GEN_SIMD_TRANS_REG_PAIR_1(pm2wsub_hx)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pm2wsuba_h)
+GEN_SIMD_TRANS_ACC_REG_PAIR_1(pm2wsuba_hx)
+
 static bool trans_pli_b(DisasContext *ctx, arg_pli_b * a)
 {
     REQUIRE_EXT(ctx, RVP);
@@ -973,3 +1235,439 @@ static bool trans_plui_w(DisasContext *ctx, arg_plui=
_w * a)
     gen_set_gpri(ctx, a->rd, imm);
     return true;
 }
+
+static bool trans_pli_db(DisasContext *ctx, arg_pli_db * a)
+{
+    REQUIRE_EXT(ctx, RVP);
+    int i =3D 1;
+    target_long imm =3D a->imm;
+    while (i < TARGET_LONG_SIZE) {
+        imm =3D ((imm << 8) + a->imm);
+        i++;
+    }
+    gen_set_gpri(ctx, (a->rd) * 2, imm);
+    gen_set_gpri(ctx, (a->rd) * 2 + 1, imm);
+    return true;
+}
+
+static bool trans_pli_dh(DisasContext *ctx, arg_pli_dh * a)
+{
+    REQUIRE_EXT(ctx, RVP);
+    int i =3D 1;
+    target_long imm =3D a->imm;
+    while (i < TARGET_LONG_SIZE / 2) {
+        imm =3D (imm << 16) + (a->imm & 0xFFFF);
+        i++;
+    }
+    gen_set_gpri(ctx, (a->rd) * 2, imm);
+    gen_set_gpri(ctx, (a->rd) * 2 + 1, imm);
+    return true;
+}
+
+static bool trans_plui_dh(DisasContext *ctx, arg_plui_dh * a)
+{
+    REQUIRE_EXT(ctx, RVP);
+    int i =3D 1;
+    target_long imm =3D a->imm;
+    while (i < TARGET_LONG_SIZE / 2) {
+        imm =3D (imm << 16) + (a->imm & 0xFFFF);
+        i++;
+    }
+    gen_set_gpri(ctx, (a->rd) * 2, imm);
+    gen_set_gpri(ctx, (a->rd) * 2 + 1, imm);
+    return true;
+}
+
+static bool trans_padd_dw(DisasContext *ctx, arg_padd_dw * a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE);
+    TCGv src2_0 =3D get_gpr(ctx, (a->rs2) * 2, EXT_NONE);
+    TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2);
+    TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE);
+    TCGv src2_1 =3D get_gpr(ctx, (a->rs2) * 2 + 1, EXT_NONE);
+    TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);
+    tcg_gen_add_tl(dest_0, src1_0, src2_0);
+    tcg_gen_add_tl(dest_1, src1_1, src2_1);
+    gen_set_gpr(ctx, (a->rd) * 2, dest_0);
+    gen_set_gpr(ctx, (a->rd) * 2 + 1, dest_1);
+    return true;
+}
+
+static bool trans_psub_dw(DisasContext *ctx, arg_psub_dw * a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE);
+    TCGv src2_0 =3D get_gpr(ctx, (a->rs2) * 2, EXT_NONE);
+    TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2);
+    TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE);
+    TCGv src2_1 =3D get_gpr(ctx, (a->rs2) * 2 + 1, EXT_NONE);
+    TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);
+    tcg_gen_sub_tl(dest_0, src1_0, src2_0);
+    tcg_gen_sub_tl(dest_1, src1_1, src2_1);
+    gen_set_gpr(ctx, (a->rd) * 2, dest_0);
+    gen_set_gpr(ctx, (a->rd) * 2 + 1, dest_1);
+    return true;
+}
+
+static bool trans_psh1add_dw(DisasContext *ctx, arg_psh1add_dw * a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE);
+    TCGv src2_0 =3D get_gpr(ctx, (a->rs2) * 2, EXT_NONE);
+    TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2);
+    TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE);
+    TCGv src2_1 =3D get_gpr(ctx, (a->rs2) * 2 + 1, EXT_NONE);
+    TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);
+    gen_sh1add(dest_0, src1_0, src2_0);
+    gen_sh1add(dest_1, src1_1, src2_1);
+    gen_set_gpr(ctx, (a->rd) * 2, dest_0);
+    gen_set_gpr(ctx, (a->rd) * 2 + 1, dest_1);
+    return true;
+}
+
+/* Verify rd is not zero register for wzip8p and wzip16p. */
+#if defined(TARGET_RISCV32)
+static bool trans_wzip8p(DisasContext *ctx, arg_wzip8p * a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    TCGv_i32 src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);
+    TCGv_i32 src2 =3D get_gpr(ctx, a->rs2, EXT_NONE);
+    TCGv_i64 t =3D tcg_temp_new_i64();
+    if (a->rd =3D=3D 0) {
+        return true;
+    } else {
+        get_pair_regs(ctx, t, (a->rd) * 2);
+    }
+    gen_helper_wzip8p(t, tcg_env, src1, src2);
+    set_pair_regs(ctx, (a->rd) * 2, t);
+    return true;
+}
+#else
+static bool trans_wzip8p(DisasContext *ctx, arg_wzip8p * a)
+{
+    REQUIRE_32BIT(ctx);
+    return true;
+}
+#endif
+
+#if defined(TARGET_RISCV32)
+static bool trans_wzip16p(DisasContext *ctx, arg_wzip16p * a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    TCGv_i32 src1 =3D get_gpr(ctx, a->rs1, EXT_NONE);
+    TCGv_i32 src2 =3D get_gpr(ctx, a->rs2, EXT_NONE);
+    TCGv_i64 t =3D tcg_temp_new_i64();
+    if (a->rd =3D=3D 0) {
+        return true;
+    } else {
+        get_pair_regs(ctx, t, (a->rd) * 2);
+    }
+    gen_helper_wzip16p(t, tcg_env, src1, src2);
+    set_pair_regs(ctx, (a->rd) * 2, t);
+    return true;
+}
+#else
+static bool trans_wzip16p(DisasContext *ctx, arg_wzip16p * a)
+{
+    REQUIRE_32BIT(ctx);
+    return true;
+}
+#endif
+
+static bool trans_addd_p(DisasContext *ctx, arg_addd_p * a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    TCGv_i64 src1 =3D tcg_temp_new_i64();
+    TCGv_i64 src2 =3D tcg_temp_new_i64();
+    TCGv_i64 dest =3D tcg_temp_new_i64();
+    get_pair_regs(ctx, src1, (a->rs1) * 2);
+    get_pair_regs(ctx, src2, (a->rs2) * 2);
+    get_pair_regs(ctx, dest, (a->rd) * 2);
+    tcg_gen_add_i64(dest, src1, src2);
+    set_pair_regs(ctx, (a->rd) * 2, dest);
+
+    return true;
+}
+
+static bool trans_subd_p(DisasContext *ctx, arg_subd_p * a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    TCGv_i64 src1 =3D tcg_temp_new_i64();
+    TCGv_i64 src2 =3D tcg_temp_new_i64();
+    TCGv_i64 dest =3D tcg_temp_new_i64();
+    get_pair_regs(ctx, src1, (a->rs1) * 2);
+    get_pair_regs(ctx, src2, (a->rs2) * 2);
+    get_pair_regs(ctx, dest, (a->rd) * 2);
+    tcg_gen_sub_i64(dest, src1, src2);
+    set_pair_regs(ctx, (a->rd) * 2, dest);
+
+    return true;
+}
+
+static bool trans_psext_dw_b(DisasContext *ctx, arg_psext_dw_b * a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2);
+    TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE);
+    TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);
+    TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE);
+
+    tcg_gen_ext8s_tl(dest_0, src1_0);
+    gen_set_gpr(ctx, (a->rd) * 2, dest_0);
+
+    tcg_gen_ext8s_tl(dest_1, src1_1);
+    gen_set_gpr(ctx, (a->rd) * 2 + 1, dest_1);
+
+    return true;
+}
+
+static bool trans_psext_dw_h(DisasContext *ctx, arg_psext_dw_h * a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2);
+    TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE);
+    TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1);
+    TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE);
+
+    tcg_gen_ext16s_tl(dest_0, src1_0);
+    gen_set_gpr(ctx, (a->rd) * 2, dest_0);
+
+    tcg_gen_ext16s_tl(dest_1, src1_1);
+    gen_set_gpr(ctx, (a->rd) * 2 + 1, dest_1);
+
+    return true;
+}
+
+static bool trans_pslli_dw(DisasContext *ctx, arg_pslli_dw *a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    arg_shift a0, a1;
+    a0.rd =3D (a->rd) * 2;
+    a0.rs1 =3D (a->rs1) * 2;
+    a0.shamt =3D a->imm;
+    a1.rd =3D (a->rd) * 2 + 1;
+    a1.rs1 =3D (a->rs1) * 2 + 1;
+    a1.shamt =3D a->imm;
+
+    gen_shift_imm_fn(ctx, &a0, EXT_NONE, tcg_gen_shli_tl, NULL);
+    gen_shift_imm_fn(ctx, &a1, EXT_NONE, tcg_gen_shli_tl, NULL);
+
+    return true;
+}
+
+static bool trans_psrli_dw(DisasContext *ctx, arg_psrli_dw *a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    arg_shift a0, a1;
+    a0.rd =3D (a->rd) * 2;
+    a0.rs1 =3D (a->rs1) * 2;
+    a0.shamt =3D a->imm;
+    a1.rd =3D (a->rd) * 2 + 1;
+    a1.rs1 =3D (a->rs1) * 2 + 1;
+    a1.shamt =3D a->imm;
+
+    gen_shift_imm_fn_per_ol(ctx, &a0, EXT_NONE, tcg_gen_shri_tl,
+                            gen_srliw, NULL);
+    gen_shift_imm_fn_per_ol(ctx, &a1, EXT_NONE, tcg_gen_shri_tl,
+                            gen_srliw, NULL);
+
+    return true;
+}
+
+static bool trans_psrai_dw(DisasContext *ctx, arg_psrai_dw *a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    arg_shift a0, a1;
+    a0.rd =3D (a->rd) * 2;
+    a0.rs1 =3D (a->rs1) * 2;
+    a0.shamt =3D a->imm;
+    a1.rd =3D (a->rd) * 2 + 1;
+    a1.rs1 =3D (a->rs1) * 2 + 1;
+    a1.shamt =3D a->imm;
+
+    gen_shift_imm_fn_per_ol(ctx, &a0, EXT_NONE, tcg_gen_sari_tl,
+                            gen_sraiw, NULL);
+    gen_shift_imm_fn_per_ol(ctx, &a1, EXT_NONE, tcg_gen_sari_tl,
+                            gen_sraiw, NULL);
+
+    return true;
+}
+
+static bool trans_pmin_dw(DisasContext *ctx, arg_pmin_dw *a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    REQUIRE_ZBB(ctx);
+    arg_r a0, a1;
+    a0.rd =3D (a->rd) * 2;
+    a0.rs1 =3D (a->rs1) * 2;
+    a0.rs2 =3D (a->rs2) * 2;
+    a1.rd =3D (a->rd) * 2 + 1;
+    a1.rs1 =3D (a->rs1) * 2 + 1;
+    a1.rs2 =3D (a->rs2) * 2 + 1;
+
+    gen_arith(ctx, &a0, EXT_SIGN, tcg_gen_smin_tl, NULL);
+    gen_arith(ctx, &a1, EXT_SIGN, tcg_gen_smin_tl, NULL);
+
+    return true;
+}
+
+static bool trans_pminu_dw(DisasContext *ctx, arg_pminu_dw *a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    REQUIRE_ZBB(ctx);
+    arg_r a0, a1;
+    a0.rd =3D (a->rd) * 2;
+    a0.rs1 =3D (a->rs1) * 2;
+    a0.rs2 =3D (a->rs2) * 2;
+    a1.rd =3D (a->rd) * 2 + 1;
+    a1.rs1 =3D (a->rs1) * 2 + 1;
+    a1.rs2 =3D (a->rs2) * 2 + 1;
+
+    gen_arith(ctx, &a0, EXT_SIGN, tcg_gen_umin_tl, NULL);
+    gen_arith(ctx, &a1, EXT_SIGN, tcg_gen_umin_tl, NULL);
+
+    return true;
+}
+
+static bool trans_pmax_dw(DisasContext *ctx, arg_pmax_dw *a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    REQUIRE_ZBB(ctx);
+    arg_r a0, a1;
+    a0.rd =3D (a->rd) * 2;
+    a0.rs1 =3D (a->rs1) * 2;
+    a0.rs2 =3D (a->rs2) * 2;
+    a1.rd =3D (a->rd) * 2 + 1;
+    a1.rs1 =3D (a->rs1) * 2 + 1;
+    a1.rs2 =3D (a->rs2) * 2 + 1;
+
+    gen_arith(ctx, &a0, EXT_SIGN, tcg_gen_smax_tl, NULL);
+    gen_arith(ctx, &a1, EXT_SIGN, tcg_gen_smax_tl, NULL);
+
+    return true;
+}
+
+static bool trans_pmaxu_dw(DisasContext *ctx, arg_pmaxu_dw *a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    REQUIRE_ZBB(ctx);
+    arg_r a0, a1;
+    a0.rd =3D (a->rd) * 2;
+    a0.rs1 =3D (a->rs1) * 2;
+    a0.rs2 =3D (a->rs2) * 2;
+    a1.rd =3D (a->rd) * 2 + 1;
+    a1.rs1 =3D (a->rs1) * 2 + 1;
+    a1.rs2 =3D (a->rs2) * 2 + 1;
+
+    gen_arith(ctx, &a0, EXT_SIGN, tcg_gen_umax_tl, NULL);
+    gen_arith(ctx, &a1, EXT_SIGN, tcg_gen_umax_tl, NULL);
+
+    return true;
+}
+
+static bool trans_padd_dws(DisasContext *ctx, arg_padd_dws *a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    arg_r a0, a1;
+    a0.rd =3D (a->rd) * 2;
+    a0.rs1 =3D (a->rs1) * 2;
+    a0.rs2 =3D a->rs2;
+    a1.rd =3D (a->rd) * 2 + 1;
+    a1.rs1 =3D (a->rs1) * 2 + 1;
+    a1.rs2 =3D a->rs2;
+
+    gen_arith(ctx, &a0, EXT_NONE, tcg_gen_add_tl, NULL);
+    gen_arith(ctx, &a1, EXT_NONE, tcg_gen_add_tl, NULL);
+
+    return true;
+}
+
+static bool trans_psll_dws(DisasContext *ctx, arg_psll_dws *a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    arg_r a0, a1;
+    a0.rd =3D (a->rd) * 2;
+    a0.rs1 =3D (a->rs1) * 2;
+    a0.rs2 =3D a->rs2;
+    a1.rd =3D (a->rd) * 2 + 1;
+    a1.rs1 =3D (a->rs1) * 2 + 1;
+    a1.rs2 =3D a->rs2;
+
+    gen_shift(ctx, &a0, EXT_NONE, tcg_gen_shl_tl, NULL);
+    gen_shift(ctx, &a1, EXT_NONE, tcg_gen_shl_tl, NULL);
+
+    return true;
+}
+
+static bool trans_psrl_dws(DisasContext *ctx, arg_psrl_dws *a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    arg_r a0, a1;
+    a0.rd =3D (a->rd) * 2;
+    a0.rs1 =3D (a->rs1) * 2;
+    a0.rs2 =3D a->rs2;
+    a1.rd =3D (a->rd) * 2 + 1;
+    a1.rs1 =3D (a->rs1) * 2 + 1;
+    a1.rs2 =3D a->rs2;
+
+    gen_shift(ctx, &a0, EXT_ZERO, tcg_gen_shr_tl, NULL);
+    gen_shift(ctx, &a1, EXT_ZERO, tcg_gen_shr_tl, NULL);
+
+    return true;
+}
+
+static bool trans_psra_dws(DisasContext *ctx, arg_psra_dws *a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    arg_r a0, a1;
+    a0.rd =3D (a->rd) * 2;
+    a0.rs1 =3D (a->rs1) * 2;
+    a0.rs2 =3D a->rs2;
+    a1.rd =3D (a->rd) * 2 + 1;
+    a1.rs1 =3D (a->rs1) * 2 + 1;
+    a1.rs2 =3D a->rs2;
+
+    gen_shift(ctx, &a0, EXT_SIGN, tcg_gen_sar_tl, NULL);
+    gen_shift(ctx, &a1, EXT_SIGN, tcg_gen_sar_tl, NULL);
+
+    return true;
+}
+
+static bool trans_ppaire_dh(DisasContext *ctx, arg_ppaire_dh *a)
+{
+    REQUIRE_32BIT(ctx);
+    REQUIRE_EXT(ctx, RVP);
+    REQUIRE_ZBKB(ctx);
+    arg_r a0, a1;
+    a0.rd =3D (a->rd) * 2;
+    a0.rs1 =3D (a->rs1) * 2;
+    a0.rs2 =3D (a->rs2) * 2;
+    a1.rd =3D (a->rd) * 2 + 1;
+    a1.rs1 =3D (a->rs1) * 2 + 1;
+    a1.rs2 =3D (a->rs2) * 2 + 1;
+
+    gen_arith(ctx, &a0, EXT_NONE, gen_pack, NULL);
+    gen_arith(ctx, &a1, EXT_NONE, gen_pack, NULL);
+    return true;
+}
diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c
index 5eede48581..4c91800128 100644
--- a/target/riscv/psimd_helper.c
+++ b/target/riscv/psimd_helper.c
@@ -7012,3 +7012,2071 @@ uint64_t HELPER(pm4addau_h)(CPURISCVState *env, ui=
nt64_t rs1,
     uint64_t prod3 =3D (uint64_t)s1_h3 * (uint64_t)s2_h3;
     return d + prod0 + prod1 + prod2 + prod3;
 }
+
+/* Double-Width Operations (RV32 only, register pairs) */
+
+/**
+ * PWADD.B - Packed widening byte to halfword addition (RV32)
+ * rd_pair =3D {rs1[31:24]+rs2[31:24], rs1[23:16]+rs2[23:16],
+ *            rs1[15:8]+rs2[15:8], rs1[7:0]+rs2[7:0]} (sign-extended)
+ */
+uint64_t HELPER(pwadd_b)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        int16_t e1 =3D (int8_t)((rs1 >> (i * 8)) & 0xFF);
+        int16_t e2 =3D (int8_t)((rs2 >> (i * 8)) & 0xFF);
+        int16_t res =3D e1 + e2;
+        rd |=3D ((uint64_t)(uint16_t)res) << (i * 16);
+    }
+
+    return rd;
+}
+
+/**
+ * PWADDA.B - Packed widening byte to halfword addition with accumulate (R=
V32)
+ * rd_pair +=3D {rs1[i] + rs2[i]}
+ */
+uint64_t HELPER(pwadda_b)(CPURISCVState *env, uint32_t rs1,
+                          uint32_t rs2, uint64_t rd)
+{
+    uint64_t result =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        int16_t e1 =3D (int8_t)((rs1 >> (i * 8)) & 0xFF);
+        int16_t e2 =3D (int8_t)((rs2 >> (i * 8)) & 0xFF);
+        int16_t acc =3D (int16_t)((rd >> (i * 16)) & 0xFFFF);
+        int16_t res =3D acc + e1 + e2;
+        result |=3D ((uint64_t)(uint16_t)res) << (i * 16);
+    }
+
+    return result;
+}
+
+/**
+ * PWADDU.B - Packed widening byte to halfword unsigned addition (RV32)
+ */
+uint64_t HELPER(pwaddu_b)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t e1 =3D (uint8_t)((rs1 >> (i * 8)) & 0xFF);
+        uint16_t e2 =3D (uint8_t)((rs2 >> (i * 8)) & 0xFF);
+        uint16_t res =3D e1 + e2;
+        rd |=3D ((uint64_t)res) << (i * 16);
+    }
+
+    return rd;
+}
+
+/**
+ * PWADDAU.B - Packed widening byte to halfword unsigned addition
+ * with accumulate (RV32)
+ */
+uint64_t HELPER(pwaddau_b)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint64_t rd)
+{
+    uint64_t result =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t e1 =3D (uint8_t)((rs1 >> (i * 8)) & 0xFF);
+        uint16_t e2 =3D (uint8_t)((rs2 >> (i * 8)) & 0xFF);
+        uint16_t acc =3D (uint16_t)((rd >> (i * 16)) & 0xFFFF);
+        uint16_t res =3D acc + e1 + e2;
+        result |=3D ((uint64_t)res) << (i * 16);
+    }
+
+    return result;
+}
+
+/**
+ * PWSUB.B - Packed widening byte to halfword subtraction (RV32)
+ */
+uint64_t HELPER(pwsub_b)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        int16_t e1 =3D (int8_t)((rs1 >> (i * 8)) & 0xFF);
+        int16_t e2 =3D (int8_t)((rs2 >> (i * 8)) & 0xFF);
+        int16_t res =3D e1 - e2;
+        rd |=3D ((uint64_t)(uint16_t)res) << (i * 16);
+    }
+
+    return rd;
+}
+
+/**
+ * PWSUBA.B - Packed widening byte to halfword subtraction
+ * with accumulate (RV32)
+ */
+uint64_t HELPER(pwsuba_b)(CPURISCVState *env, uint32_t rs1,
+                          uint32_t rs2, uint64_t rd)
+{
+    uint64_t result =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        int16_t e1 =3D (int8_t)((rs1 >> (i * 8)) & 0xFF);
+        int16_t e2 =3D (int8_t)((rs2 >> (i * 8)) & 0xFF);
+        int16_t acc =3D (int16_t)((rd >> (i * 16)) & 0xFFFF);
+        int16_t res =3D acc + (e1 - e2);
+        result |=3D ((uint64_t)(uint16_t)res) << (i * 16);
+    }
+
+    return result;
+}
+
+/**
+ * PWSUBU.B - Packed widening byte to halfword unsigned subtraction (RV32)
+ */
+uint64_t HELPER(pwsubu_b)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t e1 =3D (uint8_t)((rs1 >> (i * 8)) & 0xFF);
+        uint16_t e2 =3D (uint8_t)((rs2 >> (i * 8)) & 0xFF);
+        uint16_t res =3D e1 - e2;
+        rd |=3D ((uint64_t)res) << (i * 16);
+    }
+
+    return rd;
+}
+
+/**
+ * PWSUBAU.B - Packed widening byte to halfword unsigned subtraction
+ * with accumulate (RV32)
+ */
+uint64_t HELPER(pwsubau_b)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint64_t rd)
+{
+    uint64_t result =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t e1 =3D (uint8_t)((rs1 >> (i * 8)) & 0xFF);
+        uint16_t e2 =3D (uint8_t)((rs2 >> (i * 8)) & 0xFF);
+        uint16_t acc =3D (uint16_t)((rd >> (i * 16)) & 0xFFFF);
+        uint16_t res =3D acc + (e1 - e2);
+        result |=3D ((uint64_t)res) << (i * 16);
+    }
+
+    return result;
+}
+
+/**
+ * PWSLLI.B - Packed widening shift left immediate (byte to halfword)
+ */
+uint64_t HELPER(pwslli_b)(CPURISCVState *env, uint32_t rs1, uint32_t imm)
+{
+    uint64_t rd =3D 0;
+    uint8_t shamt =3D imm & 0x0F;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t e1 =3D (uint8_t)((rs1 >> (i * 8)) & 0xFF);
+        uint16_t res =3D e1 << shamt;
+        rd |=3D ((uint64_t)res) << (i * 16);
+    }
+
+    return rd;
+}
+
+/**
+ * PWSLL.BS - Packed widening shift left from register (byte to halfword)
+ */
+uint64_t HELPER(pwsll_bs)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+    uint8_t shamt =3D rs2 & 0x1F;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t e1 =3D (uint8_t)((rs1 >> (i * 8)) & 0xFF);
+        uint16_t res =3D e1 << shamt;
+        rd |=3D ((uint64_t)res) << (i * 16);
+    }
+
+    return rd;
+}
+
+/**
+ * PWSLAI.B - Packed widening signed shift left immediate (byte to halfwor=
d)
+ */
+uint64_t HELPER(pwslai_b)(CPURISCVState *env, uint32_t rs1, uint32_t imm)
+{
+    uint64_t rd =3D 0;
+    uint8_t shamt =3D imm & 0x0F;
+
+    for (int i =3D 0; i < 4; i++) {
+        int16_t e1 =3D (int8_t)((rs1 >> (i * 8)) & 0xFF);
+        int16_t res =3D e1 << shamt;
+        rd |=3D ((uint64_t)(uint16_t)res) << (i * 16);
+    }
+
+    return rd;
+}
+
+/**
+ * PWSLA.BS - Packed widening signed shift left from register (byte to hal=
fword)
+ */
+uint64_t HELPER(pwsla_bs)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+    uint8_t shamt =3D rs2 & 0x1F;
+
+    for (int i =3D 0; i < 4; i++) {
+        int16_t e1 =3D (int8_t)((rs1 >> (i * 8)) & 0xFF);
+        int16_t res =3D e1 << shamt;
+        rd |=3D ((uint64_t)(uint16_t)res) << (i * 16);
+    }
+
+    return rd;
+}
+
+/**
+ * PWADD.H - Packed widening halfword to word addition (RV32)
+ */
+uint64_t HELPER(pwadd_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t e1 =3D (int16_t)((rs1 >> (i * 16)) & 0xFFFF);
+        int32_t e2 =3D (int16_t)((rs2 >> (i * 16)) & 0xFFFF);
+        int32_t res =3D e1 + e2;
+        rd |=3D ((uint64_t)(uint32_t)res) << (i * 32);
+    }
+
+    return rd;
+}
+
+/**
+ * PWADDA.H - Packed widening halfword to word addition with accumulate (R=
V32)
+ */
+uint64_t HELPER(pwadda_h)(CPURISCVState *env, uint32_t rs1,
+                          uint32_t rs2, uint64_t rd)
+{
+    uint64_t result =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t e1 =3D (int16_t)((rs1 >> (i * 16)) & 0xFFFF);
+        int32_t e2 =3D (int16_t)((rs2 >> (i * 16)) & 0xFFFF);
+        int32_t acc =3D (int32_t)((rd >> (i * 32)) & 0xFFFFFFFF);
+        int32_t res =3D acc + e1 + e2;
+        result |=3D ((uint64_t)(uint32_t)res) << (i * 32);
+    }
+
+    return result;
+}
+
+/**
+ * PWADDU.H - Packed widening halfword to word unsigned addition (RV32)
+ */
+uint64_t HELPER(pwaddu_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t e1 =3D (uint16_t)((rs1 >> (i * 16)) & 0xFFFF);
+        uint32_t e2 =3D (uint16_t)((rs2 >> (i * 16)) & 0xFFFF);
+        uint32_t res =3D e1 + e2;
+        rd |=3D ((uint64_t)res) << (i * 32);
+    }
+
+    return rd;
+}
+
+/**
+ * PWADDAU.H - Packed widening halfword to word unsigned addition
+ * with accumulate (RV32)
+ */
+uint64_t HELPER(pwaddau_h)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint64_t rd)
+{
+    uint64_t result =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t e1 =3D (uint16_t)((rs1 >> (i * 16)) & 0xFFFF);
+        uint32_t e2 =3D (uint16_t)((rs2 >> (i * 16)) & 0xFFFF);
+        uint32_t acc =3D (uint32_t)((rd >> (i * 32)) & 0xFFFFFFFF);
+        uint32_t res =3D acc + e1 + e2;
+        result |=3D ((uint64_t)res) << (i * 32);
+    }
+
+    return result;
+}
+
+/**
+ * PWSUB.H - Packed widening halfword to word subtraction (RV32)
+ */
+uint64_t HELPER(pwsub_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t e1 =3D (int16_t)((rs1 >> (i * 16)) & 0xFFFF);
+        int32_t e2 =3D (int16_t)((rs2 >> (i * 16)) & 0xFFFF);
+        int32_t res =3D e1 - e2;
+        rd |=3D ((uint64_t)(uint32_t)res) << (i * 32);
+    }
+
+    return rd;
+}
+
+/**
+ * PWSUBA.H - Packed widening halfword to word subtraction
+ * with accumulate (RV32)
+ */
+uint64_t HELPER(pwsuba_h)(CPURISCVState *env, uint32_t rs1,
+                          uint32_t rs2, uint64_t rd)
+{
+    uint64_t result =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t e1 =3D (int16_t)((rs1 >> (i * 16)) & 0xFFFF);
+        int32_t e2 =3D (int16_t)((rs2 >> (i * 16)) & 0xFFFF);
+        int32_t acc =3D (int32_t)((rd >> (i * 32)) & 0xFFFFFFFF);
+        int32_t res =3D acc + (e1 - e2);
+        result |=3D ((uint64_t)(uint32_t)res) << (i * 32);
+    }
+
+    return result;
+}
+
+/**
+ * PWSUBU.H - Packed widening halfword to word unsigned subtraction (RV32)
+ */
+uint64_t HELPER(pwsubu_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t e1 =3D (uint16_t)((rs1 >> (i * 16)) & 0xFFFF);
+        uint32_t e2 =3D (uint16_t)((rs2 >> (i * 16)) & 0xFFFF);
+        uint32_t res =3D e1 - e2;
+        rd |=3D ((uint64_t)res) << (i * 32);
+    }
+
+    return rd;
+}
+
+/**
+ * PWSUBAU.H - Packed widening halfword to word unsigned subtraction
+ * with accumulate (RV32)
+ */
+uint64_t HELPER(pwsubau_h)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint64_t rd)
+{
+    uint64_t result =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t e1 =3D (uint16_t)((rs1 >> (i * 16)) & 0xFFFF);
+        uint32_t e2 =3D (uint16_t)((rs2 >> (i * 16)) & 0xFFFF);
+        uint32_t acc =3D (uint32_t)((rd >> (i * 32)) & 0xFFFFFFFF);
+        uint32_t res =3D acc + (e1 - e2);
+        result |=3D ((uint64_t)res) << (i * 32);
+    }
+
+    return result;
+}
+
+/**
+ * PWSLLI.H - Packed widening shift left immediate (halfword to word)
+ */
+uint64_t HELPER(pwslli_h)(CPURISCVState *env, uint32_t rs1, uint32_t imm)
+{
+    uint64_t rd =3D 0;
+    uint8_t shamt =3D imm & 0x1F;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t e1 =3D (uint16_t)((rs1 >> (i * 16)) & 0xFFFF);
+        uint32_t res =3D e1 << shamt;
+        rd |=3D ((uint64_t)res) << (i * 32);
+    }
+
+    return rd;
+}
+
+/**
+ * PWSLL.HS - Packed widening shift left from register (halfword to word)
+ */
+uint64_t HELPER(pwsll_hs)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+    uint8_t shamt =3D rs2 & 0x1F;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t e1 =3D (uint16_t)((rs1 >> (i * 16)) & 0xFFFF);
+        uint32_t res =3D e1 << shamt;
+        rd |=3D ((uint64_t)res) << (i * 32);
+    }
+
+    return rd;
+}
+
+/**
+ * PWSLAI.H - Packed widening signed shift left immediate (halfword to wor=
d)
+ */
+uint64_t HELPER(pwslai_h)(CPURISCVState *env, uint32_t rs1, uint32_t imm)
+{
+    uint64_t rd =3D 0;
+    uint8_t shamt =3D imm & 0x1F;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t e1 =3D (int16_t)((rs1 >> (i * 16)) & 0xFFFF);
+        int32_t res =3D e1 << shamt;
+        rd |=3D ((uint64_t)(uint32_t)res) << (i * 32);
+    }
+
+    return rd;
+}
+
+/**
+ * PWSLA.HS - Packed widening signed shift left from register (halfword to=
 word)
+ */
+uint64_t HELPER(pwsla_hs)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+    uint8_t shamt =3D rs2 & 0x1F;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t e1 =3D (int16_t)((rs1 >> (i * 16)) & 0xFFFF);
+        int32_t res =3D e1 << shamt;
+        rd |=3D ((uint64_t)(uint32_t)res) << (i * 32);
+    }
+
+    return rd;
+}
+
+/**
+ * WADD - Widening signed addition (RV32)
+ */
+uint64_t HELPER(wadd)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int64_t a =3D (int32_t)rs1;
+    int64_t b =3D (int32_t)rs2;
+    return (uint64_t)(a + b);
+}
+
+/**
+ * WADDA - Widening signed addition with accumulate (RV32)
+ */
+uint64_t HELPER(wadda)(CPURISCVState *env, uint32_t rs1,
+                       uint32_t rs2, uint64_t rd)
+{
+    int64_t a =3D (int32_t)rs1;
+    int64_t b =3D (int32_t)rs2;
+    int64_t acc =3D (int64_t)rd;
+    return (uint64_t)(acc + a + b);
+}
+
+/**
+ * WADDU - Widening unsigned addition (RV32)
+ */
+uint64_t HELPER(waddu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t a =3D rs1;
+    uint64_t b =3D rs2;
+    return a + b;
+}
+
+/**
+ * WADDAU - Widening unsigned addition with accumulate (RV32)
+ */
+uint64_t HELPER(waddau)(CPURISCVState *env, uint32_t rs1,
+                        uint32_t rs2, uint64_t rd)
+{
+    uint64_t acc =3D rd;
+    return acc + rs1 + rs2;
+}
+
+/**
+ * WSUB - Widening signed subtraction (RV32)
+ */
+uint64_t HELPER(wsub)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int64_t a =3D (int32_t)rs1;
+    int64_t b =3D (int32_t)rs2;
+    return (uint64_t)(a - b);
+}
+
+/**
+ * WSUBA - Widening signed subtraction with accumulate (RV32)
+ */
+uint64_t HELPER(wsuba)(CPURISCVState *env, uint32_t rs1,
+                       uint32_t rs2, uint64_t rd)
+{
+    int64_t a =3D (int32_t)rs1;
+    int64_t b =3D (int32_t)rs2;
+    int64_t acc =3D (int64_t)rd;
+    return (uint64_t)(acc + a - b);
+}
+
+/**
+ * WSUBU - Widening unsigned subtraction (RV32)
+ */
+uint64_t HELPER(wsubu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t a =3D rs1;
+    uint64_t b =3D rs2;
+    return a - b;
+}
+
+/**
+ * WSUBAU - Widening unsigned subtraction with accumulate (RV32)
+ */
+uint64_t HELPER(wsubau)(CPURISCVState *env, uint32_t rs1,
+                        uint32_t rs2, uint64_t rd)
+{
+    uint64_t acc =3D rd;
+    return acc + rs1 - rs2;
+}
+
+/**
+ * WSLLI - Widening logical shift left immediate (RV32)
+ */
+uint64_t HELPER(wslli)(CPURISCVState *env, uint32_t rs1, uint32_t imm)
+{
+    uint64_t a =3D rs1;
+    uint8_t shamt =3D imm & 0x3F;
+    return a << shamt;
+}
+
+/**
+ * WSLL - Widening logical shift left from register (RV32)
+ */
+uint64_t HELPER(wsll)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t a =3D rs1;
+    uint8_t shamt =3D rs2 & 0x3F;
+    return a << shamt;
+}
+
+/**
+ * WSLAI - Widening signed shift left immediate (RV32)
+ */
+uint64_t HELPER(wslai)(CPURISCVState *env, uint32_t rs1, uint32_t imm)
+{
+    int64_t a =3D (int32_t)rs1;
+    uint8_t shamt =3D imm & 0x3F;
+    return (uint64_t)(a << shamt);
+}
+
+/**
+ * WSLA - Widening signed shift left from register (RV32)
+ */
+uint64_t HELPER(wsla)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int64_t a =3D (int32_t)rs1;
+    uint8_t shamt =3D rs2 & 0x3F;
+    return (uint64_t)(a << shamt);
+}
+
+/**
+ * WZIP8P - Double-width interleave bytes (RV32)
+ */
+uint64_t HELPER(wzip8p)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint64_t b1 =3D (uint64_t)EXTRACT8(rs1, i) << 16 * i;
+        uint64_t b2 =3D (uint64_t)EXTRACT8(rs2, i) << (16 * i + 8);
+        rd =3D rd | b2 | b1;
+    }
+
+    return rd;
+}
+
+/**
+ * WZIP16P - Double-width interleave halfwords (RV32)
+ */
+uint64_t HELPER(wzip16p)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint64_t h1 =3D (uint64_t)EXTRACT16(rs1, i) << (32 * i);
+        uint64_t h2 =3D (uint64_t)EXTRACT16(rs2, i) << (32 * i + 16);
+        rd =3D rd | h2 | h1;
+    }
+
+    return rd;
+}
+
+/**
+ * PREDSUM.DBS - Double-width signed reduction sum of bytes (RV32)
+ */
+uint32_t HELPER(predsum_dbs)(CPURISCVState *env, uint32_t rs1_lo,
+                             uint32_t rs1_hi, uint32_t rs2)
+{
+    int64_t sum =3D (int32_t)rs2;
+    int64_t s1 =3D ((int64_t)rs1_hi << 32) | rs1_lo;
+
+    for (int i =3D 0; i < 8; i++) {
+        int8_t b =3D (int8_t)((s1 >> (i * 8)) & 0xFF);
+        sum +=3D b;
+    }
+
+    return (uint32_t)sum;
+}
+
+/**
+ * PREDSUMU.DBS - Double-width unsigned reduction sum of bytes (RV32)
+ */
+uint32_t HELPER(predsumu_dbs)(CPURISCVState *env, uint32_t rs1_lo,
+                              uint32_t rs1_hi, uint32_t rs2)
+{
+    uint64_t sum =3D rs2;
+    uint64_t s1 =3D ((uint64_t)rs1_hi << 32) | rs1_lo;
+
+    for (int i =3D 0; i < 8; i++) {
+        uint8_t b =3D (uint8_t)((s1 >> (i * 8)) & 0xFF);
+        sum +=3D b;
+    }
+
+    return (uint32_t)sum;
+}
+
+/**
+ * PREDSUM.DHS - Double-width signed reduction sum of halfwords (RV32)
+ */
+uint32_t HELPER(predsum_dhs)(CPURISCVState *env, uint32_t rs1_lo,
+                             uint32_t rs1_hi, uint32_t rs2)
+{
+    int64_t sum =3D (int32_t)rs2;
+    int64_t s1 =3D ((int64_t)rs1_hi << 32) | rs1_lo;
+
+    for (int i =3D 0; i < 4; i++) {
+        int16_t h =3D (int16_t)((s1 >> (i * 16)) & 0xFFFF);
+        sum +=3D h;
+    }
+
+    return (uint32_t)sum;
+}
+
+/**
+ * PREDSUMU.DHS - Double-width unsigned reduction sum of halfwords (RV32)
+ */
+uint32_t HELPER(predsumu_dhs)(CPURISCVState *env, uint32_t rs1_lo,
+                              uint32_t rs1_hi, uint32_t rs2)
+{
+    uint64_t sum =3D rs2;
+    uint64_t s1 =3D ((uint64_t)rs1_hi << 32) | rs1_lo;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t h =3D (uint16_t)((s1 >> (i * 16)) & 0xFFFF);
+        sum +=3D h;
+    }
+
+    return (uint32_t)sum;
+}
+
+
+/* Narrowing Operations (RV32 only, register pair sources) */
+
+/**
+ * PNSRLI.B - Narrowing logical shift right immediate (64-bit to 32-bit)
+ */
+uint32_t HELPER(pnsrli_b)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF;
+        uint8_t result =3D (s1_h >> (shamt & 0xF)) & 0xFF;
+        rd |=3D ((uint32_t)result) << (i * 8);
+    }
+
+    return rd;
+}
+
+/**
+ * PNSRL.BS - Narrowing logical shift right from register (64-bit to 32-bi=
t)
+ */
+uint32_t HELPER(pnsrl_bs)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF;
+        uint32_t s1_h_z32 =3D (uint32_t)s1_h;
+        uint8_t result =3D (s1_h_z32 >> (shamt & 0x1F)) & 0xFF;
+        rd |=3D ((uint32_t)result) << (i * 8);
+    }
+
+    return rd;
+}
+
+/**
+ * PNSRAI.B - Narrowing arithmetic shift right immediate (64-bit to 32-bit)
+ */
+uint32_t HELPER(pnsrai_b)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF;
+        int32_t s1_h_s32 =3D (int32_t)(int16_t)s1_h;
+        int32_t s1_h_s24 =3D (s1_h_s32 << 8) >> 8;
+        uint8_t result =3D s1_h_s24 >> (shamt & 0xF) & 0xFF;
+        rd |=3D ((uint32_t)result) << (i * 8);
+    }
+
+    return rd;
+}
+
+/**
+ * PNSRA.BS - Narrowing arithmetic shift right from register (64-bit to 32=
-bit)
+ */
+uint32_t HELPER(pnsra_bs)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF;
+        int64_t s1_h_s64 =3D (int64_t)(int16_t)s1_h;
+        s1_h_s64 =3D (s1_h_s64 << 24) >> 24;
+        uint8_t result =3D s1_h_s64 >> (shamt & 0x1F) & 0xFF;
+        rd |=3D ((uint32_t)result) << (i * 8);
+    }
+
+    return rd;
+}
+
+/**
+ * PNSRARI.B - Narrowing arithmetic shift right with rounding
+ * immediate (64-bit to 32-bit)
+ */
+uint32_t HELPER(pnsrari_b)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF;
+        int32_t s1_h_s32 =3D (int32_t)(int16_t)s1_h;
+        int32_t s1_h_s24 =3D (s1_h_s32 << 8) >> 8;
+        uint32_t shx_25bit =3D ((uint32_t)s1_h_s24 << 1);
+        uint32_t shx =3D (shx_25bit >> (shamt & 0xF)) & 0x1FF;
+        uint8_t result =3D ((shx + 1) >> 1) & 0xFF;
+        rd |=3D ((uint32_t)result) << (i * 8);
+    }
+
+    return rd;
+}
+
+/**
+ * PNSRAR.BS - Narrowing arithmetic shift right with rounding
+ * from register (64-bit to 32-bit)
+ */
+uint32_t HELPER(pnsrar_bs)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF;
+        int64_t s1_h_s64 =3D (int64_t)(int16_t)s1_h;
+        int64_t s1_h_s40 =3D (s1_h_s64 << 24) >> 24;
+        uint64_t shx_41bit =3D ((uint64_t)s1_h_s40 << 1);
+        uint64_t shx =3D (shx_41bit >> (shamt & 0x1F)) & 0x1FF;
+        uint8_t result =3D ((shx + 1) >> 1) & 0xFF;
+        rd |=3D ((uint32_t)result) << (i * 8);
+    }
+
+    return rd;
+}
+
+/**
+ * PNCLIPI.B - Narrowing clip signed (64-bit to 32-bit) with immediate shi=
ft
+ */
+uint32_t HELPER(pnclipi_b)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF;
+        int32_t s1_h_s32 =3D (int32_t)(int16_t)s1_h;
+        int16_t shx =3D (int16_t)(s1_h_s32 >> (shamt & 0xF));
+        uint8_t result =3D 0;
+
+        if (shx < -128) {
+            sat =3D 1;
+            result =3D 0x80; /* -128 */
+        } else if (shx > 127) {
+            sat =3D 1;
+            result =3D 0x7F; /* 127 */
+        } else {
+            result =3D (uint8_t)shx;
+        }
+        rd |=3D ((uint32_t)result << (i * 8));
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPRI.B - Narrowing clip signed with rounding
+ * (64-bit to 32-bit) with immediate shift
+ */
+uint32_t HELPER(pnclipri_b)(CPURISCVState *env, uint64_t s1, uint32_t sham=
t)
+{
+    uint32_t rd =3D 0;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF;
+        int32_t s1_h_s32 =3D (int32_t)(int16_t)s1_h;
+        uint64_t shx_33bit =3D ((uint32_t)s1_h_s32 << 1);
+        uint32_t shx =3D (shx_33bit >> (shamt & 0xF)) & 0x1FFFF;
+        uint16_t round_shx =3D (uint16_t)((shx + 1) >> 1);
+        int16_t round_shx_s =3D (int16_t)round_shx;
+        uint8_t result =3D 0;
+
+        if (round_shx_s < -128) {
+            sat =3D 1;
+            result =3D 0x80;
+        } else if (round_shx_s > 127) {
+            sat =3D 1;
+            result =3D 0x7F;
+        } else {
+            result =3D (uint8_t)round_shx;
+        }
+
+        rd |=3D ((uint32_t)result) << (i * 8);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPIU.B - Narrowing clip unsigned (64-bit to 32-bit) with immediate =
shift
+ */
+uint32_t HELPER(pnclipiu_b)(CPURISCVState *env, uint64_t s1, uint32_t sham=
t)
+{
+    uint32_t rd =3D 0;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF;
+        uint16_t shx =3D s1_h >> (shamt & 0xF);
+        uint8_t result =3D 0;
+
+        if (shx > 0x00FF) {
+            sat =3D 1;
+            result =3D 0xFF;
+        } else {
+            result =3D (uint8_t)(shx & 0xFF);
+        }
+        rd |=3D ((uint32_t)result) << (i * 8);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPRIU.B - Narrowing clip unsigned with rounding
+ * (64-bit to 32-bit) with immediate shift
+ */
+uint32_t HELPER(pnclipriu_b)(CPURISCVState *env, uint64_t s1, uint32_t sha=
mt)
+{
+    uint32_t rd =3D 0;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF;
+        uint32_t shx_17bit =3D ((uint32_t)s1_h << 1);
+        uint32_t shx =3D shx_17bit >> (shamt & 0xF);
+        uint16_t round_shx =3D (uint16_t)((shx + 1) >> 1);
+        uint8_t result =3D 0;
+
+        if (round_shx > 0x00FF) {
+            sat =3D 1;
+            result =3D 0xFF;
+        } else {
+            result =3D (uint8_t)(round_shx & 0xFF);
+        }
+        rd |=3D ((uint32_t)result) << (i * 8);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIP.BS - Narrowing clip signed from register (64-bit to 32-bit)
+ */
+uint32_t HELPER(pnclip_bs)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF;
+        int64_t s1_h_s64 =3D (int64_t)(int16_t)s1_h;
+        int64_t s1_h_s48 =3D (s1_h_s64 << 16) >> 16;
+        int16_t shx =3D (int16_t)(s1_h_s48 >> (shamt & 0x1F));
+        uint8_t result =3D 0;
+
+        if (shx < -128) {
+            sat =3D 1;
+            result =3D 0x80;
+        } else if (shx > 127) {
+            sat =3D 1;
+            result =3D 0x7F;
+        } else {
+            result =3D (uint8_t)shx;
+        }
+        rd |=3D ((uint32_t)result) << (i * 8);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPR.BS - Narrowing clip signed with rounding
+ * from register (64-bit to 32-bit)
+ */
+uint32_t HELPER(pnclipr_bs)(CPURISCVState *env, uint64_t s1, uint32_t sham=
t)
+{
+    uint32_t rd =3D 0;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF;
+        int64_t s1_h_s64 =3D (int64_t)(int16_t)s1_h;
+        int64_t s1_h_s48 =3D (s1_h_s64 << 16) >> 16;
+        uint64_t shx_49bit =3D ((uint64_t)s1_h_s48 << 1);
+        uint32_t shx =3D (shx_49bit >> (shamt & 0x1F)) & 0x1FFFF;
+        uint16_t round_shx =3D (uint16_t)((shx + 1) >> 1);
+        int16_t round_shx_s =3D (int16_t)round_shx;
+        uint8_t result =3D 0;
+
+        if (round_shx_s < -128) {
+            sat =3D 1;
+            result =3D 0x80;
+        } else if (round_shx_s > 127) {
+            sat =3D 1;
+            result =3D 0x7F;
+        } else {
+            result =3D (uint8_t)round_shx;
+        }
+        rd |=3D ((uint32_t)result) << (i * 8);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPU.BS - Narrowing clip unsigned from register (64-bit to 32-bit)
+ */
+uint32_t HELPER(pnclipu_bs)(CPURISCVState *env, uint64_t s1, uint32_t sham=
t)
+{
+    uint32_t rd =3D 0;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF;
+        uint32_t s1_h_z32 =3D (uint32_t)s1_h;
+        uint16_t shx =3D (s1_h_z32 >> (shamt & 0x1F)) & 0xFFFF;
+        uint8_t result =3D 0;
+
+        if (shx > 0x00FF) {
+            sat =3D 1;
+            result =3D 0xFF;
+        } else {
+            result =3D (uint8_t)(shx & 0xFF);
+        }
+        rd |=3D ((uint32_t)result) << (i * 8);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPRU.BS - Narrowing clip unsigned with rounding
+ * from register (64-bit to 32-bit)
+ */
+uint32_t HELPER(pnclipru_bs)(CPURISCVState *env, uint64_t s1, uint32_t sha=
mt)
+{
+    uint32_t rd =3D 0;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF;
+        uint32_t s1_h_z32 =3D (uint32_t)s1_h;
+        uint64_t shx_33bit =3D ((uint64_t)s1_h_z32 << 1);
+        uint32_t shx =3D (shx_33bit >> (shamt & 0x1F)) & 0x1FFFF;
+        uint16_t round_shx =3D (uint16_t)((shx + 1) >> 1);
+        uint8_t result =3D 0;
+
+        if (round_shx > 0x00FF) {
+            sat =3D 1;
+            result =3D 0xFF;
+        } else {
+            result =3D (uint8_t)(round_shx & 0xFF);
+        }
+        rd |=3D ((uint32_t)result) << (i * 8);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNSRLI.H - Narrowing logical shift right immediate
+ * (64-bit to 32-bit, word to halfword)
+ */
+uint32_t HELPER(pnsrli_h)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+    uint32_t s1_low  =3D (uint32_t)(s1 & 0xFFFFFFFF);
+    uint32_t s1_high =3D (uint32_t)((s1 >> 32) & 0xFFFFFFFF);
+
+    uint16_t rd_low  =3D (s1_low  >> (shamt & 0x1F)) & 0xFFFF;
+    uint16_t rd_high =3D (s1_high >> (shamt & 0x1F)) & 0xFFFF;
+
+    rd =3D ((uint32_t)rd_high << 16) | rd_low;
+    return rd;
+}
+
+/**
+ * PNSRAI.H - Narrowing arithmetic shift right immediate
+ * (64-bit to 32-bit, word to halfword)
+ */
+uint32_t HELPER(pnsrai_h)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+    uint32_t s1_low  =3D (uint32_t)(s1 & 0xFFFFFFFF);
+    int64_t s1_low_s64 =3D (int64_t)(int32_t)s1_low;
+    int64_t s1_low_s48 =3D (s1_low_s64 << 16) >> 16;
+
+    uint32_t s1_high =3D (uint32_t)((s1 >> 32) & 0xFFFFFFFF);
+    int64_t s1_high_s64 =3D (int64_t)(int32_t)s1_high;
+    int64_t s1_high_s48 =3D (s1_high_s64 << 16) >> 16;
+
+    uint16_t rd_low  =3D (s1_low_s48  >> (shamt & 0x1F)) & 0xFFFF;
+    uint16_t rd_high =3D (s1_high_s48 >> (shamt & 0x1F)) & 0xFFFF;
+
+    rd =3D ((uint32_t)rd_high << 16) | rd_low;
+    return rd;
+}
+
+/**
+ * PNSRARI.H - Narrowing arithmetic shift right with rounding
+ * immediate (64-bit to 32-bit, word to halfword)
+ */
+uint32_t HELPER(pnsrari_h)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t s1_w =3D (s1 >> (i * 32)) & 0xFFFFFFFF;
+        int64_t s1_w_s64 =3D (int64_t)(int32_t)s1_w;
+        int64_t s1_w_s48 =3D (s1_w_s64 << 16) >> 16;
+        uint64_t shx_49bit =3D ((uint64_t)s1_w_s48 << 1);
+        uint32_t shx =3D (shx_49bit >> (shamt & 0x1F)) & 0x1FFFF;
+        rd |=3D ((uint16_t)((shx + 1) >> 1)) << (i * 16);
+    }
+
+    return rd;
+}
+
+/**
+ * PNSRL.HS - Narrowing logical shift right from register
+ * (64-bit to 32-bit, word to halfword)
+ */
+uint32_t HELPER(pnsrl_hs)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+    uint32_t s1_low  =3D (uint32_t)(s1 & 0xFFFFFFFF);
+    uint32_t s1_high =3D (uint32_t)((s1 >> 32) & 0xFFFFFFFF);
+
+    uint16_t rd_low  =3D (s1_low  >> (shamt & 0x1F)) & 0xFFFF;
+    uint16_t rd_high =3D (s1_high >> (shamt & 0x1F)) & 0xFFFF;
+
+    rd =3D ((uint32_t)rd_high << 16) | rd_low;
+    return rd;
+}
+
+/**
+ * PNSRA.HS - Narrowing arithmetic shift right from register
+ * (64-bit to 32-bit, word to halfword)
+ */
+uint32_t HELPER(pnsra_hs)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+    uint32_t s1_low  =3D (uint32_t)(s1 & 0xFFFFFFFF);
+    uint32_t s1_high =3D (uint32_t)((s1 >> 32) & 0xFFFFFFFF);
+
+    uint16_t rd_low  =3D (s1_low  >> (shamt & 0x1F)) & 0xFFFF;
+    uint16_t rd_high =3D (s1_high >> (shamt & 0x1F)) & 0xFFFF;
+
+    rd =3D ((uint32_t)rd_high << 16) | rd_low;
+    return rd;
+}
+
+/**
+ * PNSRAR.HS - Narrowing arithmetic shift right with rounding
+ * from register (64-bit to 32-bit, word to halfword)
+ */
+uint32_t HELPER(pnsrar_hs)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t s1_w =3D (s1 >> (i * 32)) & 0xFFFFFFFF;
+        int64_t s1_w_s64 =3D (int64_t)(int32_t)s1_w;
+        int64_t s1_w_s48 =3D (s1_w_s64 << 16) >> 16;
+        uint64_t shx_49bit =3D ((uint64_t)s1_w_s48 << 1);
+        uint32_t shx =3D (shx_49bit >> (shamt & 0x1F)) & 0x1FFFF;
+        rd |=3D ((uint16_t)((shx + 1) >> 1)) << (i * 16);
+    }
+
+    return rd;
+}
+
+/**
+ * PNCLIP.HS - Narrowing signed clip from register shift (word to halfword)
+ * For each word: arithmetic right shift, clip to signed 16-bit
+ *   shx =3D (int32_t)rs1[i] >> shamt
+ *   result =3D sat16(shx)
+ */
+uint32_t HELPER(pnclip_hs)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint32_t rd =3D 0;
+    int sat =3D 0;
+    uint8_t shift =3D shamt & 0x1F;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t s1_w =3D EXTRACT32(s1, i);
+        int64_t s1_w_s64 =3D (int64_t)(int32_t)s1_w;
+        int32_t shx =3D (int32_t)(s1_w_s64 >> shift);
+        uint16_t result;
+
+        if (shx < -32768) {
+            sat =3D 1;
+            result =3D 0x8000;
+        } else if (shx > 32767) {
+            sat =3D 1;
+            result =3D 0x7FFF;
+        } else {
+            result =3D (uint16_t)shx;
+        }
+
+        rd =3D INSERT16(rd, result, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPR.HS - Narrowing signed clip with rounding
+ * from register (word to halfword)
+ * For each word: ((int32_t)rs1[i] << 1) >> shamt, round, clip to signed 1=
6-bit
+ *   shx_65bit =3D ((int64_t)rs1[i] << 1)
+ *   shx =3D (shx_65bit >> shamt) & mask
+ *   round =3D (shx + 1) >> 1
+ *   result =3D sat16(round)
+ */
+uint32_t HELPER(pnclipr_hs)(CPURISCVState *env, uint64_t s1, uint32_t sham=
t)
+{
+    uint32_t rd =3D 0;
+    int sat =3D 0;
+    uint8_t shift =3D shamt & 0x1F;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t s1_w =3D EXTRACT32(s1, i);
+        int64_t s1_w_s64 =3D (int64_t)(int32_t)s1_w;
+        __uint128_t shx_65bit =3D (__uint128_t)s1_w_s64 << 1;
+        uint64_t shx =3D (uint64_t)(shx_65bit >> shift) & 0x1FFFFFFFF;
+        int32_t round_shx =3D (int32_t)((shx + 1) >> 1);
+        uint16_t result;
+
+        if (round_shx < -32768) {
+            sat =3D 1;
+            result =3D 0x8000;
+        } else if (round_shx > 32767) {
+            sat =3D 1;
+            result =3D 0x7FFF;
+        } else {
+            result =3D (uint16_t)round_shx;
+        }
+
+        rd =3D INSERT16(rd, result, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPI.H - Narrowing signed clip from immediate shift (word to halfwor=
d)
+ * For each word: rs1[i] >> imm, clip to signed 16-bit
+ */
+uint32_t HELPER(pnclipi_h)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    return HELPER(pnclip_hs)(env, s1, shamt);
+}
+
+/**
+ * PNCLIPRI.H - Narrowing signed clip with rounding
+ * from immediate shift (word to halfword)
+ * For each word: (rs1[i] << 1) >> imm, round, clip to signed 16-bit
+ */
+uint32_t HELPER(pnclipri_h)(CPURISCVState *env, uint64_t s1, uint32_t sham=
t)
+{
+    return HELPER(pnclipr_hs)(env, s1, shamt);
+}
+
+/**
+ * PNCLIPU.HS - Narrowing unsigned clip from register shift (word to halfw=
ord)
+ * For each word: shift right, clip to unsigned 16-bit
+ *   shx =3D rs1[i] >> shamt
+ *   result =3D (shx > 65535) ? 0xFFFF : shx
+ */
+uint32_t HELPER(pnclipu_hs)(CPURISCVState *env, uint64_t s1, uint32_t sham=
t)
+{
+    uint32_t rd =3D 0;
+    int sat =3D 0;
+    uint8_t shift =3D shamt & 0x1F;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t s1_w =3D EXTRACT32(s1, i);
+        uint32_t shx =3D s1_w >> shift;
+        uint16_t result;
+
+        if (shx > 65535) {
+            sat =3D 1;
+            result =3D 0xFFFF;
+        } else {
+            result =3D (uint16_t)shx;
+        }
+
+        rd =3D INSERT16(rd, result, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPRU.HS - Narrowing unsigned clip with rounding
+ * from register (word to halfword)
+ * For each word: (rs1[i] << 1) >> shamt, round, clip to unsigned 16-bit
+ *   shx =3D ((rs1[i] << 1) >> shamt)
+ *   round =3D (shx + 1) >> 1
+ *   result =3D (round > 65535) ? 0xFFFF : round
+ */
+uint32_t HELPER(pnclipru_hs)(CPURISCVState *env, uint64_t s1, uint32_t sha=
mt)
+{
+    uint32_t rd =3D 0;
+    int sat =3D 0;
+    uint8_t shift =3D shamt & 0x1F;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t s1_w =3D EXTRACT32(s1, i);
+        uint64_t shx_33bit =3D (uint64_t)s1_w << 1;
+        uint64_t shx =3D shx_33bit >> shift;
+        uint32_t round_shx =3D (uint32_t)((shx + 1) >> 1);
+        uint16_t result;
+
+        if (round_shx > 65535) {
+            sat =3D 1;
+            result =3D 0xFFFF;
+        } else {
+            result =3D (uint16_t)round_shx;
+        }
+
+        rd =3D INSERT16(rd, result, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPIU.H - Narrowing unsigned clip from immediate shift (word to half=
word)
+ * For each word: rs1[i] >> imm, clip to unsigned 16-bit
+ */
+uint32_t HELPER(pnclipiu_h)(CPURISCVState *env, uint64_t s1, uint32_t sham=
t)
+{
+    return HELPER(pnclipu_hs)(env, s1, shamt);
+}
+
+/**
+ * PNCLIPRIU.H - Narrowing unsigned clip with rounding
+ * from immediate shift (word to halfword)
+ * For each word: (rs1[i] << 1) >> imm, round, clip to unsigned 16-bit
+ */
+uint32_t HELPER(pnclipriu_h)(CPURISCVState *env, uint64_t s1, uint32_t sha=
mt)
+{
+    return HELPER(pnclipru_hs)(env, s1, shamt);
+}
+
+/**
+ * NSRLI - Narrowing logical shift right immediate (64-bit to 32-bit)
+ */
+uint32_t HELPER(nsrli)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    return (s1 >> (shamt & 0x3F)) & 0xFFFFFFFF;
+}
+
+/**
+ * NSRAI - Narrowing arithmetic shift right immediate (64-bit to 32-bit)
+ */
+uint32_t HELPER(nsrai)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    __int128_t s1_s128 =3D (__int128_t)((int64_t)s1);
+    __int128_t s1_s96 =3D (s1_s128 << 32) >> 32;
+    return (uint32_t)(s1_s96 >> (shamt & 0x3F)) & 0xFFFFFFFF;
+}
+
+/**
+ * NSRARI - Narrowing arithmetic shift right with rounding
+ * immediate (64-bit to 32-bit)
+ */
+uint32_t HELPER(nsrari)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    __int128_t s1_s128 =3D (__int128_t)((int64_t)s1);
+    __int128_t s1_s96 =3D (s1_s128 << 32) >> 32;
+    __uint128_t shx_97bit =3D ((__uint128_t)s1_s96 << 1);
+    uint64_t shx =3D (uint64_t)(shx_97bit >> (shamt & 0x3F)) & 0x1FFFFFFFF;
+    return (uint32_t)((shx + 1) >> 1);
+}
+
+/**
+ * NSRL - Narrowing logical shift right from register (64-bit to 32-bit)
+ */
+uint32_t HELPER(nsrl)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    return (s1 >> (shamt & 0x3F)) & 0xFFFFFFFF;
+}
+
+/**
+ * NSRA - Narrowing arithmetic shift right from register (64-bit to 32-bit)
+ */
+uint32_t HELPER(nsra)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    __int128_t s1_s128 =3D (__int128_t)((int64_t)s1);
+    __int128_t s1_s96 =3D (s1_s128 << 32) >> 32;
+    return (uint32_t)(s1_s96 >> (shamt & 0x3F)) & 0xFFFFFFFF;
+}
+
+/**
+ * NSRAR - Narrowing arithmetic shift right with rounding
+ * from register (64-bit to 32-bit)
+ */
+uint32_t HELPER(nsrar)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    __int128_t s1_s128 =3D (__int128_t)((int64_t)s1);
+    __int128_t s1_s96 =3D (s1_s128 << 32) >> 32;
+    __uint128_t shx_97bit =3D ((__uint128_t)s1_s96 << 1);
+    uint64_t shx =3D (uint64_t)(shx_97bit >> (shamt & 0x3F)) & 0x1FFFFFFFF;
+    return (uint32_t)((shx + 1) >> 1);
+}
+
+/**
+ * NCLIPI - Narrowing clip signed with immediate shift (64-bit to 32-bit)
+ */
+uint32_t HELPER(nclipi)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    __int128_t s1_s128 =3D (__int128_t)((int64_t)s1);
+    int64_t shx =3D (int64_t)(s1_s128 >> (shamt & 0x3F));
+
+    if (shx < -2147483648LL) {
+        env->vxsat =3D 1;
+        return 0x80000000U;
+    } else if (shx > 2147483647LL) {
+        env->vxsat =3D 1;
+        return 0x7FFFFFFFU;
+    } else {
+        return (uint32_t)(shx & 0xFFFFFFFF);
+    }
+}
+
+/**
+ * NCLIPRI - Narrowing clip signed with rounding and immediate
+ * shift (64-bit to 32-bit)
+ */
+uint32_t HELPER(nclipri)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    typedef struct {
+        __uint128_t low;
+        uint8_t high;
+    } Uint129;
+
+    Uint129 left_shift_1(__int128_t s1_s128)
+    {
+        Uint129 result;
+        __uint128_t us1 =3D (__uint128_t)s1_s128;
+        result.low =3D us1 << 1;
+        result.high =3D (us1 >> 127) & 0x1;
+        return result;
+    }
+
+    Uint129 right_shift(Uint129 val, uint32_t smt)
+    {
+        Uint129 result;
+        if (smt =3D=3D 0) {
+            return val;
+        } else if (smt >=3D 129) {
+            result.low =3D 0;
+            result.high =3D 0;
+        } else if (smt =3D=3D 128) {
+            result.low =3D val.high;
+            result.high =3D 0;
+        } else {
+            result.low =3D (val.low >> smt) |
+                         ((__uint128_t)val.high << (128 - smt));
+            result.high =3D (val.high >> smt);
+        }
+        return result;
+    }
+
+    __int128_t s1_s128 =3D (__int128_t)((int64_t)s1);
+    Uint129 shx_129bit =3D left_shift_1(s1_s128);
+    Uint129 shx =3D right_shift(shx_129bit, shamt & 0x3F);
+    int64_t round_shx =3D (int64_t)((shx.low + 1) >> 1);
+
+    if (round_shx < -2147483648LL) {
+        env->vxsat =3D 1;
+        return 0x80000000U;
+    } else if (round_shx > 2147483647LL) {
+        env->vxsat =3D 1;
+        return 0x7FFFFFFFU;
+    } else {
+        return (uint32_t)round_shx;
+    }
+}
+
+/**
+ * NCLIPIU - Narrowing clip unsigned with immediate shift (64-bit to 32-bi=
t)
+ */
+uint32_t HELPER(nclipiu)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint64_t shx =3D s1 >> (shamt & 0x3F);
+
+    if (shx > 4294967295ULL) {
+        env->vxsat =3D 1;
+        return 0xFFFFFFFFU;
+    } else {
+        return (uint32_t)(shx & 0xFFFFFFFF);
+    }
+}
+
+/**
+ * NCLIPRIU - Narrowing clip unsigned with rounding and immediate
+ * shift (64-bit to 32-bit)
+ */
+uint32_t HELPER(nclipriu)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    __uint128_t shx_65bit =3D (s1 << 1);
+    __uint128_t shx =3D shx_65bit >> (shamt & 0x3F);
+    uint64_t round_shx =3D (shx + 1) >> 1;
+
+    if (round_shx > 4294967295ULL) {
+        env->vxsat =3D 1;
+        return 0xFFFFFFFFU;
+    } else {
+        return (uint32_t)(round_shx & 0xFFFFFFFF);
+    }
+}
+
+/**
+ * NCLIP - Narrowing clip signed from register (64-bit to 32-bit)
+ */
+uint32_t HELPER(nclip)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    __int128_t s1_s128 =3D (__int128_t)((int64_t)s1);
+    int64_t shx =3D (int64_t)(s1_s128 >> (shamt & 0x3F));
+
+    if (shx < -2147483648LL) {
+        env->vxsat =3D 1;
+        return 0x80000000U;
+    } else if (shx > 2147483647LL) {
+        env->vxsat =3D 1;
+        return 0x7FFFFFFFU;
+    } else {
+        return (uint32_t)(shx & 0xFFFFFFFF);
+    }
+}
+
+/**
+ * NCLIPR - Narrowing clip signed with rounding from register (64-bit to 3=
2-bit)
+ */
+uint32_t HELPER(nclipr)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    typedef struct {
+        __uint128_t low;
+        uint8_t high;
+    } Uint129;
+
+    Uint129 left_shift_1(__int128_t s1_s128)
+    {
+        Uint129 result;
+        __uint128_t us1 =3D (__uint128_t)s1_s128;
+        result.low =3D us1 << 1;
+        result.high =3D (us1 >> 127) & 0x1;
+        return result;
+    }
+
+    Uint129 right_shift(Uint129 val, uint32_t smt)
+    {
+        Uint129 result;
+        if (smt =3D=3D 0) {
+            return val;
+        } else if (smt >=3D 129) {
+            result.low =3D 0;
+            result.high =3D 0;
+        } else if (smt =3D=3D 128) {
+            result.low =3D val.high;
+            result.high =3D 0;
+        } else {
+            result.low =3D (val.low >> smt) |
+                         ((__uint128_t)val.high << (128 - smt));
+            result.high =3D (val.high >> smt);
+        }
+        return result;
+    }
+
+    __int128_t s1_s128 =3D (__int128_t)((int64_t)s1);
+    Uint129 shx_129bit =3D left_shift_1(s1_s128);
+    Uint129 shx =3D right_shift(shx_129bit, shamt & 0x3F);
+    int64_t round_shx =3D (int64_t)((shx.low + 1) >> 1);
+
+    if (round_shx < -2147483648LL) {
+        env->vxsat =3D 1;
+        return 0x80000000U;
+    } else if (round_shx > 2147483647LL) {
+        env->vxsat =3D 1;
+        return 0x7FFFFFFFU;
+    } else {
+        return (uint32_t)round_shx;
+    }
+}
+
+/**
+ * NCLIPU - Narrowing clip unsigned from register (64-bit to 32-bit)
+ */
+uint32_t HELPER(nclipu)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    uint64_t shx =3D s1 >> (shamt & 0x3F);
+
+    if (shx > 4294967295ULL) {
+        env->vxsat =3D 1;
+        return 0xFFFFFFFFU;
+    } else {
+        return (uint32_t)(shx & 0xFFFFFFFF);
+    }
+}
+
+/**
+ * NCLIPRU - Narrowing clip unsigned with rounding
+ * from register (64-bit to 32-bit)
+ */
+uint32_t HELPER(nclipru)(CPURISCVState *env, uint64_t s1, uint32_t shamt)
+{
+    __uint128_t shx_65bit =3D (s1 << 1);
+    __uint128_t shx =3D shx_65bit >> (shamt & 0x3F);
+    uint64_t round_shx =3D (shx + 1) >> 1;
+
+    if (round_shx > 4294967295ULL) {
+        env->vxsat =3D 1;
+        return 0xFFFFFFFFU;
+    } else {
+        return (uint32_t)(round_shx & 0xFFFFFFFF);
+    }
+}
+
+/* Multiplication with Even-Odd Register Pairs as Destination (RV32 only) =
*/
+
+/**
+ * PMQWACC.H - Packed Q-format halfword to word multiply accumulate
+ */
+uint64_t HELPER(pmqwacc_h)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int16_t s1_h =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s2_h =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int32_t d_w =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)s1_h * (int64_t)s2_h;
+        uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15));
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PMQRWACC.H - Packed Q-format halfword to word multiply
+ * accumulate with rounding
+ */
+uint64_t HELPER(pmqrwacc_h)(CPURISCVState *env, uint32_t rs1,
+                            uint32_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int16_t s1_h =3D (int16_t)EXTRACT16(rs1, i * 2);
+        int16_t s2_h =3D (int16_t)EXTRACT16(rs2, i * 2);
+        int32_t d_w =3D (int32_t)EXTRACT32(dest, i);
+        int64_t prod =3D (int64_t)s1_h * (int64_t)s2_h + (1LL << 14);
+        uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15));
+        rd =3D INSERT32(rd, res, i);
+    }
+    return rd;
+}
+
+/**
+ * PWMUL.B - Widening byte to halfword multiplication
+ */
+uint64_t HELPER(pwmul_b)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        int8_t s1_b =3D (int8_t)EXTRACT8(rs1, i);
+        int8_t s2_b =3D (int8_t)EXTRACT8(rs2, i);
+        int16_t prod =3D (int16_t)s1_b * (int16_t)s2_b;
+        rd |=3D ((uint64_t)(uint16_t)prod) << (i * 16);
+    }
+    return rd;
+}
+
+/**
+ * PWMULSU.B - Widening signed x unsigned byte to halfword multiplication
+ */
+uint64_t HELPER(pwmulsu_b)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        int8_t s1_b =3D (int8_t)EXTRACT8(rs1, i);
+        uint8_t s2_b =3D EXTRACT8(rs2, i);
+        int16_t prod =3D (int16_t)s1_b * (uint16_t)s2_b;
+        rd |=3D ((uint64_t)(uint16_t)prod) << (i * 16);
+    }
+    return rd;
+}
+
+/**
+ * PWMULU.B - Widening unsigned byte to halfword multiplication
+ */
+uint64_t HELPER(pwmulu_b)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint8_t s1_b =3D EXTRACT8(rs1, i);
+        uint8_t s2_b =3D EXTRACT8(rs2, i);
+        uint16_t prod =3D (uint16_t)s1_b * (uint16_t)s2_b;
+        rd |=3D ((uint64_t)prod) << (i * 16);
+    }
+    return rd;
+}
+
+/**
+ * PWMUL.H - Widening halfword to word multiplication
+ */
+uint64_t HELPER(pwmul_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int16_t s1_h =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t s2_h =3D (int16_t)EXTRACT16(rs2, i);
+        int32_t prod =3D (int32_t)s1_h * (int32_t)s2_h;
+        rd |=3D ((uint64_t)(uint32_t)prod) << (i * 32);
+    }
+    return rd;
+}
+
+/**
+ * PWMULSU.H - Widening signed x unsigned halfword to word multiplication
+ */
+uint64_t HELPER(pwmulsu_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int16_t s1_h =3D (int16_t)EXTRACT16(rs1, i);
+        uint16_t s2_h =3D EXTRACT16(rs2, i);
+        int32_t prod =3D (int32_t)s1_h * (uint32_t)s2_h;
+        rd |=3D ((uint64_t)(uint32_t)prod) << (i * 32);
+    }
+    return rd;
+}
+
+/**
+ * PWMULU.H - Widening unsigned halfword to word multiplication
+ */
+uint64_t HELPER(pwmulu_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint16_t s1_h =3D EXTRACT16(rs1, i);
+        uint16_t s2_h =3D EXTRACT16(rs2, i);
+        uint32_t prod =3D (uint32_t)s1_h * (uint32_t)s2_h;
+        rd |=3D ((uint64_t)prod) << (i * 32);
+    }
+    return rd;
+}
+
+/**
+ * PWMACC.H - Widening multiply accumulate (halfword to word)
+ */
+uint64_t HELPER(pwmacc_h)(CPURISCVState *env, uint32_t rs1,
+                          uint32_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int16_t s1_h =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t s2_h =3D (int16_t)EXTRACT16(rs2, i);
+        int32_t d_w =3D (int32_t)EXTRACT32(dest, i);
+        int32_t prod =3D (int32_t)s1_h * (int32_t)s2_h;
+        uint32_t res =3D (uint32_t)(d_w + prod);
+        rd |=3D ((uint64_t)res) << (i * 32);
+    }
+    return rd;
+}
+
+/**
+ * PWMACCSU.H - Widening signed x unsigned multiply
+ * accumulate (halfword to word)
+ */
+uint64_t HELPER(pwmaccsu_h)(CPURISCVState *env, uint32_t rs1,
+                            uint32_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int16_t s1_h =3D (int16_t)EXTRACT16(rs1, i);
+        uint16_t s2_h =3D EXTRACT16(rs2, i);
+        int32_t d_w =3D (int32_t)EXTRACT32(dest, i);
+        int32_t prod =3D (int32_t)s1_h * (uint32_t)s2_h;
+        uint32_t res =3D (uint32_t)(d_w + prod);
+        rd |=3D ((uint64_t)res) << (i * 32);
+    }
+    return rd;
+}
+
+/**
+ * PWMACCU.H - Widening unsigned multiply accumulate (halfword to word)
+ */
+uint64_t HELPER(pwmaccu_h)(CPURISCVState *env, uint32_t rs1,
+                           uint32_t rs2, uint64_t dest)
+{
+    uint64_t rd =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint16_t s1_h =3D EXTRACT16(rs1, i);
+        uint16_t s2_h =3D EXTRACT16(rs2, i);
+        uint32_t d_w =3D EXTRACT32(dest, i);
+        uint32_t prod =3D (uint32_t)s1_h * (uint32_t)s2_h;
+        uint32_t res =3D d_w + prod;
+        rd |=3D ((uint64_t)res) << (i * 32);
+    }
+    return rd;
+}
+
+/**
+ * MQWACC - Q-format word multiply accumulate
+ */
+uint64_t HELPER(mqwacc)(CPURISCVState *env, uint32_t rs1,
+                        uint32_t rs2, uint64_t dest)
+{
+    int64_t s1 =3D (int64_t)(int32_t)rs1;
+    int64_t s2 =3D (int64_t)(int32_t)rs2;
+    int64_t d =3D (int64_t)dest;
+    __int128_t prod =3D (__int128_t)s1 * (__int128_t)s2;
+    return (uint64_t)(d + (int64_t)(prod >> 31));
+}
+
+/**
+ * MQRWACC - Q-format word multiply accumulate with rounding
+ */
+uint64_t HELPER(mqrwacc)(CPURISCVState *env, uint32_t rs1,
+                         uint32_t rs2, uint64_t dest)
+{
+    int64_t s1 =3D (int64_t)(int32_t)rs1;
+    int64_t s2 =3D (int64_t)(int32_t)rs2;
+    int64_t d =3D (int64_t)dest;
+    __int128_t prod =3D (__int128_t)s1 * (__int128_t)s2 + (1LL << 30);
+    return (uint64_t)(d + (int64_t)(prod >> 31));
+}
+
+/**
+ * WMUL - Widening signed multiplication (32-bit to 64-bit, RV32)
+ */
+uint64_t HELPER(wmul)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    return (uint64_t)((int64_t)(int32_t)rs1 * (int64_t)(int32_t)rs2);
+}
+
+/**
+ * WMULSU - Widening signed x unsigned multiplication (32-bit to 64-bit, R=
V32)
+ */
+uint64_t HELPER(wmulsu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    return (uint64_t)((int64_t)(int32_t)rs1 * (uint64_t)rs2);
+}
+
+/**
+ * WMULU - Widening unsigned multiplication (32-bit to 64-bit, RV32)
+ */
+uint64_t HELPER(wmulu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    return (uint64_t)rs1 * (uint64_t)rs2;
+}
+
+/**
+ * WMACC - Widening multiply accumulate signed (32-bit to 64-bit, RV32)
+ */
+uint64_t HELPER(wmacc)(CPURISCVState *env, uint32_t rs1,
+                       uint32_t rs2, uint64_t dest)
+{
+    return (uint64_t)((int64_t)(int32_t)rs1 *
+                      (int64_t)(int32_t)rs2 + (int64_t)dest);
+}
+
+/**
+ * WMACCSU - Widening multiply accumulate signed x unsigned
+ * (32-bit to 64-bit, RV32)
+ */
+uint64_t HELPER(wmaccsu)(CPURISCVState *env, uint32_t rs1,
+                         uint32_t rs2, uint64_t dest)
+{
+    return (uint64_t)((int64_t)(int32_t)rs1 * (uint64_t)rs2 + (int64_t)des=
t);
+}
+
+/**
+ * WMACCU - Widening multiply accumulate unsigned (32-bit to 64-bit, RV32)
+ */
+uint64_t HELPER(wmaccu)(CPURISCVState *env, uint32_t rs1,
+                        uint32_t rs2, uint64_t dest)
+{
+    return (uint64_t)rs1 * (uint64_t)rs2 + (uint64_t)dest;
+}
+
+/**
+ * PM2WADD.H - Add two widening products (halfword to doubleword)
+ */
+uint64_t HELPER(pm2wadd_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0);
+    int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1);
+    int64_t prod0 =3D (int64_t)s1_h0 * (int64_t)s2_h0;
+    int64_t prod1 =3D (int64_t)s1_h1 * (int64_t)s2_h1;
+    return (uint64_t)(prod0 + prod1);
+}
+
+/**
+ * PM2WADDSU.H - Add two widening products
+ * (signed x unsigned, halfword to doubleword)
+ */
+uint64_t HELPER(pm2waddsu_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs=
2)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    uint16_t s2_h0 =3D EXTRACT16(rs2, 0);
+    uint16_t s2_h1 =3D EXTRACT16(rs2, 1);
+    int64_t prod0 =3D (int64_t)s1_h0 * (uint64_t)s2_h0;
+    int64_t prod1 =3D (int64_t)s1_h1 * (uint64_t)s2_h1;
+    return (uint64_t)(prod0 + prod1);
+}
+
+/**
+ * PM2WADDU.H - Add two widening products (unsigned, halfword to doublewor=
d)
+ */
+uint64_t HELPER(pm2waddu_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    uint16_t s1_h0 =3D EXTRACT16(rs1, 0);
+    uint16_t s1_h1 =3D EXTRACT16(rs1, 1);
+    uint16_t s2_h0 =3D EXTRACT16(rs2, 0);
+    uint16_t s2_h1 =3D EXTRACT16(rs2, 1);
+    uint64_t prod0 =3D (uint64_t)s1_h0 * (uint64_t)s2_h0;
+    uint64_t prod1 =3D (uint64_t)s1_h1 * (uint64_t)s2_h1;
+    return prod0 + prod1;
+}
+
+/**
+ * PM2WADDA.H - Add two widening products with accumulate
+ * (halfword to doubleword)
+ */
+uint64_t HELPER(pm2wadda_h)(CPURISCVState *env, uint32_t rs1,
+                            uint32_t rs2, uint64_t dest)
+{
+    int16_t s1_h0 =3D EXTRACT16(rs1, 0);
+    int16_t s1_h1 =3D EXTRACT16(rs1, 1);
+    int16_t s2_h0 =3D EXTRACT16(rs2, 0);
+    int16_t s2_h1 =3D EXTRACT16(rs2, 1);
+    int64_t d_h =3D (int64_t)dest;
+    int64_t mul_00 =3D (int64_t)s1_h0 * (int64_t)s2_h0;
+    int64_t mul_11 =3D (int64_t)s1_h1 * (int64_t)s2_h1;
+    return (uint64_t)(d_h + mul_00 + mul_11);
+}
+
+/**
+ * PM2WADDASU.H - Add two widening products with accumulate
+ * (signed x unsigned, halfword to doubleword)
+ */
+uint64_t HELPER(pm2waddasu_h)(CPURISCVState *env, uint32_t rs1,
+                              uint32_t rs2, uint64_t dest)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    uint16_t s2_h0 =3D (uint16_t)EXTRACT16(rs2, 0);
+    uint16_t s2_h1 =3D (uint16_t)EXTRACT16(rs2, 1);
+    int64_t d_h =3D (int64_t)dest;
+    int64_t mul_00 =3D (int64_t)s1_h0 * (uint64_t)s2_h0;
+    int64_t mul_11 =3D (int64_t)s1_h1 * (uint64_t)s2_h1;
+    return (uint64_t)(d_h + mul_00 + mul_11);
+}
+
+/**
+ * PM2WADDAU.H - Add two widening products with accumulate
+ * (unsigned, halfword to doubleword)
+ */
+uint64_t HELPER(pm2waddau_h)(CPURISCVState *env, uint32_t rs1,
+                             uint32_t rs2, uint64_t dest)
+{
+    uint16_t s1_h0 =3D (uint16_t)EXTRACT16(rs1, 0);
+    uint16_t s1_h1 =3D (uint16_t)EXTRACT16(rs1, 1);
+    uint16_t s2_h0 =3D (uint16_t)EXTRACT16(rs2, 0);
+    uint16_t s2_h1 =3D (uint16_t)EXTRACT16(rs2, 1);
+    uint64_t d_h =3D (uint64_t)dest;
+    uint64_t mul_00 =3D (uint64_t)s1_h0 * (uint64_t)s2_h0;
+    uint64_t mul_11 =3D (uint64_t)s1_h1 * (uint64_t)s2_h1;
+    return (uint64_t)(d_h + mul_00 + mul_11);
+}
+
+/**
+ * PM2WADD.HX - Add two widening cross products (halfword to doubleword)
+ */
+uint64_t HELPER(pm2wadd_hx)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0);
+    int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1);
+    int64_t prod01 =3D (int64_t)s1_h0 * (int64_t)s2_h1;
+    int64_t prod10 =3D (int64_t)s1_h1 * (int64_t)s2_h0;
+    return (uint64_t)(prod01 + prod10);
+}
+
+/**
+ * PM2WADDA.HX - Add two widening cross products with accumulate
+ * (halfword to doubleword)
+ */
+uint64_t HELPER(pm2wadda_hx)(CPURISCVState *env, uint32_t rs1,
+                             uint32_t rs2, uint64_t dest)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0);
+    int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod01 =3D (int64_t)s1_h0 * (int64_t)s2_h1;
+    int64_t prod10 =3D (int64_t)s1_h1 * (int64_t)s2_h0;
+    return (uint64_t)(d + prod01 + prod10);
+}
+
+/**
+ * PM2WSUB.H - Subtract two widening products (halfword to doubleword)
+ */
+uint64_t HELPER(pm2wsub_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0);
+    int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1);
+    int64_t prod0 =3D (int64_t)s1_h0 * (int64_t)s2_h0;
+    int64_t prod1 =3D (int64_t)s1_h1 * (int64_t)s2_h1;
+    return (uint64_t)(prod0 - prod1);
+}
+
+/**
+ * PM2WSUB.HX - Subtract two widening cross products (halfword to doublewo=
rd)
+ */
+uint64_t HELPER(pm2wsub_hx)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0);
+    int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1);
+    int64_t prod01 =3D (int64_t)s1_h0 * (int64_t)s2_h1;
+    int64_t prod10 =3D (int64_t)s1_h1 * (int64_t)s2_h0;
+    return (uint64_t)(prod01 - prod10);
+}
+
+/**
+ * PM2WSUBA.H - Subtract two widening products with accumulate
+ * (halfword to doubleword)
+ */
+uint64_t HELPER(pm2wsuba_h)(CPURISCVState *env, uint32_t rs1,
+                            uint32_t rs2, uint64_t dest)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0);
+    int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod0 =3D (int64_t)s1_h0 * (int64_t)s2_h0;
+    int64_t prod1 =3D (int64_t)s1_h1 * (int64_t)s2_h1;
+    return (uint64_t)(d + prod0 - prod1);
+}
+
+/**
+ * PM2WSUBA.HX - Subtract two widening cross products with accumulate
+ * (halfword to doubleword)
+ */
+uint64_t HELPER(pm2wsuba_hx)(CPURISCVState *env, uint32_t rs1,
+                             uint32_t rs2, uint64_t dest)
+{
+    int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0);
+    int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1);
+    int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0);
+    int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1);
+    int64_t d =3D (int64_t)dest;
+    int64_t prod01 =3D (int64_t)s1_h0 * (int64_t)s2_h1;
+    int64_t prod10 =3D (int64_t)s1_h1 * (int64_t)s2_h0;
+    return (uint64_t)(d + prod01 - prod10);
+}
--=20
2.34.1
From nobody Sat May 30 20:13:16 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1776422973305318.5674874282191;
 Fri, 17 Apr 2026 03:49:33 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists1p.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1wDgjq-0001XR-Ue; Fri, 17 Apr 2026 06:47:54 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjo-0001Tu-Hj; Fri, 17 Apr 2026 06:47:52 -0400
Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn)
 by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256)
 (Exim 4.90_1) (envelope-from <xiaoou@iscas.ac.cn>)
 id 1wDgjl-000846-F4; Fri, 17 Apr 2026 06:47:52 -0400
Received: from Huawei.localdomain (unknown [36.110.52.2])
 by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S16;
 Fri, 17 Apr 2026 18:47:25 +0800 (CST)
From: Molly Chen <xiaoou@iscas.ac.cn>
To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com,
 daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com,
 chao.liu.zevorn@gmail.com
Cc: xiaoou@iscas.ac.cn,
	qemu-riscv@nongnu.org,
	qemu-devel@nongnu.org
Subject: [PATCH 14/14] target/riscv: rvp: update to v020,
 add SHL and PNCLIP[U]P.* instructions
Date: Fri, 17 Apr 2026 18:46:51 +0800
Message-Id: <20260417104652.17857-15-xiaoou@iscas.ac.cn>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
References: <20260417104652.17857-1-xiaoou@iscas.ac.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S16
X-Coremail-Antispam: 1UD129KBjvAXoWfJF4UWrWxCFyfXr4UGFyUZFb_yoW8Gw1xZo
 WrKw45Ar1fGw13u34F9w4UXr1UZr92vw1kGr48Zr42qas7Wr12gFn8J3s5AF40qrWayrW7
 XrZ3WryrtF1akr9rn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3
 AaLaJ3UjIYCTnIWjp_UUUOb7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva
 j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc
 Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l
 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV
 WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE
 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I
 x0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8
 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_Jw
 0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AK
 xVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrx
 kI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v2
 6r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F
 4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjfU5Tmh
 DUUUU
X-Originating-IP: [36.110.52.2]
X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists1p.gnu.org;
Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn;
 helo=cstnet.cn
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998,
 HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: qemu development <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1776422974548158500
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Molly Chen <xiaoou@iscas.ac.cn>
---
 target/riscv/helper.h                   |  14 +
 target/riscv/insn32.decode              |  22 ++
 target/riscv/insn_trans/trans_rvp.c.inc |  18 ++
 target/riscv/psimd_helper.c             | 370 ++++++++++++++++++++++++
 4 files changed, 424 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 85d4fe1b67..a9dbe53dbf 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1483,6 +1483,14 @@ DEF_HELPER_3(ssha, i32, env, i32, i32)
 DEF_HELPER_3(sshar, i32, env, i32, i32)
 DEF_HELPER_3(sha, i64, env, i64, i64)
 DEF_HELPER_3(shar, i64, env, i64, i64)
+DEF_HELPER_3(psshl_hs, tl, env, tl, tl)
+DEF_HELPER_3(psshlr_hs, tl, env, tl, tl)
+DEF_HELPER_3(psshl_ws, i64, env, i64, i64)
+DEF_HELPER_3(psshlr_ws, i64, env, i64, i64)
+DEF_HELPER_3(sshl, i32, env, i32, i32)
+DEF_HELPER_3(sshlr, i32, env, i32, i32)
+DEF_HELPER_3(shl, i64, env, i64, i64)
+DEF_HELPER_3(shlr, i64, env, i64, i64)
=20
 /* Packed SIMD - Exchange Operations */
 DEF_HELPER_3(pas_hx, tl, env, tl, tl)
@@ -1538,6 +1546,12 @@ DEF_HELPER_4(srx, tl, env, tl, tl, tl)
 DEF_HELPER_4(mvm, tl, env, tl, tl, tl)
 DEF_HELPER_4(mvmn, tl, env, tl, tl, tl)
 DEF_HELPER_4(merge, tl, env, tl, tl, tl)
+DEF_HELPER_3(pnclipp_b, i64, env, i64, i64)
+DEF_HELPER_3(pnclipup_b, i64, env, i64, i64)
+DEF_HELPER_3(pnclipp_h, i64, env, i64, i64)
+DEF_HELPER_3(pnclipup_h, i64, env, i64, i64)
+DEF_HELPER_3(pnclipp_w, i64, env, i64, i64)
+DEF_HELPER_3(pnclipup_w, i64, env, i64, i64)
=20
 /* Packed SIMD - Count Leading Operations */
 DEF_HELPER_2(cls, tl, env, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 7be0b9e5e6..d7aebd55e2 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -1293,6 +1293,18 @@ psshar_hs  1111100 ..... ..... 010 ..... 0011011 @r
 }
 sha        1110111 ..... ..... 010 ..... 0011011 @r
 shar       1111111 ..... ..... 010 ..... 0011011 @r
+psshl_hs   1010100 ..... ..... 010 ..... 0011011 @r
+psshlr_hs  1011100 ..... ..... 010 ..... 0011011 @r
+{
+  psshl_ws   1010101 ..... ..... 010 ..... 0011011 @r
+  sshl       1010101 ..... ..... 010 ..... 0011011 @r
+}
+{
+  psshlr_ws  1011101 ..... ..... 010 ..... 0011011 @r
+  sshlr      1011101 ..... ..... 010 ..... 0011011 @r
+}
+shl        1010111 ..... ..... 010 ..... 0011011 @r
+shlr       1011111 ..... ..... 010 ..... 0011011 @r
=20
 # Packed SIMD - Exchange Operations
 pas_hx     1000000 ..... ..... 110 ..... 0111011 @r
@@ -1346,6 +1358,12 @@ srx         1010111 ..... ..... 001 ..... 0111011 @r
 mvm         1010100 ..... ..... 001 ..... 0111011 @r
 mvmn        1010101 ..... ..... 001 ..... 0111011 @r
 merge       1010110 ..... ..... 001 ..... 0111011 @r
+pnclipp_b   1100000 ..... ..... 010 ..... 0111011 @r
+pnclipup_b  1000000 ..... ..... 010 ..... 0111011 @r
+pnclipp_h   1100001 ..... ..... 010 ..... 0111011 @r
+pnclipup_h  1000001 ..... ..... 010 ..... 0111011 @r
+pnclipp_w   1100011 ..... ..... 010 ..... 0111011 @r
+pnclipup_w  1000011 ..... ..... 010 ..... 0111011 @r
=20
 # Packed SIMD - Count Leading Operations
 cls    01100 0000011 ..... 001 ..... 0010011 @r2
@@ -1790,12 +1808,16 @@ psrl_dhs    0000100 ..... .... 1110 .... 00011011 @=
r_p_3
 psra_dhs    0100100 ..... .... 1110 .... 00011011 @r_p_3
 pssha_dhs   0110100 ..... .... 0110 .... 00011011 @r_p_3
 psshar_dhs  0111100 ..... .... 0110 .... 00011011 @r_p_3
+psshl_dhs   0010100 ..... .... 0110 .... 00011011 @r_p_3
+psshlr_dhs  0011100 ..... .... 0110 .... 00011011 @r_p_3
 padd_dws    0001101 ..... .... 0110 .... 00011011 @r_p_3
 psll_dws    0000101 ..... .... 0110 .... 00011011 @r_p_3
 psrl_dws    0000101 ..... .... 1110 .... 00011011 @r_p_3
 psra_dws    0100101 ..... .... 1110 .... 00011011 @r_p_3
 pssha_dws   0110101 ..... .... 0110 .... 00011011 @r_p_3
 psshar_dws  0111101 ..... .... 0110 .... 00011011 @r_p_3
+psshl_dws   0010101 ..... .... 0110 .... 00011011 @r_p_3
+psshlr_dws  0011101 ..... .... 0110 .... 00011011 @r_p_3
=20
 # register-pair operands
 ppaire_db    1000000 .... 0 .... 1110 .... 00011011 @r_p_2
diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr=
ans/trans_rvp.c.inc
index ca459293a3..b4142adcfb 100644
--- a/target/riscv/insn_trans/trans_rvp.c.inc
+++ b/target/riscv/insn_trans/trans_rvp.c.inc
@@ -686,6 +686,14 @@ GEN_SIMD_TRANS_32(ssha)
 GEN_SIMD_TRANS_32(sshar)
 GEN_SIMD_TRANS_64(sha)
 GEN_SIMD_TRANS_64(shar)
+GEN_SIMD_TRANS(psshl_hs)
+GEN_SIMD_TRANS(psshlr_hs)
+GEN_SIMD_TRANS_64(psshl_ws)
+GEN_SIMD_TRANS_64(psshlr_ws)
+GEN_SIMD_TRANS_32(sshl)
+GEN_SIMD_TRANS_32(sshlr)
+GEN_SIMD_TRANS_64(shl)
+GEN_SIMD_TRANS_64(shlr)
=20
 /* Packed SIMD - Exchange Operations */
 GEN_SIMD_TRANS(pas_hx)
@@ -739,6 +747,12 @@ GEN_SIMD_TRANS_ACC(srx)
 GEN_SIMD_TRANS_ACC(mvm)
 GEN_SIMD_TRANS_ACC(mvmn)
 GEN_SIMD_TRANS_ACC(merge)
+GEN_SIMD_TRANS_64(pnclipp_b)
+GEN_SIMD_TRANS_64(pnclipup_b)
+GEN_SIMD_TRANS_64(pnclipp_h)
+GEN_SIMD_TRANS_64(pnclipup_h)
+GEN_SIMD_TRANS_64(pnclipp_w)
+GEN_SIMD_TRANS_64(pnclipup_w)
=20
 /* Packed SIMD - Count Leading Operations */
 GEN_SIMD_TRANS_R1(cls)
@@ -1066,8 +1080,12 @@ GEN_SIMD_TRANS_REG_PAIR_3(psrl_dhs, psrl_hs)
 GEN_SIMD_TRANS_REG_PAIR_3(psra_dhs, psra_hs)
 GEN_SIMD_TRANS_REG_PAIR_3(pssha_dhs, pssha_hs)
 GEN_SIMD_TRANS_REG_PAIR_3(psshar_dhs, psshar_hs)
+GEN_SIMD_TRANS_REG_PAIR_3(psshl_dhs, psshl_hs)
+GEN_SIMD_TRANS_REG_PAIR_3(psshlr_dhs, psshlr_hs)
 GEN_SIMD_TRANS_REG_PAIR_DW(pssha_dws, ssha)
 GEN_SIMD_TRANS_REG_PAIR_DW(psshar_dws, sshar)
+GEN_SIMD_TRANS_REG_PAIR_DW(psshl_dws, sshl)
+GEN_SIMD_TRANS_REG_PAIR_DW(psshlr_dws, sshlr)
=20
 GEN_SIMD_TRANS_REG_PAIR_2(ppairo_db, ppairo_b)
 GEN_SIMD_TRANS_REG_PAIR_2(ppairo_dh, ppairo_h)
diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c
index 4c91800128..96e016d90d 100644
--- a/target/riscv/psimd_helper.c
+++ b/target/riscv/psimd_helper.c
@@ -2704,6 +2704,242 @@ uint64_t HELPER(shar)(CPURISCVState *env, uint64_t =
rs1, uint64_t rs2)
     }
 }
=20
+/**
+ * PSSHL.HS - Packed 16-bit variable shift with unsigned saturation
+ * Positive shift left (saturating), negative shift right (logical)
+ */
+target_ulong HELPER(psshl_hs)(CPURISCVState *env, target_ulong rs1,
+                              target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D (uint16_t)EXTRACT16(rs1, i);
+        uint16_t res;
+
+        if (shamt >=3D 0) {
+            uint32_t shifted =3D (shamt >=3D 16) ? ((uint32_t)e1 << 16)
+                                             : ((uint32_t)e1 << shamt);
+            res =3D unsigned_saturate_h(shifted, &sat);
+        } else {
+            int right =3D -shamt;
+            if (right >=3D 16) {
+                res =3D 0;
+            } else {
+                res =3D e1 >> right;
+            }
+        }
+
+        rd =3D INSERT16(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSHLR.HS - Packed 16-bit variable shift with rounding
+ * and unsigned saturation
+ * Positive shift left (saturating), negative shift right (logical, rounde=
d)
+ */
+target_ulong HELPER(psshlr_hs)(CPURISCVState *env, target_ulong rs1,
+                               target_ulong rs2)
+{
+    target_ulong rd =3D 0;
+    int elems =3D ELEMS_H(rd);
+    int sat =3D 0;
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+
+    for (int i =3D 0; i < elems; i++) {
+        uint16_t e1 =3D (uint16_t)EXTRACT16(rs1, i);
+        uint16_t res;
+
+        if (shamt >=3D 0) {
+            uint32_t shifted =3D (shamt >=3D 16) ? ((uint32_t)e1 << 16)
+                                             : ((uint32_t)e1 << shamt);
+            res =3D unsigned_saturate_h(shifted, &sat);
+        } else {
+            int right =3D -shamt;
+            if (right > 16) {
+                res =3D 0;
+            } else {
+                uint32_t rounded =3D ((uint32_t)e1 >> (right - 1)) + 1;
+                res =3D (uint16_t)(rounded >> 1);
+            }
+        }
+
+        rd =3D INSERT16(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSHL.WS - Packed 32-bit variable shift with unsigned saturation (RV64 =
only)
+ * Positive shift left (saturating), negative shift right (logical)
+ */
+uint64_t HELPER(psshl_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int sat =3D 0;
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t e1 =3D (uint32_t)EXTRACT32(rs1, i);
+        uint32_t res;
+
+        if (shamt >=3D 0) {
+            uint64_t shifted =3D (shamt >=3D 32) ? ((uint64_t)e1 << 32)
+                                             : ((uint64_t)e1 << shamt);
+            res =3D unsigned_saturate_w(shifted, &sat);
+        } else {
+            int right =3D -shamt;
+            if (right >=3D 32) {
+                res =3D 0;
+            } else {
+                res =3D e1 >> right;
+            }
+        }
+
+        rd =3D INSERT32(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PSSHLR.WS - Packed 32-bit variable shift with rounding
+ * and unsigned saturation (RV64 only)
+ * Positive shift left (saturating), negative shift right (logical, rounde=
d)
+ */
+uint64_t HELPER(psshlr_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int sat =3D 0;
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t e1 =3D (uint32_t)EXTRACT32(rs1, i);
+        uint32_t res;
+
+        if (shamt >=3D 0) {
+            uint64_t shifted =3D (shamt >=3D 32) ? ((uint64_t)e1 << 32)
+                                             : ((uint64_t)e1 << shamt);
+            res =3D unsigned_saturate_w(shifted, &sat);
+        } else {
+            int right =3D -shamt;
+            if (right > 32) {
+                res =3D 0;
+            } else {
+                uint64_t rounded =3D ((uint64_t)e1 >> (right - 1)) + 1;
+                res =3D (uint32_t)(rounded >> 1);
+            }
+        }
+
+        rd =3D INSERT32(rd, res, i);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * SSHL - 32-bit scalar variable shift with unsigned saturation
+ */
+uint32_t HELPER(sshl)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int sat =3D 0;
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+
+    if (shamt < 0) {
+        int right =3D -shamt;
+        return (right >=3D 32) ? 0 : (rs1 >> right);
+    } else {
+        uint64_t shifted =3D (shamt >=3D 32) ? ((uint64_t)rs1 << 32)
+                                         : ((uint64_t)rs1 << shamt);
+        uint32_t res =3D unsigned_saturate_w(shifted, &sat);
+        if (sat) {
+            env->vxsat =3D 1;
+        }
+        return res;
+    }
+}
+
+/**
+ * SSHLR - 32-bit scalar variable shift with rounding and unsigned saturat=
ion
+ */
+uint32_t HELPER(sshlr)(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
+{
+    int sat =3D 0;
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+
+    if (shamt < 0) {
+        int right =3D -shamt;
+        if (right > 32) {
+            return 0;
+        } else {
+            uint64_t rounded =3D ((uint64_t)rs1 >> (right - 1)) + 1;
+            return rounded >> 1;
+        }
+    } else {
+        uint64_t shifted =3D (shamt >=3D 32) ? ((uint64_t)rs1 << 32)
+                                         : ((uint64_t)rs1 << shamt);
+        uint32_t res =3D unsigned_saturate_w(shifted, &sat);
+        if (sat) {
+            env->vxsat =3D 1;
+        }
+        return res;
+    }
+}
+
+/**
+ * SHL - 64-bit scalar variable logical shift
+ */
+uint64_t HELPER(shl)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+
+    if (shamt < 0) {
+        int right =3D -shamt;
+        return (right >=3D 64) ? 0 : (rs1 >> right);
+    } else {
+        return (shamt >=3D 64) ? 0 : (rs1 << shamt);
+    }
+}
+
+/**
+ * SHLR - 64-bit scalar variable logical shift with rounding
+ */
+uint64_t HELPER(shlr)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    int8_t shamt =3D (int8_t)(rs2 & 0xFF);
+
+    if (shamt < 0) {
+        int right =3D -shamt;
+        if (right > 64) {
+            return 0;
+        } else {
+            uint64_t rounded =3D (rs1 >> (right - 1)) + 1;
+            return rounded >> 1;
+        }
+    } else {
+        return (shamt >=3D 64) ? 0 : (rs1 << shamt);
+    }
+}
+
 /* Exchange operations (AS/SA/AS/SA with X suffix) */
=20
 /**
@@ -3573,6 +3809,140 @@ target_ulong HELPER(merge)(CPURISCVState *env, targ=
et_ulong rs1,
     return (~rd & rs1) | (rd & rs2);
 }
=20
+/**
+ * PNCLIPP.B - Pack narrow clip signed halfwords to bytes (RV64 only)
+ */
+uint64_t HELPER(pnclipp_b)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        int16_t lo =3D (int16_t)EXTRACT16(rs1, i);
+        int16_t hi =3D (int16_t)EXTRACT16(rs2, i);
+        int8_t res_lo =3D signed_saturate_b(lo, &sat);
+        int8_t res_hi =3D signed_saturate_b(hi, &sat);
+
+        rd =3D (uint64_t)INSERT8(rd, res_lo, i);
+        rd =3D (uint64_t)INSERT8(rd, res_hi, i + 4);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPUP.B - Pack narrow clip unsigned halfwords to bytes (RV64 only)
+ */
+uint64_t HELPER(pnclipup_b)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < 4; i++) {
+        uint16_t lo =3D (uint16_t)EXTRACT16(rs1, i);
+        uint16_t hi =3D (uint16_t)EXTRACT16(rs2, i);
+        uint8_t res_lo =3D unsigned_saturate_b(lo, &sat);
+        uint8_t res_hi =3D unsigned_saturate_b(hi, &sat);
+
+        rd =3D (uint64_t)INSERT8(rd, res_lo, i);
+        rd =3D (uint64_t)INSERT8(rd, res_hi, i + 4);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPP.H - Pack narrow clip signed words to halfwords (RV64 only)
+ */
+uint64_t HELPER(pnclipp_h)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        int32_t lo =3D (int32_t)EXTRACT32(rs1, i);
+        int32_t hi =3D (int32_t)EXTRACT32(rs2, i);
+        int16_t res_lo =3D signed_saturate_h(lo, &sat);
+        int16_t res_hi =3D signed_saturate_h(hi, &sat);
+
+        rd =3D (uint64_t)INSERT16(rd, res_lo, i);
+        rd =3D (uint64_t)INSERT16(rd, res_hi, i + 2);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPUP.H - Pack narrow clip unsigned words to halfwords (RV64 only)
+ */
+uint64_t HELPER(pnclipup_h)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int sat =3D 0;
+
+    for (int i =3D 0; i < 2; i++) {
+        uint32_t lo =3D (uint32_t)EXTRACT32(rs1, i);
+        uint32_t hi =3D (uint32_t)EXTRACT32(rs2, i);
+        uint16_t res_lo =3D unsigned_saturate_h(lo, &sat);
+        uint16_t res_hi =3D unsigned_saturate_h(hi, &sat);
+
+        rd =3D (uint64_t)INSERT16(rd, res_lo, i);
+        rd =3D (uint64_t)INSERT16(rd, res_hi, i + 2);
+    }
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPP.W - Pack narrow clip signed doublewords to words (RV64 only)
+ */
+uint64_t HELPER(pnclipp_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int sat =3D 0;
+    int32_t res_lo =3D signed_saturate_w((int64_t)rs1, &sat);
+    int32_t res_hi =3D signed_saturate_w((int64_t)rs2, &sat);
+
+    rd =3D (uint64_t)(uint32_t)res_lo;
+    rd |=3D (uint64_t)(uint32_t)res_hi << 32;
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
+/**
+ * PNCLIPUP.W - Pack narrow clip unsigned doublewords to words (RV64 only)
+ */
+uint64_t HELPER(pnclipup_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
+{
+    uint64_t rd =3D 0;
+    int sat =3D 0;
+    uint32_t res_lo =3D unsigned_saturate_w(rs1, &sat);
+    uint32_t res_hi =3D unsigned_saturate_w(rs2, &sat);
+
+    rd =3D (uint64_t)res_lo;
+    rd |=3D (uint64_t)res_hi << 32;
+
+    if (sat) {
+        env->vxsat =3D 1;
+    }
+    return rd;
+}
+
 /* Count leading operations */
=20
 /**
--=20
2.34.1