Series comparison

-[PULL 00/25] riscv-to-apply queue
+[PULL v2 00/45] riscv-to-apply queue
-From: Alistair Francis <alistair.francis@wdc.com>
+The following changes since commit c5ea91da443b458352c1b629b490ee6631775cb4:
-The following changes since commit 9cc1bf1ebca550f8d90f967ccd2b6d2e00e81387:
+  Merge tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu into staging (2023-09-08 10:06:25 -0400)
   Merge tag 'pull-xen-20220609' of https://xenbits.xen.org/git-http/people/aperard/qemu-dm into staging (2022-06-09 08:25:17 -0700)
 are available in the Git repository at:
-  git@github.com:alistair23/qemu.git tags/pull-riscv-to-apply-20220610
+  https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20230911
-for you to fetch changes up to 07314158f6aa4d2589520c194a7531b9364a8d54:
+for you to fetch changes up to e7a03409f29e2da59297d55afbaec98c96e43e3a:
-  target/riscv: trans_rvv: Avoid assert for RV32 and e64 (2022-06-10 09:42:12 +1000)
+  target/riscv: don't read CSR in riscv_csrrw_do64 (2023-09-11 11:45:55 +1000)
 ----------------------------------------------------------------
-Fourth RISC-V PR for QEMU 7.1
+First RISC-V PR for 8.2
-* Update MAINTAINERS
+ * Remove 'host' CPU from TCG
-* Add support for Zmmul extension
+ * riscv_htif Fixup printing on big endian hosts
-* Fixup FDT errors when supplying device tree from the command line for virt machine
+ * Add zmmul isa string
-* Avoid overflowing the addr_config buffer in the SiFive PLIC
+ * Add smepmp isa string
-* Support -device loader addresses above 2GB
+ * Fix page_check_range use in fault-only-first
-* Correctly wake from WFI on VS-level external interrupts
+ * Use existing lookup tables for MixColumns
-* Fixes for RV128 support
+ * Add RISC-V vector cryptographic instruction set support
-* Support Vector extension tail agnostic setting elements' bits to all 1s
+ * Implement WARL behaviour for mcountinhibit/mcounteren
-* Don't expose the CPU properties on named CPUs
+ * Add Zihintntl extension ISA string to DTS
-* Fix vector extension assert for RV32
+ * Fix zfa fleq.d and fltq.d
  * Fix upper/lower mtime write calculation
  * Make rtc variable names consistent
  * Use abi type for linux-user target_ucontext
  * Add RISC-V KVM AIA Support
  * Fix riscv,pmu DT node path in the virt machine
  * Update CSR bits name for svadu extension
  * Mark zicond non-experimental
  * Fix satp_mode_finalize() when satp_mode.supported = 0
  * Fix non-KVM --enable-debug build
  * Add new extensions to hwprobe
  * Use accelerated helper for AES64KS1I
  * Allocate itrigger timers only once
  * Respect mseccfg.RLB for pmpaddrX changes
  * Align the AIA model to v1.0 ratified spec
  * Don't read the CSR in riscv_csrrw_do64
 ----------------------------------------------------------------
-Alistair Francis (4):
+Akihiko Odaki (1):
-      MAINTAINERS: Cover hw/core/uboot_image.h within Generic Loader section
+      target/riscv: Allocate itrigger timers only once
       hw/intc: sifive_plic: Avoid overflowing the addr_config buffer
       target/riscv: Don't expose the CPU properties on names CPUs
       target/riscv: trans_rvv: Avoid assert for RV32 and e64
-Andrew Bresticker (1):
+Ard Biesheuvel (2):
-      target/riscv: Wake on VS-level external interrupts
+      target/riscv: Use existing lookup tables for MixColumns
       target/riscv: Use accelerated helper for AES64KS1I
-Atish Patra (1):
+Conor Dooley (1):
-      hw/riscv: virt: Generate fw_cfg DT node correctly
+      hw/riscv: virt: Fix riscv,pmu DT node path
-Frédéric Pétrot (1):
+Daniel Henrique Barboza (6):
-      target/riscv/debug.c: keep experimental rv128 support working
+      target/riscv/cpu.c: do not run 'host' CPU with TCG
       target/riscv/cpu.c: add zmmul isa string
       target/riscv/cpu.c: add smepmp isa string
       target/riscv: fix satp_mode_finalize() when satp_mode.supported = 0
       hw/riscv/virt.c: fix non-KVM --enable-debug build
       hw/intc/riscv_aplic.c fix non-KVM --enable-debug build
-Jamie Iles (1):
+Dickon Hood (2):
-      hw/core/loader: return image sizes as ssize_t
+      target/riscv: Refactor translation of vector-widening instruction
       target/riscv: Add Zvbb ISA extension support
 Jason Chien (3):
       target/riscv: Add Zihintntl extension ISA string to DTS
       hw/intc: Fix upper/lower mtime write calculation
       hw/intc: Make rtc variable names consistent
 Kiran Ostrolenk (4):
       target/riscv: Refactor some of the generic vector functionality
       target/riscv: Refactor vector-vector translation macro
       target/riscv: Refactor some of the generic vector functionality
       target/riscv: Add Zvknh ISA extension support
 LIU Zhiwei (3):
       target/riscv: Fix page_check_range use in fault-only-first
       target/riscv: Fix zfa fleq.d and fltq.d
       linux-user/riscv: Use abi type for target_ucontext
 Lawrence Hunter (2):
       target/riscv: Add Zvbc ISA extension support
       target/riscv: Add Zvksh ISA extension support
 Leon Schuermann (1):
       target/riscv/pmp.c: respect mseccfg.RLB for pmpaddrX changes
 Max Chou (3):
       crypto: Create sm4_subword
       crypto: Add SM4 constant parameter CK
       target/riscv: Add Zvksed ISA extension support
 Nazar Kazakov (4):
       target/riscv: Remove redundant "cpu_vl == 0" checks
       target/riscv: Move vector translation checks
       target/riscv: Add Zvkned ISA extension support
       target/riscv: Add Zvkg ISA extension support
 Nikita Shubin (1):
       target/riscv: don't read CSR in riscv_csrrw_do64
 Rob Bradford (1):
       target/riscv: Implement WARL behaviour for mcountinhibit/mcounteren
 Robbin Ehn (1):
       linux-user/riscv: Add new extensions to hwprobe
 Thomas Huth (2):
       hw/char/riscv_htif: Fix printing of console characters on big endian hosts
       hw/char/riscv_htif: Fix the console syscall on big endian hosts
 Tommy Wu (1):
       target/riscv: Align the AIA model to v1.0 ratified spec
 Vineet Gupta (1):
       riscv: zicond: make non-experimental
 Weiwei Li (1):
-      target/riscv: add support for zmmul extension v0.1
+      target/riscv: Update CSR bits name for svadu extension
-eopXD (16):
+Yong-Xuan Wang (5):
-      target/riscv: rvv: Prune redundant ESZ, DSZ parameter passed
+      target/riscv: support the AIA device emulation with KVM enabled
-      target/riscv: rvv: Prune redundant access_type parameter passed
+      target/riscv: check the in-kernel irqchip support
-      target/riscv: rvv: Rename ambiguous esz
+      target/riscv: Create an KVM AIA irqchip
-      target/riscv: rvv: Early exit when vstart >= vl
+      target/riscv: update APLIC and IMSIC to support KVM AIA
-      target/riscv: rvv: Add tail agnostic for vv instructions
+      target/riscv: select KVM AIA in riscv virt machine
       target/riscv: rvv: Add tail agnostic for vector load / store instructions
       target/riscv: rvv: Add tail agnostic for vx, vvm, vxm instructions
       target/riscv: rvv: Add tail agnostic for vector integer shift instructions
       target/riscv: rvv: Add tail agnostic for vector integer comparison instructions
       target/riscv: rvv: Add tail agnostic for vector integer merge and move instructions
       target/riscv: rvv: Add tail agnostic for vector fix-point arithmetic instructions
       target/riscv: rvv: Add tail agnostic for vector floating-point instructions
       target/riscv: rvv: Add tail agnostic for vector reduction instructions
       target/riscv: rvv: Add tail agnostic for vector mask instructions
       target/riscv: rvv: Add tail agnostic for vector permutation instructions
       target/riscv: rvv: Add option 'rvv_ta_all_1s' to enable optional tail agnostic behavior
- include/hw/loader.h                     |   55 +-
+ include/crypto/aes.h                      |   7 +
- target/riscv/cpu.h                      |    4 +
+ include/crypto/sm4.h                      |   9 +
- target/riscv/internals.h                |    6 +-
+ target/riscv/cpu_bits.h                   |   8 +-
- hw/arm/armv7m.c                         |    2 +-
+ target/riscv/cpu_cfg.h                    |   9 +
- hw/arm/boot.c                           |    8 +-
+ target/riscv/debug.h                      |   3 +-
- hw/core/generic-loader.c                |    2 +-
+ target/riscv/helper.h                     |  98 +++
- hw/core/loader.c                        |   81 +-
+ target/riscv/kvm_riscv.h                  |   5 +
- hw/i386/x86.c                           |    2 +-
+ target/riscv/vector_internals.h           | 228 +++++++
- hw/intc/sifive_plic.c                   |   19 +-
+ target/riscv/insn32.decode                |  58 ++
- hw/riscv/boot.c                         |    5 +-
+ crypto/aes.c                              |   4 +-
- hw/riscv/virt.c                         |   28 +-
+ crypto/sm4.c                              |  10 +
- target/riscv/cpu.c                      |   68 +-
+ hw/char/riscv_htif.c                      |  12 +-
- target/riscv/cpu_helper.c               |    4 +-
+ hw/intc/riscv_aclint.c                    |  11 +-
- target/riscv/debug.c                    |    2 +
+ hw/intc/riscv_aplic.c                     |  52 +-
- target/riscv/translate.c                |    4 +
+ hw/intc/riscv_imsic.c                     |  25 +-
- target/riscv/vector_helper.c            | 1588 +++++++++++++++++++------------
+ hw/riscv/virt.c                           | 374 ++++++------
- target/riscv/insn_trans/trans_rvm.c.inc |   18 +-
+ linux-user/riscv/signal.c                 |   4 +-
- target/riscv/insn_trans/trans_rvv.c.inc |  106 ++-
+ linux-user/syscall.c                      |  14 +-
- MAINTAINERS                             |    1 +
+ target/arm/tcg/crypto_helper.c            |  10 +-
-files changed, 1244 insertions(+), 759 deletions(-)
+ target/riscv/cpu.c                        |  83 ++-
  target/riscv/cpu_helper.c                 |   6 +-
  target/riscv/crypto_helper.c              |  51 +-
  target/riscv/csr.c                        |  54 +-
  target/riscv/debug.c                      |  15 +-
  target/riscv/kvm.c                        | 201 ++++++-
  target/riscv/pmp.c                        |   4 +
  target/riscv/translate.c                  |   1 +
  target/riscv/vcrypto_helper.c             | 970 ++++++++++++++++++++++++++++++
  target/riscv/vector_helper.c              | 245 +-------
  target/riscv/vector_internals.c           |  81 +++
  target/riscv/insn_trans/trans_rvv.c.inc   | 171 +++---
  target/riscv/insn_trans/trans_rvvk.c.inc  | 606 +++++++++++++++++++
  target/riscv/insn_trans/trans_rvzfa.c.inc |   4 +-
  target/riscv/meson.build                  |   4 +-
 files changed, 2785 insertions(+), 652 deletions(-)
  create mode 100644 target/riscv/vector_internals.h
  create mode 100644 target/riscv/vcrypto_helper.c
  create mode 100644 target/riscv/vector_internals.c
  create mode 100644 target/riscv/insn_trans/trans_rvvk.c.inc

-New patch
+[PULL v2 01/45] target/riscv/cpu.c: do not run 'host' CPU with TCG
+From: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
+The 'host' CPU is available in a CONFIG_KVM build and it's currently
+available for all accels, but is a KVM only CPU. This means that in a
+RISC-V KVM capable host we can do things like this:
+$ ./build/qemu-system-riscv64 -M virt,accel=tcg -cpu host --nographic
+qemu-system-riscv64: H extension requires priv spec 1.12.0
+This CPU does not have a priv spec because we don't filter its extensions
+via priv spec. We shouldn't be reaching riscv_cpu_realize_tcg() at all
+with the 'host' CPU.
+We don't have a way to filter the 'host' CPU out of the available CPU
+options (-cpu help) if the build includes both KVM and TCG. What we can
+do is to error out during riscv_cpu_realize_tcg() if the user chooses
+the 'host' CPU with accel=tcg:
+$ ./build/qemu-system-riscv64 -M virt,accel=tcg -cpu host --nographic
+qemu-system-riscv64: 'host' CPU is not compatible with TCG acceleration
+Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-Id: <20230721133411.474105-1-dbarboza@ventanamicro.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ target/riscv/cpu.c | 5 +++++
+file changed, 5 insertions(+)
+diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/cpu.c
++++ b/target/riscv/cpu.c
+@@ -XXX,XX +XXX,XX @@ static void riscv_cpu_realize_tcg(DeviceState *dev, Error **errp)
+     CPURISCVState *env = &cpu->env;
+     Error *local_err = NULL;
++    if (object_dynamic_cast(OBJECT(dev), TYPE_RISCV_CPU_HOST)) {
++        error_setg(errp, "'host' CPU is not compatible with TCG acceleration");
++        return;
++    }
++
+     riscv_cpu_validate_misa_mxl(cpu, &local_err);
+     if (local_err != NULL) {
+         error_propagate(errp, local_err);
+--
+.41.0

-New patch
+[PULL v2 02/45] hw/char/riscv_htif: Fix printing of console characters on big endian hosts
+From: Thomas Huth <thuth@redhat.com>
+The character that should be printed is stored in the 64 bit "payload"
+variable. The code currently tries to print it by taking the address
+of the variable and passing this pointer to qemu_chr_fe_write(). However,
+this only works on little endian hosts where the least significant bits
+are stored on the lowest address. To do this in a portable way, we have
+to store the value in an uint8_t variable instead.
+Fixes: 5033606780 ("RISC-V HTIF Console")
+Signed-off-by: Thomas Huth <thuth@redhat.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Bin Meng <bmeng@tinylab.org>
+Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-Id: <20230721094720.902454-2-thuth@redhat.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ hw/char/riscv_htif.c | 3 ++-
+file changed, 2 insertions(+), 1 deletion(-)
+diff --git a/hw/char/riscv_htif.c b/hw/char/riscv_htif.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/char/riscv_htif.c
++++ b/hw/char/riscv_htif.c
+@@ -XXX,XX +XXX,XX @@ static void htif_handle_tohost_write(HTIFState *s, uint64_t val_written)
+             s->tohost = 0; /* clear to indicate we read */
+             return;
+         } else if (cmd == HTIF_CONSOLE_CMD_PUTC) {
+-            qemu_chr_fe_write(&s->chr, (uint8_t *)&payload, 1);
++            uint8_t ch = (uint8_t)payload;
++            qemu_chr_fe_write(&s->chr, &ch, 1);
+             resp = 0x100 | (uint8_t)payload;
+         } else {
+             qemu_log("HTIF device %d: unknown command\n", device);
+--
+.41.0

-New patch
+[PULL v2 03/45] hw/char/riscv_htif: Fix the console syscall on big endian hosts
+From: Thomas Huth <thuth@redhat.com>
+Values that have been read via cpu_physical_memory_read() from the
+guest's memory have to be swapped in case the host endianess differs
+from the guest.
+Fixes: a6e13e31d5 ("riscv_htif: Support console output via proxy syscall")
+Signed-off-by: Thomas Huth <thuth@redhat.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Bin Meng <bmeng@tinylab.org>
+Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
+Message-Id: <20230721094720.902454-3-thuth@redhat.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ hw/char/riscv_htif.c | 9 +++++----
+file changed, 5 insertions(+), 4 deletions(-)
+diff --git a/hw/char/riscv_htif.c b/hw/char/riscv_htif.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/char/riscv_htif.c
++++ b/hw/char/riscv_htif.c
+@@ -XXX,XX +XXX,XX @@
+ #include "qemu/timer.h"
+ #include "qemu/error-report.h"
+ #include "exec/address-spaces.h"
++#include "exec/tswap.h"
+ #include "sysemu/dma.h"
+ #define RISCV_DEBUG_HTIF 0
+@@ -XXX,XX +XXX,XX @@ static void htif_handle_tohost_write(HTIFState *s, uint64_t val_written)
+             } else {
+                 uint64_t syscall[8];
+                 cpu_physical_memory_read(payload, syscall, sizeof(syscall));
+-                if (syscall[0] == PK_SYS_WRITE &&
+-                    syscall[1] == HTIF_DEV_CONSOLE &&
+-                    syscall[3] == HTIF_CONSOLE_CMD_PUTC) {
++                if (tswap64(syscall[0]) == PK_SYS_WRITE &&
++                    tswap64(syscall[1]) == HTIF_DEV_CONSOLE &&
++                    tswap64(syscall[3]) == HTIF_CONSOLE_CMD_PUTC) {
+                     uint8_t ch;
+-                    cpu_physical_memory_read(syscall[2], &ch, 1);
++                    cpu_physical_memory_read(tswap64(syscall[2]), &ch, 1);
+                     qemu_chr_fe_write(&s->chr, &ch, 1);
+                     resp = 0x100 | (uint8_t)payload;
+                 } else {
+--
+.41.0

-[PULL 10/25] target/riscv: rvv: Rename ambiguous esz
+[PULL v2 04/45] target/riscv/cpu.c: add zmmul isa string
-From: eopXD <yueh.ting.chen@gmail.com>
+From: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
-No functional change intended in this commit.
+zmmul was promoted from experimental to ratified in commit 6d00ffad4e95.
 Add a riscv,isa string for it.
-Signed-off-by: eop Chen <eop.chen@sifive.com>
+Fixes: 6d00ffad4e95 ("target/riscv: move zmmul out of the experimental properties")
-Reviewed-by: Frank Chang <frank.chang@sifive.com>
+Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
 Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-Message-Id: <165449614532.19704.7000832880482980398-3@git.sr.ht>
+Message-Id: <20230720132424.371132-2-dbarboza@ventanamicro.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/vector_helper.c | 76 ++++++++++++++++++------------------
+ target/riscv/cpu.c | 1 +
-file changed, 38 insertions(+), 38 deletions(-)
+file changed, 1 insertion(+)
-diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
+diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/riscv/vector_helper.c
+--- a/target/riscv/cpu.c
-+++ b/target/riscv/vector_helper.c
++++ b/target/riscv/cpu.c
-@@ -XXX,XX +XXX,XX @@ static inline int32_t vext_lmul(uint32_t desc)
+@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
- /*
+     ISA_EXT_DATA_ENTRY(zicsr, PRIV_VERSION_1_10_0, ext_icsr),
-  * Get the maximum number of elements can be operated.
+     ISA_EXT_DATA_ENTRY(zifencei, PRIV_VERSION_1_10_0, ext_ifencei),
-  *
+     ISA_EXT_DATA_ENTRY(zihintpause, PRIV_VERSION_1_10_0, ext_zihintpause),
-- * esz: log2 of element size in bytes.
++    ISA_EXT_DATA_ENTRY(zmmul, PRIV_VERSION_1_12_0, ext_zmmul),
-+ * log2_esz: log2 of element size in bytes.
+     ISA_EXT_DATA_ENTRY(zawrs, PRIV_VERSION_1_12_0, ext_zawrs),
-  */
+     ISA_EXT_DATA_ENTRY(zfa, PRIV_VERSION_1_12_0, ext_zfa),
--static inline uint32_t vext_max_elems(uint32_t desc, uint32_t esz)
+     ISA_EXT_DATA_ENTRY(zfbfmin, PRIV_VERSION_1_12_0, ext_zfbfmin),
 +static inline uint32_t vext_max_elems(uint32_t desc, uint32_t log2_esz)
  {
      /*
       * As simd_desc support at most 2048 bytes, the max vlen is 1024 bits.
@@ -XXX,XX +XXX,XX @@ static inline uint32_t vext_max_elems(uint32_t desc, uint32_t esz)
      uint32_t vlenb = simd_maxsz(desc);
      /* Return VLMAX */
 -    int scale = vext_lmul(desc) - esz;
 +    int scale = vext_lmul(desc) - log2_esz;
      return scale < 0 ? vlenb >> -scale : vlenb << scale;
  }
@@ -XXX,XX +XXX,XX @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
                   target_ulong stride, CPURISCVState *env,
                   uint32_t desc, uint32_t vm,
                   vext_ldst_elem_fn *ldst_elem,
 -                 uint32_t esz, uintptr_t ra)
 +                 uint32_t log2_esz, uintptr_t ra)
  {
      uint32_t i, k;
      uint32_t nf = vext_nf(desc);
 -    uint32_t max_elems = vext_max_elems(desc, esz);
 +    uint32_t max_elems = vext_max_elems(desc, log2_esz);
      for (i = env->vstart; i < env->vl; i++, env->vstart++) {
          if (!vm && !vext_elem_mask(v0, i)) {
@@ -XXX,XX +XXX,XX @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
          k = 0;
          while (k < nf) {
 -            target_ulong addr = base + stride * i + (k << esz);
 +            target_ulong addr = base + stride * i + (k << log2_esz);
              ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
              k++;
          }
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_ST_STRIDE(vsse64_v, int64_t, ste_d)
  /* unmasked unit-stride load and store operation*/
  static void
  vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
 -             vext_ldst_elem_fn *ldst_elem, uint32_t esz, uint32_t evl,
 +             vext_ldst_elem_fn *ldst_elem, uint32_t log2_esz, uint32_t evl,
               uintptr_t ra)
  {
      uint32_t i, k;
      uint32_t nf = vext_nf(desc);
 -    uint32_t max_elems = vext_max_elems(desc, esz);
 +    uint32_t max_elems = vext_max_elems(desc, log2_esz);
      /* load bytes from guest memory */
      for (i = env->vstart; i < evl; i++, env->vstart++) {
          k = 0;
          while (k < nf) {
 -            target_ulong addr = base + ((i * nf + k) << esz);
 +            target_ulong addr = base + ((i * nf + k) << log2_esz);
              ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
              k++;
          }
@@ -XXX,XX +XXX,XX @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
                  void *vs2, CPURISCVState *env, uint32_t desc,
                  vext_get_index_addr get_index_addr,
                  vext_ldst_elem_fn *ldst_elem,
 -                uint32_t esz, uintptr_t ra)
 +                uint32_t log2_esz, uintptr_t ra)
  {
      uint32_t i, k;
      uint32_t nf = vext_nf(desc);
      uint32_t vm = vext_vm(desc);
 -    uint32_t max_elems = vext_max_elems(desc, esz);
 +    uint32_t max_elems = vext_max_elems(desc, log2_esz);
      /* load bytes from guest memory */
      for (i = env->vstart; i < env->vl; i++, env->vstart++) {
@@ -XXX,XX +XXX,XX @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
          k = 0;
          while (k < nf) {
 -            abi_ptr addr = get_index_addr(base, i, vs2) + (k << esz);
 +            abi_ptr addr = get_index_addr(base, i, vs2) + (k << log2_esz);
              ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
              k++;
          }
@@ -XXX,XX +XXX,XX @@ static inline void
  vext_ldff(void *vd, void *v0, target_ulong base,
            CPURISCVState *env, uint32_t desc,
            vext_ldst_elem_fn *ldst_elem,
 -          uint32_t esz, uintptr_t ra)
 +          uint32_t log2_esz, uintptr_t ra)
  {
      void *host;
      uint32_t i, k, vl = 0;
      uint32_t nf = vext_nf(desc);
      uint32_t vm = vext_vm(desc);
 -    uint32_t max_elems = vext_max_elems(desc, esz);
 +    uint32_t max_elems = vext_max_elems(desc, log2_esz);
      target_ulong addr, offset, remain;
      /* probe every access*/
@@ -XXX,XX +XXX,XX @@ vext_ldff(void *vd, void *v0, target_ulong base,
          if (!vm && !vext_elem_mask(v0, i)) {
              continue;
          }
 -        addr = adjust_addr(env, base + i * (nf << esz));
 +        addr = adjust_addr(env, base + i * (nf << log2_esz));
          if (i == 0) {
 -            probe_pages(env, addr, nf << esz, ra, MMU_DATA_LOAD);
 +            probe_pages(env, addr, nf << log2_esz, ra, MMU_DATA_LOAD);
          } else {
              /* if it triggers an exception, no need to check watchpoint */
 -            remain = nf << esz;
 +            remain = nf << log2_esz;
              while (remain > 0) {
                  offset = -(addr | TARGET_PAGE_MASK);
                  host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD,
@@ -XXX,XX +XXX,XX @@ ProbeSuccess:
              continue;
          }
          while (k < nf) {
 -            target_ulong addr = base + ((i * nf + k) << esz);
 +            target_ulong addr = base + ((i * nf + k) << log2_esz);
              ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
              k++;
          }
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_LDFF(vle64ff_v, int64_t, lde_d)
   */
  static void
  vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
 -                vext_ldst_elem_fn *ldst_elem, uint32_t esz, uintptr_t ra)
 +                vext_ldst_elem_fn *ldst_elem, uint32_t log2_esz, uintptr_t ra)
  {
      uint32_t i, k, off, pos;
      uint32_t nf = vext_nf(desc);
      uint32_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
 -    uint32_t max_elems = vlenb >> esz;
 +    uint32_t max_elems = vlenb >> log2_esz;
      k = env->vstart / max_elems;
      off = env->vstart % max_elems;
@@ -XXX,XX +XXX,XX @@ vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
      if (off) {
          /* load/store rest of elements of current segment pointed by vstart */
          for (pos = off; pos < max_elems; pos++, env->vstart++) {
 -            target_ulong addr = base + ((pos + k * max_elems) << esz);
 +            target_ulong addr = base + ((pos + k * max_elems) << log2_esz);
              ldst_elem(env, adjust_addr(env, addr), pos + k * max_elems, vd, ra);
          }
          k++;
@@ -XXX,XX +XXX,XX @@ vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
      /* load/store elements for rest of segments */
      for (; k < nf; k++) {
          for (i = 0; i < max_elems; i++, env->vstart++) {
 -            target_ulong addr = base + ((i + k * max_elems) << esz);
 +            target_ulong addr = base + ((i + k * max_elems) << log2_esz);
              ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
          }
      }
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VSLIDEDOWN_VX(vslidedown_vx_h, uint16_t, H2)
  GEN_VEXT_VSLIDEDOWN_VX(vslidedown_vx_w, uint32_t, H4)
  GEN_VEXT_VSLIDEDOWN_VX(vslidedown_vx_d, uint64_t, H8)
 -#define GEN_VEXT_VSLIE1UP(ESZ, H)                                           \
 -static void vslide1up_##ESZ(void *vd, void *v0, target_ulong s1, void *vs2, \
 -                     CPURISCVState *env, uint32_t desc)                     \
 +#define GEN_VEXT_VSLIE1UP(BITWIDTH, H)                                      \
 +static void vslide1up_##BITWIDTH(void *vd, void *v0, target_ulong s1,       \
 +                     void *vs2, CPURISCVState *env, uint32_t desc)          \
  {                                                                           \
 -    typedef uint##ESZ##_t ETYPE;                                            \
 +    typedef uint##BITWIDTH##_t ETYPE;                                       \
      uint32_t vm = vext_vm(desc);                                            \
      uint32_t vl = env->vl;                                                  \
      uint32_t i;                                                             \
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VSLIE1UP(16, H2)
  GEN_VEXT_VSLIE1UP(32, H4)
  GEN_VEXT_VSLIE1UP(64, H8)
 -#define GEN_VEXT_VSLIDE1UP_VX(NAME, ESZ)                          \
 +#define GEN_VEXT_VSLIDE1UP_VX(NAME, BITWIDTH)                     \
  void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
                    CPURISCVState *env, uint32_t desc)              \
  {                                                                 \
 -    vslide1up_##ESZ(vd, v0, s1, vs2, env, desc);                  \
 +    vslide1up_##BITWIDTH(vd, v0, s1, vs2, env, desc);             \
  }
  /* vslide1up.vx vd, vs2, rs1, vm # vd[0]=x[rs1], vd[i+1] = vs2[i] */
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VSLIDE1UP_VX(vslide1up_vx_h, 16)
  GEN_VEXT_VSLIDE1UP_VX(vslide1up_vx_w, 32)
  GEN_VEXT_VSLIDE1UP_VX(vslide1up_vx_d, 64)
 -#define GEN_VEXT_VSLIDE1DOWN(ESZ, H)                                          \
 -static void vslide1down_##ESZ(void *vd, void *v0, target_ulong s1, void *vs2, \
 -                       CPURISCVState *env, uint32_t desc)                     \
 +#define GEN_VEXT_VSLIDE1DOWN(BITWIDTH, H)                                     \
 +static void vslide1down_##BITWIDTH(void *vd, void *v0, target_ulong s1,       \
 +                       void *vs2, CPURISCVState *env, uint32_t desc)          \
  {                                                                             \
 -    typedef uint##ESZ##_t ETYPE;                                              \
 +    typedef uint##BITWIDTH##_t ETYPE;                                         \
      uint32_t vm = vext_vm(desc);                                              \
      uint32_t vl = env->vl;                                                    \
      uint32_t i;                                                               \
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VSLIDE1DOWN(16, H2)
  GEN_VEXT_VSLIDE1DOWN(32, H4)
  GEN_VEXT_VSLIDE1DOWN(64, H8)
 -#define GEN_VEXT_VSLIDE1DOWN_VX(NAME, ESZ)                        \
 +#define GEN_VEXT_VSLIDE1DOWN_VX(NAME, BITWIDTH)                   \
  void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
                    CPURISCVState *env, uint32_t desc)              \
  {                                                                 \
 -    vslide1down_##ESZ(vd, v0, s1, vs2, env, desc);                \
 +    vslide1down_##BITWIDTH(vd, v0, s1, vs2, env, desc);           \
  }
  /* vslide1down.vx vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=x[rs1] */
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_w, 32)
  GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_d, 64)
  /* Vector Floating-Point Slide Instructions */
 -#define GEN_VEXT_VFSLIDE1UP_VF(NAME, ESZ)                     \
 +#define GEN_VEXT_VFSLIDE1UP_VF(NAME, BITWIDTH)                \
  void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2, \
                    CPURISCVState *env, uint32_t desc)          \
  {                                                             \
 -    vslide1up_##ESZ(vd, v0, s1, vs2, env, desc);              \
 +    vslide1up_##BITWIDTH(vd, v0, s1, vs2, env, desc);         \
  }
  /* vfslide1up.vf vd, vs2, rs1, vm # vd[0]=f[rs1], vd[i+1] = vs2[i] */
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VFSLIDE1UP_VF(vfslide1up_vf_h, 16)
  GEN_VEXT_VFSLIDE1UP_VF(vfslide1up_vf_w, 32)
  GEN_VEXT_VFSLIDE1UP_VF(vfslide1up_vf_d, 64)
 -#define GEN_VEXT_VFSLIDE1DOWN_VF(NAME, ESZ)                   \
 +#define GEN_VEXT_VFSLIDE1DOWN_VF(NAME, BITWIDTH)              \
  void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2, \
                    CPURISCVState *env, uint32_t desc)          \
  {                                                             \
 -    vslide1down_##ESZ(vd, v0, s1, vs2, env, desc);            \
 +    vslide1down_##BITWIDTH(vd, v0, s1, vs2, env, desc);       \
  }
  /* vfslide1down.vf vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=f[rs1] */
 --
-.36.1
+.41.0

-[PULL 23/25] target/riscv: rvv: Add option 'rvv_ta_all_1s' to enable optional tail agnostic behavior
+[PULL v2 05/45] target/riscv/cpu.c: add smepmp isa string
-From: eopXD <eop.chen@sifive.com>
+From: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
-According to v-spec, tail agnostic behavior can be either kept as
+The cpu->cfg.epmp extension is still experimental, but it already has a
-undisturbed or set elements' bits to all 1s. To distinguish the
+'smepmp' riscv,isa string. Add it.
 difference of tail policies, QEMU should be able to simulate the tail
 agnostic behavior as "set tail elements' bits to all 1s".
-There are multiple possibility for agnostic elements according to
+Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
 v-spec. The main intent of this patch-set tries to add option that
 can distinguish between tail policies. Setting agnostic elements to
 all 1s allows QEMU to express this.
 This commit adds option 'rvv_ta_all_1s' is added to enable the
 behavior, it is default as disabled.
 Signed-off-by: eop Chen <eop.chen@sifive.com>
 Reviewed-by: Frank Chang <frank.chang@sifive.com>
 Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-Message-Id: <165449614532.19704.7000832880482980398-16@git.sr.ht>
+Message-Id: <20230720132424.371132-3-dbarboza@ventanamicro.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/cpu.c | 2 ++
+ target/riscv/cpu.c | 1 +
-file changed, 2 insertions(+)
+file changed, 1 insertion(+)
 diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/cpu.c
 +++ b/target/riscv/cpu.c
-@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_properties[] = {
+@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
-     DEFINE_PROP_UINT64("resetvec", RISCVCPU, cfg.resetvec, DEFAULT_RSTVEC),
+     ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
+     ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
-     DEFINE_PROP_BOOL("short-isa-string", RISCVCPU, cfg.short_isa_string, false),
+     ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
-+
++    ISA_EXT_DATA_ENTRY(smepmp, PRIV_VERSION_1_12_0, epmp),
-+    DEFINE_PROP_BOOL("rvv_ta_all_1s", RISCVCPU, cfg.rvv_ta_all_1s, false),
+     ISA_EXT_DATA_ENTRY(smstateen, PRIV_VERSION_1_12_0, ext_smstateen),
-     DEFINE_PROP_END_OF_LIST(),
+     ISA_EXT_DATA_ENTRY(ssaia, PRIV_VERSION_1_12_0, ext_ssaia),
- };
+     ISA_EXT_DATA_ENTRY(sscofpmf, PRIV_VERSION_1_12_0, ext_sscofpmf),
 --
-.36.1
+.41.0

-[PULL 21/25] target/riscv: rvv: Add tail agnostic for vector mask instructions
+[PULL v2 06/45] target/riscv: Fix page_check_range use in fault-only-first
-From: eopXD <yueh.ting.chen@gmail.com>
+From: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
-The tail elements in the destination mask register are updated under
+Commit bef6f008b98(accel/tcg: Return bool from page_check_range) converts
-a tail-agnostic policy.
+integer return value to bool type. However, it wrongly converted the use
 of the API in riscv fault-only-first, where page_check_range < = 0, should
 be converted to !page_check_range.
-Signed-off-by: eop Chen <eop.chen@sifive.com>
+Signed-off-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
-Reviewed-by: Frank Chang <frank.chang@sifive.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
+Message-ID: <20230729031618.821-1-zhiwei_liu@linux.alibaba.com>
 Acked-by: Alistair Francis <alistair.francis@wdc.com>
 Message-Id: <165449614532.19704.7000832880482980398-14@git.sr.ht>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/vector_helper.c            | 30 +++++++++++++++++++++++++
+ target/riscv/vector_helper.c | 2 +-
- target/riscv/insn_trans/trans_rvv.c.inc |  6 +++++
+file changed, 1 insertion(+), 1 deletion(-)
 files changed, 36 insertions(+)
 diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/vector_helper.c
 +++ b/target/riscv/vector_helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+@@ -XXX,XX +XXX,XX @@ vext_ldff(void *vd, void *v0, target_ulong base,
-                   uint32_t desc)                          \
+                                          cpu_mmu_index(env, false));
- {                                                         \
+                 if (host) {
-     uint32_t vl = env->vl;                                \
+ #ifdef CONFIG_USER_ONLY
-+    uint32_t total_elems = env_archcpu(env)->cfg.vlen;    \
+-                    if (page_check_range(addr, offset, PAGE_READ)) {
-+    uint32_t vta_all_1s = vext_vta_all_1s(desc);          \
++                    if (!page_check_range(addr, offset, PAGE_READ)) {
-     uint32_t i;                                           \
+                         vl = i;
-     int a, b;                                             \
+                         goto ProbeSuccess;
-                                                           \
+                     }
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
          vext_set_elem_mask(vd, i, OP(b, a));              \
      }                                                     \
      env->vstart = 0;                                      \
 +    /* mask destination register are always tail-         \
 +     * agnostic                                           \
 +     */                                                   \
 +    /* set tail elements to 1s */                         \
 +    if (vta_all_1s) {                                     \
 +        for (; i < total_elems; i++) {                    \
 +            vext_set_elem_mask(vd, i, 1);                 \
 +        }                                                 \
 +    }                                                     \
  }
  #define DO_NAND(N, M)  (!(N & M))
@@ -XXX,XX +XXX,XX @@ static void vmsetm(void *vd, void *v0, void *vs2, CPURISCVState *env,
  {
      uint32_t vm = vext_vm(desc);
      uint32_t vl = env->vl;
 +    uint32_t total_elems = env_archcpu(env)->cfg.vlen;
 +    uint32_t vta_all_1s = vext_vta_all_1s(desc);
      int i;
      bool first_mask_bit = false;
@@ -XXX,XX +XXX,XX @@ static void vmsetm(void *vd, void *v0, void *vs2, CPURISCVState *env,
          }
      }
      env->vstart = 0;
 +    /* mask destination register are always tail-agnostic */
 +    /* set tail elements to 1s */
 +    if (vta_all_1s) {
 +        for (; i < total_elems; i++) {
 +            vext_set_elem_mask(vd, i, 1);
 +        }
 +    }
  }
  void HELPER(vmsbf_m)(void *vd, void *v0, void *vs2, CPURISCVState *env,
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs2, CPURISCVState *env,      \
  {                                                                         \
      uint32_t vm = vext_vm(desc);                                          \
      uint32_t vl = env->vl;                                                \
 +    uint32_t esz = sizeof(ETYPE);                                         \
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
 +    uint32_t vta = vext_vta(desc);                                        \
      uint32_t sum = 0;                                                     \
      int i;                                                                \
                                                                            \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs2, CPURISCVState *env,      \
          }                                                                 \
      }                                                                     \
      env->vstart = 0;                                                      \
 +    /* set tail elements to 1s */                                         \
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
  }
  GEN_VEXT_VIOTA_M(viota_m_b, uint8_t,  H1)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, CPURISCVState *env, uint32_t desc)  \
  {                                                                         \
      uint32_t vm = vext_vm(desc);                                          \
      uint32_t vl = env->vl;                                                \
 +    uint32_t esz = sizeof(ETYPE);                                         \
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
 +    uint32_t vta = vext_vta(desc);                                        \
      int i;                                                                \
                                                                            \
      for (i = env->vstart; i < vl; i++) {                                  \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, CPURISCVState *env, uint32_t desc)  \
          *((ETYPE *)vd + H(i)) = i;                                        \
      }                                                                     \
      env->vstart = 0;                                                      \
 +    /* set tail elements to 1s */                                         \
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
  }
  GEN_VEXT_VID_V(vid_v_b, uint8_t,  H1)
 diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/insn_trans/trans_rvv.c.inc
 +++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_r *a)                \
          tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                     \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
 +        data =                                                     \
 +            FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);\
          tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                             vreg_ofs(s, a->rs1),                    \
                             vreg_ofs(s, a->rs2), cpu_env,           \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
 +        data =                                                     \
 +            FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);\
          tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd),                     \
                             vreg_ofs(s, 0), vreg_ofs(s, a->rs2),    \
                             cpu_env, s->cfg_ptr->vlen / 8,          \
@@ -XXX,XX +XXX,XX @@ static bool trans_viota_m(DisasContext *s, arg_viota_m *a)
          data = FIELD_DP32(data, VDATA, VM, a->vm);
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);
          static gen_helper_gvec_3_ptr * const fns[4] = {
              gen_helper_viota_m_b, gen_helper_viota_m_h,
              gen_helper_viota_m_w, gen_helper_viota_m_d,
@@ -XXX,XX +XXX,XX @@ static bool trans_vid_v(DisasContext *s, arg_vid_v *a)
          data = FIELD_DP32(data, VDATA, VM, a->vm);
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);
          static gen_helper_gvec_2_ptr * const fns[4] = {
              gen_helper_vid_v_b, gen_helper_vid_v_h,
              gen_helper_vid_v_w, gen_helper_vid_v_d,
 --
-.36.1
+.41.0

-New patch
+[PULL v2 07/45] target/riscv: Use existing lookup tables for MixColumns
+From: Ard Biesheuvel <ardb@kernel.org>
+The AES MixColumns and InvMixColumns operations are relatively
+expensive 4x4 matrix multiplications in GF(2^8), which is why C
+implementations usually rely on precomputed lookup tables rather than
+performing the calculations on demand.
+Given that we already carry those tables in QEMU, we can just grab the
+right value in the implementation of the RISC-V AES32 instructions. Note
+that the tables in question are permuted according to the respective
+Sbox, so we can omit the Sbox lookup as well in this case.
+Cc: Richard Henderson <richard.henderson@linaro.org>
+Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
+Cc: Zewen Ye <lustrew@foxmail.com>
+Cc: Weiwei Li <liweiwei@iscas.ac.cn>
+Cc: Junqiang Wang <wangjunqiang@iscas.ac.cn>
+Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-ID: <20230731084043.1791984-1-ardb@kernel.org>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ include/crypto/aes.h         |  7 +++++++
+ crypto/aes.c                 |  4 ++--
+ target/riscv/crypto_helper.c | 34 ++++------------------------------
+files changed, 13 insertions(+), 32 deletions(-)
+diff --git a/include/crypto/aes.h b/include/crypto/aes.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/crypto/aes.h
++++ b/include/crypto/aes.h
+@@ -XXX,XX +XXX,XX @@ void AES_decrypt(const unsigned char *in, unsigned char *out,
+ extern const uint8_t AES_sbox[256];
+ extern const uint8_t AES_isbox[256];
++/*
++AES_Te0[x] = S [x].[02, 01, 01, 03];
++AES_Td0[x] = Si[x].[0e, 09, 0d, 0b];
++*/
++
++extern const uint32_t AES_Te0[256], AES_Td0[256];
++
+ #endif
+diff --git a/crypto/aes.c b/crypto/aes.c
+index XXXXXXX..XXXXXXX 100644
+--- a/crypto/aes.c
++++ b/crypto/aes.c
+@@ -XXX,XX +XXX,XX @@ AES_Td3[x] = Si[x].[09, 0d, 0b, 0e];
+ AES_Td4[x] = Si[x].[01, 01, 01, 01];
+ */
+-static const uint32_t AES_Te0[256] = {
++const uint32_t AES_Te0[256] = {
+xc66363a5U, 0xf87c7c84U, 0xee777799U, 0xf67b7b8dU,
+xfff2f20dU, 0xd66b6bbdU, 0xde6f6fb1U, 0x91c5c554U,
+x60303050U, 0x02010103U, 0xce6767a9U, 0x562b2b7dU,
+@@ -XXX,XX +XXX,XX @@ static const uint32_t AES_Te4[256] = {
+xb0b0b0b0U, 0x54545454U, 0xbbbbbbbbU, 0x16161616U,
+ };
+-static const uint32_t AES_Td0[256] = {
++const uint32_t AES_Td0[256] = {
+x51f4a750U, 0x7e416553U, 0x1a17a4c3U, 0x3a275e96U,
+x3bab6bcbU, 0x1f9d45f1U, 0xacfa58abU, 0x4be30393U,
+x2030fa55U, 0xad766df6U, 0x88cc7691U, 0xf5024c25U,
+diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/crypto_helper.c
++++ b/target/riscv/crypto_helper.c
+@@ -XXX,XX +XXX,XX @@
+ #include "crypto/aes-round.h"
+ #include "crypto/sm4.h"
+-#define AES_XTIME(a) \
+-    ((a << 1) ^ ((a & 0x80) ? 0x1b : 0))
+-
+-#define AES_GFMUL(a, b) (( \
+-    (((b) & 0x1) ? (a) : 0) ^ \
+-    (((b) & 0x2) ? AES_XTIME(a) : 0) ^ \
+-    (((b) & 0x4) ? AES_XTIME(AES_XTIME(a)) : 0) ^ \
+-    (((b) & 0x8) ? AES_XTIME(AES_XTIME(AES_XTIME(a))) : 0)) & 0xFF)
+-
+-static inline uint32_t aes_mixcolumn_byte(uint8_t x, bool fwd)
+-{
+-    uint32_t u;
+-
+-    if (fwd) {
+-        u = (AES_GFMUL(x, 3) << 24) | (x << 16) | (x << 8) |
+-            (AES_GFMUL(x, 2) << 0);
+-    } else {
+-        u = (AES_GFMUL(x, 0xb) << 24) | (AES_GFMUL(x, 0xd) << 16) |
+-            (AES_GFMUL(x, 0x9) << 8) | (AES_GFMUL(x, 0xe) << 0);
+-    }
+-    return u;
+-}
+-
+ #define sext32_xlen(x) (target_ulong)(int32_t)(x)
+ static inline target_ulong aes32_operation(target_ulong shamt,
+@@ -XXX,XX +XXX,XX @@ static inline target_ulong aes32_operation(target_ulong shamt,
+                                            bool enc, bool mix)
+ {
+     uint8_t si = rs2 >> shamt;
+-    uint8_t so;
+     uint32_t mixed;
+     target_ulong res;
+     if (enc) {
+-        so = AES_sbox[si];
+         if (mix) {
+-            mixed = aes_mixcolumn_byte(so, true);
++            mixed = be32_to_cpu(AES_Te0[si]);
+         } else {
+-            mixed = so;
++            mixed = AES_sbox[si];
+         }
+     } else {
+-        so = AES_isbox[si];
+         if (mix) {
+-            mixed = aes_mixcolumn_byte(so, false);
++            mixed = be32_to_cpu(AES_Td0[si]);
+         } else {
+-            mixed = so;
++            mixed = AES_isbox[si];
+         }
+     }
+     mixed = rol32(mixed, shamt);
+--
+.41.0

-[PULL 12/25] target/riscv: rvv: Add tail agnostic for vv instructions
+[PULL v2 08/45] target/riscv: Refactor some of the generic vector functionality
-From: eopXD <eop.chen@sifive.com>
+From: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
-According to v-spec, tail agnostic behavior can be either kept as
+Take some functions/macros out of `vector_helper` and put them in a new
-undisturbed or set elements' bits to all 1s. To distinguish the
+module called `vector_internals`. This ensures they can be used by both
-difference of tail policies, QEMU should be able to simulate the tail
+vector and vector-crypto helpers (latter implemented in proceeding
-agnostic behavior as "set tail elements' bits to all 1s".
+commits).
-There are multiple possibility for agnostic elements according to
+Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
 v-spec. The main intent of this patch-set tries to add option that
 can distinguish between tail policies. Setting agnostic elements to
 all 1s allows QEMU to express this.
 This is the first commit regarding the optional tail agnostic
 behavior. Follow-up commits will add this optional behavior
 for all rvv instructions.
 Signed-off-by: eop Chen <eop.chen@sifive.com>
 Reviewed-by: Frank Chang <frank.chang@sifive.com>
 Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
+Signed-off-by: Max Chou <max.chou@sifive.com>
 Acked-by: Alistair Francis <alistair.francis@wdc.com>
-Message-Id: <165449614532.19704.7000832880482980398-5@git.sr.ht>
+Message-ID: <20230711165917.2629866-2-max.chou@sifive.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/cpu.h                      |   2 +
+ target/riscv/vector_internals.h | 182 +++++++++++++++++++++++++++++
- target/riscv/internals.h                |   5 +-
+ target/riscv/vector_helper.c    | 201 +-------------------------------
- target/riscv/cpu_helper.c               |   2 +
+ target/riscv/vector_internals.c |  81 +++++++++++++
- target/riscv/translate.c                |   2 +
+ target/riscv/meson.build        |   1 +
- target/riscv/vector_helper.c            | 296 +++++++++++++-----------
+files changed, 265 insertions(+), 200 deletions(-)
- target/riscv/insn_trans/trans_rvv.c.inc |   3 +-
+ create mode 100644 target/riscv/vector_internals.h
-files changed, 178 insertions(+), 132 deletions(-)
+ create mode 100644 target/riscv/vector_internals.c
-diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
+diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
-index XXXXXXX..XXXXXXX 100644
+new file mode 100644
---- a/target/riscv/cpu.h
+index XXXXXXX..XXXXXXX
-+++ b/target/riscv/cpu.h
+--- /dev/null
-@@ -XXX,XX +XXX,XX @@ struct RISCVCPUConfig {
++++ b/target/riscv/vector_internals.h
      bool ext_zve32f;
      bool ext_zve64f;
      bool ext_zmmul;
 +    bool rvv_ta_all_1s;
      uint32_t mvendorid;
      uint64_t marchid;
@@ -XXX,XX +XXX,XX @@ FIELD(TB_FLAGS, XL, 20, 2)
  /* If PointerMasking should be applied */
  FIELD(TB_FLAGS, PM_MASK_ENABLED, 22, 1)
  FIELD(TB_FLAGS, PM_BASE_ENABLED, 23, 1)
 +FIELD(TB_FLAGS, VTA, 24, 1)
  #ifdef TARGET_RISCV32
  #define riscv_cpu_mxl(env)  ((void)(env), MXL_RV32)
 diff --git a/target/riscv/internals.h b/target/riscv/internals.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/internals.h
 +++ b/target/riscv/internals.h
 @@ -XXX,XX +XXX,XX @@
- /* share data between vector helpers and decode code */
++/*
- FIELD(VDATA, VM, 0, 1)
++ * RISC-V Vector Extension Internals
- FIELD(VDATA, LMUL, 1, 3)
++ *
--FIELD(VDATA, NF, 4, 4)
++ * Copyright (c) 2020 T-Head Semiconductor Co., Ltd. All rights reserved.
--FIELD(VDATA, WD, 4, 1)
++ *
-+FIELD(VDATA, VTA, 4, 1)
++ * This program is free software; you can redistribute it and/or modify it
-+FIELD(VDATA, NF, 5, 4)
++ * under the terms and conditions of the GNU General Public License,
-+FIELD(VDATA, WD, 5, 1)
++ * version 2 or later, as published by the Free Software Foundation.
++ *
- /* float point classify helpers */
++ * This program is distributed in the hope it will be useful, but WITHOUT
- target_ulong fclass_h(uint64_t frs1);
++ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
++ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-index XXXXXXX..XXXXXXX 100644
++ * more details.
---- a/target/riscv/cpu_helper.c
++ *
-+++ b/target/riscv/cpu_helper.c
++ * You should have received a copy of the GNU General Public License along with
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
++ * this program.  If not, see <http://www.gnu.org/licenses/>.
-         flags = FIELD_DP32(flags, TB_FLAGS, LMUL,
++ */
-                     FIELD_EX64(env->vtype, VTYPE, VLMUL));
++
-         flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
++#ifndef TARGET_RISCV_VECTOR_INTERNALS_H
-+        flags = FIELD_DP32(flags, TB_FLAGS, VTA,
++#define TARGET_RISCV_VECTOR_INTERNALS_H
-+                    FIELD_EX64(env->vtype, VTYPE, VTA));
++
-     } else {
++#include "qemu/osdep.h"
-         flags = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
++#include "qemu/bitops.h"
-     }
++#include "cpu.h"
-diff --git a/target/riscv/translate.c b/target/riscv/translate.c
++#include "tcg/tcg-gvec-desc.h"
-index XXXXXXX..XXXXXXX 100644
++#include "internals.h"
---- a/target/riscv/translate.c
++
-+++ b/target/riscv/translate.c
++static inline uint32_t vext_nf(uint32_t desc)
-@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
++{
-      */
++    return FIELD_EX32(simd_data(desc), VDATA, NF);
-     int8_t lmul;
++}
-     uint8_t sew;
++
-+    uint8_t vta;
++/*
-     target_ulong vstart;
++ * Note that vector data is stored in host-endian 64-bit chunks,
-     bool vl_eq_vlmax;
++ * so addressing units smaller than that needs a host-endian fixup.
-     uint8_t ntemp;
++ */
-@@ -XXX,XX +XXX,XX @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
++#if HOST_BIG_ENDIAN
-     ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
++#define H1(x)   ((x) ^ 7)
-     ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
++#define H1_2(x) ((x) ^ 6)
-     ctx->lmul = sextract32(FIELD_EX32(tb_flags, TB_FLAGS, LMUL), 0, 3);
++#define H1_4(x) ((x) ^ 4)
-+    ctx->vta = FIELD_EX32(tb_flags, TB_FLAGS, VTA) && cpu->cfg.rvv_ta_all_1s;
++#define H2(x)   ((x) ^ 3)
-     ctx->vstart = env->vstart;
++#define H4(x)   ((x) ^ 1)
-     ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
++#define H8(x)   ((x))
-     ctx->misa_mxl_max = env->misa_mxl_max;
++#else
 +#define H1(x)   (x)
 +#define H1_2(x) (x)
 +#define H1_4(x) (x)
 +#define H2(x)   (x)
 +#define H4(x)   (x)
 +#define H8(x)   (x)
 +#endif
 +
 +/*
 + * Encode LMUL to lmul as following:
 + *     LMUL    vlmul    lmul
 + *      1       000       0
 + *      2       001       1
 + *      4       010       2
 + *      8       011       3
 + *      -       100       -
 + *     1/8      101      -3
 + *     1/4      110      -2
 + *     1/2      111      -1
 + */
 +static inline int32_t vext_lmul(uint32_t desc)
 +{
 +    return sextract32(FIELD_EX32(simd_data(desc), VDATA, LMUL), 0, 3);
 +}
 +
 +static inline uint32_t vext_vm(uint32_t desc)
 +{
 +    return FIELD_EX32(simd_data(desc), VDATA, VM);
 +}
 +
 +static inline uint32_t vext_vma(uint32_t desc)
 +{
 +    return FIELD_EX32(simd_data(desc), VDATA, VMA);
 +}
 +
 +static inline uint32_t vext_vta(uint32_t desc)
 +{
 +    return FIELD_EX32(simd_data(desc), VDATA, VTA);
 +}
 +
 +static inline uint32_t vext_vta_all_1s(uint32_t desc)
 +{
 +    return FIELD_EX32(simd_data(desc), VDATA, VTA_ALL_1S);
 +}
 +
 +/*
 + * Earlier designs (pre-0.9) had a varying number of bits
 + * per mask value (MLEN). In the 0.9 design, MLEN=1.
 + * (Section 4.5)
 + */
 +static inline int vext_elem_mask(void *v0, int index)
 +{
 +    int idx = index / 64;
 +    int pos = index  % 64;
 +    return (((uint64_t *)v0)[idx] >> pos) & 1;
 +}
 +
 +/*
 + * Get number of total elements, including prestart, body and tail elements.
 + * Note that when LMUL < 1, the tail includes the elements past VLMAX that
 + * are held in the same vector register.
 + */
 +static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
 +                                            uint32_t esz)
 +{
 +    uint32_t vlenb = simd_maxsz(desc);
 +    uint32_t sew = 1 << FIELD_EX64(env->vtype, VTYPE, VSEW);
 +    int8_t emul = ctzl(esz) - ctzl(sew) + vext_lmul(desc) < 0 ? 0 :
 +                  ctzl(esz) - ctzl(sew) + vext_lmul(desc);
 +    return (vlenb << emul) / esz;
 +}
 +
 +/* set agnostic elements to 1s */
 +void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
 +                       uint32_t tot);
 +
 +/* expand macro args before macro */
 +#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
 +
 +/* (TD, T1, T2, TX1, TX2) */
 +#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
 +#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
 +#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
 +#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
 +
 +/* operation of two vector elements */
 +typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
 +
 +#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
 +static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
 +{                                                               \
 +    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
 +    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
 +    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
 +}
 +
 +void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
 +                CPURISCVState *env, uint32_t desc,
 +                opivv2_fn *fn, uint32_t esz);
 +
 +/* generate the helpers for OPIVV */
 +#define GEN_VEXT_VV(NAME, ESZ)                            \
 +void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
 +                  void *vs2, CPURISCVState *env,          \
 +                  uint32_t desc)                          \
 +{                                                         \
 +    do_vext_vv(vd, v0, vs1, vs2, env, desc,               \
 +               do_##NAME, ESZ);                           \
 +}
 +
 +typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
 +
 +/*
 + * (T1)s1 gives the real operator type.
 + * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
 + */
 +#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
 +static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
 +{                                                                   \
 +    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
 +    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
 +}
 +
 +void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
 +                CPURISCVState *env, uint32_t desc,
 +                opivx2_fn fn, uint32_t esz);
 +
 +/* generate the helpers for OPIVX */
 +#define GEN_VEXT_VX(NAME, ESZ)                            \
 +void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
 +                  void *vs2, CPURISCVState *env,          \
 +                  uint32_t desc)                          \
 +{                                                         \
 +    do_vext_vx(vd, v0, s1, vs2, env, desc,                \
 +               do_##NAME, ESZ);                           \
 +}
 +
 +#endif /* TARGET_RISCV_VECTOR_INTERNALS_H */
 diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/vector_helper.c
 +++ b/target/riscv/vector_helper.c
-@@ -XXX,XX +XXX,XX @@ static inline int32_t vext_lmul(uint32_t desc)
+@@ -XXX,XX +XXX,XX @@
-     return sextract32(FIELD_EX32(simd_data(desc), VDATA, LMUL), 0, 3);
+ #include "fpu/softfloat.h"
  #include "tcg/tcg-gvec-desc.h"
  #include "internals.h"
 +#include "vector_internals.h"
  #include <math.h>
  target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
@@ -XXX,XX +XXX,XX @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
      return vl;
  }
-+static inline uint32_t vext_vta(uint32_t desc)
+-/*
-+{
+- * Note that vector data is stored in host-endian 64-bit chunks,
-+    return FIELD_EX32(simd_data(desc), VDATA, VTA);
+- * so addressing units smaller than that needs a host-endian fixup.
-+}
+- */
-+
+-#if HOST_BIG_ENDIAN
 -#define H1(x)   ((x) ^ 7)
 -#define H1_2(x) ((x) ^ 6)
 -#define H1_4(x) ((x) ^ 4)
 -#define H2(x)   ((x) ^ 3)
 -#define H4(x)   ((x) ^ 1)
 -#define H8(x)   ((x))
 -#else
 -#define H1(x)   (x)
 -#define H1_2(x) (x)
 -#define H1_4(x) (x)
 -#define H2(x)   (x)
 -#define H4(x)   (x)
 -#define H8(x)   (x)
 -#endif
 -
 -static inline uint32_t vext_nf(uint32_t desc)
 -{
 -    return FIELD_EX32(simd_data(desc), VDATA, NF);
 -}
 -
 -static inline uint32_t vext_vm(uint32_t desc)
 -{
 -    return FIELD_EX32(simd_data(desc), VDATA, VM);
 -}
 -
 -/*
 - * Encode LMUL to lmul as following:
 - *     LMUL    vlmul    lmul
 - *      1       000       0
 - *      2       001       1
 - *      4       010       2
 - *      8       011       3
 - *      -       100       -
 - *     1/8      101      -3
 - *     1/4      110      -2
 - *     1/2      111      -1
 - */
 -static inline int32_t vext_lmul(uint32_t desc)
 -{
 -    return sextract32(FIELD_EX32(simd_data(desc), VDATA, LMUL), 0, 3);
 -}
 -
 -static inline uint32_t vext_vta(uint32_t desc)
 -{
 -    return FIELD_EX32(simd_data(desc), VDATA, VTA);
 -}
 -
 -static inline uint32_t vext_vma(uint32_t desc)
 -{
 -    return FIELD_EX32(simd_data(desc), VDATA, VMA);
 -}
 -
 -static inline uint32_t vext_vta_all_1s(uint32_t desc)
 -{
 -    return FIELD_EX32(simd_data(desc), VDATA, VTA_ALL_1S);
 -}
 -
  /*
   * Get the maximum number of elements can be operated.
   *
 @@ -XXX,XX +XXX,XX @@ static inline uint32_t vext_max_elems(uint32_t desc, uint32_t log2_esz)
      return scale < 0 ? vlenb >> -scale : vlenb << scale;
  }
-+/*
+-/*
-+ * Get number of total elements, including prestart, body and tail elements.
+- * Get number of total elements, including prestart, body and tail elements.
-+ * Note that when LMUL < 1, the tail includes the elements past VLMAX that
+- * Note that when LMUL < 1, the tail includes the elements past VLMAX that
-+ * are held in the same vector register.
+- * are held in the same vector register.
-+ */
+- */
-+static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
+-static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
-+                                            uint32_t esz)
+-                                            uint32_t esz)
-+{
+-{
-+    uint32_t vlenb = simd_maxsz(desc);
+-    uint32_t vlenb = simd_maxsz(desc);
-+    uint32_t sew = 1 << FIELD_EX64(env->vtype, VTYPE, VSEW);
+-    uint32_t sew = 1 << FIELD_EX64(env->vtype, VTYPE, VSEW);
-+    int8_t emul = ctzl(esz) - ctzl(sew) + vext_lmul(desc) < 0 ? 0 :
+-    int8_t emul = ctzl(esz) - ctzl(sew) + vext_lmul(desc) < 0 ? 0 :
-+                  ctzl(esz) - ctzl(sew) + vext_lmul(desc);
+-                  ctzl(esz) - ctzl(sew) + vext_lmul(desc);
-+    return (vlenb << emul) / esz;
+-    return (vlenb << emul) / esz;
-+}
+-}
-+
+-
  static inline target_ulong adjust_addr(CPURISCVState *env, target_ulong addr)
  {
-     return (addr & env->cur_pmmask) | env->cur_pmbase;
+     return (addr & ~env->cur_pmmask) | env->cur_pmbase;
 @@ -XXX,XX +XXX,XX @@ static void probe_pages(CPURISCVState *env, target_ulong addr,
      }
  }
+-/* set agnostic elements to 1s */
+-static void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
+-                              uint32_t tot)
+-{
+-    if (is_agnostic == 0) {
+-        /* policy undisturbed */
+-        return;
+-    }
+-    if (tot - cnt == 0) {
+-        return;
+-    }
+-    memset(base + cnt, -1, tot - cnt);
+-}
+-
+ static inline void vext_set_elem_mask(void *v0, int index,
+                                       uint8_t value)
+ {
+@@ -XXX,XX +XXX,XX @@ static inline void vext_set_elem_mask(void *v0, int index,
+     ((uint64_t *)v0)[idx] = deposit64(old, pos, 1, value);
+ }
+-/*
+- * Earlier designs (pre-0.9) had a varying number of bits
+- * per mask value (MLEN). In the 0.9 design, MLEN=1.
+- * (Section 4.5)
+- */
+-static inline int vext_elem_mask(void *v0, int index)
+-{
+-    int idx = index / 64;
+-    int pos = index  % 64;
+-    return (((uint64_t *)v0)[idx] >> pos) & 1;
+-}
+-
+ /* elements operations for load and store */
+ typedef void vext_ldst_elem_fn(CPURISCVState *env, abi_ptr addr,
+                                uint32_t idx, void *vd, uintptr_t retaddr);
+@@ -XXX,XX +XXX,XX @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
+  * Vector Integer Arithmetic Instructions
+  */
+-/* expand macro args before macro */
+-#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
+-
+ /* (TD, T1, T2, TX1, TX2) */
+ #define OP_SSS_B int8_t, int8_t, int8_t, int8_t, int8_t
+ #define OP_SSS_H int16_t, int16_t, int16_t, int16_t, int16_t
+ #define OP_SSS_W int32_t, int32_t, int32_t, int32_t, int32_t
+ #define OP_SSS_D int64_t, int64_t, int64_t, int64_t, int64_t
+-#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
+-#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
+-#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
+-#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
+ #define OP_SUS_B int8_t, uint8_t, int8_t, uint8_t, int8_t
+ #define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
+ #define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
+@@ -XXX,XX +XXX,XX @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
+ #define NOP_UUU_H uint16_t, uint16_t, uint32_t, uint16_t, uint32_t
+ #define NOP_UUU_W uint32_t, uint32_t, uint64_t, uint32_t, uint64_t
+-/* operation of two vector elements */
+-typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
+-
+-#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
+-static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
+-{                                                               \
+-    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
+-    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
+-    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
+-}
+ #define DO_SUB(N, M) (N - M)
+ #define DO_RSUB(N, M) (M - N)
+@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vsub_vv_h, OP_SSS_H, H2, H2, H2, DO_SUB)
+ RVVCALL(OPIVV2, vsub_vv_w, OP_SSS_W, H4, H4, H4, DO_SUB)
+ RVVCALL(OPIVV2, vsub_vv_d, OP_SSS_D, H8, H8, H8, DO_SUB)
+-static void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
+-                       CPURISCVState *env, uint32_t desc,
+-                       opivv2_fn *fn, uint32_t esz)
+-{
+-    uint32_t vm = vext_vm(desc);
+-    uint32_t vl = env->vl;
+-    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+-    uint32_t vta = vext_vta(desc);
+-    uint32_t vma = vext_vma(desc);
+-    uint32_t i;
+-
+-    for (i = env->vstart; i < vl; i++) {
+-        if (!vm && !vext_elem_mask(v0, i)) {
+-            /* set masked-off elements to 1s */
+-            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
+-            continue;
+-        }
+-        fn(vd, vs1, vs2, i);
+-    }
+-    env->vstart = 0;
+-    /* set tail elements to 1s */
+-    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
+-}
+-
+-/* generate the helpers for OPIVV */
+-#define GEN_VEXT_VV(NAME, ESZ)                            \
+-void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+-                  void *vs2, CPURISCVState *env,          \
+-                  uint32_t desc)                          \
+-{                                                         \
+-    do_vext_vv(vd, v0, vs1, vs2, env, desc,               \
+-               do_##NAME, ESZ);                           \
+-}
+-
+ GEN_VEXT_VV(vadd_vv_b, 1)
+ GEN_VEXT_VV(vadd_vv_h, 2)
+ GEN_VEXT_VV(vadd_vv_w, 4)
+@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VV(vsub_vv_h, 2)
+ GEN_VEXT_VV(vsub_vv_w, 4)
+ GEN_VEXT_VV(vsub_vv_d, 8)
+-typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
+-
+-/*
+- * (T1)s1 gives the real operator type.
+- * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
+- */
+-#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
+-static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
+-{                                                                   \
+-    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
+-    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
+-}
+ RVVCALL(OPIVX2, vadd_vx_b, OP_SSS_B, H1, H1, DO_ADD)
+ RVVCALL(OPIVX2, vadd_vx_h, OP_SSS_H, H2, H2, DO_ADD)
+@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vrsub_vx_h, OP_SSS_H, H2, H2, DO_RSUB)
+ RVVCALL(OPIVX2, vrsub_vx_w, OP_SSS_W, H4, H4, DO_RSUB)
+ RVVCALL(OPIVX2, vrsub_vx_d, OP_SSS_D, H8, H8, DO_RSUB)
+-static void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
+-                       CPURISCVState *env, uint32_t desc,
+-                       opivx2_fn fn, uint32_t esz)
+-{
+-    uint32_t vm = vext_vm(desc);
+-    uint32_t vl = env->vl;
+-    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+-    uint32_t vta = vext_vta(desc);
+-    uint32_t vma = vext_vma(desc);
+-    uint32_t i;
+-
+-    for (i = env->vstart; i < vl; i++) {
+-        if (!vm && !vext_elem_mask(v0, i)) {
+-            /* set masked-off elements to 1s */
+-            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
+-            continue;
+-        }
+-        fn(vd, s1, vs2, i);
+-    }
+-    env->vstart = 0;
+-    /* set tail elements to 1s */
+-    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
+-}
+-
+-/* generate the helpers for OPIVX */
+-#define GEN_VEXT_VX(NAME, ESZ)                            \
+-void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
+-                  void *vs2, CPURISCVState *env,          \
+-                  uint32_t desc)                          \
+-{                                                         \
+-    do_vext_vx(vd, v0, s1, vs2, env, desc,                \
+-               do_##NAME, ESZ);                           \
+-}
+-
+ GEN_VEXT_VX(vadd_vx_b, 1)
+ GEN_VEXT_VX(vadd_vx_h, 2)
+ GEN_VEXT_VX(vadd_vx_w, 4)
+diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c
+new file mode 100644
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/target/riscv/vector_internals.c
+@@ -XXX,XX +XXX,XX @@
++/*
++ * RISC-V Vector Extension Internals
++ *
++ * Copyright (c) 2020 T-Head Semiconductor Co., Ltd. All rights reserved.
++ *
++ * This program is free software; you can redistribute it and/or modify it
++ * under the terms and conditions of the GNU General Public License,
++ * version 2 or later, as published by the Free Software Foundation.
++ *
++ * This program is distributed in the hope it will be useful, but WITHOUT
++ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
++ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
++ * more details.
++ *
++ * You should have received a copy of the GNU General Public License along with
++ * this program.  If not, see <http://www.gnu.org/licenses/>.
++ */
++
++#include "vector_internals.h"
++
 +/* set agnostic elements to 1s */
-+static void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
++void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
-+                              uint32_t tot)
++                       uint32_t tot)
 +{
 +    if (is_agnostic == 0) {
 +        /* policy undisturbed */
 +        return;
 +    }
 +    if (tot - cnt == 0) {
 +        return ;
 +    }
 +    memset(base + cnt, -1, tot - cnt);
 +}
 +
- static inline void vext_set_elem_mask(void *v0, int index,
++void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
-                                       uint8_t value)
++                CPURISCVState *env, uint32_t desc,
- {
++                opivv2_fn *fn, uint32_t esz)
-@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vsub_vv_d, OP_SSS_D, H8, H8, H8, DO_SUB)
++{
++    uint32_t vm = vext_vm(desc);
- static void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
++    uint32_t vl = env->vl;
                         CPURISCVState *env, uint32_t desc,
 -                       opivv2_fn *fn)
 +                       opivv2_fn *fn, uint32_t esz)
  {
      uint32_t vm = vext_vm(desc);
      uint32_t vl = env->vl;
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
 +    uint32_t vta = vext_vta(desc);
-     uint32_t i;
++    uint32_t vma = vext_vma(desc);
++    uint32_t i;
-     for (i = env->vstart; i < vl; i++) {
++
-@@ -XXX,XX +XXX,XX @@ static void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
++    for (i = env->vstart; i < vl; i++) {
-         fn(vd, vs1, vs2, i);
++        if (!vm && !vext_elem_mask(v0, i)) {
-     }
++            /* set masked-off elements to 1s */
-     env->vstart = 0;
++            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
 +            continue;
 +        }
 +        fn(vd, vs1, vs2, i);
 +    }
 +    env->vstart = 0;
 +    /* set tail elements to 1s */
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
- }
++}
++
- /* generate the helpers for OPIVV */
++void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
--#define GEN_VEXT_VV(NAME)                                 \
++                CPURISCVState *env, uint32_t desc,
-+#define GEN_VEXT_VV(NAME, ESZ)                            \
++                opivx2_fn fn, uint32_t esz)
- void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
++{
-                   void *vs2, CPURISCVState *env,          \
++    uint32_t vm = vext_vm(desc);
-                   uint32_t desc)                          \
++    uint32_t vl = env->vl;
- {                                                         \
++    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
-     do_vext_vv(vd, v0, vs1, vs2, env, desc,               \
++    uint32_t vta = vext_vta(desc);
--               do_##NAME);                                \
++    uint32_t vma = vext_vma(desc);
-+               do_##NAME, ESZ);                           \
++    uint32_t i;
- }
++
++    for (i = env->vstart; i < vl; i++) {
--GEN_VEXT_VV(vadd_vv_b)
++        if (!vm && !vext_elem_mask(v0, i)) {
--GEN_VEXT_VV(vadd_vv_h)
++            /* set masked-off elements to 1s */
--GEN_VEXT_VV(vadd_vv_w)
++            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
--GEN_VEXT_VV(vadd_vv_d)
++            continue;
--GEN_VEXT_VV(vsub_vv_b)
++        }
--GEN_VEXT_VV(vsub_vv_h)
++        fn(vd, s1, vs2, i);
--GEN_VEXT_VV(vsub_vv_w)
++    }
--GEN_VEXT_VV(vsub_vv_d)
++    env->vstart = 0;
-+GEN_VEXT_VV(vadd_vv_b, 1)
++    /* set tail elements to 1s */
-+GEN_VEXT_VV(vadd_vv_h, 2)
++    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
-+GEN_VEXT_VV(vadd_vv_w, 4)
++}
-+GEN_VEXT_VV(vadd_vv_d, 8)
+diff --git a/target/riscv/meson.build b/target/riscv/meson.build
 +GEN_VEXT_VV(vsub_vv_b, 1)
 +GEN_VEXT_VV(vsub_vv_h, 2)
 +GEN_VEXT_VV(vsub_vv_w, 4)
 +GEN_VEXT_VV(vsub_vv_d, 8)
  typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vwadd_wv_w, WOP_WSSS_W, H8, H4, H4, DO_ADD)
  RVVCALL(OPIVV2, vwsub_wv_b, WOP_WSSS_B, H2, H1, H1, DO_SUB)
  RVVCALL(OPIVV2, vwsub_wv_h, WOP_WSSS_H, H4, H2, H2, DO_SUB)
  RVVCALL(OPIVV2, vwsub_wv_w, WOP_WSSS_W, H8, H4, H4, DO_SUB)
 -GEN_VEXT_VV(vwaddu_vv_b)
 -GEN_VEXT_VV(vwaddu_vv_h)
 -GEN_VEXT_VV(vwaddu_vv_w)
 -GEN_VEXT_VV(vwsubu_vv_b)
 -GEN_VEXT_VV(vwsubu_vv_h)
 -GEN_VEXT_VV(vwsubu_vv_w)
 -GEN_VEXT_VV(vwadd_vv_b)
 -GEN_VEXT_VV(vwadd_vv_h)
 -GEN_VEXT_VV(vwadd_vv_w)
 -GEN_VEXT_VV(vwsub_vv_b)
 -GEN_VEXT_VV(vwsub_vv_h)
 -GEN_VEXT_VV(vwsub_vv_w)
 -GEN_VEXT_VV(vwaddu_wv_b)
 -GEN_VEXT_VV(vwaddu_wv_h)
 -GEN_VEXT_VV(vwaddu_wv_w)
 -GEN_VEXT_VV(vwsubu_wv_b)
 -GEN_VEXT_VV(vwsubu_wv_h)
 -GEN_VEXT_VV(vwsubu_wv_w)
 -GEN_VEXT_VV(vwadd_wv_b)
 -GEN_VEXT_VV(vwadd_wv_h)
 -GEN_VEXT_VV(vwadd_wv_w)
 -GEN_VEXT_VV(vwsub_wv_b)
 -GEN_VEXT_VV(vwsub_wv_h)
 -GEN_VEXT_VV(vwsub_wv_w)
 +GEN_VEXT_VV(vwaddu_vv_b, 2)
 +GEN_VEXT_VV(vwaddu_vv_h, 4)
 +GEN_VEXT_VV(vwaddu_vv_w, 8)
 +GEN_VEXT_VV(vwsubu_vv_b, 2)
 +GEN_VEXT_VV(vwsubu_vv_h, 4)
 +GEN_VEXT_VV(vwsubu_vv_w, 8)
 +GEN_VEXT_VV(vwadd_vv_b, 2)
 +GEN_VEXT_VV(vwadd_vv_h, 4)
 +GEN_VEXT_VV(vwadd_vv_w, 8)
 +GEN_VEXT_VV(vwsub_vv_b, 2)
 +GEN_VEXT_VV(vwsub_vv_h, 4)
 +GEN_VEXT_VV(vwsub_vv_w, 8)
 +GEN_VEXT_VV(vwaddu_wv_b, 2)
 +GEN_VEXT_VV(vwaddu_wv_h, 4)
 +GEN_VEXT_VV(vwaddu_wv_w, 8)
 +GEN_VEXT_VV(vwsubu_wv_b, 2)
 +GEN_VEXT_VV(vwsubu_wv_h, 4)
 +GEN_VEXT_VV(vwsubu_wv_w, 8)
 +GEN_VEXT_VV(vwadd_wv_b, 2)
 +GEN_VEXT_VV(vwadd_wv_h, 4)
 +GEN_VEXT_VV(vwadd_wv_w, 8)
 +GEN_VEXT_VV(vwsub_wv_b, 2)
 +GEN_VEXT_VV(vwsub_wv_h, 4)
 +GEN_VEXT_VV(vwsub_wv_w, 8)
  RVVCALL(OPIVX2, vwaddu_vx_b, WOP_UUU_B, H2, H1, DO_ADD)
  RVVCALL(OPIVX2, vwaddu_vx_h, WOP_UUU_H, H4, H2, DO_ADD)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vxor_vv_b, OP_SSS_B, H1, H1, H1, DO_XOR)
  RVVCALL(OPIVV2, vxor_vv_h, OP_SSS_H, H2, H2, H2, DO_XOR)
  RVVCALL(OPIVV2, vxor_vv_w, OP_SSS_W, H4, H4, H4, DO_XOR)
  RVVCALL(OPIVV2, vxor_vv_d, OP_SSS_D, H8, H8, H8, DO_XOR)
 -GEN_VEXT_VV(vand_vv_b)
 -GEN_VEXT_VV(vand_vv_h)
 -GEN_VEXT_VV(vand_vv_w)
 -GEN_VEXT_VV(vand_vv_d)
 -GEN_VEXT_VV(vor_vv_b)
 -GEN_VEXT_VV(vor_vv_h)
 -GEN_VEXT_VV(vor_vv_w)
 -GEN_VEXT_VV(vor_vv_d)
 -GEN_VEXT_VV(vxor_vv_b)
 -GEN_VEXT_VV(vxor_vv_h)
 -GEN_VEXT_VV(vxor_vv_w)
 -GEN_VEXT_VV(vxor_vv_d)
 +GEN_VEXT_VV(vand_vv_b, 1)
 +GEN_VEXT_VV(vand_vv_h, 2)
 +GEN_VEXT_VV(vand_vv_w, 4)
 +GEN_VEXT_VV(vand_vv_d, 8)
 +GEN_VEXT_VV(vor_vv_b, 1)
 +GEN_VEXT_VV(vor_vv_h, 2)
 +GEN_VEXT_VV(vor_vv_w, 4)
 +GEN_VEXT_VV(vor_vv_d, 8)
 +GEN_VEXT_VV(vxor_vv_b, 1)
 +GEN_VEXT_VV(vxor_vv_h, 2)
 +GEN_VEXT_VV(vxor_vv_w, 4)
 +GEN_VEXT_VV(vxor_vv_d, 8)
  RVVCALL(OPIVX2, vand_vx_b, OP_SSS_B, H1, H1, DO_AND)
  RVVCALL(OPIVX2, vand_vx_h, OP_SSS_H, H2, H2, DO_AND)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vmax_vv_b, OP_SSS_B, H1, H1, H1, DO_MAX)
  RVVCALL(OPIVV2, vmax_vv_h, OP_SSS_H, H2, H2, H2, DO_MAX)
  RVVCALL(OPIVV2, vmax_vv_w, OP_SSS_W, H4, H4, H4, DO_MAX)
  RVVCALL(OPIVV2, vmax_vv_d, OP_SSS_D, H8, H8, H8, DO_MAX)
 -GEN_VEXT_VV(vminu_vv_b)
 -GEN_VEXT_VV(vminu_vv_h)
 -GEN_VEXT_VV(vminu_vv_w)
 -GEN_VEXT_VV(vminu_vv_d)
 -GEN_VEXT_VV(vmin_vv_b)
 -GEN_VEXT_VV(vmin_vv_h)
 -GEN_VEXT_VV(vmin_vv_w)
 -GEN_VEXT_VV(vmin_vv_d)
 -GEN_VEXT_VV(vmaxu_vv_b)
 -GEN_VEXT_VV(vmaxu_vv_h)
 -GEN_VEXT_VV(vmaxu_vv_w)
 -GEN_VEXT_VV(vmaxu_vv_d)
 -GEN_VEXT_VV(vmax_vv_b)
 -GEN_VEXT_VV(vmax_vv_h)
 -GEN_VEXT_VV(vmax_vv_w)
 -GEN_VEXT_VV(vmax_vv_d)
 +GEN_VEXT_VV(vminu_vv_b, 1)
 +GEN_VEXT_VV(vminu_vv_h, 2)
 +GEN_VEXT_VV(vminu_vv_w, 4)
 +GEN_VEXT_VV(vminu_vv_d, 8)
 +GEN_VEXT_VV(vmin_vv_b, 1)
 +GEN_VEXT_VV(vmin_vv_h, 2)
 +GEN_VEXT_VV(vmin_vv_w, 4)
 +GEN_VEXT_VV(vmin_vv_d, 8)
 +GEN_VEXT_VV(vmaxu_vv_b, 1)
 +GEN_VEXT_VV(vmaxu_vv_h, 2)
 +GEN_VEXT_VV(vmaxu_vv_w, 4)
 +GEN_VEXT_VV(vmaxu_vv_d, 8)
 +GEN_VEXT_VV(vmax_vv_b, 1)
 +GEN_VEXT_VV(vmax_vv_h, 2)
 +GEN_VEXT_VV(vmax_vv_w, 4)
 +GEN_VEXT_VV(vmax_vv_d, 8)
  RVVCALL(OPIVX2, vminu_vx_b, OP_UUU_B, H1, H1, DO_MIN)
  RVVCALL(OPIVX2, vminu_vx_h, OP_UUU_H, H2, H2, DO_MIN)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vmul_vv_b, OP_SSS_B, H1, H1, H1, DO_MUL)
  RVVCALL(OPIVV2, vmul_vv_h, OP_SSS_H, H2, H2, H2, DO_MUL)
  RVVCALL(OPIVV2, vmul_vv_w, OP_SSS_W, H4, H4, H4, DO_MUL)
  RVVCALL(OPIVV2, vmul_vv_d, OP_SSS_D, H8, H8, H8, DO_MUL)
 -GEN_VEXT_VV(vmul_vv_b)
 -GEN_VEXT_VV(vmul_vv_h)
 -GEN_VEXT_VV(vmul_vv_w)
 -GEN_VEXT_VV(vmul_vv_d)
 +GEN_VEXT_VV(vmul_vv_b, 1)
 +GEN_VEXT_VV(vmul_vv_h, 2)
 +GEN_VEXT_VV(vmul_vv_w, 4)
 +GEN_VEXT_VV(vmul_vv_d, 8)
  static int8_t do_mulh_b(int8_t s2, int8_t s1)
  {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vmulhsu_vv_b, OP_SUS_B, H1, H1, H1, do_mulhsu_b)
  RVVCALL(OPIVV2, vmulhsu_vv_h, OP_SUS_H, H2, H2, H2, do_mulhsu_h)
  RVVCALL(OPIVV2, vmulhsu_vv_w, OP_SUS_W, H4, H4, H4, do_mulhsu_w)
  RVVCALL(OPIVV2, vmulhsu_vv_d, OP_SUS_D, H8, H8, H8, do_mulhsu_d)
 -GEN_VEXT_VV(vmulh_vv_b)
 -GEN_VEXT_VV(vmulh_vv_h)
 -GEN_VEXT_VV(vmulh_vv_w)
 -GEN_VEXT_VV(vmulh_vv_d)
 -GEN_VEXT_VV(vmulhu_vv_b)
 -GEN_VEXT_VV(vmulhu_vv_h)
 -GEN_VEXT_VV(vmulhu_vv_w)
 -GEN_VEXT_VV(vmulhu_vv_d)
 -GEN_VEXT_VV(vmulhsu_vv_b)
 -GEN_VEXT_VV(vmulhsu_vv_h)
 -GEN_VEXT_VV(vmulhsu_vv_w)
 -GEN_VEXT_VV(vmulhsu_vv_d)
 +GEN_VEXT_VV(vmulh_vv_b, 1)
 +GEN_VEXT_VV(vmulh_vv_h, 2)
 +GEN_VEXT_VV(vmulh_vv_w, 4)
 +GEN_VEXT_VV(vmulh_vv_d, 8)
 +GEN_VEXT_VV(vmulhu_vv_b, 1)
 +GEN_VEXT_VV(vmulhu_vv_h, 2)
 +GEN_VEXT_VV(vmulhu_vv_w, 4)
 +GEN_VEXT_VV(vmulhu_vv_d, 8)
 +GEN_VEXT_VV(vmulhsu_vv_b, 1)
 +GEN_VEXT_VV(vmulhsu_vv_h, 2)
 +GEN_VEXT_VV(vmulhsu_vv_w, 4)
 +GEN_VEXT_VV(vmulhsu_vv_d, 8)
  RVVCALL(OPIVX2, vmul_vx_b, OP_SSS_B, H1, H1, DO_MUL)
  RVVCALL(OPIVX2, vmul_vx_h, OP_SSS_H, H2, H2, DO_MUL)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vrem_vv_b, OP_SSS_B, H1, H1, H1, DO_REM)
  RVVCALL(OPIVV2, vrem_vv_h, OP_SSS_H, H2, H2, H2, DO_REM)
  RVVCALL(OPIVV2, vrem_vv_w, OP_SSS_W, H4, H4, H4, DO_REM)
  RVVCALL(OPIVV2, vrem_vv_d, OP_SSS_D, H8, H8, H8, DO_REM)
 -GEN_VEXT_VV(vdivu_vv_b)
 -GEN_VEXT_VV(vdivu_vv_h)
 -GEN_VEXT_VV(vdivu_vv_w)
 -GEN_VEXT_VV(vdivu_vv_d)
 -GEN_VEXT_VV(vdiv_vv_b)
 -GEN_VEXT_VV(vdiv_vv_h)
 -GEN_VEXT_VV(vdiv_vv_w)
 -GEN_VEXT_VV(vdiv_vv_d)
 -GEN_VEXT_VV(vremu_vv_b)
 -GEN_VEXT_VV(vremu_vv_h)
 -GEN_VEXT_VV(vremu_vv_w)
 -GEN_VEXT_VV(vremu_vv_d)
 -GEN_VEXT_VV(vrem_vv_b)
 -GEN_VEXT_VV(vrem_vv_h)
 -GEN_VEXT_VV(vrem_vv_w)
 -GEN_VEXT_VV(vrem_vv_d)
 +GEN_VEXT_VV(vdivu_vv_b, 1)
 +GEN_VEXT_VV(vdivu_vv_h, 2)
 +GEN_VEXT_VV(vdivu_vv_w, 4)
 +GEN_VEXT_VV(vdivu_vv_d, 8)
 +GEN_VEXT_VV(vdiv_vv_b, 1)
 +GEN_VEXT_VV(vdiv_vv_h, 2)
 +GEN_VEXT_VV(vdiv_vv_w, 4)
 +GEN_VEXT_VV(vdiv_vv_d, 8)
 +GEN_VEXT_VV(vremu_vv_b, 1)
 +GEN_VEXT_VV(vremu_vv_h, 2)
 +GEN_VEXT_VV(vremu_vv_w, 4)
 +GEN_VEXT_VV(vremu_vv_d, 8)
 +GEN_VEXT_VV(vrem_vv_b, 1)
 +GEN_VEXT_VV(vrem_vv_h, 2)
 +GEN_VEXT_VV(vrem_vv_w, 4)
 +GEN_VEXT_VV(vrem_vv_d, 8)
  RVVCALL(OPIVX2, vdivu_vx_b, OP_UUU_B, H1, H1, DO_DIVU)
  RVVCALL(OPIVX2, vdivu_vx_h, OP_UUU_H, H2, H2, DO_DIVU)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vwmulu_vv_w, WOP_UUU_W, H8, H4, H4, DO_MUL)
  RVVCALL(OPIVV2, vwmulsu_vv_b, WOP_SUS_B, H2, H1, H1, DO_MUL)
  RVVCALL(OPIVV2, vwmulsu_vv_h, WOP_SUS_H, H4, H2, H2, DO_MUL)
  RVVCALL(OPIVV2, vwmulsu_vv_w, WOP_SUS_W, H8, H4, H4, DO_MUL)
 -GEN_VEXT_VV(vwmul_vv_b)
 -GEN_VEXT_VV(vwmul_vv_h)
 -GEN_VEXT_VV(vwmul_vv_w)
 -GEN_VEXT_VV(vwmulu_vv_b)
 -GEN_VEXT_VV(vwmulu_vv_h)
 -GEN_VEXT_VV(vwmulu_vv_w)
 -GEN_VEXT_VV(vwmulsu_vv_b)
 -GEN_VEXT_VV(vwmulsu_vv_h)
 -GEN_VEXT_VV(vwmulsu_vv_w)
 +GEN_VEXT_VV(vwmul_vv_b, 2)
 +GEN_VEXT_VV(vwmul_vv_h, 4)
 +GEN_VEXT_VV(vwmul_vv_w, 8)
 +GEN_VEXT_VV(vwmulu_vv_b, 2)
 +GEN_VEXT_VV(vwmulu_vv_h, 4)
 +GEN_VEXT_VV(vwmulu_vv_w, 8)
 +GEN_VEXT_VV(vwmulsu_vv_b, 2)
 +GEN_VEXT_VV(vwmulsu_vv_h, 4)
 +GEN_VEXT_VV(vwmulsu_vv_w, 8)
  RVVCALL(OPIVX2, vwmul_vx_b, WOP_SSS_B, H2, H1, DO_MUL)
  RVVCALL(OPIVX2, vwmul_vx_h, WOP_SSS_H, H4, H2, DO_MUL)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV3, vnmsub_vv_b, OP_SSS_B, H1, H1, H1, DO_NMSUB)
  RVVCALL(OPIVV3, vnmsub_vv_h, OP_SSS_H, H2, H2, H2, DO_NMSUB)
  RVVCALL(OPIVV3, vnmsub_vv_w, OP_SSS_W, H4, H4, H4, DO_NMSUB)
  RVVCALL(OPIVV3, vnmsub_vv_d, OP_SSS_D, H8, H8, H8, DO_NMSUB)
 -GEN_VEXT_VV(vmacc_vv_b)
 -GEN_VEXT_VV(vmacc_vv_h)
 -GEN_VEXT_VV(vmacc_vv_w)
 -GEN_VEXT_VV(vmacc_vv_d)
 -GEN_VEXT_VV(vnmsac_vv_b)
 -GEN_VEXT_VV(vnmsac_vv_h)
 -GEN_VEXT_VV(vnmsac_vv_w)
 -GEN_VEXT_VV(vnmsac_vv_d)
 -GEN_VEXT_VV(vmadd_vv_b)
 -GEN_VEXT_VV(vmadd_vv_h)
 -GEN_VEXT_VV(vmadd_vv_w)
 -GEN_VEXT_VV(vmadd_vv_d)
 -GEN_VEXT_VV(vnmsub_vv_b)
 -GEN_VEXT_VV(vnmsub_vv_h)
 -GEN_VEXT_VV(vnmsub_vv_w)
 -GEN_VEXT_VV(vnmsub_vv_d)
 +GEN_VEXT_VV(vmacc_vv_b, 1)
 +GEN_VEXT_VV(vmacc_vv_h, 2)
 +GEN_VEXT_VV(vmacc_vv_w, 4)
 +GEN_VEXT_VV(vmacc_vv_d, 8)
 +GEN_VEXT_VV(vnmsac_vv_b, 1)
 +GEN_VEXT_VV(vnmsac_vv_h, 2)
 +GEN_VEXT_VV(vnmsac_vv_w, 4)
 +GEN_VEXT_VV(vnmsac_vv_d, 8)
 +GEN_VEXT_VV(vmadd_vv_b, 1)
 +GEN_VEXT_VV(vmadd_vv_h, 2)
 +GEN_VEXT_VV(vmadd_vv_w, 4)
 +GEN_VEXT_VV(vmadd_vv_d, 8)
 +GEN_VEXT_VV(vnmsub_vv_b, 1)
 +GEN_VEXT_VV(vnmsub_vv_h, 2)
 +GEN_VEXT_VV(vnmsub_vv_w, 4)
 +GEN_VEXT_VV(vnmsub_vv_d, 8)
  #define OPIVX3(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
  static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV3, vwmacc_vv_w, WOP_SSS_W, H8, H4, H4, DO_MACC)
  RVVCALL(OPIVV3, vwmaccsu_vv_b, WOP_SSU_B, H2, H1, H1, DO_MACC)
  RVVCALL(OPIVV3, vwmaccsu_vv_h, WOP_SSU_H, H4, H2, H2, DO_MACC)
  RVVCALL(OPIVV3, vwmaccsu_vv_w, WOP_SSU_W, H8, H4, H4, DO_MACC)
 -GEN_VEXT_VV(vwmaccu_vv_b)
 -GEN_VEXT_VV(vwmaccu_vv_h)
 -GEN_VEXT_VV(vwmaccu_vv_w)
 -GEN_VEXT_VV(vwmacc_vv_b)
 -GEN_VEXT_VV(vwmacc_vv_h)
 -GEN_VEXT_VV(vwmacc_vv_w)
 -GEN_VEXT_VV(vwmaccsu_vv_b)
 -GEN_VEXT_VV(vwmaccsu_vv_h)
 -GEN_VEXT_VV(vwmaccsu_vv_w)
 +GEN_VEXT_VV(vwmaccu_vv_b, 2)
 +GEN_VEXT_VV(vwmaccu_vv_h, 4)
 +GEN_VEXT_VV(vwmaccu_vv_w, 8)
 +GEN_VEXT_VV(vwmacc_vv_b, 2)
 +GEN_VEXT_VV(vwmacc_vv_h, 4)
 +GEN_VEXT_VV(vwmacc_vv_w, 8)
 +GEN_VEXT_VV(vwmaccsu_vv_b, 2)
 +GEN_VEXT_VV(vwmaccsu_vv_h, 4)
 +GEN_VEXT_VV(vwmaccsu_vv_w, 8)
  RVVCALL(OPIVX3, vwmaccu_vx_b, WOP_UUU_B, H2, H1, DO_MACC)
  RVVCALL(OPIVX3, vwmaccu_vx_h, WOP_UUU_H, H4, H2, DO_MACC)
 diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/riscv/insn_trans/trans_rvv.c.inc
+--- a/target/riscv/meson.build
-+++ b/target/riscv/insn_trans/trans_rvv.c.inc
++++ b/target/riscv/meson.build
-@@ -XXX,XX +XXX,XX @@ do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn *gvec_fn,
+@@ -XXX,XX +XXX,XX @@ riscv_ss.add(files(
-     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+   'gdbstub.c',
-     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+   'op_helper.c',
+   'vector_helper.c',
--    if (a->vm && s->vl_eq_vlmax) {
++  'vector_internals.c',
-+    if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
+   'bitmanip_helper.c',
-         gvec_fn(s->sew, vreg_ofs(s, a->rd),
+   'translate.c',
-                 vreg_ofs(s, a->rs2), vreg_ofs(s, a->rs1),
+   'm128_helper.c',
                  MAXSZ(s), MAXSZ(s));
@@ -XXX,XX +XXX,XX @@ do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn *gvec_fn,
          data = FIELD_DP32(data, VDATA, VM, a->vm);
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);
          tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
                             vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),
                             cpu_env, s->cfg_ptr->vlen / 8,
 --
-.36.1
+.41.0

-[PULL 25/25] target/riscv: trans_rvv: Avoid assert for RV32 and e64
+[PULL v2 09/45] target/riscv: Refactor vector-vector translation macro
-From: Alistair Francis <alistair.francis@wdc.com>
+From: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
-When running a 32-bit guest, with a e64 vmv.v.x and vl_eq_vlmax set to
+Refactor the non SEW-specific stuff out of `GEN_OPIVV_TRANS` into
-true the `tcg_debug_assert(vece <= MO_32)` will be triggered inside
+function `opivv_trans` (similar to `opivi_trans`). `opivv_trans` will be
-tcg_gen_gvec_dup_i32().
+used in proceeding vector-crypto commits.
-This patch checks that condition and instead uses tcg_gen_gvec_dup_i64()
+Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
 is required.
 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1028
 Suggested-by: Robert Bu <robert.bu@gmail.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-Id: <20220608234701.369536-1-alistair.francis@opensource.wdc.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
 Signed-off-by: Max Chou <max.chou@sifive.com>
 Message-ID: <20230711165917.2629866-3-max.chou@sifive.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/insn_trans/trans_rvv.c.inc | 12 ++++++++++--
+ target/riscv/insn_trans/trans_rvv.c.inc | 62 +++++++++++++------------
-file changed, 10 insertions(+), 2 deletions(-)
+file changed, 32 insertions(+), 30 deletions(-)
 diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/insn_trans/trans_rvv.c.inc
 +++ b/target/riscv/insn_trans/trans_rvv.c.inc
-@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_x(DisasContext *s, arg_vmv_v_x *a)
+@@ -XXX,XX +XXX,XX @@ GEN_OPIWX_WIDEN_TRANS(vwadd_wx)
-         s1 = get_gpr(s, a->rs1, EXT_SIGN);
+ GEN_OPIWX_WIDEN_TRANS(vwsubu_wx)
+ GEN_OPIWX_WIDEN_TRANS(vwsub_wx)
-         if (s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
--            tcg_gen_gvec_dup_tl(s->sew, vreg_ofs(s, a->rd),
++static bool opivv_trans(uint32_t vd, uint32_t vs1, uint32_t vs2, uint32_t vm,
--                                MAXSZ(s), MAXSZ(s), s1);
++                        gen_helper_gvec_4_ptr *fn, DisasContext *s)
-+            if (get_xl(s) == MXL_RV32 && s->sew == MO_64) {
++{
-+                TCGv_i64 s1_i64 = tcg_temp_new_i64();
++    uint32_t data = 0;
-+                tcg_gen_ext_tl_i64(s1_i64, s1);
++    TCGLabel *over = gen_new_label();
-+                tcg_gen_gvec_dup_i64(s->sew, vreg_ofs(s, a->rd),
++    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
-+                                     MAXSZ(s), MAXSZ(s), s1_i64);
++    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-+                tcg_temp_free_i64(s1_i64);
++
-+            } else {
++    data = FIELD_DP32(data, VDATA, VM, vm);
-+                tcg_gen_gvec_dup_tl(s->sew, vreg_ofs(s, a->rd),
++    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
-+                                    MAXSZ(s), MAXSZ(s), s1);
++    data = FIELD_DP32(data, VDATA, VTA, s->vta);
-+            }
++    data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);
-         } else {
++    data = FIELD_DP32(data, VDATA, VMA, s->vma);
-             TCGv_i32 desc;
++    tcg_gen_gvec_4_ptr(vreg_ofs(s, vd), vreg_ofs(s, 0), vreg_ofs(s, vs1),
-             TCGv_i64 s1_i64 = tcg_temp_new_i64();
++                       vreg_ofs(s, vs2), cpu_env, s->cfg_ptr->vlen / 8,
 +                       s->cfg_ptr->vlen / 8, data, fn);
 +    mark_vs_dirty(s);
 +    gen_set_label(over);
 +    return true;
 +}
 +
  /* Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions */
  /* OPIVV without GVEC IR */
 -#define GEN_OPIVV_TRANS(NAME, CHECK)                               \
 -static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
 -{                                                                  \
 -    if (CHECK(s, a)) {                                             \
 -        uint32_t data = 0;                                         \
 -        static gen_helper_gvec_4_ptr * const fns[4] = {            \
 -            gen_helper_##NAME##_b, gen_helper_##NAME##_h,          \
 -            gen_helper_##NAME##_w, gen_helper_##NAME##_d,          \
 -        };                                                         \
 -        TCGLabel *over = gen_new_label();                          \
 -        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
 -        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
 -                                                                   \
 -        data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
 -        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
 -        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
 -        data =                                                     \
 -            FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);\
 -        data = FIELD_DP32(data, VDATA, VMA, s->vma);               \
 -        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
 -                           vreg_ofs(s, a->rs1),                    \
 -                           vreg_ofs(s, a->rs2), cpu_env,           \
 -                           s->cfg_ptr->vlen / 8,                   \
 -                           s->cfg_ptr->vlen / 8, data,             \
 -                           fns[s->sew]);                           \
 -        mark_vs_dirty(s);                                          \
 -        gen_set_label(over);                                       \
 -        return true;                                               \
 -    }                                                              \
 -    return false;                                                  \
 +#define GEN_OPIVV_TRANS(NAME, CHECK)                                     \
 +static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
 +{                                                                        \
 +    if (CHECK(s, a)) {                                                   \
 +        static gen_helper_gvec_4_ptr * const fns[4] = {                  \
 +            gen_helper_##NAME##_b, gen_helper_##NAME##_h,                \
 +            gen_helper_##NAME##_w, gen_helper_##NAME##_d,                \
 +        };                                                               \
 +        return opivv_trans(a->rd, a->rs1, a->rs2, a->vm, fns[s->sew], s);\
 +    }                                                                    \
 +    return false;                                                        \
  }
  /*
 --
-.36.1
+.41.0

-[PULL 11/25] target/riscv: rvv: Early exit when vstart >= vl
+[PULL v2 10/45] target/riscv: Remove redundant "cpu_vl == 0" checks
-From: eopXD <yueh.ting.chen@gmail.com>
+From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
-According to v-spec (section 5.4):
+Remove the redundant "vl == 0" check which is already included within the  vstart >= vl check, when vl == 0.
 When vstart ≥ vl, there are no body elements, and no elements are
 updated in any destination vector register group, including that
 no tail elements are updated with agnostic values.
-vmsbf.m, vmsif.m, vmsof.m, viota.m, vcompress instructions themselves
+Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
 require vstart to be zero. So they don't need the early exit.
 Signed-off-by: eop Chen <eop.chen@sifive.com>
 Reviewed-by: Frank Chang <frank.chang@sifive.com>
 Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
+Signed-off-by: Max Chou <max.chou@sifive.com>
 Acked-by: Alistair Francis <alistair.francis@wdc.com>
-Message-Id: <165449614532.19704.7000832880482980398-4@git.sr.ht>
+Message-ID: <20230711165917.2629866-4-max.chou@sifive.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/insn_trans/trans_rvv.c.inc | 27 +++++++++++++++++++++++++
+ target/riscv/insn_trans/trans_rvv.c.inc | 31 +------------------------
-file changed, 27 insertions(+)
+file changed, 1 insertion(+), 30 deletions(-)
 diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/insn_trans/trans_rvv.c.inc
 +++ b/target/riscv/insn_trans/trans_rvv.c.inc
 @@ -XXX,XX +XXX,XX @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
+     TCGv_i32 desc;
-     TCGLabel *over = gen_new_label();
-     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+     TCGLabel *over = gen_new_label();
-+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-     dest = tcg_temp_new_ptr();
-     mask = tcg_temp_new_ptr();
+     dest = tcg_temp_new_ptr();
 @@ -XXX,XX +XXX,XX @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
+     TCGv_i32 desc;
-     TCGLabel *over = gen_new_label();
-     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+     TCGLabel *over = gen_new_label();
-+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-     dest = tcg_temp_new_ptr();
-     mask = tcg_temp_new_ptr();
+     dest = tcg_temp_new_ptr();
 @@ -XXX,XX +XXX,XX @@ static bool ldst_index_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+     TCGv_i32 desc;
-     TCGLabel *over = gen_new_label();
-     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+     TCGLabel *over = gen_new_label();
-+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-     dest = tcg_temp_new_ptr();
-     mask = tcg_temp_new_ptr();
+     dest = tcg_temp_new_ptr();
 @@ -XXX,XX +XXX,XX @@ static bool ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
+     TCGv_i32 desc;
-     TCGLabel *over = gen_new_label();
-     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+     TCGLabel *over = gen_new_label();
-+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-     dest = tcg_temp_new_ptr();
-     mask = tcg_temp_new_ptr();
+     dest = tcg_temp_new_ptr();
 @@ -XXX,XX +XXX,XX @@ do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn *gvec_fn,
+         return false;
      }
-     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
-+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-     if (a->vm && s->vl_eq_vlmax) {
+     if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
          gvec_fn(s->sew, vreg_ofs(s, a->rd),
 @@ -XXX,XX +XXX,XX @@ static bool opivx_trans(uint32_t vd, uint32_t rs1, uint32_t vs2, uint32_t vm,
+     uint32_t data = 0;
-     TCGLabel *over = gen_new_label();
-     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+     TCGLabel *over = gen_new_label();
-+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-     dest = tcg_temp_new_ptr();
-     mask = tcg_temp_new_ptr();
+     dest = tcg_temp_new_ptr();
 @@ -XXX,XX +XXX,XX @@ static bool opivi_trans(uint32_t vd, uint32_t imm, uint32_t vs2, uint32_t vm,
+     uint32_t data = 0;
-     TCGLabel *over = gen_new_label();
-     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+     TCGLabel *over = gen_new_label();
-+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-     dest = tcg_temp_new_ptr();
-     mask = tcg_temp_new_ptr();
+     dest = tcg_temp_new_ptr();
 @@ -XXX,XX +XXX,XX @@ static bool do_opivv_widen(DisasContext *s, arg_rmrr *a,
-         uint32_t data = 0;
+     if (checkfn(s, a)) {
-         TCGLabel *over = gen_new_label();
+         uint32_t data = 0;
-         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+         TCGLabel *over = gen_new_label();
-+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-         data = FIELD_DP32(data, VDATA, VM, a->vm);
-         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+         data = FIELD_DP32(data, VDATA, VM, a->vm);
 @@ -XXX,XX +XXX,XX @@ static bool do_opiwv_widen(DisasContext *s, arg_rmrr *a,
-         uint32_t data = 0;
+     if (opiwv_widen_check(s, a)) {
-         TCGLabel *over = gen_new_label();
+         uint32_t data = 0;
-         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+         TCGLabel *over = gen_new_label();
-+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-         data = FIELD_DP32(data, VDATA, VM, a->vm);
-         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+         data = FIELD_DP32(data, VDATA, VM, a->vm);
@@ -XXX,XX +XXX,XX @@ static bool opivv_trans(uint32_t vd, uint32_t vs1, uint32_t vs2, uint32_t vm,
  {
      uint32_t data = 0;
      TCGLabel *over = gen_new_label();
 -    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
      tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
      data = FIELD_DP32(data, VDATA, VM, vm);
 @@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
-         };                                                         \
+             gen_helper_##NAME##_w,                                 \
-         TCGLabel *over = gen_new_label();                          \
+         };                                                         \
-         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
+         TCGLabel *over = gen_new_label();                          \
-+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
+-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
-                                                                    \
+         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
-         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+                                                                    \
-         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
          };                                                         \
          TCGLabel *over = gen_new_label();                          \
          tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
 +        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
 @@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_v(DisasContext *s, arg_vmv_v_v *a)
+                 gen_helper_vmv_v_v_w, gen_helper_vmv_v_v_d,
              };
              TCGLabel *over = gen_new_label();
-             tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+-            tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
-+            tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+             tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
              tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),
-                                cpu_env, s->cfg_ptr->vlen / 8,
 @@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_x(DisasContext *s, arg_vmv_v_x *a)
+         vext_check_ss(s, a->rd, 0, 1)) {
          TCGv s1;
          TCGLabel *over = gen_new_label();
-         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
-+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
          s1 = get_gpr(s, a->rs1, EXT_SIGN);
 @@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
+                 gen_helper_vmv_v_x_w, gen_helper_vmv_v_x_d,
              };
              TCGLabel *over = gen_new_label();
-             tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+-            tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
-+            tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+             tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
              s1 = tcg_constant_i64(simm);
-             dest = tcg_temp_new_ptr();
 @@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+         };                                                         \
          TCGLabel *over = gen_new_label();                          \
          gen_set_rm(s, RISCV_FRM_DYN);                              \
-         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
+-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
-+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
+         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
 @@ -XXX,XX +XXX,XX @@ static bool opfvf_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+     TCGv_i64 t1;
-     TCGLabel *over = gen_new_label();
-     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+     TCGLabel *over = gen_new_label();
-+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-     dest = tcg_temp_new_ptr();
-     mask = tcg_temp_new_ptr();
+     dest = tcg_temp_new_ptr();
 @@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
+         };                                                       \
          TCGLabel *over = gen_new_label();                        \
          gen_set_rm(s, RISCV_FRM_DYN);                            \
-         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);        \
+-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);        \
-+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);\
+         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);\
                                                                   \
          data = FIELD_DP32(data, VDATA, VM, a->vm);               \
-         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);           \
 @@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+         };                                                         \
          TCGLabel *over = gen_new_label();                          \
          gen_set_rm(s, RISCV_FRM_DYN);                              \
-         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
+-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
-+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
+         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
 @@ -XXX,XX +XXX,XX @@ static bool do_opfv(DisasContext *s, arg_rmr *a,
-         TCGLabel *over = gen_new_label();
+         uint32_t data = 0;
-         gen_set_rm(s, rm);
+         TCGLabel *over = gen_new_label();
-         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+         gen_set_rm_chkfrm(s, rm);
-+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-         data = FIELD_DP32(data, VDATA, VM, a->vm);
-         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+         data = FIELD_DP32(data, VDATA, VM, a->vm);
 @@ -XXX,XX +XXX,XX @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f *a)
+                 gen_helper_vmv_v_x_d,
              };
              TCGLabel *over = gen_new_label();
-             tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+-            tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
-+            tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+             tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
              t1 = tcg_temp_new_i64();
-             /* NaN-box f[rs1] */
+@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
-@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
+         };                                                         \
          TCGLabel *over = gen_new_label();                          \
-         gen_set_rm(s, FRM);                                        \
+         gen_set_rm_chkfrm(s, FRM);                                 \
-         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
+-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
-+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
+         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
          };                                                         \
          TCGLabel *over = gen_new_label();                          \
          gen_set_rm(s, RISCV_FRM_DYN);                              \
 -        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
          tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
          };                                                         \
          TCGLabel *over = gen_new_label();                          \
          gen_set_rm_chkfrm(s, FRM);                                 \
 -        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
          tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
          };                                                         \
          TCGLabel *over = gen_new_label();                          \
          gen_set_rm_chkfrm(s, FRM);                                 \
 -        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
          tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_r *a)                \
          uint32_t data = 0;                                         \
          gen_helper_gvec_4_ptr *fn = gen_helper_##NAME;             \
          TCGLabel *over = gen_new_label();                          \
 -        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
          tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                     \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
-@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
-         TCGLabel *over = gen_new_label();                          \
-         gen_set_rm(s, RISCV_FRM_DYN);                              \
-         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
-+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
-                                                                    \
-         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
-         tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
-@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
-         TCGLabel *over = gen_new_label();                          \
-         gen_set_rm(s, FRM);                                        \
-         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
-+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
-                                                                    \
-         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
-         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
-@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
-         TCGLabel *over = gen_new_label();                          \
-         gen_set_rm(s, FRM);                                        \
-         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
-+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
-                                                                    \
-         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
-         tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
-@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_r *a)                \
-         gen_helper_gvec_4_ptr *fn = gen_helper_##NAME;             \
-         TCGLabel *over = gen_new_label();                          \
-         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
-+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
-                                                                    \
-         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
-         tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
 @@ -XXX,XX +XXX,XX @@ static bool trans_vid_v(DisasContext *s, arg_vid_v *a)
-         uint32_t data = 0;
+         require_vm(a->vm, a->rd)) {
-         TCGLabel *over = gen_new_label();
+         uint32_t data = 0;
-         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+         TCGLabel *over = gen_new_label();
-+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
-         data = FIELD_DP32(data, VDATA, VM, a->vm);
-         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+         data = FIELD_DP32(data, VDATA, VM, a->vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
          TCGv s1;
          TCGLabel *over = gen_new_label();
 -        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
          tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
          t1 = tcg_temp_new_i64();
@@ -XXX,XX +XXX,XX @@ static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
          TCGv_i64 t1;
          TCGLabel *over = gen_new_label();
 -        /* if vl == 0 or vstart >= vl, skip vector register write back */
 -        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
 +        /* if vstart >= vl, skip vector register write back */
          tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
          /* NaN-box f[rs1] */
 @@ -XXX,XX +XXX,XX @@ static bool int_ext_op(DisasContext *s, arg_rmr *a, uint8_t seq)
+     uint32_t data = 0;
      gen_helper_gvec_3_ptr *fn;
      TCGLabel *over = gen_new_label();
-     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
-+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
      static gen_helper_gvec_3_ptr * const fns[6][4] = {
-         {
 --
-.36.1
+.41.0

-[PULL 13/25] target/riscv: rvv: Add tail agnostic for vector load / store instructions
+[PULL v2 11/45] target/riscv: Add Zvbc ISA extension support
-From: eopXD <yueh.ting.chen@gmail.com>
+From: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
-Destination register of unit-stride mask load and store instructions are
+This commit adds support for the Zvbc vector-crypto extension, which
-always written with a tail-agnostic policy.
+consists of the following instructions:
-A vector segment load / store instruction may contain fractional lmul
+* vclmulh.[vx,vv]
-with nf * lmul > 1. The rest of the elements in the last register should
+* vclmul.[vx,vv]
-be treated as tail elements.
+Translation functions are defined in
-Signed-off-by: eop Chen <eop.chen@sifive.com>
+`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
-Reviewed-by: Frank Chang <frank.chang@sifive.com>
+`target/riscv/vcrypto_helper.c`.
-Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
-Acked-by: Alistair Francis <alistair.francis@wdc.com>
+Co-authored-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
-Message-Id: <165449614532.19704.7000832880482980398-6@git.sr.ht>
+Co-authored-by: Max Chou <max.chou@sifive.com>
 Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
 Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
 Signed-off-by: Max Chou <max.chou@sifive.com>
 [max.chou@sifive.com: Exposed x-zvbc property]
 Message-ID: <20230711165917.2629866-5-max.chou@sifive.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/translate.c                |  2 +
+ target/riscv/cpu_cfg.h                   |  1 +
- target/riscv/vector_helper.c            | 60 +++++++++++++++++++++++++
+ target/riscv/helper.h                    |  6 +++
- target/riscv/insn_trans/trans_rvv.c.inc |  6 +++
+ target/riscv/insn32.decode               |  6 +++
-files changed, 68 insertions(+)
+ target/riscv/cpu.c                       |  9 ++++
+ target/riscv/translate.c                 |  1 +
  target/riscv/vcrypto_helper.c            | 59 ++++++++++++++++++++++
  target/riscv/insn_trans/trans_rvvk.c.inc | 62 ++++++++++++++++++++++++
  target/riscv/meson.build                 |  3 +-
 files changed, 146 insertions(+), 1 deletion(-)
  create mode 100644 target/riscv/vcrypto_helper.c
  create mode 100644 target/riscv/insn_trans/trans_rvvk.c.inc
 diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/cpu_cfg.h
 +++ b/target/riscv/cpu_cfg.h
@@ -XXX,XX +XXX,XX @@ struct RISCVCPUConfig {
      bool ext_zve32f;
      bool ext_zve64f;
      bool ext_zve64d;
 +    bool ext_zvbc;
      bool ext_zmmul;
      bool ext_zvfbfmin;
      bool ext_zvfbfwma;
 diff --git a/target/riscv/helper.h b/target/riscv/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/helper.h
 +++ b/target/riscv/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_5(vfwcvtbf16_f_f_v, void, ptr, ptr, ptr, env, i32)
  DEF_HELPER_6(vfwmaccbf16_vv, void, ptr, ptr, ptr, ptr, env, i32)
  DEF_HELPER_6(vfwmaccbf16_vf, void, ptr, ptr, i64, ptr, env, i32)
 +
 +/* Vector crypto functions */
 +DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
 +DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
 +DEF_HELPER_6(vclmulh_vv, void, ptr, ptr, ptr, ptr, env, i32)
 +DEF_HELPER_6(vclmulh_vx, void, ptr, ptr, tl, ptr, env, i32)
 diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/insn32.decode
 +++ b/target/riscv/insn32.decode
@@ -XXX,XX +XXX,XX @@ vfwcvtbf16_f_f_v  010010 . ..... 01101 001 ..... 1010111 @r2_vm
  # *** Zvfbfwma Standard Extension ***
  vfwmaccbf16_vv    111011 . ..... ..... 001 ..... 1010111 @r_vm
  vfwmaccbf16_vf    111011 . ..... ..... 101 ..... 1010111 @r_vm
 +
 +# *** Zvbc vector crypto extension ***
 +vclmul_vv   001100 . ..... ..... 010 ..... 1010111 @r_vm
 +vclmul_vx   001100 . ..... ..... 110 ..... 1010111 @r_vm
 +vclmulh_vv  001101 . ..... ..... 010 ..... 1010111 @r_vm
 +vclmulh_vx  001101 . ..... ..... 110 ..... 1010111 @r_vm
 diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/cpu.c
 +++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
      ISA_EXT_DATA_ENTRY(zksed, PRIV_VERSION_1_12_0, ext_zksed),
      ISA_EXT_DATA_ENTRY(zksh, PRIV_VERSION_1_12_0, ext_zksh),
      ISA_EXT_DATA_ENTRY(zkt, PRIV_VERSION_1_12_0, ext_zkt),
 +    ISA_EXT_DATA_ENTRY(zvbc, PRIV_VERSION_1_12_0, ext_zvbc),
      ISA_EXT_DATA_ENTRY(zve32f, PRIV_VERSION_1_10_0, ext_zve32f),
      ISA_EXT_DATA_ENTRY(zve64f, PRIV_VERSION_1_10_0, ext_zve64f),
      ISA_EXT_DATA_ENTRY(zve64d, PRIV_VERSION_1_10_0, ext_zve64d),
@@ -XXX,XX +XXX,XX @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
          return;
      }
 +    if (cpu->cfg.ext_zvbc && !cpu->cfg.ext_zve64f) {
 +        error_setg(errp, "Zvbc extension requires V or Zve64{f,d} extensions");
 +        return;
 +    }
 +
      if (cpu->cfg.ext_zk) {
          cpu->cfg.ext_zkn = true;
          cpu->cfg.ext_zkr = true;
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
      DEFINE_PROP_BOOL("x-zvfbfmin", RISCVCPU, cfg.ext_zvfbfmin, false),
      DEFINE_PROP_BOOL("x-zvfbfwma", RISCVCPU, cfg.ext_zvfbfwma, false),
 +    /* Vector cryptography extensions */
 +    DEFINE_PROP_BOOL("x-zvbc", RISCVCPU, cfg.ext_zvbc, false),
 +
      DEFINE_PROP_END_OF_LIST(),
  };
 diff --git a/target/riscv/translate.c b/target/riscv/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/translate.c
 +++ b/target/riscv/translate.c
-@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
+@@ -XXX,XX +XXX,XX @@ static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
-     int8_t lmul;
+ #include "insn_trans/trans_rvzfa.c.inc"
-     uint8_t sew;
+ #include "insn_trans/trans_rvzfh.c.inc"
-     uint8_t vta;
+ #include "insn_trans/trans_rvk.c.inc"
-+    bool cfg_vta_all_1s;
++#include "insn_trans/trans_rvvk.c.inc"
-     target_ulong vstart;
+ #include "insn_trans/trans_privileged.c.inc"
-     bool vl_eq_vlmax;
+ #include "insn_trans/trans_svinval.c.inc"
-     uint8_t ntemp;
+ #include "insn_trans/trans_rvbf16.c.inc"
-@@ -XXX,XX +XXX,XX @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
+diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
-     ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
+new file mode 100644
-     ctx->lmul = sextract32(FIELD_EX32(tb_flags, TB_FLAGS, LMUL), 0, 3);
+index XXXXXXX..XXXXXXX
-     ctx->vta = FIELD_EX32(tb_flags, TB_FLAGS, VTA) && cpu->cfg.rvv_ta_all_1s;
+--- /dev/null
-+    ctx->cfg_vta_all_1s = cpu->cfg.rvv_ta_all_1s;
++++ b/target/riscv/vcrypto_helper.c
-     ctx->vstart = env->vstart;
+@@ -XXX,XX +XXX,XX @@
-     ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
++/*
-     ctx->misa_mxl_max = env->misa_mxl_max;
++ * RISC-V Vector Crypto Extension Helpers for QEMU.
-diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
++ *
-index XXXXXXX..XXXXXXX 100644
++ * Copyright (C) 2023 SiFive, Inc.
---- a/target/riscv/vector_helper.c
++ * Written by Codethink Ltd and SiFive.
-+++ b/target/riscv/vector_helper.c
++ *
-@@ -XXX,XX +XXX,XX @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
++ * This program is free software; you can redistribute it and/or modify it
-     uint32_t i, k;
++ * under the terms and conditions of the GNU General Public License,
-     uint32_t nf = vext_nf(desc);
++ * version 2 or later, as published by the Free Software Foundation.
-     uint32_t max_elems = vext_max_elems(desc, log2_esz);
++ *
-+    uint32_t esz = 1 << log2_esz;
++ * This program is distributed in the hope it will be useful, but WITHOUT
-+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
++ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-+    uint32_t vta = vext_vta(desc);
++ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
++ * more details.
-     for (i = env->vstart; i < env->vl; i++, env->vstart++) {
++ *
-         if (!vm && !vext_elem_mask(v0, i)) {
++ * You should have received a copy of the GNU General Public License along with
-@@ -XXX,XX +XXX,XX @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
++ * this program.  If not, see <http://www.gnu.org/licenses/>.
-         }
++ */
-     }
++
-     env->vstart = 0;
++#include "qemu/osdep.h"
-+    /* set tail elements to 1s */
++#include "qemu/host-utils.h"
-+    for (k = 0; k < nf; ++k) {
++#include "qemu/bitops.h"
-+        vext_set_elems_1s(vd, vta, (k * max_elems + env->vl) * esz,
++#include "cpu.h"
-+                          (k * max_elems + max_elems) * esz);
++#include "exec/memop.h"
-+    }
++#include "exec/exec-all.h"
-+    if (nf * max_elems % total_elems != 0) {
++#include "exec/helper-proto.h"
-+        uint32_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
++#include "internals.h"
-+        uint32_t registers_used =
++#include "vector_internals.h"
-+            ((nf * max_elems) * esz + (vlenb - 1)) / vlenb;
++
-+        vext_set_elems_1s(vd, vta, (nf * max_elems) * esz,
++static uint64_t clmul64(uint64_t y, uint64_t x)
-+                          registers_used * vlenb);
++{
-+    }
++    uint64_t result = 0;
- }
++    for (int j = 63; j >= 0; j--) {
++        if ((y >> j) & 1) {
- #define GEN_VEXT_LD_STRIDE(NAME, ETYPE, LOAD_FN)                        \
++            result ^= (x << j);
-@@ -XXX,XX +XXX,XX @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
++        }
-     uint32_t i, k;
++    }
-     uint32_t nf = vext_nf(desc);
++    return result;
-     uint32_t max_elems = vext_max_elems(desc, log2_esz);
++}
-+    uint32_t esz = 1 << log2_esz;
++
-+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
++static uint64_t clmulh64(uint64_t y, uint64_t x)
-+    uint32_t vta = vext_vta(desc);
++{
++    uint64_t result = 0;
-     /* load bytes from guest memory */
++    for (int j = 63; j >= 1; j--) {
-     for (i = env->vstart; i < evl; i++, env->vstart++) {
++        if ((y >> j) & 1) {
-@@ -XXX,XX +XXX,XX @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
++            result ^= (x >> (64 - j));
-         }
++        }
-     }
++    }
-     env->vstart = 0;
++    return result;
-+    /* set tail elements to 1s */
++}
-+    for (k = 0; k < nf; ++k) {
++
-+        vext_set_elems_1s(vd, vta, (k * max_elems + evl) * esz,
++RVVCALL(OPIVV2, vclmul_vv, OP_UUU_D, H8, H8, H8, clmul64)
-+                          (k * max_elems + max_elems) * esz);
++GEN_VEXT_VV(vclmul_vv, 8)
-+    }
++RVVCALL(OPIVX2, vclmul_vx, OP_UUU_D, H8, H8, clmul64)
-+    if (nf * max_elems % total_elems != 0) {
++GEN_VEXT_VX(vclmul_vx, 8)
-+        uint32_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
++RVVCALL(OPIVV2, vclmulh_vv, OP_UUU_D, H8, H8, H8, clmulh64)
-+        uint32_t registers_used =
++GEN_VEXT_VV(vclmulh_vv, 8)
-+            ((nf * max_elems) * esz + (vlenb - 1)) / vlenb;
++RVVCALL(OPIVX2, vclmulh_vx, OP_UUU_D, H8, H8, clmulh64)
-+        vext_set_elems_1s(vd, vta, (nf * max_elems) * esz,
++GEN_VEXT_VX(vclmulh_vx, 8)
-+                          registers_used * vlenb);
+diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc b/target/riscv/insn_trans/trans_rvvk.c.inc
-+    }
+new file mode 100644
- }
+index XXXXXXX..XXXXXXX
+--- /dev/null
- /*
++++ b/target/riscv/insn_trans/trans_rvvk.c.inc
-@@ -XXX,XX +XXX,XX @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
+@@ -XXX,XX +XXX,XX @@
-     uint32_t nf = vext_nf(desc);
++/*
-     uint32_t vm = vext_vm(desc);
++ * RISC-V translation routines for the vector crypto extension.
-     uint32_t max_elems = vext_max_elems(desc, log2_esz);
++ *
-+    uint32_t esz = 1 << log2_esz;
++ * Copyright (C) 2023 SiFive, Inc.
-+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
++ * Written by Codethink Ltd and SiFive.
-+    uint32_t vta = vext_vta(desc);
++ *
++ * This program is free software; you can redistribute it and/or modify it
-     /* load bytes from guest memory */
++ * under the terms and conditions of the GNU General Public License,
-     for (i = env->vstart; i < env->vl; i++, env->vstart++) {
++ * version 2 or later, as published by the Free Software Foundation.
-@@ -XXX,XX +XXX,XX @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
++ *
-         }
++ * This program is distributed in the hope it will be useful, but WITHOUT
-     }
++ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-     env->vstart = 0;
++ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-+    /* set tail elements to 1s */
++ * more details.
-+    for (k = 0; k < nf; ++k) {
++ *
-+        vext_set_elems_1s(vd, vta, (k * max_elems + env->vl) * esz,
++ * You should have received a copy of the GNU General Public License along with
-+                          (k * max_elems + max_elems) * esz);
++ * this program.  If not, see <http://www.gnu.org/licenses/>.
-+    }
++ */
-+    if (nf * max_elems % total_elems != 0) {
++
-+        uint32_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
++/*
-+        uint32_t registers_used =
++ * Zvbc
-+            ((nf * max_elems) * esz + (vlenb - 1)) / vlenb;
++ */
-+        vext_set_elems_1s(vd, vta, (nf * max_elems) * esz,
++
-+                          registers_used * vlenb);
++#define GEN_VV_MASKED_TRANS(NAME, CHECK)                     \
-+    }
++    static bool trans_##NAME(DisasContext *s, arg_rmrr *a)   \
- }
++    {                                                        \
++        if (CHECK(s, a)) {                                   \
- #define GEN_VEXT_LD_INDEX(NAME, ETYPE, INDEX_FN, LOAD_FN)                  \
++            return opivv_trans(a->rd, a->rs1, a->rs2, a->vm, \
-@@ -XXX,XX +XXX,XX @@ vext_ldff(void *vd, void *v0, target_ulong base,
++                               gen_helper_##NAME, s);        \
-     uint32_t nf = vext_nf(desc);
++        }                                                    \
-     uint32_t vm = vext_vm(desc);
++        return false;                                        \
-     uint32_t max_elems = vext_max_elems(desc, log2_esz);
++    }
-+    uint32_t esz = 1 << log2_esz;
++
-+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
++static bool vclmul_vv_check(DisasContext *s, arg_rmrr *a)
-+    uint32_t vta = vext_vta(desc);
++{
-     target_ulong addr, offset, remain;
++    return opivv_check(s, a) &&
++           s->cfg_ptr->ext_zvbc == true &&
-     /* probe every access*/
++           s->sew == MO_64;
-@@ -XXX,XX +XXX,XX @@ ProbeSuccess:
++}
-         }
++
-     }
++GEN_VV_MASKED_TRANS(vclmul_vv, vclmul_vv_check)
-     env->vstart = 0;
++GEN_VV_MASKED_TRANS(vclmulh_vv, vclmul_vv_check)
-+    /* set tail elements to 1s */
++
-+    for (k = 0; k < nf; ++k) {
++#define GEN_VX_MASKED_TRANS(NAME, CHECK)                     \
-+        vext_set_elems_1s(vd, vta, (k * max_elems + env->vl) * esz,
++    static bool trans_##NAME(DisasContext *s, arg_rmrr *a)   \
-+                          (k * max_elems + max_elems) * esz);
++    {                                                        \
-+    }
++        if (CHECK(s, a)) {                                   \
-+    if (nf * max_elems % total_elems != 0) {
++            return opivx_trans(a->rd, a->rs1, a->rs2, a->vm, \
-+        uint32_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
++                               gen_helper_##NAME, s);        \
-+        uint32_t registers_used =
++        }                                                    \
-+            ((nf * max_elems) * esz + (vlenb - 1)) / vlenb;
++        return false;                                        \
-+        vext_set_elems_1s(vd, vta, (nf * max_elems) * esz,
++    }
-+                          registers_used * vlenb);
++
-+    }
++static bool vclmul_vx_check(DisasContext *s, arg_rmrr *a)
- }
++{
++    return opivx_check(s, a) &&
- #define GEN_VEXT_LDFF(NAME, ETYPE, LOAD_FN)               \
++           s->cfg_ptr->ext_zvbc == true &&
-diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
++           s->sew == MO_64;
-index XXXXXXX..XXXXXXX 100644
++}
---- a/target/riscv/insn_trans/trans_rvv.c.inc
++
-+++ b/target/riscv/insn_trans/trans_rvv.c.inc
++GEN_VX_MASKED_TRANS(vclmul_vx, vclmul_vx_check)
-@@ -XXX,XX +XXX,XX @@ static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t eew)
++GEN_VX_MASKED_TRANS(vclmulh_vx, vclmul_vx_check)
-     data = FIELD_DP32(data, VDATA, VM, a->vm);
+diff --git a/target/riscv/meson.build b/target/riscv/meson.build
-     data = FIELD_DP32(data, VDATA, LMUL, emul);
+index XXXXXXX..XXXXXXX 100644
-     data = FIELD_DP32(data, VDATA, NF, a->nf);
+--- a/target/riscv/meson.build
-+    data = FIELD_DP32(data, VDATA, VTA, s->vta);
++++ b/target/riscv/meson.build
-     return ldst_us_trans(a->rd, a->rs1, data, fn, s, false);
+@@ -XXX,XX +XXX,XX @@ riscv_ss.add(files(
- }
+   'translate.c',
+   'm128_helper.c',
-@@ -XXX,XX +XXX,XX @@ static bool ld_us_mask_op(DisasContext *s, arg_vlm_v *a, uint8_t eew)
+   'crypto_helper.c',
-     /* EMUL = 1, NFIELDS = 1 */
+-  'zce_helper.c'
-     data = FIELD_DP32(data, VDATA, LMUL, 0);
++  'zce_helper.c',
-     data = FIELD_DP32(data, VDATA, NF, 1);
++  'vcrypto_helper.c'
-+    /* Mask destination register are always tail-agnostic */
+ ))
-+    data = FIELD_DP32(data, VDATA, VTA, s->cfg_vta_all_1s);
+ riscv_ss.add(when: 'CONFIG_KVM', if_true: files('kvm.c'), if_false: files('kvm-stub.c'))
      return ldst_us_trans(a->rd, a->rs1, data, fn, s, false);
  }
@@ -XXX,XX +XXX,XX @@ static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t eew)
      data = FIELD_DP32(data, VDATA, VM, a->vm);
      data = FIELD_DP32(data, VDATA, LMUL, emul);
      data = FIELD_DP32(data, VDATA, NF, a->nf);
 +    data = FIELD_DP32(data, VDATA, VTA, s->vta);
      return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s, false);
  }
@@ -XXX,XX +XXX,XX @@ static bool ld_index_op(DisasContext *s, arg_rnfvm *a, uint8_t eew)
      data = FIELD_DP32(data, VDATA, VM, a->vm);
      data = FIELD_DP32(data, VDATA, LMUL, emul);
      data = FIELD_DP32(data, VDATA, NF, a->nf);
 +    data = FIELD_DP32(data, VDATA, VTA, s->vta);
      return ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s, false);
  }
@@ -XXX,XX +XXX,XX @@ static bool ldff_op(DisasContext *s, arg_r2nfvm *a, uint8_t eew)
      data = FIELD_DP32(data, VDATA, VM, a->vm);
      data = FIELD_DP32(data, VDATA, LMUL, emul);
      data = FIELD_DP32(data, VDATA, NF, a->nf);
 +    data = FIELD_DP32(data, VDATA, VTA, s->vta);
      return ldff_trans(a->rd, a->rs1, data, fn, s);
  }
 --
-.36.1
+.41.0

-[PULL 15/25] target/riscv: rvv: Add tail agnostic for vector integer shift instructions
+[PULL v2 12/45] target/riscv: Move vector translation checks
-From: eopXD <yueh.ting.chen@gmail.com>
+From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
-Signed-off-by: eop Chen <eop.chen@sifive.com>
+Move the checks out of `do_opiv{v,x,i}_gvec{,_shift}` functions
-Reviewed-by: Frank Chang <frank.chang@sifive.com>
+and into the corresponding macros. This enables the functions to be
 reused in proceeding commits without check duplication.
 Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
-Acked-by: Alistair Francis <alistair.francis@wdc.com>
+Signed-off-by: Max Chou <max.chou@sifive.com>
-Message-Id: <165449614532.19704.7000832880482980398-8@git.sr.ht>
+Message-ID: <20230711165917.2629866-6-max.chou@sifive.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/vector_helper.c            | 11 +++++++++++
+ target/riscv/insn_trans/trans_rvv.c.inc | 28 +++++++++++--------------
- target/riscv/insn_trans/trans_rvv.c.inc |  3 ++-
+file changed, 12 insertions(+), 16 deletions(-)
 files changed, 13 insertions(+), 1 deletion(-)
-diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/riscv/vector_helper.c
-+++ b/target/riscv/vector_helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,                          \
- {                                                                         \
-     uint32_t vm = vext_vm(desc);                                          \
-     uint32_t vl = env->vl;                                                \
-+    uint32_t esz = sizeof(TS1);                                           \
-+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
-+    uint32_t vta = vext_vta(desc);                                        \
-     uint32_t i;                                                           \
-                                                                           \
-     for (i = env->vstart; i < vl; i++) {                                  \
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,                          \
-         *((TS1 *)vd + HS1(i)) = OP(s2, s1 & MASK);                        \
-     }                                                                     \
-     env->vstart = 0;                                                      \
-+    /* set tail elements to 1s */                                         \
-+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
- }
- GEN_VEXT_SHIFT_VV(vsll_vv_b, uint8_t,  uint8_t, H1, H1, DO_SLL, 0x7)
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,      \
- {                                                           \
-     uint32_t vm = vext_vm(desc);                            \
-     uint32_t vl = env->vl;                                  \
-+    uint32_t esz = sizeof(TD);                              \
-+    uint32_t total_elems =                                  \
-+        vext_get_total_elems(env, desc, esz);               \
-+    uint32_t vta = vext_vta(desc);                          \
-     uint32_t i;                                             \
-                                                             \
-     for (i = env->vstart; i < vl; i++) {                    \
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,      \
-         *((TD *)vd + HD(i)) = OP(s2, s1 & MASK);            \
-     }                                                       \
-     env->vstart = 0;                                        \
-+    /* set tail elements to 1s */                           \
-+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);\
- }
- GEN_VEXT_SHIFT_VX(vsll_vx_b, uint8_t, int8_t, H1, H1, DO_SLL, 0x7)
 diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/insn_trans/trans_rvv.c.inc
 +++ b/target/riscv/insn_trans/trans_rvv.c.inc
-@@ -XXX,XX +XXX,XX @@ do_opivx_gvec_shift(DisasContext *s, arg_rmrr *a, GVecGen2sFn32 *gvec_fn,
+@@ -XXX,XX +XXX,XX @@ do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn *gvec_fn,
-         return false;
+               gen_helper_gvec_4_ptr *fn)
-     }
+ {
+     TCGLabel *over = gen_new_label();
--    if (a->vm && s->vl_eq_vlmax) {
+-    if (!opivv_check(s, a)) {
-+    if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
+-        return false;
 -    }
      tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
          gen_helper_##NAME##_b, gen_helper_##NAME##_h,              \
          gen_helper_##NAME##_w, gen_helper_##NAME##_d,              \
      };                                                             \
 +    if (!opivv_check(s, a)) {                                      \
 +        return false;                                              \
 +    }                                                              \
      return do_opivv_gvec(s, a, tcg_gen_gvec_##SUF, fns[s->sew]);   \
  }
@@ -XXX,XX +XXX,XX @@ static inline bool
  do_opivx_gvec(DisasContext *s, arg_rmrr *a, GVecGen2sFn *gvec_fn,
                gen_helper_opivx *fn)
  {
 -    if (!opivx_check(s, a)) {
 -        return false;
 -    }
 -
      if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
          TCGv_i64 src1 = tcg_temp_new_i64();
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
          gen_helper_##NAME##_b, gen_helper_##NAME##_h,              \
          gen_helper_##NAME##_w, gen_helper_##NAME##_d,              \
      };                                                             \
 +    if (!opivx_check(s, a)) {                                      \
 +        return false;                                              \
 +    }                                                              \
      return do_opivx_gvec(s, a, tcg_gen_gvec_##SUF, fns[s->sew]);   \
  }
@@ -XXX,XX +XXX,XX @@ static inline bool
  do_opivi_gvec(DisasContext *s, arg_rmrr *a, GVecGen2iFn *gvec_fn,
                gen_helper_opivx *fn, imm_mode_t imm_mode)
  {
 -    if (!opivx_check(s, a)) {
 -        return false;
 -    }
 -
      if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
          gvec_fn(s->sew, vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2),
                  extract_imm(s, a->rs1, imm_mode), MAXSZ(s), MAXSZ(s));
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
          gen_helper_##OPIVX##_b, gen_helper_##OPIVX##_h,            \
          gen_helper_##OPIVX##_w, gen_helper_##OPIVX##_d,            \
      };                                                             \
 +    if (!opivx_check(s, a)) {                                      \
 +        return false;                                              \
 +    }                                                              \
      return do_opivi_gvec(s, a, tcg_gen_gvec_##SUF,                 \
                           fns[s->sew], IMM_MODE);                   \
  }
@@ -XXX,XX +XXX,XX @@ static inline bool
  do_opivx_gvec_shift(DisasContext *s, arg_rmrr *a, GVecGen2sFn32 *gvec_fn,
                      gen_helper_opivx *fn)
  {
 -    if (!opivx_check(s, a)) {
 -        return false;
 -    }
 -
      if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
          TCGv_i32 src1 = tcg_temp_new_i32();
-         tcg_gen_trunc_tl_i32(src1, get_gpr(s, a->rs1, EXT_NONE));
+@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                    \
-@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
+         gen_helper_##NAME##_b, gen_helper_##NAME##_h,                     \
-                                                                    \
+         gen_helper_##NAME##_w, gen_helper_##NAME##_d,                     \
-         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+     };                                                                    \
-         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+-                                                                          \
-+        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
++    if (!opivx_check(s, a)) {                                             \
-         tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
++        return false;                                                     \
-                            vreg_ofs(s, a->rs1),                    \
++    }                                                                     \
-                            vreg_ofs(s, a->rs2), cpu_env,           \
+     return do_opivx_gvec_shift(s, a, tcg_gen_gvec_##SUF, fns[s->sew]);    \
  }
 --
-.36.1
+.41.0

-[PULL 08/25] target/riscv: rvv: Prune redundant ESZ, DSZ parameter passed
+[PULL v2 13/45] target/riscv: Refactor translation of vector-widening instruction
-From: eopXD <yueh.ting.chen@gmail.com>
+From: Dickon Hood <dickon.hood@codethink.co.uk>
-No functional change intended in this commit.
+Zvbb (implemented in later commit) has a widening instruction, which
 requires an extra check on the enabled extensions.  Refactor
 GEN_OPIVX_WIDEN_TRANS() to take a check function to avoid reimplementing
 it.
-Signed-off-by: eop Chen <eop.chen@sifive.com>
+Signed-off-by: Dickon Hood <dickon.hood@codethink.co.uk>
-Reviewed-by: Frank Chang <frank.chang@sifive.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Signed-off-by: Max Chou <max.chou@sifive.com>
-Message-Id: <165449614532.19704.7000832880482980398-1@git.sr.ht>
+Message-ID: <20230711165917.2629866-7-max.chou@sifive.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/vector_helper.c | 1132 +++++++++++++++++-----------------
+ target/riscv/insn_trans/trans_rvv.c.inc | 52 +++++++++++--------------
-file changed, 565 insertions(+), 567 deletions(-)
+file changed, 23 insertions(+), 29 deletions(-)
-diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
+diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/riscv/vector_helper.c
+--- a/target/riscv/insn_trans/trans_rvv.c.inc
-+++ b/target/riscv/vector_helper.c
++++ b/target/riscv/insn_trans/trans_rvv.c.inc
-@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vsub_vv_d, OP_SSS_D, H8, H8, H8, DO_SUB)
+@@ -XXX,XX +XXX,XX @@ static bool opivx_widen_check(DisasContext *s, arg_rmrr *a)
+            vext_check_ds(s, a->rd, a->rs2, a->vm);
  static void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
                         CPURISCVState *env, uint32_t desc,
 -                       uint32_t esz, uint32_t dsz,
                         opivv2_fn *fn)
  {
      uint32_t vm = vext_vm(desc);
@@ -XXX,XX +XXX,XX @@ static void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
  }
- /* generate the helpers for OPIVV */
+-static bool do_opivx_widen(DisasContext *s, arg_rmrr *a,
--#define GEN_VEXT_VV(NAME, ESZ, DSZ)                       \
+-                           gen_helper_opivx *fn)
-+#define GEN_VEXT_VV(NAME)                                 \
+-{
- void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+-    if (opivx_widen_check(s, a)) {
-                   void *vs2, CPURISCVState *env,          \
+-        return opivx_trans(a->rd, a->rs1, a->rs2, a->vm, fn, s);
-                   uint32_t desc)                          \
+-    }
- {                                                         \
+-    return false;
--    do_vext_vv(vd, v0, vs1, vs2, env, desc, ESZ, DSZ,     \
+-}
-+    do_vext_vv(vd, v0, vs1, vs2, env, desc,               \
+-
-                do_##NAME);                                \
+-#define GEN_OPIVX_WIDEN_TRANS(NAME) \
 -static bool trans_##NAME(DisasContext *s, arg_rmrr *a)       \
 -{                                                            \
 -    static gen_helper_opivx * const fns[3] = {               \
 -        gen_helper_##NAME##_b,                               \
 -        gen_helper_##NAME##_h,                               \
 -        gen_helper_##NAME##_w                                \
 -    };                                                       \
 -    return do_opivx_widen(s, a, fns[s->sew]);                \
 +#define GEN_OPIVX_WIDEN_TRANS(NAME, CHECK) \
 +static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                    \
 +{                                                                         \
 +    if (CHECK(s, a)) {                                                    \
 +        static gen_helper_opivx * const fns[3] = {                        \
 +            gen_helper_##NAME##_b,                                        \
 +            gen_helper_##NAME##_h,                                        \
 +            gen_helper_##NAME##_w                                         \
 +        };                                                                \
 +        return opivx_trans(a->rd, a->rs1, a->rs2, a->vm, fns[s->sew], s); \
 +    }                                                                     \
 +    return false;                                                         \
  }
--GEN_VEXT_VV(vadd_vv_b, 1, 1)
+-GEN_OPIVX_WIDEN_TRANS(vwaddu_vx)
--GEN_VEXT_VV(vadd_vv_h, 2, 2)
+-GEN_OPIVX_WIDEN_TRANS(vwadd_vx)
--GEN_VEXT_VV(vadd_vv_w, 4, 4)
+-GEN_OPIVX_WIDEN_TRANS(vwsubu_vx)
--GEN_VEXT_VV(vadd_vv_d, 8, 8)
+-GEN_OPIVX_WIDEN_TRANS(vwsub_vx)
--GEN_VEXT_VV(vsub_vv_b, 1, 1)
++GEN_OPIVX_WIDEN_TRANS(vwaddu_vx, opivx_widen_check)
--GEN_VEXT_VV(vsub_vv_h, 2, 2)
++GEN_OPIVX_WIDEN_TRANS(vwadd_vx, opivx_widen_check)
--GEN_VEXT_VV(vsub_vv_w, 4, 4)
++GEN_OPIVX_WIDEN_TRANS(vwsubu_vx, opivx_widen_check)
--GEN_VEXT_VV(vsub_vv_d, 8, 8)
++GEN_OPIVX_WIDEN_TRANS(vwsub_vx, opivx_widen_check)
-+GEN_VEXT_VV(vadd_vv_b)
-+GEN_VEXT_VV(vadd_vv_h)
+ /* WIDEN OPIVV with WIDEN */
-+GEN_VEXT_VV(vadd_vv_w)
+ static bool opiwv_widen_check(DisasContext *s, arg_rmrr *a)
-+GEN_VEXT_VV(vadd_vv_d)
+@@ -XXX,XX +XXX,XX @@ GEN_OPIVX_TRANS(vrem_vx, opivx_check)
-+GEN_VEXT_VV(vsub_vv_b)
+ GEN_OPIVV_WIDEN_TRANS(vwmul_vv, opivv_widen_check)
-+GEN_VEXT_VV(vsub_vv_h)
+ GEN_OPIVV_WIDEN_TRANS(vwmulu_vv, opivv_widen_check)
-+GEN_VEXT_VV(vsub_vv_w)
+ GEN_OPIVV_WIDEN_TRANS(vwmulsu_vv, opivv_widen_check)
-+GEN_VEXT_VV(vsub_vv_d)
+-GEN_OPIVX_WIDEN_TRANS(vwmul_vx)
+-GEN_OPIVX_WIDEN_TRANS(vwmulu_vx)
- typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
+-GEN_OPIVX_WIDEN_TRANS(vwmulsu_vx)
++GEN_OPIVX_WIDEN_TRANS(vwmul_vx, opivx_widen_check)
-@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vrsub_vx_d, OP_SSS_D, H8, H8, DO_RSUB)
++GEN_OPIVX_WIDEN_TRANS(vwmulu_vx, opivx_widen_check)
++GEN_OPIVX_WIDEN_TRANS(vwmulsu_vx, opivx_widen_check)
  static void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
                         CPURISCVState *env, uint32_t desc,
 -                       uint32_t esz, uint32_t dsz,
                         opivx2_fn fn)
  {
      uint32_t vm = vext_vm(desc);
@@ -XXX,XX +XXX,XX @@ static void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
  }
  /* generate the helpers for OPIVX */
 -#define GEN_VEXT_VX(NAME, ESZ, DSZ)                       \
 +#define GEN_VEXT_VX(NAME)                                 \
  void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
                    void *vs2, CPURISCVState *env,          \
                    uint32_t desc)                          \
  {                                                         \
 -    do_vext_vx(vd, v0, s1, vs2, env, desc, ESZ, DSZ,      \
 +    do_vext_vx(vd, v0, s1, vs2, env, desc,                \
                 do_##NAME);                                \
  }
 -GEN_VEXT_VX(vadd_vx_b, 1, 1)
 -GEN_VEXT_VX(vadd_vx_h, 2, 2)
 -GEN_VEXT_VX(vadd_vx_w, 4, 4)
 -GEN_VEXT_VX(vadd_vx_d, 8, 8)
 -GEN_VEXT_VX(vsub_vx_b, 1, 1)
 -GEN_VEXT_VX(vsub_vx_h, 2, 2)
 -GEN_VEXT_VX(vsub_vx_w, 4, 4)
 -GEN_VEXT_VX(vsub_vx_d, 8, 8)
 -GEN_VEXT_VX(vrsub_vx_b, 1, 1)
 -GEN_VEXT_VX(vrsub_vx_h, 2, 2)
 -GEN_VEXT_VX(vrsub_vx_w, 4, 4)
 -GEN_VEXT_VX(vrsub_vx_d, 8, 8)
 +GEN_VEXT_VX(vadd_vx_b)
 +GEN_VEXT_VX(vadd_vx_h)
 +GEN_VEXT_VX(vadd_vx_w)
 +GEN_VEXT_VX(vadd_vx_d)
 +GEN_VEXT_VX(vsub_vx_b)
 +GEN_VEXT_VX(vsub_vx_h)
 +GEN_VEXT_VX(vsub_vx_w)
 +GEN_VEXT_VX(vsub_vx_d)
 +GEN_VEXT_VX(vrsub_vx_b)
 +GEN_VEXT_VX(vrsub_vx_h)
 +GEN_VEXT_VX(vrsub_vx_w)
 +GEN_VEXT_VX(vrsub_vx_d)
  void HELPER(vec_rsubs8)(void *d, void *a, uint64_t b, uint32_t desc)
  {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vwadd_wv_w, WOP_WSSS_W, H8, H4, H4, DO_ADD)
  RVVCALL(OPIVV2, vwsub_wv_b, WOP_WSSS_B, H2, H1, H1, DO_SUB)
  RVVCALL(OPIVV2, vwsub_wv_h, WOP_WSSS_H, H4, H2, H2, DO_SUB)
  RVVCALL(OPIVV2, vwsub_wv_w, WOP_WSSS_W, H8, H4, H4, DO_SUB)
 -GEN_VEXT_VV(vwaddu_vv_b, 1, 2)
 -GEN_VEXT_VV(vwaddu_vv_h, 2, 4)
 -GEN_VEXT_VV(vwaddu_vv_w, 4, 8)
 -GEN_VEXT_VV(vwsubu_vv_b, 1, 2)
 -GEN_VEXT_VV(vwsubu_vv_h, 2, 4)
 -GEN_VEXT_VV(vwsubu_vv_w, 4, 8)
 -GEN_VEXT_VV(vwadd_vv_b, 1, 2)
 -GEN_VEXT_VV(vwadd_vv_h, 2, 4)
 -GEN_VEXT_VV(vwadd_vv_w, 4, 8)
 -GEN_VEXT_VV(vwsub_vv_b, 1, 2)
 -GEN_VEXT_VV(vwsub_vv_h, 2, 4)
 -GEN_VEXT_VV(vwsub_vv_w, 4, 8)
 -GEN_VEXT_VV(vwaddu_wv_b, 1, 2)
 -GEN_VEXT_VV(vwaddu_wv_h, 2, 4)
 -GEN_VEXT_VV(vwaddu_wv_w, 4, 8)
 -GEN_VEXT_VV(vwsubu_wv_b, 1, 2)
 -GEN_VEXT_VV(vwsubu_wv_h, 2, 4)
 -GEN_VEXT_VV(vwsubu_wv_w, 4, 8)
 -GEN_VEXT_VV(vwadd_wv_b, 1, 2)
 -GEN_VEXT_VV(vwadd_wv_h, 2, 4)
 -GEN_VEXT_VV(vwadd_wv_w, 4, 8)
 -GEN_VEXT_VV(vwsub_wv_b, 1, 2)
 -GEN_VEXT_VV(vwsub_wv_h, 2, 4)
 -GEN_VEXT_VV(vwsub_wv_w, 4, 8)
 +GEN_VEXT_VV(vwaddu_vv_b)
 +GEN_VEXT_VV(vwaddu_vv_h)
 +GEN_VEXT_VV(vwaddu_vv_w)
 +GEN_VEXT_VV(vwsubu_vv_b)
 +GEN_VEXT_VV(vwsubu_vv_h)
 +GEN_VEXT_VV(vwsubu_vv_w)
 +GEN_VEXT_VV(vwadd_vv_b)
 +GEN_VEXT_VV(vwadd_vv_h)
 +GEN_VEXT_VV(vwadd_vv_w)
 +GEN_VEXT_VV(vwsub_vv_b)
 +GEN_VEXT_VV(vwsub_vv_h)
 +GEN_VEXT_VV(vwsub_vv_w)
 +GEN_VEXT_VV(vwaddu_wv_b)
 +GEN_VEXT_VV(vwaddu_wv_h)
 +GEN_VEXT_VV(vwaddu_wv_w)
 +GEN_VEXT_VV(vwsubu_wv_b)
 +GEN_VEXT_VV(vwsubu_wv_h)
 +GEN_VEXT_VV(vwsubu_wv_w)
 +GEN_VEXT_VV(vwadd_wv_b)
 +GEN_VEXT_VV(vwadd_wv_h)
 +GEN_VEXT_VV(vwadd_wv_w)
 +GEN_VEXT_VV(vwsub_wv_b)
 +GEN_VEXT_VV(vwsub_wv_h)
 +GEN_VEXT_VV(vwsub_wv_w)
  RVVCALL(OPIVX2, vwaddu_vx_b, WOP_UUU_B, H2, H1, DO_ADD)
  RVVCALL(OPIVX2, vwaddu_vx_h, WOP_UUU_H, H4, H2, DO_ADD)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vwadd_wx_w, WOP_WSSS_W, H8, H4, DO_ADD)
  RVVCALL(OPIVX2, vwsub_wx_b, WOP_WSSS_B, H2, H1, DO_SUB)
  RVVCALL(OPIVX2, vwsub_wx_h, WOP_WSSS_H, H4, H2, DO_SUB)
  RVVCALL(OPIVX2, vwsub_wx_w, WOP_WSSS_W, H8, H4, DO_SUB)
 -GEN_VEXT_VX(vwaddu_vx_b, 1, 2)
 -GEN_VEXT_VX(vwaddu_vx_h, 2, 4)
 -GEN_VEXT_VX(vwaddu_vx_w, 4, 8)
 -GEN_VEXT_VX(vwsubu_vx_b, 1, 2)
 -GEN_VEXT_VX(vwsubu_vx_h, 2, 4)
 -GEN_VEXT_VX(vwsubu_vx_w, 4, 8)
 -GEN_VEXT_VX(vwadd_vx_b, 1, 2)
 -GEN_VEXT_VX(vwadd_vx_h, 2, 4)
 -GEN_VEXT_VX(vwadd_vx_w, 4, 8)
 -GEN_VEXT_VX(vwsub_vx_b, 1, 2)
 -GEN_VEXT_VX(vwsub_vx_h, 2, 4)
 -GEN_VEXT_VX(vwsub_vx_w, 4, 8)
 -GEN_VEXT_VX(vwaddu_wx_b, 1, 2)
 -GEN_VEXT_VX(vwaddu_wx_h, 2, 4)
 -GEN_VEXT_VX(vwaddu_wx_w, 4, 8)
 -GEN_VEXT_VX(vwsubu_wx_b, 1, 2)
 -GEN_VEXT_VX(vwsubu_wx_h, 2, 4)
 -GEN_VEXT_VX(vwsubu_wx_w, 4, 8)
 -GEN_VEXT_VX(vwadd_wx_b, 1, 2)
 -GEN_VEXT_VX(vwadd_wx_h, 2, 4)
 -GEN_VEXT_VX(vwadd_wx_w, 4, 8)
 -GEN_VEXT_VX(vwsub_wx_b, 1, 2)
 -GEN_VEXT_VX(vwsub_wx_h, 2, 4)
 -GEN_VEXT_VX(vwsub_wx_w, 4, 8)
 +GEN_VEXT_VX(vwaddu_vx_b)
 +GEN_VEXT_VX(vwaddu_vx_h)
 +GEN_VEXT_VX(vwaddu_vx_w)
 +GEN_VEXT_VX(vwsubu_vx_b)
 +GEN_VEXT_VX(vwsubu_vx_h)
 +GEN_VEXT_VX(vwsubu_vx_w)
 +GEN_VEXT_VX(vwadd_vx_b)
 +GEN_VEXT_VX(vwadd_vx_h)
 +GEN_VEXT_VX(vwadd_vx_w)
 +GEN_VEXT_VX(vwsub_vx_b)
 +GEN_VEXT_VX(vwsub_vx_h)
 +GEN_VEXT_VX(vwsub_vx_w)
 +GEN_VEXT_VX(vwaddu_wx_b)
 +GEN_VEXT_VX(vwaddu_wx_h)
 +GEN_VEXT_VX(vwaddu_wx_w)
 +GEN_VEXT_VX(vwsubu_wx_b)
 +GEN_VEXT_VX(vwsubu_wx_h)
 +GEN_VEXT_VX(vwsubu_wx_w)
 +GEN_VEXT_VX(vwadd_wx_b)
 +GEN_VEXT_VX(vwadd_wx_h)
 +GEN_VEXT_VX(vwadd_wx_w)
 +GEN_VEXT_VX(vwsub_wx_b)
 +GEN_VEXT_VX(vwsub_wx_h)
 +GEN_VEXT_VX(vwsub_wx_w)
  /* Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions */
  #define DO_VADC(N, M, C) (N + M + C)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vxor_vv_b, OP_SSS_B, H1, H1, H1, DO_XOR)
  RVVCALL(OPIVV2, vxor_vv_h, OP_SSS_H, H2, H2, H2, DO_XOR)
  RVVCALL(OPIVV2, vxor_vv_w, OP_SSS_W, H4, H4, H4, DO_XOR)
  RVVCALL(OPIVV2, vxor_vv_d, OP_SSS_D, H8, H8, H8, DO_XOR)
 -GEN_VEXT_VV(vand_vv_b, 1, 1)
 -GEN_VEXT_VV(vand_vv_h, 2, 2)
 -GEN_VEXT_VV(vand_vv_w, 4, 4)
 -GEN_VEXT_VV(vand_vv_d, 8, 8)
 -GEN_VEXT_VV(vor_vv_b, 1, 1)
 -GEN_VEXT_VV(vor_vv_h, 2, 2)
 -GEN_VEXT_VV(vor_vv_w, 4, 4)
 -GEN_VEXT_VV(vor_vv_d, 8, 8)
 -GEN_VEXT_VV(vxor_vv_b, 1, 1)
 -GEN_VEXT_VV(vxor_vv_h, 2, 2)
 -GEN_VEXT_VV(vxor_vv_w, 4, 4)
 -GEN_VEXT_VV(vxor_vv_d, 8, 8)
 +GEN_VEXT_VV(vand_vv_b)
 +GEN_VEXT_VV(vand_vv_h)
 +GEN_VEXT_VV(vand_vv_w)
 +GEN_VEXT_VV(vand_vv_d)
 +GEN_VEXT_VV(vor_vv_b)
 +GEN_VEXT_VV(vor_vv_h)
 +GEN_VEXT_VV(vor_vv_w)
 +GEN_VEXT_VV(vor_vv_d)
 +GEN_VEXT_VV(vxor_vv_b)
 +GEN_VEXT_VV(vxor_vv_h)
 +GEN_VEXT_VV(vxor_vv_w)
 +GEN_VEXT_VV(vxor_vv_d)
  RVVCALL(OPIVX2, vand_vx_b, OP_SSS_B, H1, H1, DO_AND)
  RVVCALL(OPIVX2, vand_vx_h, OP_SSS_H, H2, H2, DO_AND)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vxor_vx_b, OP_SSS_B, H1, H1, DO_XOR)
  RVVCALL(OPIVX2, vxor_vx_h, OP_SSS_H, H2, H2, DO_XOR)
  RVVCALL(OPIVX2, vxor_vx_w, OP_SSS_W, H4, H4, DO_XOR)
  RVVCALL(OPIVX2, vxor_vx_d, OP_SSS_D, H8, H8, DO_XOR)
 -GEN_VEXT_VX(vand_vx_b, 1, 1)
 -GEN_VEXT_VX(vand_vx_h, 2, 2)
 -GEN_VEXT_VX(vand_vx_w, 4, 4)
 -GEN_VEXT_VX(vand_vx_d, 8, 8)
 -GEN_VEXT_VX(vor_vx_b, 1, 1)
 -GEN_VEXT_VX(vor_vx_h, 2, 2)
 -GEN_VEXT_VX(vor_vx_w, 4, 4)
 -GEN_VEXT_VX(vor_vx_d, 8, 8)
 -GEN_VEXT_VX(vxor_vx_b, 1, 1)
 -GEN_VEXT_VX(vxor_vx_h, 2, 2)
 -GEN_VEXT_VX(vxor_vx_w, 4, 4)
 -GEN_VEXT_VX(vxor_vx_d, 8, 8)
 +GEN_VEXT_VX(vand_vx_b)
 +GEN_VEXT_VX(vand_vx_h)
 +GEN_VEXT_VX(vand_vx_w)
 +GEN_VEXT_VX(vand_vx_d)
 +GEN_VEXT_VX(vor_vx_b)
 +GEN_VEXT_VX(vor_vx_h)
 +GEN_VEXT_VX(vor_vx_w)
 +GEN_VEXT_VX(vor_vx_d)
 +GEN_VEXT_VX(vxor_vx_b)
 +GEN_VEXT_VX(vxor_vx_h)
 +GEN_VEXT_VX(vxor_vx_w)
 +GEN_VEXT_VX(vxor_vx_d)
  /* Vector Single-Width Bit Shift Instructions */
  #define DO_SLL(N, M)  (N << (M))
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vmax_vv_b, OP_SSS_B, H1, H1, H1, DO_MAX)
  RVVCALL(OPIVV2, vmax_vv_h, OP_SSS_H, H2, H2, H2, DO_MAX)
  RVVCALL(OPIVV2, vmax_vv_w, OP_SSS_W, H4, H4, H4, DO_MAX)
  RVVCALL(OPIVV2, vmax_vv_d, OP_SSS_D, H8, H8, H8, DO_MAX)
 -GEN_VEXT_VV(vminu_vv_b, 1, 1)
 -GEN_VEXT_VV(vminu_vv_h, 2, 2)
 -GEN_VEXT_VV(vminu_vv_w, 4, 4)
 -GEN_VEXT_VV(vminu_vv_d, 8, 8)
 -GEN_VEXT_VV(vmin_vv_b, 1, 1)
 -GEN_VEXT_VV(vmin_vv_h, 2, 2)
 -GEN_VEXT_VV(vmin_vv_w, 4, 4)
 -GEN_VEXT_VV(vmin_vv_d, 8, 8)
 -GEN_VEXT_VV(vmaxu_vv_b, 1, 1)
 -GEN_VEXT_VV(vmaxu_vv_h, 2, 2)
 -GEN_VEXT_VV(vmaxu_vv_w, 4, 4)
 -GEN_VEXT_VV(vmaxu_vv_d, 8, 8)
 -GEN_VEXT_VV(vmax_vv_b, 1, 1)
 -GEN_VEXT_VV(vmax_vv_h, 2, 2)
 -GEN_VEXT_VV(vmax_vv_w, 4, 4)
 -GEN_VEXT_VV(vmax_vv_d, 8, 8)
 +GEN_VEXT_VV(vminu_vv_b)
 +GEN_VEXT_VV(vminu_vv_h)
 +GEN_VEXT_VV(vminu_vv_w)
 +GEN_VEXT_VV(vminu_vv_d)
 +GEN_VEXT_VV(vmin_vv_b)
 +GEN_VEXT_VV(vmin_vv_h)
 +GEN_VEXT_VV(vmin_vv_w)
 +GEN_VEXT_VV(vmin_vv_d)
 +GEN_VEXT_VV(vmaxu_vv_b)
 +GEN_VEXT_VV(vmaxu_vv_h)
 +GEN_VEXT_VV(vmaxu_vv_w)
 +GEN_VEXT_VV(vmaxu_vv_d)
 +GEN_VEXT_VV(vmax_vv_b)
 +GEN_VEXT_VV(vmax_vv_h)
 +GEN_VEXT_VV(vmax_vv_w)
 +GEN_VEXT_VV(vmax_vv_d)
  RVVCALL(OPIVX2, vminu_vx_b, OP_UUU_B, H1, H1, DO_MIN)
  RVVCALL(OPIVX2, vminu_vx_h, OP_UUU_H, H2, H2, DO_MIN)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vmax_vx_b, OP_SSS_B, H1, H1, DO_MAX)
  RVVCALL(OPIVX2, vmax_vx_h, OP_SSS_H, H2, H2, DO_MAX)
  RVVCALL(OPIVX2, vmax_vx_w, OP_SSS_W, H4, H4, DO_MAX)
  RVVCALL(OPIVX2, vmax_vx_d, OP_SSS_D, H8, H8, DO_MAX)
 -GEN_VEXT_VX(vminu_vx_b, 1, 1)
 -GEN_VEXT_VX(vminu_vx_h, 2, 2)
 -GEN_VEXT_VX(vminu_vx_w, 4, 4)
 -GEN_VEXT_VX(vminu_vx_d, 8, 8)
 -GEN_VEXT_VX(vmin_vx_b, 1, 1)
 -GEN_VEXT_VX(vmin_vx_h, 2, 2)
 -GEN_VEXT_VX(vmin_vx_w, 4, 4)
 -GEN_VEXT_VX(vmin_vx_d, 8, 8)
 -GEN_VEXT_VX(vmaxu_vx_b, 1, 1)
 -GEN_VEXT_VX(vmaxu_vx_h, 2, 2)
 -GEN_VEXT_VX(vmaxu_vx_w, 4, 4)
 -GEN_VEXT_VX(vmaxu_vx_d, 8, 8)
 -GEN_VEXT_VX(vmax_vx_b, 1, 1)
 -GEN_VEXT_VX(vmax_vx_h, 2, 2)
 -GEN_VEXT_VX(vmax_vx_w, 4, 4)
 -GEN_VEXT_VX(vmax_vx_d, 8, 8)
 +GEN_VEXT_VX(vminu_vx_b)
 +GEN_VEXT_VX(vminu_vx_h)
 +GEN_VEXT_VX(vminu_vx_w)
 +GEN_VEXT_VX(vminu_vx_d)
 +GEN_VEXT_VX(vmin_vx_b)
 +GEN_VEXT_VX(vmin_vx_h)
 +GEN_VEXT_VX(vmin_vx_w)
 +GEN_VEXT_VX(vmin_vx_d)
 +GEN_VEXT_VX(vmaxu_vx_b)
 +GEN_VEXT_VX(vmaxu_vx_h)
 +GEN_VEXT_VX(vmaxu_vx_w)
 +GEN_VEXT_VX(vmaxu_vx_d)
 +GEN_VEXT_VX(vmax_vx_b)
 +GEN_VEXT_VX(vmax_vx_h)
 +GEN_VEXT_VX(vmax_vx_w)
 +GEN_VEXT_VX(vmax_vx_d)
  /* Vector Single-Width Integer Multiply Instructions */
  #define DO_MUL(N, M) (N * M)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vmul_vv_b, OP_SSS_B, H1, H1, H1, DO_MUL)
  RVVCALL(OPIVV2, vmul_vv_h, OP_SSS_H, H2, H2, H2, DO_MUL)
  RVVCALL(OPIVV2, vmul_vv_w, OP_SSS_W, H4, H4, H4, DO_MUL)
  RVVCALL(OPIVV2, vmul_vv_d, OP_SSS_D, H8, H8, H8, DO_MUL)
 -GEN_VEXT_VV(vmul_vv_b, 1, 1)
 -GEN_VEXT_VV(vmul_vv_h, 2, 2)
 -GEN_VEXT_VV(vmul_vv_w, 4, 4)
 -GEN_VEXT_VV(vmul_vv_d, 8, 8)
 +GEN_VEXT_VV(vmul_vv_b)
 +GEN_VEXT_VV(vmul_vv_h)
 +GEN_VEXT_VV(vmul_vv_w)
 +GEN_VEXT_VV(vmul_vv_d)
  static int8_t do_mulh_b(int8_t s2, int8_t s1)
  {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vmulhsu_vv_b, OP_SUS_B, H1, H1, H1, do_mulhsu_b)
  RVVCALL(OPIVV2, vmulhsu_vv_h, OP_SUS_H, H2, H2, H2, do_mulhsu_h)
  RVVCALL(OPIVV2, vmulhsu_vv_w, OP_SUS_W, H4, H4, H4, do_mulhsu_w)
  RVVCALL(OPIVV2, vmulhsu_vv_d, OP_SUS_D, H8, H8, H8, do_mulhsu_d)
 -GEN_VEXT_VV(vmulh_vv_b, 1, 1)
 -GEN_VEXT_VV(vmulh_vv_h, 2, 2)
 -GEN_VEXT_VV(vmulh_vv_w, 4, 4)
 -GEN_VEXT_VV(vmulh_vv_d, 8, 8)
 -GEN_VEXT_VV(vmulhu_vv_b, 1, 1)
 -GEN_VEXT_VV(vmulhu_vv_h, 2, 2)
 -GEN_VEXT_VV(vmulhu_vv_w, 4, 4)
 -GEN_VEXT_VV(vmulhu_vv_d, 8, 8)
 -GEN_VEXT_VV(vmulhsu_vv_b, 1, 1)
 -GEN_VEXT_VV(vmulhsu_vv_h, 2, 2)
 -GEN_VEXT_VV(vmulhsu_vv_w, 4, 4)
 -GEN_VEXT_VV(vmulhsu_vv_d, 8, 8)
 +GEN_VEXT_VV(vmulh_vv_b)
 +GEN_VEXT_VV(vmulh_vv_h)
 +GEN_VEXT_VV(vmulh_vv_w)
 +GEN_VEXT_VV(vmulh_vv_d)
 +GEN_VEXT_VV(vmulhu_vv_b)
 +GEN_VEXT_VV(vmulhu_vv_h)
 +GEN_VEXT_VV(vmulhu_vv_w)
 +GEN_VEXT_VV(vmulhu_vv_d)
 +GEN_VEXT_VV(vmulhsu_vv_b)
 +GEN_VEXT_VV(vmulhsu_vv_h)
 +GEN_VEXT_VV(vmulhsu_vv_w)
 +GEN_VEXT_VV(vmulhsu_vv_d)
  RVVCALL(OPIVX2, vmul_vx_b, OP_SSS_B, H1, H1, DO_MUL)
  RVVCALL(OPIVX2, vmul_vx_h, OP_SSS_H, H2, H2, DO_MUL)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vmulhsu_vx_b, OP_SUS_B, H1, H1, do_mulhsu_b)
  RVVCALL(OPIVX2, vmulhsu_vx_h, OP_SUS_H, H2, H2, do_mulhsu_h)
  RVVCALL(OPIVX2, vmulhsu_vx_w, OP_SUS_W, H4, H4, do_mulhsu_w)
  RVVCALL(OPIVX2, vmulhsu_vx_d, OP_SUS_D, H8, H8, do_mulhsu_d)
 -GEN_VEXT_VX(vmul_vx_b, 1, 1)
 -GEN_VEXT_VX(vmul_vx_h, 2, 2)
 -GEN_VEXT_VX(vmul_vx_w, 4, 4)
 -GEN_VEXT_VX(vmul_vx_d, 8, 8)
 -GEN_VEXT_VX(vmulh_vx_b, 1, 1)
 -GEN_VEXT_VX(vmulh_vx_h, 2, 2)
 -GEN_VEXT_VX(vmulh_vx_w, 4, 4)
 -GEN_VEXT_VX(vmulh_vx_d, 8, 8)
 -GEN_VEXT_VX(vmulhu_vx_b, 1, 1)
 -GEN_VEXT_VX(vmulhu_vx_h, 2, 2)
 -GEN_VEXT_VX(vmulhu_vx_w, 4, 4)
 -GEN_VEXT_VX(vmulhu_vx_d, 8, 8)
 -GEN_VEXT_VX(vmulhsu_vx_b, 1, 1)
 -GEN_VEXT_VX(vmulhsu_vx_h, 2, 2)
 -GEN_VEXT_VX(vmulhsu_vx_w, 4, 4)
 -GEN_VEXT_VX(vmulhsu_vx_d, 8, 8)
 +GEN_VEXT_VX(vmul_vx_b)
 +GEN_VEXT_VX(vmul_vx_h)
 +GEN_VEXT_VX(vmul_vx_w)
 +GEN_VEXT_VX(vmul_vx_d)
 +GEN_VEXT_VX(vmulh_vx_b)
 +GEN_VEXT_VX(vmulh_vx_h)
 +GEN_VEXT_VX(vmulh_vx_w)
 +GEN_VEXT_VX(vmulh_vx_d)
 +GEN_VEXT_VX(vmulhu_vx_b)
 +GEN_VEXT_VX(vmulhu_vx_h)
 +GEN_VEXT_VX(vmulhu_vx_w)
 +GEN_VEXT_VX(vmulhu_vx_d)
 +GEN_VEXT_VX(vmulhsu_vx_b)
 +GEN_VEXT_VX(vmulhsu_vx_h)
 +GEN_VEXT_VX(vmulhsu_vx_w)
 +GEN_VEXT_VX(vmulhsu_vx_d)
  /* Vector Integer Divide Instructions */
  #define DO_DIVU(N, M) (unlikely(M == 0) ? (__typeof(N))(-1) : N / M)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vrem_vv_b, OP_SSS_B, H1, H1, H1, DO_REM)
  RVVCALL(OPIVV2, vrem_vv_h, OP_SSS_H, H2, H2, H2, DO_REM)
  RVVCALL(OPIVV2, vrem_vv_w, OP_SSS_W, H4, H4, H4, DO_REM)
  RVVCALL(OPIVV2, vrem_vv_d, OP_SSS_D, H8, H8, H8, DO_REM)
 -GEN_VEXT_VV(vdivu_vv_b, 1, 1)
 -GEN_VEXT_VV(vdivu_vv_h, 2, 2)
 -GEN_VEXT_VV(vdivu_vv_w, 4, 4)
 -GEN_VEXT_VV(vdivu_vv_d, 8, 8)
 -GEN_VEXT_VV(vdiv_vv_b, 1, 1)
 -GEN_VEXT_VV(vdiv_vv_h, 2, 2)
 -GEN_VEXT_VV(vdiv_vv_w, 4, 4)
 -GEN_VEXT_VV(vdiv_vv_d, 8, 8)
 -GEN_VEXT_VV(vremu_vv_b, 1, 1)
 -GEN_VEXT_VV(vremu_vv_h, 2, 2)
 -GEN_VEXT_VV(vremu_vv_w, 4, 4)
 -GEN_VEXT_VV(vremu_vv_d, 8, 8)
 -GEN_VEXT_VV(vrem_vv_b, 1, 1)
 -GEN_VEXT_VV(vrem_vv_h, 2, 2)
 -GEN_VEXT_VV(vrem_vv_w, 4, 4)
 -GEN_VEXT_VV(vrem_vv_d, 8, 8)
 +GEN_VEXT_VV(vdivu_vv_b)
 +GEN_VEXT_VV(vdivu_vv_h)
 +GEN_VEXT_VV(vdivu_vv_w)
 +GEN_VEXT_VV(vdivu_vv_d)
 +GEN_VEXT_VV(vdiv_vv_b)
 +GEN_VEXT_VV(vdiv_vv_h)
 +GEN_VEXT_VV(vdiv_vv_w)
 +GEN_VEXT_VV(vdiv_vv_d)
 +GEN_VEXT_VV(vremu_vv_b)
 +GEN_VEXT_VV(vremu_vv_h)
 +GEN_VEXT_VV(vremu_vv_w)
 +GEN_VEXT_VV(vremu_vv_d)
 +GEN_VEXT_VV(vrem_vv_b)
 +GEN_VEXT_VV(vrem_vv_h)
 +GEN_VEXT_VV(vrem_vv_w)
 +GEN_VEXT_VV(vrem_vv_d)
  RVVCALL(OPIVX2, vdivu_vx_b, OP_UUU_B, H1, H1, DO_DIVU)
  RVVCALL(OPIVX2, vdivu_vx_h, OP_UUU_H, H2, H2, DO_DIVU)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vrem_vx_b, OP_SSS_B, H1, H1, DO_REM)
  RVVCALL(OPIVX2, vrem_vx_h, OP_SSS_H, H2, H2, DO_REM)
  RVVCALL(OPIVX2, vrem_vx_w, OP_SSS_W, H4, H4, DO_REM)
  RVVCALL(OPIVX2, vrem_vx_d, OP_SSS_D, H8, H8, DO_REM)
 -GEN_VEXT_VX(vdivu_vx_b, 1, 1)
 -GEN_VEXT_VX(vdivu_vx_h, 2, 2)
 -GEN_VEXT_VX(vdivu_vx_w, 4, 4)
 -GEN_VEXT_VX(vdivu_vx_d, 8, 8)
 -GEN_VEXT_VX(vdiv_vx_b, 1, 1)
 -GEN_VEXT_VX(vdiv_vx_h, 2, 2)
 -GEN_VEXT_VX(vdiv_vx_w, 4, 4)
 -GEN_VEXT_VX(vdiv_vx_d, 8, 8)
 -GEN_VEXT_VX(vremu_vx_b, 1, 1)
 -GEN_VEXT_VX(vremu_vx_h, 2, 2)
 -GEN_VEXT_VX(vremu_vx_w, 4, 4)
 -GEN_VEXT_VX(vremu_vx_d, 8, 8)
 -GEN_VEXT_VX(vrem_vx_b, 1, 1)
 -GEN_VEXT_VX(vrem_vx_h, 2, 2)
 -GEN_VEXT_VX(vrem_vx_w, 4, 4)
 -GEN_VEXT_VX(vrem_vx_d, 8, 8)
 +GEN_VEXT_VX(vdivu_vx_b)
 +GEN_VEXT_VX(vdivu_vx_h)
 +GEN_VEXT_VX(vdivu_vx_w)
 +GEN_VEXT_VX(vdivu_vx_d)
 +GEN_VEXT_VX(vdiv_vx_b)
 +GEN_VEXT_VX(vdiv_vx_h)
 +GEN_VEXT_VX(vdiv_vx_w)
 +GEN_VEXT_VX(vdiv_vx_d)
 +GEN_VEXT_VX(vremu_vx_b)
 +GEN_VEXT_VX(vremu_vx_h)
 +GEN_VEXT_VX(vremu_vx_w)
 +GEN_VEXT_VX(vremu_vx_d)
 +GEN_VEXT_VX(vrem_vx_b)
 +GEN_VEXT_VX(vrem_vx_h)
 +GEN_VEXT_VX(vrem_vx_w)
 +GEN_VEXT_VX(vrem_vx_d)
  /* Vector Widening Integer Multiply Instructions */
  RVVCALL(OPIVV2, vwmul_vv_b, WOP_SSS_B, H2, H1, H1, DO_MUL)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vwmulu_vv_w, WOP_UUU_W, H8, H4, H4, DO_MUL)
  RVVCALL(OPIVV2, vwmulsu_vv_b, WOP_SUS_B, H2, H1, H1, DO_MUL)
  RVVCALL(OPIVV2, vwmulsu_vv_h, WOP_SUS_H, H4, H2, H2, DO_MUL)
  RVVCALL(OPIVV2, vwmulsu_vv_w, WOP_SUS_W, H8, H4, H4, DO_MUL)
 -GEN_VEXT_VV(vwmul_vv_b, 1, 2)
 -GEN_VEXT_VV(vwmul_vv_h, 2, 4)
 -GEN_VEXT_VV(vwmul_vv_w, 4, 8)
 -GEN_VEXT_VV(vwmulu_vv_b, 1, 2)
 -GEN_VEXT_VV(vwmulu_vv_h, 2, 4)
 -GEN_VEXT_VV(vwmulu_vv_w, 4, 8)
 -GEN_VEXT_VV(vwmulsu_vv_b, 1, 2)
 -GEN_VEXT_VV(vwmulsu_vv_h, 2, 4)
 -GEN_VEXT_VV(vwmulsu_vv_w, 4, 8)
 +GEN_VEXT_VV(vwmul_vv_b)
 +GEN_VEXT_VV(vwmul_vv_h)
 +GEN_VEXT_VV(vwmul_vv_w)
 +GEN_VEXT_VV(vwmulu_vv_b)
 +GEN_VEXT_VV(vwmulu_vv_h)
 +GEN_VEXT_VV(vwmulu_vv_w)
 +GEN_VEXT_VV(vwmulsu_vv_b)
 +GEN_VEXT_VV(vwmulsu_vv_h)
 +GEN_VEXT_VV(vwmulsu_vv_w)
  RVVCALL(OPIVX2, vwmul_vx_b, WOP_SSS_B, H2, H1, DO_MUL)
  RVVCALL(OPIVX2, vwmul_vx_h, WOP_SSS_H, H4, H2, DO_MUL)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vwmulu_vx_w, WOP_UUU_W, H8, H4, DO_MUL)
  RVVCALL(OPIVX2, vwmulsu_vx_b, WOP_SUS_B, H2, H1, DO_MUL)
  RVVCALL(OPIVX2, vwmulsu_vx_h, WOP_SUS_H, H4, H2, DO_MUL)
  RVVCALL(OPIVX2, vwmulsu_vx_w, WOP_SUS_W, H8, H4, DO_MUL)
 -GEN_VEXT_VX(vwmul_vx_b, 1, 2)
 -GEN_VEXT_VX(vwmul_vx_h, 2, 4)
 -GEN_VEXT_VX(vwmul_vx_w, 4, 8)
 -GEN_VEXT_VX(vwmulu_vx_b, 1, 2)
 -GEN_VEXT_VX(vwmulu_vx_h, 2, 4)
 -GEN_VEXT_VX(vwmulu_vx_w, 4, 8)
 -GEN_VEXT_VX(vwmulsu_vx_b, 1, 2)
 -GEN_VEXT_VX(vwmulsu_vx_h, 2, 4)
 -GEN_VEXT_VX(vwmulsu_vx_w, 4, 8)
 +GEN_VEXT_VX(vwmul_vx_b)
 +GEN_VEXT_VX(vwmul_vx_h)
 +GEN_VEXT_VX(vwmul_vx_w)
 +GEN_VEXT_VX(vwmulu_vx_b)
 +GEN_VEXT_VX(vwmulu_vx_h)
 +GEN_VEXT_VX(vwmulu_vx_w)
 +GEN_VEXT_VX(vwmulsu_vx_b)
 +GEN_VEXT_VX(vwmulsu_vx_h)
 +GEN_VEXT_VX(vwmulsu_vx_w)
  /* Vector Single-Width Integer Multiply-Add Instructions */
- #define OPIVV3(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)   \
+ GEN_OPIVV_TRANS(vmacc_vv, opivv_check)
-@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV3, vnmsub_vv_b, OP_SSS_B, H1, H1, H1, DO_NMSUB)
+@@ -XXX,XX +XXX,XX @@ GEN_OPIVX_TRANS(vnmsub_vx, opivx_check)
- RVVCALL(OPIVV3, vnmsub_vv_h, OP_SSS_H, H2, H2, H2, DO_NMSUB)
+ GEN_OPIVV_WIDEN_TRANS(vwmaccu_vv, opivv_widen_check)
- RVVCALL(OPIVV3, vnmsub_vv_w, OP_SSS_W, H4, H4, H4, DO_NMSUB)
+ GEN_OPIVV_WIDEN_TRANS(vwmacc_vv, opivv_widen_check)
- RVVCALL(OPIVV3, vnmsub_vv_d, OP_SSS_D, H8, H8, H8, DO_NMSUB)
+ GEN_OPIVV_WIDEN_TRANS(vwmaccsu_vv, opivv_widen_check)
--GEN_VEXT_VV(vmacc_vv_b, 1, 1)
+-GEN_OPIVX_WIDEN_TRANS(vwmaccu_vx)
--GEN_VEXT_VV(vmacc_vv_h, 2, 2)
+-GEN_OPIVX_WIDEN_TRANS(vwmacc_vx)
--GEN_VEXT_VV(vmacc_vv_w, 4, 4)
+-GEN_OPIVX_WIDEN_TRANS(vwmaccsu_vx)
--GEN_VEXT_VV(vmacc_vv_d, 8, 8)
+-GEN_OPIVX_WIDEN_TRANS(vwmaccus_vx)
--GEN_VEXT_VV(vnmsac_vv_b, 1, 1)
++GEN_OPIVX_WIDEN_TRANS(vwmaccu_vx, opivx_widen_check)
--GEN_VEXT_VV(vnmsac_vv_h, 2, 2)
++GEN_OPIVX_WIDEN_TRANS(vwmacc_vx, opivx_widen_check)
--GEN_VEXT_VV(vnmsac_vv_w, 4, 4)
++GEN_OPIVX_WIDEN_TRANS(vwmaccsu_vx, opivx_widen_check)
--GEN_VEXT_VV(vnmsac_vv_d, 8, 8)
++GEN_OPIVX_WIDEN_TRANS(vwmaccus_vx, opivx_widen_check)
 -GEN_VEXT_VV(vmadd_vv_b, 1, 1)
 -GEN_VEXT_VV(vmadd_vv_h, 2, 2)
 -GEN_VEXT_VV(vmadd_vv_w, 4, 4)
 -GEN_VEXT_VV(vmadd_vv_d, 8, 8)
 -GEN_VEXT_VV(vnmsub_vv_b, 1, 1)
 -GEN_VEXT_VV(vnmsub_vv_h, 2, 2)
 -GEN_VEXT_VV(vnmsub_vv_w, 4, 4)
 -GEN_VEXT_VV(vnmsub_vv_d, 8, 8)
 +GEN_VEXT_VV(vmacc_vv_b)
 +GEN_VEXT_VV(vmacc_vv_h)
 +GEN_VEXT_VV(vmacc_vv_w)
 +GEN_VEXT_VV(vmacc_vv_d)
 +GEN_VEXT_VV(vnmsac_vv_b)
 +GEN_VEXT_VV(vnmsac_vv_h)
 +GEN_VEXT_VV(vnmsac_vv_w)
 +GEN_VEXT_VV(vnmsac_vv_d)
 +GEN_VEXT_VV(vmadd_vv_b)
 +GEN_VEXT_VV(vmadd_vv_h)
 +GEN_VEXT_VV(vmadd_vv_w)
 +GEN_VEXT_VV(vmadd_vv_d)
 +GEN_VEXT_VV(vnmsub_vv_b)
 +GEN_VEXT_VV(vnmsub_vv_h)
 +GEN_VEXT_VV(vnmsub_vv_w)
 +GEN_VEXT_VV(vnmsub_vv_d)
  #define OPIVX3(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
  static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX3, vnmsub_vx_b, OP_SSS_B, H1, H1, DO_NMSUB)
  RVVCALL(OPIVX3, vnmsub_vx_h, OP_SSS_H, H2, H2, DO_NMSUB)
  RVVCALL(OPIVX3, vnmsub_vx_w, OP_SSS_W, H4, H4, DO_NMSUB)
  RVVCALL(OPIVX3, vnmsub_vx_d, OP_SSS_D, H8, H8, DO_NMSUB)
 -GEN_VEXT_VX(vmacc_vx_b, 1, 1)
 -GEN_VEXT_VX(vmacc_vx_h, 2, 2)
 -GEN_VEXT_VX(vmacc_vx_w, 4, 4)
 -GEN_VEXT_VX(vmacc_vx_d, 8, 8)
 -GEN_VEXT_VX(vnmsac_vx_b, 1, 1)
 -GEN_VEXT_VX(vnmsac_vx_h, 2, 2)
 -GEN_VEXT_VX(vnmsac_vx_w, 4, 4)
 -GEN_VEXT_VX(vnmsac_vx_d, 8, 8)
 -GEN_VEXT_VX(vmadd_vx_b, 1, 1)
 -GEN_VEXT_VX(vmadd_vx_h, 2, 2)
 -GEN_VEXT_VX(vmadd_vx_w, 4, 4)
 -GEN_VEXT_VX(vmadd_vx_d, 8, 8)
 -GEN_VEXT_VX(vnmsub_vx_b, 1, 1)
 -GEN_VEXT_VX(vnmsub_vx_h, 2, 2)
 -GEN_VEXT_VX(vnmsub_vx_w, 4, 4)
 -GEN_VEXT_VX(vnmsub_vx_d, 8, 8)
 +GEN_VEXT_VX(vmacc_vx_b)
 +GEN_VEXT_VX(vmacc_vx_h)
 +GEN_VEXT_VX(vmacc_vx_w)
 +GEN_VEXT_VX(vmacc_vx_d)
 +GEN_VEXT_VX(vnmsac_vx_b)
 +GEN_VEXT_VX(vnmsac_vx_h)
 +GEN_VEXT_VX(vnmsac_vx_w)
 +GEN_VEXT_VX(vnmsac_vx_d)
 +GEN_VEXT_VX(vmadd_vx_b)
 +GEN_VEXT_VX(vmadd_vx_h)
 +GEN_VEXT_VX(vmadd_vx_w)
 +GEN_VEXT_VX(vmadd_vx_d)
 +GEN_VEXT_VX(vnmsub_vx_b)
 +GEN_VEXT_VX(vnmsub_vx_h)
 +GEN_VEXT_VX(vnmsub_vx_w)
 +GEN_VEXT_VX(vnmsub_vx_d)
  /* Vector Widening Integer Multiply-Add Instructions */
  RVVCALL(OPIVV3, vwmaccu_vv_b, WOP_UUU_B, H2, H1, H1, DO_MACC)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV3, vwmacc_vv_w, WOP_SSS_W, H8, H4, H4, DO_MACC)
  RVVCALL(OPIVV3, vwmaccsu_vv_b, WOP_SSU_B, H2, H1, H1, DO_MACC)
  RVVCALL(OPIVV3, vwmaccsu_vv_h, WOP_SSU_H, H4, H2, H2, DO_MACC)
  RVVCALL(OPIVV3, vwmaccsu_vv_w, WOP_SSU_W, H8, H4, H4, DO_MACC)
 -GEN_VEXT_VV(vwmaccu_vv_b, 1, 2)
 -GEN_VEXT_VV(vwmaccu_vv_h, 2, 4)
 -GEN_VEXT_VV(vwmaccu_vv_w, 4, 8)
 -GEN_VEXT_VV(vwmacc_vv_b, 1, 2)
 -GEN_VEXT_VV(vwmacc_vv_h, 2, 4)
 -GEN_VEXT_VV(vwmacc_vv_w, 4, 8)
 -GEN_VEXT_VV(vwmaccsu_vv_b, 1, 2)
 -GEN_VEXT_VV(vwmaccsu_vv_h, 2, 4)
 -GEN_VEXT_VV(vwmaccsu_vv_w, 4, 8)
 +GEN_VEXT_VV(vwmaccu_vv_b)
 +GEN_VEXT_VV(vwmaccu_vv_h)
 +GEN_VEXT_VV(vwmaccu_vv_w)
 +GEN_VEXT_VV(vwmacc_vv_b)
 +GEN_VEXT_VV(vwmacc_vv_h)
 +GEN_VEXT_VV(vwmacc_vv_w)
 +GEN_VEXT_VV(vwmaccsu_vv_b)
 +GEN_VEXT_VV(vwmaccsu_vv_h)
 +GEN_VEXT_VV(vwmaccsu_vv_w)
  RVVCALL(OPIVX3, vwmaccu_vx_b, WOP_UUU_B, H2, H1, DO_MACC)
  RVVCALL(OPIVX3, vwmaccu_vx_h, WOP_UUU_H, H4, H2, DO_MACC)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX3, vwmaccsu_vx_w, WOP_SSU_W, H8, H4, DO_MACC)
  RVVCALL(OPIVX3, vwmaccus_vx_b, WOP_SUS_B, H2, H1, DO_MACC)
  RVVCALL(OPIVX3, vwmaccus_vx_h, WOP_SUS_H, H4, H2, DO_MACC)
  RVVCALL(OPIVX3, vwmaccus_vx_w, WOP_SUS_W, H8, H4, DO_MACC)
 -GEN_VEXT_VX(vwmaccu_vx_b, 1, 2)
 -GEN_VEXT_VX(vwmaccu_vx_h, 2, 4)
 -GEN_VEXT_VX(vwmaccu_vx_w, 4, 8)
 -GEN_VEXT_VX(vwmacc_vx_b, 1, 2)
 -GEN_VEXT_VX(vwmacc_vx_h, 2, 4)
 -GEN_VEXT_VX(vwmacc_vx_w, 4, 8)
 -GEN_VEXT_VX(vwmaccsu_vx_b, 1, 2)
 -GEN_VEXT_VX(vwmaccsu_vx_h, 2, 4)
 -GEN_VEXT_VX(vwmaccsu_vx_w, 4, 8)
 -GEN_VEXT_VX(vwmaccus_vx_b, 1, 2)
 -GEN_VEXT_VX(vwmaccus_vx_h, 2, 4)
 -GEN_VEXT_VX(vwmaccus_vx_w, 4, 8)
 +GEN_VEXT_VX(vwmaccu_vx_b)
 +GEN_VEXT_VX(vwmaccu_vx_h)
 +GEN_VEXT_VX(vwmaccu_vx_w)
 +GEN_VEXT_VX(vwmacc_vx_b)
 +GEN_VEXT_VX(vwmacc_vx_h)
 +GEN_VEXT_VX(vwmacc_vx_w)
 +GEN_VEXT_VX(vwmaccsu_vx_b)
 +GEN_VEXT_VX(vwmaccsu_vx_h)
 +GEN_VEXT_VX(vwmaccsu_vx_w)
 +GEN_VEXT_VX(vwmaccus_vx_b)
 +GEN_VEXT_VX(vwmaccus_vx_h)
 +GEN_VEXT_VX(vwmaccus_vx_w)
  /* Vector Integer Merge and Move Instructions */
- #define GEN_VEXT_VMV_VV(NAME, ETYPE, H)                              \
+ static bool trans_vmv_v_v(DisasContext *s, arg_vmv_v_v *a)
@@ -XXX,XX +XXX,XX @@ vext_vv_rm_1(void *vd, void *v0, void *vs1, void *vs2,
  static inline void
  vext_vv_rm_2(void *vd, void *v0, void *vs1, void *vs2,
               CPURISCVState *env,
 -             uint32_t desc, uint32_t esz, uint32_t dsz,
 +             uint32_t desc,
               opivv2_rm_fn *fn)
  {
      uint32_t vm = vext_vm(desc);
@@ -XXX,XX +XXX,XX @@ vext_vv_rm_2(void *vd, void *v0, void *vs1, void *vs2,
  }
  /* generate helpers for fixed point instructions with OPIVV format */
 -#define GEN_VEXT_VV_RM(NAME, ESZ, DSZ)                          \
 +#define GEN_VEXT_VV_RM(NAME)                                    \
  void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,     \
                    CPURISCVState *env, uint32_t desc)            \
  {                                                               \
 -    vext_vv_rm_2(vd, v0, vs1, vs2, env, desc, ESZ, DSZ,         \
 +    vext_vv_rm_2(vd, v0, vs1, vs2, env, desc,                   \
                   do_##NAME);                                    \
  }
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vsaddu_vv_b, OP_UUU_B, H1, H1, H1, saddu8)
  RVVCALL(OPIVV2_RM, vsaddu_vv_h, OP_UUU_H, H2, H2, H2, saddu16)
  RVVCALL(OPIVV2_RM, vsaddu_vv_w, OP_UUU_W, H4, H4, H4, saddu32)
  RVVCALL(OPIVV2_RM, vsaddu_vv_d, OP_UUU_D, H8, H8, H8, saddu64)
 -GEN_VEXT_VV_RM(vsaddu_vv_b, 1, 1)
 -GEN_VEXT_VV_RM(vsaddu_vv_h, 2, 2)
 -GEN_VEXT_VV_RM(vsaddu_vv_w, 4, 4)
 -GEN_VEXT_VV_RM(vsaddu_vv_d, 8, 8)
 +GEN_VEXT_VV_RM(vsaddu_vv_b)
 +GEN_VEXT_VV_RM(vsaddu_vv_h)
 +GEN_VEXT_VV_RM(vsaddu_vv_w)
 +GEN_VEXT_VV_RM(vsaddu_vv_d)
  typedef void opivx2_rm_fn(void *vd, target_long s1, void *vs2, int i,
                            CPURISCVState *env, int vxrm);
@@ -XXX,XX +XXX,XX @@ vext_vx_rm_1(void *vd, void *v0, target_long s1, void *vs2,
  static inline void
  vext_vx_rm_2(void *vd, void *v0, target_long s1, void *vs2,
               CPURISCVState *env,
 -             uint32_t desc, uint32_t esz, uint32_t dsz,
 +             uint32_t desc,
               opivx2_rm_fn *fn)
  {
      uint32_t vm = vext_vm(desc);
@@ -XXX,XX +XXX,XX @@ vext_vx_rm_2(void *vd, void *v0, target_long s1, void *vs2,
  }
  /* generate helpers for fixed point instructions with OPIVX format */
 -#define GEN_VEXT_VX_RM(NAME, ESZ, DSZ)                    \
 +#define GEN_VEXT_VX_RM(NAME)                              \
  void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
          void *vs2, CPURISCVState *env, uint32_t desc)     \
  {                                                         \
 -    vext_vx_rm_2(vd, v0, s1, vs2, env, desc, ESZ, DSZ,    \
 +    vext_vx_rm_2(vd, v0, s1, vs2, env, desc,              \
                   do_##NAME);                              \
  }
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2_RM, vsaddu_vx_b, OP_UUU_B, H1, H1, saddu8)
  RVVCALL(OPIVX2_RM, vsaddu_vx_h, OP_UUU_H, H2, H2, saddu16)
  RVVCALL(OPIVX2_RM, vsaddu_vx_w, OP_UUU_W, H4, H4, saddu32)
  RVVCALL(OPIVX2_RM, vsaddu_vx_d, OP_UUU_D, H8, H8, saddu64)
 -GEN_VEXT_VX_RM(vsaddu_vx_b, 1, 1)
 -GEN_VEXT_VX_RM(vsaddu_vx_h, 2, 2)
 -GEN_VEXT_VX_RM(vsaddu_vx_w, 4, 4)
 -GEN_VEXT_VX_RM(vsaddu_vx_d, 8, 8)
 +GEN_VEXT_VX_RM(vsaddu_vx_b)
 +GEN_VEXT_VX_RM(vsaddu_vx_h)
 +GEN_VEXT_VX_RM(vsaddu_vx_w)
 +GEN_VEXT_VX_RM(vsaddu_vx_d)
  static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
  {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vsadd_vv_b, OP_SSS_B, H1, H1, H1, sadd8)
  RVVCALL(OPIVV2_RM, vsadd_vv_h, OP_SSS_H, H2, H2, H2, sadd16)
  RVVCALL(OPIVV2_RM, vsadd_vv_w, OP_SSS_W, H4, H4, H4, sadd32)
  RVVCALL(OPIVV2_RM, vsadd_vv_d, OP_SSS_D, H8, H8, H8, sadd64)
 -GEN_VEXT_VV_RM(vsadd_vv_b, 1, 1)
 -GEN_VEXT_VV_RM(vsadd_vv_h, 2, 2)
 -GEN_VEXT_VV_RM(vsadd_vv_w, 4, 4)
 -GEN_VEXT_VV_RM(vsadd_vv_d, 8, 8)
 +GEN_VEXT_VV_RM(vsadd_vv_b)
 +GEN_VEXT_VV_RM(vsadd_vv_h)
 +GEN_VEXT_VV_RM(vsadd_vv_w)
 +GEN_VEXT_VV_RM(vsadd_vv_d)
  RVVCALL(OPIVX2_RM, vsadd_vx_b, OP_SSS_B, H1, H1, sadd8)
  RVVCALL(OPIVX2_RM, vsadd_vx_h, OP_SSS_H, H2, H2, sadd16)
  RVVCALL(OPIVX2_RM, vsadd_vx_w, OP_SSS_W, H4, H4, sadd32)
  RVVCALL(OPIVX2_RM, vsadd_vx_d, OP_SSS_D, H8, H8, sadd64)
 -GEN_VEXT_VX_RM(vsadd_vx_b, 1, 1)
 -GEN_VEXT_VX_RM(vsadd_vx_h, 2, 2)
 -GEN_VEXT_VX_RM(vsadd_vx_w, 4, 4)
 -GEN_VEXT_VX_RM(vsadd_vx_d, 8, 8)
 +GEN_VEXT_VX_RM(vsadd_vx_b)
 +GEN_VEXT_VX_RM(vsadd_vx_h)
 +GEN_VEXT_VX_RM(vsadd_vx_w)
 +GEN_VEXT_VX_RM(vsadd_vx_d)
  static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
  {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssubu_vv_b, OP_UUU_B, H1, H1, H1, ssubu8)
  RVVCALL(OPIVV2_RM, vssubu_vv_h, OP_UUU_H, H2, H2, H2, ssubu16)
  RVVCALL(OPIVV2_RM, vssubu_vv_w, OP_UUU_W, H4, H4, H4, ssubu32)
  RVVCALL(OPIVV2_RM, vssubu_vv_d, OP_UUU_D, H8, H8, H8, ssubu64)
 -GEN_VEXT_VV_RM(vssubu_vv_b, 1, 1)
 -GEN_VEXT_VV_RM(vssubu_vv_h, 2, 2)
 -GEN_VEXT_VV_RM(vssubu_vv_w, 4, 4)
 -GEN_VEXT_VV_RM(vssubu_vv_d, 8, 8)
 +GEN_VEXT_VV_RM(vssubu_vv_b)
 +GEN_VEXT_VV_RM(vssubu_vv_h)
 +GEN_VEXT_VV_RM(vssubu_vv_w)
 +GEN_VEXT_VV_RM(vssubu_vv_d)
  RVVCALL(OPIVX2_RM, vssubu_vx_b, OP_UUU_B, H1, H1, ssubu8)
  RVVCALL(OPIVX2_RM, vssubu_vx_h, OP_UUU_H, H2, H2, ssubu16)
  RVVCALL(OPIVX2_RM, vssubu_vx_w, OP_UUU_W, H4, H4, ssubu32)
  RVVCALL(OPIVX2_RM, vssubu_vx_d, OP_UUU_D, H8, H8, ssubu64)
 -GEN_VEXT_VX_RM(vssubu_vx_b, 1, 1)
 -GEN_VEXT_VX_RM(vssubu_vx_h, 2, 2)
 -GEN_VEXT_VX_RM(vssubu_vx_w, 4, 4)
 -GEN_VEXT_VX_RM(vssubu_vx_d, 8, 8)
 +GEN_VEXT_VX_RM(vssubu_vx_b)
 +GEN_VEXT_VX_RM(vssubu_vx_h)
 +GEN_VEXT_VX_RM(vssubu_vx_w)
 +GEN_VEXT_VX_RM(vssubu_vx_d)
  static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
  {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssub_vv_b, OP_SSS_B, H1, H1, H1, ssub8)
  RVVCALL(OPIVV2_RM, vssub_vv_h, OP_SSS_H, H2, H2, H2, ssub16)
  RVVCALL(OPIVV2_RM, vssub_vv_w, OP_SSS_W, H4, H4, H4, ssub32)
  RVVCALL(OPIVV2_RM, vssub_vv_d, OP_SSS_D, H8, H8, H8, ssub64)
 -GEN_VEXT_VV_RM(vssub_vv_b, 1, 1)
 -GEN_VEXT_VV_RM(vssub_vv_h, 2, 2)
 -GEN_VEXT_VV_RM(vssub_vv_w, 4, 4)
 -GEN_VEXT_VV_RM(vssub_vv_d, 8, 8)
 +GEN_VEXT_VV_RM(vssub_vv_b)
 +GEN_VEXT_VV_RM(vssub_vv_h)
 +GEN_VEXT_VV_RM(vssub_vv_w)
 +GEN_VEXT_VV_RM(vssub_vv_d)
  RVVCALL(OPIVX2_RM, vssub_vx_b, OP_SSS_B, H1, H1, ssub8)
  RVVCALL(OPIVX2_RM, vssub_vx_h, OP_SSS_H, H2, H2, ssub16)
  RVVCALL(OPIVX2_RM, vssub_vx_w, OP_SSS_W, H4, H4, ssub32)
  RVVCALL(OPIVX2_RM, vssub_vx_d, OP_SSS_D, H8, H8, ssub64)
 -GEN_VEXT_VX_RM(vssub_vx_b, 1, 1)
 -GEN_VEXT_VX_RM(vssub_vx_h, 2, 2)
 -GEN_VEXT_VX_RM(vssub_vx_w, 4, 4)
 -GEN_VEXT_VX_RM(vssub_vx_d, 8, 8)
 +GEN_VEXT_VX_RM(vssub_vx_b)
 +GEN_VEXT_VX_RM(vssub_vx_h)
 +GEN_VEXT_VX_RM(vssub_vx_w)
 +GEN_VEXT_VX_RM(vssub_vx_d)
  /* Vector Single-Width Averaging Add and Subtract */
  static inline uint8_t get_round(int vxrm, uint64_t v, uint8_t shift)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd32)
  RVVCALL(OPIVV2_RM, vaadd_vv_h, OP_SSS_H, H2, H2, H2, aadd32)
  RVVCALL(OPIVV2_RM, vaadd_vv_w, OP_SSS_W, H4, H4, H4, aadd32)
  RVVCALL(OPIVV2_RM, vaadd_vv_d, OP_SSS_D, H8, H8, H8, aadd64)
 -GEN_VEXT_VV_RM(vaadd_vv_b, 1, 1)
 -GEN_VEXT_VV_RM(vaadd_vv_h, 2, 2)
 -GEN_VEXT_VV_RM(vaadd_vv_w, 4, 4)
 -GEN_VEXT_VV_RM(vaadd_vv_d, 8, 8)
 +GEN_VEXT_VV_RM(vaadd_vv_b)
 +GEN_VEXT_VV_RM(vaadd_vv_h)
 +GEN_VEXT_VV_RM(vaadd_vv_w)
 +GEN_VEXT_VV_RM(vaadd_vv_d)
  RVVCALL(OPIVX2_RM, vaadd_vx_b, OP_SSS_B, H1, H1, aadd32)
  RVVCALL(OPIVX2_RM, vaadd_vx_h, OP_SSS_H, H2, H2, aadd32)
  RVVCALL(OPIVX2_RM, vaadd_vx_w, OP_SSS_W, H4, H4, aadd32)
  RVVCALL(OPIVX2_RM, vaadd_vx_d, OP_SSS_D, H8, H8, aadd64)
 -GEN_VEXT_VX_RM(vaadd_vx_b, 1, 1)
 -GEN_VEXT_VX_RM(vaadd_vx_h, 2, 2)
 -GEN_VEXT_VX_RM(vaadd_vx_w, 4, 4)
 -GEN_VEXT_VX_RM(vaadd_vx_d, 8, 8)
 +GEN_VEXT_VX_RM(vaadd_vx_b)
 +GEN_VEXT_VX_RM(vaadd_vx_h)
 +GEN_VEXT_VX_RM(vaadd_vx_w)
 +GEN_VEXT_VX_RM(vaadd_vx_d)
  static inline uint32_t aaddu32(CPURISCVState *env, int vxrm,
                                 uint32_t a, uint32_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vaaddu_vv_b, OP_UUU_B, H1, H1, H1, aaddu32)
  RVVCALL(OPIVV2_RM, vaaddu_vv_h, OP_UUU_H, H2, H2, H2, aaddu32)
  RVVCALL(OPIVV2_RM, vaaddu_vv_w, OP_UUU_W, H4, H4, H4, aaddu32)
  RVVCALL(OPIVV2_RM, vaaddu_vv_d, OP_UUU_D, H8, H8, H8, aaddu64)
 -GEN_VEXT_VV_RM(vaaddu_vv_b, 1, 1)
 -GEN_VEXT_VV_RM(vaaddu_vv_h, 2, 2)
 -GEN_VEXT_VV_RM(vaaddu_vv_w, 4, 4)
 -GEN_VEXT_VV_RM(vaaddu_vv_d, 8, 8)
 +GEN_VEXT_VV_RM(vaaddu_vv_b)
 +GEN_VEXT_VV_RM(vaaddu_vv_h)
 +GEN_VEXT_VV_RM(vaaddu_vv_w)
 +GEN_VEXT_VV_RM(vaaddu_vv_d)
  RVVCALL(OPIVX2_RM, vaaddu_vx_b, OP_UUU_B, H1, H1, aaddu32)
  RVVCALL(OPIVX2_RM, vaaddu_vx_h, OP_UUU_H, H2, H2, aaddu32)
  RVVCALL(OPIVX2_RM, vaaddu_vx_w, OP_UUU_W, H4, H4, aaddu32)
  RVVCALL(OPIVX2_RM, vaaddu_vx_d, OP_UUU_D, H8, H8, aaddu64)
 -GEN_VEXT_VX_RM(vaaddu_vx_b, 1, 1)
 -GEN_VEXT_VX_RM(vaaddu_vx_h, 2, 2)
 -GEN_VEXT_VX_RM(vaaddu_vx_w, 4, 4)
 -GEN_VEXT_VX_RM(vaaddu_vx_d, 8, 8)
 +GEN_VEXT_VX_RM(vaaddu_vx_b)
 +GEN_VEXT_VX_RM(vaaddu_vx_h)
 +GEN_VEXT_VX_RM(vaaddu_vx_w)
 +GEN_VEXT_VX_RM(vaaddu_vx_d)
  static inline int32_t asub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
  {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vasub_vv_b, OP_SSS_B, H1, H1, H1, asub32)
  RVVCALL(OPIVV2_RM, vasub_vv_h, OP_SSS_H, H2, H2, H2, asub32)
  RVVCALL(OPIVV2_RM, vasub_vv_w, OP_SSS_W, H4, H4, H4, asub32)
  RVVCALL(OPIVV2_RM, vasub_vv_d, OP_SSS_D, H8, H8, H8, asub64)
 -GEN_VEXT_VV_RM(vasub_vv_b, 1, 1)
 -GEN_VEXT_VV_RM(vasub_vv_h, 2, 2)
 -GEN_VEXT_VV_RM(vasub_vv_w, 4, 4)
 -GEN_VEXT_VV_RM(vasub_vv_d, 8, 8)
 +GEN_VEXT_VV_RM(vasub_vv_b)
 +GEN_VEXT_VV_RM(vasub_vv_h)
 +GEN_VEXT_VV_RM(vasub_vv_w)
 +GEN_VEXT_VV_RM(vasub_vv_d)
  RVVCALL(OPIVX2_RM, vasub_vx_b, OP_SSS_B, H1, H1, asub32)
  RVVCALL(OPIVX2_RM, vasub_vx_h, OP_SSS_H, H2, H2, asub32)
  RVVCALL(OPIVX2_RM, vasub_vx_w, OP_SSS_W, H4, H4, asub32)
  RVVCALL(OPIVX2_RM, vasub_vx_d, OP_SSS_D, H8, H8, asub64)
 -GEN_VEXT_VX_RM(vasub_vx_b, 1, 1)
 -GEN_VEXT_VX_RM(vasub_vx_h, 2, 2)
 -GEN_VEXT_VX_RM(vasub_vx_w, 4, 4)
 -GEN_VEXT_VX_RM(vasub_vx_d, 8, 8)
 +GEN_VEXT_VX_RM(vasub_vx_b)
 +GEN_VEXT_VX_RM(vasub_vx_h)
 +GEN_VEXT_VX_RM(vasub_vx_w)
 +GEN_VEXT_VX_RM(vasub_vx_d)
  static inline uint32_t asubu32(CPURISCVState *env, int vxrm,
                                 uint32_t a, uint32_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vasubu_vv_b, OP_UUU_B, H1, H1, H1, asubu32)
  RVVCALL(OPIVV2_RM, vasubu_vv_h, OP_UUU_H, H2, H2, H2, asubu32)
  RVVCALL(OPIVV2_RM, vasubu_vv_w, OP_UUU_W, H4, H4, H4, asubu32)
  RVVCALL(OPIVV2_RM, vasubu_vv_d, OP_UUU_D, H8, H8, H8, asubu64)
 -GEN_VEXT_VV_RM(vasubu_vv_b, 1, 1)
 -GEN_VEXT_VV_RM(vasubu_vv_h, 2, 2)
 -GEN_VEXT_VV_RM(vasubu_vv_w, 4, 4)
 -GEN_VEXT_VV_RM(vasubu_vv_d, 8, 8)
 +GEN_VEXT_VV_RM(vasubu_vv_b)
 +GEN_VEXT_VV_RM(vasubu_vv_h)
 +GEN_VEXT_VV_RM(vasubu_vv_w)
 +GEN_VEXT_VV_RM(vasubu_vv_d)
  RVVCALL(OPIVX2_RM, vasubu_vx_b, OP_UUU_B, H1, H1, asubu32)
  RVVCALL(OPIVX2_RM, vasubu_vx_h, OP_UUU_H, H2, H2, asubu32)
  RVVCALL(OPIVX2_RM, vasubu_vx_w, OP_UUU_W, H4, H4, asubu32)
  RVVCALL(OPIVX2_RM, vasubu_vx_d, OP_UUU_D, H8, H8, asubu64)
 -GEN_VEXT_VX_RM(vasubu_vx_b, 1, 1)
 -GEN_VEXT_VX_RM(vasubu_vx_h, 2, 2)
 -GEN_VEXT_VX_RM(vasubu_vx_w, 4, 4)
 -GEN_VEXT_VX_RM(vasubu_vx_d, 8, 8)
 +GEN_VEXT_VX_RM(vasubu_vx_b)
 +GEN_VEXT_VX_RM(vasubu_vx_h)
 +GEN_VEXT_VX_RM(vasubu_vx_w)
 +GEN_VEXT_VX_RM(vasubu_vx_d)
  /* Vector Single-Width Fractional Multiply with Rounding and Saturation */
  static inline int8_t vsmul8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vsmul_vv_b, OP_SSS_B, H1, H1, H1, vsmul8)
  RVVCALL(OPIVV2_RM, vsmul_vv_h, OP_SSS_H, H2, H2, H2, vsmul16)
  RVVCALL(OPIVV2_RM, vsmul_vv_w, OP_SSS_W, H4, H4, H4, vsmul32)
  RVVCALL(OPIVV2_RM, vsmul_vv_d, OP_SSS_D, H8, H8, H8, vsmul64)
 -GEN_VEXT_VV_RM(vsmul_vv_b, 1, 1)
 -GEN_VEXT_VV_RM(vsmul_vv_h, 2, 2)
 -GEN_VEXT_VV_RM(vsmul_vv_w, 4, 4)
 -GEN_VEXT_VV_RM(vsmul_vv_d, 8, 8)
 +GEN_VEXT_VV_RM(vsmul_vv_b)
 +GEN_VEXT_VV_RM(vsmul_vv_h)
 +GEN_VEXT_VV_RM(vsmul_vv_w)
 +GEN_VEXT_VV_RM(vsmul_vv_d)
  RVVCALL(OPIVX2_RM, vsmul_vx_b, OP_SSS_B, H1, H1, vsmul8)
  RVVCALL(OPIVX2_RM, vsmul_vx_h, OP_SSS_H, H2, H2, vsmul16)
  RVVCALL(OPIVX2_RM, vsmul_vx_w, OP_SSS_W, H4, H4, vsmul32)
  RVVCALL(OPIVX2_RM, vsmul_vx_d, OP_SSS_D, H8, H8, vsmul64)
 -GEN_VEXT_VX_RM(vsmul_vx_b, 1, 1)
 -GEN_VEXT_VX_RM(vsmul_vx_h, 2, 2)
 -GEN_VEXT_VX_RM(vsmul_vx_w, 4, 4)
 -GEN_VEXT_VX_RM(vsmul_vx_d, 8, 8)
 +GEN_VEXT_VX_RM(vsmul_vx_b)
 +GEN_VEXT_VX_RM(vsmul_vx_h)
 +GEN_VEXT_VX_RM(vsmul_vx_w)
 +GEN_VEXT_VX_RM(vsmul_vx_d)
  /* Vector Single-Width Scaling Shift Instructions */
  static inline uint8_t
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssrl_vv_b, OP_UUU_B, H1, H1, H1, vssrl8)
  RVVCALL(OPIVV2_RM, vssrl_vv_h, OP_UUU_H, H2, H2, H2, vssrl16)
  RVVCALL(OPIVV2_RM, vssrl_vv_w, OP_UUU_W, H4, H4, H4, vssrl32)
  RVVCALL(OPIVV2_RM, vssrl_vv_d, OP_UUU_D, H8, H8, H8, vssrl64)
 -GEN_VEXT_VV_RM(vssrl_vv_b, 1, 1)
 -GEN_VEXT_VV_RM(vssrl_vv_h, 2, 2)
 -GEN_VEXT_VV_RM(vssrl_vv_w, 4, 4)
 -GEN_VEXT_VV_RM(vssrl_vv_d, 8, 8)
 +GEN_VEXT_VV_RM(vssrl_vv_b)
 +GEN_VEXT_VV_RM(vssrl_vv_h)
 +GEN_VEXT_VV_RM(vssrl_vv_w)
 +GEN_VEXT_VV_RM(vssrl_vv_d)
  RVVCALL(OPIVX2_RM, vssrl_vx_b, OP_UUU_B, H1, H1, vssrl8)
  RVVCALL(OPIVX2_RM, vssrl_vx_h, OP_UUU_H, H2, H2, vssrl16)
  RVVCALL(OPIVX2_RM, vssrl_vx_w, OP_UUU_W, H4, H4, vssrl32)
  RVVCALL(OPIVX2_RM, vssrl_vx_d, OP_UUU_D, H8, H8, vssrl64)
 -GEN_VEXT_VX_RM(vssrl_vx_b, 1, 1)
 -GEN_VEXT_VX_RM(vssrl_vx_h, 2, 2)
 -GEN_VEXT_VX_RM(vssrl_vx_w, 4, 4)
 -GEN_VEXT_VX_RM(vssrl_vx_d, 8, 8)
 +GEN_VEXT_VX_RM(vssrl_vx_b)
 +GEN_VEXT_VX_RM(vssrl_vx_h)
 +GEN_VEXT_VX_RM(vssrl_vx_w)
 +GEN_VEXT_VX_RM(vssrl_vx_d)
  static inline int8_t
  vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssra_vv_b, OP_SSS_B, H1, H1, H1, vssra8)
  RVVCALL(OPIVV2_RM, vssra_vv_h, OP_SSS_H, H2, H2, H2, vssra16)
  RVVCALL(OPIVV2_RM, vssra_vv_w, OP_SSS_W, H4, H4, H4, vssra32)
  RVVCALL(OPIVV2_RM, vssra_vv_d, OP_SSS_D, H8, H8, H8, vssra64)
 -GEN_VEXT_VV_RM(vssra_vv_b, 1, 1)
 -GEN_VEXT_VV_RM(vssra_vv_h, 2, 2)
 -GEN_VEXT_VV_RM(vssra_vv_w, 4, 4)
 -GEN_VEXT_VV_RM(vssra_vv_d, 8, 8)
 +GEN_VEXT_VV_RM(vssra_vv_b)
 +GEN_VEXT_VV_RM(vssra_vv_h)
 +GEN_VEXT_VV_RM(vssra_vv_w)
 +GEN_VEXT_VV_RM(vssra_vv_d)
  RVVCALL(OPIVX2_RM, vssra_vx_b, OP_SSS_B, H1, H1, vssra8)
  RVVCALL(OPIVX2_RM, vssra_vx_h, OP_SSS_H, H2, H2, vssra16)
  RVVCALL(OPIVX2_RM, vssra_vx_w, OP_SSS_W, H4, H4, vssra32)
  RVVCALL(OPIVX2_RM, vssra_vx_d, OP_SSS_D, H8, H8, vssra64)
 -GEN_VEXT_VX_RM(vssra_vx_b, 1, 1)
 -GEN_VEXT_VX_RM(vssra_vx_h, 2, 2)
 -GEN_VEXT_VX_RM(vssra_vx_w, 4, 4)
 -GEN_VEXT_VX_RM(vssra_vx_d, 8, 8)
 +GEN_VEXT_VX_RM(vssra_vx_b)
 +GEN_VEXT_VX_RM(vssra_vx_h)
 +GEN_VEXT_VX_RM(vssra_vx_w)
 +GEN_VEXT_VX_RM(vssra_vx_d)
  /* Vector Narrowing Fixed-Point Clip Instructions */
  static inline int8_t
@@ -XXX,XX +XXX,XX @@ vnclip32(CPURISCVState *env, int vxrm, int64_t a, int32_t b)
  RVVCALL(OPIVV2_RM, vnclip_wv_b, NOP_SSS_B, H1, H2, H1, vnclip8)
  RVVCALL(OPIVV2_RM, vnclip_wv_h, NOP_SSS_H, H2, H4, H2, vnclip16)
  RVVCALL(OPIVV2_RM, vnclip_wv_w, NOP_SSS_W, H4, H8, H4, vnclip32)
 -GEN_VEXT_VV_RM(vnclip_wv_b, 1, 1)
 -GEN_VEXT_VV_RM(vnclip_wv_h, 2, 2)
 -GEN_VEXT_VV_RM(vnclip_wv_w, 4, 4)
 +GEN_VEXT_VV_RM(vnclip_wv_b)
 +GEN_VEXT_VV_RM(vnclip_wv_h)
 +GEN_VEXT_VV_RM(vnclip_wv_w)
  RVVCALL(OPIVX2_RM, vnclip_wx_b, NOP_SSS_B, H1, H2, vnclip8)
  RVVCALL(OPIVX2_RM, vnclip_wx_h, NOP_SSS_H, H2, H4, vnclip16)
  RVVCALL(OPIVX2_RM, vnclip_wx_w, NOP_SSS_W, H4, H8, vnclip32)
 -GEN_VEXT_VX_RM(vnclip_wx_b, 1, 1)
 -GEN_VEXT_VX_RM(vnclip_wx_h, 2, 2)
 -GEN_VEXT_VX_RM(vnclip_wx_w, 4, 4)
 +GEN_VEXT_VX_RM(vnclip_wx_b)
 +GEN_VEXT_VX_RM(vnclip_wx_h)
 +GEN_VEXT_VX_RM(vnclip_wx_w)
  static inline uint8_t
  vnclipu8(CPURISCVState *env, int vxrm, uint16_t a, uint8_t b)
@@ -XXX,XX +XXX,XX @@ vnclipu32(CPURISCVState *env, int vxrm, uint64_t a, uint32_t b)
  RVVCALL(OPIVV2_RM, vnclipu_wv_b, NOP_UUU_B, H1, H2, H1, vnclipu8)
  RVVCALL(OPIVV2_RM, vnclipu_wv_h, NOP_UUU_H, H2, H4, H2, vnclipu16)
  RVVCALL(OPIVV2_RM, vnclipu_wv_w, NOP_UUU_W, H4, H8, H4, vnclipu32)
 -GEN_VEXT_VV_RM(vnclipu_wv_b, 1, 1)
 -GEN_VEXT_VV_RM(vnclipu_wv_h, 2, 2)
 -GEN_VEXT_VV_RM(vnclipu_wv_w, 4, 4)
 +GEN_VEXT_VV_RM(vnclipu_wv_b)
 +GEN_VEXT_VV_RM(vnclipu_wv_h)
 +GEN_VEXT_VV_RM(vnclipu_wv_w)
  RVVCALL(OPIVX2_RM, vnclipu_wx_b, NOP_UUU_B, H1, H2, vnclipu8)
  RVVCALL(OPIVX2_RM, vnclipu_wx_h, NOP_UUU_H, H2, H4, vnclipu16)
  RVVCALL(OPIVX2_RM, vnclipu_wx_w, NOP_UUU_W, H4, H8, vnclipu32)
 -GEN_VEXT_VX_RM(vnclipu_wx_b, 1, 1)
 -GEN_VEXT_VX_RM(vnclipu_wx_h, 2, 2)
 -GEN_VEXT_VX_RM(vnclipu_wx_w, 4, 4)
 +GEN_VEXT_VX_RM(vnclipu_wx_b)
 +GEN_VEXT_VX_RM(vnclipu_wx_h)
 +GEN_VEXT_VX_RM(vnclipu_wx_w)
  /*
   *** Vector Float Point Arithmetic Instructions
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, void *vs1, void *vs2, int i,   \
      *((TD *)vd + HD(i)) = OP(s2, s1, &env->fp_status);         \
  }
 -#define GEN_VEXT_VV_ENV(NAME, ESZ, DSZ)                   \
 +#define GEN_VEXT_VV_ENV(NAME)                             \
  void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
                    void *vs2, CPURISCVState *env,          \
                    uint32_t desc)                          \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
  RVVCALL(OPFVV2, vfadd_vv_h, OP_UUU_H, H2, H2, H2, float16_add)
  RVVCALL(OPFVV2, vfadd_vv_w, OP_UUU_W, H4, H4, H4, float32_add)
  RVVCALL(OPFVV2, vfadd_vv_d, OP_UUU_D, H8, H8, H8, float64_add)
 -GEN_VEXT_VV_ENV(vfadd_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfadd_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfadd_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfadd_vv_h)
 +GEN_VEXT_VV_ENV(vfadd_vv_w)
 +GEN_VEXT_VV_ENV(vfadd_vv_d)
  #define OPFVF2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)        \
  static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i, \
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i, \
      *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1, &env->fp_status);\
  }
 -#define GEN_VEXT_VF(NAME, ESZ, DSZ)                       \
 +#define GEN_VEXT_VF(NAME)                                 \
  void HELPER(NAME)(void *vd, void *v0, uint64_t s1,        \
                    void *vs2, CPURISCVState *env,          \
                    uint32_t desc)                          \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1,        \
  RVVCALL(OPFVF2, vfadd_vf_h, OP_UUU_H, H2, H2, float16_add)
  RVVCALL(OPFVF2, vfadd_vf_w, OP_UUU_W, H4, H4, float32_add)
  RVVCALL(OPFVF2, vfadd_vf_d, OP_UUU_D, H8, H8, float64_add)
 -GEN_VEXT_VF(vfadd_vf_h, 2, 2)
 -GEN_VEXT_VF(vfadd_vf_w, 4, 4)
 -GEN_VEXT_VF(vfadd_vf_d, 8, 8)
 +GEN_VEXT_VF(vfadd_vf_h)
 +GEN_VEXT_VF(vfadd_vf_w)
 +GEN_VEXT_VF(vfadd_vf_d)
  RVVCALL(OPFVV2, vfsub_vv_h, OP_UUU_H, H2, H2, H2, float16_sub)
  RVVCALL(OPFVV2, vfsub_vv_w, OP_UUU_W, H4, H4, H4, float32_sub)
  RVVCALL(OPFVV2, vfsub_vv_d, OP_UUU_D, H8, H8, H8, float64_sub)
 -GEN_VEXT_VV_ENV(vfsub_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfsub_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfsub_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfsub_vv_h)
 +GEN_VEXT_VV_ENV(vfsub_vv_w)
 +GEN_VEXT_VV_ENV(vfsub_vv_d)
  RVVCALL(OPFVF2, vfsub_vf_h, OP_UUU_H, H2, H2, float16_sub)
  RVVCALL(OPFVF2, vfsub_vf_w, OP_UUU_W, H4, H4, float32_sub)
  RVVCALL(OPFVF2, vfsub_vf_d, OP_UUU_D, H8, H8, float64_sub)
 -GEN_VEXT_VF(vfsub_vf_h, 2, 2)
 -GEN_VEXT_VF(vfsub_vf_w, 4, 4)
 -GEN_VEXT_VF(vfsub_vf_d, 8, 8)
 +GEN_VEXT_VF(vfsub_vf_h)
 +GEN_VEXT_VF(vfsub_vf_w)
 +GEN_VEXT_VF(vfsub_vf_d)
  static uint16_t float16_rsub(uint16_t a, uint16_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t float64_rsub(uint64_t a, uint64_t b, float_status *s)
  RVVCALL(OPFVF2, vfrsub_vf_h, OP_UUU_H, H2, H2, float16_rsub)
  RVVCALL(OPFVF2, vfrsub_vf_w, OP_UUU_W, H4, H4, float32_rsub)
  RVVCALL(OPFVF2, vfrsub_vf_d, OP_UUU_D, H8, H8, float64_rsub)
 -GEN_VEXT_VF(vfrsub_vf_h, 2, 2)
 -GEN_VEXT_VF(vfrsub_vf_w, 4, 4)
 -GEN_VEXT_VF(vfrsub_vf_d, 8, 8)
 +GEN_VEXT_VF(vfrsub_vf_h)
 +GEN_VEXT_VF(vfrsub_vf_w)
 +GEN_VEXT_VF(vfrsub_vf_d)
  /* Vector Widening Floating-Point Add/Subtract Instructions */
  static uint32_t vfwadd16(uint16_t a, uint16_t b, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwadd32(uint32_t a, uint32_t b, float_status *s)
  RVVCALL(OPFVV2, vfwadd_vv_h, WOP_UUU_H, H4, H2, H2, vfwadd16)
  RVVCALL(OPFVV2, vfwadd_vv_w, WOP_UUU_W, H8, H4, H4, vfwadd32)
 -GEN_VEXT_VV_ENV(vfwadd_vv_h, 2, 4)
 -GEN_VEXT_VV_ENV(vfwadd_vv_w, 4, 8)
 +GEN_VEXT_VV_ENV(vfwadd_vv_h)
 +GEN_VEXT_VV_ENV(vfwadd_vv_w)
  RVVCALL(OPFVF2, vfwadd_vf_h, WOP_UUU_H, H4, H2, vfwadd16)
  RVVCALL(OPFVF2, vfwadd_vf_w, WOP_UUU_W, H8, H4, vfwadd32)
 -GEN_VEXT_VF(vfwadd_vf_h, 2, 4)
 -GEN_VEXT_VF(vfwadd_vf_w, 4, 8)
 +GEN_VEXT_VF(vfwadd_vf_h)
 +GEN_VEXT_VF(vfwadd_vf_w)
  static uint32_t vfwsub16(uint16_t a, uint16_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwsub32(uint32_t a, uint32_t b, float_status *s)
  RVVCALL(OPFVV2, vfwsub_vv_h, WOP_UUU_H, H4, H2, H2, vfwsub16)
  RVVCALL(OPFVV2, vfwsub_vv_w, WOP_UUU_W, H8, H4, H4, vfwsub32)
 -GEN_VEXT_VV_ENV(vfwsub_vv_h, 2, 4)
 -GEN_VEXT_VV_ENV(vfwsub_vv_w, 4, 8)
 +GEN_VEXT_VV_ENV(vfwsub_vv_h)
 +GEN_VEXT_VV_ENV(vfwsub_vv_w)
  RVVCALL(OPFVF2, vfwsub_vf_h, WOP_UUU_H, H4, H2, vfwsub16)
  RVVCALL(OPFVF2, vfwsub_vf_w, WOP_UUU_W, H8, H4, vfwsub32)
 -GEN_VEXT_VF(vfwsub_vf_h, 2, 4)
 -GEN_VEXT_VF(vfwsub_vf_w, 4, 8)
 +GEN_VEXT_VF(vfwsub_vf_h)
 +GEN_VEXT_VF(vfwsub_vf_w)
  static uint32_t vfwaddw16(uint32_t a, uint16_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwaddw32(uint64_t a, uint32_t b, float_status *s)
  RVVCALL(OPFVV2, vfwadd_wv_h, WOP_WUUU_H, H4, H2, H2, vfwaddw16)
  RVVCALL(OPFVV2, vfwadd_wv_w, WOP_WUUU_W, H8, H4, H4, vfwaddw32)
 -GEN_VEXT_VV_ENV(vfwadd_wv_h, 2, 4)
 -GEN_VEXT_VV_ENV(vfwadd_wv_w, 4, 8)
 +GEN_VEXT_VV_ENV(vfwadd_wv_h)
 +GEN_VEXT_VV_ENV(vfwadd_wv_w)
  RVVCALL(OPFVF2, vfwadd_wf_h, WOP_WUUU_H, H4, H2, vfwaddw16)
  RVVCALL(OPFVF2, vfwadd_wf_w, WOP_WUUU_W, H8, H4, vfwaddw32)
 -GEN_VEXT_VF(vfwadd_wf_h, 2, 4)
 -GEN_VEXT_VF(vfwadd_wf_w, 4, 8)
 +GEN_VEXT_VF(vfwadd_wf_h)
 +GEN_VEXT_VF(vfwadd_wf_w)
  static uint32_t vfwsubw16(uint32_t a, uint16_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwsubw32(uint64_t a, uint32_t b, float_status *s)
  RVVCALL(OPFVV2, vfwsub_wv_h, WOP_WUUU_H, H4, H2, H2, vfwsubw16)
  RVVCALL(OPFVV2, vfwsub_wv_w, WOP_WUUU_W, H8, H4, H4, vfwsubw32)
 -GEN_VEXT_VV_ENV(vfwsub_wv_h, 2, 4)
 -GEN_VEXT_VV_ENV(vfwsub_wv_w, 4, 8)
 +GEN_VEXT_VV_ENV(vfwsub_wv_h)
 +GEN_VEXT_VV_ENV(vfwsub_wv_w)
  RVVCALL(OPFVF2, vfwsub_wf_h, WOP_WUUU_H, H4, H2, vfwsubw16)
  RVVCALL(OPFVF2, vfwsub_wf_w, WOP_WUUU_W, H8, H4, vfwsubw32)
 -GEN_VEXT_VF(vfwsub_wf_h, 2, 4)
 -GEN_VEXT_VF(vfwsub_wf_w, 4, 8)
 +GEN_VEXT_VF(vfwsub_wf_h)
 +GEN_VEXT_VF(vfwsub_wf_w)
  /* Vector Single-Width Floating-Point Multiply/Divide Instructions */
  RVVCALL(OPFVV2, vfmul_vv_h, OP_UUU_H, H2, H2, H2, float16_mul)
  RVVCALL(OPFVV2, vfmul_vv_w, OP_UUU_W, H4, H4, H4, float32_mul)
  RVVCALL(OPFVV2, vfmul_vv_d, OP_UUU_D, H8, H8, H8, float64_mul)
 -GEN_VEXT_VV_ENV(vfmul_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfmul_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfmul_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfmul_vv_h)
 +GEN_VEXT_VV_ENV(vfmul_vv_w)
 +GEN_VEXT_VV_ENV(vfmul_vv_d)
  RVVCALL(OPFVF2, vfmul_vf_h, OP_UUU_H, H2, H2, float16_mul)
  RVVCALL(OPFVF2, vfmul_vf_w, OP_UUU_W, H4, H4, float32_mul)
  RVVCALL(OPFVF2, vfmul_vf_d, OP_UUU_D, H8, H8, float64_mul)
 -GEN_VEXT_VF(vfmul_vf_h, 2, 2)
 -GEN_VEXT_VF(vfmul_vf_w, 4, 4)
 -GEN_VEXT_VF(vfmul_vf_d, 8, 8)
 +GEN_VEXT_VF(vfmul_vf_h)
 +GEN_VEXT_VF(vfmul_vf_w)
 +GEN_VEXT_VF(vfmul_vf_d)
  RVVCALL(OPFVV2, vfdiv_vv_h, OP_UUU_H, H2, H2, H2, float16_div)
  RVVCALL(OPFVV2, vfdiv_vv_w, OP_UUU_W, H4, H4, H4, float32_div)
  RVVCALL(OPFVV2, vfdiv_vv_d, OP_UUU_D, H8, H8, H8, float64_div)
 -GEN_VEXT_VV_ENV(vfdiv_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfdiv_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfdiv_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfdiv_vv_h)
 +GEN_VEXT_VV_ENV(vfdiv_vv_w)
 +GEN_VEXT_VV_ENV(vfdiv_vv_d)
  RVVCALL(OPFVF2, vfdiv_vf_h, OP_UUU_H, H2, H2, float16_div)
  RVVCALL(OPFVF2, vfdiv_vf_w, OP_UUU_W, H4, H4, float32_div)
  RVVCALL(OPFVF2, vfdiv_vf_d, OP_UUU_D, H8, H8, float64_div)
 -GEN_VEXT_VF(vfdiv_vf_h, 2, 2)
 -GEN_VEXT_VF(vfdiv_vf_w, 4, 4)
 -GEN_VEXT_VF(vfdiv_vf_d, 8, 8)
 +GEN_VEXT_VF(vfdiv_vf_h)
 +GEN_VEXT_VF(vfdiv_vf_w)
 +GEN_VEXT_VF(vfdiv_vf_d)
  static uint16_t float16_rdiv(uint16_t a, uint16_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t float64_rdiv(uint64_t a, uint64_t b, float_status *s)
  RVVCALL(OPFVF2, vfrdiv_vf_h, OP_UUU_H, H2, H2, float16_rdiv)
  RVVCALL(OPFVF2, vfrdiv_vf_w, OP_UUU_W, H4, H4, float32_rdiv)
  RVVCALL(OPFVF2, vfrdiv_vf_d, OP_UUU_D, H8, H8, float64_rdiv)
 -GEN_VEXT_VF(vfrdiv_vf_h, 2, 2)
 -GEN_VEXT_VF(vfrdiv_vf_w, 4, 4)
 -GEN_VEXT_VF(vfrdiv_vf_d, 8, 8)
 +GEN_VEXT_VF(vfrdiv_vf_h)
 +GEN_VEXT_VF(vfrdiv_vf_w)
 +GEN_VEXT_VF(vfrdiv_vf_d)
  /* Vector Widening Floating-Point Multiply */
  static uint32_t vfwmul16(uint16_t a, uint16_t b, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwmul32(uint32_t a, uint32_t b, float_status *s)
  }
  RVVCALL(OPFVV2, vfwmul_vv_h, WOP_UUU_H, H4, H2, H2, vfwmul16)
  RVVCALL(OPFVV2, vfwmul_vv_w, WOP_UUU_W, H8, H4, H4, vfwmul32)
 -GEN_VEXT_VV_ENV(vfwmul_vv_h, 2, 4)
 -GEN_VEXT_VV_ENV(vfwmul_vv_w, 4, 8)
 +GEN_VEXT_VV_ENV(vfwmul_vv_h)
 +GEN_VEXT_VV_ENV(vfwmul_vv_w)
  RVVCALL(OPFVF2, vfwmul_vf_h, WOP_UUU_H, H4, H2, vfwmul16)
  RVVCALL(OPFVF2, vfwmul_vf_w, WOP_UUU_W, H8, H4, vfwmul32)
 -GEN_VEXT_VF(vfwmul_vf_h, 2, 4)
 -GEN_VEXT_VF(vfwmul_vf_w, 4, 8)
 +GEN_VEXT_VF(vfwmul_vf_h)
 +GEN_VEXT_VF(vfwmul_vf_w)
  /* Vector Single-Width Floating-Point Fused Multiply-Add Instructions */
  #define OPFVV3(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)       \
@@ -XXX,XX +XXX,XX @@ static uint64_t fmacc64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfmacc_vv_h, OP_UUU_H, H2, H2, H2, fmacc16)
  RVVCALL(OPFVV3, vfmacc_vv_w, OP_UUU_W, H4, H4, H4, fmacc32)
  RVVCALL(OPFVV3, vfmacc_vv_d, OP_UUU_D, H8, H8, H8, fmacc64)
 -GEN_VEXT_VV_ENV(vfmacc_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfmacc_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfmacc_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfmacc_vv_h)
 +GEN_VEXT_VV_ENV(vfmacc_vv_w)
 +GEN_VEXT_VV_ENV(vfmacc_vv_d)
  #define OPFVF3(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)           \
  static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i,    \
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i,    \
  RVVCALL(OPFVF3, vfmacc_vf_h, OP_UUU_H, H2, H2, fmacc16)
  RVVCALL(OPFVF3, vfmacc_vf_w, OP_UUU_W, H4, H4, fmacc32)
  RVVCALL(OPFVF3, vfmacc_vf_d, OP_UUU_D, H8, H8, fmacc64)
 -GEN_VEXT_VF(vfmacc_vf_h, 2, 2)
 -GEN_VEXT_VF(vfmacc_vf_w, 4, 4)
 -GEN_VEXT_VF(vfmacc_vf_d, 8, 8)
 +GEN_VEXT_VF(vfmacc_vf_h)
 +GEN_VEXT_VF(vfmacc_vf_w)
 +GEN_VEXT_VF(vfmacc_vf_d)
  static uint16_t fnmacc16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmacc64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfnmacc_vv_h, OP_UUU_H, H2, H2, H2, fnmacc16)
  RVVCALL(OPFVV3, vfnmacc_vv_w, OP_UUU_W, H4, H4, H4, fnmacc32)
  RVVCALL(OPFVV3, vfnmacc_vv_d, OP_UUU_D, H8, H8, H8, fnmacc64)
 -GEN_VEXT_VV_ENV(vfnmacc_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfnmacc_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfnmacc_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfnmacc_vv_h)
 +GEN_VEXT_VV_ENV(vfnmacc_vv_w)
 +GEN_VEXT_VV_ENV(vfnmacc_vv_d)
  RVVCALL(OPFVF3, vfnmacc_vf_h, OP_UUU_H, H2, H2, fnmacc16)
  RVVCALL(OPFVF3, vfnmacc_vf_w, OP_UUU_W, H4, H4, fnmacc32)
  RVVCALL(OPFVF3, vfnmacc_vf_d, OP_UUU_D, H8, H8, fnmacc64)
 -GEN_VEXT_VF(vfnmacc_vf_h, 2, 2)
 -GEN_VEXT_VF(vfnmacc_vf_w, 4, 4)
 -GEN_VEXT_VF(vfnmacc_vf_d, 8, 8)
 +GEN_VEXT_VF(vfnmacc_vf_h)
 +GEN_VEXT_VF(vfnmacc_vf_w)
 +GEN_VEXT_VF(vfnmacc_vf_d)
  static uint16_t fmsac16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fmsac64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfmsac_vv_h, OP_UUU_H, H2, H2, H2, fmsac16)
  RVVCALL(OPFVV3, vfmsac_vv_w, OP_UUU_W, H4, H4, H4, fmsac32)
  RVVCALL(OPFVV3, vfmsac_vv_d, OP_UUU_D, H8, H8, H8, fmsac64)
 -GEN_VEXT_VV_ENV(vfmsac_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfmsac_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfmsac_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfmsac_vv_h)
 +GEN_VEXT_VV_ENV(vfmsac_vv_w)
 +GEN_VEXT_VV_ENV(vfmsac_vv_d)
  RVVCALL(OPFVF3, vfmsac_vf_h, OP_UUU_H, H2, H2, fmsac16)
  RVVCALL(OPFVF3, vfmsac_vf_w, OP_UUU_W, H4, H4, fmsac32)
  RVVCALL(OPFVF3, vfmsac_vf_d, OP_UUU_D, H8, H8, fmsac64)
 -GEN_VEXT_VF(vfmsac_vf_h, 2, 2)
 -GEN_VEXT_VF(vfmsac_vf_w, 4, 4)
 -GEN_VEXT_VF(vfmsac_vf_d, 8, 8)
 +GEN_VEXT_VF(vfmsac_vf_h)
 +GEN_VEXT_VF(vfmsac_vf_w)
 +GEN_VEXT_VF(vfmsac_vf_d)
  static uint16_t fnmsac16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmsac64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfnmsac_vv_h, OP_UUU_H, H2, H2, H2, fnmsac16)
  RVVCALL(OPFVV3, vfnmsac_vv_w, OP_UUU_W, H4, H4, H4, fnmsac32)
  RVVCALL(OPFVV3, vfnmsac_vv_d, OP_UUU_D, H8, H8, H8, fnmsac64)
 -GEN_VEXT_VV_ENV(vfnmsac_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfnmsac_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfnmsac_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfnmsac_vv_h)
 +GEN_VEXT_VV_ENV(vfnmsac_vv_w)
 +GEN_VEXT_VV_ENV(vfnmsac_vv_d)
  RVVCALL(OPFVF3, vfnmsac_vf_h, OP_UUU_H, H2, H2, fnmsac16)
  RVVCALL(OPFVF3, vfnmsac_vf_w, OP_UUU_W, H4, H4, fnmsac32)
  RVVCALL(OPFVF3, vfnmsac_vf_d, OP_UUU_D, H8, H8, fnmsac64)
 -GEN_VEXT_VF(vfnmsac_vf_h, 2, 2)
 -GEN_VEXT_VF(vfnmsac_vf_w, 4, 4)
 -GEN_VEXT_VF(vfnmsac_vf_d, 8, 8)
 +GEN_VEXT_VF(vfnmsac_vf_h)
 +GEN_VEXT_VF(vfnmsac_vf_w)
 +GEN_VEXT_VF(vfnmsac_vf_d)
  static uint16_t fmadd16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fmadd64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfmadd_vv_h, OP_UUU_H, H2, H2, H2, fmadd16)
  RVVCALL(OPFVV3, vfmadd_vv_w, OP_UUU_W, H4, H4, H4, fmadd32)
  RVVCALL(OPFVV3, vfmadd_vv_d, OP_UUU_D, H8, H8, H8, fmadd64)
 -GEN_VEXT_VV_ENV(vfmadd_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfmadd_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfmadd_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfmadd_vv_h)
 +GEN_VEXT_VV_ENV(vfmadd_vv_w)
 +GEN_VEXT_VV_ENV(vfmadd_vv_d)
  RVVCALL(OPFVF3, vfmadd_vf_h, OP_UUU_H, H2, H2, fmadd16)
  RVVCALL(OPFVF3, vfmadd_vf_w, OP_UUU_W, H4, H4, fmadd32)
  RVVCALL(OPFVF3, vfmadd_vf_d, OP_UUU_D, H8, H8, fmadd64)
 -GEN_VEXT_VF(vfmadd_vf_h, 2, 2)
 -GEN_VEXT_VF(vfmadd_vf_w, 4, 4)
 -GEN_VEXT_VF(vfmadd_vf_d, 8, 8)
 +GEN_VEXT_VF(vfmadd_vf_h)
 +GEN_VEXT_VF(vfmadd_vf_w)
 +GEN_VEXT_VF(vfmadd_vf_d)
  static uint16_t fnmadd16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmadd64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfnmadd_vv_h, OP_UUU_H, H2, H2, H2, fnmadd16)
  RVVCALL(OPFVV3, vfnmadd_vv_w, OP_UUU_W, H4, H4, H4, fnmadd32)
  RVVCALL(OPFVV3, vfnmadd_vv_d, OP_UUU_D, H8, H8, H8, fnmadd64)
 -GEN_VEXT_VV_ENV(vfnmadd_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfnmadd_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfnmadd_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfnmadd_vv_h)
 +GEN_VEXT_VV_ENV(vfnmadd_vv_w)
 +GEN_VEXT_VV_ENV(vfnmadd_vv_d)
  RVVCALL(OPFVF3, vfnmadd_vf_h, OP_UUU_H, H2, H2, fnmadd16)
  RVVCALL(OPFVF3, vfnmadd_vf_w, OP_UUU_W, H4, H4, fnmadd32)
  RVVCALL(OPFVF3, vfnmadd_vf_d, OP_UUU_D, H8, H8, fnmadd64)
 -GEN_VEXT_VF(vfnmadd_vf_h, 2, 2)
 -GEN_VEXT_VF(vfnmadd_vf_w, 4, 4)
 -GEN_VEXT_VF(vfnmadd_vf_d, 8, 8)
 +GEN_VEXT_VF(vfnmadd_vf_h)
 +GEN_VEXT_VF(vfnmadd_vf_w)
 +GEN_VEXT_VF(vfnmadd_vf_d)
  static uint16_t fmsub16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fmsub64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfmsub_vv_h, OP_UUU_H, H2, H2, H2, fmsub16)
  RVVCALL(OPFVV3, vfmsub_vv_w, OP_UUU_W, H4, H4, H4, fmsub32)
  RVVCALL(OPFVV3, vfmsub_vv_d, OP_UUU_D, H8, H8, H8, fmsub64)
 -GEN_VEXT_VV_ENV(vfmsub_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfmsub_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfmsub_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfmsub_vv_h)
 +GEN_VEXT_VV_ENV(vfmsub_vv_w)
 +GEN_VEXT_VV_ENV(vfmsub_vv_d)
  RVVCALL(OPFVF3, vfmsub_vf_h, OP_UUU_H, H2, H2, fmsub16)
  RVVCALL(OPFVF3, vfmsub_vf_w, OP_UUU_W, H4, H4, fmsub32)
  RVVCALL(OPFVF3, vfmsub_vf_d, OP_UUU_D, H8, H8, fmsub64)
 -GEN_VEXT_VF(vfmsub_vf_h, 2, 2)
 -GEN_VEXT_VF(vfmsub_vf_w, 4, 4)
 -GEN_VEXT_VF(vfmsub_vf_d, 8, 8)
 +GEN_VEXT_VF(vfmsub_vf_h)
 +GEN_VEXT_VF(vfmsub_vf_w)
 +GEN_VEXT_VF(vfmsub_vf_d)
  static uint16_t fnmsub16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmsub64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfnmsub_vv_h, OP_UUU_H, H2, H2, H2, fnmsub16)
  RVVCALL(OPFVV3, vfnmsub_vv_w, OP_UUU_W, H4, H4, H4, fnmsub32)
  RVVCALL(OPFVV3, vfnmsub_vv_d, OP_UUU_D, H8, H8, H8, fnmsub64)
 -GEN_VEXT_VV_ENV(vfnmsub_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfnmsub_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfnmsub_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfnmsub_vv_h)
 +GEN_VEXT_VV_ENV(vfnmsub_vv_w)
 +GEN_VEXT_VV_ENV(vfnmsub_vv_d)
  RVVCALL(OPFVF3, vfnmsub_vf_h, OP_UUU_H, H2, H2, fnmsub16)
  RVVCALL(OPFVF3, vfnmsub_vf_w, OP_UUU_W, H4, H4, fnmsub32)
  RVVCALL(OPFVF3, vfnmsub_vf_d, OP_UUU_D, H8, H8, fnmsub64)
 -GEN_VEXT_VF(vfnmsub_vf_h, 2, 2)
 -GEN_VEXT_VF(vfnmsub_vf_w, 4, 4)
 -GEN_VEXT_VF(vfnmsub_vf_d, 8, 8)
 +GEN_VEXT_VF(vfnmsub_vf_h)
 +GEN_VEXT_VF(vfnmsub_vf_w)
 +GEN_VEXT_VF(vfnmsub_vf_d)
  /* Vector Widening Floating-Point Fused Multiply-Add Instructions */
  static uint32_t fwmacc16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t fwmacc32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfwmacc_vv_h, WOP_UUU_H, H4, H2, H2, fwmacc16)
  RVVCALL(OPFVV3, vfwmacc_vv_w, WOP_UUU_W, H8, H4, H4, fwmacc32)
 -GEN_VEXT_VV_ENV(vfwmacc_vv_h, 2, 4)
 -GEN_VEXT_VV_ENV(vfwmacc_vv_w, 4, 8)
 +GEN_VEXT_VV_ENV(vfwmacc_vv_h)
 +GEN_VEXT_VV_ENV(vfwmacc_vv_w)
  RVVCALL(OPFVF3, vfwmacc_vf_h, WOP_UUU_H, H4, H2, fwmacc16)
  RVVCALL(OPFVF3, vfwmacc_vf_w, WOP_UUU_W, H8, H4, fwmacc32)
 -GEN_VEXT_VF(vfwmacc_vf_h, 2, 4)
 -GEN_VEXT_VF(vfwmacc_vf_w, 4, 8)
 +GEN_VEXT_VF(vfwmacc_vf_h)
 +GEN_VEXT_VF(vfwmacc_vf_w)
  static uint32_t fwnmacc16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fwnmacc32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfwnmacc_vv_h, WOP_UUU_H, H4, H2, H2, fwnmacc16)
  RVVCALL(OPFVV3, vfwnmacc_vv_w, WOP_UUU_W, H8, H4, H4, fwnmacc32)
 -GEN_VEXT_VV_ENV(vfwnmacc_vv_h, 2, 4)
 -GEN_VEXT_VV_ENV(vfwnmacc_vv_w, 4, 8)
 +GEN_VEXT_VV_ENV(vfwnmacc_vv_h)
 +GEN_VEXT_VV_ENV(vfwnmacc_vv_w)
  RVVCALL(OPFVF3, vfwnmacc_vf_h, WOP_UUU_H, H4, H2, fwnmacc16)
  RVVCALL(OPFVF3, vfwnmacc_vf_w, WOP_UUU_W, H8, H4, fwnmacc32)
 -GEN_VEXT_VF(vfwnmacc_vf_h, 2, 4)
 -GEN_VEXT_VF(vfwnmacc_vf_w, 4, 8)
 +GEN_VEXT_VF(vfwnmacc_vf_h)
 +GEN_VEXT_VF(vfwnmacc_vf_w)
  static uint32_t fwmsac16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fwmsac32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfwmsac_vv_h, WOP_UUU_H, H4, H2, H2, fwmsac16)
  RVVCALL(OPFVV3, vfwmsac_vv_w, WOP_UUU_W, H8, H4, H4, fwmsac32)
 -GEN_VEXT_VV_ENV(vfwmsac_vv_h, 2, 4)
 -GEN_VEXT_VV_ENV(vfwmsac_vv_w, 4, 8)
 +GEN_VEXT_VV_ENV(vfwmsac_vv_h)
 +GEN_VEXT_VV_ENV(vfwmsac_vv_w)
  RVVCALL(OPFVF3, vfwmsac_vf_h, WOP_UUU_H, H4, H2, fwmsac16)
  RVVCALL(OPFVF3, vfwmsac_vf_w, WOP_UUU_W, H8, H4, fwmsac32)
 -GEN_VEXT_VF(vfwmsac_vf_h, 2, 4)
 -GEN_VEXT_VF(vfwmsac_vf_w, 4, 8)
 +GEN_VEXT_VF(vfwmsac_vf_h)
 +GEN_VEXT_VF(vfwmsac_vf_w)
  static uint32_t fwnmsac16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fwnmsac32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfwnmsac_vv_h, WOP_UUU_H, H4, H2, H2, fwnmsac16)
  RVVCALL(OPFVV3, vfwnmsac_vv_w, WOP_UUU_W, H8, H4, H4, fwnmsac32)
 -GEN_VEXT_VV_ENV(vfwnmsac_vv_h, 2, 4)
 -GEN_VEXT_VV_ENV(vfwnmsac_vv_w, 4, 8)
 +GEN_VEXT_VV_ENV(vfwnmsac_vv_h)
 +GEN_VEXT_VV_ENV(vfwnmsac_vv_w)
  RVVCALL(OPFVF3, vfwnmsac_vf_h, WOP_UUU_H, H4, H2, fwnmsac16)
  RVVCALL(OPFVF3, vfwnmsac_vf_w, WOP_UUU_W, H8, H4, fwnmsac32)
 -GEN_VEXT_VF(vfwnmsac_vf_h, 2, 4)
 -GEN_VEXT_VF(vfwnmsac_vf_w, 4, 8)
 +GEN_VEXT_VF(vfwnmsac_vf_h)
 +GEN_VEXT_VF(vfwnmsac_vf_w)
  /* Vector Floating-Point Square-Root Instruction */
  /* (TD, T2, TX2) */
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, void *vs2, int i,      \
      *((TD *)vd + HD(i)) = OP(s2, &env->fp_status);     \
  }
 -#define GEN_VEXT_V_ENV(NAME, ESZ, DSZ)                 \
 +#define GEN_VEXT_V_ENV(NAME)                           \
  void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
          CPURISCVState *env, uint32_t desc)             \
  {                                                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
  RVVCALL(OPFVV1, vfsqrt_v_h, OP_UU_H, H2, H2, float16_sqrt)
  RVVCALL(OPFVV1, vfsqrt_v_w, OP_UU_W, H4, H4, float32_sqrt)
  RVVCALL(OPFVV1, vfsqrt_v_d, OP_UU_D, H8, H8, float64_sqrt)
 -GEN_VEXT_V_ENV(vfsqrt_v_h, 2, 2)
 -GEN_VEXT_V_ENV(vfsqrt_v_w, 4, 4)
 -GEN_VEXT_V_ENV(vfsqrt_v_d, 8, 8)
 +GEN_VEXT_V_ENV(vfsqrt_v_h)
 +GEN_VEXT_V_ENV(vfsqrt_v_w)
 +GEN_VEXT_V_ENV(vfsqrt_v_d)
  /*
   * Vector Floating-Point Reciprocal Square-Root Estimate Instruction
@@ -XXX,XX +XXX,XX @@ static float64 frsqrt7_d(float64 f, float_status *s)
  RVVCALL(OPFVV1, vfrsqrt7_v_h, OP_UU_H, H2, H2, frsqrt7_h)
  RVVCALL(OPFVV1, vfrsqrt7_v_w, OP_UU_W, H4, H4, frsqrt7_s)
  RVVCALL(OPFVV1, vfrsqrt7_v_d, OP_UU_D, H8, H8, frsqrt7_d)
 -GEN_VEXT_V_ENV(vfrsqrt7_v_h, 2, 2)
 -GEN_VEXT_V_ENV(vfrsqrt7_v_w, 4, 4)
 -GEN_VEXT_V_ENV(vfrsqrt7_v_d, 8, 8)
 +GEN_VEXT_V_ENV(vfrsqrt7_v_h)
 +GEN_VEXT_V_ENV(vfrsqrt7_v_w)
 +GEN_VEXT_V_ENV(vfrsqrt7_v_d)
  /*
   * Vector Floating-Point Reciprocal Estimate Instruction
@@ -XXX,XX +XXX,XX @@ static float64 frec7_d(float64 f, float_status *s)
  RVVCALL(OPFVV1, vfrec7_v_h, OP_UU_H, H2, H2, frec7_h)
  RVVCALL(OPFVV1, vfrec7_v_w, OP_UU_W, H4, H4, frec7_s)
  RVVCALL(OPFVV1, vfrec7_v_d, OP_UU_D, H8, H8, frec7_d)
 -GEN_VEXT_V_ENV(vfrec7_v_h, 2, 2)
 -GEN_VEXT_V_ENV(vfrec7_v_w, 4, 4)
 -GEN_VEXT_V_ENV(vfrec7_v_d, 8, 8)
 +GEN_VEXT_V_ENV(vfrec7_v_h)
 +GEN_VEXT_V_ENV(vfrec7_v_w)
 +GEN_VEXT_V_ENV(vfrec7_v_d)
  /* Vector Floating-Point MIN/MAX Instructions */
  RVVCALL(OPFVV2, vfmin_vv_h, OP_UUU_H, H2, H2, H2, float16_minimum_number)
  RVVCALL(OPFVV2, vfmin_vv_w, OP_UUU_W, H4, H4, H4, float32_minimum_number)
  RVVCALL(OPFVV2, vfmin_vv_d, OP_UUU_D, H8, H8, H8, float64_minimum_number)
 -GEN_VEXT_VV_ENV(vfmin_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfmin_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfmin_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfmin_vv_h)
 +GEN_VEXT_VV_ENV(vfmin_vv_w)
 +GEN_VEXT_VV_ENV(vfmin_vv_d)
  RVVCALL(OPFVF2, vfmin_vf_h, OP_UUU_H, H2, H2, float16_minimum_number)
  RVVCALL(OPFVF2, vfmin_vf_w, OP_UUU_W, H4, H4, float32_minimum_number)
  RVVCALL(OPFVF2, vfmin_vf_d, OP_UUU_D, H8, H8, float64_minimum_number)
 -GEN_VEXT_VF(vfmin_vf_h, 2, 2)
 -GEN_VEXT_VF(vfmin_vf_w, 4, 4)
 -GEN_VEXT_VF(vfmin_vf_d, 8, 8)
 +GEN_VEXT_VF(vfmin_vf_h)
 +GEN_VEXT_VF(vfmin_vf_w)
 +GEN_VEXT_VF(vfmin_vf_d)
  RVVCALL(OPFVV2, vfmax_vv_h, OP_UUU_H, H2, H2, H2, float16_maximum_number)
  RVVCALL(OPFVV2, vfmax_vv_w, OP_UUU_W, H4, H4, H4, float32_maximum_number)
  RVVCALL(OPFVV2, vfmax_vv_d, OP_UUU_D, H8, H8, H8, float64_maximum_number)
 -GEN_VEXT_VV_ENV(vfmax_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfmax_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfmax_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfmax_vv_h)
 +GEN_VEXT_VV_ENV(vfmax_vv_w)
 +GEN_VEXT_VV_ENV(vfmax_vv_d)
  RVVCALL(OPFVF2, vfmax_vf_h, OP_UUU_H, H2, H2, float16_maximum_number)
  RVVCALL(OPFVF2, vfmax_vf_w, OP_UUU_W, H4, H4, float32_maximum_number)
  RVVCALL(OPFVF2, vfmax_vf_d, OP_UUU_D, H8, H8, float64_maximum_number)
 -GEN_VEXT_VF(vfmax_vf_h, 2, 2)
 -GEN_VEXT_VF(vfmax_vf_w, 4, 4)
 -GEN_VEXT_VF(vfmax_vf_d, 8, 8)
 +GEN_VEXT_VF(vfmax_vf_h)
 +GEN_VEXT_VF(vfmax_vf_w)
 +GEN_VEXT_VF(vfmax_vf_d)
  /* Vector Floating-Point Sign-Injection Instructions */
  static uint16_t fsgnj16(uint16_t a, uint16_t b, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t fsgnj64(uint64_t a, uint64_t b, float_status *s)
  RVVCALL(OPFVV2, vfsgnj_vv_h, OP_UUU_H, H2, H2, H2, fsgnj16)
  RVVCALL(OPFVV2, vfsgnj_vv_w, OP_UUU_W, H4, H4, H4, fsgnj32)
  RVVCALL(OPFVV2, vfsgnj_vv_d, OP_UUU_D, H8, H8, H8, fsgnj64)
 -GEN_VEXT_VV_ENV(vfsgnj_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfsgnj_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfsgnj_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfsgnj_vv_h)
 +GEN_VEXT_VV_ENV(vfsgnj_vv_w)
 +GEN_VEXT_VV_ENV(vfsgnj_vv_d)
  RVVCALL(OPFVF2, vfsgnj_vf_h, OP_UUU_H, H2, H2, fsgnj16)
  RVVCALL(OPFVF2, vfsgnj_vf_w, OP_UUU_W, H4, H4, fsgnj32)
  RVVCALL(OPFVF2, vfsgnj_vf_d, OP_UUU_D, H8, H8, fsgnj64)
 -GEN_VEXT_VF(vfsgnj_vf_h, 2, 2)
 -GEN_VEXT_VF(vfsgnj_vf_w, 4, 4)
 -GEN_VEXT_VF(vfsgnj_vf_d, 8, 8)
 +GEN_VEXT_VF(vfsgnj_vf_h)
 +GEN_VEXT_VF(vfsgnj_vf_w)
 +GEN_VEXT_VF(vfsgnj_vf_d)
  static uint16_t fsgnjn16(uint16_t a, uint16_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fsgnjn64(uint64_t a, uint64_t b, float_status *s)
  RVVCALL(OPFVV2, vfsgnjn_vv_h, OP_UUU_H, H2, H2, H2, fsgnjn16)
  RVVCALL(OPFVV2, vfsgnjn_vv_w, OP_UUU_W, H4, H4, H4, fsgnjn32)
  RVVCALL(OPFVV2, vfsgnjn_vv_d, OP_UUU_D, H8, H8, H8, fsgnjn64)
 -GEN_VEXT_VV_ENV(vfsgnjn_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfsgnjn_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfsgnjn_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfsgnjn_vv_h)
 +GEN_VEXT_VV_ENV(vfsgnjn_vv_w)
 +GEN_VEXT_VV_ENV(vfsgnjn_vv_d)
  RVVCALL(OPFVF2, vfsgnjn_vf_h, OP_UUU_H, H2, H2, fsgnjn16)
  RVVCALL(OPFVF2, vfsgnjn_vf_w, OP_UUU_W, H4, H4, fsgnjn32)
  RVVCALL(OPFVF2, vfsgnjn_vf_d, OP_UUU_D, H8, H8, fsgnjn64)
 -GEN_VEXT_VF(vfsgnjn_vf_h, 2, 2)
 -GEN_VEXT_VF(vfsgnjn_vf_w, 4, 4)
 -GEN_VEXT_VF(vfsgnjn_vf_d, 8, 8)
 +GEN_VEXT_VF(vfsgnjn_vf_h)
 +GEN_VEXT_VF(vfsgnjn_vf_w)
 +GEN_VEXT_VF(vfsgnjn_vf_d)
  static uint16_t fsgnjx16(uint16_t a, uint16_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fsgnjx64(uint64_t a, uint64_t b, float_status *s)
  RVVCALL(OPFVV2, vfsgnjx_vv_h, OP_UUU_H, H2, H2, H2, fsgnjx16)
  RVVCALL(OPFVV2, vfsgnjx_vv_w, OP_UUU_W, H4, H4, H4, fsgnjx32)
  RVVCALL(OPFVV2, vfsgnjx_vv_d, OP_UUU_D, H8, H8, H8, fsgnjx64)
 -GEN_VEXT_VV_ENV(vfsgnjx_vv_h, 2, 2)
 -GEN_VEXT_VV_ENV(vfsgnjx_vv_w, 4, 4)
 -GEN_VEXT_VV_ENV(vfsgnjx_vv_d, 8, 8)
 +GEN_VEXT_VV_ENV(vfsgnjx_vv_h)
 +GEN_VEXT_VV_ENV(vfsgnjx_vv_w)
 +GEN_VEXT_VV_ENV(vfsgnjx_vv_d)
  RVVCALL(OPFVF2, vfsgnjx_vf_h, OP_UUU_H, H2, H2, fsgnjx16)
  RVVCALL(OPFVF2, vfsgnjx_vf_w, OP_UUU_W, H4, H4, fsgnjx32)
  RVVCALL(OPFVF2, vfsgnjx_vf_d, OP_UUU_D, H8, H8, fsgnjx64)
 -GEN_VEXT_VF(vfsgnjx_vf_h, 2, 2)
 -GEN_VEXT_VF(vfsgnjx_vf_w, 4, 4)
 -GEN_VEXT_VF(vfsgnjx_vf_d, 8, 8)
 +GEN_VEXT_VF(vfsgnjx_vf_h)
 +GEN_VEXT_VF(vfsgnjx_vf_w)
 +GEN_VEXT_VF(vfsgnjx_vf_d)
  /* Vector Floating-Point Compare Instructions */
  #define GEN_VEXT_CMP_VV_ENV(NAME, ETYPE, H, DO_OP)            \
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, void *vs2, int i)      \
      *((TD *)vd + HD(i)) = OP(s2);                      \
  }
 -#define GEN_VEXT_V(NAME, ESZ, DSZ)                     \
 +#define GEN_VEXT_V(NAME)                               \
  void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
                    CPURISCVState *env, uint32_t desc)   \
  {                                                      \
@@ -XXX,XX +XXX,XX @@ target_ulong fclass_d(uint64_t frs1)
  RVVCALL(OPIVV1, vfclass_v_h, OP_UU_H, H2, H2, fclass_h)
  RVVCALL(OPIVV1, vfclass_v_w, OP_UU_W, H4, H4, fclass_s)
  RVVCALL(OPIVV1, vfclass_v_d, OP_UU_D, H8, H8, fclass_d)
 -GEN_VEXT_V(vfclass_v_h, 2, 2)
 -GEN_VEXT_V(vfclass_v_w, 4, 4)
 -GEN_VEXT_V(vfclass_v_d, 8, 8)
 +GEN_VEXT_V(vfclass_v_h)
 +GEN_VEXT_V(vfclass_v_w)
 +GEN_VEXT_V(vfclass_v_d)
  /* Vector Floating-Point Merge Instruction */
  #define GEN_VFMERGE_VF(NAME, ETYPE, H)                        \
@@ -XXX,XX +XXX,XX @@ GEN_VFMERGE_VF(vfmerge_vfm_d, int64_t, H8)
  RVVCALL(OPFVV1, vfcvt_xu_f_v_h, OP_UU_H, H2, H2, float16_to_uint16)
  RVVCALL(OPFVV1, vfcvt_xu_f_v_w, OP_UU_W, H4, H4, float32_to_uint32)
  RVVCALL(OPFVV1, vfcvt_xu_f_v_d, OP_UU_D, H8, H8, float64_to_uint64)
 -GEN_VEXT_V_ENV(vfcvt_xu_f_v_h, 2, 2)
 -GEN_VEXT_V_ENV(vfcvt_xu_f_v_w, 4, 4)
 -GEN_VEXT_V_ENV(vfcvt_xu_f_v_d, 8, 8)
 +GEN_VEXT_V_ENV(vfcvt_xu_f_v_h)
 +GEN_VEXT_V_ENV(vfcvt_xu_f_v_w)
 +GEN_VEXT_V_ENV(vfcvt_xu_f_v_d)
  /* vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. */
  RVVCALL(OPFVV1, vfcvt_x_f_v_h, OP_UU_H, H2, H2, float16_to_int16)
  RVVCALL(OPFVV1, vfcvt_x_f_v_w, OP_UU_W, H4, H4, float32_to_int32)
  RVVCALL(OPFVV1, vfcvt_x_f_v_d, OP_UU_D, H8, H8, float64_to_int64)
 -GEN_VEXT_V_ENV(vfcvt_x_f_v_h, 2, 2)
 -GEN_VEXT_V_ENV(vfcvt_x_f_v_w, 4, 4)
 -GEN_VEXT_V_ENV(vfcvt_x_f_v_d, 8, 8)
 +GEN_VEXT_V_ENV(vfcvt_x_f_v_h)
 +GEN_VEXT_V_ENV(vfcvt_x_f_v_w)
 +GEN_VEXT_V_ENV(vfcvt_x_f_v_d)
  /* vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. */
  RVVCALL(OPFVV1, vfcvt_f_xu_v_h, OP_UU_H, H2, H2, uint16_to_float16)
  RVVCALL(OPFVV1, vfcvt_f_xu_v_w, OP_UU_W, H4, H4, uint32_to_float32)
  RVVCALL(OPFVV1, vfcvt_f_xu_v_d, OP_UU_D, H8, H8, uint64_to_float64)
 -GEN_VEXT_V_ENV(vfcvt_f_xu_v_h, 2, 2)
 -GEN_VEXT_V_ENV(vfcvt_f_xu_v_w, 4, 4)
 -GEN_VEXT_V_ENV(vfcvt_f_xu_v_d, 8, 8)
 +GEN_VEXT_V_ENV(vfcvt_f_xu_v_h)
 +GEN_VEXT_V_ENV(vfcvt_f_xu_v_w)
 +GEN_VEXT_V_ENV(vfcvt_f_xu_v_d)
  /* vfcvt.f.x.v vd, vs2, vm # Convert integer to float. */
  RVVCALL(OPFVV1, vfcvt_f_x_v_h, OP_UU_H, H2, H2, int16_to_float16)
  RVVCALL(OPFVV1, vfcvt_f_x_v_w, OP_UU_W, H4, H4, int32_to_float32)
  RVVCALL(OPFVV1, vfcvt_f_x_v_d, OP_UU_D, H8, H8, int64_to_float64)
 -GEN_VEXT_V_ENV(vfcvt_f_x_v_h, 2, 2)
 -GEN_VEXT_V_ENV(vfcvt_f_x_v_w, 4, 4)
 -GEN_VEXT_V_ENV(vfcvt_f_x_v_d, 8, 8)
 +GEN_VEXT_V_ENV(vfcvt_f_x_v_h)
 +GEN_VEXT_V_ENV(vfcvt_f_x_v_w)
 +GEN_VEXT_V_ENV(vfcvt_f_x_v_d)
  /* Widening Floating-Point/Integer Type-Convert Instructions */
  /* (TD, T2, TX2) */
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_V_ENV(vfcvt_f_x_v_d, 8, 8)
  /* vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer.*/
  RVVCALL(OPFVV1, vfwcvt_xu_f_v_h, WOP_UU_H, H4, H2, float16_to_uint32)
  RVVCALL(OPFVV1, vfwcvt_xu_f_v_w, WOP_UU_W, H8, H4, float32_to_uint64)
 -GEN_VEXT_V_ENV(vfwcvt_xu_f_v_h, 2, 4)
 -GEN_VEXT_V_ENV(vfwcvt_xu_f_v_w, 4, 8)
 +GEN_VEXT_V_ENV(vfwcvt_xu_f_v_h)
 +GEN_VEXT_V_ENV(vfwcvt_xu_f_v_w)
  /* vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. */
  RVVCALL(OPFVV1, vfwcvt_x_f_v_h, WOP_UU_H, H4, H2, float16_to_int32)
  RVVCALL(OPFVV1, vfwcvt_x_f_v_w, WOP_UU_W, H8, H4, float32_to_int64)
 -GEN_VEXT_V_ENV(vfwcvt_x_f_v_h, 2, 4)
 -GEN_VEXT_V_ENV(vfwcvt_x_f_v_w, 4, 8)
 +GEN_VEXT_V_ENV(vfwcvt_x_f_v_h)
 +GEN_VEXT_V_ENV(vfwcvt_x_f_v_w)
  /* vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float */
  RVVCALL(OPFVV1, vfwcvt_f_xu_v_b, WOP_UU_B, H2, H1, uint8_to_float16)
  RVVCALL(OPFVV1, vfwcvt_f_xu_v_h, WOP_UU_H, H4, H2, uint16_to_float32)
  RVVCALL(OPFVV1, vfwcvt_f_xu_v_w, WOP_UU_W, H8, H4, uint32_to_float64)
 -GEN_VEXT_V_ENV(vfwcvt_f_xu_v_b, 1, 2)
 -GEN_VEXT_V_ENV(vfwcvt_f_xu_v_h, 2, 4)
 -GEN_VEXT_V_ENV(vfwcvt_f_xu_v_w, 4, 8)
 +GEN_VEXT_V_ENV(vfwcvt_f_xu_v_b)
 +GEN_VEXT_V_ENV(vfwcvt_f_xu_v_h)
 +GEN_VEXT_V_ENV(vfwcvt_f_xu_v_w)
  /* vfwcvt.f.x.v vd, vs2, vm # Convert integer to double-width float. */
  RVVCALL(OPFVV1, vfwcvt_f_x_v_b, WOP_UU_B, H2, H1, int8_to_float16)
  RVVCALL(OPFVV1, vfwcvt_f_x_v_h, WOP_UU_H, H4, H2, int16_to_float32)
  RVVCALL(OPFVV1, vfwcvt_f_x_v_w, WOP_UU_W, H8, H4, int32_to_float64)
 -GEN_VEXT_V_ENV(vfwcvt_f_x_v_b, 1, 2)
 -GEN_VEXT_V_ENV(vfwcvt_f_x_v_h, 2, 4)
 -GEN_VEXT_V_ENV(vfwcvt_f_x_v_w, 4, 8)
 +GEN_VEXT_V_ENV(vfwcvt_f_x_v_b)
 +GEN_VEXT_V_ENV(vfwcvt_f_x_v_h)
 +GEN_VEXT_V_ENV(vfwcvt_f_x_v_w)
  /*
   * vfwcvt.f.f.v vd, vs2, vm
@@ -XXX,XX +XXX,XX @@ static uint32_t vfwcvtffv16(uint16_t a, float_status *s)
  RVVCALL(OPFVV1, vfwcvt_f_f_v_h, WOP_UU_H, H4, H2, vfwcvtffv16)
  RVVCALL(OPFVV1, vfwcvt_f_f_v_w, WOP_UU_W, H8, H4, float32_to_float64)
 -GEN_VEXT_V_ENV(vfwcvt_f_f_v_h, 2, 4)
 -GEN_VEXT_V_ENV(vfwcvt_f_f_v_w, 4, 8)
 +GEN_VEXT_V_ENV(vfwcvt_f_f_v_h)
 +GEN_VEXT_V_ENV(vfwcvt_f_f_v_w)
  /* Narrowing Floating-Point/Integer Type-Convert Instructions */
  /* (TD, T2, TX2) */
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_V_ENV(vfwcvt_f_f_v_w, 4, 8)
  RVVCALL(OPFVV1, vfncvt_xu_f_w_b, NOP_UU_B, H1, H2, float16_to_uint8)
  RVVCALL(OPFVV1, vfncvt_xu_f_w_h, NOP_UU_H, H2, H4, float32_to_uint16)
  RVVCALL(OPFVV1, vfncvt_xu_f_w_w, NOP_UU_W, H4, H8, float64_to_uint32)
 -GEN_VEXT_V_ENV(vfncvt_xu_f_w_b, 1, 1)
 -GEN_VEXT_V_ENV(vfncvt_xu_f_w_h, 2, 2)
 -GEN_VEXT_V_ENV(vfncvt_xu_f_w_w, 4, 4)
 +GEN_VEXT_V_ENV(vfncvt_xu_f_w_b)
 +GEN_VEXT_V_ENV(vfncvt_xu_f_w_h)
 +GEN_VEXT_V_ENV(vfncvt_xu_f_w_w)
  /* vfncvt.x.f.v vd, vs2, vm # Convert double-width float to signed integer. */
  RVVCALL(OPFVV1, vfncvt_x_f_w_b, NOP_UU_B, H1, H2, float16_to_int8)
  RVVCALL(OPFVV1, vfncvt_x_f_w_h, NOP_UU_H, H2, H4, float32_to_int16)
  RVVCALL(OPFVV1, vfncvt_x_f_w_w, NOP_UU_W, H4, H8, float64_to_int32)
 -GEN_VEXT_V_ENV(vfncvt_x_f_w_b, 1, 1)
 -GEN_VEXT_V_ENV(vfncvt_x_f_w_h, 2, 2)
 -GEN_VEXT_V_ENV(vfncvt_x_f_w_w, 4, 4)
 +GEN_VEXT_V_ENV(vfncvt_x_f_w_b)
 +GEN_VEXT_V_ENV(vfncvt_x_f_w_h)
 +GEN_VEXT_V_ENV(vfncvt_x_f_w_w)
  /* vfncvt.f.xu.v vd, vs2, vm # Convert double-width unsigned integer to float */
  RVVCALL(OPFVV1, vfncvt_f_xu_w_h, NOP_UU_H, H2, H4, uint32_to_float16)
  RVVCALL(OPFVV1, vfncvt_f_xu_w_w, NOP_UU_W, H4, H8, uint64_to_float32)
 -GEN_VEXT_V_ENV(vfncvt_f_xu_w_h, 2, 2)
 -GEN_VEXT_V_ENV(vfncvt_f_xu_w_w, 4, 4)
 +GEN_VEXT_V_ENV(vfncvt_f_xu_w_h)
 +GEN_VEXT_V_ENV(vfncvt_f_xu_w_w)
  /* vfncvt.f.x.v vd, vs2, vm # Convert double-width integer to float. */
  RVVCALL(OPFVV1, vfncvt_f_x_w_h, NOP_UU_H, H2, H4, int32_to_float16)
  RVVCALL(OPFVV1, vfncvt_f_x_w_w, NOP_UU_W, H4, H8, int64_to_float32)
 -GEN_VEXT_V_ENV(vfncvt_f_x_w_h, 2, 2)
 -GEN_VEXT_V_ENV(vfncvt_f_x_w_w, 4, 4)
 +GEN_VEXT_V_ENV(vfncvt_f_x_w_h)
 +GEN_VEXT_V_ENV(vfncvt_f_x_w_w)
  /* vfncvt.f.f.v vd, vs2, vm # Convert double float to single-width float. */
  static uint16_t vfncvtffv16(uint32_t a, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint16_t vfncvtffv16(uint32_t a, float_status *s)
  RVVCALL(OPFVV1, vfncvt_f_f_w_h, NOP_UU_H, H2, H4, vfncvtffv16)
  RVVCALL(OPFVV1, vfncvt_f_f_w_w, NOP_UU_W, H4, H8, float64_to_float32)
 -GEN_VEXT_V_ENV(vfncvt_f_f_w_h, 2, 2)
 -GEN_VEXT_V_ENV(vfncvt_f_f_w_w, 4, 4)
 +GEN_VEXT_V_ENV(vfncvt_f_f_w_h)
 +GEN_VEXT_V_ENV(vfncvt_f_f_w_w)
  /*
   *** Vector Reduction Operations
 --
-.36.1
+.41.0

-[PULL 20/25] target/riscv: rvv: Add tail agnostic for vector reduction instructions
+[PULL v2 14/45] target/riscv: Refactor some of the generic vector functionality
-From: eopXD <yueh.ting.chen@gmail.com>
+From: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
-Signed-off-by: eop Chen <eop.chen@sifive.com>
+Move some macros out of `vector_helper` and into `vector_internals`.
-Reviewed-by: Frank Chang <frank.chang@sifive.com>
+This ensures they can be used by both vector and vector-crypto helpers
 (latter implemented in proceeding commits).
 Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
 Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
-Acked-by: Alistair Francis <alistair.francis@wdc.com>
+Signed-off-by: Max Chou <max.chou@sifive.com>
-Message-Id: <165449614532.19704.7000832880482980398-13@git.sr.ht>
+Message-ID: <20230711165917.2629866-8-max.chou@sifive.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/vector_helper.c | 20 ++++++++++++++++++++
+ target/riscv/vector_internals.h | 46 +++++++++++++++++++++++++++++++++
-file changed, 20 insertions(+)
+ target/riscv/vector_helper.c    | 42 ------------------------------
 files changed, 46 insertions(+), 42 deletions(-)
+diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/vector_internals.h
++++ b/target/riscv/vector_internals.h
+@@ -XXX,XX +XXX,XX @@ void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
+ /* expand macro args before macro */
+ #define RVVCALL(macro, ...)  macro(__VA_ARGS__)
++/* (TD, T2, TX2) */
++#define OP_UU_B uint8_t, uint8_t, uint8_t
++#define OP_UU_H uint16_t, uint16_t, uint16_t
++#define OP_UU_W uint32_t, uint32_t, uint32_t
++#define OP_UU_D uint64_t, uint64_t, uint64_t
++
+ /* (TD, T1, T2, TX1, TX2) */
+ #define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
+ #define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
+ #define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
+ #define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
++#define OPIVV1(NAME, TD, T2, TX2, HD, HS2, OP)         \
++static void do_##NAME(void *vd, void *vs2, int i)      \
++{                                                      \
++    TX2 s2 = *((T2 *)vs2 + HS2(i));                    \
++    *((TD *)vd + HD(i)) = OP(s2);                      \
++}
++
++#define GEN_VEXT_V(NAME, ESZ)                          \
++void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
++                  CPURISCVState *env, uint32_t desc)   \
++{                                                      \
++    uint32_t vm = vext_vm(desc);                       \
++    uint32_t vl = env->vl;                             \
++    uint32_t total_elems =                             \
++        vext_get_total_elems(env, desc, ESZ);          \
++    uint32_t vta = vext_vta(desc);                     \
++    uint32_t vma = vext_vma(desc);                     \
++    uint32_t i;                                        \
++                                                       \
++    for (i = env->vstart; i < vl; i++) {               \
++        if (!vm && !vext_elem_mask(v0, i)) {           \
++            /* set masked-off elements to 1s */        \
++            vext_set_elems_1s(vd, vma, i * ESZ,        \
++                              (i + 1) * ESZ);          \
++            continue;                                  \
++        }                                              \
++        do_##NAME(vd, vs2, i);                         \
++    }                                                  \
++    env->vstart = 0;                                   \
++    /* set tail elements to 1s */                      \
++    vext_set_elems_1s(vd, vta, vl * ESZ,               \
++                      total_elems * ESZ);              \
++}
++
+ /* operation of two vector elements */
+ typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
+@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
+                do_##NAME, ESZ);                           \
+ }
++/* Three of the widening shortening macros: */
++/* (TD, T1, T2, TX1, TX2) */
++#define WOP_UUU_B uint16_t, uint8_t, uint8_t, uint16_t, uint16_t
++#define WOP_UUU_H uint32_t, uint16_t, uint16_t, uint32_t, uint32_t
++#define WOP_UUU_W uint64_t, uint32_t, uint32_t, uint64_t, uint64_t
++
+ #endif /* TARGET_RISCV_VECTOR_INTERNALS_H */
 diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/vector_helper.c
 +++ b/target/riscv/vector_helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+@@ -XXX,XX +XXX,XX @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
- {                                                         \
+ #define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
-     uint32_t vm = vext_vm(desc);                          \
+ #define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
-     uint32_t vl = env->vl;                                \
+ #define OP_SUS_D int64_t, uint64_t, int64_t, uint64_t, int64_t
-+    uint32_t esz = sizeof(TD);                            \
+-#define WOP_UUU_B uint16_t, uint8_t, uint8_t, uint16_t, uint16_t
-+    uint32_t vlenb = simd_maxsz(desc);                    \
+-#define WOP_UUU_H uint32_t, uint16_t, uint16_t, uint32_t, uint32_t
-+    uint32_t vta = vext_vta(desc);                        \
+-#define WOP_UUU_W uint64_t, uint32_t, uint32_t, uint64_t, uint64_t
-     uint32_t i;                                           \
+ #define WOP_SSS_B int16_t, int8_t, int8_t, int16_t, int16_t
-     TD s1 =  *((TD *)vs1 + HD(0));                        \
+ #define WOP_SSS_H int32_t, int16_t, int16_t, int32_t, int32_t
-                                                           \
+ #define WOP_SSS_W int64_t, int32_t, int32_t, int64_t, int64_t
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VF(vfwnmsac_vf_h, 4)
-     }                                                     \
+ GEN_VEXT_VF(vfwnmsac_vf_w, 8)
-     *((TD *)vd + HD(0)) = s1;                             \
-     env->vstart = 0;                                      \
+ /* Vector Floating-Point Square-Root Instruction */
-+    /* set tail elements to 1s */                         \
+-/* (TD, T2, TX2) */
-+    vext_set_elems_1s(vd, vta, esz, vlenb);               \
+-#define OP_UU_H uint16_t, uint16_t, uint16_t
- }
+-#define OP_UU_W uint32_t, uint32_t, uint32_t
+-#define OP_UU_D uint64_t, uint64_t, uint64_t
- /* vd[0] = sum(vs1[0], vs2[*]) */
+-
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,           \
+ #define OPFVV1(NAME, TD, T2, TX2, HD, HS2, OP)         \
- {                                                          \
+ static void do_##NAME(void *vd, void *vs2, int i,      \
-     uint32_t vm = vext_vm(desc);                           \
+                       CPURISCVState *env)              \
-     uint32_t vl = env->vl;                                 \
+@@ -XXX,XX +XXX,XX @@ GEN_VEXT_CMP_VF(vmfge_vf_w, uint32_t, H4, vmfge32)
-+    uint32_t esz = sizeof(TD);                             \
+ GEN_VEXT_CMP_VF(vmfge_vf_d, uint64_t, H8, vmfge64)
-+    uint32_t vlenb = simd_maxsz(desc);                     \
-+    uint32_t vta = vext_vta(desc);                         \
+ /* Vector Floating-Point Classify Instruction */
-     uint32_t i;                                            \
+-#define OPIVV1(NAME, TD, T2, TX2, HD, HS2, OP)         \
-     TD s1 =  *((TD *)vs1 + HD(0));                         \
+-static void do_##NAME(void *vd, void *vs2, int i)      \
-                                                            \
+-{                                                      \
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,           \
+-    TX2 s2 = *((T2 *)vs2 + HS2(i));                    \
-     }                                                      \
+-    *((TD *)vd + HD(i)) = OP(s2);                      \
-     *((TD *)vd + HD(0)) = s1;                              \
+-}
-     env->vstart = 0;                                       \
+-
-+    /* set tail elements to 1s */                          \
+-#define GEN_VEXT_V(NAME, ESZ)                          \
-+    vext_set_elems_1s(vd, vta, esz, vlenb);                \
+-void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
- }
+-                  CPURISCVState *env, uint32_t desc)   \
+-{                                                      \
- /* Unordered sum */
+-    uint32_t vm = vext_vm(desc);                       \
-@@ -XXX,XX +XXX,XX @@ void HELPER(vfwredsum_vs_h)(void *vd, void *v0, void *vs1,
+-    uint32_t vl = env->vl;                             \
 -    uint32_t total_elems =                             \
 -        vext_get_total_elems(env, desc, ESZ);          \
 -    uint32_t vta = vext_vta(desc);                     \
 -    uint32_t vma = vext_vma(desc);                     \
 -    uint32_t i;                                        \
 -                                                       \
 -    for (i = env->vstart; i < vl; i++) {               \
 -        if (!vm && !vext_elem_mask(v0, i)) {           \
 -            /* set masked-off elements to 1s */        \
 -            vext_set_elems_1s(vd, vma, i * ESZ,        \
 -                              (i + 1) * ESZ);          \
 -            continue;                                  \
 -        }                                              \
 -        do_##NAME(vd, vs2, i);                         \
 -    }                                                  \
 -    env->vstart = 0;                                   \
 -    /* set tail elements to 1s */                      \
 -    vext_set_elems_1s(vd, vta, vl * ESZ,               \
 -                      total_elems * ESZ);              \
 -}
 -
  target_ulong fclass_h(uint64_t frs1)
  {
-     uint32_t vm = vext_vm(desc);
+     float16 f = frs1;
      uint32_t vl = env->vl;
 +    uint32_t esz = sizeof(uint32_t);
 +    uint32_t vlenb = simd_maxsz(desc);
 +    uint32_t vta = vext_vta(desc);
      uint32_t i;
      uint32_t s1 =  *((uint32_t *)vs1 + H4(0));
@@ -XXX,XX +XXX,XX @@ void HELPER(vfwredsum_vs_h)(void *vd, void *v0, void *vs1,
      }
      *((uint32_t *)vd + H4(0)) = s1;
      env->vstart = 0;
 +    /* set tail elements to 1s */
 +    vext_set_elems_1s(vd, vta, esz, vlenb);
  }
  void HELPER(vfwredsum_vs_w)(void *vd, void *v0, void *vs1,
@@ -XXX,XX +XXX,XX @@ void HELPER(vfwredsum_vs_w)(void *vd, void *v0, void *vs1,
  {
      uint32_t vm = vext_vm(desc);
      uint32_t vl = env->vl;
 +    uint32_t esz = sizeof(uint64_t);
 +    uint32_t vlenb = simd_maxsz(desc);
 +    uint32_t vta = vext_vta(desc);
      uint32_t i;
      uint64_t s1 =  *((uint64_t *)vs1);
@@ -XXX,XX +XXX,XX @@ void HELPER(vfwredsum_vs_w)(void *vd, void *v0, void *vs1,
      }
      *((uint64_t *)vd) = s1;
      env->vstart = 0;
 +    /* set tail elements to 1s */
 +    vext_set_elems_1s(vd, vta, esz, vlenb);
  }
  /*
 --
-.36.1
+.41.0

-New patch
+[PULL v2 15/45] target/riscv: Add Zvbb ISA extension support
+From: Dickon Hood <dickon.hood@codethink.co.uk>
+This commit adds support for the Zvbb vector-crypto extension, which
+consists of the following instructions:
+* vrol.[vv,vx]
+* vror.[vv,vx,vi]
+* vbrev8.v
+* vrev8.v
+* vandn.[vv,vx]
+* vbrev.v
+* vclz.v
+* vctz.v
+* vcpop.v
+* vwsll.[vv,vx,vi]
+Translation functions are defined in
+`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
+`target/riscv/vcrypto_helper.c`.
+Co-authored-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
+Co-authored-by: William Salmon <will.salmon@codethink.co.uk>
+Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
+[max.chou@sifive.com: Fix imm mode of vror.vi]
+Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
+Signed-off-by: William Salmon <will.salmon@codethink.co.uk>
+Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
+Signed-off-by: Dickon Hood <dickon.hood@codethink.co.uk>
+Signed-off-by: Max Chou <max.chou@sifive.com>
+Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
+[max.chou@sifive.com: Exposed x-zvbb property]
+Message-ID: <20230711165917.2629866-9-max.chou@sifive.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ target/riscv/cpu_cfg.h                   |   1 +
+ target/riscv/helper.h                    |  62 +++++++++
+ target/riscv/insn32.decode               |  20 +++
+ target/riscv/cpu.c                       |  12 ++
+ target/riscv/vcrypto_helper.c            | 138 +++++++++++++++++++
+ target/riscv/insn_trans/trans_rvvk.c.inc | 164 +++++++++++++++++++++++
+files changed, 397 insertions(+)
+diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/cpu_cfg.h
++++ b/target/riscv/cpu_cfg.h
+@@ -XXX,XX +XXX,XX @@ struct RISCVCPUConfig {
+     bool ext_zve32f;
+     bool ext_zve64f;
+     bool ext_zve64d;
++    bool ext_zvbb;
+     bool ext_zvbc;
+     bool ext_zmmul;
+     bool ext_zvfbfmin;
+diff --git a/target/riscv/helper.h b/target/riscv/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/helper.h
++++ b/target/riscv/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
+ DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
+ DEF_HELPER_6(vclmulh_vv, void, ptr, ptr, ptr, ptr, env, i32)
+ DEF_HELPER_6(vclmulh_vx, void, ptr, ptr, tl, ptr, env, i32)
++
++DEF_HELPER_6(vror_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
++DEF_HELPER_6(vror_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
++DEF_HELPER_6(vror_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
++DEF_HELPER_6(vror_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
++
++DEF_HELPER_6(vror_vx_b, void, ptr, ptr, tl, ptr, env, i32)
++DEF_HELPER_6(vror_vx_h, void, ptr, ptr, tl, ptr, env, i32)
++DEF_HELPER_6(vror_vx_w, void, ptr, ptr, tl, ptr, env, i32)
++DEF_HELPER_6(vror_vx_d, void, ptr, ptr, tl, ptr, env, i32)
++
++DEF_HELPER_6(vrol_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
++DEF_HELPER_6(vrol_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
++DEF_HELPER_6(vrol_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
++DEF_HELPER_6(vrol_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
++
++DEF_HELPER_6(vrol_vx_b, void, ptr, ptr, tl, ptr, env, i32)
++DEF_HELPER_6(vrol_vx_h, void, ptr, ptr, tl, ptr, env, i32)
++DEF_HELPER_6(vrol_vx_w, void, ptr, ptr, tl, ptr, env, i32)
++DEF_HELPER_6(vrol_vx_d, void, ptr, ptr, tl, ptr, env, i32)
++
++DEF_HELPER_5(vrev8_v_b, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vrev8_v_h, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vrev8_v_w, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vrev8_v_d, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vbrev8_v_b, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vbrev8_v_h, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vbrev8_v_w, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vbrev8_v_d, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vbrev_v_b, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vbrev_v_h, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vbrev_v_w, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vbrev_v_d, void, ptr, ptr, ptr, env, i32)
++
++DEF_HELPER_5(vclz_v_b, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vclz_v_h, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vclz_v_w, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vclz_v_d, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vctz_v_b, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vctz_v_h, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vctz_v_w, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vctz_v_d, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vcpop_v_b, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vcpop_v_h, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vcpop_v_w, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vcpop_v_d, void, ptr, ptr, ptr, env, i32)
++
++DEF_HELPER_6(vwsll_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
++DEF_HELPER_6(vwsll_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
++DEF_HELPER_6(vwsll_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
++DEF_HELPER_6(vwsll_vx_b, void, ptr, ptr, tl, ptr, env, i32)
++DEF_HELPER_6(vwsll_vx_h, void, ptr, ptr, tl, ptr, env, i32)
++DEF_HELPER_6(vwsll_vx_w, void, ptr, ptr, tl, ptr, env, i32)
++
++DEF_HELPER_6(vandn_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
++DEF_HELPER_6(vandn_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
++DEF_HELPER_6(vandn_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
++DEF_HELPER_6(vandn_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
++DEF_HELPER_6(vandn_vx_b, void, ptr, ptr, tl, ptr, env, i32)
++DEF_HELPER_6(vandn_vx_h, void, ptr, ptr, tl, ptr, env, i32)
++DEF_HELPER_6(vandn_vx_w, void, ptr, ptr, tl, ptr, env, i32)
++DEF_HELPER_6(vandn_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/insn32.decode
++++ b/target/riscv/insn32.decode
+@@ -XXX,XX +XXX,XX @@
+ %imm_u    12:s20                 !function=ex_shift_12
+ %imm_bs   30:2                   !function=ex_shift_3
+ %imm_rnum 20:4
++%imm_z6   26:1 15:5
+ # Argument sets:
+ &empty
+@@ -XXX,XX +XXX,XX @@
+ @r_vm    ...... vm:1 ..... ..... ... ..... ....... &rmrr %rs2 %rs1 %rd
+ @r_vm_1  ...... . ..... ..... ... ..... .......    &rmrr vm=1 %rs2 %rs1 %rd
+ @r_vm_0  ...... . ..... ..... ... ..... .......    &rmrr vm=0 %rs2 %rs1 %rd
++@r2_zimm6  ..... . vm:1 ..... ..... ... ..... .......  &rmrr %rs2 rs1=%imm_z6 %rd
+ @r2_zimm11 . zimm:11  ..... ... ..... ....... %rs1 %rd
+ @r2_zimm10 .. zimm:10  ..... ... ..... ....... %rs1 %rd
+ @r2_s    .......   ..... ..... ... ..... ....... %rs2 %rs1
+@@ -XXX,XX +XXX,XX @@ vclmul_vv   001100 . ..... ..... 010 ..... 1010111 @r_vm
+ vclmul_vx   001100 . ..... ..... 110 ..... 1010111 @r_vm
+ vclmulh_vv  001101 . ..... ..... 010 ..... 1010111 @r_vm
+ vclmulh_vx  001101 . ..... ..... 110 ..... 1010111 @r_vm
++
++# *** Zvbb vector crypto extension ***
++vrol_vv     010101 . ..... ..... 000 ..... 1010111 @r_vm
++vrol_vx     010101 . ..... ..... 100 ..... 1010111 @r_vm
++vror_vv     010100 . ..... ..... 000 ..... 1010111 @r_vm
++vror_vx     010100 . ..... ..... 100 ..... 1010111 @r_vm
++vror_vi     01010. . ..... ..... 011 ..... 1010111 @r2_zimm6
++vbrev8_v    010010 . ..... 01000 010 ..... 1010111 @r2_vm
++vrev8_v     010010 . ..... 01001 010 ..... 1010111 @r2_vm
++vandn_vv    000001 . ..... ..... 000 ..... 1010111 @r_vm
++vandn_vx    000001 . ..... ..... 100 ..... 1010111 @r_vm
++vbrev_v     010010 . ..... 01010 010 ..... 1010111 @r2_vm
++vclz_v      010010 . ..... 01100 010 ..... 1010111 @r2_vm
++vctz_v      010010 . ..... 01101 010 ..... 1010111 @r2_vm
++vcpop_v     010010 . ..... 01110 010 ..... 1010111 @r2_vm
++vwsll_vv    110101 . ..... ..... 000 ..... 1010111 @r_vm
++vwsll_vx    110101 . ..... ..... 100 ..... 1010111 @r_vm
++vwsll_vi    110101 . ..... ..... 011 ..... 1010111 @r_vm
+diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/cpu.c
++++ b/target/riscv/cpu.c
+@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
+     ISA_EXT_DATA_ENTRY(zksed, PRIV_VERSION_1_12_0, ext_zksed),
+     ISA_EXT_DATA_ENTRY(zksh, PRIV_VERSION_1_12_0, ext_zksh),
+     ISA_EXT_DATA_ENTRY(zkt, PRIV_VERSION_1_12_0, ext_zkt),
++    ISA_EXT_DATA_ENTRY(zvbb, PRIV_VERSION_1_12_0, ext_zvbb),
+     ISA_EXT_DATA_ENTRY(zvbc, PRIV_VERSION_1_12_0, ext_zvbc),
+     ISA_EXT_DATA_ENTRY(zve32f, PRIV_VERSION_1_10_0, ext_zve32f),
+     ISA_EXT_DATA_ENTRY(zve64f, PRIV_VERSION_1_10_0, ext_zve64f),
+@@ -XXX,XX +XXX,XX @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
+         return;
+     }
++    /*
++     * In principle Zve*x would also suffice here, were they supported
++     * in qemu
++     */
++    if (cpu->cfg.ext_zvbb && !cpu->cfg.ext_zve32f) {
++        error_setg(errp,
++                   "Vector crypto extensions require V or Zve* extensions");
++        return;
++    }
++
+     if (cpu->cfg.ext_zvbc && !cpu->cfg.ext_zve64f) {
+         error_setg(errp, "Zvbc extension requires V or Zve64{f,d} extensions");
+         return;
+@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
+     DEFINE_PROP_BOOL("x-zvfbfwma", RISCVCPU, cfg.ext_zvfbfwma, false),
+     /* Vector cryptography extensions */
++    DEFINE_PROP_BOOL("x-zvbb", RISCVCPU, cfg.ext_zvbb, false),
+     DEFINE_PROP_BOOL("x-zvbc", RISCVCPU, cfg.ext_zvbc, false),
+     DEFINE_PROP_END_OF_LIST(),
+diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/vcrypto_helper.c
++++ b/target/riscv/vcrypto_helper.c
+@@ -XXX,XX +XXX,XX @@
+ #include "qemu/osdep.h"
+ #include "qemu/host-utils.h"
+ #include "qemu/bitops.h"
++#include "qemu/bswap.h"
+ #include "cpu.h"
+ #include "exec/memop.h"
+ #include "exec/exec-all.h"
+@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vclmulh_vv, OP_UUU_D, H8, H8, H8, clmulh64)
+ GEN_VEXT_VV(vclmulh_vv, 8)
+ RVVCALL(OPIVX2, vclmulh_vx, OP_UUU_D, H8, H8, clmulh64)
+ GEN_VEXT_VX(vclmulh_vx, 8)
++
++RVVCALL(OPIVV2, vror_vv_b, OP_UUU_B, H1, H1, H1, ror8)
++RVVCALL(OPIVV2, vror_vv_h, OP_UUU_H, H2, H2, H2, ror16)
++RVVCALL(OPIVV2, vror_vv_w, OP_UUU_W, H4, H4, H4, ror32)
++RVVCALL(OPIVV2, vror_vv_d, OP_UUU_D, H8, H8, H8, ror64)
++GEN_VEXT_VV(vror_vv_b, 1)
++GEN_VEXT_VV(vror_vv_h, 2)
++GEN_VEXT_VV(vror_vv_w, 4)
++GEN_VEXT_VV(vror_vv_d, 8)
++
++RVVCALL(OPIVX2, vror_vx_b, OP_UUU_B, H1, H1, ror8)
++RVVCALL(OPIVX2, vror_vx_h, OP_UUU_H, H2, H2, ror16)
++RVVCALL(OPIVX2, vror_vx_w, OP_UUU_W, H4, H4, ror32)
++RVVCALL(OPIVX2, vror_vx_d, OP_UUU_D, H8, H8, ror64)
++GEN_VEXT_VX(vror_vx_b, 1)
++GEN_VEXT_VX(vror_vx_h, 2)
++GEN_VEXT_VX(vror_vx_w, 4)
++GEN_VEXT_VX(vror_vx_d, 8)
++
++RVVCALL(OPIVV2, vrol_vv_b, OP_UUU_B, H1, H1, H1, rol8)
++RVVCALL(OPIVV2, vrol_vv_h, OP_UUU_H, H2, H2, H2, rol16)
++RVVCALL(OPIVV2, vrol_vv_w, OP_UUU_W, H4, H4, H4, rol32)
++RVVCALL(OPIVV2, vrol_vv_d, OP_UUU_D, H8, H8, H8, rol64)
++GEN_VEXT_VV(vrol_vv_b, 1)
++GEN_VEXT_VV(vrol_vv_h, 2)
++GEN_VEXT_VV(vrol_vv_w, 4)
++GEN_VEXT_VV(vrol_vv_d, 8)
++
++RVVCALL(OPIVX2, vrol_vx_b, OP_UUU_B, H1, H1, rol8)
++RVVCALL(OPIVX2, vrol_vx_h, OP_UUU_H, H2, H2, rol16)
++RVVCALL(OPIVX2, vrol_vx_w, OP_UUU_W, H4, H4, rol32)
++RVVCALL(OPIVX2, vrol_vx_d, OP_UUU_D, H8, H8, rol64)
++GEN_VEXT_VX(vrol_vx_b, 1)
++GEN_VEXT_VX(vrol_vx_h, 2)
++GEN_VEXT_VX(vrol_vx_w, 4)
++GEN_VEXT_VX(vrol_vx_d, 8)
++
++static uint64_t brev8(uint64_t val)
++{
++    val = ((val & 0x5555555555555555ull) << 1) |
++          ((val & 0xAAAAAAAAAAAAAAAAull) >> 1);
++    val = ((val & 0x3333333333333333ull) << 2) |
++          ((val & 0xCCCCCCCCCCCCCCCCull) >> 2);
++    val = ((val & 0x0F0F0F0F0F0F0F0Full) << 4) |
++          ((val & 0xF0F0F0F0F0F0F0F0ull) >> 4);
++
++    return val;
++}
++
++RVVCALL(OPIVV1, vbrev8_v_b, OP_UU_B, H1, H1, brev8)
++RVVCALL(OPIVV1, vbrev8_v_h, OP_UU_H, H2, H2, brev8)
++RVVCALL(OPIVV1, vbrev8_v_w, OP_UU_W, H4, H4, brev8)
++RVVCALL(OPIVV1, vbrev8_v_d, OP_UU_D, H8, H8, brev8)
++GEN_VEXT_V(vbrev8_v_b, 1)
++GEN_VEXT_V(vbrev8_v_h, 2)
++GEN_VEXT_V(vbrev8_v_w, 4)
++GEN_VEXT_V(vbrev8_v_d, 8)
++
++#define DO_IDENTITY(a) (a)
++RVVCALL(OPIVV1, vrev8_v_b, OP_UU_B, H1, H1, DO_IDENTITY)
++RVVCALL(OPIVV1, vrev8_v_h, OP_UU_H, H2, H2, bswap16)
++RVVCALL(OPIVV1, vrev8_v_w, OP_UU_W, H4, H4, bswap32)
++RVVCALL(OPIVV1, vrev8_v_d, OP_UU_D, H8, H8, bswap64)
++GEN_VEXT_V(vrev8_v_b, 1)
++GEN_VEXT_V(vrev8_v_h, 2)
++GEN_VEXT_V(vrev8_v_w, 4)
++GEN_VEXT_V(vrev8_v_d, 8)
++
++#define DO_ANDN(a, b) ((a) & ~(b))
++RVVCALL(OPIVV2, vandn_vv_b, OP_UUU_B, H1, H1, H1, DO_ANDN)
++RVVCALL(OPIVV2, vandn_vv_h, OP_UUU_H, H2, H2, H2, DO_ANDN)
++RVVCALL(OPIVV2, vandn_vv_w, OP_UUU_W, H4, H4, H4, DO_ANDN)
++RVVCALL(OPIVV2, vandn_vv_d, OP_UUU_D, H8, H8, H8, DO_ANDN)
++GEN_VEXT_VV(vandn_vv_b, 1)
++GEN_VEXT_VV(vandn_vv_h, 2)
++GEN_VEXT_VV(vandn_vv_w, 4)
++GEN_VEXT_VV(vandn_vv_d, 8)
++
++RVVCALL(OPIVX2, vandn_vx_b, OP_UUU_B, H1, H1, DO_ANDN)
++RVVCALL(OPIVX2, vandn_vx_h, OP_UUU_H, H2, H2, DO_ANDN)
++RVVCALL(OPIVX2, vandn_vx_w, OP_UUU_W, H4, H4, DO_ANDN)
++RVVCALL(OPIVX2, vandn_vx_d, OP_UUU_D, H8, H8, DO_ANDN)
++GEN_VEXT_VX(vandn_vx_b, 1)
++GEN_VEXT_VX(vandn_vx_h, 2)
++GEN_VEXT_VX(vandn_vx_w, 4)
++GEN_VEXT_VX(vandn_vx_d, 8)
++
++RVVCALL(OPIVV1, vbrev_v_b, OP_UU_B, H1, H1, revbit8)
++RVVCALL(OPIVV1, vbrev_v_h, OP_UU_H, H2, H2, revbit16)
++RVVCALL(OPIVV1, vbrev_v_w, OP_UU_W, H4, H4, revbit32)
++RVVCALL(OPIVV1, vbrev_v_d, OP_UU_D, H8, H8, revbit64)
++GEN_VEXT_V(vbrev_v_b, 1)
++GEN_VEXT_V(vbrev_v_h, 2)
++GEN_VEXT_V(vbrev_v_w, 4)
++GEN_VEXT_V(vbrev_v_d, 8)
++
++RVVCALL(OPIVV1, vclz_v_b, OP_UU_B, H1, H1, clz8)
++RVVCALL(OPIVV1, vclz_v_h, OP_UU_H, H2, H2, clz16)
++RVVCALL(OPIVV1, vclz_v_w, OP_UU_W, H4, H4, clz32)
++RVVCALL(OPIVV1, vclz_v_d, OP_UU_D, H8, H8, clz64)
++GEN_VEXT_V(vclz_v_b, 1)
++GEN_VEXT_V(vclz_v_h, 2)
++GEN_VEXT_V(vclz_v_w, 4)
++GEN_VEXT_V(vclz_v_d, 8)
++
++RVVCALL(OPIVV1, vctz_v_b, OP_UU_B, H1, H1, ctz8)
++RVVCALL(OPIVV1, vctz_v_h, OP_UU_H, H2, H2, ctz16)
++RVVCALL(OPIVV1, vctz_v_w, OP_UU_W, H4, H4, ctz32)
++RVVCALL(OPIVV1, vctz_v_d, OP_UU_D, H8, H8, ctz64)
++GEN_VEXT_V(vctz_v_b, 1)
++GEN_VEXT_V(vctz_v_h, 2)
++GEN_VEXT_V(vctz_v_w, 4)
++GEN_VEXT_V(vctz_v_d, 8)
++
++RVVCALL(OPIVV1, vcpop_v_b, OP_UU_B, H1, H1, ctpop8)
++RVVCALL(OPIVV1, vcpop_v_h, OP_UU_H, H2, H2, ctpop16)
++RVVCALL(OPIVV1, vcpop_v_w, OP_UU_W, H4, H4, ctpop32)
++RVVCALL(OPIVV1, vcpop_v_d, OP_UU_D, H8, H8, ctpop64)
++GEN_VEXT_V(vcpop_v_b, 1)
++GEN_VEXT_V(vcpop_v_h, 2)
++GEN_VEXT_V(vcpop_v_w, 4)
++GEN_VEXT_V(vcpop_v_d, 8)
++
++#define DO_SLL(N, M) (N << (M & (sizeof(N) * 8 - 1)))
++RVVCALL(OPIVV2, vwsll_vv_b, WOP_UUU_B, H2, H1, H1, DO_SLL)
++RVVCALL(OPIVV2, vwsll_vv_h, WOP_UUU_H, H4, H2, H2, DO_SLL)
++RVVCALL(OPIVV2, vwsll_vv_w, WOP_UUU_W, H8, H4, H4, DO_SLL)
++GEN_VEXT_VV(vwsll_vv_b, 2)
++GEN_VEXT_VV(vwsll_vv_h, 4)
++GEN_VEXT_VV(vwsll_vv_w, 8)
++
++RVVCALL(OPIVX2, vwsll_vx_b, WOP_UUU_B, H2, H1, DO_SLL)
++RVVCALL(OPIVX2, vwsll_vx_h, WOP_UUU_H, H4, H2, DO_SLL)
++RVVCALL(OPIVX2, vwsll_vx_w, WOP_UUU_W, H8, H4, DO_SLL)
++GEN_VEXT_VX(vwsll_vx_b, 2)
++GEN_VEXT_VX(vwsll_vx_h, 4)
++GEN_VEXT_VX(vwsll_vx_w, 8)
+diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc b/target/riscv/insn_trans/trans_rvvk.c.inc
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/insn_trans/trans_rvvk.c.inc
++++ b/target/riscv/insn_trans/trans_rvvk.c.inc
+@@ -XXX,XX +XXX,XX @@ static bool vclmul_vx_check(DisasContext *s, arg_rmrr *a)
+ GEN_VX_MASKED_TRANS(vclmul_vx, vclmul_vx_check)
+ GEN_VX_MASKED_TRANS(vclmulh_vx, vclmul_vx_check)
++
++/*
++ * Zvbb
++ */
++
++#define GEN_OPIVI_GVEC_TRANS_CHECK(NAME, IMM_MODE, OPIVX, SUF, CHECK)   \
++    static bool trans_##NAME(DisasContext *s, arg_rmrr *a)              \
++    {                                                                   \
++        if (CHECK(s, a)) {                                              \
++            static gen_helper_opivx *const fns[4] = {                   \
++                gen_helper_##OPIVX##_b,                                 \
++                gen_helper_##OPIVX##_h,                                 \
++                gen_helper_##OPIVX##_w,                                 \
++                gen_helper_##OPIVX##_d,                                 \
++            };                                                          \
++            return do_opivi_gvec(s, a, tcg_gen_gvec_##SUF, fns[s->sew], \
++                                 IMM_MODE);                             \
++        }                                                               \
++        return false;                                                   \
++    }
++
++#define GEN_OPIVV_GVEC_TRANS_CHECK(NAME, SUF, CHECK)                     \
++    static bool trans_##NAME(DisasContext *s, arg_rmrr *a)               \
++    {                                                                    \
++        if (CHECK(s, a)) {                                               \
++            static gen_helper_gvec_4_ptr *const fns[4] = {               \
++                gen_helper_##NAME##_b,                                   \
++                gen_helper_##NAME##_h,                                   \
++                gen_helper_##NAME##_w,                                   \
++                gen_helper_##NAME##_d,                                   \
++            };                                                           \
++            return do_opivv_gvec(s, a, tcg_gen_gvec_##SUF, fns[s->sew]); \
++        }                                                                \
++        return false;                                                    \
++    }
++
++#define GEN_OPIVX_GVEC_SHIFT_TRANS_CHECK(NAME, SUF, CHECK)       \
++    static bool trans_##NAME(DisasContext *s, arg_rmrr *a)       \
++    {                                                            \
++        if (CHECK(s, a)) {                                       \
++            static gen_helper_opivx *const fns[4] = {            \
++                gen_helper_##NAME##_b,                           \
++                gen_helper_##NAME##_h,                           \
++                gen_helper_##NAME##_w,                           \
++                gen_helper_##NAME##_d,                           \
++            };                                                   \
++            return do_opivx_gvec_shift(s, a, tcg_gen_gvec_##SUF, \
++                                       fns[s->sew]);             \
++        }                                                        \
++        return false;                                            \
++    }
++
++static bool zvbb_vv_check(DisasContext *s, arg_rmrr *a)
++{
++    return opivv_check(s, a) && s->cfg_ptr->ext_zvbb == true;
++}
++
++static bool zvbb_vx_check(DisasContext *s, arg_rmrr *a)
++{
++    return opivx_check(s, a) && s->cfg_ptr->ext_zvbb == true;
++}
++
++/* vrol.v[vx] */
++GEN_OPIVV_GVEC_TRANS_CHECK(vrol_vv, rotlv, zvbb_vv_check)
++GEN_OPIVX_GVEC_SHIFT_TRANS_CHECK(vrol_vx, rotls, zvbb_vx_check)
++
++/* vror.v[vxi] */
++GEN_OPIVV_GVEC_TRANS_CHECK(vror_vv, rotrv, zvbb_vv_check)
++GEN_OPIVX_GVEC_SHIFT_TRANS_CHECK(vror_vx, rotrs, zvbb_vx_check)
++GEN_OPIVI_GVEC_TRANS_CHECK(vror_vi, IMM_TRUNC_SEW, vror_vx, rotri, zvbb_vx_check)
++
++#define GEN_OPIVX_GVEC_TRANS_CHECK(NAME, SUF, CHECK)                     \
++    static bool trans_##NAME(DisasContext *s, arg_rmrr *a)               \
++    {                                                                    \
++        if (CHECK(s, a)) {                                               \
++            static gen_helper_opivx *const fns[4] = {                    \
++                gen_helper_##NAME##_b,                                   \
++                gen_helper_##NAME##_h,                                   \
++                gen_helper_##NAME##_w,                                   \
++                gen_helper_##NAME##_d,                                   \
++            };                                                           \
++            return do_opivx_gvec(s, a, tcg_gen_gvec_##SUF, fns[s->sew]); \
++        }                                                                \
++        return false;                                                    \
++    }
++
++/* vandn.v[vx] */
++GEN_OPIVV_GVEC_TRANS_CHECK(vandn_vv, andc, zvbb_vv_check)
++GEN_OPIVX_GVEC_TRANS_CHECK(vandn_vx, andcs, zvbb_vx_check)
++
++#define GEN_OPIV_TRANS(NAME, CHECK)                                        \
++    static bool trans_##NAME(DisasContext *s, arg_rmr *a)                  \
++    {                                                                      \
++        if (CHECK(s, a)) {                                                 \
++            uint32_t data = 0;                                             \
++            static gen_helper_gvec_3_ptr *const fns[4] = {                 \
++                gen_helper_##NAME##_b,                                     \
++                gen_helper_##NAME##_h,                                     \
++                gen_helper_##NAME##_w,                                     \
++                gen_helper_##NAME##_d,                                     \
++            };                                                             \
++            TCGLabel *over = gen_new_label();                              \
++            tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);     \
++                                                                           \
++            data = FIELD_DP32(data, VDATA, VM, a->vm);                     \
++            data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                 \
++            data = FIELD_DP32(data, VDATA, VTA, s->vta);                   \
++            data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s); \
++            data = FIELD_DP32(data, VDATA, VMA, s->vma);                   \
++            tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),         \
++                               vreg_ofs(s, a->rs2), cpu_env,               \
++                               s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8, \
++                               data, fns[s->sew]);                         \
++            mark_vs_dirty(s);                                              \
++            gen_set_label(over);                                           \
++            return true;                                                   \
++        }                                                                  \
++        return false;                                                      \
++    }
++
++static bool zvbb_opiv_check(DisasContext *s, arg_rmr *a)
++{
++    return s->cfg_ptr->ext_zvbb == true &&
++           require_rvv(s) &&
++           vext_check_isa_ill(s) &&
++           vext_check_ss(s, a->rd, a->rs2, a->vm);
++}
++
++GEN_OPIV_TRANS(vbrev8_v, zvbb_opiv_check)
++GEN_OPIV_TRANS(vrev8_v, zvbb_opiv_check)
++GEN_OPIV_TRANS(vbrev_v, zvbb_opiv_check)
++GEN_OPIV_TRANS(vclz_v, zvbb_opiv_check)
++GEN_OPIV_TRANS(vctz_v, zvbb_opiv_check)
++GEN_OPIV_TRANS(vcpop_v, zvbb_opiv_check)
++
++static bool vwsll_vv_check(DisasContext *s, arg_rmrr *a)
++{
++    return s->cfg_ptr->ext_zvbb && opivv_widen_check(s, a);
++}
++
++static bool vwsll_vx_check(DisasContext *s, arg_rmrr *a)
++{
++    return s->cfg_ptr->ext_zvbb && opivx_widen_check(s, a);
++}
++
++/* OPIVI without GVEC IR */
++#define GEN_OPIVI_WIDEN_TRANS(NAME, IMM_MODE, OPIVX, CHECK)                  \
++    static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
++    {                                                                        \
++        if (CHECK(s, a)) {                                                   \
++            static gen_helper_opivx *const fns[3] = {                        \
++                gen_helper_##OPIVX##_b,                                      \
++                gen_helper_##OPIVX##_h,                                      \
++                gen_helper_##OPIVX##_w,                                      \
++            };                                                               \
++            return opivi_trans(a->rd, a->rs1, a->rs2, a->vm, fns[s->sew], s, \
++                               IMM_MODE);                                    \
++        }                                                                    \
++        return false;                                                        \
++    }
++
++GEN_OPIVV_WIDEN_TRANS(vwsll_vv, vwsll_vv_check)
++GEN_OPIVX_WIDEN_TRANS(vwsll_vx, vwsll_vx_check)
++GEN_OPIVI_WIDEN_TRANS(vwsll_vi, IMM_ZX, vwsll_vx, vwsll_vx_check)
+--
+.41.0

-New patch
+[PULL v2 16/45] target/riscv: Add Zvkned ISA extension support
+From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
+This commit adds support for the Zvkned vector-crypto extension, which
+consists of the following instructions:
+* vaesef.[vv,vs]
+* vaesdf.[vv,vs]
+* vaesdm.[vv,vs]
+* vaesz.vs
+* vaesem.[vv,vs]
+* vaeskf1.vi
+* vaeskf2.vi
+Translation functions are defined in
+`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
+`target/riscv/vcrypto_helper.c`.
+Co-authored-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
+Co-authored-by: William Salmon <will.salmon@codethink.co.uk>
+[max.chou@sifive.com: Replaced vstart checking by TCG op]
+Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
+Signed-off-by: William Salmon <will.salmon@codethink.co.uk>
+Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
+Signed-off-by: Max Chou <max.chou@sifive.com>
+Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
+[max.chou@sifive.com: Imported aes-round.h and exposed x-zvkned
+property]
+[max.chou@sifive.com: Fixed endian issues and replaced the vstart & vl
+egs checking by helper function]
+[max.chou@sifive.com: Replaced bswap32 calls in aes key expanding]
+Message-ID: <20230711165917.2629866-10-max.chou@sifive.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ target/riscv/cpu_cfg.h                   |   1 +
+ target/riscv/helper.h                    |  14 ++
+ target/riscv/insn32.decode               |  14 ++
+ target/riscv/cpu.c                       |   4 +-
+ target/riscv/vcrypto_helper.c            | 202 +++++++++++++++++++++++
+ target/riscv/insn_trans/trans_rvvk.c.inc | 147 +++++++++++++++++
+files changed, 381 insertions(+), 1 deletion(-)
+diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/cpu_cfg.h
++++ b/target/riscv/cpu_cfg.h
+@@ -XXX,XX +XXX,XX @@ struct RISCVCPUConfig {
+     bool ext_zve64d;
+     bool ext_zvbb;
+     bool ext_zvbc;
++    bool ext_zvkned;
+     bool ext_zmmul;
+     bool ext_zvfbfmin;
+     bool ext_zvfbfwma;
+diff --git a/target/riscv/helper.h b/target/riscv/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/helper.h
++++ b/target/riscv/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_6(vandn_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+ DEF_HELPER_6(vandn_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+ DEF_HELPER_6(vandn_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+ DEF_HELPER_6(vandn_vx_d, void, ptr, ptr, tl, ptr, env, i32)
++
++DEF_HELPER_2(egs_check, void, i32, env)
++
++DEF_HELPER_4(vaesef_vv, void, ptr, ptr, env, i32)
++DEF_HELPER_4(vaesef_vs, void, ptr, ptr, env, i32)
++DEF_HELPER_4(vaesdf_vv, void, ptr, ptr, env, i32)
++DEF_HELPER_4(vaesdf_vs, void, ptr, ptr, env, i32)
++DEF_HELPER_4(vaesem_vv, void, ptr, ptr, env, i32)
++DEF_HELPER_4(vaesem_vs, void, ptr, ptr, env, i32)
++DEF_HELPER_4(vaesdm_vv, void, ptr, ptr, env, i32)
++DEF_HELPER_4(vaesdm_vs, void, ptr, ptr, env, i32)
++DEF_HELPER_4(vaesz_vs, void, ptr, ptr, env, i32)
++DEF_HELPER_5(vaeskf1_vi, void, ptr, ptr, i32, env, i32)
++DEF_HELPER_5(vaeskf2_vi, void, ptr, ptr, i32, env, i32)
+diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/insn32.decode
++++ b/target/riscv/insn32.decode
+@@ -XXX,XX +XXX,XX @@
+ @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
+ @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
+ @r2      .......   ..... ..... ... ..... ....... &r2 %rs1 %rd
++@r2_vm_1 ...... . ..... ..... ... ..... ....... &rmr vm=1 %rs2 %rd
+ @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
+ @r2_vm   ...... vm:1 ..... ..... ... ..... ....... &rmr %rs2 %rd
+ @r1_vm   ...... vm:1 ..... ..... ... ..... ....... %rd
+@@ -XXX,XX +XXX,XX @@ vcpop_v     010010 . ..... 01110 010 ..... 1010111 @r2_vm
+ vwsll_vv    110101 . ..... ..... 000 ..... 1010111 @r_vm
+ vwsll_vx    110101 . ..... ..... 100 ..... 1010111 @r_vm
+ vwsll_vi    110101 . ..... ..... 011 ..... 1010111 @r_vm
++
++# *** Zvkned vector crypto extension ***
++vaesef_vv   101000 1 ..... 00011 010 ..... 1110111 @r2_vm_1
++vaesef_vs   101001 1 ..... 00011 010 ..... 1110111 @r2_vm_1
++vaesdf_vv   101000 1 ..... 00001 010 ..... 1110111 @r2_vm_1
++vaesdf_vs   101001 1 ..... 00001 010 ..... 1110111 @r2_vm_1
++vaesem_vv   101000 1 ..... 00010 010 ..... 1110111 @r2_vm_1
++vaesem_vs   101001 1 ..... 00010 010 ..... 1110111 @r2_vm_1
++vaesdm_vv   101000 1 ..... 00000 010 ..... 1110111 @r2_vm_1
++vaesdm_vs   101001 1 ..... 00000 010 ..... 1110111 @r2_vm_1
++vaesz_vs    101001 1 ..... 00111 010 ..... 1110111 @r2_vm_1
++vaeskf1_vi  100010 1 ..... ..... 010 ..... 1110111 @r_vm_1
++vaeskf2_vi  101010 1 ..... ..... 010 ..... 1110111 @r_vm_1
+diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/cpu.c
++++ b/target/riscv/cpu.c
+@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
+     ISA_EXT_DATA_ENTRY(zvfbfwma, PRIV_VERSION_1_12_0, ext_zvfbfwma),
+     ISA_EXT_DATA_ENTRY(zvfh, PRIV_VERSION_1_12_0, ext_zvfh),
+     ISA_EXT_DATA_ENTRY(zvfhmin, PRIV_VERSION_1_12_0, ext_zvfhmin),
++    ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned),
+     ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
+     ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
+     ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
+@@ -XXX,XX +XXX,XX @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
+      * In principle Zve*x would also suffice here, were they supported
+      * in qemu
+      */
+-    if (cpu->cfg.ext_zvbb && !cpu->cfg.ext_zve32f) {
++    if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned) && !cpu->cfg.ext_zve32f) {
+         error_setg(errp,
+                    "Vector crypto extensions require V or Zve* extensions");
+         return;
+@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
+     /* Vector cryptography extensions */
+     DEFINE_PROP_BOOL("x-zvbb", RISCVCPU, cfg.ext_zvbb, false),
+     DEFINE_PROP_BOOL("x-zvbc", RISCVCPU, cfg.ext_zvbc, false),
++    DEFINE_PROP_BOOL("x-zvkned", RISCVCPU, cfg.ext_zvkned, false),
+     DEFINE_PROP_END_OF_LIST(),
+ };
+diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/vcrypto_helper.c
++++ b/target/riscv/vcrypto_helper.c
+@@ -XXX,XX +XXX,XX @@
+ #include "qemu/bitops.h"
+ #include "qemu/bswap.h"
+ #include "cpu.h"
++#include "crypto/aes.h"
++#include "crypto/aes-round.h"
+ #include "exec/memop.h"
+ #include "exec/exec-all.h"
+ #include "exec/helper-proto.h"
+@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vwsll_vx_w, WOP_UUU_W, H8, H4, DO_SLL)
+ GEN_VEXT_VX(vwsll_vx_b, 2)
+ GEN_VEXT_VX(vwsll_vx_h, 4)
+ GEN_VEXT_VX(vwsll_vx_w, 8)
++
++void HELPER(egs_check)(uint32_t egs, CPURISCVState *env)
++{
++    uint32_t vl = env->vl;
++    uint32_t vstart = env->vstart;
++
++    if (vl % egs != 0 || vstart % egs != 0) {
++        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
++    }
++}
++
++static inline void xor_round_key(AESState *round_state, AESState *round_key)
++{
++    round_state->v = round_state->v ^ round_key->v;
++}
++
++#define GEN_ZVKNED_HELPER_VV(NAME, ...)                                   \
++    void HELPER(NAME)(void *vd, void *vs2, CPURISCVState *env,            \
++                      uint32_t desc)                                      \
++    {                                                                     \
++        uint32_t vl = env->vl;                                            \
++        uint32_t total_elems = vext_get_total_elems(env, desc, 4);        \
++        uint32_t vta = vext_vta(desc);                                    \
++                                                                          \
++        for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {        \
++            AESState round_key;                                           \
++            round_key.d[0] = *((uint64_t *)vs2 + H8(i * 2 + 0));          \
++            round_key.d[1] = *((uint64_t *)vs2 + H8(i * 2 + 1));          \
++            AESState round_state;                                         \
++            round_state.d[0] = *((uint64_t *)vd + H8(i * 2 + 0));         \
++            round_state.d[1] = *((uint64_t *)vd + H8(i * 2 + 1));         \
++            __VA_ARGS__;                                                  \
++            *((uint64_t *)vd + H8(i * 2 + 0)) = round_state.d[0];         \
++            *((uint64_t *)vd + H8(i * 2 + 1)) = round_state.d[1];         \
++        }                                                                 \
++        env->vstart = 0;                                                  \
++        /* set tail elements to 1s */                                     \
++        vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);              \
++    }
++
++#define GEN_ZVKNED_HELPER_VS(NAME, ...)                                   \
++    void HELPER(NAME)(void *vd, void *vs2, CPURISCVState *env,            \
++                      uint32_t desc)                                      \
++    {                                                                     \
++        uint32_t vl = env->vl;                                            \
++        uint32_t total_elems = vext_get_total_elems(env, desc, 4);        \
++        uint32_t vta = vext_vta(desc);                                    \
++                                                                          \
++        for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {        \
++            AESState round_key;                                           \
++            round_key.d[0] = *((uint64_t *)vs2 + H8(0));                  \
++            round_key.d[1] = *((uint64_t *)vs2 + H8(1));                  \
++            AESState round_state;                                         \
++            round_state.d[0] = *((uint64_t *)vd + H8(i * 2 + 0));         \
++            round_state.d[1] = *((uint64_t *)vd + H8(i * 2 + 1));         \
++            __VA_ARGS__;                                                  \
++            *((uint64_t *)vd + H8(i * 2 + 0)) = round_state.d[0];         \
++            *((uint64_t *)vd + H8(i * 2 + 1)) = round_state.d[1];         \
++        }                                                                 \
++        env->vstart = 0;                                                  \
++        /* set tail elements to 1s */                                     \
++        vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);              \
++    }
++
++GEN_ZVKNED_HELPER_VV(vaesef_vv, aesenc_SB_SR_AK(&round_state,
++                                                &round_state,
++                                                &round_key,
++                                                false);)
++GEN_ZVKNED_HELPER_VS(vaesef_vs, aesenc_SB_SR_AK(&round_state,
++                                                &round_state,
++                                                &round_key,
++                                                false);)
++GEN_ZVKNED_HELPER_VV(vaesdf_vv, aesdec_ISB_ISR_AK(&round_state,
++                                                  &round_state,
++                                                  &round_key,
++                                                  false);)
++GEN_ZVKNED_HELPER_VS(vaesdf_vs, aesdec_ISB_ISR_AK(&round_state,
++                                                  &round_state,
++                                                  &round_key,
++                                                  false);)
++GEN_ZVKNED_HELPER_VV(vaesem_vv, aesenc_SB_SR_MC_AK(&round_state,
++                                                   &round_state,
++                                                   &round_key,
++                                                   false);)
++GEN_ZVKNED_HELPER_VS(vaesem_vs, aesenc_SB_SR_MC_AK(&round_state,
++                                                   &round_state,
++                                                   &round_key,
++                                                   false);)
++GEN_ZVKNED_HELPER_VV(vaesdm_vv, aesdec_ISB_ISR_AK_IMC(&round_state,
++                                                      &round_state,
++                                                      &round_key,
++                                                      false);)
++GEN_ZVKNED_HELPER_VS(vaesdm_vs, aesdec_ISB_ISR_AK_IMC(&round_state,
++                                                      &round_state,
++                                                      &round_key,
++                                                      false);)
++GEN_ZVKNED_HELPER_VS(vaesz_vs, xor_round_key(&round_state, &round_key);)
++
++void HELPER(vaeskf1_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
++                        CPURISCVState *env, uint32_t desc)
++{
++    uint32_t *vd = vd_vptr;
++    uint32_t *vs2 = vs2_vptr;
++    uint32_t vl = env->vl;
++    uint32_t total_elems = vext_get_total_elems(env, desc, 4);
++    uint32_t vta = vext_vta(desc);
++
++    uimm &= 0b1111;
++    if (uimm > 10 || uimm == 0) {
++        uimm ^= 0b1000;
++    }
++
++    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
++        uint32_t rk[8], tmp;
++        static const uint32_t rcon[] = {
++            0x00000001, 0x00000002, 0x00000004, 0x00000008, 0x00000010,
++            0x00000020, 0x00000040, 0x00000080, 0x0000001B, 0x00000036,
++        };
++
++        rk[0] = vs2[i * 4 + H4(0)];
++        rk[1] = vs2[i * 4 + H4(1)];
++        rk[2] = vs2[i * 4 + H4(2)];
++        rk[3] = vs2[i * 4 + H4(3)];
++        tmp = ror32(rk[3], 8);
++
++        rk[4] = rk[0] ^ (((uint32_t)AES_sbox[(tmp >> 24) & 0xff] << 24) |
++                         ((uint32_t)AES_sbox[(tmp >> 16) & 0xff] << 16) |
++                         ((uint32_t)AES_sbox[(tmp >> 8) & 0xff] << 8) |
++                         ((uint32_t)AES_sbox[(tmp >> 0) & 0xff] << 0))
++                      ^ rcon[uimm - 1];
++        rk[5] = rk[1] ^ rk[4];
++        rk[6] = rk[2] ^ rk[5];
++        rk[7] = rk[3] ^ rk[6];
++
++        vd[i * 4 + H4(0)] = rk[4];
++        vd[i * 4 + H4(1)] = rk[5];
++        vd[i * 4 + H4(2)] = rk[6];
++        vd[i * 4 + H4(3)] = rk[7];
++    }
++    env->vstart = 0;
++    /* set tail elements to 1s */
++    vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);
++}
++
++void HELPER(vaeskf2_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
++                        CPURISCVState *env, uint32_t desc)
++{
++    uint32_t *vd = vd_vptr;
++    uint32_t *vs2 = vs2_vptr;
++    uint32_t vl = env->vl;
++    uint32_t total_elems = vext_get_total_elems(env, desc, 4);
++    uint32_t vta = vext_vta(desc);
++
++    uimm &= 0b1111;
++    if (uimm > 14 || uimm < 2) {
++        uimm ^= 0b1000;
++    }
++
++    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
++        uint32_t rk[12], tmp;
++        static const uint32_t rcon[] = {
++            0x00000001, 0x00000002, 0x00000004, 0x00000008, 0x00000010,
++            0x00000020, 0x00000040, 0x00000080, 0x0000001B, 0x00000036,
++        };
++
++        rk[0] = vd[i * 4 + H4(0)];
++        rk[1] = vd[i * 4 + H4(1)];
++        rk[2] = vd[i * 4 + H4(2)];
++        rk[3] = vd[i * 4 + H4(3)];
++        rk[4] = vs2[i * 4 + H4(0)];
++        rk[5] = vs2[i * 4 + H4(1)];
++        rk[6] = vs2[i * 4 + H4(2)];
++        rk[7] = vs2[i * 4 + H4(3)];
++
++        if (uimm % 2 == 0) {
++            tmp = ror32(rk[7], 8);
++            rk[8] = rk[0] ^ (((uint32_t)AES_sbox[(tmp >> 24) & 0xff] << 24) |
++                             ((uint32_t)AES_sbox[(tmp >> 16) & 0xff] << 16) |
++                             ((uint32_t)AES_sbox[(tmp >> 8) & 0xff] << 8) |
++                             ((uint32_t)AES_sbox[(tmp >> 0) & 0xff] << 0))
++                          ^ rcon[(uimm - 1) / 2];
++        } else {
++            rk[8] = rk[0] ^ (((uint32_t)AES_sbox[(rk[7] >> 24) & 0xff] << 24) |
++                             ((uint32_t)AES_sbox[(rk[7] >> 16) & 0xff] << 16) |
++                             ((uint32_t)AES_sbox[(rk[7] >> 8) & 0xff] << 8) |
++                             ((uint32_t)AES_sbox[(rk[7] >> 0) & 0xff] << 0));
++        }
++        rk[9] = rk[1] ^ rk[8];
++        rk[10] = rk[2] ^ rk[9];
++        rk[11] = rk[3] ^ rk[10];
++
++        vd[i * 4 + H4(0)] = rk[8];
++        vd[i * 4 + H4(1)] = rk[9];
++        vd[i * 4 + H4(2)] = rk[10];
++        vd[i * 4 + H4(3)] = rk[11];
++    }
++    env->vstart = 0;
++    /* set tail elements to 1s */
++    vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);
++}
+diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc b/target/riscv/insn_trans/trans_rvvk.c.inc
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/insn_trans/trans_rvvk.c.inc
++++ b/target/riscv/insn_trans/trans_rvvk.c.inc
+@@ -XXX,XX +XXX,XX @@ static bool vwsll_vx_check(DisasContext *s, arg_rmrr *a)
+ GEN_OPIVV_WIDEN_TRANS(vwsll_vv, vwsll_vv_check)
+ GEN_OPIVX_WIDEN_TRANS(vwsll_vx, vwsll_vx_check)
+ GEN_OPIVI_WIDEN_TRANS(vwsll_vi, IMM_ZX, vwsll_vx, vwsll_vx_check)
++
++/*
++ * Zvkned
++ */
++
++#define ZVKNED_EGS 4
++
++#define GEN_V_UNMASKED_TRANS(NAME, CHECK, EGS)                                \
++    static bool trans_##NAME(DisasContext *s, arg_##NAME *a)                  \
++    {                                                                         \
++        if (CHECK(s, a)) {                                                    \
++            TCGv_ptr rd_v, rs2_v;                                             \
++            TCGv_i32 desc, egs;                                               \
++            uint32_t data = 0;                                                \
++            TCGLabel *over = gen_new_label();                                 \
++                                                                              \
++            if (!s->vstart_eq_zero || !s->vl_eq_vlmax) {                      \
++                /* save opcode for unwinding in case we throw an exception */ \
++                decode_save_opc(s);                                           \
++                egs = tcg_constant_i32(EGS);                                  \
++                gen_helper_egs_check(egs, cpu_env);                           \
++                tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);    \
++            }                                                                 \
++                                                                              \
++            data = FIELD_DP32(data, VDATA, VM, a->vm);                        \
++            data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                    \
++            data = FIELD_DP32(data, VDATA, VTA, s->vta);                      \
++            data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);    \
++            data = FIELD_DP32(data, VDATA, VMA, s->vma);                      \
++            rd_v = tcg_temp_new_ptr();                                        \
++            rs2_v = tcg_temp_new_ptr();                                       \
++            desc = tcg_constant_i32(                                          \
++                simd_desc(s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8, data)); \
++            tcg_gen_addi_ptr(rd_v, cpu_env, vreg_ofs(s, a->rd));              \
++            tcg_gen_addi_ptr(rs2_v, cpu_env, vreg_ofs(s, a->rs2));            \
++            gen_helper_##NAME(rd_v, rs2_v, cpu_env, desc);                    \
++            mark_vs_dirty(s);                                                 \
++            gen_set_label(over);                                              \
++            return true;                                                      \
++        }                                                                     \
++        return false;                                                         \
++    }
++
++static bool vaes_check_vv(DisasContext *s, arg_rmr *a)
++{
++    int egw_bytes = ZVKNED_EGS << s->sew;
++    return s->cfg_ptr->ext_zvkned == true &&
++           require_rvv(s) &&
++           vext_check_isa_ill(s) &&
++           MAXSZ(s) >= egw_bytes &&
++           require_align(a->rd, s->lmul) &&
++           require_align(a->rs2, s->lmul) &&
++           s->sew == MO_32;
++}
++
++static bool vaes_check_overlap(DisasContext *s, int vd, int vs2)
++{
++    int8_t op_size = s->lmul <= 0 ? 1 : 1 << s->lmul;
++    return !is_overlapped(vd, op_size, vs2, 1);
++}
++
++static bool vaes_check_vs(DisasContext *s, arg_rmr *a)
++{
++    int egw_bytes = ZVKNED_EGS << s->sew;
++    return vaes_check_overlap(s, a->rd, a->rs2) &&
++           MAXSZ(s) >= egw_bytes &&
++           s->cfg_ptr->ext_zvkned == true &&
++           require_rvv(s) &&
++           vext_check_isa_ill(s) &&
++           require_align(a->rd, s->lmul) &&
++           s->sew == MO_32;
++}
++
++GEN_V_UNMASKED_TRANS(vaesef_vv, vaes_check_vv, ZVKNED_EGS)
++GEN_V_UNMASKED_TRANS(vaesef_vs, vaes_check_vs, ZVKNED_EGS)
++GEN_V_UNMASKED_TRANS(vaesdf_vv, vaes_check_vv, ZVKNED_EGS)
++GEN_V_UNMASKED_TRANS(vaesdf_vs, vaes_check_vs, ZVKNED_EGS)
++GEN_V_UNMASKED_TRANS(vaesdm_vv, vaes_check_vv, ZVKNED_EGS)
++GEN_V_UNMASKED_TRANS(vaesdm_vs, vaes_check_vs, ZVKNED_EGS)
++GEN_V_UNMASKED_TRANS(vaesz_vs, vaes_check_vs, ZVKNED_EGS)
++GEN_V_UNMASKED_TRANS(vaesem_vv, vaes_check_vv, ZVKNED_EGS)
++GEN_V_UNMASKED_TRANS(vaesem_vs, vaes_check_vs, ZVKNED_EGS)
++
++#define GEN_VI_UNMASKED_TRANS(NAME, CHECK, EGS)                               \
++    static bool trans_##NAME(DisasContext *s, arg_##NAME *a)                  \
++    {                                                                         \
++        if (CHECK(s, a)) {                                                    \
++            TCGv_ptr rd_v, rs2_v;                                             \
++            TCGv_i32 uimm_v, desc, egs;                                       \
++            uint32_t data = 0;                                                \
++            TCGLabel *over = gen_new_label();                                 \
++                                                                              \
++            if (!s->vstart_eq_zero || !s->vl_eq_vlmax) {                      \
++                /* save opcode for unwinding in case we throw an exception */ \
++                decode_save_opc(s);                                           \
++                egs = tcg_constant_i32(EGS);                                  \
++                gen_helper_egs_check(egs, cpu_env);                           \
++                tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);    \
++            }                                                                 \
++                                                                              \
++            data = FIELD_DP32(data, VDATA, VM, a->vm);                        \
++            data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                    \
++            data = FIELD_DP32(data, VDATA, VTA, s->vta);                      \
++            data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);    \
++            data = FIELD_DP32(data, VDATA, VMA, s->vma);                      \
++                                                                              \
++            rd_v = tcg_temp_new_ptr();                                        \
++            rs2_v = tcg_temp_new_ptr();                                       \
++            uimm_v = tcg_constant_i32(a->rs1);                                \
++            desc = tcg_constant_i32(                                          \
++                simd_desc(s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8, data)); \
++            tcg_gen_addi_ptr(rd_v, cpu_env, vreg_ofs(s, a->rd));              \
++            tcg_gen_addi_ptr(rs2_v, cpu_env, vreg_ofs(s, a->rs2));            \
++            gen_helper_##NAME(rd_v, rs2_v, uimm_v, cpu_env, desc);            \
++            mark_vs_dirty(s);                                                 \
++            gen_set_label(over);                                              \
++            return true;                                                      \
++        }                                                                     \
++        return false;                                                         \
++    }
++
++static bool vaeskf1_check(DisasContext *s, arg_vaeskf1_vi *a)
++{
++    int egw_bytes = ZVKNED_EGS << s->sew;
++    return s->cfg_ptr->ext_zvkned == true &&
++           require_rvv(s) &&
++           vext_check_isa_ill(s) &&
++           MAXSZ(s) >= egw_bytes &&
++           s->sew == MO_32 &&
++           require_align(a->rd, s->lmul) &&
++           require_align(a->rs2, s->lmul);
++}
++
++static bool vaeskf2_check(DisasContext *s, arg_vaeskf2_vi *a)
++{
++    int egw_bytes = ZVKNED_EGS << s->sew;
++    return s->cfg_ptr->ext_zvkned == true &&
++           require_rvv(s) &&
++           vext_check_isa_ill(s) &&
++           MAXSZ(s) >= egw_bytes &&
++           s->sew == MO_32 &&
++           require_align(a->rd, s->lmul) &&
++           require_align(a->rs2, s->lmul);
++}
++
++GEN_VI_UNMASKED_TRANS(vaeskf1_vi, vaeskf1_check, ZVKNED_EGS)
++GEN_VI_UNMASKED_TRANS(vaeskf2_vi, vaeskf2_check, ZVKNED_EGS)
+--
+.41.0

-New patch
+[PULL v2 17/45] target/riscv: Add Zvknh ISA extension support
+From: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
+This commit adds support for the Zvknh vector-crypto extension, which
+consists of the following instructions:
+* vsha2ms.vv
+* vsha2c[hl].vv
+Translation functions are defined in
+`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
+`target/riscv/vcrypto_helper.c`.
+Co-authored-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
+Co-authored-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
+[max.chou@sifive.com: Replaced vstart checking by TCG op]
+Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
+Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
+Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
+Signed-off-by: Max Chou <max.chou@sifive.com>
+Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
+[max.chou@sifive.com: Exposed x-zvknha & x-zvknhb properties]
+[max.chou@sifive.com: Replaced SEW selection to happened during
+translation]
+Message-ID: <20230711165917.2629866-11-max.chou@sifive.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ target/riscv/cpu_cfg.h                   |   2 +
+ target/riscv/helper.h                    |   6 +
+ target/riscv/insn32.decode               |   5 +
+ target/riscv/cpu.c                       |  13 +-
+ target/riscv/vcrypto_helper.c            | 238 +++++++++++++++++++++++
+ target/riscv/insn_trans/trans_rvvk.c.inc | 129 ++++++++++++
+files changed, 390 insertions(+), 3 deletions(-)
+diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/cpu_cfg.h
++++ b/target/riscv/cpu_cfg.h
+@@ -XXX,XX +XXX,XX @@ struct RISCVCPUConfig {
+     bool ext_zvbb;
+     bool ext_zvbc;
+     bool ext_zvkned;
++    bool ext_zvknha;
++    bool ext_zvknhb;
+     bool ext_zmmul;
+     bool ext_zvfbfmin;
+     bool ext_zvfbfwma;
+diff --git a/target/riscv/helper.h b/target/riscv/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/helper.h
++++ b/target/riscv/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(vaesdm_vs, void, ptr, ptr, env, i32)
+ DEF_HELPER_4(vaesz_vs, void, ptr, ptr, env, i32)
+ DEF_HELPER_5(vaeskf1_vi, void, ptr, ptr, i32, env, i32)
+ DEF_HELPER_5(vaeskf2_vi, void, ptr, ptr, i32, env, i32)
++
++DEF_HELPER_5(vsha2ms_vv, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vsha2ch32_vv, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vsha2ch64_vv, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vsha2cl32_vv, void, ptr, ptr, ptr, env, i32)
++DEF_HELPER_5(vsha2cl64_vv, void, ptr, ptr, ptr, env, i32)
+diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/insn32.decode
++++ b/target/riscv/insn32.decode
+@@ -XXX,XX +XXX,XX @@ vaesdm_vs   101001 1 ..... 00000 010 ..... 1110111 @r2_vm_1
+ vaesz_vs    101001 1 ..... 00111 010 ..... 1110111 @r2_vm_1
+ vaeskf1_vi  100010 1 ..... ..... 010 ..... 1110111 @r_vm_1
+ vaeskf2_vi  101010 1 ..... ..... 010 ..... 1110111 @r_vm_1
++
++# *** Zvknh vector crypto extension ***
++vsha2ms_vv  101101 1 ..... ..... 010 ..... 1110111 @r_vm_1
++vsha2ch_vv  101110 1 ..... ..... 010 ..... 1110111 @r_vm_1
++vsha2cl_vv  101111 1 ..... ..... 010 ..... 1110111 @r_vm_1
+diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/cpu.c
++++ b/target/riscv/cpu.c
+@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
+     ISA_EXT_DATA_ENTRY(zvfh, PRIV_VERSION_1_12_0, ext_zvfh),
+     ISA_EXT_DATA_ENTRY(zvfhmin, PRIV_VERSION_1_12_0, ext_zvfhmin),
+     ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned),
++    ISA_EXT_DATA_ENTRY(zvknha, PRIV_VERSION_1_12_0, ext_zvknha),
++    ISA_EXT_DATA_ENTRY(zvknhb, PRIV_VERSION_1_12_0, ext_zvknhb),
+     ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
+     ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
+     ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
+@@ -XXX,XX +XXX,XX @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
+      * In principle Zve*x would also suffice here, were they supported
+      * in qemu
+      */
+-    if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned) && !cpu->cfg.ext_zve32f) {
++    if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned || cpu->cfg.ext_zvknha) &&
++        !cpu->cfg.ext_zve32f) {
+         error_setg(errp,
+                    "Vector crypto extensions require V or Zve* extensions");
+         return;
+     }
+-    if (cpu->cfg.ext_zvbc && !cpu->cfg.ext_zve64f) {
+-        error_setg(errp, "Zvbc extension requires V or Zve64{f,d} extensions");
++    if ((cpu->cfg.ext_zvbc || cpu->cfg.ext_zvknhb) && !cpu->cfg.ext_zve64f) {
++        error_setg(
++            errp,
++            "Zvbc and Zvknhb extensions require V or Zve64{f,d} extensions");
+         return;
+     }
+@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
+     DEFINE_PROP_BOOL("x-zvbb", RISCVCPU, cfg.ext_zvbb, false),
+     DEFINE_PROP_BOOL("x-zvbc", RISCVCPU, cfg.ext_zvbc, false),
+     DEFINE_PROP_BOOL("x-zvkned", RISCVCPU, cfg.ext_zvkned, false),
++    DEFINE_PROP_BOOL("x-zvknha", RISCVCPU, cfg.ext_zvknha, false),
++    DEFINE_PROP_BOOL("x-zvknhb", RISCVCPU, cfg.ext_zvknhb, false),
+     DEFINE_PROP_END_OF_LIST(),
+ };
+diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/vcrypto_helper.c
++++ b/target/riscv/vcrypto_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(vaeskf2_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
+     /* set tail elements to 1s */
+     vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);
+ }
++
++static inline uint32_t sig0_sha256(uint32_t x)
++{
++    return ror32(x, 7) ^ ror32(x, 18) ^ (x >> 3);
++}
++
++static inline uint32_t sig1_sha256(uint32_t x)
++{
++    return ror32(x, 17) ^ ror32(x, 19) ^ (x >> 10);
++}
++
++static inline uint64_t sig0_sha512(uint64_t x)
++{
++    return ror64(x, 1) ^ ror64(x, 8) ^ (x >> 7);
++}
++
++static inline uint64_t sig1_sha512(uint64_t x)
++{
++    return ror64(x, 19) ^ ror64(x, 61) ^ (x >> 6);
++}
++
++static inline void vsha2ms_e32(uint32_t *vd, uint32_t *vs1, uint32_t *vs2)
++{
++    uint32_t res[4];
++    res[0] = sig1_sha256(vs1[H4(2)]) + vs2[H4(1)] + sig0_sha256(vd[H4(1)]) +
++             vd[H4(0)];
++    res[1] = sig1_sha256(vs1[H4(3)]) + vs2[H4(2)] + sig0_sha256(vd[H4(2)]) +
++             vd[H4(1)];
++    res[2] =
++        sig1_sha256(res[0]) + vs2[H4(3)] + sig0_sha256(vd[H4(3)]) + vd[H4(2)];
++    res[3] =
++        sig1_sha256(res[1]) + vs1[H4(0)] + sig0_sha256(vs2[H4(0)]) + vd[H4(3)];
++    vd[H4(3)] = res[3];
++    vd[H4(2)] = res[2];
++    vd[H4(1)] = res[1];
++    vd[H4(0)] = res[0];
++}
++
++static inline void vsha2ms_e64(uint64_t *vd, uint64_t *vs1, uint64_t *vs2)
++{
++    uint64_t res[4];
++    res[0] = sig1_sha512(vs1[2]) + vs2[1] + sig0_sha512(vd[1]) + vd[0];
++    res[1] = sig1_sha512(vs1[3]) + vs2[2] + sig0_sha512(vd[2]) + vd[1];
++    res[2] = sig1_sha512(res[0]) + vs2[3] + sig0_sha512(vd[3]) + vd[2];
++    res[3] = sig1_sha512(res[1]) + vs1[0] + sig0_sha512(vs2[0]) + vd[3];
++    vd[3] = res[3];
++    vd[2] = res[2];
++    vd[1] = res[1];
++    vd[0] = res[0];
++}
++
++void HELPER(vsha2ms_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
++                        uint32_t desc)
++{
++    uint32_t sew = FIELD_EX64(env->vtype, VTYPE, VSEW);
++    uint32_t esz = sew == MO_32 ? 4 : 8;
++    uint32_t total_elems;
++    uint32_t vta = vext_vta(desc);
++
++    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
++        if (sew == MO_32) {
++            vsha2ms_e32(((uint32_t *)vd) + i * 4, ((uint32_t *)vs1) + i * 4,
++                        ((uint32_t *)vs2) + i * 4);
++        } else {
++            /* If not 32 then SEW should be 64 */
++            vsha2ms_e64(((uint64_t *)vd) + i * 4, ((uint64_t *)vs1) + i * 4,
++                        ((uint64_t *)vs2) + i * 4);
++        }
++    }
++    /* set tail elements to 1s */
++    total_elems = vext_get_total_elems(env, desc, esz);
++    vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
++    env->vstart = 0;
++}
++
++static inline uint64_t sum0_64(uint64_t x)
++{
++    return ror64(x, 28) ^ ror64(x, 34) ^ ror64(x, 39);
++}
++
++static inline uint32_t sum0_32(uint32_t x)
++{
++    return ror32(x, 2) ^ ror32(x, 13) ^ ror32(x, 22);
++}
++
++static inline uint64_t sum1_64(uint64_t x)
++{
++    return ror64(x, 14) ^ ror64(x, 18) ^ ror64(x, 41);
++}
++
++static inline uint32_t sum1_32(uint32_t x)
++{
++    return ror32(x, 6) ^ ror32(x, 11) ^ ror32(x, 25);
++}
++
++#define ch(x, y, z) ((x & y) ^ ((~x) & z))
++
++#define maj(x, y, z) ((x & y) ^ (x & z) ^ (y & z))
++
++static void vsha2c_64(uint64_t *vs2, uint64_t *vd, uint64_t *vs1)
++{
++    uint64_t a = vs2[3], b = vs2[2], e = vs2[1], f = vs2[0];
++    uint64_t c = vd[3], d = vd[2], g = vd[1], h = vd[0];
++    uint64_t W0 = vs1[0], W1 = vs1[1];
++    uint64_t T1 = h + sum1_64(e) + ch(e, f, g) + W0;
++    uint64_t T2 = sum0_64(a) + maj(a, b, c);
++
++    h = g;
++    g = f;
++    f = e;
++    e = d + T1;
++    d = c;
++    c = b;
++    b = a;
++    a = T1 + T2;
++
++    T1 = h + sum1_64(e) + ch(e, f, g) + W1;
++    T2 = sum0_64(a) + maj(a, b, c);
++    h = g;
++    g = f;
++    f = e;
++    e = d + T1;
++    d = c;
++    c = b;
++    b = a;
++    a = T1 + T2;
++
++    vd[0] = f;
++    vd[1] = e;
++    vd[2] = b;
++    vd[3] = a;
++}
++
++static void vsha2c_32(uint32_t *vs2, uint32_t *vd, uint32_t *vs1)
++{
++    uint32_t a = vs2[H4(3)], b = vs2[H4(2)], e = vs2[H4(1)], f = vs2[H4(0)];
++    uint32_t c = vd[H4(3)], d = vd[H4(2)], g = vd[H4(1)], h = vd[H4(0)];
++    uint32_t W0 = vs1[H4(0)], W1 = vs1[H4(1)];
++    uint32_t T1 = h + sum1_32(e) + ch(e, f, g) + W0;
++    uint32_t T2 = sum0_32(a) + maj(a, b, c);
++
++    h = g;
++    g = f;
++    f = e;
++    e = d + T1;
++    d = c;
++    c = b;
++    b = a;
++    a = T1 + T2;
++
++    T1 = h + sum1_32(e) + ch(e, f, g) + W1;
++    T2 = sum0_32(a) + maj(a, b, c);
++    h = g;
++    g = f;
++    f = e;
++    e = d + T1;
++    d = c;
++    c = b;
++    b = a;
++    a = T1 + T2;
++
++    vd[H4(0)] = f;
++    vd[H4(1)] = e;
++    vd[H4(2)] = b;
++    vd[H4(3)] = a;
++}
++
++void HELPER(vsha2ch32_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
++                          uint32_t desc)
++{
++    const uint32_t esz = 4;
++    uint32_t total_elems;
++    uint32_t vta = vext_vta(desc);
++
++    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
++        vsha2c_32(((uint32_t *)vs2) + 4 * i, ((uint32_t *)vd) + 4 * i,
++                  ((uint32_t *)vs1) + 4 * i + 2);
++    }
++
++    /* set tail elements to 1s */
++    total_elems = vext_get_total_elems(env, desc, esz);
++    vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
++    env->vstart = 0;
++}
++
++void HELPER(vsha2ch64_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
++                          uint32_t desc)
++{
++    const uint32_t esz = 8;
++    uint32_t total_elems;
++    uint32_t vta = vext_vta(desc);
++
++    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
++        vsha2c_64(((uint64_t *)vs2) + 4 * i, ((uint64_t *)vd) + 4 * i,
++                  ((uint64_t *)vs1) + 4 * i + 2);
++    }
++
++    /* set tail elements to 1s */
++    total_elems = vext_get_total_elems(env, desc, esz);
++    vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
++    env->vstart = 0;
++}
++
++void HELPER(vsha2cl32_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
++                          uint32_t desc)
++{
++    const uint32_t esz = 4;
++    uint32_t total_elems;
++    uint32_t vta = vext_vta(desc);
++
++    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
++        vsha2c_32(((uint32_t *)vs2) + 4 * i, ((uint32_t *)vd) + 4 * i,
++                  (((uint32_t *)vs1) + 4 * i));
++    }
++
++    /* set tail elements to 1s */
++    total_elems = vext_get_total_elems(env, desc, esz);
++    vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
++    env->vstart = 0;
++}
++
++void HELPER(vsha2cl64_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
++                          uint32_t desc)
++{
++    uint32_t esz = 8;
++    uint32_t total_elems;
++    uint32_t vta = vext_vta(desc);
++
++    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
++        vsha2c_64(((uint64_t *)vs2) + 4 * i, ((uint64_t *)vd) + 4 * i,
++                  (((uint64_t *)vs1) + 4 * i));
++    }
++
++    /* set tail elements to 1s */
++    total_elems = vext_get_total_elems(env, desc, esz);
++    vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
++    env->vstart = 0;
++}
+diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc b/target/riscv/insn_trans/trans_rvvk.c.inc
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/insn_trans/trans_rvvk.c.inc
++++ b/target/riscv/insn_trans/trans_rvvk.c.inc
+@@ -XXX,XX +XXX,XX @@ static bool vaeskf2_check(DisasContext *s, arg_vaeskf2_vi *a)
+ GEN_VI_UNMASKED_TRANS(vaeskf1_vi, vaeskf1_check, ZVKNED_EGS)
+ GEN_VI_UNMASKED_TRANS(vaeskf2_vi, vaeskf2_check, ZVKNED_EGS)
++
++/*
++ * Zvknh
++ */
++
++#define ZVKNH_EGS 4
++
++#define GEN_VV_UNMASKED_TRANS(NAME, CHECK, EGS)                               \
++    static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                    \
++    {                                                                         \
++        if (CHECK(s, a)) {                                                    \
++            uint32_t data = 0;                                                \
++            TCGLabel *over = gen_new_label();                                 \
++            TCGv_i32 egs;                                                     \
++                                                                              \
++            if (!s->vstart_eq_zero || !s->vl_eq_vlmax) {                      \
++                /* save opcode for unwinding in case we throw an exception */ \
++                decode_save_opc(s);                                           \
++                egs = tcg_constant_i32(EGS);                                  \
++                gen_helper_egs_check(egs, cpu_env);                           \
++                tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);    \
++            }                                                                 \
++                                                                              \
++            data = FIELD_DP32(data, VDATA, VM, a->vm);                        \
++            data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                    \
++            data = FIELD_DP32(data, VDATA, VTA, s->vta);                      \
++            data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);    \
++            data = FIELD_DP32(data, VDATA, VMA, s->vma);                      \
++                                                                              \
++            tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),       \
++                               vreg_ofs(s, a->rs2), cpu_env,                  \
++                               s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8,    \
++                               data, gen_helper_##NAME);                      \
++                                                                              \
++            mark_vs_dirty(s);                                                 \
++            gen_set_label(over);                                              \
++            return true;                                                      \
++        }                                                                     \
++        return false;                                                         \
++    }
++
++static bool vsha_check_sew(DisasContext *s)
++{
++    return (s->cfg_ptr->ext_zvknha == true && s->sew == MO_32) ||
++           (s->cfg_ptr->ext_zvknhb == true &&
++            (s->sew == MO_32 || s->sew == MO_64));
++}
++
++static bool vsha_check(DisasContext *s, arg_rmrr *a)
++{
++    int egw_bytes = ZVKNH_EGS << s->sew;
++    int mult = 1 << MAX(s->lmul, 0);
++    return opivv_check(s, a) &&
++           vsha_check_sew(s) &&
++           MAXSZ(s) >= egw_bytes &&
++           !is_overlapped(a->rd, mult, a->rs1, mult) &&
++           !is_overlapped(a->rd, mult, a->rs2, mult) &&
++           s->lmul >= 0;
++}
++
++GEN_VV_UNMASKED_TRANS(vsha2ms_vv, vsha_check, ZVKNH_EGS)
++
++static bool trans_vsha2cl_vv(DisasContext *s, arg_rmrr *a)
++{
++    if (vsha_check(s, a)) {
++        uint32_t data = 0;
++        TCGLabel *over = gen_new_label();
++        TCGv_i32 egs;
++
++        if (!s->vstart_eq_zero || !s->vl_eq_vlmax) {
++            /* save opcode for unwinding in case we throw an exception */
++            decode_save_opc(s);
++            egs = tcg_constant_i32(ZVKNH_EGS);
++            gen_helper_egs_check(egs, cpu_env);
++            tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
++        }
++
++        data = FIELD_DP32(data, VDATA, VM, a->vm);
++        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
++        data = FIELD_DP32(data, VDATA, VTA, s->vta);
++        data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);
++        data = FIELD_DP32(data, VDATA, VMA, s->vma);
++
++        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),
++            vreg_ofs(s, a->rs2), cpu_env, s->cfg_ptr->vlen / 8,
++            s->cfg_ptr->vlen / 8, data,
++            s->sew == MO_32 ?
++                gen_helper_vsha2cl32_vv : gen_helper_vsha2cl64_vv);
++
++        mark_vs_dirty(s);
++        gen_set_label(over);
++        return true;
++    }
++    return false;
++}
++
++static bool trans_vsha2ch_vv(DisasContext *s, arg_rmrr *a)
++{
++    if (vsha_check(s, a)) {
++        uint32_t data = 0;
++        TCGLabel *over = gen_new_label();
++        TCGv_i32 egs;
++
++        if (!s->vstart_eq_zero || !s->vl_eq_vlmax) {
++            /* save opcode for unwinding in case we throw an exception */
++            decode_save_opc(s);
++            egs = tcg_constant_i32(ZVKNH_EGS);
++            gen_helper_egs_check(egs, cpu_env);
++            tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
++        }
++
++        data = FIELD_DP32(data, VDATA, VM, a->vm);
++        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
++        data = FIELD_DP32(data, VDATA, VTA, s->vta);
++        data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);
++        data = FIELD_DP32(data, VDATA, VMA, s->vma);
++
++        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),
++            vreg_ofs(s, a->rs2), cpu_env, s->cfg_ptr->vlen / 8,
++            s->cfg_ptr->vlen / 8, data,
++            s->sew == MO_32 ?
++                gen_helper_vsha2ch32_vv : gen_helper_vsha2ch64_vv);
++
++        mark_vs_dirty(s);
++        gen_set_label(over);
++        return true;
++    }
++    return false;
++}
+--
+.41.0

-[PULL 18/25] target/riscv: rvv: Add tail agnostic for vector fix-point arithmetic instructions
+[PULL v2 18/45] target/riscv: Add Zvksh ISA extension support
-From: eopXD <yueh.ting.chen@gmail.com>
+From: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
-Signed-off-by: eop Chen <eop.chen@sifive.com>
+This commit adds support for the Zvksh vector-crypto extension, which
-Reviewed-by: Frank Chang <frank.chang@sifive.com>
+consists of the following instructions:
-Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
-Acked-by: Alistair Francis <alistair.francis@wdc.com>
+* vsm3me.vv
-Message-Id: <165449614532.19704.7000832880482980398-11@git.sr.ht>
+* vsm3c.vi
 Translation functions are defined in
 `target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
 `target/riscv/vcrypto_helper.c`.
 Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
 [max.chou@sifive.com: Replaced vstart checking by TCG op]
 Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
 Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
 Signed-off-by: Max Chou <max.chou@sifive.com>
 Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
 [max.chou@sifive.com: Exposed x-zvksh property]
 Message-ID: <20230711165917.2629866-12-max.chou@sifive.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/vector_helper.c | 220 ++++++++++++++++++-----------------
+ target/riscv/cpu_cfg.h                   |   1 +
-file changed, 114 insertions(+), 106 deletions(-)
+ target/riscv/helper.h                    |   3 +
+ target/riscv/insn32.decode               |   4 +
-diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
+ target/riscv/cpu.c                       |   6 +-
-index XXXXXXX..XXXXXXX 100644
+ target/riscv/vcrypto_helper.c            | 134 +++++++++++++++++++++++
---- a/target/riscv/vector_helper.c
+ target/riscv/insn_trans/trans_rvvk.c.inc |  31 ++++++
-+++ b/target/riscv/vector_helper.c
+files changed, 177 insertions(+), 2 deletions(-)
-@@ -XXX,XX +XXX,XX @@ static inline void
- vext_vv_rm_2(void *vd, void *v0, void *vs1, void *vs2,
+diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
-              CPURISCVState *env,
+index XXXXXXX..XXXXXXX 100644
-              uint32_t desc,
+--- a/target/riscv/cpu_cfg.h
--             opivv2_rm_fn *fn)
++++ b/target/riscv/cpu_cfg.h
-+             opivv2_rm_fn *fn, uint32_t esz)
+@@ -XXX,XX +XXX,XX @@ struct RISCVCPUConfig {
- {
+     bool ext_zvkned;
-     uint32_t vm = vext_vm(desc);
+     bool ext_zvknha;
-     uint32_t vl = env->vl;
+     bool ext_zvknhb;
 +    bool ext_zvksh;
      bool ext_zmmul;
      bool ext_zvfbfmin;
      bool ext_zvfbfwma;
 diff --git a/target/riscv/helper.h b/target/riscv/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/helper.h
 +++ b/target/riscv/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_5(vsha2ch32_vv, void, ptr, ptr, ptr, env, i32)
  DEF_HELPER_5(vsha2ch64_vv, void, ptr, ptr, ptr, env, i32)
  DEF_HELPER_5(vsha2cl32_vv, void, ptr, ptr, ptr, env, i32)
  DEF_HELPER_5(vsha2cl64_vv, void, ptr, ptr, ptr, env, i32)
 +
 +DEF_HELPER_5(vsm3me_vv, void, ptr, ptr, ptr, env, i32)
 +DEF_HELPER_5(vsm3c_vi, void, ptr, ptr, i32, env, i32)
 diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/insn32.decode
 +++ b/target/riscv/insn32.decode
@@ -XXX,XX +XXX,XX @@ vaeskf2_vi  101010 1 ..... ..... 010 ..... 1110111 @r_vm_1
  vsha2ms_vv  101101 1 ..... ..... 010 ..... 1110111 @r_vm_1
  vsha2ch_vv  101110 1 ..... ..... 010 ..... 1110111 @r_vm_1
  vsha2cl_vv  101111 1 ..... ..... 010 ..... 1110111 @r_vm_1
 +
 +# *** Zvksh vector crypto extension ***
 +vsm3me_vv   100000 1 ..... ..... 010 ..... 1110111 @r_vm_1
 +vsm3c_vi    101011 1 ..... ..... 010 ..... 1110111 @r_vm_1
 diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/cpu.c
 +++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
      ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned),
      ISA_EXT_DATA_ENTRY(zvknha, PRIV_VERSION_1_12_0, ext_zvknha),
      ISA_EXT_DATA_ENTRY(zvknhb, PRIV_VERSION_1_12_0, ext_zvknhb),
 +    ISA_EXT_DATA_ENTRY(zvksh, PRIV_VERSION_1_12_0, ext_zvksh),
      ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
      ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
      ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
@@ -XXX,XX +XXX,XX @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
       * In principle Zve*x would also suffice here, were they supported
       * in qemu
       */
 -    if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned || cpu->cfg.ext_zvknha) &&
 -        !cpu->cfg.ext_zve32f) {
 +    if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned || cpu->cfg.ext_zvknha ||
 +         cpu->cfg.ext_zvksh) && !cpu->cfg.ext_zve32f) {
          error_setg(errp,
                     "Vector crypto extensions require V or Zve* extensions");
          return;
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
      DEFINE_PROP_BOOL("x-zvkned", RISCVCPU, cfg.ext_zvkned, false),
      DEFINE_PROP_BOOL("x-zvknha", RISCVCPU, cfg.ext_zvknha, false),
      DEFINE_PROP_BOOL("x-zvknhb", RISCVCPU, cfg.ext_zvknhb, false),
 +    DEFINE_PROP_BOOL("x-zvksh", RISCVCPU, cfg.ext_zvksh, false),
      DEFINE_PROP_END_OF_LIST(),
  };
 diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/vcrypto_helper.c
 +++ b/target/riscv/vcrypto_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(vsha2cl64_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
      vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
      env->vstart = 0;
  }
 +
 +static inline uint32_t p1(uint32_t x)
 +{
 +    return x ^ rol32(x, 15) ^ rol32(x, 23);
 +}
 +
 +static inline uint32_t zvksh_w(uint32_t m16, uint32_t m9, uint32_t m3,
 +                               uint32_t m13, uint32_t m6)
 +{
 +    return p1(m16 ^ m9 ^ rol32(m3, 15)) ^ rol32(m13, 7) ^ m6;
 +}
 +
 +void HELPER(vsm3me_vv)(void *vd_vptr, void *vs1_vptr, void *vs2_vptr,
 +                       CPURISCVState *env, uint32_t desc)
 +{
 +    uint32_t esz = memop_size(FIELD_EX64(env->vtype, VTYPE, VSEW));
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
 +    uint32_t vta = vext_vta(desc);
++    uint32_t *vd = vd_vptr;
-     switch (env->vxrm) {
++    uint32_t *vs1 = vs1_vptr;
-     case 0: /* rnu */
++    uint32_t *vs2 = vs2_vptr;
-@@ -XXX,XX +XXX,XX @@ vext_vv_rm_2(void *vd, void *v0, void *vs1, void *vs2,
++
-                      env, vl, vm, 3, fn);
++    for (int i = env->vstart / 8; i < env->vl / 8; i++) {
-         break;
++        uint32_t w[24];
-     }
++        for (int j = 0; j < 8; j++) {
-+    /* set tail elements to 1s */
++            w[j] = bswap32(vs1[H4((i * 8) + j)]);
-+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
++            w[j + 8] = bswap32(vs2[H4((i * 8) + j)]);
- }
++        }
++        for (int j = 0; j < 8; j++) {
- /* generate helpers for fixed point instructions with OPIVV format */
++            w[j + 16] =
--#define GEN_VEXT_VV_RM(NAME)                                    \
++                zvksh_w(w[j], w[j + 7], w[j + 13], w[j + 3], w[j + 10]);
-+#define GEN_VEXT_VV_RM(NAME, ESZ)                               \
++        }
- void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,     \
++        for (int j = 0; j < 8; j++) {
-                   CPURISCVState *env, uint32_t desc)            \
++            vd[(i * 8) + j] = bswap32(w[H4(j + 16)]);
- {                                                               \
++        }
-     vext_vv_rm_2(vd, v0, vs1, vs2, env, desc,                   \
++    }
--                 do_##NAME);                                    \
++    vext_set_elems_1s(vd_vptr, vta, env->vl * esz, total_elems * esz);
-+                 do_##NAME, ESZ);                               \
++    env->vstart = 0;
- }
++}
++
- static inline uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
++static inline uint32_t ff1(uint32_t x, uint32_t y, uint32_t z)
-@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vsaddu_vv_b, OP_UUU_B, H1, H1, H1, saddu8)
++{
- RVVCALL(OPIVV2_RM, vsaddu_vv_h, OP_UUU_H, H2, H2, H2, saddu16)
++    return x ^ y ^ z;
- RVVCALL(OPIVV2_RM, vsaddu_vv_w, OP_UUU_W, H4, H4, H4, saddu32)
++}
- RVVCALL(OPIVV2_RM, vsaddu_vv_d, OP_UUU_D, H8, H8, H8, saddu64)
++
--GEN_VEXT_VV_RM(vsaddu_vv_b)
++static inline uint32_t ff2(uint32_t x, uint32_t y, uint32_t z)
--GEN_VEXT_VV_RM(vsaddu_vv_h)
++{
--GEN_VEXT_VV_RM(vsaddu_vv_w)
++    return (x & y) | (x & z) | (y & z);
--GEN_VEXT_VV_RM(vsaddu_vv_d)
++}
-+GEN_VEXT_VV_RM(vsaddu_vv_b, 1)
++
-+GEN_VEXT_VV_RM(vsaddu_vv_h, 2)
++static inline uint32_t ff_j(uint32_t x, uint32_t y, uint32_t z, uint32_t j)
-+GEN_VEXT_VV_RM(vsaddu_vv_w, 4)
++{
-+GEN_VEXT_VV_RM(vsaddu_vv_d, 8)
++    return (j <= 15) ? ff1(x, y, z) : ff2(x, y, z);
++}
- typedef void opivx2_rm_fn(void *vd, target_long s1, void *vs2, int i,
++
-                           CPURISCVState *env, int vxrm);
++static inline uint32_t gg1(uint32_t x, uint32_t y, uint32_t z)
-@@ -XXX,XX +XXX,XX @@ static inline void
++{
- vext_vx_rm_2(void *vd, void *v0, target_long s1, void *vs2,
++    return x ^ y ^ z;
-              CPURISCVState *env,
++}
-              uint32_t desc,
++
--             opivx2_rm_fn *fn)
++static inline uint32_t gg2(uint32_t x, uint32_t y, uint32_t z)
-+             opivx2_rm_fn *fn, uint32_t esz)
++{
- {
++    return (x & y) | (~x & z);
-     uint32_t vm = vext_vm(desc);
++}
-     uint32_t vl = env->vl;
++
 +static inline uint32_t gg_j(uint32_t x, uint32_t y, uint32_t z, uint32_t j)
 +{
 +    return (j <= 15) ? gg1(x, y, z) : gg2(x, y, z);
 +}
 +
 +static inline uint32_t t_j(uint32_t j)
 +{
 +    return (j <= 15) ? 0x79cc4519 : 0x7a879d8a;
 +}
 +
 +static inline uint32_t p_0(uint32_t x)
 +{
 +    return x ^ rol32(x, 9) ^ rol32(x, 17);
 +}
 +
 +static void sm3c(uint32_t *vd, uint32_t *vs1, uint32_t *vs2, uint32_t uimm)
 +{
 +    uint32_t x0, x1;
 +    uint32_t j;
 +    uint32_t ss1, ss2, tt1, tt2;
 +    x0 = vs2[0] ^ vs2[4];
 +    x1 = vs2[1] ^ vs2[5];
 +    j = 2 * uimm;
 +    ss1 = rol32(rol32(vs1[0], 12) + vs1[4] + rol32(t_j(j), j % 32), 7);
 +    ss2 = ss1 ^ rol32(vs1[0], 12);
 +    tt1 = ff_j(vs1[0], vs1[1], vs1[2], j) + vs1[3] + ss2 + x0;
 +    tt2 = gg_j(vs1[4], vs1[5], vs1[6], j) + vs1[7] + ss1 + vs2[0];
 +    vs1[3] = vs1[2];
 +    vd[3] = rol32(vs1[1], 9);
 +    vs1[1] = vs1[0];
 +    vd[1] = tt1;
 +    vs1[7] = vs1[6];
 +    vd[7] = rol32(vs1[5], 19);
 +    vs1[5] = vs1[4];
 +    vd[5] = p_0(tt2);
 +    j = 2 * uimm + 1;
 +    ss1 = rol32(rol32(vd[1], 12) + vd[5] + rol32(t_j(j), j % 32), 7);
 +    ss2 = ss1 ^ rol32(vd[1], 12);
 +    tt1 = ff_j(vd[1], vs1[1], vd[3], j) + vs1[3] + ss2 + x1;
 +    tt2 = gg_j(vd[5], vs1[5], vd[7], j) + vs1[7] + ss1 + vs2[1];
 +    vd[2] = rol32(vs1[1], 9);
 +    vd[0] = tt1;
 +    vd[6] = rol32(vs1[5], 19);
 +    vd[4] = p_0(tt2);
 +}
 +
 +void HELPER(vsm3c_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
 +                      CPURISCVState *env, uint32_t desc)
 +{
 +    uint32_t esz = memop_size(FIELD_EX64(env->vtype, VTYPE, VSEW));
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
 +    uint32_t vta = vext_vta(desc);
++    uint32_t *vd = vd_vptr;
-     switch (env->vxrm) {
++    uint32_t *vs2 = vs2_vptr;
-     case 0: /* rnu */
++    uint32_t v1[8], v2[8], v3[8];
-@@ -XXX,XX +XXX,XX @@ vext_vx_rm_2(void *vd, void *v0, target_long s1, void *vs2,
++
-                      env, vl, vm, 3, fn);
++    for (int i = env->vstart / 8; i < env->vl / 8; i++) {
-         break;
++        for (int k = 0; k < 8; k++) {
 +            v2[k] = bswap32(vd[H4(i * 8 + k)]);
 +            v3[k] = bswap32(vs2[H4(i * 8 + k)]);
 +        }
 +        sm3c(v1, v2, v3, uimm);
 +        for (int k = 0; k < 8; k++) {
 +            vd[i * 8 + k] = bswap32(v1[H4(k)]);
 +        }
 +    }
 +    vext_set_elems_1s(vd_vptr, vta, env->vl * esz, total_elems * esz);
 +    env->vstart = 0;
 +}
 diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc b/target/riscv/insn_trans/trans_rvvk.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/insn_trans/trans_rvvk.c.inc
 +++ b/target/riscv/insn_trans/trans_rvvk.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_vsha2ch_vv(DisasContext *s, arg_rmrr *a)
      }
-+    /* set tail elements to 1s */
+     return false;
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
  }
++
- /* generate helpers for fixed point instructions with OPIVX format */
++/*
--#define GEN_VEXT_VX_RM(NAME)                              \
++ * Zvksh
-+#define GEN_VEXT_VX_RM(NAME, ESZ)                         \
++ */
- void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
++
-         void *vs2, CPURISCVState *env, uint32_t desc)     \
++#define ZVKSH_EGS 8
- {                                                         \
++
-     vext_vx_rm_2(vd, v0, s1, vs2, env, desc,              \
++static inline bool vsm3_check(DisasContext *s, arg_rmrr *a)
--                 do_##NAME);                              \
++{
-+                 do_##NAME, ESZ);                         \
++    int egw_bytes = ZVKSH_EGS << s->sew;
- }
++    int mult = 1 << MAX(s->lmul, 0);
++    return s->cfg_ptr->ext_zvksh == true &&
- RVVCALL(OPIVX2_RM, vsaddu_vx_b, OP_UUU_B, H1, H1, saddu8)
++           require_rvv(s) &&
- RVVCALL(OPIVX2_RM, vsaddu_vx_h, OP_UUU_H, H2, H2, saddu16)
++           vext_check_isa_ill(s) &&
- RVVCALL(OPIVX2_RM, vsaddu_vx_w, OP_UUU_W, H4, H4, saddu32)
++           !is_overlapped(a->rd, mult, a->rs2, mult) &&
- RVVCALL(OPIVX2_RM, vsaddu_vx_d, OP_UUU_D, H8, H8, saddu64)
++           MAXSZ(s) >= egw_bytes &&
--GEN_VEXT_VX_RM(vsaddu_vx_b)
++           s->sew == MO_32;
--GEN_VEXT_VX_RM(vsaddu_vx_h)
++}
--GEN_VEXT_VX_RM(vsaddu_vx_w)
++
--GEN_VEXT_VX_RM(vsaddu_vx_d)
++static inline bool vsm3me_check(DisasContext *s, arg_rmrr *a)
-+GEN_VEXT_VX_RM(vsaddu_vx_b, 1)
++{
-+GEN_VEXT_VX_RM(vsaddu_vx_h, 2)
++    return vsm3_check(s, a) && vext_check_sss(s, a->rd, a->rs1, a->rs2, a->vm);
-+GEN_VEXT_VX_RM(vsaddu_vx_w, 4)
++}
-+GEN_VEXT_VX_RM(vsaddu_vx_d, 8)
++
++static inline bool vsm3c_check(DisasContext *s, arg_rmrr *a)
- static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
++{
- {
++    return vsm3_check(s, a) && vext_check_ss(s, a->rd, a->rs2, a->vm);
-@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vsadd_vv_b, OP_SSS_B, H1, H1, H1, sadd8)
++}
- RVVCALL(OPIVV2_RM, vsadd_vv_h, OP_SSS_H, H2, H2, H2, sadd16)
++
- RVVCALL(OPIVV2_RM, vsadd_vv_w, OP_SSS_W, H4, H4, H4, sadd32)
++GEN_VV_UNMASKED_TRANS(vsm3me_vv, vsm3me_check, ZVKSH_EGS)
- RVVCALL(OPIVV2_RM, vsadd_vv_d, OP_SSS_D, H8, H8, H8, sadd64)
++GEN_VI_UNMASKED_TRANS(vsm3c_vi, vsm3c_check, ZVKSH_EGS)
 -GEN_VEXT_VV_RM(vsadd_vv_b)
 -GEN_VEXT_VV_RM(vsadd_vv_h)
 -GEN_VEXT_VV_RM(vsadd_vv_w)
 -GEN_VEXT_VV_RM(vsadd_vv_d)
 +GEN_VEXT_VV_RM(vsadd_vv_b, 1)
 +GEN_VEXT_VV_RM(vsadd_vv_h, 2)
 +GEN_VEXT_VV_RM(vsadd_vv_w, 4)
 +GEN_VEXT_VV_RM(vsadd_vv_d, 8)
  RVVCALL(OPIVX2_RM, vsadd_vx_b, OP_SSS_B, H1, H1, sadd8)
  RVVCALL(OPIVX2_RM, vsadd_vx_h, OP_SSS_H, H2, H2, sadd16)
  RVVCALL(OPIVX2_RM, vsadd_vx_w, OP_SSS_W, H4, H4, sadd32)
  RVVCALL(OPIVX2_RM, vsadd_vx_d, OP_SSS_D, H8, H8, sadd64)
 -GEN_VEXT_VX_RM(vsadd_vx_b)
 -GEN_VEXT_VX_RM(vsadd_vx_h)
 -GEN_VEXT_VX_RM(vsadd_vx_w)
 -GEN_VEXT_VX_RM(vsadd_vx_d)
 +GEN_VEXT_VX_RM(vsadd_vx_b, 1)
 +GEN_VEXT_VX_RM(vsadd_vx_h, 2)
 +GEN_VEXT_VX_RM(vsadd_vx_w, 4)
 +GEN_VEXT_VX_RM(vsadd_vx_d, 8)
  static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
  {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssubu_vv_b, OP_UUU_B, H1, H1, H1, ssubu8)
  RVVCALL(OPIVV2_RM, vssubu_vv_h, OP_UUU_H, H2, H2, H2, ssubu16)
  RVVCALL(OPIVV2_RM, vssubu_vv_w, OP_UUU_W, H4, H4, H4, ssubu32)
  RVVCALL(OPIVV2_RM, vssubu_vv_d, OP_UUU_D, H8, H8, H8, ssubu64)
 -GEN_VEXT_VV_RM(vssubu_vv_b)
 -GEN_VEXT_VV_RM(vssubu_vv_h)
 -GEN_VEXT_VV_RM(vssubu_vv_w)
 -GEN_VEXT_VV_RM(vssubu_vv_d)
 +GEN_VEXT_VV_RM(vssubu_vv_b, 1)
 +GEN_VEXT_VV_RM(vssubu_vv_h, 2)
 +GEN_VEXT_VV_RM(vssubu_vv_w, 4)
 +GEN_VEXT_VV_RM(vssubu_vv_d, 8)
  RVVCALL(OPIVX2_RM, vssubu_vx_b, OP_UUU_B, H1, H1, ssubu8)
  RVVCALL(OPIVX2_RM, vssubu_vx_h, OP_UUU_H, H2, H2, ssubu16)
  RVVCALL(OPIVX2_RM, vssubu_vx_w, OP_UUU_W, H4, H4, ssubu32)
  RVVCALL(OPIVX2_RM, vssubu_vx_d, OP_UUU_D, H8, H8, ssubu64)
 -GEN_VEXT_VX_RM(vssubu_vx_b)
 -GEN_VEXT_VX_RM(vssubu_vx_h)
 -GEN_VEXT_VX_RM(vssubu_vx_w)
 -GEN_VEXT_VX_RM(vssubu_vx_d)
 +GEN_VEXT_VX_RM(vssubu_vx_b, 1)
 +GEN_VEXT_VX_RM(vssubu_vx_h, 2)
 +GEN_VEXT_VX_RM(vssubu_vx_w, 4)
 +GEN_VEXT_VX_RM(vssubu_vx_d, 8)
  static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
  {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssub_vv_b, OP_SSS_B, H1, H1, H1, ssub8)
  RVVCALL(OPIVV2_RM, vssub_vv_h, OP_SSS_H, H2, H2, H2, ssub16)
  RVVCALL(OPIVV2_RM, vssub_vv_w, OP_SSS_W, H4, H4, H4, ssub32)
  RVVCALL(OPIVV2_RM, vssub_vv_d, OP_SSS_D, H8, H8, H8, ssub64)
 -GEN_VEXT_VV_RM(vssub_vv_b)
 -GEN_VEXT_VV_RM(vssub_vv_h)
 -GEN_VEXT_VV_RM(vssub_vv_w)
 -GEN_VEXT_VV_RM(vssub_vv_d)
 +GEN_VEXT_VV_RM(vssub_vv_b, 1)
 +GEN_VEXT_VV_RM(vssub_vv_h, 2)
 +GEN_VEXT_VV_RM(vssub_vv_w, 4)
 +GEN_VEXT_VV_RM(vssub_vv_d, 8)
  RVVCALL(OPIVX2_RM, vssub_vx_b, OP_SSS_B, H1, H1, ssub8)
  RVVCALL(OPIVX2_RM, vssub_vx_h, OP_SSS_H, H2, H2, ssub16)
  RVVCALL(OPIVX2_RM, vssub_vx_w, OP_SSS_W, H4, H4, ssub32)
  RVVCALL(OPIVX2_RM, vssub_vx_d, OP_SSS_D, H8, H8, ssub64)
 -GEN_VEXT_VX_RM(vssub_vx_b)
 -GEN_VEXT_VX_RM(vssub_vx_h)
 -GEN_VEXT_VX_RM(vssub_vx_w)
 -GEN_VEXT_VX_RM(vssub_vx_d)
 +GEN_VEXT_VX_RM(vssub_vx_b, 1)
 +GEN_VEXT_VX_RM(vssub_vx_h, 2)
 +GEN_VEXT_VX_RM(vssub_vx_w, 4)
 +GEN_VEXT_VX_RM(vssub_vx_d, 8)
  /* Vector Single-Width Averaging Add and Subtract */
  static inline uint8_t get_round(int vxrm, uint64_t v, uint8_t shift)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd32)
  RVVCALL(OPIVV2_RM, vaadd_vv_h, OP_SSS_H, H2, H2, H2, aadd32)
  RVVCALL(OPIVV2_RM, vaadd_vv_w, OP_SSS_W, H4, H4, H4, aadd32)
  RVVCALL(OPIVV2_RM, vaadd_vv_d, OP_SSS_D, H8, H8, H8, aadd64)
 -GEN_VEXT_VV_RM(vaadd_vv_b)
 -GEN_VEXT_VV_RM(vaadd_vv_h)
 -GEN_VEXT_VV_RM(vaadd_vv_w)
 -GEN_VEXT_VV_RM(vaadd_vv_d)
 +GEN_VEXT_VV_RM(vaadd_vv_b, 1)
 +GEN_VEXT_VV_RM(vaadd_vv_h, 2)
 +GEN_VEXT_VV_RM(vaadd_vv_w, 4)
 +GEN_VEXT_VV_RM(vaadd_vv_d, 8)
  RVVCALL(OPIVX2_RM, vaadd_vx_b, OP_SSS_B, H1, H1, aadd32)
  RVVCALL(OPIVX2_RM, vaadd_vx_h, OP_SSS_H, H2, H2, aadd32)
  RVVCALL(OPIVX2_RM, vaadd_vx_w, OP_SSS_W, H4, H4, aadd32)
  RVVCALL(OPIVX2_RM, vaadd_vx_d, OP_SSS_D, H8, H8, aadd64)
 -GEN_VEXT_VX_RM(vaadd_vx_b)
 -GEN_VEXT_VX_RM(vaadd_vx_h)
 -GEN_VEXT_VX_RM(vaadd_vx_w)
 -GEN_VEXT_VX_RM(vaadd_vx_d)
 +GEN_VEXT_VX_RM(vaadd_vx_b, 1)
 +GEN_VEXT_VX_RM(vaadd_vx_h, 2)
 +GEN_VEXT_VX_RM(vaadd_vx_w, 4)
 +GEN_VEXT_VX_RM(vaadd_vx_d, 8)
  static inline uint32_t aaddu32(CPURISCVState *env, int vxrm,
                                 uint32_t a, uint32_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vaaddu_vv_b, OP_UUU_B, H1, H1, H1, aaddu32)
  RVVCALL(OPIVV2_RM, vaaddu_vv_h, OP_UUU_H, H2, H2, H2, aaddu32)
  RVVCALL(OPIVV2_RM, vaaddu_vv_w, OP_UUU_W, H4, H4, H4, aaddu32)
  RVVCALL(OPIVV2_RM, vaaddu_vv_d, OP_UUU_D, H8, H8, H8, aaddu64)
 -GEN_VEXT_VV_RM(vaaddu_vv_b)
 -GEN_VEXT_VV_RM(vaaddu_vv_h)
 -GEN_VEXT_VV_RM(vaaddu_vv_w)
 -GEN_VEXT_VV_RM(vaaddu_vv_d)
 +GEN_VEXT_VV_RM(vaaddu_vv_b, 1)
 +GEN_VEXT_VV_RM(vaaddu_vv_h, 2)
 +GEN_VEXT_VV_RM(vaaddu_vv_w, 4)
 +GEN_VEXT_VV_RM(vaaddu_vv_d, 8)
  RVVCALL(OPIVX2_RM, vaaddu_vx_b, OP_UUU_B, H1, H1, aaddu32)
  RVVCALL(OPIVX2_RM, vaaddu_vx_h, OP_UUU_H, H2, H2, aaddu32)
  RVVCALL(OPIVX2_RM, vaaddu_vx_w, OP_UUU_W, H4, H4, aaddu32)
  RVVCALL(OPIVX2_RM, vaaddu_vx_d, OP_UUU_D, H8, H8, aaddu64)
 -GEN_VEXT_VX_RM(vaaddu_vx_b)
 -GEN_VEXT_VX_RM(vaaddu_vx_h)
 -GEN_VEXT_VX_RM(vaaddu_vx_w)
 -GEN_VEXT_VX_RM(vaaddu_vx_d)
 +GEN_VEXT_VX_RM(vaaddu_vx_b, 1)
 +GEN_VEXT_VX_RM(vaaddu_vx_h, 2)
 +GEN_VEXT_VX_RM(vaaddu_vx_w, 4)
 +GEN_VEXT_VX_RM(vaaddu_vx_d, 8)
  static inline int32_t asub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
  {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vasub_vv_b, OP_SSS_B, H1, H1, H1, asub32)
  RVVCALL(OPIVV2_RM, vasub_vv_h, OP_SSS_H, H2, H2, H2, asub32)
  RVVCALL(OPIVV2_RM, vasub_vv_w, OP_SSS_W, H4, H4, H4, asub32)
  RVVCALL(OPIVV2_RM, vasub_vv_d, OP_SSS_D, H8, H8, H8, asub64)
 -GEN_VEXT_VV_RM(vasub_vv_b)
 -GEN_VEXT_VV_RM(vasub_vv_h)
 -GEN_VEXT_VV_RM(vasub_vv_w)
 -GEN_VEXT_VV_RM(vasub_vv_d)
 +GEN_VEXT_VV_RM(vasub_vv_b, 1)
 +GEN_VEXT_VV_RM(vasub_vv_h, 2)
 +GEN_VEXT_VV_RM(vasub_vv_w, 4)
 +GEN_VEXT_VV_RM(vasub_vv_d, 8)
  RVVCALL(OPIVX2_RM, vasub_vx_b, OP_SSS_B, H1, H1, asub32)
  RVVCALL(OPIVX2_RM, vasub_vx_h, OP_SSS_H, H2, H2, asub32)
  RVVCALL(OPIVX2_RM, vasub_vx_w, OP_SSS_W, H4, H4, asub32)
  RVVCALL(OPIVX2_RM, vasub_vx_d, OP_SSS_D, H8, H8, asub64)
 -GEN_VEXT_VX_RM(vasub_vx_b)
 -GEN_VEXT_VX_RM(vasub_vx_h)
 -GEN_VEXT_VX_RM(vasub_vx_w)
 -GEN_VEXT_VX_RM(vasub_vx_d)
 +GEN_VEXT_VX_RM(vasub_vx_b, 1)
 +GEN_VEXT_VX_RM(vasub_vx_h, 2)
 +GEN_VEXT_VX_RM(vasub_vx_w, 4)
 +GEN_VEXT_VX_RM(vasub_vx_d, 8)
  static inline uint32_t asubu32(CPURISCVState *env, int vxrm,
                                 uint32_t a, uint32_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vasubu_vv_b, OP_UUU_B, H1, H1, H1, asubu32)
  RVVCALL(OPIVV2_RM, vasubu_vv_h, OP_UUU_H, H2, H2, H2, asubu32)
  RVVCALL(OPIVV2_RM, vasubu_vv_w, OP_UUU_W, H4, H4, H4, asubu32)
  RVVCALL(OPIVV2_RM, vasubu_vv_d, OP_UUU_D, H8, H8, H8, asubu64)
 -GEN_VEXT_VV_RM(vasubu_vv_b)
 -GEN_VEXT_VV_RM(vasubu_vv_h)
 -GEN_VEXT_VV_RM(vasubu_vv_w)
 -GEN_VEXT_VV_RM(vasubu_vv_d)
 +GEN_VEXT_VV_RM(vasubu_vv_b, 1)
 +GEN_VEXT_VV_RM(vasubu_vv_h, 2)
 +GEN_VEXT_VV_RM(vasubu_vv_w, 4)
 +GEN_VEXT_VV_RM(vasubu_vv_d, 8)
  RVVCALL(OPIVX2_RM, vasubu_vx_b, OP_UUU_B, H1, H1, asubu32)
  RVVCALL(OPIVX2_RM, vasubu_vx_h, OP_UUU_H, H2, H2, asubu32)
  RVVCALL(OPIVX2_RM, vasubu_vx_w, OP_UUU_W, H4, H4, asubu32)
  RVVCALL(OPIVX2_RM, vasubu_vx_d, OP_UUU_D, H8, H8, asubu64)
 -GEN_VEXT_VX_RM(vasubu_vx_b)
 -GEN_VEXT_VX_RM(vasubu_vx_h)
 -GEN_VEXT_VX_RM(vasubu_vx_w)
 -GEN_VEXT_VX_RM(vasubu_vx_d)
 +GEN_VEXT_VX_RM(vasubu_vx_b, 1)
 +GEN_VEXT_VX_RM(vasubu_vx_h, 2)
 +GEN_VEXT_VX_RM(vasubu_vx_w, 4)
 +GEN_VEXT_VX_RM(vasubu_vx_d, 8)
  /* Vector Single-Width Fractional Multiply with Rounding and Saturation */
  static inline int8_t vsmul8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vsmul_vv_b, OP_SSS_B, H1, H1, H1, vsmul8)
  RVVCALL(OPIVV2_RM, vsmul_vv_h, OP_SSS_H, H2, H2, H2, vsmul16)
  RVVCALL(OPIVV2_RM, vsmul_vv_w, OP_SSS_W, H4, H4, H4, vsmul32)
  RVVCALL(OPIVV2_RM, vsmul_vv_d, OP_SSS_D, H8, H8, H8, vsmul64)
 -GEN_VEXT_VV_RM(vsmul_vv_b)
 -GEN_VEXT_VV_RM(vsmul_vv_h)
 -GEN_VEXT_VV_RM(vsmul_vv_w)
 -GEN_VEXT_VV_RM(vsmul_vv_d)
 +GEN_VEXT_VV_RM(vsmul_vv_b, 1)
 +GEN_VEXT_VV_RM(vsmul_vv_h, 2)
 +GEN_VEXT_VV_RM(vsmul_vv_w, 4)
 +GEN_VEXT_VV_RM(vsmul_vv_d, 8)
  RVVCALL(OPIVX2_RM, vsmul_vx_b, OP_SSS_B, H1, H1, vsmul8)
  RVVCALL(OPIVX2_RM, vsmul_vx_h, OP_SSS_H, H2, H2, vsmul16)
  RVVCALL(OPIVX2_RM, vsmul_vx_w, OP_SSS_W, H4, H4, vsmul32)
  RVVCALL(OPIVX2_RM, vsmul_vx_d, OP_SSS_D, H8, H8, vsmul64)
 -GEN_VEXT_VX_RM(vsmul_vx_b)
 -GEN_VEXT_VX_RM(vsmul_vx_h)
 -GEN_VEXT_VX_RM(vsmul_vx_w)
 -GEN_VEXT_VX_RM(vsmul_vx_d)
 +GEN_VEXT_VX_RM(vsmul_vx_b, 1)
 +GEN_VEXT_VX_RM(vsmul_vx_h, 2)
 +GEN_VEXT_VX_RM(vsmul_vx_w, 4)
 +GEN_VEXT_VX_RM(vsmul_vx_d, 8)
  /* Vector Single-Width Scaling Shift Instructions */
  static inline uint8_t
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssrl_vv_b, OP_UUU_B, H1, H1, H1, vssrl8)
  RVVCALL(OPIVV2_RM, vssrl_vv_h, OP_UUU_H, H2, H2, H2, vssrl16)
  RVVCALL(OPIVV2_RM, vssrl_vv_w, OP_UUU_W, H4, H4, H4, vssrl32)
  RVVCALL(OPIVV2_RM, vssrl_vv_d, OP_UUU_D, H8, H8, H8, vssrl64)
 -GEN_VEXT_VV_RM(vssrl_vv_b)
 -GEN_VEXT_VV_RM(vssrl_vv_h)
 -GEN_VEXT_VV_RM(vssrl_vv_w)
 -GEN_VEXT_VV_RM(vssrl_vv_d)
 +GEN_VEXT_VV_RM(vssrl_vv_b, 1)
 +GEN_VEXT_VV_RM(vssrl_vv_h, 2)
 +GEN_VEXT_VV_RM(vssrl_vv_w, 4)
 +GEN_VEXT_VV_RM(vssrl_vv_d, 8)
  RVVCALL(OPIVX2_RM, vssrl_vx_b, OP_UUU_B, H1, H1, vssrl8)
  RVVCALL(OPIVX2_RM, vssrl_vx_h, OP_UUU_H, H2, H2, vssrl16)
  RVVCALL(OPIVX2_RM, vssrl_vx_w, OP_UUU_W, H4, H4, vssrl32)
  RVVCALL(OPIVX2_RM, vssrl_vx_d, OP_UUU_D, H8, H8, vssrl64)
 -GEN_VEXT_VX_RM(vssrl_vx_b)
 -GEN_VEXT_VX_RM(vssrl_vx_h)
 -GEN_VEXT_VX_RM(vssrl_vx_w)
 -GEN_VEXT_VX_RM(vssrl_vx_d)
 +GEN_VEXT_VX_RM(vssrl_vx_b, 1)
 +GEN_VEXT_VX_RM(vssrl_vx_h, 2)
 +GEN_VEXT_VX_RM(vssrl_vx_w, 4)
 +GEN_VEXT_VX_RM(vssrl_vx_d, 8)
  static inline int8_t
  vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssra_vv_b, OP_SSS_B, H1, H1, H1, vssra8)
  RVVCALL(OPIVV2_RM, vssra_vv_h, OP_SSS_H, H2, H2, H2, vssra16)
  RVVCALL(OPIVV2_RM, vssra_vv_w, OP_SSS_W, H4, H4, H4, vssra32)
  RVVCALL(OPIVV2_RM, vssra_vv_d, OP_SSS_D, H8, H8, H8, vssra64)
 -GEN_VEXT_VV_RM(vssra_vv_b)
 -GEN_VEXT_VV_RM(vssra_vv_h)
 -GEN_VEXT_VV_RM(vssra_vv_w)
 -GEN_VEXT_VV_RM(vssra_vv_d)
 +GEN_VEXT_VV_RM(vssra_vv_b, 1)
 +GEN_VEXT_VV_RM(vssra_vv_h, 2)
 +GEN_VEXT_VV_RM(vssra_vv_w, 4)
 +GEN_VEXT_VV_RM(vssra_vv_d, 8)
  RVVCALL(OPIVX2_RM, vssra_vx_b, OP_SSS_B, H1, H1, vssra8)
  RVVCALL(OPIVX2_RM, vssra_vx_h, OP_SSS_H, H2, H2, vssra16)
  RVVCALL(OPIVX2_RM, vssra_vx_w, OP_SSS_W, H4, H4, vssra32)
  RVVCALL(OPIVX2_RM, vssra_vx_d, OP_SSS_D, H8, H8, vssra64)
 -GEN_VEXT_VX_RM(vssra_vx_b)
 -GEN_VEXT_VX_RM(vssra_vx_h)
 -GEN_VEXT_VX_RM(vssra_vx_w)
 -GEN_VEXT_VX_RM(vssra_vx_d)
 +GEN_VEXT_VX_RM(vssra_vx_b, 1)
 +GEN_VEXT_VX_RM(vssra_vx_h, 2)
 +GEN_VEXT_VX_RM(vssra_vx_w, 4)
 +GEN_VEXT_VX_RM(vssra_vx_d, 8)
  /* Vector Narrowing Fixed-Point Clip Instructions */
  static inline int8_t
@@ -XXX,XX +XXX,XX @@ vnclip32(CPURISCVState *env, int vxrm, int64_t a, int32_t b)
  RVVCALL(OPIVV2_RM, vnclip_wv_b, NOP_SSS_B, H1, H2, H1, vnclip8)
  RVVCALL(OPIVV2_RM, vnclip_wv_h, NOP_SSS_H, H2, H4, H2, vnclip16)
  RVVCALL(OPIVV2_RM, vnclip_wv_w, NOP_SSS_W, H4, H8, H4, vnclip32)
 -GEN_VEXT_VV_RM(vnclip_wv_b)
 -GEN_VEXT_VV_RM(vnclip_wv_h)
 -GEN_VEXT_VV_RM(vnclip_wv_w)
 +GEN_VEXT_VV_RM(vnclip_wv_b, 1)
 +GEN_VEXT_VV_RM(vnclip_wv_h, 2)
 +GEN_VEXT_VV_RM(vnclip_wv_w, 4)
  RVVCALL(OPIVX2_RM, vnclip_wx_b, NOP_SSS_B, H1, H2, vnclip8)
  RVVCALL(OPIVX2_RM, vnclip_wx_h, NOP_SSS_H, H2, H4, vnclip16)
  RVVCALL(OPIVX2_RM, vnclip_wx_w, NOP_SSS_W, H4, H8, vnclip32)
 -GEN_VEXT_VX_RM(vnclip_wx_b)
 -GEN_VEXT_VX_RM(vnclip_wx_h)
 -GEN_VEXT_VX_RM(vnclip_wx_w)
 +GEN_VEXT_VX_RM(vnclip_wx_b, 1)
 +GEN_VEXT_VX_RM(vnclip_wx_h, 2)
 +GEN_VEXT_VX_RM(vnclip_wx_w, 4)
  static inline uint8_t
  vnclipu8(CPURISCVState *env, int vxrm, uint16_t a, uint8_t b)
@@ -XXX,XX +XXX,XX @@ vnclipu32(CPURISCVState *env, int vxrm, uint64_t a, uint32_t b)
  RVVCALL(OPIVV2_RM, vnclipu_wv_b, NOP_UUU_B, H1, H2, H1, vnclipu8)
  RVVCALL(OPIVV2_RM, vnclipu_wv_h, NOP_UUU_H, H2, H4, H2, vnclipu16)
  RVVCALL(OPIVV2_RM, vnclipu_wv_w, NOP_UUU_W, H4, H8, H4, vnclipu32)
 -GEN_VEXT_VV_RM(vnclipu_wv_b)
 -GEN_VEXT_VV_RM(vnclipu_wv_h)
 -GEN_VEXT_VV_RM(vnclipu_wv_w)
 +GEN_VEXT_VV_RM(vnclipu_wv_b, 1)
 +GEN_VEXT_VV_RM(vnclipu_wv_h, 2)
 +GEN_VEXT_VV_RM(vnclipu_wv_w, 4)
  RVVCALL(OPIVX2_RM, vnclipu_wx_b, NOP_UUU_B, H1, H2, vnclipu8)
  RVVCALL(OPIVX2_RM, vnclipu_wx_h, NOP_UUU_H, H2, H4, vnclipu16)
  RVVCALL(OPIVX2_RM, vnclipu_wx_w, NOP_UUU_W, H4, H8, vnclipu32)
 -GEN_VEXT_VX_RM(vnclipu_wx_b)
 -GEN_VEXT_VX_RM(vnclipu_wx_h)
 -GEN_VEXT_VX_RM(vnclipu_wx_w)
 +GEN_VEXT_VX_RM(vnclipu_wx_b, 1)
 +GEN_VEXT_VX_RM(vnclipu_wx_h, 2)
 +GEN_VEXT_VX_RM(vnclipu_wx_w, 4)
  /*
   *** Vector Float Point Arithmetic Instructions
 --
-.36.1
+.41.0

-New patch
+[PULL v2 19/45] target/riscv: Add Zvkg ISA extension support
+From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
 This commit adds support for the Zvkg vector-crypto extension, which
 consists of the following instructions:
 * vgmul.vv
 * vghsh.vv
 Translation functions are defined in
 `target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
 `target/riscv/vcrypto_helper.c`.
 Co-authored-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
 [max.chou@sifive.com: Replaced vstart checking by TCG op]
 Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
 Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
 Signed-off-by: Max Chou <max.chou@sifive.com>
 Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
 [max.chou@sifive.com: Exposed x-zvkg property]
 [max.chou@sifive.com: Replaced uint by int for cross win32 build]
 Message-ID: <20230711165917.2629866-13-max.chou@sifive.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
  target/riscv/cpu_cfg.h                   |  1 +
  target/riscv/helper.h                    |  3 +
  target/riscv/insn32.decode               |  4 ++
  target/riscv/cpu.c                       |  6 +-
  target/riscv/vcrypto_helper.c            | 72 ++++++++++++++++++++++++
  target/riscv/insn_trans/trans_rvvk.c.inc | 30 ++++++++++
 files changed, 114 insertions(+), 2 deletions(-)
 diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/cpu_cfg.h
 +++ b/target/riscv/cpu_cfg.h
@@ -XXX,XX +XXX,XX @@ struct RISCVCPUConfig {
      bool ext_zve64d;
      bool ext_zvbb;
      bool ext_zvbc;
 +    bool ext_zvkg;
      bool ext_zvkned;
      bool ext_zvknha;
      bool ext_zvknhb;
 diff --git a/target/riscv/helper.h b/target/riscv/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/helper.h
 +++ b/target/riscv/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_5(vsha2cl64_vv, void, ptr, ptr, ptr, env, i32)
  DEF_HELPER_5(vsm3me_vv, void, ptr, ptr, ptr, env, i32)
  DEF_HELPER_5(vsm3c_vi, void, ptr, ptr, i32, env, i32)
 +
 +DEF_HELPER_5(vghsh_vv, void, ptr, ptr, ptr, env, i32)
 +DEF_HELPER_4(vgmul_vv, void, ptr, ptr, env, i32)
 diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/insn32.decode
 +++ b/target/riscv/insn32.decode
@@ -XXX,XX +XXX,XX @@ vsha2cl_vv  101111 1 ..... ..... 010 ..... 1110111 @r_vm_1
  # *** Zvksh vector crypto extension ***
  vsm3me_vv   100000 1 ..... ..... 010 ..... 1110111 @r_vm_1
  vsm3c_vi    101011 1 ..... ..... 010 ..... 1110111 @r_vm_1
 +
 +# *** Zvkg vector crypto extension ***
 +vghsh_vv    101100 1 ..... ..... 010 ..... 1110111 @r_vm_1
 +vgmul_vv    101000 1 ..... 10001 010 ..... 1110111 @r2_vm_1
 diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/cpu.c
 +++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
      ISA_EXT_DATA_ENTRY(zvfbfwma, PRIV_VERSION_1_12_0, ext_zvfbfwma),
      ISA_EXT_DATA_ENTRY(zvfh, PRIV_VERSION_1_12_0, ext_zvfh),
      ISA_EXT_DATA_ENTRY(zvfhmin, PRIV_VERSION_1_12_0, ext_zvfhmin),
 +    ISA_EXT_DATA_ENTRY(zvkg, PRIV_VERSION_1_12_0, ext_zvkg),
      ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned),
      ISA_EXT_DATA_ENTRY(zvknha, PRIV_VERSION_1_12_0, ext_zvknha),
      ISA_EXT_DATA_ENTRY(zvknhb, PRIV_VERSION_1_12_0, ext_zvknhb),
@@ -XXX,XX +XXX,XX @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
       * In principle Zve*x would also suffice here, were they supported
       * in qemu
       */
 -    if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned || cpu->cfg.ext_zvknha ||
 -         cpu->cfg.ext_zvksh) && !cpu->cfg.ext_zve32f) {
 +    if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkg || cpu->cfg.ext_zvkned ||
 +         cpu->cfg.ext_zvknha || cpu->cfg.ext_zvksh) && !cpu->cfg.ext_zve32f) {
          error_setg(errp,
                     "Vector crypto extensions require V or Zve* extensions");
          return;
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
      /* Vector cryptography extensions */
      DEFINE_PROP_BOOL("x-zvbb", RISCVCPU, cfg.ext_zvbb, false),
      DEFINE_PROP_BOOL("x-zvbc", RISCVCPU, cfg.ext_zvbc, false),
 +    DEFINE_PROP_BOOL("x-zvkg", RISCVCPU, cfg.ext_zvkg, false),
      DEFINE_PROP_BOOL("x-zvkned", RISCVCPU, cfg.ext_zvkned, false),
      DEFINE_PROP_BOOL("x-zvknha", RISCVCPU, cfg.ext_zvknha, false),
      DEFINE_PROP_BOOL("x-zvknhb", RISCVCPU, cfg.ext_zvknhb, false),
 diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/vcrypto_helper.c
 +++ b/target/riscv/vcrypto_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(vsm3c_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
      vext_set_elems_1s(vd_vptr, vta, env->vl * esz, total_elems * esz);
      env->vstart = 0;
  }
 +
 +void HELPER(vghsh_vv)(void *vd_vptr, void *vs1_vptr, void *vs2_vptr,
 +                      CPURISCVState *env, uint32_t desc)
 +{
 +    uint64_t *vd = vd_vptr;
 +    uint64_t *vs1 = vs1_vptr;
 +    uint64_t *vs2 = vs2_vptr;
 +    uint32_t vta = vext_vta(desc);
 +    uint32_t total_elems = vext_get_total_elems(env, desc, 4);
 +
 +    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
 +        uint64_t Y[2] = {vd[i * 2 + 0], vd[i * 2 + 1]};
 +        uint64_t H[2] = {brev8(vs2[i * 2 + 0]), brev8(vs2[i * 2 + 1])};
 +        uint64_t X[2] = {vs1[i * 2 + 0], vs1[i * 2 + 1]};
 +        uint64_t Z[2] = {0, 0};
 +
 +        uint64_t S[2] = {brev8(Y[0] ^ X[0]), brev8(Y[1] ^ X[1])};
 +
 +        for (int j = 0; j < 128; j++) {
 +            if ((S[j / 64] >> (j % 64)) & 1) {
 +                Z[0] ^= H[0];
 +                Z[1] ^= H[1];
 +            }
 +            bool reduce = ((H[1] >> 63) & 1);
 +            H[1] = H[1] << 1 | H[0] >> 63;
 +            H[0] = H[0] << 1;
 +            if (reduce) {
 +                H[0] ^= 0x87;
 +            }
 +        }
 +
 +        vd[i * 2 + 0] = brev8(Z[0]);
 +        vd[i * 2 + 1] = brev8(Z[1]);
 +    }
 +    /* set tail elements to 1s */
 +    vext_set_elems_1s(vd, vta, env->vl * 4, total_elems * 4);
 +    env->vstart = 0;
 +}
 +
 +void HELPER(vgmul_vv)(void *vd_vptr, void *vs2_vptr, CPURISCVState *env,
 +                      uint32_t desc)
 +{
 +    uint64_t *vd = vd_vptr;
 +    uint64_t *vs2 = vs2_vptr;
 +    uint32_t vta = vext_vta(desc);
 +    uint32_t total_elems = vext_get_total_elems(env, desc, 4);
 +
 +    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
 +        uint64_t Y[2] = {brev8(vd[i * 2 + 0]), brev8(vd[i * 2 + 1])};
 +        uint64_t H[2] = {brev8(vs2[i * 2 + 0]), brev8(vs2[i * 2 + 1])};
 +        uint64_t Z[2] = {0, 0};
 +
 +        for (int j = 0; j < 128; j++) {
 +            if ((Y[j / 64] >> (j % 64)) & 1) {
 +                Z[0] ^= H[0];
 +                Z[1] ^= H[1];
 +            }
 +            bool reduce = ((H[1] >> 63) & 1);
 +            H[1] = H[1] << 1 | H[0] >> 63;
 +            H[0] = H[0] << 1;
 +            if (reduce) {
 +                H[0] ^= 0x87;
 +            }
 +        }
 +
 +        vd[i * 2 + 0] = brev8(Z[0]);
 +        vd[i * 2 + 1] = brev8(Z[1]);
 +    }
 +    /* set tail elements to 1s */
 +    vext_set_elems_1s(vd, vta, env->vl * 4, total_elems * 4);
 +    env->vstart = 0;
 +}
 diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc b/target/riscv/insn_trans/trans_rvvk.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/insn_trans/trans_rvvk.c.inc
 +++ b/target/riscv/insn_trans/trans_rvvk.c.inc
@@ -XXX,XX +XXX,XX @@ static inline bool vsm3c_check(DisasContext *s, arg_rmrr *a)
  GEN_VV_UNMASKED_TRANS(vsm3me_vv, vsm3me_check, ZVKSH_EGS)
  GEN_VI_UNMASKED_TRANS(vsm3c_vi, vsm3c_check, ZVKSH_EGS)
 +
 +/*
 + * Zvkg
 + */
 +
 +#define ZVKG_EGS 4
 +
 +static bool vgmul_check(DisasContext *s, arg_rmr *a)
 +{
 +    int egw_bytes = ZVKG_EGS << s->sew;
 +    return s->cfg_ptr->ext_zvkg == true &&
 +           vext_check_isa_ill(s) &&
 +           require_rvv(s) &&
 +           MAXSZ(s) >= egw_bytes &&
 +           vext_check_ss(s, a->rd, a->rs2, a->vm) &&
 +           s->sew == MO_32;
 +}
 +
 +GEN_V_UNMASKED_TRANS(vgmul_vv, vgmul_check, ZVKG_EGS)
 +
 +static bool vghsh_check(DisasContext *s, arg_rmrr *a)
 +{
 +    int egw_bytes = ZVKG_EGS << s->sew;
 +    return s->cfg_ptr->ext_zvkg == true &&
 +           opivv_check(s, a) &&
 +           MAXSZ(s) >= egw_bytes &&
 +           s->sew == MO_32;
 +}
 +
 +GEN_VV_UNMASKED_TRANS(vghsh_vv, vghsh_check, ZVKG_EGS)
 --
 .41.0

-New patch
+[PULL v2 20/45] crypto: Create sm4_subword
+From: Max Chou <max.chou@sifive.com>
+Allows sharing of sm4_subword between different targets.
+Signed-off-by: Max Chou <max.chou@sifive.com>
+Reviewed-by: Frank Chang <frank.chang@sifive.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Max Chou <max.chou@sifive.com>
+Message-ID: <20230711165917.2629866-14-max.chou@sifive.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ include/crypto/sm4.h           |  8 ++++++++
+ target/arm/tcg/crypto_helper.c | 10 ++--------
+files changed, 10 insertions(+), 8 deletions(-)
+diff --git a/include/crypto/sm4.h b/include/crypto/sm4.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/crypto/sm4.h
++++ b/include/crypto/sm4.h
+@@ -XXX,XX +XXX,XX @@
+ extern const uint8_t sm4_sbox[256];
++static inline uint32_t sm4_subword(uint32_t word)
++{
++    return sm4_sbox[word & 0xff] |
++           sm4_sbox[(word >> 8) & 0xff] << 8 |
++           sm4_sbox[(word >> 16) & 0xff] << 16 |
++           sm4_sbox[(word >> 24) & 0xff] << 24;
++}
++
+ #endif
+diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/crypto_helper.c
++++ b/target/arm/tcg/crypto_helper.c
+@@ -XXX,XX +XXX,XX @@ static void do_crypto_sm4e(uint64_t *rd, uint64_t *rn, uint64_t *rm)
+             CR_ST_WORD(d, (i + 3) % 4) ^
+             CR_ST_WORD(n, i);
+-        t = sm4_sbox[t & 0xff] |
+-            sm4_sbox[(t >> 8) & 0xff] << 8 |
+-            sm4_sbox[(t >> 16) & 0xff] << 16 |
+-            sm4_sbox[(t >> 24) & 0xff] << 24;
++        t = sm4_subword(t);
+         CR_ST_WORD(d, i) ^= t ^ rol32(t, 2) ^ rol32(t, 10) ^ rol32(t, 18) ^
+                             rol32(t, 24);
+@@ -XXX,XX +XXX,XX @@ static void do_crypto_sm4ekey(uint64_t *rd, uint64_t *rn, uint64_t *rm)
+             CR_ST_WORD(d, (i + 3) % 4) ^
+             CR_ST_WORD(m, i);
+-        t = sm4_sbox[t & 0xff] |
+-            sm4_sbox[(t >> 8) & 0xff] << 8 |
+-            sm4_sbox[(t >> 16) & 0xff] << 16 |
+-            sm4_sbox[(t >> 24) & 0xff] << 24;
++        t = sm4_subword(t);
+         CR_ST_WORD(d, i) ^= t ^ rol32(t, 13) ^ rol32(t, 23);
+     }
+--
+.41.0

-New patch
+[PULL v2 21/45] crypto: Add SM4 constant parameter CK
+From: Max Chou <max.chou@sifive.com>
+Adds sm4_ck constant for use in sm4 cryptography across different targets.
+Signed-off-by: Max Chou <max.chou@sifive.com>
+Reviewed-by: Frank Chang <frank.chang@sifive.com>
+Signed-off-by: Max Chou <max.chou@sifive.com>
+Message-ID: <20230711165917.2629866-15-max.chou@sifive.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ include/crypto/sm4.h |  1 +
+ crypto/sm4.c         | 10 ++++++++++
+files changed, 11 insertions(+)
+diff --git a/include/crypto/sm4.h b/include/crypto/sm4.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/crypto/sm4.h
++++ b/include/crypto/sm4.h
+@@ -XXX,XX +XXX,XX @@
+ #define QEMU_SM4_H
+ extern const uint8_t sm4_sbox[256];
++extern const uint32_t sm4_ck[32];
+ static inline uint32_t sm4_subword(uint32_t word)
+ {
+diff --git a/crypto/sm4.c b/crypto/sm4.c
+index XXXXXXX..XXXXXXX 100644
+--- a/crypto/sm4.c
++++ b/crypto/sm4.c
+@@ -XXX,XX +XXX,XX @@ uint8_t const sm4_sbox[] = {
+x79, 0xee, 0x5f, 0x3e, 0xd7, 0xcb, 0x39, 0x48,
+ };
++uint32_t const sm4_ck[] = {
++    0x00070e15, 0x1c232a31, 0x383f464d, 0x545b6269,
++    0x70777e85, 0x8c939aa1, 0xa8afb6bd, 0xc4cbd2d9,
++    0xe0e7eef5, 0xfc030a11, 0x181f262d, 0x343b4249,
++    0x50575e65, 0x6c737a81, 0x888f969d, 0xa4abb2b9,
++    0xc0c7ced5, 0xdce3eaf1, 0xf8ff060d, 0x141b2229,
++    0x30373e45, 0x4c535a61, 0x686f767d, 0x848b9299,
++    0xa0a7aeb5, 0xbcc3cad1, 0xd8dfe6ed, 0xf4fb0209,
++    0x10171e25, 0x2c333a41, 0x484f565d, 0x646b7279
++};
+--
+.41.0

-[PULL 19/25] target/riscv: rvv: Add tail agnostic for vector floating-point instructions
+[PULL v2 22/45] target/riscv: Add Zvksed ISA extension support
-From: eopXD <yueh.ting.chen@gmail.com>
+From: Max Chou <max.chou@sifive.com>
-Compares write mask registers, and so always operate under a tail-
+This commit adds support for the Zvksed vector-crypto extension, which
-agnostic policy.
+consists of the following instructions:
-Signed-off-by: eop Chen <eop.chen@sifive.com>
+* vsm4k.vi
 * vsm4r.[vv,vs]
 Translation functions are defined in
 `target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
 `target/riscv/vcrypto_helper.c`.
 Signed-off-by: Max Chou <max.chou@sifive.com>
 Reviewed-by: Frank Chang <frank.chang@sifive.com>
-Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
+[lawrence.hunter@codethink.co.uk: Moved SM4 functions from
-Acked-by: Alistair Francis <alistair.francis@wdc.com>
+crypto_helper.c to vcrypto_helper.c]
-Message-Id: <165449614532.19704.7000832880482980398-12@git.sr.ht>
+[nazar.kazakov@codethink.co.uk: Added alignment checks, refactored code to
 use macros, and minor style changes]
 Signed-off-by: Max Chou <max.chou@sifive.com>
 Message-ID: <20230711165917.2629866-16-max.chou@sifive.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/vector_helper.c            | 440 +++++++++++++-----------
+ target/riscv/cpu_cfg.h                   |   1 +
- target/riscv/insn_trans/trans_rvv.c.inc |  17 +
+ target/riscv/helper.h                    |   4 +
-files changed, 261 insertions(+), 196 deletions(-)
+ target/riscv/insn32.decode               |   5 +
+ target/riscv/cpu.c                       |   5 +-
-diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
+ target/riscv/vcrypto_helper.c            | 127 +++++++++++++++++++++++
-index XXXXXXX..XXXXXXX 100644
+ target/riscv/insn_trans/trans_rvvk.c.inc |  43 ++++++++
---- a/target/riscv/vector_helper.c
+files changed, 184 insertions(+), 1 deletion(-)
-+++ b/target/riscv/vector_helper.c
-@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, void *vs1, void *vs2, int i,   \
+diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
-     *((TD *)vd + HD(i)) = OP(s2, s1, &env->fp_status);         \
+index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/cpu_cfg.h
 +++ b/target/riscv/cpu_cfg.h
@@ -XXX,XX +XXX,XX @@ struct RISCVCPUConfig {
      bool ext_zvkned;
      bool ext_zvknha;
      bool ext_zvknhb;
 +    bool ext_zvksed;
      bool ext_zvksh;
      bool ext_zmmul;
      bool ext_zvfbfmin;
 diff --git a/target/riscv/helper.h b/target/riscv/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/helper.h
 +++ b/target/riscv/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_5(vsm3c_vi, void, ptr, ptr, i32, env, i32)
  DEF_HELPER_5(vghsh_vv, void, ptr, ptr, ptr, env, i32)
  DEF_HELPER_4(vgmul_vv, void, ptr, ptr, env, i32)
 +
 +DEF_HELPER_5(vsm4k_vi, void, ptr, ptr, i32, env, i32)
 +DEF_HELPER_4(vsm4r_vv, void, ptr, ptr, env, i32)
 +DEF_HELPER_4(vsm4r_vs, void, ptr, ptr, env, i32)
 diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/insn32.decode
 +++ b/target/riscv/insn32.decode
@@ -XXX,XX +XXX,XX @@ vsm3c_vi    101011 1 ..... ..... 010 ..... 1110111 @r_vm_1
  # *** Zvkg vector crypto extension ***
  vghsh_vv    101100 1 ..... ..... 010 ..... 1110111 @r_vm_1
  vgmul_vv    101000 1 ..... 10001 010 ..... 1110111 @r2_vm_1
 +
 +# *** Zvksed vector crypto extension ***
 +vsm4k_vi    100001 1 ..... ..... 010 ..... 1110111 @r_vm_1
 +vsm4r_vv    101000 1 ..... 10000 010 ..... 1110111 @r2_vm_1
 +vsm4r_vs    101001 1 ..... 10000 010 ..... 1110111 @r2_vm_1
 diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/cpu.c
 +++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
      ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned),
      ISA_EXT_DATA_ENTRY(zvknha, PRIV_VERSION_1_12_0, ext_zvknha),
      ISA_EXT_DATA_ENTRY(zvknhb, PRIV_VERSION_1_12_0, ext_zvknhb),
 +    ISA_EXT_DATA_ENTRY(zvksed, PRIV_VERSION_1_12_0, ext_zvksed),
      ISA_EXT_DATA_ENTRY(zvksh, PRIV_VERSION_1_12_0, ext_zvksh),
      ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
      ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
@@ -XXX,XX +XXX,XX @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
       * in qemu
       */
      if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkg || cpu->cfg.ext_zvkned ||
 -         cpu->cfg.ext_zvknha || cpu->cfg.ext_zvksh) && !cpu->cfg.ext_zve32f) {
 +         cpu->cfg.ext_zvknha || cpu->cfg.ext_zvksed || cpu->cfg.ext_zvksh) &&
 +        !cpu->cfg.ext_zve32f) {
          error_setg(errp,
                     "Vector crypto extensions require V or Zve* extensions");
          return;
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
      DEFINE_PROP_BOOL("x-zvkned", RISCVCPU, cfg.ext_zvkned, false),
      DEFINE_PROP_BOOL("x-zvknha", RISCVCPU, cfg.ext_zvknha, false),
      DEFINE_PROP_BOOL("x-zvknhb", RISCVCPU, cfg.ext_zvknhb, false),
 +    DEFINE_PROP_BOOL("x-zvksed", RISCVCPU, cfg.ext_zvksed, false),
      DEFINE_PROP_BOOL("x-zvksh", RISCVCPU, cfg.ext_zvksh, false),
      DEFINE_PROP_END_OF_LIST(),
 diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/vcrypto_helper.c
 +++ b/target/riscv/vcrypto_helper.c
@@ -XXX,XX +XXX,XX @@
  #include "cpu.h"
  #include "crypto/aes.h"
  #include "crypto/aes-round.h"
 +#include "crypto/sm4.h"
  #include "exec/memop.h"
  #include "exec/exec-all.h"
  #include "exec/helper-proto.h"
@@ -XXX,XX +XXX,XX @@ void HELPER(vgmul_vv)(void *vd_vptr, void *vs2_vptr, CPURISCVState *env,
      vext_set_elems_1s(vd, vta, env->vl * 4, total_elems * 4);
      env->vstart = 0;
  }
++
--#define GEN_VEXT_VV_ENV(NAME)                             \
++void HELPER(vsm4k_vi)(void *vd, void *vs2, uint32_t uimm5, CPURISCVState *env,
-+#define GEN_VEXT_VV_ENV(NAME, ESZ)                        \
++                      uint32_t desc)
- void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
++{
-                   void *vs2, CPURISCVState *env,          \
++    const uint32_t egs = 4;
-                   uint32_t desc)                          \
++    uint32_t rnd = uimm5 & 0x7;
- {                                                         \
++    uint32_t group_start = env->vstart / egs;
-     uint32_t vm = vext_vm(desc);                          \
++    uint32_t group_end = env->vl / egs;
-     uint32_t vl = env->vl;                                \
++    uint32_t esz = sizeof(uint32_t);
-+    uint32_t total_elems =                                \
++    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
-+        vext_get_total_elems(env, desc, ESZ);             \
++
-+    uint32_t vta = vext_vta(desc);                        \
++    for (uint32_t i = group_start; i < group_end; ++i) {
-     uint32_t i;                                           \
++        uint32_t vstart = i * egs;
-                                                           \
++        uint32_t vend = (i + 1) * egs;
-     for (i = env->vstart; i < vl; i++) {                  \
++        uint32_t rk[4] = {0};
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
++        uint32_t tmp[8] = {0};
-         do_##NAME(vd, vs1, vs2, i, env);                  \
++
-     }                                                     \
++        for (uint32_t j = vstart; j < vend; ++j) {
-     env->vstart = 0;                                      \
++            rk[j - vstart] = *((uint32_t *)vs2 + H4(j));
-+    /* set tail elements to 1s */                         \
++        }
-+    vext_set_elems_1s(vd, vta, vl * ESZ,                  \
++
-+                      total_elems * ESZ);                 \
++        for (uint32_t j = 0; j < egs; ++j) {
 +            tmp[j] = rk[j];
 +        }
 +
 +        for (uint32_t j = 0; j < egs; ++j) {
 +            uint32_t b, s;
 +            b = tmp[j + 1] ^ tmp[j + 2] ^ tmp[j + 3] ^ sm4_ck[rnd * 4 + j];
 +
 +            s = sm4_subword(b);
 +
 +            tmp[j + 4] = tmp[j] ^ (s ^ rol32(s, 13) ^ rol32(s, 23));
 +        }
 +
 +        for (uint32_t j = vstart; j < vend; ++j) {
 +            *((uint32_t *)vd + H4(j)) = tmp[egs + (j - vstart)];
 +        }
 +    }
 +
 +    env->vstart = 0;
 +    /* set tail elements to 1s */
 +    vext_set_elems_1s(vd, vext_vta(desc), env->vl * esz, total_elems * esz);
 +}
 +
 +static void do_sm4_round(uint32_t *rk, uint32_t *buf)
 +{
 +    const uint32_t egs = 4;
 +    uint32_t s, b;
 +
 +    for (uint32_t j = egs; j < egs * 2; ++j) {
 +        b = buf[j - 3] ^ buf[j - 2] ^ buf[j - 1] ^ rk[j - 4];
 +
 +        s = sm4_subword(b);
 +
 +        buf[j] = buf[j - 4] ^ (s ^ rol32(s, 2) ^ rol32(s, 10) ^ rol32(s, 18) ^
 +                               rol32(s, 24));
 +    }
 +}
 +
 +void HELPER(vsm4r_vv)(void *vd, void *vs2, CPURISCVState *env, uint32_t desc)
 +{
 +    const uint32_t egs = 4;
 +    uint32_t group_start = env->vstart / egs;
 +    uint32_t group_end = env->vl / egs;
 +    uint32_t esz = sizeof(uint32_t);
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
 +
 +    for (uint32_t i = group_start; i < group_end; ++i) {
 +        uint32_t vstart = i * egs;
 +        uint32_t vend = (i + 1) * egs;
 +        uint32_t rk[4] = {0};
 +        uint32_t tmp[8] = {0};
 +
 +        for (uint32_t j = vstart; j < vend; ++j) {
 +            rk[j - vstart] = *((uint32_t *)vs2 + H4(j));
 +        }
 +
 +        for (uint32_t j = vstart; j < vend; ++j) {
 +            tmp[j - vstart] = *((uint32_t *)vd + H4(j));
 +        }
 +
 +        do_sm4_round(rk, tmp);
 +
 +        for (uint32_t j = vstart; j < vend; ++j) {
 +            *((uint32_t *)vd + H4(j)) = tmp[egs + (j - vstart)];
 +        }
 +    }
 +
 +    env->vstart = 0;
 +    /* set tail elements to 1s */
 +    vext_set_elems_1s(vd, vext_vta(desc), env->vl * esz, total_elems * esz);
 +}
 +
 +void HELPER(vsm4r_vs)(void *vd, void *vs2, CPURISCVState *env, uint32_t desc)
 +{
 +    const uint32_t egs = 4;
 +    uint32_t group_start = env->vstart / egs;
 +    uint32_t group_end = env->vl / egs;
 +    uint32_t esz = sizeof(uint32_t);
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
 +
 +    for (uint32_t i = group_start; i < group_end; ++i) {
 +        uint32_t vstart = i * egs;
 +        uint32_t vend = (i + 1) * egs;
 +        uint32_t rk[4] = {0};
 +        uint32_t tmp[8] = {0};
 +
 +        for (uint32_t j = 0; j < egs; ++j) {
 +            rk[j] = *((uint32_t *)vs2 + H4(j));
 +        }
 +
 +        for (uint32_t j = vstart; j < vend; ++j) {
 +            tmp[j - vstart] = *((uint32_t *)vd + H4(j));
 +        }
 +
 +        do_sm4_round(rk, tmp);
 +
 +        for (uint32_t j = vstart; j < vend; ++j) {
 +            *((uint32_t *)vd + H4(j)) = tmp[egs + (j - vstart)];
 +        }
 +    }
 +
 +    env->vstart = 0;
 +    /* set tail elements to 1s */
 +    vext_set_elems_1s(vd, vext_vta(desc), env->vl * esz, total_elems * esz);
 +}
 diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc b/target/riscv/insn_trans/trans_rvvk.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/insn_trans/trans_rvvk.c.inc
 +++ b/target/riscv/insn_trans/trans_rvvk.c.inc
@@ -XXX,XX +XXX,XX @@ static bool vghsh_check(DisasContext *s, arg_rmrr *a)
  }
- RVVCALL(OPFVV2, vfadd_vv_h, OP_UUU_H, H2, H2, H2, float16_add)
+ GEN_VV_UNMASKED_TRANS(vghsh_vv, vghsh_check, ZVKG_EGS)
- RVVCALL(OPFVV2, vfadd_vv_w, OP_UUU_W, H4, H4, H4, float32_add)
++
- RVVCALL(OPFVV2, vfadd_vv_d, OP_UUU_D, H8, H8, H8, float64_add)
++/*
--GEN_VEXT_VV_ENV(vfadd_vv_h)
++ * Zvksed
--GEN_VEXT_VV_ENV(vfadd_vv_w)
++ */
--GEN_VEXT_VV_ENV(vfadd_vv_d)
++
-+GEN_VEXT_VV_ENV(vfadd_vv_h, 2)
++#define ZVKSED_EGS 4
-+GEN_VEXT_VV_ENV(vfadd_vv_w, 4)
++
-+GEN_VEXT_VV_ENV(vfadd_vv_d, 8)
++static bool zvksed_check(DisasContext *s)
++{
- #define OPFVF2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)        \
++    int egw_bytes = ZVKSED_EGS << s->sew;
- static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i, \
++    return s->cfg_ptr->ext_zvksed == true &&
-@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i, \
++           require_rvv(s) &&
-     *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1, &env->fp_status);\
++           vext_check_isa_ill(s) &&
- }
++           MAXSZ(s) >= egw_bytes &&
++           s->sew == MO_32;
--#define GEN_VEXT_VF(NAME)                                 \
++}
-+#define GEN_VEXT_VF(NAME, ESZ)                            \
++
- void HELPER(NAME)(void *vd, void *v0, uint64_t s1,        \
++static bool vsm4k_vi_check(DisasContext *s, arg_rmrr *a)
-                   void *vs2, CPURISCVState *env,          \
++{
-                   uint32_t desc)                          \
++    return zvksed_check(s) &&
- {                                                         \
++           require_align(a->rd, s->lmul) &&
-     uint32_t vm = vext_vm(desc);                          \
++           require_align(a->rs2, s->lmul);
-     uint32_t vl = env->vl;                                \
++}
-+    uint32_t total_elems =                                \
++
-+        vext_get_total_elems(env, desc, ESZ);              \
++GEN_VI_UNMASKED_TRANS(vsm4k_vi, vsm4k_vi_check, ZVKSED_EGS)
-+    uint32_t vta = vext_vta(desc);                        \
++
-     uint32_t i;                                           \
++static bool vsm4r_vv_check(DisasContext *s, arg_rmr *a)
-                                                           \
++{
-     for (i = env->vstart; i < vl; i++) {                  \
++    return zvksed_check(s) &&
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1,        \
++           require_align(a->rd, s->lmul) &&
-         do_##NAME(vd, s1, vs2, i, env);                   \
++           require_align(a->rs2, s->lmul);
-     }                                                     \
++}
-     env->vstart = 0;                                      \
++
-+    /* set tail elements to 1s */                         \
++GEN_V_UNMASKED_TRANS(vsm4r_vv, vsm4r_vv_check, ZVKSED_EGS)
-+    vext_set_elems_1s(vd, vta, vl * ESZ,                  \
++
-+                      total_elems * ESZ);                 \
++static bool vsm4r_vs_check(DisasContext *s, arg_rmr *a)
- }
++{
++    return zvksed_check(s) &&
- RVVCALL(OPFVF2, vfadd_vf_h, OP_UUU_H, H2, H2, float16_add)
++           !is_overlapped(a->rd, 1 << MAX(s->lmul, 0), a->rs2, 1) &&
- RVVCALL(OPFVF2, vfadd_vf_w, OP_UUU_W, H4, H4, float32_add)
++           require_align(a->rd, s->lmul);
- RVVCALL(OPFVF2, vfadd_vf_d, OP_UUU_D, H8, H8, float64_add)
++}
--GEN_VEXT_VF(vfadd_vf_h)
++
--GEN_VEXT_VF(vfadd_vf_w)
++GEN_V_UNMASKED_TRANS(vsm4r_vs, vsm4r_vs_check, ZVKSED_EGS)
 -GEN_VEXT_VF(vfadd_vf_d)
 +GEN_VEXT_VF(vfadd_vf_h, 2)
 +GEN_VEXT_VF(vfadd_vf_w, 4)
 +GEN_VEXT_VF(vfadd_vf_d, 8)
  RVVCALL(OPFVV2, vfsub_vv_h, OP_UUU_H, H2, H2, H2, float16_sub)
  RVVCALL(OPFVV2, vfsub_vv_w, OP_UUU_W, H4, H4, H4, float32_sub)
  RVVCALL(OPFVV2, vfsub_vv_d, OP_UUU_D, H8, H8, H8, float64_sub)
 -GEN_VEXT_VV_ENV(vfsub_vv_h)
 -GEN_VEXT_VV_ENV(vfsub_vv_w)
 -GEN_VEXT_VV_ENV(vfsub_vv_d)
 +GEN_VEXT_VV_ENV(vfsub_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfsub_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfsub_vv_d, 8)
  RVVCALL(OPFVF2, vfsub_vf_h, OP_UUU_H, H2, H2, float16_sub)
  RVVCALL(OPFVF2, vfsub_vf_w, OP_UUU_W, H4, H4, float32_sub)
  RVVCALL(OPFVF2, vfsub_vf_d, OP_UUU_D, H8, H8, float64_sub)
 -GEN_VEXT_VF(vfsub_vf_h)
 -GEN_VEXT_VF(vfsub_vf_w)
 -GEN_VEXT_VF(vfsub_vf_d)
 +GEN_VEXT_VF(vfsub_vf_h, 2)
 +GEN_VEXT_VF(vfsub_vf_w, 4)
 +GEN_VEXT_VF(vfsub_vf_d, 8)
  static uint16_t float16_rsub(uint16_t a, uint16_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t float64_rsub(uint64_t a, uint64_t b, float_status *s)
  RVVCALL(OPFVF2, vfrsub_vf_h, OP_UUU_H, H2, H2, float16_rsub)
  RVVCALL(OPFVF2, vfrsub_vf_w, OP_UUU_W, H4, H4, float32_rsub)
  RVVCALL(OPFVF2, vfrsub_vf_d, OP_UUU_D, H8, H8, float64_rsub)
 -GEN_VEXT_VF(vfrsub_vf_h)
 -GEN_VEXT_VF(vfrsub_vf_w)
 -GEN_VEXT_VF(vfrsub_vf_d)
 +GEN_VEXT_VF(vfrsub_vf_h, 2)
 +GEN_VEXT_VF(vfrsub_vf_w, 4)
 +GEN_VEXT_VF(vfrsub_vf_d, 8)
  /* Vector Widening Floating-Point Add/Subtract Instructions */
  static uint32_t vfwadd16(uint16_t a, uint16_t b, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwadd32(uint32_t a, uint32_t b, float_status *s)
  RVVCALL(OPFVV2, vfwadd_vv_h, WOP_UUU_H, H4, H2, H2, vfwadd16)
  RVVCALL(OPFVV2, vfwadd_vv_w, WOP_UUU_W, H8, H4, H4, vfwadd32)
 -GEN_VEXT_VV_ENV(vfwadd_vv_h)
 -GEN_VEXT_VV_ENV(vfwadd_vv_w)
 +GEN_VEXT_VV_ENV(vfwadd_vv_h, 4)
 +GEN_VEXT_VV_ENV(vfwadd_vv_w, 8)
  RVVCALL(OPFVF2, vfwadd_vf_h, WOP_UUU_H, H4, H2, vfwadd16)
  RVVCALL(OPFVF2, vfwadd_vf_w, WOP_UUU_W, H8, H4, vfwadd32)
 -GEN_VEXT_VF(vfwadd_vf_h)
 -GEN_VEXT_VF(vfwadd_vf_w)
 +GEN_VEXT_VF(vfwadd_vf_h, 4)
 +GEN_VEXT_VF(vfwadd_vf_w, 8)
  static uint32_t vfwsub16(uint16_t a, uint16_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwsub32(uint32_t a, uint32_t b, float_status *s)
  RVVCALL(OPFVV2, vfwsub_vv_h, WOP_UUU_H, H4, H2, H2, vfwsub16)
  RVVCALL(OPFVV2, vfwsub_vv_w, WOP_UUU_W, H8, H4, H4, vfwsub32)
 -GEN_VEXT_VV_ENV(vfwsub_vv_h)
 -GEN_VEXT_VV_ENV(vfwsub_vv_w)
 +GEN_VEXT_VV_ENV(vfwsub_vv_h, 4)
 +GEN_VEXT_VV_ENV(vfwsub_vv_w, 8)
  RVVCALL(OPFVF2, vfwsub_vf_h, WOP_UUU_H, H4, H2, vfwsub16)
  RVVCALL(OPFVF2, vfwsub_vf_w, WOP_UUU_W, H8, H4, vfwsub32)
 -GEN_VEXT_VF(vfwsub_vf_h)
 -GEN_VEXT_VF(vfwsub_vf_w)
 +GEN_VEXT_VF(vfwsub_vf_h, 4)
 +GEN_VEXT_VF(vfwsub_vf_w, 8)
  static uint32_t vfwaddw16(uint32_t a, uint16_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwaddw32(uint64_t a, uint32_t b, float_status *s)
  RVVCALL(OPFVV2, vfwadd_wv_h, WOP_WUUU_H, H4, H2, H2, vfwaddw16)
  RVVCALL(OPFVV2, vfwadd_wv_w, WOP_WUUU_W, H8, H4, H4, vfwaddw32)
 -GEN_VEXT_VV_ENV(vfwadd_wv_h)
 -GEN_VEXT_VV_ENV(vfwadd_wv_w)
 +GEN_VEXT_VV_ENV(vfwadd_wv_h, 4)
 +GEN_VEXT_VV_ENV(vfwadd_wv_w, 8)
  RVVCALL(OPFVF2, vfwadd_wf_h, WOP_WUUU_H, H4, H2, vfwaddw16)
  RVVCALL(OPFVF2, vfwadd_wf_w, WOP_WUUU_W, H8, H4, vfwaddw32)
 -GEN_VEXT_VF(vfwadd_wf_h)
 -GEN_VEXT_VF(vfwadd_wf_w)
 +GEN_VEXT_VF(vfwadd_wf_h, 4)
 +GEN_VEXT_VF(vfwadd_wf_w, 8)
  static uint32_t vfwsubw16(uint32_t a, uint16_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwsubw32(uint64_t a, uint32_t b, float_status *s)
  RVVCALL(OPFVV2, vfwsub_wv_h, WOP_WUUU_H, H4, H2, H2, vfwsubw16)
  RVVCALL(OPFVV2, vfwsub_wv_w, WOP_WUUU_W, H8, H4, H4, vfwsubw32)
 -GEN_VEXT_VV_ENV(vfwsub_wv_h)
 -GEN_VEXT_VV_ENV(vfwsub_wv_w)
 +GEN_VEXT_VV_ENV(vfwsub_wv_h, 4)
 +GEN_VEXT_VV_ENV(vfwsub_wv_w, 8)
  RVVCALL(OPFVF2, vfwsub_wf_h, WOP_WUUU_H, H4, H2, vfwsubw16)
  RVVCALL(OPFVF2, vfwsub_wf_w, WOP_WUUU_W, H8, H4, vfwsubw32)
 -GEN_VEXT_VF(vfwsub_wf_h)
 -GEN_VEXT_VF(vfwsub_wf_w)
 +GEN_VEXT_VF(vfwsub_wf_h, 4)
 +GEN_VEXT_VF(vfwsub_wf_w, 8)
  /* Vector Single-Width Floating-Point Multiply/Divide Instructions */
  RVVCALL(OPFVV2, vfmul_vv_h, OP_UUU_H, H2, H2, H2, float16_mul)
  RVVCALL(OPFVV2, vfmul_vv_w, OP_UUU_W, H4, H4, H4, float32_mul)
  RVVCALL(OPFVV2, vfmul_vv_d, OP_UUU_D, H8, H8, H8, float64_mul)
 -GEN_VEXT_VV_ENV(vfmul_vv_h)
 -GEN_VEXT_VV_ENV(vfmul_vv_w)
 -GEN_VEXT_VV_ENV(vfmul_vv_d)
 +GEN_VEXT_VV_ENV(vfmul_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfmul_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfmul_vv_d, 8)
  RVVCALL(OPFVF2, vfmul_vf_h, OP_UUU_H, H2, H2, float16_mul)
  RVVCALL(OPFVF2, vfmul_vf_w, OP_UUU_W, H4, H4, float32_mul)
  RVVCALL(OPFVF2, vfmul_vf_d, OP_UUU_D, H8, H8, float64_mul)
 -GEN_VEXT_VF(vfmul_vf_h)
 -GEN_VEXT_VF(vfmul_vf_w)
 -GEN_VEXT_VF(vfmul_vf_d)
 +GEN_VEXT_VF(vfmul_vf_h, 2)
 +GEN_VEXT_VF(vfmul_vf_w, 4)
 +GEN_VEXT_VF(vfmul_vf_d, 8)
  RVVCALL(OPFVV2, vfdiv_vv_h, OP_UUU_H, H2, H2, H2, float16_div)
  RVVCALL(OPFVV2, vfdiv_vv_w, OP_UUU_W, H4, H4, H4, float32_div)
  RVVCALL(OPFVV2, vfdiv_vv_d, OP_UUU_D, H8, H8, H8, float64_div)
 -GEN_VEXT_VV_ENV(vfdiv_vv_h)
 -GEN_VEXT_VV_ENV(vfdiv_vv_w)
 -GEN_VEXT_VV_ENV(vfdiv_vv_d)
 +GEN_VEXT_VV_ENV(vfdiv_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfdiv_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfdiv_vv_d, 8)
  RVVCALL(OPFVF2, vfdiv_vf_h, OP_UUU_H, H2, H2, float16_div)
  RVVCALL(OPFVF2, vfdiv_vf_w, OP_UUU_W, H4, H4, float32_div)
  RVVCALL(OPFVF2, vfdiv_vf_d, OP_UUU_D, H8, H8, float64_div)
 -GEN_VEXT_VF(vfdiv_vf_h)
 -GEN_VEXT_VF(vfdiv_vf_w)
 -GEN_VEXT_VF(vfdiv_vf_d)
 +GEN_VEXT_VF(vfdiv_vf_h, 2)
 +GEN_VEXT_VF(vfdiv_vf_w, 4)
 +GEN_VEXT_VF(vfdiv_vf_d, 8)
  static uint16_t float16_rdiv(uint16_t a, uint16_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t float64_rdiv(uint64_t a, uint64_t b, float_status *s)
  RVVCALL(OPFVF2, vfrdiv_vf_h, OP_UUU_H, H2, H2, float16_rdiv)
  RVVCALL(OPFVF2, vfrdiv_vf_w, OP_UUU_W, H4, H4, float32_rdiv)
  RVVCALL(OPFVF2, vfrdiv_vf_d, OP_UUU_D, H8, H8, float64_rdiv)
 -GEN_VEXT_VF(vfrdiv_vf_h)
 -GEN_VEXT_VF(vfrdiv_vf_w)
 -GEN_VEXT_VF(vfrdiv_vf_d)
 +GEN_VEXT_VF(vfrdiv_vf_h, 2)
 +GEN_VEXT_VF(vfrdiv_vf_w, 4)
 +GEN_VEXT_VF(vfrdiv_vf_d, 8)
  /* Vector Widening Floating-Point Multiply */
  static uint32_t vfwmul16(uint16_t a, uint16_t b, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwmul32(uint32_t a, uint32_t b, float_status *s)
  }
  RVVCALL(OPFVV2, vfwmul_vv_h, WOP_UUU_H, H4, H2, H2, vfwmul16)
  RVVCALL(OPFVV2, vfwmul_vv_w, WOP_UUU_W, H8, H4, H4, vfwmul32)
 -GEN_VEXT_VV_ENV(vfwmul_vv_h)
 -GEN_VEXT_VV_ENV(vfwmul_vv_w)
 +GEN_VEXT_VV_ENV(vfwmul_vv_h, 4)
 +GEN_VEXT_VV_ENV(vfwmul_vv_w, 8)
  RVVCALL(OPFVF2, vfwmul_vf_h, WOP_UUU_H, H4, H2, vfwmul16)
  RVVCALL(OPFVF2, vfwmul_vf_w, WOP_UUU_W, H8, H4, vfwmul32)
 -GEN_VEXT_VF(vfwmul_vf_h)
 -GEN_VEXT_VF(vfwmul_vf_w)
 +GEN_VEXT_VF(vfwmul_vf_h, 4)
 +GEN_VEXT_VF(vfwmul_vf_w, 8)
  /* Vector Single-Width Floating-Point Fused Multiply-Add Instructions */
  #define OPFVV3(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)       \
@@ -XXX,XX +XXX,XX @@ static uint64_t fmacc64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfmacc_vv_h, OP_UUU_H, H2, H2, H2, fmacc16)
  RVVCALL(OPFVV3, vfmacc_vv_w, OP_UUU_W, H4, H4, H4, fmacc32)
  RVVCALL(OPFVV3, vfmacc_vv_d, OP_UUU_D, H8, H8, H8, fmacc64)
 -GEN_VEXT_VV_ENV(vfmacc_vv_h)
 -GEN_VEXT_VV_ENV(vfmacc_vv_w)
 -GEN_VEXT_VV_ENV(vfmacc_vv_d)
 +GEN_VEXT_VV_ENV(vfmacc_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfmacc_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfmacc_vv_d, 8)
  #define OPFVF3(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)           \
  static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i,    \
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i,    \
  RVVCALL(OPFVF3, vfmacc_vf_h, OP_UUU_H, H2, H2, fmacc16)
  RVVCALL(OPFVF3, vfmacc_vf_w, OP_UUU_W, H4, H4, fmacc32)
  RVVCALL(OPFVF3, vfmacc_vf_d, OP_UUU_D, H8, H8, fmacc64)
 -GEN_VEXT_VF(vfmacc_vf_h)
 -GEN_VEXT_VF(vfmacc_vf_w)
 -GEN_VEXT_VF(vfmacc_vf_d)
 +GEN_VEXT_VF(vfmacc_vf_h, 2)
 +GEN_VEXT_VF(vfmacc_vf_w, 4)
 +GEN_VEXT_VF(vfmacc_vf_d, 8)
  static uint16_t fnmacc16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmacc64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfnmacc_vv_h, OP_UUU_H, H2, H2, H2, fnmacc16)
  RVVCALL(OPFVV3, vfnmacc_vv_w, OP_UUU_W, H4, H4, H4, fnmacc32)
  RVVCALL(OPFVV3, vfnmacc_vv_d, OP_UUU_D, H8, H8, H8, fnmacc64)
 -GEN_VEXT_VV_ENV(vfnmacc_vv_h)
 -GEN_VEXT_VV_ENV(vfnmacc_vv_w)
 -GEN_VEXT_VV_ENV(vfnmacc_vv_d)
 +GEN_VEXT_VV_ENV(vfnmacc_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfnmacc_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfnmacc_vv_d, 8)
  RVVCALL(OPFVF3, vfnmacc_vf_h, OP_UUU_H, H2, H2, fnmacc16)
  RVVCALL(OPFVF3, vfnmacc_vf_w, OP_UUU_W, H4, H4, fnmacc32)
  RVVCALL(OPFVF3, vfnmacc_vf_d, OP_UUU_D, H8, H8, fnmacc64)
 -GEN_VEXT_VF(vfnmacc_vf_h)
 -GEN_VEXT_VF(vfnmacc_vf_w)
 -GEN_VEXT_VF(vfnmacc_vf_d)
 +GEN_VEXT_VF(vfnmacc_vf_h, 2)
 +GEN_VEXT_VF(vfnmacc_vf_w, 4)
 +GEN_VEXT_VF(vfnmacc_vf_d, 8)
  static uint16_t fmsac16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fmsac64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfmsac_vv_h, OP_UUU_H, H2, H2, H2, fmsac16)
  RVVCALL(OPFVV3, vfmsac_vv_w, OP_UUU_W, H4, H4, H4, fmsac32)
  RVVCALL(OPFVV3, vfmsac_vv_d, OP_UUU_D, H8, H8, H8, fmsac64)
 -GEN_VEXT_VV_ENV(vfmsac_vv_h)
 -GEN_VEXT_VV_ENV(vfmsac_vv_w)
 -GEN_VEXT_VV_ENV(vfmsac_vv_d)
 +GEN_VEXT_VV_ENV(vfmsac_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfmsac_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfmsac_vv_d, 8)
  RVVCALL(OPFVF3, vfmsac_vf_h, OP_UUU_H, H2, H2, fmsac16)
  RVVCALL(OPFVF3, vfmsac_vf_w, OP_UUU_W, H4, H4, fmsac32)
  RVVCALL(OPFVF3, vfmsac_vf_d, OP_UUU_D, H8, H8, fmsac64)
 -GEN_VEXT_VF(vfmsac_vf_h)
 -GEN_VEXT_VF(vfmsac_vf_w)
 -GEN_VEXT_VF(vfmsac_vf_d)
 +GEN_VEXT_VF(vfmsac_vf_h, 2)
 +GEN_VEXT_VF(vfmsac_vf_w, 4)
 +GEN_VEXT_VF(vfmsac_vf_d, 8)
  static uint16_t fnmsac16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmsac64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfnmsac_vv_h, OP_UUU_H, H2, H2, H2, fnmsac16)
  RVVCALL(OPFVV3, vfnmsac_vv_w, OP_UUU_W, H4, H4, H4, fnmsac32)
  RVVCALL(OPFVV3, vfnmsac_vv_d, OP_UUU_D, H8, H8, H8, fnmsac64)
 -GEN_VEXT_VV_ENV(vfnmsac_vv_h)
 -GEN_VEXT_VV_ENV(vfnmsac_vv_w)
 -GEN_VEXT_VV_ENV(vfnmsac_vv_d)
 +GEN_VEXT_VV_ENV(vfnmsac_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfnmsac_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfnmsac_vv_d, 8)
  RVVCALL(OPFVF3, vfnmsac_vf_h, OP_UUU_H, H2, H2, fnmsac16)
  RVVCALL(OPFVF3, vfnmsac_vf_w, OP_UUU_W, H4, H4, fnmsac32)
  RVVCALL(OPFVF3, vfnmsac_vf_d, OP_UUU_D, H8, H8, fnmsac64)
 -GEN_VEXT_VF(vfnmsac_vf_h)
 -GEN_VEXT_VF(vfnmsac_vf_w)
 -GEN_VEXT_VF(vfnmsac_vf_d)
 +GEN_VEXT_VF(vfnmsac_vf_h, 2)
 +GEN_VEXT_VF(vfnmsac_vf_w, 4)
 +GEN_VEXT_VF(vfnmsac_vf_d, 8)
  static uint16_t fmadd16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fmadd64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfmadd_vv_h, OP_UUU_H, H2, H2, H2, fmadd16)
  RVVCALL(OPFVV3, vfmadd_vv_w, OP_UUU_W, H4, H4, H4, fmadd32)
  RVVCALL(OPFVV3, vfmadd_vv_d, OP_UUU_D, H8, H8, H8, fmadd64)
 -GEN_VEXT_VV_ENV(vfmadd_vv_h)
 -GEN_VEXT_VV_ENV(vfmadd_vv_w)
 -GEN_VEXT_VV_ENV(vfmadd_vv_d)
 +GEN_VEXT_VV_ENV(vfmadd_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfmadd_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfmadd_vv_d, 8)
  RVVCALL(OPFVF3, vfmadd_vf_h, OP_UUU_H, H2, H2, fmadd16)
  RVVCALL(OPFVF3, vfmadd_vf_w, OP_UUU_W, H4, H4, fmadd32)
  RVVCALL(OPFVF3, vfmadd_vf_d, OP_UUU_D, H8, H8, fmadd64)
 -GEN_VEXT_VF(vfmadd_vf_h)
 -GEN_VEXT_VF(vfmadd_vf_w)
 -GEN_VEXT_VF(vfmadd_vf_d)
 +GEN_VEXT_VF(vfmadd_vf_h, 2)
 +GEN_VEXT_VF(vfmadd_vf_w, 4)
 +GEN_VEXT_VF(vfmadd_vf_d, 8)
  static uint16_t fnmadd16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmadd64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfnmadd_vv_h, OP_UUU_H, H2, H2, H2, fnmadd16)
  RVVCALL(OPFVV3, vfnmadd_vv_w, OP_UUU_W, H4, H4, H4, fnmadd32)
  RVVCALL(OPFVV3, vfnmadd_vv_d, OP_UUU_D, H8, H8, H8, fnmadd64)
 -GEN_VEXT_VV_ENV(vfnmadd_vv_h)
 -GEN_VEXT_VV_ENV(vfnmadd_vv_w)
 -GEN_VEXT_VV_ENV(vfnmadd_vv_d)
 +GEN_VEXT_VV_ENV(vfnmadd_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfnmadd_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfnmadd_vv_d, 8)
  RVVCALL(OPFVF3, vfnmadd_vf_h, OP_UUU_H, H2, H2, fnmadd16)
  RVVCALL(OPFVF3, vfnmadd_vf_w, OP_UUU_W, H4, H4, fnmadd32)
  RVVCALL(OPFVF3, vfnmadd_vf_d, OP_UUU_D, H8, H8, fnmadd64)
 -GEN_VEXT_VF(vfnmadd_vf_h)
 -GEN_VEXT_VF(vfnmadd_vf_w)
 -GEN_VEXT_VF(vfnmadd_vf_d)
 +GEN_VEXT_VF(vfnmadd_vf_h, 2)
 +GEN_VEXT_VF(vfnmadd_vf_w, 4)
 +GEN_VEXT_VF(vfnmadd_vf_d, 8)
  static uint16_t fmsub16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fmsub64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfmsub_vv_h, OP_UUU_H, H2, H2, H2, fmsub16)
  RVVCALL(OPFVV3, vfmsub_vv_w, OP_UUU_W, H4, H4, H4, fmsub32)
  RVVCALL(OPFVV3, vfmsub_vv_d, OP_UUU_D, H8, H8, H8, fmsub64)
 -GEN_VEXT_VV_ENV(vfmsub_vv_h)
 -GEN_VEXT_VV_ENV(vfmsub_vv_w)
 -GEN_VEXT_VV_ENV(vfmsub_vv_d)
 +GEN_VEXT_VV_ENV(vfmsub_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfmsub_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfmsub_vv_d, 8)
  RVVCALL(OPFVF3, vfmsub_vf_h, OP_UUU_H, H2, H2, fmsub16)
  RVVCALL(OPFVF3, vfmsub_vf_w, OP_UUU_W, H4, H4, fmsub32)
  RVVCALL(OPFVF3, vfmsub_vf_d, OP_UUU_D, H8, H8, fmsub64)
 -GEN_VEXT_VF(vfmsub_vf_h)
 -GEN_VEXT_VF(vfmsub_vf_w)
 -GEN_VEXT_VF(vfmsub_vf_d)
 +GEN_VEXT_VF(vfmsub_vf_h, 2)
 +GEN_VEXT_VF(vfmsub_vf_w, 4)
 +GEN_VEXT_VF(vfmsub_vf_d, 8)
  static uint16_t fnmsub16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmsub64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfnmsub_vv_h, OP_UUU_H, H2, H2, H2, fnmsub16)
  RVVCALL(OPFVV3, vfnmsub_vv_w, OP_UUU_W, H4, H4, H4, fnmsub32)
  RVVCALL(OPFVV3, vfnmsub_vv_d, OP_UUU_D, H8, H8, H8, fnmsub64)
 -GEN_VEXT_VV_ENV(vfnmsub_vv_h)
 -GEN_VEXT_VV_ENV(vfnmsub_vv_w)
 -GEN_VEXT_VV_ENV(vfnmsub_vv_d)
 +GEN_VEXT_VV_ENV(vfnmsub_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfnmsub_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfnmsub_vv_d, 8)
  RVVCALL(OPFVF3, vfnmsub_vf_h, OP_UUU_H, H2, H2, fnmsub16)
  RVVCALL(OPFVF3, vfnmsub_vf_w, OP_UUU_W, H4, H4, fnmsub32)
  RVVCALL(OPFVF3, vfnmsub_vf_d, OP_UUU_D, H8, H8, fnmsub64)
 -GEN_VEXT_VF(vfnmsub_vf_h)
 -GEN_VEXT_VF(vfnmsub_vf_w)
 -GEN_VEXT_VF(vfnmsub_vf_d)
 +GEN_VEXT_VF(vfnmsub_vf_h, 2)
 +GEN_VEXT_VF(vfnmsub_vf_w, 4)
 +GEN_VEXT_VF(vfnmsub_vf_d, 8)
  /* Vector Widening Floating-Point Fused Multiply-Add Instructions */
  static uint32_t fwmacc16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t fwmacc32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfwmacc_vv_h, WOP_UUU_H, H4, H2, H2, fwmacc16)
  RVVCALL(OPFVV3, vfwmacc_vv_w, WOP_UUU_W, H8, H4, H4, fwmacc32)
 -GEN_VEXT_VV_ENV(vfwmacc_vv_h)
 -GEN_VEXT_VV_ENV(vfwmacc_vv_w)
 +GEN_VEXT_VV_ENV(vfwmacc_vv_h, 4)
 +GEN_VEXT_VV_ENV(vfwmacc_vv_w, 8)
  RVVCALL(OPFVF3, vfwmacc_vf_h, WOP_UUU_H, H4, H2, fwmacc16)
  RVVCALL(OPFVF3, vfwmacc_vf_w, WOP_UUU_W, H8, H4, fwmacc32)
 -GEN_VEXT_VF(vfwmacc_vf_h)
 -GEN_VEXT_VF(vfwmacc_vf_w)
 +GEN_VEXT_VF(vfwmacc_vf_h, 4)
 +GEN_VEXT_VF(vfwmacc_vf_w, 8)
  static uint32_t fwnmacc16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fwnmacc32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfwnmacc_vv_h, WOP_UUU_H, H4, H2, H2, fwnmacc16)
  RVVCALL(OPFVV3, vfwnmacc_vv_w, WOP_UUU_W, H8, H4, H4, fwnmacc32)
 -GEN_VEXT_VV_ENV(vfwnmacc_vv_h)
 -GEN_VEXT_VV_ENV(vfwnmacc_vv_w)
 +GEN_VEXT_VV_ENV(vfwnmacc_vv_h, 4)
 +GEN_VEXT_VV_ENV(vfwnmacc_vv_w, 8)
  RVVCALL(OPFVF3, vfwnmacc_vf_h, WOP_UUU_H, H4, H2, fwnmacc16)
  RVVCALL(OPFVF3, vfwnmacc_vf_w, WOP_UUU_W, H8, H4, fwnmacc32)
 -GEN_VEXT_VF(vfwnmacc_vf_h)
 -GEN_VEXT_VF(vfwnmacc_vf_w)
 +GEN_VEXT_VF(vfwnmacc_vf_h, 4)
 +GEN_VEXT_VF(vfwnmacc_vf_w, 8)
  static uint32_t fwmsac16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fwmsac32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfwmsac_vv_h, WOP_UUU_H, H4, H2, H2, fwmsac16)
  RVVCALL(OPFVV3, vfwmsac_vv_w, WOP_UUU_W, H8, H4, H4, fwmsac32)
 -GEN_VEXT_VV_ENV(vfwmsac_vv_h)
 -GEN_VEXT_VV_ENV(vfwmsac_vv_w)
 +GEN_VEXT_VV_ENV(vfwmsac_vv_h, 4)
 +GEN_VEXT_VV_ENV(vfwmsac_vv_w, 8)
  RVVCALL(OPFVF3, vfwmsac_vf_h, WOP_UUU_H, H4, H2, fwmsac16)
  RVVCALL(OPFVF3, vfwmsac_vf_w, WOP_UUU_W, H8, H4, fwmsac32)
 -GEN_VEXT_VF(vfwmsac_vf_h)
 -GEN_VEXT_VF(vfwmsac_vf_w)
 +GEN_VEXT_VF(vfwmsac_vf_h, 4)
 +GEN_VEXT_VF(vfwmsac_vf_w, 8)
  static uint32_t fwnmsac16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fwnmsac32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
  RVVCALL(OPFVV3, vfwnmsac_vv_h, WOP_UUU_H, H4, H2, H2, fwnmsac16)
  RVVCALL(OPFVV3, vfwnmsac_vv_w, WOP_UUU_W, H8, H4, H4, fwnmsac32)
 -GEN_VEXT_VV_ENV(vfwnmsac_vv_h)
 -GEN_VEXT_VV_ENV(vfwnmsac_vv_w)
 +GEN_VEXT_VV_ENV(vfwnmsac_vv_h, 4)
 +GEN_VEXT_VV_ENV(vfwnmsac_vv_w, 8)
  RVVCALL(OPFVF3, vfwnmsac_vf_h, WOP_UUU_H, H4, H2, fwnmsac16)
  RVVCALL(OPFVF3, vfwnmsac_vf_w, WOP_UUU_W, H8, H4, fwnmsac32)
 -GEN_VEXT_VF(vfwnmsac_vf_h)
 -GEN_VEXT_VF(vfwnmsac_vf_w)
 +GEN_VEXT_VF(vfwnmsac_vf_h, 4)
 +GEN_VEXT_VF(vfwnmsac_vf_w, 8)
  /* Vector Floating-Point Square-Root Instruction */
  /* (TD, T2, TX2) */
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, void *vs2, int i,      \
      *((TD *)vd + HD(i)) = OP(s2, &env->fp_status);     \
  }
 -#define GEN_VEXT_V_ENV(NAME)                           \
 +#define GEN_VEXT_V_ENV(NAME, ESZ)                      \
  void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
          CPURISCVState *env, uint32_t desc)             \
  {                                                      \
      uint32_t vm = vext_vm(desc);                       \
      uint32_t vl = env->vl;                             \
 +    uint32_t total_elems =                             \
 +        vext_get_total_elems(env, desc, ESZ);          \
 +    uint32_t vta = vext_vta(desc);                     \
      uint32_t i;                                        \
                                                         \
      if (vl == 0) {                                     \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
          do_##NAME(vd, vs2, i, env);                    \
      }                                                  \
      env->vstart = 0;                                   \
 +    vext_set_elems_1s(vd, vta, vl * ESZ,               \
 +                      total_elems * ESZ);              \
  }
  RVVCALL(OPFVV1, vfsqrt_v_h, OP_UU_H, H2, H2, float16_sqrt)
  RVVCALL(OPFVV1, vfsqrt_v_w, OP_UU_W, H4, H4, float32_sqrt)
  RVVCALL(OPFVV1, vfsqrt_v_d, OP_UU_D, H8, H8, float64_sqrt)
 -GEN_VEXT_V_ENV(vfsqrt_v_h)
 -GEN_VEXT_V_ENV(vfsqrt_v_w)
 -GEN_VEXT_V_ENV(vfsqrt_v_d)
 +GEN_VEXT_V_ENV(vfsqrt_v_h, 2)
 +GEN_VEXT_V_ENV(vfsqrt_v_w, 4)
 +GEN_VEXT_V_ENV(vfsqrt_v_d, 8)
  /*
   * Vector Floating-Point Reciprocal Square-Root Estimate Instruction
@@ -XXX,XX +XXX,XX @@ static float64 frsqrt7_d(float64 f, float_status *s)
  RVVCALL(OPFVV1, vfrsqrt7_v_h, OP_UU_H, H2, H2, frsqrt7_h)
  RVVCALL(OPFVV1, vfrsqrt7_v_w, OP_UU_W, H4, H4, frsqrt7_s)
  RVVCALL(OPFVV1, vfrsqrt7_v_d, OP_UU_D, H8, H8, frsqrt7_d)
 -GEN_VEXT_V_ENV(vfrsqrt7_v_h)
 -GEN_VEXT_V_ENV(vfrsqrt7_v_w)
 -GEN_VEXT_V_ENV(vfrsqrt7_v_d)
 +GEN_VEXT_V_ENV(vfrsqrt7_v_h, 2)
 +GEN_VEXT_V_ENV(vfrsqrt7_v_w, 4)
 +GEN_VEXT_V_ENV(vfrsqrt7_v_d, 8)
  /*
   * Vector Floating-Point Reciprocal Estimate Instruction
@@ -XXX,XX +XXX,XX @@ static float64 frec7_d(float64 f, float_status *s)
  RVVCALL(OPFVV1, vfrec7_v_h, OP_UU_H, H2, H2, frec7_h)
  RVVCALL(OPFVV1, vfrec7_v_w, OP_UU_W, H4, H4, frec7_s)
  RVVCALL(OPFVV1, vfrec7_v_d, OP_UU_D, H8, H8, frec7_d)
 -GEN_VEXT_V_ENV(vfrec7_v_h)
 -GEN_VEXT_V_ENV(vfrec7_v_w)
 -GEN_VEXT_V_ENV(vfrec7_v_d)
 +GEN_VEXT_V_ENV(vfrec7_v_h, 2)
 +GEN_VEXT_V_ENV(vfrec7_v_w, 4)
 +GEN_VEXT_V_ENV(vfrec7_v_d, 8)
  /* Vector Floating-Point MIN/MAX Instructions */
  RVVCALL(OPFVV2, vfmin_vv_h, OP_UUU_H, H2, H2, H2, float16_minimum_number)
  RVVCALL(OPFVV2, vfmin_vv_w, OP_UUU_W, H4, H4, H4, float32_minimum_number)
  RVVCALL(OPFVV2, vfmin_vv_d, OP_UUU_D, H8, H8, H8, float64_minimum_number)
 -GEN_VEXT_VV_ENV(vfmin_vv_h)
 -GEN_VEXT_VV_ENV(vfmin_vv_w)
 -GEN_VEXT_VV_ENV(vfmin_vv_d)
 +GEN_VEXT_VV_ENV(vfmin_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfmin_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfmin_vv_d, 8)
  RVVCALL(OPFVF2, vfmin_vf_h, OP_UUU_H, H2, H2, float16_minimum_number)
  RVVCALL(OPFVF2, vfmin_vf_w, OP_UUU_W, H4, H4, float32_minimum_number)
  RVVCALL(OPFVF2, vfmin_vf_d, OP_UUU_D, H8, H8, float64_minimum_number)
 -GEN_VEXT_VF(vfmin_vf_h)
 -GEN_VEXT_VF(vfmin_vf_w)
 -GEN_VEXT_VF(vfmin_vf_d)
 +GEN_VEXT_VF(vfmin_vf_h, 2)
 +GEN_VEXT_VF(vfmin_vf_w, 4)
 +GEN_VEXT_VF(vfmin_vf_d, 8)
  RVVCALL(OPFVV2, vfmax_vv_h, OP_UUU_H, H2, H2, H2, float16_maximum_number)
  RVVCALL(OPFVV2, vfmax_vv_w, OP_UUU_W, H4, H4, H4, float32_maximum_number)
  RVVCALL(OPFVV2, vfmax_vv_d, OP_UUU_D, H8, H8, H8, float64_maximum_number)
 -GEN_VEXT_VV_ENV(vfmax_vv_h)
 -GEN_VEXT_VV_ENV(vfmax_vv_w)
 -GEN_VEXT_VV_ENV(vfmax_vv_d)
 +GEN_VEXT_VV_ENV(vfmax_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfmax_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfmax_vv_d, 8)
  RVVCALL(OPFVF2, vfmax_vf_h, OP_UUU_H, H2, H2, float16_maximum_number)
  RVVCALL(OPFVF2, vfmax_vf_w, OP_UUU_W, H4, H4, float32_maximum_number)
  RVVCALL(OPFVF2, vfmax_vf_d, OP_UUU_D, H8, H8, float64_maximum_number)
 -GEN_VEXT_VF(vfmax_vf_h)
 -GEN_VEXT_VF(vfmax_vf_w)
 -GEN_VEXT_VF(vfmax_vf_d)
 +GEN_VEXT_VF(vfmax_vf_h, 2)
 +GEN_VEXT_VF(vfmax_vf_w, 4)
 +GEN_VEXT_VF(vfmax_vf_d, 8)
  /* Vector Floating-Point Sign-Injection Instructions */
  static uint16_t fsgnj16(uint16_t a, uint16_t b, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t fsgnj64(uint64_t a, uint64_t b, float_status *s)
  RVVCALL(OPFVV2, vfsgnj_vv_h, OP_UUU_H, H2, H2, H2, fsgnj16)
  RVVCALL(OPFVV2, vfsgnj_vv_w, OP_UUU_W, H4, H4, H4, fsgnj32)
  RVVCALL(OPFVV2, vfsgnj_vv_d, OP_UUU_D, H8, H8, H8, fsgnj64)
 -GEN_VEXT_VV_ENV(vfsgnj_vv_h)
 -GEN_VEXT_VV_ENV(vfsgnj_vv_w)
 -GEN_VEXT_VV_ENV(vfsgnj_vv_d)
 +GEN_VEXT_VV_ENV(vfsgnj_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfsgnj_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfsgnj_vv_d, 8)
  RVVCALL(OPFVF2, vfsgnj_vf_h, OP_UUU_H, H2, H2, fsgnj16)
  RVVCALL(OPFVF2, vfsgnj_vf_w, OP_UUU_W, H4, H4, fsgnj32)
  RVVCALL(OPFVF2, vfsgnj_vf_d, OP_UUU_D, H8, H8, fsgnj64)
 -GEN_VEXT_VF(vfsgnj_vf_h)
 -GEN_VEXT_VF(vfsgnj_vf_w)
 -GEN_VEXT_VF(vfsgnj_vf_d)
 +GEN_VEXT_VF(vfsgnj_vf_h, 2)
 +GEN_VEXT_VF(vfsgnj_vf_w, 4)
 +GEN_VEXT_VF(vfsgnj_vf_d, 8)
  static uint16_t fsgnjn16(uint16_t a, uint16_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fsgnjn64(uint64_t a, uint64_t b, float_status *s)
  RVVCALL(OPFVV2, vfsgnjn_vv_h, OP_UUU_H, H2, H2, H2, fsgnjn16)
  RVVCALL(OPFVV2, vfsgnjn_vv_w, OP_UUU_W, H4, H4, H4, fsgnjn32)
  RVVCALL(OPFVV2, vfsgnjn_vv_d, OP_UUU_D, H8, H8, H8, fsgnjn64)
 -GEN_VEXT_VV_ENV(vfsgnjn_vv_h)
 -GEN_VEXT_VV_ENV(vfsgnjn_vv_w)
 -GEN_VEXT_VV_ENV(vfsgnjn_vv_d)
 +GEN_VEXT_VV_ENV(vfsgnjn_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfsgnjn_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfsgnjn_vv_d, 8)
  RVVCALL(OPFVF2, vfsgnjn_vf_h, OP_UUU_H, H2, H2, fsgnjn16)
  RVVCALL(OPFVF2, vfsgnjn_vf_w, OP_UUU_W, H4, H4, fsgnjn32)
  RVVCALL(OPFVF2, vfsgnjn_vf_d, OP_UUU_D, H8, H8, fsgnjn64)
 -GEN_VEXT_VF(vfsgnjn_vf_h)
 -GEN_VEXT_VF(vfsgnjn_vf_w)
 -GEN_VEXT_VF(vfsgnjn_vf_d)
 +GEN_VEXT_VF(vfsgnjn_vf_h, 2)
 +GEN_VEXT_VF(vfsgnjn_vf_w, 4)
 +GEN_VEXT_VF(vfsgnjn_vf_d, 8)
  static uint16_t fsgnjx16(uint16_t a, uint16_t b, float_status *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint64_t fsgnjx64(uint64_t a, uint64_t b, float_status *s)
  RVVCALL(OPFVV2, vfsgnjx_vv_h, OP_UUU_H, H2, H2, H2, fsgnjx16)
  RVVCALL(OPFVV2, vfsgnjx_vv_w, OP_UUU_W, H4, H4, H4, fsgnjx32)
  RVVCALL(OPFVV2, vfsgnjx_vv_d, OP_UUU_D, H8, H8, H8, fsgnjx64)
 -GEN_VEXT_VV_ENV(vfsgnjx_vv_h)
 -GEN_VEXT_VV_ENV(vfsgnjx_vv_w)
 -GEN_VEXT_VV_ENV(vfsgnjx_vv_d)
 +GEN_VEXT_VV_ENV(vfsgnjx_vv_h, 2)
 +GEN_VEXT_VV_ENV(vfsgnjx_vv_w, 4)
 +GEN_VEXT_VV_ENV(vfsgnjx_vv_d, 8)
  RVVCALL(OPFVF2, vfsgnjx_vf_h, OP_UUU_H, H2, H2, fsgnjx16)
  RVVCALL(OPFVF2, vfsgnjx_vf_w, OP_UUU_W, H4, H4, fsgnjx32)
  RVVCALL(OPFVF2, vfsgnjx_vf_d, OP_UUU_D, H8, H8, fsgnjx64)
 -GEN_VEXT_VF(vfsgnjx_vf_h)
 -GEN_VEXT_VF(vfsgnjx_vf_w)
 -GEN_VEXT_VF(vfsgnjx_vf_d)
 +GEN_VEXT_VF(vfsgnjx_vf_h, 2)
 +GEN_VEXT_VF(vfsgnjx_vf_w, 4)
 +GEN_VEXT_VF(vfsgnjx_vf_d, 8)
  /* Vector Floating-Point Compare Instructions */
  #define GEN_VEXT_CMP_VV_ENV(NAME, ETYPE, H, DO_OP)            \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
  {                                                             \
      uint32_t vm = vext_vm(desc);                              \
      uint32_t vl = env->vl;                                    \
 +    uint32_t total_elems = env_archcpu(env)->cfg.vlen;        \
 +    uint32_t vta_all_1s = vext_vta_all_1s(desc);              \
      uint32_t i;                                               \
                                                                \
      for (i = env->vstart; i < vl; i++) {                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
                             DO_OP(s2, s1, &env->fp_status));   \
      }                                                         \
      env->vstart = 0;                                          \
 +    /* mask destination register are always tail-agnostic */  \
 +    /* set tail elements to 1s */                             \
 +    if (vta_all_1s) {                                         \
 +        for (; i < total_elems; i++) {                        \
 +            vext_set_elem_mask(vd, i, 1);                     \
 +        }                                                     \
 +    }                                                         \
  }
  GEN_VEXT_CMP_VV_ENV(vmfeq_vv_h, uint16_t, H2, float16_eq_quiet)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2,       \
  {                                                                   \
      uint32_t vm = vext_vm(desc);                                    \
      uint32_t vl = env->vl;                                          \
 +    uint32_t total_elems = env_archcpu(env)->cfg.vlen;              \
 +    uint32_t vta_all_1s = vext_vta_all_1s(desc);                    \
      uint32_t i;                                                     \
                                                                      \
      for (i = env->vstart; i < vl; i++) {                            \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2,       \
                             DO_OP(s2, (ETYPE)s1, &env->fp_status));  \
      }                                                               \
      env->vstart = 0;                                                \
 +    /* mask destination register are always tail-agnostic */        \
 +    /* set tail elements to 1s */                                   \
 +    if (vta_all_1s) {                                               \
 +        for (; i < total_elems; i++) {                              \
 +            vext_set_elem_mask(vd, i, 1);                           \
 +        }                                                           \
 +    }                                                               \
  }
  GEN_VEXT_CMP_VF(vmfeq_vf_h, uint16_t, H2, float16_eq_quiet)
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, void *vs2, int i)      \
      *((TD *)vd + HD(i)) = OP(s2);                      \
  }
 -#define GEN_VEXT_V(NAME)                               \
 +#define GEN_VEXT_V(NAME, ESZ)                          \
  void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
                    CPURISCVState *env, uint32_t desc)   \
  {                                                      \
      uint32_t vm = vext_vm(desc);                       \
      uint32_t vl = env->vl;                             \
 +    uint32_t total_elems =                             \
 +        vext_get_total_elems(env, desc, ESZ);          \
 +    uint32_t vta = vext_vta(desc);                     \
      uint32_t i;                                        \
                                                         \
      for (i = env->vstart; i < vl; i++) {               \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
          do_##NAME(vd, vs2, i);                         \
      }                                                  \
      env->vstart = 0;                                   \
 +    /* set tail elements to 1s */                      \
 +    vext_set_elems_1s(vd, vta, vl * ESZ,               \
 +                      total_elems * ESZ);              \
  }
  target_ulong fclass_h(uint64_t frs1)
@@ -XXX,XX +XXX,XX @@ target_ulong fclass_d(uint64_t frs1)
  RVVCALL(OPIVV1, vfclass_v_h, OP_UU_H, H2, H2, fclass_h)
  RVVCALL(OPIVV1, vfclass_v_w, OP_UU_W, H4, H4, fclass_s)
  RVVCALL(OPIVV1, vfclass_v_d, OP_UU_D, H8, H8, fclass_d)
 -GEN_VEXT_V(vfclass_v_h)
 -GEN_VEXT_V(vfclass_v_w)
 -GEN_VEXT_V(vfclass_v_d)
 +GEN_VEXT_V(vfclass_v_h, 2)
 +GEN_VEXT_V(vfclass_v_w, 4)
 +GEN_VEXT_V(vfclass_v_d, 8)
  /* Vector Floating-Point Merge Instruction */
 +
  #define GEN_VFMERGE_VF(NAME, ETYPE, H)                        \
  void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2, \
                    CPURISCVState *env, uint32_t desc)          \
  {                                                             \
      uint32_t vm = vext_vm(desc);                              \
      uint32_t vl = env->vl;                                    \
 +    uint32_t esz = sizeof(ETYPE);                             \
 +    uint32_t total_elems =                                    \
 +        vext_get_total_elems(env, desc, esz);                 \
 +    uint32_t vta = vext_vta(desc);                            \
      uint32_t i;                                               \
                                                                \
      for (i = env->vstart; i < vl; i++) {                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2, \
            = (!vm && !vext_elem_mask(v0, i) ? s2 : s1);        \
      }                                                         \
      env->vstart = 0;                                          \
 +    /* set tail elements to 1s */                             \
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);  \
  }
  GEN_VFMERGE_VF(vfmerge_vfm_h, int16_t, H2)
@@ -XXX,XX +XXX,XX @@ GEN_VFMERGE_VF(vfmerge_vfm_d, int64_t, H8)
  RVVCALL(OPFVV1, vfcvt_xu_f_v_h, OP_UU_H, H2, H2, float16_to_uint16)
  RVVCALL(OPFVV1, vfcvt_xu_f_v_w, OP_UU_W, H4, H4, float32_to_uint32)
  RVVCALL(OPFVV1, vfcvt_xu_f_v_d, OP_UU_D, H8, H8, float64_to_uint64)
 -GEN_VEXT_V_ENV(vfcvt_xu_f_v_h)
 -GEN_VEXT_V_ENV(vfcvt_xu_f_v_w)
 -GEN_VEXT_V_ENV(vfcvt_xu_f_v_d)
 +GEN_VEXT_V_ENV(vfcvt_xu_f_v_h, 2)
 +GEN_VEXT_V_ENV(vfcvt_xu_f_v_w, 4)
 +GEN_VEXT_V_ENV(vfcvt_xu_f_v_d, 8)
  /* vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. */
  RVVCALL(OPFVV1, vfcvt_x_f_v_h, OP_UU_H, H2, H2, float16_to_int16)
  RVVCALL(OPFVV1, vfcvt_x_f_v_w, OP_UU_W, H4, H4, float32_to_int32)
  RVVCALL(OPFVV1, vfcvt_x_f_v_d, OP_UU_D, H8, H8, float64_to_int64)
 -GEN_VEXT_V_ENV(vfcvt_x_f_v_h)
 -GEN_VEXT_V_ENV(vfcvt_x_f_v_w)
 -GEN_VEXT_V_ENV(vfcvt_x_f_v_d)
 +GEN_VEXT_V_ENV(vfcvt_x_f_v_h, 2)
 +GEN_VEXT_V_ENV(vfcvt_x_f_v_w, 4)
 +GEN_VEXT_V_ENV(vfcvt_x_f_v_d, 8)
  /* vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. */
  RVVCALL(OPFVV1, vfcvt_f_xu_v_h, OP_UU_H, H2, H2, uint16_to_float16)
  RVVCALL(OPFVV1, vfcvt_f_xu_v_w, OP_UU_W, H4, H4, uint32_to_float32)
  RVVCALL(OPFVV1, vfcvt_f_xu_v_d, OP_UU_D, H8, H8, uint64_to_float64)
 -GEN_VEXT_V_ENV(vfcvt_f_xu_v_h)
 -GEN_VEXT_V_ENV(vfcvt_f_xu_v_w)
 -GEN_VEXT_V_ENV(vfcvt_f_xu_v_d)
 +GEN_VEXT_V_ENV(vfcvt_f_xu_v_h, 2)
 +GEN_VEXT_V_ENV(vfcvt_f_xu_v_w, 4)
 +GEN_VEXT_V_ENV(vfcvt_f_xu_v_d, 8)
  /* vfcvt.f.x.v vd, vs2, vm # Convert integer to float. */
  RVVCALL(OPFVV1, vfcvt_f_x_v_h, OP_UU_H, H2, H2, int16_to_float16)
  RVVCALL(OPFVV1, vfcvt_f_x_v_w, OP_UU_W, H4, H4, int32_to_float32)
  RVVCALL(OPFVV1, vfcvt_f_x_v_d, OP_UU_D, H8, H8, int64_to_float64)
 -GEN_VEXT_V_ENV(vfcvt_f_x_v_h)
 -GEN_VEXT_V_ENV(vfcvt_f_x_v_w)
 -GEN_VEXT_V_ENV(vfcvt_f_x_v_d)
 +GEN_VEXT_V_ENV(vfcvt_f_x_v_h, 2)
 +GEN_VEXT_V_ENV(vfcvt_f_x_v_w, 4)
 +GEN_VEXT_V_ENV(vfcvt_f_x_v_d, 8)
  /* Widening Floating-Point/Integer Type-Convert Instructions */
  /* (TD, T2, TX2) */
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_V_ENV(vfcvt_f_x_v_d)
  /* vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer.*/
  RVVCALL(OPFVV1, vfwcvt_xu_f_v_h, WOP_UU_H, H4, H2, float16_to_uint32)
  RVVCALL(OPFVV1, vfwcvt_xu_f_v_w, WOP_UU_W, H8, H4, float32_to_uint64)
 -GEN_VEXT_V_ENV(vfwcvt_xu_f_v_h)
 -GEN_VEXT_V_ENV(vfwcvt_xu_f_v_w)
 +GEN_VEXT_V_ENV(vfwcvt_xu_f_v_h, 4)
 +GEN_VEXT_V_ENV(vfwcvt_xu_f_v_w, 8)
  /* vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. */
  RVVCALL(OPFVV1, vfwcvt_x_f_v_h, WOP_UU_H, H4, H2, float16_to_int32)
  RVVCALL(OPFVV1, vfwcvt_x_f_v_w, WOP_UU_W, H8, H4, float32_to_int64)
 -GEN_VEXT_V_ENV(vfwcvt_x_f_v_h)
 -GEN_VEXT_V_ENV(vfwcvt_x_f_v_w)
 +GEN_VEXT_V_ENV(vfwcvt_x_f_v_h, 4)
 +GEN_VEXT_V_ENV(vfwcvt_x_f_v_w, 8)
  /* vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float */
  RVVCALL(OPFVV1, vfwcvt_f_xu_v_b, WOP_UU_B, H2, H1, uint8_to_float16)
  RVVCALL(OPFVV1, vfwcvt_f_xu_v_h, WOP_UU_H, H4, H2, uint16_to_float32)
  RVVCALL(OPFVV1, vfwcvt_f_xu_v_w, WOP_UU_W, H8, H4, uint32_to_float64)
 -GEN_VEXT_V_ENV(vfwcvt_f_xu_v_b)
 -GEN_VEXT_V_ENV(vfwcvt_f_xu_v_h)
 -GEN_VEXT_V_ENV(vfwcvt_f_xu_v_w)
 +GEN_VEXT_V_ENV(vfwcvt_f_xu_v_b, 2)
 +GEN_VEXT_V_ENV(vfwcvt_f_xu_v_h, 4)
 +GEN_VEXT_V_ENV(vfwcvt_f_xu_v_w, 8)
  /* vfwcvt.f.x.v vd, vs2, vm # Convert integer to double-width float. */
  RVVCALL(OPFVV1, vfwcvt_f_x_v_b, WOP_UU_B, H2, H1, int8_to_float16)
  RVVCALL(OPFVV1, vfwcvt_f_x_v_h, WOP_UU_H, H4, H2, int16_to_float32)
  RVVCALL(OPFVV1, vfwcvt_f_x_v_w, WOP_UU_W, H8, H4, int32_to_float64)
 -GEN_VEXT_V_ENV(vfwcvt_f_x_v_b)
 -GEN_VEXT_V_ENV(vfwcvt_f_x_v_h)
 -GEN_VEXT_V_ENV(vfwcvt_f_x_v_w)
 +GEN_VEXT_V_ENV(vfwcvt_f_x_v_b, 2)
 +GEN_VEXT_V_ENV(vfwcvt_f_x_v_h, 4)
 +GEN_VEXT_V_ENV(vfwcvt_f_x_v_w, 8)
  /*
   * vfwcvt.f.f.v vd, vs2, vm
@@ -XXX,XX +XXX,XX @@ static uint32_t vfwcvtffv16(uint16_t a, float_status *s)
  RVVCALL(OPFVV1, vfwcvt_f_f_v_h, WOP_UU_H, H4, H2, vfwcvtffv16)
  RVVCALL(OPFVV1, vfwcvt_f_f_v_w, WOP_UU_W, H8, H4, float32_to_float64)
 -GEN_VEXT_V_ENV(vfwcvt_f_f_v_h)
 -GEN_VEXT_V_ENV(vfwcvt_f_f_v_w)
 +GEN_VEXT_V_ENV(vfwcvt_f_f_v_h, 4)
 +GEN_VEXT_V_ENV(vfwcvt_f_f_v_w, 8)
  /* Narrowing Floating-Point/Integer Type-Convert Instructions */
  /* (TD, T2, TX2) */
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_V_ENV(vfwcvt_f_f_v_w)
  RVVCALL(OPFVV1, vfncvt_xu_f_w_b, NOP_UU_B, H1, H2, float16_to_uint8)
  RVVCALL(OPFVV1, vfncvt_xu_f_w_h, NOP_UU_H, H2, H4, float32_to_uint16)
  RVVCALL(OPFVV1, vfncvt_xu_f_w_w, NOP_UU_W, H4, H8, float64_to_uint32)
 -GEN_VEXT_V_ENV(vfncvt_xu_f_w_b)
 -GEN_VEXT_V_ENV(vfncvt_xu_f_w_h)
 -GEN_VEXT_V_ENV(vfncvt_xu_f_w_w)
 +GEN_VEXT_V_ENV(vfncvt_xu_f_w_b, 1)
 +GEN_VEXT_V_ENV(vfncvt_xu_f_w_h, 2)
 +GEN_VEXT_V_ENV(vfncvt_xu_f_w_w, 4)
  /* vfncvt.x.f.v vd, vs2, vm # Convert double-width float to signed integer. */
  RVVCALL(OPFVV1, vfncvt_x_f_w_b, NOP_UU_B, H1, H2, float16_to_int8)
  RVVCALL(OPFVV1, vfncvt_x_f_w_h, NOP_UU_H, H2, H4, float32_to_int16)
  RVVCALL(OPFVV1, vfncvt_x_f_w_w, NOP_UU_W, H4, H8, float64_to_int32)
 -GEN_VEXT_V_ENV(vfncvt_x_f_w_b)
 -GEN_VEXT_V_ENV(vfncvt_x_f_w_h)
 -GEN_VEXT_V_ENV(vfncvt_x_f_w_w)
 +GEN_VEXT_V_ENV(vfncvt_x_f_w_b, 1)
 +GEN_VEXT_V_ENV(vfncvt_x_f_w_h, 2)
 +GEN_VEXT_V_ENV(vfncvt_x_f_w_w, 4)
  /* vfncvt.f.xu.v vd, vs2, vm # Convert double-width unsigned integer to float */
  RVVCALL(OPFVV1, vfncvt_f_xu_w_h, NOP_UU_H, H2, H4, uint32_to_float16)
  RVVCALL(OPFVV1, vfncvt_f_xu_w_w, NOP_UU_W, H4, H8, uint64_to_float32)
 -GEN_VEXT_V_ENV(vfncvt_f_xu_w_h)
 -GEN_VEXT_V_ENV(vfncvt_f_xu_w_w)
 +GEN_VEXT_V_ENV(vfncvt_f_xu_w_h, 2)
 +GEN_VEXT_V_ENV(vfncvt_f_xu_w_w, 4)
  /* vfncvt.f.x.v vd, vs2, vm # Convert double-width integer to float. */
  RVVCALL(OPFVV1, vfncvt_f_x_w_h, NOP_UU_H, H2, H4, int32_to_float16)
  RVVCALL(OPFVV1, vfncvt_f_x_w_w, NOP_UU_W, H4, H8, int64_to_float32)
 -GEN_VEXT_V_ENV(vfncvt_f_x_w_h)
 -GEN_VEXT_V_ENV(vfncvt_f_x_w_w)
 +GEN_VEXT_V_ENV(vfncvt_f_x_w_h, 2)
 +GEN_VEXT_V_ENV(vfncvt_f_x_w_w, 4)
  /* vfncvt.f.f.v vd, vs2, vm # Convert double float to single-width float. */
  static uint16_t vfncvtffv16(uint32_t a, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint16_t vfncvtffv16(uint32_t a, float_status *s)
  RVVCALL(OPFVV1, vfncvt_f_f_w_h, NOP_UU_H, H2, H4, vfncvtffv16)
  RVVCALL(OPFVV1, vfncvt_f_f_w_w, NOP_UU_W, H4, H8, float64_to_float32)
 -GEN_VEXT_V_ENV(vfncvt_f_f_w_h)
 -GEN_VEXT_V_ENV(vfncvt_f_f_w_w)
 +GEN_VEXT_V_ENV(vfncvt_f_f_w_h, 2)
 +GEN_VEXT_V_ENV(vfncvt_f_f_w_w, 4)
  /*
   *** Vector Reduction Operations
 diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/insn_trans/trans_rvv.c.inc
 +++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
 +        data =                                                     \
 +            FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);\
          tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                             vreg_ofs(s, a->rs1),                    \
                             vreg_ofs(s, a->rs2), cpu_env,           \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)            \
          gen_set_rm(s, RISCV_FRM_DYN);                             \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);            \
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);              \
 +        data = FIELD_DP32(data, VDATA, VTA_ALL_1S,                \
 +                          s->cfg_vta_all_1s);                     \
          return opfvf_trans(a->rd, a->rs1, a->rs2, data,           \
                             fns[s->sew - 1], s);                   \
      }                                                             \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
                                                                   \
          data = FIELD_DP32(data, VDATA, VM, a->vm);               \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);           \
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);             \
          tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),   \
                             vreg_ofs(s, a->rs1),                  \
                             vreg_ofs(s, a->rs2), cpu_env,         \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
          gen_set_rm(s, RISCV_FRM_DYN);                            \
          data = FIELD_DP32(data, VDATA, VM, a->vm);               \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);           \
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);             \
          return opfvf_trans(a->rd, a->rs1, a->rs2, data,          \
                             fns[s->sew - 1], s);                  \
      }                                                            \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
          tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                             vreg_ofs(s, a->rs1),                    \
                             vreg_ofs(s, a->rs2), cpu_env,           \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
          gen_set_rm(s, RISCV_FRM_DYN);                            \
          data = FIELD_DP32(data, VDATA, VM, a->vm);               \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);           \
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);             \
          return opfvf_trans(a->rd, a->rs1, a->rs2, data,          \
                             fns[s->sew - 1], s);                  \
      }                                                            \
@@ -XXX,XX +XXX,XX @@ static bool do_opfv(DisasContext *s, arg_rmr *a,
          data = FIELD_DP32(data, VDATA, VM, a->vm);
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);
          tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
                             vreg_ofs(s, a->rs2), cpu_env,
                             s->cfg_ptr->vlen / 8,
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
          tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                             vreg_ofs(s, a->rs2), cpu_env,           \
                             s->cfg_ptr->vlen / 8,                   \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
          tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
 +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
          tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                             vreg_ofs(s, a->rs2), cpu_env,           \
                             s->cfg_ptr->vlen / 8,                   \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
          tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                             vreg_ofs(s, a->rs2), cpu_env,           \
                             s->cfg_ptr->vlen / 8,                   \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
          tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
 +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
          tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                             vreg_ofs(s, a->rs2), cpu_env,           \
                             s->cfg_ptr->vlen / 8,                   \
 --
-.36.1
+.41.0

-New patch
+[PULL v2 23/45] target/riscv: Implement WARL behaviour for mcountinhibit/mcounteren
+From: Rob Bradford <rbradford@rivosinc.com>
+These are WARL fields - zero out the bits for unavailable counters and
+special case the TM bit in mcountinhibit which is hardwired to zero.
+This patch achieves this by modifying the value written so that any use
+of the field will see the correctly masked bits.
+Tested by modifying OpenSBI to write max value to these CSRs and upon
+subsequent read the appropriate number of bits for number of PMUs is
+enabled and the TM bit is zero in mcountinhibit.
+Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
+Acked-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Atish Patra <atishp@rivosinc.com>
+Message-ID: <20230802124906.24197-1-rbradford@rivosinc.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ target/riscv/csr.c | 11 +++++++++--
+file changed, 9 insertions(+), 2 deletions(-)
+diff --git a/target/riscv/csr.c b/target/riscv/csr.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/csr.c
++++ b/target/riscv/csr.c
+@@ -XXX,XX +XXX,XX @@ static RISCVException write_mcountinhibit(CPURISCVState *env, int csrno,
+ {
+     int cidx;
+     PMUCTRState *counter;
++    RISCVCPU *cpu = env_archcpu(env);
+-    env->mcountinhibit = val;
++    /* WARL register - disable unavailable counters; TM bit is always 0 */
++    env->mcountinhibit =
++        val & (cpu->pmu_avail_ctrs | COUNTEREN_CY | COUNTEREN_IR);
+     /* Check if any other counter is also monitoring cycles/instructions */
+     for (cidx = 0; cidx < RV_MAX_MHPMCOUNTERS; cidx++) {
+@@ -XXX,XX +XXX,XX @@ static RISCVException read_mcounteren(CPURISCVState *env, int csrno,
+ static RISCVException write_mcounteren(CPURISCVState *env, int csrno,
+                                        target_ulong val)
+ {
+-    env->mcounteren = val;
++    RISCVCPU *cpu = env_archcpu(env);
++
++    /* WARL register - disable unavailable counters */
++    env->mcounteren = val & (cpu->pmu_avail_ctrs | COUNTEREN_CY | COUNTEREN_TM |
++                             COUNTEREN_IR);
+     return RISCV_EXCP_NONE;
+ }
+--
+.41.0

-New patch
+[PULL v2 24/45] target/riscv: Add Zihintntl extension ISA string to DTS
+From: Jason Chien <jason.chien@sifive.com>
+RVA23 Profiles states:
+The RVA23 profiles are intended to be used for 64-bit application
+processors that will run rich OS stacks from standard binary OS
+distributions and with a substantial number of third-party binary user
+applications that will be supported over a considerable length of time
+in the field.
+The chapter 4 of the unprivileged spec introduces the Zihintntl extension
+and Zihintntl is a mandatory extension presented in RVA23 Profiles, whose
+purpose is to enable application and operating system portability across
+different implementations. Thus the DTS should contain the Zihintntl ISA
+string in order to pass to software.
+The unprivileged spec states:
+Like any HINTs, these instructions may be freely ignored. Hence, although
+they are described in terms of cache-based memory hierarchies, they do not
+mandate the provision of caches.
+These instructions are encoded with non-used opcode, e.g. ADD x0, x0, x2,
+which QEMU already supports, and QEMU does not emulate cache. Therefore
+these instructions can be considered as a no-op, and we only need to add
+a new property for the Zihintntl extension.
+Reviewed-by: Frank Chang <frank.chang@sifive.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Signed-off-by: Jason Chien <jason.chien@sifive.com>
+Message-ID: <20230726074049.19505-2-jason.chien@sifive.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ target/riscv/cpu_cfg.h | 1 +
+ target/riscv/cpu.c     | 2 ++
+files changed, 3 insertions(+)
+diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/cpu_cfg.h
++++ b/target/riscv/cpu_cfg.h
+@@ -XXX,XX +XXX,XX @@ struct RISCVCPUConfig {
+     bool ext_icbom;
+     bool ext_icboz;
+     bool ext_zicond;
++    bool ext_zihintntl;
+     bool ext_zihintpause;
+     bool ext_smstateen;
+     bool ext_sstc;
+diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/cpu.c
++++ b/target/riscv/cpu.c
+@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
+     ISA_EXT_DATA_ENTRY(zicond, PRIV_VERSION_1_12_0, ext_zicond),
+     ISA_EXT_DATA_ENTRY(zicsr, PRIV_VERSION_1_10_0, ext_icsr),
+     ISA_EXT_DATA_ENTRY(zifencei, PRIV_VERSION_1_10_0, ext_ifencei),
++    ISA_EXT_DATA_ENTRY(zihintntl, PRIV_VERSION_1_10_0, ext_zihintntl),
+     ISA_EXT_DATA_ENTRY(zihintpause, PRIV_VERSION_1_10_0, ext_zihintpause),
+     ISA_EXT_DATA_ENTRY(zmmul, PRIV_VERSION_1_12_0, ext_zmmul),
+     ISA_EXT_DATA_ENTRY(zawrs, PRIV_VERSION_1_12_0, ext_zawrs),
+@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
+     DEFINE_PROP_BOOL("sscofpmf", RISCVCPU, cfg.ext_sscofpmf, false),
+     DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
+     DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
++    DEFINE_PROP_BOOL("Zihintntl", RISCVCPU, cfg.ext_zihintntl, true),
+     DEFINE_PROP_BOOL("Zihintpause", RISCVCPU, cfg.ext_zihintpause, true),
+     DEFINE_PROP_BOOL("Zawrs", RISCVCPU, cfg.ext_zawrs, true),
+     DEFINE_PROP_BOOL("Zfa", RISCVCPU, cfg.ext_zfa, true),
+--
+.41.0

-New patch
+[PULL v2 25/45] target/riscv: Fix zfa fleq.d and fltq.d
+From: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
+Commit a47842d ("riscv: Add support for the Zfa extension") implemented the zfa extension.
+However, it has some typos for fleq.d and fltq.d. Both of them misused the fltq.s
+helper function.
+Fixes: a47842d ("riscv: Add support for the Zfa extension")
+Signed-off-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
+Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
+Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
+Message-ID: <20230728003906.768-1-zhiwei_liu@linux.alibaba.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ target/riscv/insn_trans/trans_rvzfa.c.inc | 4 ++--
+file changed, 2 insertions(+), 2 deletions(-)
+diff --git a/target/riscv/insn_trans/trans_rvzfa.c.inc b/target/riscv/insn_trans/trans_rvzfa.c.inc
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/insn_trans/trans_rvzfa.c.inc
++++ b/target/riscv/insn_trans/trans_rvzfa.c.inc
+@@ -XXX,XX +XXX,XX @@ bool trans_fleq_d(DisasContext *ctx, arg_fleq_d *a)
+     TCGv_i64 src1 = get_fpr_hs(ctx, a->rs1);
+     TCGv_i64 src2 = get_fpr_hs(ctx, a->rs2);
+-    gen_helper_fltq_s(dest, cpu_env, src1, src2);
++    gen_helper_fleq_d(dest, cpu_env, src1, src2);
+     gen_set_gpr(ctx, a->rd, dest);
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ bool trans_fltq_d(DisasContext *ctx, arg_fltq_d *a)
+     TCGv_i64 src1 = get_fpr_hs(ctx, a->rs1);
+     TCGv_i64 src2 = get_fpr_hs(ctx, a->rs2);
+-    gen_helper_fltq_s(dest, cpu_env, src1, src2);
++    gen_helper_fltq_d(dest, cpu_env, src1, src2);
+     gen_set_gpr(ctx, a->rd, dest);
+     return true;
+ }
+--
+.41.0

-New patch
+[PULL v2 26/45] hw/intc: Fix upper/lower mtime write calculation
+From: Jason Chien <jason.chien@sifive.com>
+When writing the upper mtime, we should keep the original lower mtime
+whose value is given by cpu_riscv_read_rtc() instead of
+cpu_riscv_read_rtc_raw(). The same logic applies to writes to lower mtime.
+Signed-off-by: Jason Chien <jason.chien@sifive.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Message-ID: <20230728082502.26439-1-jason.chien@sifive.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ hw/intc/riscv_aclint.c | 5 +++--
+file changed, 3 insertions(+), 2 deletions(-)
+diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/intc/riscv_aclint.c
++++ b/hw/intc/riscv_aclint.c
+@@ -XXX,XX +XXX,XX @@ static void riscv_aclint_mtimer_write(void *opaque, hwaddr addr,
+         return;
+     } else if (addr == mtimer->time_base || addr == mtimer->time_base + 4) {
+         uint64_t rtc_r = cpu_riscv_read_rtc_raw(mtimer->timebase_freq);
++        uint64_t rtc = cpu_riscv_read_rtc(mtimer);
+         if (addr == mtimer->time_base) {
+             if (size == 4) {
+                 /* time_lo for RV32/RV64 */
+-                mtimer->time_delta = ((rtc_r & ~0xFFFFFFFFULL) | value) - rtc_r;
++                mtimer->time_delta = ((rtc & ~0xFFFFFFFFULL) | value) - rtc_r;
+             } else {
+                 /* time for RV64 */
+                 mtimer->time_delta = value - rtc_r;
+@@ -XXX,XX +XXX,XX @@ static void riscv_aclint_mtimer_write(void *opaque, hwaddr addr,
+         } else {
+             if (size == 4) {
+                 /* time_hi for RV32/RV64 */
+-                mtimer->time_delta = (value << 32 | (rtc_r & 0xFFFFFFFF)) - rtc_r;
++                mtimer->time_delta = (value << 32 | (rtc & 0xFFFFFFFF)) - rtc_r;
+             } else {
+                 qemu_log_mask(LOG_GUEST_ERROR,
+                               "aclint-mtimer: invalid time_hi write: %08x",
+--
+.41.0

-[PULL 01/25] MAINTAINERS: Cover hw/core/uboot_image.h within Generic Loader section
+[PULL v2 27/45] hw/intc: Make rtc variable names consistent
-From: Alistair Francis <alistair.francis@wdc.com>
+From: Jason Chien <jason.chien@sifive.com>
+The variables whose values are given by cpu_riscv_read_rtc() should be named
+"rtc". The variables whose value are given by cpu_riscv_read_rtc_raw()
+should be named "rtc_r".
+Signed-off-by: Jason Chien <jason.chien@sifive.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Message-ID: <20230728082502.26439-2-jason.chien@sifive.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-Id: <20220509091339.26016-1-alistair.francis@wdc.com>
 ---
- MAINTAINERS | 1 +
+ hw/intc/riscv_aclint.c | 6 +++---
-file changed, 1 insertion(+)
+file changed, 3 insertions(+), 3 deletions(-)
-diff --git a/MAINTAINERS b/MAINTAINERS
+diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
 index XXXXXXX..XXXXXXX 100644
---- a/MAINTAINERS
+--- a/hw/intc/riscv_aclint.c
-+++ b/MAINTAINERS
++++ b/hw/intc/riscv_aclint.c
-@@ -XXX,XX +XXX,XX @@ Generic Loader
+@@ -XXX,XX +XXX,XX @@ static void riscv_aclint_mtimer_write_timecmp(RISCVAclintMTimerState *mtimer,
- M: Alistair Francis <alistair@alistair23.me>
+     uint64_t next;
- S: Maintained
+     uint64_t diff;
- F: hw/core/generic-loader.c
-+F: hw/core/uboot_image.h
+-    uint64_t rtc_r = cpu_riscv_read_rtc(mtimer);
- F: include/hw/core/generic-loader.h
++    uint64_t rtc = cpu_riscv_read_rtc(mtimer);
- F: docs/system/generic-loader.rst
      /* Compute the relative hartid w.r.t the socket */
      hartid = hartid - mtimer->hartid_base;
      mtimer->timecmp[hartid] = value;
 -    if (mtimer->timecmp[hartid] <= rtc_r) {
 +    if (mtimer->timecmp[hartid] <= rtc) {
          /*
           * If we're setting an MTIMECMP value in the "past",
           * immediately raise the timer interrupt
@@ -XXX,XX +XXX,XX @@ static void riscv_aclint_mtimer_write_timecmp(RISCVAclintMTimerState *mtimer,
      /* otherwise, set up the future timer interrupt */
      qemu_irq_lower(mtimer->timer_irqs[hartid]);
 -    diff = mtimer->timecmp[hartid] - rtc_r;
 +    diff = mtimer->timecmp[hartid] - rtc;
      /* back to ns (note args switched in muldiv64) */
      uint64_t ns_diff = muldiv64(diff, NANOSECONDS_PER_SECOND, timebase_freq);
 --
-.36.1
+.41.0

-New patch
+[PULL v2 28/45] linux-user/riscv: Use abi type for target_ucontext
+From: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
+We should not use types dependend on host arch for target_ucontext.
+This bug is found when run rv32 applications.
+Signed-off-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Message-ID: <20230811055438.1945-1-zhiwei_liu@linux.alibaba.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ linux-user/riscv/signal.c | 4 ++--
+file changed, 2 insertions(+), 2 deletions(-)
+diff --git a/linux-user/riscv/signal.c b/linux-user/riscv/signal.c
+index XXXXXXX..XXXXXXX 100644
+--- a/linux-user/riscv/signal.c
++++ b/linux-user/riscv/signal.c
+@@ -XXX,XX +XXX,XX @@ struct target_sigcontext {
+ }; /* cf. riscv-linux:arch/riscv/include/uapi/asm/ptrace.h */
+ struct target_ucontext {
+-    unsigned long uc_flags;
+-    struct target_ucontext *uc_link;
++    abi_ulong uc_flags;
++    abi_ptr uc_link;
+     target_stack_t uc_stack;
+     target_sigset_t uc_sigmask;
+     uint8_t   __unused[1024 / 8 - sizeof(target_sigset_t)];
+--
+.41.0

-[PULL 05/25] hw/core/loader: return image sizes as ssize_t
+[PULL v2 29/45] target/riscv: support the AIA device emulation with KVM enabled
-From: Jamie Iles <jamie@nuviainc.com>
+From: Yong-Xuan Wang <yongxuan.wang@sifive.com>
-Various loader functions return an int which limits images to 2GB which
+In this patch, we create the APLIC and IMSIC FDT helper functions and
-is fine for things like a BIOS/kernel image, but if we want to be able
+remove M mode AIA devices when using KVM acceleration.
 to load memory images or large ramdisks then any file over 2GB would
 silently fail to load.
-Cc: Luc Michel <lmichel@kalray.eu>
+Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
-Signed-off-by: Jamie Iles <jamie@nuviainc.com>
+Reviewed-by: Jim Shu <jim.shu@sifive.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
-Reviewed-by: Luc Michel <lmichel@kalray.eu>
+Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Message-ID: <20230727102439.22554-2-yongxuan.wang@sifive.com>
 Message-Id: <20211111141141.3295094-2-jamie@nuviainc.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- include/hw/loader.h      | 55 +++++++++++++--------------
+ hw/riscv/virt.c | 290 +++++++++++++++++++++++-------------------------
- hw/arm/armv7m.c          |  2 +-
+file changed, 137 insertions(+), 153 deletions(-)
  hw/arm/boot.c            |  8 ++--
  hw/core/generic-loader.c |  2 +-
  hw/core/loader.c         | 81 +++++++++++++++++++++-------------------
  hw/i386/x86.c            |  2 +-
  hw/riscv/boot.c          |  5 ++-
 files changed, 80 insertions(+), 75 deletions(-)
-diff --git a/include/hw/loader.h b/include/hw/loader.h
+diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/loader.h
+--- a/hw/riscv/virt.c
-+++ b/include/hw/loader.h
++++ b/hw/riscv/virt.c
-@@ -XXX,XX +XXX,XX @@ ssize_t load_image_size(const char *filename, void *addr, size_t size);
+@@ -XXX,XX +XXX,XX @@ static uint32_t imsic_num_bits(uint32_t count)
   *
   * Returns the size of the loaded image on success, -1 otherwise.
   */
 -int load_image_targphys_as(const char *filename,
 -                           hwaddr addr, uint64_t max_sz, AddressSpace *as);
 +ssize_t load_image_targphys_as(const char *filename,
 +                               hwaddr addr, uint64_t max_sz, AddressSpace *as);
  /**load_targphys_hex_as:
   * @filename: Path to the .hex file
@@ -XXX,XX +XXX,XX @@ int load_image_targphys_as(const char *filename,
   *
   * Returns the size of the loaded .hex file on success, -1 otherwise.
   */
 -int load_targphys_hex_as(const char *filename, hwaddr *entry, AddressSpace *as);
 +ssize_t load_targphys_hex_as(const char *filename, hwaddr *entry,
 +                             AddressSpace *as);
  /** load_image_targphys:
   * Same as load_image_targphys_as(), but doesn't allow the caller to specify
   * an AddressSpace.
   */
 -int load_image_targphys(const char *filename, hwaddr,
 -                        uint64_t max_sz);
 +ssize_t load_image_targphys(const char *filename, hwaddr,
 +                            uint64_t max_sz);
  /**
   * load_image_mr: load an image into a memory region
@@ -XXX,XX +XXX,XX @@ int load_image_targphys(const char *filename, hwaddr,
   * If the file is larger than the memory region's size the call will fail.
   * Returns -1 on failure, or the size of the file.
   */
 -int load_image_mr(const char *filename, MemoryRegion *mr);
 +ssize_t load_image_mr(const char *filename, MemoryRegion *mr);
  /* This is the limit on the maximum uncompressed image size that
   * load_image_gzipped_buffer() and load_image_gzipped() will read. It prevents
@@ -XXX,XX +XXX,XX @@ int load_image_mr(const char *filename, MemoryRegion *mr);
   */
  #define LOAD_IMAGE_MAX_GUNZIP_BYTES (256 << 20)
 -int load_image_gzipped_buffer(const char *filename, uint64_t max_sz,
 -                              uint8_t **buffer);
 -int load_image_gzipped(const char *filename, hwaddr addr, uint64_t max_sz);
 +ssize_t load_image_gzipped_buffer(const char *filename, uint64_t max_sz,
 +                                  uint8_t **buffer);
 +ssize_t load_image_gzipped(const char *filename, hwaddr addr, uint64_t max_sz);
  #define ELF_LOAD_FAILED       -1
  #define ELF_LOAD_NOT_ELF      -2
@@ -XXX,XX +XXX,XX @@ ssize_t load_elf(const char *filename,
   */
  void load_elf_hdr(const char *filename, void *hdr, bool *is64, Error **errp);
 -int load_aout(const char *filename, hwaddr addr, int max_sz,
 -              int bswap_needed, hwaddr target_page_size);
 +ssize_t load_aout(const char *filename, hwaddr addr, int max_sz,
 +                  int bswap_needed, hwaddr target_page_size);
  #define LOAD_UIMAGE_LOADADDR_INVALID (-1)
@@ -XXX,XX +XXX,XX @@ int load_aout(const char *filename, hwaddr addr, int max_sz,
   *
   * Returns the size of the loaded image on success, -1 otherwise.
   */
 -int load_uimage_as(const char *filename, hwaddr *ep,
 -                   hwaddr *loadaddr, int *is_linux,
 -                   uint64_t (*translate_fn)(void *, uint64_t),
 -                   void *translate_opaque, AddressSpace *as);
 +ssize_t load_uimage_as(const char *filename, hwaddr *ep,
 +                       hwaddr *loadaddr, int *is_linux,
 +                       uint64_t (*translate_fn)(void *, uint64_t),
 +                       void *translate_opaque, AddressSpace *as);
  /** load_uimage:
   * Same as load_uimage_as(), but doesn't allow the caller to specify an
   * AddressSpace.
   */
 -int load_uimage(const char *filename, hwaddr *ep,
 -                hwaddr *loadaddr, int *is_linux,
 -                uint64_t (*translate_fn)(void *, uint64_t),
 -                void *translate_opaque);
 +ssize_t load_uimage(const char *filename, hwaddr *ep,
 +                    hwaddr *loadaddr, int *is_linux,
 +                    uint64_t (*translate_fn)(void *, uint64_t),
 +                    void *translate_opaque);
  /**
   * load_ramdisk_as:
@@ -XXX,XX +XXX,XX @@ int load_uimage(const char *filename, hwaddr *ep,
   *
   * Returns the size of the loaded image on success, -1 otherwise.
   */
 -int load_ramdisk_as(const char *filename, hwaddr addr, uint64_t max_sz,
 -                    AddressSpace *as);
 +ssize_t load_ramdisk_as(const char *filename, hwaddr addr, uint64_t max_sz,
 +                        AddressSpace *as);
  /**
   * load_ramdisk:
   * Same as load_ramdisk_as(), but doesn't allow the caller to specify
   * an AddressSpace.
   */
 -int load_ramdisk(const char *filename, hwaddr addr, uint64_t max_sz);
 +ssize_t load_ramdisk(const char *filename, hwaddr addr, uint64_t max_sz);
  ssize_t gunzip(void *dst, size_t dstlen, uint8_t *src, size_t srclen);
@@ -XXX,XX +XXX,XX @@ void pstrcpy_targphys(const char *name,
  extern bool option_rom_has_mr;
  extern bool rom_file_has_mr;
 -int rom_add_file(const char *file, const char *fw_dir,
 -                 hwaddr addr, int32_t bootindex,
 -                 bool option_rom, MemoryRegion *mr, AddressSpace *as);
 +ssize_t rom_add_file(const char *file, const char *fw_dir,
 +                     hwaddr addr, int32_t bootindex,
 +                     bool option_rom, MemoryRegion *mr, AddressSpace *as);
  MemoryRegion *rom_add_blob(const char *name, const void *blob, size_t len,
                             size_t max_len, hwaddr addr,
                             const char *fw_file_name,
@@ -XXX,XX +XXX,XX @@ void hmp_info_roms(Monitor *mon, const QDict *qdict);
  #define rom_add_blob_fixed_as(_f, _b, _l, _a, _as)      \
      rom_add_blob(_f, _b, _l, _l, _a, NULL, NULL, NULL, _as, true)
 -int rom_add_vga(const char *file);
 -int rom_add_option(const char *file, int32_t bootindex);
 +ssize_t rom_add_vga(const char *file);
 +ssize_t rom_add_option(const char *file, int32_t bootindex);
  /* This is the usual maximum in uboot, so if a uImage overflows this, it would
   * overflow on real hardware too. */
 diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/armv7m.c
 +++ b/hw/arm/armv7m.c
@@ -XXX,XX +XXX,XX @@ static void armv7m_reset(void *opaque)
  void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int mem_size)
  {
 -    int image_size;
 +    ssize_t image_size;
      uint64_t entry;
      int big_endian;
      AddressSpace *as;
 diff --git a/hw/arm/boot.c b/hw/arm/boot.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/boot.c
 +++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ static int do_arm_linux_init(Object *obj, void *opaque)
      return 0;
  }
 -static int64_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry,
 +static ssize_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry,
                              uint64_t *lowaddr, uint64_t *highaddr,
                              int elf_machine, AddressSpace *as)
  {
@@ -XXX,XX +XXX,XX @@ static int64_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry,
      } elf_header;
      int data_swab = 0;
      bool big_endian;
 -    int64_t ret = -1;
 +    ssize_t ret = -1;
      Error *err = NULL;
@@ -XXX,XX +XXX,XX @@ static void arm_setup_direct_kernel_boot(ARMCPU *cpu,
      /* Set up for a direct boot of a kernel image file. */
      CPUState *cs;
      AddressSpace *as = arm_boot_address_space(cpu, info);
 -    int kernel_size;
 +    ssize_t kernel_size;
      int initrd_size;
      int is_linux = 0;
      uint64_t elf_entry;
@@ -XXX,XX +XXX,XX @@ static void arm_setup_direct_kernel_boot(ARMCPU *cpu,
      if (kernel_size > info->ram_size) {
          error_report("kernel '%s' is too large to fit in RAM "
 -                     "(kernel size %d, RAM size %" PRId64 ")",
 +                     "(kernel size %zd, RAM size %" PRId64 ")",
                       info->kernel_filename, kernel_size, info->ram_size);
          exit(1);
      }
 diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/core/generic-loader.c
 +++ b/hw/core/generic-loader.c
@@ -XXX,XX +XXX,XX @@ static void generic_loader_realize(DeviceState *dev, Error **errp)
      GenericLoaderState *s = GENERIC_LOADER(dev);
      hwaddr entry;
      int big_endian;
 -    int size = 0;
 +    ssize_t size = 0;
      s->set_pc = false;
 diff --git a/hw/core/loader.c b/hw/core/loader.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/core/loader.c
 +++ b/hw/core/loader.c
@@ -XXX,XX +XXX,XX @@ ssize_t read_targphys(const char *name,
      return did;
  }
 -int load_image_targphys(const char *filename,
 -                        hwaddr addr, uint64_t max_sz)
 +ssize_t load_image_targphys(const char *filename,
 +                            hwaddr addr, uint64_t max_sz)
  {
      return load_image_targphys_as(filename, addr, max_sz, NULL);
  }
  /* return the size or -1 if error */
 -int load_image_targphys_as(const char *filename,
 -                           hwaddr addr, uint64_t max_sz, AddressSpace *as)
 +ssize_t load_image_targphys_as(const char *filename,
 +                               hwaddr addr, uint64_t max_sz, AddressSpace *as)
  {
 -    int size;
 +    ssize_t size;
      size = get_image_size(filename);
      if (size < 0 || size > max_sz) {
@@ -XXX,XX +XXX,XX @@ int load_image_targphys_as(const char *filename,
      return size;
  }
 -int load_image_mr(const char *filename, MemoryRegion *mr)
 +ssize_t load_image_mr(const char *filename, MemoryRegion *mr)
  {
 -    int size;
 +    ssize_t size;
      if (!memory_access_is_direct(mr, false)) {
          /* Can only load an image into RAM or ROM */
@@ -XXX,XX +XXX,XX @@ static void bswap_ahdr(struct exec *e)
       : (_N_SEGMENT_ROUND (_N_TXTENDADDR(x, target_page_size), target_page_size)))
 -int load_aout(const char *filename, hwaddr addr, int max_sz,
 -              int bswap_needed, hwaddr target_page_size)
 +ssize_t load_aout(const char *filename, hwaddr addr, int max_sz,
 +                  int bswap_needed, hwaddr target_page_size)
  {
      int fd;
      ssize_t size, ret;
@@ -XXX,XX +XXX,XX @@ toosmall:
  }
  /* Load a U-Boot image.  */
 -static int load_uboot_image(const char *filename, hwaddr *ep, hwaddr *loadaddr,
 -                            int *is_linux, uint8_t image_type,
 -                            uint64_t (*translate_fn)(void *, uint64_t),
 -                            void *translate_opaque, AddressSpace *as)
 +static ssize_t load_uboot_image(const char *filename, hwaddr *ep,
 +                                hwaddr *loadaddr, int *is_linux,
 +                                uint8_t image_type,
 +                                uint64_t (*translate_fn)(void *, uint64_t),
 +                                void *translate_opaque, AddressSpace *as)
  {
      int fd;
 -    int size;
 +    ssize_t size;
      hwaddr address;
      uboot_image_header_t h;
      uboot_image_header_t *hdr = &h;
@@ -XXX,XX +XXX,XX @@ out:
      return ret;
  }
--int load_uimage(const char *filename, hwaddr *ep, hwaddr *loadaddr,
+-static void create_fdt_imsic(RISCVVirtState *s, const MemMapEntry *memmap,
--                int *is_linux,
+-                             uint32_t *phandle, uint32_t *intc_phandles,
--                uint64_t (*translate_fn)(void *, uint64_t),
+-                             uint32_t *msi_m_phandle, uint32_t *msi_s_phandle)
--                void *translate_opaque)
++static void create_fdt_one_imsic(RISCVVirtState *s, hwaddr base_addr,
-+ssize_t load_uimage(const char *filename, hwaddr *ep, hwaddr *loadaddr,
++                                 uint32_t *intc_phandles, uint32_t msi_phandle,
-+                    int *is_linux,
++                                 bool m_mode, uint32_t imsic_guest_bits)
 +                    uint64_t (*translate_fn)(void *, uint64_t),
 +                    void *translate_opaque)
  {
-     return load_uboot_image(filename, ep, loadaddr, is_linux, IH_TYPE_KERNEL,
+     int cpu, socket;
-                             translate_fn, translate_opaque, NULL);
+     char *imsic_name;
      MachineState *ms = MACHINE(s);
      int socket_count = riscv_socket_count(ms);
 -    uint32_t imsic_max_hart_per_socket, imsic_guest_bits;
 +    uint32_t imsic_max_hart_per_socket;
      uint32_t *imsic_cells, *imsic_regs, imsic_addr, imsic_size;
 -    *msi_m_phandle = (*phandle)++;
 -    *msi_s_phandle = (*phandle)++;
      imsic_cells = g_new0(uint32_t, ms->smp.cpus * 2);
      imsic_regs = g_new0(uint32_t, socket_count * 4);
 -    /* M-level IMSIC node */
      for (cpu = 0; cpu < ms->smp.cpus; cpu++) {
          imsic_cells[cpu * 2 + 0] = cpu_to_be32(intc_phandles[cpu]);
 -        imsic_cells[cpu * 2 + 1] = cpu_to_be32(IRQ_M_EXT);
 +        imsic_cells[cpu * 2 + 1] = cpu_to_be32(m_mode ? IRQ_M_EXT : IRQ_S_EXT);
      }
 -    imsic_max_hart_per_socket = 0;
 -    for (socket = 0; socket < socket_count; socket++) {
 -        imsic_addr = memmap[VIRT_IMSIC_M].base +
 -                     socket * VIRT_IMSIC_GROUP_MAX_SIZE;
 -        imsic_size = IMSIC_HART_SIZE(0) * s->soc[socket].num_harts;
 -        imsic_regs[socket * 4 + 0] = 0;
 -        imsic_regs[socket * 4 + 1] = cpu_to_be32(imsic_addr);
 -        imsic_regs[socket * 4 + 2] = 0;
 -        imsic_regs[socket * 4 + 3] = cpu_to_be32(imsic_size);
 -        if (imsic_max_hart_per_socket < s->soc[socket].num_harts) {
 -            imsic_max_hart_per_socket = s->soc[socket].num_harts;
 -        }
 -    }
 -    imsic_name = g_strdup_printf("/soc/imsics@%lx",
 -        (unsigned long)memmap[VIRT_IMSIC_M].base);
 -    qemu_fdt_add_subnode(ms->fdt, imsic_name);
 -    qemu_fdt_setprop_string(ms->fdt, imsic_name, "compatible",
 -        "riscv,imsics");
 -    qemu_fdt_setprop_cell(ms->fdt, imsic_name, "#interrupt-cells",
 -        FDT_IMSIC_INT_CELLS);
 -    qemu_fdt_setprop(ms->fdt, imsic_name, "interrupt-controller",
 -        NULL, 0);
 -    qemu_fdt_setprop(ms->fdt, imsic_name, "msi-controller",
 -        NULL, 0);
 -    qemu_fdt_setprop(ms->fdt, imsic_name, "interrupts-extended",
 -        imsic_cells, ms->smp.cpus * sizeof(uint32_t) * 2);
 -    qemu_fdt_setprop(ms->fdt, imsic_name, "reg", imsic_regs,
 -        socket_count * sizeof(uint32_t) * 4);
 -    qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,num-ids",
 -        VIRT_IRQCHIP_NUM_MSIS);
 -    if (socket_count > 1) {
 -        qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,hart-index-bits",
 -            imsic_num_bits(imsic_max_hart_per_socket));
 -        qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,group-index-bits",
 -            imsic_num_bits(socket_count));
 -        qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,group-index-shift",
 -            IMSIC_MMIO_GROUP_MIN_SHIFT);
 -    }
 -    qemu_fdt_setprop_cell(ms->fdt, imsic_name, "phandle", *msi_m_phandle);
 -
 -    g_free(imsic_name);
 -    /* S-level IMSIC node */
 -    for (cpu = 0; cpu < ms->smp.cpus; cpu++) {
 -        imsic_cells[cpu * 2 + 0] = cpu_to_be32(intc_phandles[cpu]);
 -        imsic_cells[cpu * 2 + 1] = cpu_to_be32(IRQ_S_EXT);
 -    }
 -    imsic_guest_bits = imsic_num_bits(s->aia_guests + 1);
      imsic_max_hart_per_socket = 0;
      for (socket = 0; socket < socket_count; socket++) {
 -        imsic_addr = memmap[VIRT_IMSIC_S].base +
 -                     socket * VIRT_IMSIC_GROUP_MAX_SIZE;
 +        imsic_addr = base_addr + socket * VIRT_IMSIC_GROUP_MAX_SIZE;
          imsic_size = IMSIC_HART_SIZE(imsic_guest_bits) *
                       s->soc[socket].num_harts;
          imsic_regs[socket * 4 + 0] = 0;
@@ -XXX,XX +XXX,XX @@ static void create_fdt_imsic(RISCVVirtState *s, const MemMapEntry *memmap,
              imsic_max_hart_per_socket = s->soc[socket].num_harts;
          }
      }
 -    imsic_name = g_strdup_printf("/soc/imsics@%lx",
 -        (unsigned long)memmap[VIRT_IMSIC_S].base);
 +
 +    imsic_name = g_strdup_printf("/soc/imsics@%lx", (unsigned long)base_addr);
      qemu_fdt_add_subnode(ms->fdt, imsic_name);
 -    qemu_fdt_setprop_string(ms->fdt, imsic_name, "compatible",
 -        "riscv,imsics");
 +    qemu_fdt_setprop_string(ms->fdt, imsic_name, "compatible", "riscv,imsics");
      qemu_fdt_setprop_cell(ms->fdt, imsic_name, "#interrupt-cells",
 -        FDT_IMSIC_INT_CELLS);
 -    qemu_fdt_setprop(ms->fdt, imsic_name, "interrupt-controller",
 -        NULL, 0);
 -    qemu_fdt_setprop(ms->fdt, imsic_name, "msi-controller",
 -        NULL, 0);
 +                          FDT_IMSIC_INT_CELLS);
 +    qemu_fdt_setprop(ms->fdt, imsic_name, "interrupt-controller", NULL, 0);
 +    qemu_fdt_setprop(ms->fdt, imsic_name, "msi-controller", NULL, 0);
      qemu_fdt_setprop(ms->fdt, imsic_name, "interrupts-extended",
 -        imsic_cells, ms->smp.cpus * sizeof(uint32_t) * 2);
 +                     imsic_cells, ms->smp.cpus * sizeof(uint32_t) * 2);
      qemu_fdt_setprop(ms->fdt, imsic_name, "reg", imsic_regs,
 -        socket_count * sizeof(uint32_t) * 4);
 +                     socket_count * sizeof(uint32_t) * 4);
      qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,num-ids",
 -        VIRT_IRQCHIP_NUM_MSIS);
 +                     VIRT_IRQCHIP_NUM_MSIS);
 +
      if (imsic_guest_bits) {
          qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,guest-index-bits",
 -            imsic_guest_bits);
 +                              imsic_guest_bits);
      }
 +
      if (socket_count > 1) {
          qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,hart-index-bits",
 -            imsic_num_bits(imsic_max_hart_per_socket));
 +                              imsic_num_bits(imsic_max_hart_per_socket));
          qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,group-index-bits",
 -            imsic_num_bits(socket_count));
 +                              imsic_num_bits(socket_count));
          qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,group-index-shift",
 -            IMSIC_MMIO_GROUP_MIN_SHIFT);
 +                              IMSIC_MMIO_GROUP_MIN_SHIFT);
      }
 -    qemu_fdt_setprop_cell(ms->fdt, imsic_name, "phandle", *msi_s_phandle);
 -    g_free(imsic_name);
 +    qemu_fdt_setprop_cell(ms->fdt, imsic_name, "phandle", msi_phandle);
 +    g_free(imsic_name);
      g_free(imsic_regs);
      g_free(imsic_cells);
  }
--int load_uimage_as(const char *filename, hwaddr *ep, hwaddr *loadaddr,
+-static void create_fdt_socket_aplic(RISCVVirtState *s,
--                   int *is_linux,
+-                                    const MemMapEntry *memmap, int socket,
--                   uint64_t (*translate_fn)(void *, uint64_t),
+-                                    uint32_t msi_m_phandle,
--                   void *translate_opaque, AddressSpace *as)
+-                                    uint32_t msi_s_phandle,
-+ssize_t load_uimage_as(const char *filename, hwaddr *ep, hwaddr *loadaddr,
+-                                    uint32_t *phandle,
-+                       int *is_linux,
+-                                    uint32_t *intc_phandles,
-+                       uint64_t (*translate_fn)(void *, uint64_t),
+-                                    uint32_t *aplic_phandles)
-+                       void *translate_opaque, AddressSpace *as)
++static void create_fdt_imsic(RISCVVirtState *s, const MemMapEntry *memmap,
 +                             uint32_t *phandle, uint32_t *intc_phandles,
 +                             uint32_t *msi_m_phandle, uint32_t *msi_s_phandle)
 +{
 +    *msi_m_phandle = (*phandle)++;
 +    *msi_s_phandle = (*phandle)++;
 +
 +    if (!kvm_enabled()) {
 +        /* M-level IMSIC node */
 +        create_fdt_one_imsic(s, memmap[VIRT_IMSIC_M].base, intc_phandles,
 +                             *msi_m_phandle, true, 0);
 +    }
 +
 +    /* S-level IMSIC node */
 +    create_fdt_one_imsic(s, memmap[VIRT_IMSIC_S].base, intc_phandles,
 +                         *msi_s_phandle, false,
 +                         imsic_num_bits(s->aia_guests + 1));
 +
 +}
 +
 +static void create_fdt_one_aplic(RISCVVirtState *s, int socket,
 +                                 unsigned long aplic_addr, uint32_t aplic_size,
 +                                 uint32_t msi_phandle,
 +                                 uint32_t *intc_phandles,
 +                                 uint32_t aplic_phandle,
 +                                 uint32_t aplic_child_phandle,
 +                                 bool m_mode)
  {
-     return load_uboot_image(filename, ep, loadaddr, is_linux, IH_TYPE_KERNEL,
+     int cpu;
-                             translate_fn, translate_opaque, as);
+     char *aplic_name;
      uint32_t *aplic_cells;
 -    unsigned long aplic_addr;
      MachineState *ms = MACHINE(s);
 -    uint32_t aplic_m_phandle, aplic_s_phandle;
 -    aplic_m_phandle = (*phandle)++;
 -    aplic_s_phandle = (*phandle)++;
      aplic_cells = g_new0(uint32_t, s->soc[socket].num_harts * 2);
 -    /* M-level APLIC node */
      for (cpu = 0; cpu < s->soc[socket].num_harts; cpu++) {
          aplic_cells[cpu * 2 + 0] = cpu_to_be32(intc_phandles[cpu]);
 -        aplic_cells[cpu * 2 + 1] = cpu_to_be32(IRQ_M_EXT);
 +        aplic_cells[cpu * 2 + 1] = cpu_to_be32(m_mode ? IRQ_M_EXT : IRQ_S_EXT);
      }
 -    aplic_addr = memmap[VIRT_APLIC_M].base +
 -                 (memmap[VIRT_APLIC_M].size * socket);
 +
      aplic_name = g_strdup_printf("/soc/aplic@%lx", aplic_addr);
      qemu_fdt_add_subnode(ms->fdt, aplic_name);
      qemu_fdt_setprop_string(ms->fdt, aplic_name, "compatible", "riscv,aplic");
      qemu_fdt_setprop_cell(ms->fdt, aplic_name,
 -        "#interrupt-cells", FDT_APLIC_INT_CELLS);
 +                          "#interrupt-cells", FDT_APLIC_INT_CELLS);
      qemu_fdt_setprop(ms->fdt, aplic_name, "interrupt-controller", NULL, 0);
 +
      if (s->aia_type == VIRT_AIA_TYPE_APLIC) {
          qemu_fdt_setprop(ms->fdt, aplic_name, "interrupts-extended",
 -            aplic_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 2);
 +                         aplic_cells,
 +                         s->soc[socket].num_harts * sizeof(uint32_t) * 2);
      } else {
 -        qemu_fdt_setprop_cell(ms->fdt, aplic_name, "msi-parent",
 -            msi_m_phandle);
 +        qemu_fdt_setprop_cell(ms->fdt, aplic_name, "msi-parent", msi_phandle);
      }
 +
      qemu_fdt_setprop_cells(ms->fdt, aplic_name, "reg",
 -        0x0, aplic_addr, 0x0, memmap[VIRT_APLIC_M].size);
 +                           0x0, aplic_addr, 0x0, aplic_size);
      qemu_fdt_setprop_cell(ms->fdt, aplic_name, "riscv,num-sources",
 -        VIRT_IRQCHIP_NUM_SOURCES);
 -    qemu_fdt_setprop_cell(ms->fdt, aplic_name, "riscv,children",
 -        aplic_s_phandle);
 -    qemu_fdt_setprop_cells(ms->fdt, aplic_name, "riscv,delegate",
 -        aplic_s_phandle, 0x1, VIRT_IRQCHIP_NUM_SOURCES);
 +                          VIRT_IRQCHIP_NUM_SOURCES);
 +
 +    if (aplic_child_phandle) {
 +        qemu_fdt_setprop_cell(ms->fdt, aplic_name, "riscv,children",
 +                              aplic_child_phandle);
 +        qemu_fdt_setprop_cells(ms->fdt, aplic_name, "riscv,delegate",
 +                               aplic_child_phandle, 0x1,
 +                               VIRT_IRQCHIP_NUM_SOURCES);
 +    }
 +
      riscv_socket_fdt_write_id(ms, aplic_name, socket);
 -    qemu_fdt_setprop_cell(ms->fdt, aplic_name, "phandle", aplic_m_phandle);
 +    qemu_fdt_setprop_cell(ms->fdt, aplic_name, "phandle", aplic_phandle);
 +
      g_free(aplic_name);
 +    g_free(aplic_cells);
 +}
 -    /* S-level APLIC node */
 -    for (cpu = 0; cpu < s->soc[socket].num_harts; cpu++) {
 -        aplic_cells[cpu * 2 + 0] = cpu_to_be32(intc_phandles[cpu]);
 -        aplic_cells[cpu * 2 + 1] = cpu_to_be32(IRQ_S_EXT);
 +static void create_fdt_socket_aplic(RISCVVirtState *s,
 +                                    const MemMapEntry *memmap, int socket,
 +                                    uint32_t msi_m_phandle,
 +                                    uint32_t msi_s_phandle,
 +                                    uint32_t *phandle,
 +                                    uint32_t *intc_phandles,
 +                                    uint32_t *aplic_phandles)
 +{
 +    char *aplic_name;
 +    unsigned long aplic_addr;
 +    MachineState *ms = MACHINE(s);
 +    uint32_t aplic_m_phandle, aplic_s_phandle;
 +
 +    aplic_m_phandle = (*phandle)++;
 +    aplic_s_phandle = (*phandle)++;
 +
 +    if (!kvm_enabled()) {
 +        /* M-level APLIC node */
 +        aplic_addr = memmap[VIRT_APLIC_M].base +
 +                     (memmap[VIRT_APLIC_M].size * socket);
 +        create_fdt_one_aplic(s, socket, aplic_addr, memmap[VIRT_APLIC_M].size,
 +                             msi_m_phandle, intc_phandles,
 +                             aplic_m_phandle, aplic_s_phandle,
 +                             true);
      }
 +
 +    /* S-level APLIC node */
      aplic_addr = memmap[VIRT_APLIC_S].base +
                   (memmap[VIRT_APLIC_S].size * socket);
 +    create_fdt_one_aplic(s, socket, aplic_addr, memmap[VIRT_APLIC_S].size,
 +                         msi_s_phandle, intc_phandles,
 +                         aplic_s_phandle, 0,
 +                         false);
 +
      aplic_name = g_strdup_printf("/soc/aplic@%lx", aplic_addr);
 -    qemu_fdt_add_subnode(ms->fdt, aplic_name);
 -    qemu_fdt_setprop_string(ms->fdt, aplic_name, "compatible", "riscv,aplic");
 -    qemu_fdt_setprop_cell(ms->fdt, aplic_name,
 -        "#interrupt-cells", FDT_APLIC_INT_CELLS);
 -    qemu_fdt_setprop(ms->fdt, aplic_name, "interrupt-controller", NULL, 0);
 -    if (s->aia_type == VIRT_AIA_TYPE_APLIC) {
 -        qemu_fdt_setprop(ms->fdt, aplic_name, "interrupts-extended",
 -            aplic_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 2);
 -    } else {
 -        qemu_fdt_setprop_cell(ms->fdt, aplic_name, "msi-parent",
 -            msi_s_phandle);
 -    }
 -    qemu_fdt_setprop_cells(ms->fdt, aplic_name, "reg",
 -        0x0, aplic_addr, 0x0, memmap[VIRT_APLIC_S].size);
 -    qemu_fdt_setprop_cell(ms->fdt, aplic_name, "riscv,num-sources",
 -        VIRT_IRQCHIP_NUM_SOURCES);
 -    riscv_socket_fdt_write_id(ms, aplic_name, socket);
 -    qemu_fdt_setprop_cell(ms->fdt, aplic_name, "phandle", aplic_s_phandle);
      if (!socket) {
          platform_bus_add_all_fdt_nodes(ms->fdt, aplic_name,
@@ -XXX,XX +XXX,XX @@ static void create_fdt_socket_aplic(RISCVVirtState *s,
      g_free(aplic_name);
 -    g_free(aplic_cells);
      aplic_phandles[socket] = aplic_s_phandle;
  }
- /* Load a ramdisk.  */
+@@ -XXX,XX +XXX,XX @@ static DeviceState *virt_create_aia(RISCVVirtAIAType aia_type, int aia_guests,
--int load_ramdisk(const char *filename, hwaddr addr, uint64_t max_sz)
+     int i;
-+ssize_t load_ramdisk(const char *filename, hwaddr addr, uint64_t max_sz)
+     hwaddr addr;
- {
+     uint32_t guest_bits;
-     return load_ramdisk_as(filename, addr, max_sz, NULL);
+-    DeviceState *aplic_m;
 -    bool msimode = (aia_type == VIRT_AIA_TYPE_APLIC_IMSIC) ? true : false;
 +    DeviceState *aplic_s = NULL;
 +    DeviceState *aplic_m = NULL;
 +    bool msimode = aia_type == VIRT_AIA_TYPE_APLIC_IMSIC;
      if (msimode) {
 -        /* Per-socket M-level IMSICs */
 -        addr = memmap[VIRT_IMSIC_M].base + socket * VIRT_IMSIC_GROUP_MAX_SIZE;
 -        for (i = 0; i < hart_count; i++) {
 -            riscv_imsic_create(addr + i * IMSIC_HART_SIZE(0),
 -                               base_hartid + i, true, 1,
 -                               VIRT_IRQCHIP_NUM_MSIS);
 +        if (!kvm_enabled()) {
 +            /* Per-socket M-level IMSICs */
 +            addr = memmap[VIRT_IMSIC_M].base +
 +                   socket * VIRT_IMSIC_GROUP_MAX_SIZE;
 +            for (i = 0; i < hart_count; i++) {
 +                riscv_imsic_create(addr + i * IMSIC_HART_SIZE(0),
 +                                   base_hartid + i, true, 1,
 +                                   VIRT_IRQCHIP_NUM_MSIS);
 +            }
          }
          /* Per-socket S-level IMSICs */
@@ -XXX,XX +XXX,XX @@ static DeviceState *virt_create_aia(RISCVVirtAIAType aia_type, int aia_guests,
          }
      }
 -    /* Per-socket M-level APLIC */
 -    aplic_m = riscv_aplic_create(
 -        memmap[VIRT_APLIC_M].base + socket * memmap[VIRT_APLIC_M].size,
 -        memmap[VIRT_APLIC_M].size,
 -        (msimode) ? 0 : base_hartid,
 -        (msimode) ? 0 : hart_count,
 -        VIRT_IRQCHIP_NUM_SOURCES,
 -        VIRT_IRQCHIP_NUM_PRIO_BITS,
 -        msimode, true, NULL);
 -
 -    if (aplic_m) {
 -        /* Per-socket S-level APLIC */
 -        riscv_aplic_create(
 -            memmap[VIRT_APLIC_S].base + socket * memmap[VIRT_APLIC_S].size,
 -            memmap[VIRT_APLIC_S].size,
 -            (msimode) ? 0 : base_hartid,
 -            (msimode) ? 0 : hart_count,
 -            VIRT_IRQCHIP_NUM_SOURCES,
 -            VIRT_IRQCHIP_NUM_PRIO_BITS,
 -            msimode, false, aplic_m);
 +    if (!kvm_enabled()) {
 +        /* Per-socket M-level APLIC */
 +        aplic_m = riscv_aplic_create(memmap[VIRT_APLIC_M].base +
 +                                     socket * memmap[VIRT_APLIC_M].size,
 +                                     memmap[VIRT_APLIC_M].size,
 +                                     (msimode) ? 0 : base_hartid,
 +                                     (msimode) ? 0 : hart_count,
 +                                     VIRT_IRQCHIP_NUM_SOURCES,
 +                                     VIRT_IRQCHIP_NUM_PRIO_BITS,
 +                                     msimode, true, NULL);
      }
 -    return aplic_m;
 +    /* Per-socket S-level APLIC */
 +    aplic_s = riscv_aplic_create(memmap[VIRT_APLIC_S].base +
 +                                 socket * memmap[VIRT_APLIC_S].size,
 +                                 memmap[VIRT_APLIC_S].size,
 +                                 (msimode) ? 0 : base_hartid,
 +                                 (msimode) ? 0 : hart_count,
 +                                 VIRT_IRQCHIP_NUM_SOURCES,
 +                                 VIRT_IRQCHIP_NUM_PRIO_BITS,
 +                                 msimode, false, aplic_m);
 +
 +    return kvm_enabled() ? aplic_s : aplic_m;
  }
--int load_ramdisk_as(const char *filename, hwaddr addr, uint64_t max_sz,
+ static void create_platform_bus(RISCVVirtState *s, DeviceState *irqchip)
 -                    AddressSpace *as)
 +ssize_t load_ramdisk_as(const char *filename, hwaddr addr, uint64_t max_sz,
 +                        AddressSpace *as)
  {
      return load_uboot_image(filename, NULL, &addr, NULL, IH_TYPE_RAMDISK,
                              NULL, NULL, as);
  }
  /* Load a gzip-compressed kernel to a dynamically allocated buffer. */
 -int load_image_gzipped_buffer(const char *filename, uint64_t max_sz,
 -                              uint8_t **buffer)
 +ssize_t load_image_gzipped_buffer(const char *filename, uint64_t max_sz,
 +                                  uint8_t **buffer)
  {
      uint8_t *compressed_data = NULL;
      uint8_t *data = NULL;
@@ -XXX,XX +XXX,XX @@ int load_image_gzipped_buffer(const char *filename, uint64_t max_sz,
  }
  /* Load a gzip-compressed kernel. */
 -int load_image_gzipped(const char *filename, hwaddr addr, uint64_t max_sz)
 +ssize_t load_image_gzipped(const char *filename, hwaddr addr, uint64_t max_sz)
  {
 -    int bytes;
 +    ssize_t bytes;
      uint8_t *data;
      bytes = load_image_gzipped_buffer(filename, max_sz, &data);
@@ -XXX,XX +XXX,XX @@ static void *rom_set_mr(Rom *rom, Object *owner, const char *name, bool ro)
      return data;
  }
 -int rom_add_file(const char *file, const char *fw_dir,
 -                 hwaddr addr, int32_t bootindex,
 -                 bool option_rom, MemoryRegion *mr,
 -                 AddressSpace *as)
 +ssize_t rom_add_file(const char *file, const char *fw_dir,
 +                     hwaddr addr, int32_t bootindex,
 +                     bool option_rom, MemoryRegion *mr,
 +                     AddressSpace *as)
  {
      MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
      Rom *rom;
 -    int rc, fd = -1;
 +    ssize_t rc;
 +    int fd = -1;
      char devpath[100];
      if (as && mr) {
@@ -XXX,XX +XXX,XX @@ int rom_add_file(const char *file, const char *fw_dir,
      lseek(fd, 0, SEEK_SET);
      rc = read(fd, rom->data, rom->datasize);
      if (rc != rom->datasize) {
 -        fprintf(stderr, "rom: file %-20s: read error: rc=%d (expected %zd)\n",
 +        fprintf(stderr, "rom: file %-20s: read error: rc=%zd (expected %zd)\n",
                  rom->name, rc, rom->datasize);
          goto err;
      }
@@ -XXX,XX +XXX,XX @@ int rom_add_elf_program(const char *name, GMappedFile *mapped_file, void *data,
      return 0;
  }
 -int rom_add_vga(const char *file)
 +ssize_t rom_add_vga(const char *file)
  {
      return rom_add_file(file, "vgaroms", 0, -1, true, NULL, NULL);
  }
 -int rom_add_option(const char *file, int32_t bootindex)
 +ssize_t rom_add_option(const char *file, int32_t bootindex)
  {
      return rom_add_file(file, "genroms", 0, bootindex, true, NULL, NULL);
  }
@@ -XXX,XX +XXX,XX @@ out:
  }
  /* return size or -1 if error */
 -int load_targphys_hex_as(const char *filename, hwaddr *entry, AddressSpace *as)
 +ssize_t load_targphys_hex_as(const char *filename, hwaddr *entry,
 +                             AddressSpace *as)
  {
      gsize hex_blob_size;
      gchar *hex_blob;
 -    int total_size = 0;
 +    ssize_t total_size = 0;
      if (!g_file_get_contents(filename, &hex_blob, &hex_blob_size, NULL)) {
          return -1;
 diff --git a/hw/i386/x86.c b/hw/i386/x86.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/i386/x86.c
 +++ b/hw/i386/x86.c
@@ -XXX,XX +XXX,XX @@ void x86_bios_rom_init(MachineState *ms, const char *default_firmware,
      char *filename;
      MemoryRegion *bios, *isa_bios;
      int bios_size, isa_bios_size;
 -    int ret;
 +    ssize_t ret;
      /* BIOS load */
      bios_name = ms->firmware ?: default_firmware;
 diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/riscv/boot.c
 +++ b/hw/riscv/boot.c
@@ -XXX,XX +XXX,XX @@ target_ulong riscv_load_firmware(const char *firmware_filename,
                                   hwaddr firmware_load_addr,
                                   symbol_fn_t sym_cb)
  {
 -    uint64_t firmware_entry, firmware_size, firmware_end;
 +    uint64_t firmware_entry, firmware_end;
 +    ssize_t firmware_size;
      if (load_elf_ram_sym(firmware_filename, NULL, NULL, NULL,
                           &firmware_entry, NULL, &firmware_end, NULL,
@@ -XXX,XX +XXX,XX @@ target_ulong riscv_load_kernel(const char *kernel_filename,
  hwaddr riscv_load_initrd(const char *filename, uint64_t mem_size,
                           uint64_t kernel_entry, hwaddr *start)
  {
 -    int size;
 +    ssize_t size;
      /*
       * We want to put the initrd far enough into RAM that when the
 --
-.36.1
+.41.0

-New patch
+[PULL v2 30/45] target/riscv: check the in-kernel irqchip support
+From: Yong-Xuan Wang <yongxuan.wang@sifive.com>
+We check the in-kernel irqchip support when using KVM acceleration.
+Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
+Reviewed-by: Jim Shu <jim.shu@sifive.com>
+Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
+Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
+Message-ID: <20230727102439.22554-3-yongxuan.wang@sifive.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ target/riscv/kvm.c | 10 +++++++++-
+file changed, 9 insertions(+), 1 deletion(-)
+diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/kvm.c
++++ b/target/riscv/kvm.c
+@@ -XXX,XX +XXX,XX @@ int kvm_arch_init(MachineState *ms, KVMState *s)
+ int kvm_arch_irqchip_create(KVMState *s)
+ {
+-    return 0;
++    if (kvm_kernel_irqchip_split()) {
++        error_report("-machine kernel_irqchip=split is not supported on RISC-V.");
++        exit(1);
++    }
++
++    /*
++     * We can create the VAIA using the newer device control API.
++     */
++    return kvm_check_extension(s, KVM_CAP_DEVICE_CTRL);
+ }
+ int kvm_arch_process_async_events(CPUState *cs)
+--
+.41.0

-[PULL 17/25] target/riscv: rvv: Add tail agnostic for vector integer merge and move instructions
+[PULL v2 31/45] target/riscv: Create an KVM AIA irqchip
-From: eopXD <yueh.ting.chen@gmail.com>
+From: Yong-Xuan Wang <yongxuan.wang@sifive.com>
-Signed-off-by: eop Chen <eop.chen@sifive.com>
+We create a vAIA chip by using the KVM_DEV_TYPE_RISCV_AIA and then set up
-Reviewed-by: Frank Chang <frank.chang@sifive.com>
+the chip with the KVM_DEV_RISCV_AIA_GRP_* APIs.
-Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
+We also extend KVM accelerator to specify the KVM AIA mode. The "riscv-aia"
-Acked-by: Alistair Francis <alistair.francis@wdc.com>
+parameter is passed along with --accel in QEMU command-line.
-Message-Id: <165449614532.19704.7000832880482980398-10@git.sr.ht>
+) "riscv-aia=emul": IMSIC is emulated by hypervisor
 ) "riscv-aia=hwaccel": use hardware guest IMSIC
 ) "riscv-aia=auto": use the hardware guest IMSICs whenever available
                      otherwise we fallback to software emulation.
 Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
 Reviewed-by: Jim Shu <jim.shu@sifive.com>
 Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
 Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
 Message-ID: <20230727102439.22554-4-yongxuan.wang@sifive.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/vector_helper.c            | 20 ++++++++++++++++++++
+ target/riscv/kvm_riscv.h |   4 +
- target/riscv/insn_trans/trans_rvv.c.inc | 12 ++++++++----
+ target/riscv/kvm.c       | 186 +++++++++++++++++++++++++++++++++++++++
-files changed, 28 insertions(+), 4 deletions(-)
+files changed, 190 insertions(+)
-diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
+diff --git a/target/riscv/kvm_riscv.h b/target/riscv/kvm_riscv.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/riscv/vector_helper.c
+--- a/target/riscv/kvm_riscv.h
-+++ b/target/riscv/vector_helper.c
++++ b/target/riscv/kvm_riscv.h
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vs1, CPURISCVState *env,           \
+@@ -XXX,XX +XXX,XX @@
-                   uint32_t desc)                                     \
+ void kvm_riscv_init_user_properties(Object *cpu_obj);
- {                                                                    \
+ void kvm_riscv_reset_vcpu(RISCVCPU *cpu);
-     uint32_t vl = env->vl;                                           \
+ void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level);
-+    uint32_t esz = sizeof(ETYPE);                                    \
++void kvm_riscv_aia_create(MachineState *machine, uint64_t group_shift,
-+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);     \
++                          uint64_t aia_irq_num, uint64_t aia_msi_num,
-+    uint32_t vta = vext_vta(desc);                                   \
++                          uint64_t aplic_base, uint64_t imsic_base,
-     uint32_t i;                                                      \
++                          uint64_t guest_num);
-                                                                      \
-     for (i = env->vstart; i < vl; i++) {                             \
+ #endif
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vs1, CPURISCVState *env,           \
+diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
-         *((ETYPE *)vd + H(i)) = s1;                                  \
+index XXXXXXX..XXXXXXX 100644
-     }                                                                \
+--- a/target/riscv/kvm.c
-     env->vstart = 0;                                                 \
++++ b/target/riscv/kvm.c
-+    /* set tail elements to 1s */                                    \
+@@ -XXX,XX +XXX,XX @@
-+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);         \
+ #include "exec/address-spaces.h"
  #include "hw/boards.h"
  #include "hw/irq.h"
 +#include "hw/intc/riscv_imsic.h"
  #include "qemu/log.h"
  #include "hw/loader.h"
  #include "kvm_riscv.h"
@@ -XXX,XX +XXX,XX @@
  #include "chardev/char-fe.h"
  #include "migration/migration.h"
  #include "sysemu/runstate.h"
 +#include "hw/riscv/numa.h"
  static uint64_t kvm_riscv_reg_id(CPURISCVState *env, uint64_t type,
                                   uint64_t idx)
@@ -XXX,XX +XXX,XX @@ bool kvm_arch_cpu_check_are_resettable(void)
      return true;
  }
- GEN_VEXT_VMV_VV(vmv_v_v_b, int8_t,  H1)
++static int aia_mode;
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, uint64_t s1, CPURISCVState *env,         \
++
-                   uint32_t desc)                                     \
++static const char *kvm_aia_mode_str(uint64_t mode)
- {                                                                    \
++{
-     uint32_t vl = env->vl;                                           \
++    switch (mode) {
-+    uint32_t esz = sizeof(ETYPE);                                    \
++    case KVM_DEV_RISCV_AIA_MODE_EMUL:
-+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);     \
++        return "emul";
-+    uint32_t vta = vext_vta(desc);                                   \
++    case KVM_DEV_RISCV_AIA_MODE_HWACCEL:
-     uint32_t i;                                                      \
++        return "hwaccel";
-                                                                      \
++    case KVM_DEV_RISCV_AIA_MODE_AUTO:
-     for (i = env->vstart; i < vl; i++) {                             \
++    default:
-         *((ETYPE *)vd + H(i)) = (ETYPE)s1;                           \
++        return "auto";
-     }                                                                \
++    };
-     env->vstart = 0;                                                 \
++}
-+    /* set tail elements to 1s */                                    \
++
-+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);         \
++static char *riscv_get_kvm_aia(Object *obj, Error **errp)
 +{
 +    return g_strdup(kvm_aia_mode_str(aia_mode));
 +}
 +
 +static void riscv_set_kvm_aia(Object *obj, const char *val, Error **errp)
 +{
 +    if (!strcmp(val, "emul")) {
 +        aia_mode = KVM_DEV_RISCV_AIA_MODE_EMUL;
 +    } else if (!strcmp(val, "hwaccel")) {
 +        aia_mode = KVM_DEV_RISCV_AIA_MODE_HWACCEL;
 +    } else if (!strcmp(val, "auto")) {
 +        aia_mode = KVM_DEV_RISCV_AIA_MODE_AUTO;
 +    } else {
 +        error_setg(errp, "Invalid KVM AIA mode");
 +        error_append_hint(errp, "Valid values are emul, hwaccel, and auto.\n");
 +    }
 +}
 +
  void kvm_arch_accel_class_init(ObjectClass *oc)
  {
 +    object_class_property_add_str(oc, "riscv-aia", riscv_get_kvm_aia,
 +                                  riscv_set_kvm_aia);
 +    object_class_property_set_description(oc, "riscv-aia",
 +                                          "Set KVM AIA mode. Valid values are "
 +                                          "emul, hwaccel, and auto. Default "
 +                                          "is auto.");
 +    object_property_set_default_str(object_class_property_find(oc, "riscv-aia"),
 +                                    "auto");
 +}
 +
 +void kvm_riscv_aia_create(MachineState *machine, uint64_t group_shift,
 +                          uint64_t aia_irq_num, uint64_t aia_msi_num,
 +                          uint64_t aplic_base, uint64_t imsic_base,
 +                          uint64_t guest_num)
 +{
 +    int ret, i;
 +    int aia_fd = -1;
 +    uint64_t default_aia_mode;
 +    uint64_t socket_count = riscv_socket_count(machine);
 +    uint64_t max_hart_per_socket = 0;
 +    uint64_t socket, base_hart, hart_count, socket_imsic_base, imsic_addr;
 +    uint64_t socket_bits, hart_bits, guest_bits;
 +
 +    aia_fd = kvm_create_device(kvm_state, KVM_DEV_TYPE_RISCV_AIA, false);
 +
 +    if (aia_fd < 0) {
 +        error_report("Unable to create in-kernel irqchip");
 +        exit(1);
 +    }
 +
 +    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
 +                            KVM_DEV_RISCV_AIA_CONFIG_MODE,
 +                            &default_aia_mode, false, NULL);
 +    if (ret < 0) {
 +        error_report("KVM AIA: failed to get current KVM AIA mode");
 +        exit(1);
 +    }
 +    qemu_log("KVM AIA: default mode is %s\n",
 +             kvm_aia_mode_str(default_aia_mode));
 +
 +    if (default_aia_mode != aia_mode) {
 +        ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
 +                                KVM_DEV_RISCV_AIA_CONFIG_MODE,
 +                                &aia_mode, true, NULL);
 +        if (ret < 0)
 +            warn_report("KVM AIA: failed to set KVM AIA mode");
 +        else
 +            qemu_log("KVM AIA: set current mode to %s\n",
 +                     kvm_aia_mode_str(aia_mode));
 +    }
 +
 +    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
 +                            KVM_DEV_RISCV_AIA_CONFIG_SRCS,
 +                            &aia_irq_num, true, NULL);
 +    if (ret < 0) {
 +        error_report("KVM AIA: failed to set number of input irq lines");
 +        exit(1);
 +    }
 +
 +    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
 +                            KVM_DEV_RISCV_AIA_CONFIG_IDS,
 +                            &aia_msi_num, true, NULL);
 +    if (ret < 0) {
 +        error_report("KVM AIA: failed to set number of msi");
 +        exit(1);
 +    }
 +
 +    socket_bits = find_last_bit(&socket_count, BITS_PER_LONG) + 1;
 +    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
 +                            KVM_DEV_RISCV_AIA_CONFIG_GROUP_BITS,
 +                            &socket_bits, true, NULL);
 +    if (ret < 0) {
 +        error_report("KVM AIA: failed to set group_bits");
 +        exit(1);
 +    }
 +
 +    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
 +                            KVM_DEV_RISCV_AIA_CONFIG_GROUP_SHIFT,
 +                            &group_shift, true, NULL);
 +    if (ret < 0) {
 +        error_report("KVM AIA: failed to set group_shift");
 +        exit(1);
 +    }
 +
 +    guest_bits = guest_num == 0 ? 0 :
 +                 find_last_bit(&guest_num, BITS_PER_LONG) + 1;
 +    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
 +                            KVM_DEV_RISCV_AIA_CONFIG_GUEST_BITS,
 +                            &guest_bits, true, NULL);
 +    if (ret < 0) {
 +        error_report("KVM AIA: failed to set guest_bits");
 +        exit(1);
 +    }
 +
 +    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_ADDR,
 +                            KVM_DEV_RISCV_AIA_ADDR_APLIC,
 +                            &aplic_base, true, NULL);
 +    if (ret < 0) {
 +        error_report("KVM AIA: failed to set the base address of APLIC");
 +        exit(1);
 +    }
 +
 +    for (socket = 0; socket < socket_count; socket++) {
 +        socket_imsic_base = imsic_base + socket * (1U << group_shift);
 +        hart_count = riscv_socket_hart_count(machine, socket);
 +        base_hart = riscv_socket_first_hartid(machine, socket);
 +
 +        if (max_hart_per_socket < hart_count) {
 +            max_hart_per_socket = hart_count;
 +        }
 +
 +        for (i = 0; i < hart_count; i++) {
 +            imsic_addr = socket_imsic_base + i * IMSIC_HART_SIZE(guest_bits);
 +            ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_ADDR,
 +                                    KVM_DEV_RISCV_AIA_ADDR_IMSIC(i + base_hart),
 +                                    &imsic_addr, true, NULL);
 +            if (ret < 0) {
 +                error_report("KVM AIA: failed to set the IMSIC address for hart %d", i);
 +                exit(1);
 +            }
 +        }
 +    }
 +
 +    hart_bits = find_last_bit(&max_hart_per_socket, BITS_PER_LONG) + 1;
 +    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
 +                            KVM_DEV_RISCV_AIA_CONFIG_HART_BITS,
 +                            &hart_bits, true, NULL);
 +    if (ret < 0) {
 +        error_report("KVM AIA: failed to set hart_bits");
 +        exit(1);
 +    }
 +
 +    if (kvm_has_gsi_routing()) {
 +        for (uint64_t idx = 0; idx < aia_irq_num + 1; ++idx) {
 +            /* KVM AIA only has one APLIC instance */
 +            kvm_irqchip_add_irq_route(kvm_state, idx, 0, idx);
 +        }
 +        kvm_gsi_routing_allowed = true;
 +        kvm_irqchip_commit_routes(kvm_state);
 +    }
 +
 +    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CTRL,
 +                            KVM_DEV_RISCV_AIA_CTRL_INIT,
 +                            NULL, true, NULL);
 +    if (ret < 0) {
 +        error_report("KVM AIA: initialized fail");
 +        exit(1);
 +    }
 +
 +    kvm_msi_via_irqfd_allowed = kvm_irqfds_enabled();
  }
- GEN_VEXT_VMV_VX(vmv_v_x_b, int8_t,  H1)
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,          \
-                   CPURISCVState *env, uint32_t desc)                 \
- {                                                                    \
-     uint32_t vl = env->vl;                                           \
-+    uint32_t esz = sizeof(ETYPE);                                    \
-+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);     \
-+    uint32_t vta = vext_vta(desc);                                   \
-     uint32_t i;                                                      \
-                                                                      \
-     for (i = env->vstart; i < vl; i++) {                             \
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,          \
-         *((ETYPE *)vd + H(i)) = *(vt + H(i));                        \
-     }                                                                \
-     env->vstart = 0;                                                 \
-+    /* set tail elements to 1s */                                    \
-+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);         \
- }
- GEN_VEXT_VMERGE_VV(vmerge_vvm_b, int8_t,  H1)
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,               \
-                   void *vs2, CPURISCVState *env, uint32_t desc)      \
- {                                                                    \
-     uint32_t vl = env->vl;                                           \
-+    uint32_t esz = sizeof(ETYPE);                                    \
-+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);     \
-+    uint32_t vta = vext_vta(desc);                                   \
-     uint32_t i;                                                      \
-                                                                      \
-     for (i = env->vstart; i < vl; i++) {                             \
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,               \
-         *((ETYPE *)vd + H(i)) = d;                                   \
-     }                                                                \
-     env->vstart = 0;                                                 \
-+    /* set tail elements to 1s */                                    \
-+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);         \
- }
- GEN_VEXT_VMERGE_VX(vmerge_vxm_b, int8_t,  H1)
-diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/target/riscv/insn_trans/trans_rvv.c.inc
-+++ b/target/riscv/insn_trans/trans_rvv.c.inc
-@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_v(DisasContext *s, arg_vmv_v_v *a)
-         vext_check_isa_ill(s) &&
-         /* vmv.v.v has rs2 = 0 and vm = 1 */
-         vext_check_sss(s, a->rd, a->rs1, 0, 1)) {
--        if (s->vl_eq_vlmax) {
-+        if (s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
-             tcg_gen_gvec_mov(s->sew, vreg_ofs(s, a->rd),
-                              vreg_ofs(s, a->rs1),
-                              MAXSZ(s), MAXSZ(s));
-         } else {
-             uint32_t data = FIELD_DP32(0, VDATA, LMUL, s->lmul);
-+            data = FIELD_DP32(data, VDATA, VTA, s->vta);
-             static gen_helper_gvec_2_ptr * const fns[4] = {
-                 gen_helper_vmv_v_v_b, gen_helper_vmv_v_v_h,
-                 gen_helper_vmv_v_v_w, gen_helper_vmv_v_v_d,
-@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_x(DisasContext *s, arg_vmv_v_x *a)
-         s1 = get_gpr(s, a->rs1, EXT_SIGN);
--        if (s->vl_eq_vlmax) {
-+        if (s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
-             tcg_gen_gvec_dup_tl(s->sew, vreg_ofs(s, a->rd),
-                                 MAXSZ(s), MAXSZ(s), s1);
-         } else {
-@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_x(DisasContext *s, arg_vmv_v_x *a)
-             TCGv_i64 s1_i64 = tcg_temp_new_i64();
-             TCGv_ptr dest = tcg_temp_new_ptr();
-             uint32_t data = FIELD_DP32(0, VDATA, LMUL, s->lmul);
-+            data = FIELD_DP32(data, VDATA, VTA, s->vta);
-             static gen_helper_vmv_vx * const fns[4] = {
-                 gen_helper_vmv_v_x_b, gen_helper_vmv_v_x_h,
-                 gen_helper_vmv_v_x_w, gen_helper_vmv_v_x_d,
-@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
-         /* vmv.v.i has rs2 = 0 and vm = 1 */
-         vext_check_ss(s, a->rd, 0, 1)) {
-         int64_t simm = sextract64(a->rs1, 0, 5);
--        if (s->vl_eq_vlmax) {
-+        if (s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
-             tcg_gen_gvec_dup_imm(s->sew, vreg_ofs(s, a->rd),
-                                  MAXSZ(s), MAXSZ(s), simm);
-             mark_vs_dirty(s);
-@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
-             TCGv_i64 s1;
-             TCGv_ptr dest;
-             uint32_t data = FIELD_DP32(0, VDATA, LMUL, s->lmul);
-+            data = FIELD_DP32(data, VDATA, VTA, s->vta);
-             static gen_helper_vmv_vx * const fns[4] = {
-                 gen_helper_vmv_v_x_b, gen_helper_vmv_v_x_h,
-                 gen_helper_vmv_v_x_w, gen_helper_vmv_v_x_d,
-@@ -XXX,XX +XXX,XX @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f *a)
-         TCGv_i64 t1;
--        if (s->vl_eq_vlmax) {
-+        if (s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
-             t1 = tcg_temp_new_i64();
-             /* NaN-box f[rs1] */
-             do_nanbox(s, t1, cpu_fpr[a->rs1]);
-@@ -XXX,XX +XXX,XX @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f *a)
-             TCGv_ptr dest;
-             TCGv_i32 desc;
-             uint32_t data = FIELD_DP32(0, VDATA, LMUL, s->lmul);
-+            data = FIELD_DP32(data, VDATA, VTA, s->vta);
-             static gen_helper_vmv_vx * const fns[3] = {
-                 gen_helper_vmv_v_x_h,
-                 gen_helper_vmv_v_x_w,
 --
-.36.1
+.41.0

-[PULL 14/25] target/riscv: rvv: Add tail agnostic for vx, vvm, vxm instructions
+[PULL v2 32/45] target/riscv: update APLIC and IMSIC to support KVM AIA
-From: eopXD <yueh.ting.chen@gmail.com>
+From: Yong-Xuan Wang <yongxuan.wang@sifive.com>
-`vmadc` and `vmsbc` produces a mask value, they always operate with
+KVM AIA can't emulate APLIC only. When "aia=aplic" parameter is passed,
-a tail agnostic policy.
+APLIC devices is emulated by QEMU. For "aia=aplic-imsic", remove the
 mmio operations of APLIC when using KVM AIA and send wired interrupt
 signal via KVM_IRQ_LINE API.
 After KVM AIA enabled, MSI messages are delivered by KVM_SIGNAL_MSI API
 when the IMSICs receive mmio write requests.
-Signed-off-by: eop Chen <eop.chen@sifive.com>
+Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
-Reviewed-by: Frank Chang <frank.chang@sifive.com>
+Reviewed-by: Jim Shu <jim.shu@sifive.com>
-Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
+Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
-Acked-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
-Message-Id: <165449614532.19704.7000832880482980398-7@git.sr.ht>
+Message-ID: <20230727102439.22554-5-yongxuan.wang@sifive.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/internals.h                |   5 +-
+ hw/intc/riscv_aplic.c | 56 ++++++++++++++++++++++++++++++-------------
- target/riscv/vector_helper.c            | 314 +++++++++++++-----------
+ hw/intc/riscv_imsic.c | 25 +++++++++++++++----
- target/riscv/insn_trans/trans_rvv.c.inc |  13 +-
+files changed, 61 insertions(+), 20 deletions(-)
 files changed, 190 insertions(+), 142 deletions(-)
-diff --git a/target/riscv/internals.h b/target/riscv/internals.h
+diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/riscv/internals.h
+--- a/hw/intc/riscv_aplic.c
-+++ b/target/riscv/internals.h
++++ b/hw/intc/riscv_aplic.c
 @@ -XXX,XX +XXX,XX @@
- FIELD(VDATA, VM, 0, 1)
+ #include "hw/irq.h"
- FIELD(VDATA, LMUL, 1, 3)
+ #include "target/riscv/cpu.h"
- FIELD(VDATA, VTA, 4, 1)
+ #include "sysemu/sysemu.h"
--FIELD(VDATA, NF, 5, 4)
++#include "sysemu/kvm.h"
--FIELD(VDATA, WD, 5, 1)
+ #include "migration/vmstate.h"
-+FIELD(VDATA, VTA_ALL_1S, 5, 1)
-+FIELD(VDATA, NF, 6, 4)
+ #define APLIC_MAX_IDC                  (1UL << 14)
-+FIELD(VDATA, WD, 6, 1)
+@@ -XXX,XX +XXX,XX @@
- /* float point classify helpers */
+ #define APLIC_IDC_CLAIMI               0x1c
- target_ulong fclass_h(uint64_t frs1);
-diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
++/*
-index XXXXXXX..XXXXXXX 100644
++ * KVM AIA only supports APLIC MSI, fallback to QEMU emulation if we want to use
---- a/target/riscv/vector_helper.c
++ * APLIC Wired.
-+++ b/target/riscv/vector_helper.c
++ */
-@@ -XXX,XX +XXX,XX @@ static inline uint32_t vext_vta(uint32_t desc)
++static bool is_kvm_aia(bool msimode)
      return FIELD_EX32(simd_data(desc), VDATA, VTA);
  }
 +static inline uint32_t vext_vta_all_1s(uint32_t desc)
 +{
-+    return FIELD_EX32(simd_data(desc), VDATA, VTA_ALL_1S);
++    return kvm_irqchip_in_kernel() && msimode;
 +}
 +
- /*
+ static uint32_t riscv_aplic_read_input_word(RISCVAPLICState *aplic,
-  * Get the maximum number of elements can be operated.
+                                             uint32_t word)
   *
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vrsub_vx_d, OP_SSS_D, H8, H8, DO_RSUB)
  static void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
                         CPURISCVState *env, uint32_t desc,
 -                       opivx2_fn fn)
 +                       opivx2_fn fn, uint32_t esz)
  {
-     uint32_t vm = vext_vm(desc);
+@@ -XXX,XX +XXX,XX @@ static uint32_t riscv_aplic_idc_claimi(RISCVAPLICState *aplic, uint32_t idc)
-     uint32_t vl = env->vl;
+     return topi;
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
 +    uint32_t vta = vext_vta(desc);
      uint32_t i;
      for (i = env->vstart; i < vl; i++) {
@@ -XXX,XX +XXX,XX @@ static void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
          fn(vd, s1, vs2, i);
      }
      env->vstart = 0;
 +    /* set tail elements to 1s */
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
  }
- /* generate the helpers for OPIVX */
++static void riscv_kvm_aplic_request(void *opaque, int irq, int level)
--#define GEN_VEXT_VX(NAME)                                 \
++{
-+#define GEN_VEXT_VX(NAME, ESZ)                            \
++    kvm_set_irq(kvm_state, irq, !!level);
  void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
                    void *vs2, CPURISCVState *env,          \
                    uint32_t desc)                          \
  {                                                         \
      do_vext_vx(vd, v0, s1, vs2, env, desc,                \
 -               do_##NAME);                                \
 -}
 -
 -GEN_VEXT_VX(vadd_vx_b)
 -GEN_VEXT_VX(vadd_vx_h)
 -GEN_VEXT_VX(vadd_vx_w)
 -GEN_VEXT_VX(vadd_vx_d)
 -GEN_VEXT_VX(vsub_vx_b)
 -GEN_VEXT_VX(vsub_vx_h)
 -GEN_VEXT_VX(vsub_vx_w)
 -GEN_VEXT_VX(vsub_vx_d)
 -GEN_VEXT_VX(vrsub_vx_b)
 -GEN_VEXT_VX(vrsub_vx_h)
 -GEN_VEXT_VX(vrsub_vx_w)
 -GEN_VEXT_VX(vrsub_vx_d)
 +               do_##NAME, ESZ);                           \
 +}
 +
-+GEN_VEXT_VX(vadd_vx_b, 1)
+ static void riscv_aplic_request(void *opaque, int irq, int level)
 +GEN_VEXT_VX(vadd_vx_h, 2)
 +GEN_VEXT_VX(vadd_vx_w, 4)
 +GEN_VEXT_VX(vadd_vx_d, 8)
 +GEN_VEXT_VX(vsub_vx_b, 1)
 +GEN_VEXT_VX(vsub_vx_h, 2)
 +GEN_VEXT_VX(vsub_vx_w, 4)
 +GEN_VEXT_VX(vsub_vx_d, 8)
 +GEN_VEXT_VX(vrsub_vx_b, 1)
 +GEN_VEXT_VX(vrsub_vx_h, 2)
 +GEN_VEXT_VX(vrsub_vx_w, 4)
 +GEN_VEXT_VX(vrsub_vx_d, 8)
  void HELPER(vec_rsubs8)(void *d, void *a, uint64_t b, uint32_t desc)
  {
-@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vwadd_wx_w, WOP_WSSS_W, H8, H4, DO_ADD)
+     bool update = false;
- RVVCALL(OPIVX2, vwsub_wx_b, WOP_WSSS_B, H2, H1, DO_SUB)
+@@ -XXX,XX +XXX,XX @@ static void riscv_aplic_realize(DeviceState *dev, Error **errp)
- RVVCALL(OPIVX2, vwsub_wx_h, WOP_WSSS_H, H4, H2, DO_SUB)
+     uint32_t i;
- RVVCALL(OPIVX2, vwsub_wx_w, WOP_WSSS_W, H8, H4, DO_SUB)
+     RISCVAPLICState *aplic = RISCV_APLIC(dev);
--GEN_VEXT_VX(vwaddu_vx_b)
--GEN_VEXT_VX(vwaddu_vx_h)
+-    aplic->bitfield_words = (aplic->num_irqs + 31) >> 5;
--GEN_VEXT_VX(vwaddu_vx_w)
+-    aplic->sourcecfg = g_new0(uint32_t, aplic->num_irqs);
--GEN_VEXT_VX(vwsubu_vx_b)
+-    aplic->state = g_new0(uint32_t, aplic->num_irqs);
--GEN_VEXT_VX(vwsubu_vx_h)
+-    aplic->target = g_new0(uint32_t, aplic->num_irqs);
--GEN_VEXT_VX(vwsubu_vx_w)
+-    if (!aplic->msimode) {
--GEN_VEXT_VX(vwadd_vx_b)
+-        for (i = 0; i < aplic->num_irqs; i++) {
--GEN_VEXT_VX(vwadd_vx_h)
+-            aplic->target[i] = 1;
--GEN_VEXT_VX(vwadd_vx_w)
++    if (!is_kvm_aia(aplic->msimode)) {
--GEN_VEXT_VX(vwsub_vx_b)
++        aplic->bitfield_words = (aplic->num_irqs + 31) >> 5;
--GEN_VEXT_VX(vwsub_vx_h)
++        aplic->sourcecfg = g_new0(uint32_t, aplic->num_irqs);
--GEN_VEXT_VX(vwsub_vx_w)
++        aplic->state = g_new0(uint32_t, aplic->num_irqs);
--GEN_VEXT_VX(vwaddu_wx_b)
++        aplic->target = g_new0(uint32_t, aplic->num_irqs);
--GEN_VEXT_VX(vwaddu_wx_h)
++        if (!aplic->msimode) {
--GEN_VEXT_VX(vwaddu_wx_w)
++            for (i = 0; i < aplic->num_irqs; i++) {
--GEN_VEXT_VX(vwsubu_wx_b)
++                aplic->target[i] = 1;
--GEN_VEXT_VX(vwsubu_wx_h)
++            }
--GEN_VEXT_VX(vwsubu_wx_w)
+         }
--GEN_VEXT_VX(vwadd_wx_b)
+-    }
--GEN_VEXT_VX(vwadd_wx_h)
+-    aplic->idelivery = g_new0(uint32_t, aplic->num_harts);
--GEN_VEXT_VX(vwadd_wx_w)
+-    aplic->iforce = g_new0(uint32_t, aplic->num_harts);
--GEN_VEXT_VX(vwsub_wx_b)
+-    aplic->ithreshold = g_new0(uint32_t, aplic->num_harts);
--GEN_VEXT_VX(vwsub_wx_h)
++        aplic->idelivery = g_new0(uint32_t, aplic->num_harts);
--GEN_VEXT_VX(vwsub_wx_w)
++        aplic->iforce = g_new0(uint32_t, aplic->num_harts);
-+GEN_VEXT_VX(vwaddu_vx_b, 2)
++        aplic->ithreshold = g_new0(uint32_t, aplic->num_harts);
-+GEN_VEXT_VX(vwaddu_vx_h, 4)
-+GEN_VEXT_VX(vwaddu_vx_w, 8)
+-    memory_region_init_io(&aplic->mmio, OBJECT(dev), &riscv_aplic_ops, aplic,
-+GEN_VEXT_VX(vwsubu_vx_b, 2)
+-                          TYPE_RISCV_APLIC, aplic->aperture_size);
-+GEN_VEXT_VX(vwsubu_vx_h, 4)
+-    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &aplic->mmio);
-+GEN_VEXT_VX(vwsubu_vx_w, 8)
++        memory_region_init_io(&aplic->mmio, OBJECT(dev), &riscv_aplic_ops,
-+GEN_VEXT_VX(vwadd_vx_b, 2)
++                              aplic, TYPE_RISCV_APLIC, aplic->aperture_size);
-+GEN_VEXT_VX(vwadd_vx_h, 4)
++        sysbus_init_mmio(SYS_BUS_DEVICE(dev), &aplic->mmio);
-+GEN_VEXT_VX(vwadd_vx_w, 8)
++    }
-+GEN_VEXT_VX(vwsub_vx_b, 2)
-+GEN_VEXT_VX(vwsub_vx_h, 4)
+     /*
-+GEN_VEXT_VX(vwsub_vx_w, 8)
+      * Only root APLICs have hardware IRQ lines. All non-root APLICs
-+GEN_VEXT_VX(vwaddu_wx_b, 2)
+      * have IRQ lines delegated by their parent APLIC.
-+GEN_VEXT_VX(vwaddu_wx_h, 4)
+      */
-+GEN_VEXT_VX(vwaddu_wx_w, 8)
+     if (!aplic->parent) {
-+GEN_VEXT_VX(vwsubu_wx_b, 2)
+-        qdev_init_gpio_in(dev, riscv_aplic_request, aplic->num_irqs);
-+GEN_VEXT_VX(vwsubu_wx_h, 4)
++        if (is_kvm_aia(aplic->msimode)) {
-+GEN_VEXT_VX(vwsubu_wx_w, 8)
++            qdev_init_gpio_in(dev, riscv_kvm_aplic_request, aplic->num_irqs);
-+GEN_VEXT_VX(vwadd_wx_b, 2)
++        } else {
-+GEN_VEXT_VX(vwadd_wx_h, 4)
++            qdev_init_gpio_in(dev, riscv_aplic_request, aplic->num_irqs);
-+GEN_VEXT_VX(vwadd_wx_w, 8)
++        }
-+GEN_VEXT_VX(vwsub_wx_b, 2)
+     }
-+GEN_VEXT_VX(vwsub_wx_h, 4)
-+GEN_VEXT_VX(vwsub_wx_w, 8)
+     /* Create output IRQ lines for non-MSI mode */
+@@ -XXX,XX +XXX,XX @@ DeviceState *riscv_aplic_create(hwaddr addr, hwaddr size,
- /* Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions */
+     qdev_prop_set_bit(dev, "mmode", mmode);
- #define DO_VADC(N, M, C) (N + M + C)
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
+     sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
-                   CPURISCVState *env, uint32_t desc)          \
+-    sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, addr);
- {                                                             \
++
-     uint32_t vl = env->vl;                                    \
++    if (!is_kvm_aia(msimode)) {
-+    uint32_t esz = sizeof(ETYPE);                             \
++        sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, addr);
-+    uint32_t total_elems =                                    \
++    }
-+        vext_get_total_elems(env, desc, esz);                 \
-+    uint32_t vta = vext_vta(desc);                            \
+     if (parent) {
-     uint32_t i;                                               \
+         riscv_aplic_add_child(parent, dev);
-                                                               \
+diff --git a/hw/intc/riscv_imsic.c b/hw/intc/riscv_imsic.c
      for (i = env->vstart; i < vl; i++) {                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
          *((ETYPE *)vd + H(i)) = DO_OP(s2, s1, carry);         \
      }                                                         \
      env->vstart = 0;                                          \
 +    /* set tail elements to 1s */                             \
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);  \
  }
  GEN_VEXT_VADC_VVM(vadc_vvm_b, uint8_t,  H1, DO_VADC)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,        \
                    CPURISCVState *env, uint32_t desc)                     \
  {                                                                        \
      uint32_t vl = env->vl;                                               \
 +    uint32_t esz = sizeof(ETYPE);                                        \
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);         \
 +    uint32_t vta = vext_vta(desc);                                       \
      uint32_t i;                                                          \
                                                                           \
      for (i = env->vstart; i < vl; i++) {                                 \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,        \
          *((ETYPE *)vd + H(i)) = DO_OP(s2, (ETYPE)(target_long)s1, carry);\
      }                                                                    \
      env->vstart = 0;                                          \
 +    /* set tail elements to 1s */                                        \
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);             \
  }
  GEN_VEXT_VADC_VXM(vadc_vxm_b, uint8_t,  H1, DO_VADC)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
  {                                                             \
      uint32_t vl = env->vl;                                    \
      uint32_t vm = vext_vm(desc);                              \
 +    uint32_t total_elems = env_archcpu(env)->cfg.vlen;        \
 +    uint32_t vta_all_1s = vext_vta_all_1s(desc);              \
      uint32_t i;                                               \
                                                                \
      for (i = env->vstart; i < vl; i++) {                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
          vext_set_elem_mask(vd, i, DO_OP(s2, s1, carry));      \
      }                                                         \
      env->vstart = 0;                                          \
 +    /* mask destination register are always tail-agnostic */  \
 +    /* set tail elements to 1s */                             \
 +    if (vta_all_1s) {                                         \
 +        for (; i < total_elems; i++) {                        \
 +            vext_set_elem_mask(vd, i, 1);                     \
 +        }                                                     \
 +    }                                                         \
  }
  GEN_VEXT_VMADC_VVM(vmadc_vvm_b, uint8_t,  H1, DO_MADC)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,          \
  {                                                               \
      uint32_t vl = env->vl;                                      \
      uint32_t vm = vext_vm(desc);                                \
 +    uint32_t total_elems = env_archcpu(env)->cfg.vlen;          \
 +    uint32_t vta_all_1s = vext_vta_all_1s(desc);                \
      uint32_t i;                                                 \
                                                                  \
      for (i = env->vstart; i < vl; i++) {                        \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,          \
                  DO_OP(s2, (ETYPE)(target_long)s1, carry));      \
      }                                                           \
      env->vstart = 0;                                            \
 +    /* mask destination register are always tail-agnostic */    \
 +    /* set tail elements to 1s */                               \
 +    if (vta_all_1s) {                                           \
 +        for (; i < total_elems; i++) {                          \
 +            vext_set_elem_mask(vd, i, 1);                       \
 +        }                                                       \
 +    }                                                           \
  }
  GEN_VEXT_VMADC_VXM(vmadc_vxm_b, uint8_t,  H1, DO_MADC)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vxor_vx_b, OP_SSS_B, H1, H1, DO_XOR)
  RVVCALL(OPIVX2, vxor_vx_h, OP_SSS_H, H2, H2, DO_XOR)
  RVVCALL(OPIVX2, vxor_vx_w, OP_SSS_W, H4, H4, DO_XOR)
  RVVCALL(OPIVX2, vxor_vx_d, OP_SSS_D, H8, H8, DO_XOR)
 -GEN_VEXT_VX(vand_vx_b)
 -GEN_VEXT_VX(vand_vx_h)
 -GEN_VEXT_VX(vand_vx_w)
 -GEN_VEXT_VX(vand_vx_d)
 -GEN_VEXT_VX(vor_vx_b)
 -GEN_VEXT_VX(vor_vx_h)
 -GEN_VEXT_VX(vor_vx_w)
 -GEN_VEXT_VX(vor_vx_d)
 -GEN_VEXT_VX(vxor_vx_b)
 -GEN_VEXT_VX(vxor_vx_h)
 -GEN_VEXT_VX(vxor_vx_w)
 -GEN_VEXT_VX(vxor_vx_d)
 +GEN_VEXT_VX(vand_vx_b, 1)
 +GEN_VEXT_VX(vand_vx_h, 2)
 +GEN_VEXT_VX(vand_vx_w, 4)
 +GEN_VEXT_VX(vand_vx_d, 8)
 +GEN_VEXT_VX(vor_vx_b, 1)
 +GEN_VEXT_VX(vor_vx_h, 2)
 +GEN_VEXT_VX(vor_vx_w, 4)
 +GEN_VEXT_VX(vor_vx_d, 8)
 +GEN_VEXT_VX(vxor_vx_b, 1)
 +GEN_VEXT_VX(vxor_vx_h, 2)
 +GEN_VEXT_VX(vxor_vx_w, 4)
 +GEN_VEXT_VX(vxor_vx_d, 8)
  /* Vector Single-Width Bit Shift Instructions */
  #define DO_SLL(N, M)  (N << (M))
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vmax_vx_b, OP_SSS_B, H1, H1, DO_MAX)
  RVVCALL(OPIVX2, vmax_vx_h, OP_SSS_H, H2, H2, DO_MAX)
  RVVCALL(OPIVX2, vmax_vx_w, OP_SSS_W, H4, H4, DO_MAX)
  RVVCALL(OPIVX2, vmax_vx_d, OP_SSS_D, H8, H8, DO_MAX)
 -GEN_VEXT_VX(vminu_vx_b)
 -GEN_VEXT_VX(vminu_vx_h)
 -GEN_VEXT_VX(vminu_vx_w)
 -GEN_VEXT_VX(vminu_vx_d)
 -GEN_VEXT_VX(vmin_vx_b)
 -GEN_VEXT_VX(vmin_vx_h)
 -GEN_VEXT_VX(vmin_vx_w)
 -GEN_VEXT_VX(vmin_vx_d)
 -GEN_VEXT_VX(vmaxu_vx_b)
 -GEN_VEXT_VX(vmaxu_vx_h)
 -GEN_VEXT_VX(vmaxu_vx_w)
 -GEN_VEXT_VX(vmaxu_vx_d)
 -GEN_VEXT_VX(vmax_vx_b)
 -GEN_VEXT_VX(vmax_vx_h)
 -GEN_VEXT_VX(vmax_vx_w)
 -GEN_VEXT_VX(vmax_vx_d)
 +GEN_VEXT_VX(vminu_vx_b, 1)
 +GEN_VEXT_VX(vminu_vx_h, 2)
 +GEN_VEXT_VX(vminu_vx_w, 4)
 +GEN_VEXT_VX(vminu_vx_d, 8)
 +GEN_VEXT_VX(vmin_vx_b, 1)
 +GEN_VEXT_VX(vmin_vx_h, 2)
 +GEN_VEXT_VX(vmin_vx_w, 4)
 +GEN_VEXT_VX(vmin_vx_d, 8)
 +GEN_VEXT_VX(vmaxu_vx_b, 1)
 +GEN_VEXT_VX(vmaxu_vx_h, 2)
 +GEN_VEXT_VX(vmaxu_vx_w, 4)
 +GEN_VEXT_VX(vmaxu_vx_d, 8)
 +GEN_VEXT_VX(vmax_vx_b, 1)
 +GEN_VEXT_VX(vmax_vx_h, 2)
 +GEN_VEXT_VX(vmax_vx_w, 4)
 +GEN_VEXT_VX(vmax_vx_d, 8)
  /* Vector Single-Width Integer Multiply Instructions */
  #define DO_MUL(N, M) (N * M)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vmulhsu_vx_b, OP_SUS_B, H1, H1, do_mulhsu_b)
  RVVCALL(OPIVX2, vmulhsu_vx_h, OP_SUS_H, H2, H2, do_mulhsu_h)
  RVVCALL(OPIVX2, vmulhsu_vx_w, OP_SUS_W, H4, H4, do_mulhsu_w)
  RVVCALL(OPIVX2, vmulhsu_vx_d, OP_SUS_D, H8, H8, do_mulhsu_d)
 -GEN_VEXT_VX(vmul_vx_b)
 -GEN_VEXT_VX(vmul_vx_h)
 -GEN_VEXT_VX(vmul_vx_w)
 -GEN_VEXT_VX(vmul_vx_d)
 -GEN_VEXT_VX(vmulh_vx_b)
 -GEN_VEXT_VX(vmulh_vx_h)
 -GEN_VEXT_VX(vmulh_vx_w)
 -GEN_VEXT_VX(vmulh_vx_d)
 -GEN_VEXT_VX(vmulhu_vx_b)
 -GEN_VEXT_VX(vmulhu_vx_h)
 -GEN_VEXT_VX(vmulhu_vx_w)
 -GEN_VEXT_VX(vmulhu_vx_d)
 -GEN_VEXT_VX(vmulhsu_vx_b)
 -GEN_VEXT_VX(vmulhsu_vx_h)
 -GEN_VEXT_VX(vmulhsu_vx_w)
 -GEN_VEXT_VX(vmulhsu_vx_d)
 +GEN_VEXT_VX(vmul_vx_b, 1)
 +GEN_VEXT_VX(vmul_vx_h, 2)
 +GEN_VEXT_VX(vmul_vx_w, 4)
 +GEN_VEXT_VX(vmul_vx_d, 8)
 +GEN_VEXT_VX(vmulh_vx_b, 1)
 +GEN_VEXT_VX(vmulh_vx_h, 2)
 +GEN_VEXT_VX(vmulh_vx_w, 4)
 +GEN_VEXT_VX(vmulh_vx_d, 8)
 +GEN_VEXT_VX(vmulhu_vx_b, 1)
 +GEN_VEXT_VX(vmulhu_vx_h, 2)
 +GEN_VEXT_VX(vmulhu_vx_w, 4)
 +GEN_VEXT_VX(vmulhu_vx_d, 8)
 +GEN_VEXT_VX(vmulhsu_vx_b, 1)
 +GEN_VEXT_VX(vmulhsu_vx_h, 2)
 +GEN_VEXT_VX(vmulhsu_vx_w, 4)
 +GEN_VEXT_VX(vmulhsu_vx_d, 8)
  /* Vector Integer Divide Instructions */
  #define DO_DIVU(N, M) (unlikely(M == 0) ? (__typeof(N))(-1) : N / M)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vrem_vx_b, OP_SSS_B, H1, H1, DO_REM)
  RVVCALL(OPIVX2, vrem_vx_h, OP_SSS_H, H2, H2, DO_REM)
  RVVCALL(OPIVX2, vrem_vx_w, OP_SSS_W, H4, H4, DO_REM)
  RVVCALL(OPIVX2, vrem_vx_d, OP_SSS_D, H8, H8, DO_REM)
 -GEN_VEXT_VX(vdivu_vx_b)
 -GEN_VEXT_VX(vdivu_vx_h)
 -GEN_VEXT_VX(vdivu_vx_w)
 -GEN_VEXT_VX(vdivu_vx_d)
 -GEN_VEXT_VX(vdiv_vx_b)
 -GEN_VEXT_VX(vdiv_vx_h)
 -GEN_VEXT_VX(vdiv_vx_w)
 -GEN_VEXT_VX(vdiv_vx_d)
 -GEN_VEXT_VX(vremu_vx_b)
 -GEN_VEXT_VX(vremu_vx_h)
 -GEN_VEXT_VX(vremu_vx_w)
 -GEN_VEXT_VX(vremu_vx_d)
 -GEN_VEXT_VX(vrem_vx_b)
 -GEN_VEXT_VX(vrem_vx_h)
 -GEN_VEXT_VX(vrem_vx_w)
 -GEN_VEXT_VX(vrem_vx_d)
 +GEN_VEXT_VX(vdivu_vx_b, 1)
 +GEN_VEXT_VX(vdivu_vx_h, 2)
 +GEN_VEXT_VX(vdivu_vx_w, 4)
 +GEN_VEXT_VX(vdivu_vx_d, 8)
 +GEN_VEXT_VX(vdiv_vx_b, 1)
 +GEN_VEXT_VX(vdiv_vx_h, 2)
 +GEN_VEXT_VX(vdiv_vx_w, 4)
 +GEN_VEXT_VX(vdiv_vx_d, 8)
 +GEN_VEXT_VX(vremu_vx_b, 1)
 +GEN_VEXT_VX(vremu_vx_h, 2)
 +GEN_VEXT_VX(vremu_vx_w, 4)
 +GEN_VEXT_VX(vremu_vx_d, 8)
 +GEN_VEXT_VX(vrem_vx_b, 1)
 +GEN_VEXT_VX(vrem_vx_h, 2)
 +GEN_VEXT_VX(vrem_vx_w, 4)
 +GEN_VEXT_VX(vrem_vx_d, 8)
  /* Vector Widening Integer Multiply Instructions */
  RVVCALL(OPIVV2, vwmul_vv_b, WOP_SSS_B, H2, H1, H1, DO_MUL)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vwmulu_vx_w, WOP_UUU_W, H8, H4, DO_MUL)
  RVVCALL(OPIVX2, vwmulsu_vx_b, WOP_SUS_B, H2, H1, DO_MUL)
  RVVCALL(OPIVX2, vwmulsu_vx_h, WOP_SUS_H, H4, H2, DO_MUL)
  RVVCALL(OPIVX2, vwmulsu_vx_w, WOP_SUS_W, H8, H4, DO_MUL)
 -GEN_VEXT_VX(vwmul_vx_b)
 -GEN_VEXT_VX(vwmul_vx_h)
 -GEN_VEXT_VX(vwmul_vx_w)
 -GEN_VEXT_VX(vwmulu_vx_b)
 -GEN_VEXT_VX(vwmulu_vx_h)
 -GEN_VEXT_VX(vwmulu_vx_w)
 -GEN_VEXT_VX(vwmulsu_vx_b)
 -GEN_VEXT_VX(vwmulsu_vx_h)
 -GEN_VEXT_VX(vwmulsu_vx_w)
 +GEN_VEXT_VX(vwmul_vx_b, 2)
 +GEN_VEXT_VX(vwmul_vx_h, 4)
 +GEN_VEXT_VX(vwmul_vx_w, 8)
 +GEN_VEXT_VX(vwmulu_vx_b, 2)
 +GEN_VEXT_VX(vwmulu_vx_h, 4)
 +GEN_VEXT_VX(vwmulu_vx_w, 8)
 +GEN_VEXT_VX(vwmulsu_vx_b, 2)
 +GEN_VEXT_VX(vwmulsu_vx_h, 4)
 +GEN_VEXT_VX(vwmulsu_vx_w, 8)
  /* Vector Single-Width Integer Multiply-Add Instructions */
  #define OPIVV3(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)   \
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX3, vnmsub_vx_b, OP_SSS_B, H1, H1, DO_NMSUB)
  RVVCALL(OPIVX3, vnmsub_vx_h, OP_SSS_H, H2, H2, DO_NMSUB)
  RVVCALL(OPIVX3, vnmsub_vx_w, OP_SSS_W, H4, H4, DO_NMSUB)
  RVVCALL(OPIVX3, vnmsub_vx_d, OP_SSS_D, H8, H8, DO_NMSUB)
 -GEN_VEXT_VX(vmacc_vx_b)
 -GEN_VEXT_VX(vmacc_vx_h)
 -GEN_VEXT_VX(vmacc_vx_w)
 -GEN_VEXT_VX(vmacc_vx_d)
 -GEN_VEXT_VX(vnmsac_vx_b)
 -GEN_VEXT_VX(vnmsac_vx_h)
 -GEN_VEXT_VX(vnmsac_vx_w)
 -GEN_VEXT_VX(vnmsac_vx_d)
 -GEN_VEXT_VX(vmadd_vx_b)
 -GEN_VEXT_VX(vmadd_vx_h)
 -GEN_VEXT_VX(vmadd_vx_w)
 -GEN_VEXT_VX(vmadd_vx_d)
 -GEN_VEXT_VX(vnmsub_vx_b)
 -GEN_VEXT_VX(vnmsub_vx_h)
 -GEN_VEXT_VX(vnmsub_vx_w)
 -GEN_VEXT_VX(vnmsub_vx_d)
 +GEN_VEXT_VX(vmacc_vx_b, 1)
 +GEN_VEXT_VX(vmacc_vx_h, 2)
 +GEN_VEXT_VX(vmacc_vx_w, 4)
 +GEN_VEXT_VX(vmacc_vx_d, 8)
 +GEN_VEXT_VX(vnmsac_vx_b, 1)
 +GEN_VEXT_VX(vnmsac_vx_h, 2)
 +GEN_VEXT_VX(vnmsac_vx_w, 4)
 +GEN_VEXT_VX(vnmsac_vx_d, 8)
 +GEN_VEXT_VX(vmadd_vx_b, 1)
 +GEN_VEXT_VX(vmadd_vx_h, 2)
 +GEN_VEXT_VX(vmadd_vx_w, 4)
 +GEN_VEXT_VX(vmadd_vx_d, 8)
 +GEN_VEXT_VX(vnmsub_vx_b, 1)
 +GEN_VEXT_VX(vnmsub_vx_h, 2)
 +GEN_VEXT_VX(vnmsub_vx_w, 4)
 +GEN_VEXT_VX(vnmsub_vx_d, 8)
  /* Vector Widening Integer Multiply-Add Instructions */
  RVVCALL(OPIVV3, vwmaccu_vv_b, WOP_UUU_B, H2, H1, H1, DO_MACC)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX3, vwmaccsu_vx_w, WOP_SSU_W, H8, H4, DO_MACC)
  RVVCALL(OPIVX3, vwmaccus_vx_b, WOP_SUS_B, H2, H1, DO_MACC)
  RVVCALL(OPIVX3, vwmaccus_vx_h, WOP_SUS_H, H4, H2, DO_MACC)
  RVVCALL(OPIVX3, vwmaccus_vx_w, WOP_SUS_W, H8, H4, DO_MACC)
 -GEN_VEXT_VX(vwmaccu_vx_b)
 -GEN_VEXT_VX(vwmaccu_vx_h)
 -GEN_VEXT_VX(vwmaccu_vx_w)
 -GEN_VEXT_VX(vwmacc_vx_b)
 -GEN_VEXT_VX(vwmacc_vx_h)
 -GEN_VEXT_VX(vwmacc_vx_w)
 -GEN_VEXT_VX(vwmaccsu_vx_b)
 -GEN_VEXT_VX(vwmaccsu_vx_h)
 -GEN_VEXT_VX(vwmaccsu_vx_w)
 -GEN_VEXT_VX(vwmaccus_vx_b)
 -GEN_VEXT_VX(vwmaccus_vx_h)
 -GEN_VEXT_VX(vwmaccus_vx_w)
 +GEN_VEXT_VX(vwmaccu_vx_b, 2)
 +GEN_VEXT_VX(vwmaccu_vx_h, 4)
 +GEN_VEXT_VX(vwmaccu_vx_w, 8)
 +GEN_VEXT_VX(vwmacc_vx_b, 2)
 +GEN_VEXT_VX(vwmacc_vx_h, 4)
 +GEN_VEXT_VX(vwmacc_vx_w, 8)
 +GEN_VEXT_VX(vwmaccsu_vx_b, 2)
 +GEN_VEXT_VX(vwmaccsu_vx_h, 4)
 +GEN_VEXT_VX(vwmaccsu_vx_w, 8)
 +GEN_VEXT_VX(vwmaccus_vx_b, 2)
 +GEN_VEXT_VX(vwmaccus_vx_h, 4)
 +GEN_VEXT_VX(vwmaccus_vx_w, 8)
  /* Vector Integer Merge and Move Instructions */
  #define GEN_VEXT_VMV_VV(NAME, ETYPE, H)                              \
 diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/riscv/insn_trans/trans_rvv.c.inc
+--- a/hw/intc/riscv_imsic.c
-+++ b/target/riscv/insn_trans/trans_rvv.c.inc
++++ b/hw/intc/riscv_imsic.c
-@@ -XXX,XX +XXX,XX @@ static bool opivx_trans(uint32_t vd, uint32_t rs1, uint32_t vs2, uint32_t vm,
+@@ -XXX,XX +XXX,XX @@
+ #include "target/riscv/cpu.h"
-     data = FIELD_DP32(data, VDATA, VM, vm);
+ #include "target/riscv/cpu_bits.h"
-     data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+ #include "sysemu/sysemu.h"
-+    data = FIELD_DP32(data, VDATA, VTA, s->vta);
++#include "sysemu/kvm.h"
-+    data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);
+ #include "migration/vmstate.h"
-     desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlen / 8,
-                                       s->cfg_ptr->vlen / 8, data));
+ #define IMSIC_MMIO_PAGE_LE             0x00
+@@ -XXX,XX +XXX,XX @@ static void riscv_imsic_write(void *opaque, hwaddr addr, uint64_t value,
-@@ -XXX,XX +XXX,XX @@ do_opivx_gvec(DisasContext *s, arg_rmrr *a, GVecGen2sFn *gvec_fn,
+         goto err;
          return false;
      }
--    if (a->vm && s->vl_eq_vlmax) {
++#if defined(CONFIG_KVM)
-+    if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
++    if (kvm_irqchip_in_kernel()) {
-         TCGv_i64 src1 = tcg_temp_new_i64();
++        struct kvm_msi msi;
++
-         tcg_gen_ext_tl_i64(src1, get_gpr(s, a->rs1, EXT_SIGN));
++        msi.address_lo = extract64(imsic->mmio.addr + addr, 0, 32);
-@@ -XXX,XX +XXX,XX @@ static bool opivi_trans(uint32_t vd, uint32_t imm, uint32_t vs2, uint32_t vm,
++        msi.address_hi = extract64(imsic->mmio.addr + addr, 32, 32);
++        msi.data = le32_to_cpu(value);
-     data = FIELD_DP32(data, VDATA, VM, vm);
++
-     data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
++        kvm_vm_ioctl(kvm_state, KVM_SIGNAL_MSI, &msi);
-+    data = FIELD_DP32(data, VDATA, VTA, s->vta);
++
-+    data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);
++        return;
-     desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlen / 8,
++    }
-                                       s->cfg_ptr->vlen / 8, data));
++#endif
++
-@@ -XXX,XX +XXX,XX @@ do_opivi_gvec(DisasContext *s, arg_rmrr *a, GVecGen2iFn *gvec_fn,
+     /* Writes only supported for MSI little-endian registers */
-         return false;
+     page = addr >> IMSIC_MMIO_PAGE_SHIFT;
-     }
+     if ((addr & (IMSIC_MMIO_PAGE_SZ - 1)) == IMSIC_MMIO_PAGE_LE) {
+@@ -XXX,XX +XXX,XX @@ static void riscv_imsic_realize(DeviceState *dev, Error **errp)
--    if (a->vm && s->vl_eq_vlmax) {
+     CPUState *cpu = cpu_by_arch_id(imsic->hartid);
-+    if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
+     CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
-         gvec_fn(s->sew, vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2),
-                 extract_imm(s, a->rs1, imm_mode), MAXSZ(s), MAXSZ(s));
+-    imsic->num_eistate = imsic->num_pages * imsic->num_irqs;
-         mark_vs_dirty(s);
+-    imsic->eidelivery = g_new0(uint32_t, imsic->num_pages);
-@@ -XXX,XX +XXX,XX @@ static bool do_opivv_widen(DisasContext *s, arg_rmrr *a,
+-    imsic->eithreshold = g_new0(uint32_t, imsic->num_pages);
+-    imsic->eistate = g_new0(uint32_t, imsic->num_eistate);
-         data = FIELD_DP32(data, VDATA, VM, a->vm);
++    if (!kvm_irqchip_in_kernel()) {
-         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
++        imsic->num_eistate = imsic->num_pages * imsic->num_irqs;
-+        data = FIELD_DP32(data, VDATA, VTA, s->vta);
++        imsic->eidelivery = g_new0(uint32_t, imsic->num_pages);
-         tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
++        imsic->eithreshold = g_new0(uint32_t, imsic->num_pages);
-                            vreg_ofs(s, a->rs1),
++        imsic->eistate = g_new0(uint32_t, imsic->num_eistate);
-                            vreg_ofs(s, a->rs2),
++    }
-@@ -XXX,XX +XXX,XX @@ static bool do_opiwv_widen(DisasContext *s, arg_rmrr *a,
+     memory_region_init_io(&imsic->mmio, OBJECT(dev), &riscv_imsic_ops,
-         data = FIELD_DP32(data, VDATA, VM, a->vm);
+                           imsic, TYPE_RISCV_IMSIC,
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);
          tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
                             vreg_ofs(s, a->rs1),
                             vreg_ofs(s, a->rs2),
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
                                                                     \
          data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
 +        data =                                                     \
 +            FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);\
          tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                             vreg_ofs(s, a->rs1),                    \
                             vreg_ofs(s, a->rs2), cpu_env,           \
 --
-.36.1
+.41.0

-[PULL 03/25] hw/riscv: virt: Generate fw_cfg DT node correctly
+[PULL v2 33/45] target/riscv: select KVM AIA in riscv virt machine
-From: Atish Patra <atishp@rivosinc.com>
+From: Yong-Xuan Wang <yongxuan.wang@sifive.com>
-fw_cfg DT node is generated after the create_fdt without any check
+Select KVM AIA when the host kernel has in-kernel AIA chip support.
-if the DT is being loaded from the commandline. This results in
+Since KVM AIA only has one APLIC instance, we map the QEMU APLIC
-FDT_ERR_EXISTS error if dtb is loaded from the commandline.
+devices to KVM APLIC.
-Generate fw_cfg node only if the DT is not loaded from the commandline.
+Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
+Reviewed-by: Jim Shu <jim.shu@sifive.com>
-Signed-off-by: Atish Patra <atishp@rivosinc.com>
+Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
-Message-Id: <20220526203500.847165-1-atishp@rivosinc.com>
+Message-ID: <20230727102439.22554-6-yongxuan.wang@sifive.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- hw/riscv/virt.c | 28 ++++++++++++++++++----------
+ hw/riscv/virt.c | 94 +++++++++++++++++++++++++++++++++----------------
-file changed, 18 insertions(+), 10 deletions(-)
+file changed, 63 insertions(+), 31 deletions(-)
 diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/riscv/virt.c
 +++ b/hw/riscv/virt.c
-@@ -XXX,XX +XXX,XX @@ static void create_fdt_flash(RISCVVirtState *s, const MemMapEntry *memmap)
+@@ -XXX,XX +XXX,XX @@
-     g_free(name);
+ #include "hw/riscv/virt.h"
- }
+ #include "hw/riscv/boot.h"
+ #include "hw/riscv/numa.h"
-+static void create_fdt_fw_cfg(RISCVVirtState *s, const MemMapEntry *memmap)
++#include "kvm_riscv.h"
  #include "hw/intc/riscv_aclint.h"
  #include "hw/intc/riscv_aplic.h"
  #include "hw/intc/riscv_imsic.h"
@@ -XXX,XX +XXX,XX @@
  #error "Can't accommodate all IMSIC groups in address space"
  #endif
 +/* KVM AIA only supports APLIC MSI. APLIC Wired is always emulated by QEMU. */
 +static bool virt_use_kvm_aia(RISCVVirtState *s)
 +{
-+    char *nodename;
++    return kvm_irqchip_in_kernel() && s->aia_type == VIRT_AIA_TYPE_APLIC_IMSIC;
 +    MachineState *mc = MACHINE(s);
 +    hwaddr base = memmap[VIRT_FW_CFG].base;
 +    hwaddr size = memmap[VIRT_FW_CFG].size;
 +
 +    nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base);
 +    qemu_fdt_add_subnode(mc->fdt, nodename);
 +    qemu_fdt_setprop_string(mc->fdt, nodename,
 +                            "compatible", "qemu,fw-cfg-mmio");
 +    qemu_fdt_setprop_sized_cells(mc->fdt, nodename, "reg",
 +                                 2, base, 2, size);
 +    qemu_fdt_setprop(mc->fdt, nodename, "dma-coherent", NULL, 0);
 +    g_free(nodename);
 +}
 +
- static void create_fdt(RISCVVirtState *s, const MemMapEntry *memmap,
+ static const MemMapEntry virt_memmap[] = {
-                        uint64_t mem_size, const char *cmdline, bool is_32_bit)
+     [VIRT_DEBUG] =        {        0x0,         0x100 },
      [VIRT_MROM] =         {     0x1000,        0xf000 },
@@ -XXX,XX +XXX,XX @@ static void create_fdt_one_aplic(RISCVVirtState *s, int socket,
                                   uint32_t *intc_phandles,
                                   uint32_t aplic_phandle,
                                   uint32_t aplic_child_phandle,
 -                                 bool m_mode)
 +                                 bool m_mode, int num_harts)
  {
-@@ -XXX,XX +XXX,XX @@ static void create_fdt(RISCVVirtState *s, const MemMapEntry *memmap,
+     int cpu;
-     create_fdt_rtc(s, memmap, irq_mmio_phandle);
+     char *aplic_name;
+     uint32_t *aplic_cells;
-     create_fdt_flash(s, memmap);
+     MachineState *ms = MACHINE(s);
-+    create_fdt_fw_cfg(s, memmap);
+-    aplic_cells = g_new0(uint32_t, s->soc[socket].num_harts * 2);
- update_bootargs:
++    aplic_cells = g_new0(uint32_t, num_harts * 2);
-     if (cmdline && *cmdline) {
-@@ -XXX,XX +XXX,XX @@ static inline DeviceState *gpex_pcie_init(MemoryRegion *sys_mem,
+-    for (cpu = 0; cpu < s->soc[socket].num_harts; cpu++) {
- static FWCfgState *create_fw_cfg(const MachineState *mc)
++    for (cpu = 0; cpu < num_harts; cpu++) {
          aplic_cells[cpu * 2 + 0] = cpu_to_be32(intc_phandles[cpu]);
          aplic_cells[cpu * 2 + 1] = cpu_to_be32(m_mode ? IRQ_M_EXT : IRQ_S_EXT);
      }
@@ -XXX,XX +XXX,XX @@ static void create_fdt_one_aplic(RISCVVirtState *s, int socket,
      if (s->aia_type == VIRT_AIA_TYPE_APLIC) {
          qemu_fdt_setprop(ms->fdt, aplic_name, "interrupts-extended",
 -                         aplic_cells,
 -                         s->soc[socket].num_harts * sizeof(uint32_t) * 2);
 +                         aplic_cells, num_harts * sizeof(uint32_t) * 2);
      } else {
          qemu_fdt_setprop_cell(ms->fdt, aplic_name, "msi-parent", msi_phandle);
      }
@@ -XXX,XX +XXX,XX @@ static void create_fdt_socket_aplic(RISCVVirtState *s,
                                      uint32_t msi_s_phandle,
                                      uint32_t *phandle,
                                      uint32_t *intc_phandles,
 -                                    uint32_t *aplic_phandles)
 +                                    uint32_t *aplic_phandles,
 +                                    int num_harts)
  {
-     hwaddr base = virt_memmap[VIRT_FW_CFG].base;
+     char *aplic_name;
--    hwaddr size = virt_memmap[VIRT_FW_CFG].size;
+     unsigned long aplic_addr;
-     FWCfgState *fw_cfg;
+@@ -XXX,XX +XXX,XX @@ static void create_fdt_socket_aplic(RISCVVirtState *s,
--    char *nodename;
+         create_fdt_one_aplic(s, socket, aplic_addr, memmap[VIRT_APLIC_M].size,
+                              msi_m_phandle, intc_phandles,
-     fw_cfg = fw_cfg_init_mem_wide(base + 8, base, 8, base + 16,
+                              aplic_m_phandle, aplic_s_phandle,
-                                   &address_space_memory);
+-                             true);
-     fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)mc->smp.cpus);
++                             true, num_harts);
+     }
--    nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base);
--    qemu_fdt_add_subnode(mc->fdt, nodename);
+     /* S-level APLIC node */
--    qemu_fdt_setprop_string(mc->fdt, nodename,
+@@ -XXX,XX +XXX,XX @@ static void create_fdt_socket_aplic(RISCVVirtState *s,
--                            "compatible", "qemu,fw-cfg-mmio");
+     create_fdt_one_aplic(s, socket, aplic_addr, memmap[VIRT_APLIC_S].size,
--    qemu_fdt_setprop_sized_cells(mc->fdt, nodename, "reg",
+                          msi_s_phandle, intc_phandles,
--                                 2, base, 2, size);
+                          aplic_s_phandle, 0,
--    qemu_fdt_setprop(mc->fdt, nodename, "dma-coherent", NULL, 0);
+-                         false);
--    g_free(nodename);
++                         false, num_harts);
-     return fw_cfg;
- }
+     aplic_name = g_strdup_printf("/soc/aplic@%lx", aplic_addr);
@@ -XXX,XX +XXX,XX @@ static void create_fdt_sockets(RISCVVirtState *s, const MemMapEntry *memmap,
          *msi_pcie_phandle = msi_s_phandle;
      }
 -    phandle_pos = ms->smp.cpus;
 -    for (socket = (socket_count - 1); socket >= 0; socket--) {
 -        phandle_pos -= s->soc[socket].num_harts;
 -
 -        if (s->aia_type == VIRT_AIA_TYPE_NONE) {
 -            create_fdt_socket_plic(s, memmap, socket, phandle,
 -                &intc_phandles[phandle_pos], xplic_phandles);
 -        } else {
 -            create_fdt_socket_aplic(s, memmap, socket,
 -                msi_m_phandle, msi_s_phandle, phandle,
 -                &intc_phandles[phandle_pos], xplic_phandles);
 +    /* KVM AIA only has one APLIC instance */
 +    if (virt_use_kvm_aia(s)) {
 +        create_fdt_socket_aplic(s, memmap, 0,
 +                                msi_m_phandle, msi_s_phandle, phandle,
 +                                &intc_phandles[0], xplic_phandles,
 +                                ms->smp.cpus);
 +    } else {
 +        phandle_pos = ms->smp.cpus;
 +        for (socket = (socket_count - 1); socket >= 0; socket--) {
 +            phandle_pos -= s->soc[socket].num_harts;
 +
 +            if (s->aia_type == VIRT_AIA_TYPE_NONE) {
 +                create_fdt_socket_plic(s, memmap, socket, phandle,
 +                                       &intc_phandles[phandle_pos],
 +                                       xplic_phandles);
 +            } else {
 +                create_fdt_socket_aplic(s, memmap, socket,
 +                                        msi_m_phandle, msi_s_phandle, phandle,
 +                                        &intc_phandles[phandle_pos],
 +                                        xplic_phandles,
 +                                        s->soc[socket].num_harts);
 +            }
          }
      }
      g_free(intc_phandles);
 -    for (socket = 0; socket < socket_count; socket++) {
 -        if (socket == 0) {
 -            *irq_mmio_phandle = xplic_phandles[socket];
 -            *irq_virtio_phandle = xplic_phandles[socket];
 -            *irq_pcie_phandle = xplic_phandles[socket];
 -        }
 -        if (socket == 1) {
 -            *irq_virtio_phandle = xplic_phandles[socket];
 -            *irq_pcie_phandle = xplic_phandles[socket];
 -        }
 -        if (socket == 2) {
 -            *irq_pcie_phandle = xplic_phandles[socket];
 +    if (virt_use_kvm_aia(s)) {
 +        *irq_mmio_phandle = xplic_phandles[0];
 +        *irq_virtio_phandle = xplic_phandles[0];
 +        *irq_pcie_phandle = xplic_phandles[0];
 +    } else {
 +        for (socket = 0; socket < socket_count; socket++) {
 +            if (socket == 0) {
 +                *irq_mmio_phandle = xplic_phandles[socket];
 +                *irq_virtio_phandle = xplic_phandles[socket];
 +                *irq_pcie_phandle = xplic_phandles[socket];
 +            }
 +            if (socket == 1) {
 +                *irq_virtio_phandle = xplic_phandles[socket];
 +                *irq_pcie_phandle = xplic_phandles[socket];
 +            }
 +            if (socket == 2) {
 +                *irq_pcie_phandle = xplic_phandles[socket];
 +            }
          }
      }
@@ -XXX,XX +XXX,XX @@ static void virt_machine_init(MachineState *machine)
          }
      }
 +    if (virt_use_kvm_aia(s)) {
 +        kvm_riscv_aia_create(machine, IMSIC_MMIO_GROUP_MIN_SHIFT,
 +                             VIRT_IRQCHIP_NUM_SOURCES, VIRT_IRQCHIP_NUM_MSIS,
 +                             memmap[VIRT_APLIC_S].base,
 +                             memmap[VIRT_IMSIC_S].base,
 +                             s->aia_guests);
 +    }
 +
      if (riscv_is_32bit(&s->soc[0])) {
  #if HOST_LONG_BITS == 64
          /* limit RAM size in a 32-bit system */
 --
-.36.1
+.41.0

-New patch
+[PULL v2 34/45] hw/riscv: virt: Fix riscv,pmu DT node path
+From: Conor Dooley <conor.dooley@microchip.com>
+On a dtb dumped from the virt machine, dt-validate complains:
+soc: pmu: {'riscv,event-to-mhpmcounters': [[1, 1, 524281], [2, 2, 524284], [65561, 65561, 524280], [65563, 65563, 524280], [65569, 65569, 524280]], 'compatible': ['riscv,pmu']} should not be valid under {'type': 'object'}
+        from schema $id: http://devicetree.org/schemas/simple-bus.yaml#
+That's pretty cryptic, but running the dtb back through dtc produces
+something a lot more reasonable:
+Warning (simple_bus_reg): /soc/pmu: missing or empty reg/ranges property
+Moving the riscv,pmu node out of the soc bus solves the problem.
+Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
+Acked-by: Alistair Francis <alistair.francis@wdc.com>
+Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
+Message-ID: <20230727-groom-decline-2c57ce42841c@spud>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ hw/riscv/virt.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/riscv/virt.c
++++ b/hw/riscv/virt.c
+@@ -XXX,XX +XXX,XX @@ static void create_fdt_pmu(RISCVVirtState *s)
+     MachineState *ms = MACHINE(s);
+     RISCVCPU hart = s->soc[0].harts[0];
+-    pmu_name = g_strdup_printf("/soc/pmu");
++    pmu_name = g_strdup_printf("/pmu");
+     qemu_fdt_add_subnode(ms->fdt, pmu_name);
+     qemu_fdt_setprop_string(ms->fdt, pmu_name, "compatible", "riscv,pmu");
+     riscv_pmu_generate_fdt_node(ms->fdt, hart.cfg.pmu_num, pmu_name);
+--
+.41.0

-[PULL 06/25] target/riscv: Wake on VS-level external interrupts
+[PULL v2 35/45] target/riscv: Update CSR bits name for svadu extension
-From: Andrew Bresticker <abrestic@rivosinc.com>
+From: Weiwei Li <liweiwei@iscas.ac.cn>
-Whether or not VSEIP is pending isn't reflected in env->mip and must
+The Svadu specification updated the name of the *envcfg bit from
-instead be determined from hstatus.vgein and hgeip. As a result a
+HADE to ADUE.
 CPU in WFI won't wake on a VSEIP, which violates the WFI behavior as
 specified in the privileged ISA. Just use riscv_cpu_all_pending()
 instead, which already accounts for VSEIP.
-Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
+Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn>
-Message-Id: <20220531210544.181322-1-abrestic@rivosinc.com>
+Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
 Message-ID: <20230816141916.66898-1-liweiwei@iscas.ac.cn>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/cpu.h        | 1 +
+ target/riscv/cpu_bits.h   |  8 ++++----
- target/riscv/cpu.c        | 2 +-
+ target/riscv/cpu.c        |  4 ++--
- target/riscv/cpu_helper.c | 2 +-
+ target/riscv/cpu_helper.c |  6 +++---
-files changed, 3 insertions(+), 2 deletions(-)
+ target/riscv/csr.c        | 12 ++++++------
 files changed, 15 insertions(+), 15 deletions(-)
-diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
+diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/riscv/cpu.h
+--- a/target/riscv/cpu_bits.h
-+++ b/target/riscv/cpu.h
++++ b/target/riscv/cpu_bits.h
-@@ -XXX,XX +XXX,XX @@ int riscv_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
+@@ -XXX,XX +XXX,XX @@ typedef enum RISCVException {
- int riscv_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
+ #define MENVCFG_CBIE                       (3UL << 4)
- int riscv_cpu_hviprio_index2irq(int index, int *out_irq, int *out_rdzero);
+ #define MENVCFG_CBCFE                      BIT(6)
- uint8_t riscv_cpu_default_priority(int irq);
+ #define MENVCFG_CBZE                       BIT(7)
-+uint64_t riscv_cpu_all_pending(CPURISCVState *env);
+-#define MENVCFG_HADE                       (1ULL << 61)
- int riscv_cpu_mirq_pending(CPURISCVState *env);
++#define MENVCFG_ADUE                       (1ULL << 61)
- int riscv_cpu_sirq_pending(CPURISCVState *env);
+ #define MENVCFG_PBMTE                      (1ULL << 62)
- int riscv_cpu_vsirq_pending(CPURISCVState *env);
+ #define MENVCFG_STCE                       (1ULL << 63)
  /* For RV32 */
 -#define MENVCFGH_HADE                      BIT(29)
 +#define MENVCFGH_ADUE                      BIT(29)
  #define MENVCFGH_PBMTE                     BIT(30)
  #define MENVCFGH_STCE                      BIT(31)
@@ -XXX,XX +XXX,XX @@ typedef enum RISCVException {
  #define HENVCFG_CBIE                       MENVCFG_CBIE
  #define HENVCFG_CBCFE                      MENVCFG_CBCFE
  #define HENVCFG_CBZE                       MENVCFG_CBZE
 -#define HENVCFG_HADE                       MENVCFG_HADE
 +#define HENVCFG_ADUE                       MENVCFG_ADUE
  #define HENVCFG_PBMTE                      MENVCFG_PBMTE
  #define HENVCFG_STCE                       MENVCFG_STCE
  /* For RV32 */
 -#define HENVCFGH_HADE                       MENVCFGH_HADE
 +#define HENVCFGH_ADUE                       MENVCFGH_ADUE
  #define HENVCFGH_PBMTE                      MENVCFGH_PBMTE
  #define HENVCFGH_STCE                       MENVCFGH_STCE
 diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/cpu.c
 +++ b/target/riscv/cpu.c
-@@ -XXX,XX +XXX,XX @@ static bool riscv_cpu_has_work(CPUState *cs)
+@@ -XXX,XX +XXX,XX @@ static void riscv_cpu_reset_hold(Object *obj)
-      * Definition of the WFI instruction requires it to ignore the privilege
+     env->two_stage_lookup = false;
-      * mode and delegation registers, but respect individual enables
-      */
+     env->menvcfg = (cpu->cfg.ext_svpbmt ? MENVCFG_PBMTE : 0) |
--    return (env->mip & env->mie) != 0;
+-                   (cpu->cfg.ext_svadu ? MENVCFG_HADE : 0);
-+    return riscv_cpu_all_pending(env) != 0;
++                   (cpu->cfg.ext_svadu ? MENVCFG_ADUE : 0);
- #else
+     env->henvcfg = (cpu->cfg.ext_svpbmt ? HENVCFG_PBMTE : 0) |
-     return true;
+-                   (cpu->cfg.ext_svadu ? HENVCFG_HADE : 0);
- #endif
++                   (cpu->cfg.ext_svadu ? HENVCFG_ADUE : 0);
      /* Initialized default priorities of local interrupts. */
      for (i = 0; i < ARRAY_SIZE(env->miprio); i++) {
 diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/cpu_helper.c
 +++ b/target/riscv/cpu_helper.c
-@@ -XXX,XX +XXX,XX @@ static int riscv_cpu_pending_to_irq(CPURISCVState *env,
+@@ -XXX,XX +XXX,XX @@ static int get_physical_address(CPURISCVState *env, hwaddr *physical,
-     return best_irq;
+     }
      bool pbmte = env->menvcfg & MENVCFG_PBMTE;
 -    bool hade = env->menvcfg & MENVCFG_HADE;
 +    bool adue = env->menvcfg & MENVCFG_ADUE;
      if (first_stage && two_stage && env->virt_enabled) {
          pbmte = pbmte && (env->henvcfg & HENVCFG_PBMTE);
 -        hade = hade && (env->henvcfg & HENVCFG_HADE);
 +        adue = adue && (env->henvcfg & HENVCFG_ADUE);
      }
      int ptshift = (levels - 1) * ptidxbits;
@@ -XXX,XX +XXX,XX @@ restart:
      /* Page table updates need to be atomic with MTTCG enabled */
      if (updated_pte != pte && !is_debug) {
 -        if (!hade) {
 +        if (!adue) {
              return TRANSLATE_FAIL;
          }
 diff --git a/target/riscv/csr.c b/target/riscv/csr.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/csr.c
 +++ b/target/riscv/csr.c
@@ -XXX,XX +XXX,XX @@ static RISCVException write_menvcfg(CPURISCVState *env, int csrno,
      if (riscv_cpu_mxl(env) == MXL_RV64) {
          mask |= (cfg->ext_svpbmt ? MENVCFG_PBMTE : 0) |
                  (cfg->ext_sstc ? MENVCFG_STCE : 0) |
 -                (cfg->ext_svadu ? MENVCFG_HADE : 0);
 +                (cfg->ext_svadu ? MENVCFG_ADUE : 0);
      }
      env->menvcfg = (env->menvcfg & ~mask) | (val & mask);
@@ -XXX,XX +XXX,XX @@ static RISCVException write_menvcfgh(CPURISCVState *env, int csrno,
      const RISCVCPUConfig *cfg = riscv_cpu_cfg(env);
      uint64_t mask = (cfg->ext_svpbmt ? MENVCFG_PBMTE : 0) |
                      (cfg->ext_sstc ? MENVCFG_STCE : 0) |
 -                    (cfg->ext_svadu ? MENVCFG_HADE : 0);
 +                    (cfg->ext_svadu ? MENVCFG_ADUE : 0);
      uint64_t valh = (uint64_t)val << 32;
      env->menvcfg = (env->menvcfg & ~mask) | (valh & mask);
@@ -XXX,XX +XXX,XX @@ static RISCVException read_henvcfg(CPURISCVState *env, int csrno,
       * henvcfg.stce is read_only 0 when menvcfg.stce = 0
       * henvcfg.hade is read_only 0 when menvcfg.hade = 0
       */
 -    *val = env->henvcfg & (~(HENVCFG_PBMTE | HENVCFG_STCE | HENVCFG_HADE) |
 +    *val = env->henvcfg & (~(HENVCFG_PBMTE | HENVCFG_STCE | HENVCFG_ADUE) |
                             env->menvcfg);
      return RISCV_EXCP_NONE;
  }
+@@ -XXX,XX +XXX,XX @@ static RISCVException write_henvcfg(CPURISCVState *env, int csrno,
--static uint64_t riscv_cpu_all_pending(CPURISCVState *env)
+     }
-+uint64_t riscv_cpu_all_pending(CPURISCVState *env)
      if (riscv_cpu_mxl(env) == MXL_RV64) {
 -        mask |= env->menvcfg & (HENVCFG_PBMTE | HENVCFG_STCE | HENVCFG_HADE);
 +        mask |= env->menvcfg & (HENVCFG_PBMTE | HENVCFG_STCE | HENVCFG_ADUE);
      }
      env->henvcfg = (env->henvcfg & ~mask) | (val & mask);
@@ -XXX,XX +XXX,XX @@ static RISCVException read_henvcfgh(CPURISCVState *env, int csrno,
          return ret;
      }
 -    *val = (env->henvcfg & (~(HENVCFG_PBMTE | HENVCFG_STCE | HENVCFG_HADE) |
 +    *val = (env->henvcfg & (~(HENVCFG_PBMTE | HENVCFG_STCE | HENVCFG_ADUE) |
                              env->menvcfg)) >> 32;
      return RISCV_EXCP_NONE;
  }
@@ -XXX,XX +XXX,XX @@ static RISCVException write_henvcfgh(CPURISCVState *env, int csrno,
                                       target_ulong val)
  {
-     uint32_t gein = get_field(env->hstatus, HSTATUS_VGEIN);
+     uint64_t mask = env->menvcfg & (HENVCFG_PBMTE | HENVCFG_STCE |
-     uint64_t vsgein = (env->hgeip & (1ULL << gein)) ? MIP_VSEIP : 0;
+-                                    HENVCFG_HADE);
 +                                    HENVCFG_ADUE);
      uint64_t valh = (uint64_t)val << 32;
      RISCVException ret;
 --
-.36.1
+.41.0

-[PULL 24/25] target/riscv: Don't expose the CPU properties on names CPUs
+[PULL v2 36/45] target/riscv: fix satp_mode_finalize() when satp_mode.supported = 0
-From: Alistair Francis <alistair.francis@wdc.com>
+From: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
-There are currently two types of RISC-V CPUs:
+In the same emulated RISC-V host, the 'host' KVM CPU takes 4 times
- - Generic CPUs (base or any) that allow complete custimisation
+longer to boot than the 'rv64' KVM CPU.
  - "Named" CPUs that match existing hardware
-Users can use the base CPUs to custimise the extensions that they want, for
+The reason is an unintended behavior of riscv_cpu_satp_mode_finalize()
-example -cpu rv64,v=true.
+when satp_mode.supported = 0, i.e. when cpu_init() does not set
 satp_mode_max_supported(). satp_mode_max_from_map(map) does:
-We originally exposed these as part of the named CPUs as well, but that was
+- __builtin_clz(map)
 by accident.
-Exposing the CPU properties to named CPUs means that we accidently
+This means that, if satp_mode.supported = 0, satp_mode_supported_max
-enable extensions that don't exist on the CPUs by default. For example
+wil be '31 - 32'. But this is C, so satp_mode_supported_max will gladly
-the SiFive E CPU currently support the zba extension, which is a bug.
+set it to UINT_MAX (4294967295). After that, if the user didn't set a
 satp_mode, set_satp_mode_default_map(cpu) will make
-This patch instead only exposes the CPU extensions to the generic CPUs.
+cfg.satp_mode.map = cfg.satp_mode.supported
-Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+So satp_mode.map = 0. And then satp_mode_map_max will be set to
-Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
+satp_mode_max_from_map(cpu->cfg.satp_mode.map), i.e. also UINT_MAX. The
-Message-Id: <20220608061437.314434-1-alistair.francis@opensource.wdc.com>
+guard "satp_mode_map_max > satp_mode_supported_max" doesn't protect us
 here since both are UINT_MAX.
 And finally we have 2 loops:
         for (int i = satp_mode_map_max - 1; i >= 0; --i) {
 Which are, in fact, 2 loops from UINT_MAX -1 to -1. This is where the
 extra delay when booting the 'host' CPU is coming from.
 Commit 43d1de32f8 already set a precedence for satp_mode.supported = 0
 in a different manner. We're doing the same here. If supported == 0,
 interpret as 'the CPU wants the OS to handle satp mode alone' and skip
 satp_mode_finalize().
 We'll also put a guard in satp_mode_max_from_map() to assert out if map
 is 0 since the function is not ready to deal with it.
 Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
 Fixes: 6f23aaeb9b ("riscv: Allow user to set the satp mode")
 Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
 Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
 Message-ID: <20230817152903.694926-1-dbarboza@ventanamicro.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/cpu.c | 57 +++++++++++++++++++++++++++++++++++++---------
+ target/riscv/cpu.c | 23 ++++++++++++++++++++---
-file changed, 46 insertions(+), 11 deletions(-)
+file changed, 20 insertions(+), 3 deletions(-)
 diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/cpu.c
 +++ b/target/riscv/cpu.c
-@@ -XXX,XX +XXX,XX @@ static const char * const riscv_intr_names[] = {
+@@ -XXX,XX +XXX,XX @@ static uint8_t satp_mode_from_str(const char *satp_mode_str)
-     "reserved"
- };
+ uint8_t satp_mode_max_from_map(uint32_t map)
+ {
-+static void register_cpu_props(DeviceState *dev);
++    /*
 +     * 'map = 0' will make us return (31 - 32), which C will
 +     * happily overflow to UINT_MAX. There's no good result to
 +     * return if 'map = 0' (e.g. returning 0 will be ambiguous
 +     * with the result for 'map = 1').
 +     *
 +     * Assert out if map = 0. Callers will have to deal with
 +     * it outside of this function.
 +     */
 +    g_assert(map > 0);
 +
- const char *riscv_cpu_get_trap_name(target_ulong cause, bool async)
+     /* map here has at least one bit set, so no problem with clz */
      return 31 - __builtin_clz(map);
  }
@@ -XXX,XX +XXX,XX @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
  static void riscv_cpu_satp_mode_finalize(RISCVCPU *cpu, Error **errp)
  {
-     if (async) {
+     bool rv32 = riscv_cpu_mxl(&cpu->env) == MXL_RV32;
-@@ -XXX,XX +XXX,XX @@ static void riscv_any_cpu_init(Object *obj)
+-    uint8_t satp_mode_map_max;
-     set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD | RVC | RVU);
+-    uint8_t satp_mode_supported_max =
- #endif
+-                        satp_mode_max_from_map(cpu->cfg.satp_mode.supported);
-     set_priv_version(env, PRIV_VERSION_1_12_0);
++    uint8_t satp_mode_map_max, satp_mode_supported_max;
 +    register_cpu_props(DEVICE(obj));
  }
  #if defined(TARGET_RISCV64)
@@ -XXX,XX +XXX,XX @@ static void rv64_base_cpu_init(Object *obj)
      CPURISCVState *env = &RISCV_CPU(obj)->env;
      /* We set this in the realise function */
      set_misa(env, MXL_RV64, 0);
 +    register_cpu_props(DEVICE(obj));
  }
  static void rv64_sifive_u_cpu_init(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void rv64_sifive_u_cpu_init(Object *obj)
  static void rv64_sifive_e_cpu_init(Object *obj)
  {
      CPURISCVState *env = &RISCV_CPU(obj)->env;
 +    RISCVCPU *cpu = RISCV_CPU(obj);
 +
-     set_misa(env, MXL_RV64, RVI | RVM | RVA | RVC | RVU);
++    /* The CPU wants the OS to decide which satp mode to use */
-     set_priv_version(env, PRIV_VERSION_1_10_0);
++    if (cpu->cfg.satp_mode.supported == 0) {
--    qdev_prop_set_bit(DEVICE(obj), "mmu", false);
++        return;
-+    cpu->cfg.mmu = false;
++    }
  }
  static void rv128_base_cpu_init(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void rv128_base_cpu_init(Object *obj)
      CPURISCVState *env = &RISCV_CPU(obj)->env;
      /* We set this in the realise function */
      set_misa(env, MXL_RV128, 0);
 +    register_cpu_props(DEVICE(obj));
  }
  #else
  static void rv32_base_cpu_init(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void rv32_base_cpu_init(Object *obj)
      CPURISCVState *env = &RISCV_CPU(obj)->env;
      /* We set this in the realise function */
      set_misa(env, MXL_RV32, 0);
 +    register_cpu_props(DEVICE(obj));
  }
  static void rv32_sifive_u_cpu_init(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void rv32_sifive_u_cpu_init(Object *obj)
  static void rv32_sifive_e_cpu_init(Object *obj)
  {
      CPURISCVState *env = &RISCV_CPU(obj)->env;
 +    RISCVCPU *cpu = RISCV_CPU(obj);
 +
-     set_misa(env, MXL_RV32, RVI | RVM | RVA | RVC | RVU);
++    satp_mode_supported_max =
-     set_priv_version(env, PRIV_VERSION_1_10_0);
++                    satp_mode_max_from_map(cpu->cfg.satp_mode.supported);
--    qdev_prop_set_bit(DEVICE(obj), "mmu", false);
-+    cpu->cfg.mmu = false;
+     if (cpu->cfg.satp_mode.map == 0) {
- }
+         if (cpu->cfg.satp_mode.init == 0) {
  static void rv32_ibex_cpu_init(Object *obj)
  {
      CPURISCVState *env = &RISCV_CPU(obj)->env;
 +    RISCVCPU *cpu = RISCV_CPU(obj);
 +
      set_misa(env, MXL_RV32, RVI | RVM | RVC | RVU);
      set_priv_version(env, PRIV_VERSION_1_10_0);
 -    qdev_prop_set_bit(DEVICE(obj), "mmu", false);
 -    qdev_prop_set_bit(DEVICE(obj), "x-epmp", true);
 +    cpu->cfg.mmu = false;
 +    cpu->cfg.epmp = true;
  }
  static void rv32_imafcu_nommu_cpu_init(Object *obj)
  {
      CPURISCVState *env = &RISCV_CPU(obj)->env;
 +    RISCVCPU *cpu = RISCV_CPU(obj);
 +
      set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVC | RVU);
      set_priv_version(env, PRIV_VERSION_1_10_0);
      set_resetvec(env, DEFAULT_RSTVEC);
 -    qdev_prop_set_bit(DEVICE(obj), "mmu", false);
 +    cpu->cfg.mmu = false;
  }
  #endif
@@ -XXX,XX +XXX,XX @@ static void riscv_host_cpu_init(Object *obj)
  #elif defined(TARGET_RISCV64)
      set_misa(env, MXL_RV64, 0);
  #endif
 +    register_cpu_props(DEVICE(obj));
  }
  #endif
@@ -XXX,XX +XXX,XX @@ static void riscv_cpu_init(Object *obj)
  {
      RISCVCPU *cpu = RISCV_CPU(obj);
 +    cpu->cfg.ext_counters = true;
 +    cpu->cfg.ext_ifencei = true;
 +    cpu->cfg.ext_icsr = true;
 +    cpu->cfg.mmu = true;
 +    cpu->cfg.pmp = true;
 +
      cpu_set_cpustate_pointers(cpu);
  #ifndef CONFIG_USER_ONLY
@@ -XXX,XX +XXX,XX @@ static void riscv_cpu_init(Object *obj)
  #endif /* CONFIG_USER_ONLY */
  }
 -static Property riscv_cpu_properties[] = {
 +static Property riscv_cpu_extensions[] = {
      /* Defaults for standard extensions */
      DEFINE_PROP_BOOL("i", RISCVCPU, cfg.ext_i, true),
      DEFINE_PROP_BOOL("e", RISCVCPU, cfg.ext_e, false),
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_properties[] = {
      DEFINE_PROP_BOOL("Zve64f", RISCVCPU, cfg.ext_zve64f, false),
      DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
      DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
 -    DEFINE_PROP_BOOL("debug", RISCVCPU, cfg.debug, true),
      DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
      DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
      DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
      DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
 -    DEFINE_PROP_UINT32("mvendorid", RISCVCPU, cfg.mvendorid, 0),
 -    DEFINE_PROP_UINT64("marchid", RISCVCPU, cfg.marchid, RISCV_CPU_MARCHID),
 -    DEFINE_PROP_UINT64("mimpid", RISCVCPU, cfg.mimpid, RISCV_CPU_MIMPID),
 -
      DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false),
      DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),
      DEFINE_PROP_BOOL("svpbmt", RISCVCPU, cfg.ext_svpbmt, false),
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_properties[] = {
      DEFINE_PROP_BOOL("x-epmp", RISCVCPU, cfg.epmp, false),
      DEFINE_PROP_BOOL("x-aia", RISCVCPU, cfg.aia, false),
 +    DEFINE_PROP_END_OF_LIST(),
 +};
 +
 +static void register_cpu_props(DeviceState *dev)
 +{
 +    Property *prop;
 +
 +    for (prop = riscv_cpu_extensions; prop && prop->name; prop++) {
 +        qdev_property_add_static(dev, prop);
 +    }
 +}
 +
 +static Property riscv_cpu_properties[] = {
 +    DEFINE_PROP_BOOL("debug", RISCVCPU, cfg.debug, true),
 +
 +    DEFINE_PROP_UINT32("mvendorid", RISCVCPU, cfg.mvendorid, 0),
 +    DEFINE_PROP_UINT64("marchid", RISCVCPU, cfg.marchid, RISCV_CPU_MARCHID),
 +    DEFINE_PROP_UINT64("mimpid", RISCVCPU, cfg.mimpid, RISCV_CPU_MIMPID),
 +
      DEFINE_PROP_UINT64("resetvec", RISCVCPU, cfg.resetvec, DEFAULT_RSTVEC),
      DEFINE_PROP_BOOL("short-isa-string", RISCVCPU, cfg.short_isa_string, false),
 --
-.36.1
+.41.0

-[PULL 02/25] target/riscv: add support for zmmul extension v0.1
+[PULL v2 37/45] riscv: zicond: make non-experimental
-From: Weiwei Li <liweiwei@iscas.ac.cn>
+From: Vineet Gupta <vineetg@rivosinc.com>
-Add support for the zmmul extension v0.1. This extension includes all
+zicond is now codegen supported in both llvm and gcc.
 multiplication operations from the M extension but not the divide ops.
-Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn>
+This change allows seamless enabling/testing of zicond in downstream
-Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn>
+projects. e.g. currently riscv-gnu-toolchain parses elf attributes
-Reviewed-by: Víctor Colombo <victor.colombo@eldorado.org.br>
+to create a cmdline for qemu but fails short of enabling it because of
 the "x-" prefix.
 Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
 Message-ID: <20230808181715.436395-1-vineetg@rivosinc.com>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-Message-Id: <20220531030732.3850-1-liweiwei@iscas.ac.cn>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/cpu.h                      |  1 +
+ target/riscv/cpu.c | 2 +-
- target/riscv/cpu.c                      |  7 +++++++
+file changed, 1 insertion(+), 1 deletion(-)
  target/riscv/insn_trans/trans_rvm.c.inc | 18 ++++++++++++------
 files changed, 20 insertions(+), 6 deletions(-)
-diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/riscv/cpu.h
-+++ b/target/riscv/cpu.h
-@@ -XXX,XX +XXX,XX @@ struct RISCVCPUConfig {
-     bool ext_zhinxmin;
-     bool ext_zve32f;
-     bool ext_zve64f;
-+    bool ext_zmmul;
-     uint32_t mvendorid;
-     uint64_t marchid;
 diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/cpu.c
 +++ b/target/riscv/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
-             cpu->cfg.ext_ifencei = true;
+     DEFINE_PROP_BOOL("zcf", RISCVCPU, cfg.ext_zcf, false),
-         }
+     DEFINE_PROP_BOOL("zcmp", RISCVCPU, cfg.ext_zcmp, false),
+     DEFINE_PROP_BOOL("zcmt", RISCVCPU, cfg.ext_zcmt, false),
-+        if (cpu->cfg.ext_m && cpu->cfg.ext_zmmul) {
++    DEFINE_PROP_BOOL("zicond", RISCVCPU, cfg.ext_zicond, false),
-+            warn_report("Zmmul will override M");
-+            cpu->cfg.ext_m = false;
+     /* Vendor-specific custom extensions */
-+        }
+     DEFINE_PROP_BOOL("xtheadba", RISCVCPU, cfg.ext_xtheadba, false),
-+
+@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
-         if (cpu->cfg.ext_i && cpu->cfg.ext_e) {
+     DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, false),
              error_setg(errp,
                         "I and E extensions are incompatible");
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_properties[] = {
      /* These are experimental so mark with 'x-' */
-     DEFINE_PROP_BOOL("x-j", RISCVCPU, cfg.ext_j, false),
+-    DEFINE_PROP_BOOL("x-zicond", RISCVCPU, cfg.ext_zicond, false),
-+    DEFINE_PROP_BOOL("x-zmmul", RISCVCPU, cfg.ext_zmmul, false),
      /* ePMP 0.9.3 */
      DEFINE_PROP_BOOL("x-epmp", RISCVCPU, cfg.epmp, false),
-     DEFINE_PROP_BOOL("x-aia", RISCVCPU, cfg.aia, false),
-@@ -XXX,XX +XXX,XX @@ static void riscv_isa_string_ext(RISCVCPU *cpu, char **isa_str, int max_str_len)
-     struct isa_ext_data isa_edata_arr[] = {
-         ISA_EDATA_ENTRY(zicsr, ext_icsr),
-         ISA_EDATA_ENTRY(zifencei, ext_ifencei),
-+        ISA_EDATA_ENTRY(zmmul, ext_zmmul),
-         ISA_EDATA_ENTRY(zfh, ext_zfh),
-         ISA_EDATA_ENTRY(zfhmin, ext_zfhmin),
-         ISA_EDATA_ENTRY(zfinx, ext_zfinx),
-diff --git a/target/riscv/insn_trans/trans_rvm.c.inc b/target/riscv/insn_trans/trans_rvm.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/target/riscv/insn_trans/trans_rvm.c.inc
-+++ b/target/riscv/insn_trans/trans_rvm.c.inc
-@@ -XXX,XX +XXX,XX @@
-  * this program.  If not, see <http://www.gnu.org/licenses/>.
-  */
-+#define REQUIRE_M_OR_ZMMUL(ctx) do {                      \
-+    if (!ctx->cfg_ptr->ext_zmmul && !has_ext(ctx, RVM)) { \
-+        return false;                                     \
-+    }                                                     \
-+} while (0)
-+
- static void gen_mulhu_i128(TCGv r2, TCGv r3, TCGv al, TCGv ah, TCGv bl, TCGv bh)
- {
-     TCGv tmpl = tcg_temp_new();
-@@ -XXX,XX +XXX,XX @@ static void gen_mul_i128(TCGv rl, TCGv rh,
- static bool trans_mul(DisasContext *ctx, arg_mul *a)
- {
--    REQUIRE_EXT(ctx, RVM);
-+    REQUIRE_M_OR_ZMMUL(ctx);
-     return gen_arith(ctx, a, EXT_NONE, tcg_gen_mul_tl, gen_mul_i128);
- }
-@@ -XXX,XX +XXX,XX @@ static void gen_mulh_w(TCGv ret, TCGv s1, TCGv s2)
- static bool trans_mulh(DisasContext *ctx, arg_mulh *a)
- {
--    REQUIRE_EXT(ctx, RVM);
-+    REQUIRE_M_OR_ZMMUL(ctx);
-     return gen_arith_per_ol(ctx, a, EXT_SIGN, gen_mulh, gen_mulh_w,
-                             gen_mulh_i128);
- }
-@@ -XXX,XX +XXX,XX @@ static void gen_mulhsu_w(TCGv ret, TCGv arg1, TCGv arg2)
- static bool trans_mulhsu(DisasContext *ctx, arg_mulhsu *a)
- {
--    REQUIRE_EXT(ctx, RVM);
-+    REQUIRE_M_OR_ZMMUL(ctx);
-     return gen_arith_per_ol(ctx, a, EXT_NONE, gen_mulhsu, gen_mulhsu_w,
-                             gen_mulhsu_i128);
- }
-@@ -XXX,XX +XXX,XX @@ static void gen_mulhu(TCGv ret, TCGv s1, TCGv s2)
- static bool trans_mulhu(DisasContext *ctx, arg_mulhu *a)
- {
--    REQUIRE_EXT(ctx, RVM);
-+    REQUIRE_M_OR_ZMMUL(ctx);
-     /* gen_mulh_w works for either sign as input. */
-     return gen_arith_per_ol(ctx, a, EXT_ZERO, gen_mulhu, gen_mulh_w,
-                             gen_mulhu_i128);
-@@ -XXX,XX +XXX,XX @@ static bool trans_remu(DisasContext *ctx, arg_remu *a)
- static bool trans_mulw(DisasContext *ctx, arg_mulw *a)
- {
-     REQUIRE_64_OR_128BIT(ctx);
--    REQUIRE_EXT(ctx, RVM);
-+    REQUIRE_M_OR_ZMMUL(ctx);
-     ctx->ol = MXL_RV32;
-     return gen_arith(ctx, a, EXT_NONE, tcg_gen_mul_tl, NULL);
- }
-@@ -XXX,XX +XXX,XX @@ static bool trans_remuw(DisasContext *ctx, arg_remuw *a)
- static bool trans_muld(DisasContext *ctx, arg_muld *a)
- {
-     REQUIRE_128BIT(ctx);
--    REQUIRE_EXT(ctx, RVM);
-+    REQUIRE_M_OR_ZMMUL(ctx);
-     ctx->ol = MXL_RV64;
-     return gen_arith(ctx, a, EXT_SIGN, tcg_gen_mul_tl, NULL);
- }
 --
-.36.1
+.41.0

-[PULL 04/25] hw/intc: sifive_plic: Avoid overflowing the addr_config buffer
+[PULL v2 38/45] hw/riscv/virt.c: fix non-KVM --enable-debug build
-From: Alistair Francis <alistair.francis@wdc.com>
+From: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
-Since commit ad40be27 "target/riscv: Support start kernel directly by
+A build with --enable-debug and without KVM will fail as follows:
 KVM" we have been overflowing the addr_config on "M,MS..."
 configurations, as reported https://gitlab.com/qemu-project/qemu/-/issues/1050.
-This commit changes the loop in sifive_plic_create() from iterating over
+/usr/bin/ld: libqemu-riscv64-softmmu.fa.p/hw_riscv_virt.c.o: in function `virt_machine_init':
-the number of harts to just iterating over the addr_config. The
+./qemu/build/../hw/riscv/virt.c:1465: undefined reference to `kvm_riscv_aia_create'
 addr_config is based on the hart_config, and will contain interrup details
 for all harts. This way we can't iterate past the end of addr_config.
-Fixes: ad40be27084536 ("target/riscv: Support start kernel directly by KVM")
+This happens because the code block with "if virt_use_kvm_aia(s)" isn't
-Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1050
+being ignored by the debug build, resulting in an undefined reference to
 a KVM only function.
 Add a 'kvm_enabled()' conditional together with virt_use_kvm_aia() will
 make the compiler crop the kvm_riscv_aia_create() call entirely from a
 non-KVM build. Note that adding the 'kvm_enabled()' conditional inside
 virt_use_kvm_aia() won't fix the build because this function would need
 to be inlined multiple times to make the compiler zero out the entire
 block.
 While we're at it, use kvm_enabled() in all instances where
 virt_use_kvm_aia() is checked to allow the compiler to elide these other
 kvm-only instances as well.
 Suggested-by: Richard Henderson <richard.henderson@linaro.org>
 Fixes: dbdb99948e ("target/riscv: select KVM AIA in riscv virt machine")
 Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
 Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-ID: <20230830133503.711138-2-dbarboza@ventanamicro.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
-Reviewed-by: Mingwang Li <limingwang@huawei.com>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-Id: <20220601013631.196854-1-alistair.francis@opensource.wdc.com>
 ---
- hw/intc/sifive_plic.c | 19 +++++++++----------
+ hw/riscv/virt.c | 6 +++---
-file changed, 9 insertions(+), 10 deletions(-)
+file changed, 3 insertions(+), 3 deletions(-)
-diff --git a/hw/intc/sifive_plic.c b/hw/intc/sifive_plic.c
+diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/sifive_plic.c
+--- a/hw/riscv/virt.c
-+++ b/hw/intc/sifive_plic.c
++++ b/hw/riscv/virt.c
-@@ -XXX,XX +XXX,XX @@ DeviceState *sifive_plic_create(hwaddr addr, char *hart_config,
+@@ -XXX,XX +XXX,XX @@ static void create_fdt_sockets(RISCVVirtState *s, const MemMapEntry *memmap,
-     uint32_t context_stride, uint32_t aperture_size)
+     }
- {
-     DeviceState *dev = qdev_new(TYPE_SIFIVE_PLIC);
+     /* KVM AIA only has one APLIC instance */
--    int i, j = 0;
+-    if (virt_use_kvm_aia(s)) {
-+    int i;
++    if (kvm_enabled() && virt_use_kvm_aia(s)) {
-     SiFivePLICState *plic;
+         create_fdt_socket_aplic(s, memmap, 0,
+                                 msi_m_phandle, msi_s_phandle, phandle,
-     assert(enable_stride == (enable_stride & -enable_stride));
+                                 &intc_phandles[0], xplic_phandles,
-@@ -XXX,XX +XXX,XX @@ DeviceState *sifive_plic_create(hwaddr addr, char *hart_config,
+@@ -XXX,XX +XXX,XX @@ static void create_fdt_sockets(RISCVVirtState *s, const MemMapEntry *memmap,
-     sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, addr);
+     g_free(intc_phandles);
-     plic = SIFIVE_PLIC(dev);
--    for (i = 0; i < num_harts; i++) {
+-    if (virt_use_kvm_aia(s)) {
--        CPUState *cpu = qemu_get_cpu(hartid_base + i);
++    if (kvm_enabled() && virt_use_kvm_aia(s)) {
+         *irq_mmio_phandle = xplic_phandles[0];
--        if (plic->addr_config[j].mode == PLICMode_M) {
+         *irq_virtio_phandle = xplic_phandles[0];
--            j++;
+         *irq_pcie_phandle = xplic_phandles[0];
--            qdev_connect_gpio_out(dev, num_harts + i,
+@@ -XXX,XX +XXX,XX @@ static void virt_machine_init(MachineState *machine)
 +    for (i = 0; i < plic->num_addrs; i++) {
 +        int cpu_num = plic->addr_config[i].hartid;
 +        CPUState *cpu = qemu_get_cpu(hartid_base + cpu_num);
 +
 +        if (plic->addr_config[i].mode == PLICMode_M) {
 +            qdev_connect_gpio_out(dev, num_harts + cpu_num,
                                    qdev_get_gpio_in(DEVICE(cpu), IRQ_M_EXT));
          }
 -
 -        if (plic->addr_config[j].mode == PLICMode_S) {
 -            j++;
 -            qdev_connect_gpio_out(dev, i,
 +        if (plic->addr_config[i].mode == PLICMode_S) {
 +            qdev_connect_gpio_out(dev, cpu_num,
                                    qdev_get_gpio_in(DEVICE(cpu), IRQ_S_EXT));
          }
      }
+-    if (virt_use_kvm_aia(s)) {
++    if (kvm_enabled() && virt_use_kvm_aia(s)) {
+         kvm_riscv_aia_create(machine, IMSIC_MMIO_GROUP_MIN_SHIFT,
+                              VIRT_IRQCHIP_NUM_SOURCES, VIRT_IRQCHIP_NUM_MSIS,
+                              memmap[VIRT_APLIC_S].base,
 --
-.36.1
+.41.0

-[PULL 16/25] target/riscv: rvv: Add tail agnostic for vector integer comparison instructions
+[PULL v2 39/45] hw/intc/riscv_aplic.c fix non-KVM --enable-debug build
-From: eopXD <yueh.ting.chen@gmail.com>
+From: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
-Compares write mask registers, and so always operate under a tail-
+Commit 6df0b37e2ab breaks a --enable-debug build in a non-KVM
-agnostic policy.
+environment with the following error:
-Signed-off-by: eop Chen <eop.chen@sifive.com>
+/usr/bin/ld: libqemu-riscv64-softmmu.fa.p/hw_intc_riscv_aplic.c.o: in function `riscv_kvm_aplic_request':
-Reviewed-by: Frank Chang <frank.chang@sifive.com>
+./qemu/build/../hw/intc/riscv_aplic.c:486: undefined reference to `kvm_set_irq'
-Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
+collect2: error: ld returned 1 exit status
-Acked-by: Alistair Francis <alistair.francis@wdc.com>
-Message-Id: <165449614532.19704.7000832880482980398-9@git.sr.ht>
+This happens because the debug build will poke into the
 'if (is_kvm_aia(aplic->msimode))' block and fail to find a reference to
 the KVM only function riscv_kvm_aplic_request().
 There are multiple solutions to fix this. We'll go with the same
 solution from the previous patch, i.e. add a kvm_enabled() conditional
 to filter out the block. But there's a catch: riscv_kvm_aplic_request()
 is a local function that would end up being used if the compiler crops
 the block, and this won't work. Quoting Richard Henderson's explanation
 in [1]:
 "(...) the compiler won't eliminate entire unused functions with -O0"
 We'll solve it by moving riscv_kvm_aplic_request() to kvm.c and add its
 declaration in kvm_riscv.h, where all other KVM specific public
 functions are already declared. Other archs handles KVM specific code in
 this manner and we expect to do the same from now on.
 [1] https://lore.kernel.org/qemu-riscv/d2f1ad02-eb03-138f-9d08-db676deeed05@linaro.org/
 Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
 Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-ID: <20230830133503.711138-3-dbarboza@ventanamicro.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/vector_helper.c | 18 ++++++++++++++++++
+ target/riscv/kvm_riscv.h | 1 +
-file changed, 18 insertions(+)
+ hw/intc/riscv_aplic.c    | 8 ++------
  target/riscv/kvm.c       | 5 +++++
 files changed, 8 insertions(+), 6 deletions(-)
-diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
+diff --git a/target/riscv/kvm_riscv.h b/target/riscv/kvm_riscv.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/riscv/vector_helper.c
+--- a/target/riscv/kvm_riscv.h
-+++ b/target/riscv/vector_helper.c
++++ b/target/riscv/kvm_riscv.h
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
+@@ -XXX,XX +XXX,XX @@ void kvm_riscv_aia_create(MachineState *machine, uint64_t group_shift,
- {                                                             \
+                           uint64_t aia_irq_num, uint64_t aia_msi_num,
-     uint32_t vm = vext_vm(desc);                              \
+                           uint64_t aplic_base, uint64_t imsic_base,
-     uint32_t vl = env->vl;                                    \
+                           uint64_t guest_num);
-+    uint32_t total_elems = env_archcpu(env)->cfg.vlen;        \
++void riscv_kvm_aplic_request(void *opaque, int irq, int level);
-+    uint32_t vta_all_1s = vext_vta_all_1s(desc);              \
-     uint32_t i;                                               \
+ #endif
-                                                               \
+diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c
-     for (i = env->vstart; i < vl; i++) {                      \
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
+--- a/hw/intc/riscv_aplic.c
-         vext_set_elem_mask(vd, i, DO_OP(s2, s1));             \
++++ b/hw/intc/riscv_aplic.c
-     }                                                         \
+@@ -XXX,XX +XXX,XX @@
-     env->vstart = 0;                                          \
+ #include "target/riscv/cpu.h"
-+    /* mask destination register are always tail-agnostic */  \
+ #include "sysemu/sysemu.h"
-+    /* set tail elements to 1s */                             \
+ #include "sysemu/kvm.h"
-+    if (vta_all_1s) {                                         \
++#include "kvm_riscv.h"
-+        for (; i < total_elems; i++) {                        \
+ #include "migration/vmstate.h"
-+            vext_set_elem_mask(vd, i, 1);                     \
-+        }                                                     \
+ #define APLIC_MAX_IDC                  (1UL << 14)
-+    }                                                         \
+@@ -XXX,XX +XXX,XX @@ static uint32_t riscv_aplic_idc_claimi(RISCVAPLICState *aplic, uint32_t idc)
      return topi;
  }
- GEN_VEXT_CMP_VV(vmseq_vv_b, uint8_t,  H1, DO_MSEQ)
+-static void riscv_kvm_aplic_request(void *opaque, int irq, int level)
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,   \
+-{
- {                                                                   \
+-    kvm_set_irq(kvm_state, irq, !!level);
-     uint32_t vm = vext_vm(desc);                                    \
+-}
-     uint32_t vl = env->vl;                                          \
+-
-+    uint32_t total_elems = env_archcpu(env)->cfg.vlen;              \
+ static void riscv_aplic_request(void *opaque, int irq, int level)
-+    uint32_t vta_all_1s = vext_vta_all_1s(desc);                    \
+ {
-     uint32_t i;                                                     \
+     bool update = false;
-                                                                     \
+@@ -XXX,XX +XXX,XX @@ static void riscv_aplic_realize(DeviceState *dev, Error **errp)
-     for (i = env->vstart; i < vl; i++) {                            \
+      * have IRQ lines delegated by their parent APLIC.
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,   \
+      */
-                 DO_OP(s2, (ETYPE)(target_long)s1));                 \
+     if (!aplic->parent) {
-     }                                                               \
+-        if (is_kvm_aia(aplic->msimode)) {
-     env->vstart = 0;                                                \
++        if (kvm_enabled() && is_kvm_aia(aplic->msimode)) {
-+    /* mask destination register are always tail-agnostic */        \
+             qdev_init_gpio_in(dev, riscv_kvm_aplic_request, aplic->num_irqs);
-+    /* set tail elements to 1s */                                   \
+         } else {
-+    if (vta_all_1s) {                                               \
+             qdev_init_gpio_in(dev, riscv_aplic_request, aplic->num_irqs);
-+        for (; i < total_elems; i++) {                              \
+diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
-+            vext_set_elem_mask(vd, i, 1);                           \
+index XXXXXXX..XXXXXXX 100644
-+        }                                                           \
+--- a/target/riscv/kvm.c
-+    }                                                               \
++++ b/target/riscv/kvm.c
- }
+@@ -XXX,XX +XXX,XX @@
+ #include "sysemu/runstate.h"
- GEN_VEXT_CMP_VX(vmseq_vx_b, uint8_t,  H1, DO_MSEQ)
+ #include "hw/riscv/numa.h"
 +void riscv_kvm_aplic_request(void *opaque, int irq, int level)
 +{
 +    kvm_set_irq(kvm_state, irq, !!level);
 +}
 +
  static uint64_t kvm_riscv_reg_id(CPURISCVState *env, uint64_t type,
                                   uint64_t idx)
  {
 --
-.36.1
+.41.0

-New patch
+[PULL v2 40/45] linux-user/riscv: Add new extensions to hwprobe
+From: Robbin Ehn <rehn@rivosinc.com>
+This patch adds the new extensions in
+linux 6.5 to the hwprobe syscall.
+And fixes RVC check to OR with correct value.
+The previous variable contains 0 therefore it
+did work.
+Signed-off-by: Robbin Ehn <rehn@rivosinc.com>
+Acked-by: Richard Henderson <richard.henderson@linaro.org>
+Acked-by: Alistair Francis <alistair.francis@wdc.com>
+Message-ID: <bc82203b72d7efb30f1b4a8f9eb3d94699799dc8.camel@rivosinc.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ linux-user/syscall.c | 14 +++++++++++++-
+file changed, 13 insertions(+), 1 deletion(-)
+diff --git a/linux-user/syscall.c b/linux-user/syscall.c
+index XXXXXXX..XXXXXXX 100644
+--- a/linux-user/syscall.c
++++ b/linux-user/syscall.c
+@@ -XXX,XX +XXX,XX @@ static int do_getdents64(abi_long dirfd, abi_long arg2, abi_long count)
+ #define RISCV_HWPROBE_KEY_IMA_EXT_0     4
+ #define     RISCV_HWPROBE_IMA_FD       (1 << 0)
+ #define     RISCV_HWPROBE_IMA_C        (1 << 1)
++#define     RISCV_HWPROBE_IMA_V        (1 << 2)
++#define     RISCV_HWPROBE_EXT_ZBA      (1 << 3)
++#define     RISCV_HWPROBE_EXT_ZBB      (1 << 4)
++#define     RISCV_HWPROBE_EXT_ZBS      (1 << 5)
+ #define RISCV_HWPROBE_KEY_CPUPERF_0     5
+ #define     RISCV_HWPROBE_MISALIGNED_UNKNOWN     (0 << 0)
+@@ -XXX,XX +XXX,XX @@ static void risc_hwprobe_fill_pairs(CPURISCVState *env,
+                     riscv_has_ext(env, RVD) ?
+                     RISCV_HWPROBE_IMA_FD : 0;
+             value |= riscv_has_ext(env, RVC) ?
+-                     RISCV_HWPROBE_IMA_C : pair->value;
++                     RISCV_HWPROBE_IMA_C : 0;
++            value |= riscv_has_ext(env, RVV) ?
++                     RISCV_HWPROBE_IMA_V : 0;
++            value |= cfg->ext_zba ?
++                     RISCV_HWPROBE_EXT_ZBA : 0;
++            value |= cfg->ext_zbb ?
++                     RISCV_HWPROBE_EXT_ZBB : 0;
++            value |= cfg->ext_zbs ?
++                     RISCV_HWPROBE_EXT_ZBS : 0;
+             __put_user(value, &pair->value);
+             break;
+         case RISCV_HWPROBE_KEY_CPUPERF_0:
+--
+.41.0

-[PULL 09/25] target/riscv: rvv: Prune redundant access_type parameter passed
+[PULL v2 41/45] target/riscv: Use accelerated helper for AES64KS1I
-From: eopXD <yueh.ting.chen@gmail.com>
+From: Ard Biesheuvel <ardb@kernel.org>
-No functional change intended in this commit.
+Use the accelerated SubBytes/ShiftRows/AddRoundKey AES helper to
 implement the first half of the key schedule derivation. This does not
 actually involve shifting rows, so clone the same value into all four
 columns of the AES vector to counter that operation.
-Signed-off-by: eop Chen <eop.chen@sifive.com>
+Cc: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-Id: <165449614532.19704.7000832880482980398-2@git.sr.ht>
+Cc: Palmer Dabbelt <palmer@dabbelt.com>
 Cc: Alistair Francis <alistair.francis@wdc.com>
 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-ID: <20230831154118.138727-1-ardb@kernel.org>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/vector_helper.c | 35 ++++++++++++++++-------------------
+ target/riscv/crypto_helper.c | 17 +++++------------
-file changed, 16 insertions(+), 19 deletions(-)
+file changed, 5 insertions(+), 12 deletions(-)
-diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
+diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/riscv/vector_helper.c
+--- a/target/riscv/crypto_helper.c
-+++ b/target/riscv/vector_helper.c
++++ b/target/riscv/crypto_helper.c
-@@ -XXX,XX +XXX,XX @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
+@@ -XXX,XX +XXX,XX @@ target_ulong HELPER(aes64ks1i)(target_ulong rs1, target_ulong rnum)
-                  target_ulong stride, CPURISCVState *env,
-                  uint32_t desc, uint32_t vm,
+     uint8_t enc_rnum = rnum;
-                  vext_ldst_elem_fn *ldst_elem,
+     uint32_t temp = (RS1 >> 32) & 0xFFFFFFFF;
--                 uint32_t esz, uintptr_t ra, MMUAccessType access_type)
+-    uint8_t rcon_ = 0;
-+                 uint32_t esz, uintptr_t ra)
+-    target_ulong result;
- {
++    AESState t, rc = {};
-     uint32_t i, k;
-     uint32_t nf = vext_nf(desc);
+     if (enc_rnum != 0xA) {
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void * v0, target_ulong base,               \
+         temp = ror32(temp, 8); /* Rotate right by 8 */
- {                                                                       \
+-        rcon_ = round_consts[enc_rnum];
-     uint32_t vm = vext_vm(desc);                                        \
++        rc.w[0] = rc.w[1] = round_consts[enc_rnum];
-     vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN,      \
+     }
--                     ctzl(sizeof(ETYPE)), GETPC(), MMU_DATA_LOAD);      \
-+                     ctzl(sizeof(ETYPE)), GETPC());                     \
+-    temp = ((uint32_t)AES_sbox[(temp >> 24) & 0xFF] << 24) |
 -           ((uint32_t)AES_sbox[(temp >> 16) & 0xFF] << 16) |
 -           ((uint32_t)AES_sbox[(temp >> 8) & 0xFF] << 8) |
 -           ((uint32_t)AES_sbox[(temp >> 0) & 0xFF] << 0);
 +    t.w[0] = t.w[1] = t.w[2] = t.w[3] = temp;
 +    aesenc_SB_SR_AK(&t, &t, &rc, false);
 -    temp ^= rcon_;
 -
 -    result = ((uint64_t)temp << 32) | temp;
 -
 -    return result;
 +    return t.d[0];
  }
- GEN_VEXT_LD_STRIDE(vlse8_v,  int8_t,  lde_b)
+ target_ulong HELPER(aes64im)(target_ulong rs1)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
  {                                                                       \
      uint32_t vm = vext_vm(desc);                                        \
      vext_ldst_stride(vd, v0, base, stride, env, desc, vm, STORE_FN,     \
 -                     ctzl(sizeof(ETYPE)), GETPC(), MMU_DATA_STORE);     \
 +                     ctzl(sizeof(ETYPE)), GETPC());                     \
  }
  GEN_VEXT_ST_STRIDE(vsse8_v,  int8_t,  ste_b)
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_ST_STRIDE(vsse64_v, int64_t, ste_d)
  static void
  vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
               vext_ldst_elem_fn *ldst_elem, uint32_t esz, uint32_t evl,
 -             uintptr_t ra, MMUAccessType access_type)
 +             uintptr_t ra)
  {
      uint32_t i, k;
      uint32_t nf = vext_nf(desc);
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
  {                                                                       \
      uint32_t stride = vext_nf(desc) << ctzl(sizeof(ETYPE));             \
      vext_ldst_stride(vd, v0, base, stride, env, desc, false, LOAD_FN,   \
 -                     ctzl(sizeof(ETYPE)), GETPC(), MMU_DATA_LOAD);      \
 +                     ctzl(sizeof(ETYPE)), GETPC());                     \
  }                                                                       \
                                                                          \
  void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
                    CPURISCVState *env, uint32_t desc)                    \
  {                                                                       \
      vext_ldst_us(vd, base, env, desc, LOAD_FN,                          \
 -                 ctzl(sizeof(ETYPE)), env->vl, GETPC(), MMU_DATA_LOAD); \
 +                 ctzl(sizeof(ETYPE)), env->vl, GETPC());                \
  }
  GEN_VEXT_LD_US(vle8_v,  int8_t,  lde_b)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,          \
  {                                                                        \
      uint32_t stride = vext_nf(desc) << ctzl(sizeof(ETYPE));              \
      vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN,   \
 -                     ctzl(sizeof(ETYPE)), GETPC(), MMU_DATA_STORE);      \
 +                     ctzl(sizeof(ETYPE)), GETPC());                      \
  }                                                                        \
                                                                           \
  void HELPER(NAME)(void *vd, void *v0, target_ulong base,                 \
                    CPURISCVState *env, uint32_t desc)                     \
  {                                                                        \
      vext_ldst_us(vd, base, env, desc, STORE_FN,                          \
 -                 ctzl(sizeof(ETYPE)), env->vl, GETPC(), MMU_DATA_STORE); \
 +                 ctzl(sizeof(ETYPE)), env->vl, GETPC());                 \
  }
  GEN_VEXT_ST_US(vse8_v,  int8_t,  ste_b)
@@ -XXX,XX +XXX,XX @@ void HELPER(vlm_v)(void *vd, void *v0, target_ulong base,
      /* evl = ceil(vl/8) */
      uint8_t evl = (env->vl + 7) >> 3;
      vext_ldst_us(vd, base, env, desc, lde_b,
 -                 0, evl, GETPC(), MMU_DATA_LOAD);
 +                 0, evl, GETPC());
  }
  void HELPER(vsm_v)(void *vd, void *v0, target_ulong base,
@@ -XXX,XX +XXX,XX @@ void HELPER(vsm_v)(void *vd, void *v0, target_ulong base,
      /* evl = ceil(vl/8) */
      uint8_t evl = (env->vl + 7) >> 3;
      vext_ldst_us(vd, base, env, desc, ste_b,
 -                 0, evl, GETPC(), MMU_DATA_STORE);
 +                 0, evl, GETPC());
  }
  /*
@@ -XXX,XX +XXX,XX @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
                  void *vs2, CPURISCVState *env, uint32_t desc,
                  vext_get_index_addr get_index_addr,
                  vext_ldst_elem_fn *ldst_elem,
 -                uint32_t esz, uintptr_t ra, MMUAccessType access_type)
 +                uint32_t esz, uintptr_t ra)
  {
      uint32_t i, k;
      uint32_t nf = vext_nf(desc);
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong base,                   \
                    void *vs2, CPURISCVState *env, uint32_t desc)            \
  {                                                                          \
      vext_ldst_index(vd, v0, base, vs2, env, desc, INDEX_FN,                \
 -                    LOAD_FN, ctzl(sizeof(ETYPE)), GETPC(), MMU_DATA_LOAD); \
 +                    LOAD_FN, ctzl(sizeof(ETYPE)), GETPC());                \
  }
  GEN_VEXT_LD_INDEX(vlxei8_8_v,   int8_t,  idx_b, lde_b)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong base,         \
  {                                                                \
      vext_ldst_index(vd, v0, base, vs2, env, desc, INDEX_FN,      \
                      STORE_FN, ctzl(sizeof(ETYPE)),               \
 -                    GETPC(), MMU_DATA_STORE);                    \
 +                    GETPC());                                    \
  }
  GEN_VEXT_ST_INDEX(vsxei8_8_v,   int8_t,  idx_b, ste_b)
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_LDFF(vle64ff_v, int64_t, lde_d)
   */
  static void
  vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
 -                vext_ldst_elem_fn *ldst_elem, uint32_t esz, uintptr_t ra,
 -                MMUAccessType access_type)
 +                vext_ldst_elem_fn *ldst_elem, uint32_t esz, uintptr_t ra)
  {
      uint32_t i, k, off, pos;
      uint32_t nf = vext_nf(desc);
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, target_ulong base,       \
                    CPURISCVState *env, uint32_t desc) \
  {                                                    \
      vext_ldst_whole(vd, base, env, desc, LOAD_FN,    \
 -                    ctzl(sizeof(ETYPE)), GETPC(),    \
 -                    MMU_DATA_LOAD);                  \
 +                    ctzl(sizeof(ETYPE)), GETPC());   \
  }
  GEN_VEXT_LD_WHOLE(vl1re8_v,  int8_t,  lde_b)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, target_ulong base,       \
                    CPURISCVState *env, uint32_t desc) \
  {                                                    \
      vext_ldst_whole(vd, base, env, desc, STORE_FN,   \
 -                    ctzl(sizeof(ETYPE)), GETPC(),    \
 -                    MMU_DATA_STORE);                 \
 +                    ctzl(sizeof(ETYPE)), GETPC());   \
  }
  GEN_VEXT_ST_WHOLE(vs1r_v, int8_t, ste_b)
 --
-.36.1
+.41.0

-[PULL 07/25] target/riscv/debug.c: keep experimental rv128 support working
+[PULL v2 42/45] target/riscv: Allocate itrigger timers only once
-From: Frédéric Pétrot <frederic.petrot@univ-grenoble-alpes.fr>
+From: Akihiko Odaki <akihiko.odaki@daynix.com>
-Add an MXL_RV128 case in two switches so that no error is triggered when
+riscv_trigger_init() had been called on reset events that can happen
-using the -cpu x-rv128 option.
+several times for a CPU and it allocated timers for itrigger. If old
 timers were present, they were simply overwritten by the new timers,
 resulting in a memory leak.
-Signed-off-by: Frédéric Pétrot <frederic.petrot@univ-grenoble-alpes.fr>
+Divide riscv_trigger_init() into two functions, namely
-Acked-by: Alistair Francis <alistair.francis@wdc.com>
+riscv_trigger_realize() and riscv_trigger_reset() and call them in
-Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
+appropriate timing. The timer allocation will happen only once for a
-Message-Id: <20220602155246.38837-1-frederic.petrot@univ-grenoble-alpes.fr>
+CPU in riscv_trigger_realize().
 Fixes: 5a4ae64cac ("target/riscv: Add itrigger support when icount is enabled")
 Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Message-ID: <20230818034059.9146-1-akihiko.odaki@daynix.com>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/debug.c | 2 ++
+ target/riscv/debug.h |  3 ++-
-file changed, 2 insertions(+)
+ target/riscv/cpu.c   |  8 +++++++-
  target/riscv/debug.c | 15 ++++++++++++---
 files changed, 21 insertions(+), 5 deletions(-)
+diff --git a/target/riscv/debug.h b/target/riscv/debug.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/debug.h
++++ b/target/riscv/debug.h
+@@ -XXX,XX +XXX,XX @@ void riscv_cpu_debug_excp_handler(CPUState *cs);
+ bool riscv_cpu_debug_check_breakpoint(CPUState *cs);
+ bool riscv_cpu_debug_check_watchpoint(CPUState *cs, CPUWatchpoint *wp);
+-void riscv_trigger_init(CPURISCVState *env);
++void riscv_trigger_realize(CPURISCVState *env);
++void riscv_trigger_reset_hold(CPURISCVState *env);
+ bool riscv_itrigger_enabled(CPURISCVState *env);
+ void riscv_itrigger_update_priv(CPURISCVState *env);
+diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/cpu.c
++++ b/target/riscv/cpu.c
+@@ -XXX,XX +XXX,XX @@ static void riscv_cpu_reset_hold(Object *obj)
+ #ifndef CONFIG_USER_ONLY
+     if (cpu->cfg.debug) {
+-        riscv_trigger_init(env);
++        riscv_trigger_reset_hold(env);
+     }
+     if (kvm_enabled()) {
+@@ -XXX,XX +XXX,XX @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
+     riscv_cpu_register_gdb_regs_for_features(cs);
++#ifndef CONFIG_USER_ONLY
++    if (cpu->cfg.debug) {
++        riscv_trigger_realize(&cpu->env);
++    }
++#endif
++
+     qemu_init_vcpu(cs);
+     cpu_reset(cs);
 diff --git a/target/riscv/debug.c b/target/riscv/debug.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/debug.c
 +++ b/target/riscv/debug.c
-@@ -XXX,XX +XXX,XX @@ static inline target_ulong trigger_type(CPURISCVState *env,
+@@ -XXX,XX +XXX,XX @@ bool riscv_cpu_debug_check_watchpoint(CPUState *cs, CPUWatchpoint *wp)
-         tdata1 = RV32_TYPE(type);
+     return false;
-         break;
+ }
-     case MXL_RV64:
-+    case MXL_RV128:
+-void riscv_trigger_init(CPURISCVState *env)
-         tdata1 = RV64_TYPE(type);
++void riscv_trigger_realize(CPURISCVState *env)
-         break;
++{
-     default:
++    int i;
-@@ -XXX,XX +XXX,XX @@ static target_ulong tdata1_validate(CPURISCVState *env, target_ulong val,
++
-         tdata1 = RV32_TYPE(t);
++    for (i = 0; i < RV_MAX_TRIGGERS; i++) {
-         break;
++        env->itrigger_timer[i] = timer_new_ns(QEMU_CLOCK_VIRTUAL,
-     case MXL_RV64:
++                                              riscv_itrigger_timer_cb, env);
-+    case MXL_RV128:
++    }
-         type = extract64(val, 60, 4);
++}
-         dmode = extract64(val, 59, 1);
++
-         tdata1 = RV64_TYPE(t);
++void riscv_trigger_reset_hold(CPURISCVState *env)
  {
      target_ulong tdata1 = build_tdata1(env, TRIGGER_TYPE_AD_MATCH, 0, 0);
      int i;
@@ -XXX,XX +XXX,XX @@ void riscv_trigger_init(CPURISCVState *env)
          env->tdata3[i] = 0;
          env->cpu_breakpoint[i] = NULL;
          env->cpu_watchpoint[i] = NULL;
 -        env->itrigger_timer[i] = timer_new_ns(QEMU_CLOCK_VIRTUAL,
 -                                              riscv_itrigger_timer_cb, env);
 +        timer_del(env->itrigger_timer[i]);
      }
  }
 --
-.36.1
+.41.0

-New patch
+[PULL v2 43/45] target/riscv/pmp.c: respect mseccfg.RLB for pmpaddrX changes
+From: Leon Schuermann <leons@opentitan.org>
+When the rule-lock bypass (RLB) bit is set in the mseccfg CSR, the PMP
+configuration lock bits must not apply. While this behavior is
+implemented for the pmpcfgX CSRs, this bit is not respected for
+changes to the pmpaddrX CSRs. This patch ensures that pmpaddrX CSR
+writes work even on locked regions when the global rule-lock bypass is
+enabled.
+Signed-off-by: Leon Schuermann <leons@opentitan.org>
+Reviewed-by: Mayuresh Chitale <mchitale@ventanamicro.com>
+Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Message-ID: <20230829215046.1430463-1-leon@is.currently.online>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ target/riscv/pmp.c | 4 ++++
+file changed, 4 insertions(+)
+diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/pmp.c
++++ b/target/riscv/pmp.c
+@@ -XXX,XX +XXX,XX @@ static inline uint8_t pmp_get_a_field(uint8_t cfg)
+  */
+ static inline int pmp_is_locked(CPURISCVState *env, uint32_t pmp_index)
+ {
++    /* mseccfg.RLB is set */
++    if (MSECCFG_RLB_ISSET(env)) {
++        return 0;
++    }
+     if (env->pmp_state.pmp[pmp_index].cfg_reg & PMP_LOCK) {
+         return 1;
+--
+.41.0

-New patch
+[PULL v2 44/45] target/riscv: Align the AIA model to v1.0 ratified spec
+From: Tommy Wu <tommy.wu@sifive.com>
+According to the new spec, when vsiselect has a reserved value, attempts
+from M-mode or HS-mode to access vsireg, or from VS-mode to access
+sireg, should preferably raise an illegal instruction exception.
+Signed-off-by: Tommy Wu <tommy.wu@sifive.com>
+Reviewed-by: Frank Chang <frank.chang@sifive.com>
+Message-ID: <20230816061647.600672-1-tommy.wu@sifive.com>
+Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
+---
+ target/riscv/csr.c | 7 +++++--
+file changed, 5 insertions(+), 2 deletions(-)
+diff --git a/target/riscv/csr.c b/target/riscv/csr.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/riscv/csr.c
++++ b/target/riscv/csr.c
+@@ -XXX,XX +XXX,XX @@ static int rmw_iprio(target_ulong xlen,
+ static int rmw_xireg(CPURISCVState *env, int csrno, target_ulong *val,
+                      target_ulong new_val, target_ulong wr_mask)
+ {
+-    bool virt;
++    bool virt, isel_reserved;
+     uint8_t *iprio;
+     int ret = -EINVAL;
+     target_ulong priv, isel, vgein;
+@@ -XXX,XX +XXX,XX @@ static int rmw_xireg(CPURISCVState *env, int csrno, target_ulong *val,
+     /* Decode register details from CSR number */
+     virt = false;
++    isel_reserved = false;
+     switch (csrno) {
+     case CSR_MIREG:
+         iprio = env->miprio;
+@@ -XXX,XX +XXX,XX @@ static int rmw_xireg(CPURISCVState *env, int csrno, target_ulong *val,
+                                                   riscv_cpu_mxl_bits(env)),
+                                     val, new_val, wr_mask);
+         }
++    } else {
++        isel_reserved = true;
+     }
+ done:
+     if (ret) {
+-        return (env->virt_enabled && virt) ?
++        return (env->virt_enabled && virt && !isel_reserved) ?
+                RISCV_EXCP_VIRT_INSTRUCTION_FAULT : RISCV_EXCP_ILLEGAL_INST;
+     }
+     return RISCV_EXCP_NONE;
+--
+.41.0

-[PULL 22/25] target/riscv: rvv: Add tail agnostic for vector permutation instructions
+[PULL v2 45/45] target/riscv: don't read CSR in riscv_csrrw_do64
-From: eopXD <yueh.ting.chen@gmail.com>
+From: Nikita Shubin <n.shubin@yadro.com>
-Signed-off-by: eop Chen <eop.chen@sifive.com>
+As per ISA:
-Reviewed-by: Frank Chang <frank.chang@sifive.com>
-Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
+"For CSRRWI, if rd=x0, then the instruction shall not read the CSR and
-Acked-by: Alistair Francis <alistair.francis@wdc.com>
+shall not cause any of the side effects that might occur on a CSR read."
-Message-Id: <165449614532.19704.7000832880482980398-15@git.sr.ht>
 trans_csrrwi() and trans_csrrw() call do_csrw() if rd=x0, do_csrw() calls
 riscv_csrrw_do64(), via helper_csrw() passing NULL as *ret_value.
 Signed-off-by: Nikita Shubin <n.shubin@yadro.com>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Message-ID: <20230808090914.17634-1-nikita.shubin@maquefel.me>
 Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
 ---
- target/riscv/vector_helper.c            | 40 +++++++++++++++++++++++++
+ target/riscv/csr.c | 24 +++++++++++++++---------
- target/riscv/insn_trans/trans_rvv.c.inc |  7 +++--
+file changed, 15 insertions(+), 9 deletions(-)
 files changed, 45 insertions(+), 2 deletions(-)
-diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
+diff --git a/target/riscv/csr.c b/target/riscv/csr.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/riscv/vector_helper.c
+--- a/target/riscv/csr.c
-+++ b/target/riscv/vector_helper.c
++++ b/target/riscv/csr.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
+@@ -XXX,XX +XXX,XX @@ static RISCVException riscv_csrrw_do64(CPURISCVState *env, int csrno,
- {                                                                         \
+                                        target_ulong write_mask)
-     uint32_t vm = vext_vm(desc);                                          \
+ {
-     uint32_t vl = env->vl;                                                \
+     RISCVException ret;
-+    uint32_t esz = sizeof(ETYPE);                                         \
+-    target_ulong old_value;
-+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
++    target_ulong old_value = 0;
-+    uint32_t vta = vext_vta(desc);                                        \
-     target_ulong offset = s1, i_min, i;                                   \
+     /* execute combined read/write operation if it exists */
-                                                                           \
+     if (csr_ops[csrno].op) {
-     i_min = MAX(env->vstart, offset);                                     \
+         return csr_ops[csrno].op(env, csrno, ret_value, new_value, write_mask);
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
          }                                                                 \
          *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));          \
      }                                                                     \
 +    /* set tail elements to 1s */                                         \
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
  }
  /* vslideup.vx vd, vs2, rs1, vm # vd[i+rs1] = vs2[i] */
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
      uint32_t vlmax = vext_max_elems(desc, ctzl(sizeof(ETYPE)));           \
      uint32_t vm = vext_vm(desc);                                          \
      uint32_t vl = env->vl;                                                \
 +    uint32_t esz = sizeof(ETYPE);                                         \
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
 +    uint32_t vta = vext_vta(desc);                                        \
      target_ulong i_max, i;                                                \
                                                                            \
      i_max = MAX(MIN(s1 < vlmax ? vlmax - s1 : 0, vl), env->vstart);       \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
      }                                                                     \
                                                                            \
      env->vstart = 0;                                                      \
 +    /* set tail elements to 1s */                                         \
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
  }
  /* vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+rs1] */
@@ -XXX,XX +XXX,XX @@ static void vslide1up_##BITWIDTH(void *vd, void *v0, target_ulong s1,       \
      typedef uint##BITWIDTH##_t ETYPE;                                       \
      uint32_t vm = vext_vm(desc);                                            \
      uint32_t vl = env->vl;                                                  \
 +    uint32_t esz = sizeof(ETYPE);                                           \
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);            \
 +    uint32_t vta = vext_vta(desc);                                          \
      uint32_t i;                                                             \
                                                                              \
      for (i = env->vstart; i < vl; i++) {                                    \
@@ -XXX,XX +XXX,XX @@ static void vslide1up_##BITWIDTH(void *vd, void *v0, target_ulong s1,       \
          }                                                                   \
      }                                                                       \
      env->vstart = 0;                                                        \
 +    /* set tail elements to 1s */                                           \
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);                \
  }
  GEN_VEXT_VSLIE1UP(8,  H1)
@@ -XXX,XX +XXX,XX @@ static void vslide1down_##BITWIDTH(void *vd, void *v0, target_ulong s1,       \
      typedef uint##BITWIDTH##_t ETYPE;                                         \
      uint32_t vm = vext_vm(desc);                                              \
      uint32_t vl = env->vl;                                                    \
 +    uint32_t esz = sizeof(ETYPE);                                             \
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);              \
 +    uint32_t vta = vext_vta(desc);                                            \
      uint32_t i;                                                               \
                                                                                \
      for (i = env->vstart; i < vl; i++) {                                      \
@@ -XXX,XX +XXX,XX @@ static void vslide1down_##BITWIDTH(void *vd, void *v0, target_ulong s1,       \
          }                                                                     \
      }                                                                         \
      env->vstart = 0;                                                          \
 +    /* set tail elements to 1s */                                             \
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);                  \
  }
  GEN_VEXT_VSLIDE1DOWN(8,  H1)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,               \
      uint32_t vlmax = vext_max_elems(desc, ctzl(sizeof(TS2)));             \
      uint32_t vm = vext_vm(desc);                                          \
      uint32_t vl = env->vl;                                                \
 +    uint32_t esz = sizeof(TS2);                                           \
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
 +    uint32_t vta = vext_vta(desc);                                        \
      uint64_t index;                                                       \
      uint32_t i;                                                           \
                                                                            \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,               \
          }                                                                 \
      }                                                                     \
      env->vstart = 0;                                                      \
 +    /* set tail elements to 1s */                                         \
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
  }
  /* vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; */
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
      uint32_t vlmax = vext_max_elems(desc, ctzl(sizeof(ETYPE)));           \
      uint32_t vm = vext_vm(desc);                                          \
      uint32_t vl = env->vl;                                                \
 +    uint32_t esz = sizeof(ETYPE);                                         \
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
 +    uint32_t vta = vext_vta(desc);                                        \
      uint64_t index = s1;                                                  \
      uint32_t i;                                                           \
                                                                            \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
          }                                                                 \
      }                                                                     \
      env->vstart = 0;                                                      \
 +    /* set tail elements to 1s */                                         \
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
  }
  /* vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[rs1] */
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,               \
                    CPURISCVState *env, uint32_t desc)                      \
  {                                                                         \
      uint32_t vl = env->vl;                                                \
 +    uint32_t esz = sizeof(ETYPE);                                         \
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
 +    uint32_t vta = vext_vta(desc);                                        \
      uint32_t num = 0, i;                                                  \
                                                                            \
      for (i = env->vstart; i < vl; i++) {                                  \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,               \
          num++;                                                            \
      }                                                                     \
      env->vstart = 0;                                                      \
 +    /* set tail elements to 1s */                                         \
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
  }
  /* Compress into vd elements of vs2 where vs1 is enabled */
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs2,                 \
  {                                                                \
      uint32_t vl = env->vl;                                       \
      uint32_t vm = vext_vm(desc);                                 \
 +    uint32_t esz = sizeof(ETYPE);                                \
 +    uint32_t total_elems = vext_get_total_elems(env, desc, esz); \
 +    uint32_t vta = vext_vta(desc);                               \
      uint32_t i;                                                  \
                                                                   \
      for (i = env->vstart; i < vl; i++) {                         \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs2,                 \
          *((ETYPE *)vd + HD(i)) = *((DTYPE *)vs2 + HS1(i));       \
      }                                                            \
      env->vstart = 0;                                             \
 +    /* set tail elements to 1s */                                \
 +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);     \
  }
  GEN_VEXT_INT_EXT(vzext_vf2_h, uint16_t, uint8_t,  H2, H1)
 diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/insn_trans/trans_rvv.c.inc
 +++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_vrgather_vx(DisasContext *s, arg_rmrr *a)
          return false;
      }
--    if (a->vm && s->vl_eq_vlmax) {
+-    /* if no accessor exists then return failure */
-+    if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
+-    if (!csr_ops[csrno].read) {
-         int scale = s->lmul - (s->sew + 3);
+-        return RISCV_EXCP_ILLEGAL_INST;
-         int vlmax = s->cfg_ptr->vlen >> -scale;
+-    }
-         TCGv_i64 dest = tcg_temp_new_i64();
+-    /* read old value */
-@@ -XXX,XX +XXX,XX @@ static bool trans_vrgather_vi(DisasContext *s, arg_rmrr *a)
+-    ret = csr_ops[csrno].read(env, csrno, &old_value);
-         return false;
+-    if (ret != RISCV_EXCP_NONE) {
 -        return ret;
 +    /*
 +     * ret_value == NULL means that rd=x0 and we're coming from helper_csrw()
 +     * and we can't throw side effects caused by CSR reads.
 +     */
 +    if (ret_value) {
 +        /* if no accessor exists then return failure */
 +        if (!csr_ops[csrno].read) {
 +            return RISCV_EXCP_ILLEGAL_INST;
 +        }
 +        /* read old value */
 +        ret = csr_ops[csrno].read(env, csrno, &old_value);
 +        if (ret != RISCV_EXCP_NONE) {
 +            return ret;
 +        }
      }
--    if (a->vm && s->vl_eq_vlmax) {
+     /* write value if writable and write mask set, otherwise drop writes */
 +    if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
          int scale = s->lmul - (s->sew + 3);
          int vlmax = s->cfg_ptr->vlen >> -scale;
          if (a->rs1 >= vlmax) {
@@ -XXX,XX +XXX,XX @@ static bool trans_vcompress_vm(DisasContext *s, arg_r *a)
          tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
          data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
 +        data = FIELD_DP32(data, VDATA, VTA, s->vta);
          tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
                             vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),
                             cpu_env, s->cfg_ptr->vlen / 8,
@@ -XXX,XX +XXX,XX @@ static bool int_ext_op(DisasContext *s, arg_rmr *a, uint8_t seq)
      }
      data = FIELD_DP32(data, VDATA, VM, a->vm);
 +    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
 +    data = FIELD_DP32(data, VDATA, VTA, s->vta);
      tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
                         vreg_ofs(s, a->rs2), cpu_env,
 --
-.36.1
+.41.0

From: Alistair Francis <alistair.francis@wdc.com>

The following changes since commit 9cc1bf1ebca550f8d90f967ccd2b6d2e00e81387:

Merge tag 'pull-xen-20220609' of https://xenbits.xen.org/git-http/people/aperard/qemu-dm into staging (2022-06-09 08:25:17 -0700)

are available in the Git repository at:

git@github.com:alistair23/qemu.git tags/pull-riscv-to-apply-20220610

for you to fetch changes up to 07314158f6aa4d2589520c194a7531b9364a8d54:

target/riscv: trans_rvv: Avoid assert for RV32 and e64 (2022-06-10 09:42:12 +1000)

----------------------------------------------------------------
Fourth RISC-V PR for QEMU 7.1

* Update MAINTAINERS
* Add support for Zmmul extension
* Fixup FDT errors when supplying device tree from the command line for virt machine
* Avoid overflowing the addr_config buffer in the SiFive PLIC
* Support -device loader addresses above 2GB
* Correctly wake from WFI on VS-level external interrupts
* Fixes for RV128 support
* Support Vector extension tail agnostic setting elements' bits to all 1s
* Don't expose the CPU properties on named CPUs
* Fix vector extension assert for RV32

----------------------------------------------------------------
Alistair Francis (4):
      MAINTAINERS: Cover hw/core/uboot_image.h within Generic Loader section
      hw/intc: sifive_plic: Avoid overflowing the addr_config buffer
      target/riscv: Don't expose the CPU properties on names CPUs
      target/riscv: trans_rvv: Avoid assert for RV32 and e64

Andrew Bresticker (1):
      target/riscv: Wake on VS-level external interrupts

Atish Patra (1):
      hw/riscv: virt: Generate fw_cfg DT node correctly

Frédéric Pétrot (1):
      target/riscv/debug.c: keep experimental rv128 support working

Jamie Iles (1):
      hw/core/loader: return image sizes as ssize_t

Weiwei Li (1):
      target/riscv: add support for zmmul extension v0.1

eopXD (16):
      target/riscv: rvv: Prune redundant ESZ, DSZ parameter passed
      target/riscv: rvv: Prune redundant access_type parameter passed
      target/riscv: rvv: Rename ambiguous esz
      target/riscv: rvv: Early exit when vstart >= vl
      target/riscv: rvv: Add tail agnostic for vv instructions
      target/riscv: rvv: Add tail agnostic for vector load / store instructions
      target/riscv: rvv: Add tail agnostic for vx, vvm, vxm instructions
      target/riscv: rvv: Add tail agnostic for vector integer shift instructions
      target/riscv: rvv: Add tail agnostic for vector integer comparison instructions
      target/riscv: rvv: Add tail agnostic for vector integer merge and move instructions
      target/riscv: rvv: Add tail agnostic for vector fix-point arithmetic instructions
      target/riscv: rvv: Add tail agnostic for vector floating-point instructions
      target/riscv: rvv: Add tail agnostic for vector reduction instructions
      target/riscv: rvv: Add tail agnostic for vector mask instructions
      target/riscv: rvv: Add tail agnostic for vector permutation instructions
      target/riscv: rvv: Add option 'rvv_ta_all_1s' to enable optional tail agnostic behavior

From: Weiwei Li <liweiwei@iscas.ac.cn>

Add support for the zmmul extension v0.1. This extension includes all
multiplication operations from the M extension but not the divide ops.

Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn>
Reviewed-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20220531030732.3850-1-liweiwei@iscas.ac.cn>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu.h                      |  1 +
 target/riscv/cpu.c                      |  7 +++++++
 target/riscv/insn_trans/trans_rvm.c.inc | 18 ++++++++++++------
 3 files changed, 20 insertions(+), 6 deletions(-)

From: Atish Patra <atishp@rivosinc.com>

fw_cfg DT node is generated after the create_fdt without any check
if the DT is being loaded from the commandline. This results in
FDT_ERR_EXISTS error if dtb is loaded from the commandline.

Generate fw_cfg node only if the DT is not loaded from the commandline.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20220526203500.847165-1-atishp@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 hw/riscv/virt.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -XXX,XX +XXX,XX @@ static void create_fdt_flash(RISCVVirtState *s, const MemMapEntry *memmap)
     g_free(name);
 }
 
+static void create_fdt_fw_cfg(RISCVVirtState *s, const MemMapEntry *memmap)
+{
+    char *nodename;
+    MachineState *mc = MACHINE(s);
+    hwaddr base = memmap[VIRT_FW_CFG].base;
+    hwaddr size = memmap[VIRT_FW_CFG].size;
+
+    nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base);
+    qemu_fdt_add_subnode(mc->fdt, nodename);
+    qemu_fdt_setprop_string(mc->fdt, nodename,
+                            "compatible", "qemu,fw-cfg-mmio");
+    qemu_fdt_setprop_sized_cells(mc->fdt, nodename, "reg",
+                                 2, base, 2, size);
+    qemu_fdt_setprop(mc->fdt, nodename, "dma-coherent", NULL, 0);
+    g_free(nodename);
+}
+
 static void create_fdt(RISCVVirtState *s, const MemMapEntry *memmap,
                        uint64_t mem_size, const char *cmdline, bool is_32_bit)
 {
@@ -XXX,XX +XXX,XX @@ static void create_fdt(RISCVVirtState *s, const MemMapEntry *memmap,
     create_fdt_rtc(s, memmap, irq_mmio_phandle);
 
     create_fdt_flash(s, memmap);
+    create_fdt_fw_cfg(s, memmap);
 
 update_bootargs:
     if (cmdline && *cmdline) {
@@ -XXX,XX +XXX,XX @@ static inline DeviceState *gpex_pcie_init(MemoryRegion *sys_mem,
 static FWCfgState *create_fw_cfg(const MachineState *mc)
 {
     hwaddr base = virt_memmap[VIRT_FW_CFG].base;
-    hwaddr size = virt_memmap[VIRT_FW_CFG].size;
     FWCfgState *fw_cfg;
-    char *nodename;
 
     fw_cfg = fw_cfg_init_mem_wide(base + 8, base, 8, base + 16,
                                   &address_space_memory);
     fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)mc->smp.cpus);
 
-    nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base);
-    qemu_fdt_add_subnode(mc->fdt, nodename);
-    qemu_fdt_setprop_string(mc->fdt, nodename,
-                            "compatible", "qemu,fw-cfg-mmio");
-    qemu_fdt_setprop_sized_cells(mc->fdt, nodename, "reg",
-                                 2, base, 2, size);
-    qemu_fdt_setprop(mc->fdt, nodename, "dma-coherent", NULL, 0);
-    g_free(nodename);
     return fw_cfg;
 }
 
-- 
2.36.1

From: Alistair Francis <alistair.francis@wdc.com>

Since commit ad40be27 "target/riscv: Support start kernel directly by
KVM" we have been overflowing the addr_config on "M,MS..."
configurations, as reported https://gitlab.com/qemu-project/qemu/-/issues/1050.

This commit changes the loop in sifive_plic_create() from iterating over
the number of harts to just iterating over the addr_config. The
addr_config is based on the hart_config, and will contain interrup details
for all harts. This way we can't iterate past the end of addr_config.

Fixes: ad40be27084536 ("target/riscv: Support start kernel directly by KVM")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1050
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Mingwang Li <limingwang@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220601013631.196854-1-alistair.francis@opensource.wdc.com>
---
 hw/intc/sifive_plic.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/hw/intc/sifive_plic.c b/hw/intc/sifive_plic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/sifive_plic.c
+++ b/hw/intc/sifive_plic.c
@@ -XXX,XX +XXX,XX @@ DeviceState *sifive_plic_create(hwaddr addr, char *hart_config,
     uint32_t context_stride, uint32_t aperture_size)
 {
     DeviceState *dev = qdev_new(TYPE_SIFIVE_PLIC);
-    int i, j = 0;
+    int i;
     SiFivePLICState *plic;
 
     assert(enable_stride == (enable_stride & -enable_stride));
@@ -XXX,XX +XXX,XX @@ DeviceState *sifive_plic_create(hwaddr addr, char *hart_config,
     sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, addr);
 
     plic = SIFIVE_PLIC(dev);
-    for (i = 0; i < num_harts; i++) {
-        CPUState *cpu = qemu_get_cpu(hartid_base + i);
 
-        if (plic->addr_config[j].mode == PLICMode_M) {
-            j++;
-            qdev_connect_gpio_out(dev, num_harts + i,
+    for (i = 0; i < plic->num_addrs; i++) {
+        int cpu_num = plic->addr_config[i].hartid;
+        CPUState *cpu = qemu_get_cpu(hartid_base + cpu_num);
+
+        if (plic->addr_config[i].mode == PLICMode_M) {
+            qdev_connect_gpio_out(dev, num_harts + cpu_num,
                                   qdev_get_gpio_in(DEVICE(cpu), IRQ_M_EXT));
         }
-
-        if (plic->addr_config[j].mode == PLICMode_S) {
-            j++;
-            qdev_connect_gpio_out(dev, i,
+        if (plic->addr_config[i].mode == PLICMode_S) {
+            qdev_connect_gpio_out(dev, cpu_num,
                                   qdev_get_gpio_in(DEVICE(cpu), IRQ_S_EXT));
         }
     }
-- 
2.36.1

From: Jamie Iles <jamie@nuviainc.com>

Various loader functions return an int which limits images to 2GB which
is fine for things like a BIOS/kernel image, but if we want to be able
to load memory images or large ramdisks then any file over 2GB would
silently fail to load.

Cc: Luc Michel <lmichel@kalray.eu>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Luc Michel <lmichel@kalray.eu>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20211111141141.3295094-2-jamie@nuviainc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 include/hw/loader.h      | 55 +++++++++++++--------------
 hw/arm/armv7m.c          |  2 +-
 hw/arm/boot.c            |  8 ++--
 hw/core/generic-loader.c |  2 +-
 hw/core/loader.c         | 81 +++++++++++++++++++++-------------------
 hw/i386/x86.c            |  2 +-
 hw/riscv/boot.c          |  5 ++-
 7 files changed, 80 insertions(+), 75 deletions(-)

diff --git a/include/hw/loader.h b/include/hw/loader.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/loader.h
+++ b/include/hw/loader.h
@@ -XXX,XX +XXX,XX @@ ssize_t load_image_size(const char *filename, void *addr, size_t size);
  *
  * Returns the size of the loaded image on success, -1 otherwise.
  */
-int load_image_targphys_as(const char *filename,
-                           hwaddr addr, uint64_t max_sz, AddressSpace *as);
+ssize_t load_image_targphys_as(const char *filename,
+                               hwaddr addr, uint64_t max_sz, AddressSpace *as);
 
 /**load_targphys_hex_as:
  * @filename: Path to the .hex file
@@ -XXX,XX +XXX,XX @@ int load_image_targphys_as(const char *filename,
  *
  * Returns the size of the loaded .hex file on success, -1 otherwise.
  */
-int load_targphys_hex_as(const char *filename, hwaddr *entry, AddressSpace *as);
+ssize_t load_targphys_hex_as(const char *filename, hwaddr *entry,
+                             AddressSpace *as);
 
 /** load_image_targphys:
  * Same as load_image_targphys_as(), but doesn't allow the caller to specify
  * an AddressSpace.
  */
-int load_image_targphys(const char *filename, hwaddr,
-                        uint64_t max_sz);
+ssize_t load_image_targphys(const char *filename, hwaddr,
+                            uint64_t max_sz);
 
 /**
  * load_image_mr: load an image into a memory region
@@ -XXX,XX +XXX,XX @@ int load_image_targphys(const char *filename, hwaddr,
  * If the file is larger than the memory region's size the call will fail.
  * Returns -1 on failure, or the size of the file.
  */
-int load_image_mr(const char *filename, MemoryRegion *mr);
+ssize_t load_image_mr(const char *filename, MemoryRegion *mr);
 
 /* This is the limit on the maximum uncompressed image size that
  * load_image_gzipped_buffer() and load_image_gzipped() will read. It prevents
@@ -XXX,XX +XXX,XX @@ int load_image_mr(const char *filename, MemoryRegion *mr);
  */
 #define LOAD_IMAGE_MAX_GUNZIP_BYTES (256 << 20)
 
-int load_image_gzipped_buffer(const char *filename, uint64_t max_sz,
-                              uint8_t **buffer);
-int load_image_gzipped(const char *filename, hwaddr addr, uint64_t max_sz);
+ssize_t load_image_gzipped_buffer(const char *filename, uint64_t max_sz,
+                                  uint8_t **buffer);
+ssize_t load_image_gzipped(const char *filename, hwaddr addr, uint64_t max_sz);
 
 #define ELF_LOAD_FAILED       -1
 #define ELF_LOAD_NOT_ELF      -2
@@ -XXX,XX +XXX,XX @@ ssize_t load_elf(const char *filename,
  */
 void load_elf_hdr(const char *filename, void *hdr, bool *is64, Error **errp);
 
-int load_aout(const char *filename, hwaddr addr, int max_sz,
-              int bswap_needed, hwaddr target_page_size);
+ssize_t load_aout(const char *filename, hwaddr addr, int max_sz,
+                  int bswap_needed, hwaddr target_page_size);
 
 #define LOAD_UIMAGE_LOADADDR_INVALID (-1)
 
@@ -XXX,XX +XXX,XX @@ int load_aout(const char *filename, hwaddr addr, int max_sz,
  *
  * Returns the size of the loaded image on success, -1 otherwise.
  */
-int load_uimage_as(const char *filename, hwaddr *ep,
-                   hwaddr *loadaddr, int *is_linux,
-                   uint64_t (*translate_fn)(void *, uint64_t),
-                   void *translate_opaque, AddressSpace *as);
+ssize_t load_uimage_as(const char *filename, hwaddr *ep,
+                       hwaddr *loadaddr, int *is_linux,
+                       uint64_t (*translate_fn)(void *, uint64_t),
+                       void *translate_opaque, AddressSpace *as);
 
 /** load_uimage:
  * Same as load_uimage_as(), but doesn't allow the caller to specify an
  * AddressSpace.
  */
-int load_uimage(const char *filename, hwaddr *ep,
-                hwaddr *loadaddr, int *is_linux,
-                uint64_t (*translate_fn)(void *, uint64_t),
-                void *translate_opaque);
+ssize_t load_uimage(const char *filename, hwaddr *ep,
+                    hwaddr *loadaddr, int *is_linux,
+                    uint64_t (*translate_fn)(void *, uint64_t),
+                    void *translate_opaque);
 
 /**
  * load_ramdisk_as:
@@ -XXX,XX +XXX,XX @@ int load_uimage(const char *filename, hwaddr *ep,
  *
  * Returns the size of the loaded image on success, -1 otherwise.
  */
-int load_ramdisk_as(const char *filename, hwaddr addr, uint64_t max_sz,
-                    AddressSpace *as);
+ssize_t load_ramdisk_as(const char *filename, hwaddr addr, uint64_t max_sz,
+                        AddressSpace *as);
 
 /**
  * load_ramdisk:
  * Same as load_ramdisk_as(), but doesn't allow the caller to specify
  * an AddressSpace.
  */
-int load_ramdisk(const char *filename, hwaddr addr, uint64_t max_sz);
+ssize_t load_ramdisk(const char *filename, hwaddr addr, uint64_t max_sz);
 
 ssize_t gunzip(void *dst, size_t dstlen, uint8_t *src, size_t srclen);
 
@@ -XXX,XX +XXX,XX @@ void pstrcpy_targphys(const char *name,
 extern bool option_rom_has_mr;
 extern bool rom_file_has_mr;
 
-int rom_add_file(const char *file, const char *fw_dir,
-                 hwaddr addr, int32_t bootindex,
-                 bool option_rom, MemoryRegion *mr, AddressSpace *as);
+ssize_t rom_add_file(const char *file, const char *fw_dir,
+                     hwaddr addr, int32_t bootindex,
+                     bool option_rom, MemoryRegion *mr, AddressSpace *as);
 MemoryRegion *rom_add_blob(const char *name, const void *blob, size_t len,
                            size_t max_len, hwaddr addr,
                            const char *fw_file_name,
@@ -XXX,XX +XXX,XX @@ void hmp_info_roms(Monitor *mon, const QDict *qdict);
 #define rom_add_blob_fixed_as(_f, _b, _l, _a, _as)      \
     rom_add_blob(_f, _b, _l, _l, _a, NULL, NULL, NULL, _as, true)
 
-int rom_add_vga(const char *file);
-int rom_add_option(const char *file, int32_t bootindex);
+ssize_t rom_add_vga(const char *file);
+ssize_t rom_add_option(const char *file, int32_t bootindex);
 
 /* This is the usual maximum in uboot, so if a uImage overflows this, it would
  * overflow on real hardware too. */
diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -XXX,XX +XXX,XX @@ static void armv7m_reset(void *opaque)
 
 void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int mem_size)
 {
-    int image_size;
+    ssize_t image_size;
     uint64_t entry;
     int big_endian;
     AddressSpace *as;
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ static int do_arm_linux_init(Object *obj, void *opaque)
     return 0;
 }
 
-static int64_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry,
+static ssize_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry,
                             uint64_t *lowaddr, uint64_t *highaddr,
                             int elf_machine, AddressSpace *as)
 {
@@ -XXX,XX +XXX,XX @@ static int64_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry,
     } elf_header;
     int data_swab = 0;
     bool big_endian;
-    int64_t ret = -1;
+    ssize_t ret = -1;
     Error *err = NULL;
 
 
@@ -XXX,XX +XXX,XX @@ static void arm_setup_direct_kernel_boot(ARMCPU *cpu,
     /* Set up for a direct boot of a kernel image file. */
     CPUState *cs;
     AddressSpace *as = arm_boot_address_space(cpu, info);
-    int kernel_size;
+    ssize_t kernel_size;
     int initrd_size;
     int is_linux = 0;
     uint64_t elf_entry;
@@ -XXX,XX +XXX,XX @@ static void arm_setup_direct_kernel_boot(ARMCPU *cpu,
 
     if (kernel_size > info->ram_size) {
         error_report("kernel '%s' is too large to fit in RAM "
-                     "(kernel size %d, RAM size %" PRId64 ")",
+                     "(kernel size %zd, RAM size %" PRId64 ")",
                      info->kernel_filename, kernel_size, info->ram_size);
         exit(1);
     }
diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/core/generic-loader.c
+++ b/hw/core/generic-loader.c
@@ -XXX,XX +XXX,XX @@ static void generic_loader_realize(DeviceState *dev, Error **errp)
     GenericLoaderState *s = GENERIC_LOADER(dev);
     hwaddr entry;
     int big_endian;
-    int size = 0;
+    ssize_t size = 0;
 
     s->set_pc = false;
 
diff --git a/hw/core/loader.c b/hw/core/loader.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -XXX,XX +XXX,XX @@ ssize_t read_targphys(const char *name,
     return did;
 }
 
-int load_image_targphys(const char *filename,
-                        hwaddr addr, uint64_t max_sz)
+ssize_t load_image_targphys(const char *filename,
+                            hwaddr addr, uint64_t max_sz)
 {
     return load_image_targphys_as(filename, addr, max_sz, NULL);
 }
 
 /* return the size or -1 if error */
-int load_image_targphys_as(const char *filename,
-                           hwaddr addr, uint64_t max_sz, AddressSpace *as)
+ssize_t load_image_targphys_as(const char *filename,
+                               hwaddr addr, uint64_t max_sz, AddressSpace *as)
 {
-    int size;
+    ssize_t size;
 
     size = get_image_size(filename);
     if (size < 0 || size > max_sz) {
@@ -XXX,XX +XXX,XX @@ int load_image_targphys_as(const char *filename,
     return size;
 }
 
-int load_image_mr(const char *filename, MemoryRegion *mr)
+ssize_t load_image_mr(const char *filename, MemoryRegion *mr)
 {
-    int size;
+    ssize_t size;
 
     if (!memory_access_is_direct(mr, false)) {
         /* Can only load an image into RAM or ROM */
@@ -XXX,XX +XXX,XX @@ static void bswap_ahdr(struct exec *e)
      : (_N_SEGMENT_ROUND (_N_TXTENDADDR(x, target_page_size), target_page_size)))
 
 
-int load_aout(const char *filename, hwaddr addr, int max_sz,
-              int bswap_needed, hwaddr target_page_size)
+ssize_t load_aout(const char *filename, hwaddr addr, int max_sz,
+                  int bswap_needed, hwaddr target_page_size)
 {
     int fd;
     ssize_t size, ret;
@@ -XXX,XX +XXX,XX @@ toosmall:
 }
 
 /* Load a U-Boot image.  */
-static int load_uboot_image(const char *filename, hwaddr *ep, hwaddr *loadaddr,
-                            int *is_linux, uint8_t image_type,
-                            uint64_t (*translate_fn)(void *, uint64_t),
-                            void *translate_opaque, AddressSpace *as)
+static ssize_t load_uboot_image(const char *filename, hwaddr *ep,
+                                hwaddr *loadaddr, int *is_linux,
+                                uint8_t image_type,
+                                uint64_t (*translate_fn)(void *, uint64_t),
+                                void *translate_opaque, AddressSpace *as)
 {
     int fd;
-    int size;
+    ssize_t size;
     hwaddr address;
     uboot_image_header_t h;
     uboot_image_header_t *hdr = &h;
@@ -XXX,XX +XXX,XX @@ out:
     return ret;
 }
 
-int load_uimage(const char *filename, hwaddr *ep, hwaddr *loadaddr,
-                int *is_linux,
-                uint64_t (*translate_fn)(void *, uint64_t),
-                void *translate_opaque)
+ssize_t load_uimage(const char *filename, hwaddr *ep, hwaddr *loadaddr,
+                    int *is_linux,
+                    uint64_t (*translate_fn)(void *, uint64_t),
+                    void *translate_opaque)
 {
     return load_uboot_image(filename, ep, loadaddr, is_linux, IH_TYPE_KERNEL,
                             translate_fn, translate_opaque, NULL);
 }
 
-int load_uimage_as(const char *filename, hwaddr *ep, hwaddr *loadaddr,
-                   int *is_linux,
-                   uint64_t (*translate_fn)(void *, uint64_t),
-                   void *translate_opaque, AddressSpace *as)
+ssize_t load_uimage_as(const char *filename, hwaddr *ep, hwaddr *loadaddr,
+                       int *is_linux,
+                       uint64_t (*translate_fn)(void *, uint64_t),
+                       void *translate_opaque, AddressSpace *as)
 {
     return load_uboot_image(filename, ep, loadaddr, is_linux, IH_TYPE_KERNEL,
                             translate_fn, translate_opaque, as);
 }
 
 /* Load a ramdisk.  */
-int load_ramdisk(const char *filename, hwaddr addr, uint64_t max_sz)
+ssize_t load_ramdisk(const char *filename, hwaddr addr, uint64_t max_sz)
 {
     return load_ramdisk_as(filename, addr, max_sz, NULL);
 }
 
-int load_ramdisk_as(const char *filename, hwaddr addr, uint64_t max_sz,
-                    AddressSpace *as)
+ssize_t load_ramdisk_as(const char *filename, hwaddr addr, uint64_t max_sz,
+                        AddressSpace *as)
 {
     return load_uboot_image(filename, NULL, &addr, NULL, IH_TYPE_RAMDISK,
                             NULL, NULL, as);
 }
 
 /* Load a gzip-compressed kernel to a dynamically allocated buffer. */
-int load_image_gzipped_buffer(const char *filename, uint64_t max_sz,
-                              uint8_t **buffer)
+ssize_t load_image_gzipped_buffer(const char *filename, uint64_t max_sz,
+                                  uint8_t **buffer)
 {
     uint8_t *compressed_data = NULL;
     uint8_t *data = NULL;
@@ -XXX,XX +XXX,XX @@ int load_image_gzipped_buffer(const char *filename, uint64_t max_sz,
 }
 
 /* Load a gzip-compressed kernel. */
-int load_image_gzipped(const char *filename, hwaddr addr, uint64_t max_sz)
+ssize_t load_image_gzipped(const char *filename, hwaddr addr, uint64_t max_sz)
 {
-    int bytes;
+    ssize_t bytes;
     uint8_t *data;
 
     bytes = load_image_gzipped_buffer(filename, max_sz, &data);
@@ -XXX,XX +XXX,XX @@ static void *rom_set_mr(Rom *rom, Object *owner, const char *name, bool ro)
     return data;
 }
 
-int rom_add_file(const char *file, const char *fw_dir,
-                 hwaddr addr, int32_t bootindex,
-                 bool option_rom, MemoryRegion *mr,
-                 AddressSpace *as)
+ssize_t rom_add_file(const char *file, const char *fw_dir,
+                     hwaddr addr, int32_t bootindex,
+                     bool option_rom, MemoryRegion *mr,
+                     AddressSpace *as)
 {
     MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
     Rom *rom;
-    int rc, fd = -1;
+    ssize_t rc;
+    int fd = -1;
     char devpath[100];
 
     if (as && mr) {
@@ -XXX,XX +XXX,XX @@ int rom_add_file(const char *file, const char *fw_dir,
     lseek(fd, 0, SEEK_SET);
     rc = read(fd, rom->data, rom->datasize);
     if (rc != rom->datasize) {
-        fprintf(stderr, "rom: file %-20s: read error: rc=%d (expected %zd)\n",
+        fprintf(stderr, "rom: file %-20s: read error: rc=%zd (expected %zd)\n",
                 rom->name, rc, rom->datasize);
         goto err;
     }
@@ -XXX,XX +XXX,XX @@ int rom_add_elf_program(const char *name, GMappedFile *mapped_file, void *data,
     return 0;
 }
 
-int rom_add_vga(const char *file)
+ssize_t rom_add_vga(const char *file)
 {
     return rom_add_file(file, "vgaroms", 0, -1, true, NULL, NULL);
 }
 
-int rom_add_option(const char *file, int32_t bootindex)
+ssize_t rom_add_option(const char *file, int32_t bootindex)
 {
     return rom_add_file(file, "genroms", 0, bootindex, true, NULL, NULL);
 }
@@ -XXX,XX +XXX,XX @@ out:
 }
 
 /* return size or -1 if error */
-int load_targphys_hex_as(const char *filename, hwaddr *entry, AddressSpace *as)
+ssize_t load_targphys_hex_as(const char *filename, hwaddr *entry,
+                             AddressSpace *as)
 {
     gsize hex_blob_size;
     gchar *hex_blob;
-    int total_size = 0;
+    ssize_t total_size = 0;
 
     if (!g_file_get_contents(filename, &hex_blob, &hex_blob_size, NULL)) {
         return -1;
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -XXX,XX +XXX,XX @@ void x86_bios_rom_init(MachineState *ms, const char *default_firmware,
     char *filename;
     MemoryRegion *bios, *isa_bios;
     int bios_size, isa_bios_size;
-    int ret;
+    ssize_t ret;
 
     /* BIOS load */
     bios_name = ms->firmware ?: default_firmware;
diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/riscv/boot.c
+++ b/hw/riscv/boot.c
@@ -XXX,XX +XXX,XX @@ target_ulong riscv_load_firmware(const char *firmware_filename,
                                  hwaddr firmware_load_addr,
                                  symbol_fn_t sym_cb)
 {
-    uint64_t firmware_entry, firmware_size, firmware_end;
+    uint64_t firmware_entry, firmware_end;
+    ssize_t firmware_size;
 
     if (load_elf_ram_sym(firmware_filename, NULL, NULL, NULL,
                          &firmware_entry, NULL, &firmware_end, NULL,
@@ -XXX,XX +XXX,XX @@ target_ulong riscv_load_kernel(const char *kernel_filename,
 hwaddr riscv_load_initrd(const char *filename, uint64_t mem_size,
                          uint64_t kernel_entry, hwaddr *start)
 {
-    int size;
+    ssize_t size;
 
     /*
      * We want to put the initrd far enough into RAM that when the
-- 
2.36.1

From: Andrew Bresticker <abrestic@rivosinc.com>

Whether or not VSEIP is pending isn't reflected in env->mip and must
instead be determined from hstatus.vgein and hgeip. As a result a
CPU in WFI won't wake on a VSEIP, which violates the WFI behavior as
specified in the privileged ISA. Just use riscv_cpu_all_pending()
instead, which already accounts for VSEIP.

Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20220531210544.181322-1-abrestic@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu.h        | 1 +
 target/riscv/cpu.c        | 2 +-
 target/riscv/cpu_helper.c | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -XXX,XX +XXX,XX @@ int riscv_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int riscv_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
 int riscv_cpu_hviprio_index2irq(int index, int *out_irq, int *out_rdzero);
 uint8_t riscv_cpu_default_priority(int irq);
+uint64_t riscv_cpu_all_pending(CPURISCVState *env);
 int riscv_cpu_mirq_pending(CPURISCVState *env);
 int riscv_cpu_sirq_pending(CPURISCVState *env);
 int riscv_cpu_vsirq_pending(CPURISCVState *env);
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static bool riscv_cpu_has_work(CPUState *cs)
      * Definition of the WFI instruction requires it to ignore the privilege
      * mode and delegation registers, but respect individual enables
      */
-    return (env->mip & env->mie) != 0;
+    return riscv_cpu_all_pending(env) != 0;
 #else
     return true;
 #endif
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -XXX,XX +XXX,XX @@ static int riscv_cpu_pending_to_irq(CPURISCVState *env,
     return best_irq;
 }
 
-static uint64_t riscv_cpu_all_pending(CPURISCVState *env)
+uint64_t riscv_cpu_all_pending(CPURISCVState *env)
 {
     uint32_t gein = get_field(env->hstatus, HSTATUS_VGEIN);
     uint64_t vsgein = (env->hgeip & (1ULL << gein)) ? MIP_VSEIP : 0;
-- 
2.36.1

From: Frédéric Pétrot <frederic.petrot@univ-grenoble-alpes.fr>

Add an MXL_RV128 case in two switches so that no error is triggered when
using the -cpu x-rv128 option.

Signed-off-by: Frédéric Pétrot <frederic.petrot@univ-grenoble-alpes.fr>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Message-Id: <20220602155246.38837-1-frederic.petrot@univ-grenoble-alpes.fr>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/debug.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/riscv/debug.c b/target/riscv/debug.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/debug.c
+++ b/target/riscv/debug.c
@@ -XXX,XX +XXX,XX @@ static inline target_ulong trigger_type(CPURISCVState *env,
         tdata1 = RV32_TYPE(type);
         break;
     case MXL_RV64:
+    case MXL_RV128:
         tdata1 = RV64_TYPE(type);
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static target_ulong tdata1_validate(CPURISCVState *env, target_ulong val,
         tdata1 = RV32_TYPE(t);
         break;
     case MXL_RV64:
+    case MXL_RV128:
         type = extract64(val, 60, 4);
         dmode = extract64(val, 59, 1);
         tdata1 = RV64_TYPE(t);
-- 
2.36.1

From: eopXD <yueh.ting.chen@gmail.com>

No functional change intended in this commit.

Signed-off-by: eop Chen <eop.chen@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <165449614532.19704.7000832880482980398-1@git.sr.ht>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/vector_helper.c | 1132 +++++++++++++++++-----------------
 1 file changed, 565 insertions(+), 567 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vsub_vv_d, OP_SSS_D, H8, H8, H8, DO_SUB)
 
 static void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
                        CPURISCVState *env, uint32_t desc,
-                       uint32_t esz, uint32_t dsz,
                        opivv2_fn *fn)
 {
     uint32_t vm = vext_vm(desc);
@@ -XXX,XX +XXX,XX @@ static void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
 }
 
 /* generate the helpers for OPIVV */
-#define GEN_VEXT_VV(NAME, ESZ, DSZ)                       \
+#define GEN_VEXT_VV(NAME)                                 \
 void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
                   void *vs2, CPURISCVState *env,          \
                   uint32_t desc)                          \
 {                                                         \
-    do_vext_vv(vd, v0, vs1, vs2, env, desc, ESZ, DSZ,     \
+    do_vext_vv(vd, v0, vs1, vs2, env, desc,               \
                do_##NAME);                                \
 }
 
-GEN_VEXT_VV(vadd_vv_b, 1, 1)
-GEN_VEXT_VV(vadd_vv_h, 2, 2)
-GEN_VEXT_VV(vadd_vv_w, 4, 4)
-GEN_VEXT_VV(vadd_vv_d, 8, 8)
-GEN_VEXT_VV(vsub_vv_b, 1, 1)
-GEN_VEXT_VV(vsub_vv_h, 2, 2)
-GEN_VEXT_VV(vsub_vv_w, 4, 4)
-GEN_VEXT_VV(vsub_vv_d, 8, 8)
+GEN_VEXT_VV(vadd_vv_b)
+GEN_VEXT_VV(vadd_vv_h)
+GEN_VEXT_VV(vadd_vv_w)
+GEN_VEXT_VV(vadd_vv_d)
+GEN_VEXT_VV(vsub_vv_b)
+GEN_VEXT_VV(vsub_vv_h)
+GEN_VEXT_VV(vsub_vv_w)
+GEN_VEXT_VV(vsub_vv_d)
 
 typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
 
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vrsub_vx_d, OP_SSS_D, H8, H8, DO_RSUB)
 
 static void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
                        CPURISCVState *env, uint32_t desc,
-                       uint32_t esz, uint32_t dsz,
                        opivx2_fn fn)
 {
     uint32_t vm = vext_vm(desc);
@@ -XXX,XX +XXX,XX @@ static void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
 }
 
 /* generate the helpers for OPIVX */
-#define GEN_VEXT_VX(NAME, ESZ, DSZ)                       \
+#define GEN_VEXT_VX(NAME)                                 \
 void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
                   void *vs2, CPURISCVState *env,          \
                   uint32_t desc)                          \
 {                                                         \
-    do_vext_vx(vd, v0, s1, vs2, env, desc, ESZ, DSZ,      \
+    do_vext_vx(vd, v0, s1, vs2, env, desc,                \
                do_##NAME);                                \
 }
 
-GEN_VEXT_VX(vadd_vx_b, 1, 1)
-GEN_VEXT_VX(vadd_vx_h, 2, 2)
-GEN_VEXT_VX(vadd_vx_w, 4, 4)
-GEN_VEXT_VX(vadd_vx_d, 8, 8)
-GEN_VEXT_VX(vsub_vx_b, 1, 1)
-GEN_VEXT_VX(vsub_vx_h, 2, 2)
-GEN_VEXT_VX(vsub_vx_w, 4, 4)
-GEN_VEXT_VX(vsub_vx_d, 8, 8)
-GEN_VEXT_VX(vrsub_vx_b, 1, 1)
-GEN_VEXT_VX(vrsub_vx_h, 2, 2)
-GEN_VEXT_VX(vrsub_vx_w, 4, 4)
-GEN_VEXT_VX(vrsub_vx_d, 8, 8)
+GEN_VEXT_VX(vadd_vx_b)
+GEN_VEXT_VX(vadd_vx_h)
+GEN_VEXT_VX(vadd_vx_w)
+GEN_VEXT_VX(vadd_vx_d)
+GEN_VEXT_VX(vsub_vx_b)
+GEN_VEXT_VX(vsub_vx_h)
+GEN_VEXT_VX(vsub_vx_w)
+GEN_VEXT_VX(vsub_vx_d)
+GEN_VEXT_VX(vrsub_vx_b)
+GEN_VEXT_VX(vrsub_vx_h)
+GEN_VEXT_VX(vrsub_vx_w)
+GEN_VEXT_VX(vrsub_vx_d)
 
 void HELPER(vec_rsubs8)(void *d, void *a, uint64_t b, uint32_t desc)
 {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vwadd_wv_w, WOP_WSSS_W, H8, H4, H4, DO_ADD)
 RVVCALL(OPIVV2, vwsub_wv_b, WOP_WSSS_B, H2, H1, H1, DO_SUB)
 RVVCALL(OPIVV2, vwsub_wv_h, WOP_WSSS_H, H4, H2, H2, DO_SUB)
 RVVCALL(OPIVV2, vwsub_wv_w, WOP_WSSS_W, H8, H4, H4, DO_SUB)
-GEN_VEXT_VV(vwaddu_vv_b, 1, 2)
-GEN_VEXT_VV(vwaddu_vv_h, 2, 4)
-GEN_VEXT_VV(vwaddu_vv_w, 4, 8)
-GEN_VEXT_VV(vwsubu_vv_b, 1, 2)
-GEN_VEXT_VV(vwsubu_vv_h, 2, 4)
-GEN_VEXT_VV(vwsubu_vv_w, 4, 8)
-GEN_VEXT_VV(vwadd_vv_b, 1, 2)
-GEN_VEXT_VV(vwadd_vv_h, 2, 4)
-GEN_VEXT_VV(vwadd_vv_w, 4, 8)
-GEN_VEXT_VV(vwsub_vv_b, 1, 2)
-GEN_VEXT_VV(vwsub_vv_h, 2, 4)
-GEN_VEXT_VV(vwsub_vv_w, 4, 8)
-GEN_VEXT_VV(vwaddu_wv_b, 1, 2)
-GEN_VEXT_VV(vwaddu_wv_h, 2, 4)
-GEN_VEXT_VV(vwaddu_wv_w, 4, 8)
-GEN_VEXT_VV(vwsubu_wv_b, 1, 2)
-GEN_VEXT_VV(vwsubu_wv_h, 2, 4)
-GEN_VEXT_VV(vwsubu_wv_w, 4, 8)
-GEN_VEXT_VV(vwadd_wv_b, 1, 2)
-GEN_VEXT_VV(vwadd_wv_h, 2, 4)
-GEN_VEXT_VV(vwadd_wv_w, 4, 8)
-GEN_VEXT_VV(vwsub_wv_b, 1, 2)
-GEN_VEXT_VV(vwsub_wv_h, 2, 4)
-GEN_VEXT_VV(vwsub_wv_w, 4, 8)
+GEN_VEXT_VV(vwaddu_vv_b)
+GEN_VEXT_VV(vwaddu_vv_h)
+GEN_VEXT_VV(vwaddu_vv_w)
+GEN_VEXT_VV(vwsubu_vv_b)
+GEN_VEXT_VV(vwsubu_vv_h)
+GEN_VEXT_VV(vwsubu_vv_w)
+GEN_VEXT_VV(vwadd_vv_b)
+GEN_VEXT_VV(vwadd_vv_h)
+GEN_VEXT_VV(vwadd_vv_w)
+GEN_VEXT_VV(vwsub_vv_b)
+GEN_VEXT_VV(vwsub_vv_h)
+GEN_VEXT_VV(vwsub_vv_w)
+GEN_VEXT_VV(vwaddu_wv_b)
+GEN_VEXT_VV(vwaddu_wv_h)
+GEN_VEXT_VV(vwaddu_wv_w)
+GEN_VEXT_VV(vwsubu_wv_b)
+GEN_VEXT_VV(vwsubu_wv_h)
+GEN_VEXT_VV(vwsubu_wv_w)
+GEN_VEXT_VV(vwadd_wv_b)
+GEN_VEXT_VV(vwadd_wv_h)
+GEN_VEXT_VV(vwadd_wv_w)
+GEN_VEXT_VV(vwsub_wv_b)
+GEN_VEXT_VV(vwsub_wv_h)
+GEN_VEXT_VV(vwsub_wv_w)
 
 RVVCALL(OPIVX2, vwaddu_vx_b, WOP_UUU_B, H2, H1, DO_ADD)
 RVVCALL(OPIVX2, vwaddu_vx_h, WOP_UUU_H, H4, H2, DO_ADD)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vwadd_wx_w, WOP_WSSS_W, H8, H4, DO_ADD)
 RVVCALL(OPIVX2, vwsub_wx_b, WOP_WSSS_B, H2, H1, DO_SUB)
 RVVCALL(OPIVX2, vwsub_wx_h, WOP_WSSS_H, H4, H2, DO_SUB)
 RVVCALL(OPIVX2, vwsub_wx_w, WOP_WSSS_W, H8, H4, DO_SUB)
-GEN_VEXT_VX(vwaddu_vx_b, 1, 2)
-GEN_VEXT_VX(vwaddu_vx_h, 2, 4)
-GEN_VEXT_VX(vwaddu_vx_w, 4, 8)
-GEN_VEXT_VX(vwsubu_vx_b, 1, 2)
-GEN_VEXT_VX(vwsubu_vx_h, 2, 4)
-GEN_VEXT_VX(vwsubu_vx_w, 4, 8)
-GEN_VEXT_VX(vwadd_vx_b, 1, 2)
-GEN_VEXT_VX(vwadd_vx_h, 2, 4)
-GEN_VEXT_VX(vwadd_vx_w, 4, 8)
-GEN_VEXT_VX(vwsub_vx_b, 1, 2)
-GEN_VEXT_VX(vwsub_vx_h, 2, 4)
-GEN_VEXT_VX(vwsub_vx_w, 4, 8)
-GEN_VEXT_VX(vwaddu_wx_b, 1, 2)
-GEN_VEXT_VX(vwaddu_wx_h, 2, 4)
-GEN_VEXT_VX(vwaddu_wx_w, 4, 8)
-GEN_VEXT_VX(vwsubu_wx_b, 1, 2)
-GEN_VEXT_VX(vwsubu_wx_h, 2, 4)
-GEN_VEXT_VX(vwsubu_wx_w, 4, 8)
-GEN_VEXT_VX(vwadd_wx_b, 1, 2)
-GEN_VEXT_VX(vwadd_wx_h, 2, 4)
-GEN_VEXT_VX(vwadd_wx_w, 4, 8)
-GEN_VEXT_VX(vwsub_wx_b, 1, 2)
-GEN_VEXT_VX(vwsub_wx_h, 2, 4)
-GEN_VEXT_VX(vwsub_wx_w, 4, 8)
+GEN_VEXT_VX(vwaddu_vx_b)
+GEN_VEXT_VX(vwaddu_vx_h)
+GEN_VEXT_VX(vwaddu_vx_w)
+GEN_VEXT_VX(vwsubu_vx_b)
+GEN_VEXT_VX(vwsubu_vx_h)
+GEN_VEXT_VX(vwsubu_vx_w)
+GEN_VEXT_VX(vwadd_vx_b)
+GEN_VEXT_VX(vwadd_vx_h)
+GEN_VEXT_VX(vwadd_vx_w)
+GEN_VEXT_VX(vwsub_vx_b)
+GEN_VEXT_VX(vwsub_vx_h)
+GEN_VEXT_VX(vwsub_vx_w)
+GEN_VEXT_VX(vwaddu_wx_b)
+GEN_VEXT_VX(vwaddu_wx_h)
+GEN_VEXT_VX(vwaddu_wx_w)
+GEN_VEXT_VX(vwsubu_wx_b)
+GEN_VEXT_VX(vwsubu_wx_h)
+GEN_VEXT_VX(vwsubu_wx_w)
+GEN_VEXT_VX(vwadd_wx_b)
+GEN_VEXT_VX(vwadd_wx_h)
+GEN_VEXT_VX(vwadd_wx_w)
+GEN_VEXT_VX(vwsub_wx_b)
+GEN_VEXT_VX(vwsub_wx_h)
+GEN_VEXT_VX(vwsub_wx_w)
 
 /* Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions */
 #define DO_VADC(N, M, C) (N + M + C)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vxor_vv_b, OP_SSS_B, H1, H1, H1, DO_XOR)
 RVVCALL(OPIVV2, vxor_vv_h, OP_SSS_H, H2, H2, H2, DO_XOR)
 RVVCALL(OPIVV2, vxor_vv_w, OP_SSS_W, H4, H4, H4, DO_XOR)
 RVVCALL(OPIVV2, vxor_vv_d, OP_SSS_D, H8, H8, H8, DO_XOR)
-GEN_VEXT_VV(vand_vv_b, 1, 1)
-GEN_VEXT_VV(vand_vv_h, 2, 2)
-GEN_VEXT_VV(vand_vv_w, 4, 4)
-GEN_VEXT_VV(vand_vv_d, 8, 8)
-GEN_VEXT_VV(vor_vv_b, 1, 1)
-GEN_VEXT_VV(vor_vv_h, 2, 2)
-GEN_VEXT_VV(vor_vv_w, 4, 4)
-GEN_VEXT_VV(vor_vv_d, 8, 8)
-GEN_VEXT_VV(vxor_vv_b, 1, 1)
-GEN_VEXT_VV(vxor_vv_h, 2, 2)
-GEN_VEXT_VV(vxor_vv_w, 4, 4)
-GEN_VEXT_VV(vxor_vv_d, 8, 8)
+GEN_VEXT_VV(vand_vv_b)
+GEN_VEXT_VV(vand_vv_h)
+GEN_VEXT_VV(vand_vv_w)
+GEN_VEXT_VV(vand_vv_d)
+GEN_VEXT_VV(vor_vv_b)
+GEN_VEXT_VV(vor_vv_h)
+GEN_VEXT_VV(vor_vv_w)
+GEN_VEXT_VV(vor_vv_d)
+GEN_VEXT_VV(vxor_vv_b)
+GEN_VEXT_VV(vxor_vv_h)
+GEN_VEXT_VV(vxor_vv_w)
+GEN_VEXT_VV(vxor_vv_d)
 
 RVVCALL(OPIVX2, vand_vx_b, OP_SSS_B, H1, H1, DO_AND)
 RVVCALL(OPIVX2, vand_vx_h, OP_SSS_H, H2, H2, DO_AND)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vxor_vx_b, OP_SSS_B, H1, H1, DO_XOR)
 RVVCALL(OPIVX2, vxor_vx_h, OP_SSS_H, H2, H2, DO_XOR)
 RVVCALL(OPIVX2, vxor_vx_w, OP_SSS_W, H4, H4, DO_XOR)
 RVVCALL(OPIVX2, vxor_vx_d, OP_SSS_D, H8, H8, DO_XOR)
-GEN_VEXT_VX(vand_vx_b, 1, 1)
-GEN_VEXT_VX(vand_vx_h, 2, 2)
-GEN_VEXT_VX(vand_vx_w, 4, 4)
-GEN_VEXT_VX(vand_vx_d, 8, 8)
-GEN_VEXT_VX(vor_vx_b, 1, 1)
-GEN_VEXT_VX(vor_vx_h, 2, 2)
-GEN_VEXT_VX(vor_vx_w, 4, 4)
-GEN_VEXT_VX(vor_vx_d, 8, 8)
-GEN_VEXT_VX(vxor_vx_b, 1, 1)
-GEN_VEXT_VX(vxor_vx_h, 2, 2)
-GEN_VEXT_VX(vxor_vx_w, 4, 4)
-GEN_VEXT_VX(vxor_vx_d, 8, 8)
+GEN_VEXT_VX(vand_vx_b)
+GEN_VEXT_VX(vand_vx_h)
+GEN_VEXT_VX(vand_vx_w)
+GEN_VEXT_VX(vand_vx_d)
+GEN_VEXT_VX(vor_vx_b)
+GEN_VEXT_VX(vor_vx_h)
+GEN_VEXT_VX(vor_vx_w)
+GEN_VEXT_VX(vor_vx_d)
+GEN_VEXT_VX(vxor_vx_b)
+GEN_VEXT_VX(vxor_vx_h)
+GEN_VEXT_VX(vxor_vx_w)
+GEN_VEXT_VX(vxor_vx_d)
 
 /* Vector Single-Width Bit Shift Instructions */
 #define DO_SLL(N, M)  (N << (M))
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vmax_vv_b, OP_SSS_B, H1, H1, H1, DO_MAX)
 RVVCALL(OPIVV2, vmax_vv_h, OP_SSS_H, H2, H2, H2, DO_MAX)
 RVVCALL(OPIVV2, vmax_vv_w, OP_SSS_W, H4, H4, H4, DO_MAX)
 RVVCALL(OPIVV2, vmax_vv_d, OP_SSS_D, H8, H8, H8, DO_MAX)
-GEN_VEXT_VV(vminu_vv_b, 1, 1)
-GEN_VEXT_VV(vminu_vv_h, 2, 2)
-GEN_VEXT_VV(vminu_vv_w, 4, 4)
-GEN_VEXT_VV(vminu_vv_d, 8, 8)
-GEN_VEXT_VV(vmin_vv_b, 1, 1)
-GEN_VEXT_VV(vmin_vv_h, 2, 2)
-GEN_VEXT_VV(vmin_vv_w, 4, 4)
-GEN_VEXT_VV(vmin_vv_d, 8, 8)
-GEN_VEXT_VV(vmaxu_vv_b, 1, 1)
-GEN_VEXT_VV(vmaxu_vv_h, 2, 2)
-GEN_VEXT_VV(vmaxu_vv_w, 4, 4)
-GEN_VEXT_VV(vmaxu_vv_d, 8, 8)
-GEN_VEXT_VV(vmax_vv_b, 1, 1)
-GEN_VEXT_VV(vmax_vv_h, 2, 2)
-GEN_VEXT_VV(vmax_vv_w, 4, 4)
-GEN_VEXT_VV(vmax_vv_d, 8, 8)
+GEN_VEXT_VV(vminu_vv_b)
+GEN_VEXT_VV(vminu_vv_h)
+GEN_VEXT_VV(vminu_vv_w)
+GEN_VEXT_VV(vminu_vv_d)
+GEN_VEXT_VV(vmin_vv_b)
+GEN_VEXT_VV(vmin_vv_h)
+GEN_VEXT_VV(vmin_vv_w)
+GEN_VEXT_VV(vmin_vv_d)
+GEN_VEXT_VV(vmaxu_vv_b)
+GEN_VEXT_VV(vmaxu_vv_h)
+GEN_VEXT_VV(vmaxu_vv_w)
+GEN_VEXT_VV(vmaxu_vv_d)
+GEN_VEXT_VV(vmax_vv_b)
+GEN_VEXT_VV(vmax_vv_h)
+GEN_VEXT_VV(vmax_vv_w)
+GEN_VEXT_VV(vmax_vv_d)
 
 RVVCALL(OPIVX2, vminu_vx_b, OP_UUU_B, H1, H1, DO_MIN)
 RVVCALL(OPIVX2, vminu_vx_h, OP_UUU_H, H2, H2, DO_MIN)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vmax_vx_b, OP_SSS_B, H1, H1, DO_MAX)
 RVVCALL(OPIVX2, vmax_vx_h, OP_SSS_H, H2, H2, DO_MAX)
 RVVCALL(OPIVX2, vmax_vx_w, OP_SSS_W, H4, H4, DO_MAX)
 RVVCALL(OPIVX2, vmax_vx_d, OP_SSS_D, H8, H8, DO_MAX)
-GEN_VEXT_VX(vminu_vx_b, 1, 1)
-GEN_VEXT_VX(vminu_vx_h, 2, 2)
-GEN_VEXT_VX(vminu_vx_w, 4, 4)
-GEN_VEXT_VX(vminu_vx_d, 8, 8)
-GEN_VEXT_VX(vmin_vx_b, 1, 1)
-GEN_VEXT_VX(vmin_vx_h, 2, 2)
-GEN_VEXT_VX(vmin_vx_w, 4, 4)
-GEN_VEXT_VX(vmin_vx_d, 8, 8)
-GEN_VEXT_VX(vmaxu_vx_b, 1, 1)
-GEN_VEXT_VX(vmaxu_vx_h, 2, 2)
-GEN_VEXT_VX(vmaxu_vx_w, 4, 4)
-GEN_VEXT_VX(vmaxu_vx_d, 8, 8)
-GEN_VEXT_VX(vmax_vx_b, 1, 1)
-GEN_VEXT_VX(vmax_vx_h, 2, 2)
-GEN_VEXT_VX(vmax_vx_w, 4, 4)
-GEN_VEXT_VX(vmax_vx_d, 8, 8)
+GEN_VEXT_VX(vminu_vx_b)
+GEN_VEXT_VX(vminu_vx_h)
+GEN_VEXT_VX(vminu_vx_w)
+GEN_VEXT_VX(vminu_vx_d)
+GEN_VEXT_VX(vmin_vx_b)
+GEN_VEXT_VX(vmin_vx_h)
+GEN_VEXT_VX(vmin_vx_w)
+GEN_VEXT_VX(vmin_vx_d)
+GEN_VEXT_VX(vmaxu_vx_b)
+GEN_VEXT_VX(vmaxu_vx_h)
+GEN_VEXT_VX(vmaxu_vx_w)
+GEN_VEXT_VX(vmaxu_vx_d)
+GEN_VEXT_VX(vmax_vx_b)
+GEN_VEXT_VX(vmax_vx_h)
+GEN_VEXT_VX(vmax_vx_w)
+GEN_VEXT_VX(vmax_vx_d)
 
 /* Vector Single-Width Integer Multiply Instructions */
 #define DO_MUL(N, M) (N * M)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vmul_vv_b, OP_SSS_B, H1, H1, H1, DO_MUL)
 RVVCALL(OPIVV2, vmul_vv_h, OP_SSS_H, H2, H2, H2, DO_MUL)
 RVVCALL(OPIVV2, vmul_vv_w, OP_SSS_W, H4, H4, H4, DO_MUL)
 RVVCALL(OPIVV2, vmul_vv_d, OP_SSS_D, H8, H8, H8, DO_MUL)
-GEN_VEXT_VV(vmul_vv_b, 1, 1)
-GEN_VEXT_VV(vmul_vv_h, 2, 2)
-GEN_VEXT_VV(vmul_vv_w, 4, 4)
-GEN_VEXT_VV(vmul_vv_d, 8, 8)
+GEN_VEXT_VV(vmul_vv_b)
+GEN_VEXT_VV(vmul_vv_h)
+GEN_VEXT_VV(vmul_vv_w)
+GEN_VEXT_VV(vmul_vv_d)
 
 static int8_t do_mulh_b(int8_t s2, int8_t s1)
 {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vmulhsu_vv_b, OP_SUS_B, H1, H1, H1, do_mulhsu_b)
 RVVCALL(OPIVV2, vmulhsu_vv_h, OP_SUS_H, H2, H2, H2, do_mulhsu_h)
 RVVCALL(OPIVV2, vmulhsu_vv_w, OP_SUS_W, H4, H4, H4, do_mulhsu_w)
 RVVCALL(OPIVV2, vmulhsu_vv_d, OP_SUS_D, H8, H8, H8, do_mulhsu_d)
-GEN_VEXT_VV(vmulh_vv_b, 1, 1)
-GEN_VEXT_VV(vmulh_vv_h, 2, 2)
-GEN_VEXT_VV(vmulh_vv_w, 4, 4)
-GEN_VEXT_VV(vmulh_vv_d, 8, 8)
-GEN_VEXT_VV(vmulhu_vv_b, 1, 1)
-GEN_VEXT_VV(vmulhu_vv_h, 2, 2)
-GEN_VEXT_VV(vmulhu_vv_w, 4, 4)
-GEN_VEXT_VV(vmulhu_vv_d, 8, 8)
-GEN_VEXT_VV(vmulhsu_vv_b, 1, 1)
-GEN_VEXT_VV(vmulhsu_vv_h, 2, 2)
-GEN_VEXT_VV(vmulhsu_vv_w, 4, 4)
-GEN_VEXT_VV(vmulhsu_vv_d, 8, 8)
+GEN_VEXT_VV(vmulh_vv_b)
+GEN_VEXT_VV(vmulh_vv_h)
+GEN_VEXT_VV(vmulh_vv_w)
+GEN_VEXT_VV(vmulh_vv_d)
+GEN_VEXT_VV(vmulhu_vv_b)
+GEN_VEXT_VV(vmulhu_vv_h)
+GEN_VEXT_VV(vmulhu_vv_w)
+GEN_VEXT_VV(vmulhu_vv_d)
+GEN_VEXT_VV(vmulhsu_vv_b)
+GEN_VEXT_VV(vmulhsu_vv_h)
+GEN_VEXT_VV(vmulhsu_vv_w)
+GEN_VEXT_VV(vmulhsu_vv_d)
 
 RVVCALL(OPIVX2, vmul_vx_b, OP_SSS_B, H1, H1, DO_MUL)
 RVVCALL(OPIVX2, vmul_vx_h, OP_SSS_H, H2, H2, DO_MUL)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vmulhsu_vx_b, OP_SUS_B, H1, H1, do_mulhsu_b)
 RVVCALL(OPIVX2, vmulhsu_vx_h, OP_SUS_H, H2, H2, do_mulhsu_h)
 RVVCALL(OPIVX2, vmulhsu_vx_w, OP_SUS_W, H4, H4, do_mulhsu_w)
 RVVCALL(OPIVX2, vmulhsu_vx_d, OP_SUS_D, H8, H8, do_mulhsu_d)
-GEN_VEXT_VX(vmul_vx_b, 1, 1)
-GEN_VEXT_VX(vmul_vx_h, 2, 2)
-GEN_VEXT_VX(vmul_vx_w, 4, 4)
-GEN_VEXT_VX(vmul_vx_d, 8, 8)
-GEN_VEXT_VX(vmulh_vx_b, 1, 1)
-GEN_VEXT_VX(vmulh_vx_h, 2, 2)
-GEN_VEXT_VX(vmulh_vx_w, 4, 4)
-GEN_VEXT_VX(vmulh_vx_d, 8, 8)
-GEN_VEXT_VX(vmulhu_vx_b, 1, 1)
-GEN_VEXT_VX(vmulhu_vx_h, 2, 2)
-GEN_VEXT_VX(vmulhu_vx_w, 4, 4)
-GEN_VEXT_VX(vmulhu_vx_d, 8, 8)
-GEN_VEXT_VX(vmulhsu_vx_b, 1, 1)
-GEN_VEXT_VX(vmulhsu_vx_h, 2, 2)
-GEN_VEXT_VX(vmulhsu_vx_w, 4, 4)
-GEN_VEXT_VX(vmulhsu_vx_d, 8, 8)
+GEN_VEXT_VX(vmul_vx_b)
+GEN_VEXT_VX(vmul_vx_h)
+GEN_VEXT_VX(vmul_vx_w)
+GEN_VEXT_VX(vmul_vx_d)
+GEN_VEXT_VX(vmulh_vx_b)
+GEN_VEXT_VX(vmulh_vx_h)
+GEN_VEXT_VX(vmulh_vx_w)
+GEN_VEXT_VX(vmulh_vx_d)
+GEN_VEXT_VX(vmulhu_vx_b)
+GEN_VEXT_VX(vmulhu_vx_h)
+GEN_VEXT_VX(vmulhu_vx_w)
+GEN_VEXT_VX(vmulhu_vx_d)
+GEN_VEXT_VX(vmulhsu_vx_b)
+GEN_VEXT_VX(vmulhsu_vx_h)
+GEN_VEXT_VX(vmulhsu_vx_w)
+GEN_VEXT_VX(vmulhsu_vx_d)
 
 /* Vector Integer Divide Instructions */
 #define DO_DIVU(N, M) (unlikely(M == 0) ? (__typeof(N))(-1) : N / M)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vrem_vv_b, OP_SSS_B, H1, H1, H1, DO_REM)
 RVVCALL(OPIVV2, vrem_vv_h, OP_SSS_H, H2, H2, H2, DO_REM)
 RVVCALL(OPIVV2, vrem_vv_w, OP_SSS_W, H4, H4, H4, DO_REM)
 RVVCALL(OPIVV2, vrem_vv_d, OP_SSS_D, H8, H8, H8, DO_REM)
-GEN_VEXT_VV(vdivu_vv_b, 1, 1)
-GEN_VEXT_VV(vdivu_vv_h, 2, 2)
-GEN_VEXT_VV(vdivu_vv_w, 4, 4)
-GEN_VEXT_VV(vdivu_vv_d, 8, 8)
-GEN_VEXT_VV(vdiv_vv_b, 1, 1)
-GEN_VEXT_VV(vdiv_vv_h, 2, 2)
-GEN_VEXT_VV(vdiv_vv_w, 4, 4)
-GEN_VEXT_VV(vdiv_vv_d, 8, 8)
-GEN_VEXT_VV(vremu_vv_b, 1, 1)
-GEN_VEXT_VV(vremu_vv_h, 2, 2)
-GEN_VEXT_VV(vremu_vv_w, 4, 4)
-GEN_VEXT_VV(vremu_vv_d, 8, 8)
-GEN_VEXT_VV(vrem_vv_b, 1, 1)
-GEN_VEXT_VV(vrem_vv_h, 2, 2)
-GEN_VEXT_VV(vrem_vv_w, 4, 4)
-GEN_VEXT_VV(vrem_vv_d, 8, 8)
+GEN_VEXT_VV(vdivu_vv_b)
+GEN_VEXT_VV(vdivu_vv_h)
+GEN_VEXT_VV(vdivu_vv_w)
+GEN_VEXT_VV(vdivu_vv_d)
+GEN_VEXT_VV(vdiv_vv_b)
+GEN_VEXT_VV(vdiv_vv_h)
+GEN_VEXT_VV(vdiv_vv_w)
+GEN_VEXT_VV(vdiv_vv_d)
+GEN_VEXT_VV(vremu_vv_b)
+GEN_VEXT_VV(vremu_vv_h)
+GEN_VEXT_VV(vremu_vv_w)
+GEN_VEXT_VV(vremu_vv_d)
+GEN_VEXT_VV(vrem_vv_b)
+GEN_VEXT_VV(vrem_vv_h)
+GEN_VEXT_VV(vrem_vv_w)
+GEN_VEXT_VV(vrem_vv_d)
 
 RVVCALL(OPIVX2, vdivu_vx_b, OP_UUU_B, H1, H1, DO_DIVU)
 RVVCALL(OPIVX2, vdivu_vx_h, OP_UUU_H, H2, H2, DO_DIVU)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vrem_vx_b, OP_SSS_B, H1, H1, DO_REM)
 RVVCALL(OPIVX2, vrem_vx_h, OP_SSS_H, H2, H2, DO_REM)
 RVVCALL(OPIVX2, vrem_vx_w, OP_SSS_W, H4, H4, DO_REM)
 RVVCALL(OPIVX2, vrem_vx_d, OP_SSS_D, H8, H8, DO_REM)
-GEN_VEXT_VX(vdivu_vx_b, 1, 1)
-GEN_VEXT_VX(vdivu_vx_h, 2, 2)
-GEN_VEXT_VX(vdivu_vx_w, 4, 4)
-GEN_VEXT_VX(vdivu_vx_d, 8, 8)
-GEN_VEXT_VX(vdiv_vx_b, 1, 1)
-GEN_VEXT_VX(vdiv_vx_h, 2, 2)
-GEN_VEXT_VX(vdiv_vx_w, 4, 4)
-GEN_VEXT_VX(vdiv_vx_d, 8, 8)
-GEN_VEXT_VX(vremu_vx_b, 1, 1)
-GEN_VEXT_VX(vremu_vx_h, 2, 2)
-GEN_VEXT_VX(vremu_vx_w, 4, 4)
-GEN_VEXT_VX(vremu_vx_d, 8, 8)
-GEN_VEXT_VX(vrem_vx_b, 1, 1)
-GEN_VEXT_VX(vrem_vx_h, 2, 2)
-GEN_VEXT_VX(vrem_vx_w, 4, 4)
-GEN_VEXT_VX(vrem_vx_d, 8, 8)
+GEN_VEXT_VX(vdivu_vx_b)
+GEN_VEXT_VX(vdivu_vx_h)
+GEN_VEXT_VX(vdivu_vx_w)
+GEN_VEXT_VX(vdivu_vx_d)
+GEN_VEXT_VX(vdiv_vx_b)
+GEN_VEXT_VX(vdiv_vx_h)
+GEN_VEXT_VX(vdiv_vx_w)
+GEN_VEXT_VX(vdiv_vx_d)
+GEN_VEXT_VX(vremu_vx_b)
+GEN_VEXT_VX(vremu_vx_h)
+GEN_VEXT_VX(vremu_vx_w)
+GEN_VEXT_VX(vremu_vx_d)
+GEN_VEXT_VX(vrem_vx_b)
+GEN_VEXT_VX(vrem_vx_h)
+GEN_VEXT_VX(vrem_vx_w)
+GEN_VEXT_VX(vrem_vx_d)
 
 /* Vector Widening Integer Multiply Instructions */
 RVVCALL(OPIVV2, vwmul_vv_b, WOP_SSS_B, H2, H1, H1, DO_MUL)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vwmulu_vv_w, WOP_UUU_W, H8, H4, H4, DO_MUL)
 RVVCALL(OPIVV2, vwmulsu_vv_b, WOP_SUS_B, H2, H1, H1, DO_MUL)
 RVVCALL(OPIVV2, vwmulsu_vv_h, WOP_SUS_H, H4, H2, H2, DO_MUL)
 RVVCALL(OPIVV2, vwmulsu_vv_w, WOP_SUS_W, H8, H4, H4, DO_MUL)
-GEN_VEXT_VV(vwmul_vv_b, 1, 2)
-GEN_VEXT_VV(vwmul_vv_h, 2, 4)
-GEN_VEXT_VV(vwmul_vv_w, 4, 8)
-GEN_VEXT_VV(vwmulu_vv_b, 1, 2)
-GEN_VEXT_VV(vwmulu_vv_h, 2, 4)
-GEN_VEXT_VV(vwmulu_vv_w, 4, 8)
-GEN_VEXT_VV(vwmulsu_vv_b, 1, 2)
-GEN_VEXT_VV(vwmulsu_vv_h, 2, 4)
-GEN_VEXT_VV(vwmulsu_vv_w, 4, 8)
+GEN_VEXT_VV(vwmul_vv_b)
+GEN_VEXT_VV(vwmul_vv_h)
+GEN_VEXT_VV(vwmul_vv_w)
+GEN_VEXT_VV(vwmulu_vv_b)
+GEN_VEXT_VV(vwmulu_vv_h)
+GEN_VEXT_VV(vwmulu_vv_w)
+GEN_VEXT_VV(vwmulsu_vv_b)
+GEN_VEXT_VV(vwmulsu_vv_h)
+GEN_VEXT_VV(vwmulsu_vv_w)
 
 RVVCALL(OPIVX2, vwmul_vx_b, WOP_SSS_B, H2, H1, DO_MUL)
 RVVCALL(OPIVX2, vwmul_vx_h, WOP_SSS_H, H4, H2, DO_MUL)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vwmulu_vx_w, WOP_UUU_W, H8, H4, DO_MUL)
 RVVCALL(OPIVX2, vwmulsu_vx_b, WOP_SUS_B, H2, H1, DO_MUL)
 RVVCALL(OPIVX2, vwmulsu_vx_h, WOP_SUS_H, H4, H2, DO_MUL)
 RVVCALL(OPIVX2, vwmulsu_vx_w, WOP_SUS_W, H8, H4, DO_MUL)
-GEN_VEXT_VX(vwmul_vx_b, 1, 2)
-GEN_VEXT_VX(vwmul_vx_h, 2, 4)
-GEN_VEXT_VX(vwmul_vx_w, 4, 8)
-GEN_VEXT_VX(vwmulu_vx_b, 1, 2)
-GEN_VEXT_VX(vwmulu_vx_h, 2, 4)
-GEN_VEXT_VX(vwmulu_vx_w, 4, 8)
-GEN_VEXT_VX(vwmulsu_vx_b, 1, 2)
-GEN_VEXT_VX(vwmulsu_vx_h, 2, 4)
-GEN_VEXT_VX(vwmulsu_vx_w, 4, 8)
+GEN_VEXT_VX(vwmul_vx_b)
+GEN_VEXT_VX(vwmul_vx_h)
+GEN_VEXT_VX(vwmul_vx_w)
+GEN_VEXT_VX(vwmulu_vx_b)
+GEN_VEXT_VX(vwmulu_vx_h)
+GEN_VEXT_VX(vwmulu_vx_w)
+GEN_VEXT_VX(vwmulsu_vx_b)
+GEN_VEXT_VX(vwmulsu_vx_h)
+GEN_VEXT_VX(vwmulsu_vx_w)
 
 /* Vector Single-Width Integer Multiply-Add Instructions */
 #define OPIVV3(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)   \
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV3, vnmsub_vv_b, OP_SSS_B, H1, H1, H1, DO_NMSUB)
 RVVCALL(OPIVV3, vnmsub_vv_h, OP_SSS_H, H2, H2, H2, DO_NMSUB)
 RVVCALL(OPIVV3, vnmsub_vv_w, OP_SSS_W, H4, H4, H4, DO_NMSUB)
 RVVCALL(OPIVV3, vnmsub_vv_d, OP_SSS_D, H8, H8, H8, DO_NMSUB)
-GEN_VEXT_VV(vmacc_vv_b, 1, 1)
-GEN_VEXT_VV(vmacc_vv_h, 2, 2)
-GEN_VEXT_VV(vmacc_vv_w, 4, 4)
-GEN_VEXT_VV(vmacc_vv_d, 8, 8)
-GEN_VEXT_VV(vnmsac_vv_b, 1, 1)
-GEN_VEXT_VV(vnmsac_vv_h, 2, 2)
-GEN_VEXT_VV(vnmsac_vv_w, 4, 4)
-GEN_VEXT_VV(vnmsac_vv_d, 8, 8)
-GEN_VEXT_VV(vmadd_vv_b, 1, 1)
-GEN_VEXT_VV(vmadd_vv_h, 2, 2)
-GEN_VEXT_VV(vmadd_vv_w, 4, 4)
-GEN_VEXT_VV(vmadd_vv_d, 8, 8)
-GEN_VEXT_VV(vnmsub_vv_b, 1, 1)
-GEN_VEXT_VV(vnmsub_vv_h, 2, 2)
-GEN_VEXT_VV(vnmsub_vv_w, 4, 4)
-GEN_VEXT_VV(vnmsub_vv_d, 8, 8)
+GEN_VEXT_VV(vmacc_vv_b)
+GEN_VEXT_VV(vmacc_vv_h)
+GEN_VEXT_VV(vmacc_vv_w)
+GEN_VEXT_VV(vmacc_vv_d)
+GEN_VEXT_VV(vnmsac_vv_b)
+GEN_VEXT_VV(vnmsac_vv_h)
+GEN_VEXT_VV(vnmsac_vv_w)
+GEN_VEXT_VV(vnmsac_vv_d)
+GEN_VEXT_VV(vmadd_vv_b)
+GEN_VEXT_VV(vmadd_vv_h)
+GEN_VEXT_VV(vmadd_vv_w)
+GEN_VEXT_VV(vmadd_vv_d)
+GEN_VEXT_VV(vnmsub_vv_b)
+GEN_VEXT_VV(vnmsub_vv_h)
+GEN_VEXT_VV(vnmsub_vv_w)
+GEN_VEXT_VV(vnmsub_vv_d)
 
 #define OPIVX3(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
 static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX3, vnmsub_vx_b, OP_SSS_B, H1, H1, DO_NMSUB)
 RVVCALL(OPIVX3, vnmsub_vx_h, OP_SSS_H, H2, H2, DO_NMSUB)
 RVVCALL(OPIVX3, vnmsub_vx_w, OP_SSS_W, H4, H4, DO_NMSUB)
 RVVCALL(OPIVX3, vnmsub_vx_d, OP_SSS_D, H8, H8, DO_NMSUB)
-GEN_VEXT_VX(vmacc_vx_b, 1, 1)
-GEN_VEXT_VX(vmacc_vx_h, 2, 2)
-GEN_VEXT_VX(vmacc_vx_w, 4, 4)
-GEN_VEXT_VX(vmacc_vx_d, 8, 8)
-GEN_VEXT_VX(vnmsac_vx_b, 1, 1)
-GEN_VEXT_VX(vnmsac_vx_h, 2, 2)
-GEN_VEXT_VX(vnmsac_vx_w, 4, 4)
-GEN_VEXT_VX(vnmsac_vx_d, 8, 8)
-GEN_VEXT_VX(vmadd_vx_b, 1, 1)
-GEN_VEXT_VX(vmadd_vx_h, 2, 2)
-GEN_VEXT_VX(vmadd_vx_w, 4, 4)
-GEN_VEXT_VX(vmadd_vx_d, 8, 8)
-GEN_VEXT_VX(vnmsub_vx_b, 1, 1)
-GEN_VEXT_VX(vnmsub_vx_h, 2, 2)
-GEN_VEXT_VX(vnmsub_vx_w, 4, 4)
-GEN_VEXT_VX(vnmsub_vx_d, 8, 8)
+GEN_VEXT_VX(vmacc_vx_b)
+GEN_VEXT_VX(vmacc_vx_h)
+GEN_VEXT_VX(vmacc_vx_w)
+GEN_VEXT_VX(vmacc_vx_d)
+GEN_VEXT_VX(vnmsac_vx_b)
+GEN_VEXT_VX(vnmsac_vx_h)
+GEN_VEXT_VX(vnmsac_vx_w)
+GEN_VEXT_VX(vnmsac_vx_d)
+GEN_VEXT_VX(vmadd_vx_b)
+GEN_VEXT_VX(vmadd_vx_h)
+GEN_VEXT_VX(vmadd_vx_w)
+GEN_VEXT_VX(vmadd_vx_d)
+GEN_VEXT_VX(vnmsub_vx_b)
+GEN_VEXT_VX(vnmsub_vx_h)
+GEN_VEXT_VX(vnmsub_vx_w)
+GEN_VEXT_VX(vnmsub_vx_d)
 
 /* Vector Widening Integer Multiply-Add Instructions */
 RVVCALL(OPIVV3, vwmaccu_vv_b, WOP_UUU_B, H2, H1, H1, DO_MACC)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV3, vwmacc_vv_w, WOP_SSS_W, H8, H4, H4, DO_MACC)
 RVVCALL(OPIVV3, vwmaccsu_vv_b, WOP_SSU_B, H2, H1, H1, DO_MACC)
 RVVCALL(OPIVV3, vwmaccsu_vv_h, WOP_SSU_H, H4, H2, H2, DO_MACC)
 RVVCALL(OPIVV3, vwmaccsu_vv_w, WOP_SSU_W, H8, H4, H4, DO_MACC)
-GEN_VEXT_VV(vwmaccu_vv_b, 1, 2)
-GEN_VEXT_VV(vwmaccu_vv_h, 2, 4)
-GEN_VEXT_VV(vwmaccu_vv_w, 4, 8)
-GEN_VEXT_VV(vwmacc_vv_b, 1, 2)
-GEN_VEXT_VV(vwmacc_vv_h, 2, 4)
-GEN_VEXT_VV(vwmacc_vv_w, 4, 8)
-GEN_VEXT_VV(vwmaccsu_vv_b, 1, 2)
-GEN_VEXT_VV(vwmaccsu_vv_h, 2, 4)
-GEN_VEXT_VV(vwmaccsu_vv_w, 4, 8)
+GEN_VEXT_VV(vwmaccu_vv_b)
+GEN_VEXT_VV(vwmaccu_vv_h)
+GEN_VEXT_VV(vwmaccu_vv_w)
+GEN_VEXT_VV(vwmacc_vv_b)
+GEN_VEXT_VV(vwmacc_vv_h)
+GEN_VEXT_VV(vwmacc_vv_w)
+GEN_VEXT_VV(vwmaccsu_vv_b)
+GEN_VEXT_VV(vwmaccsu_vv_h)
+GEN_VEXT_VV(vwmaccsu_vv_w)
 
 RVVCALL(OPIVX3, vwmaccu_vx_b, WOP_UUU_B, H2, H1, DO_MACC)
 RVVCALL(OPIVX3, vwmaccu_vx_h, WOP_UUU_H, H4, H2, DO_MACC)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX3, vwmaccsu_vx_w, WOP_SSU_W, H8, H4, DO_MACC)
 RVVCALL(OPIVX3, vwmaccus_vx_b, WOP_SUS_B, H2, H1, DO_MACC)
 RVVCALL(OPIVX3, vwmaccus_vx_h, WOP_SUS_H, H4, H2, DO_MACC)
 RVVCALL(OPIVX3, vwmaccus_vx_w, WOP_SUS_W, H8, H4, DO_MACC)
-GEN_VEXT_VX(vwmaccu_vx_b, 1, 2)
-GEN_VEXT_VX(vwmaccu_vx_h, 2, 4)
-GEN_VEXT_VX(vwmaccu_vx_w, 4, 8)
-GEN_VEXT_VX(vwmacc_vx_b, 1, 2)
-GEN_VEXT_VX(vwmacc_vx_h, 2, 4)
-GEN_VEXT_VX(vwmacc_vx_w, 4, 8)
-GEN_VEXT_VX(vwmaccsu_vx_b, 1, 2)
-GEN_VEXT_VX(vwmaccsu_vx_h, 2, 4)
-GEN_VEXT_VX(vwmaccsu_vx_w, 4, 8)
-GEN_VEXT_VX(vwmaccus_vx_b, 1, 2)
-GEN_VEXT_VX(vwmaccus_vx_h, 2, 4)
-GEN_VEXT_VX(vwmaccus_vx_w, 4, 8)
+GEN_VEXT_VX(vwmaccu_vx_b)
+GEN_VEXT_VX(vwmaccu_vx_h)
+GEN_VEXT_VX(vwmaccu_vx_w)
+GEN_VEXT_VX(vwmacc_vx_b)
+GEN_VEXT_VX(vwmacc_vx_h)
+GEN_VEXT_VX(vwmacc_vx_w)
+GEN_VEXT_VX(vwmaccsu_vx_b)
+GEN_VEXT_VX(vwmaccsu_vx_h)
+GEN_VEXT_VX(vwmaccsu_vx_w)
+GEN_VEXT_VX(vwmaccus_vx_b)
+GEN_VEXT_VX(vwmaccus_vx_h)
+GEN_VEXT_VX(vwmaccus_vx_w)
 
 /* Vector Integer Merge and Move Instructions */
 #define GEN_VEXT_VMV_VV(NAME, ETYPE, H)                              \
@@ -XXX,XX +XXX,XX @@ vext_vv_rm_1(void *vd, void *v0, void *vs1, void *vs2,
 static inline void
 vext_vv_rm_2(void *vd, void *v0, void *vs1, void *vs2,
              CPURISCVState *env,
-             uint32_t desc, uint32_t esz, uint32_t dsz,
+             uint32_t desc,
              opivv2_rm_fn *fn)
 {
     uint32_t vm = vext_vm(desc);
@@ -XXX,XX +XXX,XX @@ vext_vv_rm_2(void *vd, void *v0, void *vs1, void *vs2,
 }
 
 /* generate helpers for fixed point instructions with OPIVV format */
-#define GEN_VEXT_VV_RM(NAME, ESZ, DSZ)                          \
+#define GEN_VEXT_VV_RM(NAME)                                    \
 void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,     \
                   CPURISCVState *env, uint32_t desc)            \
 {                                                               \
-    vext_vv_rm_2(vd, v0, vs1, vs2, env, desc, ESZ, DSZ,         \
+    vext_vv_rm_2(vd, v0, vs1, vs2, env, desc,                   \
                  do_##NAME);                                    \
 }
 
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vsaddu_vv_b, OP_UUU_B, H1, H1, H1, saddu8)
 RVVCALL(OPIVV2_RM, vsaddu_vv_h, OP_UUU_H, H2, H2, H2, saddu16)
 RVVCALL(OPIVV2_RM, vsaddu_vv_w, OP_UUU_W, H4, H4, H4, saddu32)
 RVVCALL(OPIVV2_RM, vsaddu_vv_d, OP_UUU_D, H8, H8, H8, saddu64)
-GEN_VEXT_VV_RM(vsaddu_vv_b, 1, 1)
-GEN_VEXT_VV_RM(vsaddu_vv_h, 2, 2)
-GEN_VEXT_VV_RM(vsaddu_vv_w, 4, 4)
-GEN_VEXT_VV_RM(vsaddu_vv_d, 8, 8)
+GEN_VEXT_VV_RM(vsaddu_vv_b)
+GEN_VEXT_VV_RM(vsaddu_vv_h)
+GEN_VEXT_VV_RM(vsaddu_vv_w)
+GEN_VEXT_VV_RM(vsaddu_vv_d)
 
 typedef void opivx2_rm_fn(void *vd, target_long s1, void *vs2, int i,
                           CPURISCVState *env, int vxrm);
@@ -XXX,XX +XXX,XX @@ vext_vx_rm_1(void *vd, void *v0, target_long s1, void *vs2,
 static inline void
 vext_vx_rm_2(void *vd, void *v0, target_long s1, void *vs2,
              CPURISCVState *env,
-             uint32_t desc, uint32_t esz, uint32_t dsz,
+             uint32_t desc,
              opivx2_rm_fn *fn)
 {
     uint32_t vm = vext_vm(desc);
@@ -XXX,XX +XXX,XX @@ vext_vx_rm_2(void *vd, void *v0, target_long s1, void *vs2,
 }
 
 /* generate helpers for fixed point instructions with OPIVX format */
-#define GEN_VEXT_VX_RM(NAME, ESZ, DSZ)                    \
+#define GEN_VEXT_VX_RM(NAME)                              \
 void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
         void *vs2, CPURISCVState *env, uint32_t desc)     \
 {                                                         \
-    vext_vx_rm_2(vd, v0, s1, vs2, env, desc, ESZ, DSZ,    \
+    vext_vx_rm_2(vd, v0, s1, vs2, env, desc,              \
                  do_##NAME);                              \
 }
 
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2_RM, vsaddu_vx_b, OP_UUU_B, H1, H1, saddu8)
 RVVCALL(OPIVX2_RM, vsaddu_vx_h, OP_UUU_H, H2, H2, saddu16)
 RVVCALL(OPIVX2_RM, vsaddu_vx_w, OP_UUU_W, H4, H4, saddu32)
 RVVCALL(OPIVX2_RM, vsaddu_vx_d, OP_UUU_D, H8, H8, saddu64)
-GEN_VEXT_VX_RM(vsaddu_vx_b, 1, 1)
-GEN_VEXT_VX_RM(vsaddu_vx_h, 2, 2)
-GEN_VEXT_VX_RM(vsaddu_vx_w, 4, 4)
-GEN_VEXT_VX_RM(vsaddu_vx_d, 8, 8)
+GEN_VEXT_VX_RM(vsaddu_vx_b)
+GEN_VEXT_VX_RM(vsaddu_vx_h)
+GEN_VEXT_VX_RM(vsaddu_vx_w)
+GEN_VEXT_VX_RM(vsaddu_vx_d)
 
 static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vsadd_vv_b, OP_SSS_B, H1, H1, H1, sadd8)
 RVVCALL(OPIVV2_RM, vsadd_vv_h, OP_SSS_H, H2, H2, H2, sadd16)
 RVVCALL(OPIVV2_RM, vsadd_vv_w, OP_SSS_W, H4, H4, H4, sadd32)
 RVVCALL(OPIVV2_RM, vsadd_vv_d, OP_SSS_D, H8, H8, H8, sadd64)
-GEN_VEXT_VV_RM(vsadd_vv_b, 1, 1)
-GEN_VEXT_VV_RM(vsadd_vv_h, 2, 2)
-GEN_VEXT_VV_RM(vsadd_vv_w, 4, 4)
-GEN_VEXT_VV_RM(vsadd_vv_d, 8, 8)
+GEN_VEXT_VV_RM(vsadd_vv_b)
+GEN_VEXT_VV_RM(vsadd_vv_h)
+GEN_VEXT_VV_RM(vsadd_vv_w)
+GEN_VEXT_VV_RM(vsadd_vv_d)
 
 RVVCALL(OPIVX2_RM, vsadd_vx_b, OP_SSS_B, H1, H1, sadd8)
 RVVCALL(OPIVX2_RM, vsadd_vx_h, OP_SSS_H, H2, H2, sadd16)
 RVVCALL(OPIVX2_RM, vsadd_vx_w, OP_SSS_W, H4, H4, sadd32)
 RVVCALL(OPIVX2_RM, vsadd_vx_d, OP_SSS_D, H8, H8, sadd64)
-GEN_VEXT_VX_RM(vsadd_vx_b, 1, 1)
-GEN_VEXT_VX_RM(vsadd_vx_h, 2, 2)
-GEN_VEXT_VX_RM(vsadd_vx_w, 4, 4)
-GEN_VEXT_VX_RM(vsadd_vx_d, 8, 8)
+GEN_VEXT_VX_RM(vsadd_vx_b)
+GEN_VEXT_VX_RM(vsadd_vx_h)
+GEN_VEXT_VX_RM(vsadd_vx_w)
+GEN_VEXT_VX_RM(vsadd_vx_d)
 
 static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
 {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssubu_vv_b, OP_UUU_B, H1, H1, H1, ssubu8)
 RVVCALL(OPIVV2_RM, vssubu_vv_h, OP_UUU_H, H2, H2, H2, ssubu16)
 RVVCALL(OPIVV2_RM, vssubu_vv_w, OP_UUU_W, H4, H4, H4, ssubu32)
 RVVCALL(OPIVV2_RM, vssubu_vv_d, OP_UUU_D, H8, H8, H8, ssubu64)
-GEN_VEXT_VV_RM(vssubu_vv_b, 1, 1)
-GEN_VEXT_VV_RM(vssubu_vv_h, 2, 2)
-GEN_VEXT_VV_RM(vssubu_vv_w, 4, 4)
-GEN_VEXT_VV_RM(vssubu_vv_d, 8, 8)
+GEN_VEXT_VV_RM(vssubu_vv_b)
+GEN_VEXT_VV_RM(vssubu_vv_h)
+GEN_VEXT_VV_RM(vssubu_vv_w)
+GEN_VEXT_VV_RM(vssubu_vv_d)
 
 RVVCALL(OPIVX2_RM, vssubu_vx_b, OP_UUU_B, H1, H1, ssubu8)
 RVVCALL(OPIVX2_RM, vssubu_vx_h, OP_UUU_H, H2, H2, ssubu16)
 RVVCALL(OPIVX2_RM, vssubu_vx_w, OP_UUU_W, H4, H4, ssubu32)
 RVVCALL(OPIVX2_RM, vssubu_vx_d, OP_UUU_D, H8, H8, ssubu64)
-GEN_VEXT_VX_RM(vssubu_vx_b, 1, 1)
-GEN_VEXT_VX_RM(vssubu_vx_h, 2, 2)
-GEN_VEXT_VX_RM(vssubu_vx_w, 4, 4)
-GEN_VEXT_VX_RM(vssubu_vx_d, 8, 8)
+GEN_VEXT_VX_RM(vssubu_vx_b)
+GEN_VEXT_VX_RM(vssubu_vx_h)
+GEN_VEXT_VX_RM(vssubu_vx_w)
+GEN_VEXT_VX_RM(vssubu_vx_d)
 
 static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssub_vv_b, OP_SSS_B, H1, H1, H1, ssub8)
 RVVCALL(OPIVV2_RM, vssub_vv_h, OP_SSS_H, H2, H2, H2, ssub16)
 RVVCALL(OPIVV2_RM, vssub_vv_w, OP_SSS_W, H4, H4, H4, ssub32)
 RVVCALL(OPIVV2_RM, vssub_vv_d, OP_SSS_D, H8, H8, H8, ssub64)
-GEN_VEXT_VV_RM(vssub_vv_b, 1, 1)
-GEN_VEXT_VV_RM(vssub_vv_h, 2, 2)
-GEN_VEXT_VV_RM(vssub_vv_w, 4, 4)
-GEN_VEXT_VV_RM(vssub_vv_d, 8, 8)
+GEN_VEXT_VV_RM(vssub_vv_b)
+GEN_VEXT_VV_RM(vssub_vv_h)
+GEN_VEXT_VV_RM(vssub_vv_w)
+GEN_VEXT_VV_RM(vssub_vv_d)
 
 RVVCALL(OPIVX2_RM, vssub_vx_b, OP_SSS_B, H1, H1, ssub8)
 RVVCALL(OPIVX2_RM, vssub_vx_h, OP_SSS_H, H2, H2, ssub16)
 RVVCALL(OPIVX2_RM, vssub_vx_w, OP_SSS_W, H4, H4, ssub32)
 RVVCALL(OPIVX2_RM, vssub_vx_d, OP_SSS_D, H8, H8, ssub64)
-GEN_VEXT_VX_RM(vssub_vx_b, 1, 1)
-GEN_VEXT_VX_RM(vssub_vx_h, 2, 2)
-GEN_VEXT_VX_RM(vssub_vx_w, 4, 4)
-GEN_VEXT_VX_RM(vssub_vx_d, 8, 8)
+GEN_VEXT_VX_RM(vssub_vx_b)
+GEN_VEXT_VX_RM(vssub_vx_h)
+GEN_VEXT_VX_RM(vssub_vx_w)
+GEN_VEXT_VX_RM(vssub_vx_d)
 
 /* Vector Single-Width Averaging Add and Subtract */
 static inline uint8_t get_round(int vxrm, uint64_t v, uint8_t shift)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd32)
 RVVCALL(OPIVV2_RM, vaadd_vv_h, OP_SSS_H, H2, H2, H2, aadd32)
 RVVCALL(OPIVV2_RM, vaadd_vv_w, OP_SSS_W, H4, H4, H4, aadd32)
 RVVCALL(OPIVV2_RM, vaadd_vv_d, OP_SSS_D, H8, H8, H8, aadd64)
-GEN_VEXT_VV_RM(vaadd_vv_b, 1, 1)
-GEN_VEXT_VV_RM(vaadd_vv_h, 2, 2)
-GEN_VEXT_VV_RM(vaadd_vv_w, 4, 4)
-GEN_VEXT_VV_RM(vaadd_vv_d, 8, 8)
+GEN_VEXT_VV_RM(vaadd_vv_b)
+GEN_VEXT_VV_RM(vaadd_vv_h)
+GEN_VEXT_VV_RM(vaadd_vv_w)
+GEN_VEXT_VV_RM(vaadd_vv_d)
 
 RVVCALL(OPIVX2_RM, vaadd_vx_b, OP_SSS_B, H1, H1, aadd32)
 RVVCALL(OPIVX2_RM, vaadd_vx_h, OP_SSS_H, H2, H2, aadd32)
 RVVCALL(OPIVX2_RM, vaadd_vx_w, OP_SSS_W, H4, H4, aadd32)
 RVVCALL(OPIVX2_RM, vaadd_vx_d, OP_SSS_D, H8, H8, aadd64)
-GEN_VEXT_VX_RM(vaadd_vx_b, 1, 1)
-GEN_VEXT_VX_RM(vaadd_vx_h, 2, 2)
-GEN_VEXT_VX_RM(vaadd_vx_w, 4, 4)
-GEN_VEXT_VX_RM(vaadd_vx_d, 8, 8)
+GEN_VEXT_VX_RM(vaadd_vx_b)
+GEN_VEXT_VX_RM(vaadd_vx_h)
+GEN_VEXT_VX_RM(vaadd_vx_w)
+GEN_VEXT_VX_RM(vaadd_vx_d)
 
 static inline uint32_t aaddu32(CPURISCVState *env, int vxrm,
                                uint32_t a, uint32_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vaaddu_vv_b, OP_UUU_B, H1, H1, H1, aaddu32)
 RVVCALL(OPIVV2_RM, vaaddu_vv_h, OP_UUU_H, H2, H2, H2, aaddu32)
 RVVCALL(OPIVV2_RM, vaaddu_vv_w, OP_UUU_W, H4, H4, H4, aaddu32)
 RVVCALL(OPIVV2_RM, vaaddu_vv_d, OP_UUU_D, H8, H8, H8, aaddu64)
-GEN_VEXT_VV_RM(vaaddu_vv_b, 1, 1)
-GEN_VEXT_VV_RM(vaaddu_vv_h, 2, 2)
-GEN_VEXT_VV_RM(vaaddu_vv_w, 4, 4)
-GEN_VEXT_VV_RM(vaaddu_vv_d, 8, 8)
+GEN_VEXT_VV_RM(vaaddu_vv_b)
+GEN_VEXT_VV_RM(vaaddu_vv_h)
+GEN_VEXT_VV_RM(vaaddu_vv_w)
+GEN_VEXT_VV_RM(vaaddu_vv_d)
 
 RVVCALL(OPIVX2_RM, vaaddu_vx_b, OP_UUU_B, H1, H1, aaddu32)
 RVVCALL(OPIVX2_RM, vaaddu_vx_h, OP_UUU_H, H2, H2, aaddu32)
 RVVCALL(OPIVX2_RM, vaaddu_vx_w, OP_UUU_W, H4, H4, aaddu32)
 RVVCALL(OPIVX2_RM, vaaddu_vx_d, OP_UUU_D, H8, H8, aaddu64)
-GEN_VEXT_VX_RM(vaaddu_vx_b, 1, 1)
-GEN_VEXT_VX_RM(vaaddu_vx_h, 2, 2)
-GEN_VEXT_VX_RM(vaaddu_vx_w, 4, 4)
-GEN_VEXT_VX_RM(vaaddu_vx_d, 8, 8)
+GEN_VEXT_VX_RM(vaaddu_vx_b)
+GEN_VEXT_VX_RM(vaaddu_vx_h)
+GEN_VEXT_VX_RM(vaaddu_vx_w)
+GEN_VEXT_VX_RM(vaaddu_vx_d)
 
 static inline int32_t asub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vasub_vv_b, OP_SSS_B, H1, H1, H1, asub32)
 RVVCALL(OPIVV2_RM, vasub_vv_h, OP_SSS_H, H2, H2, H2, asub32)
 RVVCALL(OPIVV2_RM, vasub_vv_w, OP_SSS_W, H4, H4, H4, asub32)
 RVVCALL(OPIVV2_RM, vasub_vv_d, OP_SSS_D, H8, H8, H8, asub64)
-GEN_VEXT_VV_RM(vasub_vv_b, 1, 1)
-GEN_VEXT_VV_RM(vasub_vv_h, 2, 2)
-GEN_VEXT_VV_RM(vasub_vv_w, 4, 4)
-GEN_VEXT_VV_RM(vasub_vv_d, 8, 8)
+GEN_VEXT_VV_RM(vasub_vv_b)
+GEN_VEXT_VV_RM(vasub_vv_h)
+GEN_VEXT_VV_RM(vasub_vv_w)
+GEN_VEXT_VV_RM(vasub_vv_d)
 
 RVVCALL(OPIVX2_RM, vasub_vx_b, OP_SSS_B, H1, H1, asub32)
 RVVCALL(OPIVX2_RM, vasub_vx_h, OP_SSS_H, H2, H2, asub32)
 RVVCALL(OPIVX2_RM, vasub_vx_w, OP_SSS_W, H4, H4, asub32)
 RVVCALL(OPIVX2_RM, vasub_vx_d, OP_SSS_D, H8, H8, asub64)
-GEN_VEXT_VX_RM(vasub_vx_b, 1, 1)
-GEN_VEXT_VX_RM(vasub_vx_h, 2, 2)
-GEN_VEXT_VX_RM(vasub_vx_w, 4, 4)
-GEN_VEXT_VX_RM(vasub_vx_d, 8, 8)
+GEN_VEXT_VX_RM(vasub_vx_b)
+GEN_VEXT_VX_RM(vasub_vx_h)
+GEN_VEXT_VX_RM(vasub_vx_w)
+GEN_VEXT_VX_RM(vasub_vx_d)
 
 static inline uint32_t asubu32(CPURISCVState *env, int vxrm,
                                uint32_t a, uint32_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vasubu_vv_b, OP_UUU_B, H1, H1, H1, asubu32)
 RVVCALL(OPIVV2_RM, vasubu_vv_h, OP_UUU_H, H2, H2, H2, asubu32)
 RVVCALL(OPIVV2_RM, vasubu_vv_w, OP_UUU_W, H4, H4, H4, asubu32)
 RVVCALL(OPIVV2_RM, vasubu_vv_d, OP_UUU_D, H8, H8, H8, asubu64)
-GEN_VEXT_VV_RM(vasubu_vv_b, 1, 1)
-GEN_VEXT_VV_RM(vasubu_vv_h, 2, 2)
-GEN_VEXT_VV_RM(vasubu_vv_w, 4, 4)
-GEN_VEXT_VV_RM(vasubu_vv_d, 8, 8)
+GEN_VEXT_VV_RM(vasubu_vv_b)
+GEN_VEXT_VV_RM(vasubu_vv_h)
+GEN_VEXT_VV_RM(vasubu_vv_w)
+GEN_VEXT_VV_RM(vasubu_vv_d)
 
 RVVCALL(OPIVX2_RM, vasubu_vx_b, OP_UUU_B, H1, H1, asubu32)
 RVVCALL(OPIVX2_RM, vasubu_vx_h, OP_UUU_H, H2, H2, asubu32)
 RVVCALL(OPIVX2_RM, vasubu_vx_w, OP_UUU_W, H4, H4, asubu32)
 RVVCALL(OPIVX2_RM, vasubu_vx_d, OP_UUU_D, H8, H8, asubu64)
-GEN_VEXT_VX_RM(vasubu_vx_b, 1, 1)
-GEN_VEXT_VX_RM(vasubu_vx_h, 2, 2)
-GEN_VEXT_VX_RM(vasubu_vx_w, 4, 4)
-GEN_VEXT_VX_RM(vasubu_vx_d, 8, 8)
+GEN_VEXT_VX_RM(vasubu_vx_b)
+GEN_VEXT_VX_RM(vasubu_vx_h)
+GEN_VEXT_VX_RM(vasubu_vx_w)
+GEN_VEXT_VX_RM(vasubu_vx_d)
 
 /* Vector Single-Width Fractional Multiply with Rounding and Saturation */
 static inline int8_t vsmul8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vsmul_vv_b, OP_SSS_B, H1, H1, H1, vsmul8)
 RVVCALL(OPIVV2_RM, vsmul_vv_h, OP_SSS_H, H2, H2, H2, vsmul16)
 RVVCALL(OPIVV2_RM, vsmul_vv_w, OP_SSS_W, H4, H4, H4, vsmul32)
 RVVCALL(OPIVV2_RM, vsmul_vv_d, OP_SSS_D, H8, H8, H8, vsmul64)
-GEN_VEXT_VV_RM(vsmul_vv_b, 1, 1)
-GEN_VEXT_VV_RM(vsmul_vv_h, 2, 2)
-GEN_VEXT_VV_RM(vsmul_vv_w, 4, 4)
-GEN_VEXT_VV_RM(vsmul_vv_d, 8, 8)
+GEN_VEXT_VV_RM(vsmul_vv_b)
+GEN_VEXT_VV_RM(vsmul_vv_h)
+GEN_VEXT_VV_RM(vsmul_vv_w)
+GEN_VEXT_VV_RM(vsmul_vv_d)
 
 RVVCALL(OPIVX2_RM, vsmul_vx_b, OP_SSS_B, H1, H1, vsmul8)
 RVVCALL(OPIVX2_RM, vsmul_vx_h, OP_SSS_H, H2, H2, vsmul16)
 RVVCALL(OPIVX2_RM, vsmul_vx_w, OP_SSS_W, H4, H4, vsmul32)
 RVVCALL(OPIVX2_RM, vsmul_vx_d, OP_SSS_D, H8, H8, vsmul64)
-GEN_VEXT_VX_RM(vsmul_vx_b, 1, 1)
-GEN_VEXT_VX_RM(vsmul_vx_h, 2, 2)
-GEN_VEXT_VX_RM(vsmul_vx_w, 4, 4)
-GEN_VEXT_VX_RM(vsmul_vx_d, 8, 8)
+GEN_VEXT_VX_RM(vsmul_vx_b)
+GEN_VEXT_VX_RM(vsmul_vx_h)
+GEN_VEXT_VX_RM(vsmul_vx_w)
+GEN_VEXT_VX_RM(vsmul_vx_d)
 
 /* Vector Single-Width Scaling Shift Instructions */
 static inline uint8_t
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssrl_vv_b, OP_UUU_B, H1, H1, H1, vssrl8)
 RVVCALL(OPIVV2_RM, vssrl_vv_h, OP_UUU_H, H2, H2, H2, vssrl16)
 RVVCALL(OPIVV2_RM, vssrl_vv_w, OP_UUU_W, H4, H4, H4, vssrl32)
 RVVCALL(OPIVV2_RM, vssrl_vv_d, OP_UUU_D, H8, H8, H8, vssrl64)
-GEN_VEXT_VV_RM(vssrl_vv_b, 1, 1)
-GEN_VEXT_VV_RM(vssrl_vv_h, 2, 2)
-GEN_VEXT_VV_RM(vssrl_vv_w, 4, 4)
-GEN_VEXT_VV_RM(vssrl_vv_d, 8, 8)
+GEN_VEXT_VV_RM(vssrl_vv_b)
+GEN_VEXT_VV_RM(vssrl_vv_h)
+GEN_VEXT_VV_RM(vssrl_vv_w)
+GEN_VEXT_VV_RM(vssrl_vv_d)
 
 RVVCALL(OPIVX2_RM, vssrl_vx_b, OP_UUU_B, H1, H1, vssrl8)
 RVVCALL(OPIVX2_RM, vssrl_vx_h, OP_UUU_H, H2, H2, vssrl16)
 RVVCALL(OPIVX2_RM, vssrl_vx_w, OP_UUU_W, H4, H4, vssrl32)
 RVVCALL(OPIVX2_RM, vssrl_vx_d, OP_UUU_D, H8, H8, vssrl64)
-GEN_VEXT_VX_RM(vssrl_vx_b, 1, 1)
-GEN_VEXT_VX_RM(vssrl_vx_h, 2, 2)
-GEN_VEXT_VX_RM(vssrl_vx_w, 4, 4)
-GEN_VEXT_VX_RM(vssrl_vx_d, 8, 8)
+GEN_VEXT_VX_RM(vssrl_vx_b)
+GEN_VEXT_VX_RM(vssrl_vx_h)
+GEN_VEXT_VX_RM(vssrl_vx_w)
+GEN_VEXT_VX_RM(vssrl_vx_d)
 
 static inline int8_t
 vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssra_vv_b, OP_SSS_B, H1, H1, H1, vssra8)
 RVVCALL(OPIVV2_RM, vssra_vv_h, OP_SSS_H, H2, H2, H2, vssra16)
 RVVCALL(OPIVV2_RM, vssra_vv_w, OP_SSS_W, H4, H4, H4, vssra32)
 RVVCALL(OPIVV2_RM, vssra_vv_d, OP_SSS_D, H8, H8, H8, vssra64)
-GEN_VEXT_VV_RM(vssra_vv_b, 1, 1)
-GEN_VEXT_VV_RM(vssra_vv_h, 2, 2)
-GEN_VEXT_VV_RM(vssra_vv_w, 4, 4)
-GEN_VEXT_VV_RM(vssra_vv_d, 8, 8)
+GEN_VEXT_VV_RM(vssra_vv_b)
+GEN_VEXT_VV_RM(vssra_vv_h)
+GEN_VEXT_VV_RM(vssra_vv_w)
+GEN_VEXT_VV_RM(vssra_vv_d)
 
 RVVCALL(OPIVX2_RM, vssra_vx_b, OP_SSS_B, H1, H1, vssra8)
 RVVCALL(OPIVX2_RM, vssra_vx_h, OP_SSS_H, H2, H2, vssra16)
 RVVCALL(OPIVX2_RM, vssra_vx_w, OP_SSS_W, H4, H4, vssra32)
 RVVCALL(OPIVX2_RM, vssra_vx_d, OP_SSS_D, H8, H8, vssra64)
-GEN_VEXT_VX_RM(vssra_vx_b, 1, 1)
-GEN_VEXT_VX_RM(vssra_vx_h, 2, 2)
-GEN_VEXT_VX_RM(vssra_vx_w, 4, 4)
-GEN_VEXT_VX_RM(vssra_vx_d, 8, 8)
+GEN_VEXT_VX_RM(vssra_vx_b)
+GEN_VEXT_VX_RM(vssra_vx_h)
+GEN_VEXT_VX_RM(vssra_vx_w)
+GEN_VEXT_VX_RM(vssra_vx_d)
 
 /* Vector Narrowing Fixed-Point Clip Instructions */
 static inline int8_t
@@ -XXX,XX +XXX,XX @@ vnclip32(CPURISCVState *env, int vxrm, int64_t a, int32_t b)
 RVVCALL(OPIVV2_RM, vnclip_wv_b, NOP_SSS_B, H1, H2, H1, vnclip8)
 RVVCALL(OPIVV2_RM, vnclip_wv_h, NOP_SSS_H, H2, H4, H2, vnclip16)
 RVVCALL(OPIVV2_RM, vnclip_wv_w, NOP_SSS_W, H4, H8, H4, vnclip32)
-GEN_VEXT_VV_RM(vnclip_wv_b, 1, 1)
-GEN_VEXT_VV_RM(vnclip_wv_h, 2, 2)
-GEN_VEXT_VV_RM(vnclip_wv_w, 4, 4)
+GEN_VEXT_VV_RM(vnclip_wv_b)
+GEN_VEXT_VV_RM(vnclip_wv_h)
+GEN_VEXT_VV_RM(vnclip_wv_w)
 
 RVVCALL(OPIVX2_RM, vnclip_wx_b, NOP_SSS_B, H1, H2, vnclip8)
 RVVCALL(OPIVX2_RM, vnclip_wx_h, NOP_SSS_H, H2, H4, vnclip16)
 RVVCALL(OPIVX2_RM, vnclip_wx_w, NOP_SSS_W, H4, H8, vnclip32)
-GEN_VEXT_VX_RM(vnclip_wx_b, 1, 1)
-GEN_VEXT_VX_RM(vnclip_wx_h, 2, 2)
-GEN_VEXT_VX_RM(vnclip_wx_w, 4, 4)
+GEN_VEXT_VX_RM(vnclip_wx_b)
+GEN_VEXT_VX_RM(vnclip_wx_h)
+GEN_VEXT_VX_RM(vnclip_wx_w)
 
 static inline uint8_t
 vnclipu8(CPURISCVState *env, int vxrm, uint16_t a, uint8_t b)
@@ -XXX,XX +XXX,XX @@ vnclipu32(CPURISCVState *env, int vxrm, uint64_t a, uint32_t b)
 RVVCALL(OPIVV2_RM, vnclipu_wv_b, NOP_UUU_B, H1, H2, H1, vnclipu8)
 RVVCALL(OPIVV2_RM, vnclipu_wv_h, NOP_UUU_H, H2, H4, H2, vnclipu16)
 RVVCALL(OPIVV2_RM, vnclipu_wv_w, NOP_UUU_W, H4, H8, H4, vnclipu32)
-GEN_VEXT_VV_RM(vnclipu_wv_b, 1, 1)
-GEN_VEXT_VV_RM(vnclipu_wv_h, 2, 2)
-GEN_VEXT_VV_RM(vnclipu_wv_w, 4, 4)
+GEN_VEXT_VV_RM(vnclipu_wv_b)
+GEN_VEXT_VV_RM(vnclipu_wv_h)
+GEN_VEXT_VV_RM(vnclipu_wv_w)
 
 RVVCALL(OPIVX2_RM, vnclipu_wx_b, NOP_UUU_B, H1, H2, vnclipu8)
 RVVCALL(OPIVX2_RM, vnclipu_wx_h, NOP_UUU_H, H2, H4, vnclipu16)
 RVVCALL(OPIVX2_RM, vnclipu_wx_w, NOP_UUU_W, H4, H8, vnclipu32)
-GEN_VEXT_VX_RM(vnclipu_wx_b, 1, 1)
-GEN_VEXT_VX_RM(vnclipu_wx_h, 2, 2)
-GEN_VEXT_VX_RM(vnclipu_wx_w, 4, 4)
+GEN_VEXT_VX_RM(vnclipu_wx_b)
+GEN_VEXT_VX_RM(vnclipu_wx_h)
+GEN_VEXT_VX_RM(vnclipu_wx_w)
 
 /*
  *** Vector Float Point Arithmetic Instructions
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, void *vs1, void *vs2, int i,   \
     *((TD *)vd + HD(i)) = OP(s2, s1, &env->fp_status);         \
 }
 
-#define GEN_VEXT_VV_ENV(NAME, ESZ, DSZ)                   \
+#define GEN_VEXT_VV_ENV(NAME)                             \
 void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
                   void *vs2, CPURISCVState *env,          \
                   uint32_t desc)                          \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
 RVVCALL(OPFVV2, vfadd_vv_h, OP_UUU_H, H2, H2, H2, float16_add)
 RVVCALL(OPFVV2, vfadd_vv_w, OP_UUU_W, H4, H4, H4, float32_add)
 RVVCALL(OPFVV2, vfadd_vv_d, OP_UUU_D, H8, H8, H8, float64_add)
-GEN_VEXT_VV_ENV(vfadd_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfadd_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfadd_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfadd_vv_h)
+GEN_VEXT_VV_ENV(vfadd_vv_w)
+GEN_VEXT_VV_ENV(vfadd_vv_d)
 
 #define OPFVF2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)        \
 static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i, \
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i, \
     *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1, &env->fp_status);\
 }
 
-#define GEN_VEXT_VF(NAME, ESZ, DSZ)                       \
+#define GEN_VEXT_VF(NAME)                                 \
 void HELPER(NAME)(void *vd, void *v0, uint64_t s1,        \
                   void *vs2, CPURISCVState *env,          \
                   uint32_t desc)                          \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1,        \
 RVVCALL(OPFVF2, vfadd_vf_h, OP_UUU_H, H2, H2, float16_add)
 RVVCALL(OPFVF2, vfadd_vf_w, OP_UUU_W, H4, H4, float32_add)
 RVVCALL(OPFVF2, vfadd_vf_d, OP_UUU_D, H8, H8, float64_add)
-GEN_VEXT_VF(vfadd_vf_h, 2, 2)
-GEN_VEXT_VF(vfadd_vf_w, 4, 4)
-GEN_VEXT_VF(vfadd_vf_d, 8, 8)
+GEN_VEXT_VF(vfadd_vf_h)
+GEN_VEXT_VF(vfadd_vf_w)
+GEN_VEXT_VF(vfadd_vf_d)
 
 RVVCALL(OPFVV2, vfsub_vv_h, OP_UUU_H, H2, H2, H2, float16_sub)
 RVVCALL(OPFVV2, vfsub_vv_w, OP_UUU_W, H4, H4, H4, float32_sub)
 RVVCALL(OPFVV2, vfsub_vv_d, OP_UUU_D, H8, H8, H8, float64_sub)
-GEN_VEXT_VV_ENV(vfsub_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfsub_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfsub_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfsub_vv_h)
+GEN_VEXT_VV_ENV(vfsub_vv_w)
+GEN_VEXT_VV_ENV(vfsub_vv_d)
 RVVCALL(OPFVF2, vfsub_vf_h, OP_UUU_H, H2, H2, float16_sub)
 RVVCALL(OPFVF2, vfsub_vf_w, OP_UUU_W, H4, H4, float32_sub)
 RVVCALL(OPFVF2, vfsub_vf_d, OP_UUU_D, H8, H8, float64_sub)
-GEN_VEXT_VF(vfsub_vf_h, 2, 2)
-GEN_VEXT_VF(vfsub_vf_w, 4, 4)
-GEN_VEXT_VF(vfsub_vf_d, 8, 8)
+GEN_VEXT_VF(vfsub_vf_h)
+GEN_VEXT_VF(vfsub_vf_w)
+GEN_VEXT_VF(vfsub_vf_d)
 
 static uint16_t float16_rsub(uint16_t a, uint16_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t float64_rsub(uint64_t a, uint64_t b, float_status *s)
 RVVCALL(OPFVF2, vfrsub_vf_h, OP_UUU_H, H2, H2, float16_rsub)
 RVVCALL(OPFVF2, vfrsub_vf_w, OP_UUU_W, H4, H4, float32_rsub)
 RVVCALL(OPFVF2, vfrsub_vf_d, OP_UUU_D, H8, H8, float64_rsub)
-GEN_VEXT_VF(vfrsub_vf_h, 2, 2)
-GEN_VEXT_VF(vfrsub_vf_w, 4, 4)
-GEN_VEXT_VF(vfrsub_vf_d, 8, 8)
+GEN_VEXT_VF(vfrsub_vf_h)
+GEN_VEXT_VF(vfrsub_vf_w)
+GEN_VEXT_VF(vfrsub_vf_d)
 
 /* Vector Widening Floating-Point Add/Subtract Instructions */
 static uint32_t vfwadd16(uint16_t a, uint16_t b, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwadd32(uint32_t a, uint32_t b, float_status *s)
 
 RVVCALL(OPFVV2, vfwadd_vv_h, WOP_UUU_H, H4, H2, H2, vfwadd16)
 RVVCALL(OPFVV2, vfwadd_vv_w, WOP_UUU_W, H8, H4, H4, vfwadd32)
-GEN_VEXT_VV_ENV(vfwadd_vv_h, 2, 4)
-GEN_VEXT_VV_ENV(vfwadd_vv_w, 4, 8)
+GEN_VEXT_VV_ENV(vfwadd_vv_h)
+GEN_VEXT_VV_ENV(vfwadd_vv_w)
 RVVCALL(OPFVF2, vfwadd_vf_h, WOP_UUU_H, H4, H2, vfwadd16)
 RVVCALL(OPFVF2, vfwadd_vf_w, WOP_UUU_W, H8, H4, vfwadd32)
-GEN_VEXT_VF(vfwadd_vf_h, 2, 4)
-GEN_VEXT_VF(vfwadd_vf_w, 4, 8)
+GEN_VEXT_VF(vfwadd_vf_h)
+GEN_VEXT_VF(vfwadd_vf_w)
 
 static uint32_t vfwsub16(uint16_t a, uint16_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwsub32(uint32_t a, uint32_t b, float_status *s)
 
 RVVCALL(OPFVV2, vfwsub_vv_h, WOP_UUU_H, H4, H2, H2, vfwsub16)
 RVVCALL(OPFVV2, vfwsub_vv_w, WOP_UUU_W, H8, H4, H4, vfwsub32)
-GEN_VEXT_VV_ENV(vfwsub_vv_h, 2, 4)
-GEN_VEXT_VV_ENV(vfwsub_vv_w, 4, 8)
+GEN_VEXT_VV_ENV(vfwsub_vv_h)
+GEN_VEXT_VV_ENV(vfwsub_vv_w)
 RVVCALL(OPFVF2, vfwsub_vf_h, WOP_UUU_H, H4, H2, vfwsub16)
 RVVCALL(OPFVF2, vfwsub_vf_w, WOP_UUU_W, H8, H4, vfwsub32)
-GEN_VEXT_VF(vfwsub_vf_h, 2, 4)
-GEN_VEXT_VF(vfwsub_vf_w, 4, 8)
+GEN_VEXT_VF(vfwsub_vf_h)
+GEN_VEXT_VF(vfwsub_vf_w)
 
 static uint32_t vfwaddw16(uint32_t a, uint16_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwaddw32(uint64_t a, uint32_t b, float_status *s)
 
 RVVCALL(OPFVV2, vfwadd_wv_h, WOP_WUUU_H, H4, H2, H2, vfwaddw16)
 RVVCALL(OPFVV2, vfwadd_wv_w, WOP_WUUU_W, H8, H4, H4, vfwaddw32)
-GEN_VEXT_VV_ENV(vfwadd_wv_h, 2, 4)
-GEN_VEXT_VV_ENV(vfwadd_wv_w, 4, 8)
+GEN_VEXT_VV_ENV(vfwadd_wv_h)
+GEN_VEXT_VV_ENV(vfwadd_wv_w)
 RVVCALL(OPFVF2, vfwadd_wf_h, WOP_WUUU_H, H4, H2, vfwaddw16)
 RVVCALL(OPFVF2, vfwadd_wf_w, WOP_WUUU_W, H8, H4, vfwaddw32)
-GEN_VEXT_VF(vfwadd_wf_h, 2, 4)
-GEN_VEXT_VF(vfwadd_wf_w, 4, 8)
+GEN_VEXT_VF(vfwadd_wf_h)
+GEN_VEXT_VF(vfwadd_wf_w)
 
 static uint32_t vfwsubw16(uint32_t a, uint16_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwsubw32(uint64_t a, uint32_t b, float_status *s)
 
 RVVCALL(OPFVV2, vfwsub_wv_h, WOP_WUUU_H, H4, H2, H2, vfwsubw16)
 RVVCALL(OPFVV2, vfwsub_wv_w, WOP_WUUU_W, H8, H4, H4, vfwsubw32)
-GEN_VEXT_VV_ENV(vfwsub_wv_h, 2, 4)
-GEN_VEXT_VV_ENV(vfwsub_wv_w, 4, 8)
+GEN_VEXT_VV_ENV(vfwsub_wv_h)
+GEN_VEXT_VV_ENV(vfwsub_wv_w)
 RVVCALL(OPFVF2, vfwsub_wf_h, WOP_WUUU_H, H4, H2, vfwsubw16)
 RVVCALL(OPFVF2, vfwsub_wf_w, WOP_WUUU_W, H8, H4, vfwsubw32)
-GEN_VEXT_VF(vfwsub_wf_h, 2, 4)
-GEN_VEXT_VF(vfwsub_wf_w, 4, 8)
+GEN_VEXT_VF(vfwsub_wf_h)
+GEN_VEXT_VF(vfwsub_wf_w)
 
 /* Vector Single-Width Floating-Point Multiply/Divide Instructions */
 RVVCALL(OPFVV2, vfmul_vv_h, OP_UUU_H, H2, H2, H2, float16_mul)
 RVVCALL(OPFVV2, vfmul_vv_w, OP_UUU_W, H4, H4, H4, float32_mul)
 RVVCALL(OPFVV2, vfmul_vv_d, OP_UUU_D, H8, H8, H8, float64_mul)
-GEN_VEXT_VV_ENV(vfmul_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfmul_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfmul_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfmul_vv_h)
+GEN_VEXT_VV_ENV(vfmul_vv_w)
+GEN_VEXT_VV_ENV(vfmul_vv_d)
 RVVCALL(OPFVF2, vfmul_vf_h, OP_UUU_H, H2, H2, float16_mul)
 RVVCALL(OPFVF2, vfmul_vf_w, OP_UUU_W, H4, H4, float32_mul)
 RVVCALL(OPFVF2, vfmul_vf_d, OP_UUU_D, H8, H8, float64_mul)
-GEN_VEXT_VF(vfmul_vf_h, 2, 2)
-GEN_VEXT_VF(vfmul_vf_w, 4, 4)
-GEN_VEXT_VF(vfmul_vf_d, 8, 8)
+GEN_VEXT_VF(vfmul_vf_h)
+GEN_VEXT_VF(vfmul_vf_w)
+GEN_VEXT_VF(vfmul_vf_d)
 
 RVVCALL(OPFVV2, vfdiv_vv_h, OP_UUU_H, H2, H2, H2, float16_div)
 RVVCALL(OPFVV2, vfdiv_vv_w, OP_UUU_W, H4, H4, H4, float32_div)
 RVVCALL(OPFVV2, vfdiv_vv_d, OP_UUU_D, H8, H8, H8, float64_div)
-GEN_VEXT_VV_ENV(vfdiv_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfdiv_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfdiv_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfdiv_vv_h)
+GEN_VEXT_VV_ENV(vfdiv_vv_w)
+GEN_VEXT_VV_ENV(vfdiv_vv_d)
 RVVCALL(OPFVF2, vfdiv_vf_h, OP_UUU_H, H2, H2, float16_div)
 RVVCALL(OPFVF2, vfdiv_vf_w, OP_UUU_W, H4, H4, float32_div)
 RVVCALL(OPFVF2, vfdiv_vf_d, OP_UUU_D, H8, H8, float64_div)
-GEN_VEXT_VF(vfdiv_vf_h, 2, 2)
-GEN_VEXT_VF(vfdiv_vf_w, 4, 4)
-GEN_VEXT_VF(vfdiv_vf_d, 8, 8)
+GEN_VEXT_VF(vfdiv_vf_h)
+GEN_VEXT_VF(vfdiv_vf_w)
+GEN_VEXT_VF(vfdiv_vf_d)
 
 static uint16_t float16_rdiv(uint16_t a, uint16_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t float64_rdiv(uint64_t a, uint64_t b, float_status *s)
 RVVCALL(OPFVF2, vfrdiv_vf_h, OP_UUU_H, H2, H2, float16_rdiv)
 RVVCALL(OPFVF2, vfrdiv_vf_w, OP_UUU_W, H4, H4, float32_rdiv)
 RVVCALL(OPFVF2, vfrdiv_vf_d, OP_UUU_D, H8, H8, float64_rdiv)
-GEN_VEXT_VF(vfrdiv_vf_h, 2, 2)
-GEN_VEXT_VF(vfrdiv_vf_w, 4, 4)
-GEN_VEXT_VF(vfrdiv_vf_d, 8, 8)
+GEN_VEXT_VF(vfrdiv_vf_h)
+GEN_VEXT_VF(vfrdiv_vf_w)
+GEN_VEXT_VF(vfrdiv_vf_d)
 
 /* Vector Widening Floating-Point Multiply */
 static uint32_t vfwmul16(uint16_t a, uint16_t b, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwmul32(uint32_t a, uint32_t b, float_status *s)
 }
 RVVCALL(OPFVV2, vfwmul_vv_h, WOP_UUU_H, H4, H2, H2, vfwmul16)
 RVVCALL(OPFVV2, vfwmul_vv_w, WOP_UUU_W, H8, H4, H4, vfwmul32)
-GEN_VEXT_VV_ENV(vfwmul_vv_h, 2, 4)
-GEN_VEXT_VV_ENV(vfwmul_vv_w, 4, 8)
+GEN_VEXT_VV_ENV(vfwmul_vv_h)
+GEN_VEXT_VV_ENV(vfwmul_vv_w)
 RVVCALL(OPFVF2, vfwmul_vf_h, WOP_UUU_H, H4, H2, vfwmul16)
 RVVCALL(OPFVF2, vfwmul_vf_w, WOP_UUU_W, H8, H4, vfwmul32)
-GEN_VEXT_VF(vfwmul_vf_h, 2, 4)
-GEN_VEXT_VF(vfwmul_vf_w, 4, 8)
+GEN_VEXT_VF(vfwmul_vf_h)
+GEN_VEXT_VF(vfwmul_vf_w)
 
 /* Vector Single-Width Floating-Point Fused Multiply-Add Instructions */
 #define OPFVV3(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)       \
@@ -XXX,XX +XXX,XX @@ static uint64_t fmacc64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfmacc_vv_h, OP_UUU_H, H2, H2, H2, fmacc16)
 RVVCALL(OPFVV3, vfmacc_vv_w, OP_UUU_W, H4, H4, H4, fmacc32)
 RVVCALL(OPFVV3, vfmacc_vv_d, OP_UUU_D, H8, H8, H8, fmacc64)
-GEN_VEXT_VV_ENV(vfmacc_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfmacc_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfmacc_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfmacc_vv_h)
+GEN_VEXT_VV_ENV(vfmacc_vv_w)
+GEN_VEXT_VV_ENV(vfmacc_vv_d)
 
 #define OPFVF3(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)           \
 static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i,    \
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i,    \
 RVVCALL(OPFVF3, vfmacc_vf_h, OP_UUU_H, H2, H2, fmacc16)
 RVVCALL(OPFVF3, vfmacc_vf_w, OP_UUU_W, H4, H4, fmacc32)
 RVVCALL(OPFVF3, vfmacc_vf_d, OP_UUU_D, H8, H8, fmacc64)
-GEN_VEXT_VF(vfmacc_vf_h, 2, 2)
-GEN_VEXT_VF(vfmacc_vf_w, 4, 4)
-GEN_VEXT_VF(vfmacc_vf_d, 8, 8)
+GEN_VEXT_VF(vfmacc_vf_h)
+GEN_VEXT_VF(vfmacc_vf_w)
+GEN_VEXT_VF(vfmacc_vf_d)
 
 static uint16_t fnmacc16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmacc64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfnmacc_vv_h, OP_UUU_H, H2, H2, H2, fnmacc16)
 RVVCALL(OPFVV3, vfnmacc_vv_w, OP_UUU_W, H4, H4, H4, fnmacc32)
 RVVCALL(OPFVV3, vfnmacc_vv_d, OP_UUU_D, H8, H8, H8, fnmacc64)
-GEN_VEXT_VV_ENV(vfnmacc_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfnmacc_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfnmacc_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfnmacc_vv_h)
+GEN_VEXT_VV_ENV(vfnmacc_vv_w)
+GEN_VEXT_VV_ENV(vfnmacc_vv_d)
 RVVCALL(OPFVF3, vfnmacc_vf_h, OP_UUU_H, H2, H2, fnmacc16)
 RVVCALL(OPFVF3, vfnmacc_vf_w, OP_UUU_W, H4, H4, fnmacc32)
 RVVCALL(OPFVF3, vfnmacc_vf_d, OP_UUU_D, H8, H8, fnmacc64)
-GEN_VEXT_VF(vfnmacc_vf_h, 2, 2)
-GEN_VEXT_VF(vfnmacc_vf_w, 4, 4)
-GEN_VEXT_VF(vfnmacc_vf_d, 8, 8)
+GEN_VEXT_VF(vfnmacc_vf_h)
+GEN_VEXT_VF(vfnmacc_vf_w)
+GEN_VEXT_VF(vfnmacc_vf_d)
 
 static uint16_t fmsac16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fmsac64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfmsac_vv_h, OP_UUU_H, H2, H2, H2, fmsac16)
 RVVCALL(OPFVV3, vfmsac_vv_w, OP_UUU_W, H4, H4, H4, fmsac32)
 RVVCALL(OPFVV3, vfmsac_vv_d, OP_UUU_D, H8, H8, H8, fmsac64)
-GEN_VEXT_VV_ENV(vfmsac_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfmsac_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfmsac_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfmsac_vv_h)
+GEN_VEXT_VV_ENV(vfmsac_vv_w)
+GEN_VEXT_VV_ENV(vfmsac_vv_d)
 RVVCALL(OPFVF3, vfmsac_vf_h, OP_UUU_H, H2, H2, fmsac16)
 RVVCALL(OPFVF3, vfmsac_vf_w, OP_UUU_W, H4, H4, fmsac32)
 RVVCALL(OPFVF3, vfmsac_vf_d, OP_UUU_D, H8, H8, fmsac64)
-GEN_VEXT_VF(vfmsac_vf_h, 2, 2)
-GEN_VEXT_VF(vfmsac_vf_w, 4, 4)
-GEN_VEXT_VF(vfmsac_vf_d, 8, 8)
+GEN_VEXT_VF(vfmsac_vf_h)
+GEN_VEXT_VF(vfmsac_vf_w)
+GEN_VEXT_VF(vfmsac_vf_d)
 
 static uint16_t fnmsac16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmsac64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfnmsac_vv_h, OP_UUU_H, H2, H2, H2, fnmsac16)
 RVVCALL(OPFVV3, vfnmsac_vv_w, OP_UUU_W, H4, H4, H4, fnmsac32)
 RVVCALL(OPFVV3, vfnmsac_vv_d, OP_UUU_D, H8, H8, H8, fnmsac64)
-GEN_VEXT_VV_ENV(vfnmsac_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfnmsac_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfnmsac_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfnmsac_vv_h)
+GEN_VEXT_VV_ENV(vfnmsac_vv_w)
+GEN_VEXT_VV_ENV(vfnmsac_vv_d)
 RVVCALL(OPFVF3, vfnmsac_vf_h, OP_UUU_H, H2, H2, fnmsac16)
 RVVCALL(OPFVF3, vfnmsac_vf_w, OP_UUU_W, H4, H4, fnmsac32)
 RVVCALL(OPFVF3, vfnmsac_vf_d, OP_UUU_D, H8, H8, fnmsac64)
-GEN_VEXT_VF(vfnmsac_vf_h, 2, 2)
-GEN_VEXT_VF(vfnmsac_vf_w, 4, 4)
-GEN_VEXT_VF(vfnmsac_vf_d, 8, 8)
+GEN_VEXT_VF(vfnmsac_vf_h)
+GEN_VEXT_VF(vfnmsac_vf_w)
+GEN_VEXT_VF(vfnmsac_vf_d)
 
 static uint16_t fmadd16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fmadd64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfmadd_vv_h, OP_UUU_H, H2, H2, H2, fmadd16)
 RVVCALL(OPFVV3, vfmadd_vv_w, OP_UUU_W, H4, H4, H4, fmadd32)
 RVVCALL(OPFVV3, vfmadd_vv_d, OP_UUU_D, H8, H8, H8, fmadd64)
-GEN_VEXT_VV_ENV(vfmadd_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfmadd_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfmadd_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfmadd_vv_h)
+GEN_VEXT_VV_ENV(vfmadd_vv_w)
+GEN_VEXT_VV_ENV(vfmadd_vv_d)
 RVVCALL(OPFVF3, vfmadd_vf_h, OP_UUU_H, H2, H2, fmadd16)
 RVVCALL(OPFVF3, vfmadd_vf_w, OP_UUU_W, H4, H4, fmadd32)
 RVVCALL(OPFVF3, vfmadd_vf_d, OP_UUU_D, H8, H8, fmadd64)
-GEN_VEXT_VF(vfmadd_vf_h, 2, 2)
-GEN_VEXT_VF(vfmadd_vf_w, 4, 4)
-GEN_VEXT_VF(vfmadd_vf_d, 8, 8)
+GEN_VEXT_VF(vfmadd_vf_h)
+GEN_VEXT_VF(vfmadd_vf_w)
+GEN_VEXT_VF(vfmadd_vf_d)
 
 static uint16_t fnmadd16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmadd64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfnmadd_vv_h, OP_UUU_H, H2, H2, H2, fnmadd16)
 RVVCALL(OPFVV3, vfnmadd_vv_w, OP_UUU_W, H4, H4, H4, fnmadd32)
 RVVCALL(OPFVV3, vfnmadd_vv_d, OP_UUU_D, H8, H8, H8, fnmadd64)
-GEN_VEXT_VV_ENV(vfnmadd_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfnmadd_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfnmadd_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfnmadd_vv_h)
+GEN_VEXT_VV_ENV(vfnmadd_vv_w)
+GEN_VEXT_VV_ENV(vfnmadd_vv_d)
 RVVCALL(OPFVF3, vfnmadd_vf_h, OP_UUU_H, H2, H2, fnmadd16)
 RVVCALL(OPFVF3, vfnmadd_vf_w, OP_UUU_W, H4, H4, fnmadd32)
 RVVCALL(OPFVF3, vfnmadd_vf_d, OP_UUU_D, H8, H8, fnmadd64)
-GEN_VEXT_VF(vfnmadd_vf_h, 2, 2)
-GEN_VEXT_VF(vfnmadd_vf_w, 4, 4)
-GEN_VEXT_VF(vfnmadd_vf_d, 8, 8)
+GEN_VEXT_VF(vfnmadd_vf_h)
+GEN_VEXT_VF(vfnmadd_vf_w)
+GEN_VEXT_VF(vfnmadd_vf_d)
 
 static uint16_t fmsub16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fmsub64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfmsub_vv_h, OP_UUU_H, H2, H2, H2, fmsub16)
 RVVCALL(OPFVV3, vfmsub_vv_w, OP_UUU_W, H4, H4, H4, fmsub32)
 RVVCALL(OPFVV3, vfmsub_vv_d, OP_UUU_D, H8, H8, H8, fmsub64)
-GEN_VEXT_VV_ENV(vfmsub_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfmsub_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfmsub_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfmsub_vv_h)
+GEN_VEXT_VV_ENV(vfmsub_vv_w)
+GEN_VEXT_VV_ENV(vfmsub_vv_d)
 RVVCALL(OPFVF3, vfmsub_vf_h, OP_UUU_H, H2, H2, fmsub16)
 RVVCALL(OPFVF3, vfmsub_vf_w, OP_UUU_W, H4, H4, fmsub32)
 RVVCALL(OPFVF3, vfmsub_vf_d, OP_UUU_D, H8, H8, fmsub64)
-GEN_VEXT_VF(vfmsub_vf_h, 2, 2)
-GEN_VEXT_VF(vfmsub_vf_w, 4, 4)
-GEN_VEXT_VF(vfmsub_vf_d, 8, 8)
+GEN_VEXT_VF(vfmsub_vf_h)
+GEN_VEXT_VF(vfmsub_vf_w)
+GEN_VEXT_VF(vfmsub_vf_d)
 
 static uint16_t fnmsub16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmsub64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfnmsub_vv_h, OP_UUU_H, H2, H2, H2, fnmsub16)
 RVVCALL(OPFVV3, vfnmsub_vv_w, OP_UUU_W, H4, H4, H4, fnmsub32)
 RVVCALL(OPFVV3, vfnmsub_vv_d, OP_UUU_D, H8, H8, H8, fnmsub64)
-GEN_VEXT_VV_ENV(vfnmsub_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfnmsub_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfnmsub_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfnmsub_vv_h)
+GEN_VEXT_VV_ENV(vfnmsub_vv_w)
+GEN_VEXT_VV_ENV(vfnmsub_vv_d)
 RVVCALL(OPFVF3, vfnmsub_vf_h, OP_UUU_H, H2, H2, fnmsub16)
 RVVCALL(OPFVF3, vfnmsub_vf_w, OP_UUU_W, H4, H4, fnmsub32)
 RVVCALL(OPFVF3, vfnmsub_vf_d, OP_UUU_D, H8, H8, fnmsub64)
-GEN_VEXT_VF(vfnmsub_vf_h, 2, 2)
-GEN_VEXT_VF(vfnmsub_vf_w, 4, 4)
-GEN_VEXT_VF(vfnmsub_vf_d, 8, 8)
+GEN_VEXT_VF(vfnmsub_vf_h)
+GEN_VEXT_VF(vfnmsub_vf_w)
+GEN_VEXT_VF(vfnmsub_vf_d)
 
 /* Vector Widening Floating-Point Fused Multiply-Add Instructions */
 static uint32_t fwmacc16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t fwmacc32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
 
 RVVCALL(OPFVV3, vfwmacc_vv_h, WOP_UUU_H, H4, H2, H2, fwmacc16)
 RVVCALL(OPFVV3, vfwmacc_vv_w, WOP_UUU_W, H8, H4, H4, fwmacc32)
-GEN_VEXT_VV_ENV(vfwmacc_vv_h, 2, 4)
-GEN_VEXT_VV_ENV(vfwmacc_vv_w, 4, 8)
+GEN_VEXT_VV_ENV(vfwmacc_vv_h)
+GEN_VEXT_VV_ENV(vfwmacc_vv_w)
 RVVCALL(OPFVF3, vfwmacc_vf_h, WOP_UUU_H, H4, H2, fwmacc16)
 RVVCALL(OPFVF3, vfwmacc_vf_w, WOP_UUU_W, H8, H4, fwmacc32)
-GEN_VEXT_VF(vfwmacc_vf_h, 2, 4)
-GEN_VEXT_VF(vfwmacc_vf_w, 4, 8)
+GEN_VEXT_VF(vfwmacc_vf_h)
+GEN_VEXT_VF(vfwmacc_vf_w)
 
 static uint32_t fwnmacc16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fwnmacc32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
 
 RVVCALL(OPFVV3, vfwnmacc_vv_h, WOP_UUU_H, H4, H2, H2, fwnmacc16)
 RVVCALL(OPFVV3, vfwnmacc_vv_w, WOP_UUU_W, H8, H4, H4, fwnmacc32)
-GEN_VEXT_VV_ENV(vfwnmacc_vv_h, 2, 4)
-GEN_VEXT_VV_ENV(vfwnmacc_vv_w, 4, 8)
+GEN_VEXT_VV_ENV(vfwnmacc_vv_h)
+GEN_VEXT_VV_ENV(vfwnmacc_vv_w)
 RVVCALL(OPFVF3, vfwnmacc_vf_h, WOP_UUU_H, H4, H2, fwnmacc16)
 RVVCALL(OPFVF3, vfwnmacc_vf_w, WOP_UUU_W, H8, H4, fwnmacc32)
-GEN_VEXT_VF(vfwnmacc_vf_h, 2, 4)
-GEN_VEXT_VF(vfwnmacc_vf_w, 4, 8)
+GEN_VEXT_VF(vfwnmacc_vf_h)
+GEN_VEXT_VF(vfwnmacc_vf_w)
 
 static uint32_t fwmsac16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fwmsac32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
 
 RVVCALL(OPFVV3, vfwmsac_vv_h, WOP_UUU_H, H4, H2, H2, fwmsac16)
 RVVCALL(OPFVV3, vfwmsac_vv_w, WOP_UUU_W, H8, H4, H4, fwmsac32)
-GEN_VEXT_VV_ENV(vfwmsac_vv_h, 2, 4)
-GEN_VEXT_VV_ENV(vfwmsac_vv_w, 4, 8)
+GEN_VEXT_VV_ENV(vfwmsac_vv_h)
+GEN_VEXT_VV_ENV(vfwmsac_vv_w)
 RVVCALL(OPFVF3, vfwmsac_vf_h, WOP_UUU_H, H4, H2, fwmsac16)
 RVVCALL(OPFVF3, vfwmsac_vf_w, WOP_UUU_W, H8, H4, fwmsac32)
-GEN_VEXT_VF(vfwmsac_vf_h, 2, 4)
-GEN_VEXT_VF(vfwmsac_vf_w, 4, 8)
+GEN_VEXT_VF(vfwmsac_vf_h)
+GEN_VEXT_VF(vfwmsac_vf_w)
 
 static uint32_t fwnmsac16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fwnmsac32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
 
 RVVCALL(OPFVV3, vfwnmsac_vv_h, WOP_UUU_H, H4, H2, H2, fwnmsac16)
 RVVCALL(OPFVV3, vfwnmsac_vv_w, WOP_UUU_W, H8, H4, H4, fwnmsac32)
-GEN_VEXT_VV_ENV(vfwnmsac_vv_h, 2, 4)
-GEN_VEXT_VV_ENV(vfwnmsac_vv_w, 4, 8)
+GEN_VEXT_VV_ENV(vfwnmsac_vv_h)
+GEN_VEXT_VV_ENV(vfwnmsac_vv_w)
 RVVCALL(OPFVF3, vfwnmsac_vf_h, WOP_UUU_H, H4, H2, fwnmsac16)
 RVVCALL(OPFVF3, vfwnmsac_vf_w, WOP_UUU_W, H8, H4, fwnmsac32)
-GEN_VEXT_VF(vfwnmsac_vf_h, 2, 4)
-GEN_VEXT_VF(vfwnmsac_vf_w, 4, 8)
+GEN_VEXT_VF(vfwnmsac_vf_h)
+GEN_VEXT_VF(vfwnmsac_vf_w)
 
 /* Vector Floating-Point Square-Root Instruction */
 /* (TD, T2, TX2) */
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, void *vs2, int i,      \
     *((TD *)vd + HD(i)) = OP(s2, &env->fp_status);     \
 }
 
-#define GEN_VEXT_V_ENV(NAME, ESZ, DSZ)                 \
+#define GEN_VEXT_V_ENV(NAME)                           \
 void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
         CPURISCVState *env, uint32_t desc)             \
 {                                                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
 RVVCALL(OPFVV1, vfsqrt_v_h, OP_UU_H, H2, H2, float16_sqrt)
 RVVCALL(OPFVV1, vfsqrt_v_w, OP_UU_W, H4, H4, float32_sqrt)
 RVVCALL(OPFVV1, vfsqrt_v_d, OP_UU_D, H8, H8, float64_sqrt)
-GEN_VEXT_V_ENV(vfsqrt_v_h, 2, 2)
-GEN_VEXT_V_ENV(vfsqrt_v_w, 4, 4)
-GEN_VEXT_V_ENV(vfsqrt_v_d, 8, 8)
+GEN_VEXT_V_ENV(vfsqrt_v_h)
+GEN_VEXT_V_ENV(vfsqrt_v_w)
+GEN_VEXT_V_ENV(vfsqrt_v_d)
 
 /*
  * Vector Floating-Point Reciprocal Square-Root Estimate Instruction
@@ -XXX,XX +XXX,XX @@ static float64 frsqrt7_d(float64 f, float_status *s)
 RVVCALL(OPFVV1, vfrsqrt7_v_h, OP_UU_H, H2, H2, frsqrt7_h)
 RVVCALL(OPFVV1, vfrsqrt7_v_w, OP_UU_W, H4, H4, frsqrt7_s)
 RVVCALL(OPFVV1, vfrsqrt7_v_d, OP_UU_D, H8, H8, frsqrt7_d)
-GEN_VEXT_V_ENV(vfrsqrt7_v_h, 2, 2)
-GEN_VEXT_V_ENV(vfrsqrt7_v_w, 4, 4)
-GEN_VEXT_V_ENV(vfrsqrt7_v_d, 8, 8)
+GEN_VEXT_V_ENV(vfrsqrt7_v_h)
+GEN_VEXT_V_ENV(vfrsqrt7_v_w)
+GEN_VEXT_V_ENV(vfrsqrt7_v_d)
 
 /*
  * Vector Floating-Point Reciprocal Estimate Instruction
@@ -XXX,XX +XXX,XX @@ static float64 frec7_d(float64 f, float_status *s)
 RVVCALL(OPFVV1, vfrec7_v_h, OP_UU_H, H2, H2, frec7_h)
 RVVCALL(OPFVV1, vfrec7_v_w, OP_UU_W, H4, H4, frec7_s)
 RVVCALL(OPFVV1, vfrec7_v_d, OP_UU_D, H8, H8, frec7_d)
-GEN_VEXT_V_ENV(vfrec7_v_h, 2, 2)
-GEN_VEXT_V_ENV(vfrec7_v_w, 4, 4)
-GEN_VEXT_V_ENV(vfrec7_v_d, 8, 8)
+GEN_VEXT_V_ENV(vfrec7_v_h)
+GEN_VEXT_V_ENV(vfrec7_v_w)
+GEN_VEXT_V_ENV(vfrec7_v_d)
 
 /* Vector Floating-Point MIN/MAX Instructions */
 RVVCALL(OPFVV2, vfmin_vv_h, OP_UUU_H, H2, H2, H2, float16_minimum_number)
 RVVCALL(OPFVV2, vfmin_vv_w, OP_UUU_W, H4, H4, H4, float32_minimum_number)
 RVVCALL(OPFVV2, vfmin_vv_d, OP_UUU_D, H8, H8, H8, float64_minimum_number)
-GEN_VEXT_VV_ENV(vfmin_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfmin_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfmin_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfmin_vv_h)
+GEN_VEXT_VV_ENV(vfmin_vv_w)
+GEN_VEXT_VV_ENV(vfmin_vv_d)
 RVVCALL(OPFVF2, vfmin_vf_h, OP_UUU_H, H2, H2, float16_minimum_number)
 RVVCALL(OPFVF2, vfmin_vf_w, OP_UUU_W, H4, H4, float32_minimum_number)
 RVVCALL(OPFVF2, vfmin_vf_d, OP_UUU_D, H8, H8, float64_minimum_number)
-GEN_VEXT_VF(vfmin_vf_h, 2, 2)
-GEN_VEXT_VF(vfmin_vf_w, 4, 4)
-GEN_VEXT_VF(vfmin_vf_d, 8, 8)
+GEN_VEXT_VF(vfmin_vf_h)
+GEN_VEXT_VF(vfmin_vf_w)
+GEN_VEXT_VF(vfmin_vf_d)
 
 RVVCALL(OPFVV2, vfmax_vv_h, OP_UUU_H, H2, H2, H2, float16_maximum_number)
 RVVCALL(OPFVV2, vfmax_vv_w, OP_UUU_W, H4, H4, H4, float32_maximum_number)
 RVVCALL(OPFVV2, vfmax_vv_d, OP_UUU_D, H8, H8, H8, float64_maximum_number)
-GEN_VEXT_VV_ENV(vfmax_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfmax_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfmax_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfmax_vv_h)
+GEN_VEXT_VV_ENV(vfmax_vv_w)
+GEN_VEXT_VV_ENV(vfmax_vv_d)
 RVVCALL(OPFVF2, vfmax_vf_h, OP_UUU_H, H2, H2, float16_maximum_number)
 RVVCALL(OPFVF2, vfmax_vf_w, OP_UUU_W, H4, H4, float32_maximum_number)
 RVVCALL(OPFVF2, vfmax_vf_d, OP_UUU_D, H8, H8, float64_maximum_number)
-GEN_VEXT_VF(vfmax_vf_h, 2, 2)
-GEN_VEXT_VF(vfmax_vf_w, 4, 4)
-GEN_VEXT_VF(vfmax_vf_d, 8, 8)
+GEN_VEXT_VF(vfmax_vf_h)
+GEN_VEXT_VF(vfmax_vf_w)
+GEN_VEXT_VF(vfmax_vf_d)
 
 /* Vector Floating-Point Sign-Injection Instructions */
 static uint16_t fsgnj16(uint16_t a, uint16_t b, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t fsgnj64(uint64_t a, uint64_t b, float_status *s)
 RVVCALL(OPFVV2, vfsgnj_vv_h, OP_UUU_H, H2, H2, H2, fsgnj16)
 RVVCALL(OPFVV2, vfsgnj_vv_w, OP_UUU_W, H4, H4, H4, fsgnj32)
 RVVCALL(OPFVV2, vfsgnj_vv_d, OP_UUU_D, H8, H8, H8, fsgnj64)
-GEN_VEXT_VV_ENV(vfsgnj_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfsgnj_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfsgnj_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfsgnj_vv_h)
+GEN_VEXT_VV_ENV(vfsgnj_vv_w)
+GEN_VEXT_VV_ENV(vfsgnj_vv_d)
 RVVCALL(OPFVF2, vfsgnj_vf_h, OP_UUU_H, H2, H2, fsgnj16)
 RVVCALL(OPFVF2, vfsgnj_vf_w, OP_UUU_W, H4, H4, fsgnj32)
 RVVCALL(OPFVF2, vfsgnj_vf_d, OP_UUU_D, H8, H8, fsgnj64)
-GEN_VEXT_VF(vfsgnj_vf_h, 2, 2)
-GEN_VEXT_VF(vfsgnj_vf_w, 4, 4)
-GEN_VEXT_VF(vfsgnj_vf_d, 8, 8)
+GEN_VEXT_VF(vfsgnj_vf_h)
+GEN_VEXT_VF(vfsgnj_vf_w)
+GEN_VEXT_VF(vfsgnj_vf_d)
 
 static uint16_t fsgnjn16(uint16_t a, uint16_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fsgnjn64(uint64_t a, uint64_t b, float_status *s)
 RVVCALL(OPFVV2, vfsgnjn_vv_h, OP_UUU_H, H2, H2, H2, fsgnjn16)
 RVVCALL(OPFVV2, vfsgnjn_vv_w, OP_UUU_W, H4, H4, H4, fsgnjn32)
 RVVCALL(OPFVV2, vfsgnjn_vv_d, OP_UUU_D, H8, H8, H8, fsgnjn64)
-GEN_VEXT_VV_ENV(vfsgnjn_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfsgnjn_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfsgnjn_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfsgnjn_vv_h)
+GEN_VEXT_VV_ENV(vfsgnjn_vv_w)
+GEN_VEXT_VV_ENV(vfsgnjn_vv_d)
 RVVCALL(OPFVF2, vfsgnjn_vf_h, OP_UUU_H, H2, H2, fsgnjn16)
 RVVCALL(OPFVF2, vfsgnjn_vf_w, OP_UUU_W, H4, H4, fsgnjn32)
 RVVCALL(OPFVF2, vfsgnjn_vf_d, OP_UUU_D, H8, H8, fsgnjn64)
-GEN_VEXT_VF(vfsgnjn_vf_h, 2, 2)
-GEN_VEXT_VF(vfsgnjn_vf_w, 4, 4)
-GEN_VEXT_VF(vfsgnjn_vf_d, 8, 8)
+GEN_VEXT_VF(vfsgnjn_vf_h)
+GEN_VEXT_VF(vfsgnjn_vf_w)
+GEN_VEXT_VF(vfsgnjn_vf_d)
 
 static uint16_t fsgnjx16(uint16_t a, uint16_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fsgnjx64(uint64_t a, uint64_t b, float_status *s)
 RVVCALL(OPFVV2, vfsgnjx_vv_h, OP_UUU_H, H2, H2, H2, fsgnjx16)
 RVVCALL(OPFVV2, vfsgnjx_vv_w, OP_UUU_W, H4, H4, H4, fsgnjx32)
 RVVCALL(OPFVV2, vfsgnjx_vv_d, OP_UUU_D, H8, H8, H8, fsgnjx64)
-GEN_VEXT_VV_ENV(vfsgnjx_vv_h, 2, 2)
-GEN_VEXT_VV_ENV(vfsgnjx_vv_w, 4, 4)
-GEN_VEXT_VV_ENV(vfsgnjx_vv_d, 8, 8)
+GEN_VEXT_VV_ENV(vfsgnjx_vv_h)
+GEN_VEXT_VV_ENV(vfsgnjx_vv_w)
+GEN_VEXT_VV_ENV(vfsgnjx_vv_d)
 RVVCALL(OPFVF2, vfsgnjx_vf_h, OP_UUU_H, H2, H2, fsgnjx16)
 RVVCALL(OPFVF2, vfsgnjx_vf_w, OP_UUU_W, H4, H4, fsgnjx32)
 RVVCALL(OPFVF2, vfsgnjx_vf_d, OP_UUU_D, H8, H8, fsgnjx64)
-GEN_VEXT_VF(vfsgnjx_vf_h, 2, 2)
-GEN_VEXT_VF(vfsgnjx_vf_w, 4, 4)
-GEN_VEXT_VF(vfsgnjx_vf_d, 8, 8)
+GEN_VEXT_VF(vfsgnjx_vf_h)
+GEN_VEXT_VF(vfsgnjx_vf_w)
+GEN_VEXT_VF(vfsgnjx_vf_d)
 
 /* Vector Floating-Point Compare Instructions */
 #define GEN_VEXT_CMP_VV_ENV(NAME, ETYPE, H, DO_OP)            \
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, void *vs2, int i)      \
     *((TD *)vd + HD(i)) = OP(s2);                      \
 }
 
-#define GEN_VEXT_V(NAME, ESZ, DSZ)                     \
+#define GEN_VEXT_V(NAME)                               \
 void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
                   CPURISCVState *env, uint32_t desc)   \
 {                                                      \
@@ -XXX,XX +XXX,XX @@ target_ulong fclass_d(uint64_t frs1)
 RVVCALL(OPIVV1, vfclass_v_h, OP_UU_H, H2, H2, fclass_h)
 RVVCALL(OPIVV1, vfclass_v_w, OP_UU_W, H4, H4, fclass_s)
 RVVCALL(OPIVV1, vfclass_v_d, OP_UU_D, H8, H8, fclass_d)
-GEN_VEXT_V(vfclass_v_h, 2, 2)
-GEN_VEXT_V(vfclass_v_w, 4, 4)
-GEN_VEXT_V(vfclass_v_d, 8, 8)
+GEN_VEXT_V(vfclass_v_h)
+GEN_VEXT_V(vfclass_v_w)
+GEN_VEXT_V(vfclass_v_d)
 
 /* Vector Floating-Point Merge Instruction */
 #define GEN_VFMERGE_VF(NAME, ETYPE, H)                        \
@@ -XXX,XX +XXX,XX @@ GEN_VFMERGE_VF(vfmerge_vfm_d, int64_t, H8)
 RVVCALL(OPFVV1, vfcvt_xu_f_v_h, OP_UU_H, H2, H2, float16_to_uint16)
 RVVCALL(OPFVV1, vfcvt_xu_f_v_w, OP_UU_W, H4, H4, float32_to_uint32)
 RVVCALL(OPFVV1, vfcvt_xu_f_v_d, OP_UU_D, H8, H8, float64_to_uint64)
-GEN_VEXT_V_ENV(vfcvt_xu_f_v_h, 2, 2)
-GEN_VEXT_V_ENV(vfcvt_xu_f_v_w, 4, 4)
-GEN_VEXT_V_ENV(vfcvt_xu_f_v_d, 8, 8)
+GEN_VEXT_V_ENV(vfcvt_xu_f_v_h)
+GEN_VEXT_V_ENV(vfcvt_xu_f_v_w)
+GEN_VEXT_V_ENV(vfcvt_xu_f_v_d)
 
 /* vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. */
 RVVCALL(OPFVV1, vfcvt_x_f_v_h, OP_UU_H, H2, H2, float16_to_int16)
 RVVCALL(OPFVV1, vfcvt_x_f_v_w, OP_UU_W, H4, H4, float32_to_int32)
 RVVCALL(OPFVV1, vfcvt_x_f_v_d, OP_UU_D, H8, H8, float64_to_int64)
-GEN_VEXT_V_ENV(vfcvt_x_f_v_h, 2, 2)
-GEN_VEXT_V_ENV(vfcvt_x_f_v_w, 4, 4)
-GEN_VEXT_V_ENV(vfcvt_x_f_v_d, 8, 8)
+GEN_VEXT_V_ENV(vfcvt_x_f_v_h)
+GEN_VEXT_V_ENV(vfcvt_x_f_v_w)
+GEN_VEXT_V_ENV(vfcvt_x_f_v_d)
 
 /* vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. */
 RVVCALL(OPFVV1, vfcvt_f_xu_v_h, OP_UU_H, H2, H2, uint16_to_float16)
 RVVCALL(OPFVV1, vfcvt_f_xu_v_w, OP_UU_W, H4, H4, uint32_to_float32)
 RVVCALL(OPFVV1, vfcvt_f_xu_v_d, OP_UU_D, H8, H8, uint64_to_float64)
-GEN_VEXT_V_ENV(vfcvt_f_xu_v_h, 2, 2)
-GEN_VEXT_V_ENV(vfcvt_f_xu_v_w, 4, 4)
-GEN_VEXT_V_ENV(vfcvt_f_xu_v_d, 8, 8)
+GEN_VEXT_V_ENV(vfcvt_f_xu_v_h)
+GEN_VEXT_V_ENV(vfcvt_f_xu_v_w)
+GEN_VEXT_V_ENV(vfcvt_f_xu_v_d)
 
 /* vfcvt.f.x.v vd, vs2, vm # Convert integer to float. */
 RVVCALL(OPFVV1, vfcvt_f_x_v_h, OP_UU_H, H2, H2, int16_to_float16)
 RVVCALL(OPFVV1, vfcvt_f_x_v_w, OP_UU_W, H4, H4, int32_to_float32)
 RVVCALL(OPFVV1, vfcvt_f_x_v_d, OP_UU_D, H8, H8, int64_to_float64)
-GEN_VEXT_V_ENV(vfcvt_f_x_v_h, 2, 2)
-GEN_VEXT_V_ENV(vfcvt_f_x_v_w, 4, 4)
-GEN_VEXT_V_ENV(vfcvt_f_x_v_d, 8, 8)
+GEN_VEXT_V_ENV(vfcvt_f_x_v_h)
+GEN_VEXT_V_ENV(vfcvt_f_x_v_w)
+GEN_VEXT_V_ENV(vfcvt_f_x_v_d)
 
 /* Widening Floating-Point/Integer Type-Convert Instructions */
 /* (TD, T2, TX2) */
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_V_ENV(vfcvt_f_x_v_d, 8, 8)
 /* vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer.*/
 RVVCALL(OPFVV1, vfwcvt_xu_f_v_h, WOP_UU_H, H4, H2, float16_to_uint32)
 RVVCALL(OPFVV1, vfwcvt_xu_f_v_w, WOP_UU_W, H8, H4, float32_to_uint64)
-GEN_VEXT_V_ENV(vfwcvt_xu_f_v_h, 2, 4)
-GEN_VEXT_V_ENV(vfwcvt_xu_f_v_w, 4, 8)
+GEN_VEXT_V_ENV(vfwcvt_xu_f_v_h)
+GEN_VEXT_V_ENV(vfwcvt_xu_f_v_w)
 
 /* vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. */
 RVVCALL(OPFVV1, vfwcvt_x_f_v_h, WOP_UU_H, H4, H2, float16_to_int32)
 RVVCALL(OPFVV1, vfwcvt_x_f_v_w, WOP_UU_W, H8, H4, float32_to_int64)
-GEN_VEXT_V_ENV(vfwcvt_x_f_v_h, 2, 4)
-GEN_VEXT_V_ENV(vfwcvt_x_f_v_w, 4, 8)
+GEN_VEXT_V_ENV(vfwcvt_x_f_v_h)
+GEN_VEXT_V_ENV(vfwcvt_x_f_v_w)
 
 /* vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float */
 RVVCALL(OPFVV1, vfwcvt_f_xu_v_b, WOP_UU_B, H2, H1, uint8_to_float16)
 RVVCALL(OPFVV1, vfwcvt_f_xu_v_h, WOP_UU_H, H4, H2, uint16_to_float32)
 RVVCALL(OPFVV1, vfwcvt_f_xu_v_w, WOP_UU_W, H8, H4, uint32_to_float64)
-GEN_VEXT_V_ENV(vfwcvt_f_xu_v_b, 1, 2)
-GEN_VEXT_V_ENV(vfwcvt_f_xu_v_h, 2, 4)
-GEN_VEXT_V_ENV(vfwcvt_f_xu_v_w, 4, 8)
+GEN_VEXT_V_ENV(vfwcvt_f_xu_v_b)
+GEN_VEXT_V_ENV(vfwcvt_f_xu_v_h)
+GEN_VEXT_V_ENV(vfwcvt_f_xu_v_w)
 
 /* vfwcvt.f.x.v vd, vs2, vm # Convert integer to double-width float. */
 RVVCALL(OPFVV1, vfwcvt_f_x_v_b, WOP_UU_B, H2, H1, int8_to_float16)
 RVVCALL(OPFVV1, vfwcvt_f_x_v_h, WOP_UU_H, H4, H2, int16_to_float32)
 RVVCALL(OPFVV1, vfwcvt_f_x_v_w, WOP_UU_W, H8, H4, int32_to_float64)
-GEN_VEXT_V_ENV(vfwcvt_f_x_v_b, 1, 2)
-GEN_VEXT_V_ENV(vfwcvt_f_x_v_h, 2, 4)
-GEN_VEXT_V_ENV(vfwcvt_f_x_v_w, 4, 8)
+GEN_VEXT_V_ENV(vfwcvt_f_x_v_b)
+GEN_VEXT_V_ENV(vfwcvt_f_x_v_h)
+GEN_VEXT_V_ENV(vfwcvt_f_x_v_w)
 
 /*
  * vfwcvt.f.f.v vd, vs2, vm
@@ -XXX,XX +XXX,XX @@ static uint32_t vfwcvtffv16(uint16_t a, float_status *s)
 
 RVVCALL(OPFVV1, vfwcvt_f_f_v_h, WOP_UU_H, H4, H2, vfwcvtffv16)
 RVVCALL(OPFVV1, vfwcvt_f_f_v_w, WOP_UU_W, H8, H4, float32_to_float64)
-GEN_VEXT_V_ENV(vfwcvt_f_f_v_h, 2, 4)
-GEN_VEXT_V_ENV(vfwcvt_f_f_v_w, 4, 8)
+GEN_VEXT_V_ENV(vfwcvt_f_f_v_h)
+GEN_VEXT_V_ENV(vfwcvt_f_f_v_w)
 
 /* Narrowing Floating-Point/Integer Type-Convert Instructions */
 /* (TD, T2, TX2) */
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_V_ENV(vfwcvt_f_f_v_w, 4, 8)
 RVVCALL(OPFVV1, vfncvt_xu_f_w_b, NOP_UU_B, H1, H2, float16_to_uint8)
 RVVCALL(OPFVV1, vfncvt_xu_f_w_h, NOP_UU_H, H2, H4, float32_to_uint16)
 RVVCALL(OPFVV1, vfncvt_xu_f_w_w, NOP_UU_W, H4, H8, float64_to_uint32)
-GEN_VEXT_V_ENV(vfncvt_xu_f_w_b, 1, 1)
-GEN_VEXT_V_ENV(vfncvt_xu_f_w_h, 2, 2)
-GEN_VEXT_V_ENV(vfncvt_xu_f_w_w, 4, 4)
+GEN_VEXT_V_ENV(vfncvt_xu_f_w_b)
+GEN_VEXT_V_ENV(vfncvt_xu_f_w_h)
+GEN_VEXT_V_ENV(vfncvt_xu_f_w_w)
 
 /* vfncvt.x.f.v vd, vs2, vm # Convert double-width float to signed integer. */
 RVVCALL(OPFVV1, vfncvt_x_f_w_b, NOP_UU_B, H1, H2, float16_to_int8)
 RVVCALL(OPFVV1, vfncvt_x_f_w_h, NOP_UU_H, H2, H4, float32_to_int16)
 RVVCALL(OPFVV1, vfncvt_x_f_w_w, NOP_UU_W, H4, H8, float64_to_int32)
-GEN_VEXT_V_ENV(vfncvt_x_f_w_b, 1, 1)
-GEN_VEXT_V_ENV(vfncvt_x_f_w_h, 2, 2)
-GEN_VEXT_V_ENV(vfncvt_x_f_w_w, 4, 4)
+GEN_VEXT_V_ENV(vfncvt_x_f_w_b)
+GEN_VEXT_V_ENV(vfncvt_x_f_w_h)
+GEN_VEXT_V_ENV(vfncvt_x_f_w_w)
 
 /* vfncvt.f.xu.v vd, vs2, vm # Convert double-width unsigned integer to float */
 RVVCALL(OPFVV1, vfncvt_f_xu_w_h, NOP_UU_H, H2, H4, uint32_to_float16)
 RVVCALL(OPFVV1, vfncvt_f_xu_w_w, NOP_UU_W, H4, H8, uint64_to_float32)
-GEN_VEXT_V_ENV(vfncvt_f_xu_w_h, 2, 2)
-GEN_VEXT_V_ENV(vfncvt_f_xu_w_w, 4, 4)
+GEN_VEXT_V_ENV(vfncvt_f_xu_w_h)
+GEN_VEXT_V_ENV(vfncvt_f_xu_w_w)
 
 /* vfncvt.f.x.v vd, vs2, vm # Convert double-width integer to float. */
 RVVCALL(OPFVV1, vfncvt_f_x_w_h, NOP_UU_H, H2, H4, int32_to_float16)
 RVVCALL(OPFVV1, vfncvt_f_x_w_w, NOP_UU_W, H4, H8, int64_to_float32)
-GEN_VEXT_V_ENV(vfncvt_f_x_w_h, 2, 2)
-GEN_VEXT_V_ENV(vfncvt_f_x_w_w, 4, 4)
+GEN_VEXT_V_ENV(vfncvt_f_x_w_h)
+GEN_VEXT_V_ENV(vfncvt_f_x_w_w)
 
 /* vfncvt.f.f.v vd, vs2, vm # Convert double float to single-width float. */
 static uint16_t vfncvtffv16(uint32_t a, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint16_t vfncvtffv16(uint32_t a, float_status *s)
 
 RVVCALL(OPFVV1, vfncvt_f_f_w_h, NOP_UU_H, H2, H4, vfncvtffv16)
 RVVCALL(OPFVV1, vfncvt_f_f_w_w, NOP_UU_W, H4, H8, float64_to_float32)
-GEN_VEXT_V_ENV(vfncvt_f_f_w_h, 2, 2)
-GEN_VEXT_V_ENV(vfncvt_f_f_w_w, 4, 4)
+GEN_VEXT_V_ENV(vfncvt_f_f_w_h)
+GEN_VEXT_V_ENV(vfncvt_f_f_w_w)
 
 /*
  *** Vector Reduction Operations
-- 
2.36.1

From: eopXD <yueh.ting.chen@gmail.com>

No functional change intended in this commit.

Signed-off-by: eop Chen <eop.chen@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <165449614532.19704.7000832880482980398-2@git.sr.ht>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/vector_helper.c | 35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
                  target_ulong stride, CPURISCVState *env,
                  uint32_t desc, uint32_t vm,
                  vext_ldst_elem_fn *ldst_elem,
-                 uint32_t esz, uintptr_t ra, MMUAccessType access_type)
+                 uint32_t esz, uintptr_t ra)
 {
     uint32_t i, k;
     uint32_t nf = vext_nf(desc);
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void * v0, target_ulong base,               \
 {                                                                       \
     uint32_t vm = vext_vm(desc);                                        \
     vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN,      \
-                     ctzl(sizeof(ETYPE)), GETPC(), MMU_DATA_LOAD);      \
+                     ctzl(sizeof(ETYPE)), GETPC());                     \
 }
 
 GEN_VEXT_LD_STRIDE(vlse8_v,  int8_t,  lde_b)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
 {                                                                       \
     uint32_t vm = vext_vm(desc);                                        \
     vext_ldst_stride(vd, v0, base, stride, env, desc, vm, STORE_FN,     \
-                     ctzl(sizeof(ETYPE)), GETPC(), MMU_DATA_STORE);     \
+                     ctzl(sizeof(ETYPE)), GETPC());                     \
 }
 
 GEN_VEXT_ST_STRIDE(vsse8_v,  int8_t,  ste_b)
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_ST_STRIDE(vsse64_v, int64_t, ste_d)
 static void
 vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
              vext_ldst_elem_fn *ldst_elem, uint32_t esz, uint32_t evl,
-             uintptr_t ra, MMUAccessType access_type)
+             uintptr_t ra)
 {
     uint32_t i, k;
     uint32_t nf = vext_nf(desc);
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,         \
 {                                                                       \
     uint32_t stride = vext_nf(desc) << ctzl(sizeof(ETYPE));             \
     vext_ldst_stride(vd, v0, base, stride, env, desc, false, LOAD_FN,   \
-                     ctzl(sizeof(ETYPE)), GETPC(), MMU_DATA_LOAD);      \
+                     ctzl(sizeof(ETYPE)), GETPC());                     \
 }                                                                       \
                                                                         \
 void HELPER(NAME)(void *vd, void *v0, target_ulong base,                \
                   CPURISCVState *env, uint32_t desc)                    \
 {                                                                       \
     vext_ldst_us(vd, base, env, desc, LOAD_FN,                          \
-                 ctzl(sizeof(ETYPE)), env->vl, GETPC(), MMU_DATA_LOAD); \
+                 ctzl(sizeof(ETYPE)), env->vl, GETPC());                \
 }
 
 GEN_VEXT_LD_US(vle8_v,  int8_t,  lde_b)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,          \
 {                                                                        \
     uint32_t stride = vext_nf(desc) << ctzl(sizeof(ETYPE));              \
     vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN,   \
-                     ctzl(sizeof(ETYPE)), GETPC(), MMU_DATA_STORE);      \
+                     ctzl(sizeof(ETYPE)), GETPC());                      \
 }                                                                        \
                                                                          \
 void HELPER(NAME)(void *vd, void *v0, target_ulong base,                 \
                   CPURISCVState *env, uint32_t desc)                     \
 {                                                                        \
     vext_ldst_us(vd, base, env, desc, STORE_FN,                          \
-                 ctzl(sizeof(ETYPE)), env->vl, GETPC(), MMU_DATA_STORE); \
+                 ctzl(sizeof(ETYPE)), env->vl, GETPC());                 \
 }
 
 GEN_VEXT_ST_US(vse8_v,  int8_t,  ste_b)
@@ -XXX,XX +XXX,XX @@ void HELPER(vlm_v)(void *vd, void *v0, target_ulong base,
     /* evl = ceil(vl/8) */
     uint8_t evl = (env->vl + 7) >> 3;
     vext_ldst_us(vd, base, env, desc, lde_b,
-                 0, evl, GETPC(), MMU_DATA_LOAD);
+                 0, evl, GETPC());
 }
 
 void HELPER(vsm_v)(void *vd, void *v0, target_ulong base,
@@ -XXX,XX +XXX,XX @@ void HELPER(vsm_v)(void *vd, void *v0, target_ulong base,
     /* evl = ceil(vl/8) */
     uint8_t evl = (env->vl + 7) >> 3;
     vext_ldst_us(vd, base, env, desc, ste_b,
-                 0, evl, GETPC(), MMU_DATA_STORE);
+                 0, evl, GETPC());
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
                 void *vs2, CPURISCVState *env, uint32_t desc,
                 vext_get_index_addr get_index_addr,
                 vext_ldst_elem_fn *ldst_elem,
-                uint32_t esz, uintptr_t ra, MMUAccessType access_type)
+                uint32_t esz, uintptr_t ra)
 {
     uint32_t i, k;
     uint32_t nf = vext_nf(desc);
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong base,                   \
                   void *vs2, CPURISCVState *env, uint32_t desc)            \
 {                                                                          \
     vext_ldst_index(vd, v0, base, vs2, env, desc, INDEX_FN,                \
-                    LOAD_FN, ctzl(sizeof(ETYPE)), GETPC(), MMU_DATA_LOAD); \
+                    LOAD_FN, ctzl(sizeof(ETYPE)), GETPC());                \
 }
 
 GEN_VEXT_LD_INDEX(vlxei8_8_v,   int8_t,  idx_b, lde_b)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong base,         \
 {                                                                \
     vext_ldst_index(vd, v0, base, vs2, env, desc, INDEX_FN,      \
                     STORE_FN, ctzl(sizeof(ETYPE)),               \
-                    GETPC(), MMU_DATA_STORE);                    \
+                    GETPC());                                    \
 }
 
 GEN_VEXT_ST_INDEX(vsxei8_8_v,   int8_t,  idx_b, ste_b)
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_LDFF(vle64ff_v, int64_t, lde_d)
  */
 static void
 vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
-                vext_ldst_elem_fn *ldst_elem, uint32_t esz, uintptr_t ra,
-                MMUAccessType access_type)
+                vext_ldst_elem_fn *ldst_elem, uint32_t esz, uintptr_t ra)
 {
     uint32_t i, k, off, pos;
     uint32_t nf = vext_nf(desc);
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, target_ulong base,       \
                   CPURISCVState *env, uint32_t desc) \
 {                                                    \
     vext_ldst_whole(vd, base, env, desc, LOAD_FN,    \
-                    ctzl(sizeof(ETYPE)), GETPC(),    \
-                    MMU_DATA_LOAD);                  \
+                    ctzl(sizeof(ETYPE)), GETPC());   \
 }
 
 GEN_VEXT_LD_WHOLE(vl1re8_v,  int8_t,  lde_b)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, target_ulong base,       \
                   CPURISCVState *env, uint32_t desc) \
 {                                                    \
     vext_ldst_whole(vd, base, env, desc, STORE_FN,   \
-                    ctzl(sizeof(ETYPE)), GETPC(),    \
-                    MMU_DATA_STORE);                 \
+                    ctzl(sizeof(ETYPE)), GETPC());   \
 }
 
 GEN_VEXT_ST_WHOLE(vs1r_v, int8_t, ste_b)
-- 
2.36.1

From: eopXD <yueh.ting.chen@gmail.com>

No functional change intended in this commit.

Signed-off-by: eop Chen <eop.chen@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <165449614532.19704.7000832880482980398-3@git.sr.ht>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/vector_helper.c | 76 ++++++++++++++++++------------------
 1 file changed, 38 insertions(+), 38 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ static inline int32_t vext_lmul(uint32_t desc)
 /*
  * Get the maximum number of elements can be operated.
  *
- * esz: log2 of element size in bytes.
+ * log2_esz: log2 of element size in bytes.
  */
-static inline uint32_t vext_max_elems(uint32_t desc, uint32_t esz)
+static inline uint32_t vext_max_elems(uint32_t desc, uint32_t log2_esz)
 {
     /*
      * As simd_desc support at most 2048 bytes, the max vlen is 1024 bits.
@@ -XXX,XX +XXX,XX @@ static inline uint32_t vext_max_elems(uint32_t desc, uint32_t esz)
     uint32_t vlenb = simd_maxsz(desc);
 
     /* Return VLMAX */
-    int scale = vext_lmul(desc) - esz;
+    int scale = vext_lmul(desc) - log2_esz;
     return scale < 0 ? vlenb >> -scale : vlenb << scale;
 }
 
@@ -XXX,XX +XXX,XX @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
                  target_ulong stride, CPURISCVState *env,
                  uint32_t desc, uint32_t vm,
                  vext_ldst_elem_fn *ldst_elem,
-                 uint32_t esz, uintptr_t ra)
+                 uint32_t log2_esz, uintptr_t ra)
 {
     uint32_t i, k;
     uint32_t nf = vext_nf(desc);
-    uint32_t max_elems = vext_max_elems(desc, esz);
+    uint32_t max_elems = vext_max_elems(desc, log2_esz);
 
     for (i = env->vstart; i < env->vl; i++, env->vstart++) {
         if (!vm && !vext_elem_mask(v0, i)) {
@@ -XXX,XX +XXX,XX @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
 
         k = 0;
         while (k < nf) {
-            target_ulong addr = base + stride * i + (k << esz);
+            target_ulong addr = base + stride * i + (k << log2_esz);
             ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
             k++;
         }
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_ST_STRIDE(vsse64_v, int64_t, ste_d)
 /* unmasked unit-stride load and store operation*/
 static void
 vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
-             vext_ldst_elem_fn *ldst_elem, uint32_t esz, uint32_t evl,
+             vext_ldst_elem_fn *ldst_elem, uint32_t log2_esz, uint32_t evl,
              uintptr_t ra)
 {
     uint32_t i, k;
     uint32_t nf = vext_nf(desc);
-    uint32_t max_elems = vext_max_elems(desc, esz);
+    uint32_t max_elems = vext_max_elems(desc, log2_esz);
 
     /* load bytes from guest memory */
     for (i = env->vstart; i < evl; i++, env->vstart++) {
         k = 0;
         while (k < nf) {
-            target_ulong addr = base + ((i * nf + k) << esz);
+            target_ulong addr = base + ((i * nf + k) << log2_esz);
             ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
             k++;
         }
@@ -XXX,XX +XXX,XX @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
                 void *vs2, CPURISCVState *env, uint32_t desc,
                 vext_get_index_addr get_index_addr,
                 vext_ldst_elem_fn *ldst_elem,
-                uint32_t esz, uintptr_t ra)
+                uint32_t log2_esz, uintptr_t ra)
 {
     uint32_t i, k;
     uint32_t nf = vext_nf(desc);
     uint32_t vm = vext_vm(desc);
-    uint32_t max_elems = vext_max_elems(desc, esz);
+    uint32_t max_elems = vext_max_elems(desc, log2_esz);
 
     /* load bytes from guest memory */
     for (i = env->vstart; i < env->vl; i++, env->vstart++) {
@@ -XXX,XX +XXX,XX @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
 
         k = 0;
         while (k < nf) {
-            abi_ptr addr = get_index_addr(base, i, vs2) + (k << esz);
+            abi_ptr addr = get_index_addr(base, i, vs2) + (k << log2_esz);
             ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
             k++;
         }
@@ -XXX,XX +XXX,XX @@ static inline void
 vext_ldff(void *vd, void *v0, target_ulong base,
           CPURISCVState *env, uint32_t desc,
           vext_ldst_elem_fn *ldst_elem,
-          uint32_t esz, uintptr_t ra)
+          uint32_t log2_esz, uintptr_t ra)
 {
     void *host;
     uint32_t i, k, vl = 0;
     uint32_t nf = vext_nf(desc);
     uint32_t vm = vext_vm(desc);
-    uint32_t max_elems = vext_max_elems(desc, esz);
+    uint32_t max_elems = vext_max_elems(desc, log2_esz);
     target_ulong addr, offset, remain;
 
     /* probe every access*/
@@ -XXX,XX +XXX,XX @@ vext_ldff(void *vd, void *v0, target_ulong base,
         if (!vm && !vext_elem_mask(v0, i)) {
             continue;
         }
-        addr = adjust_addr(env, base + i * (nf << esz));
+        addr = adjust_addr(env, base + i * (nf << log2_esz));
         if (i == 0) {
-            probe_pages(env, addr, nf << esz, ra, MMU_DATA_LOAD);
+            probe_pages(env, addr, nf << log2_esz, ra, MMU_DATA_LOAD);
         } else {
             /* if it triggers an exception, no need to check watchpoint */
-            remain = nf << esz;
+            remain = nf << log2_esz;
             while (remain > 0) {
                 offset = -(addr | TARGET_PAGE_MASK);
                 host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD,
@@ -XXX,XX +XXX,XX @@ ProbeSuccess:
             continue;
         }
         while (k < nf) {
-            target_ulong addr = base + ((i * nf + k) << esz);
+            target_ulong addr = base + ((i * nf + k) << log2_esz);
             ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
             k++;
         }
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_LDFF(vle64ff_v, int64_t, lde_d)
  */
 static void
 vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
-                vext_ldst_elem_fn *ldst_elem, uint32_t esz, uintptr_t ra)
+                vext_ldst_elem_fn *ldst_elem, uint32_t log2_esz, uintptr_t ra)
 {
     uint32_t i, k, off, pos;
     uint32_t nf = vext_nf(desc);
     uint32_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
-    uint32_t max_elems = vlenb >> esz;
+    uint32_t max_elems = vlenb >> log2_esz;
 
     k = env->vstart / max_elems;
     off = env->vstart % max_elems;
@@ -XXX,XX +XXX,XX @@ vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
     if (off) {
         /* load/store rest of elements of current segment pointed by vstart */
         for (pos = off; pos < max_elems; pos++, env->vstart++) {
-            target_ulong addr = base + ((pos + k * max_elems) << esz);
+            target_ulong addr = base + ((pos + k * max_elems) << log2_esz);
             ldst_elem(env, adjust_addr(env, addr), pos + k * max_elems, vd, ra);
         }
         k++;
@@ -XXX,XX +XXX,XX @@ vext_ldst_whole(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
     /* load/store elements for rest of segments */
     for (; k < nf; k++) {
         for (i = 0; i < max_elems; i++, env->vstart++) {
-            target_ulong addr = base + ((i + k * max_elems) << esz);
+            target_ulong addr = base + ((i + k * max_elems) << log2_esz);
             ldst_elem(env, adjust_addr(env, addr), i + k * max_elems, vd, ra);
         }
     }
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VSLIDEDOWN_VX(vslidedown_vx_h, uint16_t, H2)
 GEN_VEXT_VSLIDEDOWN_VX(vslidedown_vx_w, uint32_t, H4)
 GEN_VEXT_VSLIDEDOWN_VX(vslidedown_vx_d, uint64_t, H8)
 
-#define GEN_VEXT_VSLIE1UP(ESZ, H)                                           \
-static void vslide1up_##ESZ(void *vd, void *v0, target_ulong s1, void *vs2, \
-                     CPURISCVState *env, uint32_t desc)                     \
+#define GEN_VEXT_VSLIE1UP(BITWIDTH, H)                                      \
+static void vslide1up_##BITWIDTH(void *vd, void *v0, target_ulong s1,       \
+                     void *vs2, CPURISCVState *env, uint32_t desc)          \
 {                                                                           \
-    typedef uint##ESZ##_t ETYPE;                                            \
+    typedef uint##BITWIDTH##_t ETYPE;                                       \
     uint32_t vm = vext_vm(desc);                                            \
     uint32_t vl = env->vl;                                                  \
     uint32_t i;                                                             \
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VSLIE1UP(16, H2)
 GEN_VEXT_VSLIE1UP(32, H4)
 GEN_VEXT_VSLIE1UP(64, H8)
 
-#define GEN_VEXT_VSLIDE1UP_VX(NAME, ESZ)                          \
+#define GEN_VEXT_VSLIDE1UP_VX(NAME, BITWIDTH)                     \
 void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
                   CPURISCVState *env, uint32_t desc)              \
 {                                                                 \
-    vslide1up_##ESZ(vd, v0, s1, vs2, env, desc);                  \
+    vslide1up_##BITWIDTH(vd, v0, s1, vs2, env, desc);             \
 }
 
 /* vslide1up.vx vd, vs2, rs1, vm # vd[0]=x[rs1], vd[i+1] = vs2[i] */
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VSLIDE1UP_VX(vslide1up_vx_h, 16)
 GEN_VEXT_VSLIDE1UP_VX(vslide1up_vx_w, 32)
 GEN_VEXT_VSLIDE1UP_VX(vslide1up_vx_d, 64)
 
-#define GEN_VEXT_VSLIDE1DOWN(ESZ, H)                                          \
-static void vslide1down_##ESZ(void *vd, void *v0, target_ulong s1, void *vs2, \
-                       CPURISCVState *env, uint32_t desc)                     \
+#define GEN_VEXT_VSLIDE1DOWN(BITWIDTH, H)                                     \
+static void vslide1down_##BITWIDTH(void *vd, void *v0, target_ulong s1,       \
+                       void *vs2, CPURISCVState *env, uint32_t desc)          \
 {                                                                             \
-    typedef uint##ESZ##_t ETYPE;                                              \
+    typedef uint##BITWIDTH##_t ETYPE;                                         \
     uint32_t vm = vext_vm(desc);                                              \
     uint32_t vl = env->vl;                                                    \
     uint32_t i;                                                               \
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VSLIDE1DOWN(16, H2)
 GEN_VEXT_VSLIDE1DOWN(32, H4)
 GEN_VEXT_VSLIDE1DOWN(64, H8)
 
-#define GEN_VEXT_VSLIDE1DOWN_VX(NAME, ESZ)                        \
+#define GEN_VEXT_VSLIDE1DOWN_VX(NAME, BITWIDTH)                   \
 void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
                   CPURISCVState *env, uint32_t desc)              \
 {                                                                 \
-    vslide1down_##ESZ(vd, v0, s1, vs2, env, desc);                \
+    vslide1down_##BITWIDTH(vd, v0, s1, vs2, env, desc);           \
 }
 
 /* vslide1down.vx vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=x[rs1] */
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_w, 32)
 GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_d, 64)
 
 /* Vector Floating-Point Slide Instructions */
-#define GEN_VEXT_VFSLIDE1UP_VF(NAME, ESZ)                     \
+#define GEN_VEXT_VFSLIDE1UP_VF(NAME, BITWIDTH)                \
 void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2, \
                   CPURISCVState *env, uint32_t desc)          \
 {                                                             \
-    vslide1up_##ESZ(vd, v0, s1, vs2, env, desc);              \
+    vslide1up_##BITWIDTH(vd, v0, s1, vs2, env, desc);         \
 }
 
 /* vfslide1up.vf vd, vs2, rs1, vm # vd[0]=f[rs1], vd[i+1] = vs2[i] */
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VFSLIDE1UP_VF(vfslide1up_vf_h, 16)
 GEN_VEXT_VFSLIDE1UP_VF(vfslide1up_vf_w, 32)
 GEN_VEXT_VFSLIDE1UP_VF(vfslide1up_vf_d, 64)
 
-#define GEN_VEXT_VFSLIDE1DOWN_VF(NAME, ESZ)                   \
+#define GEN_VEXT_VFSLIDE1DOWN_VF(NAME, BITWIDTH)              \
 void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2, \
                   CPURISCVState *env, uint32_t desc)          \
 {                                                             \
-    vslide1down_##ESZ(vd, v0, s1, vs2, env, desc);            \
+    vslide1down_##BITWIDTH(vd, v0, s1, vs2, env, desc);       \
 }
 
 /* vfslide1down.vf vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=f[rs1] */
-- 
2.36.1

From: eopXD <yueh.ting.chen@gmail.com>

According to v-spec (section 5.4):
When vstart ≥ vl, there are no body elements, and no elements are
updated in any destination vector register group, including that
no tail elements are updated with agnostic values.

vmsbf.m, vmsif.m, vmsof.m, viota.m, vcompress instructions themselves
require vstart to be zero. So they don't need the early exit.

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -XXX,XX +XXX,XX @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
 
     TCGLabel *over = gen_new_label();
     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     dest = tcg_temp_new_ptr();
     mask = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
 
     TCGLabel *over = gen_new_label();
     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     dest = tcg_temp_new_ptr();
     mask = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ static bool ldst_index_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
 
     TCGLabel *over = gen_new_label();
     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     dest = tcg_temp_new_ptr();
     mask = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ static bool ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
 
     TCGLabel *over = gen_new_label();
     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     dest = tcg_temp_new_ptr();
     mask = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn *gvec_fn,
     }
 
     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     if (a->vm && s->vl_eq_vlmax) {
         gvec_fn(s->sew, vreg_ofs(s, a->rd),
@@ -XXX,XX +XXX,XX @@ static bool opivx_trans(uint32_t vd, uint32_t rs1, uint32_t vs2, uint32_t vm,
 
     TCGLabel *over = gen_new_label();
     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     dest = tcg_temp_new_ptr();
     mask = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ static bool opivi_trans(uint32_t vd, uint32_t imm, uint32_t vs2, uint32_t vm,
 
     TCGLabel *over = gen_new_label();
     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     dest = tcg_temp_new_ptr();
     mask = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ static bool do_opivv_widen(DisasContext *s, arg_rmrr *a,
         uint32_t data = 0;
         TCGLabel *over = gen_new_label();
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
         data = FIELD_DP32(data, VDATA, VM, a->vm);
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
@@ -XXX,XX +XXX,XX @@ static bool do_opiwv_widen(DisasContext *s, arg_rmrr *a,
         uint32_t data = 0;
         TCGLabel *over = gen_new_label();
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
         data = FIELD_DP32(data, VDATA, VM, a->vm);
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
         };                                                         \
         TCGLabel *over = gen_new_label();                          \
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
         };                                                         \
         TCGLabel *over = gen_new_label();                          \
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_v(DisasContext *s, arg_vmv_v_v *a)
             };
             TCGLabel *over = gen_new_label();
             tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+            tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
             tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),
                                cpu_env, s->cfg_ptr->vlen / 8,
@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_x(DisasContext *s, arg_vmv_v_x *a)
         TCGv s1;
         TCGLabel *over = gen_new_label();
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
         s1 = get_gpr(s, a->rs1, EXT_SIGN);
 
@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
             };
             TCGLabel *over = gen_new_label();
             tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+            tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
             s1 = tcg_constant_i64(simm);
             dest = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
         TCGLabel *over = gen_new_label();                          \
         gen_set_rm(s, RISCV_FRM_DYN);                              \
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
@@ -XXX,XX +XXX,XX @@ static bool opfvf_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
 
     TCGLabel *over = gen_new_label();
     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     dest = tcg_temp_new_ptr();
     mask = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
         TCGLabel *over = gen_new_label();                        \
         gen_set_rm(s, RISCV_FRM_DYN);                            \
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);        \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);\
                                                                  \
         data = FIELD_DP32(data, VDATA, VM, a->vm);               \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);           \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
         TCGLabel *over = gen_new_label();                          \
         gen_set_rm(s, RISCV_FRM_DYN);                              \
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
@@ -XXX,XX +XXX,XX @@ static bool do_opfv(DisasContext *s, arg_rmr *a,
         TCGLabel *over = gen_new_label();
         gen_set_rm(s, rm);
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
         data = FIELD_DP32(data, VDATA, VM, a->vm);
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
@@ -XXX,XX +XXX,XX @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f *a)
             };
             TCGLabel *over = gen_new_label();
             tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+            tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
             t1 = tcg_temp_new_i64();
             /* NaN-box f[rs1] */
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
         TCGLabel *over = gen_new_label();                          \
         gen_set_rm(s, FRM);                                        \
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
         TCGLabel *over = gen_new_label();                          \
         gen_set_rm(s, RISCV_FRM_DYN);                              \
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
         TCGLabel *over = gen_new_label();                          \
         gen_set_rm(s, FRM);                                        \
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
         TCGLabel *over = gen_new_label();                          \
         gen_set_rm(s, FRM);                                        \
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_r *a)                \
         gen_helper_gvec_4_ptr *fn = gen_helper_##NAME;             \
         TCGLabel *over = gen_new_label();                          \
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
         tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
@@ -XXX,XX +XXX,XX @@ static bool trans_vid_v(DisasContext *s, arg_vid_v *a)
         uint32_t data = 0;
         TCGLabel *over = gen_new_label();
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
         data = FIELD_DP32(data, VDATA, VM, a->vm);
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
@@ -XXX,XX +XXX,XX @@ static bool int_ext_op(DisasContext *s, arg_rmr *a, uint8_t seq)
     gen_helper_gvec_3_ptr *fn;
     TCGLabel *over = gen_new_label();
     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+    tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     static gen_helper_gvec_3_ptr * const fns[6][4] = {
         {
-- 
2.36.1

From: eopXD <eop.chen@sifive.com>

According to v-spec, tail agnostic behavior can be either kept as
undisturbed or set elements' bits to all 1s. To distinguish the
difference of tail policies, QEMU should be able to simulate the tail
agnostic behavior as "set tail elements' bits to all 1s".

There are multiple possibility for agnostic elements according to
v-spec. The main intent of this patch-set tries to add option that
can distinguish between tail policies. Setting agnostic elements to
all 1s allows QEMU to express this.

This is the first commit regarding the optional tail agnostic
behavior. Follow-up commits will add this optional behavior
for all rvv instructions.

Signed-off-by: eop Chen <eop.chen@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <165449614532.19704.7000832880482980398-5@git.sr.ht>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu.h                      |   2 +
 target/riscv/internals.h                |   5 +-
 target/riscv/cpu_helper.c               |   2 +
 target/riscv/translate.c                |   2 +
 target/riscv/vector_helper.c            | 296 +++++++++++++-----------
 target/riscv/insn_trans/trans_rvv.c.inc |   3 +-
 6 files changed, 178 insertions(+), 132 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -XXX,XX +XXX,XX @@ struct RISCVCPUConfig {
     bool ext_zve32f;
     bool ext_zve64f;
     bool ext_zmmul;
+    bool rvv_ta_all_1s;
 
     uint32_t mvendorid;
     uint64_t marchid;
@@ -XXX,XX +XXX,XX @@ FIELD(TB_FLAGS, XL, 20, 2)
 /* If PointerMasking should be applied */
 FIELD(TB_FLAGS, PM_MASK_ENABLED, 22, 1)
 FIELD(TB_FLAGS, PM_BASE_ENABLED, 23, 1)
+FIELD(TB_FLAGS, VTA, 24, 1)
 
 #ifdef TARGET_RISCV32
 #define riscv_cpu_mxl(env)  ((void)(env), MXL_RV32)
diff --git a/target/riscv/internals.h b/target/riscv/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/internals.h
+++ b/target/riscv/internals.h
@@ -XXX,XX +XXX,XX @@
 /* share data between vector helpers and decode code */
 FIELD(VDATA, VM, 0, 1)
 FIELD(VDATA, LMUL, 1, 3)
-FIELD(VDATA, NF, 4, 4)
-FIELD(VDATA, WD, 4, 1)
+FIELD(VDATA, VTA, 4, 1)
+FIELD(VDATA, NF, 5, 4)
+FIELD(VDATA, WD, 5, 1)
 
 /* float point classify helpers */
 target_ulong fclass_h(uint64_t frs1);
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
         flags = FIELD_DP32(flags, TB_FLAGS, LMUL,
                     FIELD_EX64(env->vtype, VTYPE, VLMUL));
         flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
+        flags = FIELD_DP32(flags, TB_FLAGS, VTA,
+                    FIELD_EX64(env->vtype, VTYPE, VTA));
     } else {
         flags = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
     }
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
      */
     int8_t lmul;
     uint8_t sew;
+    uint8_t vta;
     target_ulong vstart;
     bool vl_eq_vlmax;
     uint8_t ntemp;
@@ -XXX,XX +XXX,XX @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     ctx->vill = FIELD_EX32(tb_flags, TB_FLAGS, VILL);
     ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
     ctx->lmul = sextract32(FIELD_EX32(tb_flags, TB_FLAGS, LMUL), 0, 3);
+    ctx->vta = FIELD_EX32(tb_flags, TB_FLAGS, VTA) && cpu->cfg.rvv_ta_all_1s;
     ctx->vstart = env->vstart;
     ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
     ctx->misa_mxl_max = env->misa_mxl_max;
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ static inline int32_t vext_lmul(uint32_t desc)
     return sextract32(FIELD_EX32(simd_data(desc), VDATA, LMUL), 0, 3);
 }
 
+static inline uint32_t vext_vta(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, VTA);
+}
+
 /*
  * Get the maximum number of elements can be operated.
  *
@@ -XXX,XX +XXX,XX @@ static inline uint32_t vext_max_elems(uint32_t desc, uint32_t log2_esz)
     return scale < 0 ? vlenb >> -scale : vlenb << scale;
 }
 
+/*
+ * Get number of total elements, including prestart, body and tail elements.
+ * Note that when LMUL < 1, the tail includes the elements past VLMAX that
+ * are held in the same vector register.
+ */
+static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
+                                            uint32_t esz)
+{
+    uint32_t vlenb = simd_maxsz(desc);
+    uint32_t sew = 1 << FIELD_EX64(env->vtype, VTYPE, VSEW);
+    int8_t emul = ctzl(esz) - ctzl(sew) + vext_lmul(desc) < 0 ? 0 :
+                  ctzl(esz) - ctzl(sew) + vext_lmul(desc);
+    return (vlenb << emul) / esz;
+}
+
 static inline target_ulong adjust_addr(CPURISCVState *env, target_ulong addr)
 {
     return (addr & env->cur_pmmask) | env->cur_pmbase;
@@ -XXX,XX +XXX,XX @@ static void probe_pages(CPURISCVState *env, target_ulong addr,
     }
 }
 
+/* set agnostic elements to 1s */
+static void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
+                              uint32_t tot)
+{
+    if (is_agnostic == 0) {
+        /* policy undisturbed */
+        return;
+    }
+    if (tot - cnt == 0) {
+        return ;
+    }
+    memset(base + cnt, -1, tot - cnt);
+}
+
 static inline void vext_set_elem_mask(void *v0, int index,
                                       uint8_t value)
 {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vsub_vv_d, OP_SSS_D, H8, H8, H8, DO_SUB)
 
 static void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
                        CPURISCVState *env, uint32_t desc,
-                       opivv2_fn *fn)
+                       opivv2_fn *fn, uint32_t esz)
 {
     uint32_t vm = vext_vm(desc);
     uint32_t vl = env->vl;
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+    uint32_t vta = vext_vta(desc);
     uint32_t i;
 
     for (i = env->vstart; i < vl; i++) {
@@ -XXX,XX +XXX,XX @@ static void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
         fn(vd, vs1, vs2, i);
     }
     env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
 }
 
 /* generate the helpers for OPIVV */
-#define GEN_VEXT_VV(NAME)                                 \
+#define GEN_VEXT_VV(NAME, ESZ)                            \
 void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
                   void *vs2, CPURISCVState *env,          \
                   uint32_t desc)                          \
 {                                                         \
     do_vext_vv(vd, v0, vs1, vs2, env, desc,               \
-               do_##NAME);                                \
+               do_##NAME, ESZ);                           \
 }
 
-GEN_VEXT_VV(vadd_vv_b)
-GEN_VEXT_VV(vadd_vv_h)
-GEN_VEXT_VV(vadd_vv_w)
-GEN_VEXT_VV(vadd_vv_d)
-GEN_VEXT_VV(vsub_vv_b)
-GEN_VEXT_VV(vsub_vv_h)
-GEN_VEXT_VV(vsub_vv_w)
-GEN_VEXT_VV(vsub_vv_d)
+GEN_VEXT_VV(vadd_vv_b, 1)
+GEN_VEXT_VV(vadd_vv_h, 2)
+GEN_VEXT_VV(vadd_vv_w, 4)
+GEN_VEXT_VV(vadd_vv_d, 8)
+GEN_VEXT_VV(vsub_vv_b, 1)
+GEN_VEXT_VV(vsub_vv_h, 2)
+GEN_VEXT_VV(vsub_vv_w, 4)
+GEN_VEXT_VV(vsub_vv_d, 8)
 
 typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
 
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vwadd_wv_w, WOP_WSSS_W, H8, H4, H4, DO_ADD)
 RVVCALL(OPIVV2, vwsub_wv_b, WOP_WSSS_B, H2, H1, H1, DO_SUB)
 RVVCALL(OPIVV2, vwsub_wv_h, WOP_WSSS_H, H4, H2, H2, DO_SUB)
 RVVCALL(OPIVV2, vwsub_wv_w, WOP_WSSS_W, H8, H4, H4, DO_SUB)
-GEN_VEXT_VV(vwaddu_vv_b)
-GEN_VEXT_VV(vwaddu_vv_h)
-GEN_VEXT_VV(vwaddu_vv_w)
-GEN_VEXT_VV(vwsubu_vv_b)
-GEN_VEXT_VV(vwsubu_vv_h)
-GEN_VEXT_VV(vwsubu_vv_w)
-GEN_VEXT_VV(vwadd_vv_b)
-GEN_VEXT_VV(vwadd_vv_h)
-GEN_VEXT_VV(vwadd_vv_w)
-GEN_VEXT_VV(vwsub_vv_b)
-GEN_VEXT_VV(vwsub_vv_h)
-GEN_VEXT_VV(vwsub_vv_w)
-GEN_VEXT_VV(vwaddu_wv_b)
-GEN_VEXT_VV(vwaddu_wv_h)
-GEN_VEXT_VV(vwaddu_wv_w)
-GEN_VEXT_VV(vwsubu_wv_b)
-GEN_VEXT_VV(vwsubu_wv_h)
-GEN_VEXT_VV(vwsubu_wv_w)
-GEN_VEXT_VV(vwadd_wv_b)
-GEN_VEXT_VV(vwadd_wv_h)
-GEN_VEXT_VV(vwadd_wv_w)
-GEN_VEXT_VV(vwsub_wv_b)
-GEN_VEXT_VV(vwsub_wv_h)
-GEN_VEXT_VV(vwsub_wv_w)
+GEN_VEXT_VV(vwaddu_vv_b, 2)
+GEN_VEXT_VV(vwaddu_vv_h, 4)
+GEN_VEXT_VV(vwaddu_vv_w, 8)
+GEN_VEXT_VV(vwsubu_vv_b, 2)
+GEN_VEXT_VV(vwsubu_vv_h, 4)
+GEN_VEXT_VV(vwsubu_vv_w, 8)
+GEN_VEXT_VV(vwadd_vv_b, 2)
+GEN_VEXT_VV(vwadd_vv_h, 4)
+GEN_VEXT_VV(vwadd_vv_w, 8)
+GEN_VEXT_VV(vwsub_vv_b, 2)
+GEN_VEXT_VV(vwsub_vv_h, 4)
+GEN_VEXT_VV(vwsub_vv_w, 8)
+GEN_VEXT_VV(vwaddu_wv_b, 2)
+GEN_VEXT_VV(vwaddu_wv_h, 4)
+GEN_VEXT_VV(vwaddu_wv_w, 8)
+GEN_VEXT_VV(vwsubu_wv_b, 2)
+GEN_VEXT_VV(vwsubu_wv_h, 4)
+GEN_VEXT_VV(vwsubu_wv_w, 8)
+GEN_VEXT_VV(vwadd_wv_b, 2)
+GEN_VEXT_VV(vwadd_wv_h, 4)
+GEN_VEXT_VV(vwadd_wv_w, 8)
+GEN_VEXT_VV(vwsub_wv_b, 2)
+GEN_VEXT_VV(vwsub_wv_h, 4)
+GEN_VEXT_VV(vwsub_wv_w, 8)
 
 RVVCALL(OPIVX2, vwaddu_vx_b, WOP_UUU_B, H2, H1, DO_ADD)
 RVVCALL(OPIVX2, vwaddu_vx_h, WOP_UUU_H, H4, H2, DO_ADD)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vxor_vv_b, OP_SSS_B, H1, H1, H1, DO_XOR)
 RVVCALL(OPIVV2, vxor_vv_h, OP_SSS_H, H2, H2, H2, DO_XOR)
 RVVCALL(OPIVV2, vxor_vv_w, OP_SSS_W, H4, H4, H4, DO_XOR)
 RVVCALL(OPIVV2, vxor_vv_d, OP_SSS_D, H8, H8, H8, DO_XOR)
-GEN_VEXT_VV(vand_vv_b)
-GEN_VEXT_VV(vand_vv_h)
-GEN_VEXT_VV(vand_vv_w)
-GEN_VEXT_VV(vand_vv_d)
-GEN_VEXT_VV(vor_vv_b)
-GEN_VEXT_VV(vor_vv_h)
-GEN_VEXT_VV(vor_vv_w)
-GEN_VEXT_VV(vor_vv_d)
-GEN_VEXT_VV(vxor_vv_b)
-GEN_VEXT_VV(vxor_vv_h)
-GEN_VEXT_VV(vxor_vv_w)
-GEN_VEXT_VV(vxor_vv_d)
+GEN_VEXT_VV(vand_vv_b, 1)
+GEN_VEXT_VV(vand_vv_h, 2)
+GEN_VEXT_VV(vand_vv_w, 4)
+GEN_VEXT_VV(vand_vv_d, 8)
+GEN_VEXT_VV(vor_vv_b, 1)
+GEN_VEXT_VV(vor_vv_h, 2)
+GEN_VEXT_VV(vor_vv_w, 4)
+GEN_VEXT_VV(vor_vv_d, 8)
+GEN_VEXT_VV(vxor_vv_b, 1)
+GEN_VEXT_VV(vxor_vv_h, 2)
+GEN_VEXT_VV(vxor_vv_w, 4)
+GEN_VEXT_VV(vxor_vv_d, 8)
 
 RVVCALL(OPIVX2, vand_vx_b, OP_SSS_B, H1, H1, DO_AND)
 RVVCALL(OPIVX2, vand_vx_h, OP_SSS_H, H2, H2, DO_AND)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vmax_vv_b, OP_SSS_B, H1, H1, H1, DO_MAX)
 RVVCALL(OPIVV2, vmax_vv_h, OP_SSS_H, H2, H2, H2, DO_MAX)
 RVVCALL(OPIVV2, vmax_vv_w, OP_SSS_W, H4, H4, H4, DO_MAX)
 RVVCALL(OPIVV2, vmax_vv_d, OP_SSS_D, H8, H8, H8, DO_MAX)
-GEN_VEXT_VV(vminu_vv_b)
-GEN_VEXT_VV(vminu_vv_h)
-GEN_VEXT_VV(vminu_vv_w)
-GEN_VEXT_VV(vminu_vv_d)
-GEN_VEXT_VV(vmin_vv_b)
-GEN_VEXT_VV(vmin_vv_h)
-GEN_VEXT_VV(vmin_vv_w)
-GEN_VEXT_VV(vmin_vv_d)
-GEN_VEXT_VV(vmaxu_vv_b)
-GEN_VEXT_VV(vmaxu_vv_h)
-GEN_VEXT_VV(vmaxu_vv_w)
-GEN_VEXT_VV(vmaxu_vv_d)
-GEN_VEXT_VV(vmax_vv_b)
-GEN_VEXT_VV(vmax_vv_h)
-GEN_VEXT_VV(vmax_vv_w)
-GEN_VEXT_VV(vmax_vv_d)
+GEN_VEXT_VV(vminu_vv_b, 1)
+GEN_VEXT_VV(vminu_vv_h, 2)
+GEN_VEXT_VV(vminu_vv_w, 4)
+GEN_VEXT_VV(vminu_vv_d, 8)
+GEN_VEXT_VV(vmin_vv_b, 1)
+GEN_VEXT_VV(vmin_vv_h, 2)
+GEN_VEXT_VV(vmin_vv_w, 4)
+GEN_VEXT_VV(vmin_vv_d, 8)
+GEN_VEXT_VV(vmaxu_vv_b, 1)
+GEN_VEXT_VV(vmaxu_vv_h, 2)
+GEN_VEXT_VV(vmaxu_vv_w, 4)
+GEN_VEXT_VV(vmaxu_vv_d, 8)
+GEN_VEXT_VV(vmax_vv_b, 1)
+GEN_VEXT_VV(vmax_vv_h, 2)
+GEN_VEXT_VV(vmax_vv_w, 4)
+GEN_VEXT_VV(vmax_vv_d, 8)
 
 RVVCALL(OPIVX2, vminu_vx_b, OP_UUU_B, H1, H1, DO_MIN)
 RVVCALL(OPIVX2, vminu_vx_h, OP_UUU_H, H2, H2, DO_MIN)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vmul_vv_b, OP_SSS_B, H1, H1, H1, DO_MUL)
 RVVCALL(OPIVV2, vmul_vv_h, OP_SSS_H, H2, H2, H2, DO_MUL)
 RVVCALL(OPIVV2, vmul_vv_w, OP_SSS_W, H4, H4, H4, DO_MUL)
 RVVCALL(OPIVV2, vmul_vv_d, OP_SSS_D, H8, H8, H8, DO_MUL)
-GEN_VEXT_VV(vmul_vv_b)
-GEN_VEXT_VV(vmul_vv_h)
-GEN_VEXT_VV(vmul_vv_w)
-GEN_VEXT_VV(vmul_vv_d)
+GEN_VEXT_VV(vmul_vv_b, 1)
+GEN_VEXT_VV(vmul_vv_h, 2)
+GEN_VEXT_VV(vmul_vv_w, 4)
+GEN_VEXT_VV(vmul_vv_d, 8)
 
 static int8_t do_mulh_b(int8_t s2, int8_t s1)
 {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vmulhsu_vv_b, OP_SUS_B, H1, H1, H1, do_mulhsu_b)
 RVVCALL(OPIVV2, vmulhsu_vv_h, OP_SUS_H, H2, H2, H2, do_mulhsu_h)
 RVVCALL(OPIVV2, vmulhsu_vv_w, OP_SUS_W, H4, H4, H4, do_mulhsu_w)
 RVVCALL(OPIVV2, vmulhsu_vv_d, OP_SUS_D, H8, H8, H8, do_mulhsu_d)
-GEN_VEXT_VV(vmulh_vv_b)
-GEN_VEXT_VV(vmulh_vv_h)
-GEN_VEXT_VV(vmulh_vv_w)
-GEN_VEXT_VV(vmulh_vv_d)
-GEN_VEXT_VV(vmulhu_vv_b)
-GEN_VEXT_VV(vmulhu_vv_h)
-GEN_VEXT_VV(vmulhu_vv_w)
-GEN_VEXT_VV(vmulhu_vv_d)
-GEN_VEXT_VV(vmulhsu_vv_b)
-GEN_VEXT_VV(vmulhsu_vv_h)
-GEN_VEXT_VV(vmulhsu_vv_w)
-GEN_VEXT_VV(vmulhsu_vv_d)
+GEN_VEXT_VV(vmulh_vv_b, 1)
+GEN_VEXT_VV(vmulh_vv_h, 2)
+GEN_VEXT_VV(vmulh_vv_w, 4)
+GEN_VEXT_VV(vmulh_vv_d, 8)
+GEN_VEXT_VV(vmulhu_vv_b, 1)
+GEN_VEXT_VV(vmulhu_vv_h, 2)
+GEN_VEXT_VV(vmulhu_vv_w, 4)
+GEN_VEXT_VV(vmulhu_vv_d, 8)
+GEN_VEXT_VV(vmulhsu_vv_b, 1)
+GEN_VEXT_VV(vmulhsu_vv_h, 2)
+GEN_VEXT_VV(vmulhsu_vv_w, 4)
+GEN_VEXT_VV(vmulhsu_vv_d, 8)
 
 RVVCALL(OPIVX2, vmul_vx_b, OP_SSS_B, H1, H1, DO_MUL)
 RVVCALL(OPIVX2, vmul_vx_h, OP_SSS_H, H2, H2, DO_MUL)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vrem_vv_b, OP_SSS_B, H1, H1, H1, DO_REM)
 RVVCALL(OPIVV2, vrem_vv_h, OP_SSS_H, H2, H2, H2, DO_REM)
 RVVCALL(OPIVV2, vrem_vv_w, OP_SSS_W, H4, H4, H4, DO_REM)
 RVVCALL(OPIVV2, vrem_vv_d, OP_SSS_D, H8, H8, H8, DO_REM)
-GEN_VEXT_VV(vdivu_vv_b)
-GEN_VEXT_VV(vdivu_vv_h)
-GEN_VEXT_VV(vdivu_vv_w)
-GEN_VEXT_VV(vdivu_vv_d)
-GEN_VEXT_VV(vdiv_vv_b)
-GEN_VEXT_VV(vdiv_vv_h)
-GEN_VEXT_VV(vdiv_vv_w)
-GEN_VEXT_VV(vdiv_vv_d)
-GEN_VEXT_VV(vremu_vv_b)
-GEN_VEXT_VV(vremu_vv_h)
-GEN_VEXT_VV(vremu_vv_w)
-GEN_VEXT_VV(vremu_vv_d)
-GEN_VEXT_VV(vrem_vv_b)
-GEN_VEXT_VV(vrem_vv_h)
-GEN_VEXT_VV(vrem_vv_w)
-GEN_VEXT_VV(vrem_vv_d)
+GEN_VEXT_VV(vdivu_vv_b, 1)
+GEN_VEXT_VV(vdivu_vv_h, 2)
+GEN_VEXT_VV(vdivu_vv_w, 4)
+GEN_VEXT_VV(vdivu_vv_d, 8)
+GEN_VEXT_VV(vdiv_vv_b, 1)
+GEN_VEXT_VV(vdiv_vv_h, 2)
+GEN_VEXT_VV(vdiv_vv_w, 4)
+GEN_VEXT_VV(vdiv_vv_d, 8)
+GEN_VEXT_VV(vremu_vv_b, 1)
+GEN_VEXT_VV(vremu_vv_h, 2)
+GEN_VEXT_VV(vremu_vv_w, 4)
+GEN_VEXT_VV(vremu_vv_d, 8)
+GEN_VEXT_VV(vrem_vv_b, 1)
+GEN_VEXT_VV(vrem_vv_h, 2)
+GEN_VEXT_VV(vrem_vv_w, 4)
+GEN_VEXT_VV(vrem_vv_d, 8)
 
 RVVCALL(OPIVX2, vdivu_vx_b, OP_UUU_B, H1, H1, DO_DIVU)
 RVVCALL(OPIVX2, vdivu_vx_h, OP_UUU_H, H2, H2, DO_DIVU)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vwmulu_vv_w, WOP_UUU_W, H8, H4, H4, DO_MUL)
 RVVCALL(OPIVV2, vwmulsu_vv_b, WOP_SUS_B, H2, H1, H1, DO_MUL)
 RVVCALL(OPIVV2, vwmulsu_vv_h, WOP_SUS_H, H4, H2, H2, DO_MUL)
 RVVCALL(OPIVV2, vwmulsu_vv_w, WOP_SUS_W, H8, H4, H4, DO_MUL)
-GEN_VEXT_VV(vwmul_vv_b)
-GEN_VEXT_VV(vwmul_vv_h)
-GEN_VEXT_VV(vwmul_vv_w)
-GEN_VEXT_VV(vwmulu_vv_b)
-GEN_VEXT_VV(vwmulu_vv_h)
-GEN_VEXT_VV(vwmulu_vv_w)
-GEN_VEXT_VV(vwmulsu_vv_b)
-GEN_VEXT_VV(vwmulsu_vv_h)
-GEN_VEXT_VV(vwmulsu_vv_w)
+GEN_VEXT_VV(vwmul_vv_b, 2)
+GEN_VEXT_VV(vwmul_vv_h, 4)
+GEN_VEXT_VV(vwmul_vv_w, 8)
+GEN_VEXT_VV(vwmulu_vv_b, 2)
+GEN_VEXT_VV(vwmulu_vv_h, 4)
+GEN_VEXT_VV(vwmulu_vv_w, 8)
+GEN_VEXT_VV(vwmulsu_vv_b, 2)
+GEN_VEXT_VV(vwmulsu_vv_h, 4)
+GEN_VEXT_VV(vwmulsu_vv_w, 8)
 
 RVVCALL(OPIVX2, vwmul_vx_b, WOP_SSS_B, H2, H1, DO_MUL)
 RVVCALL(OPIVX2, vwmul_vx_h, WOP_SSS_H, H4, H2, DO_MUL)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV3, vnmsub_vv_b, OP_SSS_B, H1, H1, H1, DO_NMSUB)
 RVVCALL(OPIVV3, vnmsub_vv_h, OP_SSS_H, H2, H2, H2, DO_NMSUB)
 RVVCALL(OPIVV3, vnmsub_vv_w, OP_SSS_W, H4, H4, H4, DO_NMSUB)
 RVVCALL(OPIVV3, vnmsub_vv_d, OP_SSS_D, H8, H8, H8, DO_NMSUB)
-GEN_VEXT_VV(vmacc_vv_b)
-GEN_VEXT_VV(vmacc_vv_h)
-GEN_VEXT_VV(vmacc_vv_w)
-GEN_VEXT_VV(vmacc_vv_d)
-GEN_VEXT_VV(vnmsac_vv_b)
-GEN_VEXT_VV(vnmsac_vv_h)
-GEN_VEXT_VV(vnmsac_vv_w)
-GEN_VEXT_VV(vnmsac_vv_d)
-GEN_VEXT_VV(vmadd_vv_b)
-GEN_VEXT_VV(vmadd_vv_h)
-GEN_VEXT_VV(vmadd_vv_w)
-GEN_VEXT_VV(vmadd_vv_d)
-GEN_VEXT_VV(vnmsub_vv_b)
-GEN_VEXT_VV(vnmsub_vv_h)
-GEN_VEXT_VV(vnmsub_vv_w)
-GEN_VEXT_VV(vnmsub_vv_d)
+GEN_VEXT_VV(vmacc_vv_b, 1)
+GEN_VEXT_VV(vmacc_vv_h, 2)
+GEN_VEXT_VV(vmacc_vv_w, 4)
+GEN_VEXT_VV(vmacc_vv_d, 8)
+GEN_VEXT_VV(vnmsac_vv_b, 1)
+GEN_VEXT_VV(vnmsac_vv_h, 2)
+GEN_VEXT_VV(vnmsac_vv_w, 4)
+GEN_VEXT_VV(vnmsac_vv_d, 8)
+GEN_VEXT_VV(vmadd_vv_b, 1)
+GEN_VEXT_VV(vmadd_vv_h, 2)
+GEN_VEXT_VV(vmadd_vv_w, 4)
+GEN_VEXT_VV(vmadd_vv_d, 8)
+GEN_VEXT_VV(vnmsub_vv_b, 1)
+GEN_VEXT_VV(vnmsub_vv_h, 2)
+GEN_VEXT_VV(vnmsub_vv_w, 4)
+GEN_VEXT_VV(vnmsub_vv_d, 8)
 
 #define OPIVX3(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
 static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV3, vwmacc_vv_w, WOP_SSS_W, H8, H4, H4, DO_MACC)
 RVVCALL(OPIVV3, vwmaccsu_vv_b, WOP_SSU_B, H2, H1, H1, DO_MACC)
 RVVCALL(OPIVV3, vwmaccsu_vv_h, WOP_SSU_H, H4, H2, H2, DO_MACC)
 RVVCALL(OPIVV3, vwmaccsu_vv_w, WOP_SSU_W, H8, H4, H4, DO_MACC)
-GEN_VEXT_VV(vwmaccu_vv_b)
-GEN_VEXT_VV(vwmaccu_vv_h)
-GEN_VEXT_VV(vwmaccu_vv_w)
-GEN_VEXT_VV(vwmacc_vv_b)
-GEN_VEXT_VV(vwmacc_vv_h)
-GEN_VEXT_VV(vwmacc_vv_w)
-GEN_VEXT_VV(vwmaccsu_vv_b)
-GEN_VEXT_VV(vwmaccsu_vv_h)
-GEN_VEXT_VV(vwmaccsu_vv_w)
+GEN_VEXT_VV(vwmaccu_vv_b, 2)
+GEN_VEXT_VV(vwmaccu_vv_h, 4)
+GEN_VEXT_VV(vwmaccu_vv_w, 8)
+GEN_VEXT_VV(vwmacc_vv_b, 2)
+GEN_VEXT_VV(vwmacc_vv_h, 4)
+GEN_VEXT_VV(vwmacc_vv_w, 8)
+GEN_VEXT_VV(vwmaccsu_vv_b, 2)
+GEN_VEXT_VV(vwmaccsu_vv_h, 4)
+GEN_VEXT_VV(vwmaccsu_vv_w, 8)
 
 RVVCALL(OPIVX3, vwmaccu_vx_b, WOP_UUU_B, H2, H1, DO_MACC)
 RVVCALL(OPIVX3, vwmaccu_vx_h, WOP_UUU_H, H4, H2, DO_MACC)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -XXX,XX +XXX,XX @@ do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn *gvec_fn,
     tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
-    if (a->vm && s->vl_eq_vlmax) {
+    if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
         gvec_fn(s->sew, vreg_ofs(s, a->rd),
                 vreg_ofs(s, a->rs2), vreg_ofs(s, a->rs1),
                 MAXSZ(s), MAXSZ(s));
@@ -XXX,XX +XXX,XX @@ do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn *gvec_fn,
 
         data = FIELD_DP32(data, VDATA, VM, a->vm);
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);
         tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
                            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),
                            cpu_env, s->cfg_ptr->vlen / 8,
-- 
2.36.1

From: eopXD <yueh.ting.chen@gmail.com>

Destination register of unit-stride mask load and store instructions are
always written with a tail-agnostic policy.

A vector segment load / store instruction may contain fractional lmul
with nf * lmul > 1. The rest of the elements in the last register should
be treated as tail elements.

Signed-off-by: eop Chen <eop.chen@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <165449614532.19704.7000832880482980398-6@git.sr.ht>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/translate.c                |  2 +
 target/riscv/vector_helper.c            | 60 +++++++++++++++++++++++++
 target/riscv/insn_trans/trans_rvv.c.inc |  6 +++
 3 files changed, 68 insertions(+)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     int8_t lmul;
     uint8_t sew;
     uint8_t vta;
+    bool cfg_vta_all_1s;
     target_ulong vstart;
     bool vl_eq_vlmax;
     uint8_t ntemp;
@@ -XXX,XX +XXX,XX @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     ctx->sew = FIELD_EX32(tb_flags, TB_FLAGS, SEW);
     ctx->lmul = sextract32(FIELD_EX32(tb_flags, TB_FLAGS, LMUL), 0, 3);
     ctx->vta = FIELD_EX32(tb_flags, TB_FLAGS, VTA) && cpu->cfg.rvv_ta_all_1s;
+    ctx->cfg_vta_all_1s = cpu->cfg.rvv_ta_all_1s;
     ctx->vstart = env->vstart;
     ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
     ctx->misa_mxl_max = env->misa_mxl_max;
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
     uint32_t i, k;
     uint32_t nf = vext_nf(desc);
     uint32_t max_elems = vext_max_elems(desc, log2_esz);
+    uint32_t esz = 1 << log2_esz;
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+    uint32_t vta = vext_vta(desc);
 
     for (i = env->vstart; i < env->vl; i++, env->vstart++) {
         if (!vm && !vext_elem_mask(v0, i)) {
@@ -XXX,XX +XXX,XX @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
         }
     }
     env->vstart = 0;
+    /* set tail elements to 1s */
+    for (k = 0; k < nf; ++k) {
+        vext_set_elems_1s(vd, vta, (k * max_elems + env->vl) * esz,
+                          (k * max_elems + max_elems) * esz);
+    }
+    if (nf * max_elems % total_elems != 0) {
+        uint32_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
+        uint32_t registers_used =
+            ((nf * max_elems) * esz + (vlenb - 1)) / vlenb;
+        vext_set_elems_1s(vd, vta, (nf * max_elems) * esz,
+                          registers_used * vlenb);
+    }
 }
 
 #define GEN_VEXT_LD_STRIDE(NAME, ETYPE, LOAD_FN)                        \
@@ -XXX,XX +XXX,XX @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
     uint32_t i, k;
     uint32_t nf = vext_nf(desc);
     uint32_t max_elems = vext_max_elems(desc, log2_esz);
+    uint32_t esz = 1 << log2_esz;
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+    uint32_t vta = vext_vta(desc);
 
     /* load bytes from guest memory */
     for (i = env->vstart; i < evl; i++, env->vstart++) {
@@ -XXX,XX +XXX,XX @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
         }
     }
     env->vstart = 0;
+    /* set tail elements to 1s */
+    for (k = 0; k < nf; ++k) {
+        vext_set_elems_1s(vd, vta, (k * max_elems + evl) * esz,
+                          (k * max_elems + max_elems) * esz);
+    }
+    if (nf * max_elems % total_elems != 0) {
+        uint32_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
+        uint32_t registers_used =
+            ((nf * max_elems) * esz + (vlenb - 1)) / vlenb;
+        vext_set_elems_1s(vd, vta, (nf * max_elems) * esz,
+                          registers_used * vlenb);
+    }
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
     uint32_t nf = vext_nf(desc);
     uint32_t vm = vext_vm(desc);
     uint32_t max_elems = vext_max_elems(desc, log2_esz);
+    uint32_t esz = 1 << log2_esz;
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+    uint32_t vta = vext_vta(desc);
 
     /* load bytes from guest memory */
     for (i = env->vstart; i < env->vl; i++, env->vstart++) {
@@ -XXX,XX +XXX,XX @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
         }
     }
     env->vstart = 0;
+    /* set tail elements to 1s */
+    for (k = 0; k < nf; ++k) {
+        vext_set_elems_1s(vd, vta, (k * max_elems + env->vl) * esz,
+                          (k * max_elems + max_elems) * esz);
+    }
+    if (nf * max_elems % total_elems != 0) {
+        uint32_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
+        uint32_t registers_used =
+            ((nf * max_elems) * esz + (vlenb - 1)) / vlenb;
+        vext_set_elems_1s(vd, vta, (nf * max_elems) * esz,
+                          registers_used * vlenb);
+    }
 }
 
 #define GEN_VEXT_LD_INDEX(NAME, ETYPE, INDEX_FN, LOAD_FN)                  \
@@ -XXX,XX +XXX,XX @@ vext_ldff(void *vd, void *v0, target_ulong base,
     uint32_t nf = vext_nf(desc);
     uint32_t vm = vext_vm(desc);
     uint32_t max_elems = vext_max_elems(desc, log2_esz);
+    uint32_t esz = 1 << log2_esz;
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+    uint32_t vta = vext_vta(desc);
     target_ulong addr, offset, remain;
 
     /* probe every access*/
@@ -XXX,XX +XXX,XX @@ ProbeSuccess:
         }
     }
     env->vstart = 0;
+    /* set tail elements to 1s */
+    for (k = 0; k < nf; ++k) {
+        vext_set_elems_1s(vd, vta, (k * max_elems + env->vl) * esz,
+                          (k * max_elems + max_elems) * esz);
+    }
+    if (nf * max_elems % total_elems != 0) {
+        uint32_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
+        uint32_t registers_used =
+            ((nf * max_elems) * esz + (vlenb - 1)) / vlenb;
+        vext_set_elems_1s(vd, vta, (nf * max_elems) * esz,
+                          registers_used * vlenb);
+    }
 }
 
 #define GEN_VEXT_LDFF(NAME, ETYPE, LOAD_FN)               \
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -XXX,XX +XXX,XX @@ static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t eew)
     data = FIELD_DP32(data, VDATA, VM, a->vm);
     data = FIELD_DP32(data, VDATA, LMUL, emul);
     data = FIELD_DP32(data, VDATA, NF, a->nf);
+    data = FIELD_DP32(data, VDATA, VTA, s->vta);
     return ldst_us_trans(a->rd, a->rs1, data, fn, s, false);
 }
 
@@ -XXX,XX +XXX,XX @@ static bool ld_us_mask_op(DisasContext *s, arg_vlm_v *a, uint8_t eew)
     /* EMUL = 1, NFIELDS = 1 */
     data = FIELD_DP32(data, VDATA, LMUL, 0);
     data = FIELD_DP32(data, VDATA, NF, 1);
+    /* Mask destination register are always tail-agnostic */
+    data = FIELD_DP32(data, VDATA, VTA, s->cfg_vta_all_1s);
     return ldst_us_trans(a->rd, a->rs1, data, fn, s, false);
 }
 
@@ -XXX,XX +XXX,XX @@ static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t eew)
     data = FIELD_DP32(data, VDATA, VM, a->vm);
     data = FIELD_DP32(data, VDATA, LMUL, emul);
     data = FIELD_DP32(data, VDATA, NF, a->nf);
+    data = FIELD_DP32(data, VDATA, VTA, s->vta);
     return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s, false);
 }
 
@@ -XXX,XX +XXX,XX @@ static bool ld_index_op(DisasContext *s, arg_rnfvm *a, uint8_t eew)
     data = FIELD_DP32(data, VDATA, VM, a->vm);
     data = FIELD_DP32(data, VDATA, LMUL, emul);
     data = FIELD_DP32(data, VDATA, NF, a->nf);
+    data = FIELD_DP32(data, VDATA, VTA, s->vta);
     return ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s, false);
 }
 
@@ -XXX,XX +XXX,XX @@ static bool ldff_op(DisasContext *s, arg_r2nfvm *a, uint8_t eew)
     data = FIELD_DP32(data, VDATA, VM, a->vm);
     data = FIELD_DP32(data, VDATA, LMUL, emul);
     data = FIELD_DP32(data, VDATA, NF, a->nf);
+    data = FIELD_DP32(data, VDATA, VTA, s->vta);
     return ldff_trans(a->rd, a->rs1, data, fn, s);
 }
 
-- 
2.36.1

From: eopXD <yueh.ting.chen@gmail.com>

`vmadc` and `vmsbc` produces a mask value, they always operate with
a tail agnostic policy.

Signed-off-by: eop Chen <eop.chen@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <165449614532.19704.7000832880482980398-7@git.sr.ht>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/internals.h                |   5 +-
 target/riscv/vector_helper.c            | 314 +++++++++++++-----------
 target/riscv/insn_trans/trans_rvv.c.inc |  13 +-
 3 files changed, 190 insertions(+), 142 deletions(-)

diff --git a/target/riscv/internals.h b/target/riscv/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/internals.h
+++ b/target/riscv/internals.h
@@ -XXX,XX +XXX,XX @@
 FIELD(VDATA, VM, 0, 1)
 FIELD(VDATA, LMUL, 1, 3)
 FIELD(VDATA, VTA, 4, 1)
-FIELD(VDATA, NF, 5, 4)
-FIELD(VDATA, WD, 5, 1)
+FIELD(VDATA, VTA_ALL_1S, 5, 1)
+FIELD(VDATA, NF, 6, 4)
+FIELD(VDATA, WD, 6, 1)
 
 /* float point classify helpers */
 target_ulong fclass_h(uint64_t frs1);
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint32_t vext_vta(uint32_t desc)
     return FIELD_EX32(simd_data(desc), VDATA, VTA);
 }
 
+static inline uint32_t vext_vta_all_1s(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, VTA_ALL_1S);
+}
+
 /*
  * Get the maximum number of elements can be operated.
  *
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vrsub_vx_d, OP_SSS_D, H8, H8, DO_RSUB)
 
 static void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
                        CPURISCVState *env, uint32_t desc,
-                       opivx2_fn fn)
+                       opivx2_fn fn, uint32_t esz)
 {
     uint32_t vm = vext_vm(desc);
     uint32_t vl = env->vl;
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+    uint32_t vta = vext_vta(desc);
     uint32_t i;
 
     for (i = env->vstart; i < vl; i++) {
@@ -XXX,XX +XXX,XX @@ static void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
         fn(vd, s1, vs2, i);
     }
     env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
 }
 
 /* generate the helpers for OPIVX */
-#define GEN_VEXT_VX(NAME)                                 \
+#define GEN_VEXT_VX(NAME, ESZ)                            \
 void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
                   void *vs2, CPURISCVState *env,          \
                   uint32_t desc)                          \
 {                                                         \
     do_vext_vx(vd, v0, s1, vs2, env, desc,                \
-               do_##NAME);                                \
-}
-
-GEN_VEXT_VX(vadd_vx_b)
-GEN_VEXT_VX(vadd_vx_h)
-GEN_VEXT_VX(vadd_vx_w)
-GEN_VEXT_VX(vadd_vx_d)
-GEN_VEXT_VX(vsub_vx_b)
-GEN_VEXT_VX(vsub_vx_h)
-GEN_VEXT_VX(vsub_vx_w)
-GEN_VEXT_VX(vsub_vx_d)
-GEN_VEXT_VX(vrsub_vx_b)
-GEN_VEXT_VX(vrsub_vx_h)
-GEN_VEXT_VX(vrsub_vx_w)
-GEN_VEXT_VX(vrsub_vx_d)
+               do_##NAME, ESZ);                           \
+}
+
+GEN_VEXT_VX(vadd_vx_b, 1)
+GEN_VEXT_VX(vadd_vx_h, 2)
+GEN_VEXT_VX(vadd_vx_w, 4)
+GEN_VEXT_VX(vadd_vx_d, 8)
+GEN_VEXT_VX(vsub_vx_b, 1)
+GEN_VEXT_VX(vsub_vx_h, 2)
+GEN_VEXT_VX(vsub_vx_w, 4)
+GEN_VEXT_VX(vsub_vx_d, 8)
+GEN_VEXT_VX(vrsub_vx_b, 1)
+GEN_VEXT_VX(vrsub_vx_h, 2)
+GEN_VEXT_VX(vrsub_vx_w, 4)
+GEN_VEXT_VX(vrsub_vx_d, 8)
 
 void HELPER(vec_rsubs8)(void *d, void *a, uint64_t b, uint32_t desc)
 {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vwadd_wx_w, WOP_WSSS_W, H8, H4, DO_ADD)
 RVVCALL(OPIVX2, vwsub_wx_b, WOP_WSSS_B, H2, H1, DO_SUB)
 RVVCALL(OPIVX2, vwsub_wx_h, WOP_WSSS_H, H4, H2, DO_SUB)
 RVVCALL(OPIVX2, vwsub_wx_w, WOP_WSSS_W, H8, H4, DO_SUB)
-GEN_VEXT_VX(vwaddu_vx_b)
-GEN_VEXT_VX(vwaddu_vx_h)
-GEN_VEXT_VX(vwaddu_vx_w)
-GEN_VEXT_VX(vwsubu_vx_b)
-GEN_VEXT_VX(vwsubu_vx_h)
-GEN_VEXT_VX(vwsubu_vx_w)
-GEN_VEXT_VX(vwadd_vx_b)
-GEN_VEXT_VX(vwadd_vx_h)
-GEN_VEXT_VX(vwadd_vx_w)
-GEN_VEXT_VX(vwsub_vx_b)
-GEN_VEXT_VX(vwsub_vx_h)
-GEN_VEXT_VX(vwsub_vx_w)
-GEN_VEXT_VX(vwaddu_wx_b)
-GEN_VEXT_VX(vwaddu_wx_h)
-GEN_VEXT_VX(vwaddu_wx_w)
-GEN_VEXT_VX(vwsubu_wx_b)
-GEN_VEXT_VX(vwsubu_wx_h)
-GEN_VEXT_VX(vwsubu_wx_w)
-GEN_VEXT_VX(vwadd_wx_b)
-GEN_VEXT_VX(vwadd_wx_h)
-GEN_VEXT_VX(vwadd_wx_w)
-GEN_VEXT_VX(vwsub_wx_b)
-GEN_VEXT_VX(vwsub_wx_h)
-GEN_VEXT_VX(vwsub_wx_w)
+GEN_VEXT_VX(vwaddu_vx_b, 2)
+GEN_VEXT_VX(vwaddu_vx_h, 4)
+GEN_VEXT_VX(vwaddu_vx_w, 8)
+GEN_VEXT_VX(vwsubu_vx_b, 2)
+GEN_VEXT_VX(vwsubu_vx_h, 4)
+GEN_VEXT_VX(vwsubu_vx_w, 8)
+GEN_VEXT_VX(vwadd_vx_b, 2)
+GEN_VEXT_VX(vwadd_vx_h, 4)
+GEN_VEXT_VX(vwadd_vx_w, 8)
+GEN_VEXT_VX(vwsub_vx_b, 2)
+GEN_VEXT_VX(vwsub_vx_h, 4)
+GEN_VEXT_VX(vwsub_vx_w, 8)
+GEN_VEXT_VX(vwaddu_wx_b, 2)
+GEN_VEXT_VX(vwaddu_wx_h, 4)
+GEN_VEXT_VX(vwaddu_wx_w, 8)
+GEN_VEXT_VX(vwsubu_wx_b, 2)
+GEN_VEXT_VX(vwsubu_wx_h, 4)
+GEN_VEXT_VX(vwsubu_wx_w, 8)
+GEN_VEXT_VX(vwadd_wx_b, 2)
+GEN_VEXT_VX(vwadd_wx_h, 4)
+GEN_VEXT_VX(vwadd_wx_w, 8)
+GEN_VEXT_VX(vwsub_wx_b, 2)
+GEN_VEXT_VX(vwsub_wx_h, 4)
+GEN_VEXT_VX(vwsub_wx_w, 8)
 
 /* Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions */
 #define DO_VADC(N, M, C) (N + M + C)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
                   CPURISCVState *env, uint32_t desc)          \
 {                                                             \
     uint32_t vl = env->vl;                                    \
+    uint32_t esz = sizeof(ETYPE);                             \
+    uint32_t total_elems =                                    \
+        vext_get_total_elems(env, desc, esz);                 \
+    uint32_t vta = vext_vta(desc);                            \
     uint32_t i;                                               \
                                                               \
     for (i = env->vstart; i < vl; i++) {                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
         *((ETYPE *)vd + H(i)) = DO_OP(s2, s1, carry);         \
     }                                                         \
     env->vstart = 0;                                          \
+    /* set tail elements to 1s */                             \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);  \
 }
 
 GEN_VEXT_VADC_VVM(vadc_vvm_b, uint8_t,  H1, DO_VADC)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,        \
                   CPURISCVState *env, uint32_t desc)                     \
 {                                                                        \
     uint32_t vl = env->vl;                                               \
+    uint32_t esz = sizeof(ETYPE);                                        \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);         \
+    uint32_t vta = vext_vta(desc);                                       \
     uint32_t i;                                                          \
                                                                          \
     for (i = env->vstart; i < vl; i++) {                                 \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,        \
         *((ETYPE *)vd + H(i)) = DO_OP(s2, (ETYPE)(target_long)s1, carry);\
     }                                                                    \
     env->vstart = 0;                                          \
+    /* set tail elements to 1s */                                        \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);             \
 }
 
 GEN_VEXT_VADC_VXM(vadc_vxm_b, uint8_t,  H1, DO_VADC)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
 {                                                             \
     uint32_t vl = env->vl;                                    \
     uint32_t vm = vext_vm(desc);                              \
+    uint32_t total_elems = env_archcpu(env)->cfg.vlen;        \
+    uint32_t vta_all_1s = vext_vta_all_1s(desc);              \
     uint32_t i;                                               \
                                                               \
     for (i = env->vstart; i < vl; i++) {                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
         vext_set_elem_mask(vd, i, DO_OP(s2, s1, carry));      \
     }                                                         \
     env->vstart = 0;                                          \
+    /* mask destination register are always tail-agnostic */  \
+    /* set tail elements to 1s */                             \
+    if (vta_all_1s) {                                         \
+        for (; i < total_elems; i++) {                        \
+            vext_set_elem_mask(vd, i, 1);                     \
+        }                                                     \
+    }                                                         \
 }
 
 GEN_VEXT_VMADC_VVM(vmadc_vvm_b, uint8_t,  H1, DO_MADC)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,          \
 {                                                               \
     uint32_t vl = env->vl;                                      \
     uint32_t vm = vext_vm(desc);                                \
+    uint32_t total_elems = env_archcpu(env)->cfg.vlen;          \
+    uint32_t vta_all_1s = vext_vta_all_1s(desc);                \
     uint32_t i;                                                 \
                                                                 \
     for (i = env->vstart; i < vl; i++) {                        \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,          \
                 DO_OP(s2, (ETYPE)(target_long)s1, carry));      \
     }                                                           \
     env->vstart = 0;                                            \
+    /* mask destination register are always tail-agnostic */    \
+    /* set tail elements to 1s */                               \
+    if (vta_all_1s) {                                           \
+        for (; i < total_elems; i++) {                          \
+            vext_set_elem_mask(vd, i, 1);                       \
+        }                                                       \
+    }                                                           \
 }
 
 GEN_VEXT_VMADC_VXM(vmadc_vxm_b, uint8_t,  H1, DO_MADC)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vxor_vx_b, OP_SSS_B, H1, H1, DO_XOR)
 RVVCALL(OPIVX2, vxor_vx_h, OP_SSS_H, H2, H2, DO_XOR)
 RVVCALL(OPIVX2, vxor_vx_w, OP_SSS_W, H4, H4, DO_XOR)
 RVVCALL(OPIVX2, vxor_vx_d, OP_SSS_D, H8, H8, DO_XOR)
-GEN_VEXT_VX(vand_vx_b)
-GEN_VEXT_VX(vand_vx_h)
-GEN_VEXT_VX(vand_vx_w)
-GEN_VEXT_VX(vand_vx_d)
-GEN_VEXT_VX(vor_vx_b)
-GEN_VEXT_VX(vor_vx_h)
-GEN_VEXT_VX(vor_vx_w)
-GEN_VEXT_VX(vor_vx_d)
-GEN_VEXT_VX(vxor_vx_b)
-GEN_VEXT_VX(vxor_vx_h)
-GEN_VEXT_VX(vxor_vx_w)
-GEN_VEXT_VX(vxor_vx_d)
+GEN_VEXT_VX(vand_vx_b, 1)
+GEN_VEXT_VX(vand_vx_h, 2)
+GEN_VEXT_VX(vand_vx_w, 4)
+GEN_VEXT_VX(vand_vx_d, 8)
+GEN_VEXT_VX(vor_vx_b, 1)
+GEN_VEXT_VX(vor_vx_h, 2)
+GEN_VEXT_VX(vor_vx_w, 4)
+GEN_VEXT_VX(vor_vx_d, 8)
+GEN_VEXT_VX(vxor_vx_b, 1)
+GEN_VEXT_VX(vxor_vx_h, 2)
+GEN_VEXT_VX(vxor_vx_w, 4)
+GEN_VEXT_VX(vxor_vx_d, 8)
 
 /* Vector Single-Width Bit Shift Instructions */
 #define DO_SLL(N, M)  (N << (M))
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vmax_vx_b, OP_SSS_B, H1, H1, DO_MAX)
 RVVCALL(OPIVX2, vmax_vx_h, OP_SSS_H, H2, H2, DO_MAX)
 RVVCALL(OPIVX2, vmax_vx_w, OP_SSS_W, H4, H4, DO_MAX)
 RVVCALL(OPIVX2, vmax_vx_d, OP_SSS_D, H8, H8, DO_MAX)
-GEN_VEXT_VX(vminu_vx_b)
-GEN_VEXT_VX(vminu_vx_h)
-GEN_VEXT_VX(vminu_vx_w)
-GEN_VEXT_VX(vminu_vx_d)
-GEN_VEXT_VX(vmin_vx_b)
-GEN_VEXT_VX(vmin_vx_h)
-GEN_VEXT_VX(vmin_vx_w)
-GEN_VEXT_VX(vmin_vx_d)
-GEN_VEXT_VX(vmaxu_vx_b)
-GEN_VEXT_VX(vmaxu_vx_h)
-GEN_VEXT_VX(vmaxu_vx_w)
-GEN_VEXT_VX(vmaxu_vx_d)
-GEN_VEXT_VX(vmax_vx_b)
-GEN_VEXT_VX(vmax_vx_h)
-GEN_VEXT_VX(vmax_vx_w)
-GEN_VEXT_VX(vmax_vx_d)
+GEN_VEXT_VX(vminu_vx_b, 1)
+GEN_VEXT_VX(vminu_vx_h, 2)
+GEN_VEXT_VX(vminu_vx_w, 4)
+GEN_VEXT_VX(vminu_vx_d, 8)
+GEN_VEXT_VX(vmin_vx_b, 1)
+GEN_VEXT_VX(vmin_vx_h, 2)
+GEN_VEXT_VX(vmin_vx_w, 4)
+GEN_VEXT_VX(vmin_vx_d, 8)
+GEN_VEXT_VX(vmaxu_vx_b, 1)
+GEN_VEXT_VX(vmaxu_vx_h, 2)
+GEN_VEXT_VX(vmaxu_vx_w, 4)
+GEN_VEXT_VX(vmaxu_vx_d, 8)
+GEN_VEXT_VX(vmax_vx_b, 1)
+GEN_VEXT_VX(vmax_vx_h, 2)
+GEN_VEXT_VX(vmax_vx_w, 4)
+GEN_VEXT_VX(vmax_vx_d, 8)
 
 /* Vector Single-Width Integer Multiply Instructions */
 #define DO_MUL(N, M) (N * M)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vmulhsu_vx_b, OP_SUS_B, H1, H1, do_mulhsu_b)
 RVVCALL(OPIVX2, vmulhsu_vx_h, OP_SUS_H, H2, H2, do_mulhsu_h)
 RVVCALL(OPIVX2, vmulhsu_vx_w, OP_SUS_W, H4, H4, do_mulhsu_w)
 RVVCALL(OPIVX2, vmulhsu_vx_d, OP_SUS_D, H8, H8, do_mulhsu_d)
-GEN_VEXT_VX(vmul_vx_b)
-GEN_VEXT_VX(vmul_vx_h)
-GEN_VEXT_VX(vmul_vx_w)
-GEN_VEXT_VX(vmul_vx_d)
-GEN_VEXT_VX(vmulh_vx_b)
-GEN_VEXT_VX(vmulh_vx_h)
-GEN_VEXT_VX(vmulh_vx_w)
-GEN_VEXT_VX(vmulh_vx_d)
-GEN_VEXT_VX(vmulhu_vx_b)
-GEN_VEXT_VX(vmulhu_vx_h)
-GEN_VEXT_VX(vmulhu_vx_w)
-GEN_VEXT_VX(vmulhu_vx_d)
-GEN_VEXT_VX(vmulhsu_vx_b)
-GEN_VEXT_VX(vmulhsu_vx_h)
-GEN_VEXT_VX(vmulhsu_vx_w)
-GEN_VEXT_VX(vmulhsu_vx_d)
+GEN_VEXT_VX(vmul_vx_b, 1)
+GEN_VEXT_VX(vmul_vx_h, 2)
+GEN_VEXT_VX(vmul_vx_w, 4)
+GEN_VEXT_VX(vmul_vx_d, 8)
+GEN_VEXT_VX(vmulh_vx_b, 1)
+GEN_VEXT_VX(vmulh_vx_h, 2)
+GEN_VEXT_VX(vmulh_vx_w, 4)
+GEN_VEXT_VX(vmulh_vx_d, 8)
+GEN_VEXT_VX(vmulhu_vx_b, 1)
+GEN_VEXT_VX(vmulhu_vx_h, 2)
+GEN_VEXT_VX(vmulhu_vx_w, 4)
+GEN_VEXT_VX(vmulhu_vx_d, 8)
+GEN_VEXT_VX(vmulhsu_vx_b, 1)
+GEN_VEXT_VX(vmulhsu_vx_h, 2)
+GEN_VEXT_VX(vmulhsu_vx_w, 4)
+GEN_VEXT_VX(vmulhsu_vx_d, 8)
 
 /* Vector Integer Divide Instructions */
 #define DO_DIVU(N, M) (unlikely(M == 0) ? (__typeof(N))(-1) : N / M)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vrem_vx_b, OP_SSS_B, H1, H1, DO_REM)
 RVVCALL(OPIVX2, vrem_vx_h, OP_SSS_H, H2, H2, DO_REM)
 RVVCALL(OPIVX2, vrem_vx_w, OP_SSS_W, H4, H4, DO_REM)
 RVVCALL(OPIVX2, vrem_vx_d, OP_SSS_D, H8, H8, DO_REM)
-GEN_VEXT_VX(vdivu_vx_b)
-GEN_VEXT_VX(vdivu_vx_h)
-GEN_VEXT_VX(vdivu_vx_w)
-GEN_VEXT_VX(vdivu_vx_d)
-GEN_VEXT_VX(vdiv_vx_b)
-GEN_VEXT_VX(vdiv_vx_h)
-GEN_VEXT_VX(vdiv_vx_w)
-GEN_VEXT_VX(vdiv_vx_d)
-GEN_VEXT_VX(vremu_vx_b)
-GEN_VEXT_VX(vremu_vx_h)
-GEN_VEXT_VX(vremu_vx_w)
-GEN_VEXT_VX(vremu_vx_d)
-GEN_VEXT_VX(vrem_vx_b)
-GEN_VEXT_VX(vrem_vx_h)
-GEN_VEXT_VX(vrem_vx_w)
-GEN_VEXT_VX(vrem_vx_d)
+GEN_VEXT_VX(vdivu_vx_b, 1)
+GEN_VEXT_VX(vdivu_vx_h, 2)
+GEN_VEXT_VX(vdivu_vx_w, 4)
+GEN_VEXT_VX(vdivu_vx_d, 8)
+GEN_VEXT_VX(vdiv_vx_b, 1)
+GEN_VEXT_VX(vdiv_vx_h, 2)
+GEN_VEXT_VX(vdiv_vx_w, 4)
+GEN_VEXT_VX(vdiv_vx_d, 8)
+GEN_VEXT_VX(vremu_vx_b, 1)
+GEN_VEXT_VX(vremu_vx_h, 2)
+GEN_VEXT_VX(vremu_vx_w, 4)
+GEN_VEXT_VX(vremu_vx_d, 8)
+GEN_VEXT_VX(vrem_vx_b, 1)
+GEN_VEXT_VX(vrem_vx_h, 2)
+GEN_VEXT_VX(vrem_vx_w, 4)
+GEN_VEXT_VX(vrem_vx_d, 8)
 
 /* Vector Widening Integer Multiply Instructions */
 RVVCALL(OPIVV2, vwmul_vv_b, WOP_SSS_B, H2, H1, H1, DO_MUL)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vwmulu_vx_w, WOP_UUU_W, H8, H4, DO_MUL)
 RVVCALL(OPIVX2, vwmulsu_vx_b, WOP_SUS_B, H2, H1, DO_MUL)
 RVVCALL(OPIVX2, vwmulsu_vx_h, WOP_SUS_H, H4, H2, DO_MUL)
 RVVCALL(OPIVX2, vwmulsu_vx_w, WOP_SUS_W, H8, H4, DO_MUL)
-GEN_VEXT_VX(vwmul_vx_b)
-GEN_VEXT_VX(vwmul_vx_h)
-GEN_VEXT_VX(vwmul_vx_w)
-GEN_VEXT_VX(vwmulu_vx_b)
-GEN_VEXT_VX(vwmulu_vx_h)
-GEN_VEXT_VX(vwmulu_vx_w)
-GEN_VEXT_VX(vwmulsu_vx_b)
-GEN_VEXT_VX(vwmulsu_vx_h)
-GEN_VEXT_VX(vwmulsu_vx_w)
+GEN_VEXT_VX(vwmul_vx_b, 2)
+GEN_VEXT_VX(vwmul_vx_h, 4)
+GEN_VEXT_VX(vwmul_vx_w, 8)
+GEN_VEXT_VX(vwmulu_vx_b, 2)
+GEN_VEXT_VX(vwmulu_vx_h, 4)
+GEN_VEXT_VX(vwmulu_vx_w, 8)
+GEN_VEXT_VX(vwmulsu_vx_b, 2)
+GEN_VEXT_VX(vwmulsu_vx_h, 4)
+GEN_VEXT_VX(vwmulsu_vx_w, 8)
 
 /* Vector Single-Width Integer Multiply-Add Instructions */
 #define OPIVV3(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)   \
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX3, vnmsub_vx_b, OP_SSS_B, H1, H1, DO_NMSUB)
 RVVCALL(OPIVX3, vnmsub_vx_h, OP_SSS_H, H2, H2, DO_NMSUB)
 RVVCALL(OPIVX3, vnmsub_vx_w, OP_SSS_W, H4, H4, DO_NMSUB)
 RVVCALL(OPIVX3, vnmsub_vx_d, OP_SSS_D, H8, H8, DO_NMSUB)
-GEN_VEXT_VX(vmacc_vx_b)
-GEN_VEXT_VX(vmacc_vx_h)
-GEN_VEXT_VX(vmacc_vx_w)
-GEN_VEXT_VX(vmacc_vx_d)
-GEN_VEXT_VX(vnmsac_vx_b)
-GEN_VEXT_VX(vnmsac_vx_h)
-GEN_VEXT_VX(vnmsac_vx_w)
-GEN_VEXT_VX(vnmsac_vx_d)
-GEN_VEXT_VX(vmadd_vx_b)
-GEN_VEXT_VX(vmadd_vx_h)
-GEN_VEXT_VX(vmadd_vx_w)
-GEN_VEXT_VX(vmadd_vx_d)
-GEN_VEXT_VX(vnmsub_vx_b)
-GEN_VEXT_VX(vnmsub_vx_h)
-GEN_VEXT_VX(vnmsub_vx_w)
-GEN_VEXT_VX(vnmsub_vx_d)
+GEN_VEXT_VX(vmacc_vx_b, 1)
+GEN_VEXT_VX(vmacc_vx_h, 2)
+GEN_VEXT_VX(vmacc_vx_w, 4)
+GEN_VEXT_VX(vmacc_vx_d, 8)
+GEN_VEXT_VX(vnmsac_vx_b, 1)
+GEN_VEXT_VX(vnmsac_vx_h, 2)
+GEN_VEXT_VX(vnmsac_vx_w, 4)
+GEN_VEXT_VX(vnmsac_vx_d, 8)
+GEN_VEXT_VX(vmadd_vx_b, 1)
+GEN_VEXT_VX(vmadd_vx_h, 2)
+GEN_VEXT_VX(vmadd_vx_w, 4)
+GEN_VEXT_VX(vmadd_vx_d, 8)
+GEN_VEXT_VX(vnmsub_vx_b, 1)
+GEN_VEXT_VX(vnmsub_vx_h, 2)
+GEN_VEXT_VX(vnmsub_vx_w, 4)
+GEN_VEXT_VX(vnmsub_vx_d, 8)
 
 /* Vector Widening Integer Multiply-Add Instructions */
 RVVCALL(OPIVV3, vwmaccu_vv_b, WOP_UUU_B, H2, H1, H1, DO_MACC)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX3, vwmaccsu_vx_w, WOP_SSU_W, H8, H4, DO_MACC)
 RVVCALL(OPIVX3, vwmaccus_vx_b, WOP_SUS_B, H2, H1, DO_MACC)
 RVVCALL(OPIVX3, vwmaccus_vx_h, WOP_SUS_H, H4, H2, DO_MACC)
 RVVCALL(OPIVX3, vwmaccus_vx_w, WOP_SUS_W, H8, H4, DO_MACC)
-GEN_VEXT_VX(vwmaccu_vx_b)
-GEN_VEXT_VX(vwmaccu_vx_h)
-GEN_VEXT_VX(vwmaccu_vx_w)
-GEN_VEXT_VX(vwmacc_vx_b)
-GEN_VEXT_VX(vwmacc_vx_h)
-GEN_VEXT_VX(vwmacc_vx_w)
-GEN_VEXT_VX(vwmaccsu_vx_b)
-GEN_VEXT_VX(vwmaccsu_vx_h)
-GEN_VEXT_VX(vwmaccsu_vx_w)
-GEN_VEXT_VX(vwmaccus_vx_b)
-GEN_VEXT_VX(vwmaccus_vx_h)
-GEN_VEXT_VX(vwmaccus_vx_w)
+GEN_VEXT_VX(vwmaccu_vx_b, 2)
+GEN_VEXT_VX(vwmaccu_vx_h, 4)
+GEN_VEXT_VX(vwmaccu_vx_w, 8)
+GEN_VEXT_VX(vwmacc_vx_b, 2)
+GEN_VEXT_VX(vwmacc_vx_h, 4)
+GEN_VEXT_VX(vwmacc_vx_w, 8)
+GEN_VEXT_VX(vwmaccsu_vx_b, 2)
+GEN_VEXT_VX(vwmaccsu_vx_h, 4)
+GEN_VEXT_VX(vwmaccsu_vx_w, 8)
+GEN_VEXT_VX(vwmaccus_vx_b, 2)
+GEN_VEXT_VX(vwmaccus_vx_h, 4)
+GEN_VEXT_VX(vwmaccus_vx_w, 8)
 
 /* Vector Integer Merge and Move Instructions */
 #define GEN_VEXT_VMV_VV(NAME, ETYPE, H)                              \
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -XXX,XX +XXX,XX @@ static bool opivx_trans(uint32_t vd, uint32_t rs1, uint32_t vs2, uint32_t vm,
 
     data = FIELD_DP32(data, VDATA, VM, vm);
     data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, VTA, s->vta);
+    data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);
     desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlen / 8,
                                       s->cfg_ptr->vlen / 8, data));
 
@@ -XXX,XX +XXX,XX @@ do_opivx_gvec(DisasContext *s, arg_rmrr *a, GVecGen2sFn *gvec_fn,
         return false;
     }
 
-    if (a->vm && s->vl_eq_vlmax) {
+    if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
         TCGv_i64 src1 = tcg_temp_new_i64();
 
         tcg_gen_ext_tl_i64(src1, get_gpr(s, a->rs1, EXT_SIGN));
@@ -XXX,XX +XXX,XX @@ static bool opivi_trans(uint32_t vd, uint32_t imm, uint32_t vs2, uint32_t vm,
 
     data = FIELD_DP32(data, VDATA, VM, vm);
     data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, VTA, s->vta);
+    data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);
     desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlen / 8,
                                       s->cfg_ptr->vlen / 8, data));
 
@@ -XXX,XX +XXX,XX @@ do_opivi_gvec(DisasContext *s, arg_rmrr *a, GVecGen2iFn *gvec_fn,
         return false;
     }
 
-    if (a->vm && s->vl_eq_vlmax) {
+    if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
         gvec_fn(s->sew, vreg_ofs(s, a->rd), vreg_ofs(s, a->rs2),
                 extract_imm(s, a->rs1, imm_mode), MAXSZ(s), MAXSZ(s));
         mark_vs_dirty(s);
@@ -XXX,XX +XXX,XX @@ static bool do_opivv_widen(DisasContext *s, arg_rmrr *a,
 
         data = FIELD_DP32(data, VDATA, VM, a->vm);
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);
         tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
                            vreg_ofs(s, a->rs1),
                            vreg_ofs(s, a->rs2),
@@ -XXX,XX +XXX,XX @@ static bool do_opiwv_widen(DisasContext *s, arg_rmrr *a,
 
         data = FIELD_DP32(data, VDATA, VM, a->vm);
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);
         tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
                            vreg_ofs(s, a->rs1),
                            vreg_ofs(s, a->rs2),
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
+        data =                                                     \
+            FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);\
         tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                            vreg_ofs(s, a->rs1),                    \
                            vreg_ofs(s, a->rs2), cpu_env,           \
-- 
2.36.1

From: eopXD <yueh.ting.chen@gmail.com>

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,                          \
 {                                                                         \
     uint32_t vm = vext_vm(desc);                                          \
     uint32_t vl = env->vl;                                                \
+    uint32_t esz = sizeof(TS1);                                           \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
+    uint32_t vta = vext_vta(desc);                                        \
     uint32_t i;                                                           \
                                                                           \
     for (i = env->vstart; i < vl; i++) {                                  \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,                          \
         *((TS1 *)vd + HS1(i)) = OP(s2, s1 & MASK);                        \
     }                                                                     \
     env->vstart = 0;                                                      \
+    /* set tail elements to 1s */                                         \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
 }
 
 GEN_VEXT_SHIFT_VV(vsll_vv_b, uint8_t,  uint8_t, H1, H1, DO_SLL, 0x7)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,      \
 {                                                           \
     uint32_t vm = vext_vm(desc);                            \
     uint32_t vl = env->vl;                                  \
+    uint32_t esz = sizeof(TD);                              \
+    uint32_t total_elems =                                  \
+        vext_get_total_elems(env, desc, esz);               \
+    uint32_t vta = vext_vta(desc);                          \
     uint32_t i;                                             \
                                                             \
     for (i = env->vstart; i < vl; i++) {                    \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,      \
         *((TD *)vd + HD(i)) = OP(s2, s1 & MASK);            \
     }                                                       \
     env->vstart = 0;                                        \
+    /* set tail elements to 1s */                           \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);\
 }
 
 GEN_VEXT_SHIFT_VX(vsll_vx_b, uint8_t, int8_t, H1, H1, DO_SLL, 0x7)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -XXX,XX +XXX,XX @@ do_opivx_gvec_shift(DisasContext *s, arg_rmrr *a, GVecGen2sFn32 *gvec_fn,
         return false;
     }
 
-    if (a->vm && s->vl_eq_vlmax) {
+    if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
         TCGv_i32 src1 = tcg_temp_new_i32();
 
         tcg_gen_trunc_tl_i32(src1, get_gpr(s, a->rs1, EXT_NONE));
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
         tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                            vreg_ofs(s, a->rs1),                    \
                            vreg_ofs(s, a->rs2), cpu_env,           \
-- 
2.36.1

From: eopXD <yueh.ting.chen@gmail.com>

Compares write mask registers, and so always operate under a tail-
agnostic policy.

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
 {                                                             \
     uint32_t vm = vext_vm(desc);                              \
     uint32_t vl = env->vl;                                    \
+    uint32_t total_elems = env_archcpu(env)->cfg.vlen;        \
+    uint32_t vta_all_1s = vext_vta_all_1s(desc);              \
     uint32_t i;                                               \
                                                               \
     for (i = env->vstart; i < vl; i++) {                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
         vext_set_elem_mask(vd, i, DO_OP(s2, s1));             \
     }                                                         \
     env->vstart = 0;                                          \
+    /* mask destination register are always tail-agnostic */  \
+    /* set tail elements to 1s */                             \
+    if (vta_all_1s) {                                         \
+        for (; i < total_elems; i++) {                        \
+            vext_set_elem_mask(vd, i, 1);                     \
+        }                                                     \
+    }                                                         \
 }
 
 GEN_VEXT_CMP_VV(vmseq_vv_b, uint8_t,  H1, DO_MSEQ)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,   \
 {                                                                   \
     uint32_t vm = vext_vm(desc);                                    \
     uint32_t vl = env->vl;                                          \
+    uint32_t total_elems = env_archcpu(env)->cfg.vlen;              \
+    uint32_t vta_all_1s = vext_vta_all_1s(desc);                    \
     uint32_t i;                                                     \
                                                                     \
     for (i = env->vstart; i < vl; i++) {                            \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,   \
                 DO_OP(s2, (ETYPE)(target_long)s1));                 \
     }                                                               \
     env->vstart = 0;                                                \
+    /* mask destination register are always tail-agnostic */        \
+    /* set tail elements to 1s */                                   \
+    if (vta_all_1s) {                                               \
+        for (; i < total_elems; i++) {                              \
+            vext_set_elem_mask(vd, i, 1);                           \
+        }                                                           \
+    }                                                               \
 }
 
 GEN_VEXT_CMP_VX(vmseq_vx_b, uint8_t,  H1, DO_MSEQ)
-- 
2.36.1

From: eopXD <yueh.ting.chen@gmail.com>

Signed-off-by: eop Chen <eop.chen@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <165449614532.19704.7000832880482980398-10@git.sr.ht>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/vector_helper.c            | 20 ++++++++++++++++++++
 target/riscv/insn_trans/trans_rvv.c.inc | 12 ++++++++----
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vs1, CPURISCVState *env,           \
                   uint32_t desc)                                     \
 {                                                                    \
     uint32_t vl = env->vl;                                           \
+    uint32_t esz = sizeof(ETYPE);                                    \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);     \
+    uint32_t vta = vext_vta(desc);                                   \
     uint32_t i;                                                      \
                                                                      \
     for (i = env->vstart; i < vl; i++) {                             \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vs1, CPURISCVState *env,           \
         *((ETYPE *)vd + H(i)) = s1;                                  \
     }                                                                \
     env->vstart = 0;                                                 \
+    /* set tail elements to 1s */                                    \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);         \
 }
 
 GEN_VEXT_VMV_VV(vmv_v_v_b, int8_t,  H1)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, uint64_t s1, CPURISCVState *env,         \
                   uint32_t desc)                                     \
 {                                                                    \
     uint32_t vl = env->vl;                                           \
+    uint32_t esz = sizeof(ETYPE);                                    \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);     \
+    uint32_t vta = vext_vta(desc);                                   \
     uint32_t i;                                                      \
                                                                      \
     for (i = env->vstart; i < vl; i++) {                             \
         *((ETYPE *)vd + H(i)) = (ETYPE)s1;                           \
     }                                                                \
     env->vstart = 0;                                                 \
+    /* set tail elements to 1s */                                    \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);         \
 }
 
 GEN_VEXT_VMV_VX(vmv_v_x_b, int8_t,  H1)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,          \
                   CPURISCVState *env, uint32_t desc)                 \
 {                                                                    \
     uint32_t vl = env->vl;                                           \
+    uint32_t esz = sizeof(ETYPE);                                    \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);     \
+    uint32_t vta = vext_vta(desc);                                   \
     uint32_t i;                                                      \
                                                                      \
     for (i = env->vstart; i < vl; i++) {                             \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,          \
         *((ETYPE *)vd + H(i)) = *(vt + H(i));                        \
     }                                                                \
     env->vstart = 0;                                                 \
+    /* set tail elements to 1s */                                    \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);         \
 }
 
 GEN_VEXT_VMERGE_VV(vmerge_vvm_b, int8_t,  H1)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,               \
                   void *vs2, CPURISCVState *env, uint32_t desc)      \
 {                                                                    \
     uint32_t vl = env->vl;                                           \
+    uint32_t esz = sizeof(ETYPE);                                    \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);     \
+    uint32_t vta = vext_vta(desc);                                   \
     uint32_t i;                                                      \
                                                                      \
     for (i = env->vstart; i < vl; i++) {                             \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,               \
         *((ETYPE *)vd + H(i)) = d;                                   \
     }                                                                \
     env->vstart = 0;                                                 \
+    /* set tail elements to 1s */                                    \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);         \
 }
 
 GEN_VEXT_VMERGE_VX(vmerge_vxm_b, int8_t,  H1)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_v(DisasContext *s, arg_vmv_v_v *a)
         vext_check_isa_ill(s) &&
         /* vmv.v.v has rs2 = 0 and vm = 1 */
         vext_check_sss(s, a->rd, a->rs1, 0, 1)) {
-        if (s->vl_eq_vlmax) {
+        if (s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
             tcg_gen_gvec_mov(s->sew, vreg_ofs(s, a->rd),
                              vreg_ofs(s, a->rs1),
                              MAXSZ(s), MAXSZ(s));
         } else {
             uint32_t data = FIELD_DP32(0, VDATA, LMUL, s->lmul);
+            data = FIELD_DP32(data, VDATA, VTA, s->vta);
             static gen_helper_gvec_2_ptr * const fns[4] = {
                 gen_helper_vmv_v_v_b, gen_helper_vmv_v_v_h,
                 gen_helper_vmv_v_v_w, gen_helper_vmv_v_v_d,
@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_x(DisasContext *s, arg_vmv_v_x *a)
 
         s1 = get_gpr(s, a->rs1, EXT_SIGN);
 
-        if (s->vl_eq_vlmax) {
+        if (s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
             tcg_gen_gvec_dup_tl(s->sew, vreg_ofs(s, a->rd),
                                 MAXSZ(s), MAXSZ(s), s1);
         } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_x(DisasContext *s, arg_vmv_v_x *a)
             TCGv_i64 s1_i64 = tcg_temp_new_i64();
             TCGv_ptr dest = tcg_temp_new_ptr();
             uint32_t data = FIELD_DP32(0, VDATA, LMUL, s->lmul);
+            data = FIELD_DP32(data, VDATA, VTA, s->vta);
             static gen_helper_vmv_vx * const fns[4] = {
                 gen_helper_vmv_v_x_b, gen_helper_vmv_v_x_h,
                 gen_helper_vmv_v_x_w, gen_helper_vmv_v_x_d,
@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
         /* vmv.v.i has rs2 = 0 and vm = 1 */
         vext_check_ss(s, a->rd, 0, 1)) {
         int64_t simm = sextract64(a->rs1, 0, 5);
-        if (s->vl_eq_vlmax) {
+        if (s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
             tcg_gen_gvec_dup_imm(s->sew, vreg_ofs(s, a->rd),
                                  MAXSZ(s), MAXSZ(s), simm);
             mark_vs_dirty(s);
@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
             TCGv_i64 s1;
             TCGv_ptr dest;
             uint32_t data = FIELD_DP32(0, VDATA, LMUL, s->lmul);
+            data = FIELD_DP32(data, VDATA, VTA, s->vta);
             static gen_helper_vmv_vx * const fns[4] = {
                 gen_helper_vmv_v_x_b, gen_helper_vmv_v_x_h,
                 gen_helper_vmv_v_x_w, gen_helper_vmv_v_x_d,
@@ -XXX,XX +XXX,XX @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f *a)
 
         TCGv_i64 t1;
 
-        if (s->vl_eq_vlmax) {
+        if (s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
             t1 = tcg_temp_new_i64();
             /* NaN-box f[rs1] */
             do_nanbox(s, t1, cpu_fpr[a->rs1]);
@@ -XXX,XX +XXX,XX @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f *a)
             TCGv_ptr dest;
             TCGv_i32 desc;
             uint32_t data = FIELD_DP32(0, VDATA, LMUL, s->lmul);
+            data = FIELD_DP32(data, VDATA, VTA, s->vta);
             static gen_helper_vmv_vx * const fns[3] = {
                 gen_helper_vmv_v_x_h,
                 gen_helper_vmv_v_x_w,
-- 
2.36.1

From: eopXD <yueh.ting.chen@gmail.com>

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ static inline void
 vext_vv_rm_2(void *vd, void *v0, void *vs1, void *vs2,
              CPURISCVState *env,
              uint32_t desc,
-             opivv2_rm_fn *fn)
+             opivv2_rm_fn *fn, uint32_t esz)
 {
     uint32_t vm = vext_vm(desc);
     uint32_t vl = env->vl;
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+    uint32_t vta = vext_vta(desc);
 
     switch (env->vxrm) {
     case 0: /* rnu */
@@ -XXX,XX +XXX,XX @@ vext_vv_rm_2(void *vd, void *v0, void *vs1, void *vs2,
                      env, vl, vm, 3, fn);
         break;
     }
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
 }
 
 /* generate helpers for fixed point instructions with OPIVV format */
-#define GEN_VEXT_VV_RM(NAME)                                    \
+#define GEN_VEXT_VV_RM(NAME, ESZ)                               \
 void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,     \
                   CPURISCVState *env, uint32_t desc)            \
 {                                                               \
     vext_vv_rm_2(vd, v0, vs1, vs2, env, desc,                   \
-                 do_##NAME);                                    \
+                 do_##NAME, ESZ);                               \
 }
 
 static inline uint8_t saddu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vsaddu_vv_b, OP_UUU_B, H1, H1, H1, saddu8)
 RVVCALL(OPIVV2_RM, vsaddu_vv_h, OP_UUU_H, H2, H2, H2, saddu16)
 RVVCALL(OPIVV2_RM, vsaddu_vv_w, OP_UUU_W, H4, H4, H4, saddu32)
 RVVCALL(OPIVV2_RM, vsaddu_vv_d, OP_UUU_D, H8, H8, H8, saddu64)
-GEN_VEXT_VV_RM(vsaddu_vv_b)
-GEN_VEXT_VV_RM(vsaddu_vv_h)
-GEN_VEXT_VV_RM(vsaddu_vv_w)
-GEN_VEXT_VV_RM(vsaddu_vv_d)
+GEN_VEXT_VV_RM(vsaddu_vv_b, 1)
+GEN_VEXT_VV_RM(vsaddu_vv_h, 2)
+GEN_VEXT_VV_RM(vsaddu_vv_w, 4)
+GEN_VEXT_VV_RM(vsaddu_vv_d, 8)
 
 typedef void opivx2_rm_fn(void *vd, target_long s1, void *vs2, int i,
                           CPURISCVState *env, int vxrm);
@@ -XXX,XX +XXX,XX @@ static inline void
 vext_vx_rm_2(void *vd, void *v0, target_long s1, void *vs2,
              CPURISCVState *env,
              uint32_t desc,
-             opivx2_rm_fn *fn)
+             opivx2_rm_fn *fn, uint32_t esz)
 {
     uint32_t vm = vext_vm(desc);
     uint32_t vl = env->vl;
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+    uint32_t vta = vext_vta(desc);
 
     switch (env->vxrm) {
     case 0: /* rnu */
@@ -XXX,XX +XXX,XX @@ vext_vx_rm_2(void *vd, void *v0, target_long s1, void *vs2,
                      env, vl, vm, 3, fn);
         break;
     }
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
 }
 
 /* generate helpers for fixed point instructions with OPIVX format */
-#define GEN_VEXT_VX_RM(NAME)                              \
+#define GEN_VEXT_VX_RM(NAME, ESZ)                         \
 void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
         void *vs2, CPURISCVState *env, uint32_t desc)     \
 {                                                         \
     vext_vx_rm_2(vd, v0, s1, vs2, env, desc,              \
-                 do_##NAME);                              \
+                 do_##NAME, ESZ);                         \
 }
 
 RVVCALL(OPIVX2_RM, vsaddu_vx_b, OP_UUU_B, H1, H1, saddu8)
 RVVCALL(OPIVX2_RM, vsaddu_vx_h, OP_UUU_H, H2, H2, saddu16)
 RVVCALL(OPIVX2_RM, vsaddu_vx_w, OP_UUU_W, H4, H4, saddu32)
 RVVCALL(OPIVX2_RM, vsaddu_vx_d, OP_UUU_D, H8, H8, saddu64)
-GEN_VEXT_VX_RM(vsaddu_vx_b)
-GEN_VEXT_VX_RM(vsaddu_vx_h)
-GEN_VEXT_VX_RM(vsaddu_vx_w)
-GEN_VEXT_VX_RM(vsaddu_vx_d)
+GEN_VEXT_VX_RM(vsaddu_vx_b, 1)
+GEN_VEXT_VX_RM(vsaddu_vx_h, 2)
+GEN_VEXT_VX_RM(vsaddu_vx_w, 4)
+GEN_VEXT_VX_RM(vsaddu_vx_d, 8)
 
 static inline int8_t sadd8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vsadd_vv_b, OP_SSS_B, H1, H1, H1, sadd8)
 RVVCALL(OPIVV2_RM, vsadd_vv_h, OP_SSS_H, H2, H2, H2, sadd16)
 RVVCALL(OPIVV2_RM, vsadd_vv_w, OP_SSS_W, H4, H4, H4, sadd32)
 RVVCALL(OPIVV2_RM, vsadd_vv_d, OP_SSS_D, H8, H8, H8, sadd64)
-GEN_VEXT_VV_RM(vsadd_vv_b)
-GEN_VEXT_VV_RM(vsadd_vv_h)
-GEN_VEXT_VV_RM(vsadd_vv_w)
-GEN_VEXT_VV_RM(vsadd_vv_d)
+GEN_VEXT_VV_RM(vsadd_vv_b, 1)
+GEN_VEXT_VV_RM(vsadd_vv_h, 2)
+GEN_VEXT_VV_RM(vsadd_vv_w, 4)
+GEN_VEXT_VV_RM(vsadd_vv_d, 8)
 
 RVVCALL(OPIVX2_RM, vsadd_vx_b, OP_SSS_B, H1, H1, sadd8)
 RVVCALL(OPIVX2_RM, vsadd_vx_h, OP_SSS_H, H2, H2, sadd16)
 RVVCALL(OPIVX2_RM, vsadd_vx_w, OP_SSS_W, H4, H4, sadd32)
 RVVCALL(OPIVX2_RM, vsadd_vx_d, OP_SSS_D, H8, H8, sadd64)
-GEN_VEXT_VX_RM(vsadd_vx_b)
-GEN_VEXT_VX_RM(vsadd_vx_h)
-GEN_VEXT_VX_RM(vsadd_vx_w)
-GEN_VEXT_VX_RM(vsadd_vx_d)
+GEN_VEXT_VX_RM(vsadd_vx_b, 1)
+GEN_VEXT_VX_RM(vsadd_vx_h, 2)
+GEN_VEXT_VX_RM(vsadd_vx_w, 4)
+GEN_VEXT_VX_RM(vsadd_vx_d, 8)
 
 static inline uint8_t ssubu8(CPURISCVState *env, int vxrm, uint8_t a, uint8_t b)
 {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssubu_vv_b, OP_UUU_B, H1, H1, H1, ssubu8)
 RVVCALL(OPIVV2_RM, vssubu_vv_h, OP_UUU_H, H2, H2, H2, ssubu16)
 RVVCALL(OPIVV2_RM, vssubu_vv_w, OP_UUU_W, H4, H4, H4, ssubu32)
 RVVCALL(OPIVV2_RM, vssubu_vv_d, OP_UUU_D, H8, H8, H8, ssubu64)
-GEN_VEXT_VV_RM(vssubu_vv_b)
-GEN_VEXT_VV_RM(vssubu_vv_h)
-GEN_VEXT_VV_RM(vssubu_vv_w)
-GEN_VEXT_VV_RM(vssubu_vv_d)
+GEN_VEXT_VV_RM(vssubu_vv_b, 1)
+GEN_VEXT_VV_RM(vssubu_vv_h, 2)
+GEN_VEXT_VV_RM(vssubu_vv_w, 4)
+GEN_VEXT_VV_RM(vssubu_vv_d, 8)
 
 RVVCALL(OPIVX2_RM, vssubu_vx_b, OP_UUU_B, H1, H1, ssubu8)
 RVVCALL(OPIVX2_RM, vssubu_vx_h, OP_UUU_H, H2, H2, ssubu16)
 RVVCALL(OPIVX2_RM, vssubu_vx_w, OP_UUU_W, H4, H4, ssubu32)
 RVVCALL(OPIVX2_RM, vssubu_vx_d, OP_UUU_D, H8, H8, ssubu64)
-GEN_VEXT_VX_RM(vssubu_vx_b)
-GEN_VEXT_VX_RM(vssubu_vx_h)
-GEN_VEXT_VX_RM(vssubu_vx_w)
-GEN_VEXT_VX_RM(vssubu_vx_d)
+GEN_VEXT_VX_RM(vssubu_vx_b, 1)
+GEN_VEXT_VX_RM(vssubu_vx_h, 2)
+GEN_VEXT_VX_RM(vssubu_vx_w, 4)
+GEN_VEXT_VX_RM(vssubu_vx_d, 8)
 
 static inline int8_t ssub8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
 {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssub_vv_b, OP_SSS_B, H1, H1, H1, ssub8)
 RVVCALL(OPIVV2_RM, vssub_vv_h, OP_SSS_H, H2, H2, H2, ssub16)
 RVVCALL(OPIVV2_RM, vssub_vv_w, OP_SSS_W, H4, H4, H4, ssub32)
 RVVCALL(OPIVV2_RM, vssub_vv_d, OP_SSS_D, H8, H8, H8, ssub64)
-GEN_VEXT_VV_RM(vssub_vv_b)
-GEN_VEXT_VV_RM(vssub_vv_h)
-GEN_VEXT_VV_RM(vssub_vv_w)
-GEN_VEXT_VV_RM(vssub_vv_d)
+GEN_VEXT_VV_RM(vssub_vv_b, 1)
+GEN_VEXT_VV_RM(vssub_vv_h, 2)
+GEN_VEXT_VV_RM(vssub_vv_w, 4)
+GEN_VEXT_VV_RM(vssub_vv_d, 8)
 
 RVVCALL(OPIVX2_RM, vssub_vx_b, OP_SSS_B, H1, H1, ssub8)
 RVVCALL(OPIVX2_RM, vssub_vx_h, OP_SSS_H, H2, H2, ssub16)
 RVVCALL(OPIVX2_RM, vssub_vx_w, OP_SSS_W, H4, H4, ssub32)
 RVVCALL(OPIVX2_RM, vssub_vx_d, OP_SSS_D, H8, H8, ssub64)
-GEN_VEXT_VX_RM(vssub_vx_b)
-GEN_VEXT_VX_RM(vssub_vx_h)
-GEN_VEXT_VX_RM(vssub_vx_w)
-GEN_VEXT_VX_RM(vssub_vx_d)
+GEN_VEXT_VX_RM(vssub_vx_b, 1)
+GEN_VEXT_VX_RM(vssub_vx_h, 2)
+GEN_VEXT_VX_RM(vssub_vx_w, 4)
+GEN_VEXT_VX_RM(vssub_vx_d, 8)
 
 /* Vector Single-Width Averaging Add and Subtract */
 static inline uint8_t get_round(int vxrm, uint64_t v, uint8_t shift)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vaadd_vv_b, OP_SSS_B, H1, H1, H1, aadd32)
 RVVCALL(OPIVV2_RM, vaadd_vv_h, OP_SSS_H, H2, H2, H2, aadd32)
 RVVCALL(OPIVV2_RM, vaadd_vv_w, OP_SSS_W, H4, H4, H4, aadd32)
 RVVCALL(OPIVV2_RM, vaadd_vv_d, OP_SSS_D, H8, H8, H8, aadd64)
-GEN_VEXT_VV_RM(vaadd_vv_b)
-GEN_VEXT_VV_RM(vaadd_vv_h)
-GEN_VEXT_VV_RM(vaadd_vv_w)
-GEN_VEXT_VV_RM(vaadd_vv_d)
+GEN_VEXT_VV_RM(vaadd_vv_b, 1)
+GEN_VEXT_VV_RM(vaadd_vv_h, 2)
+GEN_VEXT_VV_RM(vaadd_vv_w, 4)
+GEN_VEXT_VV_RM(vaadd_vv_d, 8)
 
 RVVCALL(OPIVX2_RM, vaadd_vx_b, OP_SSS_B, H1, H1, aadd32)
 RVVCALL(OPIVX2_RM, vaadd_vx_h, OP_SSS_H, H2, H2, aadd32)
 RVVCALL(OPIVX2_RM, vaadd_vx_w, OP_SSS_W, H4, H4, aadd32)
 RVVCALL(OPIVX2_RM, vaadd_vx_d, OP_SSS_D, H8, H8, aadd64)
-GEN_VEXT_VX_RM(vaadd_vx_b)
-GEN_VEXT_VX_RM(vaadd_vx_h)
-GEN_VEXT_VX_RM(vaadd_vx_w)
-GEN_VEXT_VX_RM(vaadd_vx_d)
+GEN_VEXT_VX_RM(vaadd_vx_b, 1)
+GEN_VEXT_VX_RM(vaadd_vx_h, 2)
+GEN_VEXT_VX_RM(vaadd_vx_w, 4)
+GEN_VEXT_VX_RM(vaadd_vx_d, 8)
 
 static inline uint32_t aaddu32(CPURISCVState *env, int vxrm,
                                uint32_t a, uint32_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vaaddu_vv_b, OP_UUU_B, H1, H1, H1, aaddu32)
 RVVCALL(OPIVV2_RM, vaaddu_vv_h, OP_UUU_H, H2, H2, H2, aaddu32)
 RVVCALL(OPIVV2_RM, vaaddu_vv_w, OP_UUU_W, H4, H4, H4, aaddu32)
 RVVCALL(OPIVV2_RM, vaaddu_vv_d, OP_UUU_D, H8, H8, H8, aaddu64)
-GEN_VEXT_VV_RM(vaaddu_vv_b)
-GEN_VEXT_VV_RM(vaaddu_vv_h)
-GEN_VEXT_VV_RM(vaaddu_vv_w)
-GEN_VEXT_VV_RM(vaaddu_vv_d)
+GEN_VEXT_VV_RM(vaaddu_vv_b, 1)
+GEN_VEXT_VV_RM(vaaddu_vv_h, 2)
+GEN_VEXT_VV_RM(vaaddu_vv_w, 4)
+GEN_VEXT_VV_RM(vaaddu_vv_d, 8)
 
 RVVCALL(OPIVX2_RM, vaaddu_vx_b, OP_UUU_B, H1, H1, aaddu32)
 RVVCALL(OPIVX2_RM, vaaddu_vx_h, OP_UUU_H, H2, H2, aaddu32)
 RVVCALL(OPIVX2_RM, vaaddu_vx_w, OP_UUU_W, H4, H4, aaddu32)
 RVVCALL(OPIVX2_RM, vaaddu_vx_d, OP_UUU_D, H8, H8, aaddu64)
-GEN_VEXT_VX_RM(vaaddu_vx_b)
-GEN_VEXT_VX_RM(vaaddu_vx_h)
-GEN_VEXT_VX_RM(vaaddu_vx_w)
-GEN_VEXT_VX_RM(vaaddu_vx_d)
+GEN_VEXT_VX_RM(vaaddu_vx_b, 1)
+GEN_VEXT_VX_RM(vaaddu_vx_h, 2)
+GEN_VEXT_VX_RM(vaaddu_vx_w, 4)
+GEN_VEXT_VX_RM(vaaddu_vx_d, 8)
 
 static inline int32_t asub32(CPURISCVState *env, int vxrm, int32_t a, int32_t b)
 {
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vasub_vv_b, OP_SSS_B, H1, H1, H1, asub32)
 RVVCALL(OPIVV2_RM, vasub_vv_h, OP_SSS_H, H2, H2, H2, asub32)
 RVVCALL(OPIVV2_RM, vasub_vv_w, OP_SSS_W, H4, H4, H4, asub32)
 RVVCALL(OPIVV2_RM, vasub_vv_d, OP_SSS_D, H8, H8, H8, asub64)
-GEN_VEXT_VV_RM(vasub_vv_b)
-GEN_VEXT_VV_RM(vasub_vv_h)
-GEN_VEXT_VV_RM(vasub_vv_w)
-GEN_VEXT_VV_RM(vasub_vv_d)
+GEN_VEXT_VV_RM(vasub_vv_b, 1)
+GEN_VEXT_VV_RM(vasub_vv_h, 2)
+GEN_VEXT_VV_RM(vasub_vv_w, 4)
+GEN_VEXT_VV_RM(vasub_vv_d, 8)
 
 RVVCALL(OPIVX2_RM, vasub_vx_b, OP_SSS_B, H1, H1, asub32)
 RVVCALL(OPIVX2_RM, vasub_vx_h, OP_SSS_H, H2, H2, asub32)
 RVVCALL(OPIVX2_RM, vasub_vx_w, OP_SSS_W, H4, H4, asub32)
 RVVCALL(OPIVX2_RM, vasub_vx_d, OP_SSS_D, H8, H8, asub64)
-GEN_VEXT_VX_RM(vasub_vx_b)
-GEN_VEXT_VX_RM(vasub_vx_h)
-GEN_VEXT_VX_RM(vasub_vx_w)
-GEN_VEXT_VX_RM(vasub_vx_d)
+GEN_VEXT_VX_RM(vasub_vx_b, 1)
+GEN_VEXT_VX_RM(vasub_vx_h, 2)
+GEN_VEXT_VX_RM(vasub_vx_w, 4)
+GEN_VEXT_VX_RM(vasub_vx_d, 8)
 
 static inline uint32_t asubu32(CPURISCVState *env, int vxrm,
                                uint32_t a, uint32_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vasubu_vv_b, OP_UUU_B, H1, H1, H1, asubu32)
 RVVCALL(OPIVV2_RM, vasubu_vv_h, OP_UUU_H, H2, H2, H2, asubu32)
 RVVCALL(OPIVV2_RM, vasubu_vv_w, OP_UUU_W, H4, H4, H4, asubu32)
 RVVCALL(OPIVV2_RM, vasubu_vv_d, OP_UUU_D, H8, H8, H8, asubu64)
-GEN_VEXT_VV_RM(vasubu_vv_b)
-GEN_VEXT_VV_RM(vasubu_vv_h)
-GEN_VEXT_VV_RM(vasubu_vv_w)
-GEN_VEXT_VV_RM(vasubu_vv_d)
+GEN_VEXT_VV_RM(vasubu_vv_b, 1)
+GEN_VEXT_VV_RM(vasubu_vv_h, 2)
+GEN_VEXT_VV_RM(vasubu_vv_w, 4)
+GEN_VEXT_VV_RM(vasubu_vv_d, 8)
 
 RVVCALL(OPIVX2_RM, vasubu_vx_b, OP_UUU_B, H1, H1, asubu32)
 RVVCALL(OPIVX2_RM, vasubu_vx_h, OP_UUU_H, H2, H2, asubu32)
 RVVCALL(OPIVX2_RM, vasubu_vx_w, OP_UUU_W, H4, H4, asubu32)
 RVVCALL(OPIVX2_RM, vasubu_vx_d, OP_UUU_D, H8, H8, asubu64)
-GEN_VEXT_VX_RM(vasubu_vx_b)
-GEN_VEXT_VX_RM(vasubu_vx_h)
-GEN_VEXT_VX_RM(vasubu_vx_w)
-GEN_VEXT_VX_RM(vasubu_vx_d)
+GEN_VEXT_VX_RM(vasubu_vx_b, 1)
+GEN_VEXT_VX_RM(vasubu_vx_h, 2)
+GEN_VEXT_VX_RM(vasubu_vx_w, 4)
+GEN_VEXT_VX_RM(vasubu_vx_d, 8)
 
 /* Vector Single-Width Fractional Multiply with Rounding and Saturation */
 static inline int8_t vsmul8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vsmul_vv_b, OP_SSS_B, H1, H1, H1, vsmul8)
 RVVCALL(OPIVV2_RM, vsmul_vv_h, OP_SSS_H, H2, H2, H2, vsmul16)
 RVVCALL(OPIVV2_RM, vsmul_vv_w, OP_SSS_W, H4, H4, H4, vsmul32)
 RVVCALL(OPIVV2_RM, vsmul_vv_d, OP_SSS_D, H8, H8, H8, vsmul64)
-GEN_VEXT_VV_RM(vsmul_vv_b)
-GEN_VEXT_VV_RM(vsmul_vv_h)
-GEN_VEXT_VV_RM(vsmul_vv_w)
-GEN_VEXT_VV_RM(vsmul_vv_d)
+GEN_VEXT_VV_RM(vsmul_vv_b, 1)
+GEN_VEXT_VV_RM(vsmul_vv_h, 2)
+GEN_VEXT_VV_RM(vsmul_vv_w, 4)
+GEN_VEXT_VV_RM(vsmul_vv_d, 8)
 
 RVVCALL(OPIVX2_RM, vsmul_vx_b, OP_SSS_B, H1, H1, vsmul8)
 RVVCALL(OPIVX2_RM, vsmul_vx_h, OP_SSS_H, H2, H2, vsmul16)
 RVVCALL(OPIVX2_RM, vsmul_vx_w, OP_SSS_W, H4, H4, vsmul32)
 RVVCALL(OPIVX2_RM, vsmul_vx_d, OP_SSS_D, H8, H8, vsmul64)
-GEN_VEXT_VX_RM(vsmul_vx_b)
-GEN_VEXT_VX_RM(vsmul_vx_h)
-GEN_VEXT_VX_RM(vsmul_vx_w)
-GEN_VEXT_VX_RM(vsmul_vx_d)
+GEN_VEXT_VX_RM(vsmul_vx_b, 1)
+GEN_VEXT_VX_RM(vsmul_vx_h, 2)
+GEN_VEXT_VX_RM(vsmul_vx_w, 4)
+GEN_VEXT_VX_RM(vsmul_vx_d, 8)
 
 /* Vector Single-Width Scaling Shift Instructions */
 static inline uint8_t
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssrl_vv_b, OP_UUU_B, H1, H1, H1, vssrl8)
 RVVCALL(OPIVV2_RM, vssrl_vv_h, OP_UUU_H, H2, H2, H2, vssrl16)
 RVVCALL(OPIVV2_RM, vssrl_vv_w, OP_UUU_W, H4, H4, H4, vssrl32)
 RVVCALL(OPIVV2_RM, vssrl_vv_d, OP_UUU_D, H8, H8, H8, vssrl64)
-GEN_VEXT_VV_RM(vssrl_vv_b)
-GEN_VEXT_VV_RM(vssrl_vv_h)
-GEN_VEXT_VV_RM(vssrl_vv_w)
-GEN_VEXT_VV_RM(vssrl_vv_d)
+GEN_VEXT_VV_RM(vssrl_vv_b, 1)
+GEN_VEXT_VV_RM(vssrl_vv_h, 2)
+GEN_VEXT_VV_RM(vssrl_vv_w, 4)
+GEN_VEXT_VV_RM(vssrl_vv_d, 8)
 
 RVVCALL(OPIVX2_RM, vssrl_vx_b, OP_UUU_B, H1, H1, vssrl8)
 RVVCALL(OPIVX2_RM, vssrl_vx_h, OP_UUU_H, H2, H2, vssrl16)
 RVVCALL(OPIVX2_RM, vssrl_vx_w, OP_UUU_W, H4, H4, vssrl32)
 RVVCALL(OPIVX2_RM, vssrl_vx_d, OP_UUU_D, H8, H8, vssrl64)
-GEN_VEXT_VX_RM(vssrl_vx_b)
-GEN_VEXT_VX_RM(vssrl_vx_h)
-GEN_VEXT_VX_RM(vssrl_vx_w)
-GEN_VEXT_VX_RM(vssrl_vx_d)
+GEN_VEXT_VX_RM(vssrl_vx_b, 1)
+GEN_VEXT_VX_RM(vssrl_vx_h, 2)
+GEN_VEXT_VX_RM(vssrl_vx_w, 4)
+GEN_VEXT_VX_RM(vssrl_vx_d, 8)
 
 static inline int8_t
 vssra8(CPURISCVState *env, int vxrm, int8_t a, int8_t b)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2_RM, vssra_vv_b, OP_SSS_B, H1, H1, H1, vssra8)
 RVVCALL(OPIVV2_RM, vssra_vv_h, OP_SSS_H, H2, H2, H2, vssra16)
 RVVCALL(OPIVV2_RM, vssra_vv_w, OP_SSS_W, H4, H4, H4, vssra32)
 RVVCALL(OPIVV2_RM, vssra_vv_d, OP_SSS_D, H8, H8, H8, vssra64)
-GEN_VEXT_VV_RM(vssra_vv_b)
-GEN_VEXT_VV_RM(vssra_vv_h)
-GEN_VEXT_VV_RM(vssra_vv_w)
-GEN_VEXT_VV_RM(vssra_vv_d)
+GEN_VEXT_VV_RM(vssra_vv_b, 1)
+GEN_VEXT_VV_RM(vssra_vv_h, 2)
+GEN_VEXT_VV_RM(vssra_vv_w, 4)
+GEN_VEXT_VV_RM(vssra_vv_d, 8)
 
 RVVCALL(OPIVX2_RM, vssra_vx_b, OP_SSS_B, H1, H1, vssra8)
 RVVCALL(OPIVX2_RM, vssra_vx_h, OP_SSS_H, H2, H2, vssra16)
 RVVCALL(OPIVX2_RM, vssra_vx_w, OP_SSS_W, H4, H4, vssra32)
 RVVCALL(OPIVX2_RM, vssra_vx_d, OP_SSS_D, H8, H8, vssra64)
-GEN_VEXT_VX_RM(vssra_vx_b)
-GEN_VEXT_VX_RM(vssra_vx_h)
-GEN_VEXT_VX_RM(vssra_vx_w)
-GEN_VEXT_VX_RM(vssra_vx_d)
+GEN_VEXT_VX_RM(vssra_vx_b, 1)
+GEN_VEXT_VX_RM(vssra_vx_h, 2)
+GEN_VEXT_VX_RM(vssra_vx_w, 4)
+GEN_VEXT_VX_RM(vssra_vx_d, 8)
 
 /* Vector Narrowing Fixed-Point Clip Instructions */
 static inline int8_t
@@ -XXX,XX +XXX,XX @@ vnclip32(CPURISCVState *env, int vxrm, int64_t a, int32_t b)
 RVVCALL(OPIVV2_RM, vnclip_wv_b, NOP_SSS_B, H1, H2, H1, vnclip8)
 RVVCALL(OPIVV2_RM, vnclip_wv_h, NOP_SSS_H, H2, H4, H2, vnclip16)
 RVVCALL(OPIVV2_RM, vnclip_wv_w, NOP_SSS_W, H4, H8, H4, vnclip32)
-GEN_VEXT_VV_RM(vnclip_wv_b)
-GEN_VEXT_VV_RM(vnclip_wv_h)
-GEN_VEXT_VV_RM(vnclip_wv_w)
+GEN_VEXT_VV_RM(vnclip_wv_b, 1)
+GEN_VEXT_VV_RM(vnclip_wv_h, 2)
+GEN_VEXT_VV_RM(vnclip_wv_w, 4)
 
 RVVCALL(OPIVX2_RM, vnclip_wx_b, NOP_SSS_B, H1, H2, vnclip8)
 RVVCALL(OPIVX2_RM, vnclip_wx_h, NOP_SSS_H, H2, H4, vnclip16)
 RVVCALL(OPIVX2_RM, vnclip_wx_w, NOP_SSS_W, H4, H8, vnclip32)
-GEN_VEXT_VX_RM(vnclip_wx_b)
-GEN_VEXT_VX_RM(vnclip_wx_h)
-GEN_VEXT_VX_RM(vnclip_wx_w)
+GEN_VEXT_VX_RM(vnclip_wx_b, 1)
+GEN_VEXT_VX_RM(vnclip_wx_h, 2)
+GEN_VEXT_VX_RM(vnclip_wx_w, 4)
 
 static inline uint8_t
 vnclipu8(CPURISCVState *env, int vxrm, uint16_t a, uint8_t b)
@@ -XXX,XX +XXX,XX @@ vnclipu32(CPURISCVState *env, int vxrm, uint64_t a, uint32_t b)
 RVVCALL(OPIVV2_RM, vnclipu_wv_b, NOP_UUU_B, H1, H2, H1, vnclipu8)
 RVVCALL(OPIVV2_RM, vnclipu_wv_h, NOP_UUU_H, H2, H4, H2, vnclipu16)
 RVVCALL(OPIVV2_RM, vnclipu_wv_w, NOP_UUU_W, H4, H8, H4, vnclipu32)
-GEN_VEXT_VV_RM(vnclipu_wv_b)
-GEN_VEXT_VV_RM(vnclipu_wv_h)
-GEN_VEXT_VV_RM(vnclipu_wv_w)
+GEN_VEXT_VV_RM(vnclipu_wv_b, 1)
+GEN_VEXT_VV_RM(vnclipu_wv_h, 2)
+GEN_VEXT_VV_RM(vnclipu_wv_w, 4)
 
 RVVCALL(OPIVX2_RM, vnclipu_wx_b, NOP_UUU_B, H1, H2, vnclipu8)
 RVVCALL(OPIVX2_RM, vnclipu_wx_h, NOP_UUU_H, H2, H4, vnclipu16)
 RVVCALL(OPIVX2_RM, vnclipu_wx_w, NOP_UUU_W, H4, H8, vnclipu32)
-GEN_VEXT_VX_RM(vnclipu_wx_b)
-GEN_VEXT_VX_RM(vnclipu_wx_h)
-GEN_VEXT_VX_RM(vnclipu_wx_w)
+GEN_VEXT_VX_RM(vnclipu_wx_b, 1)
+GEN_VEXT_VX_RM(vnclipu_wx_h, 2)
+GEN_VEXT_VX_RM(vnclipu_wx_w, 4)
 
 /*
  *** Vector Float Point Arithmetic Instructions
-- 
2.36.1

From: eopXD <yueh.ting.chen@gmail.com>

Compares write mask registers, and so always operate under a tail-
agnostic policy.

Signed-off-by: eop Chen <eop.chen@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <165449614532.19704.7000832880482980398-12@git.sr.ht>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/vector_helper.c            | 440 +++++++++++++-----------
 target/riscv/insn_trans/trans_rvv.c.inc |  17 +
 2 files changed, 261 insertions(+), 196 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, void *vs1, void *vs2, int i,   \
     *((TD *)vd + HD(i)) = OP(s2, s1, &env->fp_status);         \
 }
 
-#define GEN_VEXT_VV_ENV(NAME)                             \
+#define GEN_VEXT_VV_ENV(NAME, ESZ)                        \
 void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
                   void *vs2, CPURISCVState *env,          \
                   uint32_t desc)                          \
 {                                                         \
     uint32_t vm = vext_vm(desc);                          \
     uint32_t vl = env->vl;                                \
+    uint32_t total_elems =                                \
+        vext_get_total_elems(env, desc, ESZ);             \
+    uint32_t vta = vext_vta(desc);                        \
     uint32_t i;                                           \
                                                           \
     for (i = env->vstart; i < vl; i++) {                  \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
         do_##NAME(vd, vs1, vs2, i, env);                  \
     }                                                     \
     env->vstart = 0;                                      \
+    /* set tail elements to 1s */                         \
+    vext_set_elems_1s(vd, vta, vl * ESZ,                  \
+                      total_elems * ESZ);                 \
 }
 
 RVVCALL(OPFVV2, vfadd_vv_h, OP_UUU_H, H2, H2, H2, float16_add)
 RVVCALL(OPFVV2, vfadd_vv_w, OP_UUU_W, H4, H4, H4, float32_add)
 RVVCALL(OPFVV2, vfadd_vv_d, OP_UUU_D, H8, H8, H8, float64_add)
-GEN_VEXT_VV_ENV(vfadd_vv_h)
-GEN_VEXT_VV_ENV(vfadd_vv_w)
-GEN_VEXT_VV_ENV(vfadd_vv_d)
+GEN_VEXT_VV_ENV(vfadd_vv_h, 2)
+GEN_VEXT_VV_ENV(vfadd_vv_w, 4)
+GEN_VEXT_VV_ENV(vfadd_vv_d, 8)
 
 #define OPFVF2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)        \
 static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i, \
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i, \
     *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1, &env->fp_status);\
 }
 
-#define GEN_VEXT_VF(NAME)                                 \
+#define GEN_VEXT_VF(NAME, ESZ)                            \
 void HELPER(NAME)(void *vd, void *v0, uint64_t s1,        \
                   void *vs2, CPURISCVState *env,          \
                   uint32_t desc)                          \
 {                                                         \
     uint32_t vm = vext_vm(desc);                          \
     uint32_t vl = env->vl;                                \
+    uint32_t total_elems =                                \
+        vext_get_total_elems(env, desc, ESZ);              \
+    uint32_t vta = vext_vta(desc);                        \
     uint32_t i;                                           \
                                                           \
     for (i = env->vstart; i < vl; i++) {                  \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1,        \
         do_##NAME(vd, s1, vs2, i, env);                   \
     }                                                     \
     env->vstart = 0;                                      \
+    /* set tail elements to 1s */                         \
+    vext_set_elems_1s(vd, vta, vl * ESZ,                  \
+                      total_elems * ESZ);                 \
 }
 
 RVVCALL(OPFVF2, vfadd_vf_h, OP_UUU_H, H2, H2, float16_add)
 RVVCALL(OPFVF2, vfadd_vf_w, OP_UUU_W, H4, H4, float32_add)
 RVVCALL(OPFVF2, vfadd_vf_d, OP_UUU_D, H8, H8, float64_add)
-GEN_VEXT_VF(vfadd_vf_h)
-GEN_VEXT_VF(vfadd_vf_w)
-GEN_VEXT_VF(vfadd_vf_d)
+GEN_VEXT_VF(vfadd_vf_h, 2)
+GEN_VEXT_VF(vfadd_vf_w, 4)
+GEN_VEXT_VF(vfadd_vf_d, 8)
 
 RVVCALL(OPFVV2, vfsub_vv_h, OP_UUU_H, H2, H2, H2, float16_sub)
 RVVCALL(OPFVV2, vfsub_vv_w, OP_UUU_W, H4, H4, H4, float32_sub)
 RVVCALL(OPFVV2, vfsub_vv_d, OP_UUU_D, H8, H8, H8, float64_sub)
-GEN_VEXT_VV_ENV(vfsub_vv_h)
-GEN_VEXT_VV_ENV(vfsub_vv_w)
-GEN_VEXT_VV_ENV(vfsub_vv_d)
+GEN_VEXT_VV_ENV(vfsub_vv_h, 2)
+GEN_VEXT_VV_ENV(vfsub_vv_w, 4)
+GEN_VEXT_VV_ENV(vfsub_vv_d, 8)
 RVVCALL(OPFVF2, vfsub_vf_h, OP_UUU_H, H2, H2, float16_sub)
 RVVCALL(OPFVF2, vfsub_vf_w, OP_UUU_W, H4, H4, float32_sub)
 RVVCALL(OPFVF2, vfsub_vf_d, OP_UUU_D, H8, H8, float64_sub)
-GEN_VEXT_VF(vfsub_vf_h)
-GEN_VEXT_VF(vfsub_vf_w)
-GEN_VEXT_VF(vfsub_vf_d)
+GEN_VEXT_VF(vfsub_vf_h, 2)
+GEN_VEXT_VF(vfsub_vf_w, 4)
+GEN_VEXT_VF(vfsub_vf_d, 8)
 
 static uint16_t float16_rsub(uint16_t a, uint16_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t float64_rsub(uint64_t a, uint64_t b, float_status *s)
 RVVCALL(OPFVF2, vfrsub_vf_h, OP_UUU_H, H2, H2, float16_rsub)
 RVVCALL(OPFVF2, vfrsub_vf_w, OP_UUU_W, H4, H4, float32_rsub)
 RVVCALL(OPFVF2, vfrsub_vf_d, OP_UUU_D, H8, H8, float64_rsub)
-GEN_VEXT_VF(vfrsub_vf_h)
-GEN_VEXT_VF(vfrsub_vf_w)
-GEN_VEXT_VF(vfrsub_vf_d)
+GEN_VEXT_VF(vfrsub_vf_h, 2)
+GEN_VEXT_VF(vfrsub_vf_w, 4)
+GEN_VEXT_VF(vfrsub_vf_d, 8)
 
 /* Vector Widening Floating-Point Add/Subtract Instructions */
 static uint32_t vfwadd16(uint16_t a, uint16_t b, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwadd32(uint32_t a, uint32_t b, float_status *s)
 
 RVVCALL(OPFVV2, vfwadd_vv_h, WOP_UUU_H, H4, H2, H2, vfwadd16)
 RVVCALL(OPFVV2, vfwadd_vv_w, WOP_UUU_W, H8, H4, H4, vfwadd32)
-GEN_VEXT_VV_ENV(vfwadd_vv_h)
-GEN_VEXT_VV_ENV(vfwadd_vv_w)
+GEN_VEXT_VV_ENV(vfwadd_vv_h, 4)
+GEN_VEXT_VV_ENV(vfwadd_vv_w, 8)
 RVVCALL(OPFVF2, vfwadd_vf_h, WOP_UUU_H, H4, H2, vfwadd16)
 RVVCALL(OPFVF2, vfwadd_vf_w, WOP_UUU_W, H8, H4, vfwadd32)
-GEN_VEXT_VF(vfwadd_vf_h)
-GEN_VEXT_VF(vfwadd_vf_w)
+GEN_VEXT_VF(vfwadd_vf_h, 4)
+GEN_VEXT_VF(vfwadd_vf_w, 8)
 
 static uint32_t vfwsub16(uint16_t a, uint16_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwsub32(uint32_t a, uint32_t b, float_status *s)
 
 RVVCALL(OPFVV2, vfwsub_vv_h, WOP_UUU_H, H4, H2, H2, vfwsub16)
 RVVCALL(OPFVV2, vfwsub_vv_w, WOP_UUU_W, H8, H4, H4, vfwsub32)
-GEN_VEXT_VV_ENV(vfwsub_vv_h)
-GEN_VEXT_VV_ENV(vfwsub_vv_w)
+GEN_VEXT_VV_ENV(vfwsub_vv_h, 4)
+GEN_VEXT_VV_ENV(vfwsub_vv_w, 8)
 RVVCALL(OPFVF2, vfwsub_vf_h, WOP_UUU_H, H4, H2, vfwsub16)
 RVVCALL(OPFVF2, vfwsub_vf_w, WOP_UUU_W, H8, H4, vfwsub32)
-GEN_VEXT_VF(vfwsub_vf_h)
-GEN_VEXT_VF(vfwsub_vf_w)
+GEN_VEXT_VF(vfwsub_vf_h, 4)
+GEN_VEXT_VF(vfwsub_vf_w, 8)
 
 static uint32_t vfwaddw16(uint32_t a, uint16_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwaddw32(uint64_t a, uint32_t b, float_status *s)
 
 RVVCALL(OPFVV2, vfwadd_wv_h, WOP_WUUU_H, H4, H2, H2, vfwaddw16)
 RVVCALL(OPFVV2, vfwadd_wv_w, WOP_WUUU_W, H8, H4, H4, vfwaddw32)
-GEN_VEXT_VV_ENV(vfwadd_wv_h)
-GEN_VEXT_VV_ENV(vfwadd_wv_w)
+GEN_VEXT_VV_ENV(vfwadd_wv_h, 4)
+GEN_VEXT_VV_ENV(vfwadd_wv_w, 8)
 RVVCALL(OPFVF2, vfwadd_wf_h, WOP_WUUU_H, H4, H2, vfwaddw16)
 RVVCALL(OPFVF2, vfwadd_wf_w, WOP_WUUU_W, H8, H4, vfwaddw32)
-GEN_VEXT_VF(vfwadd_wf_h)
-GEN_VEXT_VF(vfwadd_wf_w)
+GEN_VEXT_VF(vfwadd_wf_h, 4)
+GEN_VEXT_VF(vfwadd_wf_w, 8)
 
 static uint32_t vfwsubw16(uint32_t a, uint16_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwsubw32(uint64_t a, uint32_t b, float_status *s)
 
 RVVCALL(OPFVV2, vfwsub_wv_h, WOP_WUUU_H, H4, H2, H2, vfwsubw16)
 RVVCALL(OPFVV2, vfwsub_wv_w, WOP_WUUU_W, H8, H4, H4, vfwsubw32)
-GEN_VEXT_VV_ENV(vfwsub_wv_h)
-GEN_VEXT_VV_ENV(vfwsub_wv_w)
+GEN_VEXT_VV_ENV(vfwsub_wv_h, 4)
+GEN_VEXT_VV_ENV(vfwsub_wv_w, 8)
 RVVCALL(OPFVF2, vfwsub_wf_h, WOP_WUUU_H, H4, H2, vfwsubw16)
 RVVCALL(OPFVF2, vfwsub_wf_w, WOP_WUUU_W, H8, H4, vfwsubw32)
-GEN_VEXT_VF(vfwsub_wf_h)
-GEN_VEXT_VF(vfwsub_wf_w)
+GEN_VEXT_VF(vfwsub_wf_h, 4)
+GEN_VEXT_VF(vfwsub_wf_w, 8)
 
 /* Vector Single-Width Floating-Point Multiply/Divide Instructions */
 RVVCALL(OPFVV2, vfmul_vv_h, OP_UUU_H, H2, H2, H2, float16_mul)
 RVVCALL(OPFVV2, vfmul_vv_w, OP_UUU_W, H4, H4, H4, float32_mul)
 RVVCALL(OPFVV2, vfmul_vv_d, OP_UUU_D, H8, H8, H8, float64_mul)
-GEN_VEXT_VV_ENV(vfmul_vv_h)
-GEN_VEXT_VV_ENV(vfmul_vv_w)
-GEN_VEXT_VV_ENV(vfmul_vv_d)
+GEN_VEXT_VV_ENV(vfmul_vv_h, 2)
+GEN_VEXT_VV_ENV(vfmul_vv_w, 4)
+GEN_VEXT_VV_ENV(vfmul_vv_d, 8)
 RVVCALL(OPFVF2, vfmul_vf_h, OP_UUU_H, H2, H2, float16_mul)
 RVVCALL(OPFVF2, vfmul_vf_w, OP_UUU_W, H4, H4, float32_mul)
 RVVCALL(OPFVF2, vfmul_vf_d, OP_UUU_D, H8, H8, float64_mul)
-GEN_VEXT_VF(vfmul_vf_h)
-GEN_VEXT_VF(vfmul_vf_w)
-GEN_VEXT_VF(vfmul_vf_d)
+GEN_VEXT_VF(vfmul_vf_h, 2)
+GEN_VEXT_VF(vfmul_vf_w, 4)
+GEN_VEXT_VF(vfmul_vf_d, 8)
 
 RVVCALL(OPFVV2, vfdiv_vv_h, OP_UUU_H, H2, H2, H2, float16_div)
 RVVCALL(OPFVV2, vfdiv_vv_w, OP_UUU_W, H4, H4, H4, float32_div)
 RVVCALL(OPFVV2, vfdiv_vv_d, OP_UUU_D, H8, H8, H8, float64_div)
-GEN_VEXT_VV_ENV(vfdiv_vv_h)
-GEN_VEXT_VV_ENV(vfdiv_vv_w)
-GEN_VEXT_VV_ENV(vfdiv_vv_d)
+GEN_VEXT_VV_ENV(vfdiv_vv_h, 2)
+GEN_VEXT_VV_ENV(vfdiv_vv_w, 4)
+GEN_VEXT_VV_ENV(vfdiv_vv_d, 8)
 RVVCALL(OPFVF2, vfdiv_vf_h, OP_UUU_H, H2, H2, float16_div)
 RVVCALL(OPFVF2, vfdiv_vf_w, OP_UUU_W, H4, H4, float32_div)
 RVVCALL(OPFVF2, vfdiv_vf_d, OP_UUU_D, H8, H8, float64_div)
-GEN_VEXT_VF(vfdiv_vf_h)
-GEN_VEXT_VF(vfdiv_vf_w)
-GEN_VEXT_VF(vfdiv_vf_d)
+GEN_VEXT_VF(vfdiv_vf_h, 2)
+GEN_VEXT_VF(vfdiv_vf_w, 4)
+GEN_VEXT_VF(vfdiv_vf_d, 8)
 
 static uint16_t float16_rdiv(uint16_t a, uint16_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t float64_rdiv(uint64_t a, uint64_t b, float_status *s)
 RVVCALL(OPFVF2, vfrdiv_vf_h, OP_UUU_H, H2, H2, float16_rdiv)
 RVVCALL(OPFVF2, vfrdiv_vf_w, OP_UUU_W, H4, H4, float32_rdiv)
 RVVCALL(OPFVF2, vfrdiv_vf_d, OP_UUU_D, H8, H8, float64_rdiv)
-GEN_VEXT_VF(vfrdiv_vf_h)
-GEN_VEXT_VF(vfrdiv_vf_w)
-GEN_VEXT_VF(vfrdiv_vf_d)
+GEN_VEXT_VF(vfrdiv_vf_h, 2)
+GEN_VEXT_VF(vfrdiv_vf_w, 4)
+GEN_VEXT_VF(vfrdiv_vf_d, 8)
 
 /* Vector Widening Floating-Point Multiply */
 static uint32_t vfwmul16(uint16_t a, uint16_t b, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t vfwmul32(uint32_t a, uint32_t b, float_status *s)
 }
 RVVCALL(OPFVV2, vfwmul_vv_h, WOP_UUU_H, H4, H2, H2, vfwmul16)
 RVVCALL(OPFVV2, vfwmul_vv_w, WOP_UUU_W, H8, H4, H4, vfwmul32)
-GEN_VEXT_VV_ENV(vfwmul_vv_h)
-GEN_VEXT_VV_ENV(vfwmul_vv_w)
+GEN_VEXT_VV_ENV(vfwmul_vv_h, 4)
+GEN_VEXT_VV_ENV(vfwmul_vv_w, 8)
 RVVCALL(OPFVF2, vfwmul_vf_h, WOP_UUU_H, H4, H2, vfwmul16)
 RVVCALL(OPFVF2, vfwmul_vf_w, WOP_UUU_W, H8, H4, vfwmul32)
-GEN_VEXT_VF(vfwmul_vf_h)
-GEN_VEXT_VF(vfwmul_vf_w)
+GEN_VEXT_VF(vfwmul_vf_h, 4)
+GEN_VEXT_VF(vfwmul_vf_w, 8)
 
 /* Vector Single-Width Floating-Point Fused Multiply-Add Instructions */
 #define OPFVV3(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)       \
@@ -XXX,XX +XXX,XX @@ static uint64_t fmacc64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfmacc_vv_h, OP_UUU_H, H2, H2, H2, fmacc16)
 RVVCALL(OPFVV3, vfmacc_vv_w, OP_UUU_W, H4, H4, H4, fmacc32)
 RVVCALL(OPFVV3, vfmacc_vv_d, OP_UUU_D, H8, H8, H8, fmacc64)
-GEN_VEXT_VV_ENV(vfmacc_vv_h)
-GEN_VEXT_VV_ENV(vfmacc_vv_w)
-GEN_VEXT_VV_ENV(vfmacc_vv_d)
+GEN_VEXT_VV_ENV(vfmacc_vv_h, 2)
+GEN_VEXT_VV_ENV(vfmacc_vv_w, 4)
+GEN_VEXT_VV_ENV(vfmacc_vv_d, 8)
 
 #define OPFVF3(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)           \
 static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i,    \
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, uint64_t s1, void *vs2, int i,    \
 RVVCALL(OPFVF3, vfmacc_vf_h, OP_UUU_H, H2, H2, fmacc16)
 RVVCALL(OPFVF3, vfmacc_vf_w, OP_UUU_W, H4, H4, fmacc32)
 RVVCALL(OPFVF3, vfmacc_vf_d, OP_UUU_D, H8, H8, fmacc64)
-GEN_VEXT_VF(vfmacc_vf_h)
-GEN_VEXT_VF(vfmacc_vf_w)
-GEN_VEXT_VF(vfmacc_vf_d)
+GEN_VEXT_VF(vfmacc_vf_h, 2)
+GEN_VEXT_VF(vfmacc_vf_w, 4)
+GEN_VEXT_VF(vfmacc_vf_d, 8)
 
 static uint16_t fnmacc16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmacc64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfnmacc_vv_h, OP_UUU_H, H2, H2, H2, fnmacc16)
 RVVCALL(OPFVV3, vfnmacc_vv_w, OP_UUU_W, H4, H4, H4, fnmacc32)
 RVVCALL(OPFVV3, vfnmacc_vv_d, OP_UUU_D, H8, H8, H8, fnmacc64)
-GEN_VEXT_VV_ENV(vfnmacc_vv_h)
-GEN_VEXT_VV_ENV(vfnmacc_vv_w)
-GEN_VEXT_VV_ENV(vfnmacc_vv_d)
+GEN_VEXT_VV_ENV(vfnmacc_vv_h, 2)
+GEN_VEXT_VV_ENV(vfnmacc_vv_w, 4)
+GEN_VEXT_VV_ENV(vfnmacc_vv_d, 8)
 RVVCALL(OPFVF3, vfnmacc_vf_h, OP_UUU_H, H2, H2, fnmacc16)
 RVVCALL(OPFVF3, vfnmacc_vf_w, OP_UUU_W, H4, H4, fnmacc32)
 RVVCALL(OPFVF3, vfnmacc_vf_d, OP_UUU_D, H8, H8, fnmacc64)
-GEN_VEXT_VF(vfnmacc_vf_h)
-GEN_VEXT_VF(vfnmacc_vf_w)
-GEN_VEXT_VF(vfnmacc_vf_d)
+GEN_VEXT_VF(vfnmacc_vf_h, 2)
+GEN_VEXT_VF(vfnmacc_vf_w, 4)
+GEN_VEXT_VF(vfnmacc_vf_d, 8)
 
 static uint16_t fmsac16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fmsac64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfmsac_vv_h, OP_UUU_H, H2, H2, H2, fmsac16)
 RVVCALL(OPFVV3, vfmsac_vv_w, OP_UUU_W, H4, H4, H4, fmsac32)
 RVVCALL(OPFVV3, vfmsac_vv_d, OP_UUU_D, H8, H8, H8, fmsac64)
-GEN_VEXT_VV_ENV(vfmsac_vv_h)
-GEN_VEXT_VV_ENV(vfmsac_vv_w)
-GEN_VEXT_VV_ENV(vfmsac_vv_d)
+GEN_VEXT_VV_ENV(vfmsac_vv_h, 2)
+GEN_VEXT_VV_ENV(vfmsac_vv_w, 4)
+GEN_VEXT_VV_ENV(vfmsac_vv_d, 8)
 RVVCALL(OPFVF3, vfmsac_vf_h, OP_UUU_H, H2, H2, fmsac16)
 RVVCALL(OPFVF3, vfmsac_vf_w, OP_UUU_W, H4, H4, fmsac32)
 RVVCALL(OPFVF3, vfmsac_vf_d, OP_UUU_D, H8, H8, fmsac64)
-GEN_VEXT_VF(vfmsac_vf_h)
-GEN_VEXT_VF(vfmsac_vf_w)
-GEN_VEXT_VF(vfmsac_vf_d)
+GEN_VEXT_VF(vfmsac_vf_h, 2)
+GEN_VEXT_VF(vfmsac_vf_w, 4)
+GEN_VEXT_VF(vfmsac_vf_d, 8)
 
 static uint16_t fnmsac16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmsac64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfnmsac_vv_h, OP_UUU_H, H2, H2, H2, fnmsac16)
 RVVCALL(OPFVV3, vfnmsac_vv_w, OP_UUU_W, H4, H4, H4, fnmsac32)
 RVVCALL(OPFVV3, vfnmsac_vv_d, OP_UUU_D, H8, H8, H8, fnmsac64)
-GEN_VEXT_VV_ENV(vfnmsac_vv_h)
-GEN_VEXT_VV_ENV(vfnmsac_vv_w)
-GEN_VEXT_VV_ENV(vfnmsac_vv_d)
+GEN_VEXT_VV_ENV(vfnmsac_vv_h, 2)
+GEN_VEXT_VV_ENV(vfnmsac_vv_w, 4)
+GEN_VEXT_VV_ENV(vfnmsac_vv_d, 8)
 RVVCALL(OPFVF3, vfnmsac_vf_h, OP_UUU_H, H2, H2, fnmsac16)
 RVVCALL(OPFVF3, vfnmsac_vf_w, OP_UUU_W, H4, H4, fnmsac32)
 RVVCALL(OPFVF3, vfnmsac_vf_d, OP_UUU_D, H8, H8, fnmsac64)
-GEN_VEXT_VF(vfnmsac_vf_h)
-GEN_VEXT_VF(vfnmsac_vf_w)
-GEN_VEXT_VF(vfnmsac_vf_d)
+GEN_VEXT_VF(vfnmsac_vf_h, 2)
+GEN_VEXT_VF(vfnmsac_vf_w, 4)
+GEN_VEXT_VF(vfnmsac_vf_d, 8)
 
 static uint16_t fmadd16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fmadd64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfmadd_vv_h, OP_UUU_H, H2, H2, H2, fmadd16)
 RVVCALL(OPFVV3, vfmadd_vv_w, OP_UUU_W, H4, H4, H4, fmadd32)
 RVVCALL(OPFVV3, vfmadd_vv_d, OP_UUU_D, H8, H8, H8, fmadd64)
-GEN_VEXT_VV_ENV(vfmadd_vv_h)
-GEN_VEXT_VV_ENV(vfmadd_vv_w)
-GEN_VEXT_VV_ENV(vfmadd_vv_d)
+GEN_VEXT_VV_ENV(vfmadd_vv_h, 2)
+GEN_VEXT_VV_ENV(vfmadd_vv_w, 4)
+GEN_VEXT_VV_ENV(vfmadd_vv_d, 8)
 RVVCALL(OPFVF3, vfmadd_vf_h, OP_UUU_H, H2, H2, fmadd16)
 RVVCALL(OPFVF3, vfmadd_vf_w, OP_UUU_W, H4, H4, fmadd32)
 RVVCALL(OPFVF3, vfmadd_vf_d, OP_UUU_D, H8, H8, fmadd64)
-GEN_VEXT_VF(vfmadd_vf_h)
-GEN_VEXT_VF(vfmadd_vf_w)
-GEN_VEXT_VF(vfmadd_vf_d)
+GEN_VEXT_VF(vfmadd_vf_h, 2)
+GEN_VEXT_VF(vfmadd_vf_w, 4)
+GEN_VEXT_VF(vfmadd_vf_d, 8)
 
 static uint16_t fnmadd16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmadd64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfnmadd_vv_h, OP_UUU_H, H2, H2, H2, fnmadd16)
 RVVCALL(OPFVV3, vfnmadd_vv_w, OP_UUU_W, H4, H4, H4, fnmadd32)
 RVVCALL(OPFVV3, vfnmadd_vv_d, OP_UUU_D, H8, H8, H8, fnmadd64)
-GEN_VEXT_VV_ENV(vfnmadd_vv_h)
-GEN_VEXT_VV_ENV(vfnmadd_vv_w)
-GEN_VEXT_VV_ENV(vfnmadd_vv_d)
+GEN_VEXT_VV_ENV(vfnmadd_vv_h, 2)
+GEN_VEXT_VV_ENV(vfnmadd_vv_w, 4)
+GEN_VEXT_VV_ENV(vfnmadd_vv_d, 8)
 RVVCALL(OPFVF3, vfnmadd_vf_h, OP_UUU_H, H2, H2, fnmadd16)
 RVVCALL(OPFVF3, vfnmadd_vf_w, OP_UUU_W, H4, H4, fnmadd32)
 RVVCALL(OPFVF3, vfnmadd_vf_d, OP_UUU_D, H8, H8, fnmadd64)
-GEN_VEXT_VF(vfnmadd_vf_h)
-GEN_VEXT_VF(vfnmadd_vf_w)
-GEN_VEXT_VF(vfnmadd_vf_d)
+GEN_VEXT_VF(vfnmadd_vf_h, 2)
+GEN_VEXT_VF(vfnmadd_vf_w, 4)
+GEN_VEXT_VF(vfnmadd_vf_d, 8)
 
 static uint16_t fmsub16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fmsub64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfmsub_vv_h, OP_UUU_H, H2, H2, H2, fmsub16)
 RVVCALL(OPFVV3, vfmsub_vv_w, OP_UUU_W, H4, H4, H4, fmsub32)
 RVVCALL(OPFVV3, vfmsub_vv_d, OP_UUU_D, H8, H8, H8, fmsub64)
-GEN_VEXT_VV_ENV(vfmsub_vv_h)
-GEN_VEXT_VV_ENV(vfmsub_vv_w)
-GEN_VEXT_VV_ENV(vfmsub_vv_d)
+GEN_VEXT_VV_ENV(vfmsub_vv_h, 2)
+GEN_VEXT_VV_ENV(vfmsub_vv_w, 4)
+GEN_VEXT_VV_ENV(vfmsub_vv_d, 8)
 RVVCALL(OPFVF3, vfmsub_vf_h, OP_UUU_H, H2, H2, fmsub16)
 RVVCALL(OPFVF3, vfmsub_vf_w, OP_UUU_W, H4, H4, fmsub32)
 RVVCALL(OPFVF3, vfmsub_vf_d, OP_UUU_D, H8, H8, fmsub64)
-GEN_VEXT_VF(vfmsub_vf_h)
-GEN_VEXT_VF(vfmsub_vf_w)
-GEN_VEXT_VF(vfmsub_vf_d)
+GEN_VEXT_VF(vfmsub_vf_h, 2)
+GEN_VEXT_VF(vfmsub_vf_w, 4)
+GEN_VEXT_VF(vfmsub_vf_d, 8)
 
 static uint16_t fnmsub16(uint16_t a, uint16_t b, uint16_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fnmsub64(uint64_t a, uint64_t b, uint64_t d, float_status *s)
 RVVCALL(OPFVV3, vfnmsub_vv_h, OP_UUU_H, H2, H2, H2, fnmsub16)
 RVVCALL(OPFVV3, vfnmsub_vv_w, OP_UUU_W, H4, H4, H4, fnmsub32)
 RVVCALL(OPFVV3, vfnmsub_vv_d, OP_UUU_D, H8, H8, H8, fnmsub64)
-GEN_VEXT_VV_ENV(vfnmsub_vv_h)
-GEN_VEXT_VV_ENV(vfnmsub_vv_w)
-GEN_VEXT_VV_ENV(vfnmsub_vv_d)
+GEN_VEXT_VV_ENV(vfnmsub_vv_h, 2)
+GEN_VEXT_VV_ENV(vfnmsub_vv_w, 4)
+GEN_VEXT_VV_ENV(vfnmsub_vv_d, 8)
 RVVCALL(OPFVF3, vfnmsub_vf_h, OP_UUU_H, H2, H2, fnmsub16)
 RVVCALL(OPFVF3, vfnmsub_vf_w, OP_UUU_W, H4, H4, fnmsub32)
 RVVCALL(OPFVF3, vfnmsub_vf_d, OP_UUU_D, H8, H8, fnmsub64)
-GEN_VEXT_VF(vfnmsub_vf_h)
-GEN_VEXT_VF(vfnmsub_vf_w)
-GEN_VEXT_VF(vfnmsub_vf_d)
+GEN_VEXT_VF(vfnmsub_vf_h, 2)
+GEN_VEXT_VF(vfnmsub_vf_w, 4)
+GEN_VEXT_VF(vfnmsub_vf_d, 8)
 
 /* Vector Widening Floating-Point Fused Multiply-Add Instructions */
 static uint32_t fwmacc16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t fwmacc32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
 
 RVVCALL(OPFVV3, vfwmacc_vv_h, WOP_UUU_H, H4, H2, H2, fwmacc16)
 RVVCALL(OPFVV3, vfwmacc_vv_w, WOP_UUU_W, H8, H4, H4, fwmacc32)
-GEN_VEXT_VV_ENV(vfwmacc_vv_h)
-GEN_VEXT_VV_ENV(vfwmacc_vv_w)
+GEN_VEXT_VV_ENV(vfwmacc_vv_h, 4)
+GEN_VEXT_VV_ENV(vfwmacc_vv_w, 8)
 RVVCALL(OPFVF3, vfwmacc_vf_h, WOP_UUU_H, H4, H2, fwmacc16)
 RVVCALL(OPFVF3, vfwmacc_vf_w, WOP_UUU_W, H8, H4, fwmacc32)
-GEN_VEXT_VF(vfwmacc_vf_h)
-GEN_VEXT_VF(vfwmacc_vf_w)
+GEN_VEXT_VF(vfwmacc_vf_h, 4)
+GEN_VEXT_VF(vfwmacc_vf_w, 8)
 
 static uint32_t fwnmacc16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fwnmacc32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
 
 RVVCALL(OPFVV3, vfwnmacc_vv_h, WOP_UUU_H, H4, H2, H2, fwnmacc16)
 RVVCALL(OPFVV3, vfwnmacc_vv_w, WOP_UUU_W, H8, H4, H4, fwnmacc32)
-GEN_VEXT_VV_ENV(vfwnmacc_vv_h)
-GEN_VEXT_VV_ENV(vfwnmacc_vv_w)
+GEN_VEXT_VV_ENV(vfwnmacc_vv_h, 4)
+GEN_VEXT_VV_ENV(vfwnmacc_vv_w, 8)
 RVVCALL(OPFVF3, vfwnmacc_vf_h, WOP_UUU_H, H4, H2, fwnmacc16)
 RVVCALL(OPFVF3, vfwnmacc_vf_w, WOP_UUU_W, H8, H4, fwnmacc32)
-GEN_VEXT_VF(vfwnmacc_vf_h)
-GEN_VEXT_VF(vfwnmacc_vf_w)
+GEN_VEXT_VF(vfwnmacc_vf_h, 4)
+GEN_VEXT_VF(vfwnmacc_vf_w, 8)
 
 static uint32_t fwmsac16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fwmsac32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
 
 RVVCALL(OPFVV3, vfwmsac_vv_h, WOP_UUU_H, H4, H2, H2, fwmsac16)
 RVVCALL(OPFVV3, vfwmsac_vv_w, WOP_UUU_W, H8, H4, H4, fwmsac32)
-GEN_VEXT_VV_ENV(vfwmsac_vv_h)
-GEN_VEXT_VV_ENV(vfwmsac_vv_w)
+GEN_VEXT_VV_ENV(vfwmsac_vv_h, 4)
+GEN_VEXT_VV_ENV(vfwmsac_vv_w, 8)
 RVVCALL(OPFVF3, vfwmsac_vf_h, WOP_UUU_H, H4, H2, fwmsac16)
 RVVCALL(OPFVF3, vfwmsac_vf_w, WOP_UUU_W, H8, H4, fwmsac32)
-GEN_VEXT_VF(vfwmsac_vf_h)
-GEN_VEXT_VF(vfwmsac_vf_w)
+GEN_VEXT_VF(vfwmsac_vf_h, 4)
+GEN_VEXT_VF(vfwmsac_vf_w, 8)
 
 static uint32_t fwnmsac16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fwnmsac32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
 
 RVVCALL(OPFVV3, vfwnmsac_vv_h, WOP_UUU_H, H4, H2, H2, fwnmsac16)
 RVVCALL(OPFVV3, vfwnmsac_vv_w, WOP_UUU_W, H8, H4, H4, fwnmsac32)
-GEN_VEXT_VV_ENV(vfwnmsac_vv_h)
-GEN_VEXT_VV_ENV(vfwnmsac_vv_w)
+GEN_VEXT_VV_ENV(vfwnmsac_vv_h, 4)
+GEN_VEXT_VV_ENV(vfwnmsac_vv_w, 8)
 RVVCALL(OPFVF3, vfwnmsac_vf_h, WOP_UUU_H, H4, H2, fwnmsac16)
 RVVCALL(OPFVF3, vfwnmsac_vf_w, WOP_UUU_W, H8, H4, fwnmsac32)
-GEN_VEXT_VF(vfwnmsac_vf_h)
-GEN_VEXT_VF(vfwnmsac_vf_w)
+GEN_VEXT_VF(vfwnmsac_vf_h, 4)
+GEN_VEXT_VF(vfwnmsac_vf_w, 8)
 
 /* Vector Floating-Point Square-Root Instruction */
 /* (TD, T2, TX2) */
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, void *vs2, int i,      \
     *((TD *)vd + HD(i)) = OP(s2, &env->fp_status);     \
 }
 
-#define GEN_VEXT_V_ENV(NAME)                           \
+#define GEN_VEXT_V_ENV(NAME, ESZ)                      \
 void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
         CPURISCVState *env, uint32_t desc)             \
 {                                                      \
     uint32_t vm = vext_vm(desc);                       \
     uint32_t vl = env->vl;                             \
+    uint32_t total_elems =                             \
+        vext_get_total_elems(env, desc, ESZ);          \
+    uint32_t vta = vext_vta(desc);                     \
     uint32_t i;                                        \
                                                        \
     if (vl == 0) {                                     \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
         do_##NAME(vd, vs2, i, env);                    \
     }                                                  \
     env->vstart = 0;                                   \
+    vext_set_elems_1s(vd, vta, vl * ESZ,               \
+                      total_elems * ESZ);              \
 }
 
 RVVCALL(OPFVV1, vfsqrt_v_h, OP_UU_H, H2, H2, float16_sqrt)
 RVVCALL(OPFVV1, vfsqrt_v_w, OP_UU_W, H4, H4, float32_sqrt)
 RVVCALL(OPFVV1, vfsqrt_v_d, OP_UU_D, H8, H8, float64_sqrt)
-GEN_VEXT_V_ENV(vfsqrt_v_h)
-GEN_VEXT_V_ENV(vfsqrt_v_w)
-GEN_VEXT_V_ENV(vfsqrt_v_d)
+GEN_VEXT_V_ENV(vfsqrt_v_h, 2)
+GEN_VEXT_V_ENV(vfsqrt_v_w, 4)
+GEN_VEXT_V_ENV(vfsqrt_v_d, 8)
 
 /*
  * Vector Floating-Point Reciprocal Square-Root Estimate Instruction
@@ -XXX,XX +XXX,XX @@ static float64 frsqrt7_d(float64 f, float_status *s)
 RVVCALL(OPFVV1, vfrsqrt7_v_h, OP_UU_H, H2, H2, frsqrt7_h)
 RVVCALL(OPFVV1, vfrsqrt7_v_w, OP_UU_W, H4, H4, frsqrt7_s)
 RVVCALL(OPFVV1, vfrsqrt7_v_d, OP_UU_D, H8, H8, frsqrt7_d)
-GEN_VEXT_V_ENV(vfrsqrt7_v_h)
-GEN_VEXT_V_ENV(vfrsqrt7_v_w)
-GEN_VEXT_V_ENV(vfrsqrt7_v_d)
+GEN_VEXT_V_ENV(vfrsqrt7_v_h, 2)
+GEN_VEXT_V_ENV(vfrsqrt7_v_w, 4)
+GEN_VEXT_V_ENV(vfrsqrt7_v_d, 8)
 
 /*
  * Vector Floating-Point Reciprocal Estimate Instruction
@@ -XXX,XX +XXX,XX @@ static float64 frec7_d(float64 f, float_status *s)
 RVVCALL(OPFVV1, vfrec7_v_h, OP_UU_H, H2, H2, frec7_h)
 RVVCALL(OPFVV1, vfrec7_v_w, OP_UU_W, H4, H4, frec7_s)
 RVVCALL(OPFVV1, vfrec7_v_d, OP_UU_D, H8, H8, frec7_d)
-GEN_VEXT_V_ENV(vfrec7_v_h)
-GEN_VEXT_V_ENV(vfrec7_v_w)
-GEN_VEXT_V_ENV(vfrec7_v_d)
+GEN_VEXT_V_ENV(vfrec7_v_h, 2)
+GEN_VEXT_V_ENV(vfrec7_v_w, 4)
+GEN_VEXT_V_ENV(vfrec7_v_d, 8)
 
 /* Vector Floating-Point MIN/MAX Instructions */
 RVVCALL(OPFVV2, vfmin_vv_h, OP_UUU_H, H2, H2, H2, float16_minimum_number)
 RVVCALL(OPFVV2, vfmin_vv_w, OP_UUU_W, H4, H4, H4, float32_minimum_number)
 RVVCALL(OPFVV2, vfmin_vv_d, OP_UUU_D, H8, H8, H8, float64_minimum_number)
-GEN_VEXT_VV_ENV(vfmin_vv_h)
-GEN_VEXT_VV_ENV(vfmin_vv_w)
-GEN_VEXT_VV_ENV(vfmin_vv_d)
+GEN_VEXT_VV_ENV(vfmin_vv_h, 2)
+GEN_VEXT_VV_ENV(vfmin_vv_w, 4)
+GEN_VEXT_VV_ENV(vfmin_vv_d, 8)
 RVVCALL(OPFVF2, vfmin_vf_h, OP_UUU_H, H2, H2, float16_minimum_number)
 RVVCALL(OPFVF2, vfmin_vf_w, OP_UUU_W, H4, H4, float32_minimum_number)
 RVVCALL(OPFVF2, vfmin_vf_d, OP_UUU_D, H8, H8, float64_minimum_number)
-GEN_VEXT_VF(vfmin_vf_h)
-GEN_VEXT_VF(vfmin_vf_w)
-GEN_VEXT_VF(vfmin_vf_d)
+GEN_VEXT_VF(vfmin_vf_h, 2)
+GEN_VEXT_VF(vfmin_vf_w, 4)
+GEN_VEXT_VF(vfmin_vf_d, 8)
 
 RVVCALL(OPFVV2, vfmax_vv_h, OP_UUU_H, H2, H2, H2, float16_maximum_number)
 RVVCALL(OPFVV2, vfmax_vv_w, OP_UUU_W, H4, H4, H4, float32_maximum_number)
 RVVCALL(OPFVV2, vfmax_vv_d, OP_UUU_D, H8, H8, H8, float64_maximum_number)
-GEN_VEXT_VV_ENV(vfmax_vv_h)
-GEN_VEXT_VV_ENV(vfmax_vv_w)
-GEN_VEXT_VV_ENV(vfmax_vv_d)
+GEN_VEXT_VV_ENV(vfmax_vv_h, 2)
+GEN_VEXT_VV_ENV(vfmax_vv_w, 4)
+GEN_VEXT_VV_ENV(vfmax_vv_d, 8)
 RVVCALL(OPFVF2, vfmax_vf_h, OP_UUU_H, H2, H2, float16_maximum_number)
 RVVCALL(OPFVF2, vfmax_vf_w, OP_UUU_W, H4, H4, float32_maximum_number)
 RVVCALL(OPFVF2, vfmax_vf_d, OP_UUU_D, H8, H8, float64_maximum_number)
-GEN_VEXT_VF(vfmax_vf_h)
-GEN_VEXT_VF(vfmax_vf_w)
-GEN_VEXT_VF(vfmax_vf_d)
+GEN_VEXT_VF(vfmax_vf_h, 2)
+GEN_VEXT_VF(vfmax_vf_w, 4)
+GEN_VEXT_VF(vfmax_vf_d, 8)
 
 /* Vector Floating-Point Sign-Injection Instructions */
 static uint16_t fsgnj16(uint16_t a, uint16_t b, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint64_t fsgnj64(uint64_t a, uint64_t b, float_status *s)
 RVVCALL(OPFVV2, vfsgnj_vv_h, OP_UUU_H, H2, H2, H2, fsgnj16)
 RVVCALL(OPFVV2, vfsgnj_vv_w, OP_UUU_W, H4, H4, H4, fsgnj32)
 RVVCALL(OPFVV2, vfsgnj_vv_d, OP_UUU_D, H8, H8, H8, fsgnj64)
-GEN_VEXT_VV_ENV(vfsgnj_vv_h)
-GEN_VEXT_VV_ENV(vfsgnj_vv_w)
-GEN_VEXT_VV_ENV(vfsgnj_vv_d)
+GEN_VEXT_VV_ENV(vfsgnj_vv_h, 2)
+GEN_VEXT_VV_ENV(vfsgnj_vv_w, 4)
+GEN_VEXT_VV_ENV(vfsgnj_vv_d, 8)
 RVVCALL(OPFVF2, vfsgnj_vf_h, OP_UUU_H, H2, H2, fsgnj16)
 RVVCALL(OPFVF2, vfsgnj_vf_w, OP_UUU_W, H4, H4, fsgnj32)
 RVVCALL(OPFVF2, vfsgnj_vf_d, OP_UUU_D, H8, H8, fsgnj64)
-GEN_VEXT_VF(vfsgnj_vf_h)
-GEN_VEXT_VF(vfsgnj_vf_w)
-GEN_VEXT_VF(vfsgnj_vf_d)
+GEN_VEXT_VF(vfsgnj_vf_h, 2)
+GEN_VEXT_VF(vfsgnj_vf_w, 4)
+GEN_VEXT_VF(vfsgnj_vf_d, 8)
 
 static uint16_t fsgnjn16(uint16_t a, uint16_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fsgnjn64(uint64_t a, uint64_t b, float_status *s)
 RVVCALL(OPFVV2, vfsgnjn_vv_h, OP_UUU_H, H2, H2, H2, fsgnjn16)
 RVVCALL(OPFVV2, vfsgnjn_vv_w, OP_UUU_W, H4, H4, H4, fsgnjn32)
 RVVCALL(OPFVV2, vfsgnjn_vv_d, OP_UUU_D, H8, H8, H8, fsgnjn64)
-GEN_VEXT_VV_ENV(vfsgnjn_vv_h)
-GEN_VEXT_VV_ENV(vfsgnjn_vv_w)
-GEN_VEXT_VV_ENV(vfsgnjn_vv_d)
+GEN_VEXT_VV_ENV(vfsgnjn_vv_h, 2)
+GEN_VEXT_VV_ENV(vfsgnjn_vv_w, 4)
+GEN_VEXT_VV_ENV(vfsgnjn_vv_d, 8)
 RVVCALL(OPFVF2, vfsgnjn_vf_h, OP_UUU_H, H2, H2, fsgnjn16)
 RVVCALL(OPFVF2, vfsgnjn_vf_w, OP_UUU_W, H4, H4, fsgnjn32)
 RVVCALL(OPFVF2, vfsgnjn_vf_d, OP_UUU_D, H8, H8, fsgnjn64)
-GEN_VEXT_VF(vfsgnjn_vf_h)
-GEN_VEXT_VF(vfsgnjn_vf_w)
-GEN_VEXT_VF(vfsgnjn_vf_d)
+GEN_VEXT_VF(vfsgnjn_vf_h, 2)
+GEN_VEXT_VF(vfsgnjn_vf_w, 4)
+GEN_VEXT_VF(vfsgnjn_vf_d, 8)
 
 static uint16_t fsgnjx16(uint16_t a, uint16_t b, float_status *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint64_t fsgnjx64(uint64_t a, uint64_t b, float_status *s)
 RVVCALL(OPFVV2, vfsgnjx_vv_h, OP_UUU_H, H2, H2, H2, fsgnjx16)
 RVVCALL(OPFVV2, vfsgnjx_vv_w, OP_UUU_W, H4, H4, H4, fsgnjx32)
 RVVCALL(OPFVV2, vfsgnjx_vv_d, OP_UUU_D, H8, H8, H8, fsgnjx64)
-GEN_VEXT_VV_ENV(vfsgnjx_vv_h)
-GEN_VEXT_VV_ENV(vfsgnjx_vv_w)
-GEN_VEXT_VV_ENV(vfsgnjx_vv_d)
+GEN_VEXT_VV_ENV(vfsgnjx_vv_h, 2)
+GEN_VEXT_VV_ENV(vfsgnjx_vv_w, 4)
+GEN_VEXT_VV_ENV(vfsgnjx_vv_d, 8)
 RVVCALL(OPFVF2, vfsgnjx_vf_h, OP_UUU_H, H2, H2, fsgnjx16)
 RVVCALL(OPFVF2, vfsgnjx_vf_w, OP_UUU_W, H4, H4, fsgnjx32)
 RVVCALL(OPFVF2, vfsgnjx_vf_d, OP_UUU_D, H8, H8, fsgnjx64)
-GEN_VEXT_VF(vfsgnjx_vf_h)
-GEN_VEXT_VF(vfsgnjx_vf_w)
-GEN_VEXT_VF(vfsgnjx_vf_d)
+GEN_VEXT_VF(vfsgnjx_vf_h, 2)
+GEN_VEXT_VF(vfsgnjx_vf_w, 4)
+GEN_VEXT_VF(vfsgnjx_vf_d, 8)
 
 /* Vector Floating-Point Compare Instructions */
 #define GEN_VEXT_CMP_VV_ENV(NAME, ETYPE, H, DO_OP)            \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
 {                                                             \
     uint32_t vm = vext_vm(desc);                              \
     uint32_t vl = env->vl;                                    \
+    uint32_t total_elems = env_archcpu(env)->cfg.vlen;        \
+    uint32_t vta_all_1s = vext_vta_all_1s(desc);              \
     uint32_t i;                                               \
                                                               \
     for (i = env->vstart; i < vl; i++) {                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,   \
                            DO_OP(s2, s1, &env->fp_status));   \
     }                                                         \
     env->vstart = 0;                                          \
+    /* mask destination register are always tail-agnostic */  \
+    /* set tail elements to 1s */                             \
+    if (vta_all_1s) {                                         \
+        for (; i < total_elems; i++) {                        \
+            vext_set_elem_mask(vd, i, 1);                     \
+        }                                                     \
+    }                                                         \
 }
 
 GEN_VEXT_CMP_VV_ENV(vmfeq_vv_h, uint16_t, H2, float16_eq_quiet)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2,       \
 {                                                                   \
     uint32_t vm = vext_vm(desc);                                    \
     uint32_t vl = env->vl;                                          \
+    uint32_t total_elems = env_archcpu(env)->cfg.vlen;              \
+    uint32_t vta_all_1s = vext_vta_all_1s(desc);                    \
     uint32_t i;                                                     \
                                                                     \
     for (i = env->vstart; i < vl; i++) {                            \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2,       \
                            DO_OP(s2, (ETYPE)s1, &env->fp_status));  \
     }                                                               \
     env->vstart = 0;                                                \
+    /* mask destination register are always tail-agnostic */        \
+    /* set tail elements to 1s */                                   \
+    if (vta_all_1s) {                                               \
+        for (; i < total_elems; i++) {                              \
+            vext_set_elem_mask(vd, i, 1);                           \
+        }                                                           \
+    }                                                               \
 }
 
 GEN_VEXT_CMP_VF(vmfeq_vf_h, uint16_t, H2, float16_eq_quiet)
@@ -XXX,XX +XXX,XX @@ static void do_##NAME(void *vd, void *vs2, int i)      \
     *((TD *)vd + HD(i)) = OP(s2);                      \
 }
 
-#define GEN_VEXT_V(NAME)                               \
+#define GEN_VEXT_V(NAME, ESZ)                          \
 void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
                   CPURISCVState *env, uint32_t desc)   \
 {                                                      \
     uint32_t vm = vext_vm(desc);                       \
     uint32_t vl = env->vl;                             \
+    uint32_t total_elems =                             \
+        vext_get_total_elems(env, desc, ESZ);          \
+    uint32_t vta = vext_vta(desc);                     \
     uint32_t i;                                        \
                                                        \
     for (i = env->vstart; i < vl; i++) {               \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
         do_##NAME(vd, vs2, i);                         \
     }                                                  \
     env->vstart = 0;                                   \
+    /* set tail elements to 1s */                      \
+    vext_set_elems_1s(vd, vta, vl * ESZ,               \
+                      total_elems * ESZ);              \
 }
 
 target_ulong fclass_h(uint64_t frs1)
@@ -XXX,XX +XXX,XX @@ target_ulong fclass_d(uint64_t frs1)
 RVVCALL(OPIVV1, vfclass_v_h, OP_UU_H, H2, H2, fclass_h)
 RVVCALL(OPIVV1, vfclass_v_w, OP_UU_W, H4, H4, fclass_s)
 RVVCALL(OPIVV1, vfclass_v_d, OP_UU_D, H8, H8, fclass_d)
-GEN_VEXT_V(vfclass_v_h)
-GEN_VEXT_V(vfclass_v_w)
-GEN_VEXT_V(vfclass_v_d)
+GEN_VEXT_V(vfclass_v_h, 2)
+GEN_VEXT_V(vfclass_v_w, 4)
+GEN_VEXT_V(vfclass_v_d, 8)
 
 /* Vector Floating-Point Merge Instruction */
+
 #define GEN_VFMERGE_VF(NAME, ETYPE, H)                        \
 void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2, \
                   CPURISCVState *env, uint32_t desc)          \
 {                                                             \
     uint32_t vm = vext_vm(desc);                              \
     uint32_t vl = env->vl;                                    \
+    uint32_t esz = sizeof(ETYPE);                             \
+    uint32_t total_elems =                                    \
+        vext_get_total_elems(env, desc, esz);                 \
+    uint32_t vta = vext_vta(desc);                            \
     uint32_t i;                                               \
                                                               \
     for (i = env->vstart; i < vl; i++) {                      \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2, \
           = (!vm && !vext_elem_mask(v0, i) ? s2 : s1);        \
     }                                                         \
     env->vstart = 0;                                          \
+    /* set tail elements to 1s */                             \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);  \
 }
 
 GEN_VFMERGE_VF(vfmerge_vfm_h, int16_t, H2)
@@ -XXX,XX +XXX,XX @@ GEN_VFMERGE_VF(vfmerge_vfm_d, int64_t, H8)
 RVVCALL(OPFVV1, vfcvt_xu_f_v_h, OP_UU_H, H2, H2, float16_to_uint16)
 RVVCALL(OPFVV1, vfcvt_xu_f_v_w, OP_UU_W, H4, H4, float32_to_uint32)
 RVVCALL(OPFVV1, vfcvt_xu_f_v_d, OP_UU_D, H8, H8, float64_to_uint64)
-GEN_VEXT_V_ENV(vfcvt_xu_f_v_h)
-GEN_VEXT_V_ENV(vfcvt_xu_f_v_w)
-GEN_VEXT_V_ENV(vfcvt_xu_f_v_d)
+GEN_VEXT_V_ENV(vfcvt_xu_f_v_h, 2)
+GEN_VEXT_V_ENV(vfcvt_xu_f_v_w, 4)
+GEN_VEXT_V_ENV(vfcvt_xu_f_v_d, 8)
 
 /* vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. */
 RVVCALL(OPFVV1, vfcvt_x_f_v_h, OP_UU_H, H2, H2, float16_to_int16)
 RVVCALL(OPFVV1, vfcvt_x_f_v_w, OP_UU_W, H4, H4, float32_to_int32)
 RVVCALL(OPFVV1, vfcvt_x_f_v_d, OP_UU_D, H8, H8, float64_to_int64)
-GEN_VEXT_V_ENV(vfcvt_x_f_v_h)
-GEN_VEXT_V_ENV(vfcvt_x_f_v_w)
-GEN_VEXT_V_ENV(vfcvt_x_f_v_d)
+GEN_VEXT_V_ENV(vfcvt_x_f_v_h, 2)
+GEN_VEXT_V_ENV(vfcvt_x_f_v_w, 4)
+GEN_VEXT_V_ENV(vfcvt_x_f_v_d, 8)
 
 /* vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. */
 RVVCALL(OPFVV1, vfcvt_f_xu_v_h, OP_UU_H, H2, H2, uint16_to_float16)
 RVVCALL(OPFVV1, vfcvt_f_xu_v_w, OP_UU_W, H4, H4, uint32_to_float32)
 RVVCALL(OPFVV1, vfcvt_f_xu_v_d, OP_UU_D, H8, H8, uint64_to_float64)
-GEN_VEXT_V_ENV(vfcvt_f_xu_v_h)
-GEN_VEXT_V_ENV(vfcvt_f_xu_v_w)
-GEN_VEXT_V_ENV(vfcvt_f_xu_v_d)
+GEN_VEXT_V_ENV(vfcvt_f_xu_v_h, 2)
+GEN_VEXT_V_ENV(vfcvt_f_xu_v_w, 4)
+GEN_VEXT_V_ENV(vfcvt_f_xu_v_d, 8)
 
 /* vfcvt.f.x.v vd, vs2, vm # Convert integer to float. */
 RVVCALL(OPFVV1, vfcvt_f_x_v_h, OP_UU_H, H2, H2, int16_to_float16)
 RVVCALL(OPFVV1, vfcvt_f_x_v_w, OP_UU_W, H4, H4, int32_to_float32)
 RVVCALL(OPFVV1, vfcvt_f_x_v_d, OP_UU_D, H8, H8, int64_to_float64)
-GEN_VEXT_V_ENV(vfcvt_f_x_v_h)
-GEN_VEXT_V_ENV(vfcvt_f_x_v_w)
-GEN_VEXT_V_ENV(vfcvt_f_x_v_d)
+GEN_VEXT_V_ENV(vfcvt_f_x_v_h, 2)
+GEN_VEXT_V_ENV(vfcvt_f_x_v_w, 4)
+GEN_VEXT_V_ENV(vfcvt_f_x_v_d, 8)
 
 /* Widening Floating-Point/Integer Type-Convert Instructions */
 /* (TD, T2, TX2) */
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_V_ENV(vfcvt_f_x_v_d)
 /* vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer.*/
 RVVCALL(OPFVV1, vfwcvt_xu_f_v_h, WOP_UU_H, H4, H2, float16_to_uint32)
 RVVCALL(OPFVV1, vfwcvt_xu_f_v_w, WOP_UU_W, H8, H4, float32_to_uint64)
-GEN_VEXT_V_ENV(vfwcvt_xu_f_v_h)
-GEN_VEXT_V_ENV(vfwcvt_xu_f_v_w)
+GEN_VEXT_V_ENV(vfwcvt_xu_f_v_h, 4)
+GEN_VEXT_V_ENV(vfwcvt_xu_f_v_w, 8)
 
 /* vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. */
 RVVCALL(OPFVV1, vfwcvt_x_f_v_h, WOP_UU_H, H4, H2, float16_to_int32)
 RVVCALL(OPFVV1, vfwcvt_x_f_v_w, WOP_UU_W, H8, H4, float32_to_int64)
-GEN_VEXT_V_ENV(vfwcvt_x_f_v_h)
-GEN_VEXT_V_ENV(vfwcvt_x_f_v_w)
+GEN_VEXT_V_ENV(vfwcvt_x_f_v_h, 4)
+GEN_VEXT_V_ENV(vfwcvt_x_f_v_w, 8)
 
 /* vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float */
 RVVCALL(OPFVV1, vfwcvt_f_xu_v_b, WOP_UU_B, H2, H1, uint8_to_float16)
 RVVCALL(OPFVV1, vfwcvt_f_xu_v_h, WOP_UU_H, H4, H2, uint16_to_float32)
 RVVCALL(OPFVV1, vfwcvt_f_xu_v_w, WOP_UU_W, H8, H4, uint32_to_float64)
-GEN_VEXT_V_ENV(vfwcvt_f_xu_v_b)
-GEN_VEXT_V_ENV(vfwcvt_f_xu_v_h)
-GEN_VEXT_V_ENV(vfwcvt_f_xu_v_w)
+GEN_VEXT_V_ENV(vfwcvt_f_xu_v_b, 2)
+GEN_VEXT_V_ENV(vfwcvt_f_xu_v_h, 4)
+GEN_VEXT_V_ENV(vfwcvt_f_xu_v_w, 8)
 
 /* vfwcvt.f.x.v vd, vs2, vm # Convert integer to double-width float. */
 RVVCALL(OPFVV1, vfwcvt_f_x_v_b, WOP_UU_B, H2, H1, int8_to_float16)
 RVVCALL(OPFVV1, vfwcvt_f_x_v_h, WOP_UU_H, H4, H2, int16_to_float32)
 RVVCALL(OPFVV1, vfwcvt_f_x_v_w, WOP_UU_W, H8, H4, int32_to_float64)
-GEN_VEXT_V_ENV(vfwcvt_f_x_v_b)
-GEN_VEXT_V_ENV(vfwcvt_f_x_v_h)
-GEN_VEXT_V_ENV(vfwcvt_f_x_v_w)
+GEN_VEXT_V_ENV(vfwcvt_f_x_v_b, 2)
+GEN_VEXT_V_ENV(vfwcvt_f_x_v_h, 4)
+GEN_VEXT_V_ENV(vfwcvt_f_x_v_w, 8)
 
 /*
  * vfwcvt.f.f.v vd, vs2, vm
@@ -XXX,XX +XXX,XX @@ static uint32_t vfwcvtffv16(uint16_t a, float_status *s)
 
 RVVCALL(OPFVV1, vfwcvt_f_f_v_h, WOP_UU_H, H4, H2, vfwcvtffv16)
 RVVCALL(OPFVV1, vfwcvt_f_f_v_w, WOP_UU_W, H8, H4, float32_to_float64)
-GEN_VEXT_V_ENV(vfwcvt_f_f_v_h)
-GEN_VEXT_V_ENV(vfwcvt_f_f_v_w)
+GEN_VEXT_V_ENV(vfwcvt_f_f_v_h, 4)
+GEN_VEXT_V_ENV(vfwcvt_f_f_v_w, 8)
 
 /* Narrowing Floating-Point/Integer Type-Convert Instructions */
 /* (TD, T2, TX2) */
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_V_ENV(vfwcvt_f_f_v_w)
 RVVCALL(OPFVV1, vfncvt_xu_f_w_b, NOP_UU_B, H1, H2, float16_to_uint8)
 RVVCALL(OPFVV1, vfncvt_xu_f_w_h, NOP_UU_H, H2, H4, float32_to_uint16)
 RVVCALL(OPFVV1, vfncvt_xu_f_w_w, NOP_UU_W, H4, H8, float64_to_uint32)
-GEN_VEXT_V_ENV(vfncvt_xu_f_w_b)
-GEN_VEXT_V_ENV(vfncvt_xu_f_w_h)
-GEN_VEXT_V_ENV(vfncvt_xu_f_w_w)
+GEN_VEXT_V_ENV(vfncvt_xu_f_w_b, 1)
+GEN_VEXT_V_ENV(vfncvt_xu_f_w_h, 2)
+GEN_VEXT_V_ENV(vfncvt_xu_f_w_w, 4)
 
 /* vfncvt.x.f.v vd, vs2, vm # Convert double-width float to signed integer. */
 RVVCALL(OPFVV1, vfncvt_x_f_w_b, NOP_UU_B, H1, H2, float16_to_int8)
 RVVCALL(OPFVV1, vfncvt_x_f_w_h, NOP_UU_H, H2, H4, float32_to_int16)
 RVVCALL(OPFVV1, vfncvt_x_f_w_w, NOP_UU_W, H4, H8, float64_to_int32)
-GEN_VEXT_V_ENV(vfncvt_x_f_w_b)
-GEN_VEXT_V_ENV(vfncvt_x_f_w_h)
-GEN_VEXT_V_ENV(vfncvt_x_f_w_w)
+GEN_VEXT_V_ENV(vfncvt_x_f_w_b, 1)
+GEN_VEXT_V_ENV(vfncvt_x_f_w_h, 2)
+GEN_VEXT_V_ENV(vfncvt_x_f_w_w, 4)
 
 /* vfncvt.f.xu.v vd, vs2, vm # Convert double-width unsigned integer to float */
 RVVCALL(OPFVV1, vfncvt_f_xu_w_h, NOP_UU_H, H2, H4, uint32_to_float16)
 RVVCALL(OPFVV1, vfncvt_f_xu_w_w, NOP_UU_W, H4, H8, uint64_to_float32)
-GEN_VEXT_V_ENV(vfncvt_f_xu_w_h)
-GEN_VEXT_V_ENV(vfncvt_f_xu_w_w)
+GEN_VEXT_V_ENV(vfncvt_f_xu_w_h, 2)
+GEN_VEXT_V_ENV(vfncvt_f_xu_w_w, 4)
 
 /* vfncvt.f.x.v vd, vs2, vm # Convert double-width integer to float. */
 RVVCALL(OPFVV1, vfncvt_f_x_w_h, NOP_UU_H, H2, H4, int32_to_float16)
 RVVCALL(OPFVV1, vfncvt_f_x_w_w, NOP_UU_W, H4, H8, int64_to_float32)
-GEN_VEXT_V_ENV(vfncvt_f_x_w_h)
-GEN_VEXT_V_ENV(vfncvt_f_x_w_w)
+GEN_VEXT_V_ENV(vfncvt_f_x_w_h, 2)
+GEN_VEXT_V_ENV(vfncvt_f_x_w_w, 4)
 
 /* vfncvt.f.f.v vd, vs2, vm # Convert double float to single-width float. */
 static uint16_t vfncvtffv16(uint32_t a, float_status *s)
@@ -XXX,XX +XXX,XX @@ static uint16_t vfncvtffv16(uint32_t a, float_status *s)
 
 RVVCALL(OPFVV1, vfncvt_f_f_w_h, NOP_UU_H, H2, H4, vfncvtffv16)
 RVVCALL(OPFVV1, vfncvt_f_f_w_w, NOP_UU_W, H4, H8, float64_to_float32)
-GEN_VEXT_V_ENV(vfncvt_f_f_w_h)
-GEN_VEXT_V_ENV(vfncvt_f_f_w_w)
+GEN_VEXT_V_ENV(vfncvt_f_f_w_h, 2)
+GEN_VEXT_V_ENV(vfncvt_f_f_w_w, 4)
 
 /*
  *** Vector Reduction Operations
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
+        data =                                                     \
+            FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);\
         tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                            vreg_ofs(s, a->rs1),                    \
                            vreg_ofs(s, a->rs2), cpu_env,           \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)            \
         gen_set_rm(s, RISCV_FRM_DYN);                             \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);            \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);              \
+        data = FIELD_DP32(data, VDATA, VTA_ALL_1S,                \
+                          s->cfg_vta_all_1s);                     \
         return opfvf_trans(a->rd, a->rs1, a->rs2, data,           \
                            fns[s->sew - 1], s);                   \
     }                                                             \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
                                                                  \
         data = FIELD_DP32(data, VDATA, VM, a->vm);               \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);           \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);             \
         tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),   \
                            vreg_ofs(s, a->rs1),                  \
                            vreg_ofs(s, a->rs2), cpu_env,         \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
         gen_set_rm(s, RISCV_FRM_DYN);                            \
         data = FIELD_DP32(data, VDATA, VM, a->vm);               \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);           \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);             \
         return opfvf_trans(a->rd, a->rs1, a->rs2, data,          \
                            fns[s->sew - 1], s);                  \
     }                                                            \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
         tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                            vreg_ofs(s, a->rs1),                    \
                            vreg_ofs(s, a->rs2), cpu_env,           \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
         gen_set_rm(s, RISCV_FRM_DYN);                            \
         data = FIELD_DP32(data, VDATA, VM, a->vm);               \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);           \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);             \
         return opfvf_trans(a->rd, a->rs1, a->rs2, data,          \
                            fns[s->sew - 1], s);                  \
     }                                                            \
@@ -XXX,XX +XXX,XX @@ static bool do_opfv(DisasContext *s, arg_rmr *a,
 
         data = FIELD_DP32(data, VDATA, VM, a->vm);
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);
         tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
                            vreg_ofs(s, a->rs2), cpu_env,
                            s->cfg_ptr->vlen / 8,
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
         tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                            vreg_ofs(s, a->rs2), cpu_env,           \
                            s->cfg_ptr->vlen / 8,                   \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
         tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                            vreg_ofs(s, a->rs2), cpu_env,           \
                            s->cfg_ptr->vlen / 8,                   \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
         tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                            vreg_ofs(s, a->rs2), cpu_env,           \
                            s->cfg_ptr->vlen / 8,                   \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);               \
         tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                            vreg_ofs(s, a->rs2), cpu_env,           \
                            s->cfg_ptr->vlen / 8,                   \
-- 
2.36.1

From: eopXD <yueh.ting.chen@gmail.com>

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
 {                                                         \
     uint32_t vm = vext_vm(desc);                          \
     uint32_t vl = env->vl;                                \
+    uint32_t esz = sizeof(TD);                            \
+    uint32_t vlenb = simd_maxsz(desc);                    \
+    uint32_t vta = vext_vta(desc);                        \
     uint32_t i;                                           \
     TD s1 =  *((TD *)vs1 + HD(0));                        \
                                                           \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
     }                                                     \
     *((TD *)vd + HD(0)) = s1;                             \
     env->vstart = 0;                                      \
+    /* set tail elements to 1s */                         \
+    vext_set_elems_1s(vd, vta, esz, vlenb);               \
 }
 
 /* vd[0] = sum(vs1[0], vs2[*]) */
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,           \
 {                                                          \
     uint32_t vm = vext_vm(desc);                           \
     uint32_t vl = env->vl;                                 \
+    uint32_t esz = sizeof(TD);                             \
+    uint32_t vlenb = simd_maxsz(desc);                     \
+    uint32_t vta = vext_vta(desc);                         \
     uint32_t i;                                            \
     TD s1 =  *((TD *)vs1 + HD(0));                         \
                                                            \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,           \
     }                                                      \
     *((TD *)vd + HD(0)) = s1;                              \
     env->vstart = 0;                                       \
+    /* set tail elements to 1s */                          \
+    vext_set_elems_1s(vd, vta, esz, vlenb);                \
 }
 
 /* Unordered sum */
@@ -XXX,XX +XXX,XX @@ void HELPER(vfwredsum_vs_h)(void *vd, void *v0, void *vs1,
 {
     uint32_t vm = vext_vm(desc);
     uint32_t vl = env->vl;
+    uint32_t esz = sizeof(uint32_t);
+    uint32_t vlenb = simd_maxsz(desc);
+    uint32_t vta = vext_vta(desc);
     uint32_t i;
     uint32_t s1 =  *((uint32_t *)vs1 + H4(0));
 
@@ -XXX,XX +XXX,XX @@ void HELPER(vfwredsum_vs_h)(void *vd, void *v0, void *vs1,
     }
     *((uint32_t *)vd + H4(0)) = s1;
     env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, esz, vlenb);
 }
 
 void HELPER(vfwredsum_vs_w)(void *vd, void *v0, void *vs1,
@@ -XXX,XX +XXX,XX @@ void HELPER(vfwredsum_vs_w)(void *vd, void *v0, void *vs1,
 {
     uint32_t vm = vext_vm(desc);
     uint32_t vl = env->vl;
+    uint32_t esz = sizeof(uint64_t);
+    uint32_t vlenb = simd_maxsz(desc);
+    uint32_t vta = vext_vta(desc);
     uint32_t i;
     uint64_t s1 =  *((uint64_t *)vs1);
 
@@ -XXX,XX +XXX,XX @@ void HELPER(vfwredsum_vs_w)(void *vd, void *v0, void *vs1,
     }
     *((uint64_t *)vd) = s1;
     env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, esz, vlenb);
 }
 
 /*
-- 
2.36.1

From: eopXD <yueh.ting.chen@gmail.com>

The tail elements in the destination mask register are updated under
a tail-agnostic policy.

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
                   uint32_t desc)                          \
 {                                                         \
     uint32_t vl = env->vl;                                \
+    uint32_t total_elems = env_archcpu(env)->cfg.vlen;    \
+    uint32_t vta_all_1s = vext_vta_all_1s(desc);          \
     uint32_t i;                                           \
     int a, b;                                             \
                                                           \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
         vext_set_elem_mask(vd, i, OP(b, a));              \
     }                                                     \
     env->vstart = 0;                                      \
+    /* mask destination register are always tail-         \
+     * agnostic                                           \
+     */                                                   \
+    /* set tail elements to 1s */                         \
+    if (vta_all_1s) {                                     \
+        for (; i < total_elems; i++) {                    \
+            vext_set_elem_mask(vd, i, 1);                 \
+        }                                                 \
+    }                                                     \
 }
 
 #define DO_NAND(N, M)  (!(N & M))
@@ -XXX,XX +XXX,XX @@ static void vmsetm(void *vd, void *v0, void *vs2, CPURISCVState *env,
 {
     uint32_t vm = vext_vm(desc);
     uint32_t vl = env->vl;
+    uint32_t total_elems = env_archcpu(env)->cfg.vlen;
+    uint32_t vta_all_1s = vext_vta_all_1s(desc);
     int i;
     bool first_mask_bit = false;
 
@@ -XXX,XX +XXX,XX @@ static void vmsetm(void *vd, void *v0, void *vs2, CPURISCVState *env,
         }
     }
     env->vstart = 0;
+    /* mask destination register are always tail-agnostic */
+    /* set tail elements to 1s */
+    if (vta_all_1s) {
+        for (; i < total_elems; i++) {
+            vext_set_elem_mask(vd, i, 1);
+        }
+    }
 }
 
 void HELPER(vmsbf_m)(void *vd, void *v0, void *vs2, CPURISCVState *env,
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs2, CPURISCVState *env,      \
 {                                                                         \
     uint32_t vm = vext_vm(desc);                                          \
     uint32_t vl = env->vl;                                                \
+    uint32_t esz = sizeof(ETYPE);                                         \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
+    uint32_t vta = vext_vta(desc);                                        \
     uint32_t sum = 0;                                                     \
     int i;                                                                \
                                                                           \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs2, CPURISCVState *env,      \
         }                                                                 \
     }                                                                     \
     env->vstart = 0;                                                      \
+    /* set tail elements to 1s */                                         \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
 }
 
 GEN_VEXT_VIOTA_M(viota_m_b, uint8_t,  H1)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, CPURISCVState *env, uint32_t desc)  \
 {                                                                         \
     uint32_t vm = vext_vm(desc);                                          \
     uint32_t vl = env->vl;                                                \
+    uint32_t esz = sizeof(ETYPE);                                         \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
+    uint32_t vta = vext_vta(desc);                                        \
     int i;                                                                \
                                                                           \
     for (i = env->vstart; i < vl; i++) {                                  \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, CPURISCVState *env, uint32_t desc)  \
         *((ETYPE *)vd + H(i)) = i;                                        \
     }                                                                     \
     env->vstart = 0;                                                      \
+    /* set tail elements to 1s */                                         \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
 }
 
 GEN_VEXT_VID_V(vid_v_b, uint8_t,  H1)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_r *a)                \
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        data =                                                     \
+            FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);\
         tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),     \
                            vreg_ofs(s, a->rs1),                    \
                            vreg_ofs(s, a->rs2), cpu_env,           \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
+        data =                                                     \
+            FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);\
         tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd),                     \
                            vreg_ofs(s, 0), vreg_ofs(s, a->rs2),    \
                            cpu_env, s->cfg_ptr->vlen / 8,          \
@@ -XXX,XX +XXX,XX @@ static bool trans_viota_m(DisasContext *s, arg_viota_m *a)
 
         data = FIELD_DP32(data, VDATA, VM, a->vm);
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);
         static gen_helper_gvec_3_ptr * const fns[4] = {
             gen_helper_viota_m_b, gen_helper_viota_m_h,
             gen_helper_viota_m_w, gen_helper_viota_m_d,
@@ -XXX,XX +XXX,XX @@ static bool trans_vid_v(DisasContext *s, arg_vid_v *a)
 
         data = FIELD_DP32(data, VDATA, VM, a->vm);
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);
         static gen_helper_gvec_2_ptr * const fns[4] = {
             gen_helper_vid_v_b, gen_helper_vid_v_h,
             gen_helper_vid_v_w, gen_helper_vid_v_d,
-- 
2.36.1

From: eopXD <yueh.ting.chen@gmail.com>

Signed-off-by: eop Chen <eop.chen@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <165449614532.19704.7000832880482980398-15@git.sr.ht>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/vector_helper.c            | 40 +++++++++++++++++++++++++
 target/riscv/insn_trans/trans_rvv.c.inc |  7 +++--
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
 {                                                                         \
     uint32_t vm = vext_vm(desc);                                          \
     uint32_t vl = env->vl;                                                \
+    uint32_t esz = sizeof(ETYPE);                                         \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
+    uint32_t vta = vext_vta(desc);                                        \
     target_ulong offset = s1, i_min, i;                                   \
                                                                           \
     i_min = MAX(env->vstart, offset);                                     \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
         }                                                                 \
         *((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));          \
     }                                                                     \
+    /* set tail elements to 1s */                                         \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
 }
 
 /* vslideup.vx vd, vs2, rs1, vm # vd[i+rs1] = vs2[i] */
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
     uint32_t vlmax = vext_max_elems(desc, ctzl(sizeof(ETYPE)));           \
     uint32_t vm = vext_vm(desc);                                          \
     uint32_t vl = env->vl;                                                \
+    uint32_t esz = sizeof(ETYPE);                                         \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
+    uint32_t vta = vext_vta(desc);                                        \
     target_ulong i_max, i;                                                \
                                                                           \
     i_max = MAX(MIN(s1 < vlmax ? vlmax - s1 : 0, vl), env->vstart);       \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
     }                                                                     \
                                                                           \
     env->vstart = 0;                                                      \
+    /* set tail elements to 1s */                                         \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
 }
 
 /* vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+rs1] */
@@ -XXX,XX +XXX,XX @@ static void vslide1up_##BITWIDTH(void *vd, void *v0, target_ulong s1,       \
     typedef uint##BITWIDTH##_t ETYPE;                                       \
     uint32_t vm = vext_vm(desc);                                            \
     uint32_t vl = env->vl;                                                  \
+    uint32_t esz = sizeof(ETYPE);                                           \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);            \
+    uint32_t vta = vext_vta(desc);                                          \
     uint32_t i;                                                             \
                                                                             \
     for (i = env->vstart; i < vl; i++) {                                    \
@@ -XXX,XX +XXX,XX @@ static void vslide1up_##BITWIDTH(void *vd, void *v0, target_ulong s1,       \
         }                                                                   \
     }                                                                       \
     env->vstart = 0;                                                        \
+    /* set tail elements to 1s */                                           \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);                \
 }
 
 GEN_VEXT_VSLIE1UP(8,  H1)
@@ -XXX,XX +XXX,XX @@ static void vslide1down_##BITWIDTH(void *vd, void *v0, target_ulong s1,       \
     typedef uint##BITWIDTH##_t ETYPE;                                         \
     uint32_t vm = vext_vm(desc);                                              \
     uint32_t vl = env->vl;                                                    \
+    uint32_t esz = sizeof(ETYPE);                                             \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);              \
+    uint32_t vta = vext_vta(desc);                                            \
     uint32_t i;                                                               \
                                                                               \
     for (i = env->vstart; i < vl; i++) {                                      \
@@ -XXX,XX +XXX,XX @@ static void vslide1down_##BITWIDTH(void *vd, void *v0, target_ulong s1,       \
         }                                                                     \
     }                                                                         \
     env->vstart = 0;                                                          \
+    /* set tail elements to 1s */                                             \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);                  \
 }
 
 GEN_VEXT_VSLIDE1DOWN(8,  H1)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,               \
     uint32_t vlmax = vext_max_elems(desc, ctzl(sizeof(TS2)));             \
     uint32_t vm = vext_vm(desc);                                          \
     uint32_t vl = env->vl;                                                \
+    uint32_t esz = sizeof(TS2);                                           \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
+    uint32_t vta = vext_vta(desc);                                        \
     uint64_t index;                                                       \
     uint32_t i;                                                           \
                                                                           \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,               \
         }                                                                 \
     }                                                                     \
     env->vstart = 0;                                                      \
+    /* set tail elements to 1s */                                         \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
 }
 
 /* vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; */
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
     uint32_t vlmax = vext_max_elems(desc, ctzl(sizeof(ETYPE)));           \
     uint32_t vm = vext_vm(desc);                                          \
     uint32_t vl = env->vl;                                                \
+    uint32_t esz = sizeof(ETYPE);                                         \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
+    uint32_t vta = vext_vta(desc);                                        \
     uint64_t index = s1;                                                  \
     uint32_t i;                                                           \
                                                                           \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,         \
         }                                                                 \
     }                                                                     \
     env->vstart = 0;                                                      \
+    /* set tail elements to 1s */                                         \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
 }
 
 /* vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[rs1] */
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,               \
                   CPURISCVState *env, uint32_t desc)                      \
 {                                                                         \
     uint32_t vl = env->vl;                                                \
+    uint32_t esz = sizeof(ETYPE);                                         \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);          \
+    uint32_t vta = vext_vta(desc);                                        \
     uint32_t num = 0, i;                                                  \
                                                                           \
     for (i = env->vstart; i < vl; i++) {                                  \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,               \
         num++;                                                            \
     }                                                                     \
     env->vstart = 0;                                                      \
+    /* set tail elements to 1s */                                         \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);              \
 }
 
 /* Compress into vd elements of vs2 where vs1 is enabled */
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs2,                 \
 {                                                                \
     uint32_t vl = env->vl;                                       \
     uint32_t vm = vext_vm(desc);                                 \
+    uint32_t esz = sizeof(ETYPE);                                \
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz); \
+    uint32_t vta = vext_vta(desc);                               \
     uint32_t i;                                                  \
                                                                  \
     for (i = env->vstart; i < vl; i++) {                         \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, void *vs2,                 \
         *((ETYPE *)vd + HD(i)) = *((DTYPE *)vs2 + HS1(i));       \
     }                                                            \
     env->vstart = 0;                                             \
+    /* set tail elements to 1s */                                \
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);     \
 }
 
 GEN_VEXT_INT_EXT(vzext_vf2_h, uint16_t, uint8_t,  H2, H1)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_vrgather_vx(DisasContext *s, arg_rmrr *a)
         return false;
     }
 
-    if (a->vm && s->vl_eq_vlmax) {
+    if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
         int scale = s->lmul - (s->sew + 3);
         int vlmax = s->cfg_ptr->vlen >> -scale;
         TCGv_i64 dest = tcg_temp_new_i64();
@@ -XXX,XX +XXX,XX @@ static bool trans_vrgather_vi(DisasContext *s, arg_rmrr *a)
         return false;
     }
 
-    if (a->vm && s->vl_eq_vlmax) {
+    if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
         int scale = s->lmul - (s->sew + 3);
         int vlmax = s->cfg_ptr->vlen >> -scale;
         if (a->rs1 >= vlmax) {
@@ -XXX,XX +XXX,XX @@ static bool trans_vcompress_vm(DisasContext *s, arg_r *a)
         tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
 
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);
         tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
                            vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),
                            cpu_env, s->cfg_ptr->vlen / 8,
@@ -XXX,XX +XXX,XX @@ static bool int_ext_op(DisasContext *s, arg_rmr *a, uint8_t seq)
     }
 
     data = FIELD_DP32(data, VDATA, VM, a->vm);
+    data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+    data = FIELD_DP32(data, VDATA, VTA, s->vta);
 
     tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
                        vreg_ofs(s, a->rs2), cpu_env,
-- 
2.36.1

From: eopXD <eop.chen@sifive.com>

This commit adds option 'rvv_ta_all_1s' is added to enable the
behavior, it is default as disabled.

Signed-off-by: eop Chen <eop.chen@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <165449614532.19704.7000832880482980398-16@git.sr.ht>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_properties[] = {
     DEFINE_PROP_UINT64("resetvec", RISCVCPU, cfg.resetvec, DEFAULT_RSTVEC),
 
     DEFINE_PROP_BOOL("short-isa-string", RISCVCPU, cfg.short_isa_string, false),
+
+    DEFINE_PROP_BOOL("rvv_ta_all_1s", RISCVCPU, cfg.rvv_ta_all_1s, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.36.1

From: Alistair Francis <alistair.francis@wdc.com>

There are currently two types of RISC-V CPUs:
 - Generic CPUs (base or any) that allow complete custimisation
 - "Named" CPUs that match existing hardware

Users can use the base CPUs to custimise the extensions that they want, for
example -cpu rv64,v=true.

We originally exposed these as part of the named CPUs as well, but that was
by accident.

Exposing the CPU properties to named CPUs means that we accidently
enable extensions that don't exist on the CPUs by default. For example
the SiFive E CPU currently support the zba extension, which is a bug.

This patch instead only exposes the CPU extensions to the generic CPUs.

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Message-Id: <20220608061437.314434-1-alistair.francis@opensource.wdc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu.c | 57 +++++++++++++++++++++++++++++++++++++---------
 1 file changed, 46 insertions(+), 11 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static const char * const riscv_intr_names[] = {
     "reserved"
 };
 
+static void register_cpu_props(DeviceState *dev);
+
 const char *riscv_cpu_get_trap_name(target_ulong cause, bool async)
 {
     if (async) {
@@ -XXX,XX +XXX,XX @@ static void riscv_any_cpu_init(Object *obj)
     set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD | RVC | RVU);
 #endif
     set_priv_version(env, PRIV_VERSION_1_12_0);
+    register_cpu_props(DEVICE(obj));
 }
 
 #if defined(TARGET_RISCV64)
@@ -XXX,XX +XXX,XX @@ static void rv64_base_cpu_init(Object *obj)
     CPURISCVState *env = &RISCV_CPU(obj)->env;
     /* We set this in the realise function */
     set_misa(env, MXL_RV64, 0);
+    register_cpu_props(DEVICE(obj));
 }
 
 static void rv64_sifive_u_cpu_init(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void rv64_sifive_u_cpu_init(Object *obj)
 static void rv64_sifive_e_cpu_init(Object *obj)
 {
     CPURISCVState *env = &RISCV_CPU(obj)->env;
+    RISCVCPU *cpu = RISCV_CPU(obj);
+
     set_misa(env, MXL_RV64, RVI | RVM | RVA | RVC | RVU);
     set_priv_version(env, PRIV_VERSION_1_10_0);
-    qdev_prop_set_bit(DEVICE(obj), "mmu", false);
+    cpu->cfg.mmu = false;
 }
 
 static void rv128_base_cpu_init(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void rv128_base_cpu_init(Object *obj)
     CPURISCVState *env = &RISCV_CPU(obj)->env;
     /* We set this in the realise function */
     set_misa(env, MXL_RV128, 0);
+    register_cpu_props(DEVICE(obj));
 }
 #else
 static void rv32_base_cpu_init(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void rv32_base_cpu_init(Object *obj)
     CPURISCVState *env = &RISCV_CPU(obj)->env;
     /* We set this in the realise function */
     set_misa(env, MXL_RV32, 0);
+    register_cpu_props(DEVICE(obj));
 }
 
 static void rv32_sifive_u_cpu_init(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void rv32_sifive_u_cpu_init(Object *obj)
 static void rv32_sifive_e_cpu_init(Object *obj)
 {
     CPURISCVState *env = &RISCV_CPU(obj)->env;
+    RISCVCPU *cpu = RISCV_CPU(obj);
+
     set_misa(env, MXL_RV32, RVI | RVM | RVA | RVC | RVU);
     set_priv_version(env, PRIV_VERSION_1_10_0);
-    qdev_prop_set_bit(DEVICE(obj), "mmu", false);
+    cpu->cfg.mmu = false;
 }
 
 static void rv32_ibex_cpu_init(Object *obj)
 {
     CPURISCVState *env = &RISCV_CPU(obj)->env;
+    RISCVCPU *cpu = RISCV_CPU(obj);
+
     set_misa(env, MXL_RV32, RVI | RVM | RVC | RVU);
     set_priv_version(env, PRIV_VERSION_1_10_0);
-    qdev_prop_set_bit(DEVICE(obj), "mmu", false);
-    qdev_prop_set_bit(DEVICE(obj), "x-epmp", true);
+    cpu->cfg.mmu = false;
+    cpu->cfg.epmp = true;
 }
 
 static void rv32_imafcu_nommu_cpu_init(Object *obj)
 {
     CPURISCVState *env = &RISCV_CPU(obj)->env;
+    RISCVCPU *cpu = RISCV_CPU(obj);
+
     set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVC | RVU);
     set_priv_version(env, PRIV_VERSION_1_10_0);
     set_resetvec(env, DEFAULT_RSTVEC);
-    qdev_prop_set_bit(DEVICE(obj), "mmu", false);
+    cpu->cfg.mmu = false;
 }
 #endif
 
@@ -XXX,XX +XXX,XX @@ static void riscv_host_cpu_init(Object *obj)
 #elif defined(TARGET_RISCV64)
     set_misa(env, MXL_RV64, 0);
 #endif
+    register_cpu_props(DEVICE(obj));
 }
 #endif
 
@@ -XXX,XX +XXX,XX @@ static void riscv_cpu_init(Object *obj)
 {
     RISCVCPU *cpu = RISCV_CPU(obj);
 
+    cpu->cfg.ext_counters = true;
+    cpu->cfg.ext_ifencei = true;
+    cpu->cfg.ext_icsr = true;
+    cpu->cfg.mmu = true;
+    cpu->cfg.pmp = true;
+
     cpu_set_cpustate_pointers(cpu);
 
 #ifndef CONFIG_USER_ONLY
@@ -XXX,XX +XXX,XX @@ static void riscv_cpu_init(Object *obj)
 #endif /* CONFIG_USER_ONLY */
 }
 
-static Property riscv_cpu_properties[] = {
+static Property riscv_cpu_extensions[] = {
     /* Defaults for standard extensions */
     DEFINE_PROP_BOOL("i", RISCVCPU, cfg.ext_i, true),
     DEFINE_PROP_BOOL("e", RISCVCPU, cfg.ext_e, false),
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_properties[] = {
     DEFINE_PROP_BOOL("Zve64f", RISCVCPU, cfg.ext_zve64f, false),
     DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
     DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
-    DEFINE_PROP_BOOL("debug", RISCVCPU, cfg.debug, true),
 
     DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
     DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
     DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
     DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
 
-    DEFINE_PROP_UINT32("mvendorid", RISCVCPU, cfg.mvendorid, 0),
-    DEFINE_PROP_UINT64("marchid", RISCVCPU, cfg.marchid, RISCV_CPU_MARCHID),
-    DEFINE_PROP_UINT64("mimpid", RISCVCPU, cfg.mimpid, RISCV_CPU_MIMPID),
-
     DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false),
     DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),
     DEFINE_PROP_BOOL("svpbmt", RISCVCPU, cfg.ext_svpbmt, false),
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_properties[] = {
     DEFINE_PROP_BOOL("x-epmp", RISCVCPU, cfg.epmp, false),
     DEFINE_PROP_BOOL("x-aia", RISCVCPU, cfg.aia, false),
 
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void register_cpu_props(DeviceState *dev)
+{
+    Property *prop;
+
+    for (prop = riscv_cpu_extensions; prop && prop->name; prop++) {
+        qdev_property_add_static(dev, prop);
+    }
+}
+
+static Property riscv_cpu_properties[] = {
+    DEFINE_PROP_BOOL("debug", RISCVCPU, cfg.debug, true),
+
+    DEFINE_PROP_UINT32("mvendorid", RISCVCPU, cfg.mvendorid, 0),
+    DEFINE_PROP_UINT64("marchid", RISCVCPU, cfg.marchid, RISCV_CPU_MARCHID),
+    DEFINE_PROP_UINT64("mimpid", RISCVCPU, cfg.mimpid, RISCV_CPU_MIMPID),
+
     DEFINE_PROP_UINT64("resetvec", RISCVCPU, cfg.resetvec, DEFAULT_RSTVEC),
 
     DEFINE_PROP_BOOL("short-isa-string", RISCVCPU, cfg.short_isa_string, false),
-- 
2.36.1

From: Alistair Francis <alistair.francis@wdc.com>

When running a 32-bit guest, with a e64 vmv.v.x and vl_eq_vlmax set to
true the `tcg_debug_assert(vece <= MO_32)` will be triggered inside
tcg_gen_gvec_dup_i32().

This patch checks that condition and instead uses tcg_gen_gvec_dup_i64()
is required.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1028
Suggested-by: Robert Bu <robert.bu@gmail.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220608234701.369536-1-alistair.francis@opensource.wdc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/insn_trans/trans_rvv.c.inc | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

The following changes since commit c5ea91da443b458352c1b629b490ee6631775cb4:

Merge tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu into staging (2023-09-08 10:06:25 -0400)

are available in the Git repository at:

https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20230911

for you to fetch changes up to e7a03409f29e2da59297d55afbaec98c96e43e3a:

target/riscv: don't read CSR in riscv_csrrw_do64 (2023-09-11 11:45:55 +1000)

----------------------------------------------------------------
First RISC-V PR for 8.2

* Remove 'host' CPU from TCG
 * riscv_htif Fixup printing on big endian hosts
 * Add zmmul isa string
 * Add smepmp isa string
 * Fix page_check_range use in fault-only-first
 * Use existing lookup tables for MixColumns
 * Add RISC-V vector cryptographic instruction set support
 * Implement WARL behaviour for mcountinhibit/mcounteren
 * Add Zihintntl extension ISA string to DTS
 * Fix zfa fleq.d and fltq.d
 * Fix upper/lower mtime write calculation
 * Make rtc variable names consistent
 * Use abi type for linux-user target_ucontext
 * Add RISC-V KVM AIA Support
 * Fix riscv,pmu DT node path in the virt machine
 * Update CSR bits name for svadu extension
 * Mark zicond non-experimental
 * Fix satp_mode_finalize() when satp_mode.supported = 0
 * Fix non-KVM --enable-debug build
 * Add new extensions to hwprobe
 * Use accelerated helper for AES64KS1I
 * Allocate itrigger timers only once
 * Respect mseccfg.RLB for pmpaddrX changes
 * Align the AIA model to v1.0 ratified spec
 * Don't read the CSR in riscv_csrrw_do64

----------------------------------------------------------------
Akihiko Odaki (1):
      target/riscv: Allocate itrigger timers only once

Ard Biesheuvel (2):
      target/riscv: Use existing lookup tables for MixColumns
      target/riscv: Use accelerated helper for AES64KS1I

Conor Dooley (1):
      hw/riscv: virt: Fix riscv,pmu DT node path

Daniel Henrique Barboza (6):
      target/riscv/cpu.c: do not run 'host' CPU with TCG
      target/riscv/cpu.c: add zmmul isa string
      target/riscv/cpu.c: add smepmp isa string
      target/riscv: fix satp_mode_finalize() when satp_mode.supported = 0
      hw/riscv/virt.c: fix non-KVM --enable-debug build
      hw/intc/riscv_aplic.c fix non-KVM --enable-debug build

Dickon Hood (2):
      target/riscv: Refactor translation of vector-widening instruction
      target/riscv: Add Zvbb ISA extension support

Jason Chien (3):
      target/riscv: Add Zihintntl extension ISA string to DTS
      hw/intc: Fix upper/lower mtime write calculation
      hw/intc: Make rtc variable names consistent

Kiran Ostrolenk (4):
      target/riscv: Refactor some of the generic vector functionality
      target/riscv: Refactor vector-vector translation macro
      target/riscv: Refactor some of the generic vector functionality
      target/riscv: Add Zvknh ISA extension support

LIU Zhiwei (3):
      target/riscv: Fix page_check_range use in fault-only-first
      target/riscv: Fix zfa fleq.d and fltq.d
      linux-user/riscv: Use abi type for target_ucontext

Lawrence Hunter (2):
      target/riscv: Add Zvbc ISA extension support
      target/riscv: Add Zvksh ISA extension support

Leon Schuermann (1):
      target/riscv/pmp.c: respect mseccfg.RLB for pmpaddrX changes

Max Chou (3):
      crypto: Create sm4_subword
      crypto: Add SM4 constant parameter CK
      target/riscv: Add Zvksed ISA extension support

Nazar Kazakov (4):
      target/riscv: Remove redundant "cpu_vl == 0" checks
      target/riscv: Move vector translation checks
      target/riscv: Add Zvkned ISA extension support
      target/riscv: Add Zvkg ISA extension support

Nikita Shubin (1):
      target/riscv: don't read CSR in riscv_csrrw_do64

Rob Bradford (1):
      target/riscv: Implement WARL behaviour for mcountinhibit/mcounteren

Robbin Ehn (1):
      linux-user/riscv: Add new extensions to hwprobe

Thomas Huth (2):
      hw/char/riscv_htif: Fix printing of console characters on big endian hosts
      hw/char/riscv_htif: Fix the console syscall on big endian hosts

Tommy Wu (1):
      target/riscv: Align the AIA model to v1.0 ratified spec

Vineet Gupta (1):
      riscv: zicond: make non-experimental

Weiwei Li (1):
      target/riscv: Update CSR bits name for svadu extension

Yong-Xuan Wang (5):
      target/riscv: support the AIA device emulation with KVM enabled
      target/riscv: check the in-kernel irqchip support
      target/riscv: Create an KVM AIA irqchip
      target/riscv: update APLIC and IMSIC to support KVM AIA
      target/riscv: select KVM AIA in riscv virt machine

From: Daniel Henrique Barboza <dbarboza@ventanamicro.com>

The 'host' CPU is available in a CONFIG_KVM build and it's currently
available for all accels, but is a KVM only CPU. This means that in a
RISC-V KVM capable host we can do things like this:

$ ./build/qemu-system-riscv64 -M virt,accel=tcg -cpu host --nographic
qemu-system-riscv64: H extension requires priv spec 1.12.0

This CPU does not have a priv spec because we don't filter its extensions
via priv spec. We shouldn't be reaching riscv_cpu_realize_tcg() at all
with the 'host' CPU.

We don't have a way to filter the 'host' CPU out of the available CPU
options (-cpu help) if the build includes both KVM and TCG. What we can
do is to error out during riscv_cpu_realize_tcg() if the user chooses
the 'host' CPU with accel=tcg:

$ ./build/qemu-system-riscv64 -M virt,accel=tcg -cpu host --nographic
qemu-system-riscv64: 'host' CPU is not compatible with TCG acceleration

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230721133411.474105-1-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static void riscv_cpu_realize_tcg(DeviceState *dev, Error **errp)
     CPURISCVState *env = &cpu->env;
     Error *local_err = NULL;
 
+    if (object_dynamic_cast(OBJECT(dev), TYPE_RISCV_CPU_HOST)) {
+        error_setg(errp, "'host' CPU is not compatible with TCG acceleration");
+        return;
+    }
+
     riscv_cpu_validate_misa_mxl(cpu, &local_err);
     if (local_err != NULL) {
         error_propagate(errp, local_err);
-- 
2.41.0

From: Thomas Huth <thuth@redhat.com>

The character that should be printed is stored in the 64 bit "payload"
variable. The code currently tries to print it by taking the address
of the variable and passing this pointer to qemu_chr_fe_write(). However,
this only works on little endian hosts where the least significant bits
are stored on the lowest address. To do this in a portable way, we have
to store the value in an uint8_t variable instead.

Fixes: 5033606780 ("RISC-V HTIF Console")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Bin Meng <bmeng@tinylab.org>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230721094720.902454-2-thuth@redhat.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 hw/char/riscv_htif.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/char/riscv_htif.c b/hw/char/riscv_htif.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/char/riscv_htif.c
+++ b/hw/char/riscv_htif.c
@@ -XXX,XX +XXX,XX @@ static void htif_handle_tohost_write(HTIFState *s, uint64_t val_written)
             s->tohost = 0; /* clear to indicate we read */
             return;
         } else if (cmd == HTIF_CONSOLE_CMD_PUTC) {
-            qemu_chr_fe_write(&s->chr, (uint8_t *)&payload, 1);
+            uint8_t ch = (uint8_t)payload;
+            qemu_chr_fe_write(&s->chr, &ch, 1);
             resp = 0x100 | (uint8_t)payload;
         } else {
             qemu_log("HTIF device %d: unknown command\n", device);
-- 
2.41.0

From: Thomas Huth <thuth@redhat.com>

Values that have been read via cpu_physical_memory_read() from the
guest's memory have to be swapped in case the host endianess differs
from the guest.

Fixes: a6e13e31d5 ("riscv_htif: Support console output via proxy syscall")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Bin Meng <bmeng@tinylab.org>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-Id: <20230721094720.902454-3-thuth@redhat.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 hw/char/riscv_htif.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/char/riscv_htif.c b/hw/char/riscv_htif.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/char/riscv_htif.c
+++ b/hw/char/riscv_htif.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/timer.h"
 #include "qemu/error-report.h"
 #include "exec/address-spaces.h"
+#include "exec/tswap.h"
 #include "sysemu/dma.h"
 
 #define RISCV_DEBUG_HTIF 0
@@ -XXX,XX +XXX,XX @@ static void htif_handle_tohost_write(HTIFState *s, uint64_t val_written)
             } else {
                 uint64_t syscall[8];
                 cpu_physical_memory_read(payload, syscall, sizeof(syscall));
-                if (syscall[0] == PK_SYS_WRITE &&
-                    syscall[1] == HTIF_DEV_CONSOLE &&
-                    syscall[3] == HTIF_CONSOLE_CMD_PUTC) {
+                if (tswap64(syscall[0]) == PK_SYS_WRITE &&
+                    tswap64(syscall[1]) == HTIF_DEV_CONSOLE &&
+                    tswap64(syscall[3]) == HTIF_CONSOLE_CMD_PUTC) {
                     uint8_t ch;
-                    cpu_physical_memory_read(syscall[2], &ch, 1);
+                    cpu_physical_memory_read(tswap64(syscall[2]), &ch, 1);
                     qemu_chr_fe_write(&s->chr, &ch, 1);
                     resp = 0x100 | (uint8_t)payload;
                 } else {
-- 
2.41.0

From: Daniel Henrique Barboza <dbarboza@ventanamicro.com>

zmmul was promoted from experimental to ratified in commit 6d00ffad4e95.
Add a riscv,isa string for it.

Fixes: 6d00ffad4e95 ("target/riscv: move zmmul out of the experimental properties")
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230720132424.371132-2-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
     ISA_EXT_DATA_ENTRY(zicsr, PRIV_VERSION_1_10_0, ext_icsr),
     ISA_EXT_DATA_ENTRY(zifencei, PRIV_VERSION_1_10_0, ext_ifencei),
     ISA_EXT_DATA_ENTRY(zihintpause, PRIV_VERSION_1_10_0, ext_zihintpause),
+    ISA_EXT_DATA_ENTRY(zmmul, PRIV_VERSION_1_12_0, ext_zmmul),
     ISA_EXT_DATA_ENTRY(zawrs, PRIV_VERSION_1_12_0, ext_zawrs),
     ISA_EXT_DATA_ENTRY(zfa, PRIV_VERSION_1_12_0, ext_zfa),
     ISA_EXT_DATA_ENTRY(zfbfmin, PRIV_VERSION_1_12_0, ext_zfbfmin),
-- 
2.41.0

From: Daniel Henrique Barboza <dbarboza@ventanamicro.com>

The cpu->cfg.epmp extension is still experimental, but it already has a
'smepmp' riscv,isa string. Add it.

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230720132424.371132-3-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
     ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
     ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
     ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
+    ISA_EXT_DATA_ENTRY(smepmp, PRIV_VERSION_1_12_0, epmp),
     ISA_EXT_DATA_ENTRY(smstateen, PRIV_VERSION_1_12_0, ext_smstateen),
     ISA_EXT_DATA_ENTRY(ssaia, PRIV_VERSION_1_12_0, ext_ssaia),
     ISA_EXT_DATA_ENTRY(sscofpmf, PRIV_VERSION_1_12_0, ext_sscofpmf),
-- 
2.41.0

From: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>

Commit bef6f008b98(accel/tcg: Return bool from page_check_range) converts
integer return value to bool type. However, it wrongly converted the use
of the API in riscv fault-only-first, where page_check_range < = 0, should
be converted to !page_check_range.

Signed-off-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20230729031618.821-1-zhiwei_liu@linux.alibaba.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/vector_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ vext_ldff(void *vd, void *v0, target_ulong base,
                                          cpu_mmu_index(env, false));
                 if (host) {
 #ifdef CONFIG_USER_ONLY
-                    if (page_check_range(addr, offset, PAGE_READ)) {
+                    if (!page_check_range(addr, offset, PAGE_READ)) {
                         vl = i;
                         goto ProbeSuccess;
                     }
-- 
2.41.0

From: Ard Biesheuvel <ardb@kernel.org>

The AES MixColumns and InvMixColumns operations are relatively
expensive 4x4 matrix multiplications in GF(2^8), which is why C
implementations usually rely on precomputed lookup tables rather than
performing the calculations on demand.

Given that we already carry those tables in QEMU, we can just grab the
right value in the implementation of the RISC-V AES32 instructions. Note
that the tables in question are permuted according to the respective
Sbox, so we can omit the Sbox lookup as well in this case.

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Zewen Ye <lustrew@foxmail.com>
Cc: Weiwei Li <liweiwei@iscas.ac.cn>
Cc: Junqiang Wang <wangjunqiang@iscas.ac.cn>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20230731084043.1791984-1-ardb@kernel.org>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 include/crypto/aes.h         |  7 +++++++
 crypto/aes.c                 |  4 ++--
 target/riscv/crypto_helper.c | 34 ++++------------------------------
 3 files changed, 13 insertions(+), 32 deletions(-)

diff --git a/include/crypto/aes.h b/include/crypto/aes.h
index XXXXXXX..XXXXXXX 100644
--- a/include/crypto/aes.h
+++ b/include/crypto/aes.h
@@ -XXX,XX +XXX,XX @@ void AES_decrypt(const unsigned char *in, unsigned char *out,
 extern const uint8_t AES_sbox[256];
 extern const uint8_t AES_isbox[256];
 
+/*
+AES_Te0[x] = S [x].[02, 01, 01, 03];
+AES_Td0[x] = Si[x].[0e, 09, 0d, 0b];
+*/
+
+extern const uint32_t AES_Te0[256], AES_Td0[256];
+
 #endif
diff --git a/crypto/aes.c b/crypto/aes.c
index XXXXXXX..XXXXXXX 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -XXX,XX +XXX,XX @@ AES_Td3[x] = Si[x].[09, 0d, 0b, 0e];
 AES_Td4[x] = Si[x].[01, 01, 01, 01];
 */
 
-static const uint32_t AES_Te0[256] = {
+const uint32_t AES_Te0[256] = {
     0xc66363a5U, 0xf87c7c84U, 0xee777799U, 0xf67b7b8dU,
     0xfff2f20dU, 0xd66b6bbdU, 0xde6f6fb1U, 0x91c5c554U,
     0x60303050U, 0x02010103U, 0xce6767a9U, 0x562b2b7dU,
@@ -XXX,XX +XXX,XX @@ static const uint32_t AES_Te4[256] = {
     0xb0b0b0b0U, 0x54545454U, 0xbbbbbbbbU, 0x16161616U,
 };
 
-static const uint32_t AES_Td0[256] = {
+const uint32_t AES_Td0[256] = {
     0x51f4a750U, 0x7e416553U, 0x1a17a4c3U, 0x3a275e96U,
     0x3bab6bcbU, 0x1f9d45f1U, 0xacfa58abU, 0x4be30393U,
     0x2030fa55U, 0xad766df6U, 0x88cc7691U, 0xf5024c25U,
diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/crypto_helper.c
+++ b/target/riscv/crypto_helper.c
@@ -XXX,XX +XXX,XX @@
 #include "crypto/aes-round.h"
 #include "crypto/sm4.h"
 
-#define AES_XTIME(a) \
-    ((a << 1) ^ ((a & 0x80) ? 0x1b : 0))
-
-#define AES_GFMUL(a, b) (( \
-    (((b) & 0x1) ? (a) : 0) ^ \
-    (((b) & 0x2) ? AES_XTIME(a) : 0) ^ \
-    (((b) & 0x4) ? AES_XTIME(AES_XTIME(a)) : 0) ^ \
-    (((b) & 0x8) ? AES_XTIME(AES_XTIME(AES_XTIME(a))) : 0)) & 0xFF)
-
-static inline uint32_t aes_mixcolumn_byte(uint8_t x, bool fwd)
-{
-    uint32_t u;
-
-    if (fwd) {
-        u = (AES_GFMUL(x, 3) << 24) | (x << 16) | (x << 8) |
-            (AES_GFMUL(x, 2) << 0);
-    } else {
-        u = (AES_GFMUL(x, 0xb) << 24) | (AES_GFMUL(x, 0xd) << 16) |
-            (AES_GFMUL(x, 0x9) << 8) | (AES_GFMUL(x, 0xe) << 0);
-    }
-    return u;
-}
-
 #define sext32_xlen(x) (target_ulong)(int32_t)(x)
 
 static inline target_ulong aes32_operation(target_ulong shamt,
@@ -XXX,XX +XXX,XX @@ static inline target_ulong aes32_operation(target_ulong shamt,
                                            bool enc, bool mix)
 {
     uint8_t si = rs2 >> shamt;
-    uint8_t so;
     uint32_t mixed;
     target_ulong res;
 
     if (enc) {
-        so = AES_sbox[si];
         if (mix) {
-            mixed = aes_mixcolumn_byte(so, true);
+            mixed = be32_to_cpu(AES_Te0[si]);
         } else {
-            mixed = so;
+            mixed = AES_sbox[si];
         }
     } else {
-        so = AES_isbox[si];
         if (mix) {
-            mixed = aes_mixcolumn_byte(so, false);
+            mixed = be32_to_cpu(AES_Td0[si]);
         } else {
-            mixed = so;
+            mixed = AES_isbox[si];
         }
     }
     mixed = rol32(mixed, shamt);
-- 
2.41.0

From: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>

Take some functions/macros out of `vector_helper` and put them in a new
module called `vector_internals`. This ensures they can be used by both
vector and vector-crypto helpers (latter implemented in proceeding
commits).

Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Max Chou <max.chou@sifive.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20230711165917.2629866-2-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/vector_internals.h | 182 +++++++++++++++++++++++++++++
 target/riscv/vector_helper.c    | 201 +-------------------------------
 target/riscv/vector_internals.c |  81 +++++++++++++
 target/riscv/meson.build        |   1 +
 4 files changed, 265 insertions(+), 200 deletions(-)
 create mode 100644 target/riscv/vector_internals.h
 create mode 100644 target/riscv/vector_internals.c

diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/riscv/vector_internals.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * RISC-V Vector Extension Internals
+ *
+ * Copyright (c) 2020 T-Head Semiconductor Co., Ltd. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef TARGET_RISCV_VECTOR_INTERNALS_H
+#define TARGET_RISCV_VECTOR_INTERNALS_H
+
+#include "qemu/osdep.h"
+#include "qemu/bitops.h"
+#include "cpu.h"
+#include "tcg/tcg-gvec-desc.h"
+#include "internals.h"
+
+static inline uint32_t vext_nf(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, NF);
+}
+
+/*
+ * Note that vector data is stored in host-endian 64-bit chunks,
+ * so addressing units smaller than that needs a host-endian fixup.
+ */
+#if HOST_BIG_ENDIAN
+#define H1(x)   ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x)   ((x) ^ 3)
+#define H4(x)   ((x) ^ 1)
+#define H8(x)   ((x))
+#else
+#define H1(x)   (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x)   (x)
+#define H4(x)   (x)
+#define H8(x)   (x)
+#endif
+
+/*
+ * Encode LMUL to lmul as following:
+ *     LMUL    vlmul    lmul
+ *      1       000       0
+ *      2       001       1
+ *      4       010       2
+ *      8       011       3
+ *      -       100       -
+ *     1/8      101      -3
+ *     1/4      110      -2
+ *     1/2      111      -1
+ */
+static inline int32_t vext_lmul(uint32_t desc)
+{
+    return sextract32(FIELD_EX32(simd_data(desc), VDATA, LMUL), 0, 3);
+}
+
+static inline uint32_t vext_vm(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, VM);
+}
+
+static inline uint32_t vext_vma(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, VMA);
+}
+
+static inline uint32_t vext_vta(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, VTA);
+}
+
+static inline uint32_t vext_vta_all_1s(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, VTA_ALL_1S);
+}
+
+/*
+ * Earlier designs (pre-0.9) had a varying number of bits
+ * per mask value (MLEN). In the 0.9 design, MLEN=1.
+ * (Section 4.5)
+ */
+static inline int vext_elem_mask(void *v0, int index)
+{
+    int idx = index / 64;
+    int pos = index  % 64;
+    return (((uint64_t *)v0)[idx] >> pos) & 1;
+}
+
+/*
+ * Get number of total elements, including prestart, body and tail elements.
+ * Note that when LMUL < 1, the tail includes the elements past VLMAX that
+ * are held in the same vector register.
+ */
+static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
+                                            uint32_t esz)
+{
+    uint32_t vlenb = simd_maxsz(desc);
+    uint32_t sew = 1 << FIELD_EX64(env->vtype, VTYPE, VSEW);
+    int8_t emul = ctzl(esz) - ctzl(sew) + vext_lmul(desc) < 0 ? 0 :
+                  ctzl(esz) - ctzl(sew) + vext_lmul(desc);
+    return (vlenb << emul) / esz;
+}
+
+/* set agnostic elements to 1s */
+void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
+                       uint32_t tot);
+
+/* expand macro args before macro */
+#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
+
+/* (TD, T1, T2, TX1, TX2) */
+#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
+#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
+#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
+#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
+
+/* operation of two vector elements */
+typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
+
+#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
+{                                                               \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
+    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
+}
+
+void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
+                CPURISCVState *env, uint32_t desc,
+                opivv2_fn *fn, uint32_t esz);
+
+/* generate the helpers for OPIVV */
+#define GEN_VEXT_VV(NAME, ESZ)                            \
+void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+                  void *vs2, CPURISCVState *env,          \
+                  uint32_t desc)                          \
+{                                                         \
+    do_vext_vv(vd, v0, vs1, vs2, env, desc,               \
+               do_##NAME, ESZ);                           \
+}
+
+typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
+
+/*
+ * (T1)s1 gives the real operator type.
+ * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
+ */
+#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
+static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
+{                                                                   \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
+    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
+}
+
+void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
+                CPURISCVState *env, uint32_t desc,
+                opivx2_fn fn, uint32_t esz);
+
+/* generate the helpers for OPIVX */
+#define GEN_VEXT_VX(NAME, ESZ)                            \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
+                  void *vs2, CPURISCVState *env,          \
+                  uint32_t desc)                          \
+{                                                         \
+    do_vext_vx(vd, v0, s1, vs2, env, desc,                \
+               do_##NAME, ESZ);                           \
+}
+
+#endif /* TARGET_RISCV_VECTOR_INTERNALS_H */
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@
 #include "fpu/softfloat.h"
 #include "tcg/tcg-gvec-desc.h"
 #include "internals.h"
+#include "vector_internals.h"
 #include <math.h>
 
 target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
@@ -XXX,XX +XXX,XX @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
     return vl;
 }
 
-/*
- * Note that vector data is stored in host-endian 64-bit chunks,
- * so addressing units smaller than that needs a host-endian fixup.
- */
-#if HOST_BIG_ENDIAN
-#define H1(x)   ((x) ^ 7)
-#define H1_2(x) ((x) ^ 6)
-#define H1_4(x) ((x) ^ 4)
-#define H2(x)   ((x) ^ 3)
-#define H4(x)   ((x) ^ 1)
-#define H8(x)   ((x))
-#else
-#define H1(x)   (x)
-#define H1_2(x) (x)
-#define H1_4(x) (x)
-#define H2(x)   (x)
-#define H4(x)   (x)
-#define H8(x)   (x)
-#endif
-
-static inline uint32_t vext_nf(uint32_t desc)
-{
-    return FIELD_EX32(simd_data(desc), VDATA, NF);
-}
-
-static inline uint32_t vext_vm(uint32_t desc)
-{
-    return FIELD_EX32(simd_data(desc), VDATA, VM);
-}
-
-/*
- * Encode LMUL to lmul as following:
- *     LMUL    vlmul    lmul
- *      1       000       0
- *      2       001       1
- *      4       010       2
- *      8       011       3
- *      -       100       -
- *     1/8      101      -3
- *     1/4      110      -2
- *     1/2      111      -1
- */
-static inline int32_t vext_lmul(uint32_t desc)
-{
-    return sextract32(FIELD_EX32(simd_data(desc), VDATA, LMUL), 0, 3);
-}
-
-static inline uint32_t vext_vta(uint32_t desc)
-{
-    return FIELD_EX32(simd_data(desc), VDATA, VTA);
-}
-
-static inline uint32_t vext_vma(uint32_t desc)
-{
-    return FIELD_EX32(simd_data(desc), VDATA, VMA);
-}
-
-static inline uint32_t vext_vta_all_1s(uint32_t desc)
-{
-    return FIELD_EX32(simd_data(desc), VDATA, VTA_ALL_1S);
-}
-
 /*
  * Get the maximum number of elements can be operated.
  *
@@ -XXX,XX +XXX,XX @@ static inline uint32_t vext_max_elems(uint32_t desc, uint32_t log2_esz)
     return scale < 0 ? vlenb >> -scale : vlenb << scale;
 }
 
-/*
- * Get number of total elements, including prestart, body and tail elements.
- * Note that when LMUL < 1, the tail includes the elements past VLMAX that
- * are held in the same vector register.
- */
-static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
-                                            uint32_t esz)
-{
-    uint32_t vlenb = simd_maxsz(desc);
-    uint32_t sew = 1 << FIELD_EX64(env->vtype, VTYPE, VSEW);
-    int8_t emul = ctzl(esz) - ctzl(sew) + vext_lmul(desc) < 0 ? 0 :
-                  ctzl(esz) - ctzl(sew) + vext_lmul(desc);
-    return (vlenb << emul) / esz;
-}
-
 static inline target_ulong adjust_addr(CPURISCVState *env, target_ulong addr)
 {
     return (addr & ~env->cur_pmmask) | env->cur_pmbase;
@@ -XXX,XX +XXX,XX @@ static void probe_pages(CPURISCVState *env, target_ulong addr,
     }
 }
 
-/* set agnostic elements to 1s */
-static void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
-                              uint32_t tot)
-{
-    if (is_agnostic == 0) {
-        /* policy undisturbed */
-        return;
-    }
-    if (tot - cnt == 0) {
-        return;
-    }
-    memset(base + cnt, -1, tot - cnt);
-}
-
 static inline void vext_set_elem_mask(void *v0, int index,
                                       uint8_t value)
 {
@@ -XXX,XX +XXX,XX @@ static inline void vext_set_elem_mask(void *v0, int index,
     ((uint64_t *)v0)[idx] = deposit64(old, pos, 1, value);
 }
 
-/*
- * Earlier designs (pre-0.9) had a varying number of bits
- * per mask value (MLEN). In the 0.9 design, MLEN=1.
- * (Section 4.5)
- */
-static inline int vext_elem_mask(void *v0, int index)
-{
-    int idx = index / 64;
-    int pos = index  % 64;
-    return (((uint64_t *)v0)[idx] >> pos) & 1;
-}
-
 /* elements operations for load and store */
 typedef void vext_ldst_elem_fn(CPURISCVState *env, abi_ptr addr,
                                uint32_t idx, void *vd, uintptr_t retaddr);
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
  * Vector Integer Arithmetic Instructions
  */
 
-/* expand macro args before macro */
-#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
-
 /* (TD, T1, T2, TX1, TX2) */
 #define OP_SSS_B int8_t, int8_t, int8_t, int8_t, int8_t
 #define OP_SSS_H int16_t, int16_t, int16_t, int16_t, int16_t
 #define OP_SSS_W int32_t, int32_t, int32_t, int32_t, int32_t
 #define OP_SSS_D int64_t, int64_t, int64_t, int64_t, int64_t
-#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
-#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
-#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
-#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
 #define OP_SUS_B int8_t, uint8_t, int8_t, uint8_t, int8_t
 #define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
 #define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
 #define NOP_UUU_H uint16_t, uint16_t, uint32_t, uint16_t, uint32_t
 #define NOP_UUU_W uint32_t, uint32_t, uint64_t, uint32_t, uint64_t
 
-/* operation of two vector elements */
-typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
-
-#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
-static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
-{                                                               \
-    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
-    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
-    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
-}
 #define DO_SUB(N, M) (N - M)
 #define DO_RSUB(N, M) (M - N)
 
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vsub_vv_h, OP_SSS_H, H2, H2, H2, DO_SUB)
 RVVCALL(OPIVV2, vsub_vv_w, OP_SSS_W, H4, H4, H4, DO_SUB)
 RVVCALL(OPIVV2, vsub_vv_d, OP_SSS_D, H8, H8, H8, DO_SUB)
 
-static void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
-                       CPURISCVState *env, uint32_t desc,
-                       opivv2_fn *fn, uint32_t esz)
-{
-    uint32_t vm = vext_vm(desc);
-    uint32_t vl = env->vl;
-    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
-    uint32_t vta = vext_vta(desc);
-    uint32_t vma = vext_vma(desc);
-    uint32_t i;
-
-    for (i = env->vstart; i < vl; i++) {
-        if (!vm && !vext_elem_mask(v0, i)) {
-            /* set masked-off elements to 1s */
-            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
-            continue;
-        }
-        fn(vd, vs1, vs2, i);
-    }
-    env->vstart = 0;
-    /* set tail elements to 1s */
-    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
-}
-
-/* generate the helpers for OPIVV */
-#define GEN_VEXT_VV(NAME, ESZ)                            \
-void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
-                  void *vs2, CPURISCVState *env,          \
-                  uint32_t desc)                          \
-{                                                         \
-    do_vext_vv(vd, v0, vs1, vs2, env, desc,               \
-               do_##NAME, ESZ);                           \
-}
-
 GEN_VEXT_VV(vadd_vv_b, 1)
 GEN_VEXT_VV(vadd_vv_h, 2)
 GEN_VEXT_VV(vadd_vv_w, 4)
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VV(vsub_vv_h, 2)
 GEN_VEXT_VV(vsub_vv_w, 4)
 GEN_VEXT_VV(vsub_vv_d, 8)
 
-typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
-
-/*
- * (T1)s1 gives the real operator type.
- * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
- */
-#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
-static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
-{                                                                   \
-    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
-    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
-}
 
 RVVCALL(OPIVX2, vadd_vx_b, OP_SSS_B, H1, H1, DO_ADD)
 RVVCALL(OPIVX2, vadd_vx_h, OP_SSS_H, H2, H2, DO_ADD)
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vrsub_vx_h, OP_SSS_H, H2, H2, DO_RSUB)
 RVVCALL(OPIVX2, vrsub_vx_w, OP_SSS_W, H4, H4, DO_RSUB)
 RVVCALL(OPIVX2, vrsub_vx_d, OP_SSS_D, H8, H8, DO_RSUB)
 
-static void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
-                       CPURISCVState *env, uint32_t desc,
-                       opivx2_fn fn, uint32_t esz)
-{
-    uint32_t vm = vext_vm(desc);
-    uint32_t vl = env->vl;
-    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
-    uint32_t vta = vext_vta(desc);
-    uint32_t vma = vext_vma(desc);
-    uint32_t i;
-
-    for (i = env->vstart; i < vl; i++) {
-        if (!vm && !vext_elem_mask(v0, i)) {
-            /* set masked-off elements to 1s */
-            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
-            continue;
-        }
-        fn(vd, s1, vs2, i);
-    }
-    env->vstart = 0;
-    /* set tail elements to 1s */
-    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
-}
-
-/* generate the helpers for OPIVX */
-#define GEN_VEXT_VX(NAME, ESZ)                            \
-void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
-                  void *vs2, CPURISCVState *env,          \
-                  uint32_t desc)                          \
-{                                                         \
-    do_vext_vx(vd, v0, s1, vs2, env, desc,                \
-               do_##NAME, ESZ);                           \
-}
-
 GEN_VEXT_VX(vadd_vx_b, 1)
 GEN_VEXT_VX(vadd_vx_h, 2)
 GEN_VEXT_VX(vadd_vx_w, 4)
diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/riscv/vector_internals.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * RISC-V Vector Extension Internals
+ *
+ * Copyright (c) 2020 T-Head Semiconductor Co., Ltd. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "vector_internals.h"
+
+/* set agnostic elements to 1s */
+void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
+                       uint32_t tot)
+{
+    if (is_agnostic == 0) {
+        /* policy undisturbed */
+        return;
+    }
+    if (tot - cnt == 0) {
+        return ;
+    }
+    memset(base + cnt, -1, tot - cnt);
+}
+
+void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
+                CPURISCVState *env, uint32_t desc,
+                opivv2_fn *fn, uint32_t esz)
+{
+    uint32_t vm = vext_vm(desc);
+    uint32_t vl = env->vl;
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+    uint32_t vta = vext_vta(desc);
+    uint32_t vma = vext_vma(desc);
+    uint32_t i;
+
+    for (i = env->vstart; i < vl; i++) {
+        if (!vm && !vext_elem_mask(v0, i)) {
+            /* set masked-off elements to 1s */
+            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
+            continue;
+        }
+        fn(vd, vs1, vs2, i);
+    }
+    env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
+}
+
+void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
+                CPURISCVState *env, uint32_t desc,
+                opivx2_fn fn, uint32_t esz)
+{
+    uint32_t vm = vext_vm(desc);
+    uint32_t vl = env->vl;
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+    uint32_t vta = vext_vta(desc);
+    uint32_t vma = vext_vma(desc);
+    uint32_t i;
+
+    for (i = env->vstart; i < vl; i++) {
+        if (!vm && !vext_elem_mask(v0, i)) {
+            /* set masked-off elements to 1s */
+            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
+            continue;
+        }
+        fn(vd, s1, vs2, i);
+    }
+    env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
+}
diff --git a/target/riscv/meson.build b/target/riscv/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/meson.build
+++ b/target/riscv/meson.build
@@ -XXX,XX +XXX,XX @@ riscv_ss.add(files(
   'gdbstub.c',
   'op_helper.c',
   'vector_helper.c',
+  'vector_internals.c',
   'bitmanip_helper.c',
   'translate.c',
   'm128_helper.c',
-- 
2.41.0

From: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>

Refactor the non SEW-specific stuff out of `GEN_OPIVV_TRANS` into
function `opivv_trans` (similar to `opivi_trans`). `opivv_trans` will be
used in proceeding vector-crypto commits.

Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Max Chou <max.chou@sifive.com>
Message-ID: <20230711165917.2629866-3-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/insn_trans/trans_rvv.c.inc | 62 +++++++++++++------------
 1 file changed, 32 insertions(+), 30 deletions(-)

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

Remove the redundant "vl == 0" check which is already included within the  vstart >= vl check, when vl == 0.

Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Max Chou <max.chou@sifive.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20230711165917.2629866-4-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/insn_trans/trans_rvv.c.inc | 31 +------------------------
 1 file changed, 1 insertion(+), 30 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -XXX,XX +XXX,XX @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
     TCGv_i32 desc;
 
     TCGLabel *over = gen_new_label();
-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     dest = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
     TCGv_i32 desc;
 
     TCGLabel *over = gen_new_label();
-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     dest = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ static bool ldst_index_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
     TCGv_i32 desc;
 
     TCGLabel *over = gen_new_label();
-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     dest = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ static bool ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
     TCGv_i32 desc;
 
     TCGLabel *over = gen_new_label();
-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     dest = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn *gvec_fn,
         return false;
     }
 
-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
@@ -XXX,XX +XXX,XX @@ static bool opivx_trans(uint32_t vd, uint32_t rs1, uint32_t vs2, uint32_t vm,
     uint32_t data = 0;
 
     TCGLabel *over = gen_new_label();
-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     dest = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ static bool opivi_trans(uint32_t vd, uint32_t imm, uint32_t vs2, uint32_t vm,
     uint32_t data = 0;
 
     TCGLabel *over = gen_new_label();
-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     dest = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ static bool do_opivv_widen(DisasContext *s, arg_rmrr *a,
     if (checkfn(s, a)) {
         uint32_t data = 0;
         TCGLabel *over = gen_new_label();
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
         data = FIELD_DP32(data, VDATA, VM, a->vm);
@@ -XXX,XX +XXX,XX @@ static bool do_opiwv_widen(DisasContext *s, arg_rmrr *a,
     if (opiwv_widen_check(s, a)) {
         uint32_t data = 0;
         TCGLabel *over = gen_new_label();
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
         data = FIELD_DP32(data, VDATA, VM, a->vm);
@@ -XXX,XX +XXX,XX @@ static bool opivv_trans(uint32_t vd, uint32_t vs1, uint32_t vs2, uint32_t vm,
 {
     uint32_t data = 0;
     TCGLabel *over = gen_new_label();
-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     data = FIELD_DP32(data, VDATA, VM, vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
             gen_helper_##NAME##_w,                                 \
         };                                                         \
         TCGLabel *over = gen_new_label();                          \
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_v(DisasContext *s, arg_vmv_v_v *a)
                 gen_helper_vmv_v_v_w, gen_helper_vmv_v_v_d,
             };
             TCGLabel *over = gen_new_label();
-            tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
             tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
             tcg_gen_gvec_2_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),
@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_x(DisasContext *s, arg_vmv_v_x *a)
         vext_check_ss(s, a->rd, 0, 1)) {
         TCGv s1;
         TCGLabel *over = gen_new_label();
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
         s1 = get_gpr(s, a->rs1, EXT_SIGN);
@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_v_i(DisasContext *s, arg_vmv_v_i *a)
                 gen_helper_vmv_v_x_w, gen_helper_vmv_v_x_d,
             };
             TCGLabel *over = gen_new_label();
-            tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
             tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
             s1 = tcg_constant_i64(simm);
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
         };                                                         \
         TCGLabel *over = gen_new_label();                          \
         gen_set_rm(s, RISCV_FRM_DYN);                              \
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
@@ -XXX,XX +XXX,XX @@ static bool opfvf_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
     TCGv_i64 t1;
 
     TCGLabel *over = gen_new_label();
-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     dest = tcg_temp_new_ptr();
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)           \
         };                                                       \
         TCGLabel *over = gen_new_label();                        \
         gen_set_rm(s, RISCV_FRM_DYN);                            \
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);        \
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);\
                                                                  \
         data = FIELD_DP32(data, VDATA, VM, a->vm);               \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)             \
         };                                                         \
         TCGLabel *over = gen_new_label();                          \
         gen_set_rm(s, RISCV_FRM_DYN);                              \
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
@@ -XXX,XX +XXX,XX @@ static bool do_opfv(DisasContext *s, arg_rmr *a,
         uint32_t data = 0;
         TCGLabel *over = gen_new_label();
         gen_set_rm_chkfrm(s, rm);
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
         data = FIELD_DP32(data, VDATA, VM, a->vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_vfmv_v_f(DisasContext *s, arg_vfmv_v_f *a)
                 gen_helper_vmv_v_x_d,
             };
             TCGLabel *over = gen_new_label();
-            tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
             tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
             t1 = tcg_temp_new_i64();
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
         };                                                         \
         TCGLabel *over = gen_new_label();                          \
         gen_set_rm_chkfrm(s, FRM);                                 \
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
         };                                                         \
         TCGLabel *over = gen_new_label();                          \
         gen_set_rm(s, RISCV_FRM_DYN);                              \
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
         };                                                         \
         TCGLabel *over = gen_new_label();                          \
         gen_set_rm_chkfrm(s, FRM);                                 \
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)              \
         };                                                         \
         TCGLabel *over = gen_new_label();                          \
         gen_set_rm_chkfrm(s, FRM);                                 \
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, VM, a->vm);                 \
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME(DisasContext *s, arg_r *a)                \
         uint32_t data = 0;                                         \
         gen_helper_gvec_4_ptr *fn = gen_helper_##NAME;             \
         TCGLabel *over = gen_new_label();                          \
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);          \
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); \
                                                                    \
         data = FIELD_DP32(data, VDATA, LMUL, s->lmul);             \
@@ -XXX,XX +XXX,XX @@ static bool trans_vid_v(DisasContext *s, arg_vid_v *a)
         require_vm(a->vm, a->rd)) {
         uint32_t data = 0;
         TCGLabel *over = gen_new_label();
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
         data = FIELD_DP32(data, VDATA, VM, a->vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x *a)
         TCGv s1;
         TCGLabel *over = gen_new_label();
 
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
         t1 = tcg_temp_new_i64();
@@ -XXX,XX +XXX,XX @@ static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f *a)
         TCGv_i64 t1;
         TCGLabel *over = gen_new_label();
 
-        /* if vl == 0 or vstart >= vl, skip vector register write back */
-        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
+        /* if vstart >= vl, skip vector register write back */
         tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
         /* NaN-box f[rs1] */
@@ -XXX,XX +XXX,XX @@ static bool int_ext_op(DisasContext *s, arg_rmr *a, uint8_t seq)
     uint32_t data = 0;
     gen_helper_gvec_3_ptr *fn;
     TCGLabel *over = gen_new_label();
-    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
     tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
 
     static gen_helper_gvec_3_ptr * const fns[6][4] = {
-- 
2.41.0

From: Lawrence Hunter <lawrence.hunter@codethink.co.uk>

This commit adds support for the Zvbc vector-crypto extension, which
consists of the following instructions:

* vclmulh.[vx,vv]
* vclmul.[vx,vv]

Translation functions are defined in
`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
`target/riscv/vcrypto_helper.c`.

Co-authored-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Co-authored-by: Max Chou <max.chou@sifive.com>
Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
Signed-off-by: Max Chou <max.chou@sifive.com>
[max.chou@sifive.com: Exposed x-zvbc property]
Message-ID: <20230711165917.2629866-5-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu_cfg.h                   |  1 +
 target/riscv/helper.h                    |  6 +++
 target/riscv/insn32.decode               |  6 +++
 target/riscv/cpu.c                       |  9 ++++
 target/riscv/translate.c                 |  1 +
 target/riscv/vcrypto_helper.c            | 59 ++++++++++++++++++++++
 target/riscv/insn_trans/trans_rvvk.c.inc | 62 ++++++++++++++++++++++++
 target/riscv/meson.build                 |  3 +-
 8 files changed, 146 insertions(+), 1 deletion(-)
 create mode 100644 target/riscv/vcrypto_helper.c
 create mode 100644 target/riscv/insn_trans/trans_rvvk.c.inc

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

Move the checks out of `do_opiv{v,x,i}_gvec{,_shift}` functions
and into the corresponding macros. This enables the functions to be
reused in proceeding commits without check duplication.

Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Max Chou <max.chou@sifive.com>
Message-ID: <20230711165917.2629866-6-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/insn_trans/trans_rvv.c.inc | 28 +++++++++++--------------
 1 file changed, 12 insertions(+), 16 deletions(-)

From: Dickon Hood <dickon.hood@codethink.co.uk>

Zvbb (implemented in later commit) has a widening instruction, which
requires an extra check on the enabled extensions.  Refactor
GEN_OPIVX_WIDEN_TRANS() to take a check function to avoid reimplementing
it.

Signed-off-by: Dickon Hood <dickon.hood@codethink.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Max Chou <max.chou@sifive.com>
Message-ID: <20230711165917.2629866-7-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/insn_trans/trans_rvv.c.inc | 52 +++++++++++--------------
 1 file changed, 23 insertions(+), 29 deletions(-)

From: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>

Move some macros out of `vector_helper` and into `vector_internals`.
This ensures they can be used by both vector and vector-crypto helpers
(latter implemented in proceeding commits).

Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Max Chou <max.chou@sifive.com>
Message-ID: <20230711165917.2629866-8-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/vector_internals.h | 46 +++++++++++++++++++++++++++++++++
 target/riscv/vector_helper.c    | 42 ------------------------------
 2 files changed, 46 insertions(+), 42 deletions(-)

diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_internals.h
+++ b/target/riscv/vector_internals.h
@@ -XXX,XX +XXX,XX @@ void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
 /* expand macro args before macro */
 #define RVVCALL(macro, ...)  macro(__VA_ARGS__)
 
+/* (TD, T2, TX2) */
+#define OP_UU_B uint8_t, uint8_t, uint8_t
+#define OP_UU_H uint16_t, uint16_t, uint16_t
+#define OP_UU_W uint32_t, uint32_t, uint32_t
+#define OP_UU_D uint64_t, uint64_t, uint64_t
+
 /* (TD, T1, T2, TX1, TX2) */
 #define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
 #define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
 #define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
 #define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
 
+#define OPIVV1(NAME, TD, T2, TX2, HD, HS2, OP)         \
+static void do_##NAME(void *vd, void *vs2, int i)      \
+{                                                      \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                    \
+    *((TD *)vd + HD(i)) = OP(s2);                      \
+}
+
+#define GEN_VEXT_V(NAME, ESZ)                          \
+void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
+                  CPURISCVState *env, uint32_t desc)   \
+{                                                      \
+    uint32_t vm = vext_vm(desc);                       \
+    uint32_t vl = env->vl;                             \
+    uint32_t total_elems =                             \
+        vext_get_total_elems(env, desc, ESZ);          \
+    uint32_t vta = vext_vta(desc);                     \
+    uint32_t vma = vext_vma(desc);                     \
+    uint32_t i;                                        \
+                                                       \
+    for (i = env->vstart; i < vl; i++) {               \
+        if (!vm && !vext_elem_mask(v0, i)) {           \
+            /* set masked-off elements to 1s */        \
+            vext_set_elems_1s(vd, vma, i * ESZ,        \
+                              (i + 1) * ESZ);          \
+            continue;                                  \
+        }                                              \
+        do_##NAME(vd, vs2, i);                         \
+    }                                                  \
+    env->vstart = 0;                                   \
+    /* set tail elements to 1s */                      \
+    vext_set_elems_1s(vd, vta, vl * ESZ,               \
+                      total_elems * ESZ);              \
+}
+
 /* operation of two vector elements */
 typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
 
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
                do_##NAME, ESZ);                           \
 }
 
+/* Three of the widening shortening macros: */
+/* (TD, T1, T2, TX1, TX2) */
+#define WOP_UUU_B uint16_t, uint8_t, uint8_t, uint16_t, uint16_t
+#define WOP_UUU_H uint32_t, uint16_t, uint16_t, uint32_t, uint32_t
+#define WOP_UUU_W uint64_t, uint32_t, uint32_t, uint64_t, uint64_t
+
 #endif /* TARGET_RISCV_VECTOR_INTERNALS_H */
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
 #define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
 #define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
 #define OP_SUS_D int64_t, uint64_t, int64_t, uint64_t, int64_t
-#define WOP_UUU_B uint16_t, uint8_t, uint8_t, uint16_t, uint16_t
-#define WOP_UUU_H uint32_t, uint16_t, uint16_t, uint32_t, uint32_t
-#define WOP_UUU_W uint64_t, uint32_t, uint32_t, uint64_t, uint64_t
 #define WOP_SSS_B int16_t, int8_t, int8_t, int16_t, int16_t
 #define WOP_SSS_H int32_t, int16_t, int16_t, int32_t, int32_t
 #define WOP_SSS_W int64_t, int32_t, int32_t, int64_t, int64_t
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_VF(vfwnmsac_vf_h, 4)
 GEN_VEXT_VF(vfwnmsac_vf_w, 8)
 
 /* Vector Floating-Point Square-Root Instruction */
-/* (TD, T2, TX2) */
-#define OP_UU_H uint16_t, uint16_t, uint16_t
-#define OP_UU_W uint32_t, uint32_t, uint32_t
-#define OP_UU_D uint64_t, uint64_t, uint64_t
-
 #define OPFVV1(NAME, TD, T2, TX2, HD, HS2, OP)         \
 static void do_##NAME(void *vd, void *vs2, int i,      \
                       CPURISCVState *env)              \
@@ -XXX,XX +XXX,XX @@ GEN_VEXT_CMP_VF(vmfge_vf_w, uint32_t, H4, vmfge32)
 GEN_VEXT_CMP_VF(vmfge_vf_d, uint64_t, H8, vmfge64)
 
 /* Vector Floating-Point Classify Instruction */
-#define OPIVV1(NAME, TD, T2, TX2, HD, HS2, OP)         \
-static void do_##NAME(void *vd, void *vs2, int i)      \
-{                                                      \
-    TX2 s2 = *((T2 *)vs2 + HS2(i));                    \
-    *((TD *)vd + HD(i)) = OP(s2);                      \
-}
-
-#define GEN_VEXT_V(NAME, ESZ)                          \
-void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
-                  CPURISCVState *env, uint32_t desc)   \
-{                                                      \
-    uint32_t vm = vext_vm(desc);                       \
-    uint32_t vl = env->vl;                             \
-    uint32_t total_elems =                             \
-        vext_get_total_elems(env, desc, ESZ);          \
-    uint32_t vta = vext_vta(desc);                     \
-    uint32_t vma = vext_vma(desc);                     \
-    uint32_t i;                                        \
-                                                       \
-    for (i = env->vstart; i < vl; i++) {               \
-        if (!vm && !vext_elem_mask(v0, i)) {           \
-            /* set masked-off elements to 1s */        \
-            vext_set_elems_1s(vd, vma, i * ESZ,        \
-                              (i + 1) * ESZ);          \
-            continue;                                  \
-        }                                              \
-        do_##NAME(vd, vs2, i);                         \
-    }                                                  \
-    env->vstart = 0;                                   \
-    /* set tail elements to 1s */                      \
-    vext_set_elems_1s(vd, vta, vl * ESZ,               \
-                      total_elems * ESZ);              \
-}
-
 target_ulong fclass_h(uint64_t frs1)
 {
     float16 f = frs1;
-- 
2.41.0

From: Dickon Hood <dickon.hood@codethink.co.uk>

This commit adds support for the Zvbb vector-crypto extension, which
consists of the following instructions:

* vrol.[vv,vx]
* vror.[vv,vx,vi]
* vbrev8.v
* vrev8.v
* vandn.[vv,vx]
* vbrev.v
* vclz.v
* vctz.v
* vcpop.v
* vwsll.[vv,vx,vi]

Translation functions are defined in
`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
`target/riscv/vcrypto_helper.c`.

Co-authored-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Co-authored-by: William Salmon <will.salmon@codethink.co.uk>
Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
[max.chou@sifive.com: Fix imm mode of vror.vi]
Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: William Salmon <will.salmon@codethink.co.uk>
Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Signed-off-by: Dickon Hood <dickon.hood@codethink.co.uk>
Signed-off-by: Max Chou <max.chou@sifive.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
[max.chou@sifive.com: Exposed x-zvbb property]
Message-ID: <20230711165917.2629866-9-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu_cfg.h                   |   1 +
 target/riscv/helper.h                    |  62 +++++++++
 target/riscv/insn32.decode               |  20 +++
 target/riscv/cpu.c                       |  12 ++
 target/riscv/vcrypto_helper.c            | 138 +++++++++++++++++++
 target/riscv/insn_trans/trans_rvvk.c.inc | 164 +++++++++++++++++++++++
 6 files changed, 397 insertions(+)

diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -XXX,XX +XXX,XX @@ struct RISCVCPUConfig {
     bool ext_zve32f;
     bool ext_zve64f;
     bool ext_zve64d;
+    bool ext_zvbb;
     bool ext_zvbc;
     bool ext_zmmul;
     bool ext_zvfbfmin;
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vclmulh_vv, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vclmulh_vx, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vror_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vror_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vror_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vror_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vror_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vror_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vror_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vror_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vrol_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrol_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrol_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrol_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vrol_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrol_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrol_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrol_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_5(vrev8_v_b, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vrev8_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vrev8_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vrev8_v_d, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev8_v_b, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev8_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev8_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev8_v_d, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev_v_b, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev_v_d, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_5(vclz_v_b, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vclz_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vclz_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vclz_v_d, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vctz_v_b, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vctz_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vctz_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vctz_v_d, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vcpop_v_b, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vcpop_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vcpop_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vcpop_v_d, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vwsll_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsll_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsll_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsll_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsll_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsll_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vandn_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vandn_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vandn_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vandn_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vandn_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vandn_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vandn_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vandn_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -XXX,XX +XXX,XX @@
 %imm_u    12:s20                 !function=ex_shift_12
 %imm_bs   30:2                   !function=ex_shift_3
 %imm_rnum 20:4
+%imm_z6   26:1 15:5
 
 # Argument sets:
 &empty
@@ -XXX,XX +XXX,XX @@
 @r_vm    ...... vm:1 ..... ..... ... ..... ....... &rmrr %rs2 %rs1 %rd
 @r_vm_1  ...... . ..... ..... ... ..... .......    &rmrr vm=1 %rs2 %rs1 %rd
 @r_vm_0  ...... . ..... ..... ... ..... .......    &rmrr vm=0 %rs2 %rs1 %rd
+@r2_zimm6  ..... . vm:1 ..... ..... ... ..... .......  &rmrr %rs2 rs1=%imm_z6 %rd
 @r2_zimm11 . zimm:11  ..... ... ..... ....... %rs1 %rd
 @r2_zimm10 .. zimm:10  ..... ... ..... ....... %rs1 %rd
 @r2_s    .......   ..... ..... ... ..... ....... %rs2 %rs1
@@ -XXX,XX +XXX,XX @@ vclmul_vv   001100 . ..... ..... 010 ..... 1010111 @r_vm
 vclmul_vx   001100 . ..... ..... 110 ..... 1010111 @r_vm
 vclmulh_vv  001101 . ..... ..... 010 ..... 1010111 @r_vm
 vclmulh_vx  001101 . ..... ..... 110 ..... 1010111 @r_vm
+
+# *** Zvbb vector crypto extension ***
+vrol_vv     010101 . ..... ..... 000 ..... 1010111 @r_vm
+vrol_vx     010101 . ..... ..... 100 ..... 1010111 @r_vm
+vror_vv     010100 . ..... ..... 000 ..... 1010111 @r_vm
+vror_vx     010100 . ..... ..... 100 ..... 1010111 @r_vm
+vror_vi     01010. . ..... ..... 011 ..... 1010111 @r2_zimm6
+vbrev8_v    010010 . ..... 01000 010 ..... 1010111 @r2_vm
+vrev8_v     010010 . ..... 01001 010 ..... 1010111 @r2_vm
+vandn_vv    000001 . ..... ..... 000 ..... 1010111 @r_vm
+vandn_vx    000001 . ..... ..... 100 ..... 1010111 @r_vm
+vbrev_v     010010 . ..... 01010 010 ..... 1010111 @r2_vm
+vclz_v      010010 . ..... 01100 010 ..... 1010111 @r2_vm
+vctz_v      010010 . ..... 01101 010 ..... 1010111 @r2_vm
+vcpop_v     010010 . ..... 01110 010 ..... 1010111 @r2_vm
+vwsll_vv    110101 . ..... ..... 000 ..... 1010111 @r_vm
+vwsll_vx    110101 . ..... ..... 100 ..... 1010111 @r_vm
+vwsll_vi    110101 . ..... ..... 011 ..... 1010111 @r_vm
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
     ISA_EXT_DATA_ENTRY(zksed, PRIV_VERSION_1_12_0, ext_zksed),
     ISA_EXT_DATA_ENTRY(zksh, PRIV_VERSION_1_12_0, ext_zksh),
     ISA_EXT_DATA_ENTRY(zkt, PRIV_VERSION_1_12_0, ext_zkt),
+    ISA_EXT_DATA_ENTRY(zvbb, PRIV_VERSION_1_12_0, ext_zvbb),
     ISA_EXT_DATA_ENTRY(zvbc, PRIV_VERSION_1_12_0, ext_zvbc),
     ISA_EXT_DATA_ENTRY(zve32f, PRIV_VERSION_1_10_0, ext_zve32f),
     ISA_EXT_DATA_ENTRY(zve64f, PRIV_VERSION_1_10_0, ext_zve64f),
@@ -XXX,XX +XXX,XX @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
         return;
     }
 
+    /*
+     * In principle Zve*x would also suffice here, were they supported
+     * in qemu
+     */
+    if (cpu->cfg.ext_zvbb && !cpu->cfg.ext_zve32f) {
+        error_setg(errp,
+                   "Vector crypto extensions require V or Zve* extensions");
+        return;
+    }
+
     if (cpu->cfg.ext_zvbc && !cpu->cfg.ext_zve64f) {
         error_setg(errp, "Zvbc extension requires V or Zve64{f,d} extensions");
         return;
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
     DEFINE_PROP_BOOL("x-zvfbfwma", RISCVCPU, cfg.ext_zvfbfwma, false),
 
     /* Vector cryptography extensions */
+    DEFINE_PROP_BOOL("x-zvbb", RISCVCPU, cfg.ext_zvbb, false),
     DEFINE_PROP_BOOL("x-zvbc", RISCVCPU, cfg.ext_zvbc, false),
 
     DEFINE_PROP_END_OF_LIST(),
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/osdep.h"
 #include "qemu/host-utils.h"
 #include "qemu/bitops.h"
+#include "qemu/bswap.h"
 #include "cpu.h"
 #include "exec/memop.h"
 #include "exec/exec-all.h"
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVV2, vclmulh_vv, OP_UUU_D, H8, H8, H8, clmulh64)
 GEN_VEXT_VV(vclmulh_vv, 8)
 RVVCALL(OPIVX2, vclmulh_vx, OP_UUU_D, H8, H8, clmulh64)
 GEN_VEXT_VX(vclmulh_vx, 8)
+
+RVVCALL(OPIVV2, vror_vv_b, OP_UUU_B, H1, H1, H1, ror8)
+RVVCALL(OPIVV2, vror_vv_h, OP_UUU_H, H2, H2, H2, ror16)
+RVVCALL(OPIVV2, vror_vv_w, OP_UUU_W, H4, H4, H4, ror32)
+RVVCALL(OPIVV2, vror_vv_d, OP_UUU_D, H8, H8, H8, ror64)
+GEN_VEXT_VV(vror_vv_b, 1)
+GEN_VEXT_VV(vror_vv_h, 2)
+GEN_VEXT_VV(vror_vv_w, 4)
+GEN_VEXT_VV(vror_vv_d, 8)
+
+RVVCALL(OPIVX2, vror_vx_b, OP_UUU_B, H1, H1, ror8)
+RVVCALL(OPIVX2, vror_vx_h, OP_UUU_H, H2, H2, ror16)
+RVVCALL(OPIVX2, vror_vx_w, OP_UUU_W, H4, H4, ror32)
+RVVCALL(OPIVX2, vror_vx_d, OP_UUU_D, H8, H8, ror64)
+GEN_VEXT_VX(vror_vx_b, 1)
+GEN_VEXT_VX(vror_vx_h, 2)
+GEN_VEXT_VX(vror_vx_w, 4)
+GEN_VEXT_VX(vror_vx_d, 8)
+
+RVVCALL(OPIVV2, vrol_vv_b, OP_UUU_B, H1, H1, H1, rol8)
+RVVCALL(OPIVV2, vrol_vv_h, OP_UUU_H, H2, H2, H2, rol16)
+RVVCALL(OPIVV2, vrol_vv_w, OP_UUU_W, H4, H4, H4, rol32)
+RVVCALL(OPIVV2, vrol_vv_d, OP_UUU_D, H8, H8, H8, rol64)
+GEN_VEXT_VV(vrol_vv_b, 1)
+GEN_VEXT_VV(vrol_vv_h, 2)
+GEN_VEXT_VV(vrol_vv_w, 4)
+GEN_VEXT_VV(vrol_vv_d, 8)
+
+RVVCALL(OPIVX2, vrol_vx_b, OP_UUU_B, H1, H1, rol8)
+RVVCALL(OPIVX2, vrol_vx_h, OP_UUU_H, H2, H2, rol16)
+RVVCALL(OPIVX2, vrol_vx_w, OP_UUU_W, H4, H4, rol32)
+RVVCALL(OPIVX2, vrol_vx_d, OP_UUU_D, H8, H8, rol64)
+GEN_VEXT_VX(vrol_vx_b, 1)
+GEN_VEXT_VX(vrol_vx_h, 2)
+GEN_VEXT_VX(vrol_vx_w, 4)
+GEN_VEXT_VX(vrol_vx_d, 8)
+
+static uint64_t brev8(uint64_t val)
+{
+    val = ((val & 0x5555555555555555ull) << 1) |
+          ((val & 0xAAAAAAAAAAAAAAAAull) >> 1);
+    val = ((val & 0x3333333333333333ull) << 2) |
+          ((val & 0xCCCCCCCCCCCCCCCCull) >> 2);
+    val = ((val & 0x0F0F0F0F0F0F0F0Full) << 4) |
+          ((val & 0xF0F0F0F0F0F0F0F0ull) >> 4);
+
+    return val;
+}
+
+RVVCALL(OPIVV1, vbrev8_v_b, OP_UU_B, H1, H1, brev8)
+RVVCALL(OPIVV1, vbrev8_v_h, OP_UU_H, H2, H2, brev8)
+RVVCALL(OPIVV1, vbrev8_v_w, OP_UU_W, H4, H4, brev8)
+RVVCALL(OPIVV1, vbrev8_v_d, OP_UU_D, H8, H8, brev8)
+GEN_VEXT_V(vbrev8_v_b, 1)
+GEN_VEXT_V(vbrev8_v_h, 2)
+GEN_VEXT_V(vbrev8_v_w, 4)
+GEN_VEXT_V(vbrev8_v_d, 8)
+
+#define DO_IDENTITY(a) (a)
+RVVCALL(OPIVV1, vrev8_v_b, OP_UU_B, H1, H1, DO_IDENTITY)
+RVVCALL(OPIVV1, vrev8_v_h, OP_UU_H, H2, H2, bswap16)
+RVVCALL(OPIVV1, vrev8_v_w, OP_UU_W, H4, H4, bswap32)
+RVVCALL(OPIVV1, vrev8_v_d, OP_UU_D, H8, H8, bswap64)
+GEN_VEXT_V(vrev8_v_b, 1)
+GEN_VEXT_V(vrev8_v_h, 2)
+GEN_VEXT_V(vrev8_v_w, 4)
+GEN_VEXT_V(vrev8_v_d, 8)
+
+#define DO_ANDN(a, b) ((a) & ~(b))
+RVVCALL(OPIVV2, vandn_vv_b, OP_UUU_B, H1, H1, H1, DO_ANDN)
+RVVCALL(OPIVV2, vandn_vv_h, OP_UUU_H, H2, H2, H2, DO_ANDN)
+RVVCALL(OPIVV2, vandn_vv_w, OP_UUU_W, H4, H4, H4, DO_ANDN)
+RVVCALL(OPIVV2, vandn_vv_d, OP_UUU_D, H8, H8, H8, DO_ANDN)
+GEN_VEXT_VV(vandn_vv_b, 1)
+GEN_VEXT_VV(vandn_vv_h, 2)
+GEN_VEXT_VV(vandn_vv_w, 4)
+GEN_VEXT_VV(vandn_vv_d, 8)
+
+RVVCALL(OPIVX2, vandn_vx_b, OP_UUU_B, H1, H1, DO_ANDN)
+RVVCALL(OPIVX2, vandn_vx_h, OP_UUU_H, H2, H2, DO_ANDN)
+RVVCALL(OPIVX2, vandn_vx_w, OP_UUU_W, H4, H4, DO_ANDN)
+RVVCALL(OPIVX2, vandn_vx_d, OP_UUU_D, H8, H8, DO_ANDN)
+GEN_VEXT_VX(vandn_vx_b, 1)
+GEN_VEXT_VX(vandn_vx_h, 2)
+GEN_VEXT_VX(vandn_vx_w, 4)
+GEN_VEXT_VX(vandn_vx_d, 8)
+
+RVVCALL(OPIVV1, vbrev_v_b, OP_UU_B, H1, H1, revbit8)
+RVVCALL(OPIVV1, vbrev_v_h, OP_UU_H, H2, H2, revbit16)
+RVVCALL(OPIVV1, vbrev_v_w, OP_UU_W, H4, H4, revbit32)
+RVVCALL(OPIVV1, vbrev_v_d, OP_UU_D, H8, H8, revbit64)
+GEN_VEXT_V(vbrev_v_b, 1)
+GEN_VEXT_V(vbrev_v_h, 2)
+GEN_VEXT_V(vbrev_v_w, 4)
+GEN_VEXT_V(vbrev_v_d, 8)
+
+RVVCALL(OPIVV1, vclz_v_b, OP_UU_B, H1, H1, clz8)
+RVVCALL(OPIVV1, vclz_v_h, OP_UU_H, H2, H2, clz16)
+RVVCALL(OPIVV1, vclz_v_w, OP_UU_W, H4, H4, clz32)
+RVVCALL(OPIVV1, vclz_v_d, OP_UU_D, H8, H8, clz64)
+GEN_VEXT_V(vclz_v_b, 1)
+GEN_VEXT_V(vclz_v_h, 2)
+GEN_VEXT_V(vclz_v_w, 4)
+GEN_VEXT_V(vclz_v_d, 8)
+
+RVVCALL(OPIVV1, vctz_v_b, OP_UU_B, H1, H1, ctz8)
+RVVCALL(OPIVV1, vctz_v_h, OP_UU_H, H2, H2, ctz16)
+RVVCALL(OPIVV1, vctz_v_w, OP_UU_W, H4, H4, ctz32)
+RVVCALL(OPIVV1, vctz_v_d, OP_UU_D, H8, H8, ctz64)
+GEN_VEXT_V(vctz_v_b, 1)
+GEN_VEXT_V(vctz_v_h, 2)
+GEN_VEXT_V(vctz_v_w, 4)
+GEN_VEXT_V(vctz_v_d, 8)
+
+RVVCALL(OPIVV1, vcpop_v_b, OP_UU_B, H1, H1, ctpop8)
+RVVCALL(OPIVV1, vcpop_v_h, OP_UU_H, H2, H2, ctpop16)
+RVVCALL(OPIVV1, vcpop_v_w, OP_UU_W, H4, H4, ctpop32)
+RVVCALL(OPIVV1, vcpop_v_d, OP_UU_D, H8, H8, ctpop64)
+GEN_VEXT_V(vcpop_v_b, 1)
+GEN_VEXT_V(vcpop_v_h, 2)
+GEN_VEXT_V(vcpop_v_w, 4)
+GEN_VEXT_V(vcpop_v_d, 8)
+
+#define DO_SLL(N, M) (N << (M & (sizeof(N) * 8 - 1)))
+RVVCALL(OPIVV2, vwsll_vv_b, WOP_UUU_B, H2, H1, H1, DO_SLL)
+RVVCALL(OPIVV2, vwsll_vv_h, WOP_UUU_H, H4, H2, H2, DO_SLL)
+RVVCALL(OPIVV2, vwsll_vv_w, WOP_UUU_W, H8, H4, H4, DO_SLL)
+GEN_VEXT_VV(vwsll_vv_b, 2)
+GEN_VEXT_VV(vwsll_vv_h, 4)
+GEN_VEXT_VV(vwsll_vv_w, 8)
+
+RVVCALL(OPIVX2, vwsll_vx_b, WOP_UUU_B, H2, H1, DO_SLL)
+RVVCALL(OPIVX2, vwsll_vx_h, WOP_UUU_H, H4, H2, DO_SLL)
+RVVCALL(OPIVX2, vwsll_vx_w, WOP_UUU_W, H8, H4, DO_SLL)
+GEN_VEXT_VX(vwsll_vx_b, 2)
+GEN_VEXT_VX(vwsll_vx_h, 4)
+GEN_VEXT_VX(vwsll_vx_w, 8)
diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc b/target/riscv/insn_trans/trans_rvvk.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn_trans/trans_rvvk.c.inc
+++ b/target/riscv/insn_trans/trans_rvvk.c.inc
@@ -XXX,XX +XXX,XX @@ static bool vclmul_vx_check(DisasContext *s, arg_rmrr *a)
 
 GEN_VX_MASKED_TRANS(vclmul_vx, vclmul_vx_check)
 GEN_VX_MASKED_TRANS(vclmulh_vx, vclmul_vx_check)
+
+/*
+ * Zvbb
+ */
+
+#define GEN_OPIVI_GVEC_TRANS_CHECK(NAME, IMM_MODE, OPIVX, SUF, CHECK)   \
+    static bool trans_##NAME(DisasContext *s, arg_rmrr *a)              \
+    {                                                                   \
+        if (CHECK(s, a)) {                                              \
+            static gen_helper_opivx *const fns[4] = {                   \
+                gen_helper_##OPIVX##_b,                                 \
+                gen_helper_##OPIVX##_h,                                 \
+                gen_helper_##OPIVX##_w,                                 \
+                gen_helper_##OPIVX##_d,                                 \
+            };                                                          \
+            return do_opivi_gvec(s, a, tcg_gen_gvec_##SUF, fns[s->sew], \
+                                 IMM_MODE);                             \
+        }                                                               \
+        return false;                                                   \
+    }
+
+#define GEN_OPIVV_GVEC_TRANS_CHECK(NAME, SUF, CHECK)                     \
+    static bool trans_##NAME(DisasContext *s, arg_rmrr *a)               \
+    {                                                                    \
+        if (CHECK(s, a)) {                                               \
+            static gen_helper_gvec_4_ptr *const fns[4] = {               \
+                gen_helper_##NAME##_b,                                   \
+                gen_helper_##NAME##_h,                                   \
+                gen_helper_##NAME##_w,                                   \
+                gen_helper_##NAME##_d,                                   \
+            };                                                           \
+            return do_opivv_gvec(s, a, tcg_gen_gvec_##SUF, fns[s->sew]); \
+        }                                                                \
+        return false;                                                    \
+    }
+
+#define GEN_OPIVX_GVEC_SHIFT_TRANS_CHECK(NAME, SUF, CHECK)       \
+    static bool trans_##NAME(DisasContext *s, arg_rmrr *a)       \
+    {                                                            \
+        if (CHECK(s, a)) {                                       \
+            static gen_helper_opivx *const fns[4] = {            \
+                gen_helper_##NAME##_b,                           \
+                gen_helper_##NAME##_h,                           \
+                gen_helper_##NAME##_w,                           \
+                gen_helper_##NAME##_d,                           \
+            };                                                   \
+            return do_opivx_gvec_shift(s, a, tcg_gen_gvec_##SUF, \
+                                       fns[s->sew]);             \
+        }                                                        \
+        return false;                                            \
+    }
+
+static bool zvbb_vv_check(DisasContext *s, arg_rmrr *a)
+{
+    return opivv_check(s, a) && s->cfg_ptr->ext_zvbb == true;
+}
+
+static bool zvbb_vx_check(DisasContext *s, arg_rmrr *a)
+{
+    return opivx_check(s, a) && s->cfg_ptr->ext_zvbb == true;
+}
+
+/* vrol.v[vx] */
+GEN_OPIVV_GVEC_TRANS_CHECK(vrol_vv, rotlv, zvbb_vv_check)
+GEN_OPIVX_GVEC_SHIFT_TRANS_CHECK(vrol_vx, rotls, zvbb_vx_check)
+
+/* vror.v[vxi] */
+GEN_OPIVV_GVEC_TRANS_CHECK(vror_vv, rotrv, zvbb_vv_check)
+GEN_OPIVX_GVEC_SHIFT_TRANS_CHECK(vror_vx, rotrs, zvbb_vx_check)
+GEN_OPIVI_GVEC_TRANS_CHECK(vror_vi, IMM_TRUNC_SEW, vror_vx, rotri, zvbb_vx_check)
+
+#define GEN_OPIVX_GVEC_TRANS_CHECK(NAME, SUF, CHECK)                     \
+    static bool trans_##NAME(DisasContext *s, arg_rmrr *a)               \
+    {                                                                    \
+        if (CHECK(s, a)) {                                               \
+            static gen_helper_opivx *const fns[4] = {                    \
+                gen_helper_##NAME##_b,                                   \
+                gen_helper_##NAME##_h,                                   \
+                gen_helper_##NAME##_w,                                   \
+                gen_helper_##NAME##_d,                                   \
+            };                                                           \
+            return do_opivx_gvec(s, a, tcg_gen_gvec_##SUF, fns[s->sew]); \
+        }                                                                \
+        return false;                                                    \
+    }
+
+/* vandn.v[vx] */
+GEN_OPIVV_GVEC_TRANS_CHECK(vandn_vv, andc, zvbb_vv_check)
+GEN_OPIVX_GVEC_TRANS_CHECK(vandn_vx, andcs, zvbb_vx_check)
+
+#define GEN_OPIV_TRANS(NAME, CHECK)                                        \
+    static bool trans_##NAME(DisasContext *s, arg_rmr *a)                  \
+    {                                                                      \
+        if (CHECK(s, a)) {                                                 \
+            uint32_t data = 0;                                             \
+            static gen_helper_gvec_3_ptr *const fns[4] = {                 \
+                gen_helper_##NAME##_b,                                     \
+                gen_helper_##NAME##_h,                                     \
+                gen_helper_##NAME##_w,                                     \
+                gen_helper_##NAME##_d,                                     \
+            };                                                             \
+            TCGLabel *over = gen_new_label();                              \
+            tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);     \
+                                                                           \
+            data = FIELD_DP32(data, VDATA, VM, a->vm);                     \
+            data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                 \
+            data = FIELD_DP32(data, VDATA, VTA, s->vta);                   \
+            data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s); \
+            data = FIELD_DP32(data, VDATA, VMA, s->vma);                   \
+            tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),         \
+                               vreg_ofs(s, a->rs2), cpu_env,               \
+                               s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8, \
+                               data, fns[s->sew]);                         \
+            mark_vs_dirty(s);                                              \
+            gen_set_label(over);                                           \
+            return true;                                                   \
+        }                                                                  \
+        return false;                                                      \
+    }
+
+static bool zvbb_opiv_check(DisasContext *s, arg_rmr *a)
+{
+    return s->cfg_ptr->ext_zvbb == true &&
+           require_rvv(s) &&
+           vext_check_isa_ill(s) &&
+           vext_check_ss(s, a->rd, a->rs2, a->vm);
+}
+
+GEN_OPIV_TRANS(vbrev8_v, zvbb_opiv_check)
+GEN_OPIV_TRANS(vrev8_v, zvbb_opiv_check)
+GEN_OPIV_TRANS(vbrev_v, zvbb_opiv_check)
+GEN_OPIV_TRANS(vclz_v, zvbb_opiv_check)
+GEN_OPIV_TRANS(vctz_v, zvbb_opiv_check)
+GEN_OPIV_TRANS(vcpop_v, zvbb_opiv_check)
+
+static bool vwsll_vv_check(DisasContext *s, arg_rmrr *a)
+{
+    return s->cfg_ptr->ext_zvbb && opivv_widen_check(s, a);
+}
+
+static bool vwsll_vx_check(DisasContext *s, arg_rmrr *a)
+{
+    return s->cfg_ptr->ext_zvbb && opivx_widen_check(s, a);
+}
+
+/* OPIVI without GVEC IR */
+#define GEN_OPIVI_WIDEN_TRANS(NAME, IMM_MODE, OPIVX, CHECK)                  \
+    static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                   \
+    {                                                                        \
+        if (CHECK(s, a)) {                                                   \
+            static gen_helper_opivx *const fns[3] = {                        \
+                gen_helper_##OPIVX##_b,                                      \
+                gen_helper_##OPIVX##_h,                                      \
+                gen_helper_##OPIVX##_w,                                      \
+            };                                                               \
+            return opivi_trans(a->rd, a->rs1, a->rs2, a->vm, fns[s->sew], s, \
+                               IMM_MODE);                                    \
+        }                                                                    \
+        return false;                                                        \
+    }
+
+GEN_OPIVV_WIDEN_TRANS(vwsll_vv, vwsll_vv_check)
+GEN_OPIVX_WIDEN_TRANS(vwsll_vx, vwsll_vx_check)
+GEN_OPIVI_WIDEN_TRANS(vwsll_vi, IMM_ZX, vwsll_vx, vwsll_vx_check)
-- 
2.41.0

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

This commit adds support for the Zvkned vector-crypto extension, which
consists of the following instructions:

* vaesef.[vv,vs]
* vaesdf.[vv,vs]
* vaesdm.[vv,vs]
* vaesz.vs
* vaesem.[vv,vs]
* vaeskf1.vi
* vaeskf2.vi

Translation functions are defined in
`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
`target/riscv/vcrypto_helper.c`.

Co-authored-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
Co-authored-by: William Salmon <will.salmon@codethink.co.uk>
[max.chou@sifive.com: Replaced vstart checking by TCG op]
Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
Signed-off-by: William Salmon <will.salmon@codethink.co.uk>
Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Max Chou <max.chou@sifive.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
[max.chou@sifive.com: Imported aes-round.h and exposed x-zvkned
property]
[max.chou@sifive.com: Fixed endian issues and replaced the vstart & vl
egs checking by helper function]
[max.chou@sifive.com: Replaced bswap32 calls in aes key expanding]
Message-ID: <20230711165917.2629866-10-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu_cfg.h                   |   1 +
 target/riscv/helper.h                    |  14 ++
 target/riscv/insn32.decode               |  14 ++
 target/riscv/cpu.c                       |   4 +-
 target/riscv/vcrypto_helper.c            | 202 +++++++++++++++++++++++
 target/riscv/insn_trans/trans_rvvk.c.inc | 147 +++++++++++++++++
 6 files changed, 381 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -XXX,XX +XXX,XX @@ struct RISCVCPUConfig {
     bool ext_zve64d;
     bool ext_zvbb;
     bool ext_zvbc;
+    bool ext_zvkned;
     bool ext_zmmul;
     bool ext_zvfbfmin;
     bool ext_zvfbfwma;
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_6(vandn_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vandn_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vandn_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vandn_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_2(egs_check, void, i32, env)
+
+DEF_HELPER_4(vaesef_vv, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesef_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesdf_vv, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesdf_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesem_vv, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesem_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesdm_vv, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesdm_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesz_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_5(vaeskf1_vi, void, ptr, ptr, i32, env, i32)
+DEF_HELPER_5(vaeskf2_vi, void, ptr, ptr, i32, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -XXX,XX +XXX,XX @@
 @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... &r2 %rs1 %rd
+@r2_vm_1 ...... . ..... ..... ... ..... ....... &rmr vm=1 %rs2 %rd
 @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
 @r2_vm   ...... vm:1 ..... ..... ... ..... ....... &rmr %rs2 %rd
 @r1_vm   ...... vm:1 ..... ..... ... ..... ....... %rd
@@ -XXX,XX +XXX,XX @@ vcpop_v     010010 . ..... 01110 010 ..... 1010111 @r2_vm
 vwsll_vv    110101 . ..... ..... 000 ..... 1010111 @r_vm
 vwsll_vx    110101 . ..... ..... 100 ..... 1010111 @r_vm
 vwsll_vi    110101 . ..... ..... 011 ..... 1010111 @r_vm
+
+# *** Zvkned vector crypto extension ***
+vaesef_vv   101000 1 ..... 00011 010 ..... 1110111 @r2_vm_1
+vaesef_vs   101001 1 ..... 00011 010 ..... 1110111 @r2_vm_1
+vaesdf_vv   101000 1 ..... 00001 010 ..... 1110111 @r2_vm_1
+vaesdf_vs   101001 1 ..... 00001 010 ..... 1110111 @r2_vm_1
+vaesem_vv   101000 1 ..... 00010 010 ..... 1110111 @r2_vm_1
+vaesem_vs   101001 1 ..... 00010 010 ..... 1110111 @r2_vm_1
+vaesdm_vv   101000 1 ..... 00000 010 ..... 1110111 @r2_vm_1
+vaesdm_vs   101001 1 ..... 00000 010 ..... 1110111 @r2_vm_1
+vaesz_vs    101001 1 ..... 00111 010 ..... 1110111 @r2_vm_1
+vaeskf1_vi  100010 1 ..... ..... 010 ..... 1110111 @r_vm_1
+vaeskf2_vi  101010 1 ..... ..... 010 ..... 1110111 @r_vm_1
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
     ISA_EXT_DATA_ENTRY(zvfbfwma, PRIV_VERSION_1_12_0, ext_zvfbfwma),
     ISA_EXT_DATA_ENTRY(zvfh, PRIV_VERSION_1_12_0, ext_zvfh),
     ISA_EXT_DATA_ENTRY(zvfhmin, PRIV_VERSION_1_12_0, ext_zvfhmin),
+    ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned),
     ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
     ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
     ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
@@ -XXX,XX +XXX,XX @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
      * In principle Zve*x would also suffice here, were they supported
      * in qemu
      */
-    if (cpu->cfg.ext_zvbb && !cpu->cfg.ext_zve32f) {
+    if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned) && !cpu->cfg.ext_zve32f) {
         error_setg(errp,
                    "Vector crypto extensions require V or Zve* extensions");
         return;
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
     /* Vector cryptography extensions */
     DEFINE_PROP_BOOL("x-zvbb", RISCVCPU, cfg.ext_zvbb, false),
     DEFINE_PROP_BOOL("x-zvbc", RISCVCPU, cfg.ext_zvbc, false),
+    DEFINE_PROP_BOOL("x-zvkned", RISCVCPU, cfg.ext_zvkned, false),
 
     DEFINE_PROP_END_OF_LIST(),
 };
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/bitops.h"
 #include "qemu/bswap.h"
 #include "cpu.h"
+#include "crypto/aes.h"
+#include "crypto/aes-round.h"
 #include "exec/memop.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
@@ -XXX,XX +XXX,XX @@ RVVCALL(OPIVX2, vwsll_vx_w, WOP_UUU_W, H8, H4, DO_SLL)
 GEN_VEXT_VX(vwsll_vx_b, 2)
 GEN_VEXT_VX(vwsll_vx_h, 4)
 GEN_VEXT_VX(vwsll_vx_w, 8)
+
+void HELPER(egs_check)(uint32_t egs, CPURISCVState *env)
+{
+    uint32_t vl = env->vl;
+    uint32_t vstart = env->vstart;
+
+    if (vl % egs != 0 || vstart % egs != 0) {
+        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+    }
+}
+
+static inline void xor_round_key(AESState *round_state, AESState *round_key)
+{
+    round_state->v = round_state->v ^ round_key->v;
+}
+
+#define GEN_ZVKNED_HELPER_VV(NAME, ...)                                   \
+    void HELPER(NAME)(void *vd, void *vs2, CPURISCVState *env,            \
+                      uint32_t desc)                                      \
+    {                                                                     \
+        uint32_t vl = env->vl;                                            \
+        uint32_t total_elems = vext_get_total_elems(env, desc, 4);        \
+        uint32_t vta = vext_vta(desc);                                    \
+                                                                          \
+        for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {        \
+            AESState round_key;                                           \
+            round_key.d[0] = *((uint64_t *)vs2 + H8(i * 2 + 0));          \
+            round_key.d[1] = *((uint64_t *)vs2 + H8(i * 2 + 1));          \
+            AESState round_state;                                         \
+            round_state.d[0] = *((uint64_t *)vd + H8(i * 2 + 0));         \
+            round_state.d[1] = *((uint64_t *)vd + H8(i * 2 + 1));         \
+            __VA_ARGS__;                                                  \
+            *((uint64_t *)vd + H8(i * 2 + 0)) = round_state.d[0];         \
+            *((uint64_t *)vd + H8(i * 2 + 1)) = round_state.d[1];         \
+        }                                                                 \
+        env->vstart = 0;                                                  \
+        /* set tail elements to 1s */                                     \
+        vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);              \
+    }
+
+#define GEN_ZVKNED_HELPER_VS(NAME, ...)                                   \
+    void HELPER(NAME)(void *vd, void *vs2, CPURISCVState *env,            \
+                      uint32_t desc)                                      \
+    {                                                                     \
+        uint32_t vl = env->vl;                                            \
+        uint32_t total_elems = vext_get_total_elems(env, desc, 4);        \
+        uint32_t vta = vext_vta(desc);                                    \
+                                                                          \
+        for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {        \
+            AESState round_key;                                           \
+            round_key.d[0] = *((uint64_t *)vs2 + H8(0));                  \
+            round_key.d[1] = *((uint64_t *)vs2 + H8(1));                  \
+            AESState round_state;                                         \
+            round_state.d[0] = *((uint64_t *)vd + H8(i * 2 + 0));         \
+            round_state.d[1] = *((uint64_t *)vd + H8(i * 2 + 1));         \
+            __VA_ARGS__;                                                  \
+            *((uint64_t *)vd + H8(i * 2 + 0)) = round_state.d[0];         \
+            *((uint64_t *)vd + H8(i * 2 + 1)) = round_state.d[1];         \
+        }                                                                 \
+        env->vstart = 0;                                                  \
+        /* set tail elements to 1s */                                     \
+        vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);              \
+    }
+
+GEN_ZVKNED_HELPER_VV(vaesef_vv, aesenc_SB_SR_AK(&round_state,
+                                                &round_state,
+                                                &round_key,
+                                                false);)
+GEN_ZVKNED_HELPER_VS(vaesef_vs, aesenc_SB_SR_AK(&round_state,
+                                                &round_state,
+                                                &round_key,
+                                                false);)
+GEN_ZVKNED_HELPER_VV(vaesdf_vv, aesdec_ISB_ISR_AK(&round_state,
+                                                  &round_state,
+                                                  &round_key,
+                                                  false);)
+GEN_ZVKNED_HELPER_VS(vaesdf_vs, aesdec_ISB_ISR_AK(&round_state,
+                                                  &round_state,
+                                                  &round_key,
+                                                  false);)
+GEN_ZVKNED_HELPER_VV(vaesem_vv, aesenc_SB_SR_MC_AK(&round_state,
+                                                   &round_state,
+                                                   &round_key,
+                                                   false);)
+GEN_ZVKNED_HELPER_VS(vaesem_vs, aesenc_SB_SR_MC_AK(&round_state,
+                                                   &round_state,
+                                                   &round_key,
+                                                   false);)
+GEN_ZVKNED_HELPER_VV(vaesdm_vv, aesdec_ISB_ISR_AK_IMC(&round_state,
+                                                      &round_state,
+                                                      &round_key,
+                                                      false);)
+GEN_ZVKNED_HELPER_VS(vaesdm_vs, aesdec_ISB_ISR_AK_IMC(&round_state,
+                                                      &round_state,
+                                                      &round_key,
+                                                      false);)
+GEN_ZVKNED_HELPER_VS(vaesz_vs, xor_round_key(&round_state, &round_key);)
+
+void HELPER(vaeskf1_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
+                        CPURISCVState *env, uint32_t desc)
+{
+    uint32_t *vd = vd_vptr;
+    uint32_t *vs2 = vs2_vptr;
+    uint32_t vl = env->vl;
+    uint32_t total_elems = vext_get_total_elems(env, desc, 4);
+    uint32_t vta = vext_vta(desc);
+
+    uimm &= 0b1111;
+    if (uimm > 10 || uimm == 0) {
+        uimm ^= 0b1000;
+    }
+
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
+        uint32_t rk[8], tmp;
+        static const uint32_t rcon[] = {
+            0x00000001, 0x00000002, 0x00000004, 0x00000008, 0x00000010,
+            0x00000020, 0x00000040, 0x00000080, 0x0000001B, 0x00000036,
+        };
+
+        rk[0] = vs2[i * 4 + H4(0)];
+        rk[1] = vs2[i * 4 + H4(1)];
+        rk[2] = vs2[i * 4 + H4(2)];
+        rk[3] = vs2[i * 4 + H4(3)];
+        tmp = ror32(rk[3], 8);
+
+        rk[4] = rk[0] ^ (((uint32_t)AES_sbox[(tmp >> 24) & 0xff] << 24) |
+                         ((uint32_t)AES_sbox[(tmp >> 16) & 0xff] << 16) |
+                         ((uint32_t)AES_sbox[(tmp >> 8) & 0xff] << 8) |
+                         ((uint32_t)AES_sbox[(tmp >> 0) & 0xff] << 0))
+                      ^ rcon[uimm - 1];
+        rk[5] = rk[1] ^ rk[4];
+        rk[6] = rk[2] ^ rk[5];
+        rk[7] = rk[3] ^ rk[6];
+
+        vd[i * 4 + H4(0)] = rk[4];
+        vd[i * 4 + H4(1)] = rk[5];
+        vd[i * 4 + H4(2)] = rk[6];
+        vd[i * 4 + H4(3)] = rk[7];
+    }
+    env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);
+}
+
+void HELPER(vaeskf2_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
+                        CPURISCVState *env, uint32_t desc)
+{
+    uint32_t *vd = vd_vptr;
+    uint32_t *vs2 = vs2_vptr;
+    uint32_t vl = env->vl;
+    uint32_t total_elems = vext_get_total_elems(env, desc, 4);
+    uint32_t vta = vext_vta(desc);
+
+    uimm &= 0b1111;
+    if (uimm > 14 || uimm < 2) {
+        uimm ^= 0b1000;
+    }
+
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
+        uint32_t rk[12], tmp;
+        static const uint32_t rcon[] = {
+            0x00000001, 0x00000002, 0x00000004, 0x00000008, 0x00000010,
+            0x00000020, 0x00000040, 0x00000080, 0x0000001B, 0x00000036,
+        };
+
+        rk[0] = vd[i * 4 + H4(0)];
+        rk[1] = vd[i * 4 + H4(1)];
+        rk[2] = vd[i * 4 + H4(2)];
+        rk[3] = vd[i * 4 + H4(3)];
+        rk[4] = vs2[i * 4 + H4(0)];
+        rk[5] = vs2[i * 4 + H4(1)];
+        rk[6] = vs2[i * 4 + H4(2)];
+        rk[7] = vs2[i * 4 + H4(3)];
+
+        if (uimm % 2 == 0) {
+            tmp = ror32(rk[7], 8);
+            rk[8] = rk[0] ^ (((uint32_t)AES_sbox[(tmp >> 24) & 0xff] << 24) |
+                             ((uint32_t)AES_sbox[(tmp >> 16) & 0xff] << 16) |
+                             ((uint32_t)AES_sbox[(tmp >> 8) & 0xff] << 8) |
+                             ((uint32_t)AES_sbox[(tmp >> 0) & 0xff] << 0))
+                          ^ rcon[(uimm - 1) / 2];
+        } else {
+            rk[8] = rk[0] ^ (((uint32_t)AES_sbox[(rk[7] >> 24) & 0xff] << 24) |
+                             ((uint32_t)AES_sbox[(rk[7] >> 16) & 0xff] << 16) |
+                             ((uint32_t)AES_sbox[(rk[7] >> 8) & 0xff] << 8) |
+                             ((uint32_t)AES_sbox[(rk[7] >> 0) & 0xff] << 0));
+        }
+        rk[9] = rk[1] ^ rk[8];
+        rk[10] = rk[2] ^ rk[9];
+        rk[11] = rk[3] ^ rk[10];
+
+        vd[i * 4 + H4(0)] = rk[8];
+        vd[i * 4 + H4(1)] = rk[9];
+        vd[i * 4 + H4(2)] = rk[10];
+        vd[i * 4 + H4(3)] = rk[11];
+    }
+    env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);
+}
diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc b/target/riscv/insn_trans/trans_rvvk.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn_trans/trans_rvvk.c.inc
+++ b/target/riscv/insn_trans/trans_rvvk.c.inc
@@ -XXX,XX +XXX,XX @@ static bool vwsll_vx_check(DisasContext *s, arg_rmrr *a)
 GEN_OPIVV_WIDEN_TRANS(vwsll_vv, vwsll_vv_check)
 GEN_OPIVX_WIDEN_TRANS(vwsll_vx, vwsll_vx_check)
 GEN_OPIVI_WIDEN_TRANS(vwsll_vi, IMM_ZX, vwsll_vx, vwsll_vx_check)
+
+/*
+ * Zvkned
+ */
+
+#define ZVKNED_EGS 4
+
+#define GEN_V_UNMASKED_TRANS(NAME, CHECK, EGS)                                \
+    static bool trans_##NAME(DisasContext *s, arg_##NAME *a)                  \
+    {                                                                         \
+        if (CHECK(s, a)) {                                                    \
+            TCGv_ptr rd_v, rs2_v;                                             \
+            TCGv_i32 desc, egs;                                               \
+            uint32_t data = 0;                                                \
+            TCGLabel *over = gen_new_label();                                 \
+                                                                              \
+            if (!s->vstart_eq_zero || !s->vl_eq_vlmax) {                      \
+                /* save opcode for unwinding in case we throw an exception */ \
+                decode_save_opc(s);                                           \
+                egs = tcg_constant_i32(EGS);                                  \
+                gen_helper_egs_check(egs, cpu_env);                           \
+                tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);    \
+            }                                                                 \
+                                                                              \
+            data = FIELD_DP32(data, VDATA, VM, a->vm);                        \
+            data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                    \
+            data = FIELD_DP32(data, VDATA, VTA, s->vta);                      \
+            data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);    \
+            data = FIELD_DP32(data, VDATA, VMA, s->vma);                      \
+            rd_v = tcg_temp_new_ptr();                                        \
+            rs2_v = tcg_temp_new_ptr();                                       \
+            desc = tcg_constant_i32(                                          \
+                simd_desc(s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8, data)); \
+            tcg_gen_addi_ptr(rd_v, cpu_env, vreg_ofs(s, a->rd));              \
+            tcg_gen_addi_ptr(rs2_v, cpu_env, vreg_ofs(s, a->rs2));            \
+            gen_helper_##NAME(rd_v, rs2_v, cpu_env, desc);                    \
+            mark_vs_dirty(s);                                                 \
+            gen_set_label(over);                                              \
+            return true;                                                      \
+        }                                                                     \
+        return false;                                                         \
+    }
+
+static bool vaes_check_vv(DisasContext *s, arg_rmr *a)
+{
+    int egw_bytes = ZVKNED_EGS << s->sew;
+    return s->cfg_ptr->ext_zvkned == true &&
+           require_rvv(s) &&
+           vext_check_isa_ill(s) &&
+           MAXSZ(s) >= egw_bytes &&
+           require_align(a->rd, s->lmul) &&
+           require_align(a->rs2, s->lmul) &&
+           s->sew == MO_32;
+}
+
+static bool vaes_check_overlap(DisasContext *s, int vd, int vs2)
+{
+    int8_t op_size = s->lmul <= 0 ? 1 : 1 << s->lmul;
+    return !is_overlapped(vd, op_size, vs2, 1);
+}
+
+static bool vaes_check_vs(DisasContext *s, arg_rmr *a)
+{
+    int egw_bytes = ZVKNED_EGS << s->sew;
+    return vaes_check_overlap(s, a->rd, a->rs2) &&
+           MAXSZ(s) >= egw_bytes &&
+           s->cfg_ptr->ext_zvkned == true &&
+           require_rvv(s) &&
+           vext_check_isa_ill(s) &&
+           require_align(a->rd, s->lmul) &&
+           s->sew == MO_32;
+}
+
+GEN_V_UNMASKED_TRANS(vaesef_vv, vaes_check_vv, ZVKNED_EGS)
+GEN_V_UNMASKED_TRANS(vaesef_vs, vaes_check_vs, ZVKNED_EGS)
+GEN_V_UNMASKED_TRANS(vaesdf_vv, vaes_check_vv, ZVKNED_EGS)
+GEN_V_UNMASKED_TRANS(vaesdf_vs, vaes_check_vs, ZVKNED_EGS)
+GEN_V_UNMASKED_TRANS(vaesdm_vv, vaes_check_vv, ZVKNED_EGS)
+GEN_V_UNMASKED_TRANS(vaesdm_vs, vaes_check_vs, ZVKNED_EGS)
+GEN_V_UNMASKED_TRANS(vaesz_vs, vaes_check_vs, ZVKNED_EGS)
+GEN_V_UNMASKED_TRANS(vaesem_vv, vaes_check_vv, ZVKNED_EGS)
+GEN_V_UNMASKED_TRANS(vaesem_vs, vaes_check_vs, ZVKNED_EGS)
+
+#define GEN_VI_UNMASKED_TRANS(NAME, CHECK, EGS)                               \
+    static bool trans_##NAME(DisasContext *s, arg_##NAME *a)                  \
+    {                                                                         \
+        if (CHECK(s, a)) {                                                    \
+            TCGv_ptr rd_v, rs2_v;                                             \
+            TCGv_i32 uimm_v, desc, egs;                                       \
+            uint32_t data = 0;                                                \
+            TCGLabel *over = gen_new_label();                                 \
+                                                                              \
+            if (!s->vstart_eq_zero || !s->vl_eq_vlmax) {                      \
+                /* save opcode for unwinding in case we throw an exception */ \
+                decode_save_opc(s);                                           \
+                egs = tcg_constant_i32(EGS);                                  \
+                gen_helper_egs_check(egs, cpu_env);                           \
+                tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);    \
+            }                                                                 \
+                                                                              \
+            data = FIELD_DP32(data, VDATA, VM, a->vm);                        \
+            data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                    \
+            data = FIELD_DP32(data, VDATA, VTA, s->vta);                      \
+            data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);    \
+            data = FIELD_DP32(data, VDATA, VMA, s->vma);                      \
+                                                                              \
+            rd_v = tcg_temp_new_ptr();                                        \
+            rs2_v = tcg_temp_new_ptr();                                       \
+            uimm_v = tcg_constant_i32(a->rs1);                                \
+            desc = tcg_constant_i32(                                          \
+                simd_desc(s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8, data)); \
+            tcg_gen_addi_ptr(rd_v, cpu_env, vreg_ofs(s, a->rd));              \
+            tcg_gen_addi_ptr(rs2_v, cpu_env, vreg_ofs(s, a->rs2));            \
+            gen_helper_##NAME(rd_v, rs2_v, uimm_v, cpu_env, desc);            \
+            mark_vs_dirty(s);                                                 \
+            gen_set_label(over);                                              \
+            return true;                                                      \
+        }                                                                     \
+        return false;                                                         \
+    }
+
+static bool vaeskf1_check(DisasContext *s, arg_vaeskf1_vi *a)
+{
+    int egw_bytes = ZVKNED_EGS << s->sew;
+    return s->cfg_ptr->ext_zvkned == true &&
+           require_rvv(s) &&
+           vext_check_isa_ill(s) &&
+           MAXSZ(s) >= egw_bytes &&
+           s->sew == MO_32 &&
+           require_align(a->rd, s->lmul) &&
+           require_align(a->rs2, s->lmul);
+}
+
+static bool vaeskf2_check(DisasContext *s, arg_vaeskf2_vi *a)
+{
+    int egw_bytes = ZVKNED_EGS << s->sew;
+    return s->cfg_ptr->ext_zvkned == true &&
+           require_rvv(s) &&
+           vext_check_isa_ill(s) &&
+           MAXSZ(s) >= egw_bytes &&
+           s->sew == MO_32 &&
+           require_align(a->rd, s->lmul) &&
+           require_align(a->rs2, s->lmul);
+}
+
+GEN_VI_UNMASKED_TRANS(vaeskf1_vi, vaeskf1_check, ZVKNED_EGS)
+GEN_VI_UNMASKED_TRANS(vaeskf2_vi, vaeskf2_check, ZVKNED_EGS)
-- 
2.41.0

From: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>

This commit adds support for the Zvknh vector-crypto extension, which
consists of the following instructions:

* vsha2ms.vv
* vsha2c[hl].vv

Translation functions are defined in
`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
`target/riscv/vcrypto_helper.c`.

Co-authored-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Co-authored-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
[max.chou@sifive.com: Replaced vstart checking by TCG op]
Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Signed-off-by: Max Chou <max.chou@sifive.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
[max.chou@sifive.com: Exposed x-zvknha & x-zvknhb properties]
[max.chou@sifive.com: Replaced SEW selection to happened during
translation]
Message-ID: <20230711165917.2629866-11-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu_cfg.h                   |   2 +
 target/riscv/helper.h                    |   6 +
 target/riscv/insn32.decode               |   5 +
 target/riscv/cpu.c                       |  13 +-
 target/riscv/vcrypto_helper.c            | 238 +++++++++++++++++++++++
 target/riscv/insn_trans/trans_rvvk.c.inc | 129 ++++++++++++
 6 files changed, 390 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -XXX,XX +XXX,XX @@ struct RISCVCPUConfig {
     bool ext_zvbb;
     bool ext_zvbc;
     bool ext_zvkned;
+    bool ext_zvknha;
+    bool ext_zvknhb;
     bool ext_zmmul;
     bool ext_zvfbfmin;
     bool ext_zvfbfwma;
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(vaesdm_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesz_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_5(vaeskf1_vi, void, ptr, ptr, i32, env, i32)
 DEF_HELPER_5(vaeskf2_vi, void, ptr, ptr, i32, env, i32)
+
+DEF_HELPER_5(vsha2ms_vv, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vsha2ch32_vv, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vsha2ch64_vv, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vsha2cl32_vv, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vsha2cl64_vv, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -XXX,XX +XXX,XX @@ vaesdm_vs   101001 1 ..... 00000 010 ..... 1110111 @r2_vm_1
 vaesz_vs    101001 1 ..... 00111 010 ..... 1110111 @r2_vm_1
 vaeskf1_vi  100010 1 ..... ..... 010 ..... 1110111 @r_vm_1
 vaeskf2_vi  101010 1 ..... ..... 010 ..... 1110111 @r_vm_1
+
+# *** Zvknh vector crypto extension ***
+vsha2ms_vv  101101 1 ..... ..... 010 ..... 1110111 @r_vm_1
+vsha2ch_vv  101110 1 ..... ..... 010 ..... 1110111 @r_vm_1
+vsha2cl_vv  101111 1 ..... ..... 010 ..... 1110111 @r_vm_1
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static const struct isa_ext_data isa_edata_arr[] = {
     ISA_EXT_DATA_ENTRY(zvfh, PRIV_VERSION_1_12_0, ext_zvfh),
     ISA_EXT_DATA_ENTRY(zvfhmin, PRIV_VERSION_1_12_0, ext_zvfhmin),
     ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned),
+    ISA_EXT_DATA_ENTRY(zvknha, PRIV_VERSION_1_12_0, ext_zvknha),
+    ISA_EXT_DATA_ENTRY(zvknhb, PRIV_VERSION_1_12_0, ext_zvknhb),
     ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
     ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
     ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
@@ -XXX,XX +XXX,XX @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
      * In principle Zve*x would also suffice here, were they supported
      * in qemu
      */
-    if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned) && !cpu->cfg.ext_zve32f) {
+    if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned || cpu->cfg.ext_zvknha) &&
+        !cpu->cfg.ext_zve32f) {
         error_setg(errp,
                    "Vector crypto extensions require V or Zve* extensions");
         return;
     }
 
-    if (cpu->cfg.ext_zvbc && !cpu->cfg.ext_zve64f) {
-        error_setg(errp, "Zvbc extension requires V or Zve64{f,d} extensions");
+    if ((cpu->cfg.ext_zvbc || cpu->cfg.ext_zvknhb) && !cpu->cfg.ext_zve64f) {
+        error_setg(
+            errp,
+            "Zvbc and Zvknhb extensions require V or Zve64{f,d} extensions");
         return;
     }
 
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
     DEFINE_PROP_BOOL("x-zvbb", RISCVCPU, cfg.ext_zvbb, false),
     DEFINE_PROP_BOOL("x-zvbc", RISCVCPU, cfg.ext_zvbc, false),
     DEFINE_PROP_BOOL("x-zvkned", RISCVCPU, cfg.ext_zvkned, false),
+    DEFINE_PROP_BOOL("x-zvknha", RISCVCPU, cfg.ext_zvknha, false),
+    DEFINE_PROP_BOOL("x-zvknhb", RISCVCPU, cfg.ext_zvknhb, false),
 
     DEFINE_PROP_END_OF_LIST(),
 };
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(vaeskf2_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
     /* set tail elements to 1s */
     vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);
 }
+
+static inline uint32_t sig0_sha256(uint32_t x)
+{
+    return ror32(x, 7) ^ ror32(x, 18) ^ (x >> 3);
+}
+
+static inline uint32_t sig1_sha256(uint32_t x)
+{
+    return ror32(x, 17) ^ ror32(x, 19) ^ (x >> 10);
+}
+
+static inline uint64_t sig0_sha512(uint64_t x)
+{
+    return ror64(x, 1) ^ ror64(x, 8) ^ (x >> 7);
+}
+
+static inline uint64_t sig1_sha512(uint64_t x)
+{
+    return ror64(x, 19) ^ ror64(x, 61) ^ (x >> 6);
+}
+
+static inline void vsha2ms_e32(uint32_t *vd, uint32_t *vs1, uint32_t *vs2)
+{
+    uint32_t res[4];
+    res[0] = sig1_sha256(vs1[H4(2)]) + vs2[H4(1)] + sig0_sha256(vd[H4(1)]) +
+             vd[H4(0)];
+    res[1] = sig1_sha256(vs1[H4(3)]) + vs2[H4(2)] + sig0_sha256(vd[H4(2)]) +
+             vd[H4(1)];
+    res[2] =
+        sig1_sha256(res[0]) + vs2[H4(3)] + sig0_sha256(vd[H4(3)]) + vd[H4(2)];
+    res[3] =
+        sig1_sha256(res[1]) + vs1[H4(0)] + sig0_sha256(vs2[H4(0)]) + vd[H4(3)];
+    vd[H4(3)] = res[3];
+    vd[H4(2)] = res[2];
+    vd[H4(1)] = res[1];
+    vd[H4(0)] = res[0];
+}
+
+static inline void vsha2ms_e64(uint64_t *vd, uint64_t *vs1, uint64_t *vs2)
+{
+    uint64_t res[4];
+    res[0] = sig1_sha512(vs1[2]) + vs2[1] + sig0_sha512(vd[1]) + vd[0];
+    res[1] = sig1_sha512(vs1[3]) + vs2[2] + sig0_sha512(vd[2]) + vd[1];
+    res[2] = sig1_sha512(res[0]) + vs2[3] + sig0_sha512(vd[3]) + vd[2];
+    res[3] = sig1_sha512(res[1]) + vs1[0] + sig0_sha512(vs2[0]) + vd[3];
+    vd[3] = res[3];
+    vd[2] = res[2];
+    vd[1] = res[1];
+    vd[0] = res[0];
+}
+
+void HELPER(vsha2ms_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
+                        uint32_t desc)
+{
+    uint32_t sew = FIELD_EX64(env->vtype, VTYPE, VSEW);
+    uint32_t esz = sew == MO_32 ? 4 : 8;
+    uint32_t total_elems;
+    uint32_t vta = vext_vta(desc);
+
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
+        if (sew == MO_32) {
+            vsha2ms_e32(((uint32_t *)vd) + i * 4, ((uint32_t *)vs1) + i * 4,
+                        ((uint32_t *)vs2) + i * 4);
+        } else {
+            /* If not 32 then SEW should be 64 */
+            vsha2ms_e64(((uint64_t *)vd) + i * 4, ((uint64_t *)vs1) + i * 4,
+                        ((uint64_t *)vs2) + i * 4);
+        }
+    }
+    /* set tail elements to 1s */
+    total_elems = vext_get_total_elems(env, desc, esz);
+    vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
+    env->vstart = 0;
+}
+
+static inline uint64_t sum0_64(uint64_t x)
+{
+    return ror64(x, 28) ^ ror64(x, 34) ^ ror64(x, 39);
+}
+
+static inline uint32_t sum0_32(uint32_t x)
+{
+    return ror32(x, 2) ^ ror32(x, 13) ^ ror32(x, 22);
+}
+
+static inline uint64_t sum1_64(uint64_t x)
+{
+    return ror64(x, 14) ^ ror64(x, 18) ^ ror64(x, 41);
+}
+
+static inline uint32_t sum1_32(uint32_t x)
+{
+    return ror32(x, 6) ^ ror32(x, 11) ^ ror32(x, 25);
+}
+
+#define ch(x, y, z) ((x & y) ^ ((~x) & z))
+
+#define maj(x, y, z) ((x & y) ^ (x & z) ^ (y & z))
+
+static void vsha2c_64(uint64_t *vs2, uint64_t *vd, uint64_t *vs1)
+{
+    uint64_t a = vs2[3], b = vs2[2], e = vs2[1], f = vs2[0];
+    uint64_t c = vd[3], d = vd[2], g = vd[1], h = vd[0];
+    uint64_t W0 = vs1[0], W1 = vs1[1];
+    uint64_t T1 = h + sum1_64(e) + ch(e, f, g) + W0;
+    uint64_t T2 = sum0_64(a) + maj(a, b, c);
+
+    h = g;
+    g = f;
+    f = e;
+    e = d + T1;
+    d = c;
+    c = b;
+    b = a;
+    a = T1 + T2;
+
+    T1 = h + sum1_64(e) + ch(e, f, g) + W1;
+    T2 = sum0_64(a) + maj(a, b, c);
+    h = g;
+    g = f;
+    f = e;
+    e = d + T1;
+    d = c;
+    c = b;
+    b = a;
+    a = T1 + T2;
+
+    vd[0] = f;
+    vd[1] = e;
+    vd[2] = b;
+    vd[3] = a;
+}
+
+static void vsha2c_32(uint32_t *vs2, uint32_t *vd, uint32_t *vs1)
+{
+    uint32_t a = vs2[H4(3)], b = vs2[H4(2)], e = vs2[H4(1)], f = vs2[H4(0)];
+    uint32_t c = vd[H4(3)], d = vd[H4(2)], g = vd[H4(1)], h = vd[H4(0)];
+    uint32_t W0 = vs1[H4(0)], W1 = vs1[H4(1)];
+    uint32_t T1 = h + sum1_32(e) + ch(e, f, g) + W0;
+    uint32_t T2 = sum0_32(a) + maj(a, b, c);
+
+    h = g;
+    g = f;
+    f = e;
+    e = d + T1;
+    d = c;
+    c = b;
+    b = a;
+    a = T1 + T2;
+
+    T1 = h + sum1_32(e) + ch(e, f, g) + W1;
+    T2 = sum0_32(a) + maj(a, b, c);
+    h = g;
+    g = f;
+    f = e;
+    e = d + T1;
+    d = c;
+    c = b;
+    b = a;
+    a = T1 + T2;
+
+    vd[H4(0)] = f;
+    vd[H4(1)] = e;
+    vd[H4(2)] = b;
+    vd[H4(3)] = a;
+}
+
+void HELPER(vsha2ch32_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
+                          uint32_t desc)
+{
+    const uint32_t esz = 4;
+    uint32_t total_elems;
+    uint32_t vta = vext_vta(desc);
+
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
+        vsha2c_32(((uint32_t *)vs2) + 4 * i, ((uint32_t *)vd) + 4 * i,
+                  ((uint32_t *)vs1) + 4 * i + 2);
+    }
+
+    /* set tail elements to 1s */
+    total_elems = vext_get_total_elems(env, desc, esz);
+    vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
+    env->vstart = 0;
+}
+
+void HELPER(vsha2ch64_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
+                          uint32_t desc)
+{
+    const uint32_t esz = 8;
+    uint32_t total_elems;
+    uint32_t vta = vext_vta(desc);
+
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
+        vsha2c_64(((uint64_t *)vs2) + 4 * i, ((uint64_t *)vd) + 4 * i,
+                  ((uint64_t *)vs1) + 4 * i + 2);
+    }
+
+    /* set tail elements to 1s */
+    total_elems = vext_get_total_elems(env, desc, esz);
+    vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
+    env->vstart = 0;
+}
+
+void HELPER(vsha2cl32_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
+                          uint32_t desc)
+{
+    const uint32_t esz = 4;
+    uint32_t total_elems;
+    uint32_t vta = vext_vta(desc);
+
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
+        vsha2c_32(((uint32_t *)vs2) + 4 * i, ((uint32_t *)vd) + 4 * i,
+                  (((uint32_t *)vs1) + 4 * i));
+    }
+
+    /* set tail elements to 1s */
+    total_elems = vext_get_total_elems(env, desc, esz);
+    vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
+    env->vstart = 0;
+}
+
+void HELPER(vsha2cl64_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
+                          uint32_t desc)
+{
+    uint32_t esz = 8;
+    uint32_t total_elems;
+    uint32_t vta = vext_vta(desc);
+
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
+        vsha2c_64(((uint64_t *)vs2) + 4 * i, ((uint64_t *)vd) + 4 * i,
+                  (((uint64_t *)vs1) + 4 * i));
+    }
+
+    /* set tail elements to 1s */
+    total_elems = vext_get_total_elems(env, desc, esz);
+    vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
+    env->vstart = 0;
+}
diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc b/target/riscv/insn_trans/trans_rvvk.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn_trans/trans_rvvk.c.inc
+++ b/target/riscv/insn_trans/trans_rvvk.c.inc
@@ -XXX,XX +XXX,XX @@ static bool vaeskf2_check(DisasContext *s, arg_vaeskf2_vi *a)
 
 GEN_VI_UNMASKED_TRANS(vaeskf1_vi, vaeskf1_check, ZVKNED_EGS)
 GEN_VI_UNMASKED_TRANS(vaeskf2_vi, vaeskf2_check, ZVKNED_EGS)
+
+/*
+ * Zvknh
+ */
+
+#define ZVKNH_EGS 4
+
+#define GEN_VV_UNMASKED_TRANS(NAME, CHECK, EGS)                               \
+    static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                    \
+    {                                                                         \
+        if (CHECK(s, a)) {                                                    \
+            uint32_t data = 0;                                                \
+            TCGLabel *over = gen_new_label();                                 \
+            TCGv_i32 egs;                                                     \
+                                                                              \
+            if (!s->vstart_eq_zero || !s->vl_eq_vlmax) {                      \
+                /* save opcode for unwinding in case we throw an exception */ \
+                decode_save_opc(s);                                           \
+                egs = tcg_constant_i32(EGS);                                  \
+                gen_helper_egs_check(egs, cpu_env);                           \
+                tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);    \
+            }                                                                 \
+                                                                              \
+            data = FIELD_DP32(data, VDATA, VM, a->vm);                        \
+            data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                    \
+            data = FIELD_DP32(data, VDATA, VTA, s->vta);                      \
+            data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);    \
+            data = FIELD_DP32(data, VDATA, VMA, s->vma);                      \
+                                                                              \
+            tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),       \
+                               vreg_ofs(s, a->rs2), cpu_env,                  \
+                               s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8,    \
+                               data, gen_helper_##NAME);                      \
+                                                                              \
+            mark_vs_dirty(s);                                                 \
+            gen_set_label(over);                                              \
+            return true;                                                      \
+        }                                                                     \
+        return false;                                                         \
+    }
+
+static bool vsha_check_sew(DisasContext *s)
+{
+    return (s->cfg_ptr->ext_zvknha == true && s->sew == MO_32) ||
+           (s->cfg_ptr->ext_zvknhb == true &&
+            (s->sew == MO_32 || s->sew == MO_64));
+}
+
+static bool vsha_check(DisasContext *s, arg_rmrr *a)
+{
+    int egw_bytes = ZVKNH_EGS << s->sew;
+    int mult = 1 << MAX(s->lmul, 0);
+    return opivv_check(s, a) &&
+           vsha_check_sew(s) &&
+           MAXSZ(s) >= egw_bytes &&
+           !is_overlapped(a->rd, mult, a->rs1, mult) &&
+           !is_overlapped(a->rd, mult, a->rs2, mult) &&
+           s->lmul >= 0;
+}
+
+GEN_VV_UNMASKED_TRANS(vsha2ms_vv, vsha_check, ZVKNH_EGS)
+
+static bool trans_vsha2cl_vv(DisasContext *s, arg_rmrr *a)
+{
+    if (vsha_check(s, a)) {
+        uint32_t data = 0;
+        TCGLabel *over = gen_new_label();
+        TCGv_i32 egs;
+
+        if (!s->vstart_eq_zero || !s->vl_eq_vlmax) {
+            /* save opcode for unwinding in case we throw an exception */
+            decode_save_opc(s);
+            egs = tcg_constant_i32(ZVKNH_EGS);
+            gen_helper_egs_check(egs, cpu_env);
+            tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+        }
+
+        data = FIELD_DP32(data, VDATA, VM, a->vm);
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);
+        data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);
+        data = FIELD_DP32(data, VDATA, VMA, s->vma);
+
+        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),
+            vreg_ofs(s, a->rs2), cpu_env, s->cfg_ptr->vlen / 8,
+            s->cfg_ptr->vlen / 8, data,
+            s->sew == MO_32 ?
+                gen_helper_vsha2cl32_vv : gen_helper_vsha2cl64_vv);
+
+        mark_vs_dirty(s);
+        gen_set_label(over);
+        return true;
+    }
+    return false;
+}
+
+static bool trans_vsha2ch_vv(DisasContext *s, arg_rmrr *a)
+{
+    if (vsha_check(s, a)) {
+        uint32_t data = 0;
+        TCGLabel *over = gen_new_label();
+        TCGv_i32 egs;
+
+        if (!s->vstart_eq_zero || !s->vl_eq_vlmax) {
+            /* save opcode for unwinding in case we throw an exception */
+            decode_save_opc(s);
+            egs = tcg_constant_i32(ZVKNH_EGS);
+            gen_helper_egs_check(egs, cpu_env);
+            tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
+        }
+
+        data = FIELD_DP32(data, VDATA, VM, a->vm);
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);
+        data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);
+        data = FIELD_DP32(data, VDATA, VMA, s->vma);
+
+        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),
+            vreg_ofs(s, a->rs2), cpu_env, s->cfg_ptr->vlen / 8,
+            s->cfg_ptr->vlen / 8, data,
+            s->sew == MO_32 ?
+                gen_helper_vsha2ch32_vv : gen_helper_vsha2ch64_vv);
+
+        mark_vs_dirty(s);
+        gen_set_label(over);
+        return true;
+    }
+    return false;
+}
-- 
2.41.0

From: Lawrence Hunter <lawrence.hunter@codethink.co.uk>

This commit adds support for the Zvksh vector-crypto extension, which
consists of the following instructions:

* vsm3me.vv
* vsm3c.vi

Translation functions are defined in
`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
`target/riscv/vcrypto_helper.c`.

Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
[max.chou@sifive.com: Replaced vstart checking by TCG op]
Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
Signed-off-by: Max Chou <max.chou@sifive.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
[max.chou@sifive.com: Exposed x-zvksh property]
Message-ID: <20230711165917.2629866-12-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu_cfg.h                   |   1 +
 target/riscv/helper.h                    |   3 +
 target/riscv/insn32.decode               |   4 +
 target/riscv/cpu.c                       |   6 +-
 target/riscv/vcrypto_helper.c            | 134 +++++++++++++++++++++++
 target/riscv/insn_trans/trans_rvvk.c.inc |  31 ++++++
 6 files changed, 177 insertions(+), 2 deletions(-)

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

This commit adds support for the Zvkg vector-crypto extension, which
consists of the following instructions:

* vgmul.vv
* vghsh.vv

Translation functions are defined in
`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
`target/riscv/vcrypto_helper.c`.

Co-authored-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
[max.chou@sifive.com: Replaced vstart checking by TCG op]
Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Max Chou <max.chou@sifive.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
[max.chou@sifive.com: Exposed x-zvkg property]
[max.chou@sifive.com: Replaced uint by int for cross win32 build]
Message-ID: <20230711165917.2629866-13-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu_cfg.h                   |  1 +
 target/riscv/helper.h                    |  3 +
 target/riscv/insn32.decode               |  4 ++
 target/riscv/cpu.c                       |  6 +-
 target/riscv/vcrypto_helper.c            | 72 ++++++++++++++++++++++++
 target/riscv/insn_trans/trans_rvvk.c.inc | 30 ++++++++++
 6 files changed, 114 insertions(+), 2 deletions(-)

From: Max Chou <max.chou@sifive.com>

Allows sharing of sm4_subword between different targets.

Signed-off-by: Max Chou <max.chou@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Max Chou <max.chou@sifive.com>
Message-ID: <20230711165917.2629866-14-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 include/crypto/sm4.h           |  8 ++++++++
 target/arm/tcg/crypto_helper.c | 10 ++--------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/crypto/sm4.h b/include/crypto/sm4.h
index XXXXXXX..XXXXXXX 100644
--- a/include/crypto/sm4.h
+++ b/include/crypto/sm4.h
@@ -XXX,XX +XXX,XX @@
 
 extern const uint8_t sm4_sbox[256];
 
+static inline uint32_t sm4_subword(uint32_t word)
+{
+    return sm4_sbox[word & 0xff] |
+           sm4_sbox[(word >> 8) & 0xff] << 8 |
+           sm4_sbox[(word >> 16) & 0xff] << 16 |
+           sm4_sbox[(word >> 24) & 0xff] << 24;
+}
+
 #endif
diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/crypto_helper.c
+++ b/target/arm/tcg/crypto_helper.c
@@ -XXX,XX +XXX,XX @@ static void do_crypto_sm4e(uint64_t *rd, uint64_t *rn, uint64_t *rm)
             CR_ST_WORD(d, (i + 3) % 4) ^
             CR_ST_WORD(n, i);
 
-        t = sm4_sbox[t & 0xff] |
-            sm4_sbox[(t >> 8) & 0xff] << 8 |
-            sm4_sbox[(t >> 16) & 0xff] << 16 |
-            sm4_sbox[(t >> 24) & 0xff] << 24;
+        t = sm4_subword(t);
 
         CR_ST_WORD(d, i) ^= t ^ rol32(t, 2) ^ rol32(t, 10) ^ rol32(t, 18) ^
                             rol32(t, 24);
@@ -XXX,XX +XXX,XX @@ static void do_crypto_sm4ekey(uint64_t *rd, uint64_t *rn, uint64_t *rm)
             CR_ST_WORD(d, (i + 3) % 4) ^
             CR_ST_WORD(m, i);
 
-        t = sm4_sbox[t & 0xff] |
-            sm4_sbox[(t >> 8) & 0xff] << 8 |
-            sm4_sbox[(t >> 16) & 0xff] << 16 |
-            sm4_sbox[(t >> 24) & 0xff] << 24;
+        t = sm4_subword(t);
 
         CR_ST_WORD(d, i) ^= t ^ rol32(t, 13) ^ rol32(t, 23);
     }
-- 
2.41.0

From: Max Chou <max.chou@sifive.com>

Adds sm4_ck constant for use in sm4 cryptography across different targets.

Signed-off-by: Max Chou <max.chou@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Signed-off-by: Max Chou <max.chou@sifive.com>
Message-ID: <20230711165917.2629866-15-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 include/crypto/sm4.h |  1 +
 crypto/sm4.c         | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/crypto/sm4.h b/include/crypto/sm4.h
index XXXXXXX..XXXXXXX 100644
--- a/include/crypto/sm4.h
+++ b/include/crypto/sm4.h
@@ -XXX,XX +XXX,XX @@
 #define QEMU_SM4_H
 
 extern const uint8_t sm4_sbox[256];
+extern const uint32_t sm4_ck[32];
 
 static inline uint32_t sm4_subword(uint32_t word)
 {
diff --git a/crypto/sm4.c b/crypto/sm4.c
index XXXXXXX..XXXXXXX 100644
--- a/crypto/sm4.c
+++ b/crypto/sm4.c
@@ -XXX,XX +XXX,XX @@ uint8_t const sm4_sbox[] = {
     0x79, 0xee, 0x5f, 0x3e, 0xd7, 0xcb, 0x39, 0x48,
 };
 
+uint32_t const sm4_ck[] = {
+    0x00070e15, 0x1c232a31, 0x383f464d, 0x545b6269,
+    0x70777e85, 0x8c939aa1, 0xa8afb6bd, 0xc4cbd2d9,
+    0xe0e7eef5, 0xfc030a11, 0x181f262d, 0x343b4249,
+    0x50575e65, 0x6c737a81, 0x888f969d, 0xa4abb2b9,
+    0xc0c7ced5, 0xdce3eaf1, 0xf8ff060d, 0x141b2229,
+    0x30373e45, 0x4c535a61, 0x686f767d, 0x848b9299,
+    0xa0a7aeb5, 0xbcc3cad1, 0xd8dfe6ed, 0xf4fb0209,
+    0x10171e25, 0x2c333a41, 0x484f565d, 0x646b7279
+};
-- 
2.41.0

From: Max Chou <max.chou@sifive.com>

This commit adds support for the Zvksed vector-crypto extension, which
consists of the following instructions:

* vsm4k.vi
* vsm4r.[vv,vs]

Translation functions are defined in
`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
`target/riscv/vcrypto_helper.c`.

Signed-off-by: Max Chou <max.chou@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
[lawrence.hunter@codethink.co.uk: Moved SM4 functions from
crypto_helper.c to vcrypto_helper.c]
[nazar.kazakov@codethink.co.uk: Added alignment checks, refactored code to
use macros, and minor style changes]
Signed-off-by: Max Chou <max.chou@sifive.com>
Message-ID: <20230711165917.2629866-16-max.chou@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu_cfg.h                   |   1 +
 target/riscv/helper.h                    |   4 +
 target/riscv/insn32.decode               |   5 +
 target/riscv/cpu.c                       |   5 +-
 target/riscv/vcrypto_helper.c            | 127 +++++++++++++++++++++++
 target/riscv/insn_trans/trans_rvvk.c.inc |  43 ++++++++
 6 files changed, 184 insertions(+), 1 deletion(-)

From: Rob Bradford <rbradford@rivosinc.com>

These are WARL fields - zero out the bits for unavailable counters and
special case the TM bit in mcountinhibit which is hardwired to zero.
This patch achieves this by modifying the value written so that any use
of the field will see the correctly masked bits.

Tested by modifying OpenSBI to write max value to these CSRs and upon
subsequent read the appropriate number of bits for number of PMUs is
enabled and the TM bit is zero in mcountinhibit.

Signed-off-by: Rob Bradford <rbradford@rivosinc.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Message-ID: <20230802124906.24197-1-rbradford@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/csr.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -XXX,XX +XXX,XX @@ static RISCVException write_mcountinhibit(CPURISCVState *env, int csrno,
 {
     int cidx;
     PMUCTRState *counter;
+    RISCVCPU *cpu = env_archcpu(env);
 
-    env->mcountinhibit = val;
+    /* WARL register - disable unavailable counters; TM bit is always 0 */
+    env->mcountinhibit =
+        val & (cpu->pmu_avail_ctrs | COUNTEREN_CY | COUNTEREN_IR);
 
     /* Check if any other counter is also monitoring cycles/instructions */
     for (cidx = 0; cidx < RV_MAX_MHPMCOUNTERS; cidx++) {
@@ -XXX,XX +XXX,XX @@ static RISCVException read_mcounteren(CPURISCVState *env, int csrno,
 static RISCVException write_mcounteren(CPURISCVState *env, int csrno,
                                        target_ulong val)
 {
-    env->mcounteren = val;
+    RISCVCPU *cpu = env_archcpu(env);
+
+    /* WARL register - disable unavailable counters */
+    env->mcounteren = val & (cpu->pmu_avail_ctrs | COUNTEREN_CY | COUNTEREN_TM |
+                             COUNTEREN_IR);
     return RISCV_EXCP_NONE;
 }
 
-- 
2.41.0

From: Jason Chien <jason.chien@sifive.com>

RVA23 Profiles states:
The RVA23 profiles are intended to be used for 64-bit application
processors that will run rich OS stacks from standard binary OS
distributions and with a substantial number of third-party binary user
applications that will be supported over a considerable length of time
in the field.

The chapter 4 of the unprivileged spec introduces the Zihintntl extension
and Zihintntl is a mandatory extension presented in RVA23 Profiles, whose
purpose is to enable application and operating system portability across
different implementations. Thus the DTS should contain the Zihintntl ISA
string in order to pass to software.

The unprivileged spec states:
Like any HINTs, these instructions may be freely ignored. Hence, although
they are described in terms of cache-based memory hierarchies, they do not
mandate the provision of caches.

These instructions are encoded with non-used opcode, e.g. ADD x0, x0, x2,
which QEMU already supports, and QEMU does not emulate cache. Therefore
these instructions can be considered as a no-op, and we only need to add
a new property for the Zihintntl extension.

Reviewed-by: Frank Chang <frank.chang@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Jason Chien <jason.chien@sifive.com>
Message-ID: <20230726074049.19505-2-jason.chien@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu_cfg.h | 1 +
 target/riscv/cpu.c     | 2 ++
 2 files changed, 3 insertions(+)

From: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>

Commit a47842d ("riscv: Add support for the Zfa extension") implemented the zfa extension.
However, it has some typos for fleq.d and fltq.d. Both of them misused the fltq.s
helper function.

Fixes: a47842d ("riscv: Add support for the Zfa extension")
Signed-off-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Message-ID: <20230728003906.768-1-zhiwei_liu@linux.alibaba.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/insn_trans/trans_rvzfa.c.inc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvzfa.c.inc b/target/riscv/insn_trans/trans_rvzfa.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/insn_trans/trans_rvzfa.c.inc
+++ b/target/riscv/insn_trans/trans_rvzfa.c.inc
@@ -XXX,XX +XXX,XX @@ bool trans_fleq_d(DisasContext *ctx, arg_fleq_d *a)
     TCGv_i64 src1 = get_fpr_hs(ctx, a->rs1);
     TCGv_i64 src2 = get_fpr_hs(ctx, a->rs2);
 
-    gen_helper_fltq_s(dest, cpu_env, src1, src2);
+    gen_helper_fleq_d(dest, cpu_env, src1, src2);
     gen_set_gpr(ctx, a->rd, dest);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ bool trans_fltq_d(DisasContext *ctx, arg_fltq_d *a)
     TCGv_i64 src1 = get_fpr_hs(ctx, a->rs1);
     TCGv_i64 src2 = get_fpr_hs(ctx, a->rs2);
 
-    gen_helper_fltq_s(dest, cpu_env, src1, src2);
+    gen_helper_fltq_d(dest, cpu_env, src1, src2);
     gen_set_gpr(ctx, a->rd, dest);
     return true;
 }
-- 
2.41.0

From: Jason Chien <jason.chien@sifive.com>

When writing the upper mtime, we should keep the original lower mtime
whose value is given by cpu_riscv_read_rtc() instead of
cpu_riscv_read_rtc_raw(). The same logic applies to writes to lower mtime.

Signed-off-by: Jason Chien <jason.chien@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20230728082502.26439-1-jason.chien@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 hw/intc/riscv_aclint.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/riscv_aclint.c
+++ b/hw/intc/riscv_aclint.c
@@ -XXX,XX +XXX,XX @@ static void riscv_aclint_mtimer_write(void *opaque, hwaddr addr,
         return;
     } else if (addr == mtimer->time_base || addr == mtimer->time_base + 4) {
         uint64_t rtc_r = cpu_riscv_read_rtc_raw(mtimer->timebase_freq);
+        uint64_t rtc = cpu_riscv_read_rtc(mtimer);
 
         if (addr == mtimer->time_base) {
             if (size == 4) {
                 /* time_lo for RV32/RV64 */
-                mtimer->time_delta = ((rtc_r & ~0xFFFFFFFFULL) | value) - rtc_r;
+                mtimer->time_delta = ((rtc & ~0xFFFFFFFFULL) | value) - rtc_r;
             } else {
                 /* time for RV64 */
                 mtimer->time_delta = value - rtc_r;
@@ -XXX,XX +XXX,XX @@ static void riscv_aclint_mtimer_write(void *opaque, hwaddr addr,
         } else {
             if (size == 4) {
                 /* time_hi for RV32/RV64 */
-                mtimer->time_delta = (value << 32 | (rtc_r & 0xFFFFFFFF)) - rtc_r;
+                mtimer->time_delta = (value << 32 | (rtc & 0xFFFFFFFF)) - rtc_r;
             } else {
                 qemu_log_mask(LOG_GUEST_ERROR,
                               "aclint-mtimer: invalid time_hi write: %08x",
-- 
2.41.0

From: Jason Chien <jason.chien@sifive.com>

The variables whose values are given by cpu_riscv_read_rtc() should be named
"rtc". The variables whose value are given by cpu_riscv_read_rtc_raw()
should be named "rtc_r".

Signed-off-by: Jason Chien <jason.chien@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20230728082502.26439-2-jason.chien@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 hw/intc/riscv_aclint.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/riscv_aclint.c
+++ b/hw/intc/riscv_aclint.c
@@ -XXX,XX +XXX,XX @@ static void riscv_aclint_mtimer_write_timecmp(RISCVAclintMTimerState *mtimer,
     uint64_t next;
     uint64_t diff;
 
-    uint64_t rtc_r = cpu_riscv_read_rtc(mtimer);
+    uint64_t rtc = cpu_riscv_read_rtc(mtimer);
 
     /* Compute the relative hartid w.r.t the socket */
     hartid = hartid - mtimer->hartid_base;
 
     mtimer->timecmp[hartid] = value;
-    if (mtimer->timecmp[hartid] <= rtc_r) {
+    if (mtimer->timecmp[hartid] <= rtc) {
         /*
          * If we're setting an MTIMECMP value in the "past",
          * immediately raise the timer interrupt
@@ -XXX,XX +XXX,XX @@ static void riscv_aclint_mtimer_write_timecmp(RISCVAclintMTimerState *mtimer,
 
     /* otherwise, set up the future timer interrupt */
     qemu_irq_lower(mtimer->timer_irqs[hartid]);
-    diff = mtimer->timecmp[hartid] - rtc_r;
+    diff = mtimer->timecmp[hartid] - rtc;
     /* back to ns (note args switched in muldiv64) */
     uint64_t ns_diff = muldiv64(diff, NANOSECONDS_PER_SECOND, timebase_freq);
 
-- 
2.41.0

From: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>

We should not use types dependend on host arch for target_ucontext.
This bug is found when run rv32 applications.

Signed-off-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230811055438.1945-1-zhiwei_liu@linux.alibaba.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 linux-user/riscv/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/riscv/signal.c b/linux-user/riscv/signal.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/riscv/signal.c
+++ b/linux-user/riscv/signal.c
@@ -XXX,XX +XXX,XX @@ struct target_sigcontext {
 }; /* cf. riscv-linux:arch/riscv/include/uapi/asm/ptrace.h */
 
 struct target_ucontext {
-    unsigned long uc_flags;
-    struct target_ucontext *uc_link;
+    abi_ulong uc_flags;
+    abi_ptr uc_link;
     target_stack_t uc_stack;
     target_sigset_t uc_sigmask;
     uint8_t   __unused[1024 / 8 - sizeof(target_sigset_t)];
-- 
2.41.0

From: Yong-Xuan Wang <yongxuan.wang@sifive.com>

In this patch, we create the APLIC and IMSIC FDT helper functions and
remove M mode AIA devices when using KVM acceleration.

Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Jim Shu <jim.shu@sifive.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Message-ID: <20230727102439.22554-2-yongxuan.wang@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 hw/riscv/virt.c | 290 +++++++++++++++++++++++-------------------------
 1 file changed, 137 insertions(+), 153 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -XXX,XX +XXX,XX @@ static uint32_t imsic_num_bits(uint32_t count)
     return ret;
 }
 
-static void create_fdt_imsic(RISCVVirtState *s, const MemMapEntry *memmap,
-                             uint32_t *phandle, uint32_t *intc_phandles,
-                             uint32_t *msi_m_phandle, uint32_t *msi_s_phandle)
+static void create_fdt_one_imsic(RISCVVirtState *s, hwaddr base_addr,
+                                 uint32_t *intc_phandles, uint32_t msi_phandle,
+                                 bool m_mode, uint32_t imsic_guest_bits)
 {
     int cpu, socket;
     char *imsic_name;
     MachineState *ms = MACHINE(s);
     int socket_count = riscv_socket_count(ms);
-    uint32_t imsic_max_hart_per_socket, imsic_guest_bits;
+    uint32_t imsic_max_hart_per_socket;
     uint32_t *imsic_cells, *imsic_regs, imsic_addr, imsic_size;
 
-    *msi_m_phandle = (*phandle)++;
-    *msi_s_phandle = (*phandle)++;
     imsic_cells = g_new0(uint32_t, ms->smp.cpus * 2);
     imsic_regs = g_new0(uint32_t, socket_count * 4);
 
-    /* M-level IMSIC node */
     for (cpu = 0; cpu < ms->smp.cpus; cpu++) {
         imsic_cells[cpu * 2 + 0] = cpu_to_be32(intc_phandles[cpu]);
-        imsic_cells[cpu * 2 + 1] = cpu_to_be32(IRQ_M_EXT);
+        imsic_cells[cpu * 2 + 1] = cpu_to_be32(m_mode ? IRQ_M_EXT : IRQ_S_EXT);
     }
-    imsic_max_hart_per_socket = 0;
-    for (socket = 0; socket < socket_count; socket++) {
-        imsic_addr = memmap[VIRT_IMSIC_M].base +
-                     socket * VIRT_IMSIC_GROUP_MAX_SIZE;
-        imsic_size = IMSIC_HART_SIZE(0) * s->soc[socket].num_harts;
-        imsic_regs[socket * 4 + 0] = 0;
-        imsic_regs[socket * 4 + 1] = cpu_to_be32(imsic_addr);
-        imsic_regs[socket * 4 + 2] = 0;
-        imsic_regs[socket * 4 + 3] = cpu_to_be32(imsic_size);
-        if (imsic_max_hart_per_socket < s->soc[socket].num_harts) {
-            imsic_max_hart_per_socket = s->soc[socket].num_harts;
-        }
-    }
-    imsic_name = g_strdup_printf("/soc/imsics@%lx",
-        (unsigned long)memmap[VIRT_IMSIC_M].base);
-    qemu_fdt_add_subnode(ms->fdt, imsic_name);
-    qemu_fdt_setprop_string(ms->fdt, imsic_name, "compatible",
-        "riscv,imsics");
-    qemu_fdt_setprop_cell(ms->fdt, imsic_name, "#interrupt-cells",
-        FDT_IMSIC_INT_CELLS);
-    qemu_fdt_setprop(ms->fdt, imsic_name, "interrupt-controller",
-        NULL, 0);
-    qemu_fdt_setprop(ms->fdt, imsic_name, "msi-controller",
-        NULL, 0);
-    qemu_fdt_setprop(ms->fdt, imsic_name, "interrupts-extended",
-        imsic_cells, ms->smp.cpus * sizeof(uint32_t) * 2);
-    qemu_fdt_setprop(ms->fdt, imsic_name, "reg", imsic_regs,
-        socket_count * sizeof(uint32_t) * 4);
-    qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,num-ids",
-        VIRT_IRQCHIP_NUM_MSIS);
-    if (socket_count > 1) {
-        qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,hart-index-bits",
-            imsic_num_bits(imsic_max_hart_per_socket));
-        qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,group-index-bits",
-            imsic_num_bits(socket_count));
-        qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,group-index-shift",
-            IMSIC_MMIO_GROUP_MIN_SHIFT);
-    }
-    qemu_fdt_setprop_cell(ms->fdt, imsic_name, "phandle", *msi_m_phandle);
-
-    g_free(imsic_name);
 
-    /* S-level IMSIC node */
-    for (cpu = 0; cpu < ms->smp.cpus; cpu++) {
-        imsic_cells[cpu * 2 + 0] = cpu_to_be32(intc_phandles[cpu]);
-        imsic_cells[cpu * 2 + 1] = cpu_to_be32(IRQ_S_EXT);
-    }
-    imsic_guest_bits = imsic_num_bits(s->aia_guests + 1);
     imsic_max_hart_per_socket = 0;
     for (socket = 0; socket < socket_count; socket++) {
-        imsic_addr = memmap[VIRT_IMSIC_S].base +
-                     socket * VIRT_IMSIC_GROUP_MAX_SIZE;
+        imsic_addr = base_addr + socket * VIRT_IMSIC_GROUP_MAX_SIZE;
         imsic_size = IMSIC_HART_SIZE(imsic_guest_bits) *
                      s->soc[socket].num_harts;
         imsic_regs[socket * 4 + 0] = 0;
@@ -XXX,XX +XXX,XX @@ static void create_fdt_imsic(RISCVVirtState *s, const MemMapEntry *memmap,
             imsic_max_hart_per_socket = s->soc[socket].num_harts;
         }
     }
-    imsic_name = g_strdup_printf("/soc/imsics@%lx",
-        (unsigned long)memmap[VIRT_IMSIC_S].base);
+
+    imsic_name = g_strdup_printf("/soc/imsics@%lx", (unsigned long)base_addr);
     qemu_fdt_add_subnode(ms->fdt, imsic_name);
-    qemu_fdt_setprop_string(ms->fdt, imsic_name, "compatible",
-        "riscv,imsics");
+    qemu_fdt_setprop_string(ms->fdt, imsic_name, "compatible", "riscv,imsics");
     qemu_fdt_setprop_cell(ms->fdt, imsic_name, "#interrupt-cells",
-        FDT_IMSIC_INT_CELLS);
-    qemu_fdt_setprop(ms->fdt, imsic_name, "interrupt-controller",
-        NULL, 0);
-    qemu_fdt_setprop(ms->fdt, imsic_name, "msi-controller",
-        NULL, 0);
+                          FDT_IMSIC_INT_CELLS);
+    qemu_fdt_setprop(ms->fdt, imsic_name, "interrupt-controller", NULL, 0);
+    qemu_fdt_setprop(ms->fdt, imsic_name, "msi-controller", NULL, 0);
     qemu_fdt_setprop(ms->fdt, imsic_name, "interrupts-extended",
-        imsic_cells, ms->smp.cpus * sizeof(uint32_t) * 2);
+                     imsic_cells, ms->smp.cpus * sizeof(uint32_t) * 2);
     qemu_fdt_setprop(ms->fdt, imsic_name, "reg", imsic_regs,
-        socket_count * sizeof(uint32_t) * 4);
+                     socket_count * sizeof(uint32_t) * 4);
     qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,num-ids",
-        VIRT_IRQCHIP_NUM_MSIS);
+                     VIRT_IRQCHIP_NUM_MSIS);
+
     if (imsic_guest_bits) {
         qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,guest-index-bits",
-            imsic_guest_bits);
+                              imsic_guest_bits);
     }
+
     if (socket_count > 1) {
         qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,hart-index-bits",
-            imsic_num_bits(imsic_max_hart_per_socket));
+                              imsic_num_bits(imsic_max_hart_per_socket));
         qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,group-index-bits",
-            imsic_num_bits(socket_count));
+                              imsic_num_bits(socket_count));
         qemu_fdt_setprop_cell(ms->fdt, imsic_name, "riscv,group-index-shift",
-            IMSIC_MMIO_GROUP_MIN_SHIFT);
+                              IMSIC_MMIO_GROUP_MIN_SHIFT);
     }
-    qemu_fdt_setprop_cell(ms->fdt, imsic_name, "phandle", *msi_s_phandle);
-    g_free(imsic_name);
+    qemu_fdt_setprop_cell(ms->fdt, imsic_name, "phandle", msi_phandle);
 
+    g_free(imsic_name);
     g_free(imsic_regs);
     g_free(imsic_cells);
 }
 
-static void create_fdt_socket_aplic(RISCVVirtState *s,
-                                    const MemMapEntry *memmap, int socket,
-                                    uint32_t msi_m_phandle,
-                                    uint32_t msi_s_phandle,
-                                    uint32_t *phandle,
-                                    uint32_t *intc_phandles,
-                                    uint32_t *aplic_phandles)
+static void create_fdt_imsic(RISCVVirtState *s, const MemMapEntry *memmap,
+                             uint32_t *phandle, uint32_t *intc_phandles,
+                             uint32_t *msi_m_phandle, uint32_t *msi_s_phandle)
+{
+    *msi_m_phandle = (*phandle)++;
+    *msi_s_phandle = (*phandle)++;
+
+    if (!kvm_enabled()) {
+        /* M-level IMSIC node */
+        create_fdt_one_imsic(s, memmap[VIRT_IMSIC_M].base, intc_phandles,
+                             *msi_m_phandle, true, 0);
+    }
+
+    /* S-level IMSIC node */
+    create_fdt_one_imsic(s, memmap[VIRT_IMSIC_S].base, intc_phandles,
+                         *msi_s_phandle, false,
+                         imsic_num_bits(s->aia_guests + 1));
+
+}
+
+static void create_fdt_one_aplic(RISCVVirtState *s, int socket,
+                                 unsigned long aplic_addr, uint32_t aplic_size,
+                                 uint32_t msi_phandle,
+                                 uint32_t *intc_phandles,
+                                 uint32_t aplic_phandle,
+                                 uint32_t aplic_child_phandle,
+                                 bool m_mode)
 {
     int cpu;
     char *aplic_name;
     uint32_t *aplic_cells;
-    unsigned long aplic_addr;
     MachineState *ms = MACHINE(s);
-    uint32_t aplic_m_phandle, aplic_s_phandle;
 
-    aplic_m_phandle = (*phandle)++;
-    aplic_s_phandle = (*phandle)++;
     aplic_cells = g_new0(uint32_t, s->soc[socket].num_harts * 2);
 
-    /* M-level APLIC node */
     for (cpu = 0; cpu < s->soc[socket].num_harts; cpu++) {
         aplic_cells[cpu * 2 + 0] = cpu_to_be32(intc_phandles[cpu]);
-        aplic_cells[cpu * 2 + 1] = cpu_to_be32(IRQ_M_EXT);
+        aplic_cells[cpu * 2 + 1] = cpu_to_be32(m_mode ? IRQ_M_EXT : IRQ_S_EXT);
     }
-    aplic_addr = memmap[VIRT_APLIC_M].base +
-                 (memmap[VIRT_APLIC_M].size * socket);
+
     aplic_name = g_strdup_printf("/soc/aplic@%lx", aplic_addr);
     qemu_fdt_add_subnode(ms->fdt, aplic_name);
     qemu_fdt_setprop_string(ms->fdt, aplic_name, "compatible", "riscv,aplic");
     qemu_fdt_setprop_cell(ms->fdt, aplic_name,
-        "#interrupt-cells", FDT_APLIC_INT_CELLS);
+                          "#interrupt-cells", FDT_APLIC_INT_CELLS);
     qemu_fdt_setprop(ms->fdt, aplic_name, "interrupt-controller", NULL, 0);
+
     if (s->aia_type == VIRT_AIA_TYPE_APLIC) {
         qemu_fdt_setprop(ms->fdt, aplic_name, "interrupts-extended",
-            aplic_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 2);
+                         aplic_cells,
+                         s->soc[socket].num_harts * sizeof(uint32_t) * 2);
     } else {
-        qemu_fdt_setprop_cell(ms->fdt, aplic_name, "msi-parent",
-            msi_m_phandle);
+        qemu_fdt_setprop_cell(ms->fdt, aplic_name, "msi-parent", msi_phandle);
     }
+
     qemu_fdt_setprop_cells(ms->fdt, aplic_name, "reg",
-        0x0, aplic_addr, 0x0, memmap[VIRT_APLIC_M].size);
+                           0x0, aplic_addr, 0x0, aplic_size);
     qemu_fdt_setprop_cell(ms->fdt, aplic_name, "riscv,num-sources",
-        VIRT_IRQCHIP_NUM_SOURCES);
-    qemu_fdt_setprop_cell(ms->fdt, aplic_name, "riscv,children",
-        aplic_s_phandle);
-    qemu_fdt_setprop_cells(ms->fdt, aplic_name, "riscv,delegate",
-        aplic_s_phandle, 0x1, VIRT_IRQCHIP_NUM_SOURCES);
+                          VIRT_IRQCHIP_NUM_SOURCES);
+
+    if (aplic_child_phandle) {
+        qemu_fdt_setprop_cell(ms->fdt, aplic_name, "riscv,children",
+                              aplic_child_phandle);
+        qemu_fdt_setprop_cells(ms->fdt, aplic_name, "riscv,delegate",
+                               aplic_child_phandle, 0x1,
+                               VIRT_IRQCHIP_NUM_SOURCES);
+    }
+
     riscv_socket_fdt_write_id(ms, aplic_name, socket);
-    qemu_fdt_setprop_cell(ms->fdt, aplic_name, "phandle", aplic_m_phandle);
+    qemu_fdt_setprop_cell(ms->fdt, aplic_name, "phandle", aplic_phandle);
+
     g_free(aplic_name);
+    g_free(aplic_cells);
+}
 
-    /* S-level APLIC node */
-    for (cpu = 0; cpu < s->soc[socket].num_harts; cpu++) {
-        aplic_cells[cpu * 2 + 0] = cpu_to_be32(intc_phandles[cpu]);
-        aplic_cells[cpu * 2 + 1] = cpu_to_be32(IRQ_S_EXT);
+static void create_fdt_socket_aplic(RISCVVirtState *s,
+                                    const MemMapEntry *memmap, int socket,
+                                    uint32_t msi_m_phandle,
+                                    uint32_t msi_s_phandle,
+                                    uint32_t *phandle,
+                                    uint32_t *intc_phandles,
+                                    uint32_t *aplic_phandles)
+{
+    char *aplic_name;
+    unsigned long aplic_addr;
+    MachineState *ms = MACHINE(s);
+    uint32_t aplic_m_phandle, aplic_s_phandle;
+
+    aplic_m_phandle = (*phandle)++;
+    aplic_s_phandle = (*phandle)++;
+
+    if (!kvm_enabled()) {
+        /* M-level APLIC node */
+        aplic_addr = memmap[VIRT_APLIC_M].base +
+                     (memmap[VIRT_APLIC_M].size * socket);
+        create_fdt_one_aplic(s, socket, aplic_addr, memmap[VIRT_APLIC_M].size,
+                             msi_m_phandle, intc_phandles,
+                             aplic_m_phandle, aplic_s_phandle,
+                             true);
     }
+
+    /* S-level APLIC node */
     aplic_addr = memmap[VIRT_APLIC_S].base +
                  (memmap[VIRT_APLIC_S].size * socket);
+    create_fdt_one_aplic(s, socket, aplic_addr, memmap[VIRT_APLIC_S].size,
+                         msi_s_phandle, intc_phandles,
+                         aplic_s_phandle, 0,
+                         false);
+
     aplic_name = g_strdup_printf("/soc/aplic@%lx", aplic_addr);
-    qemu_fdt_add_subnode(ms->fdt, aplic_name);
-    qemu_fdt_setprop_string(ms->fdt, aplic_name, "compatible", "riscv,aplic");
-    qemu_fdt_setprop_cell(ms->fdt, aplic_name,
-        "#interrupt-cells", FDT_APLIC_INT_CELLS);
-    qemu_fdt_setprop(ms->fdt, aplic_name, "interrupt-controller", NULL, 0);
-    if (s->aia_type == VIRT_AIA_TYPE_APLIC) {
-        qemu_fdt_setprop(ms->fdt, aplic_name, "interrupts-extended",
-            aplic_cells, s->soc[socket].num_harts * sizeof(uint32_t) * 2);
-    } else {
-        qemu_fdt_setprop_cell(ms->fdt, aplic_name, "msi-parent",
-            msi_s_phandle);
-    }
-    qemu_fdt_setprop_cells(ms->fdt, aplic_name, "reg",
-        0x0, aplic_addr, 0x0, memmap[VIRT_APLIC_S].size);
-    qemu_fdt_setprop_cell(ms->fdt, aplic_name, "riscv,num-sources",
-        VIRT_IRQCHIP_NUM_SOURCES);
-    riscv_socket_fdt_write_id(ms, aplic_name, socket);
-    qemu_fdt_setprop_cell(ms->fdt, aplic_name, "phandle", aplic_s_phandle);
 
     if (!socket) {
         platform_bus_add_all_fdt_nodes(ms->fdt, aplic_name,
@@ -XXX,XX +XXX,XX @@ static void create_fdt_socket_aplic(RISCVVirtState *s,
 
     g_free(aplic_name);
 
-    g_free(aplic_cells);
     aplic_phandles[socket] = aplic_s_phandle;
 }
 
@@ -XXX,XX +XXX,XX @@ static DeviceState *virt_create_aia(RISCVVirtAIAType aia_type, int aia_guests,
     int i;
     hwaddr addr;
     uint32_t guest_bits;
-    DeviceState *aplic_m;
-    bool msimode = (aia_type == VIRT_AIA_TYPE_APLIC_IMSIC) ? true : false;
+    DeviceState *aplic_s = NULL;
+    DeviceState *aplic_m = NULL;
+    bool msimode = aia_type == VIRT_AIA_TYPE_APLIC_IMSIC;
 
     if (msimode) {
-        /* Per-socket M-level IMSICs */
-        addr = memmap[VIRT_IMSIC_M].base + socket * VIRT_IMSIC_GROUP_MAX_SIZE;
-        for (i = 0; i < hart_count; i++) {
-            riscv_imsic_create(addr + i * IMSIC_HART_SIZE(0),
-                               base_hartid + i, true, 1,
-                               VIRT_IRQCHIP_NUM_MSIS);
+        if (!kvm_enabled()) {
+            /* Per-socket M-level IMSICs */
+            addr = memmap[VIRT_IMSIC_M].base +
+                   socket * VIRT_IMSIC_GROUP_MAX_SIZE;
+            for (i = 0; i < hart_count; i++) {
+                riscv_imsic_create(addr + i * IMSIC_HART_SIZE(0),
+                                   base_hartid + i, true, 1,
+                                   VIRT_IRQCHIP_NUM_MSIS);
+            }
         }
 
         /* Per-socket S-level IMSICs */
@@ -XXX,XX +XXX,XX @@ static DeviceState *virt_create_aia(RISCVVirtAIAType aia_type, int aia_guests,
         }
     }
 
-    /* Per-socket M-level APLIC */
-    aplic_m = riscv_aplic_create(
-        memmap[VIRT_APLIC_M].base + socket * memmap[VIRT_APLIC_M].size,
-        memmap[VIRT_APLIC_M].size,
-        (msimode) ? 0 : base_hartid,
-        (msimode) ? 0 : hart_count,
-        VIRT_IRQCHIP_NUM_SOURCES,
-        VIRT_IRQCHIP_NUM_PRIO_BITS,
-        msimode, true, NULL);
-
-    if (aplic_m) {
-        /* Per-socket S-level APLIC */
-        riscv_aplic_create(
-            memmap[VIRT_APLIC_S].base + socket * memmap[VIRT_APLIC_S].size,
-            memmap[VIRT_APLIC_S].size,
-            (msimode) ? 0 : base_hartid,
-            (msimode) ? 0 : hart_count,
-            VIRT_IRQCHIP_NUM_SOURCES,
-            VIRT_IRQCHIP_NUM_PRIO_BITS,
-            msimode, false, aplic_m);
+    if (!kvm_enabled()) {
+        /* Per-socket M-level APLIC */
+        aplic_m = riscv_aplic_create(memmap[VIRT_APLIC_M].base +
+                                     socket * memmap[VIRT_APLIC_M].size,
+                                     memmap[VIRT_APLIC_M].size,
+                                     (msimode) ? 0 : base_hartid,
+                                     (msimode) ? 0 : hart_count,
+                                     VIRT_IRQCHIP_NUM_SOURCES,
+                                     VIRT_IRQCHIP_NUM_PRIO_BITS,
+                                     msimode, true, NULL);
     }
 
-    return aplic_m;
+    /* Per-socket S-level APLIC */
+    aplic_s = riscv_aplic_create(memmap[VIRT_APLIC_S].base +
+                                 socket * memmap[VIRT_APLIC_S].size,
+                                 memmap[VIRT_APLIC_S].size,
+                                 (msimode) ? 0 : base_hartid,
+                                 (msimode) ? 0 : hart_count,
+                                 VIRT_IRQCHIP_NUM_SOURCES,
+                                 VIRT_IRQCHIP_NUM_PRIO_BITS,
+                                 msimode, false, aplic_m);
+
+    return kvm_enabled() ? aplic_s : aplic_m;
 }
 
 static void create_platform_bus(RISCVVirtState *s, DeviceState *irqchip)
-- 
2.41.0

From: Yong-Xuan Wang <yongxuan.wang@sifive.com>

We check the in-kernel irqchip support when using KVM acceleration.

Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Jim Shu <jim.shu@sifive.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Message-ID: <20230727102439.22554-3-yongxuan.wang@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/kvm.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 
 int kvm_arch_irqchip_create(KVMState *s)
 {
-    return 0;
+    if (kvm_kernel_irqchip_split()) {
+        error_report("-machine kernel_irqchip=split is not supported on RISC-V.");
+        exit(1);
+    }
+
+    /*
+     * We can create the VAIA using the newer device control API.
+     */
+    return kvm_check_extension(s, KVM_CAP_DEVICE_CTRL);
 }
 
 int kvm_arch_process_async_events(CPUState *cs)
-- 
2.41.0

From: Yong-Xuan Wang <yongxuan.wang@sifive.com>

We create a vAIA chip by using the KVM_DEV_TYPE_RISCV_AIA and then set up
the chip with the KVM_DEV_RISCV_AIA_GRP_* APIs.
We also extend KVM accelerator to specify the KVM AIA mode. The "riscv-aia"
parameter is passed along with --accel in QEMU command-line.
1) "riscv-aia=emul": IMSIC is emulated by hypervisor
2) "riscv-aia=hwaccel": use hardware guest IMSIC
3) "riscv-aia=auto": use the hardware guest IMSICs whenever available
                     otherwise we fallback to software emulation.

Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Jim Shu <jim.shu@sifive.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Message-ID: <20230727102439.22554-4-yongxuan.wang@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/kvm_riscv.h |   4 +
 target/riscv/kvm.c       | 186 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 190 insertions(+)

diff --git a/target/riscv/kvm_riscv.h b/target/riscv/kvm_riscv.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/kvm_riscv.h
+++ b/target/riscv/kvm_riscv.h
@@ -XXX,XX +XXX,XX @@
 void kvm_riscv_init_user_properties(Object *cpu_obj);
 void kvm_riscv_reset_vcpu(RISCVCPU *cpu);
 void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level);
+void kvm_riscv_aia_create(MachineState *machine, uint64_t group_shift,
+                          uint64_t aia_irq_num, uint64_t aia_msi_num,
+                          uint64_t aplic_base, uint64_t imsic_base,
+                          uint64_t guest_num);
 
 #endif
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -XXX,XX +XXX,XX @@
 #include "exec/address-spaces.h"
 #include "hw/boards.h"
 #include "hw/irq.h"
+#include "hw/intc/riscv_imsic.h"
 #include "qemu/log.h"
 #include "hw/loader.h"
 #include "kvm_riscv.h"
@@ -XXX,XX +XXX,XX @@
 #include "chardev/char-fe.h"
 #include "migration/migration.h"
 #include "sysemu/runstate.h"
+#include "hw/riscv/numa.h"
 
 static uint64_t kvm_riscv_reg_id(CPURISCVState *env, uint64_t type,
                                  uint64_t idx)
@@ -XXX,XX +XXX,XX @@ bool kvm_arch_cpu_check_are_resettable(void)
     return true;
 }
 
+static int aia_mode;
+
+static const char *kvm_aia_mode_str(uint64_t mode)
+{
+    switch (mode) {
+    case KVM_DEV_RISCV_AIA_MODE_EMUL:
+        return "emul";
+    case KVM_DEV_RISCV_AIA_MODE_HWACCEL:
+        return "hwaccel";
+    case KVM_DEV_RISCV_AIA_MODE_AUTO:
+    default:
+        return "auto";
+    };
+}
+
+static char *riscv_get_kvm_aia(Object *obj, Error **errp)
+{
+    return g_strdup(kvm_aia_mode_str(aia_mode));
+}
+
+static void riscv_set_kvm_aia(Object *obj, const char *val, Error **errp)
+{
+    if (!strcmp(val, "emul")) {
+        aia_mode = KVM_DEV_RISCV_AIA_MODE_EMUL;
+    } else if (!strcmp(val, "hwaccel")) {
+        aia_mode = KVM_DEV_RISCV_AIA_MODE_HWACCEL;
+    } else if (!strcmp(val, "auto")) {
+        aia_mode = KVM_DEV_RISCV_AIA_MODE_AUTO;
+    } else {
+        error_setg(errp, "Invalid KVM AIA mode");
+        error_append_hint(errp, "Valid values are emul, hwaccel, and auto.\n");
+    }
+}
+
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
+    object_class_property_add_str(oc, "riscv-aia", riscv_get_kvm_aia,
+                                  riscv_set_kvm_aia);
+    object_class_property_set_description(oc, "riscv-aia",
+                                          "Set KVM AIA mode. Valid values are "
+                                          "emul, hwaccel, and auto. Default "
+                                          "is auto.");
+    object_property_set_default_str(object_class_property_find(oc, "riscv-aia"),
+                                    "auto");
+}
+
+void kvm_riscv_aia_create(MachineState *machine, uint64_t group_shift,
+                          uint64_t aia_irq_num, uint64_t aia_msi_num,
+                          uint64_t aplic_base, uint64_t imsic_base,
+                          uint64_t guest_num)
+{
+    int ret, i;
+    int aia_fd = -1;
+    uint64_t default_aia_mode;
+    uint64_t socket_count = riscv_socket_count(machine);
+    uint64_t max_hart_per_socket = 0;
+    uint64_t socket, base_hart, hart_count, socket_imsic_base, imsic_addr;
+    uint64_t socket_bits, hart_bits, guest_bits;
+
+    aia_fd = kvm_create_device(kvm_state, KVM_DEV_TYPE_RISCV_AIA, false);
+
+    if (aia_fd < 0) {
+        error_report("Unable to create in-kernel irqchip");
+        exit(1);
+    }
+
+    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
+                            KVM_DEV_RISCV_AIA_CONFIG_MODE,
+                            &default_aia_mode, false, NULL);
+    if (ret < 0) {
+        error_report("KVM AIA: failed to get current KVM AIA mode");
+        exit(1);
+    }
+    qemu_log("KVM AIA: default mode is %s\n",
+             kvm_aia_mode_str(default_aia_mode));
+
+    if (default_aia_mode != aia_mode) {
+        ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
+                                KVM_DEV_RISCV_AIA_CONFIG_MODE,
+                                &aia_mode, true, NULL);
+        if (ret < 0)
+            warn_report("KVM AIA: failed to set KVM AIA mode");
+        else
+            qemu_log("KVM AIA: set current mode to %s\n",
+                     kvm_aia_mode_str(aia_mode));
+    }
+
+    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
+                            KVM_DEV_RISCV_AIA_CONFIG_SRCS,
+                            &aia_irq_num, true, NULL);
+    if (ret < 0) {
+        error_report("KVM AIA: failed to set number of input irq lines");
+        exit(1);
+    }
+
+    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
+                            KVM_DEV_RISCV_AIA_CONFIG_IDS,
+                            &aia_msi_num, true, NULL);
+    if (ret < 0) {
+        error_report("KVM AIA: failed to set number of msi");
+        exit(1);
+    }
+
+    socket_bits = find_last_bit(&socket_count, BITS_PER_LONG) + 1;
+    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
+                            KVM_DEV_RISCV_AIA_CONFIG_GROUP_BITS,
+                            &socket_bits, true, NULL);
+    if (ret < 0) {
+        error_report("KVM AIA: failed to set group_bits");
+        exit(1);
+    }
+
+    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
+                            KVM_DEV_RISCV_AIA_CONFIG_GROUP_SHIFT,
+                            &group_shift, true, NULL);
+    if (ret < 0) {
+        error_report("KVM AIA: failed to set group_shift");
+        exit(1);
+    }
+
+    guest_bits = guest_num == 0 ? 0 :
+                 find_last_bit(&guest_num, BITS_PER_LONG) + 1;
+    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
+                            KVM_DEV_RISCV_AIA_CONFIG_GUEST_BITS,
+                            &guest_bits, true, NULL);
+    if (ret < 0) {
+        error_report("KVM AIA: failed to set guest_bits");
+        exit(1);
+    }
+
+    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_ADDR,
+                            KVM_DEV_RISCV_AIA_ADDR_APLIC,
+                            &aplic_base, true, NULL);
+    if (ret < 0) {
+        error_report("KVM AIA: failed to set the base address of APLIC");
+        exit(1);
+    }
+
+    for (socket = 0; socket < socket_count; socket++) {
+        socket_imsic_base = imsic_base + socket * (1U << group_shift);
+        hart_count = riscv_socket_hart_count(machine, socket);
+        base_hart = riscv_socket_first_hartid(machine, socket);
+
+        if (max_hart_per_socket < hart_count) {
+            max_hart_per_socket = hart_count;
+        }
+
+        for (i = 0; i < hart_count; i++) {
+            imsic_addr = socket_imsic_base + i * IMSIC_HART_SIZE(guest_bits);
+            ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_ADDR,
+                                    KVM_DEV_RISCV_AIA_ADDR_IMSIC(i + base_hart),
+                                    &imsic_addr, true, NULL);
+            if (ret < 0) {
+                error_report("KVM AIA: failed to set the IMSIC address for hart %d", i);
+                exit(1);
+            }
+        }
+    }
+
+    hart_bits = find_last_bit(&max_hart_per_socket, BITS_PER_LONG) + 1;
+    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CONFIG,
+                            KVM_DEV_RISCV_AIA_CONFIG_HART_BITS,
+                            &hart_bits, true, NULL);
+    if (ret < 0) {
+        error_report("KVM AIA: failed to set hart_bits");
+        exit(1);
+    }
+
+    if (kvm_has_gsi_routing()) {
+        for (uint64_t idx = 0; idx < aia_irq_num + 1; ++idx) {
+            /* KVM AIA only has one APLIC instance */
+            kvm_irqchip_add_irq_route(kvm_state, idx, 0, idx);
+        }
+        kvm_gsi_routing_allowed = true;
+        kvm_irqchip_commit_routes(kvm_state);
+    }
+
+    ret = kvm_device_access(aia_fd, KVM_DEV_RISCV_AIA_GRP_CTRL,
+                            KVM_DEV_RISCV_AIA_CTRL_INIT,
+                            NULL, true, NULL);
+    if (ret < 0) {
+        error_report("KVM AIA: initialized fail");
+        exit(1);
+    }
+
+    kvm_msi_via_irqfd_allowed = kvm_irqfds_enabled();
 }
-- 
2.41.0

From: Yong-Xuan Wang <yongxuan.wang@sifive.com>

KVM AIA can't emulate APLIC only. When "aia=aplic" parameter is passed,
APLIC devices is emulated by QEMU. For "aia=aplic-imsic", remove the
mmio operations of APLIC when using KVM AIA and send wired interrupt
signal via KVM_IRQ_LINE API.
After KVM AIA enabled, MSI messages are delivered by KVM_SIGNAL_MSI API
when the IMSICs receive mmio write requests.

Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Jim Shu <jim.shu@sifive.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Message-ID: <20230727102439.22554-5-yongxuan.wang@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 hw/intc/riscv_aplic.c | 56 ++++++++++++++++++++++++++++++-------------
 hw/intc/riscv_imsic.c | 25 +++++++++++++++----
 2 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/riscv_aplic.c
+++ b/hw/intc/riscv_aplic.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/irq.h"
 #include "target/riscv/cpu.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
 #include "migration/vmstate.h"
 
 #define APLIC_MAX_IDC                  (1UL << 14)
@@ -XXX,XX +XXX,XX @@
 
 #define APLIC_IDC_CLAIMI               0x1c
 
+/*
+ * KVM AIA only supports APLIC MSI, fallback to QEMU emulation if we want to use
+ * APLIC Wired.
+ */
+static bool is_kvm_aia(bool msimode)
+{
+    return kvm_irqchip_in_kernel() && msimode;
+}
+
 static uint32_t riscv_aplic_read_input_word(RISCVAPLICState *aplic,
                                             uint32_t word)
 {
@@ -XXX,XX +XXX,XX @@ static uint32_t riscv_aplic_idc_claimi(RISCVAPLICState *aplic, uint32_t idc)
     return topi;
 }
 
+static void riscv_kvm_aplic_request(void *opaque, int irq, int level)
+{
+    kvm_set_irq(kvm_state, irq, !!level);
+}
+
 static void riscv_aplic_request(void *opaque, int irq, int level)
 {
     bool update = false;
@@ -XXX,XX +XXX,XX @@ static void riscv_aplic_realize(DeviceState *dev, Error **errp)
     uint32_t i;
     RISCVAPLICState *aplic = RISCV_APLIC(dev);
 
-    aplic->bitfield_words = (aplic->num_irqs + 31) >> 5;
-    aplic->sourcecfg = g_new0(uint32_t, aplic->num_irqs);
-    aplic->state = g_new0(uint32_t, aplic->num_irqs);
-    aplic->target = g_new0(uint32_t, aplic->num_irqs);
-    if (!aplic->msimode) {
-        for (i = 0; i < aplic->num_irqs; i++) {
-            aplic->target[i] = 1;
+    if (!is_kvm_aia(aplic->msimode)) {
+        aplic->bitfield_words = (aplic->num_irqs + 31) >> 5;
+        aplic->sourcecfg = g_new0(uint32_t, aplic->num_irqs);
+        aplic->state = g_new0(uint32_t, aplic->num_irqs);
+        aplic->target = g_new0(uint32_t, aplic->num_irqs);
+        if (!aplic->msimode) {
+            for (i = 0; i < aplic->num_irqs; i++) {
+                aplic->target[i] = 1;
+            }
         }
-    }
-    aplic->idelivery = g_new0(uint32_t, aplic->num_harts);
-    aplic->iforce = g_new0(uint32_t, aplic->num_harts);
-    aplic->ithreshold = g_new0(uint32_t, aplic->num_harts);
+        aplic->idelivery = g_new0(uint32_t, aplic->num_harts);
+        aplic->iforce = g_new0(uint32_t, aplic->num_harts);
+        aplic->ithreshold = g_new0(uint32_t, aplic->num_harts);
 
-    memory_region_init_io(&aplic->mmio, OBJECT(dev), &riscv_aplic_ops, aplic,
-                          TYPE_RISCV_APLIC, aplic->aperture_size);
-    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &aplic->mmio);
+        memory_region_init_io(&aplic->mmio, OBJECT(dev), &riscv_aplic_ops,
+                              aplic, TYPE_RISCV_APLIC, aplic->aperture_size);
+        sysbus_init_mmio(SYS_BUS_DEVICE(dev), &aplic->mmio);
+    }
 
     /*
      * Only root APLICs have hardware IRQ lines. All non-root APLICs
      * have IRQ lines delegated by their parent APLIC.
      */
     if (!aplic->parent) {
-        qdev_init_gpio_in(dev, riscv_aplic_request, aplic->num_irqs);
+        if (is_kvm_aia(aplic->msimode)) {
+            qdev_init_gpio_in(dev, riscv_kvm_aplic_request, aplic->num_irqs);
+        } else {
+            qdev_init_gpio_in(dev, riscv_aplic_request, aplic->num_irqs);
+        }
     }
 
     /* Create output IRQ lines for non-MSI mode */
@@ -XXX,XX +XXX,XX @@ DeviceState *riscv_aplic_create(hwaddr addr, hwaddr size,
     qdev_prop_set_bit(dev, "mmode", mmode);
 
     sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
-    sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, addr);
+
+    if (!is_kvm_aia(msimode)) {
+        sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, addr);
+    }
 
     if (parent) {
         riscv_aplic_add_child(parent, dev);
diff --git a/hw/intc/riscv_imsic.c b/hw/intc/riscv_imsic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/riscv_imsic.c
+++ b/hw/intc/riscv_imsic.c
@@ -XXX,XX +XXX,XX @@
 #include "target/riscv/cpu.h"
 #include "target/riscv/cpu_bits.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
 #include "migration/vmstate.h"
 
 #define IMSIC_MMIO_PAGE_LE             0x00
@@ -XXX,XX +XXX,XX @@ static void riscv_imsic_write(void *opaque, hwaddr addr, uint64_t value,
         goto err;
     }
 
+#if defined(CONFIG_KVM)
+    if (kvm_irqchip_in_kernel()) {
+        struct kvm_msi msi;
+
+        msi.address_lo = extract64(imsic->mmio.addr + addr, 0, 32);
+        msi.address_hi = extract64(imsic->mmio.addr + addr, 32, 32);
+        msi.data = le32_to_cpu(value);
+
+        kvm_vm_ioctl(kvm_state, KVM_SIGNAL_MSI, &msi);
+
+        return;
+    }
+#endif
+
     /* Writes only supported for MSI little-endian registers */
     page = addr >> IMSIC_MMIO_PAGE_SHIFT;
     if ((addr & (IMSIC_MMIO_PAGE_SZ - 1)) == IMSIC_MMIO_PAGE_LE) {
@@ -XXX,XX +XXX,XX @@ static void riscv_imsic_realize(DeviceState *dev, Error **errp)
     CPUState *cpu = cpu_by_arch_id(imsic->hartid);
     CPURISCVState *env = cpu ? cpu->env_ptr : NULL;
 
-    imsic->num_eistate = imsic->num_pages * imsic->num_irqs;
-    imsic->eidelivery = g_new0(uint32_t, imsic->num_pages);
-    imsic->eithreshold = g_new0(uint32_t, imsic->num_pages);
-    imsic->eistate = g_new0(uint32_t, imsic->num_eistate);
+    if (!kvm_irqchip_in_kernel()) {
+        imsic->num_eistate = imsic->num_pages * imsic->num_irqs;
+        imsic->eidelivery = g_new0(uint32_t, imsic->num_pages);
+        imsic->eithreshold = g_new0(uint32_t, imsic->num_pages);
+        imsic->eistate = g_new0(uint32_t, imsic->num_eistate);
+    }
 
     memory_region_init_io(&imsic->mmio, OBJECT(dev), &riscv_imsic_ops,
                           imsic, TYPE_RISCV_IMSIC,
-- 
2.41.0

From: Yong-Xuan Wang <yongxuan.wang@sifive.com>

Select KVM AIA when the host kernel has in-kernel AIA chip support.
Since KVM AIA only has one APLIC instance, we map the QEMU APLIC
devices to KVM APLIC.

Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Jim Shu <jim.shu@sifive.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Message-ID: <20230727102439.22554-6-yongxuan.wang@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 hw/riscv/virt.c | 94 +++++++++++++++++++++++++++++++++----------------
 1 file changed, 63 insertions(+), 31 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/riscv/virt.h"
 #include "hw/riscv/boot.h"
 #include "hw/riscv/numa.h"
+#include "kvm_riscv.h"
 #include "hw/intc/riscv_aclint.h"
 #include "hw/intc/riscv_aplic.h"
 #include "hw/intc/riscv_imsic.h"
@@ -XXX,XX +XXX,XX @@
 #error "Can't accommodate all IMSIC groups in address space"
 #endif
 
+/* KVM AIA only supports APLIC MSI. APLIC Wired is always emulated by QEMU. */
+static bool virt_use_kvm_aia(RISCVVirtState *s)
+{
+    return kvm_irqchip_in_kernel() && s->aia_type == VIRT_AIA_TYPE_APLIC_IMSIC;
+}
+
 static const MemMapEntry virt_memmap[] = {
     [VIRT_DEBUG] =        {        0x0,         0x100 },
     [VIRT_MROM] =         {     0x1000,        0xf000 },
@@ -XXX,XX +XXX,XX @@ static void create_fdt_one_aplic(RISCVVirtState *s, int socket,
                                  uint32_t *intc_phandles,
                                  uint32_t aplic_phandle,
                                  uint32_t aplic_child_phandle,
-                                 bool m_mode)
+                                 bool m_mode, int num_harts)
 {
     int cpu;
     char *aplic_name;
     uint32_t *aplic_cells;
     MachineState *ms = MACHINE(s);
 
-    aplic_cells = g_new0(uint32_t, s->soc[socket].num_harts * 2);
+    aplic_cells = g_new0(uint32_t, num_harts * 2);
 
-    for (cpu = 0; cpu < s->soc[socket].num_harts; cpu++) {
+    for (cpu = 0; cpu < num_harts; cpu++) {
         aplic_cells[cpu * 2 + 0] = cpu_to_be32(intc_phandles[cpu]);
         aplic_cells[cpu * 2 + 1] = cpu_to_be32(m_mode ? IRQ_M_EXT : IRQ_S_EXT);
     }
@@ -XXX,XX +XXX,XX @@ static void create_fdt_one_aplic(RISCVVirtState *s, int socket,
 
     if (s->aia_type == VIRT_AIA_TYPE_APLIC) {
         qemu_fdt_setprop(ms->fdt, aplic_name, "interrupts-extended",
-                         aplic_cells,
-                         s->soc[socket].num_harts * sizeof(uint32_t) * 2);
+                         aplic_cells, num_harts * sizeof(uint32_t) * 2);
     } else {
         qemu_fdt_setprop_cell(ms->fdt, aplic_name, "msi-parent", msi_phandle);
     }
@@ -XXX,XX +XXX,XX @@ static void create_fdt_socket_aplic(RISCVVirtState *s,
                                     uint32_t msi_s_phandle,
                                     uint32_t *phandle,
                                     uint32_t *intc_phandles,
-                                    uint32_t *aplic_phandles)
+                                    uint32_t *aplic_phandles,
+                                    int num_harts)
 {
     char *aplic_name;
     unsigned long aplic_addr;
@@ -XXX,XX +XXX,XX @@ static void create_fdt_socket_aplic(RISCVVirtState *s,
         create_fdt_one_aplic(s, socket, aplic_addr, memmap[VIRT_APLIC_M].size,
                              msi_m_phandle, intc_phandles,
                              aplic_m_phandle, aplic_s_phandle,
-                             true);
+                             true, num_harts);
     }
 
     /* S-level APLIC node */
@@ -XXX,XX +XXX,XX @@ static void create_fdt_socket_aplic(RISCVVirtState *s,
     create_fdt_one_aplic(s, socket, aplic_addr, memmap[VIRT_APLIC_S].size,
                          msi_s_phandle, intc_phandles,
                          aplic_s_phandle, 0,
-                         false);
+                         false, num_harts);
 
     aplic_name = g_strdup_printf("/soc/aplic@%lx", aplic_addr);
 
@@ -XXX,XX +XXX,XX @@ static void create_fdt_sockets(RISCVVirtState *s, const MemMapEntry *memmap,
         *msi_pcie_phandle = msi_s_phandle;
     }
 
-    phandle_pos = ms->smp.cpus;
-    for (socket = (socket_count - 1); socket >= 0; socket--) {
-        phandle_pos -= s->soc[socket].num_harts;
-
-        if (s->aia_type == VIRT_AIA_TYPE_NONE) {
-            create_fdt_socket_plic(s, memmap, socket, phandle,
-                &intc_phandles[phandle_pos], xplic_phandles);
-        } else {
-            create_fdt_socket_aplic(s, memmap, socket,
-                msi_m_phandle, msi_s_phandle, phandle,
-                &intc_phandles[phandle_pos], xplic_phandles);
+    /* KVM AIA only has one APLIC instance */
+    if (virt_use_kvm_aia(s)) {
+        create_fdt_socket_aplic(s, memmap, 0,
+                                msi_m_phandle, msi_s_phandle, phandle,
+                                &intc_phandles[0], xplic_phandles,
+                                ms->smp.cpus);
+    } else {
+        phandle_pos = ms->smp.cpus;
+        for (socket = (socket_count - 1); socket >= 0; socket--) {
+            phandle_pos -= s->soc[socket].num_harts;
+
+            if (s->aia_type == VIRT_AIA_TYPE_NONE) {
+                create_fdt_socket_plic(s, memmap, socket, phandle,
+                                       &intc_phandles[phandle_pos],
+                                       xplic_phandles);
+            } else {
+                create_fdt_socket_aplic(s, memmap, socket,
+                                        msi_m_phandle, msi_s_phandle, phandle,
+                                        &intc_phandles[phandle_pos],
+                                        xplic_phandles,
+                                        s->soc[socket].num_harts);
+            }
         }
     }
 
     g_free(intc_phandles);
 
-    for (socket = 0; socket < socket_count; socket++) {
-        if (socket == 0) {
-            *irq_mmio_phandle = xplic_phandles[socket];
-            *irq_virtio_phandle = xplic_phandles[socket];
-            *irq_pcie_phandle = xplic_phandles[socket];
-        }
-        if (socket == 1) {
-            *irq_virtio_phandle = xplic_phandles[socket];
-            *irq_pcie_phandle = xplic_phandles[socket];
-        }
-        if (socket == 2) {
-            *irq_pcie_phandle = xplic_phandles[socket];
+    if (virt_use_kvm_aia(s)) {
+        *irq_mmio_phandle = xplic_phandles[0];
+        *irq_virtio_phandle = xplic_phandles[0];
+        *irq_pcie_phandle = xplic_phandles[0];
+    } else {
+        for (socket = 0; socket < socket_count; socket++) {
+            if (socket == 0) {
+                *irq_mmio_phandle = xplic_phandles[socket];
+                *irq_virtio_phandle = xplic_phandles[socket];
+                *irq_pcie_phandle = xplic_phandles[socket];
+            }
+            if (socket == 1) {
+                *irq_virtio_phandle = xplic_phandles[socket];
+                *irq_pcie_phandle = xplic_phandles[socket];
+            }
+            if (socket == 2) {
+                *irq_pcie_phandle = xplic_phandles[socket];
+            }
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static void virt_machine_init(MachineState *machine)
         }
     }
 
+    if (virt_use_kvm_aia(s)) {
+        kvm_riscv_aia_create(machine, IMSIC_MMIO_GROUP_MIN_SHIFT,
+                             VIRT_IRQCHIP_NUM_SOURCES, VIRT_IRQCHIP_NUM_MSIS,
+                             memmap[VIRT_APLIC_S].base,
+                             memmap[VIRT_IMSIC_S].base,
+                             s->aia_guests);
+    }
+
     if (riscv_is_32bit(&s->soc[0])) {
 #if HOST_LONG_BITS == 64
         /* limit RAM size in a 32-bit system */
-- 
2.41.0

From: Conor Dooley <conor.dooley@microchip.com>

On a dtb dumped from the virt machine, dt-validate complains:
soc: pmu: {'riscv,event-to-mhpmcounters': [[1, 1, 524281], [2, 2, 524284], [65561, 65561, 524280], [65563, 65563, 524280], [65569, 65569, 524280]], 'compatible': ['riscv,pmu']} should not be valid under {'type': 'object'}
        from schema $id: http://devicetree.org/schemas/simple-bus.yaml#
That's pretty cryptic, but running the dtb back through dtc produces
something a lot more reasonable:
Warning (simple_bus_reg): /soc/pmu: missing or empty reg/ranges property

Moving the riscv,pmu node out of the soc bus solves the problem.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20230727-groom-decline-2c57ce42841c@spud>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 hw/riscv/virt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -XXX,XX +XXX,XX @@ static void create_fdt_pmu(RISCVVirtState *s)
     MachineState *ms = MACHINE(s);
     RISCVCPU hart = s->soc[0].harts[0];
 
-    pmu_name = g_strdup_printf("/soc/pmu");
+    pmu_name = g_strdup_printf("/pmu");
     qemu_fdt_add_subnode(ms->fdt, pmu_name);
     qemu_fdt_setprop_string(ms->fdt, pmu_name, "compatible", "riscv,pmu");
     riscv_pmu_generate_fdt_node(ms->fdt, hart.cfg.pmu_num, pmu_name);
-- 
2.41.0

From: Weiwei Li <liweiwei@iscas.ac.cn>

The Svadu specification updated the name of the *envcfg bit from
HADE to ADUE.

Signed-off-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Junqiang Wang <wangjunqiang@iscas.ac.cn>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20230816141916.66898-1-liweiwei@iscas.ac.cn>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu_bits.h   |  8 ++++----
 target/riscv/cpu.c        |  4 ++--
 target/riscv/cpu_helper.c |  6 +++---
 target/riscv/csr.c        | 12 ++++++------
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -XXX,XX +XXX,XX @@ typedef enum RISCVException {
 #define MENVCFG_CBIE                       (3UL << 4)
 #define MENVCFG_CBCFE                      BIT(6)
 #define MENVCFG_CBZE                       BIT(7)
-#define MENVCFG_HADE                       (1ULL << 61)
+#define MENVCFG_ADUE                       (1ULL << 61)
 #define MENVCFG_PBMTE                      (1ULL << 62)
 #define MENVCFG_STCE                       (1ULL << 63)
 
 /* For RV32 */
-#define MENVCFGH_HADE                      BIT(29)
+#define MENVCFGH_ADUE                      BIT(29)
 #define MENVCFGH_PBMTE                     BIT(30)
 #define MENVCFGH_STCE                      BIT(31)
 
@@ -XXX,XX +XXX,XX @@ typedef enum RISCVException {
 #define HENVCFG_CBIE                       MENVCFG_CBIE
 #define HENVCFG_CBCFE                      MENVCFG_CBCFE
 #define HENVCFG_CBZE                       MENVCFG_CBZE
-#define HENVCFG_HADE                       MENVCFG_HADE
+#define HENVCFG_ADUE                       MENVCFG_ADUE
 #define HENVCFG_PBMTE                      MENVCFG_PBMTE
 #define HENVCFG_STCE                       MENVCFG_STCE
 
 /* For RV32 */
-#define HENVCFGH_HADE                       MENVCFGH_HADE
+#define HENVCFGH_ADUE                       MENVCFGH_ADUE
 #define HENVCFGH_PBMTE                      MENVCFGH_PBMTE
 #define HENVCFGH_STCE                       MENVCFGH_STCE
 
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static void riscv_cpu_reset_hold(Object *obj)
     env->two_stage_lookup = false;
 
     env->menvcfg = (cpu->cfg.ext_svpbmt ? MENVCFG_PBMTE : 0) |
-                   (cpu->cfg.ext_svadu ? MENVCFG_HADE : 0);
+                   (cpu->cfg.ext_svadu ? MENVCFG_ADUE : 0);
     env->henvcfg = (cpu->cfg.ext_svpbmt ? HENVCFG_PBMTE : 0) |
-                   (cpu->cfg.ext_svadu ? HENVCFG_HADE : 0);
+                   (cpu->cfg.ext_svadu ? HENVCFG_ADUE : 0);
 
     /* Initialized default priorities of local interrupts. */
     for (i = 0; i < ARRAY_SIZE(env->miprio); i++) {
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -XXX,XX +XXX,XX @@ static int get_physical_address(CPURISCVState *env, hwaddr *physical,
     }
 
     bool pbmte = env->menvcfg & MENVCFG_PBMTE;
-    bool hade = env->menvcfg & MENVCFG_HADE;
+    bool adue = env->menvcfg & MENVCFG_ADUE;
 
     if (first_stage && two_stage && env->virt_enabled) {
         pbmte = pbmte && (env->henvcfg & HENVCFG_PBMTE);
-        hade = hade && (env->henvcfg & HENVCFG_HADE);
+        adue = adue && (env->henvcfg & HENVCFG_ADUE);
     }
 
     int ptshift = (levels - 1) * ptidxbits;
@@ -XXX,XX +XXX,XX @@ restart:
 
     /* Page table updates need to be atomic with MTTCG enabled */
     if (updated_pte != pte && !is_debug) {
-        if (!hade) {
+        if (!adue) {
             return TRANSLATE_FAIL;
         }
 
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -XXX,XX +XXX,XX @@ static RISCVException write_menvcfg(CPURISCVState *env, int csrno,
     if (riscv_cpu_mxl(env) == MXL_RV64) {
         mask |= (cfg->ext_svpbmt ? MENVCFG_PBMTE : 0) |
                 (cfg->ext_sstc ? MENVCFG_STCE : 0) |
-                (cfg->ext_svadu ? MENVCFG_HADE : 0);
+                (cfg->ext_svadu ? MENVCFG_ADUE : 0);
     }
     env->menvcfg = (env->menvcfg & ~mask) | (val & mask);
 
@@ -XXX,XX +XXX,XX @@ static RISCVException write_menvcfgh(CPURISCVState *env, int csrno,
     const RISCVCPUConfig *cfg = riscv_cpu_cfg(env);
     uint64_t mask = (cfg->ext_svpbmt ? MENVCFG_PBMTE : 0) |
                     (cfg->ext_sstc ? MENVCFG_STCE : 0) |
-                    (cfg->ext_svadu ? MENVCFG_HADE : 0);
+                    (cfg->ext_svadu ? MENVCFG_ADUE : 0);
     uint64_t valh = (uint64_t)val << 32;
 
     env->menvcfg = (env->menvcfg & ~mask) | (valh & mask);
@@ -XXX,XX +XXX,XX @@ static RISCVException read_henvcfg(CPURISCVState *env, int csrno,
      * henvcfg.stce is read_only 0 when menvcfg.stce = 0
      * henvcfg.hade is read_only 0 when menvcfg.hade = 0
      */
-    *val = env->henvcfg & (~(HENVCFG_PBMTE | HENVCFG_STCE | HENVCFG_HADE) |
+    *val = env->henvcfg & (~(HENVCFG_PBMTE | HENVCFG_STCE | HENVCFG_ADUE) |
                            env->menvcfg);
     return RISCV_EXCP_NONE;
 }
@@ -XXX,XX +XXX,XX @@ static RISCVException write_henvcfg(CPURISCVState *env, int csrno,
     }
 
     if (riscv_cpu_mxl(env) == MXL_RV64) {
-        mask |= env->menvcfg & (HENVCFG_PBMTE | HENVCFG_STCE | HENVCFG_HADE);
+        mask |= env->menvcfg & (HENVCFG_PBMTE | HENVCFG_STCE | HENVCFG_ADUE);
     }
 
     env->henvcfg = (env->henvcfg & ~mask) | (val & mask);
@@ -XXX,XX +XXX,XX @@ static RISCVException read_henvcfgh(CPURISCVState *env, int csrno,
         return ret;
     }
 
-    *val = (env->henvcfg & (~(HENVCFG_PBMTE | HENVCFG_STCE | HENVCFG_HADE) |
+    *val = (env->henvcfg & (~(HENVCFG_PBMTE | HENVCFG_STCE | HENVCFG_ADUE) |
                             env->menvcfg)) >> 32;
     return RISCV_EXCP_NONE;
 }
@@ -XXX,XX +XXX,XX @@ static RISCVException write_henvcfgh(CPURISCVState *env, int csrno,
                                      target_ulong val)
 {
     uint64_t mask = env->menvcfg & (HENVCFG_PBMTE | HENVCFG_STCE |
-                                    HENVCFG_HADE);
+                                    HENVCFG_ADUE);
     uint64_t valh = (uint64_t)val << 32;
     RISCVException ret;
 
-- 
2.41.0

From: Daniel Henrique Barboza <dbarboza@ventanamicro.com>

In the same emulated RISC-V host, the 'host' KVM CPU takes 4 times
longer to boot than the 'rv64' KVM CPU.

The reason is an unintended behavior of riscv_cpu_satp_mode_finalize()
when satp_mode.supported = 0, i.e. when cpu_init() does not set
satp_mode_max_supported(). satp_mode_max_from_map(map) does:

31 - __builtin_clz(map)

This means that, if satp_mode.supported = 0, satp_mode_supported_max
wil be '31 - 32'. But this is C, so satp_mode_supported_max will gladly
set it to UINT_MAX (4294967295). After that, if the user didn't set a
satp_mode, set_satp_mode_default_map(cpu) will make

cfg.satp_mode.map = cfg.satp_mode.supported

So satp_mode.map = 0. And then satp_mode_map_max will be set to
satp_mode_max_from_map(cpu->cfg.satp_mode.map), i.e. also UINT_MAX. The
guard "satp_mode_map_max > satp_mode_supported_max" doesn't protect us
here since both are UINT_MAX.

And finally we have 2 loops:

for (int i = satp_mode_map_max - 1; i >= 0; --i) {

Which are, in fact, 2 loops from UINT_MAX -1 to -1. This is where the
extra delay when booting the 'host' CPU is coming from.

Commit 43d1de32f8 already set a precedence for satp_mode.supported = 0
in a different manner. We're doing the same here. If supported == 0,
interpret as 'the CPU wants the OS to handle satp mode alone' and skip
satp_mode_finalize().

We'll also put a guard in satp_mode_max_from_map() to assert out if map
is 0 since the function is not ready to deal with it.

Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Fixes: 6f23aaeb9b ("riscv: Allow user to set the satp mode")
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Message-ID: <20230817152903.694926-1-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static uint8_t satp_mode_from_str(const char *satp_mode_str)
 
 uint8_t satp_mode_max_from_map(uint32_t map)
 {
+    /*
+     * 'map = 0' will make us return (31 - 32), which C will
+     * happily overflow to UINT_MAX. There's no good result to
+     * return if 'map = 0' (e.g. returning 0 will be ambiguous
+     * with the result for 'map = 1').
+     *
+     * Assert out if map = 0. Callers will have to deal with
+     * it outside of this function.
+     */
+    g_assert(map > 0);
+
     /* map here has at least one bit set, so no problem with clz */
     return 31 - __builtin_clz(map);
 }
@@ -XXX,XX +XXX,XX @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
 static void riscv_cpu_satp_mode_finalize(RISCVCPU *cpu, Error **errp)
 {
     bool rv32 = riscv_cpu_mxl(&cpu->env) == MXL_RV32;
-    uint8_t satp_mode_map_max;
-    uint8_t satp_mode_supported_max =
-                        satp_mode_max_from_map(cpu->cfg.satp_mode.supported);
+    uint8_t satp_mode_map_max, satp_mode_supported_max;
+
+    /* The CPU wants the OS to decide which satp mode to use */
+    if (cpu->cfg.satp_mode.supported == 0) {
+        return;
+    }
+
+    satp_mode_supported_max =
+                    satp_mode_max_from_map(cpu->cfg.satp_mode.supported);
 
     if (cpu->cfg.satp_mode.map == 0) {
         if (cpu->cfg.satp_mode.init == 0) {
-- 
2.41.0

From: Vineet Gupta <vineetg@rivosinc.com>

zicond is now codegen supported in both llvm and gcc.

This change allows seamless enabling/testing of zicond in downstream
projects. e.g. currently riscv-gnu-toolchain parses elf attributes
to create a cmdline for qemu but fails short of enabling it because of
the "x-" prefix.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Message-ID: <20230808181715.436395-1-vineetg@rivosinc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
     DEFINE_PROP_BOOL("zcf", RISCVCPU, cfg.ext_zcf, false),
     DEFINE_PROP_BOOL("zcmp", RISCVCPU, cfg.ext_zcmp, false),
     DEFINE_PROP_BOOL("zcmt", RISCVCPU, cfg.ext_zcmt, false),
+    DEFINE_PROP_BOOL("zicond", RISCVCPU, cfg.ext_zicond, false),
 
     /* Vendor-specific custom extensions */
     DEFINE_PROP_BOOL("xtheadba", RISCVCPU, cfg.ext_xtheadba, false),
@@ -XXX,XX +XXX,XX @@ static Property riscv_cpu_extensions[] = {
     DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, false),
 
     /* These are experimental so mark with 'x-' */
-    DEFINE_PROP_BOOL("x-zicond", RISCVCPU, cfg.ext_zicond, false),
 
     /* ePMP 0.9.3 */
     DEFINE_PROP_BOOL("x-epmp", RISCVCPU, cfg.epmp, false),
-- 
2.41.0

From: Daniel Henrique Barboza <dbarboza@ventanamicro.com>

A build with --enable-debug and without KVM will fail as follows:

/usr/bin/ld: libqemu-riscv64-softmmu.fa.p/hw_riscv_virt.c.o: in function `virt_machine_init':
./qemu/build/../hw/riscv/virt.c:1465: undefined reference to `kvm_riscv_aia_create'

This happens because the code block with "if virt_use_kvm_aia(s)" isn't
being ignored by the debug build, resulting in an undefined reference to
a KVM only function.

Add a 'kvm_enabled()' conditional together with virt_use_kvm_aia() will
make the compiler crop the kvm_riscv_aia_create() call entirely from a
non-KVM build. Note that adding the 'kvm_enabled()' conditional inside
virt_use_kvm_aia() won't fix the build because this function would need
to be inlined multiple times to make the compiler zero out the entire
block.

While we're at it, use kvm_enabled() in all instances where
virt_use_kvm_aia() is checked to allow the compiler to elide these other
kvm-only instances as well.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Fixes: dbdb99948e ("target/riscv: select KVM AIA in riscv virt machine")
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20230830133503.711138-2-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 hw/riscv/virt.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -XXX,XX +XXX,XX @@ static void create_fdt_sockets(RISCVVirtState *s, const MemMapEntry *memmap,
     }
 
     /* KVM AIA only has one APLIC instance */
-    if (virt_use_kvm_aia(s)) {
+    if (kvm_enabled() && virt_use_kvm_aia(s)) {
         create_fdt_socket_aplic(s, memmap, 0,
                                 msi_m_phandle, msi_s_phandle, phandle,
                                 &intc_phandles[0], xplic_phandles,
@@ -XXX,XX +XXX,XX @@ static void create_fdt_sockets(RISCVVirtState *s, const MemMapEntry *memmap,
 
     g_free(intc_phandles);
 
-    if (virt_use_kvm_aia(s)) {
+    if (kvm_enabled() && virt_use_kvm_aia(s)) {
         *irq_mmio_phandle = xplic_phandles[0];
         *irq_virtio_phandle = xplic_phandles[0];
         *irq_pcie_phandle = xplic_phandles[0];
@@ -XXX,XX +XXX,XX @@ static void virt_machine_init(MachineState *machine)
         }
     }
 
-    if (virt_use_kvm_aia(s)) {
+    if (kvm_enabled() && virt_use_kvm_aia(s)) {
         kvm_riscv_aia_create(machine, IMSIC_MMIO_GROUP_MIN_SHIFT,
                              VIRT_IRQCHIP_NUM_SOURCES, VIRT_IRQCHIP_NUM_MSIS,
                              memmap[VIRT_APLIC_S].base,
-- 
2.41.0

From: Daniel Henrique Barboza <dbarboza@ventanamicro.com>

Commit 6df0b37e2ab breaks a --enable-debug build in a non-KVM
environment with the following error:

/usr/bin/ld: libqemu-riscv64-softmmu.fa.p/hw_intc_riscv_aplic.c.o: in function `riscv_kvm_aplic_request':
./qemu/build/../hw/intc/riscv_aplic.c:486: undefined reference to `kvm_set_irq'
collect2: error: ld returned 1 exit status

This happens because the debug build will poke into the
'if (is_kvm_aia(aplic->msimode))' block and fail to find a reference to
the KVM only function riscv_kvm_aplic_request().

There are multiple solutions to fix this. We'll go with the same
solution from the previous patch, i.e. add a kvm_enabled() conditional
to filter out the block. But there's a catch: riscv_kvm_aplic_request()
is a local function that would end up being used if the compiler crops
the block, and this won't work. Quoting Richard Henderson's explanation
in [1]:

"(...) the compiler won't eliminate entire unused functions with -O0"

We'll solve it by moving riscv_kvm_aplic_request() to kvm.c and add its
declaration in kvm_riscv.h, where all other KVM specific public
functions are already declared. Other archs handles KVM specific code in
this manner and we expect to do the same from now on.

[1] https://lore.kernel.org/qemu-riscv/d2f1ad02-eb03-138f-9d08-db676deeed05@linaro.org/

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20230830133503.711138-3-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/kvm_riscv.h | 1 +
 hw/intc/riscv_aplic.c    | 8 ++------
 target/riscv/kvm.c       | 5 +++++
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/target/riscv/kvm_riscv.h b/target/riscv/kvm_riscv.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/kvm_riscv.h
+++ b/target/riscv/kvm_riscv.h
@@ -XXX,XX +XXX,XX @@ void kvm_riscv_aia_create(MachineState *machine, uint64_t group_shift,
                           uint64_t aia_irq_num, uint64_t aia_msi_num,
                           uint64_t aplic_base, uint64_t imsic_base,
                           uint64_t guest_num);
+void riscv_kvm_aplic_request(void *opaque, int irq, int level);
 
 #endif
diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/riscv_aplic.c
+++ b/hw/intc/riscv_aplic.c
@@ -XXX,XX +XXX,XX @@
 #include "target/riscv/cpu.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
+#include "kvm_riscv.h"
 #include "migration/vmstate.h"
 
 #define APLIC_MAX_IDC                  (1UL << 14)
@@ -XXX,XX +XXX,XX @@ static uint32_t riscv_aplic_idc_claimi(RISCVAPLICState *aplic, uint32_t idc)
     return topi;
 }
 
-static void riscv_kvm_aplic_request(void *opaque, int irq, int level)
-{
-    kvm_set_irq(kvm_state, irq, !!level);
-}
-
 static void riscv_aplic_request(void *opaque, int irq, int level)
 {
     bool update = false;
@@ -XXX,XX +XXX,XX @@ static void riscv_aplic_realize(DeviceState *dev, Error **errp)
      * have IRQ lines delegated by their parent APLIC.
      */
     if (!aplic->parent) {
-        if (is_kvm_aia(aplic->msimode)) {
+        if (kvm_enabled() && is_kvm_aia(aplic->msimode)) {
             qdev_init_gpio_in(dev, riscv_kvm_aplic_request, aplic->num_irqs);
         } else {
             qdev_init_gpio_in(dev, riscv_aplic_request, aplic->num_irqs);
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -XXX,XX +XXX,XX @@
 #include "sysemu/runstate.h"
 #include "hw/riscv/numa.h"
 
+void riscv_kvm_aplic_request(void *opaque, int irq, int level)
+{
+    kvm_set_irq(kvm_state, irq, !!level);
+}
+
 static uint64_t kvm_riscv_reg_id(CPURISCVState *env, uint64_t type,
                                  uint64_t idx)
 {
-- 
2.41.0

From: Robbin Ehn <rehn@rivosinc.com>

This patch adds the new extensions in
linux 6.5 to the hwprobe syscall.

And fixes RVC check to OR with correct value.
The previous variable contains 0 therefore it
did work.

Signed-off-by: Robbin Ehn <rehn@rivosinc.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <bc82203b72d7efb30f1b4a8f9eb3d94699799dc8.camel@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 linux-user/syscall.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -XXX,XX +XXX,XX @@ static int do_getdents64(abi_long dirfd, abi_long arg2, abi_long count)
 #define RISCV_HWPROBE_KEY_IMA_EXT_0     4
 #define     RISCV_HWPROBE_IMA_FD       (1 << 0)
 #define     RISCV_HWPROBE_IMA_C        (1 << 1)
+#define     RISCV_HWPROBE_IMA_V        (1 << 2)
+#define     RISCV_HWPROBE_EXT_ZBA      (1 << 3)
+#define     RISCV_HWPROBE_EXT_ZBB      (1 << 4)
+#define     RISCV_HWPROBE_EXT_ZBS      (1 << 5)
 
 #define RISCV_HWPROBE_KEY_CPUPERF_0     5
 #define     RISCV_HWPROBE_MISALIGNED_UNKNOWN     (0 << 0)
@@ -XXX,XX +XXX,XX @@ static void risc_hwprobe_fill_pairs(CPURISCVState *env,
                     riscv_has_ext(env, RVD) ?
                     RISCV_HWPROBE_IMA_FD : 0;
             value |= riscv_has_ext(env, RVC) ?
-                     RISCV_HWPROBE_IMA_C : pair->value;
+                     RISCV_HWPROBE_IMA_C : 0;
+            value |= riscv_has_ext(env, RVV) ?
+                     RISCV_HWPROBE_IMA_V : 0;
+            value |= cfg->ext_zba ?
+                     RISCV_HWPROBE_EXT_ZBA : 0;
+            value |= cfg->ext_zbb ?
+                     RISCV_HWPROBE_EXT_ZBB : 0;
+            value |= cfg->ext_zbs ?
+                     RISCV_HWPROBE_EXT_ZBS : 0;
             __put_user(value, &pair->value);
             break;
         case RISCV_HWPROBE_KEY_CPUPERF_0:
-- 
2.41.0

From: Ard Biesheuvel <ardb@kernel.org>

Use the accelerated SubBytes/ShiftRows/AddRoundKey AES helper to
implement the first half of the key schedule derivation. This does not
actually involve shifting rows, so clone the same value into all four
columns of the AES vector to counter that operation.

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20230831154118.138727-1-ardb@kernel.org>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/crypto_helper.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/crypto_helper.c
+++ b/target/riscv/crypto_helper.c
@@ -XXX,XX +XXX,XX @@ target_ulong HELPER(aes64ks1i)(target_ulong rs1, target_ulong rnum)
 
     uint8_t enc_rnum = rnum;
     uint32_t temp = (RS1 >> 32) & 0xFFFFFFFF;
-    uint8_t rcon_ = 0;
-    target_ulong result;
+    AESState t, rc = {};
 
     if (enc_rnum != 0xA) {
         temp = ror32(temp, 8); /* Rotate right by 8 */
-        rcon_ = round_consts[enc_rnum];
+        rc.w[0] = rc.w[1] = round_consts[enc_rnum];
     }
 
-    temp = ((uint32_t)AES_sbox[(temp >> 24) & 0xFF] << 24) |
-           ((uint32_t)AES_sbox[(temp >> 16) & 0xFF] << 16) |
-           ((uint32_t)AES_sbox[(temp >> 8) & 0xFF] << 8) |
-           ((uint32_t)AES_sbox[(temp >> 0) & 0xFF] << 0);
+    t.w[0] = t.w[1] = t.w[2] = t.w[3] = temp;
+    aesenc_SB_SR_AK(&t, &t, &rc, false);
 
-    temp ^= rcon_;
-
-    result = ((uint64_t)temp << 32) | temp;
-
-    return result;
+    return t.d[0];
 }
 
 target_ulong HELPER(aes64im)(target_ulong rs1)
-- 
2.41.0

From: Akihiko Odaki <akihiko.odaki@daynix.com>

riscv_trigger_init() had been called on reset events that can happen
several times for a CPU and it allocated timers for itrigger. If old
timers were present, they were simply overwritten by the new timers,
resulting in a memory leak.

Divide riscv_trigger_init() into two functions, namely
riscv_trigger_realize() and riscv_trigger_reset() and call them in
appropriate timing. The timer allocation will happen only once for a
CPU in riscv_trigger_realize().

Fixes: 5a4ae64cac ("target/riscv: Add itrigger support when icount is enabled")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20230818034059.9146-1-akihiko.odaki@daynix.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/debug.h |  3 ++-
 target/riscv/cpu.c   |  8 +++++++-
 target/riscv/debug.c | 15 ++++++++++++---
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/target/riscv/debug.h b/target/riscv/debug.h
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/debug.h
+++ b/target/riscv/debug.h
@@ -XXX,XX +XXX,XX @@ void riscv_cpu_debug_excp_handler(CPUState *cs);
 bool riscv_cpu_debug_check_breakpoint(CPUState *cs);
 bool riscv_cpu_debug_check_watchpoint(CPUState *cs, CPUWatchpoint *wp);
 
-void riscv_trigger_init(CPURISCVState *env);
+void riscv_trigger_realize(CPURISCVState *env);
+void riscv_trigger_reset_hold(CPURISCVState *env);
 
 bool riscv_itrigger_enabled(CPURISCVState *env);
 void riscv_itrigger_update_priv(CPURISCVState *env);
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -XXX,XX +XXX,XX @@ static void riscv_cpu_reset_hold(Object *obj)
 
 #ifndef CONFIG_USER_ONLY
     if (cpu->cfg.debug) {
-        riscv_trigger_init(env);
+        riscv_trigger_reset_hold(env);
     }
 
     if (kvm_enabled()) {
@@ -XXX,XX +XXX,XX @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
 
     riscv_cpu_register_gdb_regs_for_features(cs);
 
+#ifndef CONFIG_USER_ONLY
+    if (cpu->cfg.debug) {
+        riscv_trigger_realize(&cpu->env);
+    }
+#endif
+
     qemu_init_vcpu(cs);
     cpu_reset(cs);
 
diff --git a/target/riscv/debug.c b/target/riscv/debug.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/debug.c
+++ b/target/riscv/debug.c
@@ -XXX,XX +XXX,XX @@ bool riscv_cpu_debug_check_watchpoint(CPUState *cs, CPUWatchpoint *wp)
     return false;
 }
 
-void riscv_trigger_init(CPURISCVState *env)
+void riscv_trigger_realize(CPURISCVState *env)
+{
+    int i;
+
+    for (i = 0; i < RV_MAX_TRIGGERS; i++) {
+        env->itrigger_timer[i] = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+                                              riscv_itrigger_timer_cb, env);
+    }
+}
+
+void riscv_trigger_reset_hold(CPURISCVState *env)
 {
     target_ulong tdata1 = build_tdata1(env, TRIGGER_TYPE_AD_MATCH, 0, 0);
     int i;
@@ -XXX,XX +XXX,XX @@ void riscv_trigger_init(CPURISCVState *env)
         env->tdata3[i] = 0;
         env->cpu_breakpoint[i] = NULL;
         env->cpu_watchpoint[i] = NULL;
-        env->itrigger_timer[i] = timer_new_ns(QEMU_CLOCK_VIRTUAL,
-                                              riscv_itrigger_timer_cb, env);
+        timer_del(env->itrigger_timer[i]);
     }
 }
-- 
2.41.0

From: Leon Schuermann <leons@opentitan.org>

When the rule-lock bypass (RLB) bit is set in the mseccfg CSR, the PMP
configuration lock bits must not apply. While this behavior is
implemented for the pmpcfgX CSRs, this bit is not respected for
changes to the pmpaddrX CSRs. This patch ensures that pmpaddrX CSR
writes work even on locked regions when the global rule-lock bypass is
enabled.

Signed-off-by: Leon Schuermann <leons@opentitan.org>
Reviewed-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20230829215046.1430463-1-leon@is.currently.online>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/pmp.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -XXX,XX +XXX,XX @@ static inline uint8_t pmp_get_a_field(uint8_t cfg)
  */
 static inline int pmp_is_locked(CPURISCVState *env, uint32_t pmp_index)
 {
+    /* mseccfg.RLB is set */
+    if (MSECCFG_RLB_ISSET(env)) {
+        return 0;
+    }
 
     if (env->pmp_state.pmp[pmp_index].cfg_reg & PMP_LOCK) {
         return 1;
-- 
2.41.0

From: Tommy Wu <tommy.wu@sifive.com>

According to the new spec, when vsiselect has a reserved value, attempts
from M-mode or HS-mode to access vsireg, or from VS-mode to access
sireg, should preferably raise an illegal instruction exception.

Signed-off-by: Tommy Wu <tommy.wu@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
Message-ID: <20230816061647.600672-1-tommy.wu@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/csr.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -XXX,XX +XXX,XX @@ static int rmw_iprio(target_ulong xlen,
 static int rmw_xireg(CPURISCVState *env, int csrno, target_ulong *val,
                      target_ulong new_val, target_ulong wr_mask)
 {
-    bool virt;
+    bool virt, isel_reserved;
     uint8_t *iprio;
     int ret = -EINVAL;
     target_ulong priv, isel, vgein;
@@ -XXX,XX +XXX,XX @@ static int rmw_xireg(CPURISCVState *env, int csrno, target_ulong *val,
 
     /* Decode register details from CSR number */
     virt = false;
+    isel_reserved = false;
     switch (csrno) {
     case CSR_MIREG:
         iprio = env->miprio;
@@ -XXX,XX +XXX,XX @@ static int rmw_xireg(CPURISCVState *env, int csrno, target_ulong *val,
                                                   riscv_cpu_mxl_bits(env)),
                                     val, new_val, wr_mask);
         }
+    } else {
+        isel_reserved = true;
     }
 
 done:
     if (ret) {
-        return (env->virt_enabled && virt) ?
+        return (env->virt_enabled && virt && !isel_reserved) ?
                RISCV_EXCP_VIRT_INSTRUCTION_FAULT : RISCV_EXCP_ILLEGAL_INST;
     }
     return RISCV_EXCP_NONE;
-- 
2.41.0

From: Nikita Shubin <n.shubin@yadro.com>

As per ISA:

"For CSRRWI, if rd=x0, then the instruction shall not read the CSR and
shall not cause any of the side effects that might occur on a CSR read."

trans_csrrwi() and trans_csrrw() call do_csrw() if rd=x0, do_csrw() calls
riscv_csrrw_do64(), via helper_csrw() passing NULL as *ret_value.

Signed-off-by: Nikita Shubin <n.shubin@yadro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20230808090914.17634-1-nikita.shubin@maquefel.me>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
---
 target/riscv/csr.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index XXXXXXX..XXXXXXX 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -XXX,XX +XXX,XX @@ static RISCVException riscv_csrrw_do64(CPURISCVState *env, int csrno,
                                        target_ulong write_mask)
 {
     RISCVException ret;
-    target_ulong old_value;
+    target_ulong old_value = 0;
 
     /* execute combined read/write operation if it exists */
     if (csr_ops[csrno].op) {
         return csr_ops[csrno].op(env, csrno, ret_value, new_value, write_mask);
     }
 
-    /* if no accessor exists then return failure */
-    if (!csr_ops[csrno].read) {
-        return RISCV_EXCP_ILLEGAL_INST;
-    }
-    /* read old value */
-    ret = csr_ops[csrno].read(env, csrno, &old_value);
-    if (ret != RISCV_EXCP_NONE) {
-        return ret;
+    /*
+     * ret_value == NULL means that rd=x0 and we're coming from helper_csrw()
+     * and we can't throw side effects caused by CSR reads.
+     */
+    if (ret_value) {
+        /* if no accessor exists then return failure */
+        if (!csr_ops[csrno].read) {
+            return RISCV_EXCP_ILLEGAL_INST;
+        }
+        /* read old value */
+        ret = csr_ops[csrno].read(env, csrno, &old_value);
+        if (ret != RISCV_EXCP_NONE) {
+            return ret;
+        }
     }
 
     /* write value if writable and write mask set, otherwise drop writes */
-- 
2.41.0