From nobody Mon Feb  9 19:04:05 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	dkim=pass;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org;
	dmarc=pass(p=none dis=none)  header.from=linaro.org
ARC-Seal: i=1; a=rsa-sha256; t=1751640607; cv=none;
	d=zohomail.com; s=zohoarc;
	b=XkxFY/gWq92n8VLF4mOYh6RKLkooZQwX/8yxJhTScMM0usYSctC14o6whtAq50cwWDRrak+GuQy9/VPTFnKx5U4Hk0ZS+AWws/qKGG0w7Ynf66A8n8Yy5ibmGsRL259XqY4YfzgwWVgvP2ZAFoA0mfdhGc3qMrI7a0vF8bLtFiY=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com;
 s=zohoarc;
	t=1751640607;
 h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To;
	bh=og0z3O2MJT0QCwzRYtxsXjZzBvNbK6+O59Vlkv9/sNA=;
	b=c0Wm8jdekP9AP7kP9hS8gPpw7eAXirSr7lrpDh5hsqnFql5QC7+Rp+2cg3WbCShofelK5EgAe4le4+oIOxO133ximHk8/52RwqWyfWT4eeYHlSNt6WnTfvFQxVpt+yJynJcxSP+F6SZrAEGRnETFpdCpk/+sYgfkrzG8eIzGg1s=
ARC-Authentication-Results: i=1; mx.zohomail.com;
	dkim=pass;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org;
	dmarc=pass header.from=<richard.henderson@linaro.org> (p=none dis=none)
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1751640607714161.03111328913542;
 Fri, 4 Jul 2025 07:50:07 -0700 (PDT)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1uXhUX-0004We-Bj; Fri, 04 Jul 2025 10:34:17 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <richard.henderson@linaro.org>)
 id 1uXhPM-0002kn-Kk
 for qemu-devel@nongnu.org; Fri, 04 Jul 2025 10:29:02 -0400
Received: from mail-oa1-x29.google.com ([2001:4860:4864:20::29])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <richard.henderson@linaro.org>)
 id 1uXhPJ-0001bF-25
 for qemu-devel@nongnu.org; Fri, 04 Jul 2025 10:28:56 -0400
Received: by mail-oa1-x29.google.com with SMTP id
 586e51a60fabf-2eb6c422828so1287397fac.1
 for <qemu-devel@nongnu.org>; Fri, 04 Jul 2025 07:28:52 -0700 (PDT)
Received: from localhost.localdomain (fixed-187-189-51-143.totalplay.net.
 [187.189.51.143]) by smtp.gmail.com with ESMTPSA id
 586e51a60fabf-2f78ff55633sm531448fac.20.2025.07.04.07.28.50
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Fri, 04 Jul 2025 07:28:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=linaro.org; s=google; t=1751639332; x=1752244132; darn=nongnu.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=og0z3O2MJT0QCwzRYtxsXjZzBvNbK6+O59Vlkv9/sNA=;
 b=B5uIsCCn9hWIRiwze9TN3NKNFXJRytSf+ytvFnVFN6PlLQAi47s7bxUOYEI3WtX4WM
 RC3Qf+n2uqs12ZHpFcPD1GBRVhmtRA2160xNq5vCSOovywezeKKswx0zqnOCLAgqsXJ7
 9bB7vv53blVrvHuKwAH5mJVV1L/KrkJqjNjRNEb5jGm4Lk0kT6hGB1v/V89T9mbBe6pO
 1zklytFHVaKfXdzU3Bf1z+3QttLrYtB31Zn2HN4iGJjgvu1nj1D9ssrYp8fvLYWHl4sM
 8vors1edqNV4ewgHrbQUfErZffC2piXiwa7F/QyXzK0fQ5Q1K1NR+nI0r9a5QQ8D3Mrp
 XL5Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1751639332; x=1752244132;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=og0z3O2MJT0QCwzRYtxsXjZzBvNbK6+O59Vlkv9/sNA=;
 b=v7btj+ThEXp29CXaPxumtSUIRSEsQLlApkUIuctviooVr7wi3Fyfzzu6tXremAh3tJ
 1e90aQFfcjSp2PDpvtBLhcTyHRp6eoleNcFBPSOQ+7IdGgRD8PcksHlzELDFlT2/63pG
 tu67wUgaqXHX4qFNyuMCpZ0wND9DBRJ5t6ntbU1nfKO/1anx4G37hAeFBmHTHCZnEFhK
 jU+RiSjn0+OUfwvYBtXtV1vdx/5A1QU3JvqaYwuPNuJC/PTa+l2MXzAWzepFG+L77wQk
 wG6U9YqENue6DGuiEVGz5r5c0VqQwcVsXNj34hqbsAaRT+f5utL/dfLKXK+10c5Rqa4p
 zPrA==
X-Gm-Message-State: AOJu0YyZ7X2XY8oFXPTjayg68JwJ8vAnG34PCGRc4IizvZPhZYztU+fx
 RxZNzC85agtOoFBjUhorsljjU12VbUkQ9Ft81mjrfxQLhqkxM1q5nRbwm7U4HwXNZ1XOPR8t0VC
 mi859PcU=
X-Gm-Gg: ASbGnctC0Bwo+OZDN9IIw3svMh7kwvS/rmgOPSCX/IFvUJq0h/fVPCpi24fOZTOKc+K
 JFtarJaoNYap2KEGBQZJquXtOaqJ2w5Ae4QOM7C0QG2wmXlX2QS2ZAxI7U1CA/3QEN4bOvX+89e
 Ut6dwtgpeWtlv9blQ1elCfwkvx8mQ7R7z3PFhDw9PL85SYy/4gcESRkIe+KbK54/cru7jPTEJt7
 6LmQTdkFBw0RnQC/NBpvdE51gAEo4Le/1+3AToFRiixA/Ghb0K3XtFu7FkH0gwZHC9dzKOYb3ia
 iqxwhXTqZlVeAHUt/rxsoe15FzyLpuGol24XGF0W6RL72PeE3YwpnTN1ZQuU9DtQVhlrDbvIlrc
 7CFiHzgdWw3RBTGl67rnW3eKLxIMjKmQcVxAxY+wayg9CZCiv
X-Google-Smtp-Source: 
 AGHT+IEISd7p5+pDIB9TYfaP6swFpVEtHQsqfTBeWYx7y+R8faMccIbJDv2nUa8JXd2PAjgK5J92rg==
X-Received: by 2002:a05:6870:3906:b0:2da:87a2:f223 with SMTP id
 586e51a60fabf-2f791f2bc81mr2469391fac.11.1751639331612;
 Fri, 04 Jul 2025 07:28:51 -0700 (PDT)
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Cc: qemu-arm@nongnu.org,
	peter.maydell@linaro.org
Subject: [PATCH v4 093/108] target/arm: Implement SME2 counted predicate
 register load/store
Date: Fri,  4 Jul 2025 08:20:56 -0600
Message-ID: <20250704142112.1018902-94-richard.henderson@linaro.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20250704142112.1018902-1-richard.henderson@linaro.org>
References: <20250704142112.1018902-1-richard.henderson@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Received-SPF: pass client-ip=2001:4860:4864:20::29;
 envelope-from=richard.henderson@linaro.org; helo=mail-oa1-x29.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZohoMail-DKIM: pass (identity @linaro.org)
X-ZM-MESSAGEID: 1751640608928116600
Content-Type: text/plain; charset="utf-8"

Implement the SVE2p1 consecutive register LD1/ST1,
and the SME2 strided register LD1/ST1.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sve.h    |  16 ++
 target/arm/tcg/sve_helper.c    | 493 +++++++++++++++++++++++++++++++++
 target/arm/tcg/translate-sve.c | 103 +++++++
 target/arm/tcg/sve.decode      |  50 ++++
 4 files changed, 662 insertions(+)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index 5f4b4aa036..c4736d7510 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -3048,3 +3048,19 @@ DEF_HELPER_FLAGS_3(pmov_pv_d, TCG_CALL_NO_RWG, void,=
 ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(pmov_vp_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(pmov_vp_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(pmov_vp_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve2p1_ld1bb_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32=
, i32)
+DEF_HELPER_FLAGS_5(sve2p1_ld1hh_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, =
i32, i32)
+DEF_HELPER_FLAGS_5(sve2p1_ld1hh_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, =
i32, i32)
+DEF_HELPER_FLAGS_5(sve2p1_ld1ss_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, =
i32, i32)
+DEF_HELPER_FLAGS_5(sve2p1_ld1ss_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, =
i32, i32)
+DEF_HELPER_FLAGS_5(sve2p1_ld1dd_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, =
i32, i32)
+DEF_HELPER_FLAGS_5(sve2p1_ld1dd_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, =
i32, i32)
+
+DEF_HELPER_FLAGS_5(sve2p1_st1bb_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32=
, i32)
+DEF_HELPER_FLAGS_5(sve2p1_st1hh_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, =
i32, i32)
+DEF_HELPER_FLAGS_5(sve2p1_st1hh_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, =
i32, i32)
+DEF_HELPER_FLAGS_5(sve2p1_st1ss_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, =
i32, i32)
+DEF_HELPER_FLAGS_5(sve2p1_st1ss_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, =
i32, i32)
+DEF_HELPER_FLAGS_5(sve2p1_st1dd_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, =
i32, i32)
+DEF_HELPER_FLAGS_5(sve2p1_st1dd_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, =
i32, i32)
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index 42b05756a9..e6342990fa 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -7586,6 +7586,499 @@ DO_ST1_ZPZ_D(dd_be, zd, MO_64)
 #undef DO_ST1_ZPZ_S
 #undef DO_ST1_ZPZ_D
=20
+/*
+ * SVE2.1 consecutive register load/store
+ */
+
+static unsigned sve2p1_cont_ldst_elements(SVEContLdSt *info, vaddr addr,
+                                          uint32_t png, intptr_t reg_max,
+                                          int N, int v_esz)
+{
+    const int esize =3D 1 << v_esz;
+    intptr_t reg_off_first =3D -1, reg_off_last =3D -1, reg_off_split;
+    DecodeCounter p =3D decode_counter(png, reg_max, v_esz);
+    unsigned b_count =3D p.count << v_esz;
+    unsigned b_stride =3D 1 << (v_esz + p.lg2_stride);
+    intptr_t page_split;
+
+    /* Set all of the element indices to -1, and the TLB data to 0. */
+    memset(info, -1, offsetof(SVEContLdSt, page));
+    memset(info->page, 0, sizeof(info->page));
+
+    if (p.invert) {
+        if (b_count >=3D reg_max * N) {
+            return 0;
+        }
+        reg_off_first =3D b_count;
+        reg_off_last =3D reg_max * N - b_stride;
+    } else {
+        if (b_count =3D=3D 0) {
+            return 0;
+        }
+        reg_off_first =3D 0;
+        reg_off_last =3D MIN(b_count - esize, reg_max * N - b_stride);
+    }
+
+    info->reg_off_first[0] =3D reg_off_first;
+    info->mem_off_first[0] =3D reg_off_first;
+
+    page_split =3D -(addr | TARGET_PAGE_MASK);
+    if (reg_off_last + esize <=3D page_split || reg_off_first >=3D page_sp=
lit) {
+        /* The entire operation fits within a single page. */
+        info->reg_off_last[0] =3D reg_off_last;
+        return b_stride;
+    }
+
+    info->page_split =3D page_split;
+    reg_off_split =3D ROUND_DOWN(page_split, esize);
+
+    /*
+     * This is the last full element on the first page, but it is not
+     * necessarily active.  If there is no full element, i.e. the first
+     * active element is the one that's split, this value remains -1.
+     * It is useful as iteration bounds.
+     */
+    if (reg_off_split !=3D 0) {
+        info->reg_off_last[0] =3D ROUND_DOWN(reg_off_split - esize, b_stri=
de);
+    }
+
+    /* Determine if an unaligned element spans the pages.  */
+    if (page_split & (esize - 1)) {
+        /* It is helpful to know if the split element is active. */
+        if ((reg_off_split & (b_stride - 1)) =3D=3D 0) {
+            info->reg_off_split =3D reg_off_split;
+            info->mem_off_split =3D reg_off_split;
+        }
+        reg_off_split +=3D esize;
+    }
+
+    /*
+     * We do want the first active element on the second page, because
+     * this may affect the address reported in an exception.
+     */
+    reg_off_split =3D ROUND_UP(reg_off_split, b_stride);
+    if (reg_off_split <=3D reg_off_last) {
+        info->reg_off_first[1] =3D reg_off_split;
+        info->mem_off_first[1] =3D reg_off_split;
+        info->reg_off_last[1] =3D reg_off_last;
+    }
+    return b_stride;
+}
+
+static void sve2p1_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *e=
nv,
+                                         target_ulong addr, unsigned estri=
de,
+                                         int esize, int wp_access, uintptr=
_t ra)
+{
+#ifndef CONFIG_USER_ONLY
+    intptr_t count_off, count_last;
+    int flags0 =3D info->page[0].flags;
+    int flags1 =3D info->page[1].flags;
+
+    if (likely(!((flags0 | flags1) & TLB_WATCHPOINT))) {
+        return;
+    }
+
+    /* Indicate that watchpoints are handled. */
+    info->page[0].flags =3D flags0 & ~TLB_WATCHPOINT;
+    info->page[1].flags =3D flags1 & ~TLB_WATCHPOINT;
+
+    if (flags0 & TLB_WATCHPOINT) {
+        count_off =3D info->reg_off_first[0];
+        count_last =3D info->reg_off_split;
+        if (count_last < 0) {
+            count_last =3D info->reg_off_last[0];
+        }
+        do {
+            cpu_check_watchpoint(env_cpu(env), addr + count_off,
+                                 esize, info->page[0].attrs, wp_access, ra=
);
+            count_off +=3D estride;
+        } while (count_off <=3D count_last);
+    }
+
+    count_off =3D info->reg_off_first[1];
+    if ((flags1 & TLB_WATCHPOINT) && count_off >=3D 0) {
+        count_last =3D info->reg_off_last[1];
+        do {
+            cpu_check_watchpoint(env_cpu(env), addr + count_off,
+                                 esize, info->page[1].attrs,
+                                 wp_access, ra);
+            count_off +=3D estride;
+        } while (count_off <=3D count_last);
+    }
+#endif
+}
+
+static void sve2p1_cont_ldst_mte_check(SVEContLdSt *info, CPUARMState *env,
+                                       target_ulong addr, unsigned estride,
+                                       int esize, uint32_t mtedesc,
+                                       uintptr_t ra)
+{
+    intptr_t count_off, count_last;
+
+    /*
+     * TODO: estride is always a small power of two, <=3D 8.
+     * Manipulate the stride within the loops such that
+     *   - first iteration hits addr + off, as required,
+     *   - second iteration hits ALIGN_UP(addr, 16),
+     *   - other iterations advance addr by 16.
+     * This will minimize the probing to once per MTE granule.
+     */
+
+    /* Process the page only if MemAttr =3D=3D Tagged. */
+    if (info->page[0].tagged) {
+        count_off =3D info->reg_off_first[0];
+        count_last =3D info->reg_off_split;
+        if (count_last < 0) {
+            count_last =3D info->reg_off_last[0];
+        }
+
+        do {
+            mte_check(env, mtedesc, addr + count_off, ra);
+            count_off +=3D estride;
+        } while (count_off <=3D count_last);
+    }
+
+    count_off =3D info->reg_off_first[1];
+    if (count_off >=3D 0 && info->page[1].tagged) {
+        count_last =3D info->reg_off_last[1];
+        do {
+            mte_check(env, mtedesc, addr + count_off, ra);
+            count_off +=3D estride;
+        } while (count_off <=3D count_last);
+    }
+}
+
+static inline QEMU_ALWAYS_INLINE
+void sve2p1_ld1_c(CPUARMState *env, ARMVectorReg *zd, const vaddr addr,
+                  uint32_t png, uint32_t desc,
+                  const uintptr_t ra, const MemOp esz,
+                  sve_ldst1_host_fn *host_fn,
+                  sve_ldst1_tlb_fn *tlb_fn)
+{
+    const unsigned N =3D (desc >> SIMD_DATA_SHIFT) & 1 ? 4 : 2;
+    const unsigned rstride =3D 1 << ((desc >> (SIMD_DATA_SHIFT + 1)) % 4);
+    uint32_t mtedesc =3D desc >> (SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
+    const intptr_t reg_max =3D simd_oprsz(desc);
+    const unsigned esize =3D 1 << esz;
+    intptr_t count_off, count_last;
+    intptr_t reg_off, reg_last, reg_n;
+    SVEContLdSt info;
+    unsigned estride, flags;
+    void *host;
+
+    estride =3D sve2p1_cont_ldst_elements(&info, addr, png, reg_max, N, es=
z);
+    if (estride =3D=3D 0) {
+        /* The entire predicate was false; no load occurs.  */
+        for (unsigned n =3D 0; n < N; n++) {
+            memset(zd + n * rstride, 0, reg_max);
+        }
+        return;
+    }
+
+    /* Probe the page(s).  Exit with exception for any invalid page. */
+    sve_cont_ldst_pages(&info, FAULT_ALL, env, addr, MMU_DATA_LOAD, ra);
+
+    /* Handle watchpoints for all active elements. */
+    sve2p1_cont_ldst_watchpoints(&info, env, addr, estride,
+                                 esize, BP_MEM_READ, ra);
+
+    /*
+     * Handle mte checks for all active elements.
+     * Since TBI must be set for MTE, !mtedesc =3D> !mte_active.
+     */
+    if (mtedesc) {
+        sve2p1_cont_ldst_mte_check(&info, env, estride, addr,
+                                   esize, mtedesc, ra);
+    }
+
+    flags =3D info.page[0].flags | info.page[1].flags;
+    if (unlikely(flags !=3D 0)) {
+        /*
+         * At least one page includes MMIO.
+         * Any bus operation can fail with cpu_transaction_failed,
+         * which for ARM will raise SyncExternal.  Perform the load
+         * into scratch memory to preserve register state until the end.
+         */
+        ARMVectorReg scratch[4] =3D { };
+
+        count_off =3D info.reg_off_first[0];
+        count_last =3D info.reg_off_last[1];
+        if (count_last < 0) {
+            count_last =3D info.reg_off_split;
+            if (count_last < 0) {
+                count_last =3D info.reg_off_last[0];
+            }
+        }
+        reg_off =3D count_off % reg_max;
+        reg_n =3D count_off / reg_max;
+
+        do {
+            reg_last =3D MIN(count_last - count_off, reg_max - esize);
+            do {
+                tlb_fn(env, &scratch[reg_n], reg_off, addr + count_off, ra=
);
+                reg_off +=3D estride;
+                count_off +=3D estride;
+            } while (reg_off <=3D reg_last);
+            reg_off =3D 0;
+            reg_n++;
+        } while (count_off <=3D count_last);
+
+        for (unsigned n =3D 0; n < N; ++n) {
+            memcpy(&zd[n * rstride], &scratch[n], reg_max);
+        }
+        return;
+    }
+
+    /* The entire operation is in RAM, on valid pages. */
+
+    for (unsigned n =3D 0; n < N; ++n) {
+        memset(&zd[n * rstride], 0, reg_max);
+    }
+
+    count_off =3D info.reg_off_first[0];
+    count_last =3D info.reg_off_last[0];
+    reg_off =3D count_off % reg_max;
+    reg_n =3D count_off / reg_max;
+    host =3D info.page[0].host;
+
+    set_helper_retaddr(ra);
+
+    do {
+        reg_last =3D MIN(count_last - reg_n * reg_max, reg_max - esize);
+        do {
+            host_fn(&zd[reg_n * rstride], reg_off, host + count_off);
+            reg_off +=3D estride;
+            count_off +=3D estride;
+        } while (reg_off <=3D reg_last);
+        reg_off =3D 0;
+        reg_n++;
+    } while (count_off <=3D count_last);
+
+    clear_helper_retaddr();
+
+    /*
+     * Use the slow path to manage the cross-page misalignment.
+     * But we know this is RAM and cannot trap.
+     */
+    count_off =3D info.reg_off_split;
+    if (unlikely(count_off >=3D 0)) {
+        reg_off =3D count_off % reg_max;
+        reg_n =3D count_off / reg_max;
+        tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_off, ra);
+    }
+
+    count_off =3D info.reg_off_first[1];
+    if (unlikely(count_off >=3D 0)) {
+        count_last =3D info.reg_off_last[1];
+        reg_off =3D count_off % reg_max;
+        reg_n =3D count_off / reg_max;
+        host =3D info.page[1].host;
+
+        set_helper_retaddr(ra);
+
+        do {
+            reg_last =3D MIN(count_last - reg_n * reg_max, reg_max - esize=
);
+            do {
+                host_fn(&zd[reg_n * rstride], reg_off, host + count_off);
+                reg_off +=3D estride;
+                count_off +=3D estride;
+            } while (reg_off <=3D reg_last);
+            reg_off =3D 0;
+            reg_n++;
+        } while (count_off <=3D count_last);
+
+        clear_helper_retaddr();
+    }
+}
+
+void HELPER(sve2p1_ld1bb_c)(CPUARMState *env, void *vd, target_ulong addr,
+                            uint32_t png, uint32_t desc)
+{
+    sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), MO_8,
+                 sve_ld1bb_host, sve_ld1bb_tlb);
+}
+
+#define DO_LD1_2(NAME, ESZ)                                             \
+void HELPER(sve2p1_##NAME##_le_c)(CPUARMState *env, void *vd,           \
+                                  target_ulong addr, uint32_t png,      \
+                                  uint32_t desc)                        \
+{                                                                       \
+    sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), ESZ,                \
+                 sve_##NAME##_le_host, sve_##NAME##_le_tlb);            \
+}                                                                       \
+void HELPER(sve2p1_##NAME##_be_c)(CPUARMState *env, void *vd,           \
+                                  target_ulong addr, uint32_t png,      \
+                                  uint32_t desc)                        \
+{                                                                       \
+    sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), ESZ,                \
+                 sve_##NAME##_be_host, sve_##NAME##_be_tlb);            \
+}
+
+DO_LD1_2(ld1hh, MO_16)
+DO_LD1_2(ld1ss, MO_32)
+DO_LD1_2(ld1dd, MO_64)
+
+#undef DO_LD1_2
+
+static inline QEMU_ALWAYS_INLINE
+void sve2p1_st1_c(CPUARMState *env, ARMVectorReg *zd, const vaddr addr,
+                  uint32_t png, uint32_t desc,
+                  const uintptr_t ra, const int esz,
+                  sve_ldst1_host_fn *host_fn,
+                  sve_ldst1_tlb_fn *tlb_fn)
+{
+    const unsigned N =3D (desc >> SIMD_DATA_SHIFT) & 1 ? 4 : 2;
+    const unsigned rstride =3D 1 << ((desc >> (SIMD_DATA_SHIFT + 1)) % 4);
+    uint32_t mtedesc =3D desc >> (SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
+    const intptr_t reg_max =3D simd_oprsz(desc);
+    const unsigned esize =3D 1 << esz;
+    intptr_t count_off, count_last;
+    intptr_t reg_off, reg_last, reg_n;
+    SVEContLdSt info;
+    unsigned estride, flags;
+    void *host;
+
+    estride =3D sve2p1_cont_ldst_elements(&info, addr, png, reg_max, N, es=
z);
+    if (estride =3D=3D 0) {
+        /* The entire predicate was false; no store occurs.  */
+        return;
+    }
+
+    /* Probe the page(s).  Exit with exception for any invalid page. */
+    sve_cont_ldst_pages(&info, FAULT_ALL, env, addr, MMU_DATA_STORE, ra);
+
+    /* Handle watchpoints for all active elements. */
+    sve2p1_cont_ldst_watchpoints(&info, env, addr, estride,
+                                 esize, BP_MEM_WRITE, ra);
+
+    /*
+     * Handle mte checks for all active elements.
+     * Since TBI must be set for MTE, !mtedesc =3D> !mte_active.
+     */
+    if (mtedesc) {
+        sve2p1_cont_ldst_mte_check(&info, env, estride, addr,
+                                   esize, mtedesc, ra);
+    }
+
+    flags =3D info.page[0].flags | info.page[1].flags;
+    if (unlikely(flags !=3D 0)) {
+        /*
+         * At least one page includes MMIO.
+         * Any bus operation can fail with cpu_transaction_failed,
+         * which for ARM will raise SyncExternal.  Perform the load
+         * into scratch memory to preserve register state until the end.
+         */
+        count_off =3D info.reg_off_first[0];
+        count_last =3D info.reg_off_last[1];
+        if (count_last < 0) {
+            count_last =3D info.reg_off_split;
+            if (count_last < 0) {
+                count_last =3D info.reg_off_last[0];
+            }
+        }
+        reg_off =3D count_off % reg_max;
+        reg_n =3D count_off / reg_max;
+
+        do {
+            reg_last =3D MIN(count_last - count_off, reg_max - esize);
+            do {
+                tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_of=
f, ra);
+                reg_off +=3D estride;
+                count_off +=3D estride;
+            } while (reg_off <=3D reg_last);
+            reg_off =3D 0;
+            reg_n++;
+        } while (count_off <=3D count_last);
+        return;
+    }
+
+    /* The entire operation is in RAM, on valid pages. */
+
+    count_off =3D info.reg_off_first[0];
+    count_last =3D info.reg_off_last[0];
+    reg_off =3D count_off % reg_max;
+    reg_n =3D count_off / reg_max;
+    host =3D info.page[0].host;
+
+    set_helper_retaddr(ra);
+
+    do {
+        reg_last =3D MIN(count_last - reg_n * reg_max, reg_max - esize);
+        do {
+            host_fn(&zd[reg_n * rstride], reg_off, host + count_off);
+            reg_off +=3D estride;
+            count_off +=3D estride;
+        } while (reg_off <=3D reg_last);
+        reg_off =3D 0;
+        reg_n++;
+    } while (count_off <=3D count_last);
+
+    clear_helper_retaddr();
+
+    /*
+     * Use the slow path to manage the cross-page misalignment.
+     * But we know this is RAM and cannot trap.
+     */
+    count_off =3D info.reg_off_split;
+    if (unlikely(count_off >=3D 0)) {
+        reg_off =3D count_off % reg_max;
+        reg_n =3D count_off / reg_max;
+        tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_off, ra);
+    }
+
+    count_off =3D info.reg_off_first[1];
+    if (unlikely(count_off >=3D 0)) {
+        count_last =3D info.reg_off_last[1];
+        reg_off =3D count_off % reg_max;
+        reg_n =3D count_off / reg_max;
+        host =3D info.page[1].host;
+
+        set_helper_retaddr(ra);
+
+        do {
+            reg_last =3D MIN(count_last - reg_n * reg_max, reg_max - esize=
);
+            do {
+                host_fn(&zd[reg_n * rstride], reg_off, host + count_off);
+                reg_off +=3D estride;
+                count_off +=3D estride;
+            } while (reg_off <=3D reg_last);
+            reg_off =3D 0;
+            reg_n++;
+        } while (count_off <=3D count_last);
+
+        clear_helper_retaddr();
+    }
+}
+
+void HELPER(sve2p1_st1bb_c)(CPUARMState *env, void *vd, target_ulong addr,
+                           uint32_t png, uint32_t desc)
+{
+    sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), MO_8,
+                 sve_st1bb_host, sve_st1bb_tlb);
+}
+
+#define DO_ST1_2(NAME, ESZ)                                             \
+void HELPER(sve2p1_##NAME##_le_c)(CPUARMState *env, void *vd,           \
+                                  target_ulong addr, uint32_t png,      \
+                                  uint32_t desc)                        \
+{                                                                       \
+    sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), ESZ,                \
+                 sve_##NAME##_le_host, sve_##NAME##_le_tlb);            \
+}                                                                       \
+void HELPER(sve2p1_##NAME##_be_c)(CPUARMState *env, void *vd,           \
+                                  target_ulong addr, uint32_t png,      \
+                                  uint32_t desc)                        \
+{                                                                       \
+    sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), ESZ,                \
+                 sve_##NAME##_be_host, sve_##NAME##_be_tlb);            \
+}
+
+DO_ST1_2(st1hh, MO_16)
+DO_ST1_2(st1ss, MO_32)
+DO_ST1_2(st1dd, MO_64)
+
+#undef DO_ST1_2
+
 void HELPER(sve2_eor3)(void *vd, void *vn, void *vm, void *vk, uint32_t de=
sc)
 {
     intptr_t i, opr_sz =3D simd_oprsz(desc) / 8;
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 02f885dd48..dfb53e4bf4 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -7863,3 +7863,106 @@ TRANS_FEAT(UQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec=
_ool_zz,
            gen_helper_sme2_uqcvtn_sh, a->rd, a->rn, 0)
 TRANS_FEAT(SQCVTUN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz,
            gen_helper_sme2_sqcvtun_sh, a->rd, a->rn, 0)
+
+static bool gen_ldst_c(DisasContext *s, TCGv_i64 addr, int zd, int png,
+                       MemOp esz, bool is_write, int n, bool strided)
+{
+    typedef void ldst_c_fn(TCGv_env, TCGv_ptr, TCGv_i64,
+                           TCGv_i32, TCGv_i32);
+    static ldst_c_fn * const f_ldst[2][2][4] =3D {
+        { { gen_helper_sve2p1_ld1bb_c,
+            gen_helper_sve2p1_ld1hh_le_c,
+            gen_helper_sve2p1_ld1ss_le_c,
+            gen_helper_sve2p1_ld1dd_le_c, },
+          { gen_helper_sve2p1_ld1bb_c,
+            gen_helper_sve2p1_ld1hh_be_c,
+            gen_helper_sve2p1_ld1ss_be_c,
+            gen_helper_sve2p1_ld1dd_be_c, } },
+
+        { { gen_helper_sve2p1_st1bb_c,
+            gen_helper_sve2p1_st1hh_le_c,
+            gen_helper_sve2p1_st1ss_le_c,
+            gen_helper_sve2p1_st1dd_le_c, },
+          { gen_helper_sve2p1_st1bb_c,
+            gen_helper_sve2p1_st1hh_be_c,
+            gen_helper_sve2p1_st1ss_be_c,
+            gen_helper_sve2p1_st1dd_be_c, } }
+    };
+
+    TCGv_i32 t_png, t_desc;
+    TCGv_ptr t_zd;
+    uint32_t desc, lg2_rstride =3D 0;
+    bool be =3D s->be_data =3D=3D MO_BE;
+
+    assert(n =3D=3D 2 || n =3D=3D 4);
+    if (strided) {
+        lg2_rstride =3D 3;
+        if (n =3D=3D 4) {
+            /* Validate ZD alignment. */
+            if (zd & 4) {
+                return false;
+            }
+            lg2_rstride =3D 2;
+        }
+        /* Ignore non-temporal bit */
+        zd &=3D ~8;
+    }
+
+    if (strided || !dc_isar_feature(aa64_sve2p1, s)
+        ? !sme_sm_enabled_check(s)
+        : !sve_access_check(s)) {
+        return true;
+    }
+
+    if (!s->mte_active[0]) {
+        addr =3D clean_data_tbi(s, addr);
+    }
+
+    desc =3D n =3D=3D 2 ? 0 : 1;
+    desc =3D desc | (lg2_rstride << 1);
+    desc =3D make_svemte_desc(s, vec_full_reg_size(s), 1, esz, is_write, d=
esc);
+    t_desc =3D tcg_constant_i32(desc);
+
+    t_png =3D tcg_temp_new_i32();
+    tcg_gen_ld16u_i32(t_png, tcg_env,
+                      pred_full_reg_offset(s, png) ^
+                      (HOST_BIG_ENDIAN ? 6 : 0));
+
+    t_zd =3D tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(t_zd, tcg_env, vec_full_reg_offset(s, zd));
+
+    f_ldst[is_write][be][esz](tcg_env, t_zd, addr, t_png, t_desc);
+    return true;
+}
+
+static bool gen_ldst_zcrr_c(DisasContext *s, arg_zcrr_ldst *a,
+                            bool is_write, bool strided)
+{
+    TCGv_i64 addr =3D tcg_temp_new_i64();
+
+    tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->esz);
+    tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
+    return gen_ldst_c(s, addr, a->rd, a->png, a->esz, is_write,
+                      a->nreg, strided);
+}
+
+static bool gen_ldst_zcri_c(DisasContext *s, arg_zcri_ldst *a,
+                            bool is_write, bool strided)
+{
+    TCGv_i64 addr =3D tcg_temp_new_i64();
+
+    tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn),
+                     a->imm * a->nreg * vec_full_reg_size(s));
+    return gen_ldst_c(s, addr, a->rd, a->png, a->esz, is_write,
+                      a->nreg, strided);
+}
+
+TRANS_FEAT(LD1_zcrr, aa64_sme2_or_sve2p1, gen_ldst_zcrr_c, a, false, false)
+TRANS_FEAT(LD1_zcri, aa64_sme2_or_sve2p1, gen_ldst_zcri_c, a, false, false)
+TRANS_FEAT(ST1_zcrr, aa64_sme2_or_sve2p1, gen_ldst_zcrr_c, a, true, false)
+TRANS_FEAT(ST1_zcri, aa64_sme2_or_sve2p1, gen_ldst_zcri_c, a, true, false)
+
+TRANS_FEAT(LD1_zcrr_stride, aa64_sme2, gen_ldst_zcrr_c, a, false, true)
+TRANS_FEAT(LD1_zcri_stride, aa64_sme2, gen_ldst_zcri_c, a, false, true)
+TRANS_FEAT(ST1_zcrr_stride, aa64_sme2, gen_ldst_zcrr_c, a, true, true)
+TRANS_FEAT(ST1_zcri_stride, aa64_sme2, gen_ldst_zcri_c, a, true, true)
diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode
index 52a56d3341..bf3d4f4853 100644
--- a/target/arm/tcg/sve.decode
+++ b/target/arm/tcg/sve.decode
@@ -1812,3 +1812,53 @@ SCLAMP          01000100 .. 0 ..... 110000 ..... ...=
..          @rda_rn_rm
 UCLAMP          01000100 .. 0 ..... 110001 ..... .....          @rda_rn_rm
=20
 FCLAMP          01100100 .. 1 ..... 001001 ..... .....          @rda_rn_rm
+
+### SVE2p1 multi-vec contiguous load
+
+&zcrr_ldst      rd png rn rm esz nreg
+&zcri_ldst      rd png rn imm esz nreg
+%png            10:3 !function=3Dplus_8
+%zd_ax2         1:4 !function=3Dtimes_2
+%zd_ax4         2:3 !function=3Dtimes_4
+
+LD1_zcrr        10100000000 rm:5 0 esz:2 ... rn:5 .... - \
+                &zcrr_ldst %png rd=3D%zd_ax2 nreg=3D2
+LD1_zcrr        10100000000 rm:5 1 esz:2 ... rn:5 ... 0- \
+                &zcrr_ldst %png rd=3D%zd_ax4 nreg=3D4
+
+ST1_zcrr        10100000001 rm:5 0 esz:2 ... rn:5 .... - \
+                &zcrr_ldst %png rd=3D%zd_ax2 nreg=3D2
+ST1_zcrr        10100000001 rm:5 1 esz:2 ... rn:5 ... 0- \
+                &zcrr_ldst %png rd=3D%zd_ax4 nreg=3D4
+
+LD1_zcri        101000000100 imm:s4 0 esz:2 ... rn:5 .... - \
+                &zcri_ldst %png rd=3D%zd_ax2 nreg=3D2
+LD1_zcri        101000000100 imm:s4 1 esz:2 ... rn:5 ... 0- \
+                &zcri_ldst %png rd=3D%zd_ax4 nreg=3D4
+
+ST1_zcri        101000000110 imm:s4 0 esz:2 ... rn:5 .... - \
+                &zcri_ldst %png rd=3D%zd_ax2 nreg=3D2
+ST1_zcri        101000000110 imm:s4 1 esz:2 ... rn:5 ... 0- \
+                &zcri_ldst %png rd=3D%zd_ax4 nreg=3D4
+
+# Note: N bit and 0 bit (for nreg4) still mashed in rd.
+# This is handled within gen_ldst_c().
+LD1_zcrr_stride 10100001000 rm:5 0 esz:2 ... rn:5 rd:5 \
+                &zcrr_ldst %png nreg=3D2
+LD1_zcrr_stride 10100001000 rm:5 1 esz:2 ... rn:5 rd:5 \
+                &zcrr_ldst %png nreg=3D4
+
+ST1_zcrr_stride 10100001001 rm:5 0 esz:2 ... rn:5 rd:5 \
+                &zcrr_ldst %png nreg=3D2
+ST1_zcrr_stride 10100001001 rm:5 1 esz:2 ... rn:5 rd:5 \
+                &zcrr_ldst %png nreg=3D4
+
+LD1_zcri_stride 101000010100 imm:s4 0 esz:2 ... rn:5 rd:5 \
+                &zcri_ldst %png nreg=3D2
+LD1_zcri_stride 101000010100 imm:s4 1 esz:2 ... rn:5 rd:5 \
+                &zcri_ldst %png nreg=3D4
+
+ST1_zcri_stride 101000010110 imm:s4 0 esz:2 ... rn:5 rd:5 \
+                &zcri_ldst %png nreg=3D2
+ST1_zcri_stride 101000010110 imm:s4 1 esz:2 ... rn:5 rd:5 \
+                &zcri_ldst %png nreg=3D4
--=20
2.43.0