From nobody Tue Dec 16 12:17:55 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=linaro.org ARC-Seal: i=1; a=rsa-sha256; t=1751640607; cv=none; d=zohomail.com; s=zohoarc; b=XkxFY/gWq92n8VLF4mOYh6RKLkooZQwX/8yxJhTScMM0usYSctC14o6whtAq50cwWDRrak+GuQy9/VPTFnKx5U4Hk0ZS+AWws/qKGG0w7Ynf66A8n8Yy5ibmGsRL259XqY4YfzgwWVgvP2ZAFoA0mfdhGc3qMrI7a0vF8bLtFiY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1751640607; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=og0z3O2MJT0QCwzRYtxsXjZzBvNbK6+O59Vlkv9/sNA=; b=c0Wm8jdekP9AP7kP9hS8gPpw7eAXirSr7lrpDh5hsqnFql5QC7+Rp+2cg3WbCShofelK5EgAe4le4+oIOxO133ximHk8/52RwqWyfWT4eeYHlSNt6WnTfvFQxVpt+yJynJcxSP+F6SZrAEGRnETFpdCpk/+sYgfkrzG8eIzGg1s= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1751640607714161.03111328913542; Fri, 4 Jul 2025 07:50:07 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uXhUX-0004We-Bj; Fri, 04 Jul 2025 10:34:17 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uXhPM-0002kn-Kk for qemu-devel@nongnu.org; Fri, 04 Jul 2025 10:29:02 -0400 Received: from mail-oa1-x29.google.com ([2001:4860:4864:20::29]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1uXhPJ-0001bF-25 for qemu-devel@nongnu.org; Fri, 04 Jul 2025 10:28:56 -0400 Received: by mail-oa1-x29.google.com with SMTP id 586e51a60fabf-2eb6c422828so1287397fac.1 for ; Fri, 04 Jul 2025 07:28:52 -0700 (PDT) Received: from localhost.localdomain (fixed-187-189-51-143.totalplay.net. [187.189.51.143]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-2f78ff55633sm531448fac.20.2025.07.04.07.28.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Jul 2025 07:28:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1751639332; x=1752244132; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=og0z3O2MJT0QCwzRYtxsXjZzBvNbK6+O59Vlkv9/sNA=; b=B5uIsCCn9hWIRiwze9TN3NKNFXJRytSf+ytvFnVFN6PlLQAi47s7bxUOYEI3WtX4WM RC3Qf+n2uqs12ZHpFcPD1GBRVhmtRA2160xNq5vCSOovywezeKKswx0zqnOCLAgqsXJ7 9bB7vv53blVrvHuKwAH5mJVV1L/KrkJqjNjRNEb5jGm4Lk0kT6hGB1v/V89T9mbBe6pO 1zklytFHVaKfXdzU3Bf1z+3QttLrYtB31Zn2HN4iGJjgvu1nj1D9ssrYp8fvLYWHl4sM 8vors1edqNV4ewgHrbQUfErZffC2piXiwa7F/QyXzK0fQ5Q1K1NR+nI0r9a5QQ8D3Mrp XL5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751639332; x=1752244132; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=og0z3O2MJT0QCwzRYtxsXjZzBvNbK6+O59Vlkv9/sNA=; b=v7btj+ThEXp29CXaPxumtSUIRSEsQLlApkUIuctviooVr7wi3Fyfzzu6tXremAh3tJ 1e90aQFfcjSp2PDpvtBLhcTyHRp6eoleNcFBPSOQ+7IdGgRD8PcksHlzELDFlT2/63pG tu67wUgaqXHX4qFNyuMCpZ0wND9DBRJ5t6ntbU1nfKO/1anx4G37hAeFBmHTHCZnEFhK jU+RiSjn0+OUfwvYBtXtV1vdx/5A1QU3JvqaYwuPNuJC/PTa+l2MXzAWzepFG+L77wQk wG6U9YqENue6DGuiEVGz5r5c0VqQwcVsXNj34hqbsAaRT+f5utL/dfLKXK+10c5Rqa4p zPrA== X-Gm-Message-State: AOJu0YyZ7X2XY8oFXPTjayg68JwJ8vAnG34PCGRc4IizvZPhZYztU+fx RxZNzC85agtOoFBjUhorsljjU12VbUkQ9Ft81mjrfxQLhqkxM1q5nRbwm7U4HwXNZ1XOPR8t0VC mi859PcU= X-Gm-Gg: ASbGnctC0Bwo+OZDN9IIw3svMh7kwvS/rmgOPSCX/IFvUJq0h/fVPCpi24fOZTOKc+K JFtarJaoNYap2KEGBQZJquXtOaqJ2w5Ae4QOM7C0QG2wmXlX2QS2ZAxI7U1CA/3QEN4bOvX+89e Ut6dwtgpeWtlv9blQ1elCfwkvx8mQ7R7z3PFhDw9PL85SYy/4gcESRkIe+KbK54/cru7jPTEJt7 6LmQTdkFBw0RnQC/NBpvdE51gAEo4Le/1+3AToFRiixA/Ghb0K3XtFu7FkH0gwZHC9dzKOYb3ia iqxwhXTqZlVeAHUt/rxsoe15FzyLpuGol24XGF0W6RL72PeE3YwpnTN1ZQuU9DtQVhlrDbvIlrc 7CFiHzgdWw3RBTGl67rnW3eKLxIMjKmQcVxAxY+wayg9CZCiv X-Google-Smtp-Source: AGHT+IEISd7p5+pDIB9TYfaP6swFpVEtHQsqfTBeWYx7y+R8faMccIbJDv2nUa8JXd2PAjgK5J92rg== X-Received: by 2002:a05:6870:3906:b0:2da:87a2:f223 with SMTP id 586e51a60fabf-2f791f2bc81mr2469391fac.11.1751639331612; Fri, 04 Jul 2025 07:28:51 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: qemu-arm@nongnu.org, peter.maydell@linaro.org Subject: [PATCH v4 093/108] target/arm: Implement SME2 counted predicate register load/store Date: Fri, 4 Jul 2025 08:20:56 -0600 Message-ID: <20250704142112.1018902-94-richard.henderson@linaro.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250704142112.1018902-1-richard.henderson@linaro.org> References: <20250704142112.1018902-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2001:4860:4864:20::29; envelope-from=richard.henderson@linaro.org; helo=mail-oa1-x29.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @linaro.org) X-ZM-MESSAGEID: 1751640608928116600 Content-Type: text/plain; charset="utf-8" Implement the SVE2p1 consecutive register LD1/ST1, and the SME2 strided register LD1/ST1. Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/tcg/helper-sve.h | 16 ++ target/arm/tcg/sve_helper.c | 493 +++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 103 +++++++ target/arm/tcg/sve.decode | 50 ++++ 4 files changed, 662 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index 5f4b4aa036..c4736d7510 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -3048,3 +3048,19 @@ DEF_HELPER_FLAGS_3(pmov_pv_d, TCG_CALL_NO_RWG, void,= ptr, ptr, i32) DEF_HELPER_FLAGS_3(pmov_vp_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(pmov_vp_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(pmov_vp_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2p1_ld1bb_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32= , i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1hh_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, = i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1hh_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, = i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1ss_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, = i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1ss_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, = i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1dd_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, = i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1dd_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, = i32, i32) + +DEF_HELPER_FLAGS_5(sve2p1_st1bb_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32= , i32) +DEF_HELPER_FLAGS_5(sve2p1_st1hh_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, = i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1hh_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, = i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1ss_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, = i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1ss_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, = i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1dd_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, = i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1dd_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, = i32, i32) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index 42b05756a9..e6342990fa 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -7586,6 +7586,499 @@ DO_ST1_ZPZ_D(dd_be, zd, MO_64) #undef DO_ST1_ZPZ_S #undef DO_ST1_ZPZ_D =20 +/* + * SVE2.1 consecutive register load/store + */ + +static unsigned sve2p1_cont_ldst_elements(SVEContLdSt *info, vaddr addr, + uint32_t png, intptr_t reg_max, + int N, int v_esz) +{ + const int esize =3D 1 << v_esz; + intptr_t reg_off_first =3D -1, reg_off_last =3D -1, reg_off_split; + DecodeCounter p =3D decode_counter(png, reg_max, v_esz); + unsigned b_count =3D p.count << v_esz; + unsigned b_stride =3D 1 << (v_esz + p.lg2_stride); + intptr_t page_split; + + /* Set all of the element indices to -1, and the TLB data to 0. */ + memset(info, -1, offsetof(SVEContLdSt, page)); + memset(info->page, 0, sizeof(info->page)); + + if (p.invert) { + if (b_count >=3D reg_max * N) { + return 0; + } + reg_off_first =3D b_count; + reg_off_last =3D reg_max * N - b_stride; + } else { + if (b_count =3D=3D 0) { + return 0; + } + reg_off_first =3D 0; + reg_off_last =3D MIN(b_count - esize, reg_max * N - b_stride); + } + + info->reg_off_first[0] =3D reg_off_first; + info->mem_off_first[0] =3D reg_off_first; + + page_split =3D -(addr | TARGET_PAGE_MASK); + if (reg_off_last + esize <=3D page_split || reg_off_first >=3D page_sp= lit) { + /* The entire operation fits within a single page. */ + info->reg_off_last[0] =3D reg_off_last; + return b_stride; + } + + info->page_split =3D page_split; + reg_off_split =3D ROUND_DOWN(page_split, esize); + + /* + * This is the last full element on the first page, but it is not + * necessarily active. If there is no full element, i.e. the first + * active element is the one that's split, this value remains -1. + * It is useful as iteration bounds. + */ + if (reg_off_split !=3D 0) { + info->reg_off_last[0] =3D ROUND_DOWN(reg_off_split - esize, b_stri= de); + } + + /* Determine if an unaligned element spans the pages. */ + if (page_split & (esize - 1)) { + /* It is helpful to know if the split element is active. */ + if ((reg_off_split & (b_stride - 1)) =3D=3D 0) { + info->reg_off_split =3D reg_off_split; + info->mem_off_split =3D reg_off_split; + } + reg_off_split +=3D esize; + } + + /* + * We do want the first active element on the second page, because + * this may affect the address reported in an exception. + */ + reg_off_split =3D ROUND_UP(reg_off_split, b_stride); + if (reg_off_split <=3D reg_off_last) { + info->reg_off_first[1] =3D reg_off_split; + info->mem_off_first[1] =3D reg_off_split; + info->reg_off_last[1] =3D reg_off_last; + } + return b_stride; +} + +static void sve2p1_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *e= nv, + target_ulong addr, unsigned estri= de, + int esize, int wp_access, uintptr= _t ra) +{ +#ifndef CONFIG_USER_ONLY + intptr_t count_off, count_last; + int flags0 =3D info->page[0].flags; + int flags1 =3D info->page[1].flags; + + if (likely(!((flags0 | flags1) & TLB_WATCHPOINT))) { + return; + } + + /* Indicate that watchpoints are handled. */ + info->page[0].flags =3D flags0 & ~TLB_WATCHPOINT; + info->page[1].flags =3D flags1 & ~TLB_WATCHPOINT; + + if (flags0 & TLB_WATCHPOINT) { + count_off =3D info->reg_off_first[0]; + count_last =3D info->reg_off_split; + if (count_last < 0) { + count_last =3D info->reg_off_last[0]; + } + do { + cpu_check_watchpoint(env_cpu(env), addr + count_off, + esize, info->page[0].attrs, wp_access, ra= ); + count_off +=3D estride; + } while (count_off <=3D count_last); + } + + count_off =3D info->reg_off_first[1]; + if ((flags1 & TLB_WATCHPOINT) && count_off >=3D 0) { + count_last =3D info->reg_off_last[1]; + do { + cpu_check_watchpoint(env_cpu(env), addr + count_off, + esize, info->page[1].attrs, + wp_access, ra); + count_off +=3D estride; + } while (count_off <=3D count_last); + } +#endif +} + +static void sve2p1_cont_ldst_mte_check(SVEContLdSt *info, CPUARMState *env, + target_ulong addr, unsigned estride, + int esize, uint32_t mtedesc, + uintptr_t ra) +{ + intptr_t count_off, count_last; + + /* + * TODO: estride is always a small power of two, <=3D 8. + * Manipulate the stride within the loops such that + * - first iteration hits addr + off, as required, + * - second iteration hits ALIGN_UP(addr, 16), + * - other iterations advance addr by 16. + * This will minimize the probing to once per MTE granule. + */ + + /* Process the page only if MemAttr =3D=3D Tagged. */ + if (info->page[0].tagged) { + count_off =3D info->reg_off_first[0]; + count_last =3D info->reg_off_split; + if (count_last < 0) { + count_last =3D info->reg_off_last[0]; + } + + do { + mte_check(env, mtedesc, addr + count_off, ra); + count_off +=3D estride; + } while (count_off <=3D count_last); + } + + count_off =3D info->reg_off_first[1]; + if (count_off >=3D 0 && info->page[1].tagged) { + count_last =3D info->reg_off_last[1]; + do { + mte_check(env, mtedesc, addr + count_off, ra); + count_off +=3D estride; + } while (count_off <=3D count_last); + } +} + +static inline QEMU_ALWAYS_INLINE +void sve2p1_ld1_c(CPUARMState *env, ARMVectorReg *zd, const vaddr addr, + uint32_t png, uint32_t desc, + const uintptr_t ra, const MemOp esz, + sve_ldst1_host_fn *host_fn, + sve_ldst1_tlb_fn *tlb_fn) +{ + const unsigned N =3D (desc >> SIMD_DATA_SHIFT) & 1 ? 4 : 2; + const unsigned rstride =3D 1 << ((desc >> (SIMD_DATA_SHIFT + 1)) % 4); + uint32_t mtedesc =3D desc >> (SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT); + const intptr_t reg_max =3D simd_oprsz(desc); + const unsigned esize =3D 1 << esz; + intptr_t count_off, count_last; + intptr_t reg_off, reg_last, reg_n; + SVEContLdSt info; + unsigned estride, flags; + void *host; + + estride =3D sve2p1_cont_ldst_elements(&info, addr, png, reg_max, N, es= z); + if (estride =3D=3D 0) { + /* The entire predicate was false; no load occurs. */ + for (unsigned n =3D 0; n < N; n++) { + memset(zd + n * rstride, 0, reg_max); + } + return; + } + + /* Probe the page(s). Exit with exception for any invalid page. */ + sve_cont_ldst_pages(&info, FAULT_ALL, env, addr, MMU_DATA_LOAD, ra); + + /* Handle watchpoints for all active elements. */ + sve2p1_cont_ldst_watchpoints(&info, env, addr, estride, + esize, BP_MEM_READ, ra); + + /* + * Handle mte checks for all active elements. + * Since TBI must be set for MTE, !mtedesc =3D> !mte_active. + */ + if (mtedesc) { + sve2p1_cont_ldst_mte_check(&info, env, estride, addr, + esize, mtedesc, ra); + } + + flags =3D info.page[0].flags | info.page[1].flags; + if (unlikely(flags !=3D 0)) { + /* + * At least one page includes MMIO. + * Any bus operation can fail with cpu_transaction_failed, + * which for ARM will raise SyncExternal. Perform the load + * into scratch memory to preserve register state until the end. + */ + ARMVectorReg scratch[4] =3D { }; + + count_off =3D info.reg_off_first[0]; + count_last =3D info.reg_off_last[1]; + if (count_last < 0) { + count_last =3D info.reg_off_split; + if (count_last < 0) { + count_last =3D info.reg_off_last[0]; + } + } + reg_off =3D count_off % reg_max; + reg_n =3D count_off / reg_max; + + do { + reg_last =3D MIN(count_last - count_off, reg_max - esize); + do { + tlb_fn(env, &scratch[reg_n], reg_off, addr + count_off, ra= ); + reg_off +=3D estride; + count_off +=3D estride; + } while (reg_off <=3D reg_last); + reg_off =3D 0; + reg_n++; + } while (count_off <=3D count_last); + + for (unsigned n =3D 0; n < N; ++n) { + memcpy(&zd[n * rstride], &scratch[n], reg_max); + } + return; + } + + /* The entire operation is in RAM, on valid pages. */ + + for (unsigned n =3D 0; n < N; ++n) { + memset(&zd[n * rstride], 0, reg_max); + } + + count_off =3D info.reg_off_first[0]; + count_last =3D info.reg_off_last[0]; + reg_off =3D count_off % reg_max; + reg_n =3D count_off / reg_max; + host =3D info.page[0].host; + + set_helper_retaddr(ra); + + do { + reg_last =3D MIN(count_last - reg_n * reg_max, reg_max - esize); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off +=3D estride; + count_off +=3D estride; + } while (reg_off <=3D reg_last); + reg_off =3D 0; + reg_n++; + } while (count_off <=3D count_last); + + clear_helper_retaddr(); + + /* + * Use the slow path to manage the cross-page misalignment. + * But we know this is RAM and cannot trap. + */ + count_off =3D info.reg_off_split; + if (unlikely(count_off >=3D 0)) { + reg_off =3D count_off % reg_max; + reg_n =3D count_off / reg_max; + tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_off, ra); + } + + count_off =3D info.reg_off_first[1]; + if (unlikely(count_off >=3D 0)) { + count_last =3D info.reg_off_last[1]; + reg_off =3D count_off % reg_max; + reg_n =3D count_off / reg_max; + host =3D info.page[1].host; + + set_helper_retaddr(ra); + + do { + reg_last =3D MIN(count_last - reg_n * reg_max, reg_max - esize= ); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off +=3D estride; + count_off +=3D estride; + } while (reg_off <=3D reg_last); + reg_off =3D 0; + reg_n++; + } while (count_off <=3D count_last); + + clear_helper_retaddr(); + } +} + +void HELPER(sve2p1_ld1bb_c)(CPUARMState *env, void *vd, target_ulong addr, + uint32_t png, uint32_t desc) +{ + sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), MO_8, + sve_ld1bb_host, sve_ld1bb_tlb); +} + +#define DO_LD1_2(NAME, ESZ) \ +void HELPER(sve2p1_##NAME##_le_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_le_host, sve_##NAME##_le_tlb); \ +} \ +void HELPER(sve2p1_##NAME##_be_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_be_host, sve_##NAME##_be_tlb); \ +} + +DO_LD1_2(ld1hh, MO_16) +DO_LD1_2(ld1ss, MO_32) +DO_LD1_2(ld1dd, MO_64) + +#undef DO_LD1_2 + +static inline QEMU_ALWAYS_INLINE +void sve2p1_st1_c(CPUARMState *env, ARMVectorReg *zd, const vaddr addr, + uint32_t png, uint32_t desc, + const uintptr_t ra, const int esz, + sve_ldst1_host_fn *host_fn, + sve_ldst1_tlb_fn *tlb_fn) +{ + const unsigned N =3D (desc >> SIMD_DATA_SHIFT) & 1 ? 4 : 2; + const unsigned rstride =3D 1 << ((desc >> (SIMD_DATA_SHIFT + 1)) % 4); + uint32_t mtedesc =3D desc >> (SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT); + const intptr_t reg_max =3D simd_oprsz(desc); + const unsigned esize =3D 1 << esz; + intptr_t count_off, count_last; + intptr_t reg_off, reg_last, reg_n; + SVEContLdSt info; + unsigned estride, flags; + void *host; + + estride =3D sve2p1_cont_ldst_elements(&info, addr, png, reg_max, N, es= z); + if (estride =3D=3D 0) { + /* The entire predicate was false; no store occurs. */ + return; + } + + /* Probe the page(s). Exit with exception for any invalid page. */ + sve_cont_ldst_pages(&info, FAULT_ALL, env, addr, MMU_DATA_STORE, ra); + + /* Handle watchpoints for all active elements. */ + sve2p1_cont_ldst_watchpoints(&info, env, addr, estride, + esize, BP_MEM_WRITE, ra); + + /* + * Handle mte checks for all active elements. + * Since TBI must be set for MTE, !mtedesc =3D> !mte_active. + */ + if (mtedesc) { + sve2p1_cont_ldst_mte_check(&info, env, estride, addr, + esize, mtedesc, ra); + } + + flags =3D info.page[0].flags | info.page[1].flags; + if (unlikely(flags !=3D 0)) { + /* + * At least one page includes MMIO. + * Any bus operation can fail with cpu_transaction_failed, + * which for ARM will raise SyncExternal. Perform the load + * into scratch memory to preserve register state until the end. + */ + count_off =3D info.reg_off_first[0]; + count_last =3D info.reg_off_last[1]; + if (count_last < 0) { + count_last =3D info.reg_off_split; + if (count_last < 0) { + count_last =3D info.reg_off_last[0]; + } + } + reg_off =3D count_off % reg_max; + reg_n =3D count_off / reg_max; + + do { + reg_last =3D MIN(count_last - count_off, reg_max - esize); + do { + tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_of= f, ra); + reg_off +=3D estride; + count_off +=3D estride; + } while (reg_off <=3D reg_last); + reg_off =3D 0; + reg_n++; + } while (count_off <=3D count_last); + return; + } + + /* The entire operation is in RAM, on valid pages. */ + + count_off =3D info.reg_off_first[0]; + count_last =3D info.reg_off_last[0]; + reg_off =3D count_off % reg_max; + reg_n =3D count_off / reg_max; + host =3D info.page[0].host; + + set_helper_retaddr(ra); + + do { + reg_last =3D MIN(count_last - reg_n * reg_max, reg_max - esize); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off +=3D estride; + count_off +=3D estride; + } while (reg_off <=3D reg_last); + reg_off =3D 0; + reg_n++; + } while (count_off <=3D count_last); + + clear_helper_retaddr(); + + /* + * Use the slow path to manage the cross-page misalignment. + * But we know this is RAM and cannot trap. + */ + count_off =3D info.reg_off_split; + if (unlikely(count_off >=3D 0)) { + reg_off =3D count_off % reg_max; + reg_n =3D count_off / reg_max; + tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_off, ra); + } + + count_off =3D info.reg_off_first[1]; + if (unlikely(count_off >=3D 0)) { + count_last =3D info.reg_off_last[1]; + reg_off =3D count_off % reg_max; + reg_n =3D count_off / reg_max; + host =3D info.page[1].host; + + set_helper_retaddr(ra); + + do { + reg_last =3D MIN(count_last - reg_n * reg_max, reg_max - esize= ); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off +=3D estride; + count_off +=3D estride; + } while (reg_off <=3D reg_last); + reg_off =3D 0; + reg_n++; + } while (count_off <=3D count_last); + + clear_helper_retaddr(); + } +} + +void HELPER(sve2p1_st1bb_c)(CPUARMState *env, void *vd, target_ulong addr, + uint32_t png, uint32_t desc) +{ + sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), MO_8, + sve_st1bb_host, sve_st1bb_tlb); +} + +#define DO_ST1_2(NAME, ESZ) \ +void HELPER(sve2p1_##NAME##_le_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_le_host, sve_##NAME##_le_tlb); \ +} \ +void HELPER(sve2p1_##NAME##_be_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_be_host, sve_##NAME##_be_tlb); \ +} + +DO_ST1_2(st1hh, MO_16) +DO_ST1_2(st1ss, MO_32) +DO_ST1_2(st1dd, MO_64) + +#undef DO_ST1_2 + void HELPER(sve2_eor3)(void *vd, void *vn, void *vm, void *vk, uint32_t de= sc) { intptr_t i, opr_sz =3D simd_oprsz(desc) / 8; diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index 02f885dd48..dfb53e4bf4 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -7863,3 +7863,106 @@ TRANS_FEAT(UQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec= _ool_zz, gen_helper_sme2_uqcvtn_sh, a->rd, a->rn, 0) TRANS_FEAT(SQCVTUN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, gen_helper_sme2_sqcvtun_sh, a->rd, a->rn, 0) + +static bool gen_ldst_c(DisasContext *s, TCGv_i64 addr, int zd, int png, + MemOp esz, bool is_write, int n, bool strided) +{ + typedef void ldst_c_fn(TCGv_env, TCGv_ptr, TCGv_i64, + TCGv_i32, TCGv_i32); + static ldst_c_fn * const f_ldst[2][2][4] =3D { + { { gen_helper_sve2p1_ld1bb_c, + gen_helper_sve2p1_ld1hh_le_c, + gen_helper_sve2p1_ld1ss_le_c, + gen_helper_sve2p1_ld1dd_le_c, }, + { gen_helper_sve2p1_ld1bb_c, + gen_helper_sve2p1_ld1hh_be_c, + gen_helper_sve2p1_ld1ss_be_c, + gen_helper_sve2p1_ld1dd_be_c, } }, + + { { gen_helper_sve2p1_st1bb_c, + gen_helper_sve2p1_st1hh_le_c, + gen_helper_sve2p1_st1ss_le_c, + gen_helper_sve2p1_st1dd_le_c, }, + { gen_helper_sve2p1_st1bb_c, + gen_helper_sve2p1_st1hh_be_c, + gen_helper_sve2p1_st1ss_be_c, + gen_helper_sve2p1_st1dd_be_c, } } + }; + + TCGv_i32 t_png, t_desc; + TCGv_ptr t_zd; + uint32_t desc, lg2_rstride =3D 0; + bool be =3D s->be_data =3D=3D MO_BE; + + assert(n =3D=3D 2 || n =3D=3D 4); + if (strided) { + lg2_rstride =3D 3; + if (n =3D=3D 4) { + /* Validate ZD alignment. */ + if (zd & 4) { + return false; + } + lg2_rstride =3D 2; + } + /* Ignore non-temporal bit */ + zd &=3D ~8; + } + + if (strided || !dc_isar_feature(aa64_sve2p1, s) + ? !sme_sm_enabled_check(s) + : !sve_access_check(s)) { + return true; + } + + if (!s->mte_active[0]) { + addr =3D clean_data_tbi(s, addr); + } + + desc =3D n =3D=3D 2 ? 0 : 1; + desc =3D desc | (lg2_rstride << 1); + desc =3D make_svemte_desc(s, vec_full_reg_size(s), 1, esz, is_write, d= esc); + t_desc =3D tcg_constant_i32(desc); + + t_png =3D tcg_temp_new_i32(); + tcg_gen_ld16u_i32(t_png, tcg_env, + pred_full_reg_offset(s, png) ^ + (HOST_BIG_ENDIAN ? 6 : 0)); + + t_zd =3D tcg_temp_new_ptr(); + tcg_gen_addi_ptr(t_zd, tcg_env, vec_full_reg_offset(s, zd)); + + f_ldst[is_write][be][esz](tcg_env, t_zd, addr, t_png, t_desc); + return true; +} + +static bool gen_ldst_zcrr_c(DisasContext *s, arg_zcrr_ldst *a, + bool is_write, bool strided) +{ + TCGv_i64 addr =3D tcg_temp_new_i64(); + + tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->esz); + tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn)); + return gen_ldst_c(s, addr, a->rd, a->png, a->esz, is_write, + a->nreg, strided); +} + +static bool gen_ldst_zcri_c(DisasContext *s, arg_zcri_ldst *a, + bool is_write, bool strided) +{ + TCGv_i64 addr =3D tcg_temp_new_i64(); + + tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), + a->imm * a->nreg * vec_full_reg_size(s)); + return gen_ldst_c(s, addr, a->rd, a->png, a->esz, is_write, + a->nreg, strided); +} + +TRANS_FEAT(LD1_zcrr, aa64_sme2_or_sve2p1, gen_ldst_zcrr_c, a, false, false) +TRANS_FEAT(LD1_zcri, aa64_sme2_or_sve2p1, gen_ldst_zcri_c, a, false, false) +TRANS_FEAT(ST1_zcrr, aa64_sme2_or_sve2p1, gen_ldst_zcrr_c, a, true, false) +TRANS_FEAT(ST1_zcri, aa64_sme2_or_sve2p1, gen_ldst_zcri_c, a, true, false) + +TRANS_FEAT(LD1_zcrr_stride, aa64_sme2, gen_ldst_zcrr_c, a, false, true) +TRANS_FEAT(LD1_zcri_stride, aa64_sme2, gen_ldst_zcri_c, a, false, true) +TRANS_FEAT(ST1_zcrr_stride, aa64_sme2, gen_ldst_zcrr_c, a, true, true) +TRANS_FEAT(ST1_zcri_stride, aa64_sme2, gen_ldst_zcri_c, a, true, true) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index 52a56d3341..bf3d4f4853 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -1812,3 +1812,53 @@ SCLAMP 01000100 .. 0 ..... 110000 ..... ...= .. @rda_rn_rm UCLAMP 01000100 .. 0 ..... 110001 ..... ..... @rda_rn_rm =20 FCLAMP 01100100 .. 1 ..... 001001 ..... ..... @rda_rn_rm + +### SVE2p1 multi-vec contiguous load + +&zcrr_ldst rd png rn rm esz nreg +&zcri_ldst rd png rn imm esz nreg +%png 10:3 !function=3Dplus_8 +%zd_ax2 1:4 !function=3Dtimes_2 +%zd_ax4 2:3 !function=3Dtimes_4 + +LD1_zcrr 10100000000 rm:5 0 esz:2 ... rn:5 .... - \ + &zcrr_ldst %png rd=3D%zd_ax2 nreg=3D2 +LD1_zcrr 10100000000 rm:5 1 esz:2 ... rn:5 ... 0- \ + &zcrr_ldst %png rd=3D%zd_ax4 nreg=3D4 + +ST1_zcrr 10100000001 rm:5 0 esz:2 ... rn:5 .... - \ + &zcrr_ldst %png rd=3D%zd_ax2 nreg=3D2 +ST1_zcrr 10100000001 rm:5 1 esz:2 ... rn:5 ... 0- \ + &zcrr_ldst %png rd=3D%zd_ax4 nreg=3D4 + +LD1_zcri 101000000100 imm:s4 0 esz:2 ... rn:5 .... - \ + &zcri_ldst %png rd=3D%zd_ax2 nreg=3D2 +LD1_zcri 101000000100 imm:s4 1 esz:2 ... rn:5 ... 0- \ + &zcri_ldst %png rd=3D%zd_ax4 nreg=3D4 + +ST1_zcri 101000000110 imm:s4 0 esz:2 ... rn:5 .... - \ + &zcri_ldst %png rd=3D%zd_ax2 nreg=3D2 +ST1_zcri 101000000110 imm:s4 1 esz:2 ... rn:5 ... 0- \ + &zcri_ldst %png rd=3D%zd_ax4 nreg=3D4 + +# Note: N bit and 0 bit (for nreg4) still mashed in rd. +# This is handled within gen_ldst_c(). +LD1_zcrr_stride 10100001000 rm:5 0 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=3D2 +LD1_zcrr_stride 10100001000 rm:5 1 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=3D4 + +ST1_zcrr_stride 10100001001 rm:5 0 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=3D2 +ST1_zcrr_stride 10100001001 rm:5 1 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=3D4 + +LD1_zcri_stride 101000010100 imm:s4 0 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=3D2 +LD1_zcri_stride 101000010100 imm:s4 1 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=3D4 + +ST1_zcri_stride 101000010110 imm:s4 0 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=3D2 +ST1_zcri_stride 101000010110 imm:s4 1 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=3D4 --=20 2.43.0