From nobody Fri Dec 19 04:19:00 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=gmail.com ARC-Seal: i=1; a=rsa-sha256; t=1750857076; cv=none; d=zohomail.com; s=zohoarc; b=D2bIWx1ngPbfQDbHnJH0UnV0k5TFsmdwjR4G8U++lahkZDc1Jr42jhyrrd2Ss4CmUZle5hbWQSByBOU5B/J7vLuT1xiurYFWFygxJDKBA+hh3OZkGS1jeKwrg7lnJBi9jmssN3LwsKhW/FPkHCDtpASz5arzjQ4ZfKFS0vGQfGk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1750857076; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=DfTlTxK1dlkaot0V6gSP9zmhbaUrUlY5kLppezkfyDY=; b=mmWWJzAbQYPjKN3ZF4NMs+Z21VzFRe9Y9AWFlvXFpnFOpfR7DQxbjH8fNrtWxBcBy1UAIs6SmGXhfvaWzQkf13/VyOOBqzKfy6R+ZPPPxGtO3u9QnNo/A4HK+ihthZlNFavi+EnqMLeIS7RpKfmOgSUWuzHDDEYahWh9DM0LaAk= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1750857076537369.7924199830949; Wed, 25 Jun 2025 06:11:16 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uUPtF-0000yC-J0; Wed, 25 Jun 2025 09:10:13 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uUNqX-0003m0-Ni; Wed, 25 Jun 2025 06:59:17 -0400 Received: from mail-pf1-x42e.google.com ([2607:f8b0:4864:20::42e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1uUNqU-0001WN-3f; Wed, 25 Jun 2025 06:59:17 -0400 Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-74801bc6dc5so637566b3a.1; Wed, 25 Jun 2025 03:59:12 -0700 (PDT) Received: from localhost (pa49-178-74-199.pa.nsw.optusnet.com.au. [49.178.74.199]) by smtp.gmail.com with UTF8SMTPSA id d2e1a72fcca58-749c882ceb3sm4147619b3a.76.2025.06.25.03.59.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Jun 2025 03:59:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750849150; x=1751453950; darn=nongnu.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=DfTlTxK1dlkaot0V6gSP9zmhbaUrUlY5kLppezkfyDY=; b=UirR6gxqxGo3w7+ohvUwnC7AjpPPOclg1JkDzqM5/e1fQDGadk+oKr0sKtrdG9tFPZ ZZok43Eco4W4pYguA1TD8Cx+27YyRVtaVyg7lF+BW0Sy7cJu+YMeLq7PwaJuLmB+haUc liJAlQq9snx/QzTpzkShbXFoOWsAj2WAlZAyMBDu0FA7xWH6QJWK9XU6yhzHAjkxIybV CI78lAi7sF+cTvPvgybam85rOPC7Yx81rJ/6r5x6envC0GwUHR1qHoVc3ikEe2/xzpPt o9oNKo4FihImfSmoOu2uG9VCOSHyMrv9KW3CiFhBM8rrBsZwXPrS1OaXdXe5wTNOA3PG OCPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750849150; x=1751453950; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DfTlTxK1dlkaot0V6gSP9zmhbaUrUlY5kLppezkfyDY=; b=tyASX9AO7rymzENVYe5z7UrEuwU2JJ6W/RbsPC90bWNIbiLrkWdwuSTYOmTEDn3PHl EcUjB9AOscCdP6hVozCBLW/kazKqF36zTdEoF5G/sxKQ5o531kjMsPPHLDc3a65kAe8I JS+P4f6UuicYdJD0TIDHE2DwgBKcJhkUPYjQNmvfMz7Bi0hOb4c2Uti0s3vHHHP6fhdC W+UfStUaQnMBdSs00OEfUWbfTIjC7j/C9h9kfZL+ensHLl46rxzSV3x13IM+Qwb9RByi 8W36aatvzEOUbtaZ5Ilb+0PdxbahhtT1A6z1q82hCJenttaV7ro7Gq+YpQNCTrjxgSs6 JQFA== X-Gm-Message-State: AOJu0YwRhRVw/MGl5sjG2d/4V6gOyNJhiBlCKOiBjbejRM0z6AGStqcS AAi7FfreRCpvvz4bliPGJBNvBzr3a1zy/3rC7X9Kjv2ZWSeOgptwXEc3lTzULJ4/nCw= X-Gm-Gg: ASbGncuZMf2T3s6GwAe76COhMDeWFRCcd4OgPqI/5I/k3ZvBmr9abw+6cL+uRVJB1ki 8HD8PW1dicLcXKEVuLZCN0pLjEzbb+eGKyA+Wul1uQrDe7EGK9lOR85hyEMzaVqvZEZ9fDo5pFH 0XjcaoawX5vKAcMU55WZukOT8ECgyxofhn7u+maC8e7580Pcdrn95AGh8Sc7vk9Bhur1E8qgtVU tHCZ7zBLKiuxDXNfWEn8OjpvixY0kII+FEX0OSl9UxohBU6W7FTWsgPxRL5XYvsEzx5WMs0yRuL 5eLy3tsEpL12C2TqXsw3tmZLbK5GmiM1aevR8ztpHSUwpABLs4WkfC9DMGZNcqBF5xqcr0YxGgv lyfqngNa16eyiKRNFI8GPFPTkJHV1jpbgPMvKYN3JnCI= X-Google-Smtp-Source: AGHT+IFJbo9+sJVsXNNJ0MKnKQCGHQVNJpDWiaS+/P5ieZrTB3gKp4auvRfIAWaU7Qo/2tRoiBCV2Q== X-Received: by 2002:a05:6a00:999:b0:748:e289:6bc with SMTP id d2e1a72fcca58-74ad4ad64cfmr3911742b3a.1.1750849149720; Wed, 25 Jun 2025 03:59:09 -0700 (PDT) From: William Kosasih To: qemu-devel@nongnu.org Cc: qemu-arm@nongnu.org, Peter Maydell , William Kosasih Subject: [PATCH] target/arm: Fix M-profile helper loads/stores alignment checks Date: Wed, 25 Jun 2025 20:28:32 +0930 Message-ID: <20250625105832.1277378-1-kosasihwilliam4@gmail.com> X-Mailer: git-send-email 2.48.1 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::42e; envelope-from=kosasihwilliam4@gmail.com; helo=mail-pf1-x42e.google.com X-Spam_score_int: 15 X-Spam_score: 1.5 X-Spam_bar: + X-Spam_report: (1.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_SBL_CSS=3.335, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Wed, 25 Jun 2025 09:10:04 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @gmail.com) X-ZM-MESSAGEID: 1750857082280124100 Content-Type: text/plain; charset="utf-8" Historically, M-profile helper functions in m_helper.c and mve_helper.c used the unaligned cpu_*_data_ra() routines to perform guest memory accesses. This meant we had no way to enforce alignment constraints when executing helper-based loads/stores. With the addition of the cpu_*_mmu() APIs, we can now combine the current MMU state (cpu_mmu_index(env, false)) with MO_ALIGN flags to build a MemOpIdx that enforces alignment at the helper level. This patch: - Replaces all calls to cpu_ldl_data_ra(), cpu_ldst_data_ra(), etc., in the M-profile helpers (m_helper.c) and the MVE helpers (mve_helper.c) with their cpu_*_mmu() equivalents. - Leaves SME and SVE helper code untouched, as those extensions support unaligned accesses by design. - Retains the manual alignment checks in the vlldm/vlstm helpers because those instructions enforce an 8-byte alignment requirement (instead of the 4-byte alignment for ordinary long loads/stores). References to cpu_*_data_* are still replaced with cpu_*_mmu(), so that the individual word accesses themselves also perform the standard alignment checks, in keeping with the ARM pseudocode. With this change, all M-profile and MVE helper-based loads and stores will now correctly honor their alignment requirements. Signed-off-by: William Kosasih Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 --- target/arm/tcg/m_helper.c | 33 +-- target/arm/tcg/mve_helper.c | 408 ++++++++++++++++++++---------------- 2 files changed, 254 insertions(+), 187 deletions(-) diff --git a/target/arm/tcg/m_helper.c b/target/arm/tcg/m_helper.c index 6614719832..28307b5615 100644 --- a/target/arm/tcg/m_helper.c +++ b/target/arm/tcg/m_helper.c @@ -632,8 +632,11 @@ void HELPER(v7m_blxns)(CPUARMState *env, uint32_t dest) } =20 /* Note that these stores can throw exceptions on MPU faults */ - cpu_stl_data_ra(env, sp, nextinst, GETPC()); - cpu_stl_data_ra(env, sp + 4, saved_psr, GETPC()); + ARMMMUIdx mmu_idx =3D arm_mmu_idx(env); + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, + arm_to_core_mmu_idx(mmu_idx)); + cpu_stl_mmu(env, sp, nextinst, oi, GETPC()); + cpu_stl_mmu(env, sp + 4, saved_psr, oi, GETPC()); =20 env->regs[13] =3D sp; env->regs[14] =3D 0xfeffffff; @@ -1048,6 +1051,9 @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fpt= r) bool s =3D env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK; bool lspact =3D env->v7m.fpccr[s] & R_V7M_FPCCR_LSPACT_MASK; uintptr_t ra =3D GETPC(); + ARMMMUIdx mmu_idx =3D arm_mmu_idx(env); + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, + arm_to_core_mmu_idx(mmu_idx)); =20 assert(env->v7m.secure); =20 @@ -1073,7 +1079,7 @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fpt= r) * Note that we do not use v7m_stack_write() here, because the * accesses should not set the FSR bits for stacking errors if they * fail. (In pseudocode terms, they are AccType_NORMAL, not AccType_ST= ACK - * or AccType_LAZYFP). Faults in cpu_stl_data_ra() will throw exceptio= ns + * or AccType_LAZYFP). Faults in cpu_stl_mmu() will throw exceptions * and longjmp out. */ if (!(env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_LSPEN_MASK)) { @@ -1089,12 +1095,12 @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t f= ptr) if (i >=3D 16) { faddr +=3D 8; /* skip the slot for the FPSCR */ } - cpu_stl_data_ra(env, faddr, slo, ra); - cpu_stl_data_ra(env, faddr + 4, shi, ra); + cpu_stl_mmu(env, faddr, slo, oi, ra); + cpu_stl_mmu(env, faddr + 4, shi, oi, ra); } - cpu_stl_data_ra(env, fptr + 0x40, vfp_get_fpscr(env), ra); + cpu_stl_mmu(env, fptr + 0x40, vfp_get_fpscr(env), oi, ra); if (cpu_isar_feature(aa32_mve, cpu)) { - cpu_stl_data_ra(env, fptr + 0x44, env->v7m.vpr, ra); + cpu_stl_mmu(env, fptr + 0x44, env->v7m.vpr, oi, ra); } =20 /* @@ -1121,6 +1127,9 @@ void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fpt= r) { ARMCPU *cpu =3D env_archcpu(env); uintptr_t ra =3D GETPC(); + ARMMMUIdx mmu_idx =3D arm_mmu_idx(env); + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, + arm_to_core_mmu_idx(mmu_idx)); =20 /* fptr is the value of Rn, the frame pointer we load the FP regs from= */ assert(env->v7m.secure); @@ -1155,16 +1164,16 @@ void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t f= ptr) faddr +=3D 8; /* skip the slot for the FPSCR and VPR */ } =20 - slo =3D cpu_ldl_data_ra(env, faddr, ra); - shi =3D cpu_ldl_data_ra(env, faddr + 4, ra); + slo =3D cpu_ldl_mmu(env, faddr, oi, ra); + shi =3D cpu_ldl_mmu(env, faddr + 4, oi, ra); =20 dn =3D (uint64_t) shi << 32 | slo; *aa32_vfp_dreg(env, i / 2) =3D dn; } - fpscr =3D cpu_ldl_data_ra(env, fptr + 0x40, ra); + fpscr =3D cpu_ldl_mmu(env, fptr + 0x40, oi, ra); vfp_set_fpscr(env, fpscr); if (cpu_isar_feature(aa32_mve, cpu)) { - env->v7m.vpr =3D cpu_ldl_data_ra(env, fptr + 0x44, ra); + env->v7m.vpr =3D cpu_ldl_mmu(env, fptr + 0x44, oi, ra); } } =20 @@ -1937,7 +1946,7 @@ static bool do_v7m_function_return(ARMCPU *cpu) * do them as secure, so work out what MMU index that is. */ mmu_idx =3D arm_v7m_mmu_idx_for_secstate(env, true); - oi =3D make_memop_idx(MO_LEUL, arm_to_core_mmu_idx(mmu_idx)); + oi =3D make_memop_idx(MO_LEUL | MO_ALIGN, arm_to_core_mmu_idx(mmu_= idx)); newpc =3D cpu_ldl_mmu(env, frameptr, oi, 0); newpsr =3D cpu_ldl_mmu(env, frameptr + 4, oi, 0); =20 diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index 506d1c3475..2a4521e1fa 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -147,62 +147,85 @@ static void mve_advance_vpt(CPUARMState *env) env->v7m.vpr =3D vpr; } =20 +/* Mapping of LDTYPE/STTYPE to the number of bytes accessed */ +#define MSIZE_b 1 +#define MSIZE_w 2 +#define MSIZE_l 4 + +/* Mapping of LDTYPE/STTYPE to MemOp flag */ +#define MFLAG_b MO_UB +#define MFLAG_w MO_TEUW +#define MFLAG_l MO_TEUL + +#define MSIZE(t) MSIZE_##t +#define MFLAG(t) MFLAG_##t + +#define SIGN_EXT(v, T, B) \ + ((T)(v) << (sizeof(T) * 8 - (B))) >> (sizeof(T) * 8 - (B)) + /* For loads, predicated lanes are zeroed instead of keeping their old val= ues */ -#define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE) \ - void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr) \ - { \ - TYPE *d =3D vd; \ - uint16_t mask =3D mve_element_mask(env); \ - uint16_t eci_mask =3D mve_eci_mask(env); \ - unsigned b, e; \ - /* \ - * R_SXTM allows the dest reg to become UNKNOWN for abandoned \ - * beats so we don't care if we update part of the dest and \ - * then take an exception. \ - */ \ - for (b =3D 0, e =3D 0; b < 16; b +=3D ESIZE, e++) { = \ - if (eci_mask & (1 << b)) { \ - d[H##ESIZE(e)] =3D (mask & (1 << b)) ? \ - cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \ - } \ - addr +=3D MSIZE; \ - } \ - mve_advance_vpt(env); \ +#define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE) = \ + void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr) = \ + { = \ + TYPE *d =3D vd; = \ + uint16_t mask =3D mve_element_mask(env); = \ + uint16_t eci_mask =3D mve_eci_mask(env); = \ + unsigned b, e; = \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); = \ + MemOpIdx oi =3D make_memop_idx(MFLAG(LDTYPE) | MO_ALIGN, mmu_idx);= \ + /* = \ + * R_SXTM allows the dest reg to become UNKNOWN for abandoned = \ + * beats so we don't care if we update part of the dest and = \ + * then take an exception. = \ + */ = \ + for (b =3D 0, e =3D 0; b < 16; b +=3D ESIZE, e++) { = \ + if (eci_mask & (1 << b)) { = \ + d[H##ESIZE(e)] =3D (mask & (1 << b)) ? = \ + SIGN_EXT(cpu_ld##LDTYPE##_mmu(env, addr, oi, GETPC()),= \ + TYPE, = \ + MSIZE * 8) = \ + : 0; = \ + } = \ + addr +=3D MSIZE; = \ + } = \ + mve_advance_vpt(env); = \ } =20 -#define DO_VSTR(OP, MSIZE, STTYPE, ESIZE, TYPE) \ - void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr) \ - { \ - TYPE *d =3D vd; \ - uint16_t mask =3D mve_element_mask(env); \ - unsigned b, e; \ - for (b =3D 0, e =3D 0; b < 16; b +=3D ESIZE, e++) { = \ - if (mask & (1 << b)) { \ - cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC())= ; \ - } \ - addr +=3D MSIZE; \ - } \ - mve_advance_vpt(env); \ +#define DO_VSTR(OP, MSIZE, STTYPE, ESIZE, TYPE) = \ + void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr) = \ + { = \ + TYPE *d =3D vd; = \ + uint16_t mask =3D mve_element_mask(env); = \ + unsigned b, e; = \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); = \ + MemOpIdx oi =3D make_memop_idx(MFLAG(STTYPE) | MO_ALIGN, mmu_idx);= \ + for (b =3D 0, e =3D 0; b < 16; b +=3D ESIZE, e++) { = \ + if (mask & (1 << b)) { = \ + cpu_st##STTYPE##_mmu(env, addr, d[H##ESIZE(e)], oi, GETPC(= )); \ + } = \ + addr +=3D MSIZE; = \ + } = \ + mve_advance_vpt(env); = \ } =20 -DO_VLDR(vldrb, 1, ldub, 1, uint8_t) -DO_VLDR(vldrh, 2, lduw, 2, uint16_t) -DO_VLDR(vldrw, 4, ldl, 4, uint32_t) +DO_VLDR(vldrb, 1, b, 1, uint8_t) +DO_VLDR(vldrh, 2, w, 2, uint16_t) +DO_VLDR(vldrw, 4, l, 4, uint32_t) =20 -DO_VSTR(vstrb, 1, stb, 1, uint8_t) -DO_VSTR(vstrh, 2, stw, 2, uint16_t) -DO_VSTR(vstrw, 4, stl, 4, uint32_t) +DO_VSTR(vstrb, 1, b, 1, uint8_t) +DO_VSTR(vstrh, 2, w, 2, uint16_t) +DO_VSTR(vstrw, 4, l, 4, uint32_t) =20 -DO_VLDR(vldrb_sh, 1, ldsb, 2, int16_t) -DO_VLDR(vldrb_sw, 1, ldsb, 4, int32_t) -DO_VLDR(vldrb_uh, 1, ldub, 2, uint16_t) -DO_VLDR(vldrb_uw, 1, ldub, 4, uint32_t) -DO_VLDR(vldrh_sw, 2, ldsw, 4, int32_t) -DO_VLDR(vldrh_uw, 2, lduw, 4, uint32_t) +DO_VLDR(vldrb_sh, 1, b, 2, int16_t) +DO_VLDR(vldrb_sw, 1, b, 4, int32_t) +DO_VLDR(vldrb_uh, 1, b, 2, uint16_t) +DO_VLDR(vldrb_uw, 1, b, 4, uint32_t) +DO_VLDR(vldrh_sw, 2, w, 4, int32_t) +DO_VLDR(vldrh_uw, 2, w, 4, uint32_t) =20 -DO_VSTR(vstrb_h, 1, stb, 2, int16_t) -DO_VSTR(vstrb_w, 1, stb, 4, int32_t) -DO_VSTR(vstrh_w, 2, stw, 4, int32_t) +DO_VSTR(vstrb_h, 1, b, 2, int16_t) +DO_VSTR(vstrb_w, 1, b, 4, int32_t) +DO_VSTR(vstrh_w, 2, w, 4, int32_t) =20 #undef DO_VLDR #undef DO_VSTR @@ -214,54 +237,61 @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t) * For loads, predicated lanes are zeroed instead of retaining * their previous values. */ -#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN, WB) \ - void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \ - uint32_t base) \ - { \ - TYPE *d =3D vd; \ - OFFTYPE *m =3D vm; \ - uint16_t mask =3D mve_element_mask(env); \ - uint16_t eci_mask =3D mve_eci_mask(env); \ - unsigned e; \ - uint32_t addr; \ - for (e =3D 0; e < 16 / ESIZE; e++, mask >>=3D ESIZE, eci_mask >>= =3D ESIZE) { \ - if (!(eci_mask & 1)) { \ - continue; \ - } \ - addr =3D ADDRFN(base, m[H##ESIZE(e)]); \ - d[H##ESIZE(e)] =3D (mask & 1) ? \ - cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \ - if (WB) { \ - m[H##ESIZE(e)] =3D addr; \ - } \ - } \ - mve_advance_vpt(env); \ +#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN, WB) = \ + void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, = \ + uint32_t base) = \ + { = \ + TYPE *d =3D vd; = \ + OFFTYPE *m =3D vm; = \ + uint16_t mask =3D mve_element_mask(env); = \ + uint16_t eci_mask =3D mve_eci_mask(env); = \ + unsigned e; = \ + uint32_t addr; = \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); = \ + MemOpIdx oi =3D make_memop_idx(MFLAG(LDTYPE) | MO_ALIGN, mmu_idx);= \ + for (e =3D 0; e < 16 / ESIZE; e++, mask >>=3D ESIZE, eci_mask >>= =3D ESIZE) {\ + if (!(eci_mask & 1)) { = \ + continue; = \ + } = \ + addr =3D ADDRFN(base, m[H##ESIZE(e)]); = \ + d[H##ESIZE(e)] =3D (mask & 1) ? = \ + SIGN_EXT(cpu_ld##LDTYPE##_mmu(env, addr, oi, GETPC()), = \ + TYPE, = \ + MSIZE(LDTYPE) * 8) = \ + : 0; = \ + if (WB) { = \ + m[H##ESIZE(e)] =3D addr; = \ + } = \ + } = \ + mve_advance_vpt(env); = \ } =20 /* We know here TYPE is unsigned so always the same as the offset type */ -#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN, WB) \ - void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \ - uint32_t base) \ - { \ - TYPE *d =3D vd; \ - TYPE *m =3D vm; \ - uint16_t mask =3D mve_element_mask(env); \ - uint16_t eci_mask =3D mve_eci_mask(env); \ - unsigned e; \ - uint32_t addr; \ - for (e =3D 0; e < 16 / ESIZE; e++, mask >>=3D ESIZE, eci_mask >>= =3D ESIZE) { \ - if (!(eci_mask & 1)) { \ - continue; \ - } \ - addr =3D ADDRFN(base, m[H##ESIZE(e)]); \ - if (mask & 1) { \ - cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC())= ; \ - } \ - if (WB) { \ - m[H##ESIZE(e)] =3D addr; \ - } \ - } \ - mve_advance_vpt(env); \ +#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN, WB) = \ + void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, = \ + uint32_t base) = \ + { = \ + TYPE *d =3D vd; = \ + TYPE *m =3D vm; = \ + uint16_t mask =3D mve_element_mask(env); = \ + uint16_t eci_mask =3D mve_eci_mask(env); = \ + unsigned e; = \ + uint32_t addr; = \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); = \ + MemOpIdx oi =3D make_memop_idx(MFLAG(STTYPE) | MO_ALIGN, mmu_idx);= \ + for (e =3D 0; e < 16 / ESIZE; e++, mask >>=3D ESIZE, eci_mask >>= =3D ESIZE) {\ + if (!(eci_mask & 1)) { = \ + continue; = \ + } = \ + addr =3D ADDRFN(base, m[H##ESIZE(e)]); = \ + if (mask & 1) { = \ + cpu_st##STTYPE##_mmu(env, addr, d[H##ESIZE(e)], oi, GETPC(= )); \ + } = \ + if (WB) { = \ + m[H##ESIZE(e)] =3D addr; = \ + } = \ + } = \ + mve_advance_vpt(env); = \ } =20 /* @@ -272,54 +302,58 @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t) * Address writeback happens on the odd beats and updates the address * stored in the even-beat element. */ -#define DO_VLDR64_SG(OP, ADDRFN, WB) \ - void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \ - uint32_t base) \ - { \ - uint32_t *d =3D vd; \ - uint32_t *m =3D vm; \ - uint16_t mask =3D mve_element_mask(env); \ - uint16_t eci_mask =3D mve_eci_mask(env); \ - unsigned e; \ - uint32_t addr; \ - for (e =3D 0; e < 16 / 4; e++, mask >>=3D 4, eci_mask >>=3D 4) { = \ - if (!(eci_mask & 1)) { \ - continue; \ - } \ - addr =3D ADDRFN(base, m[H4(e & ~1)]); \ - addr +=3D 4 * (e & 1); \ - d[H4(e)] =3D (mask & 1) ? cpu_ldl_data_ra(env, addr, GETPC()) = : 0; \ - if (WB && (e & 1)) { \ - m[H4(e & ~1)] =3D addr - 4; \ - } \ - } \ - mve_advance_vpt(env); \ +#define DO_VLDR64_SG(OP, ADDRFN, WB) = \ + void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, = \ + uint32_t base) = \ + { = \ + uint32_t *d =3D vd; = \ + uint32_t *m =3D vm; = \ + uint16_t mask =3D mve_element_mask(env); = \ + uint16_t eci_mask =3D mve_eci_mask(env); = \ + unsigned e; = \ + uint32_t addr; = \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); = \ + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); = \ + for (e =3D 0; e < 16 / 4; e++, mask >>=3D 4, eci_mask >>=3D 4) { = \ + if (!(eci_mask & 1)) { = \ + continue; = \ + } = \ + addr =3D ADDRFN(base, m[H4(e & ~1)]); = \ + addr +=3D 4 * (e & 1); = \ + d[H4(e)] =3D (mask & 1) ? cpu_ldl_mmu(env, addr, oi, GETPC()) = : 0; \ + if (WB && (e & 1)) { = \ + m[H4(e & ~1)] =3D addr - 4; = \ + } = \ + } = \ + mve_advance_vpt(env); = \ } =20 -#define DO_VSTR64_SG(OP, ADDRFN, WB) \ - void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \ - uint32_t base) \ - { \ - uint32_t *d =3D vd; \ - uint32_t *m =3D vm; \ - uint16_t mask =3D mve_element_mask(env); \ - uint16_t eci_mask =3D mve_eci_mask(env); \ - unsigned e; \ - uint32_t addr; \ - for (e =3D 0; e < 16 / 4; e++, mask >>=3D 4, eci_mask >>=3D 4) { = \ - if (!(eci_mask & 1)) { \ - continue; \ - } \ - addr =3D ADDRFN(base, m[H4(e & ~1)]); \ - addr +=3D 4 * (e & 1); \ - if (mask & 1) { \ - cpu_stl_data_ra(env, addr, d[H4(e)], GETPC()); \ - } \ - if (WB && (e & 1)) { \ - m[H4(e & ~1)] =3D addr - 4; \ - } \ - } \ - mve_advance_vpt(env); \ +#define DO_VSTR64_SG(OP, ADDRFN, WB) = \ + void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, = \ + uint32_t base) = \ + { = \ + uint32_t *d =3D vd; = \ + uint32_t *m =3D vm; = \ + uint16_t mask =3D mve_element_mask(env); = \ + uint16_t eci_mask =3D mve_eci_mask(env); = \ + unsigned e; = \ + uint32_t addr; = \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); = \ + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); = \ + for (e =3D 0; e < 16 / 4; e++, mask >>=3D 4, eci_mask >>=3D 4) { = \ + if (!(eci_mask & 1)) { = \ + continue; = \ + } = \ + addr =3D ADDRFN(base, m[H4(e & ~1)]); = \ + addr +=3D 4 * (e & 1); = \ + if (mask & 1) { = \ + cpu_stl_mmu(env, addr, d[H4(e)], oi, GETPC()); = \ + } = \ + if (WB && (e & 1)) { = \ + m[H4(e & ~1)] =3D addr - 4; = \ + } = \ + } = \ + mve_advance_vpt(env); = \ } =20 #define ADDR_ADD(BASE, OFFSET) ((BASE) + (OFFSET)) @@ -327,40 +361,40 @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t) #define ADDR_ADD_OSW(BASE, OFFSET) ((BASE) + ((OFFSET) << 2)) #define ADDR_ADD_OSD(BASE, OFFSET) ((BASE) + ((OFFSET) << 3)) =20 -DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_sh, b, 2, int16_t, uint16_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_sw, b, 4, int32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrh_sg_sw, w, 4, int32_t, uint32_t, ADDR_ADD, false) =20 -DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_ub, b, 1, uint8_t, uint8_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_uh, b, 2, uint16_t, uint16_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_uw, b, 4, uint32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrh_sg_uh, w, 2, uint16_t, uint16_t, ADDR_ADD, false) +DO_VLDR_SG(vldrh_sg_uw, w, 4, uint32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrw_sg_uw, l, 4, uint32_t, uint32_t, ADDR_ADD, false) DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD, false) =20 -DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH, false) -DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH, fals= e) -DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH, fals= e) -DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW, false) +DO_VLDR_SG(vldrh_sg_os_sw, w, 4, int32_t, uint32_t, ADDR_ADD_OSH, false) +DO_VLDR_SG(vldrh_sg_os_uh, w, 2, uint16_t, uint16_t, ADDR_ADD_OSH, false) +DO_VLDR_SG(vldrh_sg_os_uw, w, 4, uint32_t, uint32_t, ADDR_ADD_OSH, false) +DO_VLDR_SG(vldrw_sg_os_uw, l, 4, uint32_t, uint32_t, ADDR_ADD_OSW, false) DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD, false) =20 -DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD, false) -DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD, false) -DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD, false) -DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD, false) -DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD, false) -DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD, false) +DO_VSTR_SG(vstrb_sg_ub, b, 1, uint8_t, ADDR_ADD, false) +DO_VSTR_SG(vstrb_sg_uh, b, 2, uint16_t, ADDR_ADD, false) +DO_VSTR_SG(vstrb_sg_uw, b, 4, uint32_t, ADDR_ADD, false) +DO_VSTR_SG(vstrh_sg_uh, w, 2, uint16_t, ADDR_ADD, false) +DO_VSTR_SG(vstrh_sg_uw, w, 4, uint32_t, ADDR_ADD, false) +DO_VSTR_SG(vstrw_sg_uw, l, 4, uint32_t, ADDR_ADD, false) DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD, false) =20 -DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH, false) -DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH, false) -DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW, false) +DO_VSTR_SG(vstrh_sg_os_uh, w, 2, uint16_t, ADDR_ADD_OSH, false) +DO_VSTR_SG(vstrh_sg_os_uw, w, 4, uint32_t, ADDR_ADD_OSH, false) +DO_VSTR_SG(vstrw_sg_os_uw, l, 4, uint32_t, ADDR_ADD_OSW, false) DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD, false) =20 -DO_VLDR_SG(vldrw_sg_wb_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, true) +DO_VLDR_SG(vldrw_sg_wb_uw, l, 4, uint32_t, uint32_t, ADDR_ADD, true) DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true) -DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true) +DO_VSTR_SG(vstrw_sg_wb_uw, l, 4, uint32_t, ADDR_ADD, true) DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) =20 /* @@ -387,13 +421,15 @@ DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) uint16_t mask =3D mve_eci_mask(env); \ static const uint8_t off[4] =3D { O1, O2, O3, O4 }; \ uint32_t addr, data; \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat =3D 0; beat < 4; beat++, mask >>=3D 4) { = \ if ((mask & 1) =3D=3D 0) { = \ /* ECI says skip this beat */ \ continue; \ } \ addr =3D base + off[beat] * 4; \ - data =3D cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data =3D cpu_ldl_mmu(env, addr, oi, GETPC()); \ for (e =3D 0; e < 4; e++, data >>=3D 8) { = \ uint8_t *qd =3D (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \ qd[H1(off[beat])] =3D data; \ @@ -411,13 +447,15 @@ DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) uint32_t addr, data; \ int y; /* y counts 0 2 0 2 */ \ uint16_t *qd; \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat =3D 0, y =3D 0; beat < 4; beat++, mask >>=3D 4, y ^=3D 2= ) { \ if ((mask & 1) =3D=3D 0) { = \ /* ECI says skip this beat */ \ continue; \ } \ addr =3D base + off[beat] * 8 + (beat & 1) * 4; \ - data =3D cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data =3D cpu_ldl_mmu(env, addr, oi, GETPC()); \ qd =3D (uint16_t *)aa32_vfp_qreg(env, qnidx + y); \ qd[H2(off[beat])] =3D data; \ data >>=3D 16; \ @@ -436,13 +474,15 @@ DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) uint32_t addr, data; \ uint32_t *qd; \ int y; \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat =3D 0; beat < 4; beat++, mask >>=3D 4) { = \ if ((mask & 1) =3D=3D 0) { = \ /* ECI says skip this beat */ \ continue; \ } \ addr =3D base + off[beat] * 4; \ - data =3D cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data =3D cpu_ldl_mmu(env, addr, oi, GETPC()); \ y =3D (beat + (O1 & 2)) & 3; \ qd =3D (uint32_t *)aa32_vfp_qreg(env, qnidx + y); \ qd[H4(off[beat] >> 2)] =3D data; \ @@ -473,13 +513,15 @@ DO_VLD4W(vld43w, 6, 7, 8, 9) static const uint8_t off[4] =3D { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint8_t *qd; \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat =3D 0; beat < 4; beat++, mask >>=3D 4) { = \ if ((mask & 1) =3D=3D 0) { = \ /* ECI says skip this beat */ \ continue; \ } \ addr =3D base + off[beat] * 2; \ - data =3D cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data =3D cpu_ldl_mmu(env, addr, oi, GETPC()); \ for (e =3D 0; e < 4; e++, data >>=3D 8) { = \ qd =3D (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1)); \ qd[H1(off[beat] + (e >> 1))] =3D data; \ @@ -497,13 +539,15 @@ DO_VLD4W(vld43w, 6, 7, 8, 9) uint32_t addr, data; \ int e; \ uint16_t *qd; \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat =3D 0; beat < 4; beat++, mask >>=3D 4) { = \ if ((mask & 1) =3D=3D 0) { = \ /* ECI says skip this beat */ \ continue; \ } \ addr =3D base + off[beat] * 4; \ - data =3D cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data =3D cpu_ldl_mmu(env, addr, oi, GETPC()); \ for (e =3D 0; e < 2; e++, data >>=3D 16) { = \ qd =3D (uint16_t *)aa32_vfp_qreg(env, qnidx + e); \ qd[H2(off[beat])] =3D data; \ @@ -520,13 +564,15 @@ DO_VLD4W(vld43w, 6, 7, 8, 9) static const uint8_t off[4] =3D { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint32_t *qd; \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat =3D 0; beat < 4; beat++, mask >>=3D 4) { = \ if ((mask & 1) =3D=3D 0) { = \ /* ECI says skip this beat */ \ continue; \ } \ addr =3D base + off[beat]; \ - data =3D cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data =3D cpu_ldl_mmu(env, addr, oi, GETPC()); \ qd =3D (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1)); \ qd[H4(off[beat] >> 3)] =3D data; \ } \ @@ -549,6 +595,8 @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint16_t mask =3D mve_eci_mask(env); \ static const uint8_t off[4] =3D { O1, O2, O3, O4 }; \ uint32_t addr, data; \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat =3D 0; beat < 4; beat++, mask >>=3D 4) { = \ if ((mask & 1) =3D=3D 0) { = \ /* ECI says skip this beat */ \ @@ -560,7 +608,7 @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint8_t *qd =3D (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \ data =3D (data << 8) | qd[H1(off[beat])]; \ } \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } =20 @@ -574,6 +622,8 @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint32_t addr, data; \ int y; /* y counts 0 2 0 2 */ \ uint16_t *qd; \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat =3D 0, y =3D 0; beat < 4; beat++, mask >>=3D 4, y ^=3D 2= ) { \ if ((mask & 1) =3D=3D 0) { = \ /* ECI says skip this beat */ \ @@ -584,7 +634,7 @@ DO_VLD2W(vld21w, 8, 12, 16, 20) data =3D qd[H2(off[beat])]; \ qd =3D (uint16_t *)aa32_vfp_qreg(env, qnidx + y + 1); \ data |=3D qd[H2(off[beat])] << 16; \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } =20 @@ -598,6 +648,8 @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint32_t addr, data; \ uint32_t *qd; \ int y; \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat =3D 0; beat < 4; beat++, mask >>=3D 4) { = \ if ((mask & 1) =3D=3D 0) { = \ /* ECI says skip this beat */ \ @@ -607,7 +659,7 @@ DO_VLD2W(vld21w, 8, 12, 16, 20) y =3D (beat + (O1 & 2)) & 3; \ qd =3D (uint32_t *)aa32_vfp_qreg(env, qnidx + y); \ data =3D qd[H4(off[beat] >> 2)]; \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } =20 @@ -635,6 +687,8 @@ DO_VST4W(vst43w, 6, 7, 8, 9) static const uint8_t off[4] =3D { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint8_t *qd; \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat =3D 0; beat < 4; beat++, mask >>=3D 4) { = \ if ((mask & 1) =3D=3D 0) { = \ /* ECI says skip this beat */ \ @@ -646,7 +700,7 @@ DO_VST4W(vst43w, 6, 7, 8, 9) qd =3D (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1)); \ data =3D (data << 8) | qd[H1(off[beat] + (e >> 1))]; \ } \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } =20 @@ -660,6 +714,8 @@ DO_VST4W(vst43w, 6, 7, 8, 9) uint32_t addr, data; \ int e; \ uint16_t *qd; \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat =3D 0; beat < 4; beat++, mask >>=3D 4) { = \ if ((mask & 1) =3D=3D 0) { = \ /* ECI says skip this beat */ \ @@ -671,7 +727,7 @@ DO_VST4W(vst43w, 6, 7, 8, 9) qd =3D (uint16_t *)aa32_vfp_qreg(env, qnidx + e); \ data =3D (data << 16) | qd[H2(off[beat])]; \ } \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } =20 @@ -684,6 +740,8 @@ DO_VST4W(vst43w, 6, 7, 8, 9) static const uint8_t off[4] =3D { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint32_t *qd; \ + int mmu_idx =3D arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi =3D make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat =3D 0; beat < 4; beat++, mask >>=3D 4) { = \ if ((mask & 1) =3D=3D 0) { = \ /* ECI says skip this beat */ \ @@ -692,7 +750,7 @@ DO_VST4W(vst43w, 6, 7, 8, 9) addr =3D base + off[beat]; \ qd =3D (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1)); \ data =3D qd[H4(off[beat] >> 3)]; \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } =20 --=20 2.48.1