Series comparison

-[Qemu-devel] [PULL 00/42] target-arm queue
+[PULL 00/44] target-arm queue
-First pullreq for arm of the 4.1 series, since I'm back from
+First set of arm patches for 6.2. I have a lot more in my
-holiday now. This is mostly my M-profile FPU series and Philippe's
+to-review queue still...
 devices.h cleanup. I have a pile of other patchsets to work through
 in my to-review folder, but 42 patches is definitely quite
 big enough to send now...
-thanks
 -- PMM
-The following changes since commit 413a99a92c13ec408dcf2adaa87918dc81e890c8:
+The following changes since commit d42685765653ec155fdf60910662f8830bdb2cef:
-  Add Nios II semihosting support. (2019-04-29 16:09:51 +0100)
+  Open 6.2 development tree (2021-08-25 10:25:12 +0100)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20190429
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210825
-for you to fetch changes up to 437cc27ddfded3bbab6afd5ac1761e0e195edba7:
+for you to fetch changes up to 24b1a6aa43615be22c7ee66bd68ec5675f6a6a9a:
-  hw/devices: Move SMSC 91C111 declaration into a new header (2019-04-29 17:57:21 +0100)
+  docs: Document how to use gdb with unix sockets (2021-08-25 10:48:51 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * remove "bag of random stuff" hw/devices.h header
+ * More MVE emulation work
- * implement FPU for Cortex-M and enable it for Cortex-M4 and -M33
+ * Implement M-profile trapping on division by zero
- * hw/dma: Compile the bcm2835_dma device as common object
+ * kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()
- * configure: Remove --source-path option
+ * hw/char/pl011: add support for sending break
- * hw/ssi/xilinx_spips: Avoid variable length array
+ * fsl-imx6ul: Instantiate SAI1/2/3 and ASRC as unimplemented devices
- * hw/arm/smmuv3: Remove SMMUNotifierNode
+ * hw/dma/pl330: Add memory region to replace default
  * sbsa-ref: Rename SBSA_GWDT enum value
  * fsl-imx7: Instantiate SAI1/2/3 as unimplemented devices
  * docs: Document how to use gdb with unix sockets
 ----------------------------------------------------------------
-Eric Auger (1):
+Eduardo Habkost (1):
-      hw/arm/smmuv3: Remove SMMUNotifierNode
+      sbsa-ref: Rename SBSA_GWDT enum value
-Peter Maydell (28):
+Guenter Roeck (2):
-      hw/ssi/xilinx_spips: Avoid variable length array
+      fsl-imx6ul: Instantiate SAI1/2/3 and ASRC as unimplemented devices
-      configure: Remove --source-path option
+      fsl-imx7: Instantiate SAI1/2/3 as unimplemented devices
       target/arm: Make sure M-profile FPSCR RES0 bits are not settable
       hw/intc/armv7m_nvic: Allow reading of M-profile MVFR* registers
       target/arm: Implement dummy versions of M-profile FP-related registers
       target/arm: Disable most VFP sysregs for M-profile
       target/arm: Honour M-profile FP enable bits
       target/arm: Decode FP instructions for M profile
       target/arm: Clear CONTROL_S.SFPA in SG insn if FPU present
       target/arm: Handle SFPA and FPCA bits in reads and writes of CONTROL
       target/arm/helper: don't return early for STKOF faults during stacking
       target/arm: Handle floating point registers in exception entry
       target/arm: Implement v7m_update_fpccr()
       target/arm: Clear CONTROL.SFPA in BXNS and BLXNS
       target/arm: Clean excReturn bits when tail chaining
       target/arm: Allow for floating point in callee stack integrity check
       target/arm: Handle floating point registers in exception return
       target/arm: Move NS TBFLAG from bit 19 to bit 6
       target/arm: Overlap VECSTRIDE and XSCALE_CPAR TB flags
       target/arm: Set FPCCR.S when executing M-profile floating point insns
       target/arm: Activate M-profile floating point context when FPCCR.ASPEN is set
       target/arm: New helper function arm_v7m_mmu_idx_all()
       target/arm: New function armv7m_nvic_set_pending_lazyfp()
       target/arm: Add lazy-FP-stacking support to v7m_stack_write()
       target/arm: Implement M-profile lazy FP state preservation
       target/arm: Implement VLSTM for v7M CPUs with an FPU
       target/arm: Implement VLLDM for v7M CPUs with an FPU
       target/arm: Enable FPU for Cortex-M4 and Cortex-M33
-Philippe Mathieu-Daudé (13):
+Hamza Mahfooz (1):
-      hw/dma: Compile the bcm2835_dma device as common object
+      target/arm: kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()
       hw/arm/aspeed: Use TYPE_TMP105/TYPE_PCA9552 instead of hardcoded string
       hw/arm/nseries: Use TYPE_TMP105 instead of hardcoded string
       hw/display/tc6393xb: Remove unused functions
       hw/devices: Move TC6393XB declarations into a new header
       hw/devices: Move Blizzard declarations into a new header
       hw/devices: Move CBus declarations into a new header
       hw/devices: Move Gamepad declarations into a new header
       hw/devices: Move TI touchscreen declarations into a new header
       hw/devices: Move LAN9118 declarations into a new header
       hw/net/ne2000-isa: Add guards to the header
       hw/net/lan9118: Export TYPE_LAN9118 and use it instead of hardcoded string
       hw/devices: Move SMSC 91C111 declaration into a new header
- configure                     |  10 +-
+Jan Luebbe (1):
- hw/dma/Makefile.objs          |   2 +-
+      hw/char/pl011: add support for sending break
  include/hw/arm/omap.h         |   6 +-
  include/hw/arm/smmu-common.h  |   8 +-
  include/hw/devices.h          |  62 ---
  include/hw/display/blizzard.h |  22 ++
  include/hw/display/tc6393xb.h |  24 ++
  include/hw/input/gamepad.h    |  19 +
  include/hw/input/tsc2xxx.h    |  36 ++
  include/hw/misc/cbus.h        |  32 ++
  include/hw/net/lan9118.h      |  21 +
  include/hw/net/ne2000-isa.h   |   6 +
  include/hw/net/smc91c111.h    |  19 +
  include/qemu/typedefs.h       |   1 -
  target/arm/cpu.h              |  95 ++++-
  target/arm/helper.h           |   5 +
  target/arm/translate.h        |   3 +
  hw/arm/aspeed.c               |  13 +-
  hw/arm/exynos4_boards.c       |   3 +-
  hw/arm/gumstix.c              |   2 +-
  hw/arm/integratorcp.c         |   2 +-
  hw/arm/kzm.c                  |   2 +-
  hw/arm/mainstone.c            |   2 +-
  hw/arm/mps2-tz.c              |   3 +-
  hw/arm/mps2.c                 |   2 +-
  hw/arm/nseries.c              |   7 +-
  hw/arm/palm.c                 |   2 +-
  hw/arm/realview.c             |   3 +-
  hw/arm/smmu-common.c          |   6 +-
  hw/arm/smmuv3.c               |  28 +-
  hw/arm/stellaris.c            |   2 +-
  hw/arm/tosa.c                 |   2 +-
  hw/arm/versatilepb.c          |   2 +-
  hw/arm/vexpress.c             |   2 +-
  hw/display/blizzard.c         |   2 +-
  hw/display/tc6393xb.c         |  18 +-
  hw/input/stellaris_input.c    |   2 +-
  hw/input/tsc2005.c            |   2 +-
  hw/input/tsc210x.c            |   4 +-
  hw/intc/armv7m_nvic.c         | 261 +++++++++++++
  hw/misc/cbus.c                |   2 +-
  hw/net/lan9118.c              |   3 +-
  hw/net/smc91c111.c            |   2 +-
  hw/ssi/xilinx_spips.c         |   6 +-
  target/arm/cpu.c              |  20 +
  target/arm/helper.c           | 873 +++++++++++++++++++++++++++++++++++++++---
  target/arm/machine.c          |  16 +
  target/arm/translate.c        | 150 +++++++-
  target/arm/vfp_helper.c       |   8 +
  MAINTAINERS                   |   7 +
 files changed, 1595 insertions(+), 235 deletions(-)
  delete mode 100644 include/hw/devices.h
  create mode 100644 include/hw/display/blizzard.h
  create mode 100644 include/hw/display/tc6393xb.h
  create mode 100644 include/hw/input/gamepad.h
  create mode 100644 include/hw/input/tsc2xxx.h
  create mode 100644 include/hw/misc/cbus.h
  create mode 100644 include/hw/net/lan9118.h
  create mode 100644 include/hw/net/smc91c111.h
+Peter Maydell (37):
+      target/arm: Note that we handle VMOVL as a special case of VSHLL
+      target/arm: Print MVE VPR in CPU dumps
+      target/arm: Fix MVE VSLI by 0 and VSRI by <dt>
+      target/arm: Fix signed VADDV
+      target/arm: Fix mask handling for MVE narrowing operations
+      target/arm: Fix 48-bit saturating shifts
+      target/arm: Fix MVE 48-bit SQRSHRL for small right shifts
+      target/arm: Fix calculation of LTP mask when LR is 0
+      target/arm: Factor out mve_eci_mask()
+      target/arm: Fix VPT advance when ECI is non-zero
+      target/arm: Fix VLDRB/H/W for predicated elements
+      target/arm: Implement MVE VMULL (polynomial)
+      target/arm: Implement MVE incrementing/decrementing dup insns
+      target/arm: Factor out gen_vpst()
+      target/arm: Implement MVE integer vector comparisons
+      target/arm: Implement MVE integer vector-vs-scalar comparisons
+      target/arm: Implement MVE VPSEL
+      target/arm: Implement MVE VMLAS
+      target/arm: Implement MVE shift-by-scalar
+      target/arm: Move 'x' and 'a' bit definitions into vmlaldav formats
+      target/arm: Implement MVE integer min/max across vector
+      target/arm: Implement MVE VABAV
+      target/arm: Implement MVE narrowing moves
+      target/arm: Rename MVEGenDualAccOpFn to MVEGenLongDualAccOpFn
+      target/arm: Implement MVE VMLADAV and VMLSLDAV
+      target/arm: Implement MVE VMLA
+      target/arm: Implement MVE saturating doubling multiply accumulates
+      target/arm: Implement MVE VQABS, VQNEG
+      target/arm: Implement MVE VMAXA, VMINA
+      target/arm: Implement MVE VMOV to/from 2 general-purpose registers
+      target/arm: Implement MVE VPNOT
+      target/arm: Implement MVE VCTP
+      target/arm: Implement MVE scatter-gather insns
+      target/arm: Implement MVE scatter-gather immediate forms
+      target/arm: Implement MVE interleaving loads/stores
+      target/arm: Re-indent sdiv and udiv helpers
+      target/arm: Implement M-profile trapping on division by zero
+Sebastian Meyer (1):
+      docs: Document how to use gdb with unix sockets
+Wen, Jianxian (1):
+      hw/dma/pl330: Add memory region to replace default
+ docs/system/gdb.rst        |   26 +-
+ include/hw/arm/fsl-imx7.h  |    5 +
+ target/arm/cpu.h           |    1 +
+ target/arm/helper-mve.h    |  283 ++++++++++
+ target/arm/helper.h        |    4 +-
+ target/arm/translate-a32.h |    2 +
+ target/arm/vec_internal.h  |   11 +
+ target/arm/mve.decode      |  226 +++++++-
+ target/arm/t32.decode      |    1 +
+ hw/arm/exynos4210.c        |    3 +
+ hw/arm/fsl-imx6ul.c        |   12 +
+ hw/arm/fsl-imx7.c          |    7 +
+ hw/arm/sbsa-ref.c          |    6 +-
+ hw/arm/xilinx_zynq.c       |    3 +
+ hw/char/pl011.c            |    6 +
+ hw/dma/pl330.c             |   26 +-
+ target/arm/cpu.c           |    3 +
+ target/arm/helper.c        |   34 +-
+ target/arm/kvm.c           |   17 +-
+ target/arm/m_helper.c      |    4 +
+ target/arm/mve_helper.c    | 1254 ++++++++++++++++++++++++++++++++++++++++++--
+ target/arm/translate-mve.c |  877 ++++++++++++++++++++++++++++++-
+ target/arm/translate-vfp.c |    2 +-
+ target/arm/translate.c     |   37 +-
+ target/arm/vec_helper.c    |   14 +-
+files changed, 2746 insertions(+), 118 deletions(-)

-[Qemu-devel] [PULL 39/42] hw/devices: Move LAN9118 declarations into a new header
+[PULL 01/44] target/arm: Note that we handle VMOVL as a special case of VSHLL
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+Although the architecture doesn't define it as an alias, VMOVL
 (vector move long) is encoded as a VSHLL with a zero shift.
 Add a comment in the decode file noting that we handle VMOVL
 as part of VSHLL.
-Reviewed-by: Markus Armbruster <armbru@redhat.com>
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20190412165416.7977-10-philmd@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/hw/devices.h     |  3 ---
+ target/arm/mve.decode | 2 ++
- include/hw/net/lan9118.h | 19 +++++++++++++++++++
+file changed, 2 insertions(+)
  hw/arm/kzm.c             |  2 +-
  hw/arm/mps2.c            |  2 +-
  hw/arm/realview.c        |  1 +
  hw/arm/vexpress.c        |  2 +-
  hw/net/lan9118.c         |  2 +-
 files changed, 24 insertions(+), 7 deletions(-)
  create mode 100644 include/hw/net/lan9118.h
-diff --git a/include/hw/devices.h b/include/hw/devices.h
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/devices.h
+--- a/target/arm/mve.decode
-+++ b/include/hw/devices.h
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
- /* smc91c111.c */
+ VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
- void smc91c111_init(NICInfo *, uint32_t, qemu_irq);
+ # VSHLL T1 encoding; the T2 VSHLL encoding is elsewhere in this file
--/* lan9118.c */
++# Note that VMOVL is encoded as "VSHLL with a zero shift count"; we
--void lan9118_init(NICInfo *, uint32_t, qemu_irq);
++# implement it that way rather than special-casing it in the decode.
--
+ VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
- #endif
+ VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
-diff --git a/include/hw/net/lan9118.h b/include/hw/net/lan9118.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/net/lan9118.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * SMSC LAN9118 Ethernet interface emulation
 + *
 + * Copyright (c) 2009 CodeSourcery, LLC.
 + * Written by Paul Brook
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
 +#ifndef HW_NET_LAN9118_H
 +#define HW_NET_LAN9118_H
 +
 +#include "hw/irq.h"
 +#include "net/net.h"
 +
 +void lan9118_init(NICInfo *, uint32_t, qemu_irq);
 +
 +#endif
 diff --git a/hw/arm/kzm.c b/hw/arm/kzm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/kzm.c
 +++ b/hw/arm/kzm.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/error-report.h"
  #include "exec/address-spaces.h"
  #include "net/net.h"
 -#include "hw/devices.h"
 +#include "hw/net/lan9118.h"
  #include "hw/char/serial.h"
  #include "sysemu/qtest.h"
 diff --git a/hw/arm/mps2.c b/hw/arm/mps2.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/mps2.c
 +++ b/hw/arm/mps2.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/timer/cmsdk-apb-timer.h"
  #include "hw/timer/cmsdk-apb-dualtimer.h"
  #include "hw/misc/mps2-scc.h"
 -#include "hw/devices.h"
 +#include "hw/net/lan9118.h"
  #include "net/net.h"
  typedef enum MPS2FPGAType {
 diff --git a/hw/arm/realview.c b/hw/arm/realview.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/realview.c
 +++ b/hw/arm/realview.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/arm/arm.h"
  #include "hw/arm/primecell.h"
  #include "hw/devices.h"
 +#include "hw/net/lan9118.h"
  #include "hw/pci/pci.h"
  #include "net/net.h"
  #include "sysemu/sysemu.h"
 diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/vexpress.c
 +++ b/hw/arm/vexpress.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/sysbus.h"
  #include "hw/arm/arm.h"
  #include "hw/arm/primecell.h"
 -#include "hw/devices.h"
 +#include "hw/net/lan9118.h"
  #include "hw/i2c/i2c.h"
  #include "net/net.h"
  #include "sysemu/sysemu.h"
 diff --git a/hw/net/lan9118.c b/hw/net/lan9118.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/net/lan9118.c
 +++ b/hw/net/lan9118.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/sysbus.h"
  #include "net/net.h"
  #include "net/eth.h"
 -#include "hw/devices.h"
 +#include "hw/net/lan9118.h"
  #include "sysemu/sysemu.h"
  #include "hw/ptimer.h"
  #include "qemu/log.h"
 --
 .20.1

-[Qemu-devel] [PULL 29/42] target/arm: Enable FPU for Cortex-M4 and Cortex-M33
+[PULL 02/44] target/arm: Print MVE VPR in CPU dumps
-Enable the FPU by default for the Cortex-M4 and Cortex-M33.
+Include the MVE VPR register value in the CPU dumps produced by
 arm_cpu_dump_state() if we are printing FPU information. This
 makes it easier to interpret debug logs when predication is
 active.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-27-peter.maydell@linaro.org
 ---
- target/arm/cpu.c | 8 ++++++++
+ target/arm/cpu.c | 3 +++
-file changed, 8 insertions(+)
+file changed, 3 insertions(+)
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void cortex_m4_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_dump_state(CPUState *cs, FILE *f, int flags)
-     set_feature(&cpu->env, ARM_FEATURE_M);
+                          i, v);
-     set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
+         }
-     set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
+         qemu_fprintf(f, "FPSCR: %08x\n", vfp_get_fpscr(env));
-+    set_feature(&cpu->env, ARM_FEATURE_VFP4);
++        if (cpu_isar_feature(aa32_mve, cpu)) {
-     cpu->midr = 0x410fc240; /* r0p0 */
++            qemu_fprintf(f, "VPR: %08x\n", env->v7m.vpr);
-     cpu->pmsav7_dregion = 8;
++        }
-+    cpu->isar.mvfr0 = 0x10110021;
+     }
-+    cpu->isar.mvfr1 = 0x11000011;
+ }
-+    cpu->isar.mvfr2 = 0x00000000;
      cpu->id_pfr0 = 0x00000030;
      cpu->id_pfr1 = 0x00000200;
      cpu->id_dfr0 = 0x00100000;
@@ -XXX,XX +XXX,XX @@ static void cortex_m33_initfn(Object *obj)
      set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
      set_feature(&cpu->env, ARM_FEATURE_M_SECURITY);
      set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
 +    set_feature(&cpu->env, ARM_FEATURE_VFP4);
      cpu->midr = 0x410fd213; /* r0p3 */
      cpu->pmsav7_dregion = 16;
      cpu->sau_sregion = 8;
 +    cpu->isar.mvfr0 = 0x10110021;
 +    cpu->isar.mvfr1 = 0x11000011;
 +    cpu->isar.mvfr2 = 0x00000040;
      cpu->id_pfr0 = 0x00000030;
      cpu->id_pfr1 = 0x00000210;
      cpu->id_dfr0 = 0x00200000;
 --
 .20.1

-New patch
+[PULL 03/44] target/arm: Fix MVE VSLI by 0 and VSRI by <dt>
+In the MVE shift-and-insert insns, we special case VSLI by 0
+and VSRI by <dt>. VSRI by <dt> means "don't update the destination",
+which is what we've implemented. However VSLI by 0 is "set
+destination to the input", so we don't want to use the same
+special-casing that we do for VSRI by <dt>.
+Since the generic logic gives the right answer for a shift
+by 0, just use that.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/mve_helper.c | 9 +++++----
+file changed, 5 insertions(+), 4 deletions(-)
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/mve_helper.c
++++ b/target/arm/mve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
+         uint16_t mask;                                                  \
+         uint64_t shiftmask;                                             \
+         unsigned e;                                                     \
+-        if (shift == 0 || shift == ESIZE * 8) {                         \
++        if (shift == ESIZE * 8) {                                       \
+             /*                                                          \
+-             * Only VSLI can shift by 0; only VSRI can shift by <dt>.   \
+-             * The generic logic would give the right answer for 0 but  \
+-             * fails for <dt>.                                          \
++             * Only VSRI can shift by <dt>; it should mean "don't       \
++             * update the destination". The generic logic can't handle  \
++             * this because it would try to shift by an out-of-range    \
++             * amount, so special case it here.                         \
+              */                                                         \
+             goto done;                                                  \
+         }                                                               \
+--
+.20.1

-New patch
+[PULL 04/44] target/arm: Fix signed VADDV
+A cut-and-paste error meant we handled signed VADDV like
+unsigned VADDV; fix the type used.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ target/arm/mve_helper.c | 6 +++---
+file changed, 3 insertions(+), 3 deletions(-)
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/mve_helper.c
++++ b/target/arm/mve_helper.c
+@@ -XXX,XX +XXX,XX @@ DO_LDAVH(vrmlsldavhxsw, int32_t, int64_t, true, true)
+         return ra;                                              \
+     }                                                           \
+-DO_VADDV(vaddvsb, 1, uint8_t)
+-DO_VADDV(vaddvsh, 2, uint16_t)
+-DO_VADDV(vaddvsw, 4, uint32_t)
++DO_VADDV(vaddvsb, 1, int8_t)
++DO_VADDV(vaddvsh, 2, int16_t)
++DO_VADDV(vaddvsw, 4, int32_t)
+ DO_VADDV(vaddvub, 1, uint8_t)
+ DO_VADDV(vaddvuh, 2, uint16_t)
+ DO_VADDV(vaddvuw, 4, uint32_t)
+--
+.20.1

-[Qemu-devel] [PULL 42/42] hw/devices: Move SMSC 91C111 declaration into a new header
+[PULL 05/44] target/arm: Fix mask handling for MVE narrowing operations
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+In the MVE helpers for the narrowing operations (DO_VSHRN and
 DO_VSHRN_SAT) we were using the wrong bits of the predicate mask for
 the 'top' versions of the insn.  This is because the loop works over
 the double-sized input elements and shifts the predicate mask by that
 many bits each time, but when we write out the half-sized output we
 must look at the mask bits for whichever half of the element we are
 writing to.
-This commit finally deletes "hw/devices.h".
+Correct this by shifting the whole mask right by ESIZE bits for the
 'top' insns.  This allows us also to simplify the saturation bit
 checking (where we had noticed that we needed to look at a different
 mask bit for the 'top' insn.)
-Reviewed-by: Markus Armbruster <armbru@redhat.com>
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20190412165416.7977-13-philmd@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/hw/devices.h       | 11 -----------
+ target/arm/mve_helper.c | 4 +++-
- include/hw/net/smc91c111.h | 19 +++++++++++++++++++
+file changed, 3 insertions(+), 1 deletion(-)
  hw/arm/gumstix.c           |  2 +-
  hw/arm/integratorcp.c      |  2 +-
  hw/arm/mainstone.c         |  2 +-
  hw/arm/realview.c          |  2 +-
  hw/arm/versatilepb.c       |  2 +-
  hw/net/smc91c111.c         |  2 +-
 files changed, 25 insertions(+), 17 deletions(-)
  delete mode 100644 include/hw/devices.h
  create mode 100644 include/hw/net/smc91c111.h
-diff --git a/include/hw/devices.h b/include/hw/devices.h
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 deleted file mode 100644
 index XXXXXXX..XXXXXXX
 --- a/include/hw/devices.h
 +++ /dev/null
@@ -XXX,XX +XXX,XX @@
 -#ifndef QEMU_DEVICES_H
 -#define QEMU_DEVICES_H
 -
 -/* Devices that have nowhere better to go.  */
 -
 -#include "hw/hw.h"
 -
 -/* smc91c111.c */
 -void smc91c111_init(NICInfo *, uint32_t, qemu_irq);
 -
 -#endif
 diff --git a/include/hw/net/smc91c111.h b/include/hw/net/smc91c111.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/net/smc91c111.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * SMSC 91C111 Ethernet interface emulation
 + *
 + * Copyright (c) 2005 CodeSourcery, LLC.
 + * Written by Paul Brook
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
 +#ifndef HW_NET_SMC91C111_H
 +#define HW_NET_SMC91C111_H
 +
 +#include "hw/irq.h"
 +#include "net/net.h"
 +
 +void smc91c111_init(NICInfo *, uint32_t, qemu_irq);
 +
 +#endif
 diff --git a/hw/arm/gumstix.c b/hw/arm/gumstix.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/gumstix.c
+--- a/target/arm/mve_helper.c
-+++ b/hw/arm/gumstix.c
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DO_VSHLL_ALL(vshllt, true)
- #include "hw/arm/pxa.h"
+         TYPE *d = vd;                                           \
- #include "net/net.h"
+         uint16_t mask = mve_element_mask(env);                  \
- #include "hw/block/flash.h"
+         unsigned le;                                            \
--#include "hw/devices.h"
++        mask >>= ESIZE * TOP;                                   \
-+#include "hw/net/smc91c111.h"
+         for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
- #include "hw/boards.h"
+             TYPE r = FN(m[H##LESIZE(le)], shift);               \
- #include "exec/address-spaces.h"
+             mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
- #include "sysemu/qtest.h"
+@@ -XXX,XX +XXX,XX @@ static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max,
-diff --git a/hw/arm/integratorcp.c b/hw/arm/integratorcp.c
+         uint16_t mask = mve_element_mask(env);                  \
-index XXXXXXX..XXXXXXX 100644
+         bool qc = false;                                        \
---- a/hw/arm/integratorcp.c
+         unsigned le;                                            \
-+++ b/hw/arm/integratorcp.c
++        mask >>= ESIZE * TOP;                                   \
-@@ -XXX,XX +XXX,XX @@
+         for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
- #include "qemu-common.h"
+             bool sat = false;                                   \
- #include "cpu.h"
+             TYPE r = FN(m[H##LESIZE(le)], shift, &sat);         \
- #include "hw/sysbus.h"
+             mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
--#include "hw/devices.h"
+-            qc |= sat && (mask & 1 << (TOP * ESIZE));           \
- #include "hw/boards.h"
++            qc |= sat & mask & 1;                               \
- #include "hw/arm/arm.h"
+         }                                                       \
- #include "hw/misc/arm_integrator_debug.h"
+         if (qc) {                                               \
-+#include "hw/net/smc91c111.h"
+             env->vfp.qc[0] = qc;                                \
  #include "net/net.h"
  #include "exec/address-spaces.h"
  #include "sysemu/sysemu.h"
 diff --git a/hw/arm/mainstone.c b/hw/arm/mainstone.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/mainstone.c
 +++ b/hw/arm/mainstone.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/arm/pxa.h"
  #include "hw/arm/arm.h"
  #include "net/net.h"
 -#include "hw/devices.h"
 +#include "hw/net/smc91c111.h"
  #include "hw/boards.h"
  #include "hw/block/flash.h"
  #include "hw/sysbus.h"
 diff --git a/hw/arm/realview.c b/hw/arm/realview.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/realview.c
 +++ b/hw/arm/realview.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/sysbus.h"
  #include "hw/arm/arm.h"
  #include "hw/arm/primecell.h"
 -#include "hw/devices.h"
  #include "hw/net/lan9118.h"
 +#include "hw/net/smc91c111.h"
  #include "hw/pci/pci.h"
  #include "net/net.h"
  #include "sysemu/sysemu.h"
 diff --git a/hw/arm/versatilepb.c b/hw/arm/versatilepb.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/versatilepb.c
 +++ b/hw/arm/versatilepb.c
@@ -XXX,XX +XXX,XX @@
  #include "cpu.h"
  #include "hw/sysbus.h"
  #include "hw/arm/arm.h"
 -#include "hw/devices.h"
 +#include "hw/net/smc91c111.h"
  #include "net/net.h"
  #include "sysemu/sysemu.h"
  #include "hw/pci/pci.h"
 diff --git a/hw/net/smc91c111.c b/hw/net/smc91c111.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/net/smc91c111.c
 +++ b/hw/net/smc91c111.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/osdep.h"
  #include "hw/sysbus.h"
  #include "net/net.h"
 -#include "hw/devices.h"
 +#include "hw/net/smc91c111.h"
  #include "qemu/log.h"
  /* For crc32 */
  #include <zlib.h>
 --
 .20.1

-[Qemu-devel] [PULL 20/42] target/arm: Overlap VECSTRIDE and XSCALE_CPAR TB flags
+[PULL 06/44] target/arm: Fix 48-bit saturating shifts
-We are close to running out of TB flags for AArch32; we could
+In do_sqrshl48_d() and do_uqrshl48_d() we got some of the edge
-start using the cs_base word, but before we do that we can
+cases wrong and failed to saturate correctly:
-economise on our usage by sharing the same bits for the VFP
-VECSTRIDE field and the XScale XSCALE_CPAR field. This
+(1) In do_sqrshl48_d() we used the same code that do_shrshl_bhs()
-works because no XScale CPU ever had VFP.
+does to obtain the saturated most-negative and most-positive 48-bit
 signed values for the large-shift-left case.  This gives (1 << 47)
 for saturate-to-most-negative, but we weren't sign-extending this
 value to the 64-bit output as the pseudocode requires.
 (2) For left shifts by less than 48, we copied the "8/16 bit" code
 from do_sqrshl_bhs() and do_uqrshl_bhs().  This doesn't do the right
 thing because it assumes the C type we're working with is at least
 twice the number of bits we're saturating to (so that a shift left by
 bits-1 can't shift anything off the top of the value).  This isn't
 true for bits == 48, so we would incorrectly return 0 rather than the
 most-positive value for situations like "shift (1 << 44) right by
 ".  Instead check for saturation by doing the shift and signextend
 and then testing whether shifting back left again gives the original
 value.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-18-peter.maydell@linaro.org
 ---
- target/arm/cpu.h       | 10 ++++++----
+ target/arm/mve_helper.c | 12 +++++-------
- target/arm/cpu.c       |  7 +++++++
+file changed, 5 insertions(+), 7 deletions(-)
  target/arm/helper.c    |  6 +++++-
  target/arm/translate.c |  9 +++++++--
 files changed, 25 insertions(+), 7 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/mve_helper.c
-+++ b/target/arm/cpu.h
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_ANY, BE_DATA, 23, 1)
+@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
- FIELD(TBFLAG_A32, THUMB, 0, 1)
+         }
- FIELD(TBFLAG_A32, VECLEN, 1, 3)
+         return src >> -shift;
- FIELD(TBFLAG_A32, VECSTRIDE, 4, 2)
+     } else if (shift < 48) {
-+/*
+-        int64_t val = src << shift;
-+ * We store the bottom two bits of the CPAR as TB flags and handle
+-        int64_t extval = sextract64(val, 0, 48);
-+ * checks on the other bits at runtime. This shares the same bits as
+-        if (!sat || val == extval) {
-+ * VECSTRIDE, which is OK as no XScale CPU has VFP.
++        int64_t extval = sextract64(src << shift, 0, 48);
-+ */
++        if (!sat || src == (extval >> shift)) {
-+FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2)
+             return extval;
- /*
+         }
-  * Indicates whether cp register reads and writes by guest code should access
+     } else if (!sat || src == 0) {
-  * the secure or nonsecure bank of banked registers; note that this is not
+@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, NS, 6, 1)
  FIELD(TBFLAG_A32, VFPEN, 7, 1)
  FIELD(TBFLAG_A32, CONDEXEC, 8, 8)
  FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
 -/* We store the bottom two bits of the CPAR as TB flags and handle
 - * checks on the other bits at runtime
 - */
 -FIELD(TBFLAG_A32, XSCALE_CPAR, 17, 2)
  /* For M profile only, Handler (ie not Thread) mode */
  FIELD(TBFLAG_A32, HANDLER, 21, 1)
  /* For M profile only, whether we should generate stack-limit checks */
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
          set_feature(env, ARM_FEATURE_THUMB_DSP);
      }
-+    /*
+     *sat = 1;
-+     * We rely on no XScale CPU having VFP so we can use the same bits in the
+-    return (1ULL << 47) - (src >= 0);
-+     * TB flags field for VECSTRIDE and XSCALE_CPAR.
++    return src >= 0 ? MAKE_64BIT_MASK(0, 47) : MAKE_64BIT_MASK(47, 17);
-+     */
+ }
-+    assert(!(arm_feature(env, ARM_FEATURE_VFP) &&
-+             arm_feature(env, ARM_FEATURE_XSCALE)));
+ /* Operate on 64-bit values, but saturate at 48 bits */
-+
+@@ -XXX,XX +XXX,XX @@ static inline uint64_t do_uqrshl48_d(uint64_t src, int64_t shift,
-     if (arm_feature(env, ARM_FEATURE_V7) &&
+             return extval;
          !arm_feature(env, ARM_FEATURE_M) &&
          !arm_feature(env, ARM_FEATURE_PMSA)) {
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
              || arm_el_is_aa64(env, 1) || arm_feature(env, ARM_FEATURE_M)) {
              flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
          }
--        flags = FIELD_DP32(flags, TBFLAG_A32, XSCALE_CPAR, env->cp15.c15_cpar);
+     } else if (shift < 48) {
-+        /* Note that XSCALE_CPAR shares bits with VECSTRIDE */
+-        uint64_t val = src << shift;
-+        if (arm_feature(env, ARM_FEATURE_XSCALE)) {
+-        uint64_t extval = extract64(val, 0, 48);
-+            flags = FIELD_DP32(flags, TBFLAG_A32,
+-        if (!sat || val == extval) {
-+                               XSCALE_CPAR, env->cp15.c15_cpar);
++        uint64_t extval = extract64(src << shift, 0, 48);
-+        }
++        if (!sat || src == (extval >> shift)) {
-     }
+             return extval;
+         }
-     flags = FIELD_DP32(flags, TBFLAG_ANY, MMUIDX, arm_to_core_mmu_idx(mmu_idx));
+     } else if (!sat || src == 0) {
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
      dc->fp_excp_el = FIELD_EX32(tb_flags, TBFLAG_ANY, FPEXC_EL);
      dc->vfp_enabled = FIELD_EX32(tb_flags, TBFLAG_A32, VFPEN);
      dc->vec_len = FIELD_EX32(tb_flags, TBFLAG_A32, VECLEN);
 -    dc->vec_stride = FIELD_EX32(tb_flags, TBFLAG_A32, VECSTRIDE);
 -    dc->c15_cpar = FIELD_EX32(tb_flags, TBFLAG_A32, XSCALE_CPAR);
 +    if (arm_feature(env, ARM_FEATURE_XSCALE)) {
 +        dc->c15_cpar = FIELD_EX32(tb_flags, TBFLAG_A32, XSCALE_CPAR);
 +        dc->vec_stride = 0;
 +    } else {
 +        dc->vec_stride = FIELD_EX32(tb_flags, TBFLAG_A32, VECSTRIDE);
 +        dc->c15_cpar = 0;
 +    }
      dc->v7m_handler_mode = FIELD_EX32(tb_flags, TBFLAG_A32, HANDLER);
      dc->v8m_secure = arm_feature(env, ARM_FEATURE_M_SECURITY) &&
          regime_is_secure(env, dc->mmu_idx);
 --
 .20.1

-[Qemu-devel] [PULL 38/42] hw/devices: Move TI touchscreen declarations into a new header
+[PULL 07/44] target/arm: Fix MVE 48-bit SQRSHRL for small right shifts
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+We got an edge case wrong in the 48-bit SQRSHRL implementation: if
 the shift is to the right, although it always makes the result
 smaller than the input value it might not be within the 48-bit range
 the result is supposed to be if the input had some bits in [63..48]
 set and the shift didn't bring all of those within the [47..0] range.
-Since uWireSlave is only used in this new header, there is no
+Handle this similarly to the way we already do for this case in
-need to expose it via "qemu/typedefs.h".
+do_uqrshl48_d(): extend the calculated result from 48 bits,
 and return that if not saturating or if it doesn't change the
 result; otherwise fall through to return a saturated value.
-Reviewed-by: Markus Armbruster <armbru@redhat.com>
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20190412165416.7977-9-philmd@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/hw/arm/omap.h      |  6 +-----
+ target/arm/mve_helper.c | 11 +++++++++--
- include/hw/devices.h       | 15 ---------------
+file changed, 9 insertions(+), 2 deletions(-)
  include/hw/input/tsc2xxx.h | 36 ++++++++++++++++++++++++++++++++++++
  include/qemu/typedefs.h    |  1 -
  hw/arm/nseries.c           |  2 +-
  hw/arm/palm.c              |  2 +-
  hw/input/tsc2005.c         |  2 +-
  hw/input/tsc210x.c         |  4 ++--
  MAINTAINERS                |  2 ++
 files changed, 44 insertions(+), 26 deletions(-)
  create mode 100644 include/hw/input/tsc2xxx.h
-diff --git a/include/hw/arm/omap.h b/include/hw/arm/omap.h
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/omap.h
+--- a/target/arm/mve_helper.c
-+++ b/include/hw/arm/omap.h
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(mve_uqrshll)(CPUARMState *env, uint64_t n, uint32_t shift)
- #include "exec/memory.h"
+ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
- # define hw_omap_h        "omap.h"
+                                     bool round, uint32_t *sat)
- #include "hw/irq.h"
+ {
-+#include "hw/input/tsc2xxx.h"
++    int64_t val, extval;
  #include "target/arm/cpu-qom.h"
  #include "qemu/log.h"
@@ -XXX,XX +XXX,XX @@ qemu_irq *omap_mpuio_in_get(struct omap_mpuio_s *s);
  void omap_mpuio_out_set(struct omap_mpuio_s *s, int line, qemu_irq handler);
  void omap_mpuio_key(struct omap_mpuio_s *s, int row, int col, int down);
 -struct uWireSlave {
 -    uint16_t (*receive)(void *opaque);
 -    void (*send)(void *opaque, uint16_t data);
 -    void *opaque;
 -};
  struct omap_uwire_s;
  void omap_uwire_attach(struct omap_uwire_s *s,
                  uWireSlave *slave, int chipselect);
 diff --git a/include/hw/devices.h b/include/hw/devices.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/devices.h
 +++ b/include/hw/devices.h
@@ -XXX,XX +XXX,XX @@
  /* Devices that have nowhere better to go.  */
  #include "hw/hw.h"
 -#include "ui/console.h"
  /* smc91c111.c */
  void smc91c111_init(NICInfo *, uint32_t, qemu_irq);
@@ -XXX,XX +XXX,XX @@ void smc91c111_init(NICInfo *, uint32_t, qemu_irq);
  /* lan9118.c */
  void lan9118_init(NICInfo *, uint32_t, qemu_irq);
 -/* tsc210x.c */
 -uWireSlave *tsc2102_init(qemu_irq pint);
 -uWireSlave *tsc2301_init(qemu_irq penirq, qemu_irq kbirq, qemu_irq dav);
 -I2SCodec *tsc210x_codec(uWireSlave *chip);
 -uint32_t tsc210x_txrx(void *opaque, uint32_t value, int len);
 -void tsc210x_set_transform(uWireSlave *chip,
 -                MouseTransformInfo *info);
 -void tsc210x_key_event(uWireSlave *chip, int key, int down);
 -
 -/* tsc2005.c */
 -void *tsc2005_init(qemu_irq pintdav);
 -uint32_t tsc2005_txrx(void *opaque, uint32_t value, int len);
 -void tsc2005_set_transform(void *opaque, MouseTransformInfo *info);
 -
  #endif
 diff --git a/include/hw/input/tsc2xxx.h b/include/hw/input/tsc2xxx.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/input/tsc2xxx.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * TI touchscreen controller
 + *
 + * Copyright (c) 2006 Andrzej Zaborowski
 + * Copyright (C) 2008 Nokia Corporation
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
-+#ifndef HW_INPUT_TSC2XXX_H
+     if (shift <= -48) {
-+#define HW_INPUT_TSC2XXX_H
+         /* Rounding the sign bit always produces 0. */
-+
+         if (round) {
-+#include "hw/irq.h"
+@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
-+#include "ui/console.h"
+     } else if (shift < 0) {
-+
+         if (round) {
-+typedef struct uWireSlave {
+             src >>= -shift - 1;
-+    uint16_t (*receive)(void *opaque);
+-            return (src >> 1) + (src & 1);
-+    void (*send)(void *opaque, uint16_t data);
++            val = (src >> 1) + (src & 1);
-+    void *opaque;
++        } else {
-+} uWireSlave;
++            val = src >> -shift;
-+
++        }
-+/* tsc210x.c */
++        extval = sextract64(val, 0, 48);
-+uWireSlave *tsc2102_init(qemu_irq pint);
++        if (!sat || val == extval) {
-+uWireSlave *tsc2301_init(qemu_irq penirq, qemu_irq kbirq, qemu_irq dav);
++            return extval;
-+I2SCodec *tsc210x_codec(uWireSlave *chip);
+         }
-+uint32_t tsc210x_txrx(void *opaque, uint32_t value, int len);
+-        return src >> -shift;
-+void tsc210x_set_transform(uWireSlave *chip, MouseTransformInfo *info);
+     } else if (shift < 48) {
-+void tsc210x_key_event(uWireSlave *chip, int key, int down);
+         int64_t extval = sextract64(src << shift, 0, 48);
-+
+         if (!sat || src == (extval >> shift)) {
 +/* tsc2005.c */
 +void *tsc2005_init(qemu_irq pintdav);
 +uint32_t tsc2005_txrx(void *opaque, uint32_t value, int len);
 +void tsc2005_set_transform(void *opaque, MouseTransformInfo *info);
 +
 +#endif
 diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/qemu/typedefs.h
 +++ b/include/qemu/typedefs.h
@@ -XXX,XX +XXX,XX @@ typedef struct RAMBlock RAMBlock;
  typedef struct Range Range;
  typedef struct SHPCDevice SHPCDevice;
  typedef struct SSIBus SSIBus;
 -typedef struct uWireSlave uWireSlave;
  typedef struct VirtIODevice VirtIODevice;
  typedef struct Visitor Visitor;
  typedef void SaveStateHandler(QEMUFile *f, void *opaque);
 diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/nseries.c
 +++ b/hw/arm/nseries.c
@@ -XXX,XX +XXX,XX @@
  #include "ui/console.h"
  #include "hw/boards.h"
  #include "hw/i2c/i2c.h"
 -#include "hw/devices.h"
  #include "hw/display/blizzard.h"
 +#include "hw/input/tsc2xxx.h"
  #include "hw/misc/cbus.h"
  #include "hw/misc/tmp105.h"
  #include "hw/block/flash.h"
 diff --git a/hw/arm/palm.c b/hw/arm/palm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/palm.c
 +++ b/hw/arm/palm.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/arm/omap.h"
  #include "hw/boards.h"
  #include "hw/arm/arm.h"
 -#include "hw/devices.h"
 +#include "hw/input/tsc2xxx.h"
  #include "hw/loader.h"
  #include "exec/address-spaces.h"
  #include "cpu.h"
 diff --git a/hw/input/tsc2005.c b/hw/input/tsc2005.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/input/tsc2005.c
 +++ b/hw/input/tsc2005.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/hw.h"
  #include "qemu/timer.h"
  #include "ui/console.h"
 -#include "hw/devices.h"
 +#include "hw/input/tsc2xxx.h"
  #include "trace.h"
  #define TSC_CUT_RESOLUTION(value, p)    ((value) >> (16 - (p ? 12 : 10)))
 diff --git a/hw/input/tsc210x.c b/hw/input/tsc210x.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/input/tsc210x.c
 +++ b/hw/input/tsc210x.c
@@ -XXX,XX +XXX,XX @@
  #include "audio/audio.h"
  #include "qemu/timer.h"
  #include "ui/console.h"
 -#include "hw/arm/omap.h"    /* For I2SCodec and uWireSlave */
 -#include "hw/devices.h"
 +#include "hw/arm/omap.h"            /* For I2SCodec */
 +#include "hw/input/tsc2xxx.h"
  #define TSC_DATA_REGISTERS_PAGE        0x0
  #define TSC_CONTROL_REGISTERS_PAGE    0x1
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: hw/input/tsc2005.c
  F: hw/misc/cbus.c
  F: hw/timer/twl92230.c
  F: include/hw/display/blizzard.h
 +F: include/hw/input/tsc2xxx.h
  F: include/hw/misc/cbus.h
  Palm
@@ -XXX,XX +XXX,XX @@ L: qemu-arm@nongnu.org
  S: Odd Fixes
  F: hw/arm/palm.c
  F: hw/input/tsc210x.c
 +F: include/hw/input/tsc2xxx.h
  Raspberry Pi
  M: Peter Maydell <peter.maydell@linaro.org>
 --
 .20.1

-[Qemu-devel] [PULL 37/42] hw/devices: Move Gamepad declarations into a new header
+[PULL 08/44] target/arm: Fix calculation of LTP mask when LR is 0
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+In mve_element_mask(), we calculate a mask for tail predication which
 should have a number of 1 bits based on the value of LR.  However,
 our MAKE_64BIT_MASK() macro has undefined behaviour when passed a
 zero length.  Special case this to give the all-zeroes mask we
 require.
-Reviewed-by: Markus Armbruster <armbru@redhat.com>
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20190412165416.7977-8-philmd@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/hw/devices.h       |  3 ---
+ target/arm/mve_helper.c | 3 ++-
- include/hw/input/gamepad.h | 19 +++++++++++++++++++
+file changed, 2 insertions(+), 1 deletion(-)
  hw/arm/stellaris.c         |  2 +-
  hw/input/stellaris_input.c |  2 +-
  MAINTAINERS                |  1 +
 files changed, 22 insertions(+), 5 deletions(-)
  create mode 100644 include/hw/input/gamepad.h
-diff --git a/include/hw/devices.h b/include/hw/devices.h
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/devices.h
+--- a/target/arm/mve_helper.c
-+++ b/include/hw/devices.h
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ void *tsc2005_init(qemu_irq pintdav);
+@@ -XXX,XX +XXX,XX @@ static uint16_t mve_element_mask(CPUARMState *env)
- uint32_t tsc2005_txrx(void *opaque, uint32_t value, int len);
+          */
- void tsc2005_set_transform(void *opaque, MouseTransformInfo *info);
+         int masklen = env->regs[14] << env->v7m.ltpsize;
+         assert(masklen <= 16);
--/* stellaris_input.c */
+-        mask &= MAKE_64BIT_MASK(0, masklen);
--void stellaris_gamepad_init(int n, qemu_irq *irq, const int *keycode);
++        uint16_t ltpmask = masklen ? MAKE_64BIT_MASK(0, masklen) : 0;
--
++        mask &= ltpmask;
- #endif
+     }
-diff --git a/include/hw/input/gamepad.h b/include/hw/input/gamepad.h
-new file mode 100644
+     if ((env->condexec_bits & 0xf) == 0) {
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/input/gamepad.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * Gamepad style buttons connected to IRQ/GPIO lines
 + *
 + * Copyright (c) 2007 CodeSourcery.
 + * Written by Paul Brook
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
 +#ifndef HW_INPUT_GAMEPAD_H
 +#define HW_INPUT_GAMEPAD_H
 +
 +#include "hw/irq.h"
 +
 +/* stellaris_input.c */
 +void stellaris_gamepad_init(int n, qemu_irq *irq, const int *keycode);
 +
 +#endif
 diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/stellaris.c
 +++ b/hw/arm/stellaris.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/sysbus.h"
  #include "hw/ssi/ssi.h"
  #include "hw/arm/arm.h"
 -#include "hw/devices.h"
  #include "qemu/timer.h"
  #include "hw/i2c/i2c.h"
  #include "net/net.h"
@@ -XXX,XX +XXX,XX @@
  #include "sysemu/sysemu.h"
  #include "hw/arm/armv7m.h"
  #include "hw/char/pl011.h"
 +#include "hw/input/gamepad.h"
  #include "hw/watchdog/cmsdk-apb-watchdog.h"
  #include "hw/misc/unimp.h"
  #include "cpu.h"
 diff --git a/hw/input/stellaris_input.c b/hw/input/stellaris_input.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/input/stellaris_input.c
 +++ b/hw/input/stellaris_input.c
@@ -XXX,XX +XXX,XX @@
   */
  #include "qemu/osdep.h"
  #include "hw/hw.h"
 -#include "hw/devices.h"
 +#include "hw/input/gamepad.h"
  #include "ui/console.h"
  typedef struct {
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ M: Peter Maydell <peter.maydell@linaro.org>
  L: qemu-arm@nongnu.org
  S: Maintained
  F: hw/*/stellaris*
 +F: include/hw/input/gamepad.h
  Versatile Express
  M: Peter Maydell <peter.maydell@linaro.org>
 --
 .20.1

-[Qemu-devel] [PULL 08/42] target/arm: Honour M-profile FP enable bits
+[PULL 09/44] target/arm: Factor out mve_eci_mask()
-Like AArch64, M-profile floating point has no FPEXC enable
+In some situations we need a mask telling us which parts of the
-bit to gate floating point; so always set the VFPEN TB flag.
+vector correspond to beats that are not being executed because of
+ECI, separately from the combined "which bytes are predicated away"
-M-profile also has CPACR and NSACR similar to A-profile;
+mask.  Factor this mask calculation out of mve_element_mask() into
-they behave slightly differently:
+its own function.
  * the CPACR is banked between Secure and Non-Secure
  * if the NSACR forces a trap then this is taken to
    the Secure state, not the Non-Secure state
 Honour the CPACR and NSACR settings. The NSACR handling
 requires us to borrow the exception.target_el field
 (usually meaningless for M profile) to distinguish the
 NOCP UsageFault taken to Secure state from the more
 usual fault taken to the current security state.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-6-peter.maydell@linaro.org
 ---
- target/arm/helper.c    | 55 +++++++++++++++++++++++++++++++++++++++---
+ target/arm/mve_helper.c | 58 ++++++++++++++++++++++++-----------------
- target/arm/translate.c | 10 ++++++--
+file changed, 34 insertions(+), 24 deletions(-)
 files changed, 60 insertions(+), 5 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/mve_helper.c
-+++ b/target/arm/helper.c
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ uint32_t arm_phys_excp_target_el(CPUState *cs, uint32_t excp_idx,
+@@ -XXX,XX +XXX,XX @@
-     return target_el;
+ #include "exec/exec-all.h"
- }
+ #include "tcg/tcg.h"
-+/*
++static uint16_t mve_eci_mask(CPUARMState *env)
 + * Return true if the v7M CPACR permits access to the FPU for the specified
 + * security state and privilege level.
 + */
 +static bool v7m_cpacr_pass(CPUARMState *env, bool is_secure, bool is_priv)
 +{
-+    switch (extract32(env->v7m.cpacr[is_secure], 20, 2)) {
++    /*
-+    case 0:
++     * Return the mask of which elements in the MVE vector correspond
-+    case 2: /* UNPREDICTABLE: we treat like 0 */
++     * to beats being executed. The mask has 1 bits for executed lanes
-+        return false;
++     * and 0 bits where ECI says this beat was already executed.
-+    case 1:
++     */
-+        return is_priv;
++    int eci;
-+    case 3:
++
-+        return true;
++    if ((env->condexec_bits & 0xf) != 0) {
 +        return 0xffff;
 +    }
 +
 +    eci = env->condexec_bits >> 4;
 +    switch (eci) {
 +    case ECI_NONE:
 +        return 0xffff;
 +    case ECI_A0:
 +        return 0xfff0;
 +    case ECI_A0A1:
 +        return 0xff00;
 +    case ECI_A0A1A2:
 +    case ECI_A0A1A2B0:
 +        return 0xf000;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
- static bool v7m_stack_write(ARMCPU *cpu, uint32_t addr, uint32_t value,
+ static uint16_t mve_element_mask(CPUARMState *env)
                              ARMMMUIdx mmu_idx, bool ignfault)
  {
-@@ -XXX,XX +XXX,XX @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
+     /*
-         env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_UNDEFINSTR_MASK;
+@@ -XXX,XX +XXX,XX @@ static uint16_t mve_element_mask(CPUARMState *env)
-         break;
+         mask &= ltpmask;
      case EXCP_NOCP:
 -        armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
 -        env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_NOCP_MASK;
 +    {
 +        /*
 +         * NOCP might be directed to something other than the current
 +         * security state if this fault is because of NSACR; we indicate
 +         * the target security state using exception.target_el.
 +         */
 +        int target_secstate;
 +
 +        if (env->exception.target_el == 3) {
 +            target_secstate = M_REG_S;
 +        } else {
 +            target_secstate = env->v7m.secure;
 +        }
 +        armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, target_secstate);
 +        env->v7m.cfsr[target_secstate] |= R_V7M_CFSR_NOCP_MASK;
          break;
 +    }
      case EXCP_INVSTATE:
          armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
          env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_INVSTATE_MASK;
@@ -XXX,XX +XXX,XX @@ int fp_exception_el(CPUARMState *env, int cur_el)
          return 0;
      }
-+    if (arm_feature(env, ARM_FEATURE_M)) {
+-    if ((env->condexec_bits & 0xf) == 0) {
-+        /* CPACR can cause a NOCP UsageFault taken to current security state */
+-        /*
-+        if (!v7m_cpacr_pass(env, env->v7m.secure, cur_el != 0)) {
+-         * ECI bits indicate which beats are already executed;
-+            return 1;
+-         * we handle this by effectively predicating them out.
-+        }
+-         */
-+
+-        int eci = env->condexec_bits >> 4;
-+        if (arm_feature(env, ARM_FEATURE_M_SECURITY) && !env->v7m.secure) {
+-        switch (eci) {
-+            if (!extract32(env->v7m.nsacr, 10, 1)) {
+-        case ECI_NONE:
-+                /* FP insns cause a NOCP UsageFault taken to Secure */
+-            break;
-+                return 3;
+-        case ECI_A0:
-+            }
+-            mask &= 0xfff0;
-+        }
+-            break;
-+
+-        case ECI_A0A1:
-+        return 0;
+-            mask &= 0xff00;
-+    }
+-            break;
-+
+-        case ECI_A0A1A2:
-     /* The CPACR controls traps to EL1, or PL1 if we're 32 bit:
+-        case ECI_A0A1A2B0:
-      * 0, 2 : trap EL0 and EL1/PL1 accesses
+-            mask &= 0xf000;
-      * 1    : trap only EL0 accesses
+-            break;
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
+-        default:
-         flags = FIELD_DP32(flags, TBFLAG_A32, SCTLR_B, arm_sctlr_b(env));
+-            g_assert_not_reached();
-         flags = FIELD_DP32(flags, TBFLAG_A32, NS, !access_secure_reg(env));
+-        }
-         if (env->vfp.xregs[ARM_VFP_FPEXC] & (1 << 30)
+-    }
--            || arm_el_is_aa64(env, 1)) {
+-
-+            || arm_el_is_aa64(env, 1) || arm_feature(env, ARM_FEATURE_M)) {
++    /*
-             flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
++     * ECI bits indicate which beats are already executed;
-         }
++     * we handle this by effectively predicating them out.
-         flags = FIELD_DP32(flags, TBFLAG_A32, XSCALE_CPAR, env->cp15.c15_cpar);
++     */
-diff --git a/target/arm/translate.c b/target/arm/translate.c
++    mask &= mve_eci_mask(env);
-index XXXXXXX..XXXXXXX 100644
+     return mask;
---- a/target/arm/translate.c
+ }
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
       * for attempts to execute invalid vfp/neon encodings with FP disabled.
       */
      if (s->fp_excp_el) {
 -        gen_exception_insn(s, 4, EXCP_UDEF,
 -                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
 +        if (arm_dc_feature(s, ARM_FEATURE_M)) {
 +            gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
 +                               s->fp_excp_el);
 +        } else {
 +            gen_exception_insn(s, 4, EXCP_UDEF,
 +                               syn_fp_access_trap(1, 0xe, false),
 +                               s->fp_excp_el);
 +        }
          return 0;
      }
 --
 .20.1

-[Qemu-devel] [PULL 12/42] target/arm/helper: don't return early for STKOF faults during stacking
+[PULL 10/44] target/arm: Fix VPT advance when ECI is non-zero
-Currently the code in v7m_push_stack() which detects a violation
+We were not paying attention to the ECI state when advancing the VPT
-of the v8M stack limit simply returns early if it does so. This
+state.  Architecturally, VPT state advance happens for every beat
-is OK for the current integer-only code, but won't work for the
+(see the pseudocode VPTAdvance()), so on every beat the 4 bits of
-floating point handling we're about to add. We need to continue
+VPR.P0 corresponding to the current beat are inverted if required,
-executing the rest of the function so that we check for other
+and at the end of beats 1 and 3 the VPR MASK fields are updated.
-exceptions like not having permission to use the FPU and so
+This means that if the ECI state says we should not be executing all
-that we correctly set the FPCCR state if we are doing lazy
+beats then we need to skip some of the updating of the VPR that we
-stacking. Refactor to avoid the early return.
+currently do in mve_advance_vpt().
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-10-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 23 ++++++++++++++++++-----
+ target/arm/mve_helper.c | 24 +++++++++++++++++-------
-file changed, 18 insertions(+), 5 deletions(-)
+file changed, 17 insertions(+), 7 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/mve_helper.c
-+++ b/target/arm/helper.c
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
+@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
-      * should ignore further stack faults trying to process
+     /* Advance the VPT and ECI state if necessary */
-      * that derived exception.)
+     uint32_t vpr = env->v7m.vpr;
-      */
+     unsigned mask01, mask23;
--    bool stacked_ok;
++    uint16_t inv_mask;
-+    bool stacked_ok = true, limitviol = false;
++    uint16_t eci_mask = mve_eci_mask(env);
-     CPUARMState *env = &cpu->env;
-     uint32_t xpsr = xpsr_read(env);
+     if ((env->condexec_bits & 0xf) == 0) {
-     uint32_t frameptr = env->regs[13];
+         env->condexec_bits = (env->condexec_bits == (ECI_A0A1A2B0 << 4)) ?
-@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
+@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
-             armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE,
+         return;
                                      env->v7m.secure);
              env->regs[13] = limit;
 -            return true;
 +            /*
 +             * We won't try to perform any further memory accesses but
 +             * we must continue through the following code to check for
 +             * permission faults during FPU state preservation, and we
 +             * must update FPCCR if lazy stacking is enabled.
 +             */
 +            limitviol = true;
 +            stacked_ok = false;
          }
      }
-@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
++    /* Invert P0 bits if needed, but only for beats we actually executed */
-      * (which may be taken in preference to the one we started with
+     mask01 = FIELD_EX32(vpr, V7M_VPR, MASK01);
-      * if it has higher priority).
+     mask23 = FIELD_EX32(vpr, V7M_VPR, MASK23);
-      */
+-    if (mask01 > 8) {
--    stacked_ok =
+-        /* high bit set, but not 0b1000: invert the relevant half of P0 */
-+    stacked_ok = stacked_ok &&
+-        vpr ^= 0xff;
-         v7m_stack_write(cpu, frameptr, env->regs[0], mmu_idx, false) &&
++    /* Start by assuming we invert all bits corresponding to executed beats */
-         v7m_stack_write(cpu, frameptr + 4, env->regs[1], mmu_idx, false) &&
++    inv_mask = eci_mask;
-         v7m_stack_write(cpu, frameptr + 8, env->regs[2], mmu_idx, false) &&
++    if (mask01 <= 8) {
-@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
++        /* MASK01 says don't invert low half of P0 */
-         v7m_stack_write(cpu, frameptr + 24, env->regs[15], mmu_idx, false) &&
++        inv_mask &= ~0xff;
-         v7m_stack_write(cpu, frameptr + 28, xpsr, mmu_idx, false);
+     }
+-    if (mask23 > 8) {
--    /* Update SP regardless of whether any of the stack accesses failed. */
+-        /* high bit set, but not 0b1000: invert the relevant half of P0 */
--    env->regs[13] = frameptr;
+-        vpr ^= 0xff00;
-+    /*
++    if (mask23 <= 8) {
-+     * If we broke a stack limit then SP was already updated earlier;
++        /* MASK23 says don't invert high half of P0 */
-+     * otherwise we update SP regardless of whether any of the stack
++        inv_mask &= ~0xff00;
-+     * accesses failed or we took some other kind of fault.
+     }
-+     */
+-    vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
-+    if (!limitviol) {
++    vpr ^= inv_mask;
-+        env->regs[13] = frameptr;
++    /* Only update MASK01 if beat 1 executed */
 +    if (eci_mask & 0xf0) {
 +        vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
 +    }
++    /* Beat 3 always executes, so update MASK23 */
-     return !stacked_ok;
+     vpr = FIELD_DP32(vpr, V7M_VPR, MASK23, mask23 << 1);
      env->v7m.vpr = vpr;
  }
 --
 .20.1

-[Qemu-devel] [PULL 32/42] hw/arm/nseries: Use TYPE_TMP105 instead of hardcoded string
+[PULL 11/44] target/arm: Fix VLDRB/H/W for predicated elements
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+For vector loads, predicated elements are zeroed, instead of
 retaining their previous values (as happens for most data
 processing operations). This means we need to distinguish
 "beat not executed due to ECI" (don't touch destination
 element) from "beat executed but predicated out" (zero
 destination element).
-Suggested-by: Markus Armbruster <armbru@redhat.com>
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20190412165416.7977-3-philmd@redhat.com
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- hw/arm/nseries.c | 3 ++-
+ target/arm/mve_helper.c | 8 +++++---
-file changed, 2 insertions(+), 1 deletion(-)
+file changed, 5 insertions(+), 3 deletions(-)
-diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/nseries.c
+--- a/target/arm/mve_helper.c
-+++ b/hw/arm/nseries.c
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
- #include "hw/boards.h"
+     env->v7m.vpr = vpr;
  #include "hw/i2c/i2c.h"
  #include "hw/devices.h"
 +#include "hw/misc/tmp105.h"
  #include "hw/block/flash.h"
  #include "hw/hw.h"
  #include "hw/bt.h"
@@ -XXX,XX +XXX,XX @@ static void n8x0_i2c_setup(struct n800_s *s)
      qemu_register_powerdown_notifier(&n8x0_system_powerdown_notifier);
      /* Attach a TMP105 PM chip (A0 wired to ground) */
 -    dev = i2c_create_slave(i2c, "tmp105", N8X0_TMP105_ADDR);
 +    dev = i2c_create_slave(i2c, TYPE_TMP105, N8X0_TMP105_ADDR);
      qdev_connect_gpio_out(dev, 0, tmp_irq);
  }
+-
++/* For loads, predicated lanes are zeroed instead of keeping their old values */
+ #define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE)                         \
+     void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
+     {                                                                   \
+         TYPE *d = vd;                                                   \
+         uint16_t mask = mve_element_mask(env);                          \
++        uint16_t eci_mask = mve_eci_mask(env);                          \
+         unsigned b, e;                                                  \
+         /*                                                              \
+          * R_SXTM allows the dest reg to become UNKNOWN for abandoned   \
+@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
+          * then take an exception.                                      \
+          */                                                             \
+         for (b = 0, e = 0; b < 16; b += ESIZE, e++) {                   \
+-            if (mask & (1 << b)) {                                      \
+-                d[H##ESIZE(e)] = cpu_##LDTYPE##_data_ra(env, addr, GETPC()); \
++            if (eci_mask & (1 << b)) {                                  \
++                d[H##ESIZE(e)] = (mask & (1 << b)) ?                    \
++                    cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0;     \
+             }                                                           \
+             addr += MSIZE;                                              \
+         }                                                               \
 --
 .20.1

-[Qemu-devel] [PULL 17/42] target/arm: Allow for floating point in callee stack integrity check
+[PULL 12/44] target/arm: Implement MVE VMULL (polynomial)
-The magic value pushed onto the callee stack as an integrity
+Implement the MVE VMULL (polynomial) insn.  Unlike Neon, this comes
-check is different if floating point is present.
+in two flavours: 8x8->16 and a 16x16->32.  Also unlike Neon, the
 inputs are in either the low or the high half of each double-width
 element.
 The assembler for this insn indicates the size with "P8" or "P16",
 encoded into bit 28 as size = 0 or 1. We choose to follow the
 same encoding as VQDMULL and decode this into a->size as MO_16
 or MO_32 indicating the size of the result elements. This then
 carries through to the helper function names where it then
 matches up with the existing pmull_h() which does an 8x8->16
 operation and a new pmull_w() which does the 16x16->32.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-15-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 22 +++++++++++++++++++---
+ target/arm/helper-mve.h    |  5 +++++
-file changed, 19 insertions(+), 3 deletions(-)
+ target/arm/vec_internal.h  | 11 +++++++++++
  target/arm/mve.decode      | 14 ++++++++++----
  target/arm/mve_helper.c    | 16 ++++++++++++++++
  target/arm/translate-mve.c | 28 ++++++++++++++++++++++++++++
  target/arm/vec_helper.c    | 14 +++++++++++++-
 files changed, 83 insertions(+), 5 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ load_fail:
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     return false;
+ DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vmullpbh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vmullpth, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vmullpbw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +DEF_HELPER_FLAGS_4(mve_vmullptw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 +
  DEF_HELPER_FLAGS_4(mve_vqdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vqdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vqdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_internal.h
 +++ b/target/arm/vec_internal.h
@@ -XXX,XX +XXX,XX @@ int16_t do_sqrdmlah_h(int16_t, int16_t, int16_t, bool, bool, uint32_t *);
  int32_t do_sqrdmlah_s(int32_t, int32_t, int32_t, bool, bool, uint32_t *);
  int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool);
 +/*
 + * 8 x 8 -> 16 vector polynomial multiply where the inputs are
 + * in the low 8 bits of each 16-bit element
 +*/
 +uint64_t pmull_h(uint64_t op1, uint64_t op2);
 +/*
 + * 16 x 16 -> 32 vector polynomial multiply where the inputs are
 + * in the low 16 bits of each 32-bit element
 + */
 +uint64_t pmull_w(uint64_t op1, uint64_t op2);
 +
  #endif /* TARGET_ARM_VEC_INTERNALS_H */
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VHADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
  VHSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
  VHSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
 -VMULL_BS         111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
 -VMULL_BU         111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
 -VMULL_TS         111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
 -VMULL_TU         111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
 +{
 +  VMULLP_B       111 . 1110 0 . 11 ... 1 ... 0 1110 . 0 . 0 ... 0 @2op_sz28
 +  VMULL_BS       111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
 +  VMULL_BU       111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
 +}
 +{
 +  VMULLP_T       111 . 1110 0 . 11 ... 1 ... 1 1110 . 0 . 0 ... 0 @2op_sz28
 +  VMULL_TS       111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
 +  VMULL_TU       111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
 +}
  VQDMULH          1110 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
  VQRDMULH         1111 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_L(vmulltub, 1, 1, uint8_t, 2, uint16_t, DO_MUL)
  DO_2OP_L(vmulltuh, 1, 2, uint16_t, 4, uint32_t, DO_MUL)
  DO_2OP_L(vmulltuw, 1, 4, uint32_t, 8, uint64_t, DO_MUL)
 +/*
 + * Polynomial multiply. We can always do this generating 64 bits
 + * of the result at a time, so we don't need to use DO_2OP_L.
 + */
 +#define VMULLPH_MASK 0x00ff00ff00ff00ffULL
 +#define VMULLPW_MASK 0x0000ffff0000ffffULL
 +#define DO_VMULLPBH(N, M) pmull_h((N) & VMULLPH_MASK, (M) & VMULLPH_MASK)
 +#define DO_VMULLPTH(N, M) DO_VMULLPBH((N) >> 8, (M) >> 8)
 +#define DO_VMULLPBW(N, M) pmull_w((N) & VMULLPW_MASK, (M) & VMULLPW_MASK)
 +#define DO_VMULLPTW(N, M) DO_VMULLPBW((N) >> 16, (M) >> 16)
 +
 +DO_2OP(vmullpbh, 8, uint64_t, DO_VMULLPBH)
 +DO_2OP(vmullpth, 8, uint64_t, DO_VMULLPTH)
 +DO_2OP(vmullpbw, 8, uint64_t, DO_VMULLPBW)
 +DO_2OP(vmullptw, 8, uint64_t, DO_VMULLPTW)
 +
  /*
   * Because the computation type is at least twice as large as required,
   * these work for both signed and unsigned source types.
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMULLT(DisasContext *s, arg_2op *a)
      return do_2op(s, a, fns[a->size]);
  }
-+static uint32_t v7m_integrity_sig(CPUARMState *env, uint32_t lr)
++static bool trans_VMULLP_B(DisasContext *s, arg_2op *a)
 +{
 +    /*
-+     * Return the integrity signature value for the callee-saves
++     * Note that a->size indicates the output size, ie VMULL.P8
-+     * stack frame section. @lr is the exception return payload/LR value
++     * is the 8x8->16 operation and a->size is MO_16; VMULL.P16
-+     * whose FType bit forms bit 0 of the signature if FP is present.
++     * is the 16x16->32 operation and a->size is MO_32.
 +     */
-+    uint32_t sig = 0xfefa125a;
++    static MVEGenTwoOpFn * const fns[] = {
-+
++        NULL,
-+    if (!arm_feature(env, ARM_FEATURE_VFP) || (lr & R_V7M_EXCRET_FTYPE_MASK)) {
++        gen_helper_mve_vmullpbh,
-+        sig |= 1;
++        gen_helper_mve_vmullpbw,
-+    }
++        NULL,
-+    return sig;
++    };
 +    return do_2op(s, a, fns[a->size]);
 +}
 +
- static bool v7m_push_callee_stack(ARMCPU *cpu, uint32_t lr, bool dotailchain,
++static bool trans_VMULLP_T(DisasContext *s, arg_2op *a)
-                                   bool ignore_faults)
++{
 +    /* a->size is as for trans_VMULLP_B */
 +    static MVEGenTwoOpFn * const fns[] = {
 +        NULL,
 +        gen_helper_mve_vmullpth,
 +        gen_helper_mve_vmullptw,
 +        NULL,
 +    };
 +    return do_2op(s, a, fns[a->size]);
 +}
 +
  /*
   * VADC and VSBC: these perform an add-with-carry or subtract-with-carry
   * of the 32-bit elements in each lane of the input vectors, where the
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t expand_byte_to_half(uint64_t x)
           | ((x & 0xff000000) << 24);
  }
 -static uint64_t pmull_h(uint64_t op1, uint64_t op2)
 +uint64_t pmull_w(uint64_t op1, uint64_t op2)
  {
-@@ -XXX,XX +XXX,XX @@ static bool v7m_push_callee_stack(ARMCPU *cpu, uint32_t lr, bool dotailchain,
+     uint64_t result = 0;
-     bool stacked_ok;
+     int i;
-     uint32_t limit;
++    for (i = 0; i < 16; ++i) {
-     bool want_psp;
++        uint64_t mask = (op1 & 0x0000000100000001ull) * 0xffffffff;
-+    uint32_t sig;
++        result ^= op2 & mask;
++        op1 >>= 1;
-     if (dotailchain) {
++        op2 <<= 1;
-         bool mode = lr & R_V7M_EXCRET_MODE_MASK;
++    }
-@@ -XXX,XX +XXX,XX @@ static bool v7m_push_callee_stack(ARMCPU *cpu, uint32_t lr, bool dotailchain,
++    return result;
-     /* Write as much of the stack frame as we can. A write failure may
++}
-      * cause us to pend a derived exception.
-      */
++uint64_t pmull_h(uint64_t op1, uint64_t op2)
-+    sig = v7m_integrity_sig(env, lr);
++{
-     stacked_ok =
++    uint64_t result = 0;
--        v7m_stack_write(cpu, frameptr, 0xfefa125b, mmu_idx, ignore_faults) &&
++    int i;
-+        v7m_stack_write(cpu, frameptr, sig, mmu_idx, ignore_faults) &&
+     for (i = 0; i < 8; ++i) {
-         v7m_stack_write(cpu, frameptr + 0x8, env->regs[4], mmu_idx,
+         uint64_t mask = (op1 & 0x0001000100010001ull) * 0xffff;
-                         ignore_faults) &&
+         result ^= op2 & mask;
          v7m_stack_write(cpu, frameptr + 0xc, env->regs[5], mmu_idx,
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
          if (return_to_secure &&
              ((excret & R_V7M_EXCRET_ES_MASK) == 0 ||
               (excret & R_V7M_EXCRET_DCRS_MASK) == 0)) {
 -            uint32_t expected_sig = 0xfefa125b;
              uint32_t actual_sig;
              pop_ok = v7m_stack_read(cpu, &actual_sig, frameptr, mmu_idx);
 -            if (pop_ok && expected_sig != actual_sig) {
 +            if (pop_ok && v7m_integrity_sig(env, excret) != actual_sig) {
                  /* Take a SecureFault on the current stack */
                  env->v7m.sfsr |= R_V7M_SFSR_INVIS_MASK;
                  armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SECURE, false);
 --
 .20.1

-[Qemu-devel] [PULL 19/42] target/arm: Move NS TBFLAG from bit 19 to bit 6
+[PULL 13/44] target/arm: Implement MVE incrementing/decrementing dup insns
-Move the NS TBFLAG down from bit 19 to bit 6, which has not
+Implement the MVE incrementing/decrementing dup insns VIDUP, VDDUP,
-been used since commit c1e3781090b9d36c60 in 2015, when we
+VIWDUP and VDWDUP.  These fill the elements of a vector with
-started passing the entire MMU index in the TB flags rather
+successively incrementing values, starting at the offset specified in
-than just a 'privilege level' bit.
+a general purpose register.  The final value of the offset is written
+back to this register.  The wrapping variants take a second general
-This rearrangement is not strictly necessary, but means that
+purpose register which specifies the point where the count should
-we can put M-profile-only bits next to each other rather
+wrap back to 0.
 than scattered across the flag word.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-17-peter.maydell@linaro.org
 ---
- target/arm/cpu.h | 11 ++++++-----
+ target/arm/helper-mve.h    |  12 ++++
-file changed, 6 insertions(+), 5 deletions(-)
+ target/arm/mve.decode      |  25 ++++++++
  target/arm/mve_helper.c    |  63 +++++++++++++++++++
  target/arm/translate-mve.c | 120 +++++++++++++++++++++++++++++++++++++
 files changed, 220 insertions(+)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/cpu.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_ANY, BE_DATA, 23, 1)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
- FIELD(TBFLAG_A32, THUMB, 0, 1)
- FIELD(TBFLAG_A32, VECLEN, 1, 3)
+ DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
- FIELD(TBFLAG_A32, VECSTRIDE, 4, 2)
-+/*
++DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
-+ * Indicates whether cp register reads and writes by guest code should access
++DEF_HELPER_FLAGS_4(mve_viduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
-+ * the secure or nonsecure bank of banked registers; note that this is not
++DEF_HELPER_FLAGS_4(mve_vidupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
-+ * the same thing as the current security state of the processor!
++
-+ */
++DEF_HELPER_FLAGS_5(mve_viwdupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
-+FIELD(TBFLAG_A32, NS, 6, 1)
++DEF_HELPER_FLAGS_5(mve_viwduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
- FIELD(TBFLAG_A32, VFPEN, 7, 1)
++DEF_HELPER_FLAGS_5(mve_viwdupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
- FIELD(TBFLAG_A32, CONDEXEC, 8, 8)
++
- FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
++DEF_HELPER_FLAGS_5(mve_vdwdupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
++DEF_HELPER_FLAGS_5(mve_vdwduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
-  * checks on the other bits at runtime
++DEF_HELPER_FLAGS_5(mve_vdwdupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
-  */
++
- FIELD(TBFLAG_A32, XSCALE_CPAR, 17, 2)
+ DEF_HELPER_FLAGS_3(mve_vclsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
--/* Indicates whether cp register reads and writes by guest code should access
+ DEF_HELPER_FLAGS_3(mve_vclsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-- * the secure or nonsecure bank of banked registers; note that this is not
+ DEF_HELPER_FLAGS_3(mve_vclsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-- * the same thing as the current security state of the processor!
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-- */
+index XXXXXXX..XXXXXXX 100644
--FIELD(TBFLAG_A32, NS, 19, 1)
+--- a/target/arm/mve.decode
- /* For M profile only, Handler (ie not Thread) mode */
++++ b/target/arm/mve.decode
- FIELD(TBFLAG_A32, HANDLER, 21, 1)
+@@ -XXX,XX +XXX,XX @@
- /* For M profile only, whether we should generate stack-limit checks */
+ &2scalar qd qn rm size
  &1imm qd imm cmode op
  &2shift qd qm shift size
 +&vidup qd rn size imm
 +&viwdup qd rn rm size imm
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@ VDUP             1110 1110 1 1 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=0
  VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 1 1 0000 @vdup size=1
  VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
 +# Incrementing and decrementing dup
 +
 +# VIDUP, VDDUP format immediate: 1 << (immh:imml)
 +%imm_vidup 7:1 0:1 !function=vidup_imm
 +
 +# VIDUP, VDDUP registers: Rm bits [3:1] from insn, bit 0 is 1;
 +# Rn bits [3:1] from insn, bit 0 is 0
 +%vidup_rm 1:3 !function=times_2_plus_1
 +%vidup_rn 17:3 !function=times_2
 +
 +@vidup           .... .... . . size:2 .... .... .... .... .... \
 +                 qd=%qd imm=%imm_vidup rn=%vidup_rn &vidup
 +@viwdup          .... .... . . size:2 .... .... .... .... .... \
 +                 qd=%qd imm=%imm_vidup rm=%vidup_rm rn=%vidup_rn &viwdup
 +{
 +  VIDUP          1110 1110 0 . .. ... 1 ... 0 1111 . 110 111 . @vidup
 +  VIWDUP         1110 1110 0 . .. ... 1 ... 0 1111 . 110 ... . @viwdup
 +}
 +{
 +  VDDUP          1110 1110 0 . .. ... 1 ... 1 1111 . 110 111 . @vidup
 +  VDWDUP         1110 1110 0 . .. ... 1 ... 1 1111 . 110 ... . @viwdup
 +}
 +
  # multiply-add long dual accumulate
  # rdahi: bits [3:1] from insn, bit 0 is 1
  # rdalo: bits [3:1] from insn, bit 0 is 0
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_sqrshr)(CPUARMState *env, uint32_t n, uint32_t shift)
  {
      return do_sqrshl_bhs(n, -(int8_t)shift, 32, true, &env->QF);
  }
 +
 +#define DO_VIDUP(OP, ESIZE, TYPE, FN)                           \
 +    uint32_t HELPER(mve_##OP)(CPUARMState *env, void *vd,       \
 +                           uint32_t offset, uint32_t imm)       \
 +    {                                                           \
 +        TYPE *d = vd;                                           \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
 +            mergemask(&d[H##ESIZE(e)], offset, mask);           \
 +            offset = FN(offset, imm);                           \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +        return offset;                                          \
 +    }
 +
 +#define DO_VIWDUP(OP, ESIZE, TYPE, FN)                          \
 +    uint32_t HELPER(mve_##OP)(CPUARMState *env, void *vd,       \
 +                              uint32_t offset, uint32_t wrap,   \
 +                              uint32_t imm)                     \
 +    {                                                           \
 +        TYPE *d = vd;                                           \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
 +            mergemask(&d[H##ESIZE(e)], offset, mask);           \
 +            offset = FN(offset, wrap, imm);                     \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +        return offset;                                          \
 +    }
 +
 +#define DO_VIDUP_ALL(OP, FN)                    \
 +    DO_VIDUP(OP##b, 1, int8_t, FN)              \
 +    DO_VIDUP(OP##h, 2, int16_t, FN)             \
 +    DO_VIDUP(OP##w, 4, int32_t, FN)
 +
 +#define DO_VIWDUP_ALL(OP, FN)                   \
 +    DO_VIWDUP(OP##b, 1, int8_t, FN)             \
 +    DO_VIWDUP(OP##h, 2, int16_t, FN)            \
 +    DO_VIWDUP(OP##w, 4, int32_t, FN)
 +
 +static uint32_t do_add_wrap(uint32_t offset, uint32_t wrap, uint32_t imm)
 +{
 +    offset += imm;
 +    if (offset == wrap) {
 +        offset = 0;
 +    }
 +    return offset;
 +}
 +
 +static uint32_t do_sub_wrap(uint32_t offset, uint32_t wrap, uint32_t imm)
 +{
 +    if (offset == 0) {
 +        offset = wrap;
 +    }
 +    offset -= imm;
 +    return offset;
 +}
 +
 +DO_VIDUP_ALL(vidup, DO_ADD)
 +DO_VIWDUP_ALL(viwdup, do_add_wrap)
 +DO_VIWDUP_ALL(vdwdup, do_sub_wrap)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@
  #include "translate.h"
  #include "translate-a32.h"
 +static inline int vidup_imm(DisasContext *s, int x)
 +{
 +    return 1 << x;
 +}
 +
  /* Include the generated decoder */
  #include "decode-mve.c.inc"
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
  typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
 +typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
 +typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLC(DisasContext *s, arg_VSHLC *a)
      mve_update_eci(s);
      return true;
  }
 +
 +static bool do_vidup(DisasContext *s, arg_vidup *a, MVEGenVIDUPFn *fn)
 +{
 +    TCGv_ptr qd;
 +    TCGv_i32 rn;
 +
 +    /*
 +     * Vector increment/decrement with wrap and duplicate (VIDUP, VDDUP).
 +     * This fills the vector with elements of successively increasing
 +     * or decreasing values, starting from Rn.
 +     */
 +    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd)) {
 +        return false;
 +    }
 +    if (a->size == MO_64) {
 +        /* size 0b11 is another encoding */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    qd = mve_qreg_ptr(a->qd);
 +    rn = load_reg(s, a->rn);
 +    fn(rn, cpu_env, qd, rn, tcg_constant_i32(a->imm));
 +    store_reg(s, a->rn, rn);
 +    tcg_temp_free_ptr(qd);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +static bool do_viwdup(DisasContext *s, arg_viwdup *a, MVEGenVIWDUPFn *fn)
 +{
 +    TCGv_ptr qd;
 +    TCGv_i32 rn, rm;
 +
 +    /*
 +     * Vector increment/decrement with wrap and duplicate (VIWDUp, VDWDUP)
 +     * This fills the vector with elements of successively increasing
 +     * or decreasing values, starting from Rn. Rm specifies a point where
 +     * the count wraps back around to 0. The updated offset is written back
 +     * to Rn.
 +     */
 +    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd)) {
 +        return false;
 +    }
 +    if (!fn || a->rm == 13 || a->rm == 15) {
 +        /*
 +         * size 0b11 is another encoding; Rm == 13 is UNPREDICTABLE;
 +         * Rm == 13 is VIWDUP, VDWDUP.
 +         */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    qd = mve_qreg_ptr(a->qd);
 +    rn = load_reg(s, a->rn);
 +    rm = load_reg(s, a->rm);
 +    fn(rn, cpu_env, qd, rn, rm, tcg_constant_i32(a->imm));
 +    store_reg(s, a->rn, rn);
 +    tcg_temp_free_ptr(qd);
 +    tcg_temp_free_i32(rm);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +static bool trans_VIDUP(DisasContext *s, arg_vidup *a)
 +{
 +    static MVEGenVIDUPFn * const fns[] = {
 +        gen_helper_mve_vidupb,
 +        gen_helper_mve_viduph,
 +        gen_helper_mve_vidupw,
 +        NULL,
 +    };
 +    return do_vidup(s, a, fns[a->size]);
 +}
 +
 +static bool trans_VDDUP(DisasContext *s, arg_vidup *a)
 +{
 +    static MVEGenVIDUPFn * const fns[] = {
 +        gen_helper_mve_vidupb,
 +        gen_helper_mve_viduph,
 +        gen_helper_mve_vidupw,
 +        NULL,
 +    };
 +    /* VDDUP is just like VIDUP but with a negative immediate */
 +    a->imm = -a->imm;
 +    return do_vidup(s, a, fns[a->size]);
 +}
 +
 +static bool trans_VIWDUP(DisasContext *s, arg_viwdup *a)
 +{
 +    static MVEGenVIWDUPFn * const fns[] = {
 +        gen_helper_mve_viwdupb,
 +        gen_helper_mve_viwduph,
 +        gen_helper_mve_viwdupw,
 +        NULL,
 +    };
 +    return do_viwdup(s, a, fns[a->size]);
 +}
 +
 +static bool trans_VDWDUP(DisasContext *s, arg_viwdup *a)
 +{
 +    static MVEGenVIWDUPFn * const fns[] = {
 +        gen_helper_mve_vdwdupb,
 +        gen_helper_mve_vdwduph,
 +        gen_helper_mve_vdwdupw,
 +        NULL,
 +    };
 +    return do_viwdup(s, a, fns[a->size]);
 +}
 --
 .20.1

-[Qemu-devel] [PULL 31/42] hw/arm/aspeed: Use TYPE_TMP105/TYPE_PCA9552 instead of hardcoded string
+[PULL 14/44] target/arm: Factor out gen_vpst()
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+Factor out the "generate code to update VPR.MASK01/MASK23" part of
 trans_VPST(); we are going to want to reuse it for the VPT insns.
-Reviewed-by: Thomas Huth <thuth@redhat.com>
-Reviewed-by: Cédric Le Goater <clg@kaod.org>
-Reviewed-by: Markus Armbruster <armbru@redhat.com>
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20190412165416.7977-2-philmd@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- hw/arm/aspeed.c | 13 +++++++++----
+ target/arm/translate-mve.c | 31 +++++++++++++++++--------------
-file changed, 9 insertions(+), 4 deletions(-)
+file changed, 17 insertions(+), 14 deletions(-)
-diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/aspeed.c
+--- a/target/arm/translate-mve.c
-+++ b/hw/arm/aspeed.c
++++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
- #include "hw/arm/aspeed_soc.h"
+     return do_long_dual_acc(s, a, fns[a->x]);
  #include "hw/boards.h"
  #include "hw/i2c/smbus_eeprom.h"
 +#include "hw/misc/pca9552.h"
 +#include "hw/misc/tmp105.h"
  #include "qemu/log.h"
  #include "sysemu/block-backend.h"
  #include "hw/loader.h"
@@ -XXX,XX +XXX,XX @@ static void ast2500_evb_i2c_init(AspeedBoardState *bmc)
                            eeprom_buf);
      /* The AST2500 EVB expects a LM75 but a TMP105 is compatible */
 -    i2c_create_slave(aspeed_i2c_get_bus(DEVICE(&soc->i2c), 7), "tmp105", 0x4d);
 +    i2c_create_slave(aspeed_i2c_get_bus(DEVICE(&soc->i2c), 7),
 +                     TYPE_TMP105, 0x4d);
      /* The AST2500 EVB does not have an RTC. Let's pretend that one is
       * plugged on the I2C bus header */
@@ -XXX,XX +XXX,XX @@ static void witherspoon_bmc_i2c_init(AspeedBoardState *bmc)
      AspeedSoCState *soc = &bmc->soc;
      uint8_t *eeprom_buf = g_malloc0(8 * 1024);
 -    i2c_create_slave(aspeed_i2c_get_bus(DEVICE(&soc->i2c), 3), "pca9552", 0x60);
 +    i2c_create_slave(aspeed_i2c_get_bus(DEVICE(&soc->i2c), 3), TYPE_PCA9552,
 +                     0x60);
      i2c_create_slave(aspeed_i2c_get_bus(DEVICE(&soc->i2c), 4), "tmp423", 0x4c);
      i2c_create_slave(aspeed_i2c_get_bus(DEVICE(&soc->i2c), 5), "tmp423", 0x4c);
      /* The Witherspoon expects a TMP275 but a TMP105 is compatible */
 -    i2c_create_slave(aspeed_i2c_get_bus(DEVICE(&soc->i2c), 9), "tmp105", 0x4a);
 +    i2c_create_slave(aspeed_i2c_get_bus(DEVICE(&soc->i2c), 9), TYPE_TMP105,
 +                     0x4a);
      /* The witherspoon board expects Epson RX8900 I2C RTC but a ds1338 is
       * good enough */
@@ -XXX,XX +XXX,XX @@ static void witherspoon_bmc_i2c_init(AspeedBoardState *bmc)
      smbus_eeprom_init_one(aspeed_i2c_get_bus(DEVICE(&soc->i2c), 11), 0x51,
                            eeprom_buf);
 -    i2c_create_slave(aspeed_i2c_get_bus(DEVICE(&soc->i2c), 11), "pca9552",
 +    i2c_create_slave(aspeed_i2c_get_bus(DEVICE(&soc->i2c), 11), TYPE_PCA9552,
 x60);
  }
+-static bool trans_VPST(DisasContext *s, arg_VPST *a)
++static void gen_vpst(DisasContext *s, uint32_t mask)
+ {
+-    TCGv_i32 vpr;
+-
+-    /* mask == 0 is a "related encoding" */
+-    if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
+-        return false;
+-    }
+-    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+-        return true;
+-    }
+     /*
+      * Set the VPR mask fields. We take advantage of MASK01 and MASK23
+      * being adjacent fields in the register.
+      *
+-     * This insn is not predicated, but it is subject to beat-wise
++     * Updating the masks is not predicated, but it is subject to beat-wise
+      * execution, and the mask is updated on the odd-numbered beats.
+      * So if PSR.ECI says we should skip beat 1, we mustn't update the
+      * 01 mask field.
+      */
+-    vpr = load_cpu_field(v7m.vpr);
++    TCGv_i32 vpr = load_cpu_field(v7m.vpr);
+     switch (s->eci) {
+     case ECI_NONE:
+     case ECI_A0:
+         /* Update both 01 and 23 fields */
+         tcg_gen_deposit_i32(vpr, vpr,
+-                            tcg_constant_i32(a->mask | (a->mask << 4)),
++                            tcg_constant_i32(mask | (mask << 4)),
+                             R_V7M_VPR_MASK01_SHIFT,
+                             R_V7M_VPR_MASK01_LENGTH + R_V7M_VPR_MASK23_LENGTH);
+         break;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VPST(DisasContext *s, arg_VPST *a)
+     case ECI_A0A1A2B0:
+         /* Update only the 23 mask field */
+         tcg_gen_deposit_i32(vpr, vpr,
+-                            tcg_constant_i32(a->mask),
++                            tcg_constant_i32(mask),
+                             R_V7M_VPR_MASK23_SHIFT, R_V7M_VPR_MASK23_LENGTH);
+         break;
+     default:
+         g_assert_not_reached();
+     }
+     store_cpu_field(vpr, v7m.vpr);
++}
++
++static bool trans_VPST(DisasContext *s, arg_VPST *a)
++{
++    /* mask == 0 is a "related encoding" */
++    if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
++        return false;
++    }
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
++        return true;
++    }
++    gen_vpst(s, a->mask);
+     mve_update_and_store_eci(s);
+     return true;
+ }
 --
 .20.1

-[Qemu-devel] [PULL 23/42] target/arm: New helper function arm_v7m_mmu_idx_all()
+[PULL 15/44] target/arm: Implement MVE integer vector comparisons
-Add a new helper function which returns the MMU index to use
+Implement the MVE integer vector comparison instructions.  These are
-for v7M, where the caller specifies all of the security
+"VCMP (vector)" encodings T1, T2 and T3, and "VPT (vector)" encodings
-state, privilege level and whether the execution priority
+T1, T2 and T3.
-is negative, and reimplement the existing
-arm_v7m_mmu_idx_for_secstate_and_priv() in terms of it.
+These insns compare corresponding elements in each vector, and update
+the VPR.P0 predicate bits with the results of the comparison.  VPT
-We are going to need this for the lazy-FP-stacking code.
+also sets the VPR.MASK01 and VPR.MASK23 fields -- it is effectively
 "VCMP then VPST".
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-21-peter.maydell@linaro.org
 ---
- target/arm/cpu.h    |  7 +++++++
+ target/arm/helper-mve.h    | 32 ++++++++++++++++++++++
- target/arm/helper.c | 14 +++++++++++---
+ target/arm/mve.decode      | 18 +++++++++++-
-files changed, 18 insertions(+), 3 deletions(-)
+ target/arm/mve_helper.c    | 56 ++++++++++++++++++++++++++++++++++++++
+ target/arm/translate-mve.c | 47 ++++++++++++++++++++++++++++++++
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+files changed, 152 insertions(+), 1 deletion(-)
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-+++ b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@ static inline int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx)
+--- a/target/arm/helper-mve.h
-     }
++++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_uqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
  DEF_HELPER_FLAGS_3(mve_sqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
  DEF_HELPER_FLAGS_3(mve_uqrshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
  DEF_HELPER_FLAGS_3(mve_sqrshr, TCG_CALL_NO_RWG, i32, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vcmpeqb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmpeqh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmpeqw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_3(mve_vcmpneb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmpneh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmpnew, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_3(mve_vcmpcsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmpcsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmpcsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_3(mve_vcmphib, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmphih, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmphiw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_3(mve_vcmpgeb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmpgeh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmpgew, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_3(mve_vcmpltb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmplth, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmpltw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_3(mve_vcmpgtb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmpgth, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmpgtw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
 +DEF_HELPER_FLAGS_3(mve_vcmpleb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmpleh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vcmplew, TCG_CALL_NO_WG, void, env, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  &2shift qd qm shift size
  &vidup qd rn size imm
  &viwdup qd rn rm size imm
 +&vcmp qm qn size mask
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@
  @2_shr_w .... .... .. 1 ..... .... .... .... .... &2shift qd=%qd qm=%qm \
           size=2 shift=%rshift_i5
 +# Vector comparison; 4-bit Qm but 3-bit Qn
 +%mask_22_13      22:1 13:3
 +@vcmp    .... .... .. size:2 qn:3 . .... .... .... .... &vcmp qm=%qm mask=%mask_22_13
 +
  # Vector loads and stores
  # Widening loads and narrowing stores:
@@ -XXX,XX +XXX,XX @@ VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  }
+ # Predicate operations
+-%mask_22_13      22:1 13:3
+ VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
+ # Logical immediate operations (1 reg and modified-immediate)
+@@ -XXX,XX +XXX,XX @@ VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
+ VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
+ VSHLC             111 0 1110 1 . 1 imm:5 ... 0 1111 1100 rdm:4 qd=%qd
++
++# Comparisons. We expand out the conditions which are split across
++# encodings T1, T2, T3 and the fc bits. These include VPT, which is
++# effectively "VCMP then VPST". A plain "VCMP" has a mask field of zero.
++VCMPEQ            1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 0 @vcmp
++VCMPNE            1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 0 @vcmp
++VCMPCS            1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 1 @vcmp
++VCMPHI            1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 1 @vcmp
++VCMPGE            1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp
++VCMPLT            1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp
++VCMPGT            1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
++VCMPLE            1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/mve_helper.c
++++ b/target/arm/mve_helper.c
+@@ -XXX,XX +XXX,XX @@ static uint32_t do_sub_wrap(uint32_t offset, uint32_t wrap, uint32_t imm)
+ DO_VIDUP_ALL(vidup, DO_ADD)
+ DO_VIWDUP_ALL(viwdup, do_add_wrap)
+ DO_VIWDUP_ALL(vdwdup, do_sub_wrap)
++
 +/*
-+ * Return the MMU index for a v7M CPU with all relevant information
++ * Vector comparison.
-+ * manually specified.
++ * P0 bits for non-executed beats (where eci_mask is 0) are unchanged.
 + * P0 bits for predicated lanes in executed beats (where mask is 0) are 0.
 + * P0 bits otherwise are updated with the results of the comparisons.
 + * We must also keep unchanged the MASK fields at the top of v7m.vpr.
 + */
-+ARMMMUIdx arm_v7m_mmu_idx_all(CPUARMState *env,
++#define DO_VCMP(OP, ESIZE, TYPE, FN)                                    \
-+                              bool secstate, bool priv, bool negpri);
++    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, void *vm)   \
-+
++    {                                                                   \
- /* Return the MMU index for a v7M CPU in the specified security and
++        TYPE *n = vn, *m = vm;                                          \
-  * privilege state.
++        uint16_t mask = mve_element_mask(env);                          \
-  */
++        uint16_t eci_mask = mve_eci_mask(env);                          \
-diff --git a/target/arm/helper.c b/target/arm/helper.c
++        uint16_t beatpred = 0;                                          \
-index XXXXXXX..XXXXXXX 100644
++        uint16_t emask = MAKE_64BIT_MASK(0, ESIZE);                     \
---- a/target/arm/helper.c
++        unsigned e;                                                     \
-+++ b/target/arm/helper.c
++        for (e = 0; e < 16 / ESIZE; e++) {                              \
-@@ -XXX,XX +XXX,XX @@ int fp_exception_el(CPUARMState *env, int cur_el)
++            bool r = FN(n[H##ESIZE(e)], m[H##ESIZE(e)]);                \
-     return 0;
++            /* Comparison sets 0/1 bits for each byte in the element */ \
 +            beatpred |= r * emask;                                      \
 +            emask <<= ESIZE;                                            \
 +        }                                                               \
 +        beatpred &= mask;                                               \
 +        env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) |           \
 +            (beatpred & eci_mask);                                      \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +#define DO_VCMP_S(OP, FN)                       \
 +    DO_VCMP(OP##b, 1, int8_t, FN)               \
 +    DO_VCMP(OP##h, 2, int16_t, FN)              \
 +    DO_VCMP(OP##w, 4, int32_t, FN)
 +
 +#define DO_VCMP_U(OP, FN)                       \
 +    DO_VCMP(OP##b, 1, uint8_t, FN)              \
 +    DO_VCMP(OP##h, 2, uint16_t, FN)             \
 +    DO_VCMP(OP##w, 4, uint32_t, FN)
 +
 +#define DO_EQ(N, M) ((N) == (M))
 +#define DO_NE(N, M) ((N) != (M))
 +#define DO_EQ(N, M) ((N) == (M))
 +#define DO_EQ(N, M) ((N) == (M))
 +#define DO_GE(N, M) ((N) >= (M))
 +#define DO_LT(N, M) ((N) < (M))
 +#define DO_GT(N, M) ((N) > (M))
 +#define DO_LE(N, M) ((N) <= (M))
 +
 +DO_VCMP_U(vcmpeq, DO_EQ)
 +DO_VCMP_U(vcmpne, DO_NE)
 +DO_VCMP_U(vcmpcs, DO_GE)
 +DO_VCMP_U(vcmphi, DO_GT)
 +DO_VCMP_S(vcmpge, DO_GE)
 +DO_VCMP_S(vcmplt, DO_LT)
 +DO_VCMP_S(vcmpgt, DO_GT)
 +DO_VCMP_S(vcmple, DO_LE)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
  typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
  typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
 +typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static bool trans_VDWDUP(DisasContext *s, arg_viwdup *a)
      };
      return do_viwdup(s, a, fns[a->size]);
  }
++
--ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
++static bool do_vcmp(DisasContext *s, arg_vcmp *a, MVEGenCmpFn *fn)
 -                                                bool secstate, bool priv)
 +ARMMMUIdx arm_v7m_mmu_idx_all(CPUARMState *env,
 +                              bool secstate, bool priv, bool negpri)
  {
      ARMMMUIdx mmu_idx = ARM_MMU_IDX_M;
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
          mmu_idx |= ARM_MMU_IDX_M_PRIV;
      }
 -    if (armv7m_nvic_neg_prio_requested(env->nvic, secstate)) {
 +    if (negpri) {
          mmu_idx |= ARM_MMU_IDX_M_NEGPRI;
      }
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
      return mmu_idx;
  }
 +ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
 +                                                bool secstate, bool priv)
 +{
-+    bool negpri = armv7m_nvic_neg_prio_requested(env->nvic, secstate);
++    TCGv_ptr qn, qm;
 +
-+    return arm_v7m_mmu_idx_all(env, secstate, priv, negpri);
++    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qm) ||
 +        !fn) {
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    qn = mve_qreg_ptr(a->qn);
 +    qm = mve_qreg_ptr(a->qm);
 +    fn(cpu_env, qn, qm);
 +    tcg_temp_free_ptr(qn);
 +    tcg_temp_free_ptr(qm);
 +    if (a->mask) {
 +        /* VPT */
 +        gen_vpst(s, a->mask);
 +    }
 +    mve_update_eci(s);
 +    return true;
 +}
 +
- /* Return the MMU index for a v7M CPU in the specified security state */
++#define DO_VCMP(INSN, FN)                                       \
- ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
++    static bool trans_##INSN(DisasContext *s, arg_vcmp *a)      \
- {
++    {                                                           \
 +        static MVEGenCmpFn * const fns[] = {                    \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +            gen_helper_mve_##FN##w,                             \
 +            NULL,                                               \
 +        };                                                      \
 +        return do_vcmp(s, a, fns[a->size]);                     \
 +    }
 +
 +DO_VCMP(VCMPEQ, vcmpeq)
 +DO_VCMP(VCMPNE, vcmpne)
 +DO_VCMP(VCMPCS, vcmpcs)
 +DO_VCMP(VCMPHI, vcmphi)
 +DO_VCMP(VCMPGE, vcmpge)
 +DO_VCMP(VCMPLT, vcmplt)
 +DO_VCMP(VCMPGT, vcmpgt)
 +DO_VCMP(VCMPLE, vcmple)
 --
 .20.1

-[Qemu-devel] [PULL 22/42] target/arm: Activate M-profile floating point context when FPCCR.ASPEN is set
+[PULL 16/44] target/arm: Implement MVE integer vector-vs-scalar comparisons
-The M-profile FPCCR.ASPEN bit indicates that automatic floating-point
+Implement the MVE integer vector comparison instructions that compare
-context preservation is enabled. Before executing any floating-point
+each element against a scalar from a general purpose register.  These
-instruction, if FPCCR.ASPEN is set and the CONTROL FPCA/SFPA bits
+are "VCMP (vector)" encodings T4, T5 and T6 and "VPT (vector)"
-indicate that there is no active floating point context then we
+encodings T4, T5 and T6.
-must create a new context (by initializing FPSCR and setting
-FPCA/SFPA to indicate that the context is now active). In the
+We have to move the decodetree pattern for VPST, because it
-pseudocode this is handled by ExecuteFPCheck().
+overlaps with VCMP T4 with size = 0b11.
 Implement this with a new TB flag which tracks whether we
 need to create a new FP context.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-20-peter.maydell@linaro.org
 ---
- target/arm/cpu.h       |  2 ++
+ target/arm/helper-mve.h    | 32 +++++++++++++++++++++++++++
- target/arm/translate.h |  1 +
+ target/arm/mve.decode      | 18 +++++++++++++---
- target/arm/helper.c    | 13 +++++++++++++
+ target/arm/mve_helper.c    | 44 +++++++++++++++++++++++++++++++-------
- target/arm/translate.c | 29 +++++++++++++++++++++++++++++
+ target/arm/translate-mve.c | 43 +++++++++++++++++++++++++++++++++++++
-files changed, 45 insertions(+)
+files changed, 126 insertions(+), 11 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/cpu.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, NS, 6, 1)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vcmpgtw, TCG_CALL_NO_WG, void, env, ptr, ptr)
- FIELD(TBFLAG_A32, VFPEN, 7, 1)
+ DEF_HELPER_FLAGS_3(mve_vcmpleb, TCG_CALL_NO_WG, void, env, ptr, ptr)
- FIELD(TBFLAG_A32, CONDEXEC, 8, 8)
+ DEF_HELPER_FLAGS_3(mve_vcmpleh, TCG_CALL_NO_WG, void, env, ptr, ptr)
- FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
+ DEF_HELPER_FLAGS_3(mve_vcmplew, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+/* For M profile only, set if we must create a new FP context */
++
-+FIELD(TBFLAG_A32, NEW_FP_CTXT_NEEDED, 19, 1)
++DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
- /* For M profile only, set if FPCCR.S does not match current security state */
++DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
- FIELD(TBFLAG_A32, FPCCR_S_WRONG, 20, 1)
++DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
- /* For M profile only, Handler (ie not Thread) mode */
++
-diff --git a/target/arm/translate.h b/target/arm/translate.h
++DEF_HELPER_FLAGS_3(mve_vcmpne_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
-index XXXXXXX..XXXXXXX 100644
++DEF_HELPER_FLAGS_3(mve_vcmpne_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
---- a/target/arm/translate.h
++DEF_HELPER_FLAGS_3(mve_vcmpne_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
-+++ b/target/arm/translate.h
++
-@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
++DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
-     bool v8m_secure; /* true if v8M and we're in Secure mode */
++DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
-     bool v8m_stackcheck; /* true if we need to perform v8M stack limit checks */
++DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
-     bool v8m_fpccr_s_wrong; /* true if v8M FPCCR.S != v8m_secure */
++
-+    bool v7m_new_fp_ctxt_needed; /* ASPEN set but no active FP context */
++DEF_HELPER_FLAGS_3(mve_vcmphi_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
-     /* Immediate value in AArch32 SVC insn; must be set if is_jmp == DISAS_SWI
++DEF_HELPER_FLAGS_3(mve_vcmphi_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
-      * so that top level loop can generate correct syndrome information.
++DEF_HELPER_FLAGS_3(mve_vcmphi_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
-      */
++
-diff --git a/target/arm/helper.c b/target/arm/helper.c
++DEF_HELPER_FLAGS_3(mve_vcmpge_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
-index XXXXXXX..XXXXXXX 100644
++DEF_HELPER_FLAGS_3(mve_vcmpge_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
---- a/target/arm/helper.c
++DEF_HELPER_FLAGS_3(mve_vcmpge_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
-+++ b/target/arm/helper.c
++
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
++DEF_HELPER_FLAGS_3(mve_vcmplt_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
-         flags = FIELD_DP32(flags, TBFLAG_A32, FPCCR_S_WRONG, 1);
++DEF_HELPER_FLAGS_3(mve_vcmplt_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vcmplt_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vcmple_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vcmple_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vcmple_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  &vidup qd rn size imm
  &viwdup qd rn rm size imm
  &vcmp qm qn size mask
 +&vcmp_scalar qn rm size mask
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@
  # Vector comparison; 4-bit Qm but 3-bit Qn
  %mask_22_13      22:1 13:3
  @vcmp    .... .... .. size:2 qn:3 . .... .... .... .... &vcmp qm=%qm mask=%mask_22_13
 +@vcmp_scalar .... .... .. size:2 qn:3 . .... .... .... rm:4 &vcmp_scalar \
 +             mask=%mask_22_13
  # Vector loads and stores
@@ -XXX,XX +XXX,XX @@ VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
                   rdahi=%rdahi rdalo=%rdalo
  }
 -# Predicate operations
 -VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
 -
  # Logical immediate operations (1 reg and modified-immediate)
  # The cmode/op bits here decode VORR/VBIC/VMOV/VMVN, but
@@ -XXX,XX +XXX,XX @@ VCMPGE            1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp
  VCMPLT            1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp
  VCMPGT            1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
  VCMPLE            1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp
 +
 +{
 +  VPST            1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
 +  VCMPEQ_scalar   1111 1110 0 . .. ... 1 ... 0 1111 0 1 0 0 .... @vcmp_scalar
 +}
 +VCMPNE_scalar     1111 1110 0 . .. ... 1 ... 0 1111 1 1 0 0 .... @vcmp_scalar
 +VCMPCS_scalar     1111 1110 0 . .. ... 1 ... 0 1111 0 1 1 0 .... @vcmp_scalar
 +VCMPHI_scalar     1111 1110 0 . .. ... 1 ... 0 1111 1 1 1 0 .... @vcmp_scalar
 +VCMPGE_scalar     1111 1110 0 . .. ... 1 ... 1 1111 0 1 0 0 .... @vcmp_scalar
 +VCMPLT_scalar     1111 1110 0 . .. ... 1 ... 1 1111 1 1 0 0 .... @vcmp_scalar
 +VCMPGT_scalar     1111 1110 0 . .. ... 1 ... 1 1111 0 1 1 0 .... @vcmp_scalar
 +VCMPLE_scalar     1111 1110 0 . .. ... 1 ... 1 1111 1 1 1 0 .... @vcmp_scalar
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VIWDUP_ALL(vdwdup, do_sub_wrap)
          mve_advance_vpt(env);                                           \
      }
-+    if (arm_feature(env, ARM_FEATURE_M) &&
+-#define DO_VCMP_S(OP, FN)                       \
-+        (env->v7m.fpccr[env->v7m.secure] & R_V7M_FPCCR_ASPEN_MASK) &&
+-    DO_VCMP(OP##b, 1, int8_t, FN)               \
-+        (!(env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK) ||
+-    DO_VCMP(OP##h, 2, int16_t, FN)              \
-+         (env->v7m.secure &&
+-    DO_VCMP(OP##w, 4, int32_t, FN)
-+          !(env->v7m.control[M_REG_S] & R_V7M_CONTROL_SFPA_MASK)))) {
++#define DO_VCMP_SCALAR(OP, ESIZE, TYPE, FN)                             \
-+        /*
++    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,             \
-+         * ASPEN is set, but FPCA/SFPA indicate that there is no active
++                                uint32_t rm)                            \
-+         * FP context; we must create a new FP context before executing
++    {                                                                   \
-+         * any FP insn.
++        TYPE *n = vn;                                                   \
-+         */
++        uint16_t mask = mve_element_mask(env);                          \
-+        flags = FIELD_DP32(flags, TBFLAG_A32, NEW_FP_CTXT_NEEDED, 1);
++        uint16_t eci_mask = mve_eci_mask(env);                          \
-+    }
++        uint16_t beatpred = 0;                                          \
-+
++        uint16_t emask = MAKE_64BIT_MASK(0, ESIZE);                     \
-     *pflags = flags;
++        unsigned e;                                                     \
-     *cs_base = 0;
++        for (e = 0; e < 16 / ESIZE; e++) {                              \
 +            bool r = FN(n[H##ESIZE(e)], (TYPE)rm);                      \
 +            /* Comparison sets 0/1 bits for each byte in the element */ \
 +            beatpred |= r * emask;                                      \
 +            emask <<= ESIZE;                                            \
 +        }                                                               \
 +        beatpred &= mask;                                               \
 +        env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) |           \
 +            (beatpred & eci_mask);                                      \
 +        mve_advance_vpt(env);                                           \
 +    }
 -#define DO_VCMP_U(OP, FN)                       \
 -    DO_VCMP(OP##b, 1, uint8_t, FN)              \
 -    DO_VCMP(OP##h, 2, uint16_t, FN)             \
 -    DO_VCMP(OP##w, 4, uint32_t, FN)
 +#define DO_VCMP_S(OP, FN)                               \
 +    DO_VCMP(OP##b, 1, int8_t, FN)                       \
 +    DO_VCMP(OP##h, 2, int16_t, FN)                      \
 +    DO_VCMP(OP##w, 4, int32_t, FN)                      \
 +    DO_VCMP_SCALAR(OP##_scalarb, 1, int8_t, FN)         \
 +    DO_VCMP_SCALAR(OP##_scalarh, 2, int16_t, FN)        \
 +    DO_VCMP_SCALAR(OP##_scalarw, 4, int32_t, FN)
 +
 +#define DO_VCMP_U(OP, FN)                               \
 +    DO_VCMP(OP##b, 1, uint8_t, FN)                      \
 +    DO_VCMP(OP##h, 2, uint16_t, FN)                     \
 +    DO_VCMP(OP##w, 4, uint32_t, FN)                     \
 +    DO_VCMP_SCALAR(OP##_scalarb, 1, uint8_t, FN)        \
 +    DO_VCMP_SCALAR(OP##_scalarh, 2, uint16_t, FN)       \
 +    DO_VCMP_SCALAR(OP##_scalarw, 4, uint32_t, FN)
  #define DO_EQ(N, M) ((N) == (M))
  #define DO_NE(N, M) ((N) != (M))
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
  typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
  typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
  typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 +typedef void MVEGenScalarCmpFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static bool do_vcmp(DisasContext *s, arg_vcmp *a, MVEGenCmpFn *fn)
      return true;
  }
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
++static bool do_vcmp_scalar(DisasContext *s, arg_vcmp_scalar *a,
---- a/target/arm/translate.c
++                           MVEGenScalarCmpFn *fn)
-+++ b/target/arm/translate.c
++{
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
++    TCGv_ptr qn;
-             /* Don't need to do this for any further FP insns in this TB */
++    TCGv_i32 rm;
-             s->v8m_fpccr_s_wrong = false;
++
-         }
++    if (!dc_isar_feature(aa32_mve, s) || !fn || a->rm == 13) {
-+
++        return false;
-+        if (s->v7m_new_fp_ctxt_needed) {
++    }
-+            /*
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
++        return true;
-+             * and the FPSCR.
++    }
-+             */
++
-+            TCGv_i32 control, fpscr;
++    qn = mve_qreg_ptr(a->qn);
-+            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
++    if (a->rm == 15) {
-+
++        /* Encoding Rm=0b1111 means "constant zero" */
-+            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
++        rm = tcg_constant_i32(0);
-+            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
++    } else {
-+            tcg_temp_free_i32(fpscr);
++        rm = load_reg(s, a->rm);
-+            /*
++    }
-+             * We don't need to arrange to end the TB, because the only
++    fn(cpu_env, qn, rm);
-+             * parts of FPSCR which we cache in the TB flags are the VECLEN
++    tcg_temp_free_ptr(qn);
-+             * and VECSTRIDE, and those don't exist for M-profile.
++    tcg_temp_free_i32(rm);
-+             */
++    if (a->mask) {
-+
++        /* VPT */
-+            if (s->v8m_secure) {
++        gen_vpst(s, a->mask);
-+                bits |= R_V7M_CONTROL_SFPA_MASK;
++    }
-+            }
++    mve_update_eci(s);
-+            control = load_cpu_field(v7m.control[M_REG_S]);
++    return true;
-+            tcg_gen_ori_i32(control, control, bits);
++}
-+            store_cpu_field(control, v7m.control[M_REG_S]);
++
-+            /* Don't need to do this for any further FP insns in this TB */
+ #define DO_VCMP(INSN, FN)                                       \
-+            s->v7m_new_fp_ctxt_needed = false;
+     static bool trans_##INSN(DisasContext *s, arg_vcmp *a)      \
-+        }
+     {                                                           \
@@ -XXX,XX +XXX,XX @@ static bool do_vcmp(DisasContext *s, arg_vcmp *a, MVEGenCmpFn *fn)
              NULL,                                               \
          };                                                      \
          return do_vcmp(s, a, fns[a->size]);                     \
 +    }                                                           \
 +    static bool trans_##INSN##_scalar(DisasContext *s,          \
 +                                      arg_vcmp_scalar *a)       \
 +    {                                                           \
 +        static MVEGenScalarCmpFn * const fns[] = {              \
 +            gen_helper_mve_##FN##_scalarb,                      \
 +            gen_helper_mve_##FN##_scalarh,                      \
 +            gen_helper_mve_##FN##_scalarw,                      \
 +            NULL,                                               \
 +        };                                                      \
 +        return do_vcmp_scalar(s, a, fns[a->size]);              \
      }
-     if (extract32(insn, 28, 4) == 0xf) {
+ DO_VCMP(VCMPEQ, vcmpeq)
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
          regime_is_secure(env, dc->mmu_idx);
      dc->v8m_stackcheck = FIELD_EX32(tb_flags, TBFLAG_A32, STACKCHECK);
      dc->v8m_fpccr_s_wrong = FIELD_EX32(tb_flags, TBFLAG_A32, FPCCR_S_WRONG);
 +    dc->v7m_new_fp_ctxt_needed =
 +        FIELD_EX32(tb_flags, TBFLAG_A32, NEW_FP_CTXT_NEEDED);
      dc->cp_regs = cpu->cp_regs;
      dc->features = env->features;
 --
 .20.1

-[Qemu-devel] [PULL 15/42] target/arm: Clear CONTROL.SFPA in BXNS and BLXNS
+[PULL 17/44] target/arm: Implement MVE VPSEL
-For v8M floating point support, transitions from Secure
+Implement the MVE VPSEL insn, which sets each byte of the destination
-to Non-secure state via BLNS and BLXNS must clear the
+vector Qd to the byte from either Qn or Qm depending on the value of
-CONTROL.SFPA bit. (This corresponds to the pseudocode
+the corresponding bit in VPR.P0.
 BranchToNS() function.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-13-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 4 ++++
+ target/arm/helper-mve.h    |  2 ++
-file changed, 4 insertions(+)
+ target/arm/mve.decode      |  7 +++++--
  target/arm/mve_helper.c    | 19 +++++++++++++++++++
  target/arm/translate-mve.c |  2 ++
 files changed, 28 insertions(+), 2 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_bxns)(CPUARMState *env, uint32_t dest)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     /* translate.c should have made BXNS UNDEF unless we're secure */
+ DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-     assert(env->v7m.secure);
+ DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+    if (!(dest & 1)) {
++DEF_HELPER_FLAGS_4(mve_vpsel, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+        env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_SFPA_MASK;
++
  DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vaddw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VSHLC             111 0 1110 1 . 1 imm:5 ... 0 1111 1100 rdm:4 qd=%qd
  # effectively "VCMP then VPST". A plain "VCMP" has a mask field of zero.
  VCMPEQ            1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 0 @vcmp
  VCMPNE            1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 0 @vcmp
 -VCMPCS            1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 1 @vcmp
 -VCMPHI            1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 1 @vcmp
 +{
 +  VPSEL           1111 1110 0 . 11 ... 1 ... 0 1111 . 0 . 0 ... 1 @2op_nosz
 +  VCMPCS          1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 1 @vcmp
 +  VCMPHI          1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 1 @vcmp
 +}
  VCMPGE            1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp
  VCMPLT            1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp
  VCMPGT            1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VCMP_S(vcmpge, DO_GE)
  DO_VCMP_S(vcmplt, DO_LT)
  DO_VCMP_S(vcmpgt, DO_GT)
  DO_VCMP_S(vcmple, DO_LE)
 +
 +void HELPER(mve_vpsel)(CPUARMState *env, void *vd, void *vn, void *vm)
 +{
 +    /*
 +     * Qd[n] = VPR.P0[n] ? Qn[n] : Qm[n]
 +     * but note that whether bytes are written to Qd is still subject
 +     * to (all forms of) predication in the usual way.
 +     */
 +    uint64_t *d = vd, *n = vn, *m = vm;
 +    uint16_t mask = mve_element_mask(env);
 +    uint16_t p0 = FIELD_EX32(env->v7m.vpr, V7M_VPR, P0);
 +    unsigned e;
 +    for (e = 0; e < 16 / 8; e++, mask >>= 8, p0 >>= 8) {
 +        uint64_t r = m[H8(e)];
 +        mergemask(&r, n[H8(e)], p0);
 +        mergemask(&d[H8(e)], r, mask);
 +    }
-     switch_v7m_security_state(env, dest & 1);
++    mve_advance_vpt(env);
-     env->thumb = 1;
++}
-     env->regs[15] = dest & ~1;
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_blxns)(CPUARMState *env, uint32_t dest)
+index XXXXXXX..XXXXXXX 100644
-          */
+--- a/target/arm/translate-mve.c
-         write_v7m_exception(env, 1);
++++ b/target/arm/translate-mve.c
-     }
+@@ -XXX,XX +XXX,XX @@ DO_LOGIC(VORR, gen_helper_mve_vorr)
-+    env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_SFPA_MASK;
+ DO_LOGIC(VORN, gen_helper_mve_vorn)
-     switch_v7m_security_state(env, 0);
+ DO_LOGIC(VEOR, gen_helper_mve_veor)
-     env->thumb = 1;
-     env->regs[15] = dest;
++DO_LOGIC(VPSEL, gen_helper_mve_vpsel)
 +
  #define DO_2OP(INSN, FN) \
      static bool trans_##INSN(DisasContext *s, arg_2op *a)       \
      {                                                           \
 --
 .20.1

-[Qemu-devel] [PULL 18/42] target/arm: Handle floating point registers in exception return
+[PULL 18/44] target/arm: Implement MVE VMLAS
-Handle floating point registers in exception return.
+Implement the MVE VMLAS insn, which multiplies a vector by a vector
-This corresponds to pseudocode functions ValidateExceptionReturn(),
+and adds a scalar.
 ExceptionReturn(), PopStack() and ConsumeExcStackFrame().
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-16-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 142 +++++++++++++++++++++++++++++++++++++++++++-
+ target/arm/helper-mve.h    |  4 ++++
-file changed, 141 insertions(+), 1 deletion(-)
+ target/arm/mve.decode      |  3 +++
  target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
  target/arm/translate-mve.c |  1 +
 files changed, 34 insertions(+)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i3
-     bool rettobase = false;
+ DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     bool exc_secure = false;
+ DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     bool return_to_secure;
-+    bool ftype;
++DEF_HELPER_FLAGS_4(mve_vmlasb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+    bool restore_s16_s31;
++DEF_HELPER_FLAGS_4(mve_vmlash, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vmlasw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     /* If we're not in Handler mode then jumps to magic exception-exit
++
-      * addresses don't have magic behaviour. However for the v8M
+ DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
+ DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-                       excret);
+ DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
  VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 +# The U bit (28) is don't-care because it does not affect the result
 +VMLAS            111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
 +
  # Vector add across vector
  {
    VADDV          111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, do_vqdmlsdh_w)
          mve_advance_vpt(env);                                           \
      }
-+    ftype = excret & R_V7M_EXCRET_FTYPE_MASK;
++/* "accumulating" version where FN takes d as well as n and m */
-+
++#define DO_2OP_ACC_SCALAR(OP, ESIZE, TYPE, FN)                          \
-+    if (!arm_feature(env, ARM_FEATURE_VFP) && !ftype) {
++    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
-+        qemu_log_mask(LOG_GUEST_ERROR, "M profile: zero FTYPE in exception "
++                                uint32_t rm)                            \
-+                      "exit PC value 0x%" PRIx32 " is UNPREDICTABLE "
++    {                                                                   \
-+                      "if FPU not present\n",
++        TYPE *d = vd, *n = vn;                                          \
-+                      excret);
++        TYPE m = rm;                                                    \
-+        ftype = true;
++        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            mergemask(&d[H##ESIZE(e)],                                  \
 +                      FN(d[H##ESIZE(e)], n[H##ESIZE(e)], m), mask);     \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
-     if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
+ /* provide unsigned 2-op scalar helpers for all sizes */
-         /* EXC_RETURN.ES validation check (R_SMFL). We must do this before
+ #define DO_2OP_SCALAR_U(OP, FN)                 \
-          * we pick which FAULTMASK to clear.
+     DO_2OP_SCALAR(OP##b, 1, uint8_t, FN)        \
-@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
+@@ -XXX,XX +XXX,XX @@ DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, do_vqdmlsdh_w)
-      */
+     DO_2OP_SCALAR(OP##h, 2, int16_t, FN)        \
-     write_v7m_control_spsel_for_secstate(env, return_to_sp_process, exc_secure);
+     DO_2OP_SCALAR(OP##w, 4, int32_t, FN)
-+    /*
++#define DO_2OP_ACC_SCALAR_U(OP, FN)             \
-+     * Clear scratch FP values left in caller saved registers; this
++    DO_2OP_ACC_SCALAR(OP##b, 1, uint8_t, FN)    \
-+     * must happen before any kind of tail chaining.
++    DO_2OP_ACC_SCALAR(OP##h, 2, uint16_t, FN)   \
-+     */
++    DO_2OP_ACC_SCALAR(OP##w, 4, uint32_t, FN)
 +    if ((env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_CLRONRET_MASK) &&
 +        (env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK)) {
 +        if (env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_LSPACT_MASK) {
 +            env->v7m.sfsr |= R_V7M_SFSR_LSERR_MASK;
 +            armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SECURE, false);
 +            qemu_log_mask(CPU_LOG_INT, "...taking SecureFault on existing "
 +                          "stackframe: error during lazy state deactivation\n");
 +            v7m_exception_taken(cpu, excret, true, false);
 +            return;
 +        } else {
 +            /* Clear s0..s15 and FPSCR */
 +            int i;
 +
-+            for (i = 0; i < 16; i += 2) {
+ DO_2OP_SCALAR_U(vadd_scalar, DO_ADD)
-+                *aa32_vfp_dreg(env, i / 2) = 0;
+ DO_2OP_SCALAR_U(vsub_scalar, DO_SUB)
-+            }
+ DO_2OP_SCALAR_U(vmul_scalar, DO_MUL)
-+            vfp_set_fpscr(env, 0);
+@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
-+        }
+ DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
-+    }
+ DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
 +/* Vector by vector plus scalar */
 +#define DO_VMLAS(D, N, M) ((N) * (D) + (M))
 +
-     if (sfault) {
++DO_2OP_ACC_SCALAR_U(vmlas, DO_VMLAS)
          env->v7m.sfsr |= R_V7M_SFSR_INVER_MASK;
          armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SECURE, false);
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
              }
          }
 +        if (!ftype) {
 +            /* FP present and we need to handle it */
 +            if (!return_to_secure &&
 +                (env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_LSPACT_MASK)) {
 +                armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SECURE, false);
 +                env->v7m.sfsr |= R_V7M_SFSR_LSERR_MASK;
 +                qemu_log_mask(CPU_LOG_INT,
 +                              "...taking SecureFault on existing stackframe: "
 +                              "Secure LSPACT set but exception return is "
 +                              "not to secure state\n");
 +                v7m_exception_taken(cpu, excret, true, false);
 +                return;
 +            }
 +
-+            restore_s16_s31 = return_to_secure &&
+ /*
-+                (env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_TS_MASK);
+  * Long saturating scalar ops. As with DO_2OP_L, TYPE and H are for the
-+
+  * input (smaller) type and LESIZE, LTYPE, LH for the output (long) type.
-+            if (env->v7m.fpccr[return_to_secure] & R_V7M_FPCCR_LSPACT_MASK) {
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-+                /* State in FPU is still valid, just clear LSPACT */
+index XXXXXXX..XXXXXXX 100644
-+                env->v7m.fpccr[return_to_secure] &= ~R_V7M_FPCCR_LSPACT_MASK;
+--- a/target/arm/translate-mve.c
-+            } else {
++++ b/target/arm/translate-mve.c
-+                int i;
+@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
-+                uint32_t fpscr;
+ DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
-+                bool cpacr_pass, nsacr_pass;
+ DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
-+
+ DO_2OP_SCALAR(VBRSR, vbrsr)
-+                cpacr_pass = v7m_cpacr_pass(env, return_to_secure,
++DO_2OP_SCALAR(VMLAS, vmlas)
-+                                            return_to_priv);
-+                nsacr_pass = return_to_secure ||
+ static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
-+                    extract32(env->v7m.nsacr, 10, 1);
+ {
 +
 +                if (!cpacr_pass) {
 +                    armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE,
 +                                            return_to_secure);
 +                    env->v7m.cfsr[return_to_secure] |= R_V7M_CFSR_NOCP_MASK;
 +                    qemu_log_mask(CPU_LOG_INT,
 +                                  "...taking UsageFault on existing "
 +                                  "stackframe: CPACR.CP10 prevents unstacking "
 +                                  "FP regs\n");
 +                    v7m_exception_taken(cpu, excret, true, false);
 +                    return;
 +                } else if (!nsacr_pass) {
 +                    armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, true);
 +                    env->v7m.cfsr[M_REG_S] |= R_V7M_CFSR_INVPC_MASK;
 +                    qemu_log_mask(CPU_LOG_INT,
 +                                  "...taking Secure UsageFault on existing "
 +                                  "stackframe: NSACR.CP10 prevents unstacking "
 +                                  "FP regs\n");
 +                    v7m_exception_taken(cpu, excret, true, false);
 +                    return;
 +                }
 +
 +                for (i = 0; i < (restore_s16_s31 ? 32 : 16); i += 2) {
 +                    uint32_t slo, shi;
 +                    uint64_t dn;
 +                    uint32_t faddr = frameptr + 0x20 + 4 * i;
 +
 +                    if (i >= 16) {
 +                        faddr += 8; /* Skip the slot for the FPSCR */
 +                    }
 +
 +                    pop_ok = pop_ok &&
 +                        v7m_stack_read(cpu, &slo, faddr, mmu_idx) &&
 +                        v7m_stack_read(cpu, &shi, faddr + 4, mmu_idx);
 +
 +                    if (!pop_ok) {
 +                        break;
 +                    }
 +
 +                    dn = (uint64_t)shi << 32 | slo;
 +                    *aa32_vfp_dreg(env, i / 2) = dn;
 +                }
 +                pop_ok = pop_ok &&
 +                    v7m_stack_read(cpu, &fpscr, frameptr + 0x60, mmu_idx);
 +                if (pop_ok) {
 +                    vfp_set_fpscr(env, fpscr);
 +                }
 +                if (!pop_ok) {
 +                    /*
 +                     * These regs are 0 if security extension present;
 +                     * otherwise merely UNKNOWN. We zero always.
 +                     */
 +                    for (i = 0; i < (restore_s16_s31 ? 32 : 16); i += 2) {
 +                        *aa32_vfp_dreg(env, i / 2) = 0;
 +                    }
 +                    vfp_set_fpscr(env, 0);
 +                }
 +            }
 +        }
 +        env->v7m.control[M_REG_S] = FIELD_DP32(env->v7m.control[M_REG_S],
 +                                               V7M_CONTROL, FPCA, !ftype);
 +
          /* Commit to consuming the stack frame */
          frameptr += 0x20;
 +        if (!ftype) {
 +            frameptr += 0x48;
 +            if (restore_s16_s31) {
 +                frameptr += 0x40;
 +            }
 +        }
          /* Undo stack alignment (the SPREALIGN bit indicates that the original
           * pre-exception SP was not 8-aligned and we added a padding word to
           * align it, so we undo this by ORing in the bit that increases it
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
          *frame_sp_p = frameptr;
      }
      /* This xpsr_write() will invalidate frame_sp_p as it may switch stack */
 -    xpsr_write(env, xpsr, ~XPSR_SPREALIGN);
 +    xpsr_write(env, xpsr, ~(XPSR_SPREALIGN | XPSR_SFPA));
 +
 +    if (env->v7m.secure) {
 +        bool sfpa = xpsr & XPSR_SFPA;
 +
 +        env->v7m.control[M_REG_S] = FIELD_DP32(env->v7m.control[M_REG_S],
 +                                               V7M_CONTROL, SFPA, sfpa);
 +    }
      /* The restored xPSR exception field will be zero if we're
       * resuming in Thread mode. If that doesn't match what the
 --
 .20.1

-[Qemu-devel] [PULL 28/42] target/arm: Implement VLLDM for v7M CPUs with an FPU
+[PULL 19/44] target/arm: Implement MVE shift-by-scalar
-Implement the VLLDM instruction for v7M for the FPU present cas.
+Implement the MVE instructions which perform shifts by a scalar.
 These are VSHL T2, VRSHL T2, VQSHL T1 and VQRSHL T2.  They take the
 shift amount in a general purpose register and shift every element in
 the vector by that amount.
 Mostly we can reuse the helper functions for shift-by-immediate; we
 do need two new helpers for VQRSHL.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-26-peter.maydell@linaro.org
 ---
- target/arm/helper.h    |  1 +
+ target/arm/helper-mve.h    |  8 +++++++
- target/arm/helper.c    | 54 ++++++++++++++++++++++++++++++++++++++++++
+ target/arm/mve.decode      | 23 ++++++++++++++++---
- target/arm/translate.c |  2 +-
+ target/arm/mve_helper.c    |  2 ++
-files changed, 56 insertions(+), 1 deletion(-)
+ target/arm/translate-mve.c | 46 ++++++++++++++++++++++++++++++++++++++
 files changed, 76 insertions(+), 3 deletions(-)
-diff --git a/target/arm/helper.h b/target/arm/helper.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(v7m_tt, i32, env, i32, i32)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- DEF_HELPER_1(v7m_preserve_fp_state, void, env)
+ DEF_HELPER_FLAGS_4(mve_vrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(mve_vrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- DEF_HELPER_2(v7m_vlstm, void, env, i32)
-+DEF_HELPER_2(v7m_vlldm, void, env, i32)
++DEF_HELPER_FLAGS_4(mve_vqrshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vqrshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- DEF_HELPER_2(v8m_stackcheck, void, env, i32)
++DEF_HELPER_FLAGS_4(mve_vqrshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++
-diff --git a/target/arm/helper.c b/target/arm/helper.c
++DEF_HELPER_FLAGS_4(mve_vqrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_4(mve_vshllbsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vshllbsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vshllbub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/mve.decode
-+++ b/target/arm/helper.c
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
+@@ -XXX,XX +XXX,XX @@
-     g_assert_not_reached();
+ &viwdup qd rn rm size imm
- }
+ &vcmp qm qn size mask
+ &vcmp_scalar qn rm size mask
-+void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr)
++&shl_scalar qda rm size
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@
  @2_shr_w .... .... .. 1 ..... .... .... .... .... &2shift qd=%qd qm=%qm \
           size=2 shift=%rshift_i5
 +@shl_scalar .... .... .... size:2 .. .... .... .... rm:4 &shl_scalar qda=%qd
 +
  # Vector comparison; 4-bit Qm but 3-bit Qn
  %mask_22_13      22:1 13:3
  @vcmp    .... .... .. size:2 qn:3 . .... .... .... .... &vcmp qm=%qm mask=%mask_22_13
@@ -XXX,XX +XXX,XX @@ VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_no
  VADD_scalar      1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
  VSUB_scalar      1110 1110 0 . .. ... 1 ... 1 1111 . 100 .... @2scalar
 -VMUL_scalar      1110 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 +
 +{
-+    /* translate.c should never generate calls here in user-only mode */
++  VSHL_S_scalar   1110 1110 0 . 11 .. 01 ... 1 1110 0110 .... @shl_scalar
-+    g_assert_not_reached();
++  VRSHL_S_scalar  1110 1110 0 . 11 .. 11 ... 1 1110 0110 .... @shl_scalar
 +  VQSHL_S_scalar  1110 1110 0 . 11 .. 01 ... 1 1110 1110 .... @shl_scalar
 +  VQRSHL_S_scalar 1110 1110 0 . 11 .. 11 ... 1 1110 1110 .... @shl_scalar
 +  VMUL_scalar     1110 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 +}
 +
- uint32_t HELPER(v7m_tt)(CPUARMState *env, uint32_t addr, uint32_t op)
++{
- {
++  VSHL_U_scalar   1111 1110 0 . 11 .. 01 ... 1 1110 0110 .... @shl_scalar
-     /* The TT instructions can be used by unprivileged code, but in
++  VRSHL_U_scalar  1111 1110 0 . 11 .. 11 ... 1 1110 0110 .... @shl_scalar
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
++  VQSHL_U_scalar  1111 1110 0 . 11 .. 01 ... 1 1110 1110 .... @shl_scalar
-     env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_FPCA_MASK;
++  VQRSHL_U_scalar 1111 1110 0 . 11 .. 11 ... 1 1110 1110 .... @shl_scalar
 +  VBRSR           1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 +}
 +
  VHADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
  VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
  VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
@@ -XXX,XX +XXX,XX @@ VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
                    size=%size_28
  }
-+void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr)
+-VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 -
  VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
  DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
  DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
  DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
 +DO_2SHIFT_SAT_U(vqrshli_u, DO_UQRSHL_OP)
 +DO_2SHIFT_SAT_S(vqrshli_s, DO_SQRSHL_OP)
  /* Shift-and-insert; we always work with 64 bits at a time */
  #define DO_2SHIFT_INSERT(OP, ESIZE, SHIFTFN, MASKFN)                    \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VRSHRI_U, vrshli_u, true)
  DO_2SHIFT(VSRI, vsri, false)
  DO_2SHIFT(VSLI, vsli, false)
 +static bool do_2shift_scalar(DisasContext *s, arg_shl_scalar *a,
 +                             MVEGenTwoOpShiftFn *fn)
 +{
-+    /* fptr is the value of Rn, the frame pointer we load the FP regs from */
++    TCGv_ptr qda;
-+    assert(env->v7m.secure);
++    TCGv_i32 rm;
 +
-+    if (!(env->v7m.control[M_REG_S] & R_V7M_CONTROL_SFPA_MASK)) {
++    if (!dc_isar_feature(aa32_mve, s) ||
-+        return;
++        !mve_check_qreg_bank(s, a->qda) ||
 +        a->rm == 13 || a->rm == 15 || !fn) {
 +        /* Rm cases are UNPREDICTABLE */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    /* Check access to the coprocessor is permitted */
++    qda = mve_qreg_ptr(a->qda);
-+    if (!v7m_cpacr_pass(env, true, arm_current_el(env) != 0)) {
++    rm = load_reg(s, a->rm);
-+        raise_exception_ra(env, EXCP_NOCP, 0, 1, GETPC());
++    fn(cpu_env, qda, qda, rm);
 +    tcg_temp_free_ptr(qda);
 +    tcg_temp_free_i32(rm);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +#define DO_2SHIFT_SCALAR(INSN, FN)                                      \
 +    static bool trans_##INSN(DisasContext *s, arg_shl_scalar *a)        \
 +    {                                                                   \
 +        static MVEGenTwoOpShiftFn * const fns[] = {                     \
 +            gen_helper_mve_##FN##b,                                     \
 +            gen_helper_mve_##FN##h,                                     \
 +            gen_helper_mve_##FN##w,                                     \
 +            NULL,                                                       \
 +        };                                                              \
 +        return do_2shift_scalar(s, a, fns[a->size]);                    \
 +    }
 +
-+    if (env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_LSPACT_MASK) {
++DO_2SHIFT_SCALAR(VSHL_S_scalar, vshli_s)
-+        /* State in FP is still valid */
++DO_2SHIFT_SCALAR(VSHL_U_scalar, vshli_u)
-+        env->v7m.fpccr[M_REG_S] &= ~R_V7M_FPCCR_LSPACT_MASK;
++DO_2SHIFT_SCALAR(VRSHL_S_scalar, vrshli_s)
-+    } else {
++DO_2SHIFT_SCALAR(VRSHL_U_scalar, vrshli_u)
-+        bool ts = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_TS_MASK;
++DO_2SHIFT_SCALAR(VQSHL_S_scalar, vqshli_s)
-+        int i;
++DO_2SHIFT_SCALAR(VQSHL_U_scalar, vqshli_u)
-+        uint32_t fpscr;
++DO_2SHIFT_SCALAR(VQRSHL_S_scalar, vqrshli_s)
 +DO_2SHIFT_SCALAR(VQRSHL_U_scalar, vqrshli_u)
 +
-+        if (fptr & 7) {
+ #define DO_VSHLL(INSN, FN)                                      \
-+            raise_exception_ra(env, EXCP_UNALIGNED, 0, 1, GETPC());
+     static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
-+        }
+     {                                                           \
 +
 +        for (i = 0; i < (ts ? 32 : 16); i += 2) {
 +            uint32_t slo, shi;
 +            uint64_t dn;
 +            uint32_t faddr = fptr + 4 * i;
 +
 +            if (i >= 16) {
 +                faddr += 8; /* skip the slot for the FPSCR */
 +            }
 +
 +            slo = cpu_ldl_data(env, faddr);
 +            shi = cpu_ldl_data(env, faddr + 4);
 +
 +            dn = (uint64_t) shi << 32 | slo;
 +            *aa32_vfp_dreg(env, i / 2) = dn;
 +        }
 +        fpscr = cpu_ldl_data(env, fptr + 0x40);
 +        vfp_set_fpscr(env, fpscr);
 +    }
 +
 +    env->v7m.control[M_REG_S] |= R_V7M_CONTROL_FPCA_MASK;
 +}
 +
  static bool v7m_push_stack(ARMCPU *cpu)
  {
      /* Do the "set up stack frame" part of exception entry,
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
                      TCGv_i32 fptr = load_reg(s, rn);
                      if (extract32(insn, 20, 1)) {
 -                        /* VLLDM */
 +                        gen_helper_v7m_vlldm(cpu_env, fptr);
                      } else {
                          gen_helper_v7m_vlstm(cpu_env, fptr);
                      }
 --
 .20.1

-[Qemu-devel] [PULL 21/42] target/arm: Set FPCCR.S when executing M-profile floating point insns
+[PULL 20/44] target/arm: Move 'x' and 'a' bit definitions into vmlaldav formats
-The M-profile FPCCR.S bit indicates the security status of
+All the users of the vmlaldav formats have an 'x bit in bit 12 and an
-the floating point context. In the pseudocode ExecuteFPCheck()
+'a' bit in bit 5; move these to the format rather than specifying them
-function it is unconditionally set to match the current
+in each insn pattern.
 security state whenever a floating point instruction is
 executed.
 Implement this by adding a new TB flag which tracks whether
 FPCCR.S is different from the current security state, so
 that we only need to emit the code to update it in the
 less-common case when it is not already set correctly.
 Note that we will add the handling for the other work done
 by ExecuteFPCheck() in later commits.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-19-peter.maydell@linaro.org
 ---
- target/arm/cpu.h       |  2 ++
+ target/arm/mve.decode | 16 ++++++++--------
- target/arm/translate.h |  1 +
+file changed, 8 insertions(+), 8 deletions(-)
  target/arm/helper.c    |  5 +++++
  target/arm/translate.c | 20 ++++++++++++++++++++
 files changed, 28 insertions(+)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/mve.decode
-+++ b/target/arm/cpu.h
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, NS, 6, 1)
+@@ -XXX,XX +XXX,XX @@ VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
- FIELD(TBFLAG_A32, VFPEN, 7, 1)
- FIELD(TBFLAG_A32, CONDEXEC, 8, 8)
+ &vmlaldav rdahi rdalo size qn qm x a
- FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
-+/* For M profile only, set if FPCCR.S does not match current security state */
+-@vmlaldav        .... .... . ... ... . ... . .... .... qm:3 . \
-+FIELD(TBFLAG_A32, FPCCR_S_WRONG, 20, 1)
++@vmlaldav        .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
- /* For M profile only, Handler (ie not Thread) mode */
+                  qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
- FIELD(TBFLAG_A32, HANDLER, 21, 1)
+-@vmlaldav_nosz   .... .... . ... ... . ... . .... .... qm:3 . \
- /* For M profile only, whether we should generate stack-limit checks */
++@vmlaldav_nosz   .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
-diff --git a/target/arm/translate.h b/target/arm/translate.h
+                  qn=%qn rdahi=%rdahi rdalo=%rdalo size=0 &vmlaldav
-index XXXXXXX..XXXXXXX 100644
+-VMLALDAV_S       1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
---- a/target/arm/translate.h
+-VMLALDAV_U       1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
-+++ b/target/arm/translate.h
++VMLALDAV_S       1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
-@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
++VMLALDAV_U       1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
-     bool v7m_handler_mode;
-     bool v8m_secure; /* true if v8M and we're in Secure mode */
+-VMLSLDAV         1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav
-     bool v8m_stackcheck; /* true if we need to perform v8M stack limit checks */
++VMLSLDAV         1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
-+    bool v8m_fpccr_s_wrong; /* true if v8M FPCCR.S != v8m_secure */
-     /* Immediate value in AArch32 SVC insn; must be set if is_jmp == DISAS_SWI
+-VRMLALDAVH_S     1110 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
-      * so that top level loop can generate correct syndrome information.
+-VRMLALDAVH_U     1111 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
-      */
++VRMLALDAVH_S     1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
-diff --git a/target/arm/helper.c b/target/arm/helper.c
++VRMLALDAVH_U     1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+-VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_nosz
-+++ b/target/arm/helper.c
++VRMLSLDAVH       1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-         flags = FIELD_DP32(flags, TBFLAG_A32, STACKCHECK, 1);
+ # Scalar operations
      }
 +    if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
 +        FIELD_EX32(env->v7m.fpccr[M_REG_S], V7M_FPCCR, S) != env->v7m.secure) {
 +        flags = FIELD_DP32(flags, TBFLAG_A32, FPCCR_S_WRONG, 1);
 +    }
 +
      *pflags = flags;
      *cs_base = 0;
  }
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
          }
      }
 +    if (arm_dc_feature(s, ARM_FEATURE_M)) {
 +        /* Handle M-profile lazy FP state mechanics */
 +
 +        /* Update ownership of FP context: set FPCCR.S to match current state */
 +        if (s->v8m_fpccr_s_wrong) {
 +            TCGv_i32 tmp;
 +
 +            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
 +            if (s->v8m_secure) {
 +                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
 +            } else {
 +                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
 +            }
 +            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
 +            /* Don't need to do this for any further FP insns in this TB */
 +            s->v8m_fpccr_s_wrong = false;
 +        }
 +    }
 +
      if (extract32(insn, 28, 4) == 0xf) {
          /*
           * Encodings with T=1 (Thumb) or unconditional (ARM):
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
      dc->v8m_secure = arm_feature(env, ARM_FEATURE_M_SECURITY) &&
          regime_is_secure(env, dc->mmu_idx);
      dc->v8m_stackcheck = FIELD_EX32(tb_flags, TBFLAG_A32, STACKCHECK);
 +    dc->v8m_fpccr_s_wrong = FIELD_EX32(tb_flags, TBFLAG_A32, FPCCR_S_WRONG);
      dc->cp_regs = cpu->cp_regs;
      dc->features = env->features;
 --
 .20.1

-[Qemu-devel] [PULL 25/42] target/arm: Add lazy-FP-stacking support to v7m_stack_write()
+[PULL 21/44] target/arm: Implement MVE integer min/max across vector
-Pushing registers to the stack for v7M needs to handle three cases:
+Implement the MVE integer min/max across vector insns
- * the "normal" case where we pend exceptions
+VMAXV, VMINV, VMAXAV and VMINAV, which find the maximum
- * an "ignore faults" case where we set FSR bits but
+from the vector elements and a general purpose register,
-   do not pend exceptions (this is used when we are
+and store the maximum back into the general purpose
-   handling some kinds of derived exception on exception entry)
+register.
- * a "lazy FP stacking" case, where different FSR bits
-   are set and the exception is pended differently
+These insns overlap with VRMLALDAVH (they use what would
+be RdaHi=0b110).
 Implement this by changing the existing flag argument that
 tells us whether to ignore faults or not into an enum that
 specifies which of the 3 modes we should handle.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-23-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 118 +++++++++++++++++++++++++++++---------------
+ target/arm/helper-mve.h    | 20 ++++++++++++
-file changed, 79 insertions(+), 39 deletions(-)
+ target/arm/mve.decode      | 18 +++++++++--
+ target/arm/mve_helper.c    | 66 ++++++++++++++++++++++++++++++++++++++
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+ target/arm/translate-mve.c | 48 +++++++++++++++++++++++++++
-index XXXXXXX..XXXXXXX 100644
+files changed, 150 insertions(+), 2 deletions(-)
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static bool v7m_cpacr_pass(CPUARMState *env, bool is_secure, bool is_priv)
+index XXXXXXX..XXXXXXX 100644
-     }
+--- a/target/arm/helper-mve.h
- }
++++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
  DEF_HELPER_FLAGS_3(mve_vaddvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
  DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vmaxvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vmaxvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vmaxvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vmaxvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vmaxvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vmaxvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vmaxavb, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vmaxavh, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vmaxavw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vminvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vminvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vminvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vminvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vminvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vminvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vminavb, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vminavh, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vminavw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +
  DEF_HELPER_FLAGS_3(mve_vaddlv_s, TCG_CALL_NO_WG, i64, env, ptr, i64)
  DEF_HELPER_FLAGS_3(mve_vaddlv_u, TCG_CALL_NO_WG, i64, env, ptr, i64)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  &vcmp qm qn size mask
  &vcmp_scalar qn rm size mask
  &shl_scalar qda rm size
 +&vmaxv qm rda size
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@
  @vcmp_scalar .... .... .. size:2 qn:3 . .... .... .... rm:4 &vcmp_scalar \
               mask=%mask_22_13
 +@vmaxv .... .... .... size:2 .. rda:4 .... .... .... &vmaxv qm=%qm
 +
  # Vector loads and stores
  # Widening loads and narrowing stores:
@@ -XXX,XX +XXX,XX @@ VMLALDAV_U       1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
  VMLSLDAV         1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
 -VRMLALDAVH_S     1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
 -VRMLALDAVH_U     1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
 +{
 +  VMAXV_S        1110 1110 1110  .. 10 ....  1111 0 0 . 0 ... 0 @vmaxv
 +  VMINV_S        1110 1110 1110  .. 10 ....  1111 1 0 . 0 ... 0 @vmaxv
 +  VMAXAV         1110 1110 1110  .. 00 ....  1111 0 0 . 0 ... 0 @vmaxv
 +  VMINAV         1110 1110 1110  .. 00 ....  1111 1 0 . 0 ... 0 @vmaxv
 +  VRMLALDAVH_S   1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
 +}
 +
 +{
 +  VMAXV_U        1111 1110 1110  .. 10 ....  1111 0 0 . 0 ... 0 @vmaxv
 +  VMINV_U        1111 1110 1110  .. 10 ....  1111 1 0 . 0 ... 0 @vmaxv
 +  VRMLALDAVH_U   1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
 +}
  VRMLSLDAVH       1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvub, 1, uint8_t)
  DO_VADDV(vaddvuh, 2, uint16_t)
  DO_VADDV(vaddvuw, 4, uint32_t)
 +/*
-+ * What kind of stack write are we doing? This affects how exceptions
++ * Vector max/min across vector. Unlike VADDV, we must
-+ * generated during the stacking are treated.
++ * read ra as the element size, not its full width.
 + * We work with int64_t internally for simplicity.
 + */
-+typedef enum StackingMode {
++#define DO_VMAXMINV(OP, ESIZE, TYPE, RATYPE, FN)                \
-+    STACK_NORMAL,
++    uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
-+    STACK_IGNFAULTS,
++                                    uint32_t ra_in)             \
-+    STACK_LAZYFP,
++    {                                                           \
-+} StackingMode;
++        uint16_t mask = mve_element_mask(env);                  \
-+
++        unsigned e;                                             \
- static bool v7m_stack_write(ARMCPU *cpu, uint32_t addr, uint32_t value,
++        TYPE *m = vm;                                           \
--                            ARMMMUIdx mmu_idx, bool ignfault)
++        int64_t ra = (RATYPE)ra_in;                             \
-+                            ARMMMUIdx mmu_idx, StackingMode mode)
++        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
- {
++            if (mask & 1) {                                     \
-     CPUState *cs = CPU(cpu);
++                ra = FN(ra, m[H##ESIZE(e)]);                    \
-     CPUARMState *env = &cpu->env;
++            }                                                   \
-@@ -XXX,XX +XXX,XX @@ static bool v7m_stack_write(ARMCPU *cpu, uint32_t addr, uint32_t value,
++        }                                                       \
-                       &attrs, &prot, &page_size, &fi, NULL)) {
++        mve_advance_vpt(env);                                   \
-         /* MPU/SAU lookup failed */
++        return ra;                                              \
-         if (fi.type == ARMFault_QEMU_SFault) {
++    }                                                           \
--            qemu_log_mask(CPU_LOG_INT,
++
--                          "...SecureFault with SFSR.AUVIOL during stacking\n");
++#define DO_VMAXMINV_U(INSN, FN)                         \
--            env->v7m.sfsr |= R_V7M_SFSR_AUVIOL_MASK | R_V7M_SFSR_SFARVALID_MASK;
++    DO_VMAXMINV(INSN##b, 1, uint8_t, uint8_t, FN)       \
-+            if (mode == STACK_LAZYFP) {
++    DO_VMAXMINV(INSN##h, 2, uint16_t, uint16_t, FN)     \
-+                qemu_log_mask(CPU_LOG_INT,
++    DO_VMAXMINV(INSN##w, 4, uint32_t, uint32_t, FN)
-+                              "...SecureFault with SFSR.LSPERR "
++#define DO_VMAXMINV_S(INSN, FN)                         \
-+                              "during lazy stacking\n");
++    DO_VMAXMINV(INSN##b, 1, int8_t, int8_t, FN)         \
-+                env->v7m.sfsr |= R_V7M_SFSR_LSPERR_MASK;
++    DO_VMAXMINV(INSN##h, 2, int16_t, int16_t, FN)       \
-+            } else {
++    DO_VMAXMINV(INSN##w, 4, int32_t, int32_t, FN)
-+                qemu_log_mask(CPU_LOG_INT,
++
-+                              "...SecureFault with SFSR.AUVIOL "
++/*
-+                              "during stacking\n");
++ * Helpers for max and min of absolute values across vector:
-+                env->v7m.sfsr |= R_V7M_SFSR_AUVIOL_MASK;
++ * note that we only take the absolute value of 'm', not 'n'
-+            }
++ */
-+            env->v7m.sfsr |= R_V7M_SFSR_SFARVALID_MASK;
++static int64_t do_maxa(int64_t n, int64_t m)
-             env->v7m.sfar = addr;
++{
-             exc = ARMV7M_EXCP_SECURE;
++    if (m < 0) {
-             exc_secure = false;
++        m = -m;
-         } else {
++    }
--            qemu_log_mask(CPU_LOG_INT, "...MemManageFault with CFSR.MSTKERR\n");
++    return MAX(n, m);
--            env->v7m.cfsr[secure] |= R_V7M_CFSR_MSTKERR_MASK;
++}
-+            if (mode == STACK_LAZYFP) {
++
-+                qemu_log_mask(CPU_LOG_INT,
++static int64_t do_mina(int64_t n, int64_t m)
-+                              "...MemManageFault with CFSR.MLSPERR\n");
++{
-+                env->v7m.cfsr[secure] |= R_V7M_CFSR_MLSPERR_MASK;
++    if (m < 0) {
-+            } else {
++        m = -m;
-+                qemu_log_mask(CPU_LOG_INT,
++    }
-+                              "...MemManageFault with CFSR.MSTKERR\n");
++    return MIN(n, m);
-+                env->v7m.cfsr[secure] |= R_V7M_CFSR_MSTKERR_MASK;
++}
-+            }
++
-             exc = ARMV7M_EXCP_MEM;
++DO_VMAXMINV_S(vmaxvs, DO_MAX)
-             exc_secure = secure;
++DO_VMAXMINV_U(vmaxvu, DO_MAX)
-         }
++DO_VMAXMINV_S(vminvs, DO_MIN)
-@@ -XXX,XX +XXX,XX @@ static bool v7m_stack_write(ARMCPU *cpu, uint32_t addr, uint32_t value,
++DO_VMAXMINV_U(vminvu, DO_MIN)
-                          attrs, &txres);
++/*
-     if (txres != MEMTX_OK) {
++ * VMAXAV, VMINAV treat the general purpose input as unsigned
-         /* BusFault trying to write the data */
++ * and the vector elements as signed.
--        qemu_log_mask(CPU_LOG_INT, "...BusFault with BFSR.STKERR\n");
++ */
--        env->v7m.cfsr[M_REG_NS] |= R_V7M_CFSR_STKERR_MASK;
++DO_VMAXMINV(vmaxavb, 1, int8_t, uint8_t, do_maxa)
-+        if (mode == STACK_LAZYFP) {
++DO_VMAXMINV(vmaxavh, 2, int16_t, uint16_t, do_maxa)
-+            qemu_log_mask(CPU_LOG_INT, "...BusFault with BFSR.LSPERR\n");
++DO_VMAXMINV(vmaxavw, 4, int32_t, uint32_t, do_maxa)
-+            env->v7m.cfsr[M_REG_NS] |= R_V7M_CFSR_LSPERR_MASK;
++DO_VMAXMINV(vminavb, 1, int8_t, uint8_t, do_mina)
-+        } else {
++DO_VMAXMINV(vminavh, 2, int16_t, uint16_t, do_mina)
-+            qemu_log_mask(CPU_LOG_INT, "...BusFault with BFSR.STKERR\n");
++DO_VMAXMINV(vminavw, 4, int32_t, uint32_t, do_mina)
-+            env->v7m.cfsr[M_REG_NS] |= R_V7M_CFSR_STKERR_MASK;
++
-+        }
+ #define DO_VADDLV(OP, TYPE, LTYPE)                              \
-         exc = ARMV7M_EXCP_BUS;
+     uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
-         exc_secure = false;
+                                     uint64_t ra)                \
-         goto pend_fault;
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ pend_fault:
+index XXXXXXX..XXXXXXX 100644
-      * later if we have two derived exceptions.
+--- a/target/arm/translate-mve.c
-      * The only case when we must not pend the exception but instead
++++ b/target/arm/translate-mve.c
-      * throw it away is if we are doing the push of the callee registers
+@@ -XXX,XX +XXX,XX @@ DO_VCMP(VCMPGE, vcmpge)
--     * and we've already generated a derived exception. Even in this
+ DO_VCMP(VCMPLT, vcmplt)
--     * case we will still update the fault status registers.
+ DO_VCMP(VCMPGT, vcmpgt)
-+     * and we've already generated a derived exception (this is indicated
+ DO_VCMP(VCMPLE, vcmple)
-+     * by the caller passing STACK_IGNFAULTS). Even in this case we will
++
-+     * still update the fault status registers.
++static bool do_vmaxv(DisasContext *s, arg_vmaxv *a, MVEGenVADDVFn fn)
-      */
++{
--    if (!ignfault) {
++    /*
-+    switch (mode) {
++     * MIN/MAX operations across a vector: compute the min or
-+    case STACK_NORMAL:
++     * max of the initial value in a general purpose register
-         armv7m_nvic_set_pending_derived(env->nvic, exc, exc_secure);
++     * and all the elements in the vector, and store it back
-+        break;
++     * into the general purpose register.
-+    case STACK_LAZYFP:
++     */
-+        armv7m_nvic_set_pending_lazyfp(env->nvic, exc, exc_secure);
++    TCGv_ptr qm;
-+        break;
++    TCGv_i32 rda;
-+    case STACK_IGNFAULTS:
++
-+        break;
++    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qm) ||
-     }
++        !fn || a->rda == 13 || a->rda == 15) {
-     return false;
++        /* Rda cases are UNPREDICTABLE */
- }
++        return false;
-@@ -XXX,XX +XXX,XX @@ static bool v7m_push_callee_stack(ARMCPU *cpu, uint32_t lr, bool dotailchain,
++    }
-     uint32_t limit;
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-     bool want_psp;
++        return true;
-     uint32_t sig;
++    }
-+    StackingMode smode = ignore_faults ? STACK_IGNFAULTS : STACK_NORMAL;
++
++    qm = mve_qreg_ptr(a->qm);
-     if (dotailchain) {
++    rda = load_reg(s, a->rda);
-         bool mode = lr & R_V7M_EXCRET_MODE_MASK;
++    fn(rda, cpu_env, qm, rda);
-@@ -XXX,XX +XXX,XX @@ static bool v7m_push_callee_stack(ARMCPU *cpu, uint32_t lr, bool dotailchain,
++    store_reg(s, a->rda, rda);
-      */
++    tcg_temp_free_ptr(qm);
-     sig = v7m_integrity_sig(env, lr);
++    mve_update_eci(s);
-     stacked_ok =
++    return true;
--        v7m_stack_write(cpu, frameptr, sig, mmu_idx, ignore_faults) &&
++}
--        v7m_stack_write(cpu, frameptr + 0x8, env->regs[4], mmu_idx,
++
--                        ignore_faults) &&
++#define DO_VMAXV(INSN, FN)                                      \
--        v7m_stack_write(cpu, frameptr + 0xc, env->regs[5], mmu_idx,
++    static bool trans_##INSN(DisasContext *s, arg_vmaxv *a)     \
--                        ignore_faults) &&
++    {                                                           \
--        v7m_stack_write(cpu, frameptr + 0x10, env->regs[6], mmu_idx,
++        static MVEGenVADDVFn * const fns[] = {                  \
--                        ignore_faults) &&
++            gen_helper_mve_##FN##b,                             \
--        v7m_stack_write(cpu, frameptr + 0x14, env->regs[7], mmu_idx,
++            gen_helper_mve_##FN##h,                             \
--                        ignore_faults) &&
++            gen_helper_mve_##FN##w,                             \
--        v7m_stack_write(cpu, frameptr + 0x18, env->regs[8], mmu_idx,
++            NULL,                                               \
--                        ignore_faults) &&
++        };                                                      \
--        v7m_stack_write(cpu, frameptr + 0x1c, env->regs[9], mmu_idx,
++        return do_vmaxv(s, a, fns[a->size]);                    \
--                        ignore_faults) &&
++    }
--        v7m_stack_write(cpu, frameptr + 0x20, env->regs[10], mmu_idx,
++
--                        ignore_faults) &&
++DO_VMAXV(VMAXV_S, vmaxvs)
--        v7m_stack_write(cpu, frameptr + 0x24, env->regs[11], mmu_idx,
++DO_VMAXV(VMAXV_U, vmaxvu)
--                        ignore_faults);
++DO_VMAXV(VMAXAV, vmaxav)
-+        v7m_stack_write(cpu, frameptr, sig, mmu_idx, smode) &&
++DO_VMAXV(VMINV_S, vminvs)
-+        v7m_stack_write(cpu, frameptr + 0x8, env->regs[4], mmu_idx, smode) &&
++DO_VMAXV(VMINV_U, vminvu)
-+        v7m_stack_write(cpu, frameptr + 0xc, env->regs[5], mmu_idx, smode) &&
++DO_VMAXV(VMINAV, vminav)
 +        v7m_stack_write(cpu, frameptr + 0x10, env->regs[6], mmu_idx, smode) &&
 +        v7m_stack_write(cpu, frameptr + 0x14, env->regs[7], mmu_idx, smode) &&
 +        v7m_stack_write(cpu, frameptr + 0x18, env->regs[8], mmu_idx, smode) &&
 +        v7m_stack_write(cpu, frameptr + 0x1c, env->regs[9], mmu_idx, smode) &&
 +        v7m_stack_write(cpu, frameptr + 0x20, env->regs[10], mmu_idx, smode) &&
 +        v7m_stack_write(cpu, frameptr + 0x24, env->regs[11], mmu_idx, smode);
      /* Update SP regardless of whether any of the stack accesses failed. */
      *frame_sp_p = frameptr;
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
       * if it has higher priority).
       */
      stacked_ok = stacked_ok &&
 -        v7m_stack_write(cpu, frameptr, env->regs[0], mmu_idx, false) &&
 -        v7m_stack_write(cpu, frameptr + 4, env->regs[1], mmu_idx, false) &&
 -        v7m_stack_write(cpu, frameptr + 8, env->regs[2], mmu_idx, false) &&
 -        v7m_stack_write(cpu, frameptr + 12, env->regs[3], mmu_idx, false) &&
 -        v7m_stack_write(cpu, frameptr + 16, env->regs[12], mmu_idx, false) &&
 -        v7m_stack_write(cpu, frameptr + 20, env->regs[14], mmu_idx, false) &&
 -        v7m_stack_write(cpu, frameptr + 24, env->regs[15], mmu_idx, false) &&
 -        v7m_stack_write(cpu, frameptr + 28, xpsr, mmu_idx, false);
 +        v7m_stack_write(cpu, frameptr, env->regs[0], mmu_idx, STACK_NORMAL) &&
 +        v7m_stack_write(cpu, frameptr + 4, env->regs[1],
 +                        mmu_idx, STACK_NORMAL) &&
 +        v7m_stack_write(cpu, frameptr + 8, env->regs[2],
 +                        mmu_idx, STACK_NORMAL) &&
 +        v7m_stack_write(cpu, frameptr + 12, env->regs[3],
 +                        mmu_idx, STACK_NORMAL) &&
 +        v7m_stack_write(cpu, frameptr + 16, env->regs[12],
 +                        mmu_idx, STACK_NORMAL) &&
 +        v7m_stack_write(cpu, frameptr + 20, env->regs[14],
 +                        mmu_idx, STACK_NORMAL) &&
 +        v7m_stack_write(cpu, frameptr + 24, env->regs[15],
 +                        mmu_idx, STACK_NORMAL) &&
 +        v7m_stack_write(cpu, frameptr + 28, xpsr, mmu_idx, STACK_NORMAL);
      if (env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK) {
          /* FPU is active, try to save its registers */
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
                          faddr += 8; /* skip the slot for the FPSCR */
                      }
                      stacked_ok = stacked_ok &&
 -                        v7m_stack_write(cpu, faddr, slo, mmu_idx, false) &&
 -                        v7m_stack_write(cpu, faddr + 4, shi, mmu_idx, false);
 +                        v7m_stack_write(cpu, faddr, slo,
 +                                        mmu_idx, STACK_NORMAL) &&
 +                        v7m_stack_write(cpu, faddr + 4, shi,
 +                                        mmu_idx, STACK_NORMAL);
                  }
                  stacked_ok = stacked_ok &&
                      v7m_stack_write(cpu, frameptr + 0x60,
 -                                    vfp_get_fpscr(env), mmu_idx, false);
 +                                    vfp_get_fpscr(env), mmu_idx, STACK_NORMAL);
                  if (cpacr_pass) {
                      for (i = 0; i < ((framesize == 0xa8) ? 32 : 16); i += 2) {
                          *aa32_vfp_dreg(env, i / 2) = 0;
 --
 .20.1

-[Qemu-devel] [PULL 24/42] target/arm: New function armv7m_nvic_set_pending_lazyfp()
+[PULL 22/44] target/arm: Implement MVE VABAV
-In the v7M architecture, if an exception is generated in the process
+Implement the MVE VABAV insn, which computes absolute differences
-of doing the lazy stacking of FP registers, the handling of
+between elements of two vectors and accumulates the result into
-possible escalation to HardFault is treated differently to the normal
+a general purpose register.
 approach: it works based on the saved information about exception
 readiness that was stored in the FPCCR when the stack frame was
 created. Provide a new function armv7m_nvic_set_pending_lazyfp()
 which pends exceptions during lazy stacking, and implements
 this logic.
 This corresponds to the pseudocode TakePreserveFPException().
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-22-peter.maydell@linaro.org
 ---
- target/arm/cpu.h      | 12 ++++++
+ target/arm/helper-mve.h    |  7 +++++++
- hw/intc/armv7m_nvic.c | 96 +++++++++++++++++++++++++++++++++++++++++++
+ target/arm/mve.decode      |  6 ++++++
-files changed, 108 insertions(+)
+ target/arm/mve_helper.c    | 26 +++++++++++++++++++++++
  target/arm/translate-mve.c | 43 ++++++++++++++++++++++++++++++++++++++
 files changed, 82 insertions(+)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/cpu.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ void armv7m_nvic_set_pending(void *opaque, int irq, bool secure);
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vminavw, TCG_CALL_NO_WG, i32, env, ptr, i32)
-  * a different exception).
+ DEF_HELPER_FLAGS_3(mve_vaddlv_s, TCG_CALL_NO_WG, i64, env, ptr, i64)
-  */
+ DEF_HELPER_FLAGS_3(mve_vaddlv_u, TCG_CALL_NO_WG, i64, env, ptr, i64)
- void armv7m_nvic_set_pending_derived(void *opaque, int irq, bool secure);
-+/**
++DEF_HELPER_FLAGS_4(mve_vabavsb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+ * armv7m_nvic_set_pending_lazyfp: mark this lazy FP exception as pending
++DEF_HELPER_FLAGS_4(mve_vabavsh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+ * @opaque: the NVIC
++DEF_HELPER_FLAGS_4(mve_vabavsw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+ * @irq: the exception number to mark pending
++DEF_HELPER_FLAGS_4(mve_vabavub, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+ * @secure: false for non-banked exceptions or for the nonsecure
++DEF_HELPER_FLAGS_4(mve_vabavuh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+ * version of a banked exception, true for the secure version of a banked
++DEF_HELPER_FLAGS_4(mve_vabavuw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+ * exception.
++
-+ *
+ DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
-+ * Similar to armv7m_nvic_set_pending(), but specifically for exceptions
+ DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
-+ * generated in the course of lazy stacking of FP registers.
+ DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
-+ */
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 +void armv7m_nvic_set_pending_lazyfp(void *opaque, int irq, bool secure);
  /**
   * armv7m_nvic_get_pending_irq_info: return highest priority pending
   *    exception, and whether it targets Secure state
 diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/armv7m_nvic.c
+--- a/target/arm/mve.decode
-+++ b/hw/intc/armv7m_nvic.c
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ void armv7m_nvic_set_pending_derived(void *opaque, int irq, bool secure)
+@@ -XXX,XX +XXX,XX @@
-     do_armv7m_nvic_set_pending(opaque, irq, secure, true);
+ &vcmp_scalar qn rm size mask
  &shl_scalar qda rm size
  &vmaxv qm rda size
 +&vabav qn qm rda size
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@ VMLAS            111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
                   rdahi=%rdahi rdalo=%rdalo
  }
-+void armv7m_nvic_set_pending_lazyfp(void *opaque, int irq, bool secure)
++@vabav           .... .... .. size:2 .... rda:4 .... .... .... &vabav qn=%qn qm=%qm
 +{
 +    /*
 +     * Pend an exception during lazy FP stacking. This differs
 +     * from the usual exception pending because the logic for
 +     * whether we should escalate depends on the saved context
 +     * in the FPCCR register, not on the current state of the CPU/NVIC.
 +     */
 +    NVICState *s = (NVICState *)opaque;
 +    bool banked = exc_is_banked(irq);
 +    VecInfo *vec;
 +    bool targets_secure;
 +    bool escalate = false;
 +    /*
 +     * We will only look at bits in fpccr if this is a banked exception
 +     * (in which case 'secure' tells us whether it is the S or NS version).
 +     * All the bits for the non-banked exceptions are in fpccr_s.
 +     */
 +    uint32_t fpccr_s = s->cpu->env.v7m.fpccr[M_REG_S];
 +    uint32_t fpccr = s->cpu->env.v7m.fpccr[secure];
 +
-+    assert(irq > ARMV7M_EXCP_RESET && irq < s->num_irq);
++VABAV_S          111 0 1110 10 .. ... 0 .... 1111 . 0 . 0 ... 1 @vabav
-+    assert(!secure || banked);
++VABAV_U          111 1 1110 10 .. ... 0 .... 1111 . 0 . 0 ... 1 @vabav
 +
-+    vec = (banked && secure) ? &s->sec_vectors[irq] : &s->vectors[irq];
+ # Logical immediate operations (1 reg and modified-immediate)
-+
-+    targets_secure = banked ? secure : exc_targets_secure(s, irq);
+ # The cmode/op bits here decode VORR/VBIC/VMOV/VMVN, but
-+
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-+    switch (irq) {
+index XXXXXXX..XXXXXXX 100644
-+    case ARMV7M_EXCP_DEBUG:
+--- a/target/arm/mve_helper.c
-+        if (!(fpccr_s & R_V7M_FPCCR_MONRDY_MASK)) {
++++ b/target/arm/mve_helper.c
-+            /* Ignore DebugMonitor exception */
+@@ -XXX,XX +XXX,XX @@ DO_VMAXMINV(vminavb, 1, int8_t, uint8_t, do_mina)
-+            return;
+ DO_VMAXMINV(vminavh, 2, int16_t, uint16_t, do_mina)
-+        }
+ DO_VMAXMINV(vminavw, 4, int32_t, uint32_t, do_mina)
-+        break;
-+    case ARMV7M_EXCP_MEM:
++#define DO_VABAV(OP, ESIZE, TYPE)                               \
-+        escalate = !(fpccr & R_V7M_FPCCR_MMRDY_MASK);
++    uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, \
-+        break;
++                                    void *vm, uint32_t ra)      \
-+    case ARMV7M_EXCP_USAGE:
++    {                                                           \
-+        escalate = !(fpccr & R_V7M_FPCCR_UFRDY_MASK);
++        uint16_t mask = mve_element_mask(env);                  \
-+        break;
++        unsigned e;                                             \
-+    case ARMV7M_EXCP_BUS:
++        TYPE *m = vm, *n = vn;                                  \
-+        escalate = !(fpccr_s & R_V7M_FPCCR_BFRDY_MASK);
++        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
-+        break;
++            if (mask & 1) {                                     \
-+    case ARMV7M_EXCP_SECURE:
++                int64_t n0 = n[H##ESIZE(e)];                    \
-+        escalate = !(fpccr_s & R_V7M_FPCCR_SFRDY_MASK);
++                int64_t m0 = m[H##ESIZE(e)];                    \
-+        break;
++                uint32_t r = n0 >= m0 ? (n0 - m0) : (m0 - n0);  \
-+    default:
++                ra += r;                                        \
-+        g_assert_not_reached();
++            }                                                   \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +        return ra;                                              \
 +    }
 +
-+    if (escalate) {
++DO_VABAV(vabavsb, 1, int8_t)
-+        /*
++DO_VABAV(vabavsh, 2, int16_t)
-+         * Escalate to HardFault: faults that initially targeted Secure
++DO_VABAV(vabavsw, 4, int32_t)
-+         * continue to do so, even if HF normally targets NonSecure.
++DO_VABAV(vabavub, 1, uint8_t)
-+         */
++DO_VABAV(vabavuh, 2, uint16_t)
-+        irq = ARMV7M_EXCP_HARD;
++DO_VABAV(vabavuw, 4, uint32_t)
-+        if (arm_feature(&s->cpu->env, ARM_FEATURE_M_SECURITY) &&
++
-+            (targets_secure ||
+ #define DO_VADDLV(OP, TYPE, LTYPE)                              \
-+             !(s->cpu->env.v7m.aircr & R_V7M_AIRCR_BFHFNMINS_MASK))) {
+     uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
-+            vec = &s->sec_vectors[irq];
+                                     uint64_t ra)                \
-+        } else {
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-+            vec = &s->vectors[irq];
+index XXXXXXX..XXXXXXX 100644
-+        }
+--- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
  typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
  typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenScalarCmpFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 +typedef void MVEGenVABAVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ DO_VMAXV(VMAXAV, vmaxav)
  DO_VMAXV(VMINV_S, vminvs)
  DO_VMAXV(VMINV_U, vminvu)
  DO_VMAXV(VMINAV, vminav)
 +
 +static bool do_vabav(DisasContext *s, arg_vabav *a, MVEGenVABAVFn *fn)
 +{
 +    /* Absolute difference accumulated across vector */
 +    TCGv_ptr qn, qm;
 +    TCGv_i32 rda;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qm | a->qn) ||
 +        !fn || a->rda == 13 || a->rda == 15) {
 +        /* Rda cases are UNPREDICTABLE */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    if (!vec->enabled ||
++    qm = mve_qreg_ptr(a->qm);
-+        nvic_exec_prio(s) <= exc_group_prio(s, vec->prio, secure)) {
++    qn = mve_qreg_ptr(a->qn);
-+        if (!(fpccr_s & R_V7M_FPCCR_HFRDY_MASK)) {
++    rda = load_reg(s, a->rda);
-+            /*
++    fn(rda, cpu_env, qn, qm, rda);
-+             * We want to escalate to HardFault but the context the
++    store_reg(s, a->rda, rda);
-+             * FP state belongs to prevents the exception pre-empting.
++    tcg_temp_free_ptr(qm);
-+             */
++    tcg_temp_free_ptr(qn);
-+            cpu_abort(&s->cpu->parent_obj,
++    mve_update_eci(s);
-+                      "Lockup: can't escalate to HardFault during "
++    return true;
-+                      "lazy FP register stacking\n");
++}
-+        }
++
 +#define DO_VABAV(INSN, FN)                                      \
 +    static bool trans_##INSN(DisasContext *s, arg_vabav *a)     \
 +    {                                                           \
 +        static MVEGenVABAVFn * const fns[] = {                  \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +            gen_helper_mve_##FN##w,                             \
 +            NULL,                                               \
 +        };                                                      \
 +        return do_vabav(s, a, fns[a->size]);                    \
 +    }
 +
-+    if (escalate) {
++DO_VABAV(VABAV_S, vabavs)
-+        s->cpu->env.v7m.hfsr |= R_V7M_HFSR_FORCED_MASK;
++DO_VABAV(VABAV_U, vabavu)
 +    }
 +    if (!vec->pending) {
 +        vec->pending = 1;
 +        /*
 +         * We do not call nvic_irq_update(), because we know our caller
 +         * is going to handle causing us to take the exception by
 +         * raising EXCP_LAZYFP, so raising the IRQ line would be
 +         * pointless extra work. We just need to recompute the
 +         * priorities so that armv7m_nvic_can_take_pending_exception()
 +         * returns the right answer.
 +         */
 +        nvic_recompute_state(s);
 +    }
 +}
 +
  /* Make pending IRQ active.  */
  void armv7m_nvic_acknowledge_irq(void *opaque)
  {
 --
 .20.1

-[Qemu-devel] [PULL 13/42] target/arm: Handle floating point registers in exception entry
+[PULL 23/44] target/arm: Implement MVE narrowing moves
-Handle floating point registers in exception entry.
+Implement the MVE narrowing move insns VMOVN, VQMOVN and VQMOVUN.
-This corresponds to the FP-specific parts of the pseudocode
+These take a double-width input, narrow it (possibly saturating) and
-functions ActivateException() and PushStack().
+store the result to either the top or bottom half of the output
+element.
 We defer the code corresponding to UpdateFPCCR() to a later patch.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-11-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 98 +++++++++++++++++++++++++++++++++++++++++++--
+ target/arm/helper-mve.h    | 20 ++++++++++
-file changed, 95 insertions(+), 3 deletions(-)
+ target/arm/mve.decode      | 12 ++++++
+ target/arm/mve_helper.c    | 78 ++++++++++++++++++++++++++++++++++++++
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+ target/arm/translate-mve.c | 22 +++++++++++
-index XXXXXXX..XXXXXXX 100644
+files changed, 132 insertions(+)
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void v7m_exception_taken(ARMCPU *cpu, uint32_t lr, bool dotailchain,
+index XXXXXXX..XXXXXXX 100644
-     switch_v7m_security_state(env, targets_secure);
+--- a/target/arm/helper-mve.h
-     write_v7m_control_spsel(env, 0);
++++ b/target/arm/helper-mve.h
-     arm_clear_exclusive(env);
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+    /* Clear SFPA and FPCA (has no effect if no FPU) */
+ DEF_HELPER_FLAGS_3(mve_vfnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+    env->v7m.control[M_REG_S] &=
+ DEF_HELPER_FLAGS_3(mve_vfnegs, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+        ~(R_V7M_CONTROL_FPCA_MASK | R_V7M_CONTROL_SFPA_MASK);
-     /* Clear IT bits */
++DEF_HELPER_FLAGS_3(mve_vmovnbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     env->condexec_bits = 0;
++DEF_HELPER_FLAGS_3(mve_vmovnbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     env->regs[14] = lr;
++DEF_HELPER_FLAGS_3(mve_vmovntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
++DEF_HELPER_FLAGS_3(mve_vmovnth, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     uint32_t xpsr = xpsr_read(env);
++
-     uint32_t frameptr = env->regs[13];
++DEF_HELPER_FLAGS_3(mve_vqmovunbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     ARMMMUIdx mmu_idx = arm_mmu_idx(env);
++DEF_HELPER_FLAGS_3(mve_vqmovunbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+    uint32_t framesize;
++DEF_HELPER_FLAGS_3(mve_vqmovuntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+    bool nsacr_cp10 = extract32(env->v7m.nsacr, 10, 1);
++DEF_HELPER_FLAGS_3(mve_vqmovunth, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
-+    if ((env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK) &&
++DEF_HELPER_FLAGS_3(mve_vqmovnbsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+        (env->v7m.secure || nsacr_cp10)) {
++DEF_HELPER_FLAGS_3(mve_vqmovnbsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+        if (env->v7m.secure &&
++DEF_HELPER_FLAGS_3(mve_vqmovntsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+            env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_TS_MASK) {
++DEF_HELPER_FLAGS_3(mve_vqmovntsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+            framesize = 0xa8;
++
-+        } else {
++DEF_HELPER_FLAGS_3(mve_vqmovnbub, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+            framesize = 0x68;
++DEF_HELPER_FLAGS_3(mve_vqmovnbuh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+        }
++DEF_HELPER_FLAGS_3(mve_vqmovntub, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+    } else {
++DEF_HELPER_FLAGS_3(mve_vqmovntuh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+        framesize = 0x20;
++
  DEF_HELPER_FLAGS_4(mve_vand, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vbic, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
    VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
    VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
 +  VQMOVUNB       111 0 1110 0 . 11 .. 01 ... 0 1110 1 0 . 0 ... 1 @1op
 +  VQMOVN_BS      111 0 1110 0 . 11 .. 11 ... 0 1110 0 0 . 0 ... 1 @1op
 +
    VMULH_S        111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
  }
@@ -XXX,XX +XXX,XX @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
    VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
    VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
 +  VMOVNB         111 1 1110 0 . 11 .. 01 ... 0 1110 1 0 . 0 ... 1 @1op
 +  VQMOVN_BU      111 1 1110 0 . 11 .. 11 ... 0 1110 0 0 . 0 ... 1 @1op
 +
    VMULH_U        111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
  }
@@ -XXX,XX +XXX,XX @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
    VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
    VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
 +  VQMOVUNT       111 0 1110 0 . 11 .. 01 ... 1 1110 1 0 . 0 ... 1 @1op
 +  VQMOVN_TS      111 0 1110 0 . 11 .. 11 ... 1 1110 0 0 . 0 ... 1 @1op
 +
    VRMULH_S       111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
  }
@@ -XXX,XX +XXX,XX @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
    VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
    VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
 +  VMOVNT         111 1 1110 0 . 11 .. 01 ... 1 1110 1 0 . 0 ... 1 @1op
 +  VQMOVN_TU      111 1 1110 0 . 11 .. 11 ... 1 1110 0 0 . 0 ... 1 @1op
 +
    VRMULH_U       111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
  }
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VSHRN_SAT_UH(vqrshrnb_uh, vqrshrnt_uh, DO_RSHRN_UH)
  DO_VSHRN_SAT_SB(vqrshrunbb, vqrshruntb, DO_RSHRUN_B)
  DO_VSHRN_SAT_SH(vqrshrunbh, vqrshrunth, DO_RSHRUN_H)
 +#define DO_VMOVN(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE)                   \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
 +    {                                                                   \
 +        LTYPE *m = vm;                                                  \
 +        TYPE *d = vd;                                                   \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned le;                                                    \
 +        mask >>= ESIZE * TOP;                                           \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
 +            mergemask(&d[H##ESIZE(le * 2 + TOP)],                       \
 +                      m[H##LESIZE(le)], mask);                          \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
++
-     /* Align stack pointer if the guest wants that */
++DO_VMOVN(vmovnbb, false, 1, uint8_t, 2, uint16_t)
-     if ((frameptr & 4) &&
++DO_VMOVN(vmovnbh, false, 2, uint16_t, 4, uint32_t)
-@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
++DO_VMOVN(vmovntb, true, 1, uint8_t, 2, uint16_t)
-         xpsr |= XPSR_SPREALIGN;
++DO_VMOVN(vmovnth, true, 2, uint16_t, 4, uint32_t)
-     }
++
++#define DO_VMOVN_SAT(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN)           \
--    frameptr -= 0x20;
++    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
-+    xpsr &= ~XPSR_SFPA;
++    {                                                                   \
-+    if (env->v7m.secure &&
++        LTYPE *m = vm;                                                  \
-+        (env->v7m.control[M_REG_S] & R_V7M_CONTROL_SFPA_MASK)) {
++        TYPE *d = vd;                                                   \
-+        xpsr |= XPSR_SFPA;
++        uint16_t mask = mve_element_mask(env);                          \
 +        bool qc = false;                                                \
 +        unsigned le;                                                    \
 +        mask >>= ESIZE * TOP;                                           \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
 +            bool sat = false;                                           \
 +            TYPE r = FN(m[H##LESIZE(le)], &sat);                        \
 +            mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);             \
 +            qc |= sat & mask & 1;                                       \
 +        }                                                               \
 +        if (qc) {                                                       \
 +            env->vfp.qc[0] = qc;                                        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
-+    frameptr -= framesize;
++#define DO_VMOVN_SAT_UB(BOP, TOP, FN)                           \
++    DO_VMOVN_SAT(BOP, false, 1, uint8_t, 2, uint16_t, FN)       \
-     if (arm_feature(env, ARM_FEATURE_V8)) {
++    DO_VMOVN_SAT(TOP, true, 1, uint8_t, 2, uint16_t, FN)
-         uint32_t limit = v7m_sp_limit(env);
++
-@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
++#define DO_VMOVN_SAT_UH(BOP, TOP, FN)                           \
-         v7m_stack_write(cpu, frameptr + 24, env->regs[15], mmu_idx, false) &&
++    DO_VMOVN_SAT(BOP, false, 2, uint16_t, 4, uint32_t, FN)      \
-         v7m_stack_write(cpu, frameptr + 28, xpsr, mmu_idx, false);
++    DO_VMOVN_SAT(TOP, true, 2, uint16_t, 4, uint32_t, FN)
++
-+    if (env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK) {
++#define DO_VMOVN_SAT_SB(BOP, TOP, FN)                           \
-+        /* FPU is active, try to save its registers */
++    DO_VMOVN_SAT(BOP, false, 1, int8_t, 2, int16_t, FN)         \
-+        bool fpccr_s = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
++    DO_VMOVN_SAT(TOP, true, 1, int8_t, 2, int16_t, FN)
-+        bool lspact = env->v7m.fpccr[fpccr_s] & R_V7M_FPCCR_LSPACT_MASK;
++
-+
++#define DO_VMOVN_SAT_SH(BOP, TOP, FN)                           \
-+        if (lspact && arm_feature(env, ARM_FEATURE_M_SECURITY)) {
++    DO_VMOVN_SAT(BOP, false, 2, int16_t, 4, int32_t, FN)        \
-+            qemu_log_mask(CPU_LOG_INT,
++    DO_VMOVN_SAT(TOP, true, 2, int16_t, 4, int32_t, FN)
-+                          "...SecureFault because LSPACT and FPCA both set\n");
++
-+            env->v7m.sfsr |= R_V7M_SFSR_LSERR_MASK;
++#define DO_VQMOVN_SB(N, SATP)                           \
-+            armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SECURE, false);
++    do_sat_bhs((int64_t)(N), INT8_MIN, INT8_MAX, SATP)
-+        } else if (!env->v7m.secure && !nsacr_cp10) {
++#define DO_VQMOVN_UB(N, SATP)                           \
-+            qemu_log_mask(CPU_LOG_INT,
++    do_sat_bhs((uint64_t)(N), 0, UINT8_MAX, SATP)
-+                          "...Secure UsageFault with CFSR.NOCP because "
++#define DO_VQMOVUN_B(N, SATP)                           \
-+                          "NSACR.CP10 prevents stacking FP regs\n");
++    do_sat_bhs((int64_t)(N), 0, UINT8_MAX, SATP)
-+            armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, M_REG_S);
++
-+            env->v7m.cfsr[M_REG_S] |= R_V7M_CFSR_NOCP_MASK;
++#define DO_VQMOVN_SH(N, SATP)                           \
-+        } else {
++    do_sat_bhs((int64_t)(N), INT16_MIN, INT16_MAX, SATP)
-+            if (!(env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_LSPEN_MASK)) {
++#define DO_VQMOVN_UH(N, SATP)                           \
-+                /* Lazy stacking disabled, save registers now */
++    do_sat_bhs((uint64_t)(N), 0, UINT16_MAX, SATP)
-+                int i;
++#define DO_VQMOVUN_H(N, SATP)                           \
-+                bool cpacr_pass = v7m_cpacr_pass(env, env->v7m.secure,
++    do_sat_bhs((int64_t)(N), 0, UINT16_MAX, SATP)
-+                                                 arm_current_el(env) != 0);
++
-+
++DO_VMOVN_SAT_SB(vqmovnbsb, vqmovntsb, DO_VQMOVN_SB)
-+                if (stacked_ok && !cpacr_pass) {
++DO_VMOVN_SAT_SH(vqmovnbsh, vqmovntsh, DO_VQMOVN_SH)
-+                    /*
++DO_VMOVN_SAT_UB(vqmovnbub, vqmovntub, DO_VQMOVN_UB)
-+                     * Take UsageFault if CPACR forbids access. The pseudocode
++DO_VMOVN_SAT_UH(vqmovnbuh, vqmovntuh, DO_VQMOVN_UH)
-+                     * here does a full CheckCPEnabled() but we know the NSACR
++DO_VMOVN_SAT_SB(vqmovunbb, vqmovuntb, DO_VQMOVUN_B)
-+                     * check can never fail as we have already handled that.
++DO_VMOVN_SAT_SH(vqmovunbh, vqmovunth, DO_VQMOVUN_H)
-+                     */
++
-+                    qemu_log_mask(CPU_LOG_INT,
+ uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
-+                                  "...UsageFault with CFSR.NOCP because "
+                            uint32_t shift)
-+                                  "CPACR.CP10 prevents stacking FP regs\n");
+ {
-+                    armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE,
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-+                                            env->v7m.secure);
+index XXXXXXX..XXXXXXX 100644
-+                    env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_NOCP_MASK;
+--- a/target/arm/translate-mve.c
-+                    stacked_ok = false;
++++ b/target/arm/translate-mve.c
-+                }
+@@ -XXX,XX +XXX,XX @@ DO_1OP(VCLS, vcls)
-+
+ DO_1OP(VABS, vabs)
-+                for (i = 0; i < ((framesize == 0xa8) ? 32 : 16); i += 2) {
+ DO_1OP(VNEG, vneg)
-+                    uint64_t dn = *aa32_vfp_dreg(env, i / 2);
-+                    uint32_t faddr = frameptr + 0x20 + 4 * i;
++/* Narrowing moves: only size 0 and 1 are valid */
-+                    uint32_t slo = extract64(dn, 0, 32);
++#define DO_VMOVN(INSN, FN) \
-+                    uint32_t shi = extract64(dn, 32, 32);
++    static bool trans_##INSN(DisasContext *s, arg_1op *a)       \
-+
++    {                                                           \
-+                    if (i >= 16) {
++        static MVEGenOneOpFn * const fns[] = {                  \
-+                        faddr += 8; /* skip the slot for the FPSCR */
++            gen_helper_mve_##FN##b,                             \
-+                    }
++            gen_helper_mve_##FN##h,                             \
-+                    stacked_ok = stacked_ok &&
++            NULL,                                               \
-+                        v7m_stack_write(cpu, faddr, slo, mmu_idx, false) &&
++            NULL,                                               \
-+                        v7m_stack_write(cpu, faddr + 4, shi, mmu_idx, false);
++        };                                                      \
-+                }
++        return do_1op(s, a, fns[a->size]);                      \
 +                stacked_ok = stacked_ok &&
 +                    v7m_stack_write(cpu, frameptr + 0x60,
 +                                    vfp_get_fpscr(env), mmu_idx, false);
 +                if (cpacr_pass) {
 +                    for (i = 0; i < ((framesize == 0xa8) ? 32 : 16); i += 2) {
 +                        *aa32_vfp_dreg(env, i / 2) = 0;
 +                    }
 +                    vfp_set_fpscr(env, 0);
 +                }
 +            } else {
 +                /* Lazy stacking enabled, save necessary info to stack later */
 +                /* TODO : equivalent of UpdateFPCCR() pseudocode */
 +            }
 +        }
 +    }
 +
-     /*
++DO_VMOVN(VMOVNB, vmovnb)
-      * If we broke a stack limit then SP was already updated earlier;
++DO_VMOVN(VMOVNT, vmovnt)
-      * otherwise we update SP regardless of whether any of the stack
++DO_VMOVN(VQMOVUNB, vqmovunb)
-@@ -XXX,XX +XXX,XX @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
++DO_VMOVN(VQMOVUNT, vqmovunt)
++DO_VMOVN(VQMOVN_BS, vqmovnbs)
-     if (arm_feature(env, ARM_FEATURE_V8)) {
++DO_VMOVN(VQMOVN_TS, vqmovnts)
-         lr = R_V7M_EXCRET_RES1_MASK |
++DO_VMOVN(VQMOVN_BU, vqmovnbu)
--            R_V7M_EXCRET_DCRS_MASK |
++DO_VMOVN(VQMOVN_TU, vqmovntu)
--            R_V7M_EXCRET_FTYPE_MASK;
++
-+            R_V7M_EXCRET_DCRS_MASK;
+ static bool trans_VREV16(DisasContext *s, arg_1op *a)
-         /* The S bit indicates whether we should return to Secure
+ {
-          * or NonSecure (ie our current state).
+     static MVEGenOneOpFn * const fns[] = {
           * The ES bit indicates whether we're taking this exception
@@ -XXX,XX +XXX,XX @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
          if (env->v7m.secure) {
              lr |= R_V7M_EXCRET_S_MASK;
          }
 +        if (!(env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK)) {
 +            lr |= R_V7M_EXCRET_FTYPE_MASK;
 +        }
      } else {
          lr = R_V7M_EXCRET_RES1_MASK |
              R_V7M_EXCRET_S_MASK |
 --
 .20.1

-[Qemu-devel] [PULL 09/42] target/arm: Decode FP instructions for M profile
+[PULL 24/44] target/arm: Rename MVEGenDualAccOpFn to MVEGenLongDualAccOpFn
-Correct the decode of the M-profile "coprocessor and
+The MVEGenDualAccOpFn is a bit misnamed, since it is used for
-floating-point instructions" space:
+the "long dual accumulate" operations that use a 64-bit
- * op0 == 0b11 is always unallocated
+accumulator. Rename it to MVEGenLongDualAccOpFn so we can
- * if the CPU has an FPU then all insns with op1 == 0b101
+use the former name for the 32-bit accumulator insns.
    are floating point and go to disas_vfp_insn()
 For the moment we leave VLLDM and VLSTM as NOPs; in
 a later commit we will fill in the proper implementation
 for the case where an FPU is present.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-7-peter.maydell@linaro.org
 ---
- target/arm/translate.c | 26 ++++++++++++++++++++++----
+ target/arm/translate-mve.c | 16 ++++++++--------
-file changed, 22 insertions(+), 4 deletions(-)
+file changed, 8 insertions(+), 8 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/translate-mve.c
-+++ b/target/arm/translate.c
++++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
-     case 6: case 7: case 14: case 15:
+ typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
-         /* Coprocessor.  */
+ typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
-         if (arm_dc_feature(s, ARM_FEATURE_M)) {
+ typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
--            /* We don't currently implement M profile FP support,
+-typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
--             * so this entire space should give a NOCP fault, with
++typedef void MVEGenLongDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
--             * the exception of the v8M VLLDM and VLSTM insns, which
+ typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
--             * must be NOPs in Secure state and UNDEF in Nonsecure state.
+ typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
-+            /* 0b111x_11xx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx */
+ typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
-+            if (extract32(insn, 24, 2) == 3) {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMULLT_scalar(DisasContext *s, arg_2scalar *a)
-+                goto illegal_op; /* op0 = 0b11 : unallocated */
+ }
-+            }
-+
+ static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
-+            /*
+-                             MVEGenDualAccOpFn *fn)
-+             * Decode VLLDM and VLSTM first: these are nonstandard because:
++                             MVEGenLongDualAccOpFn *fn)
-+             *  * if there is no FPU then these insns must NOP in
+ {
-+             *    Secure state and UNDEF in Nonsecure state
+     TCGv_ptr qn, qm;
-+             *  * if there is an FPU then these insns do not have
+     TCGv_i64 rda;
-+             *    the usual behaviour that disas_vfp_insn() provides of
+@@ -XXX,XX +XXX,XX @@ static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
-+             *    being controlled by CPACR/NSACR enable bits or the
-+             *    lazy-stacking logic.
+ static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
-              */
+ {
-             if (arm_dc_feature(s, ARM_FEATURE_V8) &&
+-    static MVEGenDualAccOpFn * const fns[4][2] = {
-                 (insn & 0xffa00f00) == 0xec200a00) {
++    static MVEGenLongDualAccOpFn * const fns[4][2] = {
-@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
+         { NULL, NULL },
-                 /* Just NOP since FP support is not implemented */
+         { gen_helper_mve_vmlaldavsh, gen_helper_mve_vmlaldavxsh },
-                 break;
+         { gen_helper_mve_vmlaldavsw, gen_helper_mve_vmlaldavxsw },
-             }
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
-+            if (arm_dc_feature(s, ARM_FEATURE_VFP) &&
-+                ((insn >> 8) & 0xe) == 10) {
+ static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
-+                /* FP, and the CPU supports it */
+ {
-+                if (disas_vfp_insn(s, insn)) {
+-    static MVEGenDualAccOpFn * const fns[4][2] = {
-+                    goto illegal_op;
++    static MVEGenLongDualAccOpFn * const fns[4][2] = {
-+                }
+         { NULL, NULL },
-+                break;
+         { gen_helper_mve_vmlaldavuh, NULL },
-+            }
+         { gen_helper_mve_vmlaldavuw, NULL },
-+
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
-             /* All other insns: NOCP */
-             gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
+ static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
-                                default_exception_el(s));
+ {
 -    static MVEGenDualAccOpFn * const fns[4][2] = {
 +    static MVEGenLongDualAccOpFn * const fns[4][2] = {
          { NULL, NULL },
          { gen_helper_mve_vmlsldavsh, gen_helper_mve_vmlsldavxsh },
          { gen_helper_mve_vmlsldavsw, gen_helper_mve_vmlsldavxsw },
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
  static bool trans_VRMLALDAVH_S(DisasContext *s, arg_vmlaldav *a)
  {
 -    static MVEGenDualAccOpFn * const fns[] = {
 +    static MVEGenLongDualAccOpFn * const fns[] = {
          gen_helper_mve_vrmlaldavhsw, gen_helper_mve_vrmlaldavhxsw,
      };
      return do_long_dual_acc(s, a, fns[a->x]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLALDAVH_S(DisasContext *s, arg_vmlaldav *a)
  static bool trans_VRMLALDAVH_U(DisasContext *s, arg_vmlaldav *a)
  {
 -    static MVEGenDualAccOpFn * const fns[] = {
 +    static MVEGenLongDualAccOpFn * const fns[] = {
          gen_helper_mve_vrmlaldavhuw, NULL,
      };
      return do_long_dual_acc(s, a, fns[a->x]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLALDAVH_U(DisasContext *s, arg_vmlaldav *a)
  static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
  {
 -    static MVEGenDualAccOpFn * const fns[] = {
 +    static MVEGenLongDualAccOpFn * const fns[] = {
          gen_helper_mve_vrmlsldavhsw, gen_helper_mve_vrmlsldavhxsw,
      };
      return do_long_dual_acc(s, a, fns[a->x]);
 --
 .20.1

-[Qemu-devel] [PULL 06/42] target/arm: Implement dummy versions of M-profile FP-related registers
+[PULL 25/44] target/arm: Implement MVE VMLADAV and VMLSLDAV
-The M-profile floating point support has three associated config
+Implement the MVE VMLADAV and VMLSLDAV insns.  Like the VMLALDAV and
-registers: FPCAR, FPCCR and FPDSCR. It also makes the registers
+VMLSLDAV insns already implemented, these accumulate multiplied
-CPACR and NSACR have behaviour other than reads-as-zero.
+vector elements; but they accumulate a 32-bit result rather than a
-Add support for all of these as simple reads-as-written registers.
+-bit one.
-We will hook up actual functionality later.
+Note that these encodings overlap with what would be RdaHi=0b111 for
-The main complexity here is handling the FPCCR register, which
+VMLALDAV, VMLSLDAV, VRMLALDAVH and VRMLSLDAVH.
 has a mix of banked and unbanked bits.
 Note that we don't share storage with the A-profile
 cpu->cp15.nsacr and cpu->cp15.cpacr_el1, though the behaviour
 is quite similar, for two reasons:
  * the M profile CPACR is banked between security states
  * it preserves the invariant that M profile uses no state
    inside the cp15 substruct
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-4-peter.maydell@linaro.org
 ---
- target/arm/cpu.h      |  34 ++++++++++++
+ target/arm/helper-mve.h    | 17 ++++++++++
- hw/intc/armv7m_nvic.c | 125 ++++++++++++++++++++++++++++++++++++++++++
+ target/arm/mve.decode      | 33 +++++++++++++++++---
- target/arm/cpu.c      |   5 ++
+ target/arm/mve_helper.c    | 41 ++++++++++++++++++++++++
- target/arm/machine.c  |  16 ++++++
+ target/arm/translate-mve.c | 64 ++++++++++++++++++++++++++++++++++++++
-files changed, 180 insertions(+)
+files changed, 150 insertions(+), 5 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/cpu.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrmlaldavhuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-         uint32_t scr[M_REG_NUM_BANKS];
+ DEF_HELPER_FLAGS_4(mve_vrmlsldavhsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-         uint32_t msplim[M_REG_NUM_BANKS];
+ DEF_HELPER_FLAGS_4(mve_vrmlsldavhxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
-         uint32_t psplim[M_REG_NUM_BANKS];
-+        uint32_t fpcar[M_REG_NUM_BANKS];
++DEF_HELPER_FLAGS_4(mve_vmladavsb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+        uint32_t fpccr[M_REG_NUM_BANKS];
++DEF_HELPER_FLAGS_4(mve_vmladavsh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+        uint32_t fpdscr[M_REG_NUM_BANKS];
++DEF_HELPER_FLAGS_4(mve_vmladavsw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+        uint32_t cpacr[M_REG_NUM_BANKS];
++DEF_HELPER_FLAGS_4(mve_vmladavub, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+        uint32_t nsacr;
++DEF_HELPER_FLAGS_4(mve_vmladavuh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-     } v7m;
++DEF_HELPER_FLAGS_4(mve_vmladavuw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vmlsdavb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-     /* Information associated with an exception about to be taken:
++DEF_HELPER_FLAGS_4(mve_vmlsdavh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ FIELD(V7M_CSSELR, LEVEL, 1, 3)
++DEF_HELPER_FLAGS_4(mve_vmlsdavw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-  */
++
- FIELD(V7M_CSSELR, INDEX, 0, 4)
++DEF_HELPER_FLAGS_4(mve_vmladavsxb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vmladavsxh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+/* v7M FPCCR bits */
++DEF_HELPER_FLAGS_4(mve_vmladavsxw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+FIELD(V7M_FPCCR, LSPACT, 0, 1)
++DEF_HELPER_FLAGS_4(mve_vmlsdavxb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+FIELD(V7M_FPCCR, USER, 1, 1)
++DEF_HELPER_FLAGS_4(mve_vmlsdavxh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+FIELD(V7M_FPCCR, S, 2, 1)
++DEF_HELPER_FLAGS_4(mve_vmlsdavxw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
-+FIELD(V7M_FPCCR, THREAD, 3, 1)
++
-+FIELD(V7M_FPCCR, HFRDY, 4, 1)
+ DEF_HELPER_FLAGS_3(mve_vaddvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
-+FIELD(V7M_FPCCR, MMRDY, 5, 1)
+ DEF_HELPER_FLAGS_3(mve_vaddvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
-+FIELD(V7M_FPCCR, BFRDY, 6, 1)
+ DEF_HELPER_FLAGS_3(mve_vaddvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
-+FIELD(V7M_FPCCR, SFRDY, 7, 1)
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-+FIELD(V7M_FPCCR, MONRDY, 8, 1)
+index XXXXXXX..XXXXXXX 100644
-+FIELD(V7M_FPCCR, SPLIMVIOL, 9, 1)
+--- a/target/arm/mve.decode
-+FIELD(V7M_FPCCR, UFRDY, 10, 1)
++++ b/target/arm/mve.decode
-+FIELD(V7M_FPCCR, RES0, 11, 15)
+@@ -XXX,XX +XXX,XX @@ VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
-+FIELD(V7M_FPCCR, TS, 26, 1)
+ %size_16 16:1 !function=plus_1
-+FIELD(V7M_FPCCR, CLRONRETS, 27, 1)
-+FIELD(V7M_FPCCR, CLRONRET, 28, 1)
+ &vmlaldav rdahi rdalo size qn qm x a
-+FIELD(V7M_FPCCR, LSPENS, 29, 1)
++&vmladav rda size qn qm x a
-+FIELD(V7M_FPCCR, LSPEN, 30, 1)
-+FIELD(V7M_FPCCR, ASPEN, 31, 1)
+ @vmlaldav        .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
-+/* These bits are banked. Others are non-banked and live in the M_REG_S bank */
+                  qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
-+#define R_V7M_FPCCR_BANKED_MASK                 \
+ @vmlaldav_nosz   .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
-+    (R_V7M_FPCCR_LSPACT_MASK |                  \
+                  qn=%qn rdahi=%rdahi rdalo=%rdalo size=0 &vmlaldav
-+     R_V7M_FPCCR_USER_MASK |                    \
+-VMLALDAV_S       1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
-+     R_V7M_FPCCR_THREAD_MASK |                  \
+-VMLALDAV_U       1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
-+     R_V7M_FPCCR_MMRDY_MASK |                   \
++@vmladav         .... .... .... ... . ... x:1 .... . . a:1 . qm:3 . \
-+     R_V7M_FPCCR_SPLIMVIOL_MASK |               \
++                 qn=%qn rda=%rdalo size=%size_16 &vmladav
-+     R_V7M_FPCCR_UFRDY_MASK |                   \
++@vmladav_nosz    .... .... .... ... . ... x:1 .... . . a:1 . qm:3 . \
-+     R_V7M_FPCCR_ASPEN_MASK)
++                 qn=%qn rda=%rdalo size=0 &vmladav
 -VMLSLDAV         1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
 +{
 +  VMLADAV_S      1110 1110 1111  ... . ... . 1110 . 0 . 0 ... 0 @vmladav
 +  VMLALDAV_S     1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
 +}
 +{
 +  VMLADAV_U      1111 1110 1111  ... . ... . 1110 . 0 . 0 ... 0 @vmladav
 +  VMLALDAV_U     1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
 +}
 +
 +{
 +  VMLSDAV        1110 1110 1111  ... . ... . 1110 . 0 . 0 ... 1 @vmladav
 +  VMLSLDAV       1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
 +}
 +
 +{
 +  VMLSDAV        1111 1110 1111  ... 0 ... . 1110 . 0 . 0 ... 1 @vmladav_nosz
 +  VRMLSLDAVH     1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
 +}
 +
 +VMLADAV_S        1110 1110 1111  ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
 +VMLADAV_U        1111 1110 1111  ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
  {
    VMAXV_S        1110 1110 1110  .. 10 ....  1111 0 0 . 0 ... 0 @vmaxv
    VMINV_S        1110 1110 1110  .. 10 ....  1111 1 0 . 0 ... 0 @vmaxv
    VMAXAV         1110 1110 1110  .. 00 ....  1111 0 0 . 0 ... 0 @vmaxv
    VMINAV         1110 1110 1110  .. 00 ....  1111 1 0 . 0 ... 0 @vmaxv
 +  VMLADAV_S      1110 1110 1111  ... 0 ... . 1111 . 0 . 0 ... 0 @vmladav_nosz
    VRMLALDAVH_S   1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
  }
  {
    VMAXV_U        1111 1110 1110  .. 10 ....  1111 0 0 . 0 ... 0 @vmaxv
    VMINV_U        1111 1110 1110  .. 10 ....  1111 1 0 . 0 ... 0 @vmaxv
 +  VMLADAV_U      1111 1110 1111  ... 0 ... . 1111 . 0 . 0 ... 0 @vmladav_nosz
    VRMLALDAVH_U   1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
  }
 -VRMLSLDAVH       1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
 -
  # Scalar operations
  VADD_scalar      1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_LDAV(vmlsldavxsh, 2, int16_t, true, +=, -=)
  DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
  DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
 +/*
 + * Multiply add dual accumulate ops
 + */
 +#define DO_DAV(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC) \
 +    uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
 +                                    void *vm, uint32_t a)               \
 +    {                                                                   \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        TYPE *n = vn, *m = vm;                                          \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            if (mask & 1) {                                             \
 +                if (e & 1) {                                            \
 +                    a ODDACC                                            \
 +                        n[H##ESIZE(e - 1 * XCHG)] * m[H##ESIZE(e)];     \
 +                } else {                                                \
 +                    a EVENACC                                           \
 +                        n[H##ESIZE(e + 1 * XCHG)] * m[H##ESIZE(e)];     \
 +                }                                                       \
 +            }                                                           \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +        return a;                                                       \
 +    }
 +
 +#define DO_DAV_S(INSN, XCHG, EVENACC, ODDACC)           \
 +    DO_DAV(INSN##b, 1, int8_t, XCHG, EVENACC, ODDACC)   \
 +    DO_DAV(INSN##h, 2, int16_t, XCHG, EVENACC, ODDACC)  \
 +    DO_DAV(INSN##w, 4, int32_t, XCHG, EVENACC, ODDACC)
 +
 +#define DO_DAV_U(INSN, XCHG, EVENACC, ODDACC)           \
 +    DO_DAV(INSN##b, 1, uint8_t, XCHG, EVENACC, ODDACC)  \
 +    DO_DAV(INSN##h, 2, uint16_t, XCHG, EVENACC, ODDACC) \
 +    DO_DAV(INSN##w, 4, uint32_t, XCHG, EVENACC, ODDACC)
 +
 +DO_DAV_S(vmladavs, false, +=, +=)
 +DO_DAV_U(vmladavu, false, +=, +=)
 +DO_DAV_S(vmlsdav, false, +=, -=)
 +DO_DAV_S(vmladavsx, true, +=, +=)
 +DO_DAV_S(vmlsdavx, true, +=, -=)
 +
  /*
-  * System register ID fields.
+  * Rounding multiply add long dual accumulate high. In the pseudocode
-  */
+  * this is implemented with a 72-bit internal accumulator value of which
-diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/armv7m_nvic.c
+--- a/target/arm/translate-mve.c
-+++ b/hw/intc/armv7m_nvic.c
++++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
+@@ -XXX,XX +XXX,XX @@ typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TC
-     }
+ typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
-     case 0xd84: /* CSSELR */
+ typedef void MVEGenScalarCmpFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
-         return cpu->env.v7m.csselr[attrs.secure];
+ typedef void MVEGenVABAVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
-+    case 0xd88: /* CPACR */
++typedef void MVEGenDualAccOpFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
-+        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
-+            return 0;
+ /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
-+        }
+ static inline long mve_qreg_offset(unsigned reg)
-+        return cpu->env.v7m.cpacr[attrs.secure];
+@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
-+    case 0xd8c: /* NSACR */
+     return do_long_dual_acc(s, a, fns[a->x]);
-+        if (!attrs.secure || !arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+ }
-+            return 0;
-+        }
++static bool do_dual_acc(DisasContext *s, arg_vmladav *a, MVEGenDualAccOpFn *fn)
-+        return cpu->env.v7m.nsacr;
++{
-     /* TODO: Implement debug registers.  */
++    TCGv_ptr qn, qm;
-     case 0xd90: /* MPU_TYPE */
++    TCGv_i32 rda;
-         /* Unified MPU; if the MPU is not present this value is zero */
++
-@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
++    if (!dc_isar_feature(aa32_mve, s) ||
-             return 0;
++        !mve_check_qreg_bank(s, a->qn) ||
-         }
++        !fn) {
-         return cpu->env.v7m.sfar;
++        return false;
-+    case 0xf34: /* FPCCR */
++    }
-+        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+            return 0;
++        return true;
-+        }
++    }
-+        if (attrs.secure) {
++
-+            return cpu->env.v7m.fpccr[M_REG_S];
++    qn = mve_qreg_ptr(a->qn);
-+        } else {
++    qm = mve_qreg_ptr(a->qm);
-+            /*
++
-+             * NS can read LSPEN, CLRONRET and MONRDY. It can read
++    /*
-+             * BFRDY and HFRDY if AIRCR.BFHFNMINS != 0;
++     * This insn is subject to beat-wise execution. Partial execution
-+             * other non-banked bits RAZ.
++     * of an A=0 (no-accumulate) insn which does not execute the first
-+             * TODO: MONRDY should RAZ/WI if DEMCR.SDME is set.
++     * beat must start with the current rda value, not 0.
-+             */
++     */
-+            uint32_t value = cpu->env.v7m.fpccr[M_REG_S];
++    if (a->a || mve_skip_first_beat(s)) {
-+            uint32_t mask = R_V7M_FPCCR_LSPEN_MASK |
++        rda = load_reg(s, a->rda);
-+                R_V7M_FPCCR_CLRONRET_MASK |
++    } else {
-+                R_V7M_FPCCR_MONRDY_MASK;
++        rda = tcg_const_i32(0);
-+
++    }
-+            if (s->cpu->env.v7m.aircr & R_V7M_AIRCR_BFHFNMINS_MASK) {
++
-+                mask |= R_V7M_FPCCR_BFRDY_MASK | R_V7M_FPCCR_HFRDY_MASK;
++    fn(rda, cpu_env, qn, qm, rda);
-+            }
++    store_reg(s, a->rda, rda);
-+
++    tcg_temp_free_ptr(qn);
-+            value &= mask;
++    tcg_temp_free_ptr(qm);
 +
-+            value |= cpu->env.v7m.fpccr[M_REG_NS];
++    mve_update_eci(s);
-+            return value;
++    return true;
-+        }
++}
-+    case 0xf38: /* FPCAR */
++
-+        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
++#define DO_DUAL_ACC(INSN, FN)                                           \
-+            return 0;
++    static bool trans_##INSN(DisasContext *s, arg_vmladav *a)           \
-+        }
++    {                                                                   \
-+        return cpu->env.v7m.fpcar[attrs.secure];
++        static MVEGenDualAccOpFn * const fns[4][2] = {                  \
-+    case 0xf3c: /* FPDSCR */
++            { gen_helper_mve_##FN##b, gen_helper_mve_##FN##xb },        \
-+        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
++            { gen_helper_mve_##FN##h, gen_helper_mve_##FN##xh },        \
-+            return 0;
++            { gen_helper_mve_##FN##w, gen_helper_mve_##FN##xw },        \
-+        }
++            { NULL, NULL },                                             \
-+        return cpu->env.v7m.fpdscr[attrs.secure];
++        };                                                              \
-     case 0xf40: /* MVFR0 */
++        return do_dual_acc(s, a, fns[a->size][a->x]);                   \
-         return cpu->isar.mvfr0;
++    }
-     case 0xf44: /* MVFR1 */
++
-@@ -XXX,XX +XXX,XX @@ static void nvic_writel(NVICState *s, uint32_t offset, uint32_t value,
++DO_DUAL_ACC(VMLADAV_S, vmladavs)
-             cpu->env.v7m.csselr[attrs.secure] = value & R_V7M_CSSELR_INDEX_MASK;
++DO_DUAL_ACC(VMLSDAV, vmlsdav)
-         }
++
-         break;
++static bool trans_VMLADAV_U(DisasContext *s, arg_vmladav *a)
-+    case 0xd88: /* CPACR */
++{
-+        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
++    static MVEGenDualAccOpFn * const fns[4][2] = {
-+            /* We implement only the Floating Point extension's CP10/CP11 */
++        { gen_helper_mve_vmladavub, NULL },
-+            cpu->env.v7m.cpacr[attrs.secure] = value & (0xf << 20);
++        { gen_helper_mve_vmladavuh, NULL },
-+        }
++        { gen_helper_mve_vmladavuw, NULL },
-+        break;
++        { NULL, NULL },
-+    case 0xd8c: /* NSACR */
++    };
-+        if (attrs.secure && arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
++    return do_dual_acc(s, a, fns[a->size][a->x]);
-+            /* We implement only the Floating Point extension's CP10/CP11 */
++}
-+            cpu->env.v7m.nsacr = value & (3 << 10);
++
-+        }
+ static void gen_vpst(DisasContext *s, uint32_t mask)
-+        break;
+ {
-     case 0xd90: /* MPU_TYPE */
+     /*
          return; /* RO */
      case 0xd94: /* MPU_CTRL */
@@ -XXX,XX +XXX,XX @@ static void nvic_writel(NVICState *s, uint32_t offset, uint32_t value,
          }
          break;
      }
 +    case 0xf34: /* FPCCR */
 +        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
 +            /* Not all bits here are banked. */
 +            uint32_t fpccr_s;
 +
 +            if (!arm_feature(&cpu->env, ARM_FEATURE_V8)) {
 +                /* Don't allow setting of bits not present in v7M */
 +                value &= (R_V7M_FPCCR_LSPACT_MASK |
 +                          R_V7M_FPCCR_USER_MASK |
 +                          R_V7M_FPCCR_THREAD_MASK |
 +                          R_V7M_FPCCR_HFRDY_MASK |
 +                          R_V7M_FPCCR_MMRDY_MASK |
 +                          R_V7M_FPCCR_BFRDY_MASK |
 +                          R_V7M_FPCCR_MONRDY_MASK |
 +                          R_V7M_FPCCR_LSPEN_MASK |
 +                          R_V7M_FPCCR_ASPEN_MASK);
 +            }
 +            value &= ~R_V7M_FPCCR_RES0_MASK;
 +
 +            if (!attrs.secure) {
 +                /* Some non-banked bits are configurably writable by NS */
 +                fpccr_s = cpu->env.v7m.fpccr[M_REG_S];
 +                if (!(fpccr_s & R_V7M_FPCCR_LSPENS_MASK)) {
 +                    uint32_t lspen = FIELD_EX32(value, V7M_FPCCR, LSPEN);
 +                    fpccr_s = FIELD_DP32(fpccr_s, V7M_FPCCR, LSPEN, lspen);
 +                }
 +                if (!(fpccr_s & R_V7M_FPCCR_CLRONRETS_MASK)) {
 +                    uint32_t cor = FIELD_EX32(value, V7M_FPCCR, CLRONRET);
 +                    fpccr_s = FIELD_DP32(fpccr_s, V7M_FPCCR, CLRONRET, cor);
 +                }
 +                if ((s->cpu->env.v7m.aircr & R_V7M_AIRCR_BFHFNMINS_MASK)) {
 +                    uint32_t hfrdy = FIELD_EX32(value, V7M_FPCCR, HFRDY);
 +                    uint32_t bfrdy = FIELD_EX32(value, V7M_FPCCR, BFRDY);
 +                    fpccr_s = FIELD_DP32(fpccr_s, V7M_FPCCR, HFRDY, hfrdy);
 +                    fpccr_s = FIELD_DP32(fpccr_s, V7M_FPCCR, BFRDY, bfrdy);
 +                }
 +                /* TODO MONRDY should RAZ/WI if DEMCR.SDME is set */
 +                {
 +                    uint32_t monrdy = FIELD_EX32(value, V7M_FPCCR, MONRDY);
 +                    fpccr_s = FIELD_DP32(fpccr_s, V7M_FPCCR, MONRDY, monrdy);
 +                }
 +
 +                /*
 +                 * All other non-banked bits are RAZ/WI from NS; write
 +                 * just the banked bits to fpccr[M_REG_NS].
 +                 */
 +                value &= R_V7M_FPCCR_BANKED_MASK;
 +                cpu->env.v7m.fpccr[M_REG_NS] = value;
 +            } else {
 +                fpccr_s = value;
 +            }
 +            cpu->env.v7m.fpccr[M_REG_S] = fpccr_s;
 +        }
 +        break;
 +    case 0xf38: /* FPCAR */
 +        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
 +            value &= ~7;
 +            cpu->env.v7m.fpcar[attrs.secure] = value;
 +        }
 +        break;
 +    case 0xf3c: /* FPDSCR */
 +        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
 +            value &= 0x07c00000;
 +            cpu->env.v7m.fpdscr[attrs.secure] = value;
 +        }
 +        break;
      case 0xf50: /* ICIALLU */
      case 0xf58: /* ICIMVAU */
      case 0xf5c: /* DCIMVAC */
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
              env->v7m.ccr[M_REG_S] |= R_V7M_CCR_UNALIGN_TRP_MASK;
          }
 +        if (arm_feature(env, ARM_FEATURE_VFP)) {
 +            env->v7m.fpccr[M_REG_NS] = R_V7M_FPCCR_ASPEN_MASK;
 +            env->v7m.fpccr[M_REG_S] = R_V7M_FPCCR_ASPEN_MASK |
 +                R_V7M_FPCCR_LSPEN_MASK | R_V7M_FPCCR_S_MASK;
 +        }
          /* Unlike A/R profile, M profile defines the reset LR value */
          env->regs[14] = 0xffffffff;
 diff --git a/target/arm/machine.c b/target/arm/machine.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/machine.c
 +++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_v8m = {
      }
  };
 +static const VMStateDescription vmstate_m_fp = {
 +    .name = "cpu/m/fp",
 +    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .needed = vfp_needed,
 +    .fields = (VMStateField[]) {
 +        VMSTATE_UINT32_ARRAY(env.v7m.fpcar, ARMCPU, M_REG_NUM_BANKS),
 +        VMSTATE_UINT32_ARRAY(env.v7m.fpccr, ARMCPU, M_REG_NUM_BANKS),
 +        VMSTATE_UINT32_ARRAY(env.v7m.fpdscr, ARMCPU, M_REG_NUM_BANKS),
 +        VMSTATE_UINT32_ARRAY(env.v7m.cpacr, ARMCPU, M_REG_NUM_BANKS),
 +        VMSTATE_UINT32(env.v7m.nsacr, ARMCPU),
 +        VMSTATE_END_OF_LIST()
 +    }
 +};
 +
  static const VMStateDescription vmstate_m = {
      .name = "cpu/m",
      .version_id = 4,
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m = {
          &vmstate_m_scr,
          &vmstate_m_other_sp,
          &vmstate_m_v8m,
 +        &vmstate_m_fp,
          NULL
      }
  };
 --
 .20.1

-[Qemu-devel] [PULL 05/42] hw/intc/armv7m_nvic: Allow reading of M-profile MVFR* registers
+[PULL 26/44] target/arm: Implement MVE VMLA
-For M-profile the MVFR* ID registers are memory mapped, in the
+Implement the MVE VMLA insn, which multiplies a vector by a scalar
-range we implement via the NVIC. Allow them to be read.
+and accumulates into another vector.
 (If the CPU has no FPU, these registers are defined to be RAZ.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-3-peter.maydell@linaro.org
 ---
- hw/intc/armv7m_nvic.c | 6 ++++++
+ target/arm/helper-mve.h    | 4 ++++
-file changed, 6 insertions(+)
+ target/arm/mve.decode      | 1 +
  target/arm/mve_helper.c    | 5 +++++
  target/arm/translate-mve.c | 1 +
 files changed, 11 insertions(+)
-diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/armv7m_nvic.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/intc/armv7m_nvic.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i3
-             return 0;
+ DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         }
+ DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         return cpu->env.v7m.sfar;
-+    case 0xf40: /* MVFR0 */
++DEF_HELPER_FLAGS_4(mve_vmlab, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        return cpu->isar.mvfr0;
++DEF_HELPER_FLAGS_4(mve_vmlah, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+    case 0xf44: /* MVFR1 */
++DEF_HELPER_FLAGS_4(mve_vmlaw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        return cpu->isar.mvfr1;
++
-+    case 0xf48: /* MVFR2 */
+ DEF_HELPER_FLAGS_4(mve_vmlasb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        return cpu->isar.mvfr2;
+ DEF_HELPER_FLAGS_4(mve_vmlash, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     default:
+ DEF_HELPER_FLAGS_4(mve_vmlasw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     bad_offset:
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-         qemu_log_mask(LOG_GUEST_ERROR, "NVIC: Bad read offset 0x%x\n", offset);
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  # The U bit (28) is don't-care because it does not affect the result
 +VMLA             111- 1110 0 . .. ... 1 ... 0 1110 . 100 .... @2scalar
  VMLAS            111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
  # Vector add across vector
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
  DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
  DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
 +/* Vector by scalar plus vector */
 +#define DO_VMLA(D, N, M) ((N) * (M) + (D))
 +
 +DO_2OP_ACC_SCALAR_U(vmla, DO_VMLA)
 +
  /* Vector by vector plus scalar */
  #define DO_VMLAS(D, N, M) ((N) * (D) + (M))
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
  DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
  DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
  DO_2OP_SCALAR(VBRSR, vbrsr)
 +DO_2OP_SCALAR(VMLA, vmla)
  DO_2OP_SCALAR(VMLAS, vmlas)
  static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
 --
 .20.1

-[Qemu-devel] [PULL 04/42] target/arm: Make sure M-profile FPSCR RES0 bits are not settable
+[PULL 27/44] target/arm: Implement MVE saturating doubling multiply accumulates
-Enforce that for M-profile various FPSCR bits which are RES0 there
+Implement the MVE saturating doubling multiply accumulate insns
-but have defined meanings on A-profile are never settable. This
+VQDMLAH, VQRDMLAH, VQDMLASH and VQRDMLASH.  These perform a multiply,
-ensures that M-profile code can't enable the A-profile behaviour
+double, add the accumulator shifted by the element size, possibly
-(notably vector length/stride handling) by accident.
+round, saturate to twice the element size, then take the high half of
 the result.  The *MLAH insns do vector * scalar + vector, and the
 *MLASH insns do vector * vector + scalar.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-2-peter.maydell@linaro.org
 ---
- target/arm/vfp_helper.c | 8 ++++++++
+ target/arm/helper-mve.h    | 16 +++++++
-file changed, 8 insertions(+)
+ target/arm/mve.decode      |  5 ++
  target/arm/mve_helper.c    | 95 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  4 ++
 files changed, 120 insertions(+)
-diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp_helper.c
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/vfp_helper.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmlasb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         val &= ~FPCR_FZ16;
+ DEF_HELPER_FLAGS_4(mve_vmlash, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vmlasw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqdmlahb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqdmlahh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqdmlahw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqrdmlahb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlahh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlahw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqdmlashb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqdmlashh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqdmlashw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqrdmlashb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlashh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrdmlashw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
  DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
  DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
  DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  VMLA             111- 1110 0 . .. ... 1 ... 0 1110 . 100 .... @2scalar
  VMLAS            111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
 +VQRDMLAH         1110 1110 0 . .. ... 0 ... 0 1110 . 100 .... @2scalar
 +VQRDMLASH        1110 1110 0 . .. ... 0 ... 1 1110 . 100 .... @2scalar
 +VQDMLAH          1110 1110 0 . .. ... 0 ... 0 1110 . 110 .... @2scalar
 +VQDMLASH         1110 1110 0 . .. ... 0 ... 1 1110 . 110 .... @2scalar
 +
  # Vector add across vector
  {
    VADDV          111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, do_vqdmlsdh_w)
          mve_advance_vpt(env);                                           \
      }
-+    if (arm_feature(env, ARM_FEATURE_M)) {
++#define DO_2OP_SAT_ACC_SCALAR(OP, ESIZE, TYPE, FN)                      \
-+        /*
++    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
-+         * M profile FPSCR is RES0 for the QC, STRIDE, FZ16, LEN bits
++                                uint32_t rm)                            \
-+         * and also for the trapped-exception-handling bits IxE.
++    {                                                                   \
-+         */
++        TYPE *d = vd, *n = vn;                                          \
-+        val &= 0xf7c0009f;
++        TYPE m = rm;                                                    \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        bool qc = false;                                                \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            bool sat = false;                                           \
 +            mergemask(&d[H##ESIZE(e)],                                  \
 +                      FN(d[H##ESIZE(e)], n[H##ESIZE(e)], m, &sat),      \
 +                      mask);                                            \
 +            qc |= sat & mask & 1;                                       \
 +        }                                                               \
 +        if (qc) {                                                       \
 +            env->vfp.qc[0] = qc;                                        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
-     /*
+ /* provide unsigned 2-op scalar helpers for all sizes */
-      * We don't implement trapped exception handling, so the
+ #define DO_2OP_SCALAR_U(OP, FN)                 \
-      * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
+     DO_2OP_SCALAR(OP##b, 1, uint8_t, FN)        \
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
  DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
  DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
 +static int8_t do_vqdmlah_b(int8_t a, int8_t b, int8_t c, int round, bool *sat)
 +{
 +    int64_t r = (int64_t)a * b * 2 + ((int64_t)c << 8) + (round << 7);
 +    return do_sat_bhw(r, INT16_MIN, INT16_MAX, sat) >> 8;
 +}
 +
 +static int16_t do_vqdmlah_h(int16_t a, int16_t b, int16_t c,
 +                           int round, bool *sat)
 +{
 +    int64_t r = (int64_t)a * b * 2 + ((int64_t)c << 16) + (round << 15);
 +    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat) >> 16;
 +}
 +
 +static int32_t do_vqdmlah_w(int32_t a, int32_t b, int32_t c,
 +                            int round, bool *sat)
 +{
 +    /*
 +     * Architecturally we should do the entire add, double, round
 +     * and then check for saturation. We do three saturating adds,
 +     * but we need to be careful about the order. If the first
 +     * m1 + m2 saturates then it's impossible for the *2+rc to
 +     * bring it back into the non-saturated range. However, if
 +     * m1 + m2 is negative then it's possible that doing the doubling
 +     * would take the intermediate result below INT64_MAX and the
 +     * addition of the rounding constant then brings it back in range.
 +     * So we add half the rounding constant and half the "c << esize"
 +     * before doubling rather than adding the rounding constant after
 +     * the doubling.
 +     */
 +    int64_t m1 = (int64_t)a * b;
 +    int64_t m2 = (int64_t)c << 31;
 +    int64_t r;
 +    if (sadd64_overflow(m1, m2, &r) ||
 +        sadd64_overflow(r, (round << 30), &r) ||
 +        sadd64_overflow(r, r, &r)) {
 +        *sat = true;
 +        return r < 0 ? INT32_MAX : INT32_MIN;
 +    }
 +    return r >> 32;
 +}
 +
 +/*
 + * The *MLAH insns are vector * scalar + vector;
 + * the *MLASH insns are vector * vector + scalar
 + */
 +#define DO_VQDMLAH_B(D, N, M, S) do_vqdmlah_b(N, M, D, 0, S)
 +#define DO_VQDMLAH_H(D, N, M, S) do_vqdmlah_h(N, M, D, 0, S)
 +#define DO_VQDMLAH_W(D, N, M, S) do_vqdmlah_w(N, M, D, 0, S)
 +#define DO_VQRDMLAH_B(D, N, M, S) do_vqdmlah_b(N, M, D, 1, S)
 +#define DO_VQRDMLAH_H(D, N, M, S) do_vqdmlah_h(N, M, D, 1, S)
 +#define DO_VQRDMLAH_W(D, N, M, S) do_vqdmlah_w(N, M, D, 1, S)
 +
 +#define DO_VQDMLASH_B(D, N, M, S) do_vqdmlah_b(N, D, M, 0, S)
 +#define DO_VQDMLASH_H(D, N, M, S) do_vqdmlah_h(N, D, M, 0, S)
 +#define DO_VQDMLASH_W(D, N, M, S) do_vqdmlah_w(N, D, M, 0, S)
 +#define DO_VQRDMLASH_B(D, N, M, S) do_vqdmlah_b(N, D, M, 1, S)
 +#define DO_VQRDMLASH_H(D, N, M, S) do_vqdmlah_h(N, D, M, 1, S)
 +#define DO_VQRDMLASH_W(D, N, M, S) do_vqdmlah_w(N, D, M, 1, S)
 +
 +DO_2OP_SAT_ACC_SCALAR(vqdmlahb, 1, int8_t, DO_VQDMLAH_B)
 +DO_2OP_SAT_ACC_SCALAR(vqdmlahh, 2, int16_t, DO_VQDMLAH_H)
 +DO_2OP_SAT_ACC_SCALAR(vqdmlahw, 4, int32_t, DO_VQDMLAH_W)
 +DO_2OP_SAT_ACC_SCALAR(vqrdmlahb, 1, int8_t, DO_VQRDMLAH_B)
 +DO_2OP_SAT_ACC_SCALAR(vqrdmlahh, 2, int16_t, DO_VQRDMLAH_H)
 +DO_2OP_SAT_ACC_SCALAR(vqrdmlahw, 4, int32_t, DO_VQRDMLAH_W)
 +
 +DO_2OP_SAT_ACC_SCALAR(vqdmlashb, 1, int8_t, DO_VQDMLASH_B)
 +DO_2OP_SAT_ACC_SCALAR(vqdmlashh, 2, int16_t, DO_VQDMLASH_H)
 +DO_2OP_SAT_ACC_SCALAR(vqdmlashw, 4, int32_t, DO_VQDMLASH_W)
 +DO_2OP_SAT_ACC_SCALAR(vqrdmlashb, 1, int8_t, DO_VQRDMLASH_B)
 +DO_2OP_SAT_ACC_SCALAR(vqrdmlashh, 2, int16_t, DO_VQRDMLASH_H)
 +DO_2OP_SAT_ACC_SCALAR(vqrdmlashw, 4, int32_t, DO_VQRDMLASH_W)
 +
  /* Vector by scalar plus vector */
  #define DO_VMLA(D, N, M) ((N) * (M) + (D))
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
  DO_2OP_SCALAR(VBRSR, vbrsr)
  DO_2OP_SCALAR(VMLA, vmla)
  DO_2OP_SCALAR(VMLAS, vmlas)
 +DO_2OP_SCALAR(VQDMLAH, vqdmlah)
 +DO_2OP_SCALAR(VQRDMLAH, vqrdmlah)
 +DO_2OP_SCALAR(VQDMLASH, vqdmlash)
 +DO_2OP_SCALAR(VQRDMLASH, vqrdmlash)
  static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
  {
 --
 .20.1

-[Qemu-devel] [PULL 16/42] target/arm: Clean excReturn bits when tail chaining
+[PULL 28/44] target/arm: Implement MVE VQABS, VQNEG
-The TailChain() pseudocode specifies that a tail chaining
+Implement the MVE 1-operand saturating operations VQABS and VQNEG.
 exception should sanitize the excReturn all-ones bits and
 (if there is no FPU) the excReturn FType bits; we weren't
 doing this.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-14-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 8 ++++++++
+ target/arm/helper-mve.h    |  8 ++++++++
-file changed, 8 insertions(+)
+ target/arm/mve.decode      |  3 +++
  target/arm/mve_helper.c    | 37 +++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  2 ++
 files changed, 50 insertions(+)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void v7m_exception_taken(ARMCPU *cpu, uint32_t lr, bool dotailchain,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     qemu_log_mask(CPU_LOG_INT, "...taking pending %s exception %d\n",
+ DEF_HELPER_FLAGS_3(mve_vfnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-                   targets_secure ? "secure" : "nonsecure", exc);
+ DEF_HELPER_FLAGS_3(mve_vfnegs, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+    if (dotailchain) {
++DEF_HELPER_FLAGS_3(mve_vqabsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+        /* Sanitize LR FType and PREFIX bits */
++DEF_HELPER_FLAGS_3(mve_vqabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+        if (!arm_feature(env, ARM_FEATURE_VFP)) {
++DEF_HELPER_FLAGS_3(mve_vqabsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+            lr |= R_V7M_EXCRET_FTYPE_MASK;
++
-+        }
++DEF_HELPER_FLAGS_3(mve_vqnegb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-+        lr = deposit32(lr, 24, 8, 0xff);
++DEF_HELPER_FLAGS_3(mve_vqnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +DEF_HELPER_FLAGS_3(mve_vqnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 +
  DEF_HELPER_FLAGS_3(mve_vmovnbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
  DEF_HELPER_FLAGS_3(mve_vmovnbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
  DEF_HELPER_FLAGS_3(mve_vmovntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VABS_fp          1111 1111 1 . 11 .. 01 ... 0 0111 01 . 0 ... 0 @1op
  VNEG             1111 1111 1 . 11 .. 01 ... 0 0011 11 . 0 ... 0 @1op
  VNEG_fp          1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
 +VQABS            1111 1111 1 . 11 .. 00 ... 0 0111 01 . 0 ... 0 @1op
 +VQNEG            1111 1111 1 . 11 .. 00 ... 0 0111 11 . 0 ... 0 @1op
 +
  &vdup qd rt size
  # Qd is in the fields usually named Qn
  @vdup            .... .... . . .. ... . rt:4 .... . . . . .... qd=%qn &vdup
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vpsel)(CPUARMState *env, void *vd, void *vn, void *vm)
      }
      mve_advance_vpt(env);
  }
 +
 +#define DO_1OP_SAT(OP, ESIZE, TYPE, FN)                                 \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
 +    {                                                                   \
 +        TYPE *d = vd, *m = vm;                                          \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        bool qc = false;                                                \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            bool sat = false;                                           \
 +            mergemask(&d[H##ESIZE(e)], FN(m[H##ESIZE(e)], &sat), mask); \
 +            qc |= sat & mask & 1;                                       \
 +        }                                                               \
 +        if (qc) {                                                       \
 +            env->vfp.qc[0] = qc;                                        \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
-     if (arm_feature(env, ARM_FEATURE_V8)) {
++#define DO_VQABS_B(N, SATP) \
-         if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
++    do_sat_bhs(DO_ABS((int64_t)N), INT8_MIN, INT8_MAX, SATP)
-             (lr & R_V7M_EXCRET_S_MASK)) {
++#define DO_VQABS_H(N, SATP) \
 +    do_sat_bhs(DO_ABS((int64_t)N), INT16_MIN, INT16_MAX, SATP)
 +#define DO_VQABS_W(N, SATP) \
 +    do_sat_bhs(DO_ABS((int64_t)N), INT32_MIN, INT32_MAX, SATP)
 +
 +#define DO_VQNEG_B(N, SATP) do_sat_bhs(-(int64_t)N, INT8_MIN, INT8_MAX, SATP)
 +#define DO_VQNEG_H(N, SATP) do_sat_bhs(-(int64_t)N, INT16_MIN, INT16_MAX, SATP)
 +#define DO_VQNEG_W(N, SATP) do_sat_bhs(-(int64_t)N, INT32_MIN, INT32_MAX, SATP)
 +
 +DO_1OP_SAT(vqabsb, 1, int8_t, DO_VQABS_B)
 +DO_1OP_SAT(vqabsh, 2, int16_t, DO_VQABS_H)
 +DO_1OP_SAT(vqabsw, 4, int32_t, DO_VQABS_W)
 +
 +DO_1OP_SAT(vqnegb, 1, int8_t, DO_VQNEG_B)
 +DO_1OP_SAT(vqnegh, 2, int16_t, DO_VQNEG_H)
 +DO_1OP_SAT(vqnegw, 4, int32_t, DO_VQNEG_W)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_1OP(VCLZ, vclz)
  DO_1OP(VCLS, vcls)
  DO_1OP(VABS, vabs)
  DO_1OP(VNEG, vneg)
 +DO_1OP(VQABS, vqabs)
 +DO_1OP(VQNEG, vqneg)
  /* Narrowing moves: only size 0 and 1 are valid */
  #define DO_VMOVN(INSN, FN) \
 --
 .20.1

-[Qemu-devel] [PULL 01/42] hw/arm/smmuv3: Remove SMMUNotifierNode
+[PULL 29/44] target/arm: Implement MVE VMAXA, VMINA
-From: Eric Auger <eric.auger@redhat.com>
+Implement the MVE VMAXA and VMINA insns, which take the absolute
 value of the signed elements in the input vector and then accumulate
 the unsigned max or min into the destination vector.
-The SMMUNotifierNode struct is not necessary and brings extra
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-complexity so let's remove it. We now directly track the SMMUDevices
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-which have registered IOMMU MR notifiers.
+---
  target/arm/helper-mve.h    |  8 ++++++++
  target/arm/mve.decode      |  4 ++++
  target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
  target/arm/translate-mve.c |  2 ++
 files changed, 40 insertions(+)
-This is inspired from the same transformation on intel-iommu
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 done in commit b4a4ba0d68f50f218ee3957b6638dbee32a5eeef
 ("intel-iommu: remove IntelIOMMUNotifierNode")
 Signed-off-by: Eric Auger <eric.auger@redhat.com>
 Reviewed-by: Peter Xu <peterx@redhat.com>
 Message-id: 20190409160219.19026-1-eric.auger@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  include/hw/arm/smmu-common.h |  8 ++------
  hw/arm/smmu-common.c         |  6 +++---
  hw/arm/smmuv3.c              | 28 +++++++---------------------
 files changed, 12 insertions(+), 30 deletions(-)
 diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/smmu-common.h
+--- a/target/arm/helper-mve.h
-+++ b/include/hw/arm/smmu-common.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ typedef struct SMMUDevice {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vqnegb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     AddressSpace       as;
+ DEF_HELPER_FLAGS_3(mve_vqnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     uint32_t           cfg_cache_hits;
+ DEF_HELPER_FLAGS_3(mve_vqnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     uint32_t           cfg_cache_misses;
-+    QLIST_ENTRY(SMMUDevice) next;
++DEF_HELPER_FLAGS_3(mve_vmaxab, TCG_CALL_NO_WG, void, env, ptr, ptr)
- } SMMUDevice;
++DEF_HELPER_FLAGS_3(mve_vmaxah, TCG_CALL_NO_WG, void, env, ptr, ptr)
++DEF_HELPER_FLAGS_3(mve_vmaxaw, TCG_CALL_NO_WG, void, env, ptr, ptr)
--typedef struct SMMUNotifierNode {
++
--    SMMUDevice *sdev;
++DEF_HELPER_FLAGS_3(mve_vminab, TCG_CALL_NO_WG, void, env, ptr, ptr)
--    QLIST_ENTRY(SMMUNotifierNode) next;
++DEF_HELPER_FLAGS_3(mve_vminah, TCG_CALL_NO_WG, void, env, ptr, ptr)
--} SMMUNotifierNode;
++DEF_HELPER_FLAGS_3(mve_vminaw, TCG_CALL_NO_WG, void, env, ptr, ptr)
--
++
- typedef struct SMMUPciBus {
+ DEF_HELPER_FLAGS_3(mve_vmovnbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     PCIBus       *bus;
+ DEF_HELPER_FLAGS_3(mve_vmovnbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
-     SMMUDevice   *pbdev[0]; /* Parent array is sparse, so dynamically alloc */
+ DEF_HELPER_FLAGS_3(mve_vmovntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
-@@ -XXX,XX +XXX,XX @@ typedef struct SMMUState {
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
      GHashTable *iotlb;
      SMMUPciBus *smmu_pcibus_by_bus_num[SMMU_PCI_BUS_MAX];
      PCIBus *pci_bus;
 -    QLIST_HEAD(, SMMUNotifierNode) notifiers_list;
 +    QLIST_HEAD(, SMMUDevice) devices_with_notifiers;
      uint8_t bus_num;
      PCIBus *primary_bus;
  } SMMUState;
 diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/smmu-common.c
+--- a/target/arm/mve.decode
-+++ b/hw/arm/smmu-common.c
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ inline void smmu_inv_notifiers_mr(IOMMUMemoryRegion *mr)
+@@ -XXX,XX +XXX,XX @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
- /* Unmap all notifiers of all mr's */
+   VQMOVUNB       111 0 1110 0 . 11 .. 01 ... 0 1110 1 0 . 0 ... 1 @1op
- void smmu_inv_notifiers_all(SMMUState *s)
+   VQMOVN_BS      111 0 1110 0 . 11 .. 11 ... 0 1110 0 0 . 0 ... 1 @1op
- {
--    SMMUNotifierNode *node;
++  VMAXA          111 0 1110 0 . 11 .. 11 ... 0 1110 1 0 . 0 ... 1 @1op
-+    SMMUDevice *sdev;
++
+   VMULH_S        111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
 -    QLIST_FOREACH(node, &s->notifiers_list, next) {
 -        smmu_inv_notifiers_mr(&node->sdev->iommu);
 +    QLIST_FOREACH(sdev, &s->devices_with_notifiers, next) {
 +        smmu_inv_notifiers_mr(&sdev->iommu);
      }
  }
-diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
+@@ -XXX,XX +XXX,XX @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
    VQMOVUNT       111 0 1110 0 . 11 .. 01 ... 1 1110 1 0 . 0 ... 1 @1op
    VQMOVN_TS      111 0 1110 0 . 11 .. 11 ... 1 1110 0 0 . 0 ... 1 @1op
 +  VMINA          111 0 1110 0 . 11 .. 11 ... 1 1110 1 0 . 0 ... 1 @1op
 +
    VRMULH_S       111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
  }
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/smmuv3.c
+--- a/target/arm/mve_helper.c
-+++ b/hw/arm/smmuv3.c
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
+@@ -XXX,XX +XXX,XX @@ DO_1OP_SAT(vqabsw, 4, int32_t, DO_VQABS_W)
- /* invalidate an asid/iova tuple in all mr's */
+ DO_1OP_SAT(vqnegb, 1, int8_t, DO_VQNEG_B)
- static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid, dma_addr_t iova)
+ DO_1OP_SAT(vqnegh, 2, int16_t, DO_VQNEG_H)
- {
+ DO_1OP_SAT(vqnegw, 4, int32_t, DO_VQNEG_W)
--    SMMUNotifierNode *node;
++
-+    SMMUDevice *sdev;
++/*
++ * VMAXA, VMINA: vd is unsigned; vm is signed, and we take its
--    QLIST_FOREACH(node, &s->notifiers_list, next) {
++ * absolute value; we then do an unsigned comparison.
--        IOMMUMemoryRegion *mr = &node->sdev->iommu;
++ */
-+    QLIST_FOREACH(sdev, &s->devices_with_notifiers, next) {
++#define DO_VMAXMINA(OP, ESIZE, STYPE, UTYPE, FN)                        \
-+        IOMMUMemoryRegion *mr = &sdev->iommu;
++    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
-         IOMMUNotifier *n;
++    {                                                                   \
++        UTYPE *d = vd;                                                  \
-         trace_smmuv3_inv_notifiers_iova(mr->parent_obj.name, asid, iova);
++        STYPE *m = vm;                                                  \
-@@ -XXX,XX +XXX,XX @@ static void smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
++        uint16_t mask = mve_element_mask(env);                          \
-     SMMUDevice *sdev = container_of(iommu, SMMUDevice, iommu);
++        unsigned e;                                                     \
-     SMMUv3State *s3 = sdev->smmu;
++        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
-     SMMUState *s = &(s3->smmu_state);
++            UTYPE r = DO_ABS(m[H##ESIZE(e)]);                           \
--    SMMUNotifierNode *node = NULL;
++            r = FN(d[H##ESIZE(e)], r);                                  \
--    SMMUNotifierNode *next_node = NULL;
++            mergemask(&d[H##ESIZE(e)], r, mask);                        \
++        }                                                               \
-     if (new & IOMMU_NOTIFIER_MAP) {
++        mve_advance_vpt(env);                                           \
-         int bus_num = pci_bus_num(sdev->bus);
++    }
-@@ -XXX,XX +XXX,XX @@ static void smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
++
++DO_VMAXMINA(vmaxab, 1, int8_t, uint8_t, DO_MAX)
-     if (old == IOMMU_NOTIFIER_NONE) {
++DO_VMAXMINA(vmaxah, 2, int16_t, uint16_t, DO_MAX)
-         trace_smmuv3_notify_flag_add(iommu->parent_obj.name);
++DO_VMAXMINA(vmaxaw, 4, int32_t, uint32_t, DO_MAX)
--        node = g_malloc0(sizeof(*node));
++DO_VMAXMINA(vminab, 1, int8_t, uint8_t, DO_MIN)
--        node->sdev = sdev;
++DO_VMAXMINA(vminah, 2, int16_t, uint16_t, DO_MIN)
--        QLIST_INSERT_HEAD(&s->notifiers_list, node, next);
++DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
--        return;
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
--    }
+index XXXXXXX..XXXXXXX 100644
--
+--- a/target/arm/translate-mve.c
--    /* update notifier node with new flags */
++++ b/target/arm/translate-mve.c
--    QLIST_FOREACH_SAFE(node, &s->notifiers_list, next, next_node) {
+@@ -XXX,XX +XXX,XX @@ DO_1OP(VABS, vabs)
--        if (node->sdev == sdev) {
+ DO_1OP(VNEG, vneg)
--            if (new == IOMMU_NOTIFIER_NONE) {
+ DO_1OP(VQABS, vqabs)
--                trace_smmuv3_notify_flag_del(iommu->parent_obj.name);
+ DO_1OP(VQNEG, vqneg)
--                QLIST_REMOVE(node, next);
++DO_1OP(VMAXA, vmaxa)
--                g_free(node);
++DO_1OP(VMINA, vmina)
--            }
--            return;
+ /* Narrowing moves: only size 0 and 1 are valid */
--        }
+ #define DO_VMOVN(INSN, FN) \
 +        QLIST_INSERT_HEAD(&s->devices_with_notifiers, sdev, next);
 +    } else if (new == IOMMU_NOTIFIER_NONE) {
 +        trace_smmuv3_notify_flag_del(iommu->parent_obj.name);
 +        QLIST_REMOVE(sdev, next);
      }
  }
 --
 .20.1

-[Qemu-devel] [PULL 14/42] target/arm: Implement v7m_update_fpccr()
+[PULL 30/44] target/arm: Implement MVE VMOV to/from 2 general-purpose registers
-Implement the code which updates the FPCCR register on an
+Implement the MVE VMOV forms that move data between 2 general-purpose
-exception entry where we are going to use lazy FP stacking.
+registers and 2 32-bit lanes in a vector register.
 We have to defer to the NVIC to determine whether the
 various exceptions are currently ready or not.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20190416125744.27770-12-peter.maydell@linaro.org
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/cpu.h      | 14 +++++++++
+ target/arm/translate-a32.h |  1 +
- hw/intc/armv7m_nvic.c | 34 ++++++++++++++++++++++
+ target/arm/mve.decode      |  4 ++
- target/arm/helper.c   | 67 ++++++++++++++++++++++++++++++++++++++++++-
+ target/arm/translate-mve.c | 85 ++++++++++++++++++++++++++++++++++++++
-files changed, 114 insertions(+), 1 deletion(-)
+ target/arm/translate-vfp.c |  2 +-
 files changed, 91 insertions(+), 1 deletion(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/translate-a32.h
-+++ b/target/arm/cpu.h
++++ b/target/arm/translate-a32.h
-@@ -XXX,XX +XXX,XX @@ void armv7m_nvic_acknowledge_irq(void *opaque);
+@@ -XXX,XX +XXX,XX @@ void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
-  * (Ignoring -1, this is the same as the RETTOBASE value before completion.)
+ void clear_eci_state(DisasContext *s);
-  */
+ bool mve_eci_check(DisasContext *s);
- int armv7m_nvic_complete_irq(void *opaque, int irq, bool secure);
+ void mve_update_and_store_eci(DisasContext *s);
-+/**
++bool mve_skip_vmov(DisasContext *s, int vn, int index, int size);
-+ * armv7m_nvic_get_ready_status(void *opaque, int irq, bool secure)
-+ * @opaque: the NVIC
+ static inline TCGv_i32 load_cpu_offset(int offset)
-+ * @irq: the exception number to mark pending
+ {
-+ * @secure: false for non-banked exceptions or for the nonsecure
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 + * version of a banked exception, true for the secure version of a banked
 + * exception.
 + *
 + * Return whether an exception is "ready", i.e. whether the exception is
 + * enabled and is configured at a priority which would allow it to
 + * interrupt the current execution priority. This controls whether the
 + * RDY bit for it in the FPCCR is set.
 + */
 +bool armv7m_nvic_get_ready_status(void *opaque, int irq, bool secure);
  /**
   * armv7m_nvic_raw_execution_priority: return the raw execution priority
   * @opaque: the NVIC
 diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/armv7m_nvic.c
+--- a/target/arm/mve.decode
-+++ b/hw/intc/armv7m_nvic.c
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ int armv7m_nvic_complete_irq(void *opaque, int irq, bool secure)
+@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
-     return ret;
+ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
- }
+                  size=2 p=1
-+bool armv7m_nvic_get_ready_status(void *opaque, int irq, bool secure)
++# Moves between 2 32-bit vector lanes and 2 general purpose registers
 +VMOV_to_2gp      1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
 +VMOV_from_2gp    1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
 +
  # Vector 2-op
  VAND             1110 1111 0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
  VBIC             1110 1111 0 . 01 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool do_vabav(DisasContext *s, arg_vabav *a, MVEGenVABAVFn *fn)
  DO_VABAV(VABAV_S, vabavs)
  DO_VABAV(VABAV_U, vabavu)
 +
 +static bool trans_VMOV_to_2gp(DisasContext *s, arg_VMOV_to_2gp *a)
 +{
 +    /*
-+     * Return whether an exception is "ready", i.e. it is enabled and is
++     * VMOV two 32-bit vector lanes to two general-purpose registers.
-+     * configured at a priority which would allow it to interrupt the
++     * This insn is not predicated but it is subject to beat-wise
-+     * current execution priority.
++     * execution if it is not in an IT block. For us this means
-+     *
++     * only that if PSR.ECI says we should not be executing the beat
-+     * irq and secure have the same semantics as for armv7m_nvic_set_pending():
++     * corresponding to the lane of the vector register being accessed
-+     * for non-banked exceptions secure is always false; for banked exceptions
++     * then we should skip perfoming the move, and that we need to do
-+     * it indicates which of the exceptions is required.
++     * the usual check for bad ECI state and advance of ECI state.
 +     * (If PSR.ECI is non-zero then we cannot be in an IT block.)
 +     */
-+    NVICState *s = (NVICState *)opaque;
++    TCGv_i32 tmp;
-+    bool banked = exc_is_banked(irq);
++    int vd;
 +    VecInfo *vec;
 +    int running = nvic_exec_prio(s);
 +
-+    assert(irq > ARMV7M_EXCP_RESET && irq < s->num_irq);
++    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd) ||
-+    assert(!secure || banked);
++        a->rt == 13 || a->rt == 15 || a->rt2 == 13 || a->rt2 == 15 ||
-+
++        a->rt == a->rt2) {
-+    /*
++        /* Rt/Rt2 cases are UNPREDICTABLE */
-+     * HardFault is an odd special case: we always check against -1,
++        return false;
-+     * even if we're secure and HardFault has priority -3; we never
++    }
-+     * need to check for enabled state.
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+     */
++        return true;
 +    if (irq == ARMV7M_EXCP_HARD) {
 +        return running > -1;
 +    }
 +
-+    vec = (banked && secure) ? &s->sec_vectors[irq] : &s->vectors[irq];
++    /* Convert Qreg index to Dreg for read_neon_element32() etc */
 +    vd = a->qd * 2;
 +
-+    return vec->enabled &&
++    if (!mve_skip_vmov(s, vd, a->idx, MO_32)) {
-+        exc_group_prio(s, vec->prio, secure) < running;
++        tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, vd, a->idx, MO_32);
 +        store_reg(s, a->rt, tmp);
 +    }
 +    if (!mve_skip_vmov(s, vd + 1, a->idx, MO_32)) {
 +        tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, vd + 1, a->idx, MO_32);
 +        store_reg(s, a->rt2, tmp);
 +    }
 +
 +    mve_update_and_store_eci(s);
 +    return true;
 +}
 +
- /* callback when external interrupt line is changed */
++static bool trans_VMOV_from_2gp(DisasContext *s, arg_VMOV_to_2gp *a)
  static void set_irq_level(void *opaque, int n, int level)
  {
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void v7m_exception_taken(ARMCPU *cpu, uint32_t lr, bool dotailchain,
      env->thumb = addr & 1;
  }
 +static void v7m_update_fpccr(CPUARMState *env, uint32_t frameptr,
 +                             bool apply_splim)
 +{
 +    /*
-+     * Like the pseudocode UpdateFPCCR: save state in FPCAR and FPCCR
++     * VMOV two general-purpose registers to two 32-bit vector lanes.
-+     * that we will need later in order to do lazy FP reg stacking.
++     * This insn is not predicated but it is subject to beat-wise
 +     * execution if it is not in an IT block. For us this means
 +     * only that if PSR.ECI says we should not be executing the beat
 +     * corresponding to the lane of the vector register being accessed
 +     * then we should skip perfoming the move, and that we need to do
 +     * the usual check for bad ECI state and advance of ECI state.
 +     * (If PSR.ECI is non-zero then we cannot be in an IT block.)
 +     */
-+    bool is_secure = env->v7m.secure;
++    TCGv_i32 tmp;
-+    void *nvic = env->nvic;
++    int vd;
 +    /*
 +     * Some bits are unbanked and live always in fpccr[M_REG_S]; some bits
 +     * are banked and we want to update the bit in the bank for the
 +     * current security state; and in one case we want to specifically
 +     * update the NS banked version of a bit even if we are secure.
 +     */
 +    uint32_t *fpccr_s = &env->v7m.fpccr[M_REG_S];
 +    uint32_t *fpccr_ns = &env->v7m.fpccr[M_REG_NS];
 +    uint32_t *fpccr = &env->v7m.fpccr[is_secure];
 +    bool hfrdy, bfrdy, mmrdy, ns_ufrdy, s_ufrdy, sfrdy, monrdy;
 +
-+    env->v7m.fpcar[is_secure] = frameptr & ~0x7;
++    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd) ||
-+
++        a->rt == 13 || a->rt == 15 || a->rt2 == 13 || a->rt2 == 15) {
-+    if (apply_splim && arm_feature(env, ARM_FEATURE_V8)) {
++        /* Rt/Rt2 cases are UNPREDICTABLE */
-+        bool splimviol;
++        return false;
-+        uint32_t splim = v7m_sp_limit(env);
++    }
-+        bool ign = armv7m_nvic_neg_prio_requested(nvic, is_secure) &&
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+            (env->v7m.ccr[is_secure] & R_V7M_CCR_STKOFHFNMIGN_MASK);
++        return true;
 +
 +        splimviol = !ign && frameptr < splim;
 +        *fpccr = FIELD_DP32(*fpccr, V7M_FPCCR, SPLIMVIOL, splimviol);
 +    }
 +
-+    *fpccr = FIELD_DP32(*fpccr, V7M_FPCCR, LSPACT, 1);
++    /* Convert Qreg idx to Dreg for read_neon_element32() etc */
 +    vd = a->qd * 2;
 +
-+    *fpccr_s = FIELD_DP32(*fpccr_s, V7M_FPCCR, S, is_secure);
++    if (!mve_skip_vmov(s, vd, a->idx, MO_32)) {
 +        tmp = load_reg(s, a->rt);
 +        write_neon_element32(tmp, vd, a->idx, MO_32);
 +        tcg_temp_free_i32(tmp);
 +    }
 +    if (!mve_skip_vmov(s, vd + 1, a->idx, MO_32)) {
 +        tmp = load_reg(s, a->rt2);
 +        write_neon_element32(tmp, vd + 1, a->idx, MO_32);
 +        tcg_temp_free_i32(tmp);
 +    }
 +
-+    *fpccr = FIELD_DP32(*fpccr, V7M_FPCCR, USER, arm_current_el(env) == 0);
++    mve_update_and_store_eci(s);
-+
++    return true;
 +    *fpccr = FIELD_DP32(*fpccr, V7M_FPCCR, THREAD,
 +                        !arm_v7m_is_handler_mode(env));
 +
 +    hfrdy = armv7m_nvic_get_ready_status(nvic, ARMV7M_EXCP_HARD, false);
 +    *fpccr_s = FIELD_DP32(*fpccr_s, V7M_FPCCR, HFRDY, hfrdy);
 +
 +    bfrdy = armv7m_nvic_get_ready_status(nvic, ARMV7M_EXCP_BUS, false);
 +    *fpccr_s = FIELD_DP32(*fpccr_s, V7M_FPCCR, BFRDY, bfrdy);
 +
 +    mmrdy = armv7m_nvic_get_ready_status(nvic, ARMV7M_EXCP_MEM, is_secure);
 +    *fpccr = FIELD_DP32(*fpccr, V7M_FPCCR, MMRDY, mmrdy);
 +
 +    ns_ufrdy = armv7m_nvic_get_ready_status(nvic, ARMV7M_EXCP_USAGE, false);
 +    *fpccr_ns = FIELD_DP32(*fpccr_ns, V7M_FPCCR, UFRDY, ns_ufrdy);
 +
 +    monrdy = armv7m_nvic_get_ready_status(nvic, ARMV7M_EXCP_DEBUG, false);
 +    *fpccr_s = FIELD_DP32(*fpccr_s, V7M_FPCCR, MONRDY, monrdy);
 +
 +    if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
 +        s_ufrdy = armv7m_nvic_get_ready_status(nvic, ARMV7M_EXCP_USAGE, true);
 +        *fpccr_s = FIELD_DP32(*fpccr_s, V7M_FPCCR, UFRDY, s_ufrdy);
 +
 +        sfrdy = armv7m_nvic_get_ready_status(nvic, ARMV7M_EXCP_SECURE, false);
 +        *fpccr_s = FIELD_DP32(*fpccr_s, V7M_FPCCR, SFRDY, sfrdy);
 +    }
 +}
-+
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
- static bool v7m_push_stack(ARMCPU *cpu)
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c
 +++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
      return true;
  }
 -static bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
 +bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
  {
-     /* Do the "set up stack frame" part of exception entry,
+     /*
-@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
+      * In a CPU with MVE, the VMOV (vector lane to general-purpose register)
                  }
              } else {
                  /* Lazy stacking enabled, save necessary info to stack later */
 -                /* TODO : equivalent of UpdateFPCCR() pseudocode */
 +                v7m_update_fpccr(env, frameptr + 0x20, true);
              }
          }
      }
 --
 .20.1

-[Qemu-devel] [PULL 26/42] target/arm: Implement M-profile lazy FP state preservation
+[PULL 31/44] target/arm: Implement MVE VPNOT
-The M-profile architecture floating point system supports
+Implement the MVE VPNOT insn, which inverts the bits in VPR.P0
-lazy FP state preservation, where FP registers are not
+(subject to both predication and to beatwise execution).
 pushed to the stack when an exception occurs but are instead
 only saved if and when the first FP instruction in the exception
 handler is executed. Implement this in QEMU, corresponding
 to the check of LSPACT in the pseudocode ExecuteFPCheck().
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-24-peter.maydell@linaro.org
 ---
- target/arm/cpu.h       |   3 ++
+ target/arm/helper-mve.h    |  1 +
- target/arm/helper.h    |   2 +
+ target/arm/mve.decode      |  1 +
- target/arm/translate.h |   1 +
+ target/arm/mve_helper.c    | 17 +++++++++++++++++
- target/arm/helper.c    | 112 +++++++++++++++++++++++++++++++++++++++++
+ target/arm/translate-mve.c | 19 +++++++++++++++++++
- target/arm/translate.c |  22 ++++++++
+files changed, 38 insertions(+)
 files changed, 140 insertions(+)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/cpu.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- #define EXCP_NOCP           17   /* v7M NOCP UsageFault */
+ DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- #define EXCP_INVSTATE       18   /* v7M INVSTATE UsageFault */
- #define EXCP_STKOF          19   /* v8M STKOF UsageFault */
+ DEF_HELPER_FLAGS_4(mve_vpsel, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-+#define EXCP_LAZYFP         20   /* v7M fault during lazy FP stacking */
++DEF_HELPER_FLAGS_1(mve_vpnot, TCG_CALL_NO_WG, void, env)
- /* NB: add new EXCP_ defines to the array in arm_log_exception() too */
+ DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
- #define ARMV7M_EXCP_RESET   1
+ DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, NS, 6, 1)
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
  FIELD(TBFLAG_A32, VFPEN, 7, 1)
  FIELD(TBFLAG_A32, CONDEXEC, 8, 8)
  FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
 +/* For M profile only, set if FPCCR.LSPACT is set */
 +FIELD(TBFLAG_A32, LSPACT, 18, 1)
  /* For M profile only, set if we must create a new FP context */
  FIELD(TBFLAG_A32, NEW_FP_CTXT_NEEDED, 19, 1)
  /* For M profile only, set if FPCCR.S does not match current security state */
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
+--- a/target/arm/mve.decode
-+++ b/target/arm/helper.h
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(v7m_blxns, void, env, i32)
+@@ -XXX,XX +XXX,XX @@ VCMPGT            1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
+ VCMPLE            1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp
- DEF_HELPER_3(v7m_tt, i32, env, i32, i32)
+ {
-+DEF_HELPER_1(v7m_preserve_fp_state, void, env)
++  VPNOT           1111 1110 0 0 11 000 1 000 0 1111 0100 1101
-+
+   VPST            1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
- DEF_HELPER_2(v8m_stackcheck, void, env, i32)
+   VCMPEQ_scalar   1111 1110 0 . .. ... 1 ... 0 1111 0 1 0 0 .... @vcmp_scalar
+ }
- DEF_HELPER_4(access_check_cp_reg, void, env, ptr, i32, i32)
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.h
+--- a/target/arm/mve_helper.c
-+++ b/target/arm/translate.h
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
+@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vpsel)(CPUARMState *env, void *vd, void *vn, void *vm)
-     bool v8m_stackcheck; /* true if we need to perform v8M stack limit checks */
+     mve_advance_vpt(env);
      bool v8m_fpccr_s_wrong; /* true if v8M FPCCR.S != v8m_secure */
      bool v7m_new_fp_ctxt_needed; /* ASPEN set but no active FP context */
 +    bool v7m_lspact; /* FPCCR.LSPACT set */
      /* Immediate value in AArch32 SVC insn; must be set if is_jmp == DISAS_SWI
       * so that top level loop can generate correct syndrome information.
       */
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_blxns)(CPUARMState *env, uint32_t dest)
      g_assert_not_reached();
  }
-+void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
++void HELPER(mve_vpnot)(CPUARMState *env)
 +{
-+    /* translate.c should never generate calls here in user-only mode */
++    /*
-+    g_assert_not_reached();
++     * P0 bits for unexecuted beats (where eci_mask is 0) are unchanged.
 +     * P0 bits for predicated lanes in executed bits (where mask is 0) are 0.
 +     * P0 bits otherwise are inverted.
 +     * (This is the same logic as VCMP.)
 +     * This insn is itself subject to predication and to beat-wise execution,
 +     * and after it executes VPT state advances in the usual way.
 +     */
 +    uint16_t mask = mve_element_mask(env);
 +    uint16_t eci_mask = mve_eci_mask(env);
 +    uint16_t beatpred = ~env->v7m.vpr & mask;
 +    env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | (beatpred & eci_mask);
 +    mve_advance_vpt(env);
 +}
 +
- uint32_t HELPER(v7m_tt)(CPUARMState *env, uint32_t addr, uint32_t op)
+ #define DO_1OP_SAT(OP, ESIZE, TYPE, FN)                                 \
- {
+     void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
-     /* The TT instructions can be used by unprivileged code, but in
+     {                                                                   \
-@@ -XXX,XX +XXX,XX @@ pend_fault:
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-     return false;
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VPST(DisasContext *s, arg_VPST *a)
      return true;
  }
-+void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
++static bool trans_VPNOT(DisasContext *s, arg_VPNOT *a)
 +{
 +    /*
-+     * Preserve FP state (because LSPACT was set and we are about
++     * Invert the predicate in VPR.P0. We have call out to
-+     * to execute an FP instruction). This corresponds to the
++     * a helper because this insn itself is beatwise and can
-+     * PreserveFPState() pseudocode.
++     * be predicated.
 +     * We may throw an exception if the stacking fails.
 +     */
-+    ARMCPU *cpu = arm_env_get_cpu(env);
++    if (!dc_isar_feature(aa32_mve, s)) {
-+    bool is_secure = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
++        return false;
-+    bool negpri = !(env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_HFRDY_MASK);
++    }
-+    bool is_priv = !(env->v7m.fpccr[is_secure] & R_V7M_FPCCR_USER_MASK);
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+    bool splimviol = env->v7m.fpccr[is_secure] & R_V7M_FPCCR_SPLIMVIOL_MASK;
++        return true;
 +    uint32_t fpcar = env->v7m.fpcar[is_secure];
 +    bool stacked_ok = true;
 +    bool ts = is_secure && (env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_TS_MASK);
 +    bool take_exception;
 +
 +    /* Take the iothread lock as we are going to touch the NVIC */
 +    qemu_mutex_lock_iothread();
 +
 +    /* Check the background context had access to the FPU */
 +    if (!v7m_cpacr_pass(env, is_secure, is_priv)) {
 +        armv7m_nvic_set_pending_lazyfp(env->nvic, ARMV7M_EXCP_USAGE, is_secure);
 +        env->v7m.cfsr[is_secure] |= R_V7M_CFSR_NOCP_MASK;
 +        stacked_ok = false;
 +    } else if (!is_secure && !extract32(env->v7m.nsacr, 10, 1)) {
 +        armv7m_nvic_set_pending_lazyfp(env->nvic, ARMV7M_EXCP_USAGE, M_REG_S);
 +        env->v7m.cfsr[M_REG_S] |= R_V7M_CFSR_NOCP_MASK;
 +        stacked_ok = false;
 +    }
 +
-+    if (!splimviol && stacked_ok) {
++    gen_helper_mve_vpnot(cpu_env);
-+        /* We only stack if the stack limit wasn't violated */
++    mve_update_eci(s);
-+        int i;
++    return true;
 +        ARMMMUIdx mmu_idx;
 +
 +        mmu_idx = arm_v7m_mmu_idx_all(env, is_secure, is_priv, negpri);
 +        for (i = 0; i < (ts ? 32 : 16); i += 2) {
 +            uint64_t dn = *aa32_vfp_dreg(env, i / 2);
 +            uint32_t faddr = fpcar + 4 * i;
 +            uint32_t slo = extract64(dn, 0, 32);
 +            uint32_t shi = extract64(dn, 32, 32);
 +
 +            if (i >= 16) {
 +                faddr += 8; /* skip the slot for the FPSCR */
 +            }
 +            stacked_ok = stacked_ok &&
 +                v7m_stack_write(cpu, faddr, slo, mmu_idx, STACK_LAZYFP) &&
 +                v7m_stack_write(cpu, faddr + 4, shi, mmu_idx, STACK_LAZYFP);
 +        }
 +
 +        stacked_ok = stacked_ok &&
 +            v7m_stack_write(cpu, fpcar + 0x40,
 +                            vfp_get_fpscr(env), mmu_idx, STACK_LAZYFP);
 +    }
 +
 +    /*
 +     * We definitely pended an exception, but it's possible that it
 +     * might not be able to be taken now. If its priority permits us
 +     * to take it now, then we must not update the LSPACT or FP regs,
 +     * but instead jump out to take the exception immediately.
 +     * If it's just pending and won't be taken until the current
 +     * handler exits, then we do update LSPACT and the FP regs.
 +     */
 +    take_exception = !stacked_ok &&
 +        armv7m_nvic_can_take_pending_exception(env->nvic);
 +
 +    qemu_mutex_unlock_iothread();
 +
 +    if (take_exception) {
 +        raise_exception_ra(env, EXCP_LAZYFP, 0, 1, GETPC());
 +    }
 +
 +    env->v7m.fpccr[is_secure] &= ~R_V7M_FPCCR_LSPACT_MASK;
 +
 +    if (ts) {
 +        /* Clear s0 to s31 and the FPSCR */
 +        int i;
 +
 +        for (i = 0; i < 32; i += 2) {
 +            *aa32_vfp_dreg(env, i / 2) = 0;
 +        }
 +        vfp_set_fpscr(env, 0);
 +    }
 +    /*
 +     * Otherwise s0 to s15 and FPSCR are UNKNOWN; we choose to leave them
 +     * unchanged.
 +     */
 +}
 +
- /* Write to v7M CONTROL.SPSEL bit for the specified security bank.
+ static bool trans_VADDV(DisasContext *s, arg_VADDV *a)
-  * This may change the current stack pointer between Main and Process
+ {
-  * stack pointers if it is done for the CONTROL register for the current
+     /* VADDV: vector add across vector */
@@ -XXX,XX +XXX,XX @@ static void arm_log_exception(int idx)
              [EXCP_NOCP] = "v7M NOCP UsageFault",
              [EXCP_INVSTATE] = "v7M INVSTATE UsageFault",
              [EXCP_STKOF] = "v8M STKOF UsageFault",
 +            [EXCP_LAZYFP] = "v7M exception during lazy FP stacking",
          };
          if (idx >= 0 && idx < ARRAY_SIZE(excnames)) {
@@ -XXX,XX +XXX,XX @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
              return;
          }
          break;
 +    case EXCP_LAZYFP:
 +        /*
 +         * We already pended the specific exception in the NVIC in the
 +         * v7m_preserve_fp_state() helper function.
 +         */
 +        break;
      default:
          cpu_abort(cs, "Unhandled exception 0x%x\n", cs->exception_index);
          return; /* Never happens.  Keep compiler happy.  */
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
          flags = FIELD_DP32(flags, TBFLAG_A32, NEW_FP_CTXT_NEEDED, 1);
      }
 +    if (arm_feature(env, ARM_FEATURE_M)) {
 +        bool is_secure = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
 +
 +        if (env->v7m.fpccr[is_secure] & R_V7M_FPCCR_LSPACT_MASK) {
 +            flags = FIELD_DP32(flags, TBFLAG_A32, LSPACT, 1);
 +        }
 +    }
 +
      *pflags = flags;
      *cs_base = 0;
  }
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
      if (arm_dc_feature(s, ARM_FEATURE_M)) {
          /* Handle M-profile lazy FP state mechanics */
 +        /* Trigger lazy-state preservation if necessary */
 +        if (s->v7m_lspact) {
 +            /*
 +             * Lazy state saving affects external memory and also the NVIC,
 +             * so we must mark it as an IO operation for icount.
 +             */
 +            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 +                gen_io_start();
 +            }
 +            gen_helper_v7m_preserve_fp_state(cpu_env);
 +            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 +                gen_io_end();
 +            }
 +            /*
 +             * If the preserve_fp_state helper doesn't throw an exception
 +             * then it will clear LSPACT; we don't need to repeat this for
 +             * any further FP insns in this TB.
 +             */
 +            s->v7m_lspact = false;
 +        }
 +
          /* Update ownership of FP context: set FPCCR.S to match current state */
          if (s->v8m_fpccr_s_wrong) {
              TCGv_i32 tmp;
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
      dc->v8m_fpccr_s_wrong = FIELD_EX32(tb_flags, TBFLAG_A32, FPCCR_S_WRONG);
      dc->v7m_new_fp_ctxt_needed =
          FIELD_EX32(tb_flags, TBFLAG_A32, NEW_FP_CTXT_NEEDED);
 +    dc->v7m_lspact = FIELD_EX32(tb_flags, TBFLAG_A32, LSPACT);
      dc->cp_regs = cpu->cp_regs;
      dc->features = env->features;
 --
 .20.1

-[Qemu-devel] [PULL 07/42] target/arm: Disable most VFP sysregs for M-profile
+[PULL 32/44] target/arm: Implement MVE VCTP
-The only "system register" that M-profile floating point exposes
+Implement the MVE VCTP insn, which sets the VPR.P0 predicate bits so
-via the VMRS/VMRS instructions is FPSCR, and it does not have
+as to predicate any element at index Rn or greater is predicated.  As
-the odd special case for rd==15. Add a check to ensure we only
+with VPNOT, this insn itself is predicable and subject to beatwise
-expose FPSCR.
+execution.
 The calculation of the mask is the same as is used to determine
 ltpmask in mve_element_mask(), but we precalculate masklen in
 generated code to avoid having to have 4 helpers specialized by size.
 We put the decode line in with the low-overhead-loop insns in
 t32.decode because it's logically part of that collection of insn
 patterns, even though it is an MVE only insn.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-5-peter.maydell@linaro.org
 ---
- target/arm/translate.c | 19 +++++++++++++++++--
+ target/arm/helper-mve.h    |  2 ++
-file changed, 17 insertions(+), 2 deletions(-)
+ target/arm/translate-a32.h |  1 +
  target/arm/t32.decode      |  1 +
  target/arm/mve_helper.c    | 20 ++++++++++++++++++++
  target/arm/translate-mve.c |  2 +-
  target/arm/translate.c     | 33 +++++++++++++++++++++++++++++++++
 files changed, 58 insertions(+), 1 deletion(-)
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper-mve.h
++++ b/target/arm/helper-mve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ DEF_HELPER_FLAGS_4(mve_vpsel, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ DEF_HELPER_FLAGS_1(mve_vpnot, TCG_CALL_NO_WG, void, env)
++DEF_HELPER_FLAGS_2(mve_vctp, TCG_CALL_NO_WG, void, env, i32)
++
+ DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+ DEF_HELPER_FLAGS_4(mve_vaddw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a32.h
++++ b/target/arm/translate-a32.h
+@@ -XXX,XX +XXX,XX @@ long neon_element_offset(int reg, int element, MemOp memop);
+ void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
+ void clear_eci_state(DisasContext *s);
+ bool mve_eci_check(DisasContext *s);
++void mve_update_eci(DisasContext *s);
+ void mve_update_and_store_eci(DisasContext *s);
+ bool mve_skip_vmov(DisasContext *s, int vn, int index, int size);
+diff --git a/target/arm/t32.decode b/target/arm/t32.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/t32.decode
++++ b/target/arm/t32.decode
+@@ -XXX,XX +XXX,XX @@ BL               1111 0. .......... 11.1 ............         @branch24
+       # This is DLSTP
+       DLS        1111 0 0000 0 size:2 rn:4 1110 0000 0000 0001
+     }
++    VCTP         1111 0 0000 0 size:2 rn:4 1110 1000 0000 0001
+   ]
+ }
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/mve_helper.c
++++ b/target/arm/mve_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vpnot)(CPUARMState *env)
+     mve_advance_vpt(env);
+ }
++/*
++ * VCTP: P0 unexecuted bits unchanged, predicated bits zeroed,
++ * otherwise set according to value of Rn. The calculation of
++ * newmask here works in the same way as the calculation of the
++ * ltpmask in mve_element_mask(), but we have pre-calculated
++ * the masklen in the generated code.
++ */
++void HELPER(mve_vctp)(CPUARMState *env, uint32_t masklen)
++{
++    uint16_t mask = mve_element_mask(env);
++    uint16_t eci_mask = mve_eci_mask(env);
++    uint16_t newmask;
++
++    assert(masklen <= 16);
++    newmask = masklen ? MAKE_64BIT_MASK(0, masklen) : 0;
++    newmask &= mask;
++    env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | (newmask & eci_mask);
++    mve_advance_vpt(env);
++}
++
+ #define DO_1OP_SAT(OP, ESIZE, TYPE, FN)                                 \
+     void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
+     {                                                                   \
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-mve.c
++++ b/target/arm/translate-mve.c
+@@ -XXX,XX +XXX,XX @@ bool mve_eci_check(DisasContext *s)
+     }
+ }
+-static void mve_update_eci(DisasContext *s)
++void mve_update_eci(DisasContext *s)
+ {
+     /*
+      * The helper function will always update the CPUState field,
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static bool trans_LCTP(DisasContext *s, arg_LCTP *a)
-                     }
+     return true;
-                 }
+ }
-             } else { /* !dp */
-+                bool is_sysreg;
++static bool trans_VCTP(DisasContext *s, arg_VCTP *a)
 +{
 +    /*
 +     * M-profile Create Vector Tail Predicate. This insn is itself
 +     * predicated and is subject to beatwise execution.
 +     */
 +    TCGv_i32 rn_shifted, masklen;
 +
-                 if ((insn & 0x6f) != 0x00)
++    if (!dc_isar_feature(aa32_mve, s) || a->rn == 13 || a->rn == 15) {
-                     return 1;
++        return false;
-                 rn = VFP_SREG_N(insn);
++    }
 +
-+                is_sysreg = extract32(insn, 21, 1);
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+                if (arm_dc_feature(s, ARM_FEATURE_M)) {
++    /*
-+                    /*
++     * We pre-calculate the mask length here to avoid having
-+                     * The only M-profile VFP vmrs/vmsr sysreg is FPSCR.
++     * to have multiple helpers specialized for size.
-+                     * Writes to R15 are UNPREDICTABLE; we choose to undef.
++     * We pass the helper "rn <= (1 << (4 - size)) ? (rn << size) : 16".
-+                     */
++     */
-+                    if (is_sysreg && (rd == 15 || (rn >> 1) != ARM_VFP_FPSCR)) {
++    rn_shifted = tcg_temp_new_i32();
-+                        return 1;
++    masklen = load_reg(s, a->rn);
-+                    }
++    tcg_gen_shli_i32(rn_shifted, masklen, a->size);
-+                }
++    tcg_gen_movcond_i32(TCG_COND_LEU, masklen,
-+
++                        masklen, tcg_constant_i32(1 << (4 - a->size)),
-                 if (insn & ARM_CP_RW_BIT) {
++                        rn_shifted, tcg_constant_i32(16));
-                     /* vfp->arm */
++    gen_helper_mve_vctp(cpu_env, masklen);
--                    if (insn & (1 << 21)) {
++    tcg_temp_free_i32(masklen);
-+                    if (is_sysreg) {
++    tcg_temp_free_i32(rn_shifted);
-                         /* system register */
++    mve_update_eci(s);
-                         rn >>= 1;
++    return true;
++}
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                     }
+ static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
-                 } else {
+ {
                      /* arm->vfp */
 -                    if (insn & (1 << 21)) {
 +                    if (is_sysreg) {
                          rn >>= 1;
                          /* system register */
                          switch (rn) {
 --
 .20.1

-[Qemu-devel] [PULL 03/42] configure: Remove --source-path option
+[PULL 33/44] target/arm: Implement MVE scatter-gather insns
-Normally configure identifies the source path by looking
+Implement the MVE gather-loads and scatter-stores which
-at the location where the configure script itself exists.
+form the address by adding a base value from a scalar
-We also provide a --source-path option which lets the user
+register to an offset in each element of a vector.
 manually override this.
 There isn't really an obvious use case for the --source-path
 option, and in commit 927128222b0a91f56c13a in 2017 we
 accidentally added some logic that looks at $source_path
 before the command line option that overrides it has been
 processed.
 The fact that nobody complained suggests that there isn't
 any use of this option and we aren't testing it either;
 remove it. This allows us to move the "make $source_path
 absolute" logic up so that there is no window in the script
 where $source_path is set but not yet absolute.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20190318134019.23729-1-peter.maydell@linaro.org
 ---
- configure | 10 ++--------
+ target/arm/helper-mve.h    |  32 +++++++++
-file changed, 2 insertions(+), 8 deletions(-)
+ target/arm/mve.decode      |  12 ++++
  target/arm/mve_helper.c    | 129 +++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  97 ++++++++++++++++++++++++++++
 files changed, 270 insertions(+)
-diff --git a/configure b/configure
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100755
+index XXXXXXX..XXXXXXX 100644
---- a/configure
+--- a/target/arm/helper-mve.h
-+++ b/configure
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ ld_has() {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
+ DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
- # default parameters
+ DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
- source_path=$(dirname "$0")
-+# make source path absolute
++DEF_HELPER_FLAGS_4(mve_vldrb_sg_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+source_path=$(cd "$source_path"; pwd)
++DEF_HELPER_FLAGS_4(mve_vldrb_sg_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- cpu=""
++DEF_HELPER_FLAGS_4(mve_vldrh_sg_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- iasl="iasl"
++
- interp_prefix="/usr/gnemul/qemu-%M"
++DEF_HELPER_FLAGS_4(mve_vldrb_sg_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ for opt do
++DEF_HELPER_FLAGS_4(mve_vldrb_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-   ;;
++DEF_HELPER_FLAGS_4(mve_vldrb_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-   --cxx=*) CXX="$optarg"
++DEF_HELPER_FLAGS_4(mve_vldrh_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-   ;;
++DEF_HELPER_FLAGS_4(mve_vldrh_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--  --source-path=*) source_path="$optarg"
++DEF_HELPER_FLAGS_4(mve_vldrw_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--  ;;
++DEF_HELPER_FLAGS_4(mve_vldrd_sg_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-   --cpu=*) cpu="$optarg"
++
-   ;;
++DEF_HELPER_FLAGS_4(mve_vstrb_sg_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-   --extra-cflags=*) QEMU_CFLAGS="$QEMU_CFLAGS $optarg"
++DEF_HELPER_FLAGS_4(mve_vstrb_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ if test "$debug_info" = "yes"; then
++DEF_HELPER_FLAGS_4(mve_vstrb_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     LDFLAGS="-g $LDFLAGS"
++DEF_HELPER_FLAGS_4(mve_vstrh_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- fi
++DEF_HELPER_FLAGS_4(mve_vstrh_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vstrw_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--# make source path absolute
++DEF_HELPER_FLAGS_4(mve_vstrd_sg_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--source_path=$(cd "$source_path"; pwd)
++
--
++DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- # running configure in the source tree?
++
- # we know that's the case if configure is there.
++DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- if test -f "./configure"; then
++DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ for opt do
++DEF_HELPER_FLAGS_4(mve_vldrw_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-   ;;
++DEF_HELPER_FLAGS_4(mve_vldrd_sg_os_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-   --interp-prefix=*) interp_prefix="$optarg"
++
-   ;;
++DEF_HELPER_FLAGS_4(mve_vstrh_sg_os_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--  --source-path=*)
++DEF_HELPER_FLAGS_4(mve_vstrh_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--  ;;
++DEF_HELPER_FLAGS_4(mve_vstrw_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-   --cross-prefix=*)
++DEF_HELPER_FLAGS_4(mve_vstrd_sg_os_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-   ;;
++
-   --cc=*)
+ DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ $(echo Available targets: $default_target_list | \
-   --target-list-exclude=LIST exclude a set of targets from the default target-list
+ DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
- Advanced options (experts only):
+index XXXXXXX..XXXXXXX 100644
--  --source-path=PATH       path of source code [$source_path]
+--- a/target/arm/mve.decode
-   --cross-prefix=PREFIX    use PREFIX for compile tools [$cross_prefix]
++++ b/target/arm/mve.decode
-   --cc=CC                  use C compiler CC [$cc]
+@@ -XXX,XX +XXX,XX @@
-   --iasl=IASL              use ACPI compiler IASL [$iasl]
+ &shl_scalar qda rm size
  &vmaxv qm rda size
  &vabav qn qm rda size
 +&vldst_sg qd qm rn size msize os
 +
 +# scatter-gather memory size is in bits 6:4
 +%sg_msize 6:1 4:1
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
  @vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
 +@vldst_sg .... .... .... rn:4 .... ... size:2 ... ... os:1 &vldst_sg \
 +          qd=%qd qm=%qm msize=%sg_msize
 +
  @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
  @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
  @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
@@ -XXX,XX +XXX,XX @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
  VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
                   size=2 p=1
 +# gather loads/scatter stores
 +VLDR_S_sg        111 0 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
 +VLDR_U_sg        111 1 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
 +VSTR_sg          111 0 1100 1 . 00 .... ... 0 111 . .... .... @vldst_sg
 +
  # Moves between 2 32-bit vector lanes and 2 general purpose registers
  VMOV_to_2gp      1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
  VMOV_from_2gp    1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
  #undef DO_VLDR
  #undef DO_VSTR
 +/*
 + * Gather loads/scatter stores. Here each element of Qm specifies
 + * an offset to use from the base register Rm. In the _os_ versions
 + * that offset is scaled by the element size.
 + * For loads, predicated lanes are zeroed instead of retaining
 + * their previous values.
 + */
 +#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN)            \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        TYPE *d = vd;                                                   \
 +        OFFTYPE *m = vm;                                                \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        uint16_t eci_mask = mve_eci_mask(env);                          \
 +        unsigned e;                                                     \
 +        uint32_t addr;                                                  \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE, eci_mask >>= ESIZE) { \
 +            if (!(eci_mask & 1)) {                                      \
 +                continue;                                               \
 +            }                                                           \
 +            addr = ADDRFN(base, m[H##ESIZE(e)]);                        \
 +            d[H##ESIZE(e)] = (mask & 1) ?                               \
 +                cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0;         \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +/* We know here TYPE is unsigned so always the same as the offset type */
 +#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN)                     \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        TYPE *d = vd;                                                   \
 +        TYPE *m = vm;                                                   \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        uint32_t addr;                                                  \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +            addr = ADDRFN(base, m[H##ESIZE(e)]);                        \
 +            if (mask & 1) {                                             \
 +                cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \
 +            }                                                           \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +/*
 + * 64-bit accesses are slightly different: they are done as two 32-bit
 + * accesses, controlled by the predicate mask for the relevant beat,
 + * and with a single 32-bit offset in the first of the two Qm elements.
 + * Note that for QEMU our IMPDEF AIRCR.ENDIANNESS is always 0 (little).
 + */
 +#define DO_VLDR64_SG(OP, ADDRFN)                                        \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        uint32_t *d = vd;                                               \
 +        uint32_t *m = vm;                                               \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        uint16_t eci_mask = mve_eci_mask(env);                          \
 +        unsigned e;                                                     \
 +        uint32_t addr;                                                  \
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4, eci_mask >>= 4) {      \
 +            if (!(eci_mask & 1)) {                                      \
 +                continue;                                               \
 +            }                                                           \
 +            addr = ADDRFN(base, m[H4(e & ~1)]);                         \
 +            addr += 4 * (e & 1);                                        \
 +            d[H4(e)] = (mask & 1) ? cpu_ldl_data_ra(env, addr, GETPC()) : 0; \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +#define DO_VSTR64_SG(OP, ADDRFN)                                        \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        uint32_t *d = vd;                                               \
 +        uint32_t *m = vm;                                               \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        uint32_t addr;                                                  \
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {                      \
 +            addr = ADDRFN(base, m[H4(e & ~1)]);                         \
 +            addr += 4 * (e & 1);                                        \
 +            if (mask & 1) {                                             \
 +                cpu_stl_data_ra(env, addr, d[H4(e)], GETPC());          \
 +            }                                                           \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +#define ADDR_ADD(BASE, OFFSET) ((BASE) + (OFFSET))
 +#define ADDR_ADD_OSH(BASE, OFFSET) ((BASE) + ((OFFSET) << 1))
 +#define ADDR_ADD_OSW(BASE, OFFSET) ((BASE) + ((OFFSET) << 2))
 +#define ADDR_ADD_OSD(BASE, OFFSET) ((BASE) + ((OFFSET) << 3))
 +
 +DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD)
 +DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD)
 +DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD)
 +
 +DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD)
 +DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD)
 +DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD)
 +DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD)
 +DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD)
 +DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD)
 +DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD)
 +
 +DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH)
 +DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH)
 +DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH)
 +DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW)
 +DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD)
 +
 +DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD)
 +DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD)
 +DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD)
 +DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD)
 +DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD)
 +DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD)
 +DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD)
 +
 +DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH)
 +DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH)
 +DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW)
 +DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD)
 +
  /*
   * The mergemask(D, R, M) macro performs the operation "*D = R" but
   * storing only the bytes which correspond to 1 bits in M,
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static inline int vidup_imm(DisasContext *s, int x)
  #include "decode-mve.c.inc"
  typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 +typedef void MVEGenLdStSGFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
@@ -XXX,XX +XXX,XX @@ DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h, MO_8)
  DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w, MO_8)
  DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w, MO_16)
 +static bool do_ldst_sg(DisasContext *s, arg_vldst_sg *a, MVEGenLdStSGFn fn)
 +{
 +    TCGv_i32 addr;
 +    TCGv_ptr qd, qm;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd | a->qm) ||
 +        !fn || a->rn == 15) {
 +        /* Rn case is UNPREDICTABLE */
 +        return false;
 +    }
 +
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    addr = load_reg(s, a->rn);
 +
 +    qd = mve_qreg_ptr(a->qd);
 +    qm = mve_qreg_ptr(a->qm);
 +    fn(cpu_env, qd, qm, addr);
 +    tcg_temp_free_ptr(qd);
 +    tcg_temp_free_ptr(qm);
 +    tcg_temp_free_i32(addr);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +/*
 + * The naming scheme here is "vldrb_sg_sh == in-memory byte loads
 + * signextended to halfword elements in register". _os_ indicates that
 + * the offsets in Qm should be scaled by the element size.
 + */
 +/* This macro is just to make the arrays more compact in these functions */
 +#define F(N) gen_helper_mve_##N
 +
 +/* VLDRB/VSTRB (ie msize 1) with OS=1 is UNPREDICTABLE; we UNDEF */
 +static bool trans_VLDR_S_sg(DisasContext *s, arg_vldst_sg *a)
 +{
 +    static MVEGenLdStSGFn * const fns[2][4][4] = { {
 +            { NULL, F(vldrb_sg_sh), F(vldrb_sg_sw), NULL },
 +            { NULL, NULL,           F(vldrh_sg_sw), NULL },
 +            { NULL, NULL,           NULL,           NULL },
 +            { NULL, NULL,           NULL,           NULL }
 +        }, {
 +            { NULL, NULL,              NULL,              NULL },
 +            { NULL, NULL,              F(vldrh_sg_os_sw), NULL },
 +            { NULL, NULL,              NULL,              NULL },
 +            { NULL, NULL,              NULL,              NULL }
 +        }
 +    };
 +    if (a->qd == a->qm) {
 +        return false; /* UNPREDICTABLE */
 +    }
 +    return do_ldst_sg(s, a, fns[a->os][a->msize][a->size]);
 +}
 +
 +static bool trans_VLDR_U_sg(DisasContext *s, arg_vldst_sg *a)
 +{
 +    static MVEGenLdStSGFn * const fns[2][4][4] = { {
 +            { F(vldrb_sg_ub), F(vldrb_sg_uh), F(vldrb_sg_uw), NULL },
 +            { NULL,           F(vldrh_sg_uh), F(vldrh_sg_uw), NULL },
 +            { NULL,           NULL,           F(vldrw_sg_uw), NULL },
 +            { NULL,           NULL,           NULL,           F(vldrd_sg_ud) }
 +        }, {
 +            { NULL, NULL,              NULL,              NULL },
 +            { NULL, F(vldrh_sg_os_uh), F(vldrh_sg_os_uw), NULL },
 +            { NULL, NULL,              F(vldrw_sg_os_uw), NULL },
 +            { NULL, NULL,              NULL,              F(vldrd_sg_os_ud) }
 +        }
 +    };
 +    if (a->qd == a->qm) {
 +        return false; /* UNPREDICTABLE */
 +    }
 +    return do_ldst_sg(s, a, fns[a->os][a->msize][a->size]);
 +}
 +
 +static bool trans_VSTR_sg(DisasContext *s, arg_vldst_sg *a)
 +{
 +    static MVEGenLdStSGFn * const fns[2][4][4] = { {
 +            { F(vstrb_sg_ub), F(vstrb_sg_uh), F(vstrb_sg_uw), NULL },
 +            { NULL,           F(vstrh_sg_uh), F(vstrh_sg_uw), NULL },
 +            { NULL,           NULL,           F(vstrw_sg_uw), NULL },
 +            { NULL,           NULL,           NULL,           F(vstrd_sg_ud) }
 +        }, {
 +            { NULL, NULL,              NULL,              NULL },
 +            { NULL, F(vstrh_sg_os_uh), F(vstrh_sg_os_uw), NULL },
 +            { NULL, NULL,              F(vstrw_sg_os_uw), NULL },
 +            { NULL, NULL,              NULL,              F(vstrd_sg_os_ud) }
 +        }
 +    };
 +    return do_ldst_sg(s, a, fns[a->os][a->msize][a->size]);
 +}
 +
 +#undef F
 +
  static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
  {
      TCGv_ptr qd;
 --
 .20.1

-[Qemu-devel] [PULL 11/42] target/arm: Handle SFPA and FPCA bits in reads and writes of CONTROL
+[PULL 34/44] target/arm: Implement MVE scatter-gather immediate forms
-The M-profile CONTROL register has two bits -- SFPA and FPCA --
+Implement the MVE VLDR/VSTR insns which do scatter-gather using base
-which relate to floating-point support, and should be RES0 otherwise.
+addresses from Qm plus or minus an immediate offset (possibly with
-Handle them correctly in the MSR/MRS register access code.
+writeback). Note that writeback is not predicated but it does have
-Neither is banked between security states, so they are stored
+to honour ECI state, so we have to add an eci_mask check to the
-in v7m.control[M_REG_S] regardless of current security state.
+VSTR_SG macros (the VLDR_SG macros already needed this to be able
 to distinguish "skip beat" from "set predicated element to 0").
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-9-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 57 ++++++++++++++++++++++++++++++++++++++-------
+ target/arm/helper-mve.h    |  5 +++
-file changed, 49 insertions(+), 8 deletions(-)
+ target/arm/mve.decode      | 10 +++++
  target/arm/mve_helper.c    | 91 ++++++++++++++++++++++++--------------
  target/arm/translate-mve.c | 72 ++++++++++++++++++++++++++++++
 files changed, 146 insertions(+), 32 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(v7m_mrs)(CPUARMState *env, uint32_t reg)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vstrh_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         return xpsr_read(env) & mask;
+ DEF_HELPER_FLAGS_4(mve_vstrw_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         break;
+ DEF_HELPER_FLAGS_4(mve_vstrd_sg_os_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     case 20: /* CONTROL */
--        return env->v7m.control[env->v7m.secure];
++DEF_HELPER_FLAGS_4(mve_vldrw_sg_wb_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+    {
++DEF_HELPER_FLAGS_4(mve_vldrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        uint32_t value = env->v7m.control[env->v7m.secure];
++DEF_HELPER_FLAGS_4(mve_vstrw_sg_wb_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        if (!env->v7m.secure) {
++DEF_HELPER_FLAGS_4(mve_vstrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+            /* SFPA is RAZ/WI from NS; FPCA is stored in the M_REG_S bank */
++
-+            value |= env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK;
+ DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
-+        }
-+        return value;
+ DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
-+    }
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-     case 0x94: /* CONTROL_NS */
+index XXXXXXX..XXXXXXX 100644
-         /* We have to handle this here because unprivileged Secure code
+--- a/target/arm/mve.decode
-          * can read the NS CONTROL register.
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(v7m_mrs)(CPUARMState *env, uint32_t reg)
+@@ -XXX,XX +XXX,XX @@
-         if (!env->v7m.secure) {
+ &vmaxv qm rda size
-             return 0;
+ &vabav qn qm rda size
-         }
+ &vldst_sg qd qm rn size msize os
--        return env->v7m.control[M_REG_NS];
++&vldst_sg_imm qd qm a w imm
-+        return env->v7m.control[M_REG_NS] |
-+            (env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK);
+ # scatter-gather memory size is in bits 6:4
  %sg_msize 6:1 4:1
@@ -XXX,XX +XXX,XX @@
  @vldst_sg .... .... .... rn:4 .... ... size:2 ... ... os:1 &vldst_sg \
            qd=%qd qm=%qm msize=%sg_msize
 +# Qm is in the fields usually labeled Qn
 +@vldst_sg_imm .... .... a:1 . w:1 . .... .... .... . imm:7 &vldst_sg_imm \
 +              qd=%qd qm=%qn
 +
  @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
  @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
  @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
@@ -XXX,XX +XXX,XX @@ VLDR_S_sg        111 0 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
  VLDR_U_sg        111 1 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
  VSTR_sg          111 0 1100 1 . 00 .... ... 0 111 . .... .... @vldst_sg
 +VLDRW_sg_imm     111 1 1101 ... 1 ... 0 ... 1 1110 .... .... @vldst_sg_imm
 +VLDRD_sg_imm     111 1 1101 ... 1 ... 0 ... 1 1111 .... .... @vldst_sg_imm
 +VSTRW_sg_imm     111 1 1101 ... 0 ... 0 ... 1 1110 .... .... @vldst_sg_imm
 +VSTRD_sg_imm     111 1 1101 ... 0 ... 0 ... 1 1111 .... .... @vldst_sg_imm
 +
  # Moves between 2 32-bit vector lanes and 2 general purpose registers
  VMOV_to_2gp      1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
  VMOV_from_2gp    1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
   * For loads, predicated lanes are zeroed instead of retaining
   * their previous values.
   */
 -#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN)            \
 +#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN, WB)        \
      void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
                            uint32_t base)                                \
      {                                                                   \
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
              addr = ADDRFN(base, m[H##ESIZE(e)]);                        \
              d[H##ESIZE(e)] = (mask & 1) ?                               \
                  cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0;         \
 +            if (WB) {                                                   \
 +                m[H##ESIZE(e)] = addr;                                  \
 +            }                                                           \
          }                                                               \
          mve_advance_vpt(env);                                           \
      }
-     if (el == 0) {
+ /* We know here TYPE is unsigned so always the same as the offset type */
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
+-#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN)                     \
-      */
++#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN, WB)                 \
-     uint32_t mask = extract32(maskreg, 8, 4);
+     void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
-     uint32_t reg = extract32(maskreg, 0, 8);
+                           uint32_t base)                                \
-+    int cur_el = arm_current_el(env);
+     {                                                                   \
+         TYPE *d = vd;                                                   \
--    if (arm_current_el(env) == 0 && reg > 7) {
+         TYPE *m = vm;                                                   \
--        /* only xPSR sub-fields may be written by unprivileged */
+         uint16_t mask = mve_element_mask(env);                          \
-+    if (cur_el == 0 && reg > 7 && reg != 20) {
++        uint16_t eci_mask = mve_eci_mask(env);                          \
-+        /*
+         unsigned e;                                                     \
-+         * only xPSR sub-fields and CONTROL.SFPA may be written by
+         uint32_t addr;                                                  \
-+         * unprivileged code
+-        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
-+         */
++        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE, eci_mask >>= ESIZE) { \
-         return;
++            if (!(eci_mask & 1)) {                                      \
 +                continue;                                               \
 +            }                                                           \
              addr = ADDRFN(base, m[H##ESIZE(e)]);                        \
              if (mask & 1) {                                             \
                  cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \
              }                                                           \
 +            if (WB) {                                                   \
 +                m[H##ESIZE(e)] = addr;                                  \
 +            }                                                           \
          }                                                               \
          mve_advance_vpt(env);                                           \
      }
+@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
+  * accesses, controlled by the predicate mask for the relevant beat,
-                 env->v7m.control[M_REG_NS] &= ~R_V7M_CONTROL_NPRIV_MASK;
+  * and with a single 32-bit offset in the first of the two Qm elements.
-                 env->v7m.control[M_REG_NS] |= val & R_V7M_CONTROL_NPRIV_MASK;
+  * Note that for QEMU our IMPDEF AIRCR.ENDIANNESS is always 0 (little).
-             }
++ * Address writeback happens on the odd beats and updates the address
-+            /*
++ * stored in the even-beat element.
-+             * SFPA is RAZ/WI from NS. FPCA is RO if NSACR.CP10 == 0,
+  */
-+             * RES0 if the FPU is not present, and is stored in the S bank
+-#define DO_VLDR64_SG(OP, ADDRFN)                                        \
-+             */
++#define DO_VLDR64_SG(OP, ADDRFN, WB)                                    \
-+            if (arm_feature(env, ARM_FEATURE_VFP) &&
+     void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
-+                extract32(env->v7m.nsacr, 10, 1)) {
+                           uint32_t base)                                \
-+                env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_FPCA_MASK;
+     {                                                                   \
-+                env->v7m.control[M_REG_S] |= val & R_V7M_CONTROL_FPCA_MASK;
+@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
-+            }
+             addr = ADDRFN(base, m[H4(e & ~1)]);                         \
-             return;
+             addr += 4 * (e & 1);                                        \
-         case 0x98: /* SP_NS */
+             d[H4(e)] = (mask & 1) ? cpu_ldl_data_ra(env, addr, GETPC()) : 0; \
-         {
++            if (WB && (e & 1)) {                                        \
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
++                m[H4(e & ~1)] = addr - 4;                               \
-         env->v7m.faultmask[env->v7m.secure] = val & 1;
++            }                                                           \
-         break;
+         }                                                               \
-     case 20: /* CONTROL */
+         mve_advance_vpt(env);                                           \
--        /* Writing to the SPSEL bit only has an effect if we are in
+     }
-+        /*
-+         * Writing to the SPSEL bit only has an effect if we are in
+-#define DO_VSTR64_SG(OP, ADDRFN)                                        \
-          * thread mode; other bits can be updated by any privileged code.
++#define DO_VSTR64_SG(OP, ADDRFN, WB)                                    \
-          * write_v7m_control_spsel() deals with updating the SPSEL bit in
+     void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm,         \
-          * env->v7m.control, so we only need update the others.
+                           uint32_t base)                                \
-          * For v7M, we must just ignore explicit writes to SPSEL in handler
+     {                                                                   \
-          * mode; for v8M the write is permitted but will have no effect.
+         uint32_t *d = vd;                                               \
-+         * All these bits are writes-ignored from non-privileged code,
+         uint32_t *m = vm;                                               \
-+         * except for SFPA.
+         uint16_t mask = mve_element_mask(env);                          \
-          */
++        uint16_t eci_mask = mve_eci_mask(env);                          \
--        if (arm_feature(env, ARM_FEATURE_V8) ||
+         unsigned e;                                                     \
--            !arm_v7m_is_handler_mode(env)) {
+         uint32_t addr;                                                  \
-+        if (cur_el > 0 && (arm_feature(env, ARM_FEATURE_V8) ||
+-        for (e = 0; e < 16 / 4; e++, mask >>= 4) {                      \
-+                           !arm_v7m_is_handler_mode(env))) {
++        for (e = 0; e < 16 / 4; e++, mask >>= 4, eci_mask >>= 4) {      \
-             write_v7m_control_spsel(env, (val & R_V7M_CONTROL_SPSEL_MASK) != 0);
++            if (!(eci_mask & 1)) {                                      \
-         }
++                continue;                                               \
--        if (arm_feature(env, ARM_FEATURE_M_MAIN)) {
++            }                                                           \
-+        if (cur_el > 0 && arm_feature(env, ARM_FEATURE_M_MAIN)) {
+             addr = ADDRFN(base, m[H4(e & ~1)]);                         \
-             env->v7m.control[env->v7m.secure] &= ~R_V7M_CONTROL_NPRIV_MASK;
+             addr += 4 * (e & 1);                                        \
-             env->v7m.control[env->v7m.secure] |= val & R_V7M_CONTROL_NPRIV_MASK;
+             if (mask & 1) {                                             \
-         }
+                 cpu_stl_data_ra(env, addr, d[H4(e)], GETPC());          \
-+        if (arm_feature(env, ARM_FEATURE_VFP)) {
+             }                                                           \
-+            /*
++            if (WB && (e & 1)) {                                        \
-+             * SFPA is RAZ/WI from NS or if no FPU.
++                m[H4(e & ~1)] = addr - 4;                               \
-+             * FPCA is RO if NSACR.CP10 == 0, RES0 if the FPU is not present.
++            }                                                           \
-+             * Both are stored in the S bank.
+         }                                                               \
-+             */
+         mve_advance_vpt(env);                                           \
-+            if (env->v7m.secure) {
+     }
-+                env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_SFPA_MASK;
+@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
-+                env->v7m.control[M_REG_S] |= val & R_V7M_CONTROL_SFPA_MASK;
+ #define ADDR_ADD_OSW(BASE, OFFSET) ((BASE) + ((OFFSET) << 2))
-+            }
+ #define ADDR_ADD_OSD(BASE, OFFSET) ((BASE) + ((OFFSET) << 3))
-+            if (cur_el > 0 &&
-+                (env->v7m.secure || !arm_feature(env, ARM_FEATURE_M_SECURITY) ||
+-DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD)
-+                 extract32(env->v7m.nsacr, 10, 1))) {
+-DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD)
-+                env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_FPCA_MASK;
+-DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD)
-+                env->v7m.control[M_REG_S] |= val & R_V7M_CONTROL_FPCA_MASK;
++DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD, false)
-+            }
++DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD, false)
-+        }
++DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD, false)
-         break;
-     default:
+-DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD)
-     bad_reg:
+-DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD)
 -DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD)
 -DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD)
 -DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD)
 -DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD)
 -DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD)
 +DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD, false)
 +DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD, false)
 +DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD, false)
 +DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD, false)
 +DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD, false)
 +DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, false)
 +DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD, false)
 -DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH)
 -DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH)
 -DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH)
 -DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW)
 -DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD)
 +DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH, false)
 +DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH, false)
 +DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH, false)
 +DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW, false)
 +DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD, false)
 -DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD)
 -DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD)
 -DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD)
 -DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD)
 -DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD)
 -DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD)
 -DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD)
 +DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD, false)
 +DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD, false)
 +DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD, false)
 +DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD, false)
 +DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD, false)
 +DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD, false)
 +DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD, false)
 -DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH)
 -DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH)
 -DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW)
 -DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD)
 +DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH, false)
 +DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH, false)
 +DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW, false)
 +DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD, false)
 +
 +DO_VLDR_SG(vldrw_sg_wb_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, true)
 +DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true)
 +DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true)
 +DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true)
  /*
   * The mergemask(D, R, M) macro performs the operation "*D = R" but
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSTR_sg(DisasContext *s, arg_vldst_sg *a)
  #undef F
 +static bool do_ldst_sg_imm(DisasContext *s, arg_vldst_sg_imm *a,
 +                           MVEGenLdStSGFn *fn, unsigned msize)
 +{
 +    uint32_t offset;
 +    TCGv_ptr qd, qm;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd | a->qm) ||
 +        !fn) {
 +        return false;
 +    }
 +
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    offset = a->imm << msize;
 +    if (!a->a) {
 +        offset = -offset;
 +    }
 +
 +    qd = mve_qreg_ptr(a->qd);
 +    qm = mve_qreg_ptr(a->qm);
 +    fn(cpu_env, qd, qm, tcg_constant_i32(offset));
 +    tcg_temp_free_ptr(qd);
 +    tcg_temp_free_ptr(qm);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +static bool trans_VLDRW_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
 +{
 +    static MVEGenLdStSGFn * const fns[] = {
 +        gen_helper_mve_vldrw_sg_uw,
 +        gen_helper_mve_vldrw_sg_wb_uw,
 +    };
 +    if (a->qd == a->qm) {
 +        return false; /* UNPREDICTABLE */
 +    }
 +    return do_ldst_sg_imm(s, a, fns[a->w], MO_32);
 +}
 +
 +static bool trans_VLDRD_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
 +{
 +    static MVEGenLdStSGFn * const fns[] = {
 +        gen_helper_mve_vldrd_sg_ud,
 +        gen_helper_mve_vldrd_sg_wb_ud,
 +    };
 +    if (a->qd == a->qm) {
 +        return false; /* UNPREDICTABLE */
 +    }
 +    return do_ldst_sg_imm(s, a, fns[a->w], MO_64);
 +}
 +
 +static bool trans_VSTRW_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
 +{
 +    static MVEGenLdStSGFn * const fns[] = {
 +        gen_helper_mve_vstrw_sg_uw,
 +        gen_helper_mve_vstrw_sg_wb_uw,
 +    };
 +    return do_ldst_sg_imm(s, a, fns[a->w], MO_32);
 +}
 +
 +static bool trans_VSTRD_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
 +{
 +    static MVEGenLdStSGFn * const fns[] = {
 +        gen_helper_mve_vstrd_sg_ud,
 +        gen_helper_mve_vstrd_sg_wb_ud,
 +    };
 +    return do_ldst_sg_imm(s, a, fns[a->w], MO_64);
 +}
 +
  static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
  {
      TCGv_ptr qd;
 --
 .20.1

-[Qemu-devel] [PULL 02/42] hw/ssi/xilinx_spips: Avoid variable length array
+[PULL 35/44] target/arm: Implement MVE interleaving loads/stores
-In the stripe8() function we use a variable length array; however
+Implement the MVE interleaving load/store functions VLD2, VLD4, VST2
-we know that the maximum length required is MAX_NUM_BUSSES. Use
+and VST4.  VLD2 loads 16 bytes of data from memory and writes to 2
-a fixed-length array and an assert instead.
+consecutive Qregs; VLD4 loads 16 bytes of data from memory and writes
 to 4 consecutive Qregs.  The 'pattern' field in the encoding
 determines the offset into memory which is accessed and also which
 elements in the Qregs are written to.  (The intention is that a
 sequence of four consecutive VLD4 with different pattern values
 performs a complete de-interleaving load of 64 bytes into all
 elements of the 4 Qregs.) VST2 and VST4 do the same, but for stores.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
 Message-id: 20190328152635.2794-1-peter.maydell@linaro.org
 ---
- hw/ssi/xilinx_spips.c | 6 ++++--
+ target/arm/helper-mve.h    |  48 ++++++
-file changed, 4 insertions(+), 2 deletions(-)
+ target/arm/mve.decode      |  11 ++
  target/arm/mve_helper.c    | 342 +++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  94 ++++++++++
 files changed, 495 insertions(+)
-diff --git a/hw/ssi/xilinx_spips.c b/hw/ssi/xilinx_spips.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/ssi/xilinx_spips.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/ssi/xilinx_spips.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void xlnx_zynqmp_qspips_reset(DeviceState *d)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vldrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(mve_vstrw_sg_wb_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- static inline void stripe8(uint8_t *x, int num, bool dir)
+ DEF_HELPER_FLAGS_4(mve_vstrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_3(mve_vld20b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld20h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld20w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vld21b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld21h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld21w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vld40b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld40h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld40w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vld41b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld41h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld41w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vld42b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld42h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld42w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vld43b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld43h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vld43w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vst20b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst20h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst20w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vst21b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst21h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst21w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vst40b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst40h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst40w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vst41b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst41h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst41w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vst42b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst42h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst42w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vst43b, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst43h, TCG_CALL_NO_WG, void, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_vst43w, TCG_CALL_NO_WG, void, env, i32, i32)
 +
  DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  &vabav qn qm rda size
  &vldst_sg qd qm rn size msize os
  &vldst_sg_imm qd qm a w imm
 +&vldst_il qd rn size pat w
  # scatter-gather memory size is in bits 6:4
  %sg_msize 6:1 4:1
@@ -XXX,XX +XXX,XX @@
  @vldst_sg_imm .... .... a:1 . w:1 . .... .... .... . imm:7 &vldst_sg_imm \
                qd=%qd qm=%qn
 +# Deinterleaving load/interleaving store
 +@vldst_il .... .... .. w:1 . rn:4 .... ... size:2 pat:2 ..... &vldst_il \
 +          qd=%qd
 +
  @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
  @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
  @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
@@ -XXX,XX +XXX,XX @@ VLDRD_sg_imm     111 1 1101 ... 1 ... 0 ... 1 1111 .... .... @vldst_sg_imm
  VSTRW_sg_imm     111 1 1101 ... 0 ... 0 ... 1 1110 .... .... @vldst_sg_imm
  VSTRD_sg_imm     111 1 1101 ... 0 ... 0 ... 1 1111 .... .... @vldst_sg_imm
 +# deinterleaving loads/interleaving stores
 +VLD2             1111 1100 1 .. 1 .... ... 1 111 .. .. 00000 @vldst_il
 +VLD4             1111 1100 1 .. 1 .... ... 1 111 .. .. 00001 @vldst_il
 +VST2             1111 1100 1 .. 0 .... ... 1 111 .. .. 00000 @vldst_il
 +VST4             1111 1100 1 .. 0 .... ... 1 111 .. .. 00001 @vldst_il
 +
  # Moves between 2 32-bit vector lanes and 2 general purpose registers
  VMOV_to_2gp      1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
  VMOV_from_2gp    1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true)
  DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true)
  DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true)
 +/*
 + * Deinterleaving loads/interleaving stores.
 + *
 + * For these helpers we are passed the index of the first Qreg
 + * (VLD2/VST2 will also access Qn+1, VLD4/VST4 access Qn .. Qn+3)
 + * and the value of the base address register Rn.
 + * The helpers are specialized for pattern and element size, so
 + * for instance vld42h is VLD4 with pattern 2, element size MO_16.
 + *
 + * These insns are beatwise but not predicated, so we must honour ECI,
 + * but need not look at mve_element_mask().
 + *
 + * The pseudocode implements these insns with multiple memory accesses
 + * of the element size, but rules R_VVVG and R_FXDM permit us to make
 + * one 32-bit memory access per beat.
 + */
 +#define DO_VLD4B(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat, e;                                                    \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 4;                                \
 +            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
 +            for (e = 0; e < 4; e++, data >>= 8) {                       \
 +                uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \
 +                qd[H1(off[beat])] = data;                               \
 +            }                                                           \
 +        }                                                               \
 +    }
 +
 +#define DO_VLD4H(OP, O1, O2)                                            \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O1, O2, O2 };               \
 +        uint32_t addr, data;                                            \
 +        int y; /* y counts 0 2 0 2 */                                   \
 +        uint16_t *qd;                                                   \
 +        for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) {   \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 8 + (beat & 1) * 4;               \
 +            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
 +            qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y);             \
 +            qd[H2(off[beat])] = data;                                   \
 +            data >>= 16;                                                \
 +            qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y + 1);         \
 +            qd[H2(off[beat])] = data;                                   \
 +        }                                                               \
 +    }
 +
 +#define DO_VLD4W(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        uint32_t *qd;                                                   \
 +        int y;                                                          \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 4;                                \
 +            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
 +            y = (beat + (O1 & 2)) & 3;                                  \
 +            qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y);             \
 +            qd[H4(off[beat] >> 2)] = data;                              \
 +        }                                                               \
 +    }
 +
 +DO_VLD4B(vld40b, 0, 1, 10, 11)
 +DO_VLD4B(vld41b, 2, 3, 12, 13)
 +DO_VLD4B(vld42b, 4, 5, 14, 15)
 +DO_VLD4B(vld43b, 6, 7, 8, 9)
 +
 +DO_VLD4H(vld40h, 0, 5)
 +DO_VLD4H(vld41h, 1, 6)
 +DO_VLD4H(vld42h, 2, 7)
 +DO_VLD4H(vld43h, 3, 4)
 +
 +DO_VLD4W(vld40w, 0, 1, 10, 11)
 +DO_VLD4W(vld41w, 2, 3, 12, 13)
 +DO_VLD4W(vld42w, 4, 5, 14, 15)
 +DO_VLD4W(vld43w, 6, 7, 8, 9)
 +
 +#define DO_VLD2B(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat, e;                                                    \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        uint8_t *qd;                                                    \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 2;                                \
 +            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
 +            for (e = 0; e < 4; e++, data >>= 8) {                       \
 +                qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1));    \
 +                qd[H1(off[beat] + (e >> 1))] = data;                    \
 +            }                                                           \
 +        }                                                               \
 +    }
 +
 +#define DO_VLD2H(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        int e;                                                          \
 +        uint16_t *qd;                                                   \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 4;                                \
 +            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
 +            for (e = 0; e < 2; e++, data >>= 16) {                      \
 +                qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e);         \
 +                qd[H2(off[beat])] = data;                               \
 +            }                                                           \
 +        }                                                               \
 +    }
 +
 +#define DO_VLD2W(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        uint32_t *qd;                                                   \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat];                                    \
 +            data = cpu_ldl_le_data_ra(env, addr, GETPC());              \
 +            qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1));    \
 +            qd[H4(off[beat] >> 3)] = data;                              \
 +        }                                                               \
 +    }
 +
 +DO_VLD2B(vld20b, 0, 2, 12, 14)
 +DO_VLD2B(vld21b, 4, 6, 8, 10)
 +
 +DO_VLD2H(vld20h, 0, 1, 6, 7)
 +DO_VLD2H(vld21h, 2, 3, 4, 5)
 +
 +DO_VLD2W(vld20w, 0, 4, 24, 28)
 +DO_VLD2W(vld21w, 8, 12, 16, 20)
 +
 +#define DO_VST4B(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat, e;                                                    \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 4;                                \
 +            data = 0;                                                   \
 +            for (e = 3; e >= 0; e--) {                                  \
 +                uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \
 +                data = (data << 8) | qd[H1(off[beat])];                 \
 +            }                                                           \
 +            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
 +        }                                                               \
 +    }
 +
 +#define DO_VST4H(OP, O1, O2)                                            \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O1, O2, O2 };               \
 +        uint32_t addr, data;                                            \
 +        int y; /* y counts 0 2 0 2 */                                   \
 +        uint16_t *qd;                                                   \
 +        for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) {   \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 8 + (beat & 1) * 4;               \
 +            qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y);             \
 +            data = qd[H2(off[beat])];                                   \
 +            qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y + 1);         \
 +            data |= qd[H2(off[beat])] << 16;                            \
 +            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
 +        }                                                               \
 +    }
 +
 +#define DO_VST4W(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        uint32_t *qd;                                                   \
 +        int y;                                                          \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 4;                                \
 +            y = (beat + (O1 & 2)) & 3;                                  \
 +            qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y);             \
 +            data = qd[H4(off[beat] >> 2)];                              \
 +            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
 +        }                                                               \
 +    }
 +
 +DO_VST4B(vst40b, 0, 1, 10, 11)
 +DO_VST4B(vst41b, 2, 3, 12, 13)
 +DO_VST4B(vst42b, 4, 5, 14, 15)
 +DO_VST4B(vst43b, 6, 7, 8, 9)
 +
 +DO_VST4H(vst40h, 0, 5)
 +DO_VST4H(vst41h, 1, 6)
 +DO_VST4H(vst42h, 2, 7)
 +DO_VST4H(vst43h, 3, 4)
 +
 +DO_VST4W(vst40w, 0, 1, 10, 11)
 +DO_VST4W(vst41w, 2, 3, 12, 13)
 +DO_VST4W(vst42w, 4, 5, 14, 15)
 +DO_VST4W(vst43w, 6, 7, 8, 9)
 +
 +#define DO_VST2B(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat, e;                                                    \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        uint8_t *qd;                                                    \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 2;                                \
 +            data = 0;                                                   \
 +            for (e = 3; e >= 0; e--) {                                  \
 +                qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1));    \
 +                data = (data << 8) | qd[H1(off[beat] + (e >> 1))];      \
 +            }                                                           \
 +            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
 +        }                                                               \
 +    }
 +
 +#define DO_VST2H(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        int e;                                                          \
 +        uint16_t *qd;                                                   \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat] * 4;                                \
 +            data = 0;                                                   \
 +            for (e = 1; e >= 0; e--) {                                  \
 +                qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e);         \
 +                data = (data << 16) | qd[H2(off[beat])];                \
 +            }                                                           \
 +            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
 +        }                                                               \
 +    }
 +
 +#define DO_VST2W(OP, O1, O2, O3, O4)                                    \
 +    void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx,             \
 +                          uint32_t base)                                \
 +    {                                                                   \
 +        int beat;                                                       \
 +        uint16_t mask = mve_eci_mask(env);                              \
 +        static const uint8_t off[4] = { O1, O2, O3, O4 };               \
 +        uint32_t addr, data;                                            \
 +        uint32_t *qd;                                                   \
 +        for (beat = 0; beat < 4; beat++, mask >>= 4) {                  \
 +            if ((mask & 1) == 0) {                                      \
 +                /* ECI says skip this beat */                           \
 +                continue;                                               \
 +            }                                                           \
 +            addr = base + off[beat];                                    \
 +            qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1));    \
 +            data = qd[H4(off[beat] >> 3)];                              \
 +            cpu_stl_le_data_ra(env, addr, data, GETPC());               \
 +        }                                                               \
 +    }
 +
 +DO_VST2B(vst20b, 0, 2, 12, 14)
 +DO_VST2B(vst21b, 4, 6, 8, 10)
 +
 +DO_VST2H(vst20h, 0, 1, 6, 7)
 +DO_VST2H(vst21h, 2, 3, 4, 5)
 +
 +DO_VST2W(vst20w, 0, 4, 24, 28)
 +DO_VST2W(vst21w, 8, 12, 16, 20)
 +
  /*
   * The mergemask(D, R, M) macro performs the operation "*D = R" but
   * storing only the bytes which correspond to 1 bits in M,
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static inline int vidup_imm(DisasContext *s, int x)
  typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenLdStSGFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 +typedef void MVEGenLdStIlFn(TCGv_ptr, TCGv_i32, TCGv_i32);
  typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSTRD_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
      return do_ldst_sg_imm(s, a, fns[a->w], MO_64);
  }
 +static bool do_vldst_il(DisasContext *s, arg_vldst_il *a, MVEGenLdStIlFn *fn,
 +                        int addrinc)
 +{
 +    TCGv_i32 rn;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd) ||
 +        !fn || (a->rn == 13 && a->w) || a->rn == 15) {
 +        /* Variously UNPREDICTABLE or UNDEF or related-encoding */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    rn = load_reg(s, a->rn);
 +    /*
 +     * We pass the index of Qd, not a pointer, because the helper must
 +     * access multiple Q registers starting at Qd and working up.
 +     */
 +    fn(cpu_env, tcg_constant_i32(a->qd), rn);
 +
 +    if (a->w) {
 +        tcg_gen_addi_i32(rn, rn, addrinc);
 +        store_reg(s, a->rn, rn);
 +    } else {
 +        tcg_temp_free_i32(rn);
 +    }
 +    mve_update_and_store_eci(s);
 +    return true;
 +}
 +
 +/* This macro is just to make the arrays more compact in these functions */
 +#define F(N) gen_helper_mve_##N
 +
 +static bool trans_VLD2(DisasContext *s, arg_vldst_il *a)
 +{
 +    static MVEGenLdStIlFn * const fns[4][4] = {
 +        { F(vld20b), F(vld20h), F(vld20w), NULL, },
 +        { F(vld21b), F(vld21h), F(vld21w), NULL, },
 +        { NULL, NULL, NULL, NULL },
 +        { NULL, NULL, NULL, NULL },
 +    };
 +    if (a->qd > 6) {
 +        return false;
 +    }
 +    return do_vldst_il(s, a, fns[a->pat][a->size], 32);
 +}
 +
 +static bool trans_VLD4(DisasContext *s, arg_vldst_il *a)
 +{
 +    static MVEGenLdStIlFn * const fns[4][4] = {
 +        { F(vld40b), F(vld40h), F(vld40w), NULL, },
 +        { F(vld41b), F(vld41h), F(vld41w), NULL, },
 +        { F(vld42b), F(vld42h), F(vld42w), NULL, },
 +        { F(vld43b), F(vld43h), F(vld43w), NULL, },
 +    };
 +    if (a->qd > 4) {
 +        return false;
 +    }
 +    return do_vldst_il(s, a, fns[a->pat][a->size], 64);
 +}
 +
 +static bool trans_VST2(DisasContext *s, arg_vldst_il *a)
 +{
 +    static MVEGenLdStIlFn * const fns[4][4] = {
 +        { F(vst20b), F(vst20h), F(vst20w), NULL, },
 +        { F(vst21b), F(vst21h), F(vst21w), NULL, },
 +        { NULL, NULL, NULL, NULL },
 +        { NULL, NULL, NULL, NULL },
 +    };
 +    if (a->qd > 6) {
 +        return false;
 +    }
 +    return do_vldst_il(s, a, fns[a->pat][a->size], 32);
 +}
 +
 +static bool trans_VST4(DisasContext *s, arg_vldst_il *a)
 +{
 +    static MVEGenLdStIlFn * const fns[4][4] = {
 +        { F(vst40b), F(vst40h), F(vst40w), NULL, },
 +        { F(vst41b), F(vst41h), F(vst41w), NULL, },
 +        { F(vst42b), F(vst42h), F(vst42w), NULL, },
 +        { F(vst43b), F(vst43h), F(vst43w), NULL, },
 +    };
 +    if (a->qd > 4) {
 +        return false;
 +    }
 +    return do_vldst_il(s, a, fns[a->pat][a->size], 64);
 +}
 +
 +#undef F
 +
  static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
  {
--    uint8_t r[num];
+     TCGv_ptr qd;
 -    memset(r, 0, sizeof(uint8_t) * num);
 +    uint8_t r[MAX_NUM_BUSSES];
      int idx[2] = {0, 0};
      int bit[2] = {0, 7};
      int d = dir;
 +    assert(num <= MAX_NUM_BUSSES);
 +    memset(r, 0, sizeof(uint8_t) * num);
 +
      for (idx[0] = 0; idx[0] < num; ++idx[0]) {
          for (bit[0] = 7; bit[0] >= 0; bit[0]--) {
              r[idx[!d]] |= x[idx[d]] & 1 << bit[d] ? 1 << bit[!d] : 0;
 --
 .20.1

-[Qemu-devel] [PULL 10/42] target/arm: Clear CONTROL_S.SFPA in SG insn if FPU present
+[PULL 36/44] target/arm: Re-indent sdiv and udiv helpers
-If the floating point extension is present, then the SG instruction
+We're about to make a code change to the sdiv and udiv helper
-must clear the CONTROL_S.SFPA bit. Implement this.
+functions, so first fix their indentation and coding style.
 (On a no-FPU system the bit will always be zero, so we don't need
 to make the clearing of the bit conditional on ARM_FEATURE_VFP.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-8-peter.maydell@linaro.org
+Message-id: 20210730151636.17254-2-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 1 +
+ target/arm/helper.c | 15 +++++++++------
-file changed, 1 insertion(+)
+file changed, 9 insertions(+), 6 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static bool v7m_handle_execute_nsc(ARMCPU *cpu)
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(uxtb16)(uint32_t x)
-     qemu_log_mask(CPU_LOG_INT, "...really an SG instruction at 0x%08" PRIx32
-                   ", executing it\n", env->regs[15]);
+ int32_t HELPER(sdiv)(int32_t num, int32_t den)
-     env->regs[14] &= ~1;
+ {
-+    env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_SFPA_MASK;
+-    if (den == 0)
-     switch_v7m_security_state(env, true);
+-      return 0;
-     xpsr_write(env, 0, XPSR_IT);
+-    if (num == INT_MIN && den == -1)
-     env->regs[15] += 4;
+-      return INT_MIN;
 +    if (den == 0) {
 +        return 0;
 +    }
 +    if (num == INT_MIN && den == -1) {
 +        return INT_MIN;
 +    }
      return num / den;
  }
  uint32_t HELPER(udiv)(uint32_t num, uint32_t den)
  {
 -    if (den == 0)
 -      return 0;
 +    if (den == 0) {
 +        return 0;
 +    }
      return num / den;
  }
 --
 .20.1

-[Qemu-devel] [PULL 27/42] target/arm: Implement VLSTM for v7M CPUs with an FPU
+[PULL 37/44] target/arm: Implement M-profile trapping on division by zero
-Implement the VLSTM instruction for v7M for the FPU present case.
+Unlike A-profile, for M-profile the UDIV and SDIV insns can be
 configured to raise an exception on division by zero, using the CCR
 DIV_0_TRP bit.
 Implement support for setting this bit by making the helper functions
 raise the appropriate exception.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190416125744.27770-25-peter.maydell@linaro.org
+Message-id: 20210730151636.17254-3-peter.maydell@linaro.org
 ---
- target/arm/cpu.h       |  2 +
+ target/arm/cpu.h       |  1 +
- target/arm/helper.h    |  2 +
+ target/arm/helper.h    |  4 ++--
- target/arm/helper.c    | 84 ++++++++++++++++++++++++++++++++++++++++++
+ target/arm/helper.c    | 19 +++++++++++++++++--
- target/arm/translate.c | 15 +++++++-
+ target/arm/m_helper.c  |  4 ++++
-files changed, 102 insertions(+), 1 deletion(-)
+ target/arm/translate.c |  4 ++--
 files changed, 26 insertions(+), 6 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
 @@ -XXX,XX +XXX,XX @@
- #define EXCP_INVSTATE       18   /* v7M INVSTATE UsageFault */
- #define EXCP_STKOF          19   /* v8M STKOF UsageFault */
  #define EXCP_LAZYFP         20   /* v7M fault during lazy FP stacking */
-+#define EXCP_LSERR          21   /* v8M LSERR SecureFault */
+ #define EXCP_LSERR          21   /* v8M LSERR SecureFault */
-+#define EXCP_UNALIGNED      22   /* v7M UNALIGNED UsageFault */
+ #define EXCP_UNALIGNED      22   /* v7M UNALIGNED UsageFault */
 +#define EXCP_DIVBYZERO      23   /* v7M DIVBYZERO UsageFault */
  /* NB: add new EXCP_ defines to the array in arm_log_exception() too */
  #define ARMV7M_EXCP_RESET   1
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(v7m_tt, i32, env, i32, i32)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(add_saturate, i32, env, i32, i32)
+ DEF_HELPER_3(sub_saturate, i32, env, i32, i32)
- DEF_HELPER_1(v7m_preserve_fp_state, void, env)
+ DEF_HELPER_3(add_usaturate, i32, env, i32, i32)
+ DEF_HELPER_3(sub_usaturate, i32, env, i32, i32)
-+DEF_HELPER_2(v7m_vlstm, void, env, i32)
+-DEF_HELPER_FLAGS_2(sdiv, TCG_CALL_NO_RWG_SE, s32, s32, s32)
-+
+-DEF_HELPER_FLAGS_2(udiv, TCG_CALL_NO_RWG_SE, i32, i32, i32)
- DEF_HELPER_2(v8m_stackcheck, void, env, i32)
++DEF_HELPER_FLAGS_3(sdiv, TCG_CALL_NO_RWG, s32, env, s32, s32)
++DEF_HELPER_FLAGS_3(udiv, TCG_CALL_NO_RWG, i32, env, i32, i32)
- DEF_HELPER_4(access_check_cp_reg, void, env, ptr, i32, i32)
+ DEF_HELPER_FLAGS_1(rbit, TCG_CALL_NO_RWG_SE, i32, i32)
  #define PAS_OP(pfx)  \
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sxtb16)(uint32_t x)
-     g_assert_not_reached();
+     return res;
  }
-+void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
++static void handle_possible_div0_trap(CPUARMState *env, uintptr_t ra)
 +{
-+    /* translate.c should never generate calls here in user-only mode */
++    /*
-+    g_assert_not_reached();
++     * Take a division-by-zero exception if necessary; otherwise return
 +     * to get the usual non-trapping division behaviour (result of 0)
 +     */
 +    if (arm_feature(env, ARM_FEATURE_M)
 +        && (env->v7m.ccr[env->v7m.secure] & R_V7M_CCR_DIV_0_TRP_MASK)) {
 +        raise_exception_ra(env, EXCP_DIVBYZERO, 0, 1, ra);
 +    }
 +}
 +
- uint32_t HELPER(v7m_tt)(CPUARMState *env, uint32_t addr, uint32_t op)
+ uint32_t HELPER(uxtb16)(uint32_t x)
  {
-     /* The TT instructions can be used by unprivileged code, but in
+     uint32_t res;
-@@ -XXX,XX +XXX,XX @@ static void v7m_update_fpccr(CPUARMState *env, uint32_t frameptr,
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(uxtb16)(uint32_t x)
      return res;
  }
 -int32_t HELPER(sdiv)(int32_t num, int32_t den)
 +int32_t HELPER(sdiv)(CPUARMState *env, int32_t num, int32_t den)
  {
      if (den == 0) {
 +        handle_possible_div0_trap(env, GETPC());
          return 0;
      }
+     if (num == INT_MIN && den == -1) {
+@@ -XXX,XX +XXX,XX @@ int32_t HELPER(sdiv)(int32_t num, int32_t den)
+     return num / den;
  }
-+void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
+-uint32_t HELPER(udiv)(uint32_t num, uint32_t den)
-+{
++uint32_t HELPER(udiv)(CPUARMState *env, uint32_t num, uint32_t den)
 +    /* fptr is the value of Rn, the frame pointer we store the FP regs to */
 +    bool s = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
 +    bool lspact = env->v7m.fpccr[s] & R_V7M_FPCCR_LSPACT_MASK;
 +
 +    assert(env->v7m.secure);
 +
 +    if (!(env->v7m.control[M_REG_S] & R_V7M_CONTROL_SFPA_MASK)) {
 +        return;
 +    }
 +
 +    /* Check access to the coprocessor is permitted */
 +    if (!v7m_cpacr_pass(env, true, arm_current_el(env) != 0)) {
 +        raise_exception_ra(env, EXCP_NOCP, 0, 1, GETPC());
 +    }
 +
 +    if (lspact) {
 +        /* LSPACT should not be active when there is active FP state */
 +        raise_exception_ra(env, EXCP_LSERR, 0, 1, GETPC());
 +    }
 +
 +    if (fptr & 7) {
 +        raise_exception_ra(env, EXCP_UNALIGNED, 0, 1, GETPC());
 +    }
 +
 +    /*
 +     * Note that we do not use v7m_stack_write() here, because the
 +     * accesses should not set the FSR bits for stacking errors if they
 +     * fail. (In pseudocode terms, they are AccType_NORMAL, not AccType_STACK
 +     * or AccType_LAZYFP). Faults in cpu_stl_data() will throw exceptions
 +     * and longjmp out.
 +     */
 +    if (!(env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_LSPEN_MASK)) {
 +        bool ts = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_TS_MASK;
 +        int i;
 +
 +        for (i = 0; i < (ts ? 32 : 16); i += 2) {
 +            uint64_t dn = *aa32_vfp_dreg(env, i / 2);
 +            uint32_t faddr = fptr + 4 * i;
 +            uint32_t slo = extract64(dn, 0, 32);
 +            uint32_t shi = extract64(dn, 32, 32);
 +
 +            if (i >= 16) {
 +                faddr += 8; /* skip the slot for the FPSCR */
 +            }
 +            cpu_stl_data(env, faddr, slo);
 +            cpu_stl_data(env, faddr + 4, shi);
 +        }
 +        cpu_stl_data(env, fptr + 0x40, vfp_get_fpscr(env));
 +
 +        /*
 +         * If TS is 0 then s0 to s15 and FPSCR are UNKNOWN; we choose to
 +         * leave them unchanged, matching our choice in v7m_preserve_fp_state.
 +         */
 +        if (ts) {
 +            for (i = 0; i < 32; i += 2) {
 +                *aa32_vfp_dreg(env, i / 2) = 0;
 +            }
 +            vfp_set_fpscr(env, 0);
 +        }
 +    } else {
 +        v7m_update_fpccr(env, fptr, false);
 +    }
 +
 +    env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_FPCA_MASK;
 +}
 +
  static bool v7m_push_stack(ARMCPU *cpu)
  {
-     /* Do the "set up stack frame" part of exception entry,
+     if (den == 0) {
-@@ -XXX,XX +XXX,XX @@ static void arm_log_exception(int idx)
++        handle_possible_div0_trap(env, GETPC());
-             [EXCP_INVSTATE] = "v7M INVSTATE UsageFault",
+         return 0;
-             [EXCP_STKOF] = "v8M STKOF UsageFault",
+     }
      return num / den;
@@ -XXX,XX +XXX,XX @@ void arm_log_exception(int idx)
              [EXCP_LAZYFP] = "v7M exception during lazy FP stacking",
-+            [EXCP_LSERR] = "v8M LSERR UsageFault",
+             [EXCP_LSERR] = "v8M LSERR UsageFault",
-+            [EXCP_UNALIGNED] = "v7M UNALIGNED UsageFault",
+             [EXCP_UNALIGNED] = "v7M UNALIGNED UsageFault",
 +            [EXCP_DIVBYZERO] = "v7M DIVBYZERO UsageFault",
          };
          if (idx >= 0 && idx < ARRAY_SIZE(excnames)) {
+diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/m_helper.c
++++ b/target/arm/m_helper.c
 @@ -XXX,XX +XXX,XX @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
          armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
-         env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_STKOF_MASK;
+         env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_UNALIGNED_MASK;
          break;
-+    case EXCP_LSERR:
++    case EXCP_DIVBYZERO:
 +        armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SECURE, false);
 +        env->v7m.sfsr |= R_V7M_SFSR_LSERR_MASK;
 +        break;
 +    case EXCP_UNALIGNED:
 +        armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
-+        env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_UNALIGNED_MASK;
++        env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_DIVBYZERO_MASK;
 +        break;
      case EXCP_SWI:
          /* The PC already points to the next instruction.  */
          armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SVC, env->v7m.secure);
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static bool op_div(DisasContext *s, arg_rrr *a, bool u)
-                 if (!s->v8m_secure || (insn & 0x0040f0ff)) {
+     t1 = load_reg(s, a->rn);
-                     goto illegal_op;
+     t2 = load_reg(s, a->rm);
-                 }
+     if (u) {
--                /* Just NOP since FP support is not implemented */
+-        gen_helper_udiv(t1, t1, t2);
-+
++        gen_helper_udiv(t1, cpu_env, t1, t2);
-+                if (arm_dc_feature(s, ARM_FEATURE_VFP)) {
+     } else {
-+                    TCGv_i32 fptr = load_reg(s, rn);
+-        gen_helper_sdiv(t1, t1, t2);
-+
++        gen_helper_sdiv(t1, cpu_env, t1, t2);
-+                    if (extract32(insn, 20, 1)) {
+     }
-+                        /* VLLDM */
+     tcg_temp_free_i32(t2);
-+                    } else {
+     store_reg(s, a->rd, t1);
 +                        gen_helper_v7m_vlstm(cpu_env, fptr);
 +                    }
 +                    tcg_temp_free_i32(fptr);
 +
 +                    /* End the TB, because we have updated FP control bits */
 +                    s->base.is_jmp = DISAS_UPDATE;
 +                }
                  break;
              }
              if (arm_dc_feature(s, ARM_FEATURE_VFP) &&
 --
 .20.1

-[Qemu-devel] [PULL 36/42] hw/devices: Move CBus declarations into a new header
+[PULL 38/44] target/arm: kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+From: Hamza Mahfooz <someguy@effective-light.com>
-Reviewed-by: Thomas Huth <thuth@redhat.com>
+As per commit 5626f8c6d468 ("rcu: Add automatically released rcu_read_lock
-Reviewed-by: Markus Armbruster <armbru@redhat.com>
+variants"), RCU_READ_LOCK_GUARD() should be used instead of
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+rcu_read_{un}lock().
-Message-id: 20190412165416.7977-7-philmd@redhat.com
 Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
 Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
 Message-id: 20210727235201.11491-1-someguy@effective-light.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/devices.h   | 14 --------------
+ target/arm/kvm.c | 17 ++++++++---------
- include/hw/misc/cbus.h | 32 ++++++++++++++++++++++++++++++++
+file changed, 8 insertions(+), 9 deletions(-)
  hw/arm/nseries.c       |  1 +
  hw/misc/cbus.c         |  2 +-
  MAINTAINERS            |  1 +
 files changed, 35 insertions(+), 15 deletions(-)
  create mode 100644 include/hw/misc/cbus.h
-diff --git a/include/hw/devices.h b/include/hw/devices.h
+diff --git a/target/arm/kvm.c b/target/arm/kvm.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/devices.h
+--- a/target/arm/kvm.c
-+++ b/include/hw/devices.h
++++ b/target/arm/kvm.c
-@@ -XXX,XX +XXX,XX @@ void tsc2005_set_transform(void *opaque, MouseTransformInfo *info);
+@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
- /* stellaris_input.c */
+     hwaddr xlat, len, doorbell_gpa;
- void stellaris_gamepad_init(int n, qemu_irq *irq, const int *keycode);
+     MemoryRegionSection mrs;
+     MemoryRegion *mr;
--/* cbus.c */
+-    int ret = 1;
--typedef struct {
--    qemu_irq clk;
+     if (as == &address_space_memory) {
--    qemu_irq dat;
+         return 0;
--    qemu_irq sel;
+@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
--} CBus;
--CBus *cbus_init(qemu_irq dat_out);
+     /* MSI doorbell address is translated by an IOMMU */
--void cbus_attach(CBus *bus, void *slave_opaque);
 -    rcu_read_lock();
 +    RCU_READ_LOCK_GUARD();
 +
      mr = address_space_translate(as, address, &xlat, &len, true,
                                   MEMTXATTRS_UNSPECIFIED);
 +
      if (!mr) {
 -        goto unlock;
 +        return 1;
      }
 +
      mrs = memory_region_find(mr, xlat, 1);
 +
      if (!mrs.mr) {
 -        goto unlock;
 +        return 1;
      }
      doorbell_gpa = mrs.offset_within_address_space;
@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
      trace_kvm_arm_fixup_msi_route(address, doorbell_gpa);
 -    ret = 0;
 -
--void *retu_init(qemu_irq irq, int vilma);
+-unlock:
--void *tahvo_init(qemu_irq irq, int betty);
+-    rcu_read_unlock();
--
+-    return ret;
--void retu_key_event(void *retu, int state);
++    return 0;
--
+ }
- #endif
-diff --git a/include/hw/misc/cbus.h b/include/hw/misc/cbus.h
+ int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/misc/cbus.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * CBUS three-pin bus and the Retu / Betty / Tahvo / Vilma / Avilma /
 + * Hinku / Vinku / Ahne / Pihi chips used in various Nokia platforms.
 + * Based on reverse-engineering of a linux driver.
 + *
 + * Copyright (C) 2008 Nokia Corporation
 + * Written by Andrzej Zaborowski
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
 +#ifndef HW_MISC_CBUS_H
 +#define HW_MISC_CBUS_H
 +
 +#include "hw/irq.h"
 +
 +typedef struct {
 +    qemu_irq clk;
 +    qemu_irq dat;
 +    qemu_irq sel;
 +} CBus;
 +
 +CBus *cbus_init(qemu_irq dat_out);
 +void cbus_attach(CBus *bus, void *slave_opaque);
 +
 +void *retu_init(qemu_irq irq, int vilma);
 +void *tahvo_init(qemu_irq irq, int betty);
 +
 +void retu_key_event(void *retu, int state);
 +
 +#endif
 diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/nseries.c
 +++ b/hw/arm/nseries.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/i2c/i2c.h"
  #include "hw/devices.h"
  #include "hw/display/blizzard.h"
 +#include "hw/misc/cbus.h"
  #include "hw/misc/tmp105.h"
  #include "hw/block/flash.h"
  #include "hw/hw.h"
 diff --git a/hw/misc/cbus.c b/hw/misc/cbus.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/cbus.c
 +++ b/hw/misc/cbus.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/osdep.h"
  #include "hw/hw.h"
  #include "hw/irq.h"
 -#include "hw/devices.h"
 +#include "hw/misc/cbus.h"
  #include "sysemu/sysemu.h"
  //#define DEBUG
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: hw/input/tsc2005.c
  F: hw/misc/cbus.c
  F: hw/timer/twl92230.c
  F: include/hw/display/blizzard.h
 +F: include/hw/misc/cbus.h
  Palm
  M: Andrzej Zaborowski <balrogg@gmail.com>
 --
 .20.1

-[Qemu-devel] [PULL 40/42] hw/net/ne2000-isa: Add guards to the header
+[PULL 39/44] hw/char/pl011: add support for sending break
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+From: Jan Luebbe <jlu@pengutronix.de>
-Reviewed-by: Thomas Huth <thuth@redhat.com>
+Break events are currently only handled by chardev/char-serial.c, so we
-Reviewed-by: Markus Armbruster <armbru@redhat.com>
+just ignore errors, which results in no behaviour change for other
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+chardevs.
-Message-id: 20190412165416.7977-11-philmd@redhat.com
 Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
 Message-id: 20210806144700.3751979-1-jlu@pengutronix.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/net/ne2000-isa.h | 6 ++++++
+ hw/char/pl011.c | 6 ++++++
 file changed, 6 insertions(+)
-diff --git a/include/hw/net/ne2000-isa.h b/include/hw/net/ne2000-isa.h
+diff --git a/hw/char/pl011.c b/hw/char/pl011.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/net/ne2000-isa.h
+--- a/hw/char/pl011.c
-+++ b/include/hw/net/ne2000-isa.h
++++ b/hw/char/pl011.c
 @@ -XXX,XX +XXX,XX @@
-  * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ #include "hw/qdev-properties-system.h"
-  * See the COPYING file in the top-level directory.
+ #include "migration/vmstate.h"
-  */
+ #include "chardev/char-fe.h"
-+
++#include "chardev/char-serial.h"
-+#ifndef HW_NET_NE2K_ISA_H
+ #include "qemu/log.h"
-+#define HW_NET_NE2K_ISA_H
+ #include "qemu/module.h"
-+
+ #include "trace.h"
- #include "hw/hw.h"
+@@ -XXX,XX +XXX,XX @@ static void pl011_write(void *opaque, hwaddr offset,
- #include "hw/qdev.h"
+             s->read_count = 0;
- #include "hw/isa/isa.h"
+             s->read_pos = 0;
-@@ -XXX,XX +XXX,XX @@ static inline ISADevice *isa_ne2000_init(ISABus *bus, int base, int irq,
+         }
-     }
++        if ((s->lcr ^ value) & 0x1) {
-     return d;
++            int break_enable = value & 0x1;
- }
++            qemu_chr_fe_ioctl(&s->chr, CHR_IOCTL_SERIAL_SET_BREAK,
-+
++                              &break_enable);
-+#endif
++        }
          s->lcr = value;
          pl011_set_read_trigger(s);
          break;
 --
 .20.1

-[Qemu-devel] [PULL 35/42] hw/devices: Move Blizzard declarations into a new header
+[PULL 40/44] fsl-imx6ul: Instantiate SAI1/2/3 and ASRC as unimplemented devices
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+From: Guenter Roeck <linux@roeck-us.net>
-Add an entries the Blizzard device in MAINTAINERS.
+Instantiate SAI1/2/3 and ASRC as unimplemented devices to avoid random
 Linux kernel crashes, such as
-Reviewed-by: Thomas Huth <thuth@redhat.com>
+Unhandled fault: external abort on non-linefetch (0x808) at 0xd1580010
-Reviewed-by: Markus Armbruster <armbru@redhat.com>
+pgd = (ptrval)
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+[d1580010] *pgd=8231b811, *pte=02034653, *ppte=02034453
-Message-id: 20190412165416.7977-6-philmd@redhat.com
+Internal error: : 808 [#1] SMP ARM
 ...
 [<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
 [<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
 [<c09580f4>] (_regmap_write) from [<c095837c>] (_regmap_update_bits+0xe4/0xec)
 [<c095837c>] (_regmap_update_bits) from [<c09599b4>] (regmap_update_bits_base+0x50/0x74)
 [<c09599b4>] (regmap_update_bits_base) from [<c0d3e9e4>] (fsl_asrc_runtime_resume+0x1e4/0x21c)
 [<c0d3e9e4>] (fsl_asrc_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
 [<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
 [<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
 [<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
 [<c0942dfc>] (__pm_runtime_resume) from [<c0d3ecc4>] (fsl_asrc_probe+0x2a8/0x708)
 [<c0d3ecc4>] (fsl_asrc_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
 [<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
 [<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
 [<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
 [<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
 [<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
 [<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
 [<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
 [<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
 [<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
 [<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
 [<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)
 or
 Unhandled fault: external abort on non-linefetch (0x808) at 0xd19b0000
 pgd = (ptrval)
 [d19b0000] *pgd=82711811, *pte=308a0653, *ppte=308a0453
 Internal error: : 808 [#1] SMP ARM
 ...
 [<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
 [<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
 [<c09580f4>] (_regmap_write) from [<c0959b28>] (regmap_write+0x3c/0x60)
 [<c0959b28>] (regmap_write) from [<c0d41130>] (fsl_sai_runtime_resume+0x9c/0x1ec)
 [<c0d41130>] (fsl_sai_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
 [<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
 [<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
 [<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
 [<c0942dfc>] (__pm_runtime_resume) from [<c0d4231c>] (fsl_sai_probe+0x2b8/0x65c)
 [<c0d4231c>] (fsl_sai_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
 [<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
 [<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
 [<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
 [<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
 [<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
 [<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
 [<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
 [<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
 [<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
 [<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
 [<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Guenter Roeck <linux@roeck-us.net>
 Message-id: 20210810160318.87376-1-linux@roeck-us.net
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/devices.h          |  7 -------
+ hw/arm/fsl-imx6ul.c | 12 ++++++++++++
- include/hw/display/blizzard.h | 22 ++++++++++++++++++++++
+file changed, 12 insertions(+)
  hw/arm/nseries.c              |  1 +
  hw/display/blizzard.c         |  2 +-
  MAINTAINERS                   |  2 ++
 files changed, 26 insertions(+), 8 deletions(-)
  create mode 100644 include/hw/display/blizzard.h
-diff --git a/include/hw/devices.h b/include/hw/devices.h
+diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/devices.h
+--- a/hw/arm/fsl-imx6ul.c
-+++ b/include/hw/devices.h
++++ b/hw/arm/fsl-imx6ul.c
-@@ -XXX,XX +XXX,XX @@ void tsc2005_set_transform(void *opaque, MouseTransformInfo *info);
+@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
- /* stellaris_input.c */
+      */
- void stellaris_gamepad_init(int n, qemu_irq *irq, const int *keycode);
+     create_unimplemented_device("sdma", FSL_IMX6UL_SDMA_ADDR, 0x4000);
--/* blizzard.c */
++    /*
--void *s1d13745_init(qemu_irq gpio_int);
++     * SAI (Audio SSI (Synchronous Serial Interface))
--void s1d13745_write(void *opaque, int dc, uint16_t value);
++     */
--void s1d13745_write_block(void *opaque, int dc,
++    create_unimplemented_device("sai1", FSL_IMX6UL_SAI1_ADDR, 0x4000);
--                void *buf, size_t len, int pitch);
++    create_unimplemented_device("sai2", FSL_IMX6UL_SAI2_ADDR, 0x4000);
--uint16_t s1d13745_read(void *opaque, int dc);
++    create_unimplemented_device("sai3", FSL_IMX6UL_SAI3_ADDR, 0x4000);
 -
  /* cbus.c */
  typedef struct {
      qemu_irq clk;
 diff --git a/include/hw/display/blizzard.h b/include/hw/display/blizzard.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/display/blizzard.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * Epson S1D13744/S1D13745 (Blizzard/Hailstorm/Tornado) LCD/TV controller.
 + *
 + * Copyright (C) 2008 Nokia Corporation
 + * Written by Andrzej Zaborowski
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
-+#ifndef HW_DISPLAY_BLIZZARD_H
+     /*
-+#define HW_DISPLAY_BLIZZARD_H
+      * PWM
       */
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
      create_unimplemented_device("pwm3", FSL_IMX6UL_PWM3_ADDR, 0x4000);
      create_unimplemented_device("pwm4", FSL_IMX6UL_PWM4_ADDR, 0x4000);
 +    /*
 +     * Audio ASRC (asynchronous sample rate converter)
 +     */
 +    create_unimplemented_device("asrc", FSL_IMX6UL_ASRC_ADDR, 0x4000);
 +
-+#include "hw/irq.h"
+     /*
-+
+      * CAN
-+void *s1d13745_init(qemu_irq gpio_int);
+      */
 +void s1d13745_write(void *opaque, int dc, uint16_t value);
 +void s1d13745_write_block(void *opaque, int dc,
 +                          void *buf, size_t len, int pitch);
 +uint16_t s1d13745_read(void *opaque, int dc);
 +
 +#endif
 diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/nseries.c
 +++ b/hw/arm/nseries.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/boards.h"
  #include "hw/i2c/i2c.h"
  #include "hw/devices.h"
 +#include "hw/display/blizzard.h"
  #include "hw/misc/tmp105.h"
  #include "hw/block/flash.h"
  #include "hw/hw.h"
 diff --git a/hw/display/blizzard.c b/hw/display/blizzard.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/display/blizzard.c
 +++ b/hw/display/blizzard.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/osdep.h"
  #include "qemu-common.h"
  #include "ui/console.h"
 -#include "hw/devices.h"
 +#include "hw/display/blizzard.h"
  #include "ui/pixel_ops.h"
  typedef void (*blizzard_fn_t)(uint8_t *, const uint8_t *, unsigned int);
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ M: Peter Maydell <peter.maydell@linaro.org>
  L: qemu-arm@nongnu.org
  S: Odd Fixes
  F: hw/arm/nseries.c
 +F: hw/display/blizzard.c
  F: hw/input/lm832x.c
  F: hw/input/tsc2005.c
  F: hw/misc/cbus.c
  F: hw/timer/twl92230.c
 +F: include/hw/display/blizzard.h
  Palm
  M: Andrzej Zaborowski <balrogg@gmail.com>
 --
 .20.1

-[Qemu-devel] [PULL 41/42] hw/net/lan9118: Export TYPE_LAN9118 and use it instead of hardcoded string
+[PULL 41/44] hw/dma/pl330: Add memory region to replace default
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+From: "Wen, Jianxian" <Jianxian.Wen@verisilicon.com>
-Reviewed-by: Markus Armbruster <armbru@redhat.com>
+Add property memory region which can connect with IOMMU region to support SMMU translate.
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20190412165416.7977-12-philmd@redhat.com
+Signed-off-by: Jianxian Wen <jianxian.wen@verisilicon.com>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 4C23C17B8E87E74E906A25A3254A03F4FA1FEC31@SHASXM03.verisilicon.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/net/lan9118.h | 2 ++
+ hw/arm/exynos4210.c  |  3 +++
- hw/arm/exynos4_boards.c  | 3 ++-
+ hw/arm/xilinx_zynq.c |  3 +++
- hw/arm/mps2-tz.c         | 3 ++-
+ hw/dma/pl330.c       | 26 ++++++++++++++++++++++----
- hw/net/lan9118.c         | 1 -
+files changed, 28 insertions(+), 4 deletions(-)
 files changed, 6 insertions(+), 3 deletions(-)
-diff --git a/include/hw/net/lan9118.h b/include/hw/net/lan9118.h
+diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/net/lan9118.h
+--- a/hw/arm/exynos4210.c
-+++ b/include/hw/net/lan9118.h
++++ b/hw/arm/exynos4210.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static DeviceState *pl330_create(uint32_t base, qemu_or_irq *orgate,
- #include "hw/irq.h"
+     int i;
- #include "net/net.h"
+     dev = qdev_new("pl330");
-+#define TYPE_LAN9118 "lan9118"
++    object_property_set_link(OBJECT(dev), "memory",
 +                             OBJECT(get_system_memory()),
 +                             &error_fatal);
      qdev_prop_set_uint8(dev, "num_events", nevents);
      qdev_prop_set_uint8(dev, "num_chnls",  8);
      qdev_prop_set_uint8(dev, "num_periph_req",  nreq);
 diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xilinx_zynq.c
 +++ b/hw/arm/xilinx_zynq.c
@@ -XXX,XX +XXX,XX @@ static void zynq_init(MachineState *machine)
      sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[39-IRQ_OFFSET]);
      dev = qdev_new("pl330");
 +    object_property_set_link(OBJECT(dev), "memory",
 +                             OBJECT(address_space_mem),
 +                             &error_fatal);
      qdev_prop_set_uint8(dev, "num_chnls",  8);
      qdev_prop_set_uint8(dev, "num_periph_req",  4);
      qdev_prop_set_uint8(dev, "num_events",  16);
 diff --git a/hw/dma/pl330.c b/hw/dma/pl330.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/dma/pl330.c
 +++ b/hw/dma/pl330.c
@@ -XXX,XX +XXX,XX @@ struct PL330State {
      uint8_t num_faulting;
      uint8_t periph_busy[PL330_PERIPH_NUM];
 +    /* Memory region that DMA operation access */
 +    MemoryRegion *mem_mr;
 +    AddressSpace *mem_as;
  };
  #define TYPE_PL330 "pl330"
@@ -XXX,XX +XXX,XX @@ static inline const PL330InsnDesc *pl330_fetch_insn(PL330Chan *ch)
      uint8_t opcode;
      int i;
 -    dma_memory_read(&address_space_memory, ch->pc, &opcode, 1);
 +    dma_memory_read(ch->parent->mem_as, ch->pc, &opcode, 1);
      for (i = 0; insn_desc[i].size; i++) {
          if ((opcode & insn_desc[i].opmask) == insn_desc[i].opcode) {
              return &insn_desc[i];
@@ -XXX,XX +XXX,XX @@ static inline void pl330_exec_insn(PL330Chan *ch, const PL330InsnDesc *insn)
      uint8_t buf[PL330_INSN_MAXSIZE];
      assert(insn->size <= PL330_INSN_MAXSIZE);
 -    dma_memory_read(&address_space_memory, ch->pc, buf, insn->size);
 +    dma_memory_read(ch->parent->mem_as, ch->pc, buf, insn->size);
      insn->exec(ch, buf[0], &buf[1], insn->size - 1);
  }
@@ -XXX,XX +XXX,XX @@ static int pl330_exec_cycle(PL330Chan *channel)
      if (q != NULL && q->len <= pl330_fifo_num_free(&s->fifo)) {
          int len = q->len - (q->addr & (q->len - 1));
 -        dma_memory_read(&address_space_memory, q->addr, buf, len);
 +        dma_memory_read(s->mem_as, q->addr, buf, len);
          trace_pl330_exec_cycle(q->addr, len);
          if (trace_event_get_state_backends(TRACE_PL330_HEXDUMP)) {
              pl330_hexdump(buf, len);
@@ -XXX,XX +XXX,XX @@ static int pl330_exec_cycle(PL330Chan *channel)
              fifo_res = pl330_fifo_get(&s->fifo, buf, len, q->tag);
          }
          if (fifo_res == PL330_FIFO_OK || q->z) {
 -            dma_memory_write(&address_space_memory, q->addr, buf, len);
 +            dma_memory_write(s->mem_as, q->addr, buf, len);
              trace_pl330_exec_cycle(q->addr, len);
              if (trace_event_get_state_backends(TRACE_PL330_HEXDUMP)) {
                  pl330_hexdump(buf, len);
@@ -XXX,XX +XXX,XX @@ static void pl330_realize(DeviceState *dev, Error **errp)
                            "dma", PL330_IOMEM_SIZE);
      sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iomem);
 +    if (!s->mem_mr) {
 +        error_setg(errp, "'memory' link is not set");
 +        return;
 +    } else if (s->mem_mr == get_system_memory()) {
 +        /* Avoid creating new AS for system memory. */
 +        s->mem_as = &address_space_memory;
 +    } else {
 +        s->mem_as = g_new0(AddressSpace, 1);
 +        address_space_init(s->mem_as, s->mem_mr,
 +                           memory_region_name(s->mem_mr));
 +    }
 +
- void lan9118_init(NICInfo *, uint32_t, qemu_irq);
+     s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, pl330_exec_cycle_timer, s);
- #endif
+     s->cfg[0] = (s->mgr_ns_at_rst ? 0x4 : 0) |
-diff --git a/hw/arm/exynos4_boards.c b/hw/arm/exynos4_boards.c
+@@ -XXX,XX +XXX,XX @@ static Property pl330_properties[] = {
-index XXXXXXX..XXXXXXX 100644
+     DEFINE_PROP_UINT8("rd_q_dep", PL330State, rd_q_dep, 16),
---- a/hw/arm/exynos4_boards.c
+     DEFINE_PROP_UINT16("data_buffer_dep", PL330State, data_buffer_dep, 256),
-+++ b/hw/arm/exynos4_boards.c
-@@ -XXX,XX +XXX,XX @@
++    DEFINE_PROP_LINK("memory", PL330State, mem_mr,
- #include "hw/arm/arm.h"
++                     TYPE_MEMORY_REGION, MemoryRegion *),
- #include "exec/address-spaces.h"
++
- #include "hw/arm/exynos4210.h"
+     DEFINE_PROP_END_OF_LIST(),
 +#include "hw/net/lan9118.h"
  #include "hw/boards.h"
  #undef DEBUG
@@ -XXX,XX +XXX,XX @@ static void lan9215_init(uint32_t base, qemu_irq irq)
      /* This should be a 9215 but the 9118 is close enough */
      if (nd_table[0].used) {
          qemu_check_nic_model(&nd_table[0], "lan9118");
 -        dev = qdev_create(NULL, "lan9118");
 +        dev = qdev_create(NULL, TYPE_LAN9118);
          qdev_set_nic_properties(dev, &nd_table[0]);
          qdev_prop_set_uint32(dev, "mode_16bit", 1);
          qdev_init_nofail(dev);
 diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/mps2-tz.c
 +++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/arm/armsse.h"
  #include "hw/dma/pl080.h"
  #include "hw/ssi/pl022.h"
 +#include "hw/net/lan9118.h"
  #include "net/net.h"
  #include "hw/core/split-irq.h"
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_eth_dev(MPS2TZMachineState *mms, void *opaque,
       * except that it doesn't support the checksum-offload feature.
       */
      qemu_check_nic_model(nd, "lan9118");
 -    mms->lan9118 = qdev_create(NULL, "lan9118");
 +    mms->lan9118 = qdev_create(NULL, TYPE_LAN9118);
      qdev_set_nic_properties(mms->lan9118, nd);
      qdev_init_nofail(mms->lan9118);
 diff --git a/hw/net/lan9118.c b/hw/net/lan9118.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/net/lan9118.c
 +++ b/hw/net/lan9118.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_lan9118_packet = {
      }
  };
--#define TYPE_LAN9118 "lan9118"
- #define LAN9118(obj) OBJECT_CHECK(lan9118_state, (obj), TYPE_LAN9118)
- typedef struct {
 --
 .20.1

-[Qemu-devel] [PULL 33/42] hw/display/tc6393xb: Remove unused functions
+[PULL 42/44] sbsa-ref: Rename SBSA_GWDT enum value
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+From: Eduardo Habkost <ehabkost@redhat.com>
-No code used the tc6393xb_gpio_in_get() and tc6393xb_gpio_out_set()
+The SBSA_GWDT enum value conflicts with the SBSA_GWDT() QOM type
-functions since their introduction in commit 88d2c950b002. Time to
+checking helper, preventing us from using a OBJECT_DEFINE* or
-remove them.
+DEFINE_INSTANCE_CHECKER macro for the SBSA_GWDT() wrapper.
-Suggested-by: Markus Armbruster <armbru@redhat.com>
+If I understand the SBSA 6.0 specification correctly, the signal
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+being connected to IRQ 16 is the WS0 output signal from the
-Message-id: 20190412165416.7977-4-philmd@redhat.com
+Generic Watchdog.  Rename the enum value to SBSA_GWDT_WS0 to be
 more explicit and avoid the name conflict.
 Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
 Message-id: 20210806023119.431680-1-ehabkost@redhat.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/devices.h  |  3 ---
+ hw/arm/sbsa-ref.c | 6 +++---
- hw/display/tc6393xb.c | 16 ----------------
+file changed, 3 insertions(+), 3 deletions(-)
 files changed, 19 deletions(-)
-diff --git a/include/hw/devices.h b/include/hw/devices.h
+diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/devices.h
+--- a/hw/arm/sbsa-ref.c
-+++ b/include/hw/devices.h
++++ b/hw/arm/sbsa-ref.c
-@@ -XXX,XX +XXX,XX @@ void retu_key_event(void *retu, int state);
+@@ -XXX,XX +XXX,XX @@ enum {
- typedef struct TC6393xbState TC6393xbState;
+     SBSA_GIC_DIST,
- TC6393xbState *tc6393xb_init(struct MemoryRegion *sysmem,
+     SBSA_GIC_REDIST,
-                              uint32_t base, qemu_irq irq);
+     SBSA_SECURE_EC,
--void tc6393xb_gpio_out_set(TC6393xbState *s, int line,
+-    SBSA_GWDT,
--                    qemu_irq handler);
++    SBSA_GWDT_WS0,
--qemu_irq *tc6393xb_gpio_in_get(TC6393xbState *s);
+     SBSA_GWDT_REFRESH,
- qemu_irq tc6393xb_l3v_get(TC6393xbState *s);
+     SBSA_GWDT_CONTROL,
+     SBSA_SMMU,
- #endif
+@@ -XXX,XX +XXX,XX @@ static const int sbsa_ref_irqmap[] = {
-diff --git a/hw/display/tc6393xb.c b/hw/display/tc6393xb.c
+     [SBSA_AHCI] = 10,
-index XXXXXXX..XXXXXXX 100644
+     [SBSA_EHCI] = 11,
---- a/hw/display/tc6393xb.c
+     [SBSA_SMMU] = 12, /* ... to 15 */
-+++ b/hw/display/tc6393xb.c
+-    [SBSA_GWDT] = 16,
-@@ -XXX,XX +XXX,XX @@ struct TC6393xbState {
++    [SBSA_GWDT_WS0] = 16,
               blanked : 1;
  };
--qemu_irq *tc6393xb_gpio_in_get(TC6393xbState *s)
+ static const char * const valid_cpus[] = {
--{
+@@ -XXX,XX +XXX,XX @@ static void create_wdt(const SBSAMachineState *sms)
--    return s->gpio_in;
+     hwaddr cbase = sbsa_ref_memmap[SBSA_GWDT_CONTROL].base;
--}
+     DeviceState *dev = qdev_new(TYPE_WDT_SBSA);
--
+     SysBusDevice *s = SYS_BUS_DEVICE(dev);
- static void tc6393xb_gpio_set(void *opaque, int line, int level)
+-    int irq = sbsa_ref_irqmap[SBSA_GWDT];
- {
++    int irq = sbsa_ref_irqmap[SBSA_GWDT_WS0];
- //    TC6393xbState *s = opaque;
-@@ -XXX,XX +XXX,XX @@ static void tc6393xb_gpio_set(void *opaque, int line, int level)
+     sysbus_realize_and_unref(s, &error_fatal);
-     // FIXME: how does the chip reflect the GPIO input level change?
+     sysbus_mmio_map(s, 0, rbase);
  }
 -void tc6393xb_gpio_out_set(TC6393xbState *s, int line,
 -                    qemu_irq handler)
 -{
 -    if (line >= TC6393XB_GPIOS) {
 -        fprintf(stderr, "TC6393xb: no GPIO pin %d\n", line);
 -        return;
 -    }
 -
 -    s->handler[line] = handler;
 -}
 -
  static void tc6393xb_gpio_handler_update(TC6393xbState *s)
  {
      uint32_t level, diff;
 --
 .20.1

-[Qemu-devel] [PULL 30/42] hw/dma: Compile the bcm2835_dma device as common object
+[PULL 43/44] fsl-imx7: Instantiate SAI1/2/3 as unimplemented devices
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+From: Guenter Roeck <linux@roeck-us.net>
-This device is used by both ARM (BCM2836, for raspi2) and AArch64
+Instantiate SAI1/2/3 as unimplemented devices to avoid Linux kernel crashes
-(BCM2837, for raspi3) targets, and is not CPU-specific.
+such as the following.
 Move it to common object, so we build it once for all targets.
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Unhandled fault: external abort on non-linefetch (0x808) at 0xd19b0000
-Message-id: 20190427133028.12874-1-philmd@redhat.com
+pgd = (ptrval)
 [d19b0000] *pgd=82711811, *pte=308a0653, *ppte=308a0453
 Internal error: : 808 [#1] SMP ARM
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc5 #1
 ...
 [<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
 [<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
 [<c09580f4>] (_regmap_write) from [<c0959b28>] (regmap_write+0x3c/0x60)
 [<c0959b28>] (regmap_write) from [<c0d41130>] (fsl_sai_runtime_resume+0x9c/0x1ec)
 [<c0d41130>] (fsl_sai_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
 [<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
 [<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
 [<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
 [<c0942dfc>] (__pm_runtime_resume) from [<c0d4231c>] (fsl_sai_probe+0x2b8/0x65c)
 [<c0d4231c>] (fsl_sai_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
 [<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
 [<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
 [<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
 [<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
 [<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
 [<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
 [<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
 [<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
 [<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
 [<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
 [<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)
 Signed-off-by: Guenter Roeck <linux@roeck-us.net>
 Message-id: 20210810175607.538090-1-linux@roeck-us.net
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/dma/Makefile.objs | 2 +-
+ include/hw/arm/fsl-imx7.h | 5 +++++
-file changed, 1 insertion(+), 1 deletion(-)
+ hw/arm/fsl-imx7.c         | 7 +++++++
 files changed, 12 insertions(+)
-diff --git a/hw/dma/Makefile.objs b/hw/dma/Makefile.objs
+diff --git a/include/hw/arm/fsl-imx7.h b/include/hw/arm/fsl-imx7.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/dma/Makefile.objs
+--- a/include/hw/arm/fsl-imx7.h
-+++ b/hw/dma/Makefile.objs
++++ b/include/hw/arm/fsl-imx7.h
-@@ -XXX,XX +XXX,XX @@ common-obj-$(CONFIG_XLNX_ZYNQMP_ARM) += xlnx-zdma.o
+@@ -XXX,XX +XXX,XX @@ enum FslIMX7MemoryMap {
+     FSL_IMX7_UART6_ADDR           = 0x30A80000,
- obj-$(CONFIG_OMAP) += omap_dma.o soc_dma.o
+     FSL_IMX7_UART7_ADDR           = 0x30A90000,
- obj-$(CONFIG_PXA2XX) += pxa2xx_dma.o
--obj-$(CONFIG_RASPI) += bcm2835_dma.o
++    FSL_IMX7_SAI1_ADDR            = 0x308A0000,
-+common-obj-$(CONFIG_RASPI) += bcm2835_dma.o
++    FSL_IMX7_SAI2_ADDR            = 0x308B0000,
 +    FSL_IMX7_SAI3_ADDR            = 0x308C0000,
 +    FSL_IMX7_SAIn_SIZE            = 0x10000,
 +
      FSL_IMX7_ENET1_ADDR           = 0x30BE0000,
      FSL_IMX7_ENET2_ADDR           = 0x30BF0000,
 diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/fsl-imx7.c
 +++ b/hw/arm/fsl-imx7.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
      create_unimplemented_device("can1", FSL_IMX7_CAN1_ADDR, FSL_IMX7_CANn_SIZE);
      create_unimplemented_device("can2", FSL_IMX7_CAN2_ADDR, FSL_IMX7_CANn_SIZE);
 +    /*
 +     * SAI (Audio SSI (Synchronous Serial Interface))
 +     */
 +    create_unimplemented_device("sai1", FSL_IMX7_SAI1_ADDR, FSL_IMX7_SAIn_SIZE);
 +    create_unimplemented_device("sai2", FSL_IMX7_SAI2_ADDR, FSL_IMX7_SAIn_SIZE);
 +    create_unimplemented_device("sai2", FSL_IMX7_SAI3_ADDR, FSL_IMX7_SAIn_SIZE);
 +
      /*
       * OCOTP
       */
 --
 .20.1

-[Qemu-devel] [PULL 34/42] hw/devices: Move TC6393XB declarations into a new header
+[PULL 44/44] docs: Document how to use gdb with unix sockets
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+From: Sebastian Meyer <meyer@absint.com>
-Reviewed-by: Markus Armbruster <armbru@redhat.com>
+With gdb 9.0 and better it is possible to connect to a gdbstub
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+over unix sockets, which is better than a TCP socket connection
-Message-id: 20190412165416.7977-5-philmd@redhat.com
+in some situations. The QEMU command line to set this up is
 non-obvious; document it.
 Signed-off-by: Sebastian Meyer <meyer@absint.com>
 Message-id: 162867284829.27377.4784930719350564918-0@git.sr.ht
 [PMM: Tweaked commit message; adjusted wording in a couple of
 places; fixed rST formatting issue; moved section up out of
 the 'advanced debugging options' subsection]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/devices.h          |  6 ------
+ docs/system/gdb.rst | 26 +++++++++++++++++++++++++-
- include/hw/display/tc6393xb.h | 24 ++++++++++++++++++++++++
+file changed, 25 insertions(+), 1 deletion(-)
  hw/arm/tosa.c                 |  2 +-
  hw/display/tc6393xb.c         |  2 +-
  MAINTAINERS                   |  1 +
 files changed, 27 insertions(+), 8 deletions(-)
  create mode 100644 include/hw/display/tc6393xb.h
-diff --git a/include/hw/devices.h b/include/hw/devices.h
+diff --git a/docs/system/gdb.rst b/docs/system/gdb.rst
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/devices.h
+--- a/docs/system/gdb.rst
-+++ b/include/hw/devices.h
++++ b/docs/system/gdb.rst
-@@ -XXX,XX +XXX,XX @@ void *tahvo_init(qemu_irq irq, int betty);
+@@ -XXX,XX +XXX,XX @@ The ``-s`` option will make QEMU listen for an incoming connection
+ from gdb on TCP port 1234, and ``-S`` will make QEMU not start the
- void retu_key_event(void *retu, int state);
+ guest until you tell it to from gdb. (If you want to specify which
+ TCP port to use or to use something other than TCP for the gdbstub
--/* tc6393xb.c */
+-connection, use the ``-gdb dev`` option instead of ``-s``.)
--typedef struct TC6393xbState TC6393xbState;
++connection, use the ``-gdb dev`` option instead of ``-s``. See
--TC6393xbState *tc6393xb_init(struct MemoryRegion *sysmem,
++`Using unix sockets`_ for an example.)
--                             uint32_t base, qemu_irq irq);
--qemu_irq tc6393xb_l3v_get(TC6393xbState *s);
+ .. parsed-literal::
--
- #endif
+@@ -XXX,XX +XXX,XX @@ not just those in the cluster you are currently working on::
-diff --git a/include/hw/display/tc6393xb.h b/include/hw/display/tc6393xb.h
-new file mode 100644
+   (gdb) set schedule-multiple on
-index XXXXXXX..XXXXXXX
---- /dev/null
++Using unix sockets
-+++ b/include/hw/display/tc6393xb.h
++==================
@@ -XXX,XX +XXX,XX @@
 +/*
 + * Toshiba TC6393XB I/O Controller.
 + * Found in Sharp Zaurus SL-6000 (tosa) or some
 + * Toshiba e-Series PDAs.
 + *
 + * Copyright (c) 2007 Hervé Poussineau
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
-+#ifndef HW_DISPLAY_TC6393XB_H
++An alternate method for connecting gdb to the QEMU gdbstub is to use
-+#define HW_DISPLAY_TC6393XB_H
++a unix socket (if supported by your operating system). This is useful when
 +running several tests in parallel, or if you do not have a known free TCP
 +port (e.g. when running automated tests).
 +
-+#include "exec/memory.h"
++First create a chardev with the appropriate options, then
-+#include "hw/irq.h"
++instruct the gdbserver to use that device:
 +
-+typedef struct TC6393xbState TC6393xbState;
++.. parsed-literal::
 +
-+TC6393xbState *tc6393xb_init(struct MemoryRegion *sysmem,
++   |qemu_system| -chardev socket,path=/tmp/gdb-socket,server=on,wait=off,id=gdb0 -gdb chardev:gdb0 -S ...
 +                             uint32_t base, qemu_irq irq);
 +qemu_irq tc6393xb_l3v_get(TC6393xbState *s);
 +
-+#endif
++Start gdb as before, but this time connect using the path to
-diff --git a/hw/arm/tosa.c b/hw/arm/tosa.c
++the socket::
-index XXXXXXX..XXXXXXX 100644
++
---- a/hw/arm/tosa.c
++   (gdb) target remote /tmp/gdb-socket
-+++ b/hw/arm/tosa.c
++
-@@ -XXX,XX +XXX,XX @@
++Note that to use a unix socket for the connection you will need
- #include "hw/hw.h"
++gdb version 9.0 or newer.
- #include "hw/arm/pxa.h"
++
- #include "hw/arm/arm.h"
+ Advanced debugging options
--#include "hw/devices.h"
+ ==========================
- #include "hw/arm/sharpsl.h"
  #include "hw/pcmcia.h"
  #include "hw/boards.h"
 +#include "hw/display/tc6393xb.h"
  #include "hw/i2c/i2c.h"
  #include "hw/ssi/ssi.h"
  #include "hw/sysbus.h"
 diff --git a/hw/display/tc6393xb.c b/hw/display/tc6393xb.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/display/tc6393xb.c
 +++ b/hw/display/tc6393xb.c
@@ -XXX,XX +XXX,XX @@
  #include "qapi/error.h"
  #include "qemu/host-utils.h"
  #include "hw/hw.h"
 -#include "hw/devices.h"
 +#include "hw/display/tc6393xb.h"
  #include "hw/block/flash.h"
  #include "ui/console.h"
  #include "ui/pixel_ops.h"
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: hw/misc/mst_fpga.c
  F: hw/misc/max111x.c
  F: include/hw/arm/pxa.h
  F: include/hw/arm/sharpsl.h
 +F: include/hw/display/tc6393xb.h
  SABRELITE / i.MX6
  M: Peter Maydell <peter.maydell@linaro.org>
 --
 .20.1

First pullreq for arm of the 4.1 series, since I'm back from
holiday now. This is mostly my M-profile FPU series and Philippe's
devices.h cleanup. I have a pile of other patchsets to work through
in my to-review folder, but 42 patches is definitely quite
big enough to send now...

thanks
-- PMM

The following changes since commit 413a99a92c13ec408dcf2adaa87918dc81e890c8:

Add Nios II semihosting support. (2019-04-29 16:09:51 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20190429

for you to fetch changes up to 437cc27ddfded3bbab6afd5ac1761e0e195edba7:

hw/devices: Move SMSC 91C111 declaration into a new header (2019-04-29 17:57:21 +0100)

----------------------------------------------------------------
target-arm queue:
 * remove "bag of random stuff" hw/devices.h header
 * implement FPU for Cortex-M and enable it for Cortex-M4 and -M33
 * hw/dma: Compile the bcm2835_dma device as common object
 * configure: Remove --source-path option
 * hw/ssi/xilinx_spips: Avoid variable length array
 * hw/arm/smmuv3: Remove SMMUNotifierNode

----------------------------------------------------------------
Eric Auger (1):
      hw/arm/smmuv3: Remove SMMUNotifierNode

Peter Maydell (28):
      hw/ssi/xilinx_spips: Avoid variable length array
      configure: Remove --source-path option
      target/arm: Make sure M-profile FPSCR RES0 bits are not settable
      hw/intc/armv7m_nvic: Allow reading of M-profile MVFR* registers
      target/arm: Implement dummy versions of M-profile FP-related registers
      target/arm: Disable most VFP sysregs for M-profile
      target/arm: Honour M-profile FP enable bits
      target/arm: Decode FP instructions for M profile
      target/arm: Clear CONTROL_S.SFPA in SG insn if FPU present
      target/arm: Handle SFPA and FPCA bits in reads and writes of CONTROL
      target/arm/helper: don't return early for STKOF faults during stacking
      target/arm: Handle floating point registers in exception entry
      target/arm: Implement v7m_update_fpccr()
      target/arm: Clear CONTROL.SFPA in BXNS and BLXNS
      target/arm: Clean excReturn bits when tail chaining
      target/arm: Allow for floating point in callee stack integrity check
      target/arm: Handle floating point registers in exception return
      target/arm: Move NS TBFLAG from bit 19 to bit 6
      target/arm: Overlap VECSTRIDE and XSCALE_CPAR TB flags
      target/arm: Set FPCCR.S when executing M-profile floating point insns
      target/arm: Activate M-profile floating point context when FPCCR.ASPEN is set
      target/arm: New helper function arm_v7m_mmu_idx_all()
      target/arm: New function armv7m_nvic_set_pending_lazyfp()
      target/arm: Add lazy-FP-stacking support to v7m_stack_write()
      target/arm: Implement M-profile lazy FP state preservation
      target/arm: Implement VLSTM for v7M CPUs with an FPU
      target/arm: Implement VLLDM for v7M CPUs with an FPU
      target/arm: Enable FPU for Cortex-M4 and Cortex-M33

Philippe Mathieu-Daudé (13):
      hw/dma: Compile the bcm2835_dma device as common object
      hw/arm/aspeed: Use TYPE_TMP105/TYPE_PCA9552 instead of hardcoded string
      hw/arm/nseries: Use TYPE_TMP105 instead of hardcoded string
      hw/display/tc6393xb: Remove unused functions
      hw/devices: Move TC6393XB declarations into a new header
      hw/devices: Move Blizzard declarations into a new header
      hw/devices: Move CBus declarations into a new header
      hw/devices: Move Gamepad declarations into a new header
      hw/devices: Move TI touchscreen declarations into a new header
      hw/devices: Move LAN9118 declarations into a new header
      hw/net/ne2000-isa: Add guards to the header
      hw/net/lan9118: Export TYPE_LAN9118 and use it instead of hardcoded string
      hw/devices: Move SMSC 91C111 declaration into a new header

From: Eric Auger <eric.auger@redhat.com>

The SMMUNotifierNode struct is not necessary and brings extra
complexity so let's remove it. We now directly track the SMMUDevices
which have registered IOMMU MR notifiers.

This is inspired from the same transformation on intel-iommu
done in commit b4a4ba0d68f50f218ee3957b6638dbee32a5eeef
("intel-iommu: remove IntelIOMMUNotifierNode")

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-id: 20190409160219.19026-1-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/smmu-common.h |  8 ++------
 hw/arm/smmu-common.c         |  6 +++---
 hw/arm/smmuv3.c              | 28 +++++++---------------------
 3 files changed, 12 insertions(+), 30 deletions(-)

diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -XXX,XX +XXX,XX @@ typedef struct SMMUDevice {
     AddressSpace       as;
     uint32_t           cfg_cache_hits;
     uint32_t           cfg_cache_misses;
+    QLIST_ENTRY(SMMUDevice) next;
 } SMMUDevice;
 
-typedef struct SMMUNotifierNode {
-    SMMUDevice *sdev;
-    QLIST_ENTRY(SMMUNotifierNode) next;
-} SMMUNotifierNode;
-
 typedef struct SMMUPciBus {
     PCIBus       *bus;
     SMMUDevice   *pbdev[0]; /* Parent array is sparse, so dynamically alloc */
@@ -XXX,XX +XXX,XX @@ typedef struct SMMUState {
     GHashTable *iotlb;
     SMMUPciBus *smmu_pcibus_by_bus_num[SMMU_PCI_BUS_MAX];
     PCIBus *pci_bus;
-    QLIST_HEAD(, SMMUNotifierNode) notifiers_list;
+    QLIST_HEAD(, SMMUDevice) devices_with_notifiers;
     uint8_t bus_num;
     PCIBus *primary_bus;
 } SMMUState;
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -XXX,XX +XXX,XX @@ inline void smmu_inv_notifiers_mr(IOMMUMemoryRegion *mr)
 /* Unmap all notifiers of all mr's */
 void smmu_inv_notifiers_all(SMMUState *s)
 {
-    SMMUNotifierNode *node;
+    SMMUDevice *sdev;
 
-    QLIST_FOREACH(node, &s->notifiers_list, next) {
-        smmu_inv_notifiers_mr(&node->sdev->iommu);
+    QLIST_FOREACH(sdev, &s->devices_with_notifiers, next) {
+        smmu_inv_notifiers_mr(&sdev->iommu);
     }
 }
 
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -XXX,XX +XXX,XX @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 /* invalidate an asid/iova tuple in all mr's */
 static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid, dma_addr_t iova)
 {
-    SMMUNotifierNode *node;
+    SMMUDevice *sdev;
 
-    QLIST_FOREACH(node, &s->notifiers_list, next) {
-        IOMMUMemoryRegion *mr = &node->sdev->iommu;
+    QLIST_FOREACH(sdev, &s->devices_with_notifiers, next) {
+        IOMMUMemoryRegion *mr = &sdev->iommu;
         IOMMUNotifier *n;
 
         trace_smmuv3_inv_notifiers_iova(mr->parent_obj.name, asid, iova);
@@ -XXX,XX +XXX,XX @@ static void smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
     SMMUDevice *sdev = container_of(iommu, SMMUDevice, iommu);
     SMMUv3State *s3 = sdev->smmu;
     SMMUState *s = &(s3->smmu_state);
-    SMMUNotifierNode *node = NULL;
-    SMMUNotifierNode *next_node = NULL;
 
     if (new & IOMMU_NOTIFIER_MAP) {
         int bus_num = pci_bus_num(sdev->bus);
@@ -XXX,XX +XXX,XX @@ static void smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
 
     if (old == IOMMU_NOTIFIER_NONE) {
         trace_smmuv3_notify_flag_add(iommu->parent_obj.name);
-        node = g_malloc0(sizeof(*node));
-        node->sdev = sdev;
-        QLIST_INSERT_HEAD(&s->notifiers_list, node, next);
-        return;
-    }
-
-    /* update notifier node with new flags */
-    QLIST_FOREACH_SAFE(node, &s->notifiers_list, next, next_node) {
-        if (node->sdev == sdev) {
-            if (new == IOMMU_NOTIFIER_NONE) {
-                trace_smmuv3_notify_flag_del(iommu->parent_obj.name);
-                QLIST_REMOVE(node, next);
-                g_free(node);
-            }
-            return;
-        }
+        QLIST_INSERT_HEAD(&s->devices_with_notifiers, sdev, next);
+    } else if (new == IOMMU_NOTIFIER_NONE) {
+        trace_smmuv3_notify_flag_del(iommu->parent_obj.name);
+        QLIST_REMOVE(sdev, next);
     }
 }
 
-- 
2.20.1

In the stripe8() function we use a variable length array; however
we know that the maximum length required is MAX_NUM_BUSSES. Use
a fixed-length array and an assert instead.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20190328152635.2794-1-peter.maydell@linaro.org
---
 hw/ssi/xilinx_spips.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/ssi/xilinx_spips.c b/hw/ssi/xilinx_spips.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/ssi/xilinx_spips.c
+++ b/hw/ssi/xilinx_spips.c
@@ -XXX,XX +XXX,XX @@ static void xlnx_zynqmp_qspips_reset(DeviceState *d)
 
 static inline void stripe8(uint8_t *x, int num, bool dir)
 {
-    uint8_t r[num];
-    memset(r, 0, sizeof(uint8_t) * num);
+    uint8_t r[MAX_NUM_BUSSES];
     int idx[2] = {0, 0};
     int bit[2] = {0, 7};
     int d = dir;
 
+    assert(num <= MAX_NUM_BUSSES);
+    memset(r, 0, sizeof(uint8_t) * num);
+
     for (idx[0] = 0; idx[0] < num; ++idx[0]) {
         for (bit[0] = 7; bit[0] >= 0; bit[0]--) {
             r[idx[!d]] |= x[idx[d]] & 1 << bit[d] ? 1 << bit[!d] : 0;
-- 
2.20.1

Normally configure identifies the source path by looking
at the location where the configure script itself exists.
We also provide a --source-path option which lets the user
manually override this.

There isn't really an obvious use case for the --source-path
option, and in commit 927128222b0a91f56c13a in 2017 we
accidentally added some logic that looks at $source_path
before the command line option that overrides it has been
processed.

The fact that nobody complained suggests that there isn't
any use of this option and we aren't testing it either;
remove it. This allows us to move the "make $source_path
absolute" logic up so that there is no window in the script
where $source_path is set but not yet absolute.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: 20190318134019.23729-1-peter.maydell@linaro.org
---
 configure | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/configure b/configure
index XXXXXXX..XXXXXXX 100755
--- a/configure
+++ b/configure
@@ -XXX,XX +XXX,XX @@ ld_has() {
 
 # default parameters
 source_path=$(dirname "$0")
+# make source path absolute
+source_path=$(cd "$source_path"; pwd)
 cpu=""
 iasl="iasl"
 interp_prefix="/usr/gnemul/qemu-%M"
@@ -XXX,XX +XXX,XX @@ for opt do
   ;;
   --cxx=*) CXX="$optarg"
   ;;
-  --source-path=*) source_path="$optarg"
-  ;;
   --cpu=*) cpu="$optarg"
   ;;
   --extra-cflags=*) QEMU_CFLAGS="$QEMU_CFLAGS $optarg"
@@ -XXX,XX +XXX,XX @@ if test "$debug_info" = "yes"; then
     LDFLAGS="-g $LDFLAGS"
 fi
 
-# make source path absolute
-source_path=$(cd "$source_path"; pwd)
-
 # running configure in the source tree?
 # we know that's the case if configure is there.
 if test -f "./configure"; then
@@ -XXX,XX +XXX,XX @@ for opt do
   ;;
   --interp-prefix=*) interp_prefix="$optarg"
   ;;
-  --source-path=*)
-  ;;
   --cross-prefix=*)
   ;;
   --cc=*)
@@ -XXX,XX +XXX,XX @@ $(echo Available targets: $default_target_list | \
   --target-list-exclude=LIST exclude a set of targets from the default target-list
 
 Advanced options (experts only):
-  --source-path=PATH       path of source code [$source_path]
   --cross-prefix=PREFIX    use PREFIX for compile tools [$cross_prefix]
   --cc=CC                  use C compiler CC [$cc]
   --iasl=IASL              use ACPI compiler IASL [$iasl]
-- 
2.20.1

Enforce that for M-profile various FPSCR bits which are RES0 there
but have defined meanings on A-profile are never settable. This
ensures that M-profile code can't enable the A-profile behaviour
(notably vector length/stride handling) by accident.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-2-peter.maydell@linaro.org
---
 target/arm/vfp_helper.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
         val &= ~FPCR_FZ16;
     }
 
+    if (arm_feature(env, ARM_FEATURE_M)) {
+        /*
+         * M profile FPSCR is RES0 for the QC, STRIDE, FZ16, LEN bits
+         * and also for the trapped-exception-handling bits IxE.
+         */
+        val &= 0xf7c0009f;
+    }
+
     /*
      * We don't implement trapped exception handling, so the
      * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
-- 
2.20.1

For M-profile the MVFR* ID registers are memory mapped, in the
range we implement via the NVIC. Allow them to be read.
(If the CPU has no FPU, these registers are defined to be RAZ.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-3-peter.maydell@linaro.org
---
 hw/intc/armv7m_nvic.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
             return 0;
         }
         return cpu->env.v7m.sfar;
+    case 0xf40: /* MVFR0 */
+        return cpu->isar.mvfr0;
+    case 0xf44: /* MVFR1 */
+        return cpu->isar.mvfr1;
+    case 0xf48: /* MVFR2 */
+        return cpu->isar.mvfr2;
     default:
     bad_offset:
         qemu_log_mask(LOG_GUEST_ERROR, "NVIC: Bad read offset 0x%x\n", offset);
-- 
2.20.1

The M-profile floating point support has three associated config
registers: FPCAR, FPCCR and FPDSCR. It also makes the registers
CPACR and NSACR have behaviour other than reads-as-zero.
Add support for all of these as simple reads-as-written registers.
We will hook up actual functionality later.

The main complexity here is handling the FPCCR register, which
has a mix of banked and unbanked bits.

Note that we don't share storage with the A-profile
cpu->cp15.nsacr and cpu->cp15.cpacr_el1, though the behaviour
is quite similar, for two reasons:
 * the M profile CPACR is banked between security states
 * it preserves the invariant that M profile uses no state
   inside the cp15 substruct

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-4-peter.maydell@linaro.org
---
 target/arm/cpu.h      |  34 ++++++++++++
 hw/intc/armv7m_nvic.c | 125 ++++++++++++++++++++++++++++++++++++++++++
 target/arm/cpu.c      |   5 ++
 target/arm/machine.c  |  16 ++++++
 4 files changed, 180 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
         uint32_t scr[M_REG_NUM_BANKS];
         uint32_t msplim[M_REG_NUM_BANKS];
         uint32_t psplim[M_REG_NUM_BANKS];
+        uint32_t fpcar[M_REG_NUM_BANKS];
+        uint32_t fpccr[M_REG_NUM_BANKS];
+        uint32_t fpdscr[M_REG_NUM_BANKS];
+        uint32_t cpacr[M_REG_NUM_BANKS];
+        uint32_t nsacr;
     } v7m;
 
     /* Information associated with an exception about to be taken:
@@ -XXX,XX +XXX,XX @@ FIELD(V7M_CSSELR, LEVEL, 1, 3)
  */
 FIELD(V7M_CSSELR, INDEX, 0, 4)
 
+/* v7M FPCCR bits */
+FIELD(V7M_FPCCR, LSPACT, 0, 1)
+FIELD(V7M_FPCCR, USER, 1, 1)
+FIELD(V7M_FPCCR, S, 2, 1)
+FIELD(V7M_FPCCR, THREAD, 3, 1)
+FIELD(V7M_FPCCR, HFRDY, 4, 1)
+FIELD(V7M_FPCCR, MMRDY, 5, 1)
+FIELD(V7M_FPCCR, BFRDY, 6, 1)
+FIELD(V7M_FPCCR, SFRDY, 7, 1)
+FIELD(V7M_FPCCR, MONRDY, 8, 1)
+FIELD(V7M_FPCCR, SPLIMVIOL, 9, 1)
+FIELD(V7M_FPCCR, UFRDY, 10, 1)
+FIELD(V7M_FPCCR, RES0, 11, 15)
+FIELD(V7M_FPCCR, TS, 26, 1)
+FIELD(V7M_FPCCR, CLRONRETS, 27, 1)
+FIELD(V7M_FPCCR, CLRONRET, 28, 1)
+FIELD(V7M_FPCCR, LSPENS, 29, 1)
+FIELD(V7M_FPCCR, LSPEN, 30, 1)
+FIELD(V7M_FPCCR, ASPEN, 31, 1)
+/* These bits are banked. Others are non-banked and live in the M_REG_S bank */
+#define R_V7M_FPCCR_BANKED_MASK                 \
+    (R_V7M_FPCCR_LSPACT_MASK |                  \
+     R_V7M_FPCCR_USER_MASK |                    \
+     R_V7M_FPCCR_THREAD_MASK |                  \
+     R_V7M_FPCCR_MMRDY_MASK |                   \
+     R_V7M_FPCCR_SPLIMVIOL_MASK |               \
+     R_V7M_FPCCR_UFRDY_MASK |                   \
+     R_V7M_FPCCR_ASPEN_MASK)
+
 /*
  * System register ID fields.
  */
diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
     }
     case 0xd84: /* CSSELR */
         return cpu->env.v7m.csselr[attrs.secure];
+    case 0xd88: /* CPACR */
+        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+            return 0;
+        }
+        return cpu->env.v7m.cpacr[attrs.secure];
+    case 0xd8c: /* NSACR */
+        if (!attrs.secure || !arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+            return 0;
+        }
+        return cpu->env.v7m.nsacr;
     /* TODO: Implement debug registers.  */
     case 0xd90: /* MPU_TYPE */
         /* Unified MPU; if the MPU is not present this value is zero */
@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
             return 0;
         }
         return cpu->env.v7m.sfar;
+    case 0xf34: /* FPCCR */
+        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+            return 0;
+        }
+        if (attrs.secure) {
+            return cpu->env.v7m.fpccr[M_REG_S];
+        } else {
+            /*
+             * NS can read LSPEN, CLRONRET and MONRDY. It can read
+             * BFRDY and HFRDY if AIRCR.BFHFNMINS != 0;
+             * other non-banked bits RAZ.
+             * TODO: MONRDY should RAZ/WI if DEMCR.SDME is set.
+             */
+            uint32_t value = cpu->env.v7m.fpccr[M_REG_S];
+            uint32_t mask = R_V7M_FPCCR_LSPEN_MASK |
+                R_V7M_FPCCR_CLRONRET_MASK |
+                R_V7M_FPCCR_MONRDY_MASK;
+
+            if (s->cpu->env.v7m.aircr & R_V7M_AIRCR_BFHFNMINS_MASK) {
+                mask |= R_V7M_FPCCR_BFRDY_MASK | R_V7M_FPCCR_HFRDY_MASK;
+            }
+
+            value &= mask;
+
+            value |= cpu->env.v7m.fpccr[M_REG_NS];
+            return value;
+        }
+    case 0xf38: /* FPCAR */
+        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+            return 0;
+        }
+        return cpu->env.v7m.fpcar[attrs.secure];
+    case 0xf3c: /* FPDSCR */
+        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+            return 0;
+        }
+        return cpu->env.v7m.fpdscr[attrs.secure];
     case 0xf40: /* MVFR0 */
         return cpu->isar.mvfr0;
     case 0xf44: /* MVFR1 */
@@ -XXX,XX +XXX,XX @@ static void nvic_writel(NVICState *s, uint32_t offset, uint32_t value,
             cpu->env.v7m.csselr[attrs.secure] = value & R_V7M_CSSELR_INDEX_MASK;
         }
         break;
+    case 0xd88: /* CPACR */
+        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+            /* We implement only the Floating Point extension's CP10/CP11 */
+            cpu->env.v7m.cpacr[attrs.secure] = value & (0xf << 20);
+        }
+        break;
+    case 0xd8c: /* NSACR */
+        if (attrs.secure && arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+            /* We implement only the Floating Point extension's CP10/CP11 */
+            cpu->env.v7m.nsacr = value & (3 << 10);
+        }
+        break;
     case 0xd90: /* MPU_TYPE */
         return; /* RO */
     case 0xd94: /* MPU_CTRL */
@@ -XXX,XX +XXX,XX @@ static void nvic_writel(NVICState *s, uint32_t offset, uint32_t value,
         }
         break;
     }
+    case 0xf34: /* FPCCR */
+        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+            /* Not all bits here are banked. */
+            uint32_t fpccr_s;
+
+            if (!arm_feature(&cpu->env, ARM_FEATURE_V8)) {
+                /* Don't allow setting of bits not present in v7M */
+                value &= (R_V7M_FPCCR_LSPACT_MASK |
+                          R_V7M_FPCCR_USER_MASK |
+                          R_V7M_FPCCR_THREAD_MASK |
+                          R_V7M_FPCCR_HFRDY_MASK |
+                          R_V7M_FPCCR_MMRDY_MASK |
+                          R_V7M_FPCCR_BFRDY_MASK |
+                          R_V7M_FPCCR_MONRDY_MASK |
+                          R_V7M_FPCCR_LSPEN_MASK |
+                          R_V7M_FPCCR_ASPEN_MASK);
+            }
+            value &= ~R_V7M_FPCCR_RES0_MASK;
+
+            if (!attrs.secure) {
+                /* Some non-banked bits are configurably writable by NS */
+                fpccr_s = cpu->env.v7m.fpccr[M_REG_S];
+                if (!(fpccr_s & R_V7M_FPCCR_LSPENS_MASK)) {
+                    uint32_t lspen = FIELD_EX32(value, V7M_FPCCR, LSPEN);
+                    fpccr_s = FIELD_DP32(fpccr_s, V7M_FPCCR, LSPEN, lspen);
+                }
+                if (!(fpccr_s & R_V7M_FPCCR_CLRONRETS_MASK)) {
+                    uint32_t cor = FIELD_EX32(value, V7M_FPCCR, CLRONRET);
+                    fpccr_s = FIELD_DP32(fpccr_s, V7M_FPCCR, CLRONRET, cor);
+                }
+                if ((s->cpu->env.v7m.aircr & R_V7M_AIRCR_BFHFNMINS_MASK)) {
+                    uint32_t hfrdy = FIELD_EX32(value, V7M_FPCCR, HFRDY);
+                    uint32_t bfrdy = FIELD_EX32(value, V7M_FPCCR, BFRDY);
+                    fpccr_s = FIELD_DP32(fpccr_s, V7M_FPCCR, HFRDY, hfrdy);
+                    fpccr_s = FIELD_DP32(fpccr_s, V7M_FPCCR, BFRDY, bfrdy);
+                }
+                /* TODO MONRDY should RAZ/WI if DEMCR.SDME is set */
+                {
+                    uint32_t monrdy = FIELD_EX32(value, V7M_FPCCR, MONRDY);
+                    fpccr_s = FIELD_DP32(fpccr_s, V7M_FPCCR, MONRDY, monrdy);
+                }
+
+                /*
+                 * All other non-banked bits are RAZ/WI from NS; write
+                 * just the banked bits to fpccr[M_REG_NS].
+                 */
+                value &= R_V7M_FPCCR_BANKED_MASK;
+                cpu->env.v7m.fpccr[M_REG_NS] = value;
+            } else {
+                fpccr_s = value;
+            }
+            cpu->env.v7m.fpccr[M_REG_S] = fpccr_s;
+        }
+        break;
+    case 0xf38: /* FPCAR */
+        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+            value &= ~7;
+            cpu->env.v7m.fpcar[attrs.secure] = value;
+        }
+        break;
+    case 0xf3c: /* FPDSCR */
+        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+            value &= 0x07c00000;
+            cpu->env.v7m.fpdscr[attrs.secure] = value;
+        }
+        break;
     case 0xf50: /* ICIALLU */
     case 0xf58: /* ICIMVAU */
     case 0xf5c: /* DCIMVAC */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
             env->v7m.ccr[M_REG_S] |= R_V7M_CCR_UNALIGN_TRP_MASK;
         }
 
+        if (arm_feature(env, ARM_FEATURE_VFP)) {
+            env->v7m.fpccr[M_REG_NS] = R_V7M_FPCCR_ASPEN_MASK;
+            env->v7m.fpccr[M_REG_S] = R_V7M_FPCCR_ASPEN_MASK |
+                R_V7M_FPCCR_LSPEN_MASK | R_V7M_FPCCR_S_MASK;
+        }
         /* Unlike A/R profile, M profile defines the reset LR value */
         env->regs[14] = 0xffffffff;
 
diff --git a/target/arm/machine.c b/target/arm/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_v8m = {
     }
 };
 
+static const VMStateDescription vmstate_m_fp = {
+    .name = "cpu/m/fp",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = vfp_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32_ARRAY(env.v7m.fpcar, ARMCPU, M_REG_NUM_BANKS),
+        VMSTATE_UINT32_ARRAY(env.v7m.fpccr, ARMCPU, M_REG_NUM_BANKS),
+        VMSTATE_UINT32_ARRAY(env.v7m.fpdscr, ARMCPU, M_REG_NUM_BANKS),
+        VMSTATE_UINT32_ARRAY(env.v7m.cpacr, ARMCPU, M_REG_NUM_BANKS),
+        VMSTATE_UINT32(env.v7m.nsacr, ARMCPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_m = {
     .name = "cpu/m",
     .version_id = 4,
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m = {
         &vmstate_m_scr,
         &vmstate_m_other_sp,
         &vmstate_m_v8m,
+        &vmstate_m_fp,
         NULL
     }
 };
-- 
2.20.1

The only "system register" that M-profile floating point exposes
via the VMRS/VMRS instructions is FPSCR, and it does not have
the odd special case for rd==15. Add a check to ensure we only
expose FPSCR.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-5-peter.maydell@linaro.org
---
 target/arm/translate.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     }
                 }
             } else { /* !dp */
+                bool is_sysreg;
+
                 if ((insn & 0x6f) != 0x00)
                     return 1;
                 rn = VFP_SREG_N(insn);
+
+                is_sysreg = extract32(insn, 21, 1);
+
+                if (arm_dc_feature(s, ARM_FEATURE_M)) {
+                    /*
+                     * The only M-profile VFP vmrs/vmsr sysreg is FPSCR.
+                     * Writes to R15 are UNPREDICTABLE; we choose to undef.
+                     */
+                    if (is_sysreg && (rd == 15 || (rn >> 1) != ARM_VFP_FPSCR)) {
+                        return 1;
+                    }
+                }
+
                 if (insn & ARM_CP_RW_BIT) {
                     /* vfp->arm */
-                    if (insn & (1 << 21)) {
+                    if (is_sysreg) {
                         /* system register */
                         rn >>= 1;
 
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     }
                 } else {
                     /* arm->vfp */
-                    if (insn & (1 << 21)) {
+                    if (is_sysreg) {
                         rn >>= 1;
                         /* system register */
                         switch (rn) {
-- 
2.20.1

Like AArch64, M-profile floating point has no FPEXC enable
bit to gate floating point; so always set the VFPEN TB flag.

M-profile also has CPACR and NSACR similar to A-profile;
they behave slightly differently:
 * the CPACR is banked between Secure and Non-Secure
 * if the NSACR forces a trap then this is taken to
   the Secure state, not the Non-Secure state

Honour the CPACR and NSACR settings. The NSACR handling
requires us to borrow the exception.target_el field
(usually meaningless for M profile) to distinguish the
NOCP UsageFault taken to Secure state from the more
usual fault taken to the current security state.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-6-peter.maydell@linaro.org
---
 target/arm/helper.c    | 55 +++++++++++++++++++++++++++++++++++++++---
 target/arm/translate.c | 10 ++++++--
 2 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t arm_phys_excp_target_el(CPUState *cs, uint32_t excp_idx,
     return target_el;
 }
 
+/*
+ * Return true if the v7M CPACR permits access to the FPU for the specified
+ * security state and privilege level.
+ */
+static bool v7m_cpacr_pass(CPUARMState *env, bool is_secure, bool is_priv)
+{
+    switch (extract32(env->v7m.cpacr[is_secure], 20, 2)) {
+    case 0:
+    case 2: /* UNPREDICTABLE: we treat like 0 */
+        return false;
+    case 1:
+        return is_priv;
+    case 3:
+        return true;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static bool v7m_stack_write(ARMCPU *cpu, uint32_t addr, uint32_t value,
                             ARMMMUIdx mmu_idx, bool ignfault)
 {
@@ -XXX,XX +XXX,XX @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
         env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_UNDEFINSTR_MASK;
         break;
     case EXCP_NOCP:
-        armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
-        env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_NOCP_MASK;
+    {
+        /*
+         * NOCP might be directed to something other than the current
+         * security state if this fault is because of NSACR; we indicate
+         * the target security state using exception.target_el.
+         */
+        int target_secstate;
+
+        if (env->exception.target_el == 3) {
+            target_secstate = M_REG_S;
+        } else {
+            target_secstate = env->v7m.secure;
+        }
+        armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, target_secstate);
+        env->v7m.cfsr[target_secstate] |= R_V7M_CFSR_NOCP_MASK;
         break;
+    }
     case EXCP_INVSTATE:
         armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
         env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_INVSTATE_MASK;
@@ -XXX,XX +XXX,XX @@ int fp_exception_el(CPUARMState *env, int cur_el)
         return 0;
     }
 
+    if (arm_feature(env, ARM_FEATURE_M)) {
+        /* CPACR can cause a NOCP UsageFault taken to current security state */
+        if (!v7m_cpacr_pass(env, env->v7m.secure, cur_el != 0)) {
+            return 1;
+        }
+
+        if (arm_feature(env, ARM_FEATURE_M_SECURITY) && !env->v7m.secure) {
+            if (!extract32(env->v7m.nsacr, 10, 1)) {
+                /* FP insns cause a NOCP UsageFault taken to Secure */
+                return 3;
+            }
+        }
+
+        return 0;
+    }
+
     /* The CPACR controls traps to EL1, or PL1 if we're 32 bit:
      * 0, 2 : trap EL0 and EL1/PL1 accesses
      * 1    : trap only EL0 accesses
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         flags = FIELD_DP32(flags, TBFLAG_A32, SCTLR_B, arm_sctlr_b(env));
         flags = FIELD_DP32(flags, TBFLAG_A32, NS, !access_secure_reg(env));
         if (env->vfp.xregs[ARM_VFP_FPEXC] & (1 << 30)
-            || arm_el_is_aa64(env, 1)) {
+            || arm_el_is_aa64(env, 1) || arm_feature(env, ARM_FEATURE_M)) {
             flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
         }
         flags = FIELD_DP32(flags, TBFLAG_A32, XSCALE_CPAR, env->cp15.c15_cpar);
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
      * for attempts to execute invalid vfp/neon encodings with FP disabled.
      */
     if (s->fp_excp_el) {
-        gen_exception_insn(s, 4, EXCP_UDEF,
-                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
+        if (arm_dc_feature(s, ARM_FEATURE_M)) {
+            gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
+                               s->fp_excp_el);
+        } else {
+            gen_exception_insn(s, 4, EXCP_UDEF,
+                               syn_fp_access_trap(1, 0xe, false),
+                               s->fp_excp_el);
+        }
         return 0;
     }
 
-- 
2.20.1

Correct the decode of the M-profile "coprocessor and
floating-point instructions" space:
 * op0 == 0b11 is always unallocated
 * if the CPU has an FPU then all insns with op1 == 0b101
   are floating point and go to disas_vfp_insn()

For the moment we leave VLLDM and VLSTM as NOPs; in
a later commit we will fill in the proper implementation
for the case where an FPU is present.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-7-peter.maydell@linaro.org
---
 target/arm/translate.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
     case 6: case 7: case 14: case 15:
         /* Coprocessor.  */
         if (arm_dc_feature(s, ARM_FEATURE_M)) {
-            /* We don't currently implement M profile FP support,
-             * so this entire space should give a NOCP fault, with
-             * the exception of the v8M VLLDM and VLSTM insns, which
-             * must be NOPs in Secure state and UNDEF in Nonsecure state.
+            /* 0b111x_11xx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx */
+            if (extract32(insn, 24, 2) == 3) {
+                goto illegal_op; /* op0 = 0b11 : unallocated */
+            }
+
+            /*
+             * Decode VLLDM and VLSTM first: these are nonstandard because:
+             *  * if there is no FPU then these insns must NOP in
+             *    Secure state and UNDEF in Nonsecure state
+             *  * if there is an FPU then these insns do not have
+             *    the usual behaviour that disas_vfp_insn() provides of
+             *    being controlled by CPACR/NSACR enable bits or the
+             *    lazy-stacking logic.
              */
             if (arm_dc_feature(s, ARM_FEATURE_V8) &&
                 (insn & 0xffa00f00) == 0xec200a00) {
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
                 /* Just NOP since FP support is not implemented */
                 break;
             }
+            if (arm_dc_feature(s, ARM_FEATURE_VFP) &&
+                ((insn >> 8) & 0xe) == 10) {
+                /* FP, and the CPU supports it */
+                if (disas_vfp_insn(s, insn)) {
+                    goto illegal_op;
+                }
+                break;
+            }
+
             /* All other insns: NOCP */
             gen_exception_insn(s, 4, EXCP_NOCP, syn_uncategorized(),
                                default_exception_el(s));
-- 
2.20.1

If the floating point extension is present, then the SG instruction
must clear the CONTROL_S.SFPA bit. Implement this.

(On a no-FPU system the bit will always be zero, so we don't need
to make the clearing of the bit conditional on ARM_FEATURE_VFP.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-8-peter.maydell@linaro.org
---
 target/arm/helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static bool v7m_handle_execute_nsc(ARMCPU *cpu)
     qemu_log_mask(CPU_LOG_INT, "...really an SG instruction at 0x%08" PRIx32
                   ", executing it\n", env->regs[15]);
     env->regs[14] &= ~1;
+    env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_SFPA_MASK;
     switch_v7m_security_state(env, true);
     xpsr_write(env, 0, XPSR_IT);
     env->regs[15] += 4;
-- 
2.20.1

The M-profile CONTROL register has two bits -- SFPA and FPCA --
which relate to floating-point support, and should be RES0 otherwise.
Handle them correctly in the MSR/MRS register access code.
Neither is banked between security states, so they are stored
in v7m.control[M_REG_S] regardless of current security state.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-9-peter.maydell@linaro.org
---
 target/arm/helper.c | 57 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 49 insertions(+), 8 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(v7m_mrs)(CPUARMState *env, uint32_t reg)
         return xpsr_read(env) & mask;
         break;
     case 20: /* CONTROL */
-        return env->v7m.control[env->v7m.secure];
+    {
+        uint32_t value = env->v7m.control[env->v7m.secure];
+        if (!env->v7m.secure) {
+            /* SFPA is RAZ/WI from NS; FPCA is stored in the M_REG_S bank */
+            value |= env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK;
+        }
+        return value;
+    }
     case 0x94: /* CONTROL_NS */
         /* We have to handle this here because unprivileged Secure code
          * can read the NS CONTROL register.
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(v7m_mrs)(CPUARMState *env, uint32_t reg)
         if (!env->v7m.secure) {
             return 0;
         }
-        return env->v7m.control[M_REG_NS];
+        return env->v7m.control[M_REG_NS] |
+            (env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK);
     }
 
     if (el == 0) {
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
      */
     uint32_t mask = extract32(maskreg, 8, 4);
     uint32_t reg = extract32(maskreg, 0, 8);
+    int cur_el = arm_current_el(env);
 
-    if (arm_current_el(env) == 0 && reg > 7) {
-        /* only xPSR sub-fields may be written by unprivileged */
+    if (cur_el == 0 && reg > 7 && reg != 20) {
+        /*
+         * only xPSR sub-fields and CONTROL.SFPA may be written by
+         * unprivileged code
+         */
         return;
     }
 
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
                 env->v7m.control[M_REG_NS] &= ~R_V7M_CONTROL_NPRIV_MASK;
                 env->v7m.control[M_REG_NS] |= val & R_V7M_CONTROL_NPRIV_MASK;
             }
+            /*
+             * SFPA is RAZ/WI from NS. FPCA is RO if NSACR.CP10 == 0,
+             * RES0 if the FPU is not present, and is stored in the S bank
+             */
+            if (arm_feature(env, ARM_FEATURE_VFP) &&
+                extract32(env->v7m.nsacr, 10, 1)) {
+                env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_FPCA_MASK;
+                env->v7m.control[M_REG_S] |= val & R_V7M_CONTROL_FPCA_MASK;
+            }
             return;
         case 0x98: /* SP_NS */
         {
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
         env->v7m.faultmask[env->v7m.secure] = val & 1;
         break;
     case 20: /* CONTROL */
-        /* Writing to the SPSEL bit only has an effect if we are in
+        /*
+         * Writing to the SPSEL bit only has an effect if we are in
          * thread mode; other bits can be updated by any privileged code.
          * write_v7m_control_spsel() deals with updating the SPSEL bit in
          * env->v7m.control, so we only need update the others.
          * For v7M, we must just ignore explicit writes to SPSEL in handler
          * mode; for v8M the write is permitted but will have no effect.
+         * All these bits are writes-ignored from non-privileged code,
+         * except for SFPA.
          */
-        if (arm_feature(env, ARM_FEATURE_V8) ||
-            !arm_v7m_is_handler_mode(env)) {
+        if (cur_el > 0 && (arm_feature(env, ARM_FEATURE_V8) ||
+                           !arm_v7m_is_handler_mode(env))) {
             write_v7m_control_spsel(env, (val & R_V7M_CONTROL_SPSEL_MASK) != 0);
         }
-        if (arm_feature(env, ARM_FEATURE_M_MAIN)) {
+        if (cur_el > 0 && arm_feature(env, ARM_FEATURE_M_MAIN)) {
             env->v7m.control[env->v7m.secure] &= ~R_V7M_CONTROL_NPRIV_MASK;
             env->v7m.control[env->v7m.secure] |= val & R_V7M_CONTROL_NPRIV_MASK;
         }
+        if (arm_feature(env, ARM_FEATURE_VFP)) {
+            /*
+             * SFPA is RAZ/WI from NS or if no FPU.
+             * FPCA is RO if NSACR.CP10 == 0, RES0 if the FPU is not present.
+             * Both are stored in the S bank.
+             */
+            if (env->v7m.secure) {
+                env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_SFPA_MASK;
+                env->v7m.control[M_REG_S] |= val & R_V7M_CONTROL_SFPA_MASK;
+            }
+            if (cur_el > 0 &&
+                (env->v7m.secure || !arm_feature(env, ARM_FEATURE_M_SECURITY) ||
+                 extract32(env->v7m.nsacr, 10, 1))) {
+                env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_FPCA_MASK;
+                env->v7m.control[M_REG_S] |= val & R_V7M_CONTROL_FPCA_MASK;
+            }
+        }
         break;
     default:
     bad_reg:
-- 
2.20.1

Currently the code in v7m_push_stack() which detects a violation
of the v8M stack limit simply returns early if it does so. This
is OK for the current integer-only code, but won't work for the
floating point handling we're about to add. We need to continue
executing the rest of the function so that we check for other
exceptions like not having permission to use the FPU and so
that we correctly set the FPCCR state if we are doing lazy
stacking. Refactor to avoid the early return.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-10-peter.maydell@linaro.org
---
 target/arm/helper.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
      * should ignore further stack faults trying to process
      * that derived exception.)
      */
-    bool stacked_ok;
+    bool stacked_ok = true, limitviol = false;
     CPUARMState *env = &cpu->env;
     uint32_t xpsr = xpsr_read(env);
     uint32_t frameptr = env->regs[13];
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
             armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE,
                                     env->v7m.secure);
             env->regs[13] = limit;
-            return true;
+            /*
+             * We won't try to perform any further memory accesses but
+             * we must continue through the following code to check for
+             * permission faults during FPU state preservation, and we
+             * must update FPCCR if lazy stacking is enabled.
+             */
+            limitviol = true;
+            stacked_ok = false;
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
      * (which may be taken in preference to the one we started with
      * if it has higher priority).
      */
-    stacked_ok =
+    stacked_ok = stacked_ok &&
         v7m_stack_write(cpu, frameptr, env->regs[0], mmu_idx, false) &&
         v7m_stack_write(cpu, frameptr + 4, env->regs[1], mmu_idx, false) &&
         v7m_stack_write(cpu, frameptr + 8, env->regs[2], mmu_idx, false) &&
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
         v7m_stack_write(cpu, frameptr + 24, env->regs[15], mmu_idx, false) &&
         v7m_stack_write(cpu, frameptr + 28, xpsr, mmu_idx, false);
 
-    /* Update SP regardless of whether any of the stack accesses failed. */
-    env->regs[13] = frameptr;
+    /*
+     * If we broke a stack limit then SP was already updated earlier;
+     * otherwise we update SP regardless of whether any of the stack
+     * accesses failed or we took some other kind of fault.
+     */
+    if (!limitviol) {
+        env->regs[13] = frameptr;
+    }
 
     return !stacked_ok;
 }
-- 
2.20.1

Handle floating point registers in exception entry.
This corresponds to the FP-specific parts of the pseudocode
functions ActivateException() and PushStack().

We defer the code corresponding to UpdateFPCCR() to a later patch.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-11-peter.maydell@linaro.org
---
 target/arm/helper.c | 98 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 95 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void v7m_exception_taken(ARMCPU *cpu, uint32_t lr, bool dotailchain,
     switch_v7m_security_state(env, targets_secure);
     write_v7m_control_spsel(env, 0);
     arm_clear_exclusive(env);
+    /* Clear SFPA and FPCA (has no effect if no FPU) */
+    env->v7m.control[M_REG_S] &=
+        ~(R_V7M_CONTROL_FPCA_MASK | R_V7M_CONTROL_SFPA_MASK);
     /* Clear IT bits */
     env->condexec_bits = 0;
     env->regs[14] = lr;
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
     uint32_t xpsr = xpsr_read(env);
     uint32_t frameptr = env->regs[13];
     ARMMMUIdx mmu_idx = arm_mmu_idx(env);
+    uint32_t framesize;
+    bool nsacr_cp10 = extract32(env->v7m.nsacr, 10, 1);
+
+    if ((env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK) &&
+        (env->v7m.secure || nsacr_cp10)) {
+        if (env->v7m.secure &&
+            env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_TS_MASK) {
+            framesize = 0xa8;
+        } else {
+            framesize = 0x68;
+        }
+    } else {
+        framesize = 0x20;
+    }
 
     /* Align stack pointer if the guest wants that */
     if ((frameptr & 4) &&
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
         xpsr |= XPSR_SPREALIGN;
     }
 
-    frameptr -= 0x20;
+    xpsr &= ~XPSR_SFPA;
+    if (env->v7m.secure &&
+        (env->v7m.control[M_REG_S] & R_V7M_CONTROL_SFPA_MASK)) {
+        xpsr |= XPSR_SFPA;
+    }
+
+    frameptr -= framesize;
 
     if (arm_feature(env, ARM_FEATURE_V8)) {
         uint32_t limit = v7m_sp_limit(env);
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
         v7m_stack_write(cpu, frameptr + 24, env->regs[15], mmu_idx, false) &&
         v7m_stack_write(cpu, frameptr + 28, xpsr, mmu_idx, false);
 
+    if (env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK) {
+        /* FPU is active, try to save its registers */
+        bool fpccr_s = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
+        bool lspact = env->v7m.fpccr[fpccr_s] & R_V7M_FPCCR_LSPACT_MASK;
+
+        if (lspact && arm_feature(env, ARM_FEATURE_M_SECURITY)) {
+            qemu_log_mask(CPU_LOG_INT,
+                          "...SecureFault because LSPACT and FPCA both set\n");
+            env->v7m.sfsr |= R_V7M_SFSR_LSERR_MASK;
+            armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SECURE, false);
+        } else if (!env->v7m.secure && !nsacr_cp10) {
+            qemu_log_mask(CPU_LOG_INT,
+                          "...Secure UsageFault with CFSR.NOCP because "
+                          "NSACR.CP10 prevents stacking FP regs\n");
+            armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, M_REG_S);
+            env->v7m.cfsr[M_REG_S] |= R_V7M_CFSR_NOCP_MASK;
+        } else {
+            if (!(env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_LSPEN_MASK)) {
+                /* Lazy stacking disabled, save registers now */
+                int i;
+                bool cpacr_pass = v7m_cpacr_pass(env, env->v7m.secure,
+                                                 arm_current_el(env) != 0);
+
+                if (stacked_ok && !cpacr_pass) {
+                    /*
+                     * Take UsageFault if CPACR forbids access. The pseudocode
+                     * here does a full CheckCPEnabled() but we know the NSACR
+                     * check can never fail as we have already handled that.
+                     */
+                    qemu_log_mask(CPU_LOG_INT,
+                                  "...UsageFault with CFSR.NOCP because "
+                                  "CPACR.CP10 prevents stacking FP regs\n");
+                    armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE,
+                                            env->v7m.secure);
+                    env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_NOCP_MASK;
+                    stacked_ok = false;
+                }
+
+                for (i = 0; i < ((framesize == 0xa8) ? 32 : 16); i += 2) {
+                    uint64_t dn = *aa32_vfp_dreg(env, i / 2);
+                    uint32_t faddr = frameptr + 0x20 + 4 * i;
+                    uint32_t slo = extract64(dn, 0, 32);
+                    uint32_t shi = extract64(dn, 32, 32);
+
+                    if (i >= 16) {
+                        faddr += 8; /* skip the slot for the FPSCR */
+                    }
+                    stacked_ok = stacked_ok &&
+                        v7m_stack_write(cpu, faddr, slo, mmu_idx, false) &&
+                        v7m_stack_write(cpu, faddr + 4, shi, mmu_idx, false);
+                }
+                stacked_ok = stacked_ok &&
+                    v7m_stack_write(cpu, frameptr + 0x60,
+                                    vfp_get_fpscr(env), mmu_idx, false);
+                if (cpacr_pass) {
+                    for (i = 0; i < ((framesize == 0xa8) ? 32 : 16); i += 2) {
+                        *aa32_vfp_dreg(env, i / 2) = 0;
+                    }
+                    vfp_set_fpscr(env, 0);
+                }
+            } else {
+                /* Lazy stacking enabled, save necessary info to stack later */
+                /* TODO : equivalent of UpdateFPCCR() pseudocode */
+            }
+        }
+    }
+
     /*
      * If we broke a stack limit then SP was already updated earlier;
      * otherwise we update SP regardless of whether any of the stack
@@ -XXX,XX +XXX,XX @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
 
     if (arm_feature(env, ARM_FEATURE_V8)) {
         lr = R_V7M_EXCRET_RES1_MASK |
-            R_V7M_EXCRET_DCRS_MASK |
-            R_V7M_EXCRET_FTYPE_MASK;
+            R_V7M_EXCRET_DCRS_MASK;
         /* The S bit indicates whether we should return to Secure
          * or NonSecure (ie our current state).
          * The ES bit indicates whether we're taking this exception
@@ -XXX,XX +XXX,XX @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
         if (env->v7m.secure) {
             lr |= R_V7M_EXCRET_S_MASK;
         }
+        if (!(env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK)) {
+            lr |= R_V7M_EXCRET_FTYPE_MASK;
+        }
     } else {
         lr = R_V7M_EXCRET_RES1_MASK |
             R_V7M_EXCRET_S_MASK |
-- 
2.20.1

Implement the code which updates the FPCCR register on an
exception entry where we are going to use lazy FP stacking.
We have to defer to the NVIC to determine whether the
various exceptions are currently ready or not.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190416125744.27770-12-peter.maydell@linaro.org
---
 target/arm/cpu.h      | 14 +++++++++
 hw/intc/armv7m_nvic.c | 34 ++++++++++++++++++++++
 target/arm/helper.c   | 67 ++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 114 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ void armv7m_nvic_acknowledge_irq(void *opaque);
  * (Ignoring -1, this is the same as the RETTOBASE value before completion.)
  */
 int armv7m_nvic_complete_irq(void *opaque, int irq, bool secure);
+/**
+ * armv7m_nvic_get_ready_status(void *opaque, int irq, bool secure)
+ * @opaque: the NVIC
+ * @irq: the exception number to mark pending
+ * @secure: false for non-banked exceptions or for the nonsecure
+ * version of a banked exception, true for the secure version of a banked
+ * exception.
+ *
+ * Return whether an exception is "ready", i.e. whether the exception is
+ * enabled and is configured at a priority which would allow it to
+ * interrupt the current execution priority. This controls whether the
+ * RDY bit for it in the FPCCR is set.
+ */
+bool armv7m_nvic_get_ready_status(void *opaque, int irq, bool secure);
 /**
  * armv7m_nvic_raw_execution_priority: return the raw execution priority
  * @opaque: the NVIC
diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -XXX,XX +XXX,XX @@ int armv7m_nvic_complete_irq(void *opaque, int irq, bool secure)
     return ret;
 }
 
+bool armv7m_nvic_get_ready_status(void *opaque, int irq, bool secure)
+{
+    /*
+     * Return whether an exception is "ready", i.e. it is enabled and is
+     * configured at a priority which would allow it to interrupt the
+     * current execution priority.
+     *
+     * irq and secure have the same semantics as for armv7m_nvic_set_pending():
+     * for non-banked exceptions secure is always false; for banked exceptions
+     * it indicates which of the exceptions is required.
+     */
+    NVICState *s = (NVICState *)opaque;
+    bool banked = exc_is_banked(irq);
+    VecInfo *vec;
+    int running = nvic_exec_prio(s);
+
+    assert(irq > ARMV7M_EXCP_RESET && irq < s->num_irq);
+    assert(!secure || banked);
+
+    /*
+     * HardFault is an odd special case: we always check against -1,
+     * even if we're secure and HardFault has priority -3; we never
+     * need to check for enabled state.
+     */
+    if (irq == ARMV7M_EXCP_HARD) {
+        return running > -1;
+    }
+
+    vec = (banked && secure) ? &s->sec_vectors[irq] : &s->vectors[irq];
+
+    return vec->enabled &&
+        exc_group_prio(s, vec->prio, secure) < running;
+}
+
 /* callback when external interrupt line is changed */
 static void set_irq_level(void *opaque, int n, int level)
 {
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void v7m_exception_taken(ARMCPU *cpu, uint32_t lr, bool dotailchain,
     env->thumb = addr & 1;
 }
 
+static void v7m_update_fpccr(CPUARMState *env, uint32_t frameptr,
+                             bool apply_splim)
+{
+    /*
+     * Like the pseudocode UpdateFPCCR: save state in FPCAR and FPCCR
+     * that we will need later in order to do lazy FP reg stacking.
+     */
+    bool is_secure = env->v7m.secure;
+    void *nvic = env->nvic;
+    /*
+     * Some bits are unbanked and live always in fpccr[M_REG_S]; some bits
+     * are banked and we want to update the bit in the bank for the
+     * current security state; and in one case we want to specifically
+     * update the NS banked version of a bit even if we are secure.
+     */
+    uint32_t *fpccr_s = &env->v7m.fpccr[M_REG_S];
+    uint32_t *fpccr_ns = &env->v7m.fpccr[M_REG_NS];
+    uint32_t *fpccr = &env->v7m.fpccr[is_secure];
+    bool hfrdy, bfrdy, mmrdy, ns_ufrdy, s_ufrdy, sfrdy, monrdy;
+
+    env->v7m.fpcar[is_secure] = frameptr & ~0x7;
+
+    if (apply_splim && arm_feature(env, ARM_FEATURE_V8)) {
+        bool splimviol;
+        uint32_t splim = v7m_sp_limit(env);
+        bool ign = armv7m_nvic_neg_prio_requested(nvic, is_secure) &&
+            (env->v7m.ccr[is_secure] & R_V7M_CCR_STKOFHFNMIGN_MASK);
+
+        splimviol = !ign && frameptr < splim;
+        *fpccr = FIELD_DP32(*fpccr, V7M_FPCCR, SPLIMVIOL, splimviol);
+    }
+
+    *fpccr = FIELD_DP32(*fpccr, V7M_FPCCR, LSPACT, 1);
+
+    *fpccr_s = FIELD_DP32(*fpccr_s, V7M_FPCCR, S, is_secure);
+
+    *fpccr = FIELD_DP32(*fpccr, V7M_FPCCR, USER, arm_current_el(env) == 0);
+
+    *fpccr = FIELD_DP32(*fpccr, V7M_FPCCR, THREAD,
+                        !arm_v7m_is_handler_mode(env));
+
+    hfrdy = armv7m_nvic_get_ready_status(nvic, ARMV7M_EXCP_HARD, false);
+    *fpccr_s = FIELD_DP32(*fpccr_s, V7M_FPCCR, HFRDY, hfrdy);
+
+    bfrdy = armv7m_nvic_get_ready_status(nvic, ARMV7M_EXCP_BUS, false);
+    *fpccr_s = FIELD_DP32(*fpccr_s, V7M_FPCCR, BFRDY, bfrdy);
+
+    mmrdy = armv7m_nvic_get_ready_status(nvic, ARMV7M_EXCP_MEM, is_secure);
+    *fpccr = FIELD_DP32(*fpccr, V7M_FPCCR, MMRDY, mmrdy);
+
+    ns_ufrdy = armv7m_nvic_get_ready_status(nvic, ARMV7M_EXCP_USAGE, false);
+    *fpccr_ns = FIELD_DP32(*fpccr_ns, V7M_FPCCR, UFRDY, ns_ufrdy);
+
+    monrdy = armv7m_nvic_get_ready_status(nvic, ARMV7M_EXCP_DEBUG, false);
+    *fpccr_s = FIELD_DP32(*fpccr_s, V7M_FPCCR, MONRDY, monrdy);
+
+    if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
+        s_ufrdy = armv7m_nvic_get_ready_status(nvic, ARMV7M_EXCP_USAGE, true);
+        *fpccr_s = FIELD_DP32(*fpccr_s, V7M_FPCCR, UFRDY, s_ufrdy);
+
+        sfrdy = armv7m_nvic_get_ready_status(nvic, ARMV7M_EXCP_SECURE, false);
+        *fpccr_s = FIELD_DP32(*fpccr_s, V7M_FPCCR, SFRDY, sfrdy);
+    }
+}
+
 static bool v7m_push_stack(ARMCPU *cpu)
 {
     /* Do the "set up stack frame" part of exception entry,
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
                 }
             } else {
                 /* Lazy stacking enabled, save necessary info to stack later */
-                /* TODO : equivalent of UpdateFPCCR() pseudocode */
+                v7m_update_fpccr(env, frameptr + 0x20, true);
             }
         }
     }
-- 
2.20.1

For v8M floating point support, transitions from Secure
to Non-secure state via BLNS and BLXNS must clear the
CONTROL.SFPA bit. (This corresponds to the pseudocode
BranchToNS() function.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-13-peter.maydell@linaro.org
---
 target/arm/helper.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_bxns)(CPUARMState *env, uint32_t dest)
     /* translate.c should have made BXNS UNDEF unless we're secure */
     assert(env->v7m.secure);
 
+    if (!(dest & 1)) {
+        env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_SFPA_MASK;
+    }
     switch_v7m_security_state(env, dest & 1);
     env->thumb = 1;
     env->regs[15] = dest & ~1;
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_blxns)(CPUARMState *env, uint32_t dest)
          */
         write_v7m_exception(env, 1);
     }
+    env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_SFPA_MASK;
     switch_v7m_security_state(env, 0);
     env->thumb = 1;
     env->regs[15] = dest;
-- 
2.20.1

The TailChain() pseudocode specifies that a tail chaining
exception should sanitize the excReturn all-ones bits and
(if there is no FPU) the excReturn FType bits; we weren't
doing this.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-14-peter.maydell@linaro.org
---
 target/arm/helper.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void v7m_exception_taken(ARMCPU *cpu, uint32_t lr, bool dotailchain,
     qemu_log_mask(CPU_LOG_INT, "...taking pending %s exception %d\n",
                   targets_secure ? "secure" : "nonsecure", exc);
 
+    if (dotailchain) {
+        /* Sanitize LR FType and PREFIX bits */
+        if (!arm_feature(env, ARM_FEATURE_VFP)) {
+            lr |= R_V7M_EXCRET_FTYPE_MASK;
+        }
+        lr = deposit32(lr, 24, 8, 0xff);
+    }
+
     if (arm_feature(env, ARM_FEATURE_V8)) {
         if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
             (lr & R_V7M_EXCRET_S_MASK)) {
-- 
2.20.1

The magic value pushed onto the callee stack as an integrity
check is different if floating point is present.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-15-peter.maydell@linaro.org
---
 target/arm/helper.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ load_fail:
     return false;
 }
 
+static uint32_t v7m_integrity_sig(CPUARMState *env, uint32_t lr)
+{
+    /*
+     * Return the integrity signature value for the callee-saves
+     * stack frame section. @lr is the exception return payload/LR value
+     * whose FType bit forms bit 0 of the signature if FP is present.
+     */
+    uint32_t sig = 0xfefa125a;
+
+    if (!arm_feature(env, ARM_FEATURE_VFP) || (lr & R_V7M_EXCRET_FTYPE_MASK)) {
+        sig |= 1;
+    }
+    return sig;
+}
+
 static bool v7m_push_callee_stack(ARMCPU *cpu, uint32_t lr, bool dotailchain,
                                   bool ignore_faults)
 {
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_callee_stack(ARMCPU *cpu, uint32_t lr, bool dotailchain,
     bool stacked_ok;
     uint32_t limit;
     bool want_psp;
+    uint32_t sig;
 
     if (dotailchain) {
         bool mode = lr & R_V7M_EXCRET_MODE_MASK;
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_callee_stack(ARMCPU *cpu, uint32_t lr, bool dotailchain,
     /* Write as much of the stack frame as we can. A write failure may
      * cause us to pend a derived exception.
      */
+    sig = v7m_integrity_sig(env, lr);
     stacked_ok =
-        v7m_stack_write(cpu, frameptr, 0xfefa125b, mmu_idx, ignore_faults) &&
+        v7m_stack_write(cpu, frameptr, sig, mmu_idx, ignore_faults) &&
         v7m_stack_write(cpu, frameptr + 0x8, env->regs[4], mmu_idx,
                         ignore_faults) &&
         v7m_stack_write(cpu, frameptr + 0xc, env->regs[5], mmu_idx,
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
         if (return_to_secure &&
             ((excret & R_V7M_EXCRET_ES_MASK) == 0 ||
              (excret & R_V7M_EXCRET_DCRS_MASK) == 0)) {
-            uint32_t expected_sig = 0xfefa125b;
             uint32_t actual_sig;
 
             pop_ok = v7m_stack_read(cpu, &actual_sig, frameptr, mmu_idx);
 
-            if (pop_ok && expected_sig != actual_sig) {
+            if (pop_ok && v7m_integrity_sig(env, excret) != actual_sig) {
                 /* Take a SecureFault on the current stack */
                 env->v7m.sfsr |= R_V7M_SFSR_INVIS_MASK;
                 armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SECURE, false);
-- 
2.20.1

Handle floating point registers in exception return.
This corresponds to pseudocode functions ValidateExceptionReturn(),
ExceptionReturn(), PopStack() and ConsumeExcStackFrame().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-16-peter.maydell@linaro.org
---
 target/arm/helper.c | 142 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 141 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
     bool rettobase = false;
     bool exc_secure = false;
     bool return_to_secure;
+    bool ftype;
+    bool restore_s16_s31;
 
     /* If we're not in Handler mode then jumps to magic exception-exit
      * addresses don't have magic behaviour. However for the v8M
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
                       excret);
     }
 
+    ftype = excret & R_V7M_EXCRET_FTYPE_MASK;
+
+    if (!arm_feature(env, ARM_FEATURE_VFP) && !ftype) {
+        qemu_log_mask(LOG_GUEST_ERROR, "M profile: zero FTYPE in exception "
+                      "exit PC value 0x%" PRIx32 " is UNPREDICTABLE "
+                      "if FPU not present\n",
+                      excret);
+        ftype = true;
+    }
+
     if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
         /* EXC_RETURN.ES validation check (R_SMFL). We must do this before
          * we pick which FAULTMASK to clear.
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
      */
     write_v7m_control_spsel_for_secstate(env, return_to_sp_process, exc_secure);
 
+    /*
+     * Clear scratch FP values left in caller saved registers; this
+     * must happen before any kind of tail chaining.
+     */
+    if ((env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_CLRONRET_MASK) &&
+        (env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK)) {
+        if (env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_LSPACT_MASK) {
+            env->v7m.sfsr |= R_V7M_SFSR_LSERR_MASK;
+            armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SECURE, false);
+            qemu_log_mask(CPU_LOG_INT, "...taking SecureFault on existing "
+                          "stackframe: error during lazy state deactivation\n");
+            v7m_exception_taken(cpu, excret, true, false);
+            return;
+        } else {
+            /* Clear s0..s15 and FPSCR */
+            int i;
+
+            for (i = 0; i < 16; i += 2) {
+                *aa32_vfp_dreg(env, i / 2) = 0;
+            }
+            vfp_set_fpscr(env, 0);
+        }
+    }
+
     if (sfault) {
         env->v7m.sfsr |= R_V7M_SFSR_INVER_MASK;
         armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SECURE, false);
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
             }
         }
 
+        if (!ftype) {
+            /* FP present and we need to handle it */
+            if (!return_to_secure &&
+                (env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_LSPACT_MASK)) {
+                armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SECURE, false);
+                env->v7m.sfsr |= R_V7M_SFSR_LSERR_MASK;
+                qemu_log_mask(CPU_LOG_INT,
+                              "...taking SecureFault on existing stackframe: "
+                              "Secure LSPACT set but exception return is "
+                              "not to secure state\n");
+                v7m_exception_taken(cpu, excret, true, false);
+                return;
+            }
+
+            restore_s16_s31 = return_to_secure &&
+                (env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_TS_MASK);
+
+            if (env->v7m.fpccr[return_to_secure] & R_V7M_FPCCR_LSPACT_MASK) {
+                /* State in FPU is still valid, just clear LSPACT */
+                env->v7m.fpccr[return_to_secure] &= ~R_V7M_FPCCR_LSPACT_MASK;
+            } else {
+                int i;
+                uint32_t fpscr;
+                bool cpacr_pass, nsacr_pass;
+
+                cpacr_pass = v7m_cpacr_pass(env, return_to_secure,
+                                            return_to_priv);
+                nsacr_pass = return_to_secure ||
+                    extract32(env->v7m.nsacr, 10, 1);
+
+                if (!cpacr_pass) {
+                    armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE,
+                                            return_to_secure);
+                    env->v7m.cfsr[return_to_secure] |= R_V7M_CFSR_NOCP_MASK;
+                    qemu_log_mask(CPU_LOG_INT,
+                                  "...taking UsageFault on existing "
+                                  "stackframe: CPACR.CP10 prevents unstacking "
+                                  "FP regs\n");
+                    v7m_exception_taken(cpu, excret, true, false);
+                    return;
+                } else if (!nsacr_pass) {
+                    armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, true);
+                    env->v7m.cfsr[M_REG_S] |= R_V7M_CFSR_INVPC_MASK;
+                    qemu_log_mask(CPU_LOG_INT,
+                                  "...taking Secure UsageFault on existing "
+                                  "stackframe: NSACR.CP10 prevents unstacking "
+                                  "FP regs\n");
+                    v7m_exception_taken(cpu, excret, true, false);
+                    return;
+                }
+
+                for (i = 0; i < (restore_s16_s31 ? 32 : 16); i += 2) {
+                    uint32_t slo, shi;
+                    uint64_t dn;
+                    uint32_t faddr = frameptr + 0x20 + 4 * i;
+
+                    if (i >= 16) {
+                        faddr += 8; /* Skip the slot for the FPSCR */
+                    }
+
+                    pop_ok = pop_ok &&
+                        v7m_stack_read(cpu, &slo, faddr, mmu_idx) &&
+                        v7m_stack_read(cpu, &shi, faddr + 4, mmu_idx);
+
+                    if (!pop_ok) {
+                        break;
+                    }
+
+                    dn = (uint64_t)shi << 32 | slo;
+                    *aa32_vfp_dreg(env, i / 2) = dn;
+                }
+                pop_ok = pop_ok &&
+                    v7m_stack_read(cpu, &fpscr, frameptr + 0x60, mmu_idx);
+                if (pop_ok) {
+                    vfp_set_fpscr(env, fpscr);
+                }
+                if (!pop_ok) {
+                    /*
+                     * These regs are 0 if security extension present;
+                     * otherwise merely UNKNOWN. We zero always.
+                     */
+                    for (i = 0; i < (restore_s16_s31 ? 32 : 16); i += 2) {
+                        *aa32_vfp_dreg(env, i / 2) = 0;
+                    }
+                    vfp_set_fpscr(env, 0);
+                }
+            }
+        }
+        env->v7m.control[M_REG_S] = FIELD_DP32(env->v7m.control[M_REG_S],
+                                               V7M_CONTROL, FPCA, !ftype);
+
         /* Commit to consuming the stack frame */
         frameptr += 0x20;
+        if (!ftype) {
+            frameptr += 0x48;
+            if (restore_s16_s31) {
+                frameptr += 0x40;
+            }
+        }
         /* Undo stack alignment (the SPREALIGN bit indicates that the original
          * pre-exception SP was not 8-aligned and we added a padding word to
          * align it, so we undo this by ORing in the bit that increases it
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
         *frame_sp_p = frameptr;
     }
     /* This xpsr_write() will invalidate frame_sp_p as it may switch stack */
-    xpsr_write(env, xpsr, ~XPSR_SPREALIGN);
+    xpsr_write(env, xpsr, ~(XPSR_SPREALIGN | XPSR_SFPA));
+
+    if (env->v7m.secure) {
+        bool sfpa = xpsr & XPSR_SFPA;
+
+        env->v7m.control[M_REG_S] = FIELD_DP32(env->v7m.control[M_REG_S],
+                                               V7M_CONTROL, SFPA, sfpa);
+    }
 
     /* The restored xPSR exception field will be zero if we're
      * resuming in Thread mode. If that doesn't match what the
-- 
2.20.1

Move the NS TBFLAG down from bit 19 to bit 6, which has not
been used since commit c1e3781090b9d36c60 in 2015, when we
started passing the entire MMU index in the TB flags rather
than just a 'privilege level' bit.

This rearrangement is not strictly necessary, but means that
we can put M-profile-only bits next to each other rather
than scattered across the flag word.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-17-peter.maydell@linaro.org
---
 target/arm/cpu.h | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

We are close to running out of TB flags for AArch32; we could
start using the cs_base word, but before we do that we can
economise on our usage by sharing the same bits for the VFP
VECSTRIDE field and the XScale XSCALE_CPAR field. This
works because no XScale CPU ever had VFP.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-18-peter.maydell@linaro.org
---
 target/arm/cpu.h       | 10 ++++++----
 target/arm/cpu.c       |  7 +++++++
 target/arm/helper.c    |  6 +++++-
 target/arm/translate.c |  9 +++++++--
 4 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_ANY, BE_DATA, 23, 1)
 FIELD(TBFLAG_A32, THUMB, 0, 1)
 FIELD(TBFLAG_A32, VECLEN, 1, 3)
 FIELD(TBFLAG_A32, VECSTRIDE, 4, 2)
+/*
+ * We store the bottom two bits of the CPAR as TB flags and handle
+ * checks on the other bits at runtime. This shares the same bits as
+ * VECSTRIDE, which is OK as no XScale CPU has VFP.
+ */
+FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2)
 /*
  * Indicates whether cp register reads and writes by guest code should access
  * the secure or nonsecure bank of banked registers; note that this is not
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, NS, 6, 1)
 FIELD(TBFLAG_A32, VFPEN, 7, 1)
 FIELD(TBFLAG_A32, CONDEXEC, 8, 8)
 FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
-/* We store the bottom two bits of the CPAR as TB flags and handle
- * checks on the other bits at runtime
- */
-FIELD(TBFLAG_A32, XSCALE_CPAR, 17, 2)
 /* For M profile only, Handler (ie not Thread) mode */
 FIELD(TBFLAG_A32, HANDLER, 21, 1)
 /* For M profile only, whether we should generate stack-limit checks */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         set_feature(env, ARM_FEATURE_THUMB_DSP);
     }
 
+    /*
+     * We rely on no XScale CPU having VFP so we can use the same bits in the
+     * TB flags field for VECSTRIDE and XSCALE_CPAR.
+     */
+    assert(!(arm_feature(env, ARM_FEATURE_VFP) &&
+             arm_feature(env, ARM_FEATURE_XSCALE)));
+
     if (arm_feature(env, ARM_FEATURE_V7) &&
         !arm_feature(env, ARM_FEATURE_M) &&
         !arm_feature(env, ARM_FEATURE_PMSA)) {
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
             || arm_el_is_aa64(env, 1) || arm_feature(env, ARM_FEATURE_M)) {
             flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
         }
-        flags = FIELD_DP32(flags, TBFLAG_A32, XSCALE_CPAR, env->cp15.c15_cpar);
+        /* Note that XSCALE_CPAR shares bits with VECSTRIDE */
+        if (arm_feature(env, ARM_FEATURE_XSCALE)) {
+            flags = FIELD_DP32(flags, TBFLAG_A32,
+                               XSCALE_CPAR, env->cp15.c15_cpar);
+        }
     }
 
     flags = FIELD_DP32(flags, TBFLAG_ANY, MMUIDX, arm_to_core_mmu_idx(mmu_idx));
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     dc->fp_excp_el = FIELD_EX32(tb_flags, TBFLAG_ANY, FPEXC_EL);
     dc->vfp_enabled = FIELD_EX32(tb_flags, TBFLAG_A32, VFPEN);
     dc->vec_len = FIELD_EX32(tb_flags, TBFLAG_A32, VECLEN);
-    dc->vec_stride = FIELD_EX32(tb_flags, TBFLAG_A32, VECSTRIDE);
-    dc->c15_cpar = FIELD_EX32(tb_flags, TBFLAG_A32, XSCALE_CPAR);
+    if (arm_feature(env, ARM_FEATURE_XSCALE)) {
+        dc->c15_cpar = FIELD_EX32(tb_flags, TBFLAG_A32, XSCALE_CPAR);
+        dc->vec_stride = 0;
+    } else {
+        dc->vec_stride = FIELD_EX32(tb_flags, TBFLAG_A32, VECSTRIDE);
+        dc->c15_cpar = 0;
+    }
     dc->v7m_handler_mode = FIELD_EX32(tb_flags, TBFLAG_A32, HANDLER);
     dc->v8m_secure = arm_feature(env, ARM_FEATURE_M_SECURITY) &&
         regime_is_secure(env, dc->mmu_idx);
-- 
2.20.1

The M-profile FPCCR.S bit indicates the security status of
the floating point context. In the pseudocode ExecuteFPCheck()
function it is unconditionally set to match the current
security state whenever a floating point instruction is
executed.

Implement this by adding a new TB flag which tracks whether
FPCCR.S is different from the current security state, so
that we only need to emit the code to update it in the
less-common case when it is not already set correctly.

Note that we will add the handling for the other work done
by ExecuteFPCheck() in later commits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-19-peter.maydell@linaro.org
---
 target/arm/cpu.h       |  2 ++
 target/arm/translate.h |  1 +
 target/arm/helper.c    |  5 +++++
 target/arm/translate.c | 20 ++++++++++++++++++++
 4 files changed, 28 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, NS, 6, 1)
 FIELD(TBFLAG_A32, VFPEN, 7, 1)
 FIELD(TBFLAG_A32, CONDEXEC, 8, 8)
 FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
+/* For M profile only, set if FPCCR.S does not match current security state */
+FIELD(TBFLAG_A32, FPCCR_S_WRONG, 20, 1)
 /* For M profile only, Handler (ie not Thread) mode */
 FIELD(TBFLAG_A32, HANDLER, 21, 1)
 /* For M profile only, whether we should generate stack-limit checks */
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool v7m_handler_mode;
     bool v8m_secure; /* true if v8M and we're in Secure mode */
     bool v8m_stackcheck; /* true if we need to perform v8M stack limit checks */
+    bool v8m_fpccr_s_wrong; /* true if v8M FPCCR.S != v8m_secure */
     /* Immediate value in AArch32 SVC insn; must be set if is_jmp == DISAS_SWI
      * so that top level loop can generate correct syndrome information.
      */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         flags = FIELD_DP32(flags, TBFLAG_A32, STACKCHECK, 1);
     }
 
+    if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
+        FIELD_EX32(env->v7m.fpccr[M_REG_S], V7M_FPCCR, S) != env->v7m.secure) {
+        flags = FIELD_DP32(flags, TBFLAG_A32, FPCCR_S_WRONG, 1);
+    }
+
     *pflags = flags;
     *cs_base = 0;
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
         }
     }
 
+    if (arm_dc_feature(s, ARM_FEATURE_M)) {
+        /* Handle M-profile lazy FP state mechanics */
+
+        /* Update ownership of FP context: set FPCCR.S to match current state */
+        if (s->v8m_fpccr_s_wrong) {
+            TCGv_i32 tmp;
+
+            tmp = load_cpu_field(v7m.fpccr[M_REG_S]);
+            if (s->v8m_secure) {
+                tcg_gen_ori_i32(tmp, tmp, R_V7M_FPCCR_S_MASK);
+            } else {
+                tcg_gen_andi_i32(tmp, tmp, ~R_V7M_FPCCR_S_MASK);
+            }
+            store_cpu_field(tmp, v7m.fpccr[M_REG_S]);
+            /* Don't need to do this for any further FP insns in this TB */
+            s->v8m_fpccr_s_wrong = false;
+        }
+    }
+
     if (extract32(insn, 28, 4) == 0xf) {
         /*
          * Encodings with T=1 (Thumb) or unconditional (ARM):
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     dc->v8m_secure = arm_feature(env, ARM_FEATURE_M_SECURITY) &&
         regime_is_secure(env, dc->mmu_idx);
     dc->v8m_stackcheck = FIELD_EX32(tb_flags, TBFLAG_A32, STACKCHECK);
+    dc->v8m_fpccr_s_wrong = FIELD_EX32(tb_flags, TBFLAG_A32, FPCCR_S_WRONG);
     dc->cp_regs = cpu->cp_regs;
     dc->features = env->features;
 
-- 
2.20.1

The M-profile FPCCR.ASPEN bit indicates that automatic floating-point
context preservation is enabled. Before executing any floating-point
instruction, if FPCCR.ASPEN is set and the CONTROL FPCA/SFPA bits
indicate that there is no active floating point context then we
must create a new context (by initializing FPSCR and setting
FPCA/SFPA to indicate that the context is now active). In the
pseudocode this is handled by ExecuteFPCheck().

Implement this with a new TB flag which tracks whether we
need to create a new FP context.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-20-peter.maydell@linaro.org
---
 target/arm/cpu.h       |  2 ++
 target/arm/translate.h |  1 +
 target/arm/helper.c    | 13 +++++++++++++
 target/arm/translate.c | 29 +++++++++++++++++++++++++++++
 4 files changed, 45 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, NS, 6, 1)
 FIELD(TBFLAG_A32, VFPEN, 7, 1)
 FIELD(TBFLAG_A32, CONDEXEC, 8, 8)
 FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
+/* For M profile only, set if we must create a new FP context */
+FIELD(TBFLAG_A32, NEW_FP_CTXT_NEEDED, 19, 1)
 /* For M profile only, set if FPCCR.S does not match current security state */
 FIELD(TBFLAG_A32, FPCCR_S_WRONG, 20, 1)
 /* For M profile only, Handler (ie not Thread) mode */
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool v8m_secure; /* true if v8M and we're in Secure mode */
     bool v8m_stackcheck; /* true if we need to perform v8M stack limit checks */
     bool v8m_fpccr_s_wrong; /* true if v8M FPCCR.S != v8m_secure */
+    bool v7m_new_fp_ctxt_needed; /* ASPEN set but no active FP context */
     /* Immediate value in AArch32 SVC insn; must be set if is_jmp == DISAS_SWI
      * so that top level loop can generate correct syndrome information.
      */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         flags = FIELD_DP32(flags, TBFLAG_A32, FPCCR_S_WRONG, 1);
     }
 
+    if (arm_feature(env, ARM_FEATURE_M) &&
+        (env->v7m.fpccr[env->v7m.secure] & R_V7M_FPCCR_ASPEN_MASK) &&
+        (!(env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK) ||
+         (env->v7m.secure &&
+          !(env->v7m.control[M_REG_S] & R_V7M_CONTROL_SFPA_MASK)))) {
+        /*
+         * ASPEN is set, but FPCA/SFPA indicate that there is no active
+         * FP context; we must create a new FP context before executing
+         * any FP insn.
+         */
+        flags = FIELD_DP32(flags, TBFLAG_A32, NEW_FP_CTXT_NEEDED, 1);
+    }
+
     *pflags = flags;
     *cs_base = 0;
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             /* Don't need to do this for any further FP insns in this TB */
             s->v8m_fpccr_s_wrong = false;
         }
+
+        if (s->v7m_new_fp_ctxt_needed) {
+            /*
+             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
+             * and the FPSCR.
+             */
+            TCGv_i32 control, fpscr;
+            uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
+
+            fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
+            gen_helper_vfp_set_fpscr(cpu_env, fpscr);
+            tcg_temp_free_i32(fpscr);
+            /*
+             * We don't need to arrange to end the TB, because the only
+             * parts of FPSCR which we cache in the TB flags are the VECLEN
+             * and VECSTRIDE, and those don't exist for M-profile.
+             */
+
+            if (s->v8m_secure) {
+                bits |= R_V7M_CONTROL_SFPA_MASK;
+            }
+            control = load_cpu_field(v7m.control[M_REG_S]);
+            tcg_gen_ori_i32(control, control, bits);
+            store_cpu_field(control, v7m.control[M_REG_S]);
+            /* Don't need to do this for any further FP insns in this TB */
+            s->v7m_new_fp_ctxt_needed = false;
+        }
     }
 
     if (extract32(insn, 28, 4) == 0xf) {
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
         regime_is_secure(env, dc->mmu_idx);
     dc->v8m_stackcheck = FIELD_EX32(tb_flags, TBFLAG_A32, STACKCHECK);
     dc->v8m_fpccr_s_wrong = FIELD_EX32(tb_flags, TBFLAG_A32, FPCCR_S_WRONG);
+    dc->v7m_new_fp_ctxt_needed =
+        FIELD_EX32(tb_flags, TBFLAG_A32, NEW_FP_CTXT_NEEDED);
     dc->cp_regs = cpu->cp_regs;
     dc->features = env->features;
 
-- 
2.20.1

Add a new helper function which returns the MMU index to use
for v7M, where the caller specifies all of the security
state, privilege level and whether the execution priority
is negative, and reimplement the existing
arm_v7m_mmu_idx_for_secstate_and_priv() in terms of it.

We are going to need this for the lazy-FP-stacking code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-21-peter.maydell@linaro.org
---
 target/arm/cpu.h    |  7 +++++++
 target/arm/helper.c | 14 +++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx)
     }
 }
 
+/*
+ * Return the MMU index for a v7M CPU with all relevant information
+ * manually specified.
+ */
+ARMMMUIdx arm_v7m_mmu_idx_all(CPUARMState *env,
+                              bool secstate, bool priv, bool negpri);
+
 /* Return the MMU index for a v7M CPU in the specified security and
  * privilege state.
  */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ int fp_exception_el(CPUARMState *env, int cur_el)
     return 0;
 }
 
-ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
-                                                bool secstate, bool priv)
+ARMMMUIdx arm_v7m_mmu_idx_all(CPUARMState *env,
+                              bool secstate, bool priv, bool negpri)
 {
     ARMMMUIdx mmu_idx = ARM_MMU_IDX_M;
 
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
         mmu_idx |= ARM_MMU_IDX_M_PRIV;
     }
 
-    if (armv7m_nvic_neg_prio_requested(env->nvic, secstate)) {
+    if (negpri) {
         mmu_idx |= ARM_MMU_IDX_M_NEGPRI;
     }
 
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
     return mmu_idx;
 }
 
+ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
+                                                bool secstate, bool priv)
+{
+    bool negpri = armv7m_nvic_neg_prio_requested(env->nvic, secstate);
+
+    return arm_v7m_mmu_idx_all(env, secstate, priv, negpri);
+}
+
 /* Return the MMU index for a v7M CPU in the specified security state */
 ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
 {
-- 
2.20.1

In the v7M architecture, if an exception is generated in the process
of doing the lazy stacking of FP registers, the handling of
possible escalation to HardFault is treated differently to the normal
approach: it works based on the saved information about exception
readiness that was stored in the FPCCR when the stack frame was
created. Provide a new function armv7m_nvic_set_pending_lazyfp()
which pends exceptions during lazy stacking, and implements
this logic.

This corresponds to the pseudocode TakePreserveFPException().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-22-peter.maydell@linaro.org
---
 target/arm/cpu.h      | 12 ++++++
 hw/intc/armv7m_nvic.c | 96 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 108 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ void armv7m_nvic_set_pending(void *opaque, int irq, bool secure);
  * a different exception).
  */
 void armv7m_nvic_set_pending_derived(void *opaque, int irq, bool secure);
+/**
+ * armv7m_nvic_set_pending_lazyfp: mark this lazy FP exception as pending
+ * @opaque: the NVIC
+ * @irq: the exception number to mark pending
+ * @secure: false for non-banked exceptions or for the nonsecure
+ * version of a banked exception, true for the secure version of a banked
+ * exception.
+ *
+ * Similar to armv7m_nvic_set_pending(), but specifically for exceptions
+ * generated in the course of lazy stacking of FP registers.
+ */
+void armv7m_nvic_set_pending_lazyfp(void *opaque, int irq, bool secure);
 /**
  * armv7m_nvic_get_pending_irq_info: return highest priority pending
  *    exception, and whether it targets Secure state
diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -XXX,XX +XXX,XX @@ void armv7m_nvic_set_pending_derived(void *opaque, int irq, bool secure)
     do_armv7m_nvic_set_pending(opaque, irq, secure, true);
 }
 
+void armv7m_nvic_set_pending_lazyfp(void *opaque, int irq, bool secure)
+{
+    /*
+     * Pend an exception during lazy FP stacking. This differs
+     * from the usual exception pending because the logic for
+     * whether we should escalate depends on the saved context
+     * in the FPCCR register, not on the current state of the CPU/NVIC.
+     */
+    NVICState *s = (NVICState *)opaque;
+    bool banked = exc_is_banked(irq);
+    VecInfo *vec;
+    bool targets_secure;
+    bool escalate = false;
+    /*
+     * We will only look at bits in fpccr if this is a banked exception
+     * (in which case 'secure' tells us whether it is the S or NS version).
+     * All the bits for the non-banked exceptions are in fpccr_s.
+     */
+    uint32_t fpccr_s = s->cpu->env.v7m.fpccr[M_REG_S];
+    uint32_t fpccr = s->cpu->env.v7m.fpccr[secure];
+
+    assert(irq > ARMV7M_EXCP_RESET && irq < s->num_irq);
+    assert(!secure || banked);
+
+    vec = (banked && secure) ? &s->sec_vectors[irq] : &s->vectors[irq];
+
+    targets_secure = banked ? secure : exc_targets_secure(s, irq);
+
+    switch (irq) {
+    case ARMV7M_EXCP_DEBUG:
+        if (!(fpccr_s & R_V7M_FPCCR_MONRDY_MASK)) {
+            /* Ignore DebugMonitor exception */
+            return;
+        }
+        break;
+    case ARMV7M_EXCP_MEM:
+        escalate = !(fpccr & R_V7M_FPCCR_MMRDY_MASK);
+        break;
+    case ARMV7M_EXCP_USAGE:
+        escalate = !(fpccr & R_V7M_FPCCR_UFRDY_MASK);
+        break;
+    case ARMV7M_EXCP_BUS:
+        escalate = !(fpccr_s & R_V7M_FPCCR_BFRDY_MASK);
+        break;
+    case ARMV7M_EXCP_SECURE:
+        escalate = !(fpccr_s & R_V7M_FPCCR_SFRDY_MASK);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    if (escalate) {
+        /*
+         * Escalate to HardFault: faults that initially targeted Secure
+         * continue to do so, even if HF normally targets NonSecure.
+         */
+        irq = ARMV7M_EXCP_HARD;
+        if (arm_feature(&s->cpu->env, ARM_FEATURE_M_SECURITY) &&
+            (targets_secure ||
+             !(s->cpu->env.v7m.aircr & R_V7M_AIRCR_BFHFNMINS_MASK))) {
+            vec = &s->sec_vectors[irq];
+        } else {
+            vec = &s->vectors[irq];
+        }
+    }
+
+    if (!vec->enabled ||
+        nvic_exec_prio(s) <= exc_group_prio(s, vec->prio, secure)) {
+        if (!(fpccr_s & R_V7M_FPCCR_HFRDY_MASK)) {
+            /*
+             * We want to escalate to HardFault but the context the
+             * FP state belongs to prevents the exception pre-empting.
+             */
+            cpu_abort(&s->cpu->parent_obj,
+                      "Lockup: can't escalate to HardFault during "
+                      "lazy FP register stacking\n");
+        }
+    }
+
+    if (escalate) {
+        s->cpu->env.v7m.hfsr |= R_V7M_HFSR_FORCED_MASK;
+    }
+    if (!vec->pending) {
+        vec->pending = 1;
+        /*
+         * We do not call nvic_irq_update(), because we know our caller
+         * is going to handle causing us to take the exception by
+         * raising EXCP_LAZYFP, so raising the IRQ line would be
+         * pointless extra work. We just need to recompute the
+         * priorities so that armv7m_nvic_can_take_pending_exception()
+         * returns the right answer.
+         */
+        nvic_recompute_state(s);
+    }
+}
+
 /* Make pending IRQ active.  */
 void armv7m_nvic_acknowledge_irq(void *opaque)
 {
-- 
2.20.1

Pushing registers to the stack for v7M needs to handle three cases:
 * the "normal" case where we pend exceptions
 * an "ignore faults" case where we set FSR bits but
   do not pend exceptions (this is used when we are
   handling some kinds of derived exception on exception entry)
 * a "lazy FP stacking" case, where different FSR bits
   are set and the exception is pended differently

Implement this by changing the existing flag argument that
tells us whether to ignore faults or not into an enum that
specifies which of the 3 modes we should handle.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-23-peter.maydell@linaro.org
---
 target/arm/helper.c | 118 +++++++++++++++++++++++++++++---------------
 1 file changed, 79 insertions(+), 39 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static bool v7m_cpacr_pass(CPUARMState *env, bool is_secure, bool is_priv)
     }
 }
 
+/*
+ * What kind of stack write are we doing? This affects how exceptions
+ * generated during the stacking are treated.
+ */
+typedef enum StackingMode {
+    STACK_NORMAL,
+    STACK_IGNFAULTS,
+    STACK_LAZYFP,
+} StackingMode;
+
 static bool v7m_stack_write(ARMCPU *cpu, uint32_t addr, uint32_t value,
-                            ARMMMUIdx mmu_idx, bool ignfault)
+                            ARMMMUIdx mmu_idx, StackingMode mode)
 {
     CPUState *cs = CPU(cpu);
     CPUARMState *env = &cpu->env;
@@ -XXX,XX +XXX,XX @@ static bool v7m_stack_write(ARMCPU *cpu, uint32_t addr, uint32_t value,
                       &attrs, &prot, &page_size, &fi, NULL)) {
         /* MPU/SAU lookup failed */
         if (fi.type == ARMFault_QEMU_SFault) {
-            qemu_log_mask(CPU_LOG_INT,
-                          "...SecureFault with SFSR.AUVIOL during stacking\n");
-            env->v7m.sfsr |= R_V7M_SFSR_AUVIOL_MASK | R_V7M_SFSR_SFARVALID_MASK;
+            if (mode == STACK_LAZYFP) {
+                qemu_log_mask(CPU_LOG_INT,
+                              "...SecureFault with SFSR.LSPERR "
+                              "during lazy stacking\n");
+                env->v7m.sfsr |= R_V7M_SFSR_LSPERR_MASK;
+            } else {
+                qemu_log_mask(CPU_LOG_INT,
+                              "...SecureFault with SFSR.AUVIOL "
+                              "during stacking\n");
+                env->v7m.sfsr |= R_V7M_SFSR_AUVIOL_MASK;
+            }
+            env->v7m.sfsr |= R_V7M_SFSR_SFARVALID_MASK;
             env->v7m.sfar = addr;
             exc = ARMV7M_EXCP_SECURE;
             exc_secure = false;
         } else {
-            qemu_log_mask(CPU_LOG_INT, "...MemManageFault with CFSR.MSTKERR\n");
-            env->v7m.cfsr[secure] |= R_V7M_CFSR_MSTKERR_MASK;
+            if (mode == STACK_LAZYFP) {
+                qemu_log_mask(CPU_LOG_INT,
+                              "...MemManageFault with CFSR.MLSPERR\n");
+                env->v7m.cfsr[secure] |= R_V7M_CFSR_MLSPERR_MASK;
+            } else {
+                qemu_log_mask(CPU_LOG_INT,
+                              "...MemManageFault with CFSR.MSTKERR\n");
+                env->v7m.cfsr[secure] |= R_V7M_CFSR_MSTKERR_MASK;
+            }
             exc = ARMV7M_EXCP_MEM;
             exc_secure = secure;
         }
@@ -XXX,XX +XXX,XX @@ static bool v7m_stack_write(ARMCPU *cpu, uint32_t addr, uint32_t value,
                          attrs, &txres);
     if (txres != MEMTX_OK) {
         /* BusFault trying to write the data */
-        qemu_log_mask(CPU_LOG_INT, "...BusFault with BFSR.STKERR\n");
-        env->v7m.cfsr[M_REG_NS] |= R_V7M_CFSR_STKERR_MASK;
+        if (mode == STACK_LAZYFP) {
+            qemu_log_mask(CPU_LOG_INT, "...BusFault with BFSR.LSPERR\n");
+            env->v7m.cfsr[M_REG_NS] |= R_V7M_CFSR_LSPERR_MASK;
+        } else {
+            qemu_log_mask(CPU_LOG_INT, "...BusFault with BFSR.STKERR\n");
+            env->v7m.cfsr[M_REG_NS] |= R_V7M_CFSR_STKERR_MASK;
+        }
         exc = ARMV7M_EXCP_BUS;
         exc_secure = false;
         goto pend_fault;
@@ -XXX,XX +XXX,XX @@ pend_fault:
      * later if we have two derived exceptions.
      * The only case when we must not pend the exception but instead
      * throw it away is if we are doing the push of the callee registers
-     * and we've already generated a derived exception. Even in this
-     * case we will still update the fault status registers.
+     * and we've already generated a derived exception (this is indicated
+     * by the caller passing STACK_IGNFAULTS). Even in this case we will
+     * still update the fault status registers.
      */
-    if (!ignfault) {
+    switch (mode) {
+    case STACK_NORMAL:
         armv7m_nvic_set_pending_derived(env->nvic, exc, exc_secure);
+        break;
+    case STACK_LAZYFP:
+        armv7m_nvic_set_pending_lazyfp(env->nvic, exc, exc_secure);
+        break;
+    case STACK_IGNFAULTS:
+        break;
     }
     return false;
 }
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_callee_stack(ARMCPU *cpu, uint32_t lr, bool dotailchain,
     uint32_t limit;
     bool want_psp;
     uint32_t sig;
+    StackingMode smode = ignore_faults ? STACK_IGNFAULTS : STACK_NORMAL;
 
     if (dotailchain) {
         bool mode = lr & R_V7M_EXCRET_MODE_MASK;
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_callee_stack(ARMCPU *cpu, uint32_t lr, bool dotailchain,
      */
     sig = v7m_integrity_sig(env, lr);
     stacked_ok =
-        v7m_stack_write(cpu, frameptr, sig, mmu_idx, ignore_faults) &&
-        v7m_stack_write(cpu, frameptr + 0x8, env->regs[4], mmu_idx,
-                        ignore_faults) &&
-        v7m_stack_write(cpu, frameptr + 0xc, env->regs[5], mmu_idx,
-                        ignore_faults) &&
-        v7m_stack_write(cpu, frameptr + 0x10, env->regs[6], mmu_idx,
-                        ignore_faults) &&
-        v7m_stack_write(cpu, frameptr + 0x14, env->regs[7], mmu_idx,
-                        ignore_faults) &&
-        v7m_stack_write(cpu, frameptr + 0x18, env->regs[8], mmu_idx,
-                        ignore_faults) &&
-        v7m_stack_write(cpu, frameptr + 0x1c, env->regs[9], mmu_idx,
-                        ignore_faults) &&
-        v7m_stack_write(cpu, frameptr + 0x20, env->regs[10], mmu_idx,
-                        ignore_faults) &&
-        v7m_stack_write(cpu, frameptr + 0x24, env->regs[11], mmu_idx,
-                        ignore_faults);
+        v7m_stack_write(cpu, frameptr, sig, mmu_idx, smode) &&
+        v7m_stack_write(cpu, frameptr + 0x8, env->regs[4], mmu_idx, smode) &&
+        v7m_stack_write(cpu, frameptr + 0xc, env->regs[5], mmu_idx, smode) &&
+        v7m_stack_write(cpu, frameptr + 0x10, env->regs[6], mmu_idx, smode) &&
+        v7m_stack_write(cpu, frameptr + 0x14, env->regs[7], mmu_idx, smode) &&
+        v7m_stack_write(cpu, frameptr + 0x18, env->regs[8], mmu_idx, smode) &&
+        v7m_stack_write(cpu, frameptr + 0x1c, env->regs[9], mmu_idx, smode) &&
+        v7m_stack_write(cpu, frameptr + 0x20, env->regs[10], mmu_idx, smode) &&
+        v7m_stack_write(cpu, frameptr + 0x24, env->regs[11], mmu_idx, smode);
 
     /* Update SP regardless of whether any of the stack accesses failed. */
     *frame_sp_p = frameptr;
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
      * if it has higher priority).
      */
     stacked_ok = stacked_ok &&
-        v7m_stack_write(cpu, frameptr, env->regs[0], mmu_idx, false) &&
-        v7m_stack_write(cpu, frameptr + 4, env->regs[1], mmu_idx, false) &&
-        v7m_stack_write(cpu, frameptr + 8, env->regs[2], mmu_idx, false) &&
-        v7m_stack_write(cpu, frameptr + 12, env->regs[3], mmu_idx, false) &&
-        v7m_stack_write(cpu, frameptr + 16, env->regs[12], mmu_idx, false) &&
-        v7m_stack_write(cpu, frameptr + 20, env->regs[14], mmu_idx, false) &&
-        v7m_stack_write(cpu, frameptr + 24, env->regs[15], mmu_idx, false) &&
-        v7m_stack_write(cpu, frameptr + 28, xpsr, mmu_idx, false);
+        v7m_stack_write(cpu, frameptr, env->regs[0], mmu_idx, STACK_NORMAL) &&
+        v7m_stack_write(cpu, frameptr + 4, env->regs[1],
+                        mmu_idx, STACK_NORMAL) &&
+        v7m_stack_write(cpu, frameptr + 8, env->regs[2],
+                        mmu_idx, STACK_NORMAL) &&
+        v7m_stack_write(cpu, frameptr + 12, env->regs[3],
+                        mmu_idx, STACK_NORMAL) &&
+        v7m_stack_write(cpu, frameptr + 16, env->regs[12],
+                        mmu_idx, STACK_NORMAL) &&
+        v7m_stack_write(cpu, frameptr + 20, env->regs[14],
+                        mmu_idx, STACK_NORMAL) &&
+        v7m_stack_write(cpu, frameptr + 24, env->regs[15],
+                        mmu_idx, STACK_NORMAL) &&
+        v7m_stack_write(cpu, frameptr + 28, xpsr, mmu_idx, STACK_NORMAL);
 
     if (env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK) {
         /* FPU is active, try to save its registers */
@@ -XXX,XX +XXX,XX @@ static bool v7m_push_stack(ARMCPU *cpu)
                         faddr += 8; /* skip the slot for the FPSCR */
                     }
                     stacked_ok = stacked_ok &&
-                        v7m_stack_write(cpu, faddr, slo, mmu_idx, false) &&
-                        v7m_stack_write(cpu, faddr + 4, shi, mmu_idx, false);
+                        v7m_stack_write(cpu, faddr, slo,
+                                        mmu_idx, STACK_NORMAL) &&
+                        v7m_stack_write(cpu, faddr + 4, shi,
+                                        mmu_idx, STACK_NORMAL);
                 }
                 stacked_ok = stacked_ok &&
                     v7m_stack_write(cpu, frameptr + 0x60,
-                                    vfp_get_fpscr(env), mmu_idx, false);
+                                    vfp_get_fpscr(env), mmu_idx, STACK_NORMAL);
                 if (cpacr_pass) {
                     for (i = 0; i < ((framesize == 0xa8) ? 32 : 16); i += 2) {
                         *aa32_vfp_dreg(env, i / 2) = 0;
-- 
2.20.1

The M-profile architecture floating point system supports
lazy FP state preservation, where FP registers are not
pushed to the stack when an exception occurs but are instead
only saved if and when the first FP instruction in the exception
handler is executed. Implement this in QEMU, corresponding
to the check of LSPACT in the pseudocode ExecuteFPCheck().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-24-peter.maydell@linaro.org
---
 target/arm/cpu.h       |   3 ++
 target/arm/helper.h    |   2 +
 target/arm/translate.h |   1 +
 target/arm/helper.c    | 112 +++++++++++++++++++++++++++++++++++++++++
 target/arm/translate.c |  22 ++++++++
 5 files changed, 140 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@
 #define EXCP_NOCP           17   /* v7M NOCP UsageFault */
 #define EXCP_INVSTATE       18   /* v7M INVSTATE UsageFault */
 #define EXCP_STKOF          19   /* v8M STKOF UsageFault */
+#define EXCP_LAZYFP         20   /* v7M fault during lazy FP stacking */
 /* NB: add new EXCP_ defines to the array in arm_log_exception() too */
 
 #define ARMV7M_EXCP_RESET   1
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, NS, 6, 1)
 FIELD(TBFLAG_A32, VFPEN, 7, 1)
 FIELD(TBFLAG_A32, CONDEXEC, 8, 8)
 FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
+/* For M profile only, set if FPCCR.LSPACT is set */
+FIELD(TBFLAG_A32, LSPACT, 18, 1)
 /* For M profile only, set if we must create a new FP context */
 FIELD(TBFLAG_A32, NEW_FP_CTXT_NEEDED, 19, 1)
 /* For M profile only, set if FPCCR.S does not match current security state */
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(v7m_blxns, void, env, i32)
 
 DEF_HELPER_3(v7m_tt, i32, env, i32, i32)
 
+DEF_HELPER_1(v7m_preserve_fp_state, void, env)
+
 DEF_HELPER_2(v8m_stackcheck, void, env, i32)
 
 DEF_HELPER_4(access_check_cp_reg, void, env, ptr, i32, i32)
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     bool v8m_stackcheck; /* true if we need to perform v8M stack limit checks */
     bool v8m_fpccr_s_wrong; /* true if v8M FPCCR.S != v8m_secure */
     bool v7m_new_fp_ctxt_needed; /* ASPEN set but no active FP context */
+    bool v7m_lspact; /* FPCCR.LSPACT set */
     /* Immediate value in AArch32 SVC insn; must be set if is_jmp == DISAS_SWI
      * so that top level loop can generate correct syndrome information.
      */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_blxns)(CPUARMState *env, uint32_t dest)
     g_assert_not_reached();
 }
 
+void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
+{
+    /* translate.c should never generate calls here in user-only mode */
+    g_assert_not_reached();
+}
+
 uint32_t HELPER(v7m_tt)(CPUARMState *env, uint32_t addr, uint32_t op)
 {
     /* The TT instructions can be used by unprivileged code, but in
@@ -XXX,XX +XXX,XX @@ pend_fault:
     return false;
 }
 
+void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
+{
+    /*
+     * Preserve FP state (because LSPACT was set and we are about
+     * to execute an FP instruction). This corresponds to the
+     * PreserveFPState() pseudocode.
+     * We may throw an exception if the stacking fails.
+     */
+    ARMCPU *cpu = arm_env_get_cpu(env);
+    bool is_secure = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
+    bool negpri = !(env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_HFRDY_MASK);
+    bool is_priv = !(env->v7m.fpccr[is_secure] & R_V7M_FPCCR_USER_MASK);
+    bool splimviol = env->v7m.fpccr[is_secure] & R_V7M_FPCCR_SPLIMVIOL_MASK;
+    uint32_t fpcar = env->v7m.fpcar[is_secure];
+    bool stacked_ok = true;
+    bool ts = is_secure && (env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_TS_MASK);
+    bool take_exception;
+
+    /* Take the iothread lock as we are going to touch the NVIC */
+    qemu_mutex_lock_iothread();
+
+    /* Check the background context had access to the FPU */
+    if (!v7m_cpacr_pass(env, is_secure, is_priv)) {
+        armv7m_nvic_set_pending_lazyfp(env->nvic, ARMV7M_EXCP_USAGE, is_secure);
+        env->v7m.cfsr[is_secure] |= R_V7M_CFSR_NOCP_MASK;
+        stacked_ok = false;
+    } else if (!is_secure && !extract32(env->v7m.nsacr, 10, 1)) {
+        armv7m_nvic_set_pending_lazyfp(env->nvic, ARMV7M_EXCP_USAGE, M_REG_S);
+        env->v7m.cfsr[M_REG_S] |= R_V7M_CFSR_NOCP_MASK;
+        stacked_ok = false;
+    }
+
+    if (!splimviol && stacked_ok) {
+        /* We only stack if the stack limit wasn't violated */
+        int i;
+        ARMMMUIdx mmu_idx;
+
+        mmu_idx = arm_v7m_mmu_idx_all(env, is_secure, is_priv, negpri);
+        for (i = 0; i < (ts ? 32 : 16); i += 2) {
+            uint64_t dn = *aa32_vfp_dreg(env, i / 2);
+            uint32_t faddr = fpcar + 4 * i;
+            uint32_t slo = extract64(dn, 0, 32);
+            uint32_t shi = extract64(dn, 32, 32);
+
+            if (i >= 16) {
+                faddr += 8; /* skip the slot for the FPSCR */
+            }
+            stacked_ok = stacked_ok &&
+                v7m_stack_write(cpu, faddr, slo, mmu_idx, STACK_LAZYFP) &&
+                v7m_stack_write(cpu, faddr + 4, shi, mmu_idx, STACK_LAZYFP);
+        }
+
+        stacked_ok = stacked_ok &&
+            v7m_stack_write(cpu, fpcar + 0x40,
+                            vfp_get_fpscr(env), mmu_idx, STACK_LAZYFP);
+    }
+
+    /*
+     * We definitely pended an exception, but it's possible that it
+     * might not be able to be taken now. If its priority permits us
+     * to take it now, then we must not update the LSPACT or FP regs,
+     * but instead jump out to take the exception immediately.
+     * If it's just pending and won't be taken until the current
+     * handler exits, then we do update LSPACT and the FP regs.
+     */
+    take_exception = !stacked_ok &&
+        armv7m_nvic_can_take_pending_exception(env->nvic);
+
+    qemu_mutex_unlock_iothread();
+
+    if (take_exception) {
+        raise_exception_ra(env, EXCP_LAZYFP, 0, 1, GETPC());
+    }
+
+    env->v7m.fpccr[is_secure] &= ~R_V7M_FPCCR_LSPACT_MASK;
+
+    if (ts) {
+        /* Clear s0 to s31 and the FPSCR */
+        int i;
+
+        for (i = 0; i < 32; i += 2) {
+            *aa32_vfp_dreg(env, i / 2) = 0;
+        }
+        vfp_set_fpscr(env, 0);
+    }
+    /*
+     * Otherwise s0 to s15 and FPSCR are UNKNOWN; we choose to leave them
+     * unchanged.
+     */
+}
+
 /* Write to v7M CONTROL.SPSEL bit for the specified security bank.
  * This may change the current stack pointer between Main and Process
  * stack pointers if it is done for the CONTROL register for the current
@@ -XXX,XX +XXX,XX @@ static void arm_log_exception(int idx)
             [EXCP_NOCP] = "v7M NOCP UsageFault",
             [EXCP_INVSTATE] = "v7M INVSTATE UsageFault",
             [EXCP_STKOF] = "v8M STKOF UsageFault",
+            [EXCP_LAZYFP] = "v7M exception during lazy FP stacking",
         };
 
         if (idx >= 0 && idx < ARRAY_SIZE(excnames)) {
@@ -XXX,XX +XXX,XX @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
             return;
         }
         break;
+    case EXCP_LAZYFP:
+        /*
+         * We already pended the specific exception in the NVIC in the
+         * v7m_preserve_fp_state() helper function.
+         */
+        break;
     default:
         cpu_abort(cs, "Unhandled exception 0x%x\n", cs->exception_index);
         return; /* Never happens.  Keep compiler happy.  */
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         flags = FIELD_DP32(flags, TBFLAG_A32, NEW_FP_CTXT_NEEDED, 1);
     }
 
+    if (arm_feature(env, ARM_FEATURE_M)) {
+        bool is_secure = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
+
+        if (env->v7m.fpccr[is_secure] & R_V7M_FPCCR_LSPACT_MASK) {
+            flags = FIELD_DP32(flags, TBFLAG_A32, LSPACT, 1);
+        }
+    }
+
     *pflags = flags;
     *cs_base = 0;
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
     if (arm_dc_feature(s, ARM_FEATURE_M)) {
         /* Handle M-profile lazy FP state mechanics */
 
+        /* Trigger lazy-state preservation if necessary */
+        if (s->v7m_lspact) {
+            /*
+             * Lazy state saving affects external memory and also the NVIC,
+             * so we must mark it as an IO operation for icount.
+             */
+            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+                gen_io_start();
+            }
+            gen_helper_v7m_preserve_fp_state(cpu_env);
+            if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
+                gen_io_end();
+            }
+            /*
+             * If the preserve_fp_state helper doesn't throw an exception
+             * then it will clear LSPACT; we don't need to repeat this for
+             * any further FP insns in this TB.
+             */
+            s->v7m_lspact = false;
+        }
+
         /* Update ownership of FP context: set FPCCR.S to match current state */
         if (s->v8m_fpccr_s_wrong) {
             TCGv_i32 tmp;
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     dc->v8m_fpccr_s_wrong = FIELD_EX32(tb_flags, TBFLAG_A32, FPCCR_S_WRONG);
     dc->v7m_new_fp_ctxt_needed =
         FIELD_EX32(tb_flags, TBFLAG_A32, NEW_FP_CTXT_NEEDED);
+    dc->v7m_lspact = FIELD_EX32(tb_flags, TBFLAG_A32, LSPACT);
     dc->cp_regs = cpu->cp_regs;
     dc->features = env->features;
 
-- 
2.20.1

Implement the VLSTM instruction for v7M for the FPU present case.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-25-peter.maydell@linaro.org
---
 target/arm/cpu.h       |  2 +
 target/arm/helper.h    |  2 +
 target/arm/helper.c    | 84 ++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate.c | 15 +++++++-
 4 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@
 #define EXCP_INVSTATE       18   /* v7M INVSTATE UsageFault */
 #define EXCP_STKOF          19   /* v8M STKOF UsageFault */
 #define EXCP_LAZYFP         20   /* v7M fault during lazy FP stacking */
+#define EXCP_LSERR          21   /* v8M LSERR SecureFault */
+#define EXCP_UNALIGNED      22   /* v7M UNALIGNED UsageFault */
 /* NB: add new EXCP_ defines to the array in arm_log_exception() too */
 
 #define ARMV7M_EXCP_RESET   1
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(v7m_tt, i32, env, i32, i32)
 
 DEF_HELPER_1(v7m_preserve_fp_state, void, env)
 
+DEF_HELPER_2(v7m_vlstm, void, env, i32)
+
 DEF_HELPER_2(v8m_stackcheck, void, env, i32)
 
 DEF_HELPER_4(access_check_cp_reg, void, env, ptr, i32, i32)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
     g_assert_not_reached();
 }
 
+void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
+{
+    /* translate.c should never generate calls here in user-only mode */
+    g_assert_not_reached();
+}
+
 uint32_t HELPER(v7m_tt)(CPUARMState *env, uint32_t addr, uint32_t op)
 {
     /* The TT instructions can be used by unprivileged code, but in
@@ -XXX,XX +XXX,XX @@ static void v7m_update_fpccr(CPUARMState *env, uint32_t frameptr,
     }
 }
 
+void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
+{
+    /* fptr is the value of Rn, the frame pointer we store the FP regs to */
+    bool s = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
+    bool lspact = env->v7m.fpccr[s] & R_V7M_FPCCR_LSPACT_MASK;
+
+    assert(env->v7m.secure);
+
+    if (!(env->v7m.control[M_REG_S] & R_V7M_CONTROL_SFPA_MASK)) {
+        return;
+    }
+
+    /* Check access to the coprocessor is permitted */
+    if (!v7m_cpacr_pass(env, true, arm_current_el(env) != 0)) {
+        raise_exception_ra(env, EXCP_NOCP, 0, 1, GETPC());
+    }
+
+    if (lspact) {
+        /* LSPACT should not be active when there is active FP state */
+        raise_exception_ra(env, EXCP_LSERR, 0, 1, GETPC());
+    }
+
+    if (fptr & 7) {
+        raise_exception_ra(env, EXCP_UNALIGNED, 0, 1, GETPC());
+    }
+
+    /*
+     * Note that we do not use v7m_stack_write() here, because the
+     * accesses should not set the FSR bits for stacking errors if they
+     * fail. (In pseudocode terms, they are AccType_NORMAL, not AccType_STACK
+     * or AccType_LAZYFP). Faults in cpu_stl_data() will throw exceptions
+     * and longjmp out.
+     */
+    if (!(env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_LSPEN_MASK)) {
+        bool ts = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_TS_MASK;
+        int i;
+
+        for (i = 0; i < (ts ? 32 : 16); i += 2) {
+            uint64_t dn = *aa32_vfp_dreg(env, i / 2);
+            uint32_t faddr = fptr + 4 * i;
+            uint32_t slo = extract64(dn, 0, 32);
+            uint32_t shi = extract64(dn, 32, 32);
+
+            if (i >= 16) {
+                faddr += 8; /* skip the slot for the FPSCR */
+            }
+            cpu_stl_data(env, faddr, slo);
+            cpu_stl_data(env, faddr + 4, shi);
+        }
+        cpu_stl_data(env, fptr + 0x40, vfp_get_fpscr(env));
+
+        /*
+         * If TS is 0 then s0 to s15 and FPSCR are UNKNOWN; we choose to
+         * leave them unchanged, matching our choice in v7m_preserve_fp_state.
+         */
+        if (ts) {
+            for (i = 0; i < 32; i += 2) {
+                *aa32_vfp_dreg(env, i / 2) = 0;
+            }
+            vfp_set_fpscr(env, 0);
+        }
+    } else {
+        v7m_update_fpccr(env, fptr, false);
+    }
+
+    env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_FPCA_MASK;
+}
+
 static bool v7m_push_stack(ARMCPU *cpu)
 {
     /* Do the "set up stack frame" part of exception entry,
@@ -XXX,XX +XXX,XX @@ static void arm_log_exception(int idx)
             [EXCP_INVSTATE] = "v7M INVSTATE UsageFault",
             [EXCP_STKOF] = "v8M STKOF UsageFault",
             [EXCP_LAZYFP] = "v7M exception during lazy FP stacking",
+            [EXCP_LSERR] = "v8M LSERR UsageFault",
+            [EXCP_UNALIGNED] = "v7M UNALIGNED UsageFault",
         };
 
         if (idx >= 0 && idx < ARRAY_SIZE(excnames)) {
@@ -XXX,XX +XXX,XX @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
         armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
         env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_STKOF_MASK;
         break;
+    case EXCP_LSERR:
+        armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SECURE, false);
+        env->v7m.sfsr |= R_V7M_SFSR_LSERR_MASK;
+        break;
+    case EXCP_UNALIGNED:
+        armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
+        env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_UNALIGNED_MASK;
+        break;
     case EXCP_SWI:
         /* The PC already points to the next instruction.  */
         armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SVC, env->v7m.secure);
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
                 if (!s->v8m_secure || (insn & 0x0040f0ff)) {
                     goto illegal_op;
                 }
-                /* Just NOP since FP support is not implemented */
+
+                if (arm_dc_feature(s, ARM_FEATURE_VFP)) {
+                    TCGv_i32 fptr = load_reg(s, rn);
+
+                    if (extract32(insn, 20, 1)) {
+                        /* VLLDM */
+                    } else {
+                        gen_helper_v7m_vlstm(cpu_env, fptr);
+                    }
+                    tcg_temp_free_i32(fptr);
+
+                    /* End the TB, because we have updated FP control bits */
+                    s->base.is_jmp = DISAS_UPDATE;
+                }
                 break;
             }
             if (arm_dc_feature(s, ARM_FEATURE_VFP) &&
-- 
2.20.1

Implement the VLLDM instruction for v7M for the FPU present cas.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-26-peter.maydell@linaro.org
---
 target/arm/helper.h    |  1 +
 target/arm/helper.c    | 54 ++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate.c |  2 +-
 3 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(v7m_tt, i32, env, i32, i32)
 DEF_HELPER_1(v7m_preserve_fp_state, void, env)
 
 DEF_HELPER_2(v7m_vlstm, void, env, i32)
+DEF_HELPER_2(v7m_vlldm, void, env, i32)
 
 DEF_HELPER_2(v8m_stackcheck, void, env, i32)
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
     g_assert_not_reached();
 }
 
+void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr)
+{
+    /* translate.c should never generate calls here in user-only mode */
+    g_assert_not_reached();
+}
+
 uint32_t HELPER(v7m_tt)(CPUARMState *env, uint32_t addr, uint32_t op)
 {
     /* The TT instructions can be used by unprivileged code, but in
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
     env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_FPCA_MASK;
 }
 
+void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr)
+{
+    /* fptr is the value of Rn, the frame pointer we load the FP regs from */
+    assert(env->v7m.secure);
+
+    if (!(env->v7m.control[M_REG_S] & R_V7M_CONTROL_SFPA_MASK)) {
+        return;
+    }
+
+    /* Check access to the coprocessor is permitted */
+    if (!v7m_cpacr_pass(env, true, arm_current_el(env) != 0)) {
+        raise_exception_ra(env, EXCP_NOCP, 0, 1, GETPC());
+    }
+
+    if (env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_LSPACT_MASK) {
+        /* State in FP is still valid */
+        env->v7m.fpccr[M_REG_S] &= ~R_V7M_FPCCR_LSPACT_MASK;
+    } else {
+        bool ts = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_TS_MASK;
+        int i;
+        uint32_t fpscr;
+
+        if (fptr & 7) {
+            raise_exception_ra(env, EXCP_UNALIGNED, 0, 1, GETPC());
+        }
+
+        for (i = 0; i < (ts ? 32 : 16); i += 2) {
+            uint32_t slo, shi;
+            uint64_t dn;
+            uint32_t faddr = fptr + 4 * i;
+
+            if (i >= 16) {
+                faddr += 8; /* skip the slot for the FPSCR */
+            }
+
+            slo = cpu_ldl_data(env, faddr);
+            shi = cpu_ldl_data(env, faddr + 4);
+
+            dn = (uint64_t) shi << 32 | slo;
+            *aa32_vfp_dreg(env, i / 2) = dn;
+        }
+        fpscr = cpu_ldl_data(env, fptr + 0x40);
+        vfp_set_fpscr(env, fpscr);
+    }
+
+    env->v7m.control[M_REG_S] |= R_V7M_CONTROL_FPCA_MASK;
+}
+
 static bool v7m_push_stack(ARMCPU *cpu)
 {
     /* Do the "set up stack frame" part of exception entry,
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
                     TCGv_i32 fptr = load_reg(s, rn);
 
                     if (extract32(insn, 20, 1)) {
-                        /* VLLDM */
+                        gen_helper_v7m_vlldm(cpu_env, fptr);
                     } else {
                         gen_helper_v7m_vlstm(cpu_env, fptr);
                     }
-- 
2.20.1

Enable the FPU by default for the Cortex-M4 and Cortex-M33.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190416125744.27770-27-peter.maydell@linaro.org
---
 target/arm/cpu.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void cortex_m4_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_M);
     set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
     set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
+    set_feature(&cpu->env, ARM_FEATURE_VFP4);
     cpu->midr = 0x410fc240; /* r0p0 */
     cpu->pmsav7_dregion = 8;
+    cpu->isar.mvfr0 = 0x10110021;
+    cpu->isar.mvfr1 = 0x11000011;
+    cpu->isar.mvfr2 = 0x00000000;
     cpu->id_pfr0 = 0x00000030;
     cpu->id_pfr1 = 0x00000200;
     cpu->id_dfr0 = 0x00100000;
@@ -XXX,XX +XXX,XX @@ static void cortex_m33_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
     set_feature(&cpu->env, ARM_FEATURE_M_SECURITY);
     set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
+    set_feature(&cpu->env, ARM_FEATURE_VFP4);
     cpu->midr = 0x410fd213; /* r0p3 */
     cpu->pmsav7_dregion = 16;
     cpu->sau_sregion = 8;
+    cpu->isar.mvfr0 = 0x10110021;
+    cpu->isar.mvfr1 = 0x11000011;
+    cpu->isar.mvfr2 = 0x00000040;
     cpu->id_pfr0 = 0x00000030;
     cpu->id_pfr1 = 0x00000210;
     cpu->id_dfr0 = 0x00200000;
-- 
2.20.1