[XEN v4] xen/arm64: io: Decode ldr/str post-indexing instructions

Ayan Kumar Halder posted 1 patch 2 years, 2 months ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20220125211808.23810-1-ayankuma@xilinx.com
There is a newer version of this series
xen/arch/arm/decode.c | 124 +++++++++++++++++++++++++++++++++++++++++-
xen/arch/arm/decode.h |  41 +++++++++++++-
xen/arch/arm/io.c     |  41 +++++++++++---
3 files changed, 195 insertions(+), 11 deletions(-)
[XEN v4] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Ayan Kumar Halder 2 years, 2 months ago
At the moment, Xen is only handling data abort with valid syndrome (i.e.
ISV=0). Unfortunately, this doesn't cover all the instructions a domain
could use to access MMIO regions.

For instance, a baremetal OS can use any of the following instructions, where
x1 contains the address of the MMIO region:

1.      ldr     x2,    [x1],    #8
2.      ldr     w2,    [x1],    #-4
3.      ldr     x2,    [x1],    #-8
4.      ldr     w2,    [x1],    #4
5.      ldrh    w2,    [x1],    #2
6.      ldrb    w2,    [x1],    #1
7.      str     x2,    [x1],    #8
8.      str     w2,    [x1],    #-4
9.      strh    w2,    [x1],    #2
10.     strb    w2,    [x1],    #1

In the following two instructions, Rn could theoretically be stack pointer which
might contain the address of the MMIO region:-
11.     ldrb    w2,    [Rn],    #1
12.     ldrb    wzr,   [Rn],    #1

In order to handle post-indexing store/load instructions (like those mentioned
above), Xen will need to fetch and decode the instruction.

This patch only cover post-index store/load instructions from AArch64 mode.
For now, this is left unimplemented for trap from AArch32 mode.

Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
---

Changelog :-
v2 - 1. Updated the rn register after reading from it. (Pointed by Julien,
        Stefano)
     2. Used a union to represent the instruction opcode (Suggestd by Bertrand)
     3. Fixed coding style issues (Pointed by Julien)
     4. In the previous patch, I was updating dabt->sign based on the signedness
        of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI 0487G.b,
        Page 3221, SSE indicates the signedness of the data item loaded. In our
        case, the data item loaded is always unsigned.

v3- 1. Handled all the variants of ldr/str (ie 64, 32, 16, 8 bit variants).
       Thus, I have removed the check for "instr->code.opc == 0" (Suggested by
       Andre)
    2. Handled the scenario when rn = SP, rt = XZR (Suggested by Jan, Andre)
    3. Added restriction for "rt != rn" (Suggested by Andre)
    4. Moved union ldr_str_instr_class {} to decode.h. This is the header included
       by io.c and decode.c (where the union is referred). (Suggested by Jan)
    5. Indentation and typo fixes (Suggested by Jan)

v4- 1. Fixed the patch as per Stefano's comments on v3. They are as follows :-
        1.1 Use macros to determine the fixed values in the instruction opcode
        1.2 Checked if instr != NULL
        1.3 Changed some data types and added #define ARM_64 for AArch64 specific
            code 
        1.4 Moved post_increment_register() to decode.c so that the decoding
            logic is confined to a single file.
        1.5 Moved some checks from post_increment_register() to
            decode_loadstore_postindexing()
        1.6 Removed a duplicate check
    2. Updated the commit message as per Andre's comments.
    3. Changed the names of a label and some comments. *32bit* was erroneously
       mentioned in a label and comments in decode_loadstore_postindexing()
       although the function handled all variants of ldr/str post indexing.

 xen/arch/arm/decode.c | 124 +++++++++++++++++++++++++++++++++++++++++-
 xen/arch/arm/decode.h |  41 +++++++++++++-
 xen/arch/arm/io.c     |  41 +++++++++++---
 3 files changed, 195 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
index 792c2e92a7..0c12af7afa 100644
--- a/xen/arch/arm/decode.c
+++ b/xen/arch/arm/decode.c
@@ -84,6 +84,101 @@ bad_thumb2:
     return 1;
 }
 
+static int decode_loadstore_postindexing(register_t pc,
+                                         struct hsr_dabt *dabt,
+                                         union ldr_str_instr_class *instr)
+{
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+
+    if ( instr == NULL )
+    {
+        gprintk(XENLOG_ERR, "instr should not be NULL\n");
+        return -EINVAL;
+    }
+
+    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
+    {
+        gprintk(XENLOG_ERR, "Could not copy the instruction from PC\n");
+        return -EFAULT;
+    }
+
+    /*
+     * Rn -ne Rt for ldr/str instruction.
+     * Check https://developer.arm.com/documentation/dui0802/a/CIHGJHED
+     * (Register restrictions)
+     *
+     * The only exception for this is when rn = 31. It denotes SP ("Use of SP")
+     *
+     * And when rt = 31, it denotes wzr/xzr. (Refer
+     * https://developer.arm.com/documentation/den0024/a/ARMv8-Registers/AArch64-special-registers
+     * "There is no register called X31 or W31. Many instructions are encoded
+     * such that the number 31 represents the zero register, ZR (WZR/XZR)."
+     */
+    if ( (instr->code.rn == instr->code.rt) && (instr->code.rn != 31) )
+    {
+        gprintk(XENLOG_ERR, "Rn should not be equal to Rt except for r31\n");
+        return -EINVAL;
+    }
+
+    /* First, let's check for the fixed values */
+    if ( (instr->value & POST_INDEX_FIXED_MASK) != POST_INDEX_FIXED_VALUE )
+    {
+        gprintk(XENLOG_ERR, "Cannot decode instruction 0x%x",instr->value);
+        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
+            " ldr/str post indexing\n");
+        goto bad_loadstore;
+    }
+
+    /*
+     * Handle when rn = SP
+     * Refer ArmV8 ARM DDI 0487G.b, Page - D1-2463 "Stack pointer register selection"
+     * As we are interested in handling exceptions only from EL1 in AArch64 state,
+     * thus M[3:0] == EL1h (Page - C5-480 "When exception taken from AArch64 state:")
+     */
+    if ( (instr->code.rn == 31) && ((regs->cpsr & PSR_MODE_MASK) != PSR_MODE_EL1h) )
+    {
+        gprintk(XENLOG_ERR, "SP is valid only for EL1h\n");
+        goto bad_loadstore;
+    }
+
+    if ( instr->code.v != 0 )
+    {
+        gprintk(XENLOG_ERR,
+            "ldr/str post indexing for vector types are not supported\n");
+        goto bad_loadstore;
+    }
+
+    /* Check for STR (immediate) */
+    if ( instr->code.opc == 0 )
+    {
+        dabt->write = 1;
+    }
+    /* Check for LDR (immediate) */
+    else if ( instr->code.opc == 1 )
+    {
+        dabt->write = 0;
+    }
+    else
+    {
+        gprintk(XENLOG_ERR,
+            "Decoding ldr/str post indexing is not supported for this variant\n");
+        goto bad_loadstore;
+    }
+
+    gprintk(XENLOG_INFO,
+        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
+        instr->code.rt, instr->code.size, instr->code.imm9);
+
+    update_dabt(dabt, instr->code.rt, instr->code.size, false);
+    dabt->valid = 1;
+
+    return 0;
+
+ bad_loadstore:
+    gprintk(XENLOG_ERR, "unhandled Arm instruction 0x%x\n", instr->value);
+    return 1;
+}
+
 static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
 {
     uint16_t instr;
@@ -150,17 +245,44 @@ bad_thumb:
     return 1;
 }
 
-int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
+int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
+                       union ldr_str_instr_class *instr)
 {
     if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
         return decode_thumb(regs->pc, dabt);
 
+    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
+    {
+        return decode_loadstore_postindexing(regs->pc, dabt, instr);
+    }
+
     /* TODO: Handle ARM instruction */
     gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
 
     return 1;
 }
 
+#if CONFIG_ARM_64
+void post_increment_register(union ldr_str_instr_class *instr)
+{
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    register_t val;
+
+    /* handle when rn = SP */
+    if ( instr->code.rn == 31 )
+        val = regs->sp_el1;
+    else
+        val = get_user_reg(regs, instr->code.rn);
+
+    val += instr->code.imm9;
+
+    if ( instr->code.rn == 31 )
+        regs->sp_el1 = val;
+    else
+        set_user_reg(regs, instr->code.rn, val);
+}
+#endif
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
index 4613763bdb..511cd4a05f 100644
--- a/xen/arch/arm/decode.h
+++ b/xen/arch/arm/decode.h
@@ -23,6 +23,35 @@
 #include <asm/regs.h>
 #include <asm/processor.h>
 
+/*
+ * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
+ * Page 318 specifies the following bit pattern for
+ * "load/store register (immediate post-indexed)".
+ *
+ * 31 30 29  27 26 25  23   21 20              11   9         4       0
+ * ___________________________________________________________________
+ * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
+ * |____|______|__|____|____|__|_______________|____|_________|_______|
+ */
+union ldr_str_instr_class {
+    uint32_t value;
+    struct ldr_str {
+        unsigned int rt:5;     /* Rt register */
+        unsigned int rn:5;     /* Rn register */
+        unsigned int fixed1:2; /* value == 01b */
+        signed int imm9:9;            /* imm9 */
+        unsigned int fixed2:1; /* value == 0b */
+        unsigned int opc:2;    /* opc */
+        unsigned int fixed3:2; /* value == 00b */
+        unsigned int v:1;      /* vector */
+        unsigned int fixed4:3; /* value == 111b */
+        unsigned int size:2;   /* size */
+    } code;
+};
+
+#define POST_INDEX_FIXED_MASK   0x3B200C00
+#define POST_INDEX_FIXED_VALUE  0x38000400
+
 /**
  * Decode an instruction from pc
  * /!\ This function is not intended to fully decode an instruction. It
@@ -35,8 +64,18 @@
  */
 
 int decode_instruction(const struct cpu_user_regs *regs,
-                       struct hsr_dabt *dabt);
+                       struct hsr_dabt *dabt,
+                       union ldr_str_instr_class *instr);
 
+/**
+ * Update the register value for Rn
+ * /!\ This function is used to update the register value for Rn when a
+ * post indexing ldr/str instruction is decoded.
+ *
+ * This function will get:
+ * - The post indexing ldr/str instruction opcode
+ */
+void post_increment_register(union ldr_str_instr_class *instr);
 #endif /* __ARCH_ARM_DECODE_H_ */
 
 /*
diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index 729287e37c..b9c15e1fe7 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -106,14 +106,29 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
         .gpa = gpa,
         .dabt = dabt
     };
+    int rc;
+    union ldr_str_instr_class instr = {0};
 
     ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
 
+    /*
+     * Armv8 processor does not provide a valid syndrome for post-indexing
+     * ldr/str instructions. So in order to process these instructions,
+     * Xen must decode them.
+     */
+    if ( !info.dabt.valid )
+    {
+        rc = decode_instruction(regs, &info.dabt, &instr);
+        if ( rc )
+        {
+            gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
+            return IO_ABORT;
+        }
+    }
+
     handler = find_mmio_handler(v->domain, info.gpa);
     if ( !handler )
     {
-        int rc;
-
         rc = try_fwd_ioserv(regs, v, &info);
         if ( rc == IO_HANDLED )
             return handle_ioserv(regs, v);
@@ -121,10 +136,6 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
         return rc;
     }
 
-    /* All the instructions used on emulated MMIO region should be valid */
-    if ( !dabt.valid )
-        return IO_ABORT;
-
     /*
      * Erratum 766422: Thumb store translation fault to Hypervisor may
      * not have correct HSR Rt value.
@@ -134,7 +145,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
     {
         int rc;
 
-        rc = decode_instruction(regs, &info.dabt);
+        rc = decode_instruction(regs, &info.dabt, NULL);
         if ( rc )
         {
             gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
@@ -143,9 +154,21 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
     }
 
     if ( info.dabt.write )
-        return handle_write(handler, v, &info);
+        rc = handle_write(handler, v, &info);
     else
-        return handle_read(handler, v, &info);
+        rc = handle_read(handler, v, &info);
+
+#if CONFIG_ARM_64
+    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
+    {
+        if ( instr.value != 0 )
+        {
+            post_increment_register(&instr);
+        }
+    }
+#endif
+
+    return rc;
 }
 
 void register_mmio_handler(struct domain *d,
-- 
2.17.1


Re: [XEN v4] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Stefano Stabellini 2 years, 2 months ago
On Tue, 25 Jan 2022, Ayan Kumar Halder wrote:
> At the moment, Xen is only handling data abort with valid syndrome (i.e.
> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
> could use to access MMIO regions.
> 
> For instance, a baremetal OS can use any of the following instructions, where
> x1 contains the address of the MMIO region:
> 
> 1.      ldr     x2,    [x1],    #8
> 2.      ldr     w2,    [x1],    #-4
> 3.      ldr     x2,    [x1],    #-8
> 4.      ldr     w2,    [x1],    #4
> 5.      ldrh    w2,    [x1],    #2
> 6.      ldrb    w2,    [x1],    #1
> 7.      str     x2,    [x1],    #8
> 8.      str     w2,    [x1],    #-4
> 9.      strh    w2,    [x1],    #2
> 10.     strb    w2,    [x1],    #1
> 
> In the following two instructions, Rn could theoretically be stack pointer which
> might contain the address of the MMIO region:-
> 11.     ldrb    w2,    [Rn],    #1
> 12.     ldrb    wzr,   [Rn],    #1
> 
> In order to handle post-indexing store/load instructions (like those mentioned
> above), Xen will need to fetch and decode the instruction.
> 
> This patch only cover post-index store/load instructions from AArch64 mode.
> For now, this is left unimplemented for trap from AArch32 mode.

NIT: "For now, AArch32 mode is left unimplemented."


> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
> ---
> 
> Changelog :-
> v2 - 1. Updated the rn register after reading from it. (Pointed by Julien,
>         Stefano)
>      2. Used a union to represent the instruction opcode (Suggestd by Bertrand)
>      3. Fixed coding style issues (Pointed by Julien)
>      4. In the previous patch, I was updating dabt->sign based on the signedness
>         of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI 0487G.b,
>         Page 3221, SSE indicates the signedness of the data item loaded. In our
>         case, the data item loaded is always unsigned.
> 
> v3- 1. Handled all the variants of ldr/str (ie 64, 32, 16, 8 bit variants).
>        Thus, I have removed the check for "instr->code.opc == 0" (Suggested by
>        Andre)
>     2. Handled the scenario when rn = SP, rt = XZR (Suggested by Jan, Andre)
>     3. Added restriction for "rt != rn" (Suggested by Andre)
>     4. Moved union ldr_str_instr_class {} to decode.h. This is the header included
>        by io.c and decode.c (where the union is referred). (Suggested by Jan)
>     5. Indentation and typo fixes (Suggested by Jan)
> 
> v4- 1. Fixed the patch as per Stefano's comments on v3. They are as follows :-
>         1.1 Use macros to determine the fixed values in the instruction opcode
>         1.2 Checked if instr != NULL
>         1.3 Changed some data types and added #define ARM_64 for AArch64 specific
>             code 
>         1.4 Moved post_increment_register() to decode.c so that the decoding
>             logic is confined to a single file.
>         1.5 Moved some checks from post_increment_register() to
>             decode_loadstore_postindexing()
>         1.6 Removed a duplicate check
>     2. Updated the commit message as per Andre's comments.
>     3. Changed the names of a label and some comments. *32bit* was erroneously
>        mentioned in a label and comments in decode_loadstore_postindexing()
>        although the function handled all variants of ldr/str post indexing.
> 
>  xen/arch/arm/decode.c | 124 +++++++++++++++++++++++++++++++++++++++++-
>  xen/arch/arm/decode.h |  41 +++++++++++++-
>  xen/arch/arm/io.c     |  41 +++++++++++---
>  3 files changed, 195 insertions(+), 11 deletions(-)
> 
> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
> index 792c2e92a7..0c12af7afa 100644
> --- a/xen/arch/arm/decode.c
> +++ b/xen/arch/arm/decode.c
> @@ -84,6 +84,101 @@ bad_thumb2:
>      return 1;
>  }
>  
> +static int decode_loadstore_postindexing(register_t pc,
> +                                         struct hsr_dabt *dabt,
> +                                         union ldr_str_instr_class *instr)
> +{
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +
> +    if ( instr == NULL )
> +    {
> +        gprintk(XENLOG_ERR, "instr should not be NULL\n");
> +        return -EINVAL;
> +    }
> +
> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
> +    {
> +        gprintk(XENLOG_ERR, "Could not copy the instruction from PC\n");
> +        return -EFAULT;
> +    }
> +
> +    /*
> +     * Rn -ne Rt for ldr/str instruction.
> +     * Check https://developer.arm.com/documentation/dui0802/a/CIHGJHED
> +     * (Register restrictions)
> +     *
> +     * The only exception for this is when rn = 31. It denotes SP ("Use of SP")
> +     *
> +     * And when rt = 31, it denotes wzr/xzr. (Refer
> +     * https://developer.arm.com/documentation/den0024/a/ARMv8-Registers/AArch64-special-registers
> +     * "There is no register called X31 or W31. Many instructions are encoded
> +     * such that the number 31 represents the zero register, ZR (WZR/XZR)."
> +     */
> +    if ( (instr->code.rn == instr->code.rt) && (instr->code.rn != 31) )
> +    {
> +        gprintk(XENLOG_ERR, "Rn should not be equal to Rt except for r31\n");
> +        return -EINVAL;
> +    }
> +
> +    /* First, let's check for the fixed values */
> +    if ( (instr->value & POST_INDEX_FIXED_MASK) != POST_INDEX_FIXED_VALUE )
> +    {
> +        gprintk(XENLOG_ERR, "Cannot decode instruction 0x%x",instr->value);
> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
> +            " ldr/str post indexing\n");
> +        goto bad_loadstore;
> +    }
> +
> +    /*
> +     * Handle when rn = SP
> +     * Refer ArmV8 ARM DDI 0487G.b, Page - D1-2463 "Stack pointer register selection"
> +     * As we are interested in handling exceptions only from EL1 in AArch64 state,
> +     * thus M[3:0] == EL1h (Page - C5-480 "When exception taken from AArch64 state:")
> +     */
> +    if ( (instr->code.rn == 31) && ((regs->cpsr & PSR_MODE_MASK) != PSR_MODE_EL1h) )
> +    {
> +        gprintk(XENLOG_ERR, "SP is valid only for EL1h\n");
> +        goto bad_loadstore;
> +    }
> +
> +    if ( instr->code.v != 0 )
> +    {
> +        gprintk(XENLOG_ERR,
> +            "ldr/str post indexing for vector types are not supported\n");
> +        goto bad_loadstore;
> +    }
> +
> +    /* Check for STR (immediate) */
> +    if ( instr->code.opc == 0 )
> +    {
> +        dabt->write = 1;
> +    }
> +    /* Check for LDR (immediate) */
> +    else if ( instr->code.opc == 1 )
> +    {
> +        dabt->write = 0;
> +    }
> +    else
> +    {
> +        gprintk(XENLOG_ERR,
> +            "Decoding ldr/str post indexing is not supported for this variant\n");
> +        goto bad_loadstore;
> +    }
> +
> +    gprintk(XENLOG_INFO,
> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
> +        instr->code.rt, instr->code.size, instr->code.imm9);
> +
> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
> +    dabt->valid = 1;
> +
> +    return 0;
> +
> + bad_loadstore:
> +    gprintk(XENLOG_ERR, "unhandled Arm instruction 0x%x\n", instr->value);
> +    return 1;
> +}
> +
>  static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
>  {
>      uint16_t instr;
> @@ -150,17 +245,44 @@ bad_thumb:
>      return 1;
>  }
>  
> -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
> +                       union ldr_str_instr_class *instr)
>  {
>      if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
>          return decode_thumb(regs->pc, dabt);
>  
> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
> +    {
> +        return decode_loadstore_postindexing(regs->pc, dabt, instr);
> +    }
> +
>      /* TODO: Handle ARM instruction */
>      gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
>  
>      return 1;
>  }
>  
> +#if CONFIG_ARM_64
> +void post_increment_register(union ldr_str_instr_class *instr)
> +{
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +    register_t val;
> +
> +    /* handle when rn = SP */
> +    if ( instr->code.rn == 31 )
> +        val = regs->sp_el1;
> +    else
> +        val = get_user_reg(regs, instr->code.rn);
> +
> +    val += instr->code.imm9;
> +
> +    if ( instr->code.rn == 31 )
> +        regs->sp_el1 = val;
> +    else
> +        set_user_reg(regs, instr->code.rn, val);
> +}
> +#endif
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
> index 4613763bdb..511cd4a05f 100644
> --- a/xen/arch/arm/decode.h
> +++ b/xen/arch/arm/decode.h
> @@ -23,6 +23,35 @@
>  #include <asm/regs.h>
>  #include <asm/processor.h>
>  
> +/*
> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
> + * Page 318 specifies the following bit pattern for
> + * "load/store register (immediate post-indexed)".
> + *
> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
> + * ___________________________________________________________________
> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
> + * |____|______|__|____|____|__|_______________|____|_________|_______|
> + */
> +union ldr_str_instr_class {
> +    uint32_t value;
> +    struct ldr_str {
> +        unsigned int rt:5;     /* Rt register */
> +        unsigned int rn:5;     /* Rn register */
> +        unsigned int fixed1:2; /* value == 01b */
> +        signed int imm9:9;            /* imm9 */
> +        unsigned int fixed2:1; /* value == 0b */
> +        unsigned int opc:2;    /* opc */
> +        unsigned int fixed3:2; /* value == 00b */
> +        unsigned int v:1;      /* vector */
> +        unsigned int fixed4:3; /* value == 111b */
> +        unsigned int size:2;   /* size */
> +    } code;
> +};
> +
> +#define POST_INDEX_FIXED_MASK   0x3B200C00
> +#define POST_INDEX_FIXED_VALUE  0x38000400
> +
>  /**
>   * Decode an instruction from pc
>   * /!\ This function is not intended to fully decode an instruction. It
> @@ -35,8 +64,18 @@
>   */
>  
>  int decode_instruction(const struct cpu_user_regs *regs,
> -                       struct hsr_dabt *dabt);
> +                       struct hsr_dabt *dabt,
> +                       union ldr_str_instr_class *instr);
>  
> +/**

NIT: the Xen coding style only has /*, not /**

In any case aside from these two minor NITs that can be fixed on commit:

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> + * Update the register value for Rn
> + * /!\ This function is used to update the register value for Rn when a
> + * post indexing ldr/str instruction is decoded.
> + *
> + * This function will get:
> + * - The post indexing ldr/str instruction opcode
> + */
> +void post_increment_register(union ldr_str_instr_class *instr);
>  #endif /* __ARCH_ARM_DECODE_H_ */
>  
>  /*
> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> index 729287e37c..b9c15e1fe7 100644
> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -106,14 +106,29 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>          .gpa = gpa,
>          .dabt = dabt
>      };
> +    int rc;
> +    union ldr_str_instr_class instr = {0};
>  
>      ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
>  
> +    /*
> +     * Armv8 processor does not provide a valid syndrome for post-indexing
> +     * ldr/str instructions. So in order to process these instructions,
> +     * Xen must decode them.
> +     */
> +    if ( !info.dabt.valid )
> +    {
> +        rc = decode_instruction(regs, &info.dabt, &instr);
> +        if ( rc )
> +        {
> +            gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> +            return IO_ABORT;
> +        }
> +    }
> +
>      handler = find_mmio_handler(v->domain, info.gpa);
>      if ( !handler )
>      {
> -        int rc;
> -
>          rc = try_fwd_ioserv(regs, v, &info);
>          if ( rc == IO_HANDLED )
>              return handle_ioserv(regs, v);
> @@ -121,10 +136,6 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>          return rc;
>      }
>  
> -    /* All the instructions used on emulated MMIO region should be valid */
> -    if ( !dabt.valid )
> -        return IO_ABORT;
> -
>      /*
>       * Erratum 766422: Thumb store translation fault to Hypervisor may
>       * not have correct HSR Rt value.
> @@ -134,7 +145,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>      {
>          int rc;
>  
> -        rc = decode_instruction(regs, &info.dabt);
> +        rc = decode_instruction(regs, &info.dabt, NULL);
>          if ( rc )
>          {
>              gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> @@ -143,9 +154,21 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>      }
>  
>      if ( info.dabt.write )
> -        return handle_write(handler, v, &info);
> +        rc = handle_write(handler, v, &info);
>      else
> -        return handle_read(handler, v, &info);
> +        rc = handle_read(handler, v, &info);
> +
> +#if CONFIG_ARM_64
> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
> +    {
> +        if ( instr.value != 0 )
> +        {
> +            post_increment_register(&instr);
> +        }
> +    }
> +#endif
> +
> +    return rc;
>  }
>  
>  void register_mmio_handler(struct domain *d,
> -- 
> 2.17.1
> 

Re: [XEN v4] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Julien Grall 2 years, 2 months ago
Hi,

On 25/01/2022 21:18, Ayan Kumar Halder wrote:
> At the moment, Xen is only handling data abort with valid syndrome (i.e.
> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
> could use to access MMIO regions.
> 
> For instance, a baremetal OS can use any of the following instructions, where
> x1 contains the address of the MMIO region:
> 
> 1.      ldr     x2,    [x1],    #8
> 2.      ldr     w2,    [x1],    #-4
> 3.      ldr     x2,    [x1],    #-8
> 4.      ldr     w2,    [x1],    #4
> 5.      ldrh    w2,    [x1],    #2
> 6.      ldrb    w2,    [x1],    #1
> 7.      str     x2,    [x1],    #8
> 8.      str     w2,    [x1],    #-4
> 9.      strh    w2,    [x1],    #2
> 10.     strb    w2,    [x1],    #1
> 
> In the following two instructions, Rn could theoretically be stack pointer which
> might contain the address of the MMIO region:-
> 11.     ldrb    w2,    [Rn],    #1
> 12.     ldrb    wzr,   [Rn],    #1
> 
> In order to handle post-indexing store/load instructions (like those mentioned
> above), Xen will need to fetch and decode the instruction.
> 
> This patch only cover post-index store/load instructions from AArch64 mode.
> For now, this is left unimplemented for trap from AArch32 mode.
> 
> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
> ---
> 
> Changelog :-
> v2 - 1. Updated the rn register after reading from it. (Pointed by Julien,
>          Stefano)
>       2. Used a union to represent the instruction opcode (Suggestd by Bertrand)
>       3. Fixed coding style issues (Pointed by Julien)
>       4. In the previous patch, I was updating dabt->sign based on the signedness
>          of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI 0487G.b,
>          Page 3221, SSE indicates the signedness of the data item loaded. In our
>          case, the data item loaded is always unsigned.
> 
> v3- 1. Handled all the variants of ldr/str (ie 64, 32, 16, 8 bit variants).
>         Thus, I have removed the check for "instr->code.opc == 0" (Suggested by
>         Andre)
>      2. Handled the scenario when rn = SP, rt = XZR (Suggested by Jan, Andre)
>      3. Added restriction for "rt != rn" (Suggested by Andre)
>      4. Moved union ldr_str_instr_class {} to decode.h. This is the header included
>         by io.c and decode.c (where the union is referred). (Suggested by Jan)
>      5. Indentation and typo fixes (Suggested by Jan)
> 
> v4- 1. Fixed the patch as per Stefano's comments on v3. They are as follows :-
>          1.1 Use macros to determine the fixed values in the instruction opcode
>          1.2 Checked if instr != NULL
>          1.3 Changed some data types and added #define ARM_64 for AArch64 specific
>              code
>          1.4 Moved post_increment_register() to decode.c so that the decoding
>              logic is confined to a single file.
>          1.5 Moved some checks from post_increment_register() to
>              decode_loadstore_postindexing()
>          1.6 Removed a duplicate check
>      2. Updated the commit message as per Andre's comments.
>      3. Changed the names of a label and some comments. *32bit* was erroneously
>         mentioned in a label and comments in decode_loadstore_postindexing()
>         although the function handled all variants of ldr/str post indexing.
> 
>   xen/arch/arm/decode.c | 124 +++++++++++++++++++++++++++++++++++++++++-
>   xen/arch/arm/decode.h |  41 +++++++++++++-
>   xen/arch/arm/io.c     |  41 +++++++++++---
>   3 files changed, 195 insertions(+), 11 deletions(-)
> 
> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
> index 792c2e92a7..0c12af7afa 100644
> --- a/xen/arch/arm/decode.c
> +++ b/xen/arch/arm/decode.c
> @@ -84,6 +84,101 @@ bad_thumb2:
>       return 1;
>   }
>   
> +static int decode_loadstore_postindexing(register_t pc,

This is only handling AArch64 instruction. So please add aarch64 (or 
arm64) in the name.

> +                                         struct hsr_dabt *dabt,
> +                                         union ldr_str_instr_class *instr)
> +{
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +
> +    if ( instr == NULL )

Wouldn't it be a programming error? If so, should it be ASSERT(...)?

> +    {
> +        gprintk(XENLOG_ERR, "instr should not be NULL\n");
> +        return -EINVAL;
> +    }
> +
> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
> +    {
> +        gprintk(XENLOG_ERR, "Could not copy the instruction from PC\n");
> +        return -EFAULT;
> +    }
> +
> +    /*
> +     * Rn -ne Rt for ldr/str instruction.
> +     * Check https://developer.arm.com/documentation/dui0802/a/CIHGJHED
> +     * (Register restrictions)

I find a bit odd that you are pointing to 3 different spec (Compiler, 
Cortex-A, Arm Arm). Can we please use only the Arm Arm? It should 
contain everything we need...

> +     *
> +     * The only exception for this is when rn = 31. It denotes SP ("Use of SP")
> +     *
> +     * And when rt = 31, it denotes wzr/xzr. (Refer
> +     * https://developer.arm.com/documentation/den0024/a/ARMv8-Registers/AArch64-special-registers

Same here.

Also, please avoid URL and use the document reference (e.g. ARM DDI 
0487F.c for the Arm Arm) are they are easy to find on google.

> +     * "There is no register called X31 or W31. Many instructions are encoded
> +     * such that the number 31 represents the zero register, ZR (WZR/XZR)."
> +     */
> +    if ( (instr->code.rn == instr->code.rt) && (instr->code.rn != 31) )
> +    {
> +        gprintk(XENLOG_ERR, "Rn should not be equal to Rt except for r31\n");
> +        return -EINVAL;
> +    }
> +
> +    /* First, let's check for the fixed values */
> +    if ( (instr->value & POST_INDEX_FIXED_MASK) != POST_INDEX_FIXED_VALUE )
> +    {
> +        gprintk(XENLOG_ERR, "Cannot decode instruction 0x%x",instr->value);
> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
> +            " ldr/str post indexing\n");

Please don't split the message. Instead, it should be:

gprintk(XENLOG_ERR,
         "...");

But I would combine the two messages and simply write:

"Decoding instruction 0x%x is not supported". Not need to say that we 
support only ldr/str post indexing.

> +        goto bad_loadstore;

I am a bit confused why some of the error path is using error and other 
goto. Can you clarify it?

> +    }
> +
> +    /*
> +     * Handle when rn = SP
> +     * Refer ArmV8 ARM DDI 0487G.b, Page - D1-2463 "Stack pointer register selection"
> +     * As we are interested in handling exceptions only from EL1 in AArch64 state,
> +     * thus M[3:0] == EL1h (Page - C5-480 "When exception taken from AArch64 state:")

I read the last sentence as "We only support decoding from instruction 
run at EL1". But I can't find a check guarantee that.

> +     */
> +    if ( (instr->code.rn == 31) && ((regs->cpsr & PSR_MODE_MASK) != PSR_MODE_EL1h) )
> +    {
> +        gprintk(XENLOG_ERR, "SP is valid only for EL1h\n");
> +        goto bad_loadstore;
> +    }
> +
> +    if ( instr->code.v != 0 )
> +    {
> +        gprintk(XENLOG_ERR,
> +            "ldr/str post indexing for vector types are not supported\n");
> +        goto bad_loadstore;
> +    }
> +
> +    /* Check for STR (immediate) */
> +    if ( instr->code.opc == 0 )
> +    {
> +        dabt->write = 1;
> +    }

Coding style: We don't use {} for single line. In this case, it would 
also result to have a more readable code.

> +    /* Check for LDR (immediate) */
> +    else if ( instr->code.opc == 1 )
> +    {
> +        dabt->write = 0;
> +    }

Same.

> +    else
> +    {
> +        gprintk(XENLOG_ERR,
> +            "Decoding ldr/str post indexing is not supported for this variant\n");

The indentation looks wrong here.

> +        goto bad_loadstore;
> +    }
> +
> +    gprintk(XENLOG_INFO,
> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
> +        instr->code.rt, instr->code.size, instr->code.imm9);

The indentation looks wrong here.

> +
> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
> +    dabt->valid = 1;
> +
> +    return 0;
> +
> + bad_loadstore:
> +    gprintk(XENLOG_ERR, "unhandled Arm instruction 0x%x\n", instr->value);
> +    return 1;
> +}
> +
>   static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
>   {
>       uint16_t instr;
> @@ -150,17 +245,44 @@ bad_thumb:
>       return 1;
>   }
>   
> -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
> +                       union ldr_str_instr_class *instr)

I would like to avoid make the assumption that the instr we decode will 
always be a store/load. So please rename it to something more generic.

>   {
>       if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
>           return decode_thumb(regs->pc, dabt);
>   
> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )

The first part could be dropped because if psr_mode_is_32bit returns 0 
then it must mean the domain is 64-bit.

> +    {
> +        return decode_loadstore_postindexing(regs->pc, dabt, instr);
> +    }

Coding style: Please drop the {}.

> +
>       /* TODO: Handle ARM instruction */
>       gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
>   
>       return 1;
>   }
>   
> +#if CONFIG_ARM_64
> +void post_increment_register(union ldr_str_instr_class *instr)

instr should not be modified, so please use const. Also, it would be 
preferrable to pass the regs in parameter. So the none of the decoding 
code relies on the current regs.

Furthermore, decode.c should only contain code to update the syndrome 
and in theory Arm could decide to provide an valid syndrome in future 
revision. So I would move this code in io.c (or maybe traps.c).

> +{
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +    register_t val;
> +
> +    /* handle when rn = SP */
> +    if ( instr->code.rn == 31 )
> +        val = regs->sp_el1;
> +    else
> +        val = get_user_reg(regs, instr->code.rn);
> +
> +    val += instr->code.imm9;
> +
> +    if ( instr->code.rn == 31 )
> +        regs->sp_el1 = val;
> +    else
> +        set_user_reg(regs, instr->code.rn, val);
> +}
> +#endif
> +
>   /*
>    * Local variables:
>    * mode: C
> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
> index 4613763bdb..511cd4a05f 100644
> --- a/xen/arch/arm/decode.h
> +++ b/xen/arch/arm/decode.h
> @@ -23,6 +23,35 @@
>   #include <asm/regs.h>
>   #include <asm/processor.h>
>   
> +/*
> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
> + * Page 318 specifies the following bit pattern for
> + * "load/store register (immediate post-indexed)".
> + *
> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
> + * ___________________________________________________________________
> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
> + * |____|______|__|____|____|__|_______________|____|_________|_______|
> + */
> +union ldr_str_instr_class {
> +    uint32_t value;
> +    struct ldr_str {
> +        unsigned int rt:5;     /* Rt register */
> +        unsigned int rn:5;     /* Rn register */
> +        unsigned int fixed1:2; /* value == 01b */
> +        signed int imm9:9;            /* imm9 */
> +        unsigned int fixed2:1; /* value == 0b */
> +        unsigned int opc:2;    /* opc */
> +        unsigned int fixed3:2; /* value == 00b */
> +        unsigned int v:1;      /* vector */
> +        unsigned int fixed4:3; /* value == 111b */
> +        unsigned int size:2;   /* size */
> +    } code;
> +};

Looking at the code, post_increment_register() only care about 'rn' and 
'imm9'. So rather than exposing the full instruction, could we instead 
provide the strict minimum? I.e something like:

struct
{
      enum instr_type; /* Unknown, ldr/str post increment */
      union
      {
          struct
          {
            register; /* Register to increment */
            imm;      /* Immediate to add */
          } ldr_str;
      }
      uint64_t register;
}

> +
> +#define POST_INDEX_FIXED_MASK   0x3B200C00
> +#define POST_INDEX_FIXED_VALUE  0x38000400
> +
>   /**
>    * Decode an instruction from pc
>    * /!\ This function is not intended to fully decode an instruction. It
> @@ -35,8 +64,18 @@
>    */
>   
>   int decode_instruction(const struct cpu_user_regs *regs,
> -                       struct hsr_dabt *dabt);
> +                       struct hsr_dabt *dabt,
> +                       union ldr_str_instr_class *instr);
>   
> +/**
> + * Update the register value for Rn
> + * /!\ This function is used to update the register value for Rn when a

NIT: I would drop /!\ because this looks more the description of the 
function rather than a warning.

> + * post indexing ldr/str instruction is decoded.
> + *
> + * This function will get:
> + * - The post indexing ldr/str instruction opcode
> + */
> +void post_increment_register(union ldr_str_instr_class *instr);
>   #endif /* __ARCH_ARM_DECODE_H_ */
>   
>   /*
> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> index 729287e37c..b9c15e1fe7 100644
> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -106,14 +106,29 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>           .gpa = gpa,
>           .dabt = dabt
>       };
> +    int rc;
> +    union ldr_str_instr_class instr = {0};
>   
>       ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
>   
> +    /*
> +     * Armv8 processor does not provide a valid syndrome for post-indexing
> +     * ldr/str instructions. So in order to process these instructions,
> +     * Xen must decode them.
> +     */

This sort of comments will become stall if we decide to add more 
decoding. So I would keep the comments in io.c generic (i.e. no mention 
of load/store).

> +    if ( !info.dabt.valid )

I would rather prefer if we keep using dabt.valid here rather than 
info.dabt.valid. It is shorter and keep consistent with how it was 
checked before.

> +    {
> +        rc = decode_instruction(regs, &info.dabt, &instr);
> +        if ( rc )
> +        {
> +            gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> +            return IO_ABORT;
> +        }
> +    }
> +
>       handler = find_mmio_handler(v->domain, info.gpa);
>       if ( !handler )
>       {
> -        int rc;
> -
>           rc = try_fwd_ioserv(regs, v, &info);
>           if ( rc == IO_HANDLED )
>               return handle_ioserv(regs, v);
> @@ -121,10 +136,6 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>           return rc;
>       }
>   
> -    /* All the instructions used on emulated MMIO region should be valid */
> -    if ( !dabt.valid )
> -        return IO_ABORT;
> -

I think moving if ( !dabt.valid ) earlier should be part of a pre-patch. 
This would allows us to backport it as we don't want to forward the I/O 
to an IOREQ server if ISV=0.

>       /*
>        * Erratum 766422: Thumb store translation fault to Hypervisor may
>        * not have correct HSR Rt value.
> @@ -134,7 +145,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>       {
>           int rc;
>   
> -        rc = decode_instruction(regs, &info.dabt);
> +        rc = decode_instruction(regs, &info.dabt, NULL);

Could we combine the two decode_instruction()?

>           if ( rc )
>           {
>               gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> @@ -143,9 +154,21 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>       }
>   
>       if ( info.dabt.write )
> -        return handle_write(handler, v, &info);
> +        rc = handle_write(handler, v, &info);
>       else
> -        return handle_read(handler, v, &info);
> +        rc = handle_read(handler, v, &info);
> +
> +#if CONFIG_ARM_64

Rather than using #ifdef here, I prefer if provide a stub for 
post_increment_register() that contains ASSERT_UNREACHABLE(). So the 
code in io.c is more arch-agnostic.

> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )

This is a fairly expensive check. Can we avoid it and instead rely on 
the instruction to be decoded?

> +    {
> +        if ( instr.value != 0 )

I would prefer if we carry a boolean to indicate whether we manually 
decoded the instruction. The main advantage is one doesn't need to check 
in the Arm Arm to figure out that the 0x0 will never result to a valid 
instruction (it is thankfully used for UDF).

> +        {
> +            post_increment_register(&instr);
> +        }
> +    }
> +#endif
> +
> +    return rc;
>   }
>   
>   void register_mmio_handler(struct domain *d,

Cheers,

-- 
Julien Grall

Re: [XEN v4] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Stefano Stabellini 2 years, 2 months ago
On Tue, 25 Jan 2022, Julien Grall wrote:
> > +
> >       /* TODO: Handle ARM instruction */
> >       gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
> >         return 1;
> >   }
> >   +#if CONFIG_ARM_64
> > +void post_increment_register(union ldr_str_instr_class *instr)
> 
> instr should not be modified, so please use const. Also, it would be
> preferrable to pass the regs in parameter. So the none of the decoding code
> relies on the current regs.
> 
> Furthermore, decode.c should only contain code to update the syndrome and in
> theory Arm could decide to provide an valid syndrome in future revision. So I
> would move this code in io.c (or maybe traps.c).

I was the one to suggest moving it to decode.c to keep it closer to the
decoding function it is related to, and also because it felt a bit out
of place in io.c.

I don't feel strongly about this at all; I am fine either way.


> > +{
> > +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> > +    register_t val;
> > +
> > +    /* handle when rn = SP */
> > +    if ( instr->code.rn == 31 )
> > +        val = regs->sp_el1;
> > +    else
> > +        val = get_user_reg(regs, instr->code.rn);
> > +
> > +    val += instr->code.imm9;
> > +
> > +    if ( instr->code.rn == 31 )
> > +        regs->sp_el1 = val;
> > +    else
> > +        set_user_reg(regs, instr->code.rn, val);
> > +}
> > +#endif
> > +
> >   /*
> >    * Local variables:
> >    * mode: C
> > diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
> > index 4613763bdb..511cd4a05f 100644
> > --- a/xen/arch/arm/decode.h
> > +++ b/xen/arch/arm/decode.h
> > @@ -23,6 +23,35 @@
> >   #include <asm/regs.h>
> >   #include <asm/processor.h>
> >   +/*
> > + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
> > + * Page 318 specifies the following bit pattern for
> > + * "load/store register (immediate post-indexed)".
> > + *
> > + * 31 30 29  27 26 25  23   21 20              11   9         4       0
> > + * ___________________________________________________________________
> > + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
> > + * |____|______|__|____|____|__|_______________|____|_________|_______|
> > + */
> > +union ldr_str_instr_class {
> > +    uint32_t value;
> > +    struct ldr_str {
> > +        unsigned int rt:5;     /* Rt register */
> > +        unsigned int rn:5;     /* Rn register */
> > +        unsigned int fixed1:2; /* value == 01b */
> > +        signed int imm9:9;            /* imm9 */
> > +        unsigned int fixed2:1; /* value == 0b */
> > +        unsigned int opc:2;    /* opc */
> > +        unsigned int fixed3:2; /* value == 00b */
> > +        unsigned int v:1;      /* vector */
> > +        unsigned int fixed4:3; /* value == 111b */
> > +        unsigned int size:2;   /* size */
> > +    } code;
> > +};
> 
> Looking at the code, post_increment_register() only care about 'rn' and
> 'imm9'. So rather than exposing the full instruction, could we instead provide
> the strict minimum? I.e something like:
> 
> struct
> {
>      enum instr_type; /* Unknown, ldr/str post increment */
>      union
>      {
>          struct
>          {
>            register; /* Register to increment */
>            imm;      /* Immediate to add */
>          } ldr_str;
>      }
>      uint64_t register;
> }
 
The full description helped a lot during review. I would prefer to keep
it if you don't feel strongly about it.

Re: [XEN v4] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Julien Grall 2 years, 2 months ago
Hi,

On 26/01/2022 01:45, Stefano Stabellini wrote:
> On Tue, 25 Jan 2022, Julien Grall wrote:
>>> +
>>>        /* TODO: Handle ARM instruction */
>>>        gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
>>>          return 1;
>>>    }
>>>    +#if CONFIG_ARM_64
>>> +void post_increment_register(union ldr_str_instr_class *instr)
>>
>> instr should not be modified, so please use const. Also, it would be
>> preferrable to pass the regs in parameter. So the none of the decoding code
>> relies on the current regs.
>>
>> Furthermore, decode.c should only contain code to update the syndrome and in
>> theory Arm could decide to provide an valid syndrome in future revision. So I
>> would move this code in io.c (or maybe traps.c).
> 
> I was the one to suggest moving it to decode.c to keep it closer to the
> decoding function it is related to, and also because it felt a bit out
> of place in io.c.

How about traps.c? This is where we also take care of incrementing pc 
after we handle a MMIO trap.

>>> +{
>>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>>> +    register_t val;
>>> +
>>> +    /* handle when rn = SP */
>>> +    if ( instr->code.rn == 31 )
>>> +        val = regs->sp_el1;
>>> +    else
>>> +        val = get_user_reg(regs, instr->code.rn);
>>> +
>>> +    val += instr->code.imm9;
>>> +
>>> +    if ( instr->code.rn == 31 )
>>> +        regs->sp_el1 = val;
>>> +    else
>>> +        set_user_reg(regs, instr->code.rn, val);
>>> +}
>>> +#endif
>>> +
>>>    /*
>>>     * Local variables:
>>>     * mode: C
>>> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
>>> index 4613763bdb..511cd4a05f 100644
>>> --- a/xen/arch/arm/decode.h
>>> +++ b/xen/arch/arm/decode.h
>>> @@ -23,6 +23,35 @@
>>>    #include <asm/regs.h>
>>>    #include <asm/processor.h>
>>>    +/*
>>> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
>>> + * Page 318 specifies the following bit pattern for
>>> + * "load/store register (immediate post-indexed)".
>>> + *
>>> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
>>> + * ___________________________________________________________________
>>> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
>>> + * |____|______|__|____|____|__|_______________|____|_________|_______|
>>> + */
>>> +union ldr_str_instr_class {
>>> +    uint32_t value;
>>> +    struct ldr_str {

No need to name the struct here.

>>> +        unsigned int rt:5;     /* Rt register */
>>> +        unsigned int rn:5;     /* Rn register */
>>> +        unsigned int fixed1:2; /* value == 01b */
>>> +        signed int imm9:9;            /* imm9 */
>>> +        unsigned int fixed2:1; /* value == 0b */
>>> +        unsigned int opc:2;    /* opc */
>>> +        unsigned int fixed3:2; /* value == 00b */
>>> +        unsigned int v:1;      /* vector */
>>> +        unsigned int fixed4:3; /* value == 111b */
>>> +        unsigned int size:2;   /* size */
>>> +    } code;

It would be best to name it ldr_str so this can be easily extended (e.g. 
no renaming) for other instructions in the future.

>>> +};
>>
>> Looking at the code, post_increment_register() only care about 'rn' and
>> 'imm9'. So rather than exposing the full instruction, could we instead provide
>> the strict minimum? I.e something like:
>>
>> struct
>> {
>>       enum instr_type; /* Unknown, ldr/str post increment */
>>       union
>>       {
>>           struct
>>           {
>>             register; /* Register to increment */
>>             imm;      /* Immediate to add */
>>           } ldr_str;
>>       }
>>       uint64_t register;
>> }
>   
> The full description helped a lot during review. I would prefer to keep
> it if you don't feel strongly about it.

I haven't suggested to drop the union. Instead, I am suggesting to keep 
it internally to decode.c and expose something different to the external 
the user. The idea is the caller doesn't care about the full 
instruction, it only cares about what action to do.

Basically, what I am asking is an augmented dabt. So all the information 
are in one place rather than having to carry two structure (struct 
hsr_dabt and union ldr_str_instr_class) which contain mostly redundant 
information.

Cheers,

-- 
Julien Grall

Re: [XEN v4] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Stefano Stabellini 2 years, 2 months ago
On Wed, 26 Jan 2022, Julien Grall wrote:
> On 26/01/2022 01:45, Stefano Stabellini wrote:
> > On Tue, 25 Jan 2022, Julien Grall wrote:
> > > > +
> > > >        /* TODO: Handle ARM instruction */
> > > >        gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
> > > >          return 1;
> > > >    }
> > > >    +#if CONFIG_ARM_64
> > > > +void post_increment_register(union ldr_str_instr_class *instr)
> > > 
> > > instr should not be modified, so please use const. Also, it would be
> > > preferrable to pass the regs in parameter. So the none of the decoding
> > > code
> > > relies on the current regs.
> > > 
> > > Furthermore, decode.c should only contain code to update the syndrome and
> > > in
> > > theory Arm could decide to provide an valid syndrome in future revision.
> > > So I
> > > would move this code in io.c (or maybe traps.c).
> > 
> > I was the one to suggest moving it to decode.c to keep it closer to the
> > decoding function it is related to, and also because it felt a bit out
> > of place in io.c.
> 
> How about traps.c? This is where we also take care of incrementing pc after we
> handle a MMIO trap.
> 
> > > > +{
> > > > +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> > > > +    register_t val;
> > > > +
> > > > +    /* handle when rn = SP */
> > > > +    if ( instr->code.rn == 31 )
> > > > +        val = regs->sp_el1;
> > > > +    else
> > > > +        val = get_user_reg(regs, instr->code.rn);
> > > > +
> > > > +    val += instr->code.imm9;
> > > > +
> > > > +    if ( instr->code.rn == 31 )
> > > > +        regs->sp_el1 = val;
> > > > +    else
> > > > +        set_user_reg(regs, instr->code.rn, val);
> > > > +}
> > > > +#endif
> > > > +
> > > >    /*
> > > >     * Local variables:
> > > >     * mode: C
> > > > diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
> > > > index 4613763bdb..511cd4a05f 100644
> > > > --- a/xen/arch/arm/decode.h
> > > > +++ b/xen/arch/arm/decode.h
> > > > @@ -23,6 +23,35 @@
> > > >    #include <asm/regs.h>
> > > >    #include <asm/processor.h>
> > > >    +/*
> > > > + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and
> > > > Stores
> > > > + * Page 318 specifies the following bit pattern for
> > > > + * "load/store register (immediate post-indexed)".
> > > > + *
> > > > + * 31 30 29  27 26 25  23   21 20              11   9         4       0
> > > > + * ___________________________________________________________________
> > > > + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
> > > > + * |____|______|__|____|____|__|_______________|____|_________|_______|
> > > > + */
> > > > +union ldr_str_instr_class {
> > > > +    uint32_t value;
> > > > +    struct ldr_str {
> 
> No need to name the struct here.
> 
> > > > +        unsigned int rt:5;     /* Rt register */
> > > > +        unsigned int rn:5;     /* Rn register */
> > > > +        unsigned int fixed1:2; /* value == 01b */
> > > > +        signed int imm9:9;            /* imm9 */
> > > > +        unsigned int fixed2:1; /* value == 0b */
> > > > +        unsigned int opc:2;    /* opc */
> > > > +        unsigned int fixed3:2; /* value == 00b */
> > > > +        unsigned int v:1;      /* vector */
> > > > +        unsigned int fixed4:3; /* value == 111b */
> > > > +        unsigned int size:2;   /* size */
> > > > +    } code;
> 
> It would be best to name it ldr_str so this can be easily extended (e.g. no
> renaming) for other instructions in the future.
> 
> > > > +};
> > > 
> > > Looking at the code, post_increment_register() only care about 'rn' and
> > > 'imm9'. So rather than exposing the full instruction, could we instead
> > > provide
> > > the strict minimum? I.e something like:
> > > 
> > > struct
> > > {
> > >       enum instr_type; /* Unknown, ldr/str post increment */
> > >       union
> > >       {
> > >           struct
> > >           {
> > >             register; /* Register to increment */
> > >             imm;      /* Immediate to add */
> > >           } ldr_str;
> > >       }
> > >       uint64_t register;
> > > }
> >   The full description helped a lot during review. I would prefer to keep
> > it if you don't feel strongly about it.
> 
> I haven't suggested to drop the union. Instead, I am suggesting to keep it
> internally to decode.c and expose something different to the external the
> user. The idea is the caller doesn't care about the full instruction, it only
> cares about what action to do.
> 
> Basically, what I am asking is an augmented dabt. So all the information are
> in one place rather than having to carry two structure (struct hsr_dabt and
> union ldr_str_instr_class) which contain mostly redundant information.

That could be a good idea. We shouldn't need "union ldr_str_instr_class"
in io.c, it should only be needed in decode.c. Thus, it could be
internal to decode.c. That's fine by me.

Re: [XEN v4] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Ayan Kumar Halder 2 years, 2 months ago
Hi Julien/Stefano,

Thanks for your feedback. The comments are useful.

I need a couple clarifications and then I can send the pre-patch and v5 
(with the fixes suggested).


On 25/01/2022 23:02, Julien Grall wrote:
> Hi,
>
> On 25/01/2022 21:18, Ayan Kumar Halder wrote:
>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
>> could use to access MMIO regions.
>>
>> For instance, a baremetal OS can use any of the following 
>> instructions, where
>> x1 contains the address of the MMIO region:
>>
>> 1.      ldr     x2,    [x1],    #8
>> 2.      ldr     w2,    [x1],    #-4
>> 3.      ldr     x2,    [x1],    #-8
>> 4.      ldr     w2,    [x1],    #4
>> 5.      ldrh    w2,    [x1],    #2
>> 6.      ldrb    w2,    [x1],    #1
>> 7.      str     x2,    [x1],    #8
>> 8.      str     w2,    [x1],    #-4
>> 9.      strh    w2,    [x1],    #2
>> 10.     strb    w2,    [x1],    #1
>>
>> In the following two instructions, Rn could theoretically be stack 
>> pointer which
>> might contain the address of the MMIO region:-
>> 11.     ldrb    w2,    [Rn],    #1
>> 12.     ldrb    wzr,   [Rn],    #1
>>
>> In order to handle post-indexing store/load instructions (like those 
>> mentioned
>> above), Xen will need to fetch and decode the instruction.
>>
>> This patch only cover post-index store/load instructions from AArch64 
>> mode.
>> For now, this is left unimplemented for trap from AArch32 mode.
>>
>> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
>> ---
>>
>> Changelog :-
>> v2 - 1. Updated the rn register after reading from it. (Pointed by 
>> Julien,
>>          Stefano)
>>       2. Used a union to represent the instruction opcode (Suggestd 
>> by Bertrand)
>>       3. Fixed coding style issues (Pointed by Julien)
>>       4. In the previous patch, I was updating dabt->sign based on 
>> the signedness
>>          of imm9. This was incorrect. As mentioned in ARMv8 ARM DDI 
>> 0487G.b,
>>          Page 3221, SSE indicates the signedness of the data item 
>> loaded. In our
>>          case, the data item loaded is always unsigned.
>>
>> v3- 1. Handled all the variants of ldr/str (ie 64, 32, 16, 8 bit 
>> variants).
>>         Thus, I have removed the check for "instr->code.opc == 0" 
>> (Suggested by
>>         Andre)
>>      2. Handled the scenario when rn = SP, rt = XZR (Suggested by 
>> Jan, Andre)
>>      3. Added restriction for "rt != rn" (Suggested by Andre)
>>      4. Moved union ldr_str_instr_class {} to decode.h. This is the 
>> header included
>>         by io.c and decode.c (where the union is referred). 
>> (Suggested by Jan)
>>      5. Indentation and typo fixes (Suggested by Jan)
>>
>> v4- 1. Fixed the patch as per Stefano's comments on v3. They are as 
>> follows :-
>>          1.1 Use macros to determine the fixed values in the 
>> instruction opcode
>>          1.2 Checked if instr != NULL
>>          1.3 Changed some data types and added #define ARM_64 for 
>> AArch64 specific
>>              code
>>          1.4 Moved post_increment_register() to decode.c so that the 
>> decoding
>>              logic is confined to a single file.
>>          1.5 Moved some checks from post_increment_register() to
>>              decode_loadstore_postindexing()
>>          1.6 Removed a duplicate check
>>      2. Updated the commit message as per Andre's comments.
>>      3. Changed the names of a label and some comments. *32bit* was 
>> erroneously
>>         mentioned in a label and comments in 
>> decode_loadstore_postindexing()
>>         although the function handled all variants of ldr/str post 
>> indexing.
>>
>>   xen/arch/arm/decode.c | 124 +++++++++++++++++++++++++++++++++++++++++-
>>   xen/arch/arm/decode.h |  41 +++++++++++++-
>>   xen/arch/arm/io.c     |  41 +++++++++++---
>>   3 files changed, 195 insertions(+), 11 deletions(-)
>>
>> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
>> index 792c2e92a7..0c12af7afa 100644
>> --- a/xen/arch/arm/decode.c
>> +++ b/xen/arch/arm/decode.c
>> @@ -84,6 +84,101 @@ bad_thumb2:
>>       return 1;
>>   }
>>   +static int decode_loadstore_postindexing(register_t pc,
>
> This is only handling AArch64 instruction. So please add aarch64 (or 
> arm64) in the name.
>
>> + struct hsr_dabt *dabt,
>> +                                         union ldr_str_instr_class 
>> *instr)
>> +{
>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>> +
>> +    if ( instr == NULL )
>
> Wouldn't it be a programming error? If so, should it be ASSERT(...)?
>
>> +    {
>> +        gprintk(XENLOG_ERR, "instr should not be NULL\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, 
>> sizeof (instr)) )
>> +    {
>> +        gprintk(XENLOG_ERR, "Could not copy the instruction from 
>> PC\n");
>> +        return -EFAULT;
>> +    }
>> +
>> +    /*
>> +     * Rn -ne Rt for ldr/str instruction.
>> +     * Check https://developer.arm.com/documentation/dui0802/a/CIHGJHED
>> +     * (Register restrictions)
>
> I find a bit odd that you are pointing to 3 different spec (Compiler, 
> Cortex-A, Arm Arm). Can we please use only the Arm Arm? It should 
> contain everything we need...
>
>> +     *
>> +     * The only exception for this is when rn = 31. It denotes SP 
>> ("Use of SP")
>> +     *
>> +     * And when rt = 31, it denotes wzr/xzr. (Refer
>> +     * 
>> https://developer.arm.com/documentation/den0024/a/ARMv8-Registers/AArch64-special-registers
>
> Same here.
>
> Also, please avoid URL and use the document reference (e.g. ARM DDI 
> 0487F.c for the Arm Arm) are they are easy to find on google.
>
>> +     * "There is no register called X31 or W31. Many instructions 
>> are encoded
>> +     * such that the number 31 represents the zero register, ZR 
>> (WZR/XZR)."
>> +     */
>> +    if ( (instr->code.rn == instr->code.rt) && (instr->code.rn != 31) )
>> +    {
>> +        gprintk(XENLOG_ERR, "Rn should not be equal to Rt except for 
>> r31\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    /* First, let's check for the fixed values */
>> +    if ( (instr->value & POST_INDEX_FIXED_MASK) != 
>> POST_INDEX_FIXED_VALUE )
>> +    {
>> +        gprintk(XENLOG_ERR, "Cannot decode instruction 
>> 0x%x",instr->value);
>> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions 
>> other than"
>> +            " ldr/str post indexing\n");
>
> Please don't split the message. Instead, it should be:
>
> gprintk(XENLOG_ERR,
>         "...");
>
> But I would combine the two messages and simply write:
>
> "Decoding instruction 0x%x is not supported". Not need to say that we 
> support only ldr/str post indexing.
>
>> +        goto bad_loadstore;
>
> I am a bit confused why some of the error path is using error and 
> other goto. Can you clarify it?
Sorry, this is my carelessness. I was thinking to return an error 
(immediately) for issues un-related to decoding (For eg instr == NULL, 
copy_from_guest()). But on re-thinking, it does not sound sane. I will 
use goto for all the errors as it is important to display the error 
message for the user.
>
>> +    }
>> +
>> +    /*
>> +     * Handle when rn = SP
>> +     * Refer ArmV8 ARM DDI 0487G.b, Page - D1-2463 "Stack pointer 
>> register selection"
>> +     * As we are interested in handling exceptions only from EL1 in 
>> AArch64 state,
>> +     * thus M[3:0] == EL1h (Page - C5-480 "When exception taken from 
>> AArch64 state:")
>
> I read the last sentence as "We only support decoding from instruction 
> run at EL1". But I can't find a check guarantee that.
Sorry, the check below does that but is used only for rn == 31. I should 
move ((regs->cpsr & PSR_MODE_MASK) != PSR_MODE_EL1h) ) into a separate 
check of its own.
>
>> +     */
>> +    if ( (instr->code.rn == 31) && ((regs->cpsr & PSR_MODE_MASK) != 
>> PSR_MODE_EL1h) )
>> +    {
>> +        gprintk(XENLOG_ERR, "SP is valid only for EL1h\n");
>> +        goto bad_loadstore;
>> +    }
>> +
>> +    if ( instr->code.v != 0 )
>> +    {
>> +        gprintk(XENLOG_ERR,
>> +            "ldr/str post indexing for vector types are not 
>> supported\n");
>> +        goto bad_loadstore;
>> +    }
>> +
>> +    /* Check for STR (immediate) */
>> +    if ( instr->code.opc == 0 )
>> +    {
>> +        dabt->write = 1;
>> +    }
>
> Coding style: We don't use {} for single line. In this case, it would 
> also result to have a more readable code.
>
>> +    /* Check for LDR (immediate) */
>> +    else if ( instr->code.opc == 1 )
>> +    {
>> +        dabt->write = 0;
>> +    }
>
> Same.
>
>> +    else
>> +    {
>> +        gprintk(XENLOG_ERR,
>> +            "Decoding ldr/str post indexing is not supported for 
>> this variant\n");
>
> The indentation looks wrong here.
>
>> +        goto bad_loadstore;
>> +    }
>> +
>> +    gprintk(XENLOG_INFO,
>> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, 
>> instr->code.imm9 = %d\n",
>> +        instr->code.rt, instr->code.size, instr->code.imm9);
>
> The indentation looks wrong here.
>
>> +
>> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
>> +    dabt->valid = 1;
>> +
>> +    return 0;
>> +
>> + bad_loadstore:
>> +    gprintk(XENLOG_ERR, "unhandled Arm instruction 0x%x\n", 
>> instr->value);
>> +    return 1;
>> +}
>> +
>>   static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
>>   {
>>       uint16_t instr;
>> @@ -150,17 +245,44 @@ bad_thumb:
>>       return 1;
>>   }
>>   -int decode_instruction(const struct cpu_user_regs *regs, struct 
>> hsr_dabt *dabt)
>> +int decode_instruction(const struct cpu_user_regs *regs, struct 
>> hsr_dabt *dabt,
>> +                       union ldr_str_instr_class *instr)
>
> I would like to avoid make the assumption that the instr we decode 
> will always be a store/load. So please rename it to something more 
> generic.

A difference of thought. Should we not name it as per the current usage 
? This will avoid any ambiguity. Later, if this gets extended, then it 
can be named more appropriately depending on the usage.

Also, the bit-pattern in "union ldr_str_instr_class" is very much 
specific to ldr/str. So changing the name to something generic seems 
incorrect imo.

>
>>   {
>>       if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
>>           return decode_thumb(regs->pc, dabt);
>>   +    if ( (is_64bit_domain(current->domain) && 
>> !psr_mode_is_32bit(regs)) )
>
> The first part could be dropped because if psr_mode_is_32bit returns 0 
> then it must mean the domain is 64-bit.
>
>> +    {
>> +        return decode_loadstore_postindexing(regs->pc, dabt, instr);
>> +    }
>
> Coding style: Please drop the {}.
>
>> +
>>       /* TODO: Handle ARM instruction */
>>       gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
>>         return 1;
>>   }
>>   +#if CONFIG_ARM_64
>> +void post_increment_register(union ldr_str_instr_class *instr)
>
> instr should not be modified, so please use const. Also, it would be 
> preferrable to pass the regs in parameter. So the none of the decoding 
> code relies on the current regs.
>
> Furthermore, decode.c should only contain code to update the syndrome 
> and in theory Arm could decide to provide an valid syndrome in future 
> revision. So I would move this code in io.c (or maybe traps.c).
@Stefano/Julien - Can we all agree on traps.c ?
>
>> +{
>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>> +    register_t val;
>> +
>> +    /* handle when rn = SP */
>> +    if ( instr->code.rn == 31 )
>> +        val = regs->sp_el1;
>> +    else
>> +        val = get_user_reg(regs, instr->code.rn);
>> +
>> +    val += instr->code.imm9;
>> +
>> +    if ( instr->code.rn == 31 )
>> +        regs->sp_el1 = val;
>> +    else
>> +        set_user_reg(regs, instr->code.rn, val);
>> +}
>> +#endif
>> +
>>   /*
>>    * Local variables:
>>    * mode: C
>> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
>> index 4613763bdb..511cd4a05f 100644
>> --- a/xen/arch/arm/decode.h
>> +++ b/xen/arch/arm/decode.h
>> @@ -23,6 +23,35 @@
>>   #include <asm/regs.h>
>>   #include <asm/processor.h>
>>   +/*
>> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and 
>> Stores
>> + * Page 318 specifies the following bit pattern for
>> + * "load/store register (immediate post-indexed)".
>> + *
>> + * 31 30 29  27 26 25  23   21 20              11   9 4       0
>> + * ___________________________________________________________________
>> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn |  Rt   |
>> + * |____|______|__|____|____|__|_______________|____|_________|_______|
>> + */
>> +union ldr_str_instr_class {
>> +    uint32_t value;
>> +    struct ldr_str {
>> +        unsigned int rt:5;     /* Rt register */
>> +        unsigned int rn:5;     /* Rn register */
>> +        unsigned int fixed1:2; /* value == 01b */
>> +        signed int imm9:9;            /* imm9 */
>> +        unsigned int fixed2:1; /* value == 0b */
>> +        unsigned int opc:2;    /* opc */
>> +        unsigned int fixed3:2; /* value == 00b */
>> +        unsigned int v:1;      /* vector */
>> +        unsigned int fixed4:3; /* value == 111b */
>> +        unsigned int size:2;   /* size */
>> +    } code;
>> +};
>
> Looking at the code, post_increment_register() only care about 'rn' 
> and 'imm9'. So rather than exposing the full instruction, could we 
> instead provide the strict minimum? I.e something like:
>
> struct
> {
>      enum instr_type; /* Unknown, ldr/str post increment */
>      union
>      {
>          struct
>          {
>            register; /* Register to increment */
>            imm;      /* Immediate to add */
>          } ldr_str;
>      }
>      uint64_t register;
> }
>
>> +
>> +#define POST_INDEX_FIXED_MASK   0x3B200C00
>> +#define POST_INDEX_FIXED_VALUE  0x38000400
>> +
>>   /**
>>    * Decode an instruction from pc
>>    * /!\ This function is not intended to fully decode an 
>> instruction. It
>> @@ -35,8 +64,18 @@
>>    */
>>     int decode_instruction(const struct cpu_user_regs *regs,
>> -                       struct hsr_dabt *dabt);
>> +                       struct hsr_dabt *dabt,
>> +                       union ldr_str_instr_class *instr);
>>   +/**
>> + * Update the register value for Rn
>> + * /!\ This function is used to update the register value for Rn when a
>
> NIT: I would drop /!\ because this looks more the description of the 
> function rather than a warning.
>
>> + * post indexing ldr/str instruction is decoded.
>> + *
>> + * This function will get:
>> + * - The post indexing ldr/str instruction opcode
>> + */
>> +void post_increment_register(union ldr_str_instr_class *instr);
>>   #endif /* __ARCH_ARM_DECODE_H_ */
>>     /*
>> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
>> index 729287e37c..b9c15e1fe7 100644
>> --- a/xen/arch/arm/io.c
>> +++ b/xen/arch/arm/io.c
>> @@ -106,14 +106,29 @@ enum io_state try_handle_mmio(struct 
>> cpu_user_regs *regs,
>>           .gpa = gpa,
>>           .dabt = dabt
>>       };
>> +    int rc;
>> +    union ldr_str_instr_class instr = {0};
>>         ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
>>   +    /*
>> +     * Armv8 processor does not provide a valid syndrome for 
>> post-indexing
>> +     * ldr/str instructions. So in order to process these instructions,
>> +     * Xen must decode them.
>> +     */
>
> This sort of comments will become stall if we decide to add more 
> decoding. So I would keep the comments in io.c generic (i.e. no 
> mention of load/store).
Yes, I can remove ldr/str from comments
>
>> +    if ( !info.dabt.valid )
>
> I would rather prefer if we keep using dabt.valid here rather than 
> info.dabt.valid. It is shorter and keep consistent with how it was 
> checked before.
>
>> +    {
>> +        rc = decode_instruction(regs, &info.dabt, &instr);
>> +        if ( rc )
>> +        {
>> +            gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
>> +            return IO_ABORT;
>> +        }
>> +    }
>> +
>>       handler = find_mmio_handler(v->domain, info.gpa);
>>       if ( !handler )
>>       {
>> -        int rc;
>> -
>>           rc = try_fwd_ioserv(regs, v, &info);
>>           if ( rc == IO_HANDLED )
>>               return handle_ioserv(regs, v);
>> @@ -121,10 +136,6 @@ enum io_state try_handle_mmio(struct 
>> cpu_user_regs *regs,
>>           return rc;
>>       }
>>   -    /* All the instructions used on emulated MMIO region should be 
>> valid */
>> -    if ( !dabt.valid )
>> -        return IO_ABORT;
>> -
>
> I think moving if ( !dabt.valid ) earlier should be part of a 
> pre-patch. This would allows us to backport it as we don't want to 
> forward the I/O to an IOREQ server if ISV=0.
I would say that in the pre-patch, we should move "if ( !dabt.valid )" 
before "find_mmio_handler()". The reason being if the intruction was not 
decoded successfully (ie ISV=0), then there is no need to find the mmio 
handler corresponding to the gpa. Please let me know your thoughts and I 
can send the pre-patch.
>
>>       /*
>>        * Erratum 766422: Thumb store translation fault to Hypervisor may
>>        * not have correct HSR Rt value.
>> @@ -134,7 +145,7 @@ enum io_state try_handle_mmio(struct 
>> cpu_user_regs *regs,
>>       {
>>           int rc;
>>   -        rc = decode_instruction(regs, &info.dabt);
>> +        rc = decode_instruction(regs, &info.dabt, NULL);
>
> Could we combine the two decode_instruction()?

Do you mean something like this :-

if ( (!info.dabt.valid ) || (( check_workaround_766422() && (regs->cpsr 
& PSR_THUMB) &&
          dabt.write) )

{

         rc = decode_instruction(regs, &info.dabt, &instr); // We know 
that for PSR_THUMB, instr is ignored.
         if ( rc )
         {
             gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
             return IO_ABORT;
         }

}

>
>>           if ( rc )
>>           {
>>               gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
>> @@ -143,9 +154,21 @@ enum io_state try_handle_mmio(struct 
>> cpu_user_regs *regs,
>>       }
>>         if ( info.dabt.write )
>> -        return handle_write(handler, v, &info);
>> +        rc = handle_write(handler, v, &info);
>>       else
>> -        return handle_read(handler, v, &info);
>> +        rc = handle_read(handler, v, &info);
>> +
>> +#if CONFIG_ARM_64
>
> Rather than using #ifdef here, I prefer if provide a stub for 
> post_increment_register() that contains ASSERT_UNREACHABLE(). So the 
> code in io.c is more arch-agnostic.

So you mean something like this (in traps.c):-

#if CONFIG_ARM_64

void post_increment_register(union ldr_str_instr_class *instr)

{

     // handle the post increment

}

#else

void post_increment_register(union ldr_str_instr_class *instr)

{

     ASSERT_UNREACHABLE();

}

#endif

If so, I am fine with this.

>
>> +    if ( (is_64bit_domain(current->domain) && 
>> !psr_mode_is_32bit(regs)) )
>
> This is a fairly expensive check. Can we avoid it and instead rely on 
> the instruction to be decoded?

yes, as this has already been checked in decode_instruction(), so we can 
just rely on instr.

- Ayan

>
>> +    {
>> +        if ( instr.value != 0 )
>
> I would prefer if we carry a boolean to indicate whether we manually 
> decoded the instruction. The main advantage is one doesn't need to 
> check in the Arm Arm to figure out that the 0x0 will never result to a 
> valid instruction (it is thankfully used for UDF).
>
>> +        {
>> +            post_increment_register(&instr);
>> +        }
>> +    }
>> +#endif
>> +
>> +    return rc;
>>   }
>>     void register_mmio_handler(struct domain *d,
>
> Cheers,
>

Re: [XEN v4] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Julien Grall 2 years, 2 months ago

On 26/01/2022 12:21, Ayan Kumar Halder wrote:
> Hi Julien/Stefano,

Hi,

> On 25/01/2022 23:02, Julien Grall wrote:
>>> +    }
>>> +
>>> +    /*
>>> +     * Handle when rn = SP
>>> +     * Refer ArmV8 ARM DDI 0487G.b, Page - D1-2463 "Stack pointer 
>>> register selection"
>>> +     * As we are interested in handling exceptions only from EL1 in 
>>> AArch64 state,
>>> +     * thus M[3:0] == EL1h (Page - C5-480 "When exception taken from 
>>> AArch64 state:")
>>
>> I read the last sentence as "We only support decoding from instruction 
>> run at EL1". But I can't find a check guarantee that.
> Sorry, the check below does that but is used only for rn == 31. I should 
> move ((regs->cpsr & PSR_MODE_MASK) != PSR_MODE_EL1h) ) into a separate 
> check of its own.

The question is why do we want to limit instruction decoding to EL1?

>>
>>> +     */
>>> +    if ( (instr->code.rn == 31) && ((regs->cpsr & PSR_MODE_MASK) != 
>>> PSR_MODE_EL1h) )
>>> +    {
>>> +        gprintk(XENLOG_ERR, "SP is valid only for EL1h\n");
>>> +        goto bad_loadstore;
>>> +    }
>>> +
>>> +    if ( instr->code.v != 0 )
>>> +    {
>>> +        gprintk(XENLOG_ERR,
>>> +            "ldr/str post indexing for vector types are not 
>>> supported\n");
>>> +        goto bad_loadstore;
>>> +    }
>>> +
>>> +    /* Check for STR (immediate) */
>>> +    if ( instr->code.opc == 0 )
>>> +    {
>>> +        dabt->write = 1;
>>> +    }
>>
>> Coding style: We don't use {} for single line. In this case, it would 
>> also result to have a more readable code.
>>
>>> +    /* Check for LDR (immediate) */
>>> +    else if ( instr->code.opc == 1 )
>>> +    {
>>> +        dabt->write = 0;
>>> +    }
>>
>> Same.
>>
>>> +    else
>>> +    {
>>> +        gprintk(XENLOG_ERR,
>>> +            "Decoding ldr/str post indexing is not supported for 
>>> this variant\n");
>>
>> The indentation looks wrong here.
>>
>>> +        goto bad_loadstore;
>>> +    }
>>> +
>>> +    gprintk(XENLOG_INFO,
>>> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, 
>>> instr->code.imm9 = %d\n",
>>> +        instr->code.rt, instr->code.size, instr->code.imm9);
>>
>> The indentation looks wrong here.
>>
>>> +
>>> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
>>> +    dabt->valid = 1;
>>> +
>>> +    return 0;
>>> +
>>> + bad_loadstore:
>>> +    gprintk(XENLOG_ERR, "unhandled Arm instruction 0x%x\n", 
>>> instr->value);
>>> +    return 1;
>>> +}
>>> +
>>>   static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
>>>   {
>>>       uint16_t instr;
>>> @@ -150,17 +245,44 @@ bad_thumb:
>>>       return 1;
>>>   }
>>>   -int decode_instruction(const struct cpu_user_regs *regs, struct 
>>> hsr_dabt *dabt)
>>> +int decode_instruction(const struct cpu_user_regs *regs, struct 
>>> hsr_dabt *dabt,
>>> +                       union ldr_str_instr_class *instr)
>>
>> I would like to avoid make the assumption that the instr we decode 
>> will always be a store/load. So please rename it to something more 
>> generic.
> 
> A difference of thought. Should we not name it as per the current usage 
> ? This will avoid any ambiguity.

What ambiguity? The caller should be completely agnostic to what 
instruction we decode.

> Later, if this gets extended, then it 
> can be named more appropriately depending on the usage.

Renaming afterwards is exactly what I want to avoid.

> 
> Also, the bit-pattern in "union ldr_str_instr_class" is very much 
> specific to ldr/str.

I agree that the bit-pattern is specific to load/store. But that's just 
one way to interpret the 32-bit value. It would be easy to add another 
way to interpret it. I.e:

union
{
    value;
    struct {
    } str_ldr;
    struct {
    } brk;
}

>> I think moving if ( !dabt.valid ) earlier should be part of a 
>> pre-patch. This would allows us to backport it as we don't want to 
>> forward the I/O to an IOREQ server if ISV=0.
> I would say that in the pre-patch, we should move "if ( !dabt.valid )" 
> before "find_mmio_handler()". The reason being if the intruction was not 
> decoded successfully (ie ISV=0), then there is no need to find the mmio 
> handler corresponding to the gpa. Please let me know your thoughts and I 
> can send the pre-patch.

Sounds good to me.

>>
>>>       /*
>>>        * Erratum 766422: Thumb store translation fault to Hypervisor may
>>>        * not have correct HSR Rt value.
>>> @@ -134,7 +145,7 @@ enum io_state try_handle_mmio(struct 
>>> cpu_user_regs *regs,
>>>       {
>>>           int rc;
>>>   -        rc = decode_instruction(regs, &info.dabt);
>>> +        rc = decode_instruction(regs, &info.dabt, NULL);
>>
>> Could we combine the two decode_instruction()?
> 
> Do you mean something like this :-

Yes. But...

> 
> if ( (!info.dabt.valid ) || (( check_workaround_766422() && (regs->cpsr 
> & PSR_THUMB) &&
>           dabt.write) )
> 
> {
> 
>          rc = decode_instruction(regs, &info.dabt, &instr); // We know 
> that for PSR_THUMB, instr is ignored.

... without this comment. Also see my reply to Stefano's email.

[...]

> So you mean something like this (in traps.c):-
> 
> #if CONFIG_ARM_64
> 
> void post_increment_register(union ldr_str_instr_class *instr)
> 
> {
> 
>      // handle the post increment
> 
> }
> 
> #else
> 
> void post_increment_register(union ldr_str_instr_class *instr)
> 
> {
> 
>      ASSERT_UNREACHABLE();
> 
> }
> 
> #endif
> 
> If so, I am fine with this.

Yes.

Cheers,

-- 
Julien Grall

Re: [XEN v4] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Stefano Stabellini 2 years, 2 months ago
On Wed, 26 Jan 2022, Ayan Kumar Halder wrote:
> > Furthermore, decode.c should only contain code to update the syndrome and in
> > theory Arm could decide to provide an valid syndrome in future revision. So
> > I would move this code in io.c (or maybe traps.c).
> @Stefano/Julien - Can we all agree on traps.c ?

Fine by me

Re: [XEN v4] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Julien Grall 2 years, 2 months ago
Hi Ayan,

Below some more comments on a few issues I noticed while reviewing your 
other patch yesterday.

On 25/01/2022 21:18, Ayan Kumar Halder wrote:
> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> index 729287e37c..b9c15e1fe7 100644
> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -106,14 +106,29 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>           .gpa = gpa,
>           .dabt = dabt
>       };
> +    int rc;
> +    union ldr_str_instr_class instr = {0};
>   
>       ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
>   
> +    /*
> +     * Armv8 processor does not provide a valid syndrome for post-indexing
> +     * ldr/str instructions. So in order to process these instructions,
> +     * Xen must decode them.
> +     */
> +    if ( !info.dabt.valid )
> +    {
> +        rc = decode_instruction(regs, &info.dabt, &instr);
> +        if ( rc )
> +        {
> +            gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> +            return IO_ABORT;

Sorry, I should have noticed this earlier.

If we return IO_ABORT here, it means Xen will inject an abort to the 
domain. However, an I/O may trap in Xen for other reasons. One explain 
is when Xen modify the P2M it has to temporarily remove the mapping and 
then recreate it. This leaves a small window when an access may trap.

In this situation, we don't care that the instruction syndrome is 
invalid. Therefore, it would be wrong to inject an abort to the domain. 
What we want is looking whether another part of Xen handles it.

One possibility would be to return IO_UNHANDLED here. However... it 
means that we will end up to decode the instruction when this is not 
necessary. So I think we want to move the decode after 
find_mmio_handler() has succeeded.

Regarding, try_fwd_ioserv(). As you rightly pointed out, it already 
contains a dabt.valid check. We would want to augment it to decode the 
instruction. I think the thumb workaround should be there as well.

> +        }
> +    }
> +
>       handler = find_mmio_handler(v->domain, info.gpa);
>       if ( !handler )
>       {
> -        int rc;
> -
>           rc = try_fwd_ioserv(regs, v, &info);
>           if ( rc == IO_HANDLED )
>               return handle_ioserv(regs, v);
> @@ -121,10 +136,6 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>           return rc;
>       }
>   
> -    /* All the instructions used on emulated MMIO region should be valid */
> -    if ( !dabt.valid )
> -        return IO_ABORT;
> -
>       /*
>        * Erratum 766422: Thumb store translation fault to Hypervisor may
>        * not have correct HSR Rt value.
> @@ -134,7 +145,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>       {
>           int rc;
>   
> -        rc = decode_instruction(regs, &info.dabt);
> +        rc = decode_instruction(regs, &info.dabt, NULL);
>           if ( rc )
>           {
>               gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> @@ -143,9 +154,21 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>       }
>   
>       if ( info.dabt.write )
> -        return handle_write(handler, v, &info);
> +        rc = handle_write(handler, v, &info);
>       else
> -        return handle_read(handler, v, &info);
> +        rc = handle_read(handler, v, &info);
> +
> +#if CONFIG_ARM_64
> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
> +    {
> +        if ( instr.value != 0 )
> +        {
> +            post_increment_register(&instr);
> +        }

This path will not be reached when the I/O is forwarded to an IOREQ 
server. I think we need to add similar code in 
arch_ioreq_complete_mmio() (just before advance_pc()).

Just for completeness, even if it would be easier, I don't think we can 
update the register before the MMIO is handled (i.e. in 
try_fwd_ioserv()) because in theory the instruction is not completed. So 
I am a bit worry that this may impact other subsystem (the obvious one 
are the registers dump would be incorrect).

Cheers,

-- 
Julien Grall