[XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions

Ayan Kumar Halder posted 1 patch 2 years, 2 months ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20220120215527.28138-1-ayankuma@xilinx.com
There is a newer version of this series
xen/arch/arm/decode.c | 78 ++++++++++++++++++++++++++++++++++++++++++-
xen/arch/arm/decode.h | 29 +++++++++++++++-
xen/arch/arm/io.c     | 66 ++++++++++++++++++++++++++++++++----
3 files changed, 165 insertions(+), 8 deletions(-)
[XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Ayan Kumar Halder 2 years, 2 months ago
At the moment, Xen is only handling data abort with valid syndrome (i.e.
ISV=0). Unfortunately, this doesn't cover all the instructions a domain
could use to access MMIO regions.

For instance, a baremetal OS can use any of the following instructions, where
x1 contains the address of the MMIO region:

1.      ldr     x2,    [x1],    #4
2.      ldr     w2,    [x1],    #-4
3.      ldr     x2,    [x1],    #-8
4.      ldr     w2,    [x1],    #4
5.      ldrh    w2,    [x1],    #8
6.      ldrb    w2,    [x1],    #16
7.      str     x2,    [x1],    #4
8.      str     w2,    [x1],    #-4
9.      strh    w2,    [x1],    #8
10.     strb    w2,    [x1],    #16

In the following two instructions, sp contains the address of the MMIO region:-
11.     ldrb    w2,    [sp],    #16
12.     ldrb    wzr,   [sp],    #16

In order to handle post-indexing store/load instructions (like those mentioned
above), Xen will need to fetch and decode the instruction.

This patch only cover post-index store/load instructions from AArch64 mode.
For now, this is left unimplemented for trap from AArch32 mode.

Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
---

Changelog :-
v2 - 1. Updated the rn register after reading from it. (Pointed by Julien,
        Stefano)
     2. Used a union to represent the instruction opcode (Suggestd by Bertrand)
     3. Fixed coding style issues (Pointed by Julien)
     4. In the previous patch, I was updating dabt->sign based on the signedness
        of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI 0487G.b,
        Page 3221, SSE indicates the signedness of the data item loaded. In our
        case, the data item loaded is always unsigned.

v3- 1. Handled all the variants of ldr/str (ie 64, 32, 16, 8 bit variants).
       Thus, I have removed the check for "instr->code.opc == 0" (Suggested by
       Andre)
    2. Handled the scenario when rn = SP, rt = XZR (Suggested by Jan, Andre)
    3. Added restriction for "rt != rn" (Suggested by Andre)
    4. Moved union ldr_str_instr_class {} to decode.h. This is the header included
       by io.c and decode.c (where the union is referred). (Suggested by Jan)
    5. Indentation and typo fixes (Suggested by Jan)

Changes suggested but could not be considered due to reasons :-
    1. Using accessor macros instead of bitfields for "ldr_str_instr_class". (Andre)
       Reason - I could not find a simple way to represent 9 bit signed integer
       (ie imm9) without using bitfields. If I use accessor macros, then I need
       to manually calculate two's complement to obtain the value when signed
       bit is present.

    2. I/D cache cohenerncy (Andre)
       Reason :- I could not see any instruction to flush the I cache.
       Refer https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/IC--Instruction-Cache-operation--an-alias-of-SYS-?lang=en#sa_ic_op
       So, this patch assumes that the I/D caches are coherent.

 xen/arch/arm/decode.c | 78 ++++++++++++++++++++++++++++++++++++++++++-
 xen/arch/arm/decode.h | 29 +++++++++++++++-
 xen/arch/arm/io.c     | 66 ++++++++++++++++++++++++++++++++----
 3 files changed, 165 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
index 792c2e92a7..f1c59ddd1a 100644
--- a/xen/arch/arm/decode.c
+++ b/xen/arch/arm/decode.c
@@ -84,6 +84,76 @@ bad_thumb2:
     return 1;
 }
 
+static int decode_loadstore_postindexing(register_t pc,
+                                         struct hsr_dabt *dabt,
+                                         union ldr_str_instr_class *instr)
+{
+    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
+        return -EFAULT;
+
+    /*
+     * Rn -ne Rt for ldr/str instruction.
+     * Check https://developer.arm.com/documentation/dui0802/a/CIHGJHED
+     * (Register restrictions)
+     *
+     * The only exception for this is when rn = 31. It denotes SP ("Use of SP")
+     *
+     * And when rt = 31, it denotes wzr/xzr. (Refer
+     * https://developer.arm.com/documentation/den0024/a/ARMv8-Registers/AArch64-special-registers
+     * "There is no register called X31 or W31. Many instructions are encoded
+     * such that the number 31 represents the zero register, ZR (WZR/XZR)."
+     */
+    if ( (instr->code.rn == instr->code.rt) && (instr->code.rn != 31) )
+        return -EINVAL;
+
+    /* First, let's check for the fixed values */
+    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
+         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
+    {
+        gprintk(XENLOG_ERR, "Cannot decode instruction 0x%x",instr->value);
+        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
+            " ldr/str post indexing\n");
+        goto bad_32bit_loadstore;
+    }
+
+    if ( instr->code.v != 0 )
+    {
+        gprintk(XENLOG_ERR,
+            "ldr/str post indexing for vector types are not supported\n");
+        goto bad_32bit_loadstore;
+    }
+
+    /* Check for STR (immediate) - 32 bit variant */
+    if ( instr->code.opc == 0 )
+    {
+        dabt->write = 1;
+    }
+    /* Check for LDR (immediate) - 32 bit variant */
+    else if ( instr->code.opc == 1 )
+    {
+        dabt->write = 0;
+    }
+    else
+    {
+        gprintk(XENLOG_ERR,
+            "Decoding ldr/str post indexing is not supported for this variant\n");
+        goto bad_32bit_loadstore;
+    }
+
+    gprintk(XENLOG_INFO,
+        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
+        instr->code.rt, instr->code.size, instr->code.imm9);
+
+    update_dabt(dabt, instr->code.rt, instr->code.size, false);
+    dabt->valid = 1;
+
+    return 0;
+
+ bad_32bit_loadstore:
+    gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", instr->value);
+    return 1;
+}
+
 static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
 {
     uint16_t instr;
@@ -150,11 +220,17 @@ bad_thumb:
     return 1;
 }
 
-int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
+int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
+                       union ldr_str_instr_class *instr)
 {
     if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
         return decode_thumb(regs->pc, dabt);
 
+    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
+    {
+        return decode_loadstore_postindexing(regs->pc, dabt, instr);
+    }
+
     /* TODO: Handle ARM instruction */
     gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
 
diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
index 4613763bdb..5c918c9bed 100644
--- a/xen/arch/arm/decode.h
+++ b/xen/arch/arm/decode.h
@@ -23,6 +23,32 @@
 #include <asm/regs.h>
 #include <asm/processor.h>
 
+/*
+ * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
+ * Page 318 specifies the following bit pattern for
+ * "load/store register (immediate post-indexed)".
+ *
+ * 31 30 29  27 26 25  23   21 20              11   9         4       0
+ * ___________________________________________________________________
+ * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
+ * |____|______|__|____|____|__|_______________|____|_________|_______|
+ */
+union ldr_str_instr_class {
+    uint32_t value;
+    struct ldr_str {
+        unsigned int rt:5;     /* Rt register */
+        unsigned int rn:5;     /* Rn register */
+        unsigned int fixed1:2; /* value == 01b */
+        signed int imm9:9;            /* imm9 */
+        unsigned int fixed2:1; /* value == 0b */
+        unsigned int opc:2;    /* opc */
+        unsigned int fixed3:2; /* value == 00b */
+        unsigned int v:1;      /* vector */
+        unsigned int fixed4:3; /* value == 111b */
+        unsigned int size:2;   /* size */
+    } code;
+};
+
 /**
  * Decode an instruction from pc
  * /!\ This function is not intended to fully decode an instruction. It
@@ -35,7 +61,8 @@
  */
 
 int decode_instruction(const struct cpu_user_regs *regs,
-                       struct hsr_dabt *dabt);
+                       struct hsr_dabt *dabt,
+                       union ldr_str_instr_class *instr);
 
 #endif /* __ARCH_ARM_DECODE_H_ */
 
diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index 729287e37c..acb483f235 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -65,6 +65,39 @@ static enum io_state handle_write(const struct mmio_handler *handler,
     return ret ? IO_HANDLED : IO_ABORT;
 }
 
+static void post_increment_register(union ldr_str_instr_class *instr)
+{
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    unsigned int val;
+
+    /* handle when rn = SP */
+    if ( instr->code.rn == 31 )
+    {
+        if ( (regs->cpsr & PSR_MODE_MASK) == PSR_MODE_EL1h )
+        {
+            val = regs->sp_el1;
+        }
+        else
+        {
+            BUG();
+        }
+    }
+    else
+    {
+        val = get_user_reg(regs, instr->code.rn);
+    }
+    val += instr->code.imm9;
+
+    if ( instr->code.rn == 31 )
+    {
+        regs->sp_el1 = val;
+    }
+    else
+    {
+        set_user_reg(regs, instr->code.rn, val);
+    }
+}
+
 /* This function assumes that mmio regions are not overlapped */
 static int cmp_mmio_handler(const void *key, const void *elem)
 {
@@ -106,14 +139,29 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
         .gpa = gpa,
         .dabt = dabt
     };
+    int rc;
+    union ldr_str_instr_class instr = {0};
 
     ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
 
+    /*
+     * Armv8 processor does not provide a valid syndrome for post-indexing
+     * ldr/str instructions. So in order to process these instructions,
+     * Xen must decode them.
+     */
+    if ( !info.dabt.valid )
+    {
+        rc = decode_instruction(regs, &info.dabt, &instr);
+        if ( rc )
+        {
+            gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
+            return IO_ABORT;
+        }
+    }
+
     handler = find_mmio_handler(v->domain, info.gpa);
     if ( !handler )
     {
-        int rc;
-
         rc = try_fwd_ioserv(regs, v, &info);
         if ( rc == IO_HANDLED )
             return handle_ioserv(regs, v);
@@ -122,7 +170,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
     }
 
     /* All the instructions used on emulated MMIO region should be valid */
-    if ( !dabt.valid )
+    if ( !info.dabt.valid )
         return IO_ABORT;
 
     /*
@@ -134,7 +182,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
     {
         int rc;
 
-        rc = decode_instruction(regs, &info.dabt);
+        rc = decode_instruction(regs, &info.dabt, NULL);
         if ( rc )
         {
             gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
@@ -143,9 +191,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
     }
 
     if ( info.dabt.write )
-        return handle_write(handler, v, &info);
+        rc = handle_write(handler, v, &info);
     else
-        return handle_read(handler, v, &info);
+        rc = handle_read(handler, v, &info);
+
+    if ( instr.value != 0 )
+    {
+        post_increment_register(&instr);
+    }
+    return rc;
 }
 
 void register_mmio_handler(struct domain *d,
-- 
2.17.1


Re: [XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Stefano Stabellini 2 years, 2 months ago
On Thu, 20 Jan 2022, Ayan Kumar Halder wrote:
> At the moment, Xen is only handling data abort with valid syndrome (i.e.
> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
> could use to access MMIO regions.
> 
> For instance, a baremetal OS can use any of the following instructions, where
> x1 contains the address of the MMIO region:
> 
> 1.      ldr     x2,    [x1],    #4
> 2.      ldr     w2,    [x1],    #-4
> 3.      ldr     x2,    [x1],    #-8
> 4.      ldr     w2,    [x1],    #4
> 5.      ldrh    w2,    [x1],    #8
> 6.      ldrb    w2,    [x1],    #16
> 7.      str     x2,    [x1],    #4
> 8.      str     w2,    [x1],    #-4
> 9.      strh    w2,    [x1],    #8
> 10.     strb    w2,    [x1],    #16
> 
> In the following two instructions, sp contains the address of the MMIO region:-
> 11.     ldrb    w2,    [sp],    #16
> 12.     ldrb    wzr,   [sp],    #16
> 
> In order to handle post-indexing store/load instructions (like those mentioned
> above), Xen will need to fetch and decode the instruction.
> 
> This patch only cover post-index store/load instructions from AArch64 mode.
> For now, this is left unimplemented for trap from AArch32 mode.
> 
> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>

This is a lot better, thanks!


> ---
> 
> Changelog :-
> v2 - 1. Updated the rn register after reading from it. (Pointed by Julien,
>         Stefano)
>      2. Used a union to represent the instruction opcode (Suggestd by Bertrand)
>      3. Fixed coding style issues (Pointed by Julien)
>      4. In the previous patch, I was updating dabt->sign based on the signedness
>         of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI 0487G.b,
>         Page 3221, SSE indicates the signedness of the data item loaded. In our
>         case, the data item loaded is always unsigned.
> 
> v3- 1. Handled all the variants of ldr/str (ie 64, 32, 16, 8 bit variants).
>        Thus, I have removed the check for "instr->code.opc == 0" (Suggested by
>        Andre)
>     2. Handled the scenario when rn = SP, rt = XZR (Suggested by Jan, Andre)
>     3. Added restriction for "rt != rn" (Suggested by Andre)
>     4. Moved union ldr_str_instr_class {} to decode.h. This is the header included
>        by io.c and decode.c (where the union is referred). (Suggested by Jan)
>     5. Indentation and typo fixes (Suggested by Jan)
> 
> Changes suggested but could not be considered due to reasons :-
>     1. Using accessor macros instead of bitfields for "ldr_str_instr_class". (Andre)
>        Reason - I could not find a simple way to represent 9 bit signed integer
>        (ie imm9) without using bitfields. If I use accessor macros, then I need
>        to manually calculate two's complement to obtain the value when signed
>        bit is present.
> 
>     2. I/D cache cohenerncy (Andre)
>        Reason :- I could not see any instruction to flush the I cache.
>        Refer https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/IC--Instruction-Cache-operation--an-alias-of-SYS-?lang=en#sa_ic_op
>        So, this patch assumes that the I/D caches are coherent.
> 
>  xen/arch/arm/decode.c | 78 ++++++++++++++++++++++++++++++++++++++++++-
>  xen/arch/arm/decode.h | 29 +++++++++++++++-
>  xen/arch/arm/io.c     | 66 ++++++++++++++++++++++++++++++++----
>  3 files changed, 165 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
> index 792c2e92a7..f1c59ddd1a 100644
> --- a/xen/arch/arm/decode.c
> +++ b/xen/arch/arm/decode.c
> @@ -84,6 +84,76 @@ bad_thumb2:
>      return 1;
>  }
>  
> +static int decode_loadstore_postindexing(register_t pc,
> +                                         struct hsr_dabt *dabt,
> +                                         union ldr_str_instr_class *instr)
> +{
> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
> +        return -EFAULT;
> +
> +    /*
> +     * Rn -ne Rt for ldr/str instruction.
> +     * Check https://developer.arm.com/documentation/dui0802/a/CIHGJHED
> +     * (Register restrictions)
> +     *
> +     * The only exception for this is when rn = 31. It denotes SP ("Use of SP")
> +     *
> +     * And when rt = 31, it denotes wzr/xzr. (Refer
> +     * https://developer.arm.com/documentation/den0024/a/ARMv8-Registers/AArch64-special-registers
> +     * "There is no register called X31 or W31. Many instructions are encoded
> +     * such that the number 31 represents the zero register, ZR (WZR/XZR)."
> +     */
> +    if ( (instr->code.rn == instr->code.rt) && (instr->code.rn != 31) )
> +        return -EINVAL;
>
> +    /* First, let's check for the fixed values */
> +    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
> +         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
> +    {
> +        gprintk(XENLOG_ERR, "Cannot decode instruction 0x%x",instr->value);
> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
> +            " ldr/str post indexing\n");
> +        goto bad_32bit_loadstore;
> +    }

Maybe this is a useless optimization but I would write this using masks
and bitwise opts:

#define POST_INDX_FIXED_MASK  0x38200c00
#define POST_INDX_FIXED_VALUE 0x38000400

if ( (instr->value & POST_INDX_FIXED_MASK) != POST_INDX_FIXED_VALUE )
    goto bad_32bit_loadstore;



> +    if ( instr->code.v != 0 )
> +    {
> +        gprintk(XENLOG_ERR,
> +            "ldr/str post indexing for vector types are not supported\n");
> +        goto bad_32bit_loadstore;
> +    }
> +
> +    /* Check for STR (immediate) - 32 bit variant */
> +    if ( instr->code.opc == 0 )
> +    {
> +        dabt->write = 1;
> +    }
> +    /* Check for LDR (immediate) - 32 bit variant */
> +    else if ( instr->code.opc == 1 )
> +    {
> +        dabt->write = 0;
> +    }
> +    else
> +    {
> +        gprintk(XENLOG_ERR,
> +            "Decoding ldr/str post indexing is not supported for this variant\n");
> +        goto bad_32bit_loadstore;
> +    }
> +
> +    gprintk(XENLOG_INFO,
> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
> +        instr->code.rt, instr->code.size, instr->code.imm9);
> +
> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
> +    dabt->valid = 1;
> +
> +    return 0;
> +
> + bad_32bit_loadstore:
> +    gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", instr->value);
> +    return 1;
> +}
> +
>  static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
>  {
>      uint16_t instr;
> @@ -150,11 +220,17 @@ bad_thumb:
>      return 1;
>  }
>  
> -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
> +                       union ldr_str_instr_class *instr)
>  {
>      if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
>          return decode_thumb(regs->pc, dabt);
>  
> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
> +    {

We should also check that instr != NULL either here or at the beginning
of decode_loadstore_postindexing


> +        return decode_loadstore_postindexing(regs->pc, dabt, instr);
> +    }
> +
>      /* TODO: Handle ARM instruction */
>      gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
>  
> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
> index 4613763bdb..5c918c9bed 100644
> --- a/xen/arch/arm/decode.h
> +++ b/xen/arch/arm/decode.h
> @@ -23,6 +23,32 @@
>  #include <asm/regs.h>
>  #include <asm/processor.h>
>  
> +/*
> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
> + * Page 318 specifies the following bit pattern for
> + * "load/store register (immediate post-indexed)".
> + *
> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
> + * ___________________________________________________________________
> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
> + * |____|______|__|____|____|__|_______________|____|_________|_______|
> + */
> +union ldr_str_instr_class {
> +    uint32_t value;
> +    struct ldr_str {
> +        unsigned int rt:5;     /* Rt register */
> +        unsigned int rn:5;     /* Rn register */
> +        unsigned int fixed1:2; /* value == 01b */
> +        signed int imm9:9;            /* imm9 */
> +        unsigned int fixed2:1; /* value == 0b */
> +        unsigned int opc:2;    /* opc */
> +        unsigned int fixed3:2; /* value == 00b */
> +        unsigned int v:1;      /* vector */
> +        unsigned int fixed4:3; /* value == 111b */
> +        unsigned int size:2;   /* size */
> +    } code;
> +};
> +
>  /**
>   * Decode an instruction from pc
>   * /!\ This function is not intended to fully decode an instruction. It
> @@ -35,7 +61,8 @@
>   */
>  
>  int decode_instruction(const struct cpu_user_regs *regs,
> -                       struct hsr_dabt *dabt);
> +                       struct hsr_dabt *dabt,
> +                       union ldr_str_instr_class *instr);
>  
>  #endif /* __ARCH_ARM_DECODE_H_ */
>  
> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> index 729287e37c..acb483f235 100644
> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -65,6 +65,39 @@ static enum io_state handle_write(const struct mmio_handler *handler,
>      return ret ? IO_HANDLED : IO_ABORT;
>  }
>  
> +static void post_increment_register(union ldr_str_instr_class *instr)
> +{
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +    unsigned int val;

register_t val


> +    /* handle when rn = SP */
> +    if ( instr->code.rn == 31 )
> +    {
> +        if ( (regs->cpsr & PSR_MODE_MASK) == PSR_MODE_EL1h )
> +        {
> +            val = regs->sp_el1;

I think you need to #ifdef ARM64 the entire function because sp_el1 is
aarch64 only. Also, it might be better to move post_increment_register
to decode.c to keep is closer to the other half of the relevant code. I
don't feel strongly about it though, if other reviewers prefer to keep
it here, it is only fine by me.


> +        }
> +        else
> +        {
> +            BUG();

BUG is not a good idea in this code path because it might allow a guest
to cause a hypervisor crash. Instead you could print an error and
call:

    domain_crash(current->domain);

But I think it would be even better to add a check:

if ( (regs->cpsr & PSR_MODE_MASK) != PSR_MODE_EL1h )
{
    goto bad_32bit_loadstore;
}

in decode_loadstore_postindexing or in decode_instruction so that if get
here we are sure that we can handle the post-indexing instruction
completely.


> +        }
> +    }
> +    else
> +    {
> +        val = get_user_reg(regs, instr->code.rn);
> +    }
> +    val += instr->code.imm9;
> +    if ( instr->code.rn == 31 )
> +    {
> +        regs->sp_el1 = val;
> +    }
> +    else
> +    {
> +        set_user_reg(regs, instr->code.rn, val);
> +    }
> +}
> +
>  /* This function assumes that mmio regions are not overlapped */
>  static int cmp_mmio_handler(const void *key, const void *elem)
>  {
> @@ -106,14 +139,29 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>          .gpa = gpa,
>          .dabt = dabt
>      };
> +    int rc;
> +    union ldr_str_instr_class instr = {0};
>  
>      ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
>  
> +    /*
> +     * Armv8 processor does not provide a valid syndrome for post-indexing
> +     * ldr/str instructions. So in order to process these instructions,
> +     * Xen must decode them.
> +     */
> +    if ( !info.dabt.valid )
> +    {
> +        rc = decode_instruction(regs, &info.dabt, &instr);
> +        if ( rc )
> +        {
> +            gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> +            return IO_ABORT;
> +        }
> +    }
> +
>      handler = find_mmio_handler(v->domain, info.gpa);
>      if ( !handler )
>      {
> -        int rc;
> -
>          rc = try_fwd_ioserv(regs, v, &info);
>          if ( rc == IO_HANDLED )
>              return handle_ioserv(regs, v);
> @@ -122,7 +170,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>      }
>  
>      /* All the instructions used on emulated MMIO region should be valid */
> -    if ( !dabt.valid )
> +    if ( !info.dabt.valid )
>          return IO_ABORT;

Is this check still necessary given the new info.dabt.valid check above?


>      /*
> @@ -134,7 +182,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>      {
>          int rc;
>  
> -        rc = decode_instruction(regs, &info.dabt);
> +        rc = decode_instruction(regs, &info.dabt, NULL);
>          if ( rc )
>          {
>              gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> @@ -143,9 +191,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>      }
>  
>      if ( info.dabt.write )
> -        return handle_write(handler, v, &info);
> +        rc = handle_write(handler, v, &info);
>      else
> -        return handle_read(handler, v, &info);
> +        rc = handle_read(handler, v, &info);
> +
> +    if ( instr.value != 0 )
> +    {
> +        post_increment_register(&instr);
> +    }
> +    return rc;
>  }
>  
>  void register_mmio_handler(struct domain *d,

Re: [XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Ayan Kumar Halder 2 years, 2 months ago
Hi Stefano/Andre/All,

Thanks for the feedback.

On 22/01/2022 01:04, Stefano Stabellini wrote:
> On Thu, 20 Jan 2022, Ayan Kumar Halder wrote:
>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
>> could use to access MMIO regions.
>>
>> For instance, a baremetal OS can use any of the following instructions, where
>> x1 contains the address of the MMIO region:
>>
>> 1.      ldr     x2,    [x1],    #4
>> 2.      ldr     w2,    [x1],    #-4
>> 3.      ldr     x2,    [x1],    #-8
>> 4.      ldr     w2,    [x1],    #4
>> 5.      ldrh    w2,    [x1],    #8
>> 6.      ldrb    w2,    [x1],    #16
>> 7.      str     x2,    [x1],    #4
>> 8.      str     w2,    [x1],    #-4
>> 9.      strh    w2,    [x1],    #8
>> 10.     strb    w2,    [x1],    #16
>>
>> In the following two instructions, sp contains the address of the MMIO region:-
>> 11.     ldrb    w2,    [sp],    #16
>> 12.     ldrb    wzr,   [sp],    #16
>>
>> In order to handle post-indexing store/load instructions (like those mentioned
>> above), Xen will need to fetch and decode the instruction.
>>
>> This patch only cover post-index store/load instructions from AArch64 mode.
>> For now, this is left unimplemented for trap from AArch32 mode.
>>
>> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
> This is a lot better, thanks!
>
>
>> ---
>>
>> Changelog :-
>> v2 - 1. Updated the rn register after reading from it. (Pointed by Julien,
>>          Stefano)
>>       2. Used a union to represent the instruction opcode (Suggestd by Bertrand)
>>       3. Fixed coding style issues (Pointed by Julien)
>>       4. In the previous patch, I was updating dabt->sign based on the signedness
>>          of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI 0487G.b,
>>          Page 3221, SSE indicates the signedness of the data item loaded. In our
>>          case, the data item loaded is always unsigned.
>>
>> v3- 1. Handled all the variants of ldr/str (ie 64, 32, 16, 8 bit variants).
>>         Thus, I have removed the check for "instr->code.opc == 0" (Suggested by
>>         Andre)
>>      2. Handled the scenario when rn = SP, rt = XZR (Suggested by Jan, Andre)
>>      3. Added restriction for "rt != rn" (Suggested by Andre)
>>      4. Moved union ldr_str_instr_class {} to decode.h. This is the header included
>>         by io.c and decode.c (where the union is referred). (Suggested by Jan)
>>      5. Indentation and typo fixes (Suggested by Jan)
>>
>> Changes suggested but could not be considered due to reasons :-
>>      1. Using accessor macros instead of bitfields for "ldr_str_instr_class". (Andre)
>>         Reason - I could not find a simple way to represent 9 bit signed integer
>>         (ie imm9) without using bitfields. If I use accessor macros, then I need
>>         to manually calculate two's complement to obtain the value when signed
>>         bit is present.
>>
>>      2. I/D cache cohenerncy (Andre)
>>         Reason :- I could not see any instruction to flush the I cache.
>>         Refer https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/IC--Instruction-Cache-operation--an-alias-of-SYS-?lang=en#sa_ic_op
>>         So, this patch assumes that the I/D caches are coherent.
>>
>>   xen/arch/arm/decode.c | 78 ++++++++++++++++++++++++++++++++++++++++++-
>>   xen/arch/arm/decode.h | 29 +++++++++++++++-
>>   xen/arch/arm/io.c     | 66 ++++++++++++++++++++++++++++++++----
>>   3 files changed, 165 insertions(+), 8 deletions(-)
>>
>> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
>> index 792c2e92a7..f1c59ddd1a 100644
>> --- a/xen/arch/arm/decode.c
>> +++ b/xen/arch/arm/decode.c
>> @@ -84,6 +84,76 @@ bad_thumb2:
>>       return 1;
>>   }
>>   
>> +static int decode_loadstore_postindexing(register_t pc,
>> +                                         struct hsr_dabt *dabt,
>> +                                         union ldr_str_instr_class *instr)
>> +{
>> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
>> +        return -EFAULT;
>> +
>> +    /*
>> +     * Rn -ne Rt for ldr/str instruction.
>> +     * Check https://developer.arm.com/documentation/dui0802/a/CIHGJHED
>> +     * (Register restrictions)
>> +     *
>> +     * The only exception for this is when rn = 31. It denotes SP ("Use of SP")
>> +     *
>> +     * And when rt = 31, it denotes wzr/xzr. (Refer
>> +     * https://developer.arm.com/documentation/den0024/a/ARMv8-Registers/AArch64-special-registers
>> +     * "There is no register called X31 or W31. Many instructions are encoded
>> +     * such that the number 31 represents the zero register, ZR (WZR/XZR)."
>> +     */
>> +    if ( (instr->code.rn == instr->code.rt) && (instr->code.rn != 31) )
>> +        return -EINVAL;
>>
>> +    /* First, let's check for the fixed values */
>> +    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
>> +         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
>> +    {
>> +        gprintk(XENLOG_ERR, "Cannot decode instruction 0x%x",instr->value);
>> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
>> +            " ldr/str post indexing\n");
>> +        goto bad_32bit_loadstore;
>> +    }
> Maybe this is a useless optimization but I would write this using masks
> and bitwise opts:
>
> #define POST_INDX_FIXED_MASK  0x38200c00

Good suggestion. This should be 0x3B200c00

> #define POST_INDX_FIXED_VALUE 0x38000400
>
> if ( (instr->value & POST_INDX_FIXED_MASK) != POST_INDX_FIXED_VALUE )
>      goto bad_32bit_loadstore;

Done in v4.

>
>
>
>> +    if ( instr->code.v != 0 )
>> +    {
>> +        gprintk(XENLOG_ERR,
>> +            "ldr/str post indexing for vector types are not supported\n");
>> +        goto bad_32bit_loadstore;
>> +    }
>> +
>> +    /* Check for STR (immediate) - 32 bit variant */
>> +    if ( instr->code.opc == 0 )
>> +    {
>> +        dabt->write = 1;
>> +    }
>> +    /* Check for LDR (immediate) - 32 bit variant */
>> +    else if ( instr->code.opc == 1 )
>> +    {
>> +        dabt->write = 0;
>> +    }
>> +    else
>> +    {
>> +        gprintk(XENLOG_ERR,
>> +            "Decoding ldr/str post indexing is not supported for this variant\n");
>> +        goto bad_32bit_loadstore;
>> +    }
>> +
>> +    gprintk(XENLOG_INFO,
>> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
>> +        instr->code.rt, instr->code.size, instr->code.imm9);
>> +
>> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
>> +    dabt->valid = 1;
>> +
>> +    return 0;
>> +
>> + bad_32bit_loadstore:
>> +    gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", instr->value);
>> +    return 1;
>> +}
>> +
>>   static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
>>   {
>>       uint16_t instr;
>> @@ -150,11 +220,17 @@ bad_thumb:
>>       return 1;
>>   }
>>   
>> -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
>> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
>> +                       union ldr_str_instr_class *instr)
>>   {
>>       if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
>>           return decode_thumb(regs->pc, dabt);
>>   
>> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
>> +    {
> We should also check that instr != NULL either here or at the beginning
> of decode_loadstore_postindexing

Done at the beginning of decode_loadstore_postindexing()

>
>
>> +        return decode_loadstore_postindexing(regs->pc, dabt, instr);
>> +    }
>> +
>>       /* TODO: Handle ARM instruction */
>>       gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
>>   
>> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
>> index 4613763bdb..5c918c9bed 100644
>> --- a/xen/arch/arm/decode.h
>> +++ b/xen/arch/arm/decode.h
>> @@ -23,6 +23,32 @@
>>   #include <asm/regs.h>
>>   #include <asm/processor.h>
>>   
>> +/*
>> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
>> + * Page 318 specifies the following bit pattern for
>> + * "load/store register (immediate post-indexed)".
>> + *
>> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
>> + * ___________________________________________________________________
>> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
>> + * |____|______|__|____|____|__|_______________|____|_________|_______|
>> + */
>> +union ldr_str_instr_class {
>> +    uint32_t value;
>> +    struct ldr_str {
>> +        unsigned int rt:5;     /* Rt register */
>> +        unsigned int rn:5;     /* Rn register */
>> +        unsigned int fixed1:2; /* value == 01b */
>> +        signed int imm9:9;            /* imm9 */
>> +        unsigned int fixed2:1; /* value == 0b */
>> +        unsigned int opc:2;    /* opc */
>> +        unsigned int fixed3:2; /* value == 00b */
>> +        unsigned int v:1;      /* vector */
>> +        unsigned int fixed4:3; /* value == 111b */
>> +        unsigned int size:2;   /* size */
>> +    } code;
>> +};
>> +
>>   /**
>>    * Decode an instruction from pc
>>    * /!\ This function is not intended to fully decode an instruction. It
>> @@ -35,7 +61,8 @@
>>    */
>>   
>>   int decode_instruction(const struct cpu_user_regs *regs,
>> -                       struct hsr_dabt *dabt);
>> +                       struct hsr_dabt *dabt,
>> +                       union ldr_str_instr_class *instr);
>>   
>>   #endif /* __ARCH_ARM_DECODE_H_ */
>>   
>> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
>> index 729287e37c..acb483f235 100644
>> --- a/xen/arch/arm/io.c
>> +++ b/xen/arch/arm/io.c
>> @@ -65,6 +65,39 @@ static enum io_state handle_write(const struct mmio_handler *handler,
>>       return ret ? IO_HANDLED : IO_ABORT;
>>   }
>>   
>> +static void post_increment_register(union ldr_str_instr_class *instr)
>> +{
>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>> +    unsigned int val;
> register_t val

Done

>
>
>> +    /* handle when rn = SP */
>> +    if ( instr->code.rn == 31 )
>> +    {
>> +        if ( (regs->cpsr & PSR_MODE_MASK) == PSR_MODE_EL1h )
>> +        {
>> +            val = regs->sp_el1;
> I think you need to #ifdef ARM64 the entire function because sp_el1 is
> aarch64 only. Also, it might be better to move post_increment_register
> to decode.c to keep is closer to the other half of the relevant code. I
> don't feel strongly about it though, if other reviewers prefer to keep
> it here, it is only fine by me.

Makes sense as this is a part of the decoding logic. I haved moved this to decode.c as well.

>
>
>> +        }
>> +        else
>> +        {
>> +            BUG();
> BUG is not a good idea in this code path because it might allow a guest
> to cause a hypervisor crash. Instead you could print an error and
> call:
>
>      domain_crash(current->domain);
>
> But I think it would be even better to add a check:
>
> if ( (regs->cpsr & PSR_MODE_MASK) != PSR_MODE_EL1h )
> {
>      goto bad_32bit_loadstore;
> }
>
> in decode_loadstore_postindexing or in decode_instruction so that if get
> here we are sure that we can handle the post-indexing instruction
> completely.

That is a good suggestion. I will move the check to decode_loadstore_postindexing
(). Yes, Xen should not crash because of the guest.

>
>> +        }
>> +    }
>> +    else
>> +    {
>> +        val = get_user_reg(regs, instr->code.rn);
>> +    }
>> +    val += instr->code.imm9;
>> +    if ( instr->code.rn == 31 )
>> +    {
>> +        regs->sp_el1 = val;
>> +    }
>> +    else
>> +    {
>> +        set_user_reg(regs, instr->code.rn, val);
>> +    }
>> +}
>> +
>>   /* This function assumes that mmio regions are not overlapped */
>>   static int cmp_mmio_handler(const void *key, const void *elem)
>>   {
>> @@ -106,14 +139,29 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>>           .gpa = gpa,
>>           .dabt = dabt
>>       };
>> +    int rc;
>> +    union ldr_str_instr_class instr = {0};
>>   
>>       ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
>>   
>> +    /*
>> +     * Armv8 processor does not provide a valid syndrome for post-indexing
>> +     * ldr/str instructions. So in order to process these instructions,
>> +     * Xen must decode them.
>> +     */
>> +    if ( !info.dabt.valid )
>> +    {
>> +        rc = decode_instruction(regs, &info.dabt, &instr);
>> +        if ( rc )
>> +        {
>> +            gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
>> +            return IO_ABORT;
>> +        }
>> +    }
>> +
>>       handler = find_mmio_handler(v->domain, info.gpa);
>>       if ( !handler )
>>       {
>> -        int rc;
>> -
>>           rc = try_fwd_ioserv(regs, v, &info);
>>           if ( rc == IO_HANDLED )
>>               return handle_ioserv(regs, v);
>> @@ -122,7 +170,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>>       }
>>   
>>       /* All the instructions used on emulated MMIO region should be valid */
>> -    if ( !dabt.valid )
>> +    if ( !info.dabt.valid )
>>           return IO_ABORT;
> Is this check still necessary given the new info.dabt.valid check above?

Yes, this check is redundant. I will remove this.
I have sent out a v4 patch with all these fixes.
Please review "[XEN v4] xen/arm64: io: Decode ldr/str post-indexing instructions"

- Ayan

>
>
>>       /*
>> @@ -134,7 +182,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>>       {
>>           int rc;
>>   
>> -        rc = decode_instruction(regs, &info.dabt);
>> +        rc = decode_instruction(regs, &info.dabt, NULL);
>>           if ( rc )
>>           {
>>               gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
>> @@ -143,9 +191,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>>       }
>>   
>>       if ( info.dabt.write )
>> -        return handle_write(handler, v, &info);
>> +        rc = handle_write(handler, v, &info);
>>       else
>> -        return handle_read(handler, v, &info);
>> +        rc = handle_read(handler, v, &info);
>> +
>> +    if ( instr.value != 0 )
>> +    {
>> +        post_increment_register(&instr);
>> +    }
>> +    return rc;
>>   }
>>   
>>   void register_mmio_handler(struct domain *d,

Re: [XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Andre Przywara 2 years, 2 months ago
On Thu, 20 Jan 2022 21:55:27 +0000
Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:

Hi,

> At the moment, Xen is only handling data abort with valid syndrome (i.e.
> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
> could use to access MMIO regions.
> 
> For instance, a baremetal OS can use any of the following instructions, where
> x1 contains the address of the MMIO region:
> 
> 1.      ldr     x2,    [x1],    #4

That looks dodgy, since is misaligns the pointer afterwards. MMIO
access typically go to device memory, which must be naturally aligned.
Just don't give a bad example here and change that to a multiple of 8.

> 2.      ldr     w2,    [x1],    #-4

(this one is fine, btw, because it's a 32-bit read)

> 3.      ldr     x2,    [x1],    #-8
> 4.      ldr     w2,    [x1],    #4
> 5.      ldrh    w2,    [x1],    #8
> 6.      ldrb    w2,    [x1],    #16

More naturally I'd use the data size of the postindex value ...
ldr  x2 ... #-8
ldr  w2 ... #4
ldrh w2 ... #2
ldrb w2 ... #1

> 7.      str     x2,    [x1],    #4

This is a again problematic, because x1 is not 8-byte aligned anymore
after that.

> 8.      str     w2,    [x1],    #-4
> 9.      strh    w2,    [x1],    #8
> 10.     strb    w2,    [x1],    #16
> 
> In the following two instructions, sp contains the address of the MMIO region:-

Really? I don't think you should give people funny ideas, just mention
that the Rn register could theoretically be the stack pointer.

> 11.     ldrb    w2,    [sp],    #16
> 12.     ldrb    wzr,   [sp],    #16
> 
> In order to handle post-indexing store/load instructions (like those mentioned
> above), Xen will need to fetch and decode the instruction.
> 
> This patch only cover post-index store/load instructions from AArch64 mode.
> For now, this is left unimplemented for trap from AArch32 mode.
> 
> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
> ---
> 
> Changelog :-
> v2 - 1. Updated the rn register after reading from it. (Pointed by Julien,
>         Stefano)
>      2. Used a union to represent the instruction opcode (Suggestd by Bertrand)
>      3. Fixed coding style issues (Pointed by Julien)
>      4. In the previous patch, I was updating dabt->sign based on the signedness
>         of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI 0487G.b,
>         Page 3221, SSE indicates the signedness of the data item loaded. In our
>         case, the data item loaded is always unsigned.
> 
> v3- 1. Handled all the variants of ldr/str (ie 64, 32, 16, 8 bit variants).
>        Thus, I have removed the check for "instr->code.opc == 0" (Suggested by
>        Andre)
>     2. Handled the scenario when rn = SP, rt = XZR (Suggested by Jan, Andre)
>     3. Added restriction for "rt != rn" (Suggested by Andre)
>     4. Moved union ldr_str_instr_class {} to decode.h. This is the header included
>        by io.c and decode.c (where the union is referred). (Suggested by Jan)
>     5. Indentation and typo fixes (Suggested by Jan)
> 
> Changes suggested but could not be considered due to reasons :-
>     1. Using accessor macros instead of bitfields for "ldr_str_instr_class". (Andre)
>        Reason - I could not find a simple way to represent 9 bit signed integer
>        (ie imm9) without using bitfields. If I use accessor macros, then I need
>        to manually calculate two's complement to obtain the value when signed
>        bit is present.
> 
>     2. I/D cache cohenerncy (Andre)
>        Reason :- I could not see any instruction to flush the I cache.

First, please try to avoid the term "flush", because it is somewhat
overloaded. The architecture speaks of "clean" and "invalidate", which
are more precise.
Assuming you mean "clean" here: conceptually there is no such thing for
the I cache, because it's always clean. The I$ will only be read from
the CPU side - from the instruction fetcher - there is nothing written
back through it. Every store goes through the data path - always.
That is the problem that I tried to sketch you previously: you don't
have a guarantee that the instruction you read from memory is the same
that the CPU executed. The guest could have changed the instruction
after the I$ fetched that. So the CPU will execute (and trap) on
instruction X, but you will read Y. I leave it up to your imagination
if that could be exploited.

>        Refer https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/IC--Instruction-Cache-operation--an-alias-of-SYS-?lang=en#sa_ic_op
>        So, this patch assumes that the I/D caches are coherent.

Bold. ;-)

Cheers,
Andre

> 
>  xen/arch/arm/decode.c | 78 ++++++++++++++++++++++++++++++++++++++++++-
>  xen/arch/arm/decode.h | 29 +++++++++++++++-
>  xen/arch/arm/io.c     | 66 ++++++++++++++++++++++++++++++++----
>  3 files changed, 165 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
> index 792c2e92a7..f1c59ddd1a 100644
> --- a/xen/arch/arm/decode.c
> +++ b/xen/arch/arm/decode.c
> @@ -84,6 +84,76 @@ bad_thumb2:
>      return 1;
>  }
>  
> +static int decode_loadstore_postindexing(register_t pc,
> +                                         struct hsr_dabt *dabt,
> +                                         union ldr_str_instr_class *instr)
> +{
> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
> +        return -EFAULT;
> +
> +    /*
> +     * Rn -ne Rt for ldr/str instruction.
> +     * Check https://developer.arm.com/documentation/dui0802/a/CIHGJHED
> +     * (Register restrictions)
> +     *
> +     * The only exception for this is when rn = 31. It denotes SP ("Use of SP")
> +     *
> +     * And when rt = 31, it denotes wzr/xzr. (Refer
> +     * https://developer.arm.com/documentation/den0024/a/ARMv8-Registers/AArch64-special-registers
> +     * "There is no register called X31 or W31. Many instructions are encoded
> +     * such that the number 31 represents the zero register, ZR (WZR/XZR)."
> +     */
> +    if ( (instr->code.rn == instr->code.rt) && (instr->code.rn != 31) )
> +        return -EINVAL;
> +
> +    /* First, let's check for the fixed values */
> +    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
> +         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
> +    {
> +        gprintk(XENLOG_ERR, "Cannot decode instruction 0x%x",instr->value);
> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
> +            " ldr/str post indexing\n");
> +        goto bad_32bit_loadstore;
> +    }
> +
> +    if ( instr->code.v != 0 )
> +    {
> +        gprintk(XENLOG_ERR,
> +            "ldr/str post indexing for vector types are not supported\n");
> +        goto bad_32bit_loadstore;
> +    }
> +
> +    /* Check for STR (immediate) - 32 bit variant */
> +    if ( instr->code.opc == 0 )
> +    {
> +        dabt->write = 1;
> +    }
> +    /* Check for LDR (immediate) - 32 bit variant */
> +    else if ( instr->code.opc == 1 )
> +    {
> +        dabt->write = 0;
> +    }
> +    else
> +    {
> +        gprintk(XENLOG_ERR,
> +            "Decoding ldr/str post indexing is not supported for this variant\n");
> +        goto bad_32bit_loadstore;
> +    }
> +
> +    gprintk(XENLOG_INFO,
> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
> +        instr->code.rt, instr->code.size, instr->code.imm9);
> +
> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
> +    dabt->valid = 1;
> +
> +    return 0;
> +
> + bad_32bit_loadstore:
> +    gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", instr->value);
> +    return 1;
> +}
> +
>  static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
>  {
>      uint16_t instr;
> @@ -150,11 +220,17 @@ bad_thumb:
>      return 1;
>  }
>  
> -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
> +                       union ldr_str_instr_class *instr)
>  {
>      if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
>          return decode_thumb(regs->pc, dabt);
>  
> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
> +    {
> +        return decode_loadstore_postindexing(regs->pc, dabt, instr);
> +    }
> +
>      /* TODO: Handle ARM instruction */
>      gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
>  
> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
> index 4613763bdb..5c918c9bed 100644
> --- a/xen/arch/arm/decode.h
> +++ b/xen/arch/arm/decode.h
> @@ -23,6 +23,32 @@
>  #include <asm/regs.h>
>  #include <asm/processor.h>
>  
> +/*
> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
> + * Page 318 specifies the following bit pattern for
> + * "load/store register (immediate post-indexed)".
> + *
> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
> + * ___________________________________________________________________
> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
> + * |____|______|__|____|____|__|_______________|____|_________|_______|
> + */
> +union ldr_str_instr_class {
> +    uint32_t value;
> +    struct ldr_str {
> +        unsigned int rt:5;     /* Rt register */
> +        unsigned int rn:5;     /* Rn register */
> +        unsigned int fixed1:2; /* value == 01b */
> +        signed int imm9:9;            /* imm9 */
> +        unsigned int fixed2:1; /* value == 0b */
> +        unsigned int opc:2;    /* opc */
> +        unsigned int fixed3:2; /* value == 00b */
> +        unsigned int v:1;      /* vector */
> +        unsigned int fixed4:3; /* value == 111b */
> +        unsigned int size:2;   /* size */
> +    } code;
> +};
> +
>  /**
>   * Decode an instruction from pc
>   * /!\ This function is not intended to fully decode an instruction. It
> @@ -35,7 +61,8 @@
>   */
>  
>  int decode_instruction(const struct cpu_user_regs *regs,
> -                       struct hsr_dabt *dabt);
> +                       struct hsr_dabt *dabt,
> +                       union ldr_str_instr_class *instr);
>  
>  #endif /* __ARCH_ARM_DECODE_H_ */
>  
> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> index 729287e37c..acb483f235 100644
> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -65,6 +65,39 @@ static enum io_state handle_write(const struct mmio_handler *handler,
>      return ret ? IO_HANDLED : IO_ABORT;
>  }
>  
> +static void post_increment_register(union ldr_str_instr_class *instr)
> +{
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +    unsigned int val;
> +
> +    /* handle when rn = SP */
> +    if ( instr->code.rn == 31 )
> +    {
> +        if ( (regs->cpsr & PSR_MODE_MASK) == PSR_MODE_EL1h )
> +        {
> +            val = regs->sp_el1;
> +        }
> +        else
> +        {
> +            BUG();
> +        }
> +    }
> +    else
> +    {
> +        val = get_user_reg(regs, instr->code.rn);
> +    }
> +    val += instr->code.imm9;
> +
> +    if ( instr->code.rn == 31 )
> +    {
> +        regs->sp_el1 = val;
> +    }
> +    else
> +    {
> +        set_user_reg(regs, instr->code.rn, val);
> +    }
> +}
> +
>  /* This function assumes that mmio regions are not overlapped */
>  static int cmp_mmio_handler(const void *key, const void *elem)
>  {
> @@ -106,14 +139,29 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>          .gpa = gpa,
>          .dabt = dabt
>      };
> +    int rc;
> +    union ldr_str_instr_class instr = {0};
>  
>      ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
>  
> +    /*
> +     * Armv8 processor does not provide a valid syndrome for post-indexing
> +     * ldr/str instructions. So in order to process these instructions,
> +     * Xen must decode them.
> +     */
> +    if ( !info.dabt.valid )
> +    {
> +        rc = decode_instruction(regs, &info.dabt, &instr);
> +        if ( rc )
> +        {
> +            gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> +            return IO_ABORT;
> +        }
> +    }
> +
>      handler = find_mmio_handler(v->domain, info.gpa);
>      if ( !handler )
>      {
> -        int rc;
> -
>          rc = try_fwd_ioserv(regs, v, &info);
>          if ( rc == IO_HANDLED )
>              return handle_ioserv(regs, v);
> @@ -122,7 +170,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>      }
>  
>      /* All the instructions used on emulated MMIO region should be valid */
> -    if ( !dabt.valid )
> +    if ( !info.dabt.valid )
>          return IO_ABORT;
>  
>      /*
> @@ -134,7 +182,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>      {
>          int rc;
>  
> -        rc = decode_instruction(regs, &info.dabt);
> +        rc = decode_instruction(regs, &info.dabt, NULL);
>          if ( rc )
>          {
>              gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> @@ -143,9 +191,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>      }
>  
>      if ( info.dabt.write )
> -        return handle_write(handler, v, &info);
> +        rc = handle_write(handler, v, &info);
>      else
> -        return handle_read(handler, v, &info);
> +        rc = handle_read(handler, v, &info);
> +
> +    if ( instr.value != 0 )
> +    {
> +        post_increment_register(&instr);
> +    }
> +    return rc;
>  }
>  
>  void register_mmio_handler(struct domain *d,


Re: [XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Ayan Kumar Halder 2 years, 2 months ago
Hi Andre,

Many thanks for your feedback. I have one clarification :-

On 22/01/2022 01:30, Andre Przywara wrote:
> On Thu, 20 Jan 2022 21:55:27 +0000
> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
>
> Hi,
>
>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
>> could use to access MMIO regions.
>>
>> For instance, a baremetal OS can use any of the following instructions, where
>> x1 contains the address of the MMIO region:
>>
>> 1.      ldr     x2,    [x1],    #4
> That looks dodgy, since is misaligns the pointer afterwards. MMIO
> access typically go to device memory, which must be naturally aligned.
> Just don't give a bad example here and change that to a multiple of 8.
>
>> 2.      ldr     w2,    [x1],    #-4
> (this one is fine, btw, because it's a 32-bit read)
>
>> 3.      ldr     x2,    [x1],    #-8
>> 4.      ldr     w2,    [x1],    #4
>> 5.      ldrh    w2,    [x1],    #8
>> 6.      ldrb    w2,    [x1],    #16
> More naturally I'd use the data size of the postindex value ...
> ldr  x2 ... #-8
> ldr  w2 ... #4
> ldrh w2 ... #2
> ldrb w2 ... #1
>
>> 7.      str     x2,    [x1],    #4
> This is a again problematic, because x1 is not 8-byte aligned anymore
> after that.
>
>> 8.      str     w2,    [x1],    #-4
>> 9.      strh    w2,    [x1],    #8
>> 10.     strb    w2,    [x1],    #16
>>
>> In the following two instructions, sp contains the address of the MMIO region:-
> Really? I don't think you should give people funny ideas, just mention
> that the Rn register could theoretically be the stack pointer.
>
>> 11.     ldrb    w2,    [sp],    #16
>> 12.     ldrb    wzr,   [sp],    #16
>>
>> In order to handle post-indexing store/load instructions (like those mentioned
>> above), Xen will need to fetch and decode the instruction.
>>
>> This patch only cover post-index store/load instructions from AArch64 mode.
>> For now, this is left unimplemented for trap from AArch32 mode.
>>
>> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
>> ---
>>
>> Changelog :-
>> v2 - 1. Updated the rn register after reading from it. (Pointed by Julien,
>>          Stefano)
>>       2. Used a union to represent the instruction opcode (Suggestd by Bertrand)
>>       3. Fixed coding style issues (Pointed by Julien)
>>       4. In the previous patch, I was updating dabt->sign based on the signedness
>>          of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI 0487G.b,
>>          Page 3221, SSE indicates the signedness of the data item loaded. In our
>>          case, the data item loaded is always unsigned.
>>
>> v3- 1. Handled all the variants of ldr/str (ie 64, 32, 16, 8 bit variants).
>>         Thus, I have removed the check for "instr->code.opc == 0" (Suggested by
>>         Andre)
>>      2. Handled the scenario when rn = SP, rt = XZR (Suggested by Jan, Andre)
>>      3. Added restriction for "rt != rn" (Suggested by Andre)
>>      4. Moved union ldr_str_instr_class {} to decode.h. This is the header included
>>         by io.c and decode.c (where the union is referred). (Suggested by Jan)
>>      5. Indentation and typo fixes (Suggested by Jan)
>>
>> Changes suggested but could not be considered due to reasons :-
>>      1. Using accessor macros instead of bitfields for "ldr_str_instr_class". (Andre)
>>         Reason - I could not find a simple way to represent 9 bit signed integer
>>         (ie imm9) without using bitfields. If I use accessor macros, then I need
>>         to manually calculate two's complement to obtain the value when signed
>>         bit is present.
>>
>>      2. I/D cache cohenerncy (Andre)
>>         Reason :- I could not see any instruction to flush the I cache.
> First, please try to avoid the term "flush", because it is somewhat
> overloaded. The architecture speaks of "clean" and "invalidate", which
> are more precise.
> Assuming you mean "clean" here: conceptually there is no such thing for
> the I cache, because it's always clean. The I$ will only be read from
> the CPU side - from the instruction fetcher - there is nothing written
> back through it. Every store goes through the data path - always.
> That is the problem that I tried to sketch you previously: you don't
> have a guarantee that the instruction you read from memory is the same
> that the CPU executed. The guest could have changed the instruction
> after the I$ fetched that. So the CPU will execute (and trap) on
> instruction X, but you will read Y. I leave it up to your imagination
> if that could be exploited.

I see what you mean.

Refer Armv8 Arm DDI 0487G.b Page D1-2476, it says that (for instr/data 
abort) the faulting virtual address and IPA is saved in FAR_ELx and 
HPFAR_EL2 respectively. But, I do not see if the faulting instruction is 
saved in any special register. Is there something I am missing ?

Else, :( this is a limitation of the architecture (imo). A hypervisor 
can be interested to see which instruction caused the abort when ISV = 0.

Also, if an instruction is being modified by the guest (after it has 
been loaded in the I cache), and if the guest does not invalidate the I 
cache + ISB, then this is a malicious behavior by the guest. Is my 
understanding correct ?

- Ayan

>
>>         Refer https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/IC--Instruction-Cache-operation--an-alias-of-SYS-?lang=en#sa_ic_op
>>         So, this patch assumes that the I/D caches are coherent.
> Bold. ;-)
>
> Cheers,
> Andre
>
>>   xen/arch/arm/decode.c | 78 ++++++++++++++++++++++++++++++++++++++++++-
>>   xen/arch/arm/decode.h | 29 +++++++++++++++-
>>   xen/arch/arm/io.c     | 66 ++++++++++++++++++++++++++++++++----
>>   3 files changed, 165 insertions(+), 8 deletions(-)
>>
>> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
>> index 792c2e92a7..f1c59ddd1a 100644
>> --- a/xen/arch/arm/decode.c
>> +++ b/xen/arch/arm/decode.c
>> @@ -84,6 +84,76 @@ bad_thumb2:
>>       return 1;
>>   }
>>   
>> +static int decode_loadstore_postindexing(register_t pc,
>> +                                         struct hsr_dabt *dabt,
>> +                                         union ldr_str_instr_class *instr)
>> +{
>> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
>> +        return -EFAULT;
>> +
>> +    /*
>> +     * Rn -ne Rt for ldr/str instruction.
>> +     * Check https://developer.arm.com/documentation/dui0802/a/CIHGJHED
>> +     * (Register restrictions)
>> +     *
>> +     * The only exception for this is when rn = 31. It denotes SP ("Use of SP")
>> +     *
>> +     * And when rt = 31, it denotes wzr/xzr. (Refer
>> +     * https://developer.arm.com/documentation/den0024/a/ARMv8-Registers/AArch64-special-registers
>> +     * "There is no register called X31 or W31. Many instructions are encoded
>> +     * such that the number 31 represents the zero register, ZR (WZR/XZR)."
>> +     */
>> +    if ( (instr->code.rn == instr->code.rt) && (instr->code.rn != 31) )
>> +        return -EINVAL;
>> +
>> +    /* First, let's check for the fixed values */
>> +    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
>> +         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
>> +    {
>> +        gprintk(XENLOG_ERR, "Cannot decode instruction 0x%x",instr->value);
>> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
>> +            " ldr/str post indexing\n");
>> +        goto bad_32bit_loadstore;
>> +    }
>> +
>> +    if ( instr->code.v != 0 )
>> +    {
>> +        gprintk(XENLOG_ERR,
>> +            "ldr/str post indexing for vector types are not supported\n");
>> +        goto bad_32bit_loadstore;
>> +    }
>> +
>> +    /* Check for STR (immediate) - 32 bit variant */
>> +    if ( instr->code.opc == 0 )
>> +    {
>> +        dabt->write = 1;
>> +    }
>> +    /* Check for LDR (immediate) - 32 bit variant */
>> +    else if ( instr->code.opc == 1 )
>> +    {
>> +        dabt->write = 0;
>> +    }
>> +    else
>> +    {
>> +        gprintk(XENLOG_ERR,
>> +            "Decoding ldr/str post indexing is not supported for this variant\n");
>> +        goto bad_32bit_loadstore;
>> +    }
>> +
>> +    gprintk(XENLOG_INFO,
>> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
>> +        instr->code.rt, instr->code.size, instr->code.imm9);
>> +
>> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
>> +    dabt->valid = 1;
>> +
>> +    return 0;
>> +
>> + bad_32bit_loadstore:
>> +    gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", instr->value);
>> +    return 1;
>> +}
>> +
>>   static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
>>   {
>>       uint16_t instr;
>> @@ -150,11 +220,17 @@ bad_thumb:
>>       return 1;
>>   }
>>   
>> -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
>> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
>> +                       union ldr_str_instr_class *instr)
>>   {
>>       if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
>>           return decode_thumb(regs->pc, dabt);
>>   
>> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
>> +    {
>> +        return decode_loadstore_postindexing(regs->pc, dabt, instr);
>> +    }
>> +
>>       /* TODO: Handle ARM instruction */
>>       gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
>>   
>> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
>> index 4613763bdb..5c918c9bed 100644
>> --- a/xen/arch/arm/decode.h
>> +++ b/xen/arch/arm/decode.h
>> @@ -23,6 +23,32 @@
>>   #include <asm/regs.h>
>>   #include <asm/processor.h>
>>   
>> +/*
>> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
>> + * Page 318 specifies the following bit pattern for
>> + * "load/store register (immediate post-indexed)".
>> + *
>> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
>> + * ___________________________________________________________________
>> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
>> + * |____|______|__|____|____|__|_______________|____|_________|_______|
>> + */
>> +union ldr_str_instr_class {
>> +    uint32_t value;
>> +    struct ldr_str {
>> +        unsigned int rt:5;     /* Rt register */
>> +        unsigned int rn:5;     /* Rn register */
>> +        unsigned int fixed1:2; /* value == 01b */
>> +        signed int imm9:9;            /* imm9 */
>> +        unsigned int fixed2:1; /* value == 0b */
>> +        unsigned int opc:2;    /* opc */
>> +        unsigned int fixed3:2; /* value == 00b */
>> +        unsigned int v:1;      /* vector */
>> +        unsigned int fixed4:3; /* value == 111b */
>> +        unsigned int size:2;   /* size */
>> +    } code;
>> +};
>> +
>>   /**
>>    * Decode an instruction from pc
>>    * /!\ This function is not intended to fully decode an instruction. It
>> @@ -35,7 +61,8 @@
>>    */
>>   
>>   int decode_instruction(const struct cpu_user_regs *regs,
>> -                       struct hsr_dabt *dabt);
>> +                       struct hsr_dabt *dabt,
>> +                       union ldr_str_instr_class *instr);
>>   
>>   #endif /* __ARCH_ARM_DECODE_H_ */
>>   
>> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
>> index 729287e37c..acb483f235 100644
>> --- a/xen/arch/arm/io.c
>> +++ b/xen/arch/arm/io.c
>> @@ -65,6 +65,39 @@ static enum io_state handle_write(const struct mmio_handler *handler,
>>       return ret ? IO_HANDLED : IO_ABORT;
>>   }
>>   
>> +static void post_increment_register(union ldr_str_instr_class *instr)
>> +{
>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>> +    unsigned int val;
>> +
>> +    /* handle when rn = SP */
>> +    if ( instr->code.rn == 31 )
>> +    {
>> +        if ( (regs->cpsr & PSR_MODE_MASK) == PSR_MODE_EL1h )
>> +        {
>> +            val = regs->sp_el1;
>> +        }
>> +        else
>> +        {
>> +            BUG();
>> +        }
>> +    }
>> +    else
>> +    {
>> +        val = get_user_reg(regs, instr->code.rn);
>> +    }
>> +    val += instr->code.imm9;
>> +
>> +    if ( instr->code.rn == 31 )
>> +    {
>> +        regs->sp_el1 = val;
>> +    }
>> +    else
>> +    {
>> +        set_user_reg(regs, instr->code.rn, val);
>> +    }
>> +}
>> +
>>   /* This function assumes that mmio regions are not overlapped */
>>   static int cmp_mmio_handler(const void *key, const void *elem)
>>   {
>> @@ -106,14 +139,29 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>>           .gpa = gpa,
>>           .dabt = dabt
>>       };
>> +    int rc;
>> +    union ldr_str_instr_class instr = {0};
>>   
>>       ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
>>   
>> +    /*
>> +     * Armv8 processor does not provide a valid syndrome for post-indexing
>> +     * ldr/str instructions. So in order to process these instructions,
>> +     * Xen must decode them.
>> +     */
>> +    if ( !info.dabt.valid )
>> +    {
>> +        rc = decode_instruction(regs, &info.dabt, &instr);
>> +        if ( rc )
>> +        {
>> +            gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
>> +            return IO_ABORT;
>> +        }
>> +    }
>> +
>>       handler = find_mmio_handler(v->domain, info.gpa);
>>       if ( !handler )
>>       {
>> -        int rc;
>> -
>>           rc = try_fwd_ioserv(regs, v, &info);
>>           if ( rc == IO_HANDLED )
>>               return handle_ioserv(regs, v);
>> @@ -122,7 +170,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>>       }
>>   
>>       /* All the instructions used on emulated MMIO region should be valid */
>> -    if ( !dabt.valid )
>> +    if ( !info.dabt.valid )
>>           return IO_ABORT;
>>   
>>       /*
>> @@ -134,7 +182,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>>       {
>>           int rc;
>>   
>> -        rc = decode_instruction(regs, &info.dabt);
>> +        rc = decode_instruction(regs, &info.dabt, NULL);
>>           if ( rc )
>>           {
>>               gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
>> @@ -143,9 +191,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>>       }
>>   
>>       if ( info.dabt.write )
>> -        return handle_write(handler, v, &info);
>> +        rc = handle_write(handler, v, &info);
>>       else
>> -        return handle_read(handler, v, &info);
>> +        rc = handle_read(handler, v, &info);
>> +
>> +    if ( instr.value != 0 )
>> +    {
>> +        post_increment_register(&instr);
>> +    }
>> +    return rc;
>>   }
>>   
>>   void register_mmio_handler(struct domain *d,

Re: [XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Andre Przywara 2 years, 2 months ago
On Mon, 24 Jan 2022 12:07:42 +0000
Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:

Hi Ayan,

> Many thanks for your feedback. I have one clarification :-
> 
> On 22/01/2022 01:30, Andre Przywara wrote:
> > On Thu, 20 Jan 2022 21:55:27 +0000
> > Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
> >
> > Hi,
> >  
> >> At the moment, Xen is only handling data abort with valid syndrome (i.e.
> >> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
> >> could use to access MMIO regions.
> >>
> >> For instance, a baremetal OS can use any of the following instructions, where
> >> x1 contains the address of the MMIO region:
> >>
> >> 1.      ldr     x2,    [x1],    #4  
> > That looks dodgy, since is misaligns the pointer afterwards. MMIO
> > access typically go to device memory, which must be naturally aligned.
> > Just don't give a bad example here and change that to a multiple of 8.
> >  
> >> 2.      ldr     w2,    [x1],    #-4  
> > (this one is fine, btw, because it's a 32-bit read)
> >  
> >> 3.      ldr     x2,    [x1],    #-8
> >> 4.      ldr     w2,    [x1],    #4
> >> 5.      ldrh    w2,    [x1],    #8
> >> 6.      ldrb    w2,    [x1],    #16  
> > More naturally I'd use the data size of the postindex value ...
> > ldr  x2 ... #-8
> > ldr  w2 ... #4
> > ldrh w2 ... #2
> > ldrb w2 ... #1
> >  
> >> 7.      str     x2,    [x1],    #4  
> > This is a again problematic, because x1 is not 8-byte aligned anymore
> > after that.
> >  
> >> 8.      str     w2,    [x1],    #-4
> >> 9.      strh    w2,    [x1],    #8
> >> 10.     strb    w2,    [x1],    #16
> >>
> >> In the following two instructions, sp contains the address of the MMIO region:-  
> > Really? I don't think you should give people funny ideas, just mention
> > that the Rn register could theoretically be the stack pointer.
> >  
> >> 11.     ldrb    w2,    [sp],    #16
> >> 12.     ldrb    wzr,   [sp],    #16
> >>
> >> In order to handle post-indexing store/load instructions (like those mentioned
> >> above), Xen will need to fetch and decode the instruction.
> >>
> >> This patch only cover post-index store/load instructions from AArch64 mode.
> >> For now, this is left unimplemented for trap from AArch32 mode.
> >>
> >> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
> >> ---
> >>
> >> Changelog :-
> >> v2 - 1. Updated the rn register after reading from it. (Pointed by Julien,
> >>          Stefano)
> >>       2. Used a union to represent the instruction opcode (Suggestd by Bertrand)
> >>       3. Fixed coding style issues (Pointed by Julien)
> >>       4. In the previous patch, I was updating dabt->sign based on the signedness
> >>          of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI 0487G.b,
> >>          Page 3221, SSE indicates the signedness of the data item loaded. In our
> >>          case, the data item loaded is always unsigned.
> >>
> >> v3- 1. Handled all the variants of ldr/str (ie 64, 32, 16, 8 bit variants).
> >>         Thus, I have removed the check for "instr->code.opc == 0" (Suggested by
> >>         Andre)
> >>      2. Handled the scenario when rn = SP, rt = XZR (Suggested by Jan, Andre)
> >>      3. Added restriction for "rt != rn" (Suggested by Andre)
> >>      4. Moved union ldr_str_instr_class {} to decode.h. This is the header included
> >>         by io.c and decode.c (where the union is referred). (Suggested by Jan)
> >>      5. Indentation and typo fixes (Suggested by Jan)
> >>
> >> Changes suggested but could not be considered due to reasons :-
> >>      1. Using accessor macros instead of bitfields for "ldr_str_instr_class". (Andre)
> >>         Reason - I could not find a simple way to represent 9 bit signed integer
> >>         (ie imm9) without using bitfields. If I use accessor macros, then I need
> >>         to manually calculate two's complement to obtain the value when signed
> >>         bit is present.
> >>
> >>      2. I/D cache cohenerncy (Andre)
> >>         Reason :- I could not see any instruction to flush the I cache.  
> > First, please try to avoid the term "flush", because it is somewhat
> > overloaded. The architecture speaks of "clean" and "invalidate", which
> > are more precise.
> > Assuming you mean "clean" here: conceptually there is no such thing for
> > the I cache, because it's always clean. The I$ will only be read from
> > the CPU side - from the instruction fetcher - there is nothing written
> > back through it. Every store goes through the data path - always.
> > That is the problem that I tried to sketch you previously: you don't
> > have a guarantee that the instruction you read from memory is the same
> > that the CPU executed. The guest could have changed the instruction
> > after the I$ fetched that. So the CPU will execute (and trap) on
> > instruction X, but you will read Y. I leave it up to your imagination
> > if that could be exploited.  
> 
> I see what you mean.
> 
> Refer Armv8 Arm DDI 0487G.b Page D1-2476, it says that (for instr/data 
> abort) the faulting virtual address and IPA is saved in FAR_ELx and 
> HPFAR_EL2 respectively. But, I do not see if the faulting instruction is 
> saved in any special register. Is there something I am missing ?

No, indeed there is no such thing. You get the address, but not the
faulting instruction. It would indeed be nice to have from a software
developer's point of view, but the architecture does not support it.
One reason might be that it's potentially hard to implement, because it
could be tricky to reconstruct the original instruction, when it has been
broken down to something different in the actual pipelines.

> Else, :( this is a limitation of the architecture (imo). A hypervisor 
> can be interested to see which instruction caused the abort when ISV = 0.

One of the reasons I suggested to just avoid those instructions for MMIO
in the first place, especially if their usage was somewhat questionable to
begin with.

> Also, if an instruction is being modified by the guest (after it has 
> been loaded in the I cache), and if the guest does not invalidate the I 
> cache + ISB, then this is a malicious behavior by the guest. Is my 
> understanding correct ?

I wouldn't say malicious per se, there might be legitimate reasons to do
so, but in the Xen context this is mostly irrelevant, since we don't trust
the guest anyway. So whether it's malicious or accidental, the hypervisor
might be mislead.

Cheers,
Andre

> >  
> >>         Refer https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/IC--Instruction-Cache-operation--an-alias-of-SYS-?lang=en#sa_ic_op
> >>         So, this patch assumes that the I/D caches are coherent.  
> > Bold. ;-)
> >
> > Cheers,
> > Andre
> >  
> >>   xen/arch/arm/decode.c | 78 ++++++++++++++++++++++++++++++++++++++++++-
> >>   xen/arch/arm/decode.h | 29 +++++++++++++++-
> >>   xen/arch/arm/io.c     | 66 ++++++++++++++++++++++++++++++++----
> >>   3 files changed, 165 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
> >> index 792c2e92a7..f1c59ddd1a 100644
> >> --- a/xen/arch/arm/decode.c
> >> +++ b/xen/arch/arm/decode.c
> >> @@ -84,6 +84,76 @@ bad_thumb2:
> >>       return 1;
> >>   }
> >>   
> >> +static int decode_loadstore_postindexing(register_t pc,
> >> +                                         struct hsr_dabt *dabt,
> >> +                                         union ldr_str_instr_class *instr)
> >> +{
> >> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
> >> +        return -EFAULT;
> >> +
> >> +    /*
> >> +     * Rn -ne Rt for ldr/str instruction.
> >> +     * Check https://developer.arm.com/documentation/dui0802/a/CIHGJHED
> >> +     * (Register restrictions)
> >> +     *
> >> +     * The only exception for this is when rn = 31. It denotes SP ("Use of SP")
> >> +     *
> >> +     * And when rt = 31, it denotes wzr/xzr. (Refer
> >> +     * https://developer.arm.com/documentation/den0024/a/ARMv8-Registers/AArch64-special-registers
> >> +     * "There is no register called X31 or W31. Many instructions are encoded
> >> +     * such that the number 31 represents the zero register, ZR (WZR/XZR)."
> >> +     */
> >> +    if ( (instr->code.rn == instr->code.rt) && (instr->code.rn != 31) )
> >> +        return -EINVAL;
> >> +
> >> +    /* First, let's check for the fixed values */
> >> +    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
> >> +         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
> >> +    {
> >> +        gprintk(XENLOG_ERR, "Cannot decode instruction 0x%x",instr->value);
> >> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
> >> +            " ldr/str post indexing\n");
> >> +        goto bad_32bit_loadstore;
> >> +    }
> >> +
> >> +    if ( instr->code.v != 0 )
> >> +    {
> >> +        gprintk(XENLOG_ERR,
> >> +            "ldr/str post indexing for vector types are not supported\n");
> >> +        goto bad_32bit_loadstore;
> >> +    }
> >> +
> >> +    /* Check for STR (immediate) - 32 bit variant */
> >> +    if ( instr->code.opc == 0 )
> >> +    {
> >> +        dabt->write = 1;
> >> +    }
> >> +    /* Check for LDR (immediate) - 32 bit variant */
> >> +    else if ( instr->code.opc == 1 )
> >> +    {
> >> +        dabt->write = 0;
> >> +    }
> >> +    else
> >> +    {
> >> +        gprintk(XENLOG_ERR,
> >> +            "Decoding ldr/str post indexing is not supported for this variant\n");
> >> +        goto bad_32bit_loadstore;
> >> +    }
> >> +
> >> +    gprintk(XENLOG_INFO,
> >> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
> >> +        instr->code.rt, instr->code.size, instr->code.imm9);
> >> +
> >> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
> >> +    dabt->valid = 1;
> >> +
> >> +    return 0;
> >> +
> >> + bad_32bit_loadstore:
> >> +    gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", instr->value);
> >> +    return 1;
> >> +}
> >> +
> >>   static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
> >>   {
> >>       uint16_t instr;
> >> @@ -150,11 +220,17 @@ bad_thumb:
> >>       return 1;
> >>   }
> >>   
> >> -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
> >> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
> >> +                       union ldr_str_instr_class *instr)
> >>   {
> >>       if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
> >>           return decode_thumb(regs->pc, dabt);
> >>   
> >> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
> >> +    {
> >> +        return decode_loadstore_postindexing(regs->pc, dabt, instr);
> >> +    }
> >> +
> >>       /* TODO: Handle ARM instruction */
> >>       gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
> >>   
> >> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
> >> index 4613763bdb..5c918c9bed 100644
> >> --- a/xen/arch/arm/decode.h
> >> +++ b/xen/arch/arm/decode.h
> >> @@ -23,6 +23,32 @@
> >>   #include <asm/regs.h>
> >>   #include <asm/processor.h>
> >>   
> >> +/*
> >> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
> >> + * Page 318 specifies the following bit pattern for
> >> + * "load/store register (immediate post-indexed)".
> >> + *
> >> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
> >> + * ___________________________________________________________________
> >> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
> >> + * |____|______|__|____|____|__|_______________|____|_________|_______|
> >> + */
> >> +union ldr_str_instr_class {
> >> +    uint32_t value;
> >> +    struct ldr_str {
> >> +        unsigned int rt:5;     /* Rt register */
> >> +        unsigned int rn:5;     /* Rn register */
> >> +        unsigned int fixed1:2; /* value == 01b */
> >> +        signed int imm9:9;            /* imm9 */
> >> +        unsigned int fixed2:1; /* value == 0b */
> >> +        unsigned int opc:2;    /* opc */
> >> +        unsigned int fixed3:2; /* value == 00b */
> >> +        unsigned int v:1;      /* vector */
> >> +        unsigned int fixed4:3; /* value == 111b */
> >> +        unsigned int size:2;   /* size */
> >> +    } code;
> >> +};
> >> +
> >>   /**
> >>    * Decode an instruction from pc
> >>    * /!\ This function is not intended to fully decode an instruction. It
> >> @@ -35,7 +61,8 @@
> >>    */
> >>   
> >>   int decode_instruction(const struct cpu_user_regs *regs,
> >> -                       struct hsr_dabt *dabt);
> >> +                       struct hsr_dabt *dabt,
> >> +                       union ldr_str_instr_class *instr);
> >>   
> >>   #endif /* __ARCH_ARM_DECODE_H_ */
> >>   
> >> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> >> index 729287e37c..acb483f235 100644
> >> --- a/xen/arch/arm/io.c
> >> +++ b/xen/arch/arm/io.c
> >> @@ -65,6 +65,39 @@ static enum io_state handle_write(const struct mmio_handler *handler,
> >>       return ret ? IO_HANDLED : IO_ABORT;
> >>   }
> >>   
> >> +static void post_increment_register(union ldr_str_instr_class *instr)
> >> +{
> >> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> >> +    unsigned int val;
> >> +
> >> +    /* handle when rn = SP */
> >> +    if ( instr->code.rn == 31 )
> >> +    {
> >> +        if ( (regs->cpsr & PSR_MODE_MASK) == PSR_MODE_EL1h )
> >> +        {
> >> +            val = regs->sp_el1;
> >> +        }
> >> +        else
> >> +        {
> >> +            BUG();
> >> +        }
> >> +    }
> >> +    else
> >> +    {
> >> +        val = get_user_reg(regs, instr->code.rn);
> >> +    }
> >> +    val += instr->code.imm9;
> >> +
> >> +    if ( instr->code.rn == 31 )
> >> +    {
> >> +        regs->sp_el1 = val;
> >> +    }
> >> +    else
> >> +    {
> >> +        set_user_reg(regs, instr->code.rn, val);
> >> +    }
> >> +}
> >> +
> >>   /* This function assumes that mmio regions are not overlapped */
> >>   static int cmp_mmio_handler(const void *key, const void *elem)
> >>   {
> >> @@ -106,14 +139,29 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
> >>           .gpa = gpa,
> >>           .dabt = dabt
> >>       };
> >> +    int rc;
> >> +    union ldr_str_instr_class instr = {0};
> >>   
> >>       ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
> >>   
> >> +    /*
> >> +     * Armv8 processor does not provide a valid syndrome for post-indexing
> >> +     * ldr/str instructions. So in order to process these instructions,
> >> +     * Xen must decode them.
> >> +     */
> >> +    if ( !info.dabt.valid )
> >> +    {
> >> +        rc = decode_instruction(regs, &info.dabt, &instr);
> >> +        if ( rc )
> >> +        {
> >> +            gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> >> +            return IO_ABORT;
> >> +        }
> >> +    }
> >> +
> >>       handler = find_mmio_handler(v->domain, info.gpa);
> >>       if ( !handler )
> >>       {
> >> -        int rc;
> >> -
> >>           rc = try_fwd_ioserv(regs, v, &info);
> >>           if ( rc == IO_HANDLED )
> >>               return handle_ioserv(regs, v);
> >> @@ -122,7 +170,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
> >>       }
> >>   
> >>       /* All the instructions used on emulated MMIO region should be valid */
> >> -    if ( !dabt.valid )
> >> +    if ( !info.dabt.valid )
> >>           return IO_ABORT;
> >>   
> >>       /*
> >> @@ -134,7 +182,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
> >>       {
> >>           int rc;
> >>   
> >> -        rc = decode_instruction(regs, &info.dabt);
> >> +        rc = decode_instruction(regs, &info.dabt, NULL);
> >>           if ( rc )
> >>           {
> >>               gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> >> @@ -143,9 +191,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
> >>       }
> >>   
> >>       if ( info.dabt.write )
> >> -        return handle_write(handler, v, &info);
> >> +        rc = handle_write(handler, v, &info);
> >>       else
> >> -        return handle_read(handler, v, &info);
> >> +        rc = handle_read(handler, v, &info);
> >> +
> >> +    if ( instr.value != 0 )
> >> +    {
> >> +        post_increment_register(&instr);
> >> +    }
> >> +    return rc;
> >>   }
> >>   
> >>   void register_mmio_handler(struct domain *d,  


Re: [XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Ayan Kumar Halder 2 years, 2 months ago
Hi Andre,

Thanks forn your comments.

On 24/01/2022 14:36, Andre Przywara wrote:
> On Mon, 24 Jan 2022 12:07:42 +0000
> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
>
> Hi Ayan,
>
>> Many thanks for your feedback. I have one clarification :-
>>
>> On 22/01/2022 01:30, Andre Przywara wrote:
>>> On Thu, 20 Jan 2022 21:55:27 +0000
>>> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
>>>
>>> Hi,
>>>   
>>>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
>>>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
>>>> could use to access MMIO regions.
>>>>
>>>> For instance, a baremetal OS can use any of the following instructions, where
>>>> x1 contains the address of the MMIO region:
>>>>
>>>> 1.      ldr     x2,    [x1],    #4
>>> That looks dodgy, since is misaligns the pointer afterwards. MMIO
>>> access typically go to device memory, which must be naturally aligned.
>>> Just don't give a bad example here and change that to a multiple of 8.
>>>   
>>>> 2.      ldr     w2,    [x1],    #-4
>>> (this one is fine, btw, because it's a 32-bit read)
>>>   
>>>> 3.      ldr     x2,    [x1],    #-8
>>>> 4.      ldr     w2,    [x1],    #4
>>>> 5.      ldrh    w2,    [x1],    #8
>>>> 6.      ldrb    w2,    [x1],    #16
>>> More naturally I'd use the data size of the postindex value ...
>>> ldr  x2 ... #-8
>>> ldr  w2 ... #4
>>> ldrh w2 ... #2
>>> ldrb w2 ... #1
>>>   
>>>> 7.      str     x2,    [x1],    #4
>>> This is a again problematic, because x1 is not 8-byte aligned anymore
>>> after that.
>>>   
>>>> 8.      str     w2,    [x1],    #-4
>>>> 9.      strh    w2,    [x1],    #8
>>>> 10.     strb    w2,    [x1],    #16
>>>>
>>>> In the following two instructions, sp contains the address of the MMIO region:-
>>> Really? I don't think you should give people funny ideas, just mention
>>> that the Rn register could theoretically be the stack pointer.
>>>   
>>>> 11.     ldrb    w2,    [sp],    #16
>>>> 12.     ldrb    wzr,   [sp],    #16
>>>>
>>>> In order to handle post-indexing store/load instructions (like those mentioned
>>>> above), Xen will need to fetch and decode the instruction.
>>>>
>>>> This patch only cover post-index store/load instructions from AArch64 mode.
>>>> For now, this is left unimplemented for trap from AArch32 mode.
>>>>
>>>> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
>>>> ---
>>>>
>>>> Changelog :-
>>>> v2 - 1. Updated the rn register after reading from it. (Pointed by Julien,
>>>>           Stefano)
>>>>        2. Used a union to represent the instruction opcode (Suggestd by Bertrand)
>>>>        3. Fixed coding style issues (Pointed by Julien)
>>>>        4. In the previous patch, I was updating dabt->sign based on the signedness
>>>>           of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI 0487G.b,
>>>>           Page 3221, SSE indicates the signedness of the data item loaded. In our
>>>>           case, the data item loaded is always unsigned.
>>>>
>>>> v3- 1. Handled all the variants of ldr/str (ie 64, 32, 16, 8 bit variants).
>>>>          Thus, I have removed the check for "instr->code.opc == 0" (Suggested by
>>>>          Andre)
>>>>       2. Handled the scenario when rn = SP, rt = XZR (Suggested by Jan, Andre)
>>>>       3. Added restriction for "rt != rn" (Suggested by Andre)
>>>>       4. Moved union ldr_str_instr_class {} to decode.h. This is the header included
>>>>          by io.c and decode.c (where the union is referred). (Suggested by Jan)
>>>>       5. Indentation and typo fixes (Suggested by Jan)
>>>>
>>>> Changes suggested but could not be considered due to reasons :-
>>>>       1. Using accessor macros instead of bitfields for "ldr_str_instr_class". (Andre)
>>>>          Reason - I could not find a simple way to represent 9 bit signed integer
>>>>          (ie imm9) without using bitfields. If I use accessor macros, then I need
>>>>          to manually calculate two's complement to obtain the value when signed
>>>>          bit is present.
>>>>
>>>>       2. I/D cache cohenerncy (Andre)
>>>>          Reason :- I could not see any instruction to flush the I cache.
>>> First, please try to avoid the term "flush", because it is somewhat
>>> overloaded. The architecture speaks of "clean" and "invalidate", which
>>> are more precise.
>>> Assuming you mean "clean" here: conceptually there is no such thing for
>>> the I cache, because it's always clean. The I$ will only be read from
>>> the CPU side - from the instruction fetcher - there is nothing written
>>> back through it. Every store goes through the data path - always.
>>> That is the problem that I tried to sketch you previously: you don't
>>> have a guarantee that the instruction you read from memory is the same
>>> that the CPU executed. The guest could have changed the instruction
>>> after the I$ fetched that. So the CPU will execute (and trap) on
>>> instruction X, but you will read Y. I leave it up to your imagination
>>> if that could be exploited.
>> I see what you mean.
>>
>> Refer Armv8 Arm DDI 0487G.b Page D1-2476, it says that (for instr/data
>> abort) the faulting virtual address and IPA is saved in FAR_ELx and
>> HPFAR_EL2 respectively. But, I do not see if the faulting instruction is
>> saved in any special register. Is there something I am missing ?
> No, indeed there is no such thing. You get the address, but not the
> faulting instruction. It would indeed be nice to have from a software
> developer's point of view, but the architecture does not support it.
> One reason might be that it's potentially hard to implement, because it
> could be tricky to reconstruct the original instruction, when it has been
> broken down to something different in the actual pipelines.

Is it possible for Arm to implement such a register which will hold the 
instruction that caused the exception, in the future architecture 
revision ? This would be useful given that the faulting address (PC) can 
not be trusted to get to the original faulting instruction (as it can be 
changed after being loaded in I cache). I could imagine this being 
useful in any scenario when the user wants to know which instruction 
caused the fault.

Stefano/Julien/Bertrand/Volodymyr :- I would love to hear your inputs on 
this as well.

As for the patch, I will mention this issue (as a comment in the code) 
where we are loading the instruction from PC. 
Stefano/Julien/Bertrand/Volodymyr:- Does it look fine with you ?

- Ayan

>
>> Else, :( this is a limitation of the architecture (imo). A hypervisor
>> can be interested to see which instruction caused the abort when ISV = 0.
> One of the reasons I suggested to just avoid those instructions for MMIO
> in the first place, especially if their usage was somewhat questionable to
> begin with.
>
>> Also, if an instruction is being modified by the guest (after it has
>> been loaded in the I cache), and if the guest does not invalidate the I
>> cache + ISB, then this is a malicious behavior by the guest. Is my
>> understanding correct ?
> I wouldn't say malicious per se, there might be legitimate reasons to do
> so, but in the Xen context this is mostly irrelevant, since we don't trust
> the guest anyway. So whether it's malicious or accidental, the hypervisor
> might be mislead.
>
> Cheers,
> Andre
>
>>>   
>>>>          Refer https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/IC--Instruction-Cache-operation--an-alias-of-SYS-?lang=en#sa_ic_op
>>>>          So, this patch assumes that the I/D caches are coherent.
>>> Bold. ;-)
>>>
>>> Cheers,
>>> Andre
>>>   
>>>>    xen/arch/arm/decode.c | 78 ++++++++++++++++++++++++++++++++++++++++++-
>>>>    xen/arch/arm/decode.h | 29 +++++++++++++++-
>>>>    xen/arch/arm/io.c     | 66 ++++++++++++++++++++++++++++++++----
>>>>    3 files changed, 165 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
>>>> index 792c2e92a7..f1c59ddd1a 100644
>>>> --- a/xen/arch/arm/decode.c
>>>> +++ b/xen/arch/arm/decode.c
>>>> @@ -84,6 +84,76 @@ bad_thumb2:
>>>>        return 1;
>>>>    }
>>>>    
>>>> +static int decode_loadstore_postindexing(register_t pc,
>>>> +                                         struct hsr_dabt *dabt,
>>>> +                                         union ldr_str_instr_class *instr)
>>>> +{
>>>> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
>>>> +        return -EFAULT;
>>>> +
>>>> +    /*
>>>> +     * Rn -ne Rt for ldr/str instruction.
>>>> +     * Check https://developer.arm.com/documentation/dui0802/a/CIHGJHED
>>>> +     * (Register restrictions)
>>>> +     *
>>>> +     * The only exception for this is when rn = 31. It denotes SP ("Use of SP")
>>>> +     *
>>>> +     * And when rt = 31, it denotes wzr/xzr. (Refer
>>>> +     * https://developer.arm.com/documentation/den0024/a/ARMv8-Registers/AArch64-special-registers
>>>> +     * "There is no register called X31 or W31. Many instructions are encoded
>>>> +     * such that the number 31 represents the zero register, ZR (WZR/XZR)."
>>>> +     */
>>>> +    if ( (instr->code.rn == instr->code.rt) && (instr->code.rn != 31) )
>>>> +        return -EINVAL;
>>>> +
>>>> +    /* First, let's check for the fixed values */
>>>> +    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
>>>> +         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
>>>> +    {
>>>> +        gprintk(XENLOG_ERR, "Cannot decode instruction 0x%x",instr->value);
>>>> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
>>>> +            " ldr/str post indexing\n");
>>>> +        goto bad_32bit_loadstore;
>>>> +    }
>>>> +
>>>> +    if ( instr->code.v != 0 )
>>>> +    {
>>>> +        gprintk(XENLOG_ERR,
>>>> +            "ldr/str post indexing for vector types are not supported\n");
>>>> +        goto bad_32bit_loadstore;
>>>> +    }
>>>> +
>>>> +    /* Check for STR (immediate) - 32 bit variant */
>>>> +    if ( instr->code.opc == 0 )
>>>> +    {
>>>> +        dabt->write = 1;
>>>> +    }
>>>> +    /* Check for LDR (immediate) - 32 bit variant */
>>>> +    else if ( instr->code.opc == 1 )
>>>> +    {
>>>> +        dabt->write = 0;
>>>> +    }
>>>> +    else
>>>> +    {
>>>> +        gprintk(XENLOG_ERR,
>>>> +            "Decoding ldr/str post indexing is not supported for this variant\n");
>>>> +        goto bad_32bit_loadstore;
>>>> +    }
>>>> +
>>>> +    gprintk(XENLOG_INFO,
>>>> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
>>>> +        instr->code.rt, instr->code.size, instr->code.imm9);
>>>> +
>>>> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
>>>> +    dabt->valid = 1;
>>>> +
>>>> +    return 0;
>>>> +
>>>> + bad_32bit_loadstore:
>>>> +    gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", instr->value);
>>>> +    return 1;
>>>> +}
>>>> +
>>>>    static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
>>>>    {
>>>>        uint16_t instr;
>>>> @@ -150,11 +220,17 @@ bad_thumb:
>>>>        return 1;
>>>>    }
>>>>    
>>>> -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
>>>> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
>>>> +                       union ldr_str_instr_class *instr)
>>>>    {
>>>>        if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
>>>>            return decode_thumb(regs->pc, dabt);
>>>>    
>>>> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
>>>> +    {
>>>> +        return decode_loadstore_postindexing(regs->pc, dabt, instr);
>>>> +    }
>>>> +
>>>>        /* TODO: Handle ARM instruction */
>>>>        gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
>>>>    
>>>> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
>>>> index 4613763bdb..5c918c9bed 100644
>>>> --- a/xen/arch/arm/decode.h
>>>> +++ b/xen/arch/arm/decode.h
>>>> @@ -23,6 +23,32 @@
>>>>    #include <asm/regs.h>
>>>>    #include <asm/processor.h>
>>>>    
>>>> +/*
>>>> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
>>>> + * Page 318 specifies the following bit pattern for
>>>> + * "load/store register (immediate post-indexed)".
>>>> + *
>>>> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
>>>> + * ___________________________________________________________________
>>>> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
>>>> + * |____|______|__|____|____|__|_______________|____|_________|_______|
>>>> + */
>>>> +union ldr_str_instr_class {
>>>> +    uint32_t value;
>>>> +    struct ldr_str {
>>>> +        unsigned int rt:5;     /* Rt register */
>>>> +        unsigned int rn:5;     /* Rn register */
>>>> +        unsigned int fixed1:2; /* value == 01b */
>>>> +        signed int imm9:9;            /* imm9 */
>>>> +        unsigned int fixed2:1; /* value == 0b */
>>>> +        unsigned int opc:2;    /* opc */
>>>> +        unsigned int fixed3:2; /* value == 00b */
>>>> +        unsigned int v:1;      /* vector */
>>>> +        unsigned int fixed4:3; /* value == 111b */
>>>> +        unsigned int size:2;   /* size */
>>>> +    } code;
>>>> +};
>>>> +
>>>>    /**
>>>>     * Decode an instruction from pc
>>>>     * /!\ This function is not intended to fully decode an instruction. It
>>>> @@ -35,7 +61,8 @@
>>>>     */
>>>>    
>>>>    int decode_instruction(const struct cpu_user_regs *regs,
>>>> -                       struct hsr_dabt *dabt);
>>>> +                       struct hsr_dabt *dabt,
>>>> +                       union ldr_str_instr_class *instr);
>>>>    
>>>>    #endif /* __ARCH_ARM_DECODE_H_ */
>>>>    
>>>> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
>>>> index 729287e37c..acb483f235 100644
>>>> --- a/xen/arch/arm/io.c
>>>> +++ b/xen/arch/arm/io.c
>>>> @@ -65,6 +65,39 @@ static enum io_state handle_write(const struct mmio_handler *handler,
>>>>        return ret ? IO_HANDLED : IO_ABORT;
>>>>    }
>>>>    
>>>> +static void post_increment_register(union ldr_str_instr_class *instr)
>>>> +{
>>>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>>>> +    unsigned int val;
>>>> +
>>>> +    /* handle when rn = SP */
>>>> +    if ( instr->code.rn == 31 )
>>>> +    {
>>>> +        if ( (regs->cpsr & PSR_MODE_MASK) == PSR_MODE_EL1h )
>>>> +        {
>>>> +            val = regs->sp_el1;
>>>> +        }
>>>> +        else
>>>> +        {
>>>> +            BUG();
>>>> +        }
>>>> +    }
>>>> +    else
>>>> +    {
>>>> +        val = get_user_reg(regs, instr->code.rn);
>>>> +    }
>>>> +    val += instr->code.imm9;
>>>> +
>>>> +    if ( instr->code.rn == 31 )
>>>> +    {
>>>> +        regs->sp_el1 = val;
>>>> +    }
>>>> +    else
>>>> +    {
>>>> +        set_user_reg(regs, instr->code.rn, val);
>>>> +    }
>>>> +}
>>>> +
>>>>    /* This function assumes that mmio regions are not overlapped */
>>>>    static int cmp_mmio_handler(const void *key, const void *elem)
>>>>    {
>>>> @@ -106,14 +139,29 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>>>>            .gpa = gpa,
>>>>            .dabt = dabt
>>>>        };
>>>> +    int rc;
>>>> +    union ldr_str_instr_class instr = {0};
>>>>    
>>>>        ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
>>>>    
>>>> +    /*
>>>> +     * Armv8 processor does not provide a valid syndrome for post-indexing
>>>> +     * ldr/str instructions. So in order to process these instructions,
>>>> +     * Xen must decode them.
>>>> +     */
>>>> +    if ( !info.dabt.valid )
>>>> +    {
>>>> +        rc = decode_instruction(regs, &info.dabt, &instr);
>>>> +        if ( rc )
>>>> +        {
>>>> +            gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
>>>> +            return IO_ABORT;
>>>> +        }
>>>> +    }
>>>> +
>>>>        handler = find_mmio_handler(v->domain, info.gpa);
>>>>        if ( !handler )
>>>>        {
>>>> -        int rc;
>>>> -
>>>>            rc = try_fwd_ioserv(regs, v, &info);
>>>>            if ( rc == IO_HANDLED )
>>>>                return handle_ioserv(regs, v);
>>>> @@ -122,7 +170,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>>>>        }
>>>>    
>>>>        /* All the instructions used on emulated MMIO region should be valid */
>>>> -    if ( !dabt.valid )
>>>> +    if ( !info.dabt.valid )
>>>>            return IO_ABORT;
>>>>    
>>>>        /*
>>>> @@ -134,7 +182,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>>>>        {
>>>>            int rc;
>>>>    
>>>> -        rc = decode_instruction(regs, &info.dabt);
>>>> +        rc = decode_instruction(regs, &info.dabt, NULL);
>>>>            if ( rc )
>>>>            {
>>>>                gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
>>>> @@ -143,9 +191,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>>>>        }
>>>>    
>>>>        if ( info.dabt.write )
>>>> -        return handle_write(handler, v, &info);
>>>> +        rc = handle_write(handler, v, &info);
>>>>        else
>>>> -        return handle_read(handler, v, &info);
>>>> +        rc = handle_read(handler, v, &info);
>>>> +
>>>> +    if ( instr.value != 0 )
>>>> +    {
>>>> +        post_increment_register(&instr);
>>>> +    }
>>>> +    return rc;
>>>>    }
>>>>    
>>>>    void register_mmio_handler(struct domain *d,

Re: [XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Julien Grall 2 years, 2 months ago
Hi,

On 24/01/2022 17:27, Ayan Kumar Halder wrote:
> Thanks forn your comments.
> 
> On 24/01/2022 14:36, Andre Przywara wrote:
>> On Mon, 24 Jan 2022 12:07:42 +0000
>> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
>>
>> Hi Ayan,
>>
>>> Many thanks for your feedback. I have one clarification :-
>>>
>>> On 22/01/2022 01:30, Andre Przywara wrote:
>>>> On Thu, 20 Jan 2022 21:55:27 +0000
>>>> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
>>>>
>>>> Hi,
>>>>> At the moment, Xen is only handling data abort with valid syndrome 
>>>>> (i.e.
>>>>> ISV=0). Unfortunately, this doesn't cover all the instructions a 
>>>>> domain
>>>>> could use to access MMIO regions.
>>>>>
>>>>> For instance, a baremetal OS can use any of the following 
>>>>> instructions, where
>>>>> x1 contains the address of the MMIO region:
>>>>>
>>>>> 1.      ldr     x2,    [x1],    #4
>>>> That looks dodgy, since is misaligns the pointer afterwards. MMIO
>>>> access typically go to device memory, which must be naturally aligned.
>>>> Just don't give a bad example here and change that to a multiple of 8.
>>>>> 2.      ldr     w2,    [x1],    #-4
>>>> (this one is fine, btw, because it's a 32-bit read)
>>>>> 3.      ldr     x2,    [x1],    #-8
>>>>> 4.      ldr     w2,    [x1],    #4
>>>>> 5.      ldrh    w2,    [x1],    #8
>>>>> 6.      ldrb    w2,    [x1],    #16
>>>> More naturally I'd use the data size of the postindex value ...
>>>> ldr  x2 ... #-8
>>>> ldr  w2 ... #4
>>>> ldrh w2 ... #2
>>>> ldrb w2 ... #1
>>>>> 7.      str     x2,    [x1],    #4
>>>> This is a again problematic, because x1 is not 8-byte aligned anymore
>>>> after that.
>>>>> 8.      str     w2,    [x1],    #-4
>>>>> 9.      strh    w2,    [x1],    #8
>>>>> 10.     strb    w2,    [x1],    #16
>>>>>
>>>>> In the following two instructions, sp contains the address of the 
>>>>> MMIO region:-
>>>> Really? I don't think you should give people funny ideas, just mention
>>>> that the Rn register could theoretically be the stack pointer.
>>>>> 11.     ldrb    w2,    [sp],    #16
>>>>> 12.     ldrb    wzr,   [sp],    #16
>>>>>
>>>>> In order to handle post-indexing store/load instructions (like 
>>>>> those mentioned
>>>>> above), Xen will need to fetch and decode the instruction.
>>>>>
>>>>> This patch only cover post-index store/load instructions from 
>>>>> AArch64 mode.
>>>>> For now, this is left unimplemented for trap from AArch32 mode.
>>>>>
>>>>> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
>>>>> ---
>>>>>
>>>>> Changelog :-
>>>>> v2 - 1. Updated the rn register after reading from it. (Pointed by 
>>>>> Julien,
>>>>>           Stefano)
>>>>>        2. Used a union to represent the instruction opcode 
>>>>> (Suggestd by Bertrand)
>>>>>        3. Fixed coding style issues (Pointed by Julien)
>>>>>        4. In the previous patch, I was updating dabt->sign based on 
>>>>> the signedness
>>>>>           of imm9. This was incorrect. As mentioned in ARMv8 ARM  
>>>>> DDI 0487G.b,
>>>>>           Page 3221, SSE indicates the signedness of the data item 
>>>>> loaded. In our
>>>>>           case, the data item loaded is always unsigned.
>>>>>
>>>>> v3- 1. Handled all the variants of ldr/str (ie 64, 32, 16, 8 bit 
>>>>> variants).
>>>>>          Thus, I have removed the check for "instr->code.opc == 0" 
>>>>> (Suggested by
>>>>>          Andre)
>>>>>       2. Handled the scenario when rn = SP, rt = XZR (Suggested by 
>>>>> Jan, Andre)
>>>>>       3. Added restriction for "rt != rn" (Suggested by Andre)
>>>>>       4. Moved union ldr_str_instr_class {} to decode.h. This is 
>>>>> the header included
>>>>>          by io.c and decode.c (where the union is referred). 
>>>>> (Suggested by Jan)
>>>>>       5. Indentation and typo fixes (Suggested by Jan)
>>>>>
>>>>> Changes suggested but could not be considered due to reasons :-
>>>>>       1. Using accessor macros instead of bitfields for 
>>>>> "ldr_str_instr_class". (Andre)
>>>>>          Reason - I could not find a simple way to represent 9 bit 
>>>>> signed integer
>>>>>          (ie imm9) without using bitfields. If I use accessor 
>>>>> macros, then I need
>>>>>          to manually calculate two's complement to obtain the value 
>>>>> when signed
>>>>>          bit is present.
>>>>>
>>>>>       2. I/D cache cohenerncy (Andre)
>>>>>          Reason :- I could not see any instruction to flush the I 
>>>>> cache.
>>>> First, please try to avoid the term "flush", because it is somewhat
>>>> overloaded. The architecture speaks of "clean" and "invalidate", which
>>>> are more precise.
>>>> Assuming you mean "clean" here: conceptually there is no such thing for
>>>> the I cache, because it's always clean. The I$ will only be read from
>>>> the CPU side - from the instruction fetcher - there is nothing written
>>>> back through it. Every store goes through the data path - always.
>>>> That is the problem that I tried to sketch you previously: you don't
>>>> have a guarantee that the instruction you read from memory is the same
>>>> that the CPU executed. The guest could have changed the instruction
>>>> after the I$ fetched that. So the CPU will execute (and trap) on
>>>> instruction X, but you will read Y. I leave it up to your imagination
>>>> if that could be exploited.
>>> I see what you mean.
>>>
>>> Refer Armv8 Arm DDI 0487G.b Page D1-2476, it says that (for instr/data
>>> abort) the faulting virtual address and IPA is saved in FAR_ELx and
>>> HPFAR_EL2 respectively. But, I do not see if the faulting instruction is
>>> saved in any special register. Is there something I am missing ?
>> No, indeed there is no such thing. You get the address, but not the
>> faulting instruction. It would indeed be nice to have from a software
>> developer's point of view, but the architecture does not support it.
>> One reason might be that it's potentially hard to implement, because it
>> could be tricky to reconstruct the original instruction, when it has been
>> broken down to something different in the actual pipelines.
> 
> Is it possible for Arm to implement such a register which will hold the 
> instruction that caused the exception, in the future architecture 
> revision ? This would be useful given that the faulting address (PC) can 
> not be trusted to get to the original faulting instruction (as it can be 
> changed after being loaded in I cache). I could imagine this being 
> useful in any scenario when the user wants to know which instruction 
> caused the fault.
> 
> Stefano/Julien/Bertrand/Volodymyr :- I would love to hear your inputs on 
> this as well.

I have replied on Andre's email. I don't view this issue raised as a 
critical issue (see my answer on Andre's email for more details).


> 
> As for the patch, I will mention this issue (as a comment in the code) 
> where we are loading the instruction from PC. 
> Stefano/Julien/Bertrand/Volodymyr:- Does it look fine with you ?

I will queue the patch for review. I can wait the next version if you 
plan to respin it.

Cheers,

-- 
Julien Grall

Re: [XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Stefano Stabellini 2 years, 2 months ago
On Mon, 24 Jan 2022, Ayan Kumar Halder wrote:
> Hi Andre,
> 
> Thanks forn your comments.
> 
> On 24/01/2022 14:36, Andre Przywara wrote:
> > On Mon, 24 Jan 2022 12:07:42 +0000
> > Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
> > 
> > Hi Ayan,
> > 
> > > Many thanks for your feedback. I have one clarification :-
> > > 
> > > On 22/01/2022 01:30, Andre Przywara wrote:
> > > > On Thu, 20 Jan 2022 21:55:27 +0000
> > > > Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
> > > > 
> > > > Hi,
> > > >   
> > > > > At the moment, Xen is only handling data abort with valid syndrome
> > > > > (i.e.
> > > > > ISV=0). Unfortunately, this doesn't cover all the instructions a
> > > > > domain
> > > > > could use to access MMIO regions.
> > > > > 
> > > > > For instance, a baremetal OS can use any of the following
> > > > > instructions, where
> > > > > x1 contains the address of the MMIO region:
> > > > > 
> > > > > 1.      ldr     x2,    [x1],    #4
> > > > That looks dodgy, since is misaligns the pointer afterwards. MMIO
> > > > access typically go to device memory, which must be naturally aligned.
> > > > Just don't give a bad example here and change that to a multiple of 8.
> > > >   
> > > > > 2.      ldr     w2,    [x1],    #-4
> > > > (this one is fine, btw, because it's a 32-bit read)
> > > >   
> > > > > 3.      ldr     x2,    [x1],    #-8
> > > > > 4.      ldr     w2,    [x1],    #4
> > > > > 5.      ldrh    w2,    [x1],    #8
> > > > > 6.      ldrb    w2,    [x1],    #16
> > > > More naturally I'd use the data size of the postindex value ...
> > > > ldr  x2 ... #-8
> > > > ldr  w2 ... #4
> > > > ldrh w2 ... #2
> > > > ldrb w2 ... #1
> > > >   
> > > > > 7.      str     x2,    [x1],    #4
> > > > This is a again problematic, because x1 is not 8-byte aligned anymore
> > > > after that.
> > > >   
> > > > > 8.      str     w2,    [x1],    #-4
> > > > > 9.      strh    w2,    [x1],    #8
> > > > > 10.     strb    w2,    [x1],    #16
> > > > > 
> > > > > In the following two instructions, sp contains the address of the MMIO
> > > > > region:-
> > > > Really? I don't think you should give people funny ideas, just mention
> > > > that the Rn register could theoretically be the stack pointer.
> > > >   
> > > > > 11.     ldrb    w2,    [sp],    #16
> > > > > 12.     ldrb    wzr,   [sp],    #16
> > > > > 
> > > > > In order to handle post-indexing store/load instructions (like those
> > > > > mentioned
> > > > > above), Xen will need to fetch and decode the instruction.
> > > > > 
> > > > > This patch only cover post-index store/load instructions from AArch64
> > > > > mode.
> > > > > For now, this is left unimplemented for trap from AArch32 mode.
> > > > > 
> > > > > Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
> > > > > ---
> > > > > 
> > > > > Changelog :-
> > > > > v2 - 1. Updated the rn register after reading from it. (Pointed by
> > > > > Julien,
> > > > >           Stefano)
> > > > >        2. Used a union to represent the instruction opcode (Suggestd
> > > > > by Bertrand)
> > > > >        3. Fixed coding style issues (Pointed by Julien)
> > > > >        4. In the previous patch, I was updating dabt->sign based on
> > > > > the signedness
> > > > >           of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI
> > > > > 0487G.b,
> > > > >           Page 3221, SSE indicates the signedness of the data item
> > > > > loaded. In our
> > > > >           case, the data item loaded is always unsigned.
> > > > > 
> > > > > v3- 1. Handled all the variants of ldr/str (ie 64, 32, 16, 8 bit
> > > > > variants).
> > > > >          Thus, I have removed the check for "instr->code.opc == 0"
> > > > > (Suggested by
> > > > >          Andre)
> > > > >       2. Handled the scenario when rn = SP, rt = XZR (Suggested by
> > > > > Jan, Andre)
> > > > >       3. Added restriction for "rt != rn" (Suggested by Andre)
> > > > >       4. Moved union ldr_str_instr_class {} to decode.h. This is the
> > > > > header included
> > > > >          by io.c and decode.c (where the union is referred).
> > > > > (Suggested by Jan)
> > > > >       5. Indentation and typo fixes (Suggested by Jan)
> > > > > 
> > > > > Changes suggested but could not be considered due to reasons :-
> > > > >       1. Using accessor macros instead of bitfields for
> > > > > "ldr_str_instr_class". (Andre)
> > > > >          Reason - I could not find a simple way to represent 9 bit
> > > > > signed integer
> > > > >          (ie imm9) without using bitfields. If I use accessor macros,
> > > > > then I need
> > > > >          to manually calculate two's complement to obtain the value
> > > > > when signed
> > > > >          bit is present.
> > > > > 
> > > > >       2. I/D cache cohenerncy (Andre)
> > > > >          Reason :- I could not see any instruction to flush the I
> > > > > cache.
> > > > First, please try to avoid the term "flush", because it is somewhat
> > > > overloaded. The architecture speaks of "clean" and "invalidate", which
> > > > are more precise.
> > > > Assuming you mean "clean" here: conceptually there is no such thing for
> > > > the I cache, because it's always clean. The I$ will only be read from
> > > > the CPU side - from the instruction fetcher - there is nothing written
> > > > back through it. Every store goes through the data path - always.
> > > > That is the problem that I tried to sketch you previously: you don't
> > > > have a guarantee that the instruction you read from memory is the same
> > > > that the CPU executed. The guest could have changed the instruction
> > > > after the I$ fetched that. So the CPU will execute (and trap) on
> > > > instruction X, but you will read Y. I leave it up to your imagination
> > > > if that could be exploited.
> > > I see what you mean.
> > > 
> > > Refer Armv8 Arm DDI 0487G.b Page D1-2476, it says that (for instr/data
> > > abort) the faulting virtual address and IPA is saved in FAR_ELx and
> > > HPFAR_EL2 respectively. But, I do not see if the faulting instruction is
> > > saved in any special register. Is there something I am missing ?
> > No, indeed there is no such thing. You get the address, but not the
> > faulting instruction. It would indeed be nice to have from a software
> > developer's point of view, but the architecture does not support it.
> > One reason might be that it's potentially hard to implement, because it
> > could be tricky to reconstruct the original instruction, when it has been
> > broken down to something different in the actual pipelines.
> 
> Is it possible for Arm to implement such a register which will hold the
> instruction that caused the exception, in the future architecture revision ?
> This would be useful given that the faulting address (PC) can not be trusted
> to get to the original faulting instruction (as it can be changed after being
> loaded in I cache). I could imagine this being useful in any scenario when the
> user wants to know which instruction caused the fault.
> 
> Stefano/Julien/Bertrand/Volodymyr :- I would love to hear your inputs on this
> as well.
> 
> As for the patch, I will mention this issue (as a comment in the code) where
> we are loading the instruction from PC. Stefano/Julien/Bertrand/Volodymyr:-
> Does it look fine with you ?

As this issue could happen on any architecture (the guest could change
the instruction from another vcpu while the other is trapping in Xen)
and given that we do quite a bit of emulation on x86 I asked Jan on IRC
how do we handle this kind of things on x86 today. He had a good answer:
"By not making any assumptions on what we're going to find."

In other words, don't assume you are going to find a store or a load
instruction at the memory location pointed by the PC. You could find
total garbage (because it was changed in between). Make sure to check
everything is as expected before taking any actions.

And I think you are already doing that in decode_loadstore_postindexing.

These are the fields:

+ * 31 30 29  27 26 25  23   21 20              11   9         4       0
+ * ___________________________________________________________________
+ * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
+ * |____|______|__|____|____|__|_______________|____|_________|_______|
+ */
+union ldr_str_instr_class {
+    uint32_t value;
+    struct ldr_str {
+        unsigned int rt:5;     /* Rt register */
+        unsigned int rn:5;     /* Rn register */
+        unsigned int fixed1:2; /* value == 01b */
+        signed int imm9:9;            /* imm9 */
+        unsigned int fixed2:1; /* value == 0b */
+        unsigned int opc:2;    /* opc */
+        unsigned int fixed3:2; /* value == 00b */
+        unsigned int v:1;      /* vector */
+        unsigned int fixed4:3; /* value == 111b */
+        unsigned int size:2;   /* size */
+    } code;
+};


This patch already checks for:
- the fixed values
- v
- opc
- some special rt and rn values

Considering that:
- size is fine either way
- as rt and rn are 5 bits wide, all values are acceptable (x0->x31)

It doesn't look like we are missing anything, unless imm9 is restricted
to some ranges only.

Re: [XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Jan Beulich 2 years, 2 months ago
On 24.01.2022 19:41, Stefano Stabellini wrote:
> On Mon, 24 Jan 2022, Ayan Kumar Halder wrote:
>> As for the patch, I will mention this issue (as a comment in the code) where
>> we are loading the instruction from PC. Stefano/Julien/Bertrand/Volodymyr:-
>> Does it look fine with you ?
> 
> As this issue could happen on any architecture (the guest could change
> the instruction from another vcpu while the other is trapping in Xen)
> and given that we do quite a bit of emulation on x86 I asked Jan on IRC
> how do we handle this kind of things on x86 today. He had a good answer:
> "By not making any assumptions on what we're going to find."
> 
> In other words, don't assume you are going to find a store or a load
> instruction at the memory location pointed by the PC. You could find
> total garbage (because it was changed in between). Make sure to check
> everything is as expected before taking any actions.
> 
> And I think you are already doing that in decode_loadstore_postindexing.
> 
> These are the fields:
> 
> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
> + * ___________________________________________________________________
> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
> + * |____|______|__|____|____|__|_______________|____|_________|_______|
> + */
> +union ldr_str_instr_class {
> +    uint32_t value;
> +    struct ldr_str {
> +        unsigned int rt:5;     /* Rt register */
> +        unsigned int rn:5;     /* Rn register */
> +        unsigned int fixed1:2; /* value == 01b */
> +        signed int imm9:9;            /* imm9 */
> +        unsigned int fixed2:1; /* value == 0b */
> +        unsigned int opc:2;    /* opc */
> +        unsigned int fixed3:2; /* value == 00b */
> +        unsigned int v:1;      /* vector */
> +        unsigned int fixed4:3; /* value == 111b */
> +        unsigned int size:2;   /* size */
> +    } code;
> +};
> 
> 
> This patch already checks for:
> - the fixed values
> - v
> - opc
> - some special rt and rn values
> 
> Considering that:
> - size is fine either way
> - as rt and rn are 5 bits wide, all values are acceptable (x0->x31)
> 
> It doesn't look like we are missing anything, unless imm9 is restricted
> to some ranges only.

Beyond decoding there's at least one further assumption one may
mistakenly make: The address may not be suitably aligned and it may
not reference MMIO (or, should that matter, not the specific region
of MMIO that other trap-provided info my hint at).

Jan


Re: [XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Ayan Kumar Halder 2 years, 2 months ago
Hi Jan/All,

On 25/01/2022 08:55, Jan Beulich wrote:
> On 24.01.2022 19:41, Stefano Stabellini wrote:
>> On Mon, 24 Jan 2022, Ayan Kumar Halder wrote:
>>> As for the patch, I will mention this issue (as a comment in the code) where
>>> we are loading the instruction from PC. Stefano/Julien/Bertrand/Volodymyr:-
>>> Does it look fine with you ?
>> As this issue could happen on any architecture (the guest could change
>> the instruction from another vcpu while the other is trapping in Xen)
>> and given that we do quite a bit of emulation on x86 I asked Jan on IRC
>> how do we handle this kind of things on x86 today. He had a good answer:
>> "By not making any assumptions on what we're going to find."
>>
>> In other words, don't assume you are going to find a store or a load
>> instruction at the memory location pointed by the PC. You could find
>> total garbage (because it was changed in between). Make sure to check
>> everything is as expected before taking any actions.
>>
>> And I think you are already doing that in decode_loadstore_postindexing.
>>
>> These are the fields:
>>
>> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
>> + * ___________________________________________________________________
>> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
>> + * |____|______|__|____|____|__|_______________|____|_________|_______|
>> + */
>> +union ldr_str_instr_class {
>> +    uint32_t value;
>> +    struct ldr_str {
>> +        unsigned int rt:5;     /* Rt register */
>> +        unsigned int rn:5;     /* Rn register */
>> +        unsigned int fixed1:2; /* value == 01b */
>> +        signed int imm9:9;            /* imm9 */
>> +        unsigned int fixed2:1; /* value == 0b */
>> +        unsigned int opc:2;    /* opc */
>> +        unsigned int fixed3:2; /* value == 00b */
>> +        unsigned int v:1;      /* vector */
>> +        unsigned int fixed4:3; /* value == 111b */
>> +        unsigned int size:2;   /* size */
>> +    } code;
>> +};
>>
>>
>> This patch already checks for:
>> - the fixed values
>> - v
>> - opc
>> - some special rt and rn values
>>
>> Considering that:
>> - size is fine either way
>> - as rt and rn are 5 bits wide, all values are acceptable (x0->x31)
>>
>> It doesn't look like we are missing anything, unless imm9 is restricted
>> to some ranges only.
> Beyond decoding there's at least one further assumption one may
> mistakenly make: The address may not be suitably aligned and it may
> not reference MMIO (or, should that matter, not the specific region
> of MMIO that other trap-provided info my hint at).

As I see, Xen will read/write to the MMIO address provided either by 
gva_to_ipa_pa() or HPFAR_EL2.

However, (you are correct), that the address pointed by Rn might not 
point to the same address (assuming that the instruction was changed 
after being loaded in I cache). In any case, Xen will simply increment 
(or decrement) Rn. The guest will find this new value of Rn (and that 
should be fine it was the guest who had changed the instruction).

In any case, I don't see Xen doing something erroneous.

I will send out a v4 patch addressing the issues pointed by Stefano and 
Andre (commit message).

- Ayan

>
> Jan
>

Re: [XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Julien Grall 2 years, 2 months ago
Hi Andre,

On 24/01/2022 14:36, Andre Przywara wrote:
> On Mon, 24 Jan 2022 12:07:42 +0000
>> Also, if an instruction is being modified by the guest (after it has
>> been loaded in the I cache), and if the guest does not invalidate the I
>> cache + ISB, then this is a malicious behavior by the guest. Is my
>> understanding correct ?
> 
> I wouldn't say malicious per se, there might be legitimate reasons to do
> so, but in the Xen context this is mostly irrelevant, since we don't trust
> the guest anyway. So whether it's malicious or accidental, the hypervisor
> might be mislead.

I agree the hypervisor will be mislead to execute the wrong instruction. 
But, in reality, I don't see how this is a massive problem as this 
thread seems to imply. At best the guest will shoot itself in the foot.

IOW, for now, I think it is fine to assume that the guest will have 
invalidated the cache instruction before executing any instruction that 
may fault with ISV=0. This could be revisted if we have use-cases where 
we really need to know what the guest executed.

Cheers,

-- 
Julien Grall

Re: [XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions
Posted by Andre Przywara 2 years, 2 months ago
On Mon, 24 Jan 2022 17:58:55 +0000
Julien Grall <julien@xen.org> wrote:

Hi Julien,

> Hi Andre,
> 
> On 24/01/2022 14:36, Andre Przywara wrote:
> > On Mon, 24 Jan 2022 12:07:42 +0000  
> >> Also, if an instruction is being modified by the guest (after it has
> >> been loaded in the I cache), and if the guest does not invalidate the I
> >> cache + ISB, then this is a malicious behavior by the guest. Is my
> >> understanding correct ?  
> > 
> > I wouldn't say malicious per se, there might be legitimate reasons to do
> > so, but in the Xen context this is mostly irrelevant, since we don't trust
> > the guest anyway. So whether it's malicious or accidental, the hypervisor
> > might be mislead.  
> 
> I agree the hypervisor will be mislead to execute the wrong instruction. 
> But, in reality, I don't see how this is a massive problem as this 
> thread seems to imply. At best the guest will shoot itself in the foot.

I didn't really imply anything, I genuinely meant that I don't want to
spend brain cells thinking about possible exploits - I always figured you
(and Xen people in general) are so much better in this. (genuine
compliment!)
I was just pointing out that this emulation might be wrong then.
That ties back to the original question of how many bitter pills you want
to swallow for having this emulation code - which is your decision to make.

Cheers,
Andre

> IOW, for now, I think it is fine to assume that the guest will have 
> invalidated the cache instruction before executing any instruction that 
> may fault with ISV=0. This could be revisted if we have use-cases where 
> we really need to know what the guest executed.
> 
> Cheers,
>