[XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions

Ayan Kumar Halder posted 1 patch 2 years, 4 months ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20211129191638.19877-1-ayankuma@xilinx.com
xen/arch/arm/decode.c     | 68 ++++++++++++++++++++++++++++++++++++++-
xen/arch/arm/decode.h     |  3 +-
xen/arch/arm/io.c         | 40 +++++++++++++++++++----
xen/include/asm-arm/hsr.h | 26 +++++++++++++++
4 files changed, 129 insertions(+), 8 deletions(-)
[XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Ayan Kumar Halder 2 years, 4 months ago
At the moment, Xen is only handling data abort with valid syndrome (i.e.
ISV=0). Unfortunately, this doesn't cover all the instructions a domain
could use to access MMIO regions.

For instance, Xilinx baremetal OS will use:

        volatile u32 *LocalAddr = (volatile u32 *)Addr;
        *LocalAddr = Value;

This leave the compiler to decide which store instructions to use. This
may be a post-index store instruction where the HW will not provide a
valid syndrome.

In order to handle post-indexing store/load instructions, Xen will need
to fetch and decode the instruction.

This patch only cover post-index store/load instructions from AArch64 mode.
For now, this is left unimplemented for trap from AArch32 mode.

Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
---

Changelog :-

v2 :- 1. Updated the rn register after reading from it. (Pointed by Julien,
 Stefano)
2. Used a union to represent the instruction opcode (Suggestd by Bertrand)
3. Fixed coding style issues (Pointed by Julien)
4. In the previous patch, I was updating dabt->sign based on the signedness of
imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI 0487G.b, Page 3221,
SSE indicates the signedness of the data item loaded. In our case, the data item
loaded is always unsigned.

This has been tested for the following cases :-
ldr x2, [x1], #4

ldr w2, [x1], #-4

str x2, [x1], #4

str w2, [x1], #-4

The reason being  I am testing on 32bit MMIO registers only. I don't see a 8bit
or 16bit MMIO register.

 xen/arch/arm/decode.c     | 68 ++++++++++++++++++++++++++++++++++++++-
 xen/arch/arm/decode.h     |  3 +-
 xen/arch/arm/io.c         | 40 +++++++++++++++++++----
 xen/include/asm-arm/hsr.h | 26 +++++++++++++++
 4 files changed, 129 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
index 792c2e92a7..0b3e8fcbc6 100644
--- a/xen/arch/arm/decode.c
+++ b/xen/arch/arm/decode.c
@@ -84,6 +84,66 @@ bad_thumb2:
     return 1;
 }
 
+static int decode_32bit_loadstore_postindexing(register_t pc,
+                                               struct hsr_dabt *dabt,
+                                               union ldr_str_instr_class *instr)
+{
+    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
+        return -EFAULT;
+
+    /* First, let's check for the fixed values */
+    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
+         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
+    {
+        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
+            " ldr/str post indexing\n");
+        goto bad_32bit_loadstore;
+    }
+
+    if ( instr->code.size != 2 )
+    {
+        gprintk(XENLOG_ERR,
+            "ldr/str post indexing is supported for 32 bit variant only\n");
+        goto bad_32bit_loadstore;
+    }
+
+    if ( instr->code.v != 0 )
+    {
+        gprintk(XENLOG_ERR,
+            "ldr/str post indexing for vector types are not supported\n");
+        goto bad_32bit_loadstore;
+    }
+
+    /* Check for STR (immediate) - 32 bit variant */
+    if ( instr->code.opc == 0 )
+    {
+        dabt->write = 1;
+    }
+    /* Check for LDR (immediate) - 32 bit variant */
+    else if ( instr->code.opc == 1 )
+    {
+        dabt->write = 0;
+    }
+    else
+    {
+        gprintk(XENLOG_ERR,
+            "Decoding ldr/str post indexing is not supported for this variant\n");
+        goto bad_32bit_loadstore;
+    }
+
+    gprintk(XENLOG_INFO,
+        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
+        instr->code.rt, instr->code.size, instr->code.imm9);
+
+    update_dabt(dabt, instr->code.rt, instr->code.size, false);
+    dabt->valid = 1;
+
+    return 0;
+bad_32bit_loadstore:
+    gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", instr->value);
+    return 1;
+}
+
 static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
 {
     uint16_t instr;
@@ -150,11 +210,17 @@ bad_thumb:
     return 1;
 }
 
-int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
+int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
+                       union ldr_str_instr_class *instr)
 {
     if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
         return decode_thumb(regs->pc, dabt);
 
+    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
+    {
+        return decode_32bit_loadstore_postindexing(regs->pc, dabt, instr);
+    }
+
     /* TODO: Handle ARM instruction */
     gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
 
diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
index 4613763bdb..d82fc4a0f6 100644
--- a/xen/arch/arm/decode.h
+++ b/xen/arch/arm/decode.h
@@ -35,7 +35,8 @@
  */
 
 int decode_instruction(const struct cpu_user_regs *regs,
-                       struct hsr_dabt *dabt);
+                       struct hsr_dabt *dabt,
+                       union ldr_str_instr_class *instr);
 
 #endif /* __ARCH_ARM_DECODE_H_ */
 
diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index 729287e37c..0d60754bc4 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -65,6 +65,16 @@ static enum io_state handle_write(const struct mmio_handler *handler,
     return ret ? IO_HANDLED : IO_ABORT;
 }
 
+static void post_incremenet_register(union ldr_str_instr_class *instr)
+{
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    unsigned int val;
+
+    val = get_user_reg(regs, instr->code.rn);
+    val += instr->code.imm9;
+    set_user_reg(regs, instr->code.rn, val);
+}
+
 /* This function assumes that mmio regions are not overlapped */
 static int cmp_mmio_handler(const void *key, const void *elem)
 {
@@ -106,14 +116,26 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
         .gpa = gpa,
         .dabt = dabt
     };
+    int rc;
+    union ldr_str_instr_class instr = {0};
 
     ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
 
+    /*
+     * Armv8 processor does not provide a valid syndrome for post-indexing
+     * ldr/str instructions. So in order to process these instructions,
+     * Xen must decode them.
+     */
+    if ( !info.dabt.valid )
+    {
+        rc = decode_instruction(regs, &info.dabt, &instr);
+        if ( rc )
+            return IO_ABORT;
+    }
+
     handler = find_mmio_handler(v->domain, info.gpa);
     if ( !handler )
     {
-        int rc;
-
         rc = try_fwd_ioserv(regs, v, &info);
         if ( rc == IO_HANDLED )
             return handle_ioserv(regs, v);
@@ -122,7 +144,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
     }
 
     /* All the instructions used on emulated MMIO region should be valid */
-    if ( !dabt.valid )
+    if ( !info.dabt.valid )
         return IO_ABORT;
 
     /*
@@ -134,7 +156,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
     {
         int rc;
 
-        rc = decode_instruction(regs, &info.dabt);
+        rc = decode_instruction(regs, &info.dabt, NULL);
         if ( rc )
         {
             gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
@@ -143,9 +165,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
     }
 
     if ( info.dabt.write )
-        return handle_write(handler, v, &info);
+        rc = handle_write(handler, v, &info);
     else
-        return handle_read(handler, v, &info);
+        rc = handle_read(handler, v, &info);
+
+    if ( instr.value != 0 )
+    {
+        post_incremenet_register(&instr);
+    }
+    return rc;
 }
 
 void register_mmio_handler(struct domain *d,
diff --git a/xen/include/asm-arm/hsr.h b/xen/include/asm-arm/hsr.h
index 9b91b28c48..72d67d2801 100644
--- a/xen/include/asm-arm/hsr.h
+++ b/xen/include/asm-arm/hsr.h
@@ -15,6 +15,32 @@ enum dabt_size {
     DABT_DOUBLE_WORD = 3,
 };
 
+/*
+ * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
+ * Page 318 specifies the following bit pattern for
+ * "load/store register (immediate post-indexed)".
+ *
+ * 31 30 29  27 26 25  23   21 20              11   9         4       0
+ * ___________________________________________________________________
+ * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
+ * |____|______|__|____|____|__|_______________|____|_________|_______|
+ */
+union ldr_str_instr_class {
+    uint32_t value;
+    struct ldr_str {
+        unsigned int rt:5;     /* Rt register */
+        unsigned int rn:5;     /* Rn register */
+        unsigned int fixed1:2; /* value == 01b */
+        int imm9:9;            /* imm9 */
+        unsigned int fixed2:1; /* value == 0b */
+        unsigned int opc:2;    /* opc */
+        unsigned int fixed3:2; /* value == 00b */
+        unsigned int v:1;      /* vector */
+        unsigned int fixed4:3; /* value == 111b */
+        unsigned int size:2;   /* size */
+    } code;
+};
+
 union hsr {
     register_t bits;
     struct {
-- 
2.17.1


Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Jan Beulich 2 years, 4 months ago
On 29.11.2021 20:16, Ayan Kumar Halder wrote:
> At the moment, Xen is only handling data abort with valid syndrome (i.e.
> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
> could use to access MMIO regions.
> 
> For instance, Xilinx baremetal OS will use:
> 
>         volatile u32 *LocalAddr = (volatile u32 *)Addr;
>         *LocalAddr = Value;
> 
> This leave the compiler to decide which store instructions to use. This
> may be a post-index store instruction where the HW will not provide a
> valid syndrome.
> 
> In order to handle post-indexing store/load instructions, Xen will need
> to fetch and decode the instruction.
> 
> This patch only cover post-index store/load instructions from AArch64 mode.
> For now, this is left unimplemented for trap from AArch32 mode.
> 
> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>

Just a couple of general remarks, with no judgement towards its use
in the codebase, and leaving out several obvious style issues.

> Changelog :-
> 
> v2 :- 1. Updated the rn register after reading from it. (Pointed by Julien,
>  Stefano)
> 2. Used a union to represent the instruction opcode (Suggestd by Bertrand)
> 3. Fixed coding style issues (Pointed by Julien)
> 4. In the previous patch, I was updating dabt->sign based on the signedness of
> imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI 0487G.b, Page 3221,
> SSE indicates the signedness of the data item loaded. In our case, the data item
> loaded is always unsigned.
> 
> This has been tested for the following cases :-
> ldr x2, [x1], #4

DYM "ldr w2, [x1], #4" or "ldr x2, [x1], #8" here?

> ldr w2, [x1], #-4
> 
> str x2, [x1], #4

Similar aspect here.

> str w2, [x1], #-4
> 
> The reason being  I am testing on 32bit MMIO registers only. I don't see a 8bit
> or 16bit MMIO register.

As per this, perhaps the former of the two.

> --- a/xen/arch/arm/decode.c
> +++ b/xen/arch/arm/decode.c
> @@ -84,6 +84,66 @@ bad_thumb2:
>      return 1;
>  }
>  
> +static int decode_32bit_loadstore_postindexing(register_t pc,
> +                                               struct hsr_dabt *dabt,
> +                                               union ldr_str_instr_class *instr)
> +{
> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
> +        return -EFAULT;
> +
> +    /* First, let's check for the fixed values */
> +    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
> +         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
> +    {
> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
> +            " ldr/str post indexing\n");
> +        goto bad_32bit_loadstore;
> +    }
> +
> +    if ( instr->code.size != 2 )
> +    {
> +        gprintk(XENLOG_ERR,
> +            "ldr/str post indexing is supported for 32 bit variant only\n");
> +        goto bad_32bit_loadstore;
> +    }
> +
> +    if ( instr->code.v != 0 )
> +    {
> +        gprintk(XENLOG_ERR,
> +            "ldr/str post indexing for vector types are not supported\n");
> +        goto bad_32bit_loadstore;
> +    }
> +
> +    /* Check for STR (immediate) - 32 bit variant */
> +    if ( instr->code.opc == 0 )
> +    {
> +        dabt->write = 1;
> +    }
> +    /* Check for LDR (immediate) - 32 bit variant */
> +    else if ( instr->code.opc == 1 )
> +    {
> +        dabt->write = 0;
> +    }
> +    else
> +    {
> +        gprintk(XENLOG_ERR,
> +            "Decoding ldr/str post indexing is not supported for this variant\n");
> +        goto bad_32bit_loadstore;
> +    }
> +
> +    gprintk(XENLOG_INFO,
> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
> +        instr->code.rt, instr->code.size, instr->code.imm9);
> +
> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
> +    dabt->valid = 1;
> +
> +    return 0;
> +bad_32bit_loadstore:

Please indent labels by at least a blank. I also think this would
benefit from a preceding blank line.

> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -65,6 +65,16 @@ static enum io_state handle_write(const struct mmio_handler *handler,
>      return ret ? IO_HANDLED : IO_ABORT;
>  }
>  
> +static void post_incremenet_register(union ldr_str_instr_class *instr)

I think you mean post_increment_register()?

> +{
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +    unsigned int val;
> +
> +    val = get_user_reg(regs, instr->code.rn);
> +    val += instr->code.imm9;
> +    set_user_reg(regs, instr->code.rn, val);

I don't think this handles the SP case correctly, and I also don't see
that case getting rejected elsewhere.

> --- a/xen/include/asm-arm/hsr.h
> +++ b/xen/include/asm-arm/hsr.h
> @@ -15,6 +15,32 @@ enum dabt_size {
>      DABT_DOUBLE_WORD = 3,
>  };
>  
> +/*
> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
> + * Page 318 specifies the following bit pattern for
> + * "load/store register (immediate post-indexed)".
> + *
> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
> + * ___________________________________________________________________
> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
> + * |____|______|__|____|____|__|_______________|____|_________|_______|
> + */
> +union ldr_str_instr_class {
> +    uint32_t value;
> +    struct ldr_str {
> +        unsigned int rt:5;     /* Rt register */
> +        unsigned int rn:5;     /* Rn register */
> +        unsigned int fixed1:2; /* value == 01b */
> +        int imm9:9;            /* imm9 */

Plain int bitfields are, iirc, signed or unsigned at the compiler's
discretion. Hence I think you mean explicitly "signed int" here.

> +        unsigned int fixed2:1; /* value == 0b */
> +        unsigned int opc:2;    /* opc */
> +        unsigned int fixed3:2; /* value == 00b */
> +        unsigned int v:1;      /* vector */
> +        unsigned int fixed4:3; /* value == 111b */
> +        unsigned int size:2;   /* size */
> +    } code;
> +};

I'd recommend types needed in just one CU to live there, rather than
getting exposed to every source file including this header (even more
so when - aiui - this is entirely unrelated to HSR). When used in
just a single function, it might even want to live here (i.e. as
close as possible to the [only] use).

Jan


Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Ayan Kumar Halder 2 years, 4 months ago
Hi Jan,

Thanks a lot for the feedback.
I need a clarification.

On 30/11/2021 07:57, Jan Beulich wrote:
> On 29.11.2021 20:16, Ayan Kumar Halder wrote:
>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
>> could use to access MMIO regions.
>>
>> For instance, Xilinx baremetal OS will use:
>>
>>          volatile u32 *LocalAddr = (volatile u32 *)Addr;
>>          *LocalAddr = Value;
>>
>> This leave the compiler to decide which store instructions to use. This
>> may be a post-index store instruction where the HW will not provide a
>> valid syndrome.
>>
>> In order to handle post-indexing store/load instructions, Xen will need
>> to fetch and decode the instruction.
>>
>> This patch only cover post-index store/load instructions from AArch64 mode.
>> For now, this is left unimplemented for trap from AArch32 mode.
>>
>> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
> 
> Just a couple of general remarks, with no judgement towards its use
> in the codebase, and leaving out several obvious style issues.
> 
>> Changelog :-
>>
>> v2 :- 1. Updated the rn register after reading from it. (Pointed by Julien,
>>   Stefano)
>> 2. Used a union to represent the instruction opcode (Suggestd by Bertrand)
>> 3. Fixed coding style issues (Pointed by Julien)
>> 4. In the previous patch, I was updating dabt->sign based on the signedness of
>> imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI 0487G.b, Page 3221,
>> SSE indicates the signedness of the data item loaded. In our case, the data item
>> loaded is always unsigned.
>>
>> This has been tested for the following cases :-
>> ldr x2, [x1], #4
> 
> DYM "ldr w2, [x1], #4" or "ldr x2, [x1], #8" here?
Yes, you are correct.
It is "ldr w2, [x1], #4"

> 
>> ldr w2, [x1], #-4
>>
>> str x2, [x1], #4
> 
> Similar aspect here.
"str w2, [x1], #4"

> 
>> str w2, [x1], #-4
>>
>> The reason being  I am testing on 32bit MMIO registers only. I don't see a 8bit
>> or 16bit MMIO register.
> 
> As per this, perhaps the former of the two.
> 
>> --- a/xen/arch/arm/decode.c
>> +++ b/xen/arch/arm/decode.c
>> @@ -84,6 +84,66 @@ bad_thumb2:
>>       return 1;
>>   }
>>   
>> +static int decode_32bit_loadstore_postindexing(register_t pc,
>> +                                               struct hsr_dabt *dabt,
>> +                                               union ldr_str_instr_class *instr)
>> +{
>> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
>> +        return -EFAULT;
>> +
>> +    /* First, let's check for the fixed values */
>> +    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
>> +         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
>> +    {
>> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
>> +            " ldr/str post indexing\n");
>> +        goto bad_32bit_loadstore;
>> +    }
>> +
>> +    if ( instr->code.size != 2 )
>> +    {
>> +        gprintk(XENLOG_ERR,
>> +            "ldr/str post indexing is supported for 32 bit variant only\n");
>> +        goto bad_32bit_loadstore;
>> +    }
>> +
>> +    if ( instr->code.v != 0 )
>> +    {
>> +        gprintk(XENLOG_ERR,
>> +            "ldr/str post indexing for vector types are not supported\n");
>> +        goto bad_32bit_loadstore;
>> +    }
>> +
>> +    /* Check for STR (immediate) - 32 bit variant */
>> +    if ( instr->code.opc == 0 )
>> +    {
>> +        dabt->write = 1;
>> +    }
>> +    /* Check for LDR (immediate) - 32 bit variant */
>> +    else if ( instr->code.opc == 1 )
>> +    {
>> +        dabt->write = 0;
>> +    }
>> +    else
>> +    {
>> +        gprintk(XENLOG_ERR,
>> +            "Decoding ldr/str post indexing is not supported for this variant\n");
>> +        goto bad_32bit_loadstore;
>> +    }
>> +
>> +    gprintk(XENLOG_INFO,
>> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
>> +        instr->code.rt, instr->code.size, instr->code.imm9);
>> +
>> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
>> +    dabt->valid = 1;
>> +
>> +    return 0;
>> +bad_32bit_loadstore:
> 
> Please indent labels by at least a blank. I also think this would
> benefit from a preceding blank line.
Ack

> 
>> --- a/xen/arch/arm/io.c
>> +++ b/xen/arch/arm/io.c
>> @@ -65,6 +65,16 @@ static enum io_state handle_write(const struct mmio_handler *handler,
>>       return ret ? IO_HANDLED : IO_ABORT;
>>   }
>>   
>> +static void post_incremenet_register(union ldr_str_instr_class *instr)
> 
> I think you mean post_increment_register()?
Ack. Sorry for my carelessness. :(

> 
>> +{
>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>> +    unsigned int val;
>> +
>> +    val = get_user_reg(regs, instr->code.rn);
>> +    val += instr->code.imm9;
>> +    set_user_reg(regs, instr->code.rn, val);
> 
> I don't think this handles the SP case correctly, and I also don't see
> that case getting rejected elsewhere.

Sorry, I did not understand you. Can you explain a bit more ?

Following 
https://www.keil.com/support/man/docs/armasm/armasm_dom1361289873425.htm 
, Are you saying that we need to handle this restriction
"You can use SP for Rt in non-word instructions in ARM code but this is 
deprecated in ARMv6T2 and above"


> 
>> --- a/xen/include/asm-arm/hsr.h
>> +++ b/xen/include/asm-arm/hsr.h
>> @@ -15,6 +15,32 @@ enum dabt_size {
>>       DABT_DOUBLE_WORD = 3,
>>   };
>>   
>> +/*
>> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
>> + * Page 318 specifies the following bit pattern for
>> + * "load/store register (immediate post-indexed)".
>> + *
>> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
>> + * ___________________________________________________________________
>> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
>> + * |____|______|__|____|____|__|_______________|____|_________|_______|
>> + */
>> +union ldr_str_instr_class {
>> +    uint32_t value;
>> +    struct ldr_str {
>> +        unsigned int rt:5;     /* Rt register */
>> +        unsigned int rn:5;     /* Rn register */
>> +        unsigned int fixed1:2; /* value == 01b */
>> +        int imm9:9;            /* imm9 */
> 
> Plain int bitfields are, iirc, signed or unsigned at the compiler's
> discretion. Hence I think you mean explicitly "signed int" here.
> 
>> +        unsigned int fixed2:1; /* value == 0b */
>> +        unsigned int opc:2;    /* opc */
>> +        unsigned int fixed3:2; /* value == 00b */
>> +        unsigned int v:1;      /* vector */
>> +        unsigned int fixed4:3; /* value == 111b */
>> +        unsigned int size:2;   /* size */
>> +    } code;
>> +};
> 
> I'd recommend types needed in just one CU to live there, rather than
> getting exposed to every source file including this header (even more
> so when - aiui - this is entirely unrelated to HSR). When used in
> just a single function, it might even want to live here (i.e. as
> close as possible to the [only] use).

Ack. I will try to use bitmask as Andre suggested.

- Ayan
> 
> Jan
> 

Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Jan Beulich 2 years, 4 months ago
On 30.11.2021 19:35, Ayan Kumar Halder wrote:
> On 30/11/2021 07:57, Jan Beulich wrote:
>> On 29.11.2021 20:16, Ayan Kumar Halder wrote:
>>> +{
>>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>>> +    unsigned int val;
>>> +
>>> +    val = get_user_reg(regs, instr->code.rn);
>>> +    val += instr->code.imm9;
>>> +    set_user_reg(regs, instr->code.rn, val);
>>
>> I don't think this handles the SP case correctly, and I also don't see
>> that case getting rejected elsewhere.
> 
> Sorry, I did not understand you. Can you explain a bit more ?
> 
> Following 
> https://www.keil.com/support/man/docs/armasm/armasm_dom1361289873425.htm 
> , Are you saying that we need to handle this restriction
> "You can use SP for Rt in non-word instructions in ARM code but this is 
> deprecated in ARMv6T2 and above"

Are you looking at the correct (part of the) doc? It feels like this is
Arm32 wording (plus it's Rn I'm talking about, not Rt) ... DDI0487G-b
has nothing like this on the "LDR (immediate)" insn page. And even if
it had, "deprecated" doesn't mean "impossible", so you'd still need to
deal with the situation in a way that's not silently doing the wrong
thing (IOW you may be fine not actually emulating the case, but then
you need to clearly fail emulation rather than using XZR).

I have to admit I don't recall what the behavior is when Rt == 31. But
what you may further want to deal with independent of that is Rt == Rn.

Jan


Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Andre Przywara 2 years, 4 months ago
On Mon, 29 Nov 2021 19:16:38 +0000
Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:

Hi,

> At the moment, Xen is only handling data abort with valid syndrome (i.e.
> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
> could use to access MMIO regions.
> 
> For instance, Xilinx baremetal OS will use:
> 
>         volatile u32 *LocalAddr = (volatile u32 *)Addr;
>         *LocalAddr = Value;
> 
> This leave the compiler to decide which store instructions to use.

As mentioned in the other email, this is wrong, if this points to MMIO:
don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
an MMIO area, you should not see traps that you cannot handle already.

So I don't think it's a good idea to use that as an example. And since
this patch only seems to address this use case, I would doubt its
usefulness in general.

> This
> may be a post-index store instruction where the HW will not provide a
> valid syndrome.
> 
> In order to handle post-indexing store/load instructions, Xen will need
> to fetch and decode the instruction.
> 
> This patch only cover post-index store/load instructions from AArch64
> mode. For now, this is left unimplemented for trap from AArch32 mode.
> 
> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
> ---
> 
> Changelog :-
> 
> v2 :- 1. Updated the rn register after reading from it. (Pointed by
> Julien, Stefano)
> 2. Used a union to represent the instruction opcode (Suggestd by
> Bertrand) 3. Fixed coding style issues (Pointed by Julien)
> 4. In the previous patch, I was updating dabt->sign based on the
> signedness of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI
> 0487G.b, Page 3221, SSE indicates the signedness of the data item
> loaded. In our case, the data item loaded is always unsigned.
> 
> This has been tested for the following cases :-
> ldr x2, [x1], #4

As Jan already mentioned: this is a bad example. First, this is a 64-bit
access, which you don't emulate below. And second, you want to keep the
pointer aligned. Unaligned accesses to device memory always trap, as per
the architecture, even on bare metal.

> 
> ldr w2, [x1], #-4
> 
> str x2, [x1], #4

Same as above.

> str w2, [x1], #-4
> 
> The reason being  I am testing on 32bit MMIO registers only. I don't see
> a 8bit or 16bit MMIO register.

Where did you look? There are plenty of examples out there, even the GIC
allows 8-bit accesses to certain registers (grep for "VGIC_ACCESS_"), and
the Linux GIC driver is using them (but with proper accessors, of course).
Also GICv3 supports 64-bit accesses to some registers. Some PL011 UARTs use
16-bit MMIO accesses.

>  xen/arch/arm/decode.c     | 68 ++++++++++++++++++++++++++++++++++++++-
>  xen/arch/arm/decode.h     |  3 +-
>  xen/arch/arm/io.c         | 40 +++++++++++++++++++----
>  xen/include/asm-arm/hsr.h | 26 +++++++++++++++
>  4 files changed, 129 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
> index 792c2e92a7..0b3e8fcbc6 100644
> --- a/xen/arch/arm/decode.c
> +++ b/xen/arch/arm/decode.c
> @@ -84,6 +84,66 @@ bad_thumb2:
>      return 1;
>  }
>  
> +static int decode_32bit_loadstore_postindexing(register_t pc,
> +                                               struct hsr_dabt *dabt,
> +                                               union ldr_str_instr_class *instr)
> +{
> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
> +        return -EFAULT;
> +
> +    /* First, let's check for the fixed values */
> +    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
> +         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
> +    {
> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
> +            " ldr/str post indexing\n");
> +        goto bad_32bit_loadstore;
> +    }
> +
> +    if ( instr->code.size != 2 )

I don't see a good reason for this limitation. If you are going to dissect
the instruction, why not just support at least all access widths, so
64-bits, but also {ldr,str}{b,w}? I think the framework does the heavy
lifting for you already?
Same for the restriction to post-index above, supporting pre-index as well
should be easy.

To me this has the bitter taste for being a one trick pony to work around
your particular (broken!) use case.

> +    {
> +        gprintk(XENLOG_ERR,
> +            "ldr/str post indexing is supported for 32 bit variant only\n");
> +        goto bad_32bit_loadstore;
> +    }
> +
> +    if ( instr->code.v != 0 )
> +    {
> +        gprintk(XENLOG_ERR,
> +            "ldr/str post indexing for vector types are not supported\n");
> +        goto bad_32bit_loadstore;
> +    }
> +
> +    /* Check for STR (immediate) - 32 bit variant */
> +    if ( instr->code.opc == 0 )
> +    {
> +        dabt->write = 1;
> +    }
> +    /* Check for LDR (immediate) - 32 bit variant */
> +    else if ( instr->code.opc == 1 )
> +    {
> +        dabt->write = 0;
> +    }
> +    else
> +    {
> +        gprintk(XENLOG_ERR,
> +            "Decoding ldr/str post indexing is not supported for this variant\n");
> +        goto bad_32bit_loadstore;
> +    }
> +
> +    gprintk(XENLOG_INFO,
> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
> +        instr->code.rt, instr->code.size, instr->code.imm9);
> +
> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
> +    dabt->valid = 1;
> +
> +    return 0;
> +bad_32bit_loadstore:
> +    gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", instr->value);
> +    return 1;
> +}
> +
>  static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
>  {
>      uint16_t instr;
> @@ -150,11 +210,17 @@ bad_thumb:
>      return 1;
>  }
>  
> -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
> +                       union ldr_str_instr_class *instr)
>  {
>      if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
>          return decode_thumb(regs->pc, dabt);
>  
> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
> +    {
> +        return decode_32bit_loadstore_postindexing(regs->pc, dabt, instr);
> +    }
> +
>      /* TODO: Handle ARM instruction */
>      gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
>  
> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
> index 4613763bdb..d82fc4a0f6 100644
> --- a/xen/arch/arm/decode.h
> +++ b/xen/arch/arm/decode.h
> @@ -35,7 +35,8 @@
>   */
>  
>  int decode_instruction(const struct cpu_user_regs *regs,
> -                       struct hsr_dabt *dabt);
> +                       struct hsr_dabt *dabt,
> +                       union ldr_str_instr_class *instr);
>  
>  #endif /* __ARCH_ARM_DECODE_H_ */
>  
> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> index 729287e37c..0d60754bc4 100644
> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -65,6 +65,16 @@ static enum io_state handle_write(const struct
> mmio_handler *handler, return ret ? IO_HANDLED : IO_ABORT;
>  }
>  
> +static void post_incremenet_register(union ldr_str_instr_class *instr)
> +{
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +    unsigned int val;
> +
> +    val = get_user_reg(regs, instr->code.rn);
> +    val += instr->code.imm9;
> +    set_user_reg(regs, instr->code.rn, val);
> +}
> +
>  /* This function assumes that mmio regions are not overlapped */
>  static int cmp_mmio_handler(const void *key, const void *elem)
>  {
> @@ -106,14 +116,26 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> *regs, .gpa = gpa,
>          .dabt = dabt
>      };
> +    int rc;
> +    union ldr_str_instr_class instr = {0};
>  
>      ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
>  
> +    /*
> +     * Armv8 processor does not provide a valid syndrome for post-indexing
> +     * ldr/str instructions. So in order to process these instructions,
> +     * Xen must decode them.
> +     */
> +    if ( !info.dabt.valid )
> +    {
> +        rc = decode_instruction(regs, &info.dabt, &instr);
> +        if ( rc )
> +            return IO_ABORT;
> +    }
> +
>      handler = find_mmio_handler(v->domain, info.gpa);
>      if ( !handler )
>      {
> -        int rc;
> -
>          rc = try_fwd_ioserv(regs, v, &info);
>          if ( rc == IO_HANDLED )
>              return handle_ioserv(regs, v);
> @@ -122,7 +144,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> *regs, }
>  
>      /* All the instructions used on emulated MMIO region should be
> valid */
> -    if ( !dabt.valid )
> +    if ( !info.dabt.valid )
>          return IO_ABORT;
>  
>      /*
> @@ -134,7 +156,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> *regs, {
>          int rc;
>  
> -        rc = decode_instruction(regs, &info.dabt);
> +        rc = decode_instruction(regs, &info.dabt, NULL);
>          if ( rc )
>          {
>              gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> @@ -143,9 +165,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> *regs, }
>  
>      if ( info.dabt.write )
> -        return handle_write(handler, v, &info);
> +        rc = handle_write(handler, v, &info);
>      else
> -        return handle_read(handler, v, &info);
> +        rc = handle_read(handler, v, &info);
> +
> +    if ( instr.value != 0 )
> +    {
> +        post_incremenet_register(&instr);
> +    }
> +    return rc;
>  }
>  
>  void register_mmio_handler(struct domain *d,
> diff --git a/xen/include/asm-arm/hsr.h b/xen/include/asm-arm/hsr.h
> index 9b91b28c48..72d67d2801 100644
> --- a/xen/include/asm-arm/hsr.h
> +++ b/xen/include/asm-arm/hsr.h
> @@ -15,6 +15,32 @@ enum dabt_size {
>      DABT_DOUBLE_WORD = 3,
>  };
>  
> +/*
> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
> + * Page 318 specifies the following bit pattern for
> + * "load/store register (immediate post-indexed)".
> + *
> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
> + * ___________________________________________________________________
> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
> + * |____|______|__|____|____|__|_______________|____|_________|_______|
> + */
> +union ldr_str_instr_class {
> +    uint32_t value;
> +    struct ldr_str {
> +        unsigned int rt:5;     /* Rt register */

I don't think it's a particular good idea to use a bit-field here, if that
is expected to mimic a certain hardware provided bit pattern.
It works in practise (TM), but the C standard does not guarantee the order
the bits are allocated (ISO/IEC 9899:201x §6.7.2.1, stanza 11).
Since you are *reading* only from the instruction word, you should get away
with accessor macros to extract the bits you need. For instance for
filtering the opcode, you could use: ((insn & 0x3fe00c00) == 0x38400400)

Cheers,
Andre

> +        unsigned int rn:5;     /* Rn register */
> +        unsigned int fixed1:2; /* value == 01b */
> +        int imm9:9;            /* imm9 */
> +        unsigned int fixed2:1; /* value == 0b */
> +        unsigned int opc:2;    /* opc */
> +        unsigned int fixed3:2; /* value == 00b */
> +        unsigned int v:1;      /* vector */
> +        unsigned int fixed4:3; /* value == 111b */
> +        unsigned int size:2;   /* size */
> +    } code;
> +};
> +
>  union hsr {
>      register_t bits;
>      struct {


Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Ayan Kumar Halder 2 years, 4 months ago
Hi Andre,

Thanks for your comments. They are useful.

On 30/11/2021 09:49, Andre Przywara wrote:
> On Mon, 29 Nov 2021 19:16:38 +0000
> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
> 
> Hi,
> 
>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
>> could use to access MMIO regions.
>>
>> For instance, Xilinx baremetal OS will use:
>>
>>          volatile u32 *LocalAddr = (volatile u32 *)Addr;
>>          *LocalAddr = Value;
>>
>> This leave the compiler to decide which store instructions to use.
> 
> As mentioned in the other email, this is wrong, if this points to MMIO:
> don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
> an MMIO area, you should not see traps that you cannot handle already.
> 
> So I don't think it's a good idea to use that as an example. And since
> this patch only seems to address this use case, I would doubt its
> usefulness in general.
Yes, I should have fixed the comment.

Currently, I am testing with baremetal app which uses inline assembly 
code with post indexing instructions, to access the MMIO.

ATM, I am testing with 32 bit MMIO only.

On the usefulness, I am kind of torn as it is legitimate for post 
indexing instructions to be used in an inline-assembly code for 
accessing MMIO. However, that may not be something commonly seen.

@Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can 
you comment if we should have decoding logic or not ?

> 
>> This
>> may be a post-index store instruction where the HW will not provide a
>> valid syndrome.
>>
>> In order to handle post-indexing store/load instructions, Xen will need
>> to fetch and decode the instruction.
>>
>> This patch only cover post-index store/load instructions from AArch64
>> mode. For now, this is left unimplemented for trap from AArch32 mode.
>>
>> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
>> ---
>>
>> Changelog :-
>>
>> v2 :- 1. Updated the rn register after reading from it. (Pointed by
>> Julien, Stefano)
>> 2. Used a union to represent the instruction opcode (Suggestd by
>> Bertrand) 3. Fixed coding style issues (Pointed by Julien)
>> 4. In the previous patch, I was updating dabt->sign based on the
>> signedness of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI
>> 0487G.b, Page 3221, SSE indicates the signedness of the data item
>> loaded. In our case, the data item loaded is always unsigned.
>>
>> This has been tested for the following cases :-
>> ldr x2, [x1], #4
> 
> As Jan already mentioned: this is a bad example. First, this is a 64-bit
> access, which you don't emulate below. And second, you want to keep the
> pointer aligned. Unaligned accesses to device memory always trap, as per
> the architecture, even on bare metal.
> 
>>
>> ldr w2, [x1], #-4
>>
>> str x2, [x1], #4
> 
> Same as above.
> 
>> str w2, [x1], #-4
>>
>> The reason being  I am testing on 32bit MMIO registers only. I don't see
>> a 8bit or 16bit MMIO register.
> 
> Where did you look? There are plenty of examples out there, even the GIC
> allows 8-bit accesses to certain registers (grep for "VGIC_ACCESS_"), and
> the Linux GIC driver is using them (but with proper accessors, of course).
> Also GICv3 supports 64-bit accesses to some registers. Some PL011 UARTs use
> 16-bit MMIO accesses.
Yes, sorry I see them now. GICD_IPRIORITYR can be accessed with 8 bits.
Unfortunately, I have GIC-v2 on my hardware system. So, probably I can't 
test 64 bit access.

> 
>>   xen/arch/arm/decode.c     | 68 ++++++++++++++++++++++++++++++++++++++-
>>   xen/arch/arm/decode.h     |  3 +-
>>   xen/arch/arm/io.c         | 40 +++++++++++++++++++----
>>   xen/include/asm-arm/hsr.h | 26 +++++++++++++++
>>   4 files changed, 129 insertions(+), 8 deletions(-)
>>
>> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
>> index 792c2e92a7..0b3e8fcbc6 100644
>> --- a/xen/arch/arm/decode.c
>> +++ b/xen/arch/arm/decode.c
>> @@ -84,6 +84,66 @@ bad_thumb2:
>>       return 1;
>>   }
>>   
>> +static int decode_32bit_loadstore_postindexing(register_t pc,
>> +                                               struct hsr_dabt *dabt,
>> +                                               union ldr_str_instr_class *instr)
>> +{
>> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
>> +        return -EFAULT;
>> +
>> +    /* First, let's check for the fixed values */
>> +    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
>> +         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
>> +    {
>> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
>> +            " ldr/str post indexing\n");
>> +        goto bad_32bit_loadstore;
>> +    }
>> +
>> +    if ( instr->code.size != 2 )
> 
> I don't see a good reason for this limitation. If you are going to dissect
> the instruction, why not just support at least all access widths, so
> 64-bits, but also {ldr,str}{b,w}? I think the framework does the heavy
> lifting for you already?

I see your point. My intention was to test first with the restricted 
instruction set (ie ldr/str - 32 bit access with post indexing only) and 
get an opinion from the community. If the patch looks sane, then this 
can be extended with other variants as well.

> Same for the restriction to post-index above, supporting pre-index as well
> should be easy.
For Pre-indexing instruction, the ISS is valid. So I am not sure what is 
to be done here?

AFAIU, if the ISS is valid, there is no need to explicitly decode the 
instructions.
> 
> To me this has the bitter taste for being a one trick pony to work around
> your particular (broken!) use case.
> 
>> +    {
>> +        gprintk(XENLOG_ERR,
>> +            "ldr/str post indexing is supported for 32 bit variant only\n");
>> +        goto bad_32bit_loadstore;
>> +    }
>> +
>> +    if ( instr->code.v != 0 )
>> +    {
>> +        gprintk(XENLOG_ERR,
>> +            "ldr/str post indexing for vector types are not supported\n");
>> +        goto bad_32bit_loadstore;
>> +    }
>> +
>> +    /* Check for STR (immediate) - 32 bit variant */
>> +    if ( instr->code.opc == 0 )
>> +    {
>> +        dabt->write = 1;
>> +    }
>> +    /* Check for LDR (immediate) - 32 bit variant */
>> +    else if ( instr->code.opc == 1 )
>> +    {
>> +        dabt->write = 0;
>> +    }
>> +    else
>> +    {
>> +        gprintk(XENLOG_ERR,
>> +            "Decoding ldr/str post indexing is not supported for this variant\n");
>> +        goto bad_32bit_loadstore;
>> +    }
>> +
>> +    gprintk(XENLOG_INFO,
>> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
>> +        instr->code.rt, instr->code.size, instr->code.imm9);
>> +
>> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
>> +    dabt->valid = 1;
>> +
>> +    return 0;
>> +bad_32bit_loadstore:
>> +    gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", instr->value);
>> +    return 1;
>> +}
>> +
>>   static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
>>   {
>>       uint16_t instr;
>> @@ -150,11 +210,17 @@ bad_thumb:
>>       return 1;
>>   }
>>   
>> -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
>> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
>> +                       union ldr_str_instr_class *instr)
>>   {
>>       if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
>>           return decode_thumb(regs->pc, dabt);
>>   
>> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
>> +    {
>> +        return decode_32bit_loadstore_postindexing(regs->pc, dabt, instr);
>> +    }
>> +
>>       /* TODO: Handle ARM instruction */
>>       gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
>>   
>> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
>> index 4613763bdb..d82fc4a0f6 100644
>> --- a/xen/arch/arm/decode.h
>> +++ b/xen/arch/arm/decode.h
>> @@ -35,7 +35,8 @@
>>    */
>>   
>>   int decode_instruction(const struct cpu_user_regs *regs,
>> -                       struct hsr_dabt *dabt);
>> +                       struct hsr_dabt *dabt,
>> +                       union ldr_str_instr_class *instr);
>>   
>>   #endif /* __ARCH_ARM_DECODE_H_ */
>>   
>> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
>> index 729287e37c..0d60754bc4 100644
>> --- a/xen/arch/arm/io.c
>> +++ b/xen/arch/arm/io.c
>> @@ -65,6 +65,16 @@ static enum io_state handle_write(const struct
>> mmio_handler *handler, return ret ? IO_HANDLED : IO_ABORT;
>>   }
>>   
>> +static void post_incremenet_register(union ldr_str_instr_class *instr)
>> +{
>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>> +    unsigned int val;
>> +
>> +    val = get_user_reg(regs, instr->code.rn);
>> +    val += instr->code.imm9;
>> +    set_user_reg(regs, instr->code.rn, val);
>> +}
>> +
>>   /* This function assumes that mmio regions are not overlapped */
>>   static int cmp_mmio_handler(const void *key, const void *elem)
>>   {
>> @@ -106,14 +116,26 @@ enum io_state try_handle_mmio(struct cpu_user_regs
>> *regs, .gpa = gpa,
>>           .dabt = dabt
>>       };
>> +    int rc;
>> +    union ldr_str_instr_class instr = {0};
>>   
>>       ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
>>   
>> +    /*
>> +     * Armv8 processor does not provide a valid syndrome for post-indexing
>> +     * ldr/str instructions. So in order to process these instructions,
>> +     * Xen must decode them.
>> +     */
>> +    if ( !info.dabt.valid )
>> +    {
>> +        rc = decode_instruction(regs, &info.dabt, &instr);
>> +        if ( rc )
>> +            return IO_ABORT;
>> +    }
>> +
>>       handler = find_mmio_handler(v->domain, info.gpa);
>>       if ( !handler )
>>       {
>> -        int rc;
>> -
>>           rc = try_fwd_ioserv(regs, v, &info);
>>           if ( rc == IO_HANDLED )
>>               return handle_ioserv(regs, v);
>> @@ -122,7 +144,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs
>> *regs, }
>>   
>>       /* All the instructions used on emulated MMIO region should be
>> valid */
>> -    if ( !dabt.valid )
>> +    if ( !info.dabt.valid )
>>           return IO_ABORT;
>>   
>>       /*
>> @@ -134,7 +156,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs
>> *regs, {
>>           int rc;
>>   
>> -        rc = decode_instruction(regs, &info.dabt);
>> +        rc = decode_instruction(regs, &info.dabt, NULL);
>>           if ( rc )
>>           {
>>               gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
>> @@ -143,9 +165,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs
>> *regs, }
>>   
>>       if ( info.dabt.write )
>> -        return handle_write(handler, v, &info);
>> +        rc = handle_write(handler, v, &info);
>>       else
>> -        return handle_read(handler, v, &info);
>> +        rc = handle_read(handler, v, &info);
>> +
>> +    if ( instr.value != 0 )
>> +    {
>> +        post_incremenet_register(&instr);
>> +    }
>> +    return rc;
>>   }
>>   
>>   void register_mmio_handler(struct domain *d,
>> diff --git a/xen/include/asm-arm/hsr.h b/xen/include/asm-arm/hsr.h
>> index 9b91b28c48..72d67d2801 100644
>> --- a/xen/include/asm-arm/hsr.h
>> +++ b/xen/include/asm-arm/hsr.h
>> @@ -15,6 +15,32 @@ enum dabt_size {
>>       DABT_DOUBLE_WORD = 3,
>>   };
>>   
>> +/*
>> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
>> + * Page 318 specifies the following bit pattern for
>> + * "load/store register (immediate post-indexed)".
>> + *
>> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
>> + * ___________________________________________________________________
>> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
>> + * |____|______|__|____|____|__|_______________|____|_________|_______|
>> + */
>> +union ldr_str_instr_class {
>> +    uint32_t value;
>> +    struct ldr_str {
>> +        unsigned int rt:5;     /* Rt register */
> 
> I don't think it's a particular good idea to use a bit-field here, if that
> is expected to mimic a certain hardware provided bit pattern.
> It works in practise (TM), but the C standard does not guarantee the order
> the bits are allocated (ISO/IEC 9899:201x §6.7.2.1, stanza 11).
> Since you are *reading* only from the instruction word, you should get away
> with accessor macros to extract the bits you need. For instance for
> filtering the opcode, you could use: ((insn & 0x3fe00c00) == 0x38400400)

Yes, this is a very good point. I will use bitmasks to access the bits 
from the register.

I saw the same logic (ie using bitfields) is used for some other 
registers as well. For eg hsr_dabt, hsr_iabt in 
xen/include/asm-arm/hsr.h. May be that needs fixing as well for some 
other time. :)

- Ayan
> 
> Cheers,
> Andre
> 
>> +        unsigned int rn:5;     /* Rn register */
>> +        unsigned int fixed1:2; /* value == 01b */
>> +        int imm9:9;            /* imm9 */
>> +        unsigned int fixed2:1; /* value == 0b */
>> +        unsigned int opc:2;    /* opc */
>> +        unsigned int fixed3:2; /* value == 00b */
>> +        unsigned int v:1;      /* vector */
>> +        unsigned int fixed4:3; /* value == 111b */
>> +        unsigned int size:2;   /* size */
>> +    } code;
>> +};
>> +
>>   union hsr {
>>       register_t bits;
>>       struct {
> 

Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Bertrand Marquis 2 years, 4 months ago
Hi Ayan,

> On 30 Nov 2021, at 19:13, Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
> 
> Hi Andre,
> 
> Thanks for your comments. They are useful.
> 
> On 30/11/2021 09:49, Andre Przywara wrote:
>> On Mon, 29 Nov 2021 19:16:38 +0000
>> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
>> Hi,
>>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
>>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
>>> could use to access MMIO regions.
>>> 
>>> For instance, Xilinx baremetal OS will use:
>>> 
>>>         volatile u32 *LocalAddr = (volatile u32 *)Addr;
>>>         *LocalAddr = Value;
>>> 
>>> This leave the compiler to decide which store instructions to use.
>> As mentioned in the other email, this is wrong, if this points to MMIO:
>> don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
>> an MMIO area, you should not see traps that you cannot handle already.
>> So I don't think it's a good idea to use that as an example. And since
>> this patch only seems to address this use case, I would doubt its
>> usefulness in general.
> Yes, I should have fixed the comment.
> 
> Currently, I am testing with baremetal app which uses inline assembly code with post indexing instructions, to access the MMIO.
> 
> ATM, I am testing with 32 bit MMIO only.
> 
> On the usefulness, I am kind of torn as it is legitimate for post indexing instructions to be used in an inline-assembly code for accessing MMIO. However, that may not be something commonly seen.
> 
> @Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can you comment if we should have decoding logic or not ?

Andre gave you the official statement from Arm and there is nothing more I can say.
I will leave this decision to Stefano and Julien.

Regards
Bertrand

> 
>>> This
>>> may be a post-index store instruction where the HW will not provide a
>>> valid syndrome.
>>> 
>>> In order to handle post-indexing store/load instructions, Xen will need
>>> to fetch and decode the instruction.
>>> 
>>> This patch only cover post-index store/load instructions from AArch64
>>> mode. For now, this is left unimplemented for trap from AArch32 mode.
>>> 
>>> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
>>> ---
>>> 
>>> Changelog :-
>>> 
>>> v2 :- 1. Updated the rn register after reading from it. (Pointed by
>>> Julien, Stefano)
>>> 2. Used a union to represent the instruction opcode (Suggestd by
>>> Bertrand) 3. Fixed coding style issues (Pointed by Julien)
>>> 4. In the previous patch, I was updating dabt->sign based on the
>>> signedness of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI
>>> 0487G.b, Page 3221, SSE indicates the signedness of the data item
>>> loaded. In our case, the data item loaded is always unsigned.
>>> 
>>> This has been tested for the following cases :-
>>> ldr x2, [x1], #4
>> As Jan already mentioned: this is a bad example. First, this is a 64-bit
>> access, which you don't emulate below. And second, you want to keep the
>> pointer aligned. Unaligned accesses to device memory always trap, as per
>> the architecture, even on bare metal.
>>> 
>>> ldr w2, [x1], #-4
>>> 
>>> str x2, [x1], #4
>> Same as above.
>>> str w2, [x1], #-4
>>> 
>>> The reason being  I am testing on 32bit MMIO registers only. I don't see
>>> a 8bit or 16bit MMIO register.
>> Where did you look? There are plenty of examples out there, even the GIC
>> allows 8-bit accesses to certain registers (grep for "VGIC_ACCESS_"), and
>> the Linux GIC driver is using them (but with proper accessors, of course).
>> Also GICv3 supports 64-bit accesses to some registers. Some PL011 UARTs use
>> 16-bit MMIO accesses.
> Yes, sorry I see them now. GICD_IPRIORITYR can be accessed with 8 bits.
> Unfortunately, I have GIC-v2 on my hardware system. So, probably I can't test 64 bit access.
> 
>>>  xen/arch/arm/decode.c     | 68 ++++++++++++++++++++++++++++++++++++++-
>>>  xen/arch/arm/decode.h     |  3 +-
>>>  xen/arch/arm/io.c         | 40 +++++++++++++++++++----
>>>  xen/include/asm-arm/hsr.h | 26 +++++++++++++++
>>>  4 files changed, 129 insertions(+), 8 deletions(-)
>>> 
>>> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
>>> index 792c2e92a7..0b3e8fcbc6 100644
>>> --- a/xen/arch/arm/decode.c
>>> +++ b/xen/arch/arm/decode.c
>>> @@ -84,6 +84,66 @@ bad_thumb2:
>>>      return 1;
>>>  }
>>>  +static int decode_32bit_loadstore_postindexing(register_t pc,
>>> +                                               struct hsr_dabt *dabt,
>>> +                                               union ldr_str_instr_class *instr)
>>> +{
>>> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
>>> +        return -EFAULT;
>>> +
>>> +    /* First, let's check for the fixed values */
>>> +    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
>>> +         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
>>> +    {
>>> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
>>> +            " ldr/str post indexing\n");
>>> +        goto bad_32bit_loadstore;
>>> +    }
>>> +
>>> +    if ( instr->code.size != 2 )
>> I don't see a good reason for this limitation. If you are going to dissect
>> the instruction, why not just support at least all access widths, so
>> 64-bits, but also {ldr,str}{b,w}? I think the framework does the heavy
>> lifting for you already?
> 
> I see your point. My intention was to test first with the restricted instruction set (ie ldr/str - 32 bit access with post indexing only) and get an opinion from the community. If the patch looks sane, then this can be extended with other variants as well.
> 
>> Same for the restriction to post-index above, supporting pre-index as well
>> should be easy.
> For Pre-indexing instruction, the ISS is valid. So I am not sure what is to be done here?
> 
> AFAIU, if the ISS is valid, there is no need to explicitly decode the instructions.
>> To me this has the bitter taste for being a one trick pony to work around
>> your particular (broken!) use case.
>>> +    {
>>> +        gprintk(XENLOG_ERR,
>>> +            "ldr/str post indexing is supported for 32 bit variant only\n");
>>> +        goto bad_32bit_loadstore;
>>> +    }
>>> +
>>> +    if ( instr->code.v != 0 )
>>> +    {
>>> +        gprintk(XENLOG_ERR,
>>> +            "ldr/str post indexing for vector types are not supported\n");
>>> +        goto bad_32bit_loadstore;
>>> +    }
>>> +
>>> +    /* Check for STR (immediate) - 32 bit variant */
>>> +    if ( instr->code.opc == 0 )
>>> +    {
>>> +        dabt->write = 1;
>>> +    }
>>> +    /* Check for LDR (immediate) - 32 bit variant */
>>> +    else if ( instr->code.opc == 1 )
>>> +    {
>>> +        dabt->write = 0;
>>> +    }
>>> +    else
>>> +    {
>>> +        gprintk(XENLOG_ERR,
>>> +            "Decoding ldr/str post indexing is not supported for this variant\n");
>>> +        goto bad_32bit_loadstore;
>>> +    }
>>> +
>>> +    gprintk(XENLOG_INFO,
>>> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
>>> +        instr->code.rt, instr->code.size, instr->code.imm9);
>>> +
>>> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
>>> +    dabt->valid = 1;
>>> +
>>> +    return 0;
>>> +bad_32bit_loadstore:
>>> +    gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", instr->value);
>>> +    return 1;
>>> +}
>>> +
>>>  static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
>>>  {
>>>      uint16_t instr;
>>> @@ -150,11 +210,17 @@ bad_thumb:
>>>      return 1;
>>>  }
>>>  -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
>>> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
>>> +                       union ldr_str_instr_class *instr)
>>>  {
>>>      if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
>>>          return decode_thumb(regs->pc, dabt);
>>>  +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
>>> +    {
>>> +        return decode_32bit_loadstore_postindexing(regs->pc, dabt, instr);
>>> +    }
>>> +
>>>      /* TODO: Handle ARM instruction */
>>>      gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
>>>  diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
>>> index 4613763bdb..d82fc4a0f6 100644
>>> --- a/xen/arch/arm/decode.h
>>> +++ b/xen/arch/arm/decode.h
>>> @@ -35,7 +35,8 @@
>>>   */
>>>    int decode_instruction(const struct cpu_user_regs *regs,
>>> -                       struct hsr_dabt *dabt);
>>> +                       struct hsr_dabt *dabt,
>>> +                       union ldr_str_instr_class *instr);
>>>    #endif /* __ARCH_ARM_DECODE_H_ */
>>>  diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
>>> index 729287e37c..0d60754bc4 100644
>>> --- a/xen/arch/arm/io.c
>>> +++ b/xen/arch/arm/io.c
>>> @@ -65,6 +65,16 @@ static enum io_state handle_write(const struct
>>> mmio_handler *handler, return ret ? IO_HANDLED : IO_ABORT;
>>>  }
>>>  +static void post_incremenet_register(union ldr_str_instr_class *instr)
>>> +{
>>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>>> +    unsigned int val;
>>> +
>>> +    val = get_user_reg(regs, instr->code.rn);
>>> +    val += instr->code.imm9;
>>> +    set_user_reg(regs, instr->code.rn, val);
>>> +}
>>> +
>>>  /* This function assumes that mmio regions are not overlapped */
>>>  static int cmp_mmio_handler(const void *key, const void *elem)
>>>  {
>>> @@ -106,14 +116,26 @@ enum io_state try_handle_mmio(struct cpu_user_regs
>>> *regs, .gpa = gpa,
>>>          .dabt = dabt
>>>      };
>>> +    int rc;
>>> +    union ldr_str_instr_class instr = {0};
>>>        ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
>>>  +    /*
>>> +     * Armv8 processor does not provide a valid syndrome for post-indexing
>>> +     * ldr/str instructions. So in order to process these instructions,
>>> +     * Xen must decode them.
>>> +     */
>>> +    if ( !info.dabt.valid )
>>> +    {
>>> +        rc = decode_instruction(regs, &info.dabt, &instr);
>>> +        if ( rc )
>>> +            return IO_ABORT;
>>> +    }
>>> +
>>>      handler = find_mmio_handler(v->domain, info.gpa);
>>>      if ( !handler )
>>>      {
>>> -        int rc;
>>> -
>>>          rc = try_fwd_ioserv(regs, v, &info);
>>>          if ( rc == IO_HANDLED )
>>>              return handle_ioserv(regs, v);
>>> @@ -122,7 +144,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs
>>> *regs, }
>>>        /* All the instructions used on emulated MMIO region should be
>>> valid */
>>> -    if ( !dabt.valid )
>>> +    if ( !info.dabt.valid )
>>>          return IO_ABORT;
>>>        /*
>>> @@ -134,7 +156,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs
>>> *regs, {
>>>          int rc;
>>>  -        rc = decode_instruction(regs, &info.dabt);
>>> +        rc = decode_instruction(regs, &info.dabt, NULL);
>>>          if ( rc )
>>>          {
>>>              gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
>>> @@ -143,9 +165,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs
>>> *regs, }
>>>        if ( info.dabt.write )
>>> -        return handle_write(handler, v, &info);
>>> +        rc = handle_write(handler, v, &info);
>>>      else
>>> -        return handle_read(handler, v, &info);
>>> +        rc = handle_read(handler, v, &info);
>>> +
>>> +    if ( instr.value != 0 )
>>> +    {
>>> +        post_incremenet_register(&instr);
>>> +    }
>>> +    return rc;
>>>  }
>>>    void register_mmio_handler(struct domain *d,
>>> diff --git a/xen/include/asm-arm/hsr.h b/xen/include/asm-arm/hsr.h
>>> index 9b91b28c48..72d67d2801 100644
>>> --- a/xen/include/asm-arm/hsr.h
>>> +++ b/xen/include/asm-arm/hsr.h
>>> @@ -15,6 +15,32 @@ enum dabt_size {
>>>      DABT_DOUBLE_WORD = 3,
>>>  };
>>>  +/*
>>> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
>>> + * Page 318 specifies the following bit pattern for
>>> + * "load/store register (immediate post-indexed)".
>>> + *
>>> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
>>> + * ___________________________________________________________________
>>> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
>>> + * |____|______|__|____|____|__|_______________|____|_________|_______|
>>> + */
>>> +union ldr_str_instr_class {
>>> +    uint32_t value;
>>> +    struct ldr_str {
>>> +        unsigned int rt:5;     /* Rt register */
>> I don't think it's a particular good idea to use a bit-field here, if that
>> is expected to mimic a certain hardware provided bit pattern.
>> It works in practise (TM), but the C standard does not guarantee the order
>> the bits are allocated (ISO/IEC 9899:201x §6.7.2.1, stanza 11).
>> Since you are *reading* only from the instruction word, you should get away
>> with accessor macros to extract the bits you need. For instance for
>> filtering the opcode, you could use: ((insn & 0x3fe00c00) == 0x38400400)
> 
> Yes, this is a very good point. I will use bitmasks to access the bits from the register.
> 
> I saw the same logic (ie using bitfields) is used for some other registers as well. For eg hsr_dabt, hsr_iabt in xen/include/asm-arm/hsr.h. May be that needs fixing as well for some other time. :)
> 
> - Ayan
>> Cheers,
>> Andre
>>> +        unsigned int rn:5;     /* Rn register */
>>> +        unsigned int fixed1:2; /* value == 01b */
>>> +        int imm9:9;            /* imm9 */
>>> +        unsigned int fixed2:1; /* value == 0b */
>>> +        unsigned int opc:2;    /* opc */
>>> +        unsigned int fixed3:2; /* value == 00b */
>>> +        unsigned int v:1;      /* vector */
>>> +        unsigned int fixed4:3; /* value == 111b */
>>> +        unsigned int size:2;   /* size */
>>> +    } code;
>>> +};
>>> +
>>>  union hsr {
>>>      register_t bits;
>>>      struct {

Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Andre Przywara 2 years, 4 months ago
On Wed, 1 Dec 2021 08:41:13 +0000
Bertrand Marquis <Bertrand.Marquis@arm.com> wrote:

Hi,

> > On 30 Nov 2021, at 19:13, Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
> >
> > Hi Andre,
> >
> > Thanks for your comments. They are useful.
> >
> > On 30/11/2021 09:49, Andre Przywara wrote:  
> >> On Mon, 29 Nov 2021 19:16:38 +0000
> >> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
> >> Hi,  
> >>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
> >>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
> >>> could use to access MMIO regions.
> >>>
> >>> For instance, Xilinx baremetal OS will use:
> >>>
> >>>         volatile u32 *LocalAddr = (volatile u32 *)Addr;
> >>>         *LocalAddr = Value;
> >>>
> >>> This leave the compiler to decide which store instructions to use.  
> >> As mentioned in the other email, this is wrong, if this points to MMIO:
> >> don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
> >> an MMIO area, you should not see traps that you cannot handle already.
> >> So I don't think it's a good idea to use that as an example. And since
> >> this patch only seems to address this use case, I would doubt its
> >> usefulness in general.  
> > Yes, I should have fixed the comment.
> >
> > Currently, I am testing with baremetal app which uses inline assembly code with post indexing instructions, to access the MMIO.
> >
> > ATM, I am testing with 32 bit MMIO only.
> >
> > On the usefulness, I am kind of torn as it is legitimate for post indexing instructions to be used in an inline-assembly code for accessing MMIO. However, that may not be something commonly seen.
> >
> > @Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can you comment if we should have decoding logic or not ?  
> 
> Andre gave you the official statement from Arm and there is nothing more I can say.
> I will leave this decision to Stefano and Julien.

Well, I gave some statement about what the architecture says, and about
the problems for letting a compiler generate MMIO accesses.
That doesn't prevent anyone from emulating those instructions, but I
wonder if this is doing more harm than good (little tested code in
a sensitive code path of the HV), given the use cases. If this is
only to cover the (illegitimate) use case at hand, it's certainly not
worth it, IMHO.

Thanks,
Andre

> 
> Regards
> Bertrand
> 
> >  
> >>> This
> >>> may be a post-index store instruction where the HW will not provide a
> >>> valid syndrome.
> >>>
> >>> In order to handle post-indexing store/load instructions, Xen will need
> >>> to fetch and decode the instruction.
> >>>
> >>> This patch only cover post-index store/load instructions from AArch64
> >>> mode. For now, this is left unimplemented for trap from AArch32 mode.
> >>>
> >>> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
> >>> ---
> >>>
> >>> Changelog :-
> >>>
> >>> v2 :- 1. Updated the rn register after reading from it. (Pointed by
> >>> Julien, Stefano)
> >>> 2. Used a union to represent the instruction opcode (Suggestd by
> >>> Bertrand) 3. Fixed coding style issues (Pointed by Julien)
> >>> 4. In the previous patch, I was updating dabt->sign based on the
> >>> signedness of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI
> >>> 0487G.b, Page 3221, SSE indicates the signedness of the data item
> >>> loaded. In our case, the data item loaded is always unsigned.
> >>>
> >>> This has been tested for the following cases :-
> >>> ldr x2, [x1], #4  
> >> As Jan already mentioned: this is a bad example. First, this is a 64-bit
> >> access, which you don't emulate below. And second, you want to keep the
> >> pointer aligned. Unaligned accesses to device memory always trap, as per
> >> the architecture, even on bare metal.  
> >>>
> >>> ldr w2, [x1], #-4
> >>>
> >>> str x2, [x1], #4  
> >> Same as above.  
> >>> str w2, [x1], #-4
> >>>
> >>> The reason being  I am testing on 32bit MMIO registers only. I don't see
> >>> a 8bit or 16bit MMIO register.  
> >> Where did you look? There are plenty of examples out there, even the GIC
> >> allows 8-bit accesses to certain registers (grep for "VGIC_ACCESS_"), and
> >> the Linux GIC driver is using them (but with proper accessors, of course).
> >> Also GICv3 supports 64-bit accesses to some registers. Some PL011 UARTs use
> >> 16-bit MMIO accesses.  
> > Yes, sorry I see them now. GICD_IPRIORITYR can be accessed with 8 bits.
> > Unfortunately, I have GIC-v2 on my hardware system. So, probably I can't test 64 bit access.
> >  
> >>>  xen/arch/arm/decode.c     | 68 ++++++++++++++++++++++++++++++++++++++-
> >>>  xen/arch/arm/decode.h     |  3 +-
> >>>  xen/arch/arm/io.c         | 40 +++++++++++++++++++----
> >>>  xen/include/asm-arm/hsr.h | 26 +++++++++++++++
> >>>  4 files changed, 129 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
> >>> index 792c2e92a7..0b3e8fcbc6 100644
> >>> --- a/xen/arch/arm/decode.c
> >>> +++ b/xen/arch/arm/decode.c
> >>> @@ -84,6 +84,66 @@ bad_thumb2:
> >>>      return 1;
> >>>  }
> >>>  +static int decode_32bit_loadstore_postindexing(register_t pc,
> >>> +                                               struct hsr_dabt *dabt,
> >>> +                                               union ldr_str_instr_class *instr)
> >>> +{
> >>> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
> >>> +        return -EFAULT;
> >>> +
> >>> +    /* First, let's check for the fixed values */
> >>> +    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
> >>> +         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
> >>> +    {
> >>> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
> >>> +            " ldr/str post indexing\n");
> >>> +        goto bad_32bit_loadstore;
> >>> +    }
> >>> +
> >>> +    if ( instr->code.size != 2 )  
> >> I don't see a good reason for this limitation. If you are going to dissect
> >> the instruction, why not just support at least all access widths, so
> >> 64-bits, but also {ldr,str}{b,w}? I think the framework does the heavy
> >> lifting for you already?  
> >
> > I see your point. My intention was to test first with the restricted instruction set (ie ldr/str - 32 bit access with post indexing only) and get an opinion from the community. If the patch looks sane, then this can be extended with other variants as well.
> >  
> >> Same for the restriction to post-index above, supporting pre-index as well
> >> should be easy.  
> > For Pre-indexing instruction, the ISS is valid. So I am not sure what is to be done here?
> >
> > AFAIU, if the ISS is valid, there is no need to explicitly decode the instructions.  
> >> To me this has the bitter taste for being a one trick pony to work around
> >> your particular (broken!) use case.  
> >>> +    {
> >>> +        gprintk(XENLOG_ERR,
> >>> +            "ldr/str post indexing is supported for 32 bit variant only\n");
> >>> +        goto bad_32bit_loadstore;
> >>> +    }
> >>> +
> >>> +    if ( instr->code.v != 0 )
> >>> +    {
> >>> +        gprintk(XENLOG_ERR,
> >>> +            "ldr/str post indexing for vector types are not supported\n");
> >>> +        goto bad_32bit_loadstore;
> >>> +    }
> >>> +
> >>> +    /* Check for STR (immediate) - 32 bit variant */
> >>> +    if ( instr->code.opc == 0 )
> >>> +    {
> >>> +        dabt->write = 1;
> >>> +    }
> >>> +    /* Check for LDR (immediate) - 32 bit variant */
> >>> +    else if ( instr->code.opc == 1 )
> >>> +    {
> >>> +        dabt->write = 0;
> >>> +    }
> >>> +    else
> >>> +    {
> >>> +        gprintk(XENLOG_ERR,
> >>> +            "Decoding ldr/str post indexing is not supported for this variant\n");
> >>> +        goto bad_32bit_loadstore;
> >>> +    }
> >>> +
> >>> +    gprintk(XENLOG_INFO,
> >>> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
> >>> +        instr->code.rt, instr->code.size, instr->code.imm9);
> >>> +
> >>> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
> >>> +    dabt->valid = 1;
> >>> +
> >>> +    return 0;
> >>> +bad_32bit_loadstore:
> >>> +    gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", instr->value);
> >>> +    return 1;
> >>> +}
> >>> +
> >>>  static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
> >>>  {
> >>>      uint16_t instr;
> >>> @@ -150,11 +210,17 @@ bad_thumb:
> >>>      return 1;
> >>>  }
> >>>  -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
> >>> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
> >>> +                       union ldr_str_instr_class *instr)
> >>>  {
> >>>      if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
> >>>          return decode_thumb(regs->pc, dabt);
> >>>  +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
> >>> +    {
> >>> +        return decode_32bit_loadstore_postindexing(regs->pc, dabt, instr);
> >>> +    }
> >>> +
> >>>      /* TODO: Handle ARM instruction */
> >>>      gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
> >>>  diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
> >>> index 4613763bdb..d82fc4a0f6 100644
> >>> --- a/xen/arch/arm/decode.h
> >>> +++ b/xen/arch/arm/decode.h
> >>> @@ -35,7 +35,8 @@
> >>>   */
> >>>    int decode_instruction(const struct cpu_user_regs *regs,
> >>> -                       struct hsr_dabt *dabt);
> >>> +                       struct hsr_dabt *dabt,
> >>> +                       union ldr_str_instr_class *instr);
> >>>    #endif /* __ARCH_ARM_DECODE_H_ */
> >>>  diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> >>> index 729287e37c..0d60754bc4 100644
> >>> --- a/xen/arch/arm/io.c
> >>> +++ b/xen/arch/arm/io.c
> >>> @@ -65,6 +65,16 @@ static enum io_state handle_write(const struct
> >>> mmio_handler *handler, return ret ? IO_HANDLED : IO_ABORT;
> >>>  }
> >>>  +static void post_incremenet_register(union ldr_str_instr_class *instr)
> >>> +{
> >>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> >>> +    unsigned int val;
> >>> +
> >>> +    val = get_user_reg(regs, instr->code.rn);
> >>> +    val += instr->code.imm9;
> >>> +    set_user_reg(regs, instr->code.rn, val);
> >>> +}
> >>> +
> >>>  /* This function assumes that mmio regions are not overlapped */
> >>>  static int cmp_mmio_handler(const void *key, const void *elem)
> >>>  {
> >>> @@ -106,14 +116,26 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> >>> *regs, .gpa = gpa,
> >>>          .dabt = dabt
> >>>      };
> >>> +    int rc;
> >>> +    union ldr_str_instr_class instr = {0};
> >>>        ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
> >>>  +    /*
> >>> +     * Armv8 processor does not provide a valid syndrome for post-indexing
> >>> +     * ldr/str instructions. So in order to process these instructions,
> >>> +     * Xen must decode them.
> >>> +     */
> >>> +    if ( !info.dabt.valid )
> >>> +    {
> >>> +        rc = decode_instruction(regs, &info.dabt, &instr);
> >>> +        if ( rc )
> >>> +            return IO_ABORT;
> >>> +    }
> >>> +
> >>>      handler = find_mmio_handler(v->domain, info.gpa);
> >>>      if ( !handler )
> >>>      {
> >>> -        int rc;
> >>> -
> >>>          rc = try_fwd_ioserv(regs, v, &info);
> >>>          if ( rc == IO_HANDLED )
> >>>              return handle_ioserv(regs, v);
> >>> @@ -122,7 +144,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> >>> *regs, }
> >>>        /* All the instructions used on emulated MMIO region should be
> >>> valid */
> >>> -    if ( !dabt.valid )
> >>> +    if ( !info.dabt.valid )
> >>>          return IO_ABORT;
> >>>        /*
> >>> @@ -134,7 +156,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> >>> *regs, {
> >>>          int rc;
> >>>  -        rc = decode_instruction(regs, &info.dabt);
> >>> +        rc = decode_instruction(regs, &info.dabt, NULL);
> >>>          if ( rc )
> >>>          {
> >>>              gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> >>> @@ -143,9 +165,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> >>> *regs, }
> >>>        if ( info.dabt.write )
> >>> -        return handle_write(handler, v, &info);
> >>> +        rc = handle_write(handler, v, &info);
> >>>      else
> >>> -        return handle_read(handler, v, &info);
> >>> +        rc = handle_read(handler, v, &info);
> >>> +
> >>> +    if ( instr.value != 0 )
> >>> +    {
> >>> +        post_incremenet_register(&instr);
> >>> +    }
> >>> +    return rc;
> >>>  }
> >>>    void register_mmio_handler(struct domain *d,
> >>> diff --git a/xen/include/asm-arm/hsr.h b/xen/include/asm-arm/hsr.h
> >>> index 9b91b28c48..72d67d2801 100644
> >>> --- a/xen/include/asm-arm/hsr.h
> >>> +++ b/xen/include/asm-arm/hsr.h
> >>> @@ -15,6 +15,32 @@ enum dabt_size {
> >>>      DABT_DOUBLE_WORD = 3,
> >>>  };
> >>>  +/*
> >>> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
> >>> + * Page 318 specifies the following bit pattern for
> >>> + * "load/store register (immediate post-indexed)".
> >>> + *
> >>> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
> >>> + * ___________________________________________________________________
> >>> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
> >>> + * |____|______|__|____|____|__|_______________|____|_________|_______|
> >>> + */
> >>> +union ldr_str_instr_class {
> >>> +    uint32_t value;
> >>> +    struct ldr_str {
> >>> +        unsigned int rt:5;     /* Rt register */  
> >> I don't think it's a particular good idea to use a bit-field here, if that
> >> is expected to mimic a certain hardware provided bit pattern.
> >> It works in practise (TM), but the C standard does not guarantee the order
> >> the bits are allocated (ISO/IEC 9899:201x §6.7.2.1, stanza 11).
> >> Since you are *reading* only from the instruction word, you should get away
> >> with accessor macros to extract the bits you need. For instance for
> >> filtering the opcode, you could use: ((insn & 0x3fe00c00) == 0x38400400)  
> >
> > Yes, this is a very good point. I will use bitmasks to access the bits from the register.
> >
> > I saw the same logic (ie using bitfields) is used for some other registers as well. For eg hsr_dabt, hsr_iabt in xen/include/asm-arm/hsr.h. May be that needs fixing as well for some other time. :)

Well, there is no easy and clear answer as to whether to use bit-fields in
those occasions or not. From a practical point of view, it works (TM), and
has some advantages, like saving you from fiddling around with a 9-bit
sign extension, for instance.
But the Linux kernel discourages those works-for-me solutions, one killer
problem here is endianess, which is not a problem for Xen, IIRC.
I personally prefer robust code: but not relying on certain implementation
specifics (and be they very obvious or wide-spread), there will be less
surprises in the future.

So I'd leave this up to the maintainers to decide, IIUC the original
suggestion came from Bertrand?

Cheers,
Andre


> >
> > - Ayan  
> >> Cheers,
> >> Andre  
> >>> +        unsigned int rn:5;     /* Rn register */
> >>> +        unsigned int fixed1:2; /* value == 01b */
> >>> +        int imm9:9;            /* imm9 */
> >>> +        unsigned int fixed2:1; /* value == 0b */
> >>> +        unsigned int opc:2;    /* opc */
> >>> +        unsigned int fixed3:2; /* value == 00b */
> >>> +        unsigned int v:1;      /* vector */
> >>> +        unsigned int fixed4:3; /* value == 111b */
> >>> +        unsigned int size:2;   /* size */
> >>> +    } code;
> >>> +};
> >>> +
> >>>  union hsr {
> >>>      register_t bits;
> >>>      struct {  
> 


Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Julien Grall 2 years, 4 months ago
Hi all,

On 01/12/2021 08:41, Bertrand Marquis wrote:
> Hi Ayan,
> 
>> On 30 Nov 2021, at 19:13, Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
>>
>> Hi Andre,
>>
>> Thanks for your comments. They are useful.
>>
>> On 30/11/2021 09:49, Andre Przywara wrote:
>>> On Mon, 29 Nov 2021 19:16:38 +0000
>>> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
>>> Hi,
>>>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
>>>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
>>>> could use to access MMIO regions.
>>>>
>>>> For instance, Xilinx baremetal OS will use:
>>>>
>>>>          volatile u32 *LocalAddr = (volatile u32 *)Addr;
>>>>          *LocalAddr = Value;
>>>>
>>>> This leave the compiler to decide which store instructions to use.
>>> As mentioned in the other email, this is wrong, if this points to MMIO:
>>> don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
>>> an MMIO area, you should not see traps that you cannot handle already.
>>> So I don't think it's a good idea to use that as an example. And since
>>> this patch only seems to address this use case, I would doubt its
>>> usefulness in general.
>> Yes, I should have fixed the comment.
>>
>> Currently, I am testing with baremetal app which uses inline assembly code with post indexing instructions, to access the MMIO.
>>
>> ATM, I am testing with 32 bit MMIO only.
>>
>> On the usefulness, I am kind of torn as it is legitimate for post indexing instructions to be used in an inline-assembly code for accessing MMIO. However, that may not be something commonly seen.
>>
>> @Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can you comment if we should have decoding logic or not ?
> 
> Andre gave you the official statement from Arm and there is nothing more I can say.

I think this would be handy for other hypervisor and OS developper to 
know what they can expect when running in a virtualized environment. So 
would it be possible to update the Arm Arm reflecting this statement?

> I will leave this decision to Stefano and Julien.

I have had a chat on IRC with Stefano about this. I think the main 
sticking point is the Arm Arm doesn't clearly state those instructions 
should not be used by a virtualized OS on MMIO regions.

To me, this topic looks similar to the set/way instruction dilemma. They 
are a pain to virtualize (and the Arm Arm clearly hint it) but we had to 
do it because some OSes relied on them.

I think the main difference is the Arm Arm doesn't hint they should not 
be used (it only says a valid syndrome is not provided) and the 
implementation should hopefully be smaller and self-contained.

So I would be inclined to allow Xen to decode post-indexing instructions 
(pending the review).

Cheers,

-- 
Julien Grall

Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Andre Przywara 2 years, 4 months ago
On Mon, 6 Dec 2021 19:31:06 +0000
Julien Grall <julien@xen.org> wrote:

Hi,

> On 01/12/2021 08:41, Bertrand Marquis wrote:
> > Hi Ayan,
> >   
> >> On 30 Nov 2021, at 19:13, Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
> >>
> >> Hi Andre,
> >>
> >> Thanks for your comments. They are useful.
> >>
> >> On 30/11/2021 09:49, Andre Przywara wrote:  
> >>> On Mon, 29 Nov 2021 19:16:38 +0000
> >>> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
> >>> Hi,  
> >>>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
> >>>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
> >>>> could use to access MMIO regions.
> >>>>
> >>>> For instance, Xilinx baremetal OS will use:
> >>>>
> >>>>          volatile u32 *LocalAddr = (volatile u32 *)Addr;
> >>>>          *LocalAddr = Value;
> >>>>
> >>>> This leave the compiler to decide which store instructions to use.  
> >>> As mentioned in the other email, this is wrong, if this points to MMIO:
> >>> don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
> >>> an MMIO area, you should not see traps that you cannot handle already.
> >>> So I don't think it's a good idea to use that as an example. And since
> >>> this patch only seems to address this use case, I would doubt its
> >>> usefulness in general.  
> >> Yes, I should have fixed the comment.
> >>
> >> Currently, I am testing with baremetal app which uses inline assembly code with post indexing instructions, to access the MMIO.
> >>
> >> ATM, I am testing with 32 bit MMIO only.
> >>
> >> On the usefulness, I am kind of torn as it is legitimate for post indexing instructions to be used in an inline-assembly code for accessing MMIO. However, that may not be something commonly seen.
> >>
> >> @Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can you comment if we should have decoding logic or not ?  
> > 
> > Andre gave you the official statement from Arm and there is nothing more I can say.  
> 
> I think this would be handy for other hypervisor and OS developper to 
> know what they can expect when running in a virtualized environment. So 
> would it be possible to update the Arm Arm reflecting this statement?

I don't think it's within the scope of the ARM ARM to say that. It just
says that "there is no syndrome information", and your mileage may vary in
working around that.

Personally I would say that if you expect your software to work nicely
under a hypervisor, then just avoid those instructions. The Linux kernel
certainly did so.

You can try to do instruction emulation, but doing this right can get
tricky quickly: think about I$/D$ coherency, MMU on or off, etc.

> > I will leave this decision to Stefano and Julien.  
> 
> I have had a chat on IRC with Stefano about this. I think the main 
> sticking point is the Arm Arm doesn't clearly state those instructions 
> should not be used by a virtualized OS on MMIO regions.

I don't understand why the ARM ARM would need to say that. Certainly you
realise that immediately when trying to use them, and apparently it was not
a problem in the last 8ish years of Xen/ARM's existence.

So it's your decision on having the emulation, I personally would only do
it when there is a *good* use case.
And please apply the demanded scrutiny on the review - including all the
corner cases like Rn=Rt, XZR vs. SP (as Jan said) and possibly MMU status.

Cheers,
Andre

> To me, this topic looks similar to the set/way instruction dilemma. They 
> are a pain to virtualize (and the Arm Arm clearly hint it) but we had to 
> do it because some OSes relied on them.
> 
> I think the main difference is the Arm Arm doesn't hint they should not 
> be used (it only says a valid syndrome is not provided) and the 
> implementation should hopefully be smaller and self-contained.
> 
> So I would be inclined to allow Xen to decode post-indexing instructions 
> (pending the review).
> 
> Cheers,
> 


Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Ayan Kumar Halder 2 years, 3 months ago
Hi,

Thank you so much for your feedback.

I need a couple of clarifications before I can start with the v3 patch.

On 08/12/2021 12:00, Andre Przywara wrote:
> On Mon, 6 Dec 2021 19:31:06 +0000
> Julien Grall <julien@xen.org> wrote:
>
> Hi,
>
>> On 01/12/2021 08:41, Bertrand Marquis wrote:
>>> Hi Ayan,
>>>    
>>>> On 30 Nov 2021, at 19:13, Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
>>>>
>>>> Hi Andre,
>>>>
>>>> Thanks for your comments. They are useful.
>>>>
>>>> On 30/11/2021 09:49, Andre Przywara wrote:
>>>>> On Mon, 29 Nov 2021 19:16:38 +0000
>>>>> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
>>>>> Hi,
>>>>>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
>>>>>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
>>>>>> could use to access MMIO regions.
>>>>>>
>>>>>> For instance, Xilinx baremetal OS will use:
>>>>>>
>>>>>>           volatile u32 *LocalAddr = (volatile u32 *)Addr;
>>>>>>           *LocalAddr = Value;
>>>>>>
>>>>>> This leave the compiler to decide which store instructions to use.
>>>>> As mentioned in the other email, this is wrong, if this points to MMIO:
>>>>> don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
>>>>> an MMIO area, you should not see traps that you cannot handle already.
>>>>> So I don't think it's a good idea to use that as an example. And since
>>>>> this patch only seems to address this use case, I would doubt its
>>>>> usefulness in general.
>>>> Yes, I should have fixed the comment.
>>>>
>>>> Currently, I am testing with baremetal app which uses inline assembly code with post indexing instructions, to access the MMIO.
>>>>
>>>> ATM, I am testing with 32 bit MMIO only.
>>>>
>>>> On the usefulness, I am kind of torn as it is legitimate for post indexing instructions to be used in an inline-assembly code for accessing MMIO. However, that may not be something commonly seen.
>>>>
>>>> @Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can you comment if we should have decoding logic or not ?
>>> Andre gave you the official statement from Arm and there is nothing more I can say.
>> I think this would be handy for other hypervisor and OS developper to
>> know what they can expect when running in a virtualized environment. So
>> would it be possible to update the Arm Arm reflecting this statement?
> I don't think it's within the scope of the ARM ARM to say that. It just
> says that "there is no syndrome information", and your mileage may vary in
> working around that.
>
> Personally I would say that if you expect your software to work nicely
> under a hypervisor, then just avoid those instructions. The Linux kernel
> certainly did so.
>
> You can try to do instruction emulation, but doing this right can get
> tricky quickly: think about I$/D$ coherency, MMU on or off, etc.

I am trying to get all the restrictions that need to be checked. I have 
referred 
https://developer.arm.com/documentation/dui0802/a/A64-General-Instructions/Register-restrictions-for-A64-instructions?lang=en 
and "Arm A64 Instruction Set Architecture - DDI 0596" - LDR (immediate).

So far I only see the following restrictions:-

Rn -ne Rt

Rt -ne SP

You had mentioned the following cases :-

1. XZR vs SP - I see that both these refer to register no 31. Xen gets 
the register number (for Rn/Rt) only, so I am not sure what is to be 
done here.

2. MMU on or off - As I see in try_handle_mmio(), one gets the physical 
address in gpa. So I am not sure what is to be done here.

3. I/D coherency - I don't understand how this affects instruction decoding.

Please help me to understand further.

- Ayan

>
>>> I will leave this decision to Stefano and Julien.
>> I have had a chat on IRC with Stefano about this. I think the main
>> sticking point is the Arm Arm doesn't clearly state those instructions
>> should not be used by a virtualized OS on MMIO regions.
> I don't understand why the ARM ARM would need to say that. Certainly you
> realise that immediately when trying to use them, and apparently it was not
> a problem in the last 8ish years of Xen/ARM's existence.
>
> So it's your decision on having the emulation, I personally would only do
> it when there is a *good* use case.
> And please apply the demanded scrutiny on the review - including all the
> corner cases like Rn=Rt, XZR vs. SP (as Jan said) and possibly MMU status.
>
> Cheers,
> Andre
>
>> To me, this topic looks similar to the set/way instruction dilemma. They
>> are a pain to virtualize (and the Arm Arm clearly hint it) but we had to
>> do it because some OSes relied on them.
>>
>> I think the main difference is the Arm Arm doesn't hint they should not
>> be used (it only says a valid syndrome is not provided) and the
>> implementation should hopefully be smaller and self-contained.
>>
>> So I would be inclined to allow Xen to decode post-indexing instructions
>> (pending the review).
>>
>> Cheers,
>>

Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Andre Przywara 2 years, 3 months ago
On Wed, 5 Jan 2022 16:55:11 +0000
Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:

Hi,

> Thank you so much for your feedback.
> 
> I need a couple of clarifications before I can start with the v3 patch.
> 
> On 08/12/2021 12:00, Andre Przywara wrote:
> > On Mon, 6 Dec 2021 19:31:06 +0000
> > Julien Grall <julien@xen.org> wrote:
> >
> > Hi,
> >  
> >> On 01/12/2021 08:41, Bertrand Marquis wrote:  
> >>> Hi Ayan,
> >>>      
> >>>> On 30 Nov 2021, at 19:13, Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
> >>>>
> >>>> Hi Andre,
> >>>>
> >>>> Thanks for your comments. They are useful.
> >>>>
> >>>> On 30/11/2021 09:49, Andre Przywara wrote:  
> >>>>> On Mon, 29 Nov 2021 19:16:38 +0000
> >>>>> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
> >>>>> Hi,  
> >>>>>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
> >>>>>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
> >>>>>> could use to access MMIO regions.
> >>>>>>
> >>>>>> For instance, Xilinx baremetal OS will use:
> >>>>>>
> >>>>>>           volatile u32 *LocalAddr = (volatile u32 *)Addr;
> >>>>>>           *LocalAddr = Value;
> >>>>>>
> >>>>>> This leave the compiler to decide which store instructions to use.  
> >>>>> As mentioned in the other email, this is wrong, if this points to MMIO:
> >>>>> don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
> >>>>> an MMIO area, you should not see traps that you cannot handle already.
> >>>>> So I don't think it's a good idea to use that as an example. And since
> >>>>> this patch only seems to address this use case, I would doubt its
> >>>>> usefulness in general.  
> >>>> Yes, I should have fixed the comment.
> >>>>
> >>>> Currently, I am testing with baremetal app which uses inline assembly code with post indexing instructions, to access the MMIO.
> >>>>
> >>>> ATM, I am testing with 32 bit MMIO only.
> >>>>
> >>>> On the usefulness, I am kind of torn as it is legitimate for post indexing instructions to be used in an inline-assembly code for accessing MMIO. However, that may not be something commonly seen.
> >>>>
> >>>> @Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can you comment if we should have decoding logic or not ?  
> >>> Andre gave you the official statement from Arm and there is nothing more I can say.  
> >> I think this would be handy for other hypervisor and OS developper to
> >> know what they can expect when running in a virtualized environment. So
> >> would it be possible to update the Arm Arm reflecting this statement?  
> > I don't think it's within the scope of the ARM ARM to say that. It just
> > says that "there is no syndrome information", and your mileage may vary in
> > working around that.
> >
> > Personally I would say that if you expect your software to work nicely
> > under a hypervisor, then just avoid those instructions. The Linux kernel
> > certainly did so.
> >
> > You can try to do instruction emulation, but doing this right can get
> > tricky quickly: think about I$/D$ coherency, MMU on or off, etc.  
> 
> I am trying to get all the restrictions that need to be checked. I have 
> referred 
> https://developer.arm.com/documentation/dui0802/a/A64-General-Instructions/Register-restrictions-for-A64-instructions?lang=en 
> and "Arm A64 Instruction Set Architecture - DDI 0596" - LDR (immediate).
> 
> So far I only see the following restrictions:-
> 
> Rn -ne Rt
> 
> Rt -ne SP
> 
> You had mentioned the following cases :-
> 
> 1. XZR vs SP - I see that both these refer to register no 31. Xen gets 
> the register number (for Rn/Rt) only, so I am not sure what is to be 
> done here.

But the emulation code in Xen needs to know whether number 31 refers to the
stack pointer or to the zero register. This depends on the context of the
instruction. It's nothing magical about it, you just need to figure out
which it is in your case and make sure that the code uses the right
value. In the LDR case n==31 means SP, but t==31 means XZR. The emulation
would need to consider this.

> 2. MMU on or off - As I see in try_handle_mmio(), one gets the physical 
> address in gpa. So I am not sure what is to be done here.

You *might* be fine, if Xen takes care of that, I don't know.
But it needs to explicitly consider those two cases - and also make
sure that caching effects are cared for.

> 3. I/D coherency - I don't understand how this affects instruction decoding.

In the ARM architecture the I cache and D cache are not necessarily
coherent. So whatever the CPU reads via the data bus does not need to be
the same data that the instruction fetcher read a while ago. And while
there is a well-known sequence to make the current data side visible to
the instruction side, there is nothing architectural for the other way
around.
For example the CPU fetches a bunch of instructions into the I cache, then
consumes it from there. Now some code changes those instruction words in
"memory". Without explicit D-cache-clean-to-the-PoU and
I-cache-invalidate+ISB the CPU might still execute the old instructions.
But if your emulation code tries to read the instruction from memory, it
fetches the data-visible side of it - which is not what the CPU executed,
so end up emulating the wrong thing.

Please also keep in mind that the architecture does not *guarantee*
coherency, but in reality it might actually be coherent - either because
of the particular silicon implementation, or because of side effects in
the system (data cache line being evicted, ISB executed). So seeing the
effects of coherency in your testing does not mean you are safe, it might
just happen by chance.

Cheers,
Andre

> Please help me to understand further.
> 
> - Ayan
> 
> >  
> >>> I will leave this decision to Stefano and Julien.  
> >> I have had a chat on IRC with Stefano about this. I think the main
> >> sticking point is the Arm Arm doesn't clearly state those instructions
> >> should not be used by a virtualized OS on MMIO regions.  
> > I don't understand why the ARM ARM would need to say that. Certainly you
> > realise that immediately when trying to use them, and apparently it was not
> > a problem in the last 8ish years of Xen/ARM's existence.
> >
> > So it's your decision on having the emulation, I personally would only do
> > it when there is a *good* use case.
> > And please apply the demanded scrutiny on the review - including all the
> > corner cases like Rn=Rt, XZR vs. SP (as Jan said) and possibly MMU status.
> >
> > Cheers,
> > Andre
> >  
> >> To me, this topic looks similar to the set/way instruction dilemma. They
> >> are a pain to virtualize (and the Arm Arm clearly hint it) but we had to
> >> do it because some OSes relied on them.
> >>
> >> I think the main difference is the Arm Arm doesn't hint they should not
> >> be used (it only says a valid syndrome is not provided) and the
> >> implementation should hopefully be smaller and self-contained.
> >>
> >> So I would be inclined to allow Xen to decode post-indexing instructions
> >> (pending the review).
> >>
> >> Cheers,
> >>  


Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Ayan Kumar Halder 2 years, 3 months ago
Hi Andre,

Many thanks for your inputs. It is making better sense now. Much 
appreciated.

A few questions/clarifications :-

On 06/01/2022 15:33, Andre Przywara wrote:
> On Wed, 5 Jan 2022 16:55:11 +0000
> Ayan Kumar Halder<ayan.kumar.halder@xilinx.com>  wrote:
>
> Hi,
>
>> Thank you so much for your feedback.
>>
>> I need a couple of clarifications before I can start with the v3 patch.
>>
>> On 08/12/2021 12:00, Andre Przywara wrote:
>>> On Mon, 6 Dec 2021 19:31:06 +0000
>>> Julien Grall<julien@xen.org>  wrote:
>>>
>>> Hi,
>>>   
>>>> On 01/12/2021 08:41, Bertrand Marquis wrote:
>>>>> Hi Ayan,
>>>>>       
>>>>>> On 30 Nov 2021, at 19:13, Ayan Kumar Halder<ayan.kumar.halder@xilinx.com>  wrote:
>>>>>>
>>>>>> Hi Andre,
>>>>>>
>>>>>> Thanks for your comments. They are useful.
>>>>>>
>>>>>> On 30/11/2021 09:49, Andre Przywara wrote:
>>>>>>> On Mon, 29 Nov 2021 19:16:38 +0000
>>>>>>> Ayan Kumar Halder<ayan.kumar.halder@xilinx.com>  wrote:
>>>>>>> Hi,
>>>>>>>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
>>>>>>>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
>>>>>>>> could use to access MMIO regions.
>>>>>>>>
>>>>>>>> For instance, Xilinx baremetal OS will use:
>>>>>>>>
>>>>>>>>            volatile u32 *LocalAddr = (volatile u32 *)Addr;
>>>>>>>>            *LocalAddr = Value;
>>>>>>>>
>>>>>>>> This leave the compiler to decide which store instructions to use.
>>>>>>> As mentioned in the other email, this is wrong, if this points to MMIO:
>>>>>>> don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
>>>>>>> an MMIO area, you should not see traps that you cannot handle already.
>>>>>>> So I don't think it's a good idea to use that as an example. And since
>>>>>>> this patch only seems to address this use case, I would doubt its
>>>>>>> usefulness in general.
>>>>>> Yes, I should have fixed the comment.
>>>>>>
>>>>>> Currently, I am testing with baremetal app which uses inline assembly code with post indexing instructions, to access the MMIO.
>>>>>>
>>>>>> ATM, I am testing with 32 bit MMIO only.
>>>>>>
>>>>>> On the usefulness, I am kind of torn as it is legitimate for post indexing instructions to be used in an inline-assembly code for accessing MMIO. However, that may not be something commonly seen.
>>>>>>
>>>>>> @Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can you comment if we should have decoding logic or not ?
>>>>> Andre gave you the official statement from Arm and there is nothing more I can say.
>>>> I think this would be handy for other hypervisor and OS developper to
>>>> know what they can expect when running in a virtualized environment. So
>>>> would it be possible to update the Arm Arm reflecting this statement?
>>> I don't think it's within the scope of the ARM ARM to say that. It just
>>> says that "there is no syndrome information", and your mileage may vary in
>>> working around that.
>>>
>>> Personally I would say that if you expect your software to work nicely
>>> under a hypervisor, then just avoid those instructions. The Linux kernel
>>> certainly did so.
>>>
>>> You can try to do instruction emulation, but doing this right can get
>>> tricky quickly: think about I$/D$ coherency, MMU on or off, etc.
>> I am trying to get all the restrictions that need to be checked. I have
>> referred
>> https://developer.arm.com/documentation/dui0802/a/A64-General-Instructions/Register-restrictions-for-A64-instructions?lang=en  
>> and "Arm A64 Instruction Set Architecture - DDI 0596" - LDR (immediate).
>>
>> So far I only see the following restrictions:-
>>
>> Rn -ne Rt
>>
>> Rt -ne SP
>>
>> You had mentioned the following cases :-
>>
>> 1. XZR vs SP - I see that both these refer to register no 31. Xen gets
>> the register number (for Rn/Rt) only, so I am not sure what is to be
>> done here.
> But the emulation code in Xen needs to know whether number 31 refers to the
> stack pointer or to the zero register. This depends on the context of the
> instruction. It's nothing magical about it, you just need to figure out
> which it is in your case and make sure that the code uses the right
> value. In the LDR case n==31 means SP, but t==31 means XZR. The emulation
> would need to consider this.

I see what you mean. I had a look at 
xen/tools/qemu-xen-dir-remote/target/arm/cpu.h, CPUARMState {}. I don't 
see anywhere we need to refer register by its name. IIUC, the 
instruction decoding logic needs to know the register number only.

In fact, Xen would only read CPUARMState->xregs[31] for SP/XZR in 
Aarch64 context. Let me know what I might be missing here.

Beata/Richard - I see you have committed most recently on the file. 
Please feel free to add your inputs.

>
>> 2. MMU on or off - As I see in try_handle_mmio(), one gets the physical
>> address in gpa. So I am not sure what is to be done here.
> You *might* be fine, if Xen takes care of that, I don't know.
> But it needs to explicitly consider those two cases - and also make
> sure that caching effects are cared for.
>
>> 3. I/D coherency - I don't understand how this affects instruction decoding.
> In the ARM architecture the I cache and D cache are not necessarily
> coherent. So whatever the CPU reads via the data bus does not need to be
> the same data that the instruction fetcher read a while ago. And while
> there is a well-known sequence to make the current data side visible to
> the instruction side, there is nothing architectural for the other way
> around.
> For example the CPU fetches a bunch of instructions into the I cache, then
> consumes it from there. Now some code changes those instruction words in
> "memory". Without explicit D-cache-clean-to-the-PoU and
> I-cache-invalidate+ISB the CPU might still execute the old instructions.

This makes sense. Referring 
https://developer.arm.com/documentation/den0024/a/Caches/Cache-maintenance, 
I assume that the following instructions need to be executed before the 
instruction is decoded.

Xn <--- contains the address of the instruction to be decoded.

STR Wt, [Xn] DC CVAU, Xn // Clean datacache by VA to point of 
unification (PoU) DSB ISH // Ensure visibility of the datacleaned from 
cache IC IVAU, Xn // Invalidate instruction cache by VA to PoU DSB ISH 
// Ensure completion of the invalidations ISB // Synchronize the fetched 
instruction stream

> But if your emulation code tries to read the instruction from memory, it
> fetches the data-visible side of it - which is not what the CPU executed,
> so end up emulating the wrong thing.

I am guessing that this should not be a very unusual case. Isn't this 
similar to a host processor loading and modifying a firmware in the 
memory. The firmware can then be executed by a different 
micro-controller or some programmable engine. I mean the data pipeline 
or instruction pipeline should read the same contents from memory.

>
> Please also keep in mind that the architecture does not *guarantee*
> coherency, but in reality it might actually be coherent - either because
> of the particular silicon implementation, or because of side effects in
> the system (data cache line being evicted, ISB executed). So seeing the
> effects of coherency in your testing does not mean you are safe, it might
> just happen by chance.

Is there some good way to test this to ensure the best possible 
reliability ?

- Ayan

>
> Cheers,
> Andre
>
>> Please help me to understand further.
>>
>> - Ayan
>>
>>>   
>>>>> I will leave this decision to Stefano and Julien.
>>>> I have had a chat on IRC with Stefano about this. I think the main
>>>> sticking point is the Arm Arm doesn't clearly state those instructions
>>>> should not be used by a virtualized OS on MMIO regions.
>>> I don't understand why the ARM ARM would need to say that. Certainly you
>>> realise that immediately when trying to use them, and apparently it was not
>>> a problem in the last 8ish years of Xen/ARM's existence.
>>>
>>> So it's your decision on having the emulation, I personally would only do
>>> it when there is a *good* use case.
>>> And please apply the demanded scrutiny on the review - including all the
>>> corner cases like Rn=Rt, XZR vs. SP (as Jan said) and possibly MMU status.
>>>
>>> Cheers,
>>> Andre
>>>   
>>>> To me, this topic looks similar to the set/way instruction dilemma. They
>>>> are a pain to virtualize (and the Arm Arm clearly hint it) but we had to
>>>> do it because some OSes relied on them.
>>>>
>>>> I think the main difference is the Arm Arm doesn't hint they should not
>>>> be used (it only says a valid syndrome is not provided) and the
>>>> implementation should hopefully be smaller and self-contained.
>>>>
>>>> So I would be inclined to allow Xen to decode post-indexing instructions
>>>> (pending the review).
>>>>
>>>> Cheers,
>>>>   
Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Andre Przywara 2 years, 3 months ago
On Mon, 10 Jan 2022 14:33:11 +0000
Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:

Hi Ayan,

> Many thanks for your inputs. It is making better sense now. Much 
> appreciated.
> 
> A few questions/clarifications :-
> 
> On 06/01/2022 15:33, Andre Przywara wrote:
> > On Wed, 5 Jan 2022 16:55:11 +0000
> > Ayan Kumar Halder<ayan.kumar.halder@xilinx.com>  wrote:
> >
> > Hi,
> >  
> >> Thank you so much for your feedback.
> >>
> >> I need a couple of clarifications before I can start with the v3 patch.
> >>
> >> On 08/12/2021 12:00, Andre Przywara wrote:  
> >>> On Mon, 6 Dec 2021 19:31:06 +0000
> >>> Julien Grall<julien@xen.org>  wrote:
> >>>
> >>> Hi,
> >>>     
> >>>> On 01/12/2021 08:41, Bertrand Marquis wrote:  
> >>>>> Hi Ayan,
> >>>>>         
> >>>>>> On 30 Nov 2021, at 19:13, Ayan Kumar Halder<ayan.kumar.halder@xilinx.com>  wrote:
> >>>>>>
> >>>>>> Hi Andre,
> >>>>>>
> >>>>>> Thanks for your comments. They are useful.
> >>>>>>
> >>>>>> On 30/11/2021 09:49, Andre Przywara wrote:  
> >>>>>>> On Mon, 29 Nov 2021 19:16:38 +0000
> >>>>>>> Ayan Kumar Halder<ayan.kumar.halder@xilinx.com>  wrote:
> >>>>>>> Hi,  
> >>>>>>>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
> >>>>>>>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
> >>>>>>>> could use to access MMIO regions.
> >>>>>>>>
> >>>>>>>> For instance, Xilinx baremetal OS will use:
> >>>>>>>>
> >>>>>>>>            volatile u32 *LocalAddr = (volatile u32 *)Addr;
> >>>>>>>>            *LocalAddr = Value;
> >>>>>>>>
> >>>>>>>> This leave the compiler to decide which store instructions to use.  
> >>>>>>> As mentioned in the other email, this is wrong, if this points to MMIO:
> >>>>>>> don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
> >>>>>>> an MMIO area, you should not see traps that you cannot handle already.
> >>>>>>> So I don't think it's a good idea to use that as an example. And since
> >>>>>>> this patch only seems to address this use case, I would doubt its
> >>>>>>> usefulness in general.  
> >>>>>> Yes, I should have fixed the comment.
> >>>>>>
> >>>>>> Currently, I am testing with baremetal app which uses inline assembly code with post indexing instructions, to access the MMIO.
> >>>>>>
> >>>>>> ATM, I am testing with 32 bit MMIO only.
> >>>>>>
> >>>>>> On the usefulness, I am kind of torn as it is legitimate for post indexing instructions to be used in an inline-assembly code for accessing MMIO. However, that may not be something commonly seen.
> >>>>>>
> >>>>>> @Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can you comment if we should have decoding logic or not ?  
> >>>>> Andre gave you the official statement from Arm and there is nothing more I can say.  
> >>>> I think this would be handy for other hypervisor and OS developper to
> >>>> know what they can expect when running in a virtualized environment. So
> >>>> would it be possible to update the Arm Arm reflecting this statement?  
> >>> I don't think it's within the scope of the ARM ARM to say that. It just
> >>> says that "there is no syndrome information", and your mileage may vary in
> >>> working around that.
> >>>
> >>> Personally I would say that if you expect your software to work nicely
> >>> under a hypervisor, then just avoid those instructions. The Linux kernel
> >>> certainly did so.
> >>>
> >>> You can try to do instruction emulation, but doing this right can get
> >>> tricky quickly: think about I$/D$ coherency, MMU on or off, etc.  
> >> I am trying to get all the restrictions that need to be checked. I have
> >> referred
> >> https://developer.arm.com/documentation/dui0802/a/A64-General-Instructions/Register-restrictions-for-A64-instructions?lang=en  
> >> and "Arm A64 Instruction Set Architecture - DDI 0596" - LDR (immediate).
> >>
> >> So far I only see the following restrictions:-
> >>
> >> Rn -ne Rt
> >>
> >> Rt -ne SP
> >>
> >> You had mentioned the following cases :-
> >>
> >> 1. XZR vs SP - I see that both these refer to register no 31. Xen gets
> >> the register number (for Rn/Rt) only, so I am not sure what is to be
> >> done here.  
> > But the emulation code in Xen needs to know whether number 31 refers to the
> > stack pointer or to the zero register. This depends on the context of the
> > instruction. It's nothing magical about it, you just need to figure out
> > which it is in your case and make sure that the code uses the right
> > value. In the LDR case n==31 means SP, but t==31 means XZR. The emulation
> > would need to consider this.  
> 
> I see what you mean. I had a look at 
> xen/tools/qemu-xen-dir-remote/target/arm/cpu.h, CPUARMState {}. I don't 
> see anywhere we need to refer register by its name. IIUC, the 
> instruction decoding logic needs to know the register number only.

But there is of course still a difference between them: they refer to two
separate registers, they just share the encoding of x31 in the instruction.
Each instruction then *knows* what register it needs to go to when it sees
number 31.
By its very nature, you don't need to save XZR, so GPR[31] is typically
the SP, if you have an array holding register values, and you
probably handle the XZR case in code.

> In fact, Xen would only read CPUARMState->xregs[31] for SP/XZR in 
> Aarch64 context. Let me know what I might be missing here.
> 
> Beata/Richard - I see you have committed most recently on the file. 
> Please feel free to add your inputs.
> 
> >  
> >> 2. MMU on or off - As I see in try_handle_mmio(), one gets the physical
> >> address in gpa. So I am not sure what is to be done here.  
> > You *might* be fine, if Xen takes care of that, I don't know.
> > But it needs to explicitly consider those two cases - and also make
> > sure that caching effects are cared for.
> >  
> >> 3. I/D coherency - I don't understand how this affects instruction decoding.  
> > In the ARM architecture the I cache and D cache are not necessarily
> > coherent. So whatever the CPU reads via the data bus does not need to be
> > the same data that the instruction fetcher read a while ago. And while
> > there is a well-known sequence to make the current data side visible to
> > the instruction side, there is nothing architectural for the other way
> > around.
> > For example the CPU fetches a bunch of instructions into the I cache, then
> > consumes it from there. Now some code changes those instruction words in
> > "memory". Without explicit D-cache-clean-to-the-PoU and
> > I-cache-invalidate+ISB the CPU might still execute the old instructions.  
> 
> This makes sense. Referring 
> https://developer.arm.com/documentation/den0024/a/Caches/Cache-maintenance, 
> I assume that the following instructions need to be executed before the 
> instruction is decoded.
> 
> Xn <--- contains the address of the instruction to be decoded.
> 
> STR Wt, [Xn]
> DC CVAU, Xn // Clean datacache by VA to point of unification (PoU)
> DSB ISH // Ensure visibility of the datacleaned from cache
> IC IVAU, Xn // Invalidate instruction cache by VA to PoU
> DSB ISH // Ensure completion of the invalidations
> ISB // Synchronize the fetched instruction stream

That is what I sketched above: the typical sequence to make sure the
instruction fetcher reads the right instructions, after you updated them.
BUT this is not what we need here, we need to other way around: read the
instruction that the CPU executed, but via the data bus, from memory. And
I wouldn't know of a neat architectural way to ensure this.

> > But if your emulation code tries to read the instruction from memory, it
> > fetches the data-visible side of it - which is not what the CPU executed,
> > so end up emulating the wrong thing.  
> 
> I am guessing that this should not be a very unusual case. Isn't this 
> similar to a host processor loading and modifying a firmware in the 
> memory. The firmware can then be executed by a different 
> micro-controller or some programmable engine. I mean the data pipeline 
> or instruction pipeline should read the same contents from memory.

... if they are coherent. Otherwise you need explicit maintenance, see
above. I mean it's the same situation with an external processor: this
core reads instructions from memory and is probably buffering/caching them
(crucial for performance). Now you need to tell if that buffer might
contain stale data, and if the data in memory has changed meanwhile.

> > Please also keep in mind that the architecture does not *guarantee*
> > coherency, but in reality it might actually be coherent - either because
> > of the particular silicon implementation, or because of side effects in
> > the system (data cache line being evicted, ISB executed). So seeing the
> > effects of coherency in your testing does not mean you are safe, it might
> > just happen by chance.  
> 
> Is there some good way to test this to ensure the best possible 
> reliability ?

That's the problem, I think: You can reason on the basis of the
architecture as described in the ARM ARM, but this brings you to the
problems above. You can now try to hand wave away the problems, by stating
that this will never happen in practice, but you then leave the steady
ground the architecture gives you. That's why I was asking before if this
is really necessary code to have, or if you would just be better off
fixing the guests.

Cheers,
Andre

> >> Please help me to understand further.
> >>
> >> - Ayan
> >>  
> >>>     
> >>>>> I will leave this decision to Stefano and Julien.  
> >>>> I have had a chat on IRC with Stefano about this. I think the main
> >>>> sticking point is the Arm Arm doesn't clearly state those instructions
> >>>> should not be used by a virtualized OS on MMIO regions.  
> >>> I don't understand why the ARM ARM would need to say that. Certainly you
> >>> realise that immediately when trying to use them, and apparently it was not
> >>> a problem in the last 8ish years of Xen/ARM's existence.
> >>>
> >>> So it's your decision on having the emulation, I personally would only do
> >>> it when there is a *good* use case.
> >>> And please apply the demanded scrutiny on the review - including all the
> >>> corner cases like Rn=Rt, XZR vs. SP (as Jan said) and possibly MMU status.
> >>>
> >>> Cheers,
> >>> Andre
> >>>     
> >>>> To me, this topic looks similar to the set/way instruction dilemma. They
> >>>> are a pain to virtualize (and the Arm Arm clearly hint it) but we had to
> >>>> do it because some OSes relied on them.
> >>>>
> >>>> I think the main difference is the Arm Arm doesn't hint they should not
> >>>> be used (it only says a valid syndrome is not provided) and the
> >>>> implementation should hopefully be smaller and self-contained.
> >>>>
> >>>> So I would be inclined to allow Xen to decode post-indexing instructions
> >>>> (pending the review).
> >>>>
> >>>> Cheers,
> >>>>     


Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Ayan Kumar Halder 2 years, 3 months ago
Hi Andre/All,

Many thanks for your feedback.

On 11/01/2022 12:52, Andre Przywara wrote:
> On Mon, 10 Jan 2022 14:33:11 +0000
> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
>
> Hi Ayan,
>
>> Many thanks for your inputs. It is making better sense now. Much
>> appreciated.
>>
>> A few questions/clarifications :-
>>
>> On 06/01/2022 15:33, Andre Przywara wrote:
>>> On Wed, 5 Jan 2022 16:55:11 +0000
>>> Ayan Kumar Halder<ayan.kumar.halder@xilinx.com>  wrote:
>>>
>>> Hi,
>>>   
>>>> Thank you so much for your feedback.
>>>>
>>>> I need a couple of clarifications before I can start with the v3 patch.
>>>>
>>>> On 08/12/2021 12:00, Andre Przywara wrote:
>>>>> On Mon, 6 Dec 2021 19:31:06 +0000
>>>>> Julien Grall<julien@xen.org>  wrote:
>>>>>
>>>>> Hi,
>>>>>      
>>>>>> On 01/12/2021 08:41, Bertrand Marquis wrote:
>>>>>>> Hi Ayan,
>>>>>>>          
>>>>>>>> On 30 Nov 2021, at 19:13, Ayan Kumar Halder<ayan.kumar.halder@xilinx.com>  wrote:
>>>>>>>>
>>>>>>>> Hi Andre,
>>>>>>>>
>>>>>>>> Thanks for your comments. They are useful.
>>>>>>>>
>>>>>>>> On 30/11/2021 09:49, Andre Przywara wrote:
>>>>>>>>> On Mon, 29 Nov 2021 19:16:38 +0000
>>>>>>>>> Ayan Kumar Halder<ayan.kumar.halder@xilinx.com>  wrote:
>>>>>>>>> Hi,
>>>>>>>>>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
>>>>>>>>>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
>>>>>>>>>> could use to access MMIO regions.
>>>>>>>>>>
>>>>>>>>>> For instance, Xilinx baremetal OS will use:
>>>>>>>>>>
>>>>>>>>>>             volatile u32 *LocalAddr = (volatile u32 *)Addr;
>>>>>>>>>>             *LocalAddr = Value;
>>>>>>>>>>
>>>>>>>>>> This leave the compiler to decide which store instructions to use.
>>>>>>>>> As mentioned in the other email, this is wrong, if this points to MMIO:
>>>>>>>>> don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
>>>>>>>>> an MMIO area, you should not see traps that you cannot handle already.
>>>>>>>>> So I don't think it's a good idea to use that as an example. And since
>>>>>>>>> this patch only seems to address this use case, I would doubt its
>>>>>>>>> usefulness in general.
>>>>>>>> Yes, I should have fixed the comment.
>>>>>>>>
>>>>>>>> Currently, I am testing with baremetal app which uses inline assembly code with post indexing instructions, to access the MMIO.
>>>>>>>>
>>>>>>>> ATM, I am testing with 32 bit MMIO only.
>>>>>>>>
>>>>>>>> On the usefulness, I am kind of torn as it is legitimate for post indexing instructions to be used in an inline-assembly code for accessing MMIO. However, that may not be something commonly seen.
>>>>>>>>
>>>>>>>> @Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can you comment if we should have decoding logic or not ?
>>>>>>> Andre gave you the official statement from Arm and there is nothing more I can say.
>>>>>> I think this would be handy for other hypervisor and OS developper to
>>>>>> know what they can expect when running in a virtualized environment. So
>>>>>> would it be possible to update the Arm Arm reflecting this statement?
>>>>> I don't think it's within the scope of the ARM ARM to say that. It just
>>>>> says that "there is no syndrome information", and your mileage may vary in
>>>>> working around that.
>>>>>
>>>>> Personally I would say that if you expect your software to work nicely
>>>>> under a hypervisor, then just avoid those instructions. The Linux kernel
>>>>> certainly did so.
>>>>>
>>>>> You can try to do instruction emulation, but doing this right can get
>>>>> tricky quickly: think about I$/D$ coherency, MMU on or off, etc.
>>>> I am trying to get all the restrictions that need to be checked. I have
>>>> referred
>>>> https://developer.arm.com/documentation/dui0802/a/A64-General-Instructions/Register-restrictions-for-A64-instructions?lang=en
>>>> and "Arm A64 Instruction Set Architecture - DDI 0596" - LDR (immediate).
>>>>
>>>> So far I only see the following restrictions:-
>>>>
>>>> Rn -ne Rt
>>>>
>>>> Rt -ne SP
>>>>
>>>> You had mentioned the following cases :-
>>>>
>>>> 1. XZR vs SP - I see that both these refer to register no 31. Xen gets
>>>> the register number (for Rn/Rt) only, so I am not sure what is to be
>>>> done here.
>>> But the emulation code in Xen needs to know whether number 31 refers to the
>>> stack pointer or to the zero register. This depends on the context of the
>>> instruction. It's nothing magical about it, you just need to figure out
>>> which it is in your case and make sure that the code uses the right
>>> value. In the LDR case n==31 means SP, but t==31 means XZR. The emulation
>>> would need to consider this.
>> I see what you mean. I had a look at
>> xen/tools/qemu-xen-dir-remote/target/arm/cpu.h, CPUARMState {}. I don't
>> see anywhere we need to refer register by its name. IIUC, the
>> instruction decoding logic needs to know the register number only.
> But there is of course still a difference between them: they refer to two
> separate registers, they just share the encoding of x31 in the instruction.
> Each instruction then *knows* what register it needs to go to when it sees
> number 31.
> By its very nature, you don't need to save XZR, so GPR[31] is typically
> the SP, if you have an array holding register values, and you
> probably handle the XZR case in code.
>
>> In fact, Xen would only read CPUARMState->xregs[31] for SP/XZR in
>> Aarch64 context. Let me know what I might be missing here.
>>
>> Beata/Richard - I see you have committed most recently on the file.
>> Please feel free to add your inputs.
>>
>>>   
>>>> 2. MMU on or off - As I see in try_handle_mmio(), one gets the physical
>>>> address in gpa. So I am not sure what is to be done here.
>>> You *might* be fine, if Xen takes care of that, I don't know.
>>> But it needs to explicitly consider those two cases - and also make
>>> sure that caching effects are cared for.
>>>   
>>>> 3. I/D coherency - I don't understand how this affects instruction decoding.
>>> In the ARM architecture the I cache and D cache are not necessarily
>>> coherent. So whatever the CPU reads via the data bus does not need to be
>>> the same data that the instruction fetcher read a while ago. And while
>>> there is a well-known sequence to make the current data side visible to
>>> the instruction side, there is nothing architectural for the other way
>>> around.
>>> For example the CPU fetches a bunch of instructions into the I cache, then
>>> consumes it from there. Now some code changes those instruction words in
>>> "memory". Without explicit D-cache-clean-to-the-PoU and
>>> I-cache-invalidate+ISB the CPU might still execute the old instructions.
>> This makes sense. Referring
>> https://developer.arm.com/documentation/den0024/a/Caches/Cache-maintenance,
>> I assume that the following instructions need to be executed before the
>> instruction is decoded.
>>
>> Xn <--- contains the address of the instruction to be decoded.
>>
>> STR Wt, [Xn]
>> DC CVAU, Xn // Clean datacache by VA to point of unification (PoU)
>> DSB ISH // Ensure visibility of the datacleaned from cache
>> IC IVAU, Xn // Invalidate instruction cache by VA to PoU
>> DSB ISH // Ensure completion of the invalidations
>> ISB // Synchronize the fetched instruction stream
> That is what I sketched above: the typical sequence to make sure the
> instruction fetcher reads the right instructions, after you updated them.
> BUT this is not what we need here, we need to other way around: read the
> instruction that the CPU executed, but via the data bus, from memory. And
> I wouldn't know of a neat architectural way to ensure this.
>
>>> But if your emulation code tries to read the instruction from memory, it
>>> fetches the data-visible side of it - which is not what the CPU executed,
>>> so end up emulating the wrong thing.
>> I am guessing that this should not be a very unusual case. Isn't this
>> similar to a host processor loading and modifying a firmware in the
>> memory. The firmware can then be executed by a different
>> micro-controller or some programmable engine. I mean the data pipeline
>> or instruction pipeline should read the same contents from memory.
> ... if they are coherent. Otherwise you need explicit maintenance, see
> above. I mean it's the same situation with an external processor: this
> core reads instructions from memory and is probably buffering/caching them
> (crucial for performance). Now you need to tell if that buffer might
> contain stale data, and if the data in memory has changed meanwhile.
>
>>> Please also keep in mind that the architecture does not *guarantee*
>>> coherency, but in reality it might actually be coherent - either because
>>> of the particular silicon implementation, or because of side effects in
>>> the system (data cache line being evicted, ISB executed). So seeing the
>>> effects of coherency in your testing does not mean you are safe, it might
>>> just happen by chance.
>> Is there some good way to test this to ensure the best possible
>> reliability ?
> That's the problem, I think: You can reason on the basis of the
> architecture as described in the ARM ARM, but this brings you to the
> problems above. You can now try to hand wave away the problems, by stating
> that this will never happen in practice, but you then leave the steady
> ground the architecture gives you. That's why I was asking before if this
> is really necessary code to have, or if you would just be better off
> fixing the guests.
>
> Cheers,
> Andre
>
>>>> Please help me to understand further.
>>>>
>>>> - Ayan
>>>>   
>>>>>      
>>>>>>> I will leave this decision to Stefano and Julien.
>>>>>> I have had a chat on IRC with Stefano about this. I think the main
>>>>>> sticking point is the Arm Arm doesn't clearly state those instructions
>>>>>> should not be used by a virtualized OS on MMIO regions.
>>>>> I don't understand why the ARM ARM would need to say that. Certainly you
>>>>> realise that immediately when trying to use them, and apparently it was not
>>>>> a problem in the last 8ish years of Xen/ARM's existence.
>>>>>
>>>>> So it's your decision on having the emulation, I personally would only do
>>>>> it when there is a *good* use case.
>>>>> And please apply the demanded scrutiny on the review - including all the
>>>>> corner cases like Rn=Rt, XZR vs. SP (as Jan said) and possibly MMU status.

I have submitted a v3 patch for this addressing most of the comments.

"[XEN v3] xen/arm64: io: Decode ldr/str post-indexing instructions". 
Please have a look.

- Ayan

>>>>>
>>>>> Cheers,
>>>>> Andre
>>>>>      
>>>>>> To me, this topic looks similar to the set/way instruction dilemma. They
>>>>>> are a pain to virtualize (and the Arm Arm clearly hint it) but we had to
>>>>>> do it because some OSes relied on them.
>>>>>>
>>>>>> I think the main difference is the Arm Arm doesn't hint they should not
>>>>>> be used (it only says a valid syndrome is not provided) and the
>>>>>> implementation should hopefully be smaller and self-contained.
>>>>>>
>>>>>> So I would be inclined to allow Xen to decode post-indexing instructions
>>>>>> (pending the review).
>>>>>>
>>>>>> Cheers,
>>>>>>      

Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Stefano Stabellini 2 years, 4 months ago
On Mon, 6 Dec 2021, Julien Grall wrote:
> On 01/12/2021 08:41, Bertrand Marquis wrote:
> > Hi Ayan,
> > 
> > > On 30 Nov 2021, at 19:13, Ayan Kumar Halder <ayan.kumar.halder@xilinx.com>
> > > wrote:
> > > 
> > > Hi Andre,
> > > 
> > > Thanks for your comments. They are useful.
> > > 
> > > On 30/11/2021 09:49, Andre Przywara wrote:
> > > > On Mon, 29 Nov 2021 19:16:38 +0000
> > > > Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
> > > > Hi,
> > > > > At the moment, Xen is only handling data abort with valid syndrome
> > > > > (i.e.
> > > > > ISV=0). Unfortunately, this doesn't cover all the instructions a
> > > > > domain
> > > > > could use to access MMIO regions.
> > > > > 
> > > > > For instance, Xilinx baremetal OS will use:
> > > > > 
> > > > >          volatile u32 *LocalAddr = (volatile u32 *)Addr;
> > > > >          *LocalAddr = Value;
> > > > > 
> > > > > This leave the compiler to decide which store instructions to use.
> > > > As mentioned in the other email, this is wrong, if this points to MMIO:
> > > > don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
> > > > an MMIO area, you should not see traps that you cannot handle already.
> > > > So I don't think it's a good idea to use that as an example. And since
> > > > this patch only seems to address this use case, I would doubt its
> > > > usefulness in general.
> > > Yes, I should have fixed the comment.
> > > 
> > > Currently, I am testing with baremetal app which uses inline assembly code
> > > with post indexing instructions, to access the MMIO.
> > > 
> > > ATM, I am testing with 32 bit MMIO only.
> > > 
> > > On the usefulness, I am kind of torn as it is legitimate for post indexing
> > > instructions to be used in an inline-assembly code for accessing MMIO.
> > > However, that may not be something commonly seen.
> > > 
> > > @Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can
> > > you comment if we should have decoding logic or not ?
> > 
> > Andre gave you the official statement from Arm and there is nothing more I
> > can say.
> 
> I think this would be handy for other hypervisor and OS developper to know
> what they can expect when running in a virtualized environment. So would it be
> possible to update the Arm Arm reflecting this statement?
> 
> > I will leave this decision to Stefano and Julien.
> 
> I have had a chat on IRC with Stefano about this. I think the main sticking
> point is the Arm Arm doesn't clearly state those instructions should not be
> used by a virtualized OS on MMIO regions.
> 
> To me, this topic looks similar to the set/way instruction dilemma. They are a
> pain to virtualize (and the Arm Arm clearly hint it) but we had to do it
> because some OSes relied on them.
> 
> I think the main difference is the Arm Arm doesn't hint they should not be
> used (it only says a valid syndrome is not provided) and the implementation
> should hopefully be smaller and self-contained.
> 
> So I would be inclined to allow Xen to decode post-indexing instructions
> (pending the review).

I am of the same opinion.

(FYI in any case we'll also work internally to improve Xilinx baremetal.)

Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Andre Przywara 2 years, 4 months ago
On Tue, 30 Nov 2021 19:13:41 +0000
Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:

Hi Ayan,

> Thanks for your comments. They are useful.
> 
> On 30/11/2021 09:49, Andre Przywara wrote:
> > On Mon, 29 Nov 2021 19:16:38 +0000
> > Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
> > 
> > Hi,
> >   
> >> At the moment, Xen is only handling data abort with valid syndrome (i.e.
> >> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
> >> could use to access MMIO regions.
> >>
> >> For instance, Xilinx baremetal OS will use:
> >>
> >>          volatile u32 *LocalAddr = (volatile u32 *)Addr;
> >>          *LocalAddr = Value;
> >>
> >> This leave the compiler to decide which store instructions to use.  
> > 
> > As mentioned in the other email, this is wrong, if this points to MMIO:
> > don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
> > an MMIO area, you should not see traps that you cannot handle already.
> > 
> > So I don't think it's a good idea to use that as an example. And since
> > this patch only seems to address this use case, I would doubt its
> > usefulness in general.  
> Yes, I should have fixed the comment.
> 
> Currently, I am testing with baremetal app which uses inline assembly 
> code with post indexing instructions, to access the MMIO.
> 
> ATM, I am testing with 32 bit MMIO only.
> 
> On the usefulness, I am kind of torn as it is legitimate for post 
> indexing instructions to be used in an inline-assembly code for 
> accessing MMIO. However, that may not be something commonly seen.

It is legitimate, but I question the usefulness: for a start it wouldn't
work under today's Xen (or KVM), and I doubt this would be backported. So
you would always require users to use the newest version of the hypervisor.
Also MMIO accesses are slow anyway, so while using post-indexing might be
convenient from an assembly writer's point of view, it doesn't give you
much performance-wise, probably. So if you are interested in running under
Xen (or other virtualisation solutions), just don't use them for MMIO.

And I am not sure that Xen aims to virtualise every piece of code on the
planet: certainly we have certain expectations about the platform, so you
would probably consider running under a hypervisor from the very beginning.

> @Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can 
> you comment if we should have decoding logic or not ?
> 
> >   
> >> This
> >> may be a post-index store instruction where the HW will not provide a
> >> valid syndrome.
> >>
> >> In order to handle post-indexing store/load instructions, Xen will need
> >> to fetch and decode the instruction.
> >>
> >> This patch only cover post-index store/load instructions from AArch64
> >> mode. For now, this is left unimplemented for trap from AArch32 mode.
> >>
> >> Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
> >> ---
> >>
> >> Changelog :-
> >>
> >> v2 :- 1. Updated the rn register after reading from it. (Pointed by
> >> Julien, Stefano)
> >> 2. Used a union to represent the instruction opcode (Suggestd by
> >> Bertrand) 3. Fixed coding style issues (Pointed by Julien)
> >> 4. In the previous patch, I was updating dabt->sign based on the
> >> signedness of imm9. This was incorrect. As mentioned in ARMv8 ARM  DDI
> >> 0487G.b, Page 3221, SSE indicates the signedness of the data item
> >> loaded. In our case, the data item loaded is always unsigned.
> >>
> >> This has been tested for the following cases :-
> >> ldr x2, [x1], #4  
> > 
> > As Jan already mentioned: this is a bad example. First, this is a 64-bit
> > access, which you don't emulate below. And second, you want to keep the
> > pointer aligned. Unaligned accesses to device memory always trap, as per
> > the architecture, even on bare metal.
> >   
> >>
> >> ldr w2, [x1], #-4
> >>
> >> str x2, [x1], #4  
> > 
> > Same as above.
> >   
> >> str w2, [x1], #-4
> >>
> >> The reason being  I am testing on 32bit MMIO registers only. I don't see
> >> a 8bit or 16bit MMIO register.  
> > 
> > Where did you look? There are plenty of examples out there, even the GIC
> > allows 8-bit accesses to certain registers (grep for "VGIC_ACCESS_"), and
> > the Linux GIC driver is using them (but with proper accessors, of course).
> > Also GICv3 supports 64-bit accesses to some registers. Some PL011 UARTs use
> > 16-bit MMIO accesses.  
> Yes, sorry I see them now. GICD_IPRIORITYR can be accessed with 8 bits.
> Unfortunately, I have GIC-v2 on my hardware system. So, probably I can't 
> test 64 bit access.
> 
> >   
> >>   xen/arch/arm/decode.c     | 68 ++++++++++++++++++++++++++++++++++++++-
> >>   xen/arch/arm/decode.h     |  3 +-
> >>   xen/arch/arm/io.c         | 40 +++++++++++++++++++----
> >>   xen/include/asm-arm/hsr.h | 26 +++++++++++++++
> >>   4 files changed, 129 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
> >> index 792c2e92a7..0b3e8fcbc6 100644
> >> --- a/xen/arch/arm/decode.c
> >> +++ b/xen/arch/arm/decode.c
> >> @@ -84,6 +84,66 @@ bad_thumb2:
> >>       return 1;
> >>   }
> >>   
> >> +static int decode_32bit_loadstore_postindexing(register_t pc,
> >> +                                               struct hsr_dabt *dabt,
> >> +                                               union ldr_str_instr_class *instr)
> >> +{
> >> +    if ( raw_copy_from_guest(&instr->value, (void * __user)pc, sizeof (instr)) )
> >> +        return -EFAULT;
> >> +
> >> +    /* First, let's check for the fixed values */
> >> +    if ( !((instr->code.fixed1 == 1) && (instr->code.fixed2 == 0) &&
> >> +         (instr->code.fixed3 == 0) && (instr->code.fixed4 == 7)) )
> >> +    {
> >> +        gprintk(XENLOG_ERR, "Decoding not supported for instructions other than"
> >> +            " ldr/str post indexing\n");
> >> +        goto bad_32bit_loadstore;
> >> +    }
> >> +
> >> +    if ( instr->code.size != 2 )  
> > 
> > I don't see a good reason for this limitation. If you are going to dissect
> > the instruction, why not just support at least all access widths, so
> > 64-bits, but also {ldr,str}{b,w}? I think the framework does the heavy
> > lifting for you already?  
> 
> I see your point. My intention was to test first with the restricted 
> instruction set (ie ldr/str - 32 bit access with post indexing only) and 
> get an opinion from the community. If the patch looks sane, then this 
> can be extended with other variants as well.
> 
> > Same for the restriction to post-index above, supporting pre-index as well
> > should be easy.  
> For Pre-indexing instruction, the ISS is valid. So I am not sure what is 
> to be done here?

Where did you find this? The ARM ARM says that ISV=0 when the instruction
is using writeback, which would include pre-index.
I tested "ldr w0, [x1, #4]!" and it failed on KVM, as expected.

> AFAIU, if the ISS is valid, there is no need to explicitly decode the 
> instructions.
> > 
> > To me this has the bitter taste for being a one trick pony to work around
> > your particular (broken!) use case.
> >   
> >> +    {
> >> +        gprintk(XENLOG_ERR,
> >> +            "ldr/str post indexing is supported for 32 bit variant only\n");
> >> +        goto bad_32bit_loadstore;
> >> +    }
> >> +
> >> +    if ( instr->code.v != 0 )
> >> +    {
> >> +        gprintk(XENLOG_ERR,
> >> +            "ldr/str post indexing for vector types are not supported\n");
> >> +        goto bad_32bit_loadstore;
> >> +    }
> >> +
> >> +    /* Check for STR (immediate) - 32 bit variant */
> >> +    if ( instr->code.opc == 0 )
> >> +    {
> >> +        dabt->write = 1;
> >> +    }
> >> +    /* Check for LDR (immediate) - 32 bit variant */
> >> +    else if ( instr->code.opc == 1 )
> >> +    {
> >> +        dabt->write = 0;
> >> +    }
> >> +    else
> >> +    {
> >> +        gprintk(XENLOG_ERR,
> >> +            "Decoding ldr/str post indexing is not supported for this variant\n");
> >> +        goto bad_32bit_loadstore;
> >> +    }
> >> +
> >> +    gprintk(XENLOG_INFO,
> >> +        "instr->code.rt = 0x%x, instr->code.size = 0x%x, instr->code.imm9 = %d\n",
> >> +        instr->code.rt, instr->code.size, instr->code.imm9);
> >> +
> >> +    update_dabt(dabt, instr->code.rt, instr->code.size, false);
> >> +    dabt->valid = 1;
> >> +
> >> +    return 0;
> >> +bad_32bit_loadstore:
> >> +    gprintk(XENLOG_ERR, "unhandled 32bit Arm instruction 0x%x\n", instr->value);
> >> +    return 1;
> >> +}
> >> +
> >>   static int decode_thumb(register_t pc, struct hsr_dabt *dabt)
> >>   {
> >>       uint16_t instr;
> >> @@ -150,11 +210,17 @@ bad_thumb:
> >>       return 1;
> >>   }
> >>   
> >> -int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt)
> >> +int decode_instruction(const struct cpu_user_regs *regs, struct hsr_dabt *dabt,
> >> +                       union ldr_str_instr_class *instr)
> >>   {
> >>       if ( is_32bit_domain(current->domain) && regs->cpsr & PSR_THUMB )
> >>           return decode_thumb(regs->pc, dabt);
> >>   
> >> +    if ( (is_64bit_domain(current->domain) && !psr_mode_is_32bit(regs)) )
> >> +    {
> >> +        return decode_32bit_loadstore_postindexing(regs->pc, dabt, instr);
> >> +    }
> >> +
> >>       /* TODO: Handle ARM instruction */
> >>       gprintk(XENLOG_ERR, "unhandled ARM instruction\n");
> >>   
> >> diff --git a/xen/arch/arm/decode.h b/xen/arch/arm/decode.h
> >> index 4613763bdb..d82fc4a0f6 100644
> >> --- a/xen/arch/arm/decode.h
> >> +++ b/xen/arch/arm/decode.h
> >> @@ -35,7 +35,8 @@
> >>    */
> >>   
> >>   int decode_instruction(const struct cpu_user_regs *regs,
> >> -                       struct hsr_dabt *dabt);
> >> +                       struct hsr_dabt *dabt,
> >> +                       union ldr_str_instr_class *instr);
> >>   
> >>   #endif /* __ARCH_ARM_DECODE_H_ */
> >>   
> >> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> >> index 729287e37c..0d60754bc4 100644
> >> --- a/xen/arch/arm/io.c
> >> +++ b/xen/arch/arm/io.c
> >> @@ -65,6 +65,16 @@ static enum io_state handle_write(const struct
> >> mmio_handler *handler, return ret ? IO_HANDLED : IO_ABORT;
> >>   }
> >>   
> >> +static void post_incremenet_register(union ldr_str_instr_class *instr)
> >> +{
> >> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> >> +    unsigned int val;
> >> +
> >> +    val = get_user_reg(regs, instr->code.rn);
> >> +    val += instr->code.imm9;
> >> +    set_user_reg(regs, instr->code.rn, val);
> >> +}
> >> +
> >>   /* This function assumes that mmio regions are not overlapped */
> >>   static int cmp_mmio_handler(const void *key, const void *elem)
> >>   {
> >> @@ -106,14 +116,26 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> >> *regs, .gpa = gpa,
> >>           .dabt = dabt
> >>       };
> >> +    int rc;
> >> +    union ldr_str_instr_class instr = {0};
> >>   
> >>       ASSERT(hsr.ec == HSR_EC_DATA_ABORT_LOWER_EL);
> >>   
> >> +    /*
> >> +     * Armv8 processor does not provide a valid syndrome for post-indexing
> >> +     * ldr/str instructions. So in order to process these instructions,
> >> +     * Xen must decode them.
> >> +     */
> >> +    if ( !info.dabt.valid )
> >> +    {
> >> +        rc = decode_instruction(regs, &info.dabt, &instr);
> >> +        if ( rc )
> >> +            return IO_ABORT;
> >> +    }
> >> +
> >>       handler = find_mmio_handler(v->domain, info.gpa);
> >>       if ( !handler )
> >>       {
> >> -        int rc;
> >> -
> >>           rc = try_fwd_ioserv(regs, v, &info);
> >>           if ( rc == IO_HANDLED )
> >>               return handle_ioserv(regs, v);
> >> @@ -122,7 +144,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> >> *regs, }
> >>   
> >>       /* All the instructions used on emulated MMIO region should be
> >> valid */
> >> -    if ( !dabt.valid )
> >> +    if ( !info.dabt.valid )
> >>           return IO_ABORT;
> >>   
> >>       /*
> >> @@ -134,7 +156,7 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> >> *regs, {
> >>           int rc;
> >>   
> >> -        rc = decode_instruction(regs, &info.dabt);
> >> +        rc = decode_instruction(regs, &info.dabt, NULL);
> >>           if ( rc )
> >>           {
> >>               gprintk(XENLOG_DEBUG, "Unable to decode instruction\n");
> >> @@ -143,9 +165,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs
> >> *regs, }
> >>   
> >>       if ( info.dabt.write )
> >> -        return handle_write(handler, v, &info);
> >> +        rc = handle_write(handler, v, &info);
> >>       else
> >> -        return handle_read(handler, v, &info);
> >> +        rc = handle_read(handler, v, &info);
> >> +
> >> +    if ( instr.value != 0 )
> >> +    {
> >> +        post_incremenet_register(&instr);
> >> +    }
> >> +    return rc;
> >>   }
> >>   
> >>   void register_mmio_handler(struct domain *d,
> >> diff --git a/xen/include/asm-arm/hsr.h b/xen/include/asm-arm/hsr.h
> >> index 9b91b28c48..72d67d2801 100644
> >> --- a/xen/include/asm-arm/hsr.h
> >> +++ b/xen/include/asm-arm/hsr.h
> >> @@ -15,6 +15,32 @@ enum dabt_size {
> >>       DABT_DOUBLE_WORD = 3,
> >>   };
> >>   
> >> +/*
> >> + * Refer to the ARMv8 ARM (DDI 0487G.b), Section C4.1.4 Loads and Stores
> >> + * Page 318 specifies the following bit pattern for
> >> + * "load/store register (immediate post-indexed)".
> >> + *
> >> + * 31 30 29  27 26 25  23   21 20              11   9         4       0
> >> + * ___________________________________________________________________
> >> + * |size|1 1 1 |V |0 0 |opc |0 |      imm9     |0 1 |  Rn     |  Rt   |
> >> + * |____|______|__|____|____|__|_______________|____|_________|_______|
> >> + */
> >> +union ldr_str_instr_class {
> >> +    uint32_t value;
> >> +    struct ldr_str {
> >> +        unsigned int rt:5;     /* Rt register */  
> > 
> > I don't think it's a particular good idea to use a bit-field here, if that
> > is expected to mimic a certain hardware provided bit pattern.
> > It works in practise (TM), but the C standard does not guarantee the order
> > the bits are allocated (ISO/IEC 9899:201x §6.7.2.1, stanza 11).
> > Since you are *reading* only from the instruction word, you should get away
> > with accessor macros to extract the bits you need. For instance for
> > filtering the opcode, you could use: ((insn & 0x3fe00c00) == 0x38400400)  
> 
> Yes, this is a very good point. I will use bitmasks to access the bits 
> from the register.
> 
> I saw the same logic (ie using bitfields) is used for some other 
> registers as well. For eg hsr_dabt, hsr_iabt in 
> xen/include/asm-arm/hsr.h. May be that needs fixing as well for some 
> other time. :)

(erroneously put that part already in the mail to Bertrand, for the
sake of completeness here it is again):

Well, there is no easy and clear answer as to whether to use bit-fields in
those occasions or not. From a practical point of view, it works (TM), and
has some advantages, like saving you from fiddling around with a 9-bit
sign extension, for instance.
But the Linux kernel discourages those works-for-me solutions, one killer
problem here is endianess, which is not a problem for Xen, IIRC.
I personally prefer robust code: by not relying on certain implementation
specifics (and be they very obvious or wide-spread), there will be less
surprises in the future.

So I'd leave this up to the maintainers to decide, IIUC the original
suggestion came from Bertrand?

Cheers,
Andre

> >> +        unsigned int rn:5;     /* Rn register */
> >> +        unsigned int fixed1:2; /* value == 01b */
> >> +        int imm9:9;            /* imm9 */
> >> +        unsigned int fixed2:1; /* value == 0b */
> >> +        unsigned int opc:2;    /* opc */
> >> +        unsigned int fixed3:2; /* value == 00b */
> >> +        unsigned int v:1;      /* vector */
> >> +        unsigned int fixed4:3; /* value == 111b */
> >> +        unsigned int size:2;   /* size */
> >> +    } code;
> >> +};
> >> +
> >>   union hsr {
> >>       register_t bits;
> >>       struct {  
> >   


Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions
Posted by Jan Beulich 2 years, 4 months ago
On 01.12.2021 12:58, Andre Przywara wrote:
> On Tue, 30 Nov 2021 19:13:41 +0000
> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
>> On 30/11/2021 09:49, Andre Przywara wrote:
>>> On Mon, 29 Nov 2021 19:16:38 +0000
>>> Ayan Kumar Halder <ayan.kumar.halder@xilinx.com> wrote:
>>>> At the moment, Xen is only handling data abort with valid syndrome (i.e.
>>>> ISV=0). Unfortunately, this doesn't cover all the instructions a domain
>>>> could use to access MMIO regions.
>>>>
>>>> For instance, Xilinx baremetal OS will use:
>>>>
>>>>          volatile u32 *LocalAddr = (volatile u32 *)Addr;
>>>>          *LocalAddr = Value;
>>>>
>>>> This leave the compiler to decide which store instructions to use.  
>>>
>>> As mentioned in the other email, this is wrong, if this points to MMIO:
>>> don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
>>> an MMIO area, you should not see traps that you cannot handle already.
>>>
>>> So I don't think it's a good idea to use that as an example. And since
>>> this patch only seems to address this use case, I would doubt its
>>> usefulness in general.  
>> Yes, I should have fixed the comment.
>>
>> Currently, I am testing with baremetal app which uses inline assembly 
>> code with post indexing instructions, to access the MMIO.
>>
>> ATM, I am testing with 32 bit MMIO only.
>>
>> On the usefulness, I am kind of torn as it is legitimate for post 
>> indexing instructions to be used in an inline-assembly code for 
>> accessing MMIO. However, that may not be something commonly seen.
> 
> It is legitimate, but I question the usefulness: for a start it wouldn't
> work under today's Xen (or KVM), and I doubt this would be backported. So
> you would always require users to use the newest version of the hypervisor.
> Also MMIO accesses are slow anyway, so while using post-indexing might be
> convenient from an assembly writer's point of view, it doesn't give you
> much performance-wise, probably.

Just so it gets mentioned: Smaller code size may also be a consideration
here, beyond the more general one about performance.

Jan