[PATCH] This patch implements several Octeon +/II instructions.

owl129@126.com posted 1 patch 10 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20240119045626.9698-1-owl129@126.com
Maintainers: "Philippe Mathieu-Daudé" <philmd@linaro.org>, Aurelien Jarno <aurelien@aurel32.net>, Jiaxun Yang <jiaxun.yang@flygoat.com>, Aleksandar Rikalo <aleksandar.rikalo@syrmia.com>
target/mips/tcg/octeon.decode      |  35 ++++
target/mips/tcg/octeon_translate.c | 281 +++++++++++++++++++++++++++++
2 files changed, 316 insertions(+)
[PATCH] This patch implements several Octeon +/II instructions.
Posted by owl129@126.com 10 months, 1 week ago
From: owl <owl129@126.com>


Octeon+ 
- SAA 
- SAAD

Octeon2
- LAI
- LAID
- LAD
- LADD
- LAS
- LASD
- LAC
- LACD
- LAA
- LAAD
- LAW
- LAWD

- LWX
- LHX
- LDX
- LBUX
- LWUX
- LHUX
- LBX

Signed-off-by: owl <owl129@126.com>
---
 target/mips/tcg/octeon.decode      |  35 ++++
 target/mips/tcg/octeon_translate.c | 281 +++++++++++++++++++++++++++++
 2 files changed, 316 insertions(+)

diff --git a/target/mips/tcg/octeon.decode b/target/mips/tcg/octeon.decode
index 0c787cb498..980ed619d0 100644
--- a/target/mips/tcg/octeon.decode
+++ b/target/mips/tcg/octeon.decode
@@ -39,3 +39,38 @@ CINS         011100 ..... ..... ..... ..... 11001 . @bitfield
 POP          011100 rs:5 00000 rd:5 00000 10110 dw:1
 SEQNE        011100 rs:5 rt:5 rd:5 00000 10101 ne:1
 SEQNEI       011100 rs:5 rt:5 imm:s10 10111 ne:1
+
+
+# SAA rt, (base)
+# SAAD rt, (base)
+SAA         011100 base:5 rt:5 00000 00000 011000
+SAAD        011100 base:5 rt:5 00000 00000 011001
+
+LAI         011100 ..... ..... ..... 00010 011111 @r3
+LAID        011100 ..... ..... ..... 00011 011111 @r3
+LAD         011100 ..... ..... ..... 00110 011111 @r3
+LADD        011100 ..... ..... ..... 00111 011111 @r3
+LAS         011100 ..... ..... ..... 01010 011111 @r3
+LASD        011100 ..... ..... ..... 01011 011111 @r3
+LAC         011100 ..... ..... ..... 01110 011111 @r3
+LACD        011100 ..... ..... ..... 01111 011111 @r3
+LAA         011100 ..... ..... ..... 10010 011111 @r3
+LAAD        011100 ..... ..... ..... 10011 011111 @r3
+LAW         011100 ..... ..... ..... 10110 011111 @r3
+LAWD        011100 ..... ..... ..... 10111 011111 @r3
+
+
+# LWX
+# LHX
+# LDX
+# LBUX
+# LWUX
+# LHUX
+# LBX
+LWX         011111 ..... ..... ..... 00000 001010 @r3
+LHX         011111 ..... ..... ..... 00100 001010 @r3
+LDX         011111 ..... ..... ..... 01000 001010 @r3
+LBUX        011111 ..... ..... ..... 00110 001010 @r3
+LWUX        011111 ..... ..... ..... 10000 001010 @r3
+LHUX        011111 ..... ..... ..... 10100 001010 @r3
+LBX         011111 ..... ..... ..... 10110 001010 @r3
\ No newline at end of file
diff --git a/target/mips/tcg/octeon_translate.c b/target/mips/tcg/octeon_translate.c
index e25c4cbaa0..e9ec372ad3 100644
--- a/target/mips/tcg/octeon_translate.c
+++ b/target/mips/tcg/octeon_translate.c
@@ -174,3 +174,284 @@ static bool trans_SEQNEI(DisasContext *ctx, arg_SEQNEI *a)
     }
     return true;
 }
+
+/*
+ * Octeon+
+ *  https://sourceware.org/legacy-ml/binutils/2011-11/msg00085.html
+ */
+static bool trans_SAA(DisasContext *ctx, arg_SAA *a)
+{
+    TCGv t0 = tcg_temp_new();
+    tcg_gen_qemu_ld_tl(t0, cpu_gpr[a->base], ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+    tcg_gen_add_tl(t0, t0, cpu_gpr[a->rt]);
+
+    tcg_gen_qemu_st_tl(t0, cpu_gpr[a->base], ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+    return true;
+}
+
+static bool trans_SAAD(DisasContext *ctx, arg_SAAD *a)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    tcg_gen_qemu_ld_tl(t0, cpu_gpr[a->base], ctx->mem_idx, MO_TEUQ |
+                           ctx->default_tcg_memop_mask);
+    tcg_gen_add_tl(t0, t0, cpu_gpr[a->rt]);
+
+    tcg_gen_qemu_st_tl(t0, cpu_gpr[a->base], ctx->mem_idx, MO_TEUQ |
+                           ctx->default_tcg_memop_mask);
+    return true;
+}
+
+/*
+ *  Octeon2
+ *   https://chromium.googlesource.com/chromiumos/third_party/gdb/+/refs/heads/master/opcodes/mips-opc.c
+ *   https://github.com/MarvellEmbeddedProcessors/Octeon-Toolchain
+ *   https://bugs.kde.org/show_bug.cgi?id=326444
+ *   https://gcc.gnu.org/legacy-ml/gcc-patches/2011-12/msg01134.html
+ */
+static bool trans_LAI(DisasContext *ctx, arg_LAI *a)
+{
+    TCGv t0 = tcg_temp_new();
+    tcg_gen_qemu_ld_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+    gen_store_gpr(t0, a->rd);
+    tcg_gen_addi_tl(t0, t0, 1);
+
+    tcg_gen_qemu_st_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+    return true;
+}
+
+static bool trans_LAID(DisasContext *ctx, arg_LAID *a)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    tcg_gen_qemu_ld_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUQ |
+                           ctx->default_tcg_memop_mask);
+    gen_store_gpr(t0, a->rd);
+    tcg_gen_addi_tl(t0, t0, 1);
+
+    tcg_gen_qemu_st_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUQ |
+                           ctx->default_tcg_memop_mask);
+    return true;
+}
+
+static bool trans_LAD(DisasContext *ctx, arg_LAD *a)
+{
+    TCGv t0 = tcg_temp_new();
+    tcg_gen_qemu_ld_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+    gen_store_gpr(t0, a->rd);
+    tcg_gen_subi_tl(t0, t0, 1);
+
+    tcg_gen_qemu_st_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+    return true;
+}
+
+static bool trans_LADD(DisasContext *ctx, arg_LADD *a)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    tcg_gen_qemu_ld_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUQ |
+                           ctx->default_tcg_memop_mask);
+    gen_store_gpr(t0, a->rd);
+    tcg_gen_subi_tl(t0, t0, 1);
+
+    tcg_gen_qemu_st_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUQ |
+                           ctx->default_tcg_memop_mask);
+    return true;
+}
+/* Load Atomic Set Word - LAS; Cavium OCTEON2 */
+static bool trans_LAS(DisasContext *ctx, arg_LAS *a)
+{
+    TCGv t0 = tcg_temp_new();
+    tcg_gen_qemu_ld_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+    gen_store_gpr(t0, a->rd);
+    tcg_gen_movi_tl(t0, 0xffffffff);
+
+    tcg_gen_qemu_st_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+
+    return true;
+}
+/* Load Atomic Set Doubleword - LASD; Cavium OCTEON2 */
+static bool trans_LASD(DisasContext *ctx, arg_LASD *a)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    tcg_gen_qemu_ld_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUQ |
+                           ctx->default_tcg_memop_mask);
+    gen_store_gpr(t0, a->rd);
+    tcg_gen_movi_tl(t0, 0xffffffffffffffffULL);
+
+    tcg_gen_qemu_st_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUQ |
+                           ctx->default_tcg_memop_mask);
+    return true;
+}
+/* Load Atomic Clear Word - LAC; Cavium OCTEON2 */
+static bool trans_LAC(DisasContext *ctx, arg_LAC *a)
+{
+    TCGv t0 = tcg_temp_new();
+    tcg_gen_qemu_ld_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+    gen_store_gpr(t0, a->rd);
+    tcg_gen_movi_tl(t0, 0);
+
+    tcg_gen_qemu_st_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+    return true;
+}
+/* Load Atomic Clear Doubleword - LACD; Cavium OCTEON2 */
+static bool trans_LACD(DisasContext *ctx, arg_LACD *a)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    tcg_gen_qemu_ld_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUQ |
+                           ctx->default_tcg_memop_mask);
+    gen_store_gpr(t0, a->rd);
+    tcg_gen_movi_tl(t0, 0xffffffffffffffffULL);
+
+    tcg_gen_qemu_st_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUQ |
+                           ctx->default_tcg_memop_mask);
+    return true;
+}
+
+/* Load Atomic Add Word - LAA; Cavium OCTEON2 */
+static bool trans_LAA(DisasContext *ctx, arg_LAA *a)
+{
+    TCGv t0 = tcg_temp_new();
+    tcg_gen_qemu_ld_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+    gen_store_gpr(t0, a->rd);
+    tcg_gen_add_tl(t0, t0, cpu_gpr[a->rt]);
+
+    tcg_gen_qemu_st_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+    return true;
+}
+
+/* Load Atomic Add Doubleword - LAAD; Cavium OCTEON2 */
+static bool trans_LAAD(DisasContext *ctx, arg_LAAD *a)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    tcg_gen_qemu_ld_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUQ |
+                           ctx->default_tcg_memop_mask);
+    gen_store_gpr(t0, a->rd);
+    tcg_gen_add_tl(t0, t0, cpu_gpr[a->rt]);
+
+    tcg_gen_qemu_st_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUQ |
+                           ctx->default_tcg_memop_mask);
+    return true;
+}
+/* Load Atomic Swap Word - LAW; Cavium OCTEON2 */
+static bool trans_LAW(DisasContext *ctx, arg_LAW *a)
+{
+    TCGv t0 = tcg_temp_new();
+    tcg_gen_qemu_ld_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+    gen_store_gpr(t0, a->rd);
+    tcg_gen_mov_tl(t0, cpu_gpr[a->rt]);
+
+    tcg_gen_qemu_st_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+    return true;
+}
+
+static bool trans_LAWD(DisasContext *ctx, arg_LAWD *a)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    tcg_gen_qemu_ld_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUQ |
+                           ctx->default_tcg_memop_mask);
+    gen_store_gpr(t0, a->rd);
+    tcg_gen_mov_tl(t0, cpu_gpr[a->rt]);
+
+    tcg_gen_qemu_st_tl(t0, cpu_gpr[a->rs], ctx->mem_idx, MO_TEUQ |
+                           ctx->default_tcg_memop_mask);
+    return true;
+}
+
+
+static bool trans_LWX(DisasContext *ctx, arg_LWX *a)
+{
+    TCGv t0 = tcg_temp_new();
+    gen_op_addr_add(ctx, t0, cpu_gpr[a->rs], cpu_gpr[a->rt]);
+
+    tcg_gen_qemu_ld_tl(t0, t0, ctx->mem_idx, MO_TESL |
+                           ctx->default_tcg_memop_mask);
+
+    /* on mips64, 32 extend to 64 */
+    tcg_gen_ext32s_tl(cpu_gpr[a->rd], t0);
+    return true;
+}
+
+static bool trans_LHX(DisasContext *ctx, arg_LHX *a)
+{
+    TCGv t0 = tcg_temp_new();
+    gen_op_addr_add(ctx, t0, cpu_gpr[a->rs], cpu_gpr[a->rt]);
+
+    tcg_gen_qemu_ld_tl(t0, t0, ctx->mem_idx, MO_TESW |
+                           ctx->default_tcg_memop_mask);
+
+    /* 16 extend to 32/64 */
+    tcg_gen_ext16s_tl(cpu_gpr[a->rd], t0);
+    return true;
+}
+
+static bool trans_LDX(DisasContext *ctx, arg_LDX *a)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    gen_op_addr_add(ctx, t0, cpu_gpr[a->rs], cpu_gpr[a->rt]);
+
+    tcg_gen_qemu_ld_tl(t0, t0, ctx->mem_idx, MO_TESQ |
+                           ctx->default_tcg_memop_mask);
+    /* not extend */
+    gen_store_gpr(t0, a->rd);
+    return true;
+}
+
+static bool trans_LBUX(DisasContext *ctx, arg_LBUX *a)
+{
+    TCGv t0 = tcg_temp_new();
+    gen_op_addr_add(ctx, t0, cpu_gpr[a->rs], cpu_gpr[a->rt]);
+
+    tcg_gen_qemu_ld_tl(t0, t0, ctx->mem_idx, MO_UB |
+                           ctx->default_tcg_memop_mask);
+
+    tcg_gen_ext8u_tl(cpu_gpr[a->rd], t0);
+    return true;
+}
+
+static bool trans_LWUX(DisasContext *ctx, arg_LWUX *a)
+{
+    TCGv t0 = tcg_temp_new();
+    gen_op_addr_add(ctx, t0, cpu_gpr[a->rs], cpu_gpr[a->rt]);
+
+    tcg_gen_qemu_ld_tl(t0, t0, ctx->mem_idx, MO_TEUL |
+                           ctx->default_tcg_memop_mask);
+
+    tcg_gen_ext32u_tl(cpu_gpr[a->rd], t0);
+    return true;
+}
+
+static bool trans_LHUX(DisasContext *ctx, arg_LHUX *a)
+{
+    TCGv t0 = tcg_temp_new();
+    gen_op_addr_add(ctx, t0, cpu_gpr[a->rs], cpu_gpr[a->rt]);
+
+    tcg_gen_qemu_ld_tl(t0, t0, ctx->mem_idx, MO_TEUW |
+                           ctx->default_tcg_memop_mask);
+
+    tcg_gen_ext16u_tl(cpu_gpr[a->rd], t0);
+    return true;
+}
+
+static bool trans_LBX(DisasContext *ctx, arg_LBX *a)
+{
+    TCGv t0 = tcg_temp_new();
+    gen_op_addr_add(ctx, t0, cpu_gpr[a->rs], cpu_gpr[a->rt]);
+
+    tcg_gen_qemu_ld_tl(t0, t0, ctx->mem_idx, MO_SB |
+                           ctx->default_tcg_memop_mask);
+
+    tcg_gen_ext8s_tl(cpu_gpr[a->rd], t0);
+    return true;
+}
-- 
2.34.1
Re: [PATCH] This patch implements several Octeon +/II instructions.
Posted by Philippe Mathieu-Daudé 9 months, 2 weeks ago
Hi,

On 19/1/24 05:56, owl129@126.com wrote:
> From: owl <owl129@126.com>
> 

Thank for your patch!

> 
> Octeon+
> - SAA
> - SAAD

It seems it could be split in 3 parts, SA*, LA* and the rest.

> Octeon2
> - LAI
> - LAID
> - LAD
> - LADD
> - LAS
> - LASD
> - LAC
> - LACD
> - LAA
> - LAAD
> - LAW
> - LAWD
> 
> - LWX
> - LHX
> - LDX
> - LBUX
> - LWUX
> - LHUX
> - LBX
> 
> Signed-off-by: owl <owl129@126.com>
> ---
>   target/mips/tcg/octeon.decode      |  35 ++++
>   target/mips/tcg/octeon_translate.c | 281 +++++++++++++++++++++++++++++
>   2 files changed, 316 insertions(+)

How can we test it? Is there any distribution producing kernel for
Octeon+/2? Per https://github.com/MarvellEmbeddedProcessors/marvell-dpdk
I understand there could be Linux and FreeBSD, is that correct?

Thanks,

Phil.
Re:Re: [PATCH] This patch implements several Octeon +/II instructions.
Posted by owl129 9 months, 2 weeks ago
Hi > How can we test it? Is there any distribution producing kernel for
> Octeon+/2? Per https://github.com/MarvellEmbeddedProcessors/marvell-dpdk
>  I understand there could be Linux and FreeBSD, is that correct?Actually I don't know how to fully test each intruction. 


To the best of my knowledge, the Octeon Instruction is optimized for networking security/application processor.
And the latest reference manual is not public available. (https://dokumen.tips/documents/cavium-networks-octeon-plus-cn50xx-hardware-2008-cavium-networks-octeon-plus.html?page=1 )


I find the Instruction specification from Marvell's toolchain and Vargrind's vex ir translation (https://sourceware.org/git/?p=valgrind.git;a=blob;f=VEX/priv/guest_mips_toIR.c;h=1285edad0b83b0f0a6b21fc63d2235d50f94d204;hb=HEAD#l2909)
I have successfully emulated an ELF binary compiled for Cavium (Marvell) on x86 architecture.


Can you help or give me some suggestions about testing?






Best


owl129