From nobody Thu Jan 30 19:08:40 2025 Received: from cstnet.cn (smtp81.cstnet.cn [159.226.251.81]) (using TLSv1.2 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11FE4BA42; Mon, 27 Jan 2025 06:16:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=159.226.251.81 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737958594; cv=none; b=C0my50V2p3tZEEHEheMbwxsqruKGz31YItkp4ERvAPeS4DpXak9MyWdcjOZZoM2jfoklo3aSXX0Gx2+fCXjN49oYIDu/QEWr/e3p8ISIJeTNyKiIQC9ooklNvskBrfFU6nEKqSypIQuCnTtbdG7zcarD6r9eFv+LAxzhhtH0Ouc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737958594; c=relaxed/simple; bh=9xG4YHf4nj99Dri3oxURXPg6Wo9fyoaWqOmBANDOOsA=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=fvJmNysDF5R+1WRWcreZucLT0cp8VmzaHgig/nbaxOiX9WhIU+qB3BgpT3FHZLCdk2+chrl/mlV0yj1pvaZv1eptpvb0bN3Z2tURGaHo3bs/1JBvsouRZRufSQk/OEOZT7+SCcHq5Pdx4wXDmZefq2LaZZ1DbJGMb58tjR4WjnU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=iscas.ac.cn; spf=pass smtp.mailfrom=iscas.ac.cn; arc=none smtp.client-ip=159.226.251.81 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=iscas.ac.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=iscas.ac.cn Received: from ubt.. (unknown [210.73.53.31]) by APP-03 (Coremail) with SMTP id rQCowACXkzigJJdniIuiCQ--.41092S2; Mon, 27 Jan 2025 14:16:02 +0800 (CST) From: Chunyan Zhang To: Paul Walmsley , Palmer Dabbelt , Albert Ou , Charlie Jenkins , Song Liu , Yu Kuai Cc: linux-riscv@lists.infradead.org, linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, Chunyan Zhang Subject: [PATCH V2] raid6: Add RISC-V SIMD syndrome and recovery calculations Date: Mon, 27 Jan 2025 14:15:29 +0800 Message-Id: <20250127061529.2437012-1-zhangchunyan@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: rQCowACXkzigJJdniIuiCQ--.41092S2 X-Coremail-Antispam: 1UD129KBjvAXoWfKr1kKr1kZrWDCr47AFyftFb_yoWrGFy5Go ZrGw48Jrn7u345JayYka4UGr15u34fKan3twsFvr4kAF93t3W5GwsrKw4UWr42qFn8K3W2 vFy5Xw47Xw45twsrn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUYh7k0a2IF6w4kM7kC6x804xWl14x267AKxVW8JVW5JwAFc2x0 x2IEx4CE42xK8VAvwI8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj4 1l84x0c7CEw4AK67xGY2AK021l84ACjcxK6xIIjxv20xvE14v26r1j6r1xM28EF7xvwVC0 I7IYx2IY6xkF7I0E14v26r1j6r4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4 vEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40E FcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr 0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4IIrI8v6xkF7I0E8cxa n2IY04v7MxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4 AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE 17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMI IF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4l IxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVWUJVW8JbIYCTnIWI evJa73UjIFyTuYvjxUqiFxDUUUU X-CM-SenderInfo: x2kd0wxfkx051dq6x2xfdvhtffof0/1tbiCREPB2eWeBK+tgAAsD Content-Type: text/plain; charset="utf-8" The assembly is originally based on the ARM NEON and int.uc, but uses RISC-V vector instructions to implement the RAID6 syndrome and recovery calculations. Results on QEMU running with the option "-icount shift=3D0": raid6: rvvx1 gen() 1008 MB/s raid6: rvvx2 gen() 1395 MB/s raid6: rvvx4 gen() 1584 MB/s raid6: rvvx8 gen() 1694 MB/s raid6: int64x8 gen() 113 MB/s raid6: int64x4 gen() 116 MB/s raid6: int64x2 gen() 272 MB/s raid6: int64x1 gen() 229 MB/s raid6: using algorithm rvvx8 gen() 1694 MB/s raid6: .... xor() 1000 MB/s, rmw enabled raid6: using rvv recovery algorithm [Charlie: - Fixup vector options] Signed-off-by: Charlie Jenkins Signed-off-by: Chunyan Zhang --- V2: - Add raid6_rvvx8; - Address the vector options issue; - Add .valid callback to raid6_rvv and raid6_recov_rvv; - Removed unneeded check of crypto_simd_usable(); RFC: https://lore.kernel.org/lkml/20241220114023.667347-1-zhangchunyan@isca= s.ac.cn/ --- include/linux/raid/pq.h | 5 + lib/raid6/Makefile | 1 + lib/raid6/algos.c | 9 + lib/raid6/recov_rvv.c | 234 ++++++++ lib/raid6/rvv.c | 1269 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 1518 insertions(+) create mode 100644 lib/raid6/recov_rvv.c create mode 100644 lib/raid6/rvv.c diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h index 98030accf641..72ff44cca864 100644 --- a/include/linux/raid/pq.h +++ b/include/linux/raid/pq.h @@ -108,6 +108,10 @@ extern const struct raid6_calls raid6_vpermxor4; extern const struct raid6_calls raid6_vpermxor8; extern const struct raid6_calls raid6_lsx; extern const struct raid6_calls raid6_lasx; +extern const struct raid6_calls raid6_rvvx1; +extern const struct raid6_calls raid6_rvvx2; +extern const struct raid6_calls raid6_rvvx4; +extern const struct raid6_calls raid6_rvvx8; =20 struct raid6_recov_calls { void (*data2)(int, size_t, int, int, void **); @@ -125,6 +129,7 @@ extern const struct raid6_recov_calls raid6_recov_s390x= c; extern const struct raid6_recov_calls raid6_recov_neon; extern const struct raid6_recov_calls raid6_recov_lsx; extern const struct raid6_recov_calls raid6_recov_lasx; +extern const struct raid6_recov_calls raid6_recov_rvv; =20 extern const struct raid6_calls raid6_neonx1; extern const struct raid6_calls raid6_neonx2; diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile index 29127dd05d63..5be0a4e60ab1 100644 --- a/lib/raid6/Makefile +++ b/lib/raid6/Makefile @@ -10,6 +10,7 @@ raid6_pq-$(CONFIG_ALTIVEC) +=3D altivec1.o altivec2.o alt= ivec4.o altivec8.o \ raid6_pq-$(CONFIG_KERNEL_MODE_NEON) +=3D neon.o neon1.o neon2.o neon4.o ne= on8.o recov_neon.o recov_neon_inner.o raid6_pq-$(CONFIG_S390) +=3D s390vx8.o recov_s390xc.o raid6_pq-$(CONFIG_LOONGARCH) +=3D loongarch_simd.o recov_loongarch_simd.o +raid6_pq-$(CONFIG_RISCV_ISA_V) +=3D rvv.o recov_rvv.o =20 hostprogs +=3D mktables =20 diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c index cd2e88ee1f14..99980ff5b985 100644 --- a/lib/raid6/algos.c +++ b/lib/raid6/algos.c @@ -80,6 +80,12 @@ const struct raid6_calls * const raid6_algos[] =3D { #ifdef CONFIG_CPU_HAS_LSX &raid6_lsx, #endif +#endif +#ifdef CONFIG_RISCV_ISA_V + &raid6_rvvx1, + &raid6_rvvx2, + &raid6_rvvx4, + &raid6_rvvx8, #endif &raid6_intx8, &raid6_intx4, @@ -115,6 +121,9 @@ const struct raid6_recov_calls *const raid6_recov_algos= [] =3D { #ifdef CONFIG_CPU_HAS_LSX &raid6_recov_lsx, #endif +#endif +#ifdef CONFIG_RISCV_ISA_V + &raid6_recov_rvv, #endif &raid6_recov_intx1, NULL diff --git a/lib/raid6/recov_rvv.c b/lib/raid6/recov_rvv.c new file mode 100644 index 000000000000..db271d2987c6 --- /dev/null +++ b/lib/raid6/recov_rvv.c @@ -0,0 +1,234 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright 2024 Institute of Software, CAS. + * Author: Chunyan Zhang + */ + +#include +#include +#include +#include + +static int rvv_has_vector(void) +{ + return has_vector(); +} + +static void __raid6_2data_recov_rvv(int bytes, u8 *p, u8 *q, u8 *dp, + u8 *dq, const u8 *pbmul, + const u8 *qmul) +{ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsetvli x0, %[avl], e8, m1, ta, ma\n" + ".option pop\n" + : : + [avl]"r"(16) + ); + + /* + * while ( bytes-- ) { + * uint8_t px, qx, db; + * + * px =3D *p ^ *dp; + * qx =3D qmul[*q ^ *dq]; + * *dq++ =3D db =3D pbmul[px] ^ qx; + * *dp++ =3D db ^ px; + * p++; q++; + * } + */ + while (bytes) { + /* + * v0:px, v1:dp, + * v2:qx, v3:dq, + * v4:vx, v5:vy, + * v6:qm0, v7:qm1, + * v8:pm0, v9:pm1, + * v14:p/qm[vx], v15:p/qm[vy] + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vle8.v v0, (%[px])\n" + "vle8.v v1, (%[dp])\n" + "vxor.vv v0, v0, v1\n" + "vle8.v v2, (%[qx])\n" + "vle8.v v3, (%[dq])\n" + "vxor.vv v4, v2, v3\n" + "vsrl.vi v5, v4, 4\n" + "vand.vi v4, v4, 0xf\n" + "vle8.v v6, (%[qm0])\n" + "vle8.v v7, (%[qm1])\n" + "vrgather.vv v14, v6, v4\n" /* v14 =3D qm[vx] */ + "vrgather.vv v15, v7, v5\n" /* v15 =3D qm[vy] */ + "vxor.vv v2, v14, v15\n" /* v2 =3D qmul[*q ^ *dq] */ + + "vsrl.vi v5, v0, 4\n" + "vand.vi v4, v0, 0xf\n" + "vle8.v v8, (%[pm0])\n" + "vle8.v v9, (%[pm1])\n" + "vrgather.vv v14, v8, v4\n" /* v14 =3D pm[vx] */ + "vrgather.vv v15, v9, v5\n" /* v15 =3D pm[vy] */ + "vxor.vv v4, v14, v15\n" /* v4 =3D pbmul[px] */ + "vxor.vv v3, v4, v2\n" /* v3 =3D db =3D pbmul[px] ^ qx */ + "vxor.vv v1, v3, v0\n" /* v1 =3D db ^ px; */ + "vse8.v v3, (%[dq])\n" + "vse8.v v1, (%[dp])\n" + ".option pop\n" + : : + [px]"r"(p), + [dp]"r"(dp), + [qx]"r"(q), + [dq]"r"(dq), + [qm0]"r"(qmul), + [qm1]"r"(qmul + 16), + [pm0]"r"(pbmul), + [pm1]"r"(pbmul + 16) + :); + + bytes -=3D 16; + p +=3D 16; + q +=3D 16; + dp +=3D 16; + dq +=3D 16; + } +} + +static void __raid6_datap_recov_rvv(int bytes, uint8_t *p, uint8_t *q, uin= t8_t *dq, + const uint8_t *qmul) +{ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsetvli x0, %[avl], e8, m1, ta, ma\n" + ".option pop\n" + : : + [avl]"r"(16) + ); + + /* + * while (bytes--) { + * *p++ ^=3D *dq =3D qmul[*q ^ *dq]; + * q++; dq++; + * } + */ + while (bytes) { + /* + * v0:vx, v1:vy, + * v2:dq, v3:p, + * v4:qm0, v5:qm1, + * v10:m[vx], v11:m[vy] + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vle8.v v0, (%[vx])\n" + "vle8.v v2, (%[dq])\n" + "vxor.vv v0, v0, v2\n" + "vsrl.vi v1, v0, 4\n" + "vand.vi v0, v0, 0xf\n" + "vle8.v v4, (%[qm0])\n" + "vle8.v v5, (%[qm1])\n" + "vrgather.vv v10, v4, v0\n" + "vrgather.vv v11, v5, v1\n" + "vxor.vv v0, v10, v11\n" + "vle8.v v1, (%[vy])\n" + "vxor.vv v1, v0, v1\n" + "vse8.v v0, (%[dq])\n" + "vse8.v v1, (%[vy])\n" + ".option pop\n" + : : + [vx]"r"(q), + [vy]"r"(p), + [dq]"r"(dq), + [qm0]"r"(qmul), + [qm1]"r"(qmul + 16) + :); + + bytes -=3D 16; + p +=3D 16; + q +=3D 16; + dq +=3D 16; + } +} + + +static void raid6_2data_recov_rvv(int disks, size_t bytes, int faila, + int failb, void **ptrs) +{ + u8 *p, *q, *dp, *dq; + const u8 *pbmul; /* P multiplier table for B data */ + const u8 *qmul; /* Q multiplier table (for both) */ + + p =3D (u8 *)ptrs[disks - 2]; + q =3D (u8 *)ptrs[disks - 1]; + + /* + * Compute syndrome with zero for the missing data pages + * Use the dead data pages as temporary storage for + * delta p and delta q + */ + dp =3D (u8 *)ptrs[faila]; + ptrs[faila] =3D (void *)raid6_empty_zero_page; + ptrs[disks - 2] =3D dp; + dq =3D (u8 *)ptrs[failb]; + ptrs[failb] =3D (void *)raid6_empty_zero_page; + ptrs[disks - 1] =3D dq; + + raid6_call.gen_syndrome(disks, bytes, ptrs); + + /* Restore pointer table */ + ptrs[faila] =3D dp; + ptrs[failb] =3D dq; + ptrs[disks - 2] =3D p; + ptrs[disks - 1] =3D q; + + /* Now, pick the proper data tables */ + pbmul =3D raid6_vgfmul[raid6_gfexi[failb-faila]]; + qmul =3D raid6_vgfmul[raid6_gfinv[raid6_gfexp[faila] ^ + raid6_gfexp[failb]]]; + + kernel_vector_begin(); + __raid6_2data_recov_rvv(bytes, p, q, dp, dq, pbmul, qmul); + kernel_vector_end(); +} + +static void raid6_datap_recov_rvv(int disks, size_t bytes, int faila, + void **ptrs) +{ + u8 *p, *q, *dq; + const u8 *qmul; /* Q multiplier table */ + + p =3D (u8 *)ptrs[disks - 2]; + q =3D (u8 *)ptrs[disks - 1]; + + /* + * Compute syndrome with zero for the missing data page + * Use the dead data page as temporary storage for delta q + */ + dq =3D (u8 *)ptrs[faila]; + ptrs[faila] =3D (void *)raid6_empty_zero_page; + ptrs[disks - 1] =3D dq; + + raid6_call.gen_syndrome(disks, bytes, ptrs); + + /* Restore pointer table */ + ptrs[faila] =3D dq; + ptrs[disks - 1] =3D q; + + /* Now, pick the proper data tables */ + qmul =3D raid6_vgfmul[raid6_gfinv[raid6_gfexp[faila]]]; + + kernel_vector_begin(); + __raid6_datap_recov_rvv(bytes, p, q, dq, qmul); + kernel_vector_end(); +} + +const struct raid6_recov_calls raid6_recov_rvv =3D { + .data2 =3D raid6_2data_recov_rvv, + .datap =3D raid6_datap_recov_rvv, + .valid =3D rvv_has_vector, + .name =3D "rvv", + .priority =3D 1, +}; diff --git a/lib/raid6/rvv.c b/lib/raid6/rvv.c new file mode 100644 index 000000000000..fd0ec33edb1e --- /dev/null +++ b/lib/raid6/rvv.c @@ -0,0 +1,1269 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * RAID-6 syndrome calculation using RISCV vector instructions + * + * Copyright 2024 Institute of Software, CAS. + * Author: Chunyan Zhang + * + * Based on neon.uc: + * Copyright 2002-2004 H. Peter Anvin + */ + +#include +#include +#include +#include +#include + +#define NSIZE (riscv_v_vsize / 32) /* NSIZE =3D vlenb */ + +static int rvv_has_vector(void) +{ + return has_vector(); +} + +static void raid6_rvv1_gen_syndrome_real(int disks, unsigned long bytes, v= oid **ptrs) +{ + u8 **dptr =3D (u8 **)ptrs; + int d, z, z0; + u8 *p, *q; + + z0 =3D disks - 3; /* Highest data disk */ + p =3D dptr[z0+1]; /* XOR parity */ + q =3D dptr[z0+2]; /* RS syndrome */ + + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsetvli t0, x0, e8, m1, ta, ma\n" + ".option pop\n" + ); + + /* v0:wp0, v1:wq0, v2:wd0/w20, v3:w10 */ + for (d =3D 0 ; d < bytes ; d +=3D NSIZE*1) { + /* wq$$ =3D wp$$ =3D *(unative_t *)&dptr[z0][d+$$*NSIZE]; */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vle8.v v0, (%[wp0])\n" + "vle8.v v1, (%[wp0])\n" + ".option pop\n" + : : + [wp0]"r"(&dptr[z0][d+0*NSIZE]) + ); + + for (z =3D z0-1 ; z >=3D 0 ; z--) { + /* + * w2$$ =3D MASK(wq$$); + * w1$$ =3D SHLBYTE(wq$$); + * w2$$ &=3D NBYTES(0x1d); + * w1$$ ^=3D w2$$; + * wd$$ =3D *(unative_t *)&dptr[z][d+$$*NSIZE]; + * wq$$ =3D w1$$ ^ wd$$; + * wp$$ ^=3D wd$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsra.vi v2, v1, 7\n" + "vsll.vi v3, v1, 1\n" + "vand.vx v2, v2, %[x1d]\n" + "vxor.vv v3, v3, v2\n" + "vle8.v v2, (%[wd0])\n" + "vxor.vv v1, v3, v2\n" + "vxor.vv v0, v0, v2\n" + ".option pop\n" + : : + [wd0]"r"(&dptr[z][d+0*NSIZE]), + [x1d]"r"(0x1d) + ); + } + + /* + * *(unative_t *)&p[d+NSIZE*$$] =3D wp$$; + * *(unative_t *)&q[d+NSIZE*$$] =3D wq$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vse8.v v0, (%[wp0])\n" + "vse8.v v1, (%[wq0])\n" + ".option pop\n" + : : + [wp0]"r"(&p[d+NSIZE*0]), + [wq0]"r"(&q[d+NSIZE*0]) + ); + } +} + +static void raid6_rvv1_xor_syndrome_real(int disks, int start, int stop, + unsigned long bytes, void **ptrs) +{ + u8 **dptr =3D (u8 **)ptrs; + u8 *p, *q; + int d, z, z0; + + z0 =3D stop; /* P/Q right side optimization */ + p =3D dptr[disks-2]; /* XOR parity */ + q =3D dptr[disks-1]; /* RS syndrome */ + + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsetvli t0, x0, e8, m1, ta, ma\n" + ".option pop\n" + ); + + /* v0:wp0, v1:wq0, v2:wd0/w20, v3:w10 */ + for (d =3D 0 ; d < bytes ; d +=3D NSIZE*1) { + /* wq$$ =3D wp$$ =3D *(unative_t *)&dptr[z0][d+$$*NSIZE]; */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vle8.v v0, (%[wp0])\n" + "vle8.v v1, (%[wp0])\n" + ".option pop\n" + : : + [wp0]"r"(&dptr[z0][d+0*NSIZE]) + ); + + /* P/Q data pages */ + for (z =3D z0-1 ; z >=3D start ; z--) { + /* + * w2$$ =3D MASK(wq$$); + * w1$$ =3D SHLBYTE(wq$$); + * w2$$ &=3D NBYTES(0x1d); + * w1$$ ^=3D w2$$; + * wd$$ =3D *(unative_t *)&dptr[z][d+$$*NSIZE]; + * wq$$ =3D w1$$ ^ wd$$; + * wp$$ ^=3D wd$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsra.vi v2, v1, 7\n" + "vsll.vi v3, v1, 1\n" + "vand.vx v2, v2, %[x1d]\n" + "vxor.vv v3, v3, v2\n" + "vle8.v v2, (%[wd0])\n" + "vxor.vv v1, v3, v2\n" + "vxor.vv v0, v0, v2\n" + ".option pop\n" + : : + [wd0]"r"(&dptr[z][d+0*NSIZE]), + [x1d]"r"(0x1d) + ); + } + + /* P/Q left side optimization */ + for (z =3D start-1 ; z >=3D 0 ; z--) { + /* + * w2$$ =3D MASK(wq$$); + * w1$$ =3D SHLBYTE(wq$$); + * w2$$ &=3D NBYTES(0x1d); + * wq$$ =3D w1$$ ^ w2$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsra.vi v2, v1, 7\n" + "vsll.vi v3, v1, 1\n" + "vand.vx v2, v2, %[x1d]\n" + "vxor.vv v1, v3, v2\n" + ".option pop\n" + : : + [x1d]"r"(0x1d) + ); + } + + /* + * *(unative_t *)&p[d+NSIZE*$$] ^=3D wp$$; + * *(unative_t *)&q[d+NSIZE*$$] ^=3D wq$$; + * v0:wp0, v1:wq0, v2:p0, v3:q0 + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vle8.v v2, (%[wp0])\n" + "vle8.v v3, (%[wq0])\n" + "vxor.vv v2, v2, v0\n" + "vxor.vv v3, v3, v1\n" + "vse8.v v2, (%[wp0])\n" + "vse8.v v3, (%[wq0])\n" + ".option pop\n" + : : + [wp0]"r"(&p[d+NSIZE*0]), + [wq0]"r"(&q[d+NSIZE*0]) + ); + } +} + +static void raid6_rvv2_gen_syndrome_real(int disks, unsigned long bytes, v= oid **ptrs) +{ + u8 **dptr =3D (u8 **)ptrs; + int d, z, z0; + u8 *p, *q; + + z0 =3D disks - 3; /* Highest data disk */ + p =3D dptr[z0+1]; /* XOR parity */ + q =3D dptr[z0+2]; /* RS syndrome */ + + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsetvli t0, x0, e8, m1, ta, ma\n" + ".option pop\n" + ); + + /* + * v0:wp0, v1:wq0, v2:wd0/w20, v3:w10 + * v4:wp1, v5:wq1, v6:wd1/w21, v7:w11 + */ + for (d =3D 0 ; d < bytes ; d +=3D NSIZE*2) { + /* wq$$ =3D wp$$ =3D *(unative_t *)&dptr[z0][d+$$*NSIZE]; */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vle8.v v0, (%[wp0])\n" + "vle8.v v1, (%[wp0])\n" + "vle8.v v4, (%[wp1])\n" + "vle8.v v5, (%[wp1])\n" + ".option pop\n" + : : + [wp0]"r"(&dptr[z0][d+0*NSIZE]), + [wp1]"r"(&dptr[z0][d+1*NSIZE]) + ); + + for (z =3D z0-1 ; z >=3D 0 ; z--) { + /* + * w2$$ =3D MASK(wq$$); + * w1$$ =3D SHLBYTE(wq$$); + * w2$$ &=3D NBYTES(0x1d); + * w1$$ ^=3D w2$$; + * wd$$ =3D *(unative_t *)&dptr[z][d+$$*NSIZE]; + * wq$$ =3D w1$$ ^ wd$$; + * wp$$ ^=3D wd$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsra.vi v2, v1, 7\n" + "vsll.vi v3, v1, 1\n" + "vand.vx v2, v2, %[x1d]\n" + "vxor.vv v3, v3, v2\n" + "vle8.v v2, (%[wd0])\n" + "vxor.vv v1, v3, v2\n" + "vxor.vv v0, v0, v2\n" + + "vsra.vi v6, v5, 7\n" + "vsll.vi v7, v5, 1\n" + "vand.vx v6, v6, %[x1d]\n" + "vxor.vv v7, v7, v6\n" + "vle8.v v6, (%[wd1])\n" + "vxor.vv v5, v7, v6\n" + "vxor.vv v4, v4, v6\n" + ".option pop\n" + : : + [wd0]"r"(&dptr[z][d+0*NSIZE]), + [wd1]"r"(&dptr[z][d+1*NSIZE]), + [x1d]"r"(0x1d) + ); + } + + /* + * *(unative_t *)&p[d+NSIZE*$$] =3D wp$$; + * *(unative_t *)&q[d+NSIZE*$$] =3D wq$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vse8.v v0, (%[wp0])\n" + "vse8.v v1, (%[wq0])\n" + "vse8.v v4, (%[wp1])\n" + "vse8.v v5, (%[wq1])\n" + ".option pop\n" + : : + [wp0]"r"(&p[d+NSIZE*0]), + [wq0]"r"(&q[d+NSIZE*0]), + [wp1]"r"(&p[d+NSIZE*1]), + [wq1]"r"(&q[d+NSIZE*1]) + ); + } +} + +static void raid6_rvv2_xor_syndrome_real(int disks, int start, int stop, + unsigned long bytes, void **ptrs) +{ + u8 **dptr =3D (u8 **)ptrs; + u8 *p, *q; + int d, z, z0; + + z0 =3D stop; /* P/Q right side optimization */ + p =3D dptr[disks-2]; /* XOR parity */ + q =3D dptr[disks-1]; /* RS syndrome */ + + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsetvli t0, x0, e8, m1, ta, ma\n" + ".option pop\n" + ); + + /* + * v0:wp0, v1:wq0, v2:wd0/w20, v3:w10 + * v4:wp1, v5:wq1, v6:wd1/w21, v7:w11 + */ + for (d =3D 0 ; d < bytes ; d +=3D NSIZE*2) { + /* wq$$ =3D wp$$ =3D *(unative_t *)&dptr[z0][d+$$*NSIZE]; */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vle8.v v0, (%[wp0])\n" + "vle8.v v1, (%[wp0])\n" + "vle8.v v4, (%[wp1])\n" + "vle8.v v5, (%[wp1])\n" + ".option pop\n" + : : + [wp0]"r"(&dptr[z0][d+0*NSIZE]), + [wp1]"r"(&dptr[z0][d+1*NSIZE]) + ); + + /* P/Q data pages */ + for (z =3D z0-1 ; z >=3D start ; z--) { + /* + * w2$$ =3D MASK(wq$$); + * w1$$ =3D SHLBYTE(wq$$); + * w2$$ &=3D NBYTES(0x1d); + * w1$$ ^=3D w2$$; + * wd$$ =3D *(unative_t *)&dptr[z][d+$$*NSIZE]; + * wq$$ =3D w1$$ ^ wd$$; + * wp$$ ^=3D wd$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsra.vi v2, v1, 7\n" + "vsll.vi v3, v1, 1\n" + "vand.vx v2, v2, %[x1d]\n" + "vxor.vv v3, v3, v2\n" + "vle8.v v2, (%[wd0])\n" + "vxor.vv v1, v3, v2\n" + "vxor.vv v0, v0, v2\n" + + "vsra.vi v6, v5, 7\n" + "vsll.vi v7, v5, 1\n" + "vand.vx v6, v6, %[x1d]\n" + "vxor.vv v7, v7, v6\n" + "vle8.v v6, (%[wd1])\n" + "vxor.vv v5, v7, v6\n" + "vxor.vv v4, v4, v6\n" + ".option pop\n" + : : + [wd0]"r"(&dptr[z][d+0*NSIZE]), + [wd1]"r"(&dptr[z][d+1*NSIZE]), + [x1d]"r"(0x1d) + ); + } + + /* P/Q left side optimization */ + for (z =3D start-1 ; z >=3D 0 ; z--) { + /* + * w2$$ =3D MASK(wq$$); + * w1$$ =3D SHLBYTE(wq$$); + * w2$$ &=3D NBYTES(0x1d); + * wq$$ =3D w1$$ ^ w2$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsra.vi v2, v1, 7\n" + "vsll.vi v3, v1, 1\n" + "vand.vx v2, v2, %[x1d]\n" + "vxor.vv v1, v3, v2\n" + + "vsra.vi v6, v5, 7\n" + "vsll.vi v7, v5, 1\n" + "vand.vx v6, v6, %[x1d]\n" + "vxor.vv v5, v7, v6\n" + ".option pop\n" + : : + [x1d]"r"(0x1d) + ); + } + + /* + * *(unative_t *)&p[d+NSIZE*$$] ^=3D wp$$; + * *(unative_t *)&q[d+NSIZE*$$] ^=3D wq$$; + * v0:wp0, v1:wq0, v2:p0, v3:q0 + * v4:wp1, v5:wq1, v6:p1, v7:q1 + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vle8.v v2, (%[wp0])\n" + "vle8.v v3, (%[wq0])\n" + "vxor.vv v2, v2, v0\n" + "vxor.vv v3, v3, v1\n" + "vse8.v v2, (%[wp0])\n" + "vse8.v v3, (%[wq0])\n" + + "vle8.v v6, (%[wp1])\n" + "vle8.v v7, (%[wq1])\n" + "vxor.vv v6, v6, v4\n" + "vxor.vv v7, v7, v5\n" + "vse8.v v6, (%[wp1])\n" + "vse8.v v7, (%[wq1])\n" + ".option pop\n" + : : + [wp0]"r"(&p[d+NSIZE*0]), + [wq0]"r"(&q[d+NSIZE*0]), + [wp1]"r"(&p[d+NSIZE*1]), + [wq1]"r"(&q[d+NSIZE*1]) + ); + } +} + +static void raid6_rvv4_gen_syndrome_real(int disks, unsigned long bytes, v= oid **ptrs) +{ + u8 **dptr =3D (u8 **)ptrs; + int d, z, z0; + u8 *p, *q; + + z0 =3D disks - 3; /* Highest data disk */ + p =3D dptr[z0+1]; /* XOR parity */ + q =3D dptr[z0+2]; /* RS syndrome */ + + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsetvli t0, x0, e8, m1, ta, ma\n" + ".option pop\n" + ); + + /* + * v0:wp0, v1:wq0, v2:wd0/w20, v3:w10 + * v4:wp1, v5:wq1, v6:wd1/w21, v7:w11 + * v8:wp2, v9:wq2, v10:wd2/w22, v11:w12 + * v12:wp3, v13:wq3, v14:wd3/w23, v15:w13 + */ + for (d =3D 0 ; d < bytes ; d +=3D NSIZE*4) { + /* wq$$ =3D wp$$ =3D *(unative_t *)&dptr[z0][d+$$*NSIZE]; */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vle8.v v0, (%[wp0])\n" + "vle8.v v1, (%[wp0])\n" + "vle8.v v4, (%[wp1])\n" + "vle8.v v5, (%[wp1])\n" + "vle8.v v8, (%[wp2])\n" + "vle8.v v9, (%[wp2])\n" + "vle8.v v12, (%[wp3])\n" + "vle8.v v13, (%[wp3])\n" + ".option pop\n" + : : + [wp0]"r"(&dptr[z0][d+0*NSIZE]), + [wp1]"r"(&dptr[z0][d+1*NSIZE]), + [wp2]"r"(&dptr[z0][d+2*NSIZE]), + [wp3]"r"(&dptr[z0][d+3*NSIZE]) + ); + + for (z =3D z0-1 ; z >=3D 0 ; z--) { + /* + * w2$$ =3D MASK(wq$$); + * w1$$ =3D SHLBYTE(wq$$); + * w2$$ &=3D NBYTES(0x1d); + * w1$$ ^=3D w2$$; + * wd$$ =3D *(unative_t *)&dptr[z][d+$$*NSIZE]; + * wq$$ =3D w1$$ ^ wd$$; + * wp$$ ^=3D wd$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsra.vi v2, v1, 7\n" + "vsll.vi v3, v1, 1\n" + "vand.vx v2, v2, %[x1d]\n" + "vxor.vv v3, v3, v2\n" + "vle8.v v2, (%[wd0])\n" + "vxor.vv v1, v3, v2\n" + "vxor.vv v0, v0, v2\n" + + "vsra.vi v6, v5, 7\n" + "vsll.vi v7, v5, 1\n" + "vand.vx v6, v6, %[x1d]\n" + "vxor.vv v7, v7, v6\n" + "vle8.v v6, (%[wd1])\n" + "vxor.vv v5, v7, v6\n" + "vxor.vv v4, v4, v6\n" + + "vsra.vi v10, v9, 7\n" + "vsll.vi v11, v9, 1\n" + "vand.vx v10, v10, %[x1d]\n" + "vxor.vv v11, v11, v10\n" + "vle8.v v10, (%[wd2])\n" + "vxor.vv v9, v11, v10\n" + "vxor.vv v8, v8, v10\n" + + "vsra.vi v14, v13, 7\n" + "vsll.vi v15, v13, 1\n" + "vand.vx v14, v14, %[x1d]\n" + "vxor.vv v15, v15, v14\n" + "vle8.v v14, (%[wd3])\n" + "vxor.vv v13, v15, v14\n" + "vxor.vv v12, v12, v14\n" + ".option pop\n" + : : + [wd0]"r"(&dptr[z][d+0*NSIZE]), + [wd1]"r"(&dptr[z][d+1*NSIZE]), + [wd2]"r"(&dptr[z][d+2*NSIZE]), + [wd3]"r"(&dptr[z][d+3*NSIZE]), + [x1d]"r"(0x1d) + ); + } + + /* + * *(unative_t *)&p[d+NSIZE*$$] =3D wp$$; + * *(unative_t *)&q[d+NSIZE*$$] =3D wq$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vse8.v v0, (%[wp0])\n" + "vse8.v v1, (%[wq0])\n" + "vse8.v v4, (%[wp1])\n" + "vse8.v v5, (%[wq1])\n" + "vse8.v v8, (%[wp2])\n" + "vse8.v v9, (%[wq2])\n" + "vse8.v v12, (%[wp3])\n" + "vse8.v v13, (%[wq3])\n" + ".option pop\n" + : : + [wp0]"r"(&p[d+NSIZE*0]), + [wq0]"r"(&q[d+NSIZE*0]), + [wp1]"r"(&p[d+NSIZE*1]), + [wq1]"r"(&q[d+NSIZE*1]), + [wp2]"r"(&p[d+NSIZE*2]), + [wq2]"r"(&q[d+NSIZE*2]), + [wp3]"r"(&p[d+NSIZE*3]), + [wq3]"r"(&q[d+NSIZE*3]) + ); + } +} + +static void raid6_rvv4_xor_syndrome_real(int disks, int start, int stop, + unsigned long bytes, void **ptrs) +{ + u8 **dptr =3D (u8 **)ptrs; + u8 *p, *q; + int d, z, z0; + + z0 =3D stop; /* P/Q right side optimization */ + p =3D dptr[disks-2]; /* XOR parity */ + q =3D dptr[disks-1]; /* RS syndrome */ + + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsetvli t0, x0, e8, m1, ta, ma\n" + ".option pop\n" + ); + + /* + * v0:wp0, v1:wq0, v2:wd0/w20, v3:w10 + * v4:wp1, v5:wq1, v6:wd1/w21, v7:w11 + * v8:wp2, v9:wq2, v10:wd2/w22, v11:w12 + * v12:wp3, v13:wq3, v14:wd3/w23, v15:w13 + */ + for (d =3D 0 ; d < bytes ; d +=3D NSIZE*4) { + /* wq$$ =3D wp$$ =3D *(unative_t *)&dptr[z0][d+$$*NSIZE]; */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vle8.v v0, (%[wp0])\n" + "vle8.v v1, (%[wp0])\n" + "vle8.v v4, (%[wp1])\n" + "vle8.v v5, (%[wp1])\n" + "vle8.v v8, (%[wp2])\n" + "vle8.v v9, (%[wp2])\n" + "vle8.v v12, (%[wp3])\n" + "vle8.v v13, (%[wp3])\n" + ".option pop\n" + : : + [wp0]"r"(&dptr[z0][d+0*NSIZE]), + [wp1]"r"(&dptr[z0][d+1*NSIZE]), + [wp2]"r"(&dptr[z0][d+2*NSIZE]), + [wp3]"r"(&dptr[z0][d+3*NSIZE]) + ); + + /* P/Q data pages */ + for (z =3D z0-1 ; z >=3D start ; z--) { + /* + * w2$$ =3D MASK(wq$$); + * w1$$ =3D SHLBYTE(wq$$); + * w2$$ &=3D NBYTES(0x1d); + * w1$$ ^=3D w2$$; + * wd$$ =3D *(unative_t *)&dptr[z][d+$$*NSIZE]; + * wq$$ =3D w1$$ ^ wd$$; + * wp$$ ^=3D wd$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsra.vi v2, v1, 7\n" + "vsll.vi v3, v1, 1\n" + "vand.vx v2, v2, %[x1d]\n" + "vxor.vv v3, v3, v2\n" + "vle8.v v2, (%[wd0])\n" + "vxor.vv v1, v3, v2\n" + "vxor.vv v0, v0, v2\n" + + "vsra.vi v6, v5, 7\n" + "vsll.vi v7, v5, 1\n" + "vand.vx v6, v6, %[x1d]\n" + "vxor.vv v7, v7, v6\n" + "vle8.v v6, (%[wd1])\n" + "vxor.vv v5, v7, v6\n" + "vxor.vv v4, v4, v6\n" + + "vsra.vi v10, v9, 7\n" + "vsll.vi v11, v9, 1\n" + "vand.vx v10, v10, %[x1d]\n" + "vxor.vv v11, v11, v10\n" + "vle8.v v10, (%[wd2])\n" + "vxor.vv v9, v11, v10\n" + "vxor.vv v8, v8, v10\n" + + "vsra.vi v14, v13, 7\n" + "vsll.vi v15, v13, 1\n" + "vand.vx v14, v14, %[x1d]\n" + "vxor.vv v15, v15, v14\n" + "vle8.v v14, (%[wd3])\n" + "vxor.vv v13, v15, v14\n" + "vxor.vv v12, v12, v14\n" + ".option pop\n" + : : + [wd0]"r"(&dptr[z][d+0*NSIZE]), + [wd1]"r"(&dptr[z][d+1*NSIZE]), + [wd2]"r"(&dptr[z][d+2*NSIZE]), + [wd3]"r"(&dptr[z][d+3*NSIZE]), + [x1d]"r"(0x1d) + ); + } + + /* P/Q left side optimization */ + for (z =3D start-1 ; z >=3D 0 ; z--) { + /* + * w2$$ =3D MASK(wq$$); + * w1$$ =3D SHLBYTE(wq$$); + * w2$$ &=3D NBYTES(0x1d); + * wq$$ =3D w1$$ ^ w2$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsra.vi v2, v1, 7\n" + "vsll.vi v3, v1, 1\n" + "vand.vx v2, v2, %[x1d]\n" + "vxor.vv v1, v3, v2\n" + + "vsra.vi v6, v5, 7\n" + "vsll.vi v7, v5, 1\n" + "vand.vx v6, v6, %[x1d]\n" + "vxor.vv v5, v7, v6\n" + + "vsra.vi v10, v9, 7\n" + "vsll.vi v11, v9, 1\n" + "vand.vx v10, v10, %[x1d]\n" + "vxor.vv v9, v11, v10\n" + + "vsra.vi v14, v13, 7\n" + "vsll.vi v15, v13, 1\n" + "vand.vx v14, v14, %[x1d]\n" + "vxor.vv v13, v15, v14\n" + ".option pop\n" + : : + [x1d]"r"(0x1d) + ); + } + + /* + * *(unative_t *)&p[d+NSIZE*$$] ^=3D wp$$; + * *(unative_t *)&q[d+NSIZE*$$] ^=3D wq$$; + * v0:wp0, v1:wq0, v2:p0, v3:q0 + * v4:wp1, v5:wq1, v6:p1, v7:q1 + * v8:wp2, v9:wq2, v10:p2, v11:q2 + * v12:wp3, v13:wq3, v14:p3, v15:q3 + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vle8.v v2, (%[wp0])\n" + "vle8.v v3, (%[wq0])\n" + "vxor.vv v2, v2, v0\n" + "vxor.vv v3, v3, v1\n" + "vse8.v v2, (%[wp0])\n" + "vse8.v v3, (%[wq0])\n" + + "vle8.v v6, (%[wp1])\n" + "vle8.v v7, (%[wq1])\n" + "vxor.vv v6, v6, v4\n" + "vxor.vv v7, v7, v5\n" + "vse8.v v6, (%[wp1])\n" + "vse8.v v7, (%[wq1])\n" + + "vle8.v v10, (%[wp2])\n" + "vle8.v v11, (%[wq2])\n" + "vxor.vv v10, v10, v8\n" + "vxor.vv v11, v11, v9\n" + "vse8.v v10, (%[wp2])\n" + "vse8.v v11, (%[wq2])\n" + + "vle8.v v14, (%[wp3])\n" + "vle8.v v15, (%[wq3])\n" + "vxor.vv v14, v14, v12\n" + "vxor.vv v15, v15, v13\n" + "vse8.v v14, (%[wp3])\n" + "vse8.v v15, (%[wq3])\n" + ".option pop\n" + : : + [wp0]"r"(&p[d+NSIZE*0]), + [wq0]"r"(&q[d+NSIZE*0]), + [wp1]"r"(&p[d+NSIZE*1]), + [wq1]"r"(&q[d+NSIZE*1]), + [wp2]"r"(&p[d+NSIZE*2]), + [wq2]"r"(&q[d+NSIZE*2]), + [wp3]"r"(&p[d+NSIZE*3]), + [wq3]"r"(&q[d+NSIZE*3]) + ); + } +} + +static void raid6_rvv8_gen_syndrome_real(int disks, unsigned long bytes, v= oid **ptrs) +{ + u8 **dptr =3D (u8 **)ptrs; + int d, z, z0; + u8 *p, *q; + + z0 =3D disks - 3; /* Highest data disk */ + p =3D dptr[z0+1]; /* XOR parity */ + q =3D dptr[z0+2]; /* RS syndrome */ + + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsetvli t0, x0, e8, m1, ta, ma\n" + ".option pop\n" + ); + + /* + * v0:wp0, v1:wq0, v2:wd0/w20, v3:w10 + * v4:wp1, v5:wq1, v6:wd1/w21, v7:w11 + * v8:wp2, v9:wq2, v10:wd2/w22, v11:w12 + * v12:wp3, v13:wq3, v14:wd3/w23, v15:w13 + * v16:wp4, v17:wq4, v18:wd4/w24, v19:w14 + * v20:wp5, v21:wq5, v22:wd5/w25, v23:w15 + * v24:wp6, v25:wq6, v26:wd6/w26, v27:w16 + * v28:wp7, v29:wq7, v30:wd7/w27, v31:w17 + */ + for (d =3D 0; d < bytes; d +=3D NSIZE*8) { + /* wq$$ =3D wp$$ =3D *(unative_t *)&dptr[z0][d+$$*NSIZE]; */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vle8.v v0, (%[wp0])\n" + "vle8.v v1, (%[wp0])\n" + "vle8.v v4, (%[wp1])\n" + "vle8.v v5, (%[wp1])\n" + "vle8.v v8, (%[wp2])\n" + "vle8.v v9, (%[wp2])\n" + "vle8.v v12, (%[wp3])\n" + "vle8.v v13, (%[wp3])\n" + "vle8.v v16, (%[wp4])\n" + "vle8.v v17, (%[wp4])\n" + "vle8.v v20, (%[wp5])\n" + "vle8.v v21, (%[wp5])\n" + "vle8.v v24, (%[wp6])\n" + "vle8.v v25, (%[wp6])\n" + "vle8.v v28, (%[wp7])\n" + "vle8.v v29, (%[wp7])\n" + ".option pop\n" + : : + [wp0]"r"(&dptr[z0][d+0*NSIZE]), + [wp1]"r"(&dptr[z0][d+1*NSIZE]), + [wp2]"r"(&dptr[z0][d+2*NSIZE]), + [wp3]"r"(&dptr[z0][d+3*NSIZE]), + [wp4]"r"(&dptr[z0][d+4*NSIZE]), + [wp5]"r"(&dptr[z0][d+5*NSIZE]), + [wp6]"r"(&dptr[z0][d+6*NSIZE]), + [wp7]"r"(&dptr[z0][d+7*NSIZE]) + ); + + for (z =3D z0-1; z >=3D 0; z--) { + /* + * w2$$ =3D MASK(wq$$); + * w1$$ =3D SHLBYTE(wq$$); + * w2$$ &=3D NBYTES(0x1d); + * w1$$ ^=3D w2$$; + * wd$$ =3D *(unative_t *)&dptr[z][d+$$*NSIZE]; + * wq$$ =3D w1$$ ^ wd$$; + * wp$$ ^=3D wd$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsra.vi v2, v1, 7\n" + "vsll.vi v3, v1, 1\n" + "vand.vx v2, v2, %[x1d]\n" + "vxor.vv v3, v3, v2\n" + "vle8.v v2, (%[wd0])\n" + "vxor.vv v1, v3, v2\n" + "vxor.vv v0, v0, v2\n" + + "vsra.vi v6, v5, 7\n" + "vsll.vi v7, v5, 1\n" + "vand.vx v6, v6, %[x1d]\n" + "vxor.vv v7, v7, v6\n" + "vle8.v v6, (%[wd1])\n" + "vxor.vv v5, v7, v6\n" + "vxor.vv v4, v4, v6\n" + + "vsra.vi v10, v9, 7\n" + "vsll.vi v11, v9, 1\n" + "vand.vx v10, v10, %[x1d]\n" + "vxor.vv v11, v11, v10\n" + "vle8.v v10, (%[wd2])\n" + "vxor.vv v9, v11, v10\n" + "vxor.vv v8, v8, v10\n" + + "vsra.vi v14, v13, 7\n" + "vsll.vi v15, v13, 1\n" + "vand.vx v14, v14, %[x1d]\n" + "vxor.vv v15, v15, v14\n" + "vle8.v v14, (%[wd3])\n" + "vxor.vv v13, v15, v14\n" + "vxor.vv v12, v12, v14\n" + + "vsra.vi v18, v17, 7\n" + "vsll.vi v19, v17, 1\n" + "vand.vx v18, v18, %[x1d]\n" + "vxor.vv v19, v19, v18\n" + "vle8.v v18, (%[wd4])\n" + "vxor.vv v17, v19, v18\n" + "vxor.vv v16, v16, v18\n" + + "vsra.vi v22, v21, 7\n" + "vsll.vi v23, v21, 1\n" + "vand.vx v22, v22, %[x1d]\n" + "vxor.vv v23, v23, v22\n" + "vle8.v v22, (%[wd5])\n" + "vxor.vv v21, v23, v22\n" + "vxor.vv v20, v20, v22\n" + + "vsra.vi v26, v25, 7\n" + "vsll.vi v27, v25, 1\n" + "vand.vx v26, v26, %[x1d]\n" + "vxor.vv v27, v27, v26\n" + "vle8.v v26, (%[wd6])\n" + "vxor.vv v25, v27, v26\n" + "vxor.vv v24, v24, v26\n" + + "vsra.vi v30, v29, 7\n" + "vsll.vi v31, v29, 1\n" + "vand.vx v30, v30, %[x1d]\n" + "vxor.vv v31, v31, v30\n" + "vle8.v v30, (%[wd7])\n" + "vxor.vv v29, v31, v30\n" + "vxor.vv v28, v28, v30\n" + ".option pop\n" + : : + [wd0]"r"(&dptr[z][d+0*NSIZE]), + [wd1]"r"(&dptr[z][d+1*NSIZE]), + [wd2]"r"(&dptr[z][d+2*NSIZE]), + [wd3]"r"(&dptr[z][d+3*NSIZE]), + [wd4]"r"(&dptr[z][d+4*NSIZE]), + [wd5]"r"(&dptr[z][d+5*NSIZE]), + [wd6]"r"(&dptr[z][d+6*NSIZE]), + [wd7]"r"(&dptr[z][d+7*NSIZE]), + [x1d]"r"(0x1d) + ); + } + + /* + * *(unative_t *)&p[d+NSIZE*$$] =3D wp$$; + * *(unative_t *)&q[d+NSIZE*$$] =3D wq$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vse8.v v0, (%[wp0])\n" + "vse8.v v1, (%[wq0])\n" + "vse8.v v4, (%[wp1])\n" + "vse8.v v5, (%[wq1])\n" + "vse8.v v8, (%[wp2])\n" + "vse8.v v9, (%[wq2])\n" + "vse8.v v12, (%[wp3])\n" + "vse8.v v13, (%[wq3])\n" + "vse8.v v16, (%[wp4])\n" + "vse8.v v17, (%[wq4])\n" + "vse8.v v20, (%[wp5])\n" + "vse8.v v21, (%[wq5])\n" + "vse8.v v24, (%[wp6])\n" + "vse8.v v25, (%[wq6])\n" + "vse8.v v28, (%[wp7])\n" + "vse8.v v29, (%[wq7])\n" + ".option pop\n" + : : + [wp0]"r"(&p[d+NSIZE*0]), + [wq0]"r"(&q[d+NSIZE*0]), + [wp1]"r"(&p[d+NSIZE*1]), + [wq1]"r"(&q[d+NSIZE*1]), + [wp2]"r"(&p[d+NSIZE*2]), + [wq2]"r"(&q[d+NSIZE*2]), + [wp3]"r"(&p[d+NSIZE*3]), + [wq3]"r"(&q[d+NSIZE*3]), + [wp4]"r"(&p[d+NSIZE*4]), + [wq4]"r"(&q[d+NSIZE*4]), + [wp5]"r"(&p[d+NSIZE*5]), + [wq5]"r"(&q[d+NSIZE*5]), + [wp6]"r"(&p[d+NSIZE*6]), + [wq6]"r"(&q[d+NSIZE*6]), + [wp7]"r"(&p[d+NSIZE*7]), + [wq7]"r"(&q[d+NSIZE*7]) + ); + } +} + +static void raid6_rvv8_xor_syndrome_real(int disks, int start, int stop, + unsigned long bytes, void **ptrs) +{ + u8 **dptr =3D (u8 **)ptrs; + u8 *p, *q; + int d, z, z0; + + z0 =3D stop; /* P/Q right side optimization */ + p =3D dptr[disks-2]; /* XOR parity */ + q =3D dptr[disks-1]; /* RS syndrome */ + + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsetvli t0, x0, e8, m1, ta, ma\n" + ".option pop\n" + ); + + /* + * v0:wp0, v1:wq0, v2:wd0/w20, v3:w10 + * v4:wp1, v5:wq1, v6:wd1/w21, v7:w11 + * v8:wp2, v9:wq2, v10:wd2/w22, v11:w12 + * v12:wp3, v13:wq3, v14:wd3/w23, v15:w13 + * v16:wp4, v17:wq4, v18:wd4/w24, v19:w14 + * v20:wp5, v21:wq5, v22:wd5/w25, v23:w15 + * v24:wp6, v25:wq6, v26:wd6/w26, v27:w16 + * v28:wp7, v29:wq7, v30:wd7/w27, v31:w17 + */ + for (d =3D 0; d < bytes; d +=3D NSIZE*8) { + /* wq$$ =3D wp$$ =3D *(unative_t *)&dptr[z0][d+$$*NSIZE]; */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vle8.v v0, (%[wp0])\n" + "vle8.v v1, (%[wp0])\n" + "vle8.v v4, (%[wp1])\n" + "vle8.v v5, (%[wp1])\n" + "vle8.v v8, (%[wp2])\n" + "vle8.v v9, (%[wp2])\n" + "vle8.v v12, (%[wp3])\n" + "vle8.v v13, (%[wp3])\n" + "vle8.v v16, (%[wp4])\n" + "vle8.v v17, (%[wp4])\n" + "vle8.v v20, (%[wp5])\n" + "vle8.v v21, (%[wp5])\n" + "vle8.v v24, (%[wp6])\n" + "vle8.v v25, (%[wp6])\n" + "vle8.v v28, (%[wp7])\n" + "vle8.v v29, (%[wp7])\n" + ".option pop\n" + : : + [wp0]"r"(&dptr[z0][d+0*NSIZE]), + [wp1]"r"(&dptr[z0][d+1*NSIZE]), + [wp2]"r"(&dptr[z0][d+2*NSIZE]), + [wp3]"r"(&dptr[z0][d+3*NSIZE]), + [wp4]"r"(&dptr[z0][d+4*NSIZE]), + [wp5]"r"(&dptr[z0][d+5*NSIZE]), + [wp6]"r"(&dptr[z0][d+6*NSIZE]), + [wp7]"r"(&dptr[z0][d+7*NSIZE]) + ); + + /* P/Q data pages */ + for (z =3D z0-1; z >=3D start; z--) { + /* + * w2$$ =3D MASK(wq$$); + * w1$$ =3D SHLBYTE(wq$$); + * w2$$ &=3D NBYTES(0x1d); + * w1$$ ^=3D w2$$; + * wd$$ =3D *(unative_t *)&dptr[z][d+$$*NSIZE]; + * wq$$ =3D w1$$ ^ wd$$; + * wp$$ ^=3D wd$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsra.vi v2, v1, 7\n" + "vsll.vi v3, v1, 1\n" + "vand.vx v2, v2, %[x1d]\n" + "vxor.vv v3, v3, v2\n" + "vle8.v v2, (%[wd0])\n" + "vxor.vv v1, v3, v2\n" + "vxor.vv v0, v0, v2\n" + + "vsra.vi v6, v5, 7\n" + "vsll.vi v7, v5, 1\n" + "vand.vx v6, v6, %[x1d]\n" + "vxor.vv v7, v7, v6\n" + "vle8.v v6, (%[wd1])\n" + "vxor.vv v5, v7, v6\n" + "vxor.vv v4, v4, v6\n" + + "vsra.vi v10, v9, 7\n" + "vsll.vi v11, v9, 1\n" + "vand.vx v10, v10, %[x1d]\n" + "vxor.vv v11, v11, v10\n" + "vle8.v v10, (%[wd2])\n" + "vxor.vv v9, v11, v10\n" + "vxor.vv v8, v8, v10\n" + + "vsra.vi v14, v13, 7\n" + "vsll.vi v15, v13, 1\n" + "vand.vx v14, v14, %[x1d]\n" + "vxor.vv v15, v15, v14\n" + "vle8.v v14, (%[wd3])\n" + "vxor.vv v13, v15, v14\n" + "vxor.vv v12, v12, v14\n" + + "vsra.vi v18, v17, 7\n" + "vsll.vi v19, v17, 1\n" + "vand.vx v18, v18, %[x1d]\n" + "vxor.vv v19, v19, v18\n" + "vle8.v v18, (%[wd4])\n" + "vxor.vv v17, v19, v18\n" + "vxor.vv v16, v16, v18\n" + + "vsra.vi v22, v21, 7\n" + "vsll.vi v23, v21, 1\n" + "vand.vx v22, v22, %[x1d]\n" + "vxor.vv v23, v23, v22\n" + "vle8.v v22, (%[wd5])\n" + "vxor.vv v21, v23, v22\n" + "vxor.vv v20, v20, v22\n" + + "vsra.vi v26, v25, 7\n" + "vsll.vi v27, v25, 1\n" + "vand.vx v26, v26, %[x1d]\n" + "vxor.vv v27, v27, v26\n" + "vle8.v v26, (%[wd6])\n" + "vxor.vv v25, v27, v26\n" + "vxor.vv v24, v24, v26\n" + + "vsra.vi v30, v29, 7\n" + "vsll.vi v31, v29, 1\n" + "vand.vx v30, v30, %[x1d]\n" + "vxor.vv v31, v31, v30\n" + "vle8.v v30, (%[wd7])\n" + "vxor.vv v29, v31, v30\n" + "vxor.vv v28, v28, v30\n" + ".option pop\n" + : : + [wd0]"r"(&dptr[z][d+0*NSIZE]), + [wd1]"r"(&dptr[z][d+1*NSIZE]), + [wd2]"r"(&dptr[z][d+2*NSIZE]), + [wd3]"r"(&dptr[z][d+3*NSIZE]), + [wd4]"r"(&dptr[z][d+4*NSIZE]), + [wd5]"r"(&dptr[z][d+5*NSIZE]), + [wd6]"r"(&dptr[z][d+6*NSIZE]), + [wd7]"r"(&dptr[z][d+7*NSIZE]), + [x1d]"r"(0x1d) + ); + } + + /* P/Q left side optimization */ + for (z =3D start-1; z >=3D 0; z--) { + /* + * w2$$ =3D MASK(wq$$); + * w1$$ =3D SHLBYTE(wq$$); + * w2$$ &=3D NBYTES(0x1d); + * wq$$ =3D w1$$ ^ w2$$; + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vsra.vi v2, v1, 7\n" + "vsll.vi v3, v1, 1\n" + "vand.vx v2, v2, %[x1d]\n" + "vxor.vv v1, v3, v2\n" + + "vsra.vi v6, v5, 7\n" + "vsll.vi v7, v5, 1\n" + "vand.vx v6, v6, %[x1d]\n" + "vxor.vv v5, v7, v6\n" + + "vsra.vi v10, v9, 7\n" + "vsll.vi v11, v9, 1\n" + "vand.vx v10, v10, %[x1d]\n" + "vxor.vv v9, v11, v10\n" + + "vsra.vi v14, v13, 7\n" + "vsll.vi v15, v13, 1\n" + "vand.vx v14, v14, %[x1d]\n" + "vxor.vv v13, v15, v14\n" + + "vsra.vi v18, v17, 7\n" + "vsll.vi v19, v17, 1\n" + "vand.vx v18, v18, %[x1d]\n" + "vxor.vv v17, v19, v18\n" + + "vsra.vi v22, v21, 7\n" + "vsll.vi v23, v21, 1\n" + "vand.vx v22, v22, %[x1d]\n" + "vxor.vv v21, v23, v22\n" + + "vsra.vi v26, v25, 7\n" + "vsll.vi v27, v25, 1\n" + "vand.vx v26, v26, %[x1d]\n" + "vxor.vv v25, v27, v26\n" + + "vsra.vi v30, v29, 7\n" + "vsll.vi v31, v29, 1\n" + "vand.vx v30, v30, %[x1d]\n" + "vxor.vv v29, v31, v30\n" + ".option pop\n" + : : + [x1d]"r"(0x1d) + ); + } + + /* + * *(unative_t *)&p[d+NSIZE*$$] ^=3D wp$$; + * *(unative_t *)&q[d+NSIZE*$$] ^=3D wq$$; + * v0:wp0, v1:wq0, v2:p0, v3:q0 + * v4:wp1, v5:wq1, v6:p1, v7:q1 + * v8:wp2, v9:wq2, v10:p2, v11:q2 + * v12:wp3, v13:wq3, v14:p3, v15:q3 + * v16:wp4, v17:wq4, v18:p4, v19:q4 + * v20:wp5, v21:wq5, v22:p5, v23:q5 + * v24:wp6, v25:wq6, v26:p6, v27:q6 + * v28:wp7, v29:wq7, v30:p7, v31:q7 + */ + asm volatile ( + ".option push\n" + ".option arch,+v\n" + "vle8.v v2, (%[wp0])\n" + "vle8.v v3, (%[wq0])\n" + "vxor.vv v2, v2, v0\n" + "vxor.vv v3, v3, v1\n" + "vse8.v v2, (%[wp0])\n" + "vse8.v v3, (%[wq0])\n" + + "vle8.v v6, (%[wp1])\n" + "vle8.v v7, (%[wq1])\n" + "vxor.vv v6, v6, v4\n" + "vxor.vv v7, v7, v5\n" + "vse8.v v6, (%[wp1])\n" + "vse8.v v7, (%[wq1])\n" + + "vle8.v v10, (%[wp2])\n" + "vle8.v v11, (%[wq2])\n" + "vxor.vv v10, v10, v8\n" + "vxor.vv v11, v11, v9\n" + "vse8.v v10, (%[wp2])\n" + "vse8.v v11, (%[wq2])\n" + + "vle8.v v14, (%[wp3])\n" + "vle8.v v15, (%[wq3])\n" + "vxor.vv v14, v14, v12\n" + "vxor.vv v15, v15, v13\n" + "vse8.v v14, (%[wp3])\n" + "vse8.v v15, (%[wq3])\n" + + "vle8.v v18, (%[wp4])\n" + "vle8.v v19, (%[wq4])\n" + "vxor.vv v18, v18, v16\n" + "vxor.vv v19, v19, v17\n" + "vse8.v v18, (%[wp4])\n" + "vse8.v v19, (%[wq4])\n" + + "vle8.v v22, (%[wp5])\n" + "vle8.v v23, (%[wq5])\n" + "vxor.vv v22, v22, v20\n" + "vxor.vv v23, v23, v21\n" + "vse8.v v22, (%[wp5])\n" + "vse8.v v23, (%[wq5])\n" + + "vle8.v v26, (%[wp6])\n" + "vle8.v v27, (%[wq6])\n" + "vxor.vv v26, v26, v24\n" + "vxor.vv v27, v27, v25\n" + "vse8.v v26, (%[wp6])\n" + "vse8.v v27, (%[wq6])\n" + + "vle8.v v30, (%[wp7])\n" + "vle8.v v31, (%[wq7])\n" + "vxor.vv v30, v30, v28\n" + "vxor.vv v31, v31, v29\n" + "vse8.v v30, (%[wp7])\n" + "vse8.v v31, (%[wq7])\n" + ".option pop\n" + : : + [wp0]"r"(&p[d+NSIZE*0]), + [wq0]"r"(&q[d+NSIZE*0]), + [wp1]"r"(&p[d+NSIZE*1]), + [wq1]"r"(&q[d+NSIZE*1]), + [wp2]"r"(&p[d+NSIZE*2]), + [wq2]"r"(&q[d+NSIZE*2]), + [wp3]"r"(&p[d+NSIZE*3]), + [wq3]"r"(&q[d+NSIZE*3]), + [wp4]"r"(&p[d+NSIZE*4]), + [wq4]"r"(&q[d+NSIZE*4]), + [wp5]"r"(&p[d+NSIZE*5]), + [wq5]"r"(&q[d+NSIZE*5]), + [wp6]"r"(&p[d+NSIZE*6]), + [wq6]"r"(&q[d+NSIZE*6]), + [wp7]"r"(&p[d+NSIZE*7]), + [wq7]"r"(&q[d+NSIZE*7]) + ); + } +} + +#define RAID6_RVV_WRAPPER(_n) \ + static void raid6_rvv ## _n ## _gen_syndrome(int disks, \ + size_t bytes, void **ptrs) \ + { \ + void raid6_rvv ## _n ## _gen_syndrome_real(int, \ + unsigned long, void**); \ + kernel_vector_begin(); \ + raid6_rvv ## _n ## _gen_syndrome_real(disks, \ + (unsigned long)bytes, ptrs); \ + kernel_vector_end(); \ + } \ + static void raid6_rvv ## _n ## _xor_syndrome(int disks, \ + int start, int stop, \ + size_t bytes, void **ptrs) \ + { \ + void raid6_rvv ## _n ## _xor_syndrome_real(int, \ + int, int, unsigned long, void**); \ + kernel_vector_begin(); \ + raid6_rvv ## _n ## _xor_syndrome_real(disks, \ + start, stop, (unsigned long)bytes, ptrs); \ + kernel_vector_end(); \ + } \ + struct raid6_calls const raid6_rvvx ## _n =3D { \ + raid6_rvv ## _n ## _gen_syndrome, \ + raid6_rvv ## _n ## _xor_syndrome, \ + rvv_has_vector, \ + "rvvx" #_n, \ + 0 \ + } + +RAID6_RVV_WRAPPER(1); +RAID6_RVV_WRAPPER(2); +RAID6_RVV_WRAPPER(4); +RAID6_RVV_WRAPPER(8); --=20 2.34.1