From nobody Wed Dec 17 12:18:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A6CEC4167B for ; Mon, 27 Nov 2023 07:07:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232257AbjK0HHP (ORCPT ); Mon, 27 Nov 2023 02:07:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229450AbjK0HHL (ORCPT ); Mon, 27 Nov 2023 02:07:11 -0500 Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB464133 for ; Sun, 26 Nov 2023 23:07:16 -0800 (PST) Received: by mail-pg1-x52e.google.com with SMTP id 41be03b00d2f7-5bd6ac9833fso2121566a12.0 for ; Sun, 26 Nov 2023 23:07:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1701068836; x=1701673636; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/AQWefridDFzEX+S3QZsEdRxceu5Syx5iK5d77pjmdM=; b=U0MvI4S/Fs3gjOilMQuu49WHu6Aof9dzgAdbM5Gsk8W2BqwNx80zFGD484TPqf1t1u VX6fURDhE6Xyt2X5Wqw4IqvWlNuldcazkBUpcL6P1C+peZcn/qbD+irTxJ3OZ/XbC8Od r+Y9F/wjZe+ke/gjwQF3IJ4xclvHIMEc7zIlxVAb53S90NFgYS/+V+eaj37ztdMXvwt0 pa6NaxRroYqM1FK6Oy0vbX5zt19f1YXjVJ2XP/DbsDN3hhOSRx/mjg2ZkenjuTiFlKRc E8LFZV1h8EuWLPOmsY/pgCi14KZ+vAybmubgh4dcIMWehM4v741xreEDFeTA75jy38+3 SpNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701068836; x=1701673636; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/AQWefridDFzEX+S3QZsEdRxceu5Syx5iK5d77pjmdM=; b=vS7zfCoW5aFF6jGZaSNswdSROq/TBJg9KHI3J198dKSqODJrnX8R/z6ZTCOJRtROka 4Pd9N/Kui4PiOwUv08GbdsgRF2rs97Z4f1npCyHgRvLSQZiXypDOdOVQBKHJVWkaOhWa u2JMH5RSZYT9DHSb0NmLw45KgoVEsJHQRZ4GMA7dDVnwOrZRqicX70ToC1MZIza4HBoJ /lhJLOj5dO9YRAELczhbZtL8X72hAkxnurar5rBi3u2nP9t4PVLJAgBIE8/P8xRKY8/N dA060L9NrQWuQk2A0wenPzbN1RCXjjXJkatoSTJ3ZnrOgAunK17T+S64CkVsODX/qEvO gZgg== X-Gm-Message-State: AOJu0YwUW6q59lNN1UIt+hScfmIUNHUBm6yKaXgdBKNibI/aLfAS/FrB /C2MTtBm/Zo+BOPRFk84i6W1zA== X-Google-Smtp-Source: AGHT+IEfgbssJ61TZlq0d2dhxDtdtwhMhkAYzY1cvNJK6+MKTZHhqLX3gJHGHb4gNjKA9hC/r5Pzfg== X-Received: by 2002:a05:6a20:d48e:b0:187:3b1f:219c with SMTP id im14-20020a056a20d48e00b001873b1f219cmr12639143pzb.10.1701068836407; Sun, 26 Nov 2023 23:07:16 -0800 (PST) Received: from localhost.localdomain ([101.10.45.230]) by smtp.gmail.com with ESMTPSA id jh15-20020a170903328f00b001cfcd3a764esm1340134plb.77.2023.11.26.23.07.13 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2023 23:07:16 -0800 (PST) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net, conor.dooley@microchip.com, ebiggers@kernel.org, ardb@kernel.org Cc: heiko@sntech.de, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH v2 01/13] RISC-V: add helper function to read the vector VLEN Date: Mon, 27 Nov 2023 15:06:51 +0800 Message-Id: <20231127070703.1697-2-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231127070703.1697-1-jerry.shih@sifive.com> References: <20231127070703.1697-1-jerry.shih@sifive.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Heiko Stuebner VLEN describes the length of each vector register and some instructions need specific minimal VLENs to work correctly. The vector code already includes a variable riscv_v_vsize that contains the value of "32 vector registers with vlenb length" that gets filled during boot. vlenb is the value contained in the CSR_VLENB register and the value represents "VLEN / 8". So add riscv_vector_vlen() to return the actual VLEN value for in-kernel users when they need to check the available VLEN. Signed-off-by: Heiko Stuebner Signed-off-by: Jerry Shih Reviewed-by: Eric Biggers --- arch/riscv/include/asm/vector.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vecto= r.h index 9fb2dea66abd..1fd3e5510b64 100644 --- a/arch/riscv/include/asm/vector.h +++ b/arch/riscv/include/asm/vector.h @@ -244,4 +244,15 @@ void kernel_vector_allow_preemption(void); #define kernel_vector_allow_preemption() do {} while (0) #endif =20 +/* + * Return the implementation's vlen value. + * + * riscv_v_vsize contains the value of "32 vector registers with vlenb len= gth" + * so rebuild the vlen value in bits from it. + */ +static inline int riscv_vector_vlen(void) +{ + return riscv_v_vsize / 32 * 8; +} + #endif /* ! __ASM_RISCV_VECTOR_H */ --=20 2.28.0 From nobody Wed Dec 17 12:18:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5061C4167B for ; Mon, 27 Nov 2023 07:07:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232326AbjK0HHS (ORCPT ); Mon, 27 Nov 2023 02:07:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231527AbjK0HHN (ORCPT ); Mon, 27 Nov 2023 02:07:13 -0500 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DCC5412D for ; Sun, 26 Nov 2023 23:07:19 -0800 (PST) Received: by mail-pl1-x632.google.com with SMTP id d9443c01a7336-1cfa8f7c356so13681925ad.2 for ; Sun, 26 Nov 2023 23:07:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1701068839; x=1701673639; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WMA+62ieF0dHRmbCC9GfwETOr1r4110ste5ly+nBz6U=; b=HmNx+yUnvgnQUx0mmcJDl6+/jcRjbWZTVcZkbiTQihU3fwI9ZVC+RNwaJxXO6r85XR 9cfSYPzXj9ZIk9kOZ93aTwXGEK84mjROA5r+oSUWqtGGOM9zFRKqoc1mK02zW6q1wwrF x2iQRCavZJKOsP66zjln2HFTjRqKS8ZSNGxaQm52t38Pswhb74W6J3HV3lYcuRNgvhS1 DYBvZpyKboPF4S9kPSZEqyi+/rn1Ji7KjBAzmghLYY19iPttqGkq1RAPvVBeRBPVRtH6 NesvgqyiyeqfwhPmfH/y2+jUX/FIcD1/40PwW3YcDRPDCqdF1QfLb8HuEsA1vLCqxlJ1 zAPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701068839; x=1701673639; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WMA+62ieF0dHRmbCC9GfwETOr1r4110ste5ly+nBz6U=; b=C15dAFsQNruWElV2ajM1lIhuGTQsTJm9MjZhPU3uBTCVjv8tz1UnGBEJq/G8Pj2B2v PuJrexLzeT96WcyjpkQHjNvCO17fBbbDmGPGYDZzfPQFohRtrZvjhf8/+Jol5IuJPYoO UyGZbQtXbcAm6Q2huuIJQe6KjmF3CEHyjJjSrj/Exl5nbtaNkumYGM3jxX6uyOWxO3Ol nttAplJtSjkiICYkh+lh/nRfjr/Rgh1dX87TrSbvNnEvbgp2PrK4Siq7e5GpKlaH36fG rkttXl45LZQkMzKNUL1LkG5ZRazagfsUfrB09cm42fq1RT7BmqnrQpxr0ZIfFi4onZbV 58Vg== X-Gm-Message-State: AOJu0YwRYbd1lYldSQAkWccZ7OOo3Gr1jcOrIBvLzE/IFKDKp+FwpZE1 K3l+N+li8KmjmnJXGrVvir9W7w== X-Google-Smtp-Source: AGHT+IGuXY7hovfUUFK918N8FOS5pzp5Ne429WLs/7kFsDgC+8/3HusV0OnDurXW0gspNvwdh6EGhA== X-Received: by 2002:a17:903:1205:b0:1cc:5589:7dba with SMTP id l5-20020a170903120500b001cc55897dbamr10929394plh.43.1701068839357; Sun, 26 Nov 2023 23:07:19 -0800 (PST) Received: from localhost.localdomain ([101.10.45.230]) by smtp.gmail.com with ESMTPSA id jh15-20020a170903328f00b001cfcd3a764esm1340134plb.77.2023.11.26.23.07.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2023 23:07:19 -0800 (PST) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net, conor.dooley@microchip.com, ebiggers@kernel.org, ardb@kernel.org Cc: heiko@sntech.de, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH v2 02/13] RISC-V: hook new crypto subdir into build-system Date: Mon, 27 Nov 2023 15:06:52 +0800 Message-Id: <20231127070703.1697-3-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231127070703.1697-1-jerry.shih@sifive.com> References: <20231127070703.1697-1-jerry.shih@sifive.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Heiko Stuebner Create a crypto subdirectory for added accelerated cryptography routines and hook it into the riscv Kbuild and the main crypto Kconfig. Signed-off-by: Heiko Stuebner Signed-off-by: Jerry Shih Reviewed-by: Eric Biggers --- arch/riscv/Kbuild | 1 + arch/riscv/crypto/Kconfig | 5 +++++ arch/riscv/crypto/Makefile | 4 ++++ crypto/Kconfig | 3 +++ 4 files changed, 13 insertions(+) create mode 100644 arch/riscv/crypto/Kconfig create mode 100644 arch/riscv/crypto/Makefile diff --git a/arch/riscv/Kbuild b/arch/riscv/Kbuild index d25ad1c19f88..2c585f7a0b6e 100644 --- a/arch/riscv/Kbuild +++ b/arch/riscv/Kbuild @@ -2,6 +2,7 @@ =20 obj-y +=3D kernel/ mm/ net/ obj-$(CONFIG_BUILTIN_DTB) +=3D boot/dts/ +obj-$(CONFIG_CRYPTO) +=3D crypto/ obj-y +=3D errata/ obj-$(CONFIG_KVM) +=3D kvm/ =20 diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig new file mode 100644 index 000000000000..10d60edc0110 --- /dev/null +++ b/arch/riscv/crypto/Kconfig @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +menu "Accelerated Cryptographic Algorithms for CPU (riscv)" + +endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile new file mode 100644 index 000000000000..b3b6332c9f6d --- /dev/null +++ b/arch/riscv/crypto/Makefile @@ -0,0 +1,4 @@ +# SPDX-License-Identifier: GPL-2.0-only +# +# linux/arch/riscv/crypto/Makefile +# diff --git a/crypto/Kconfig b/crypto/Kconfig index 650b1b3620d8..c7b23d2c58e4 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1436,6 +1436,9 @@ endif if PPC source "arch/powerpc/crypto/Kconfig" endif +if RISCV +source "arch/riscv/crypto/Kconfig" +endif if S390 source "arch/s390/crypto/Kconfig" endif --=20 2.28.0 From nobody Wed Dec 17 12:18:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5BE2C4167B for ; Mon, 27 Nov 2023 07:07:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232453AbjK0HHa (ORCPT ); Mon, 27 Nov 2023 02:07:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232341AbjK0HHX (ORCPT ); Mon, 27 Nov 2023 02:07:23 -0500 Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 393BF134 for ; Sun, 26 Nov 2023 23:07:23 -0800 (PST) Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-1cfcc9b3b5cso4914565ad.0 for ; Sun, 26 Nov 2023 23:07:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1701068842; x=1701673642; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=N6MgwafsLqjywfMCXjdUW6zqI+itRnJtEI6/v6PIAPo=; b=HDsTZeFHrhC/ZurN+Vh7iwdexnyCCKCiL8kjau8wBfUZ0E5m+OTJxRIqseGzFuieAb fG0I3SVdpbZ8/OXf1X+HzN9LouEHU8MptalOvqHTRKWp4Dz3amDWq9cmgQWe/sTCw9qD 1GCJpd6nPpv9AtC66iNXq3x7itsDlKR9YdJbfCOgc+GtqXcaxkTZXjNskfGzOGUxldvS Awa3gOiilqpbibRK5+7Ry0SNCC6y0eYn0C85BIDxph7QHCtdB87hWkhWlMLYpzpsbep/ 2Yvct+4Xj7bA9YaRP1c33iAri6TWNJusZ9AMlcL+0e+HMMl6lM8kU1n+6ectGnUfEkbz ivog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701068842; x=1701673642; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N6MgwafsLqjywfMCXjdUW6zqI+itRnJtEI6/v6PIAPo=; b=GiOqRgG39bnAl70ppcOCYBi7FMmlDCqxkYGf2iER6bMGx7qv0owQ1p79DVY2y+LAk3 DIk//ipxX9FSzYV1FiyEYDCW7QuLgNEXFocfvh2pWV0icUTIxY1D+hjcX4QQlPafJvd/ H6l09+JM4MGOc7OjWXEGk3S6CHBouAOAl8IBwWQcNbGRuMP4YgTrtv51tP/CRRjQC+83 Ubo3U77FrwqtrONB8B3QZVOC+VOGbHAyAwlUKatcPfMlT3w63o9VPn2302zTrMtoSDxR P99JO0qexthkTL53feXH5Uxz4SBBYpiakUXvLinnL6ibpoy/cXrDlNrS1CxcO5XgjuLy 6OHA== X-Gm-Message-State: AOJu0Yx4piFhUwbB/1S6E6zbOxNXWcdsNacfxh7nXz1crAjW2LlWg7oM heCI8NRF2cQP7lpasrPk6XgWtQ== X-Google-Smtp-Source: AGHT+IHdcu5gU/CpFXYLjHgVWZvRsNwIrGTxKxdpB9YsVIiqrNzlaMEgYx7HAKNHw+RQcHOzvMld+w== X-Received: by 2002:a17:902:ce84:b0:1cf:b7ea:fea with SMTP id f4-20020a170902ce8400b001cfb7ea0feamr7251378plg.1.1701068842591; Sun, 26 Nov 2023 23:07:22 -0800 (PST) Received: from localhost.localdomain ([101.10.45.230]) by smtp.gmail.com with ESMTPSA id jh15-20020a170903328f00b001cfcd3a764esm1340134plb.77.2023.11.26.23.07.19 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2023 23:07:22 -0800 (PST) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net, conor.dooley@microchip.com, ebiggers@kernel.org, ardb@kernel.org Cc: heiko@sntech.de, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH v2 03/13] RISC-V: crypto: add OpenSSL perl module for vector instructions Date: Mon, 27 Nov 2023 15:06:53 +0800 Message-Id: <20231127070703.1697-4-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231127070703.1697-1-jerry.shih@sifive.com> References: <20231127070703.1697-1-jerry.shih@sifive.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The OpenSSL has some RISC-V vector cryptography implementations which could be reused for kernel. These implementations use a number of perl helpers for handling vector and vector-crypto-extension instructions. This patch take these perl helpers from OpenSSL(openssl/openssl#21923). The unused scalar crypto instructions in the original perl module are skipped. Co-developed-by: Christoph M=C3=BCllner Signed-off-by: Christoph M=C3=BCllner Co-developed-by: Heiko Stuebner Signed-off-by: Heiko Stuebner Co-developed-by: Phoebe Chen Signed-off-by: Phoebe Chen Signed-off-by: Jerry Shih --- arch/riscv/crypto/riscv.pm | 828 +++++++++++++++++++++++++++++++++++++ 1 file changed, 828 insertions(+) create mode 100644 arch/riscv/crypto/riscv.pm diff --git a/arch/riscv/crypto/riscv.pm b/arch/riscv/crypto/riscv.pm new file mode 100644 index 000000000000..e188f7476e3e --- /dev/null +++ b/arch/riscv/crypto/riscv.pm @@ -0,0 +1,828 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Christoph M=C3=BCllner +# Copyright (c) 2023, Jerry Shih +# Copyright (c) 2023, Phoebe Chen +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +use strict; +use warnings; + +# Set $have_stacktrace to 1 if we have Devel::StackTrace +my $have_stacktrace =3D 0; +if (eval {require Devel::StackTrace;1;}) { + $have_stacktrace =3D 1; +} + +my @regs =3D map("x$_",(0..31)); +# Mapping from the RISC-V psABI ABI mnemonic names to the register number. +my @regaliases =3D ('zero','ra','sp','gp','tp','t0','t1','t2','s0','s1', + map("a$_",(0..7)), + map("s$_",(2..11)), + map("t$_",(3..6)) +); + +my %reglookup; +@reglookup{@regs} =3D @regs; +@reglookup{@regaliases} =3D @regs; + +# Takes a register name, possibly an alias, and converts it to a register = index +# from 0 to 31 +sub read_reg { + my $reg =3D lc shift; + if (!exists($reglookup{$reg})) { + my $trace =3D ""; + if ($have_stacktrace) { + $trace =3D Devel::StackTrace->new->as_string; + } + die("Unknown register ".$reg."\n".$trace); + } + my $regstr =3D $reglookup{$reg}; + if (!($regstr =3D~ /^x([0-9]+)$/)) { + my $trace =3D ""; + if ($have_stacktrace) { + $trace =3D Devel::StackTrace->new->as_string; + } + die("Could not process register ".$reg."\n".$trace); + } + return $1; +} + +# Read the sew setting(8, 16, 32 and 64) and convert to vsew encoding. +sub read_sew { + my $sew_setting =3D shift; + + if ($sew_setting eq "e8") { + return 0; + } elsif ($sew_setting eq "e16") { + return 1; + } elsif ($sew_setting eq "e32") { + return 2; + } elsif ($sew_setting eq "e64") { + return 3; + } else { + my $trace =3D ""; + if ($have_stacktrace) { + $trace =3D Devel::StackTrace->new->as_string; + } + die("Unsupported SEW setting:".$sew_setting."\n".$trace); + } +} + +# Read the LMUL settings and convert to vlmul encoding. +sub read_lmul { + my $lmul_setting =3D shift; + + if ($lmul_setting eq "mf8") { + return 5; + } elsif ($lmul_setting eq "mf4") { + return 6; + } elsif ($lmul_setting eq "mf2") { + return 7; + } elsif ($lmul_setting eq "m1") { + return 0; + } elsif ($lmul_setting eq "m2") { + return 1; + } elsif ($lmul_setting eq "m4") { + return 2; + } elsif ($lmul_setting eq "m8") { + return 3; + } else { + my $trace =3D ""; + if ($have_stacktrace) { + $trace =3D Devel::StackTrace->new->as_string; + } + die("Unsupported LMUL setting:".$lmul_setting."\n".$trace); + } +} + +# Read the tail policy settings and convert to vta encoding. +sub read_tail_policy { + my $tail_setting =3D shift; + + if ($tail_setting eq "ta") { + return 1; + } elsif ($tail_setting eq "tu") { + return 0; + } else { + my $trace =3D ""; + if ($have_stacktrace) { + $trace =3D Devel::StackTrace->new->as_string; + } + die("Unsupported tail policy setting:".$tail_setting."\n".$trace); + } +} + +# Read the mask policy settings and convert to vma encoding. +sub read_mask_policy { + my $mask_setting =3D shift; + + if ($mask_setting eq "ma") { + return 1; + } elsif ($mask_setting eq "mu") { + return 0; + } else { + my $trace =3D ""; + if ($have_stacktrace) { + $trace =3D Devel::StackTrace->new->as_string; + } + die("Unsupported mask policy setting:".$mask_setting."\n".$trace); + } +} + +my @vregs =3D map("v$_",(0..31)); +my %vreglookup; +@vreglookup{@vregs} =3D @vregs; + +sub read_vreg { + my $vreg =3D lc shift; + if (!exists($vreglookup{$vreg})) { + my $trace =3D ""; + if ($have_stacktrace) { + $trace =3D Devel::StackTrace->new->as_string; + } + die("Unknown vector register ".$vreg."\n".$trace); + } + if (!($vreg =3D~ /^v([0-9]+)$/)) { + my $trace =3D ""; + if ($have_stacktrace) { + $trace =3D Devel::StackTrace->new->as_string; + } + die("Could not process vector register ".$vreg."\n".$trace); + } + return $1; +} + +# Read the vm settings and convert to mask encoding. +sub read_mask_vreg { + my $vreg =3D shift; + # The default value is unmasked. + my $mask_bit =3D 1; + + if (defined($vreg)) { + my $reg_id =3D read_vreg $vreg; + if ($reg_id =3D=3D 0) { + $mask_bit =3D 0; + } else { + my $trace =3D ""; + if ($have_stacktrace) { + $trace =3D Devel::StackTrace->new->as_string; + } + die("The ".$vreg." is not the mask register v0.\n".$trace); + } + } + return $mask_bit; +} + +# Vector instructions + +sub vadd_vv { + # vadd.vv vd, vs2, vs1, vm + my $template =3D 0b000000_0_00000_00000_000_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vs1 =3D read_vreg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15)= | ($vd << 7)); +} + +sub vadd_vx { + # vadd.vx vd, vs2, rs1, vm + my $template =3D 0b000000_0_00000_00000_100_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15)= | ($vd << 7)); +} + +sub vsub_vv { + # vsub.vv vd, vs2, vs1, vm + my $template =3D 0b000010_0_00000_00000_000_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vs1 =3D read_vreg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15)= | ($vd << 7)); +} + +sub vsub_vx { + # vsub.vx vd, vs2, rs1, vm + my $template =3D 0b000010_0_00000_00000_100_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15)= | ($vd << 7)); +} + +sub vid_v { + # vid.v vd + my $template =3D 0b0101001_00000_10001_010_00000_1010111; + my $vd =3D read_vreg shift; + return ".word ".($template | ($vd << 7)); +} + +sub viota_m { + # viota.m vd, vs2, vm + my $template =3D 0b010100_0_00000_10000_010_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7)); +} + +sub vle8_v { + # vle8.v vd, (rs1), vm + my $template =3D 0b000000_0_00000_00000_000_00000_0000111; + my $vd =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7)); +} + +sub vle32_v { + # vle32.v vd, (rs1), vm + my $template =3D 0b000000_0_00000_00000_110_00000_0000111; + my $vd =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7)); +} + +sub vle64_v { + # vle64.v vd, (rs1) + my $template =3D 0b0000001_00000_00000_111_00000_0000111; + my $vd =3D read_vreg shift; + my $rs1 =3D read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vlse32_v { + # vlse32.v vd, (rs1), rs2 + my $template =3D 0b0000101_00000_00000_110_00000_0000111; + my $vd =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $rs2 =3D read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vlsseg_nf_e32_v { + # vlssege32.v vd, (rs1), rs2 + my $template =3D 0b0000101_00000_00000_110_00000_0000111; + my $nf =3D shift; + $nf -=3D 1; + my $vd =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $rs2 =3D read_reg shift; + return ".word ".($template | ($nf << 29) | ($rs2 << 20) | ($rs1 << 15)= | ($vd << 7)); +} + +sub vlse64_v { + # vlse64.v vd, (rs1), rs2 + my $template =3D 0b0000101_00000_00000_111_00000_0000111; + my $vd =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $rs2 =3D read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vluxei8_v { + # vluxei8.v vd, (rs1), vs2, vm + my $template =3D 0b000001_0_00000_00000_000_00000_0000111; + my $vd =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $vs2 =3D read_vreg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15)= | ($vd << 7)); +} + +sub vmerge_vim { + # vmerge.vim vd, vs2, imm, v0 + my $template =3D 0b0101110_00000_00000_011_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $imm =3D shift; + return ".word ".($template | ($vs2 << 20) | ($imm << 15) | ($vd << 7)); +} + +sub vmerge_vvm { + # vmerge.vvm vd vs2 vs1 + my $template =3D 0b0101110_00000_00000_000_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vs1 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)) +} + +sub vmseq_vi { + # vmseq.vi vd vs1, imm + my $template =3D 0b0110001_00000_00000_011_00000_1010111; + my $vd =3D read_vreg shift; + my $vs1 =3D read_vreg shift; + my $imm =3D shift; + return ".word ".($template | ($vs1 << 20) | ($imm << 15) | ($vd << 7)) +} + +sub vmsgtu_vx { + # vmsgtu.vx vd vs2, rs1, vm + my $template =3D 0b011110_0_00000_00000_100_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15)= | ($vd << 7)) +} + +sub vmv_v_i { + # vmv.v.i vd, imm + my $template =3D 0b0101111_00000_00000_011_00000_1010111; + my $vd =3D read_vreg shift; + my $imm =3D shift; + return ".word ".($template | ($imm << 15) | ($vd << 7)); +} + +sub vmv_v_x { + # vmv.v.x vd, rs1 + my $template =3D 0b0101111_00000_00000_100_00000_1010111; + my $vd =3D read_vreg shift; + my $rs1 =3D read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vmv_v_v { + # vmv.v.v vd, vs1 + my $template =3D 0b0101111_00000_00000_000_00000_1010111; + my $vd =3D read_vreg shift; + my $vs1 =3D read_vreg shift; + return ".word ".($template | ($vs1 << 15) | ($vd << 7)); +} + +sub vor_vv_v0t { + # vor.vv vd, vs2, vs1, v0.t + my $template =3D 0b0010100_00000_00000_000_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vs1 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vse8_v { + # vse8.v vd, (rs1), vm + my $template =3D 0b000000_0_00000_00000_000_00000_0100111; + my $vd =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7)); +} + +sub vse32_v { + # vse32.v vd, (rs1), vm + my $template =3D 0b000000_0_00000_00000_110_00000_0100111; + my $vd =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7)); +} + +sub vssseg_nf_e32_v { + # vsssege32.v vs3, (rs1), rs2 + my $template =3D 0b0000101_00000_00000_110_00000_0100111; + my $nf =3D shift; + $nf -=3D 1; + my $vs3 =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $rs2 =3D read_reg shift; + return ".word ".($template | ($nf << 29) | ($rs2 << 20) | ($rs1 << 15)= | ($vs3 << 7)); +} + +sub vsuxei8_v { + # vsuxei8.v vs3, (rs1), vs2, vm + my $template =3D 0b000001_0_00000_00000_000_00000_0100111; + my $vs3 =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $vs2 =3D read_vreg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) = | ($vs3 << 7)); +} + +sub vse64_v { + # vse64.v vd, (rs1) + my $template =3D 0b0000001_00000_00000_111_00000_0100111; + my $vd =3D read_vreg shift; + my $rs1 =3D read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vsetivli__x0_2_e64_m1_tu_mu { + # vsetivli x0, 2, e64, m1, tu, mu + return ".word 0xc1817057"; +} + +sub vsetivli__x0_4_e32_m1_tu_mu { + # vsetivli x0, 4, e32, m1, tu, mu + return ".word 0xc1027057"; +} + +sub vsetivli__x0_4_e64_m1_tu_mu { + # vsetivli x0, 4, e64, m1, tu, mu + return ".word 0xc1827057"; +} + +sub vsetivli__x0_8_e32_m1_tu_mu { + # vsetivli x0, 8, e32, m1, tu, mu + return ".word 0xc1047057"; +} + +sub vsetvli { + # vsetvli rd, rs1, vtypei + my $template =3D 0b0_00000000000_00000_111_00000_1010111; + my $rd =3D read_reg shift; + my $rs1 =3D read_reg shift; + my $sew =3D read_sew shift; + my $lmul =3D read_lmul shift; + my $tail_policy =3D read_tail_policy shift; + my $mask_policy =3D read_mask_policy shift; + my $vtypei =3D ($mask_policy << 7) | ($tail_policy << 6) | ($sew << 3)= | $lmul; + + return ".word ".($template | ($vtypei << 20) | ($rs1 << 15) | ($rd << = 7)); +} + +sub vsetivli { + # vsetvli rd, uimm, vtypei + my $template =3D 0b11_0000000000_00000_111_00000_1010111; + my $rd =3D read_reg shift; + my $uimm =3D shift; + my $sew =3D read_sew shift; + my $lmul =3D read_lmul shift; + my $tail_policy =3D read_tail_policy shift; + my $mask_policy =3D read_mask_policy shift; + my $vtypei =3D ($mask_policy << 7) | ($tail_policy << 6) | ($sew << 3)= | $lmul; + + return ".word ".($template | ($vtypei << 20) | ($uimm << 15) | ($rd <<= 7)); +} + +sub vslidedown_vi { + # vslidedown.vi vd, vs2, uimm + my $template =3D 0b0011111_00000_00000_011_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $uimm =3D shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)= ); +} + +sub vslidedown_vx { + # vslidedown.vx vd, vs2, rs1 + my $template =3D 0b0011111_00000_00000_100_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $rs1 =3D read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vslideup_vi_v0t { + # vslideup.vi vd, vs2, uimm, v0.t + my $template =3D 0b0011100_00000_00000_011_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $uimm =3D shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)= ); +} + +sub vslideup_vi { + # vslideup.vi vd, vs2, uimm + my $template =3D 0b0011101_00000_00000_011_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $uimm =3D shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)= ); +} + +sub vsll_vi { + # vsll.vi vd, vs2, uimm, vm + my $template =3D 0b1001011_00000_00000_011_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $uimm =3D shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)= ); +} + +sub vsrl_vx { + # vsrl.vx vd, vs2, rs1 + my $template =3D 0b1010001_00000_00000_100_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $rs1 =3D read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vsse32_v { + # vse32.v vs3, (rs1), rs2 + my $template =3D 0b0000101_00000_00000_110_00000_0100111; + my $vs3 =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $rs2 =3D read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7)= ); +} + +sub vsse64_v { + # vsse64.v vs3, (rs1), rs2 + my $template =3D 0b0000101_00000_00000_111_00000_0100111; + my $vs3 =3D read_vreg shift; + my $rs1 =3D read_reg shift; + my $rs2 =3D read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7)= ); +} + +sub vxor_vv_v0t { + # vxor.vv vd, vs2, vs1, v0.t + my $template =3D 0b0010110_00000_00000_000_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vs1 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vxor_vv { + # vxor.vv vd, vs2, vs1 + my $template =3D 0b0010111_00000_00000_000_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vs1 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vzext_vf2 { + # vzext.vf2 vd, vs2, vm + my $template =3D 0b010010_0_00000_00110_010_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7)); +} + +# Vector crypto instructions + +## Zvbb and Zvkb instructions +## +## vandn (also in zvkb) +## vbrev +## vbrev8 (also in zvkb) +## vrev8 (also in zvkb) +## vclz +## vctz +## vcpop +## vrol (also in zvkb) +## vror (also in zvkb) +## vwsll + +sub vbrev8_v { + # vbrev8.v vd, vs2, vm + my $template =3D 0b010010_0_00000_01000_010_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7)); +} + +sub vrev8_v { + # vrev8.v vd, vs2, vm + my $template =3D 0b010010_0_00000_01001_010_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7)); +} + +sub vror_vi { + # vror.vi vd, vs2, uimm + my $template =3D 0b01010_0_1_00000_00000_011_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $uimm =3D shift; + my $uimm_i5 =3D $uimm >> 5; + my $uimm_i4_0 =3D $uimm & 0b11111; + + return ".word ".($template | ($uimm_i5 << 26) | ($vs2 << 20) | ($uimm_= i4_0 << 15) | ($vd << 7)); +} + +sub vwsll_vv { + # vwsll.vv vd, vs2, vs1, vm + my $template =3D 0b110101_0_00000_00000_000_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vs1 =3D read_vreg shift; + my $vm =3D read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15)= | ($vd << 7)); +} + +## Zvbc instructions + +sub vclmulh_vx { + # vclmulh.vx vd, vs2, rs1 + my $template =3D 0b0011011_00000_00000_110_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $rs1 =3D read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vclmul_vx_v0t { + # vclmul.vx vd, vs2, rs1, v0.t + my $template =3D 0b0011000_00000_00000_110_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $rs1 =3D read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vclmul_vx { + # vclmul.vx vd, vs2, rs1 + my $template =3D 0b0011001_00000_00000_110_00000_1010111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $rs1 =3D read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +## Zvkg instructions + +sub vghsh_vv { + # vghsh.vv vd, vs2, vs1 + my $template =3D 0b1011001_00000_00000_010_00000_1110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vs1 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vgmul_vv { + # vgmul.vv vd, vs2 + my $template =3D 0b1010001_00000_10001_010_00000_1110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +## Zvkned instructions + +sub vaesdf_vs { + # vaesdf.vs vd, vs2 + my $template =3D 0b101001_1_00000_00001_010_00000_1110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesdm_vs { + # vaesdm.vs vd, vs2 + my $template =3D 0b101001_1_00000_00000_010_00000_1110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesef_vs { + # vaesef.vs vd, vs2 + my $template =3D 0b101001_1_00000_00011_010_00000_1110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesem_vs { + # vaesem.vs vd, vs2 + my $template =3D 0b101001_1_00000_00010_010_00000_1110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaeskf1_vi { + # vaeskf1.vi vd, vs2, uimmm + my $template =3D 0b100010_1_00000_00000_010_00000_1110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $uimm =3D shift; + return ".word ".($template | ($uimm << 15) | ($vs2 << 20) | ($vd << 7)= ); +} + +sub vaeskf2_vi { + # vaeskf2.vi vd, vs2, uimm + my $template =3D 0b101010_1_00000_00000_010_00000_1110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $uimm =3D shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)= ); +} + +sub vaesz_vs { + # vaesz.vs vd, vs2 + my $template =3D 0b101001_1_00000_00111_010_00000_1110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +## Zvknha and Zvknhb instructions + +sub vsha2ms_vv { + # vsha2ms.vv vd, vs2, vs1 + my $template =3D 0b1011011_00000_00000_010_00000_1110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vs1 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7)); +} + +sub vsha2ch_vv { + # vsha2ch.vv vd, vs2, vs1 + my $template =3D 0b101110_10000_00000_001_00000_01110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vs1 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7)); +} + +sub vsha2cl_vv { + # vsha2cl.vv vd, vs2, vs1 + my $template =3D 0b101111_10000_00000_001_00000_01110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vs1 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7)); +} + +## Zvksed instructions + +sub vsm4k_vi { + # vsm4k.vi vd, vs2, uimm + my $template =3D 0b1000011_00000_00000_010_00000_1110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $uimm =3D shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)= ); +} + +sub vsm4r_vs { + # vsm4r.vs vd, vs2 + my $template =3D 0b1010011_00000_10000_010_00000_1110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +## zvksh instructions + +sub vsm3c_vi { + # vsm3c.vi vd, vs2, uimm + my $template =3D 0b1010111_00000_00000_010_00000_1110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $uimm =3D shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15 ) | ($vd << 7= )); +} + +sub vsm3me_vv { + # vsm3me.vv vd, vs2, vs1 + my $template =3D 0b1000001_00000_00000_010_00000_1110111; + my $vd =3D read_vreg shift; + my $vs2 =3D read_vreg shift; + my $vs1 =3D read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15 ) | ($vd << 7)= ); +} + +1; --=20 2.28.0 From nobody Wed Dec 17 12:18:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BC93C07D5B for ; Mon, 27 Nov 2023 07:07:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232488AbjK0HHi (ORCPT ); Mon, 27 Nov 2023 02:07:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232375AbjK0HH1 (ORCPT ); Mon, 27 Nov 2023 02:07:27 -0500 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 26DCA189 for ; Sun, 26 Nov 2023 23:07:26 -0800 (PST) Received: by mail-pl1-x632.google.com with SMTP id d9443c01a7336-1cf8b35a6dbso26422135ad.0 for ; Sun, 26 Nov 2023 23:07:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1701068845; x=1701673645; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CtUiUXkF3kN/BXKzxp91tLi6KCR2x1T9PQI4Fk+ZakA=; b=nPlY+zb0tw7YGaJaMtvPpJkcLa5nMnjqj0ak+r8ipUxdKUdq4mxoiThCCfCdEVBskl AyZ2WLkBlqXCktEUjHm16kP5mQ2vipFlNvTgQG1gsW4s9iJfyzU0j4d5zDk5g8Z61vqL UYRG7J09/rd7jPvW+0tcNFCpirQDGPJ6oWcTPpy2hMMDjfs+rBBELjT4wTMyIbqkBV+l dvfPFde0ZITusi2nvcHRVovrwujeXfgRPcuZdF63d8lnK8SoF/oimRuUo0L9WhnQdHoi cobcknsJuumEuCPrjFFH767RP5YBEDjPv2NsaOhYgYPdX3oE36vnjuJJu8Rtj1Y9UWmP XJxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701068845; x=1701673645; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CtUiUXkF3kN/BXKzxp91tLi6KCR2x1T9PQI4Fk+ZakA=; b=I2wL2OyPFtRSk+H0DR6tQQZEg+6PNDTWuoCeNtMNSl92mchoePWcLNhX8jeN4UkeS9 c2vij4nRHjBQWK9VmWt9HFujTpq+2VhJZgmHVD4ggMr80ioBxh/d65jyjdvJWO8dg8uS mejahm1rNzgDcP/KI6ny43Z6pr3Gfmb7S909VZmeCId0NiCS41PlTLJEIm+89eskMlvu AjSOhs0H7I2o4ob6m34pFjMZ7yO/YZYdmh991k6wmZ7vqog9oDKEkF3DLrfv5t1RFIzg N1UTHSfuLpGy5QwNl+1W4UlJpIlS9kw84NHYENCMhagA8owpF/TILOUtrqeN3F3GbXox wIYg== X-Gm-Message-State: AOJu0YwrV9j1CGeuotpUlmEM6gyyHplj35hzIDMAbI1d+XSl8xLsst4A f0g8XJIcCJ429Eu4w+X1LYowWA== X-Google-Smtp-Source: AGHT+IFRltSyEt83zMIe016Zra5s1haP8WmUsC8LiFC12OgzYubxW56A1sWoR6JvaAeOcy6VtjPE/w== X-Received: by 2002:a17:902:d38d:b0:1cf:a4e8:d2a1 with SMTP id e13-20020a170902d38d00b001cfa4e8d2a1mr8131780pld.42.1701068845494; Sun, 26 Nov 2023 23:07:25 -0800 (PST) Received: from localhost.localdomain ([101.10.45.230]) by smtp.gmail.com with ESMTPSA id jh15-20020a170903328f00b001cfcd3a764esm1340134plb.77.2023.11.26.23.07.22 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2023 23:07:25 -0800 (PST) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net, conor.dooley@microchip.com, ebiggers@kernel.org, ardb@kernel.org Cc: heiko@sntech.de, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH v2 04/13] RISC-V: crypto: add Zvkned accelerated AES implementation Date: Mon, 27 Nov 2023 15:06:54 +0800 Message-Id: <20231127070703.1697-5-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231127070703.1697-1-jerry.shih@sifive.com> References: <20231127070703.1697-1-jerry.shih@sifive.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The AES implementation using the Zvkned vector crypto extension from OpenSSL(openssl/openssl#21923). Co-developed-by: Christoph M=C3=BCllner Signed-off-by: Christoph M=C3=BCllner Co-developed-by: Heiko Stuebner Signed-off-by: Heiko Stuebner Co-developed-by: Phoebe Chen Signed-off-by: Phoebe Chen Signed-off-by: Jerry Shih --- Changelog v2: - Do not turn on kconfig `AES_RISCV64` option by default. - Turn to use `crypto_aes_ctx` structure for aes key. - Use `Zvkned` extension for AES-128/256 key expanding. - Export riscv64_aes_* symbols for other modules. - Add `asmlinkage` qualifier for crypto asm function. - Reorder structure riscv64_aes_alg_zvkned members initialization in the order declared. --- arch/riscv/crypto/Kconfig | 11 + arch/riscv/crypto/Makefile | 11 + arch/riscv/crypto/aes-riscv64-glue.c | 151 ++++++ arch/riscv/crypto/aes-riscv64-glue.h | 18 + arch/riscv/crypto/aes-riscv64-zvkned.pl | 593 ++++++++++++++++++++++++ 5 files changed, 784 insertions(+) create mode 100644 arch/riscv/crypto/aes-riscv64-glue.c create mode 100644 arch/riscv/crypto/aes-riscv64-glue.h create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 10d60edc0110..65189d4d47b3 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -2,4 +2,15 @@ =20 menu "Accelerated Cryptographic Algorithms for CPU (riscv)" =20 +config CRYPTO_AES_RISCV64 + tristate "Ciphers: AES" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_ALGAPI + select CRYPTO_LIB_AES + help + Block ciphers: AES cipher algorithms (FIPS-197) + + Architecture: riscv64 using: + - Zvkned vector crypto extension + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index b3b6332c9f6d..90ca91d8df26 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -2,3 +2,14 @@ # # linux/arch/riscv/crypto/Makefile # + +obj-$(CONFIG_CRYPTO_AES_RISCV64) +=3D aes-riscv64.o +aes-riscv64-y :=3D aes-riscv64-glue.o aes-riscv64-zvkned.o + +quiet_cmd_perlasm =3D PERLASM $@ + cmd_perlasm =3D $(PERL) $(<) void $(@) + +$(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl + $(call cmd,perlasm) + +clean-files +=3D aes-riscv64-zvkned.S diff --git a/arch/riscv/crypto/aes-riscv64-glue.c b/arch/riscv/crypto/aes-r= iscv64-glue.c new file mode 100644 index 000000000000..091e368edb30 --- /dev/null +++ b/arch/riscv/crypto/aes-riscv64-glue.c @@ -0,0 +1,151 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Port of the OpenSSL AES implementation for RISC-V + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + * + * Copyright (C) 2023 SiFive, Inc. + * Author: Jerry Shih + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "aes-riscv64-glue.h" + +/* aes cipher using zvkned vector crypto extension */ +asmlinkage int rv64i_zvkned_set_encrypt_key(const u8 *user_key, const int = bytes, + const struct crypto_aes_ctx *key); +asmlinkage void rv64i_zvkned_encrypt(const u8 *in, u8 *out, + const struct crypto_aes_ctx *key); +asmlinkage void rv64i_zvkned_decrypt(const u8 *in, u8 *out, + const struct crypto_aes_ctx *key); + +int riscv64_aes_setkey(struct crypto_aes_ctx *ctx, const u8 *key, + unsigned int keylen) +{ + int ret; + + ret =3D aes_check_keylen(keylen); + if (ret < 0) + return -EINVAL; + + /* + * The RISC-V AES vector crypto key expanding doesn't support AES-192. + * Use the generic software key expanding for that case. + */ + if ((keylen =3D=3D 16 || keylen =3D=3D 32) && crypto_simd_usable()) { + /* + * All zvkned-based functions use encryption expanding keys for both + * encryption and decryption. + */ + kernel_vector_begin(); + rv64i_zvkned_set_encrypt_key(key, keylen, ctx); + kernel_vector_end(); + } else { + ret =3D aes_expandkey(ctx, key, keylen); + } + + return ret; +} +EXPORT_SYMBOL(riscv64_aes_setkey); + +void riscv64_aes_encrypt_zvkned(const struct crypto_aes_ctx *ctx, u8 *dst, + const u8 *src) +{ + if (crypto_simd_usable()) { + kernel_vector_begin(); + rv64i_zvkned_encrypt(src, dst, ctx); + kernel_vector_end(); + } else { + aes_encrypt(ctx, dst, src); + } +} +EXPORT_SYMBOL(riscv64_aes_encrypt_zvkned); + +void riscv64_aes_decrypt_zvkned(const struct crypto_aes_ctx *ctx, u8 *dst, + const u8 *src) +{ + if (crypto_simd_usable()) { + kernel_vector_begin(); + rv64i_zvkned_decrypt(src, dst, ctx); + kernel_vector_end(); + } else { + aes_decrypt(ctx, dst, src); + } +} +EXPORT_SYMBOL(riscv64_aes_decrypt_zvkned); + +static int aes_setkey(struct crypto_tfm *tfm, const u8 *key, + unsigned int keylen) +{ + struct crypto_aes_ctx *ctx =3D crypto_tfm_ctx(tfm); + + return riscv64_aes_setkey(ctx, key, keylen); +} + +static void aes_encrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *= src) +{ + const struct crypto_aes_ctx *ctx =3D crypto_tfm_ctx(tfm); + + riscv64_aes_encrypt_zvkned(ctx, dst, src); +} + +static void aes_decrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *= src) +{ + const struct crypto_aes_ctx *ctx =3D crypto_tfm_ctx(tfm); + + riscv64_aes_decrypt_zvkned(ctx, dst, src); +} + +static struct crypto_alg riscv64_aes_alg_zvkned =3D { + .cra_flags =3D CRYPTO_ALG_TYPE_CIPHER, + .cra_blocksize =3D AES_BLOCK_SIZE, + .cra_ctxsize =3D sizeof(struct crypto_aes_ctx), + .cra_priority =3D 300, + .cra_name =3D "aes", + .cra_driver_name =3D "aes-riscv64-zvkned", + .cra_cipher =3D { + .cia_min_keysize =3D AES_MIN_KEY_SIZE, + .cia_max_keysize =3D AES_MAX_KEY_SIZE, + .cia_setkey =3D aes_setkey, + .cia_encrypt =3D aes_encrypt_zvkned, + .cia_decrypt =3D aes_decrypt_zvkned, + }, + .cra_module =3D THIS_MODULE, +}; + +static inline bool check_aes_ext(void) +{ + return riscv_isa_extension_available(NULL, ZVKNED) && + riscv_vector_vlen() >=3D 128; +} + +static int __init riscv64_aes_mod_init(void) +{ + if (check_aes_ext()) + return crypto_register_alg(&riscv64_aes_alg_zvkned); + + return -ENODEV; +} + +static void __exit riscv64_aes_mod_fini(void) +{ + crypto_unregister_alg(&riscv64_aes_alg_zvkned); +} + +module_init(riscv64_aes_mod_init); +module_exit(riscv64_aes_mod_fini); + +MODULE_DESCRIPTION("AES (RISC-V accelerated)"); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("aes"); diff --git a/arch/riscv/crypto/aes-riscv64-glue.h b/arch/riscv/crypto/aes-r= iscv64-glue.h new file mode 100644 index 000000000000..0416bbc4318e --- /dev/null +++ b/arch/riscv/crypto/aes-riscv64-glue.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef AES_RISCV64_GLUE_H +#define AES_RISCV64_GLUE_H + +#include +#include + +int riscv64_aes_setkey(struct crypto_aes_ctx *ctx, const u8 *key, + unsigned int keylen); + +void riscv64_aes_encrypt_zvkned(const struct crypto_aes_ctx *ctx, u8 *dst, + const u8 *src); + +void riscv64_aes_decrypt_zvkned(const struct crypto_aes_ctx *ctx, u8 *dst, + const u8 *src); + +#endif /* AES_RISCV64_GLUE_H */ diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/ae= s-riscv64-zvkned.pl new file mode 100644 index 000000000000..303e82d9f6f0 --- /dev/null +++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl @@ -0,0 +1,593 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Christoph M=C3=BCllner +# Copyright (c) 2023, Phoebe Chen +# Copyright (c) 2023, Jerry Shih +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# - RV64I +# - RISC-V Vector ('V') with VLEN >=3D 128 +# - RISC-V Vector AES block cipher extension ('Zvkned') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extensio= n) +# $flavour is the first argument if it doesn't look like a file +my $output =3D $#ARGV >=3D 0 && $ARGV[$#ARGV] =3D~ m|\.\w+$| ? pop : undef; +my $flavour =3D $#ARGV >=3D 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=3D<<___; +.text +___ + +my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, + $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15, + $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23, + $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, +) =3D map("v$_",(0..31)); + +{ +##########################################################################= ###### +# int rv64i_zvkned_set_encrypt_key(const unsigned char *userKey, const int= bytes, +# AES_KEY *key) +my ($UKEY, $BYTES, $KEYP) =3D ("a0", "a1", "a2"); +my ($T0) =3D ("t0"); + +$code .=3D <<___; +.p2align 3 +.globl rv64i_zvkned_set_encrypt_key +.type rv64i_zvkned_set_encrypt_key,\@function +rv64i_zvkned_set_encrypt_key: + beqz $UKEY, L_fail_m1 + beqz $KEYP, L_fail_m1 + + # Store the key length. + sw $BYTES, 480($KEYP) + + li $T0, 32 + beq $BYTES, $T0, L_set_key_256 + li $T0, 16 + beq $BYTES, $T0, L_set_key_128 + + j L_fail_m2 + +L_set_key_128: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + # Load the key + @{[vle32_v $V10, $UKEY]} + + # Generate keys for round 2-11 into registers v11-v20. + @{[vaeskf1_vi $V11, $V10, 1]} # v11 <- rk2 (w[ 4, 7]) + @{[vaeskf1_vi $V12, $V11, 2]} # v12 <- rk3 (w[ 8,11]) + @{[vaeskf1_vi $V13, $V12, 3]} # v13 <- rk4 (w[12,15]) + @{[vaeskf1_vi $V14, $V13, 4]} # v14 <- rk5 (w[16,19]) + @{[vaeskf1_vi $V15, $V14, 5]} # v15 <- rk6 (w[20,23]) + @{[vaeskf1_vi $V16, $V15, 6]} # v16 <- rk7 (w[24,27]) + @{[vaeskf1_vi $V17, $V16, 7]} # v17 <- rk8 (w[28,31]) + @{[vaeskf1_vi $V18, $V17, 8]} # v18 <- rk9 (w[32,35]) + @{[vaeskf1_vi $V19, $V18, 9]} # v19 <- rk10 (w[36,39]) + @{[vaeskf1_vi $V20, $V19, 10]} # v20 <- rk11 (w[40,43]) + + # Store the round keys + @{[vse32_v $V10, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V11, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V12, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V13, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V14, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V15, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V16, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V17, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V18, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V19, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V20, $KEYP]} + + li a0, 1 + ret + +L_set_key_256: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + # Load the key + @{[vle32_v $V10, $UKEY]} + addi $UKEY, $UKEY, 16 + @{[vle32_v $V11, $UKEY]} + + @{[vmv_v_v $V12, $V10]} + @{[vaeskf2_vi $V12, $V11, 2]} + @{[vmv_v_v $V13, $V11]} + @{[vaeskf2_vi $V13, $V12, 3]} + @{[vmv_v_v $V14, $V12]} + @{[vaeskf2_vi $V14, $V13, 4]} + @{[vmv_v_v $V15, $V13]} + @{[vaeskf2_vi $V15, $V14, 5]} + @{[vmv_v_v $V16, $V14]} + @{[vaeskf2_vi $V16, $V15, 6]} + @{[vmv_v_v $V17, $V15]} + @{[vaeskf2_vi $V17, $V16, 7]} + @{[vmv_v_v $V18, $V16]} + @{[vaeskf2_vi $V18, $V17, 8]} + @{[vmv_v_v $V19, $V17]} + @{[vaeskf2_vi $V19, $V18, 9]} + @{[vmv_v_v $V20, $V18]} + @{[vaeskf2_vi $V20, $V19, 10]} + @{[vmv_v_v $V21, $V19]} + @{[vaeskf2_vi $V21, $V20, 11]} + @{[vmv_v_v $V22, $V20]} + @{[vaeskf2_vi $V22, $V21, 12]} + @{[vmv_v_v $V23, $V21]} + @{[vaeskf2_vi $V23, $V22, 13]} + @{[vmv_v_v $V24, $V22]} + @{[vaeskf2_vi $V24, $V23, 14]} + + @{[vse32_v $V10, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V11, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V12, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V13, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V14, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V15, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V16, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V17, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V18, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V19, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V20, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V21, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V22, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V23, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vse32_v $V24, $KEYP]} + + li a0, 1 + ret +.size rv64i_zvkned_set_encrypt_key,.-rv64i_zvkned_set_encrypt_key +___ +} + +{ +##########################################################################= ###### +# void rv64i_zvkned_encrypt(const unsigned char *in, unsigned char *out, +# const AES_KEY *key); +my ($INP, $OUTP, $KEYP) =3D ("a0", "a1", "a2"); +my ($T0) =3D ("t0"); +my ($KEY_LEN) =3D ("a3"); + +$code .=3D <<___; +.p2align 3 +.globl rv64i_zvkned_encrypt +.type rv64i_zvkned_encrypt,\@function +rv64i_zvkned_encrypt: + # Load key length. + lwu $KEY_LEN, 480($KEYP) + + # Get proper routine for key length. + li $T0, 32 + beq $KEY_LEN, $T0, L_enc_256 + li $T0, 24 + beq $KEY_LEN, $T0, L_enc_192 + li $T0, 16 + beq $KEY_LEN, $T0, L_enc_128 + + j L_fail_m2 +.size rv64i_zvkned_encrypt,.-rv64i_zvkned_encrypt +___ + +$code .=3D <<___; +.p2align 3 +L_enc_128: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + @{[vle32_v $V1, $INP]} + + @{[vle32_v $V10, $KEYP]} + @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} + @{[vaesem_vs $V1, $V11]} # with round key w[ 4, 7] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V12, $KEYP]} + @{[vaesem_vs $V1, $V12]} # with round key w[ 8,11] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V13, $KEYP]} + @{[vaesem_vs $V1, $V13]} # with round key w[12,15] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V14, $KEYP]} + @{[vaesem_vs $V1, $V14]} # with round key w[16,19] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V15, $KEYP]} + @{[vaesem_vs $V1, $V15]} # with round key w[20,23] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V16, $KEYP]} + @{[vaesem_vs $V1, $V16]} # with round key w[24,27] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V17, $KEYP]} + @{[vaesem_vs $V1, $V17]} # with round key w[28,31] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V18, $KEYP]} + @{[vaesem_vs $V1, $V18]} # with round key w[32,35] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V19, $KEYP]} + @{[vaesem_vs $V1, $V19]} # with round key w[36,39] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V20, $KEYP]} + @{[vaesef_vs $V1, $V20]} # with round key w[40,43] + + @{[vse32_v $V1, $OUTP]} + + ret +.size L_enc_128,.-L_enc_128 +___ + +$code .=3D <<___; +.p2align 3 +L_enc_192: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + @{[vle32_v $V1, $INP]} + + @{[vle32_v $V10, $KEYP]} + @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} + @{[vaesem_vs $V1, $V11]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V12, $KEYP]} + @{[vaesem_vs $V1, $V12]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V13, $KEYP]} + @{[vaesem_vs $V1, $V13]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V14, $KEYP]} + @{[vaesem_vs $V1, $V14]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V15, $KEYP]} + @{[vaesem_vs $V1, $V15]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V16, $KEYP]} + @{[vaesem_vs $V1, $V16]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V17, $KEYP]} + @{[vaesem_vs $V1, $V17]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V18, $KEYP]} + @{[vaesem_vs $V1, $V18]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V19, $KEYP]} + @{[vaesem_vs $V1, $V19]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V20, $KEYP]} + @{[vaesem_vs $V1, $V20]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V21, $KEYP]} + @{[vaesem_vs $V1, $V21]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V22, $KEYP]} + @{[vaesef_vs $V1, $V22]} + + @{[vse32_v $V1, $OUTP]} + ret +.size L_enc_192,.-L_enc_192 +___ + +$code .=3D <<___; +.p2align 3 +L_enc_256: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + @{[vle32_v $V1, $INP]} + + @{[vle32_v $V10, $KEYP]} + @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} + @{[vaesem_vs $V1, $V11]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V12, $KEYP]} + @{[vaesem_vs $V1, $V12]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V13, $KEYP]} + @{[vaesem_vs $V1, $V13]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V14, $KEYP]} + @{[vaesem_vs $V1, $V14]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V15, $KEYP]} + @{[vaesem_vs $V1, $V15]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V16, $KEYP]} + @{[vaesem_vs $V1, $V16]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V17, $KEYP]} + @{[vaesem_vs $V1, $V17]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V18, $KEYP]} + @{[vaesem_vs $V1, $V18]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V19, $KEYP]} + @{[vaesem_vs $V1, $V19]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V20, $KEYP]} + @{[vaesem_vs $V1, $V20]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V21, $KEYP]} + @{[vaesem_vs $V1, $V21]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V22, $KEYP]} + @{[vaesem_vs $V1, $V22]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V23, $KEYP]} + @{[vaesem_vs $V1, $V23]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V24, $KEYP]} + @{[vaesef_vs $V1, $V24]} + + @{[vse32_v $V1, $OUTP]} + ret +.size L_enc_256,.-L_enc_256 +___ + +##########################################################################= ###### +# void rv64i_zvkned_decrypt(const unsigned char *in, unsigned char *out, +# const AES_KEY *key); +$code .=3D <<___; +.p2align 3 +.globl rv64i_zvkned_decrypt +.type rv64i_zvkned_decrypt,\@function +rv64i_zvkned_decrypt: + # Load key length. + lwu $KEY_LEN, 480($KEYP) + + # Get proper routine for key length. + li $T0, 32 + beq $KEY_LEN, $T0, L_dec_256 + li $T0, 24 + beq $KEY_LEN, $T0, L_dec_192 + li $T0, 16 + beq $KEY_LEN, $T0, L_dec_128 + + j L_fail_m2 +.size rv64i_zvkned_decrypt,.-rv64i_zvkned_decrypt +___ + +$code .=3D <<___; +.p2align 3 +L_dec_128: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + @{[vle32_v $V1, $INP]} + + addi $KEYP, $KEYP, 160 + @{[vle32_v $V20, $KEYP]} + @{[vaesz_vs $V1, $V20]} # with round key w[40,43] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V19, $KEYP]} + @{[vaesdm_vs $V1, $V19]} # with round key w[36,39] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V18, $KEYP]} + @{[vaesdm_vs $V1, $V18]} # with round key w[32,35] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V17, $KEYP]} + @{[vaesdm_vs $V1, $V17]} # with round key w[28,31] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V16, $KEYP]} + @{[vaesdm_vs $V1, $V16]} # with round key w[24,27] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V15, $KEYP]} + @{[vaesdm_vs $V1, $V15]} # with round key w[20,23] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V14, $KEYP]} + @{[vaesdm_vs $V1, $V14]} # with round key w[16,19] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V13, $KEYP]} + @{[vaesdm_vs $V1, $V13]} # with round key w[12,15] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V12, $KEYP]} + @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V11, $KEYP]} + @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V10, $KEYP]} + @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3] + + @{[vse32_v $V1, $OUTP]} + + ret +.size L_dec_128,.-L_dec_128 +___ + +$code .=3D <<___; +.p2align 3 +L_dec_192: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + @{[vle32_v $V1, $INP]} + + addi $KEYP, $KEYP, 192 + @{[vle32_v $V22, $KEYP]} + @{[vaesz_vs $V1, $V22]} # with round key w[48,51] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V21, $KEYP]} + @{[vaesdm_vs $V1, $V21]} # with round key w[44,47] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V20, $KEYP]} + @{[vaesdm_vs $V1, $V20]} # with round key w[40,43] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V19, $KEYP]} + @{[vaesdm_vs $V1, $V19]} # with round key w[36,39] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V18, $KEYP]} + @{[vaesdm_vs $V1, $V18]} # with round key w[32,35] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V17, $KEYP]} + @{[vaesdm_vs $V1, $V17]} # with round key w[28,31] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V16, $KEYP]} + @{[vaesdm_vs $V1, $V16]} # with round key w[24,27] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V15, $KEYP]} + @{[vaesdm_vs $V1, $V15]} # with round key w[20,23] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V14, $KEYP]} + @{[vaesdm_vs $V1, $V14]} # with round key w[16,19] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V13, $KEYP]} + @{[vaesdm_vs $V1, $V13]} # with round key w[12,15] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V12, $KEYP]} + @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V11, $KEYP]} + @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V10, $KEYP]} + @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3] + + @{[vse32_v $V1, $OUTP]} + + ret +.size L_dec_192,.-L_dec_192 +___ + +$code .=3D <<___; +.p2align 3 +L_dec_256: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + @{[vle32_v $V1, $INP]} + + addi $KEYP, $KEYP, 224 + @{[vle32_v $V24, $KEYP]} + @{[vaesz_vs $V1, $V24]} # with round key w[56,59] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V23, $KEYP]} + @{[vaesdm_vs $V1, $V23]} # with round key w[52,55] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V22, $KEYP]} + @{[vaesdm_vs $V1, $V22]} # with round key w[48,51] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V21, $KEYP]} + @{[vaesdm_vs $V1, $V21]} # with round key w[44,47] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V20, $KEYP]} + @{[vaesdm_vs $V1, $V20]} # with round key w[40,43] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V19, $KEYP]} + @{[vaesdm_vs $V1, $V19]} # with round key w[36,39] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V18, $KEYP]} + @{[vaesdm_vs $V1, $V18]} # with round key w[32,35] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V17, $KEYP]} + @{[vaesdm_vs $V1, $V17]} # with round key w[28,31] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V16, $KEYP]} + @{[vaesdm_vs $V1, $V16]} # with round key w[24,27] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V15, $KEYP]} + @{[vaesdm_vs $V1, $V15]} # with round key w[20,23] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V14, $KEYP]} + @{[vaesdm_vs $V1, $V14]} # with round key w[16,19] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V13, $KEYP]} + @{[vaesdm_vs $V1, $V13]} # with round key w[12,15] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V12, $KEYP]} + @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V11, $KEYP]} + @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V10, $KEYP]} + @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3] + + @{[vse32_v $V1, $OUTP]} + + ret +.size L_dec_256,.-L_dec_256 +___ +} + +$code .=3D <<___; +L_fail_m1: + li a0, -1 + ret +.size L_fail_m1,.-L_fail_m1 + +L_fail_m2: + li a0, -2 + ret +.size L_fail_m2,.-L_fail_m2 + +L_end: + ret +.size L_end,.-L_end +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; --=20 2.28.0 From nobody Wed Dec 17 12:18:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E45C5C07E97 for ; Mon, 27 Nov 2023 07:07:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232417AbjK0HHd (ORCPT ); Mon, 27 Nov 2023 02:07:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39774 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232405AbjK0HHZ (ORCPT ); Mon, 27 Nov 2023 02:07:25 -0500 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D937B1A6 for ; Sun, 26 Nov 2023 23:07:28 -0800 (PST) Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-1cfc3f50504so6123445ad.3 for ; Sun, 26 Nov 2023 23:07:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1701068848; x=1701673648; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=I9NcydlPjJuZdBvk47SliSWRPXO51iH+ClRWgIXdngI=; b=R3bGiBJyq+uYPEskiMD2ig1vN6/rcwG2a45RUwH5v8t+GhKBASiJ2KCfoRAn8QERwB Fwjs7IIY/oCs9P1Eafo/z/2qRS1WfEXf7i210aFCLEDdvsIyNEh2DNin+8kvlzQer6OX 2M31tXupx7SV+Vk4A7EoPk1pd4hi8K8Xz0T6nxvjeGZk/8oV+5tF2GlstIaMZ2RsLMTY bDMXAcsqi5GJY8zab1zE4JaC+pdDFrQeYIb274LosuLyjt66+OddRoUGLI4ru4y4mR0k tKiamBKph5O4/sifbbYhaIAcjnXWMxdbn726u8JcDt/QzMCnVllmoRHWmiNIYFFGEomz MHGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701068848; x=1701673648; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=I9NcydlPjJuZdBvk47SliSWRPXO51iH+ClRWgIXdngI=; b=je29FcBaHiCmdrc8u5zGxbe7OvpyhuL3vKeX6NXVMFkQq9HUsUOec7L40802+jjFSs UUTE+/sJyqgrTI71D3mnTc6bSSMYfx5+QON/MN2ejomQ8hJ5qQuxzbNcYr3G3j6Wik8E EBJSGq5lmhun/DCWnnAD/L8k08RROHES1HmbxPWCNDdS83+aP/WQEoPkfIVdi7597af+ ig3AIxGXscZy48KuKA0N1dMZ7y4+sxYzYLuQVwzCXltk2ko3wruOpq8MaAeRdAnP/BFx Pr12094Ln1JIbEQspC2BIw/hpftpcn8xmRPlm1mbIy0S5P2oB4s6HZV1Cpkya3So9qV6 rafw== X-Gm-Message-State: AOJu0YzBjT1y51K0wK/gpsK/y4FMyCEzJyE6ooGkFHpbkTvvEqOfp8OP gTcEhthU+3wfdgkVGkgdwoj6+g== X-Google-Smtp-Source: AGHT+IELRS3hxE4VCKaoEswdjN2LTSxsJZBADubHAmIAofnrpOYQED6FfVAgy1DMUxcK/odvBEY6yg== X-Received: by 2002:a17:903:1d1:b0:1cf:d58b:da39 with SMTP id e17-20020a17090301d100b001cfd58bda39mr1135943plh.64.1701068848370; Sun, 26 Nov 2023 23:07:28 -0800 (PST) Received: from localhost.localdomain ([101.10.45.230]) by smtp.gmail.com with ESMTPSA id jh15-20020a170903328f00b001cfcd3a764esm1340134plb.77.2023.11.26.23.07.25 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2023 23:07:28 -0800 (PST) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net, conor.dooley@microchip.com, ebiggers@kernel.org, ardb@kernel.org Cc: heiko@sntech.de, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH v2 05/13] crypto: simd - Update `walksize` in simd skcipher Date: Mon, 27 Nov 2023 15:06:55 +0800 Message-Id: <20231127070703.1697-6-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231127070703.1697-1-jerry.shih@sifive.com> References: <20231127070703.1697-1-jerry.shih@sifive.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The `walksize` assignment is missed in simd skcipher. Signed-off-by: Jerry Shih --- crypto/cryptd.c | 1 + crypto/simd.c | 1 + 2 files changed, 2 insertions(+) diff --git a/crypto/cryptd.c b/crypto/cryptd.c index bbcc368b6a55..253d13504ccb 100644 --- a/crypto/cryptd.c +++ b/crypto/cryptd.c @@ -405,6 +405,7 @@ static int cryptd_create_skcipher(struct crypto_templat= e *tmpl, (alg->base.cra_flags & CRYPTO_ALG_INTERNAL); inst->alg.ivsize =3D crypto_skcipher_alg_ivsize(alg); inst->alg.chunksize =3D crypto_skcipher_alg_chunksize(alg); + inst->alg.walksize =3D crypto_skcipher_alg_walksize(alg); inst->alg.min_keysize =3D crypto_skcipher_alg_min_keysize(alg); inst->alg.max_keysize =3D crypto_skcipher_alg_max_keysize(alg); =20 diff --git a/crypto/simd.c b/crypto/simd.c index edaa479a1ec5..ea0caabf90f1 100644 --- a/crypto/simd.c +++ b/crypto/simd.c @@ -181,6 +181,7 @@ struct simd_skcipher_alg *simd_skcipher_create_compat(c= onst char *algname, =20 alg->ivsize =3D ialg->ivsize; alg->chunksize =3D ialg->chunksize; + alg->walksize =3D ialg->walksize; alg->min_keysize =3D ialg->min_keysize; alg->max_keysize =3D ialg->max_keysize; =20 --=20 2.28.0 From nobody Wed Dec 17 12:18:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AC72C07D5A for ; Mon, 27 Nov 2023 07:07:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232441AbjK0HHl (ORCPT ); Mon, 27 Nov 2023 02:07:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232229AbjK0HH2 (ORCPT ); Mon, 27 Nov 2023 02:07:28 -0500 Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6D8B181 for ; Sun, 26 Nov 2023 23:07:31 -0800 (PST) Received: by mail-pl1-x62a.google.com with SMTP id d9443c01a7336-1cfafe3d46bso16442325ad.0 for ; Sun, 26 Nov 2023 23:07:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1701068851; x=1701673651; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KWDZyqBJhOVpYqYhwbudR4lRinTiZ1fIB/oQ8fHi3tg=; b=kjzYockN9LFJ6sx2k0sGTTv2ytJuV4elNRp7fBYMsUAQ9WMnE/5gs3P+8QSQ9lYcUu y0q54OvISdijCRal+GXbTQQM9k8P4lbgU7+Qi6cXrbec2Vy0BhNWcgObBZGenfaBM0pa WWfUkV5tBE3ALfJosouIAt6/wP/1T092eOP9/CmWzyHQUYfUK4BkAioZkNr2/Sly8FSm umWvG6sqhe0fLBciczQW2wL/PushexwwSJTU843BMvxw4DkiMzx+1G5AQJ7Ugi7Omh7L waH/IKc9Ewewq1bLR7U2nQWiX6+COnNYXo+fV/MpRKzFBgi9xpjCIy6jkqqN4dKvlNhK KIOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701068851; x=1701673651; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KWDZyqBJhOVpYqYhwbudR4lRinTiZ1fIB/oQ8fHi3tg=; b=UgwRmO9MlVcS4Vv1SXXAhulinlnsmUk3n+VWxtw4f7GT54oE2wK/96dedB4KyY4AZC fqUoAl77RX2zMyno/3IqFvg3iw8dvYSUjKJ01NVzjlytV6WSJnQuS9m9VTubxyfSvF0K i3a/RsyHE6ju3GHPQ5y/42hMtADUTk1jdqj7rWrC9ImCw+KbqBVzRCgfRg85vOG4Bl+D RVVn2MRfMWge/YJIgIjVvxI4uhW4jBgwykHDOi4J0xv10Bhv6Mg8alV6ugRjqkTbDn2V hLW9Ec1yJw4ZMdpim4POqCEvQ+9Vuqx+vxiL/YJfab4iRgrRviVrSHTLJLNWgTa6tJbx AuOA== X-Gm-Message-State: AOJu0Yxqwjw4fyqmORe757pqaw6v5aNdr9/RL/FtKA5eoogCbs/HjtTy itDkLQebff+oojDlEubIyM9vsA== X-Google-Smtp-Source: AGHT+IH+pvi642lKHelJCCdm6Sbw3UusLCtlqBwe0TaqWn+6ozeTKQf9INKnJ4k0xOSABXksoQFmKA== X-Received: by 2002:a17:902:820b:b0:1cf:cd4e:ca02 with SMTP id x11-20020a170902820b00b001cfcd4eca02mr2699019pln.24.1701068851234; Sun, 26 Nov 2023 23:07:31 -0800 (PST) Received: from localhost.localdomain ([101.10.45.230]) by smtp.gmail.com with ESMTPSA id jh15-20020a170903328f00b001cfcd3a764esm1340134plb.77.2023.11.26.23.07.28 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2023 23:07:31 -0800 (PST) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net, conor.dooley@microchip.com, ebiggers@kernel.org, ardb@kernel.org Cc: heiko@sntech.de, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH v2 06/13] crypto: scatterwalk - Add scatterwalk_next() to get the next scatterlist in scatter_walk Date: Mon, 27 Nov 2023 15:06:56 +0800 Message-Id: <20231127070703.1697-7-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231127070703.1697-1-jerry.shih@sifive.com> References: <20231127070703.1697-1-jerry.shih@sifive.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In some situations, we might split the `skcipher_request` into several segments. When we try to move to next segment, we might use `scatterwalk_ffwd()` to get the corresponding `scatterlist` iterating from the head of `scatterlist`. This helper function could just gather the information in `skcipher_walk` and move to next `scatterlist` directly. Signed-off-by: Jerry Shih --- include/crypto/scatterwalk.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/include/crypto/scatterwalk.h b/include/crypto/scatterwalk.h index 32fc4473175b..b1a90afe695d 100644 --- a/include/crypto/scatterwalk.h +++ b/include/crypto/scatterwalk.h @@ -98,7 +98,12 @@ void scatterwalk_map_and_copy(void *buf, struct scatterl= ist *sg, unsigned int start, unsigned int nbytes, int out); =20 struct scatterlist *scatterwalk_ffwd(struct scatterlist dst[2], - struct scatterlist *src, - unsigned int len); + struct scatterlist *src, unsigned int len); + +static inline struct scatterlist *scatterwalk_next(struct scatterlist dst[= 2], + struct scatter_walk *src) +{ + return scatterwalk_ffwd(dst, src->sg, src->offset - src->sg->offset); +} =20 #endif /* _CRYPTO_SCATTERWALK_H */ --=20 2.28.0 From nobody Wed Dec 17 12:18:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70B53C4167B for ; Mon, 27 Nov 2023 07:08:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232527AbjK0HIG (ORCPT ); Mon, 27 Nov 2023 02:08:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39786 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232512AbjK0HHk (ORCPT ); Mon, 27 Nov 2023 02:07:40 -0500 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC7F0191 for ; Sun, 26 Nov 2023 23:07:35 -0800 (PST) Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-1cfb2176150so11004475ad.3 for ; Sun, 26 Nov 2023 23:07:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1701068855; x=1701673655; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=s1s7i3VA49GEZVjag/Kbhi+FIt8+X1iRiytC3VJdGAg=; b=bZbye1PTMTRi/rawxPtlfPPa8km4RZ0l3B+7K3XyihDu17U9PSLCvaaxC5NlFZDZb1 HEi5x8xtOrDs4dGYmdjI7hPwjEuYsbjY9slL9GlooPTEO61BdMEZH4fkDPgc+1vr1L7q yRxXin6L1AurZfCoK0GCsUPrR1WFRpmkkfe2JuXiJUet6CoARCD1KgCtnSMc4lukNmO3 18L0yKFLXHmrw8/9STRr3UPkcAHADhha68MR7QNItWz/eqHcxpdWervPgnyPPLkzup7/ T/UabG6R3fwE2E22YD2R49g86RXokDdEmDmejJnsbSx5EpHX8XhsbNlCu+igJT6hnjxy PYfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701068855; x=1701673655; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=s1s7i3VA49GEZVjag/Kbhi+FIt8+X1iRiytC3VJdGAg=; b=oF9lZtwBDS80xdql9pyofI3yHy1k3T8OKPesnCdHf/g61snpRPnItoToPr2YBFotIA GU43aQSImEb/yM5FVqjA5gKzMvOYAUM7vSQ2kKr0V9Svx38Rfsrijw5fgG08koNtazIJ btNj9LwIcGgqgc/fIsBkeRJAzp7qufVtBki09u1r7lttDlkaL+z1ghr0sgW/2Gs3ydfp eX7mVYIOS8rpF9fpHybND5ooGkCTB5DQmRVjm7BwrxcvGSCBYHCz4otJJpi/SfGpws4q OrjD+uv4OGPaR69jVtHUJXn/xgKzbhB9CxF1kUor/ti931EK6UYaSFg8ibEGB92clg3T SiMA== X-Gm-Message-State: AOJu0YyU45VA8/urANsqKlvka9j6sEjZDVyqFtgxbjXaRSwpMubOTWQE nUXd8orFx/BSlMJJjU/VLzPiT2pU0Qj0sYhU4kwi3Q== X-Google-Smtp-Source: AGHT+IEdpkYekzvZiIQ/njqiH1GpCr+lk2R945AdXDLzoB19kY/lDSKWeYSQh2uFfNGIUhkXyGVo9g== X-Received: by 2002:a17:902:ead2:b0:1cf:63bb:82a6 with SMTP id p18-20020a170902ead200b001cf63bb82a6mr9164128pld.65.1701068854621; Sun, 26 Nov 2023 23:07:34 -0800 (PST) Received: from localhost.localdomain ([101.10.45.230]) by smtp.gmail.com with ESMTPSA id jh15-20020a170903328f00b001cfcd3a764esm1340134plb.77.2023.11.26.23.07.31 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2023 23:07:34 -0800 (PST) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net, conor.dooley@microchip.com, ebiggers@kernel.org, ardb@kernel.org Cc: heiko@sntech.de, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH v2 07/13] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations Date: Mon, 27 Nov 2023 15:06:57 +0800 Message-Id: <20231127070703.1697-8-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231127070703.1697-1-jerry.shih@sifive.com> References: <20231127070703.1697-1-jerry.shih@sifive.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Port the vector-crypto accelerated CBC, CTR, ECB and XTS block modes for AES cipher from OpenSSL(openssl/openssl#21923). In addition, support XTS-AES-192 mode which is not existed in OpenSSL. Co-developed-by: Phoebe Chen Signed-off-by: Phoebe Chen Signed-off-by: Jerry Shih --- Changelog v2: - Do not turn on kconfig `AES_BLOCK_RISCV64` option by default. - Update asm function for using aes key in `crypto_aes_ctx` structure. - Turn to use simd skcipher interface for AES-CBC/CTR/ECB/XTS modes. We still have lots of discussions for kernel-vector implementation. Before the final version of kernel-vector, use simd skcipher interface to skip the fallback path for all aes modes in all kinds of contexts. If we could always enable kernel-vector in softirq in the future, we could make the original sync skcipher algorithm back. - Refine aes-xts comments for head and tail blocks handling. - Update VLEN constraint for aex-xts mode. - Add `asmlinkage` qualifier for crypto asm function. - Rename aes-riscv64-zvbb-zvkg-zvkned to aes-riscv64-zvkned-zvbb-zvkg. - Rename aes-riscv64-zvkb-zvkned to aes-riscv64-zvkned-zvkb. - Reorder structure riscv64_aes_algs_zvkned, riscv64_aes_alg_zvkned_zvkb and riscv64_aes_alg_zvkned_zvbb_zvkg members initialization in the order declared. --- arch/riscv/crypto/Kconfig | 21 + arch/riscv/crypto/Makefile | 11 + .../crypto/aes-riscv64-block-mode-glue.c | 514 ++++++++++ .../crypto/aes-riscv64-zvkned-zvbb-zvkg.pl | 949 ++++++++++++++++++ arch/riscv/crypto/aes-riscv64-zvkned-zvkb.pl | 415 ++++++++ arch/riscv/crypto/aes-riscv64-zvkned.pl | 746 ++++++++++++++ 6 files changed, 2656 insertions(+) create mode 100644 arch/riscv/crypto/aes-riscv64-block-mode-glue.c create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.pl create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned-zvkb.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 65189d4d47b3..9d991ddda289 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -13,4 +13,25 @@ config CRYPTO_AES_RISCV64 Architecture: riscv64 using: - Zvkned vector crypto extension =20 +config CRYPTO_AES_BLOCK_RISCV64 + tristate "Ciphers: AES, modes: ECB/CBC/CTR/XTS" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_AES_RISCV64 + select CRYPTO_SIMD + select CRYPTO_SKCIPHER + help + Length-preserving ciphers: AES cipher algorithms (FIPS-197) + with block cipher modes: + - ECB (Electronic Codebook) mode (NIST SP 800-38A) + - CBC (Cipher Block Chaining) mode (NIST SP 800-38A) + - CTR (Counter) mode (NIST SP 800-38A) + - XTS (XOR Encrypt XOR Tweakable Block Cipher with Ciphertext + Stealing) mode (NIST SP 800-38E and IEEE 1619) + + Architecture: riscv64 using: + - Zvkned vector crypto extension + - Zvbb vector extension (XTS) + - Zvkb vector crypto extension (CTR/XTS) + - Zvkg vector crypto extension (XTS) + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 90ca91d8df26..9574b009762f 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -6,10 +6,21 @@ obj-$(CONFIG_CRYPTO_AES_RISCV64) +=3D aes-riscv64.o aes-riscv64-y :=3D aes-riscv64-glue.o aes-riscv64-zvkned.o =20 +obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) +=3D aes-block-riscv64.o +aes-block-riscv64-y :=3D aes-riscv64-block-mode-glue.o aes-riscv64-zvkned-= zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o + quiet_cmd_perlasm =3D PERLASM $@ cmd_perlasm =3D $(PERL) $(<) void $(@) =20 $(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl $(call cmd,perlasm) =20 +$(obj)/aes-riscv64-zvkned-zvbb-zvkg.S: $(src)/aes-riscv64-zvkned-zvbb-zvkg= .pl + $(call cmd,perlasm) + +$(obj)/aes-riscv64-zvkned-zvkb.S: $(src)/aes-riscv64-zvkned-zvkb.pl + $(call cmd,perlasm) + clean-files +=3D aes-riscv64-zvkned.S +clean-files +=3D aes-riscv64-zvkned-zvbb-zvkg.S +clean-files +=3D aes-riscv64-zvkned-zvkb.S diff --git a/arch/riscv/crypto/aes-riscv64-block-mode-glue.c b/arch/riscv/c= rypto/aes-riscv64-block-mode-glue.c new file mode 100644 index 000000000000..36fdd83b11ef --- /dev/null +++ b/arch/riscv/crypto/aes-riscv64-block-mode-glue.c @@ -0,0 +1,514 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Port of the OpenSSL AES block mode implementations for RISC-V + * + * Copyright (C) 2023 SiFive, Inc. + * Author: Jerry Shih + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "aes-riscv64-glue.h" + +struct riscv64_aes_xts_ctx { + struct crypto_aes_ctx ctx1; + struct crypto_aes_ctx ctx2; +}; + +/* aes cbc block mode using zvkned vector crypto extension */ +asmlinkage void rv64i_zvkned_cbc_encrypt(const u8 *in, u8 *out, size_t len= gth, + const struct crypto_aes_ctx *key, + u8 *ivec); +asmlinkage void rv64i_zvkned_cbc_decrypt(const u8 *in, u8 *out, size_t len= gth, + const struct crypto_aes_ctx *key, + u8 *ivec); +/* aes ecb block mode using zvkned vector crypto extension */ +asmlinkage void rv64i_zvkned_ecb_encrypt(const u8 *in, u8 *out, size_t len= gth, + const struct crypto_aes_ctx *key); +asmlinkage void rv64i_zvkned_ecb_decrypt(const u8 *in, u8 *out, size_t len= gth, + const struct crypto_aes_ctx *key); + +/* aes ctr block mode using zvkb and zvkned vector crypto extension */ +/* This func operates on 32-bit counter. Caller has to handle the overflow= . */ +asmlinkage void +rv64i_zvkb_zvkned_ctr32_encrypt_blocks(const u8 *in, u8 *out, size_t lengt= h, + const struct crypto_aes_ctx *key, + u8 *ivec); + +/* aes xts block mode using zvbb, zvkg and zvkned vector crypto extension = */ +asmlinkage void +rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt(const u8 *in, u8 *out, size_t lengt= h, + const struct crypto_aes_ctx *key, u8 *iv, + int update_iv); +asmlinkage void +rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt(const u8 *in, u8 *out, size_t lengt= h, + const struct crypto_aes_ctx *key, u8 *iv, + int update_iv); + +typedef void (*aes_xts_func)(const u8 *in, u8 *out, size_t length, + const struct crypto_aes_ctx *key, u8 *iv, + int update_iv); + +/* ecb */ +static int aes_setkey(struct crypto_skcipher *tfm, const u8 *in_key, + unsigned int key_len) +{ + struct crypto_aes_ctx *ctx =3D crypto_skcipher_ctx(tfm); + + return riscv64_aes_setkey(ctx, in_key, key_len); +} + +static int ecb_encrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm =3D crypto_skcipher_reqtfm(req); + const struct crypto_aes_ctx *ctx =3D crypto_skcipher_ctx(tfm); + struct skcipher_walk walk; + unsigned int nbytes; + int err; + + /* If we have error here, the `nbytes` will be zero. */ + err =3D skcipher_walk_virt(&walk, req, false); + while ((nbytes =3D walk.nbytes)) { + kernel_vector_begin(); + rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr, + nbytes & (~(AES_BLOCK_SIZE - 1)), ctx); + kernel_vector_end(); + err =3D skcipher_walk_done(&walk, nbytes & (AES_BLOCK_SIZE - 1)); + } + + return err; +} + +static int ecb_decrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm =3D crypto_skcipher_reqtfm(req); + const struct crypto_aes_ctx *ctx =3D crypto_skcipher_ctx(tfm); + struct skcipher_walk walk; + unsigned int nbytes; + int err; + + err =3D skcipher_walk_virt(&walk, req, false); + while ((nbytes =3D walk.nbytes)) { + kernel_vector_begin(); + rv64i_zvkned_ecb_decrypt(walk.src.virt.addr, walk.dst.virt.addr, + nbytes & (~(AES_BLOCK_SIZE - 1)), ctx); + kernel_vector_end(); + err =3D skcipher_walk_done(&walk, nbytes & (AES_BLOCK_SIZE - 1)); + } + + return err; +} + +/* cbc */ +static int cbc_encrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm =3D crypto_skcipher_reqtfm(req); + const struct crypto_aes_ctx *ctx =3D crypto_skcipher_ctx(tfm); + struct skcipher_walk walk; + unsigned int nbytes; + int err; + + err =3D skcipher_walk_virt(&walk, req, false); + while ((nbytes =3D walk.nbytes)) { + kernel_vector_begin(); + rv64i_zvkned_cbc_encrypt(walk.src.virt.addr, walk.dst.virt.addr, + nbytes & (~(AES_BLOCK_SIZE - 1)), ctx, + walk.iv); + kernel_vector_end(); + err =3D skcipher_walk_done(&walk, nbytes & (AES_BLOCK_SIZE - 1)); + } + + return err; +} + +static int cbc_decrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm =3D crypto_skcipher_reqtfm(req); + const struct crypto_aes_ctx *ctx =3D crypto_skcipher_ctx(tfm); + struct skcipher_walk walk; + unsigned int nbytes; + int err; + + err =3D skcipher_walk_virt(&walk, req, false); + while ((nbytes =3D walk.nbytes)) { + kernel_vector_begin(); + rv64i_zvkned_cbc_decrypt(walk.src.virt.addr, walk.dst.virt.addr, + nbytes & (~(AES_BLOCK_SIZE - 1)), ctx, + walk.iv); + kernel_vector_end(); + err =3D skcipher_walk_done(&walk, nbytes & (AES_BLOCK_SIZE - 1)); + } + + return err; +} + +/* ctr */ +static int ctr_encrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm =3D crypto_skcipher_reqtfm(req); + const struct crypto_aes_ctx *ctx =3D crypto_skcipher_ctx(tfm); + struct skcipher_walk walk; + unsigned int ctr32; + unsigned int nbytes; + unsigned int blocks; + unsigned int current_blocks; + unsigned int current_length; + int err; + + /* the ctr iv uses big endian */ + ctr32 =3D get_unaligned_be32(req->iv + 12); + err =3D skcipher_walk_virt(&walk, req, false); + while ((nbytes =3D walk.nbytes)) { + if (nbytes !=3D walk.total) { + nbytes &=3D (~(AES_BLOCK_SIZE - 1)); + blocks =3D nbytes / AES_BLOCK_SIZE; + } else { + /* This is the last walk. We should handle the tail data. */ + blocks =3D DIV_ROUND_UP(nbytes, AES_BLOCK_SIZE); + } + ctr32 +=3D blocks; + + kernel_vector_begin(); + /* + * The `if` block below detects the overflow, which is then handled by + * limiting the amount of blocks to the exact overflow point. + */ + if (ctr32 >=3D blocks) { + rv64i_zvkb_zvkned_ctr32_encrypt_blocks( + walk.src.virt.addr, walk.dst.virt.addr, nbytes, + ctx, req->iv); + } else { + /* use 2 ctr32 function calls for overflow case */ + current_blocks =3D blocks - ctr32; + current_length =3D + min(nbytes, current_blocks * AES_BLOCK_SIZE); + rv64i_zvkb_zvkned_ctr32_encrypt_blocks( + walk.src.virt.addr, walk.dst.virt.addr, + current_length, ctx, req->iv); + crypto_inc(req->iv, 12); + + if (ctr32) { + rv64i_zvkb_zvkned_ctr32_encrypt_blocks( + walk.src.virt.addr + + current_blocks * AES_BLOCK_SIZE, + walk.dst.virt.addr + + current_blocks * AES_BLOCK_SIZE, + nbytes - current_length, ctx, req->iv); + } + } + kernel_vector_end(); + + err =3D skcipher_walk_done(&walk, walk.nbytes - nbytes); + } + + return err; +} + +/* xts */ +static int xts_setkey(struct crypto_skcipher *tfm, const u8 *in_key, + unsigned int key_len) +{ + struct riscv64_aes_xts_ctx *ctx =3D crypto_skcipher_ctx(tfm); + unsigned int xts_single_key_len =3D key_len / 2; + int ret; + + ret =3D xts_verify_key(tfm, in_key, key_len); + if (ret) + return ret; + ret =3D riscv64_aes_setkey(&ctx->ctx1, in_key, xts_single_key_len); + if (ret) + return ret; + return riscv64_aes_setkey(&ctx->ctx2, in_key + xts_single_key_len, + xts_single_key_len); +} + +static int xts_crypt(struct skcipher_request *req, aes_xts_func func) +{ + struct crypto_skcipher *tfm =3D crypto_skcipher_reqtfm(req); + const struct riscv64_aes_xts_ctx *ctx =3D crypto_skcipher_ctx(tfm); + struct skcipher_request sub_req; + struct scatterlist sg_src[2], sg_dst[2]; + struct scatterlist *src, *dst; + struct skcipher_walk walk; + unsigned int walk_size =3D crypto_skcipher_walksize(tfm); + unsigned int tail_bytes; + unsigned int head_bytes; + unsigned int nbytes; + unsigned int update_iv =3D 1; + int err; + + /* xts input size should be bigger than AES_BLOCK_SIZE */ + if (req->cryptlen < AES_BLOCK_SIZE) + return -EINVAL; + + /* + * We split xts-aes cryption into `head` and `tail` parts. + * The head block contains the input from the beginning which doesn't need + * `ciphertext stealing` method. + * The tail block contains at least two AES blocks including ciphertext + * stealing data from the end. + */ + if (req->cryptlen <=3D walk_size) { + /* + * All data is in one `walk`. We could handle it within one AES-XTS call= in + * the end. + */ + tail_bytes =3D req->cryptlen; + head_bytes =3D 0; + } else { + if (req->cryptlen & (AES_BLOCK_SIZE - 1)) { + /* + * with ciphertext stealing + * + * Find the largest tail size which is small than `walk` size while the + * head part still fits AES block boundary. + */ + tail_bytes =3D req->cryptlen & (AES_BLOCK_SIZE - 1); + tail_bytes =3D walk_size + tail_bytes - AES_BLOCK_SIZE; + head_bytes =3D req->cryptlen - tail_bytes; + } else { + /* no ciphertext stealing */ + tail_bytes =3D 0; + head_bytes =3D req->cryptlen; + } + } + + riscv64_aes_encrypt_zvkned(&ctx->ctx2, req->iv, req->iv); + + if (head_bytes && tail_bytes) { + /* If we have to parts, setup new request for head part only. */ + skcipher_request_set_tfm(&sub_req, tfm); + skcipher_request_set_callback( + &sub_req, skcipher_request_flags(req), NULL, NULL); + skcipher_request_set_crypt(&sub_req, req->src, req->dst, + head_bytes, req->iv); + req =3D &sub_req; + } + + if (head_bytes) { + err =3D skcipher_walk_virt(&walk, req, false); + while ((nbytes =3D walk.nbytes)) { + if (nbytes =3D=3D walk.total) + update_iv =3D (tail_bytes > 0); + + nbytes &=3D (~(AES_BLOCK_SIZE - 1)); + kernel_vector_begin(); + func(walk.src.virt.addr, walk.dst.virt.addr, nbytes, + &ctx->ctx1, req->iv, update_iv); + kernel_vector_end(); + + err =3D skcipher_walk_done(&walk, walk.nbytes - nbytes); + } + if (err || !tail_bytes) + return err; + + /* + * Setup new request for tail part. + * We use `scatterwalk_next()` to find the next scatterlist from last + * walk instead of iterating from the beginning. + */ + dst =3D src =3D scatterwalk_next(sg_src, &walk.in); + if (req->dst !=3D req->src) + dst =3D scatterwalk_next(sg_dst, &walk.out); + skcipher_request_set_crypt(req, src, dst, tail_bytes, req->iv); + } + + /* tail */ + err =3D skcipher_walk_virt(&walk, req, false); + if (err) + return err; + if (walk.nbytes !=3D tail_bytes) + return -EINVAL; + kernel_vector_begin(); + func(walk.src.virt.addr, walk.dst.virt.addr, walk.nbytes, &ctx->ctx1, + req->iv, 0); + kernel_vector_end(); + + return skcipher_walk_done(&walk, 0); +} + +static int xts_encrypt(struct skcipher_request *req) +{ + return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt); +} + +static int xts_decrypt(struct skcipher_request *req) +{ + return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt); +} + +static struct skcipher_alg riscv64_aes_algs_zvkned[] =3D { + { + .setkey =3D aes_setkey, + .encrypt =3D ecb_encrypt, + .decrypt =3D ecb_decrypt, + .min_keysize =3D AES_MIN_KEY_SIZE, + .max_keysize =3D AES_MAX_KEY_SIZE, + .walksize =3D AES_BLOCK_SIZE * 8, + .base =3D { + .cra_flags =3D CRYPTO_ALG_INTERNAL, + .cra_blocksize =3D AES_BLOCK_SIZE, + .cra_ctxsize =3D sizeof(struct crypto_aes_ctx), + .cra_priority =3D 300, + .cra_name =3D "__ecb(aes)", + .cra_driver_name =3D "__ecb-aes-riscv64-zvkned", + .cra_module =3D THIS_MODULE, + }, + }, { + .setkey =3D aes_setkey, + .encrypt =3D cbc_encrypt, + .decrypt =3D cbc_decrypt, + .min_keysize =3D AES_MIN_KEY_SIZE, + .max_keysize =3D AES_MAX_KEY_SIZE, + .ivsize =3D AES_BLOCK_SIZE, + .walksize =3D AES_BLOCK_SIZE * 8, + .base =3D { + .cra_flags =3D CRYPTO_ALG_INTERNAL, + .cra_blocksize =3D AES_BLOCK_SIZE, + .cra_ctxsize =3D sizeof(struct crypto_aes_ctx), + .cra_priority =3D 300, + .cra_name =3D "__cbc(aes)", + .cra_driver_name =3D "__cbc-aes-riscv64-zvkned", + .cra_module =3D THIS_MODULE, + }, + } +}; + +static struct simd_skcipher_alg + *riscv64_aes_simd_algs_zvkned[ARRAY_SIZE(riscv64_aes_algs_zvkned)]; + +static struct skcipher_alg riscv64_aes_alg_zvkned_zvkb[] =3D { + { + .setkey =3D aes_setkey, + .encrypt =3D ctr_encrypt, + .decrypt =3D ctr_encrypt, + .min_keysize =3D AES_MIN_KEY_SIZE, + .max_keysize =3D AES_MAX_KEY_SIZE, + .ivsize =3D AES_BLOCK_SIZE, + .chunksize =3D AES_BLOCK_SIZE, + .walksize =3D AES_BLOCK_SIZE * 8, + .base =3D { + .cra_flags =3D CRYPTO_ALG_INTERNAL, + .cra_blocksize =3D 1, + .cra_ctxsize =3D sizeof(struct crypto_aes_ctx), + .cra_priority =3D 300, + .cra_name =3D "__ctr(aes)", + .cra_driver_name =3D "__ctr-aes-riscv64-zvkned-zvkb", + .cra_module =3D THIS_MODULE, + }, + } +}; + +static struct simd_skcipher_alg *riscv64_aes_simd_alg_zvkned_zvkb[ARRAY_SI= ZE( + riscv64_aes_alg_zvkned_zvkb)]; + +static struct skcipher_alg riscv64_aes_alg_zvkned_zvbb_zvkg[] =3D { + { + .setkey =3D xts_setkey, + .encrypt =3D xts_encrypt, + .decrypt =3D xts_decrypt, + .min_keysize =3D AES_MIN_KEY_SIZE * 2, + .max_keysize =3D AES_MAX_KEY_SIZE * 2, + .ivsize =3D AES_BLOCK_SIZE, + .chunksize =3D AES_BLOCK_SIZE, + .walksize =3D AES_BLOCK_SIZE * 8, + .base =3D { + .cra_flags =3D CRYPTO_ALG_INTERNAL, + .cra_blocksize =3D AES_BLOCK_SIZE, + .cra_ctxsize =3D sizeof(struct riscv64_aes_xts_ctx), + .cra_priority =3D 300, + .cra_name =3D "__xts(aes)", + .cra_driver_name =3D "__xts-aes-riscv64-zvkned-zvbb-zvkg", + .cra_module =3D THIS_MODULE, + }, + } +}; + +static struct simd_skcipher_alg + *riscv64_aes_simd_alg_zvkned_zvbb_zvkg[ARRAY_SIZE( + riscv64_aes_alg_zvkned_zvbb_zvkg)]; + +static int __init riscv64_aes_block_mod_init(void) +{ + int ret =3D -ENODEV; + + if (riscv_isa_extension_available(NULL, ZVKNED) && + riscv_vector_vlen() >=3D 128 && riscv_vector_vlen() <=3D 2048) { + ret =3D simd_register_skciphers_compat( + riscv64_aes_algs_zvkned, + ARRAY_SIZE(riscv64_aes_algs_zvkned), + riscv64_aes_simd_algs_zvkned); + if (ret) + return ret; + + if (riscv_isa_extension_available(NULL, ZVBB)) { + ret =3D simd_register_skciphers_compat( + riscv64_aes_alg_zvkned_zvkb, + ARRAY_SIZE(riscv64_aes_alg_zvkned_zvkb), + riscv64_aes_simd_alg_zvkned_zvkb); + if (ret) + goto unregister_zvkned; + + if (riscv_isa_extension_available(NULL, ZVKG)) { + ret =3D simd_register_skciphers_compat( + riscv64_aes_alg_zvkned_zvbb_zvkg, + ARRAY_SIZE( + riscv64_aes_alg_zvkned_zvbb_zvkg), + riscv64_aes_simd_alg_zvkned_zvbb_zvkg); + if (ret) + goto unregister_zvkned_zvkb; + } + } + } + + return ret; + +unregister_zvkned_zvkb: + simd_unregister_skciphers(riscv64_aes_alg_zvkned_zvkb, + ARRAY_SIZE(riscv64_aes_alg_zvkned_zvkb), + riscv64_aes_simd_alg_zvkned_zvkb); +unregister_zvkned: + simd_unregister_skciphers(riscv64_aes_algs_zvkned, + ARRAY_SIZE(riscv64_aes_algs_zvkned), + riscv64_aes_simd_algs_zvkned); + + return ret; +} + +static void __exit riscv64_aes_block_mod_fini(void) +{ + simd_unregister_skciphers(riscv64_aes_alg_zvkned_zvbb_zvkg, + ARRAY_SIZE(riscv64_aes_alg_zvkned_zvbb_zvkg), + riscv64_aes_simd_alg_zvkned_zvbb_zvkg); + simd_unregister_skciphers(riscv64_aes_alg_zvkned_zvkb, + ARRAY_SIZE(riscv64_aes_alg_zvkned_zvkb), + riscv64_aes_simd_alg_zvkned_zvkb); + simd_unregister_skciphers(riscv64_aes_algs_zvkned, + ARRAY_SIZE(riscv64_aes_algs_zvkned), + riscv64_aes_simd_algs_zvkned); +} + +module_init(riscv64_aes_block_mod_init); +module_exit(riscv64_aes_block_mod_fini); + +MODULE_DESCRIPTION("AES-ECB/CBC/CTR/XTS (RISC-V accelerated)"); +MODULE_AUTHOR("Jerry Shih "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("cbc(aes)"); +MODULE_ALIAS_CRYPTO("ctr(aes)"); +MODULE_ALIAS_CRYPTO("ecb(aes)"); +MODULE_ALIAS_CRYPTO("xts(aes)"); diff --git a/arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.pl b/arch/riscv= /crypto/aes-riscv64-zvkned-zvbb-zvkg.pl new file mode 100644 index 000000000000..6b6aad1cc97a --- /dev/null +++ b/arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.pl @@ -0,0 +1,949 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Jerry Shih +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# - RV64I +# - RISC-V Vector ('V') with VLEN >=3D 128 && VLEN <=3D 2048 +# - RISC-V Vector Bit-manipulation extension ('Zvbb') +# - RISC-V Vector GCM/GMAC extension ('Zvkg') +# - RISC-V Vector AES block cipher extension ('Zvkned') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extensio= n) +# $flavour is the first argument if it doesn't look like a file +my $output =3D $#ARGV >=3D 0 && $ARGV[$#ARGV] =3D~ m|\.\w+$| ? pop : undef; +my $flavour =3D $#ARGV >=3D 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=3D<<___; +.text +___ + +{ +##########################################################################= ###### +# void rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt(const unsigned char *in, +# unsigned char *out, size_t l= ength, +# const AES_KEY *key, +# unsigned char iv[16], +# int update_iv) +my ($INPUT, $OUTPUT, $LENGTH, $KEY, $IV, $UPDATE_IV) =3D ("a0", "a1", "a2"= , "a3", "a4", "a5"); +my ($TAIL_LENGTH) =3D ("a6"); +my ($VL) =3D ("a7"); +my ($T0, $T1, $T2, $T3) =3D ("t0", "t1", "t2", "t3"); +my ($STORE_LEN32) =3D ("t4"); +my ($LEN32) =3D ("t5"); +my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, + $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15, + $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23, + $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, +) =3D map("v$_",(0..31)); + +# load iv to v28 +sub load_xts_iv0 { + my $code=3D<<___; + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V28, $IV]} +___ + + return $code; +} + +# prepare input data(v24), iv(v28), bit-reversed-iv(v16), bit-reversed-iv-= multiplier(v20) +sub init_first_round { + my $code=3D<<___; + # load input + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + @{[vle32_v $V24, $INPUT]} + + li $T0, 5 + # We could simplify the initialization steps if we have `block<=3D1`. + blt $LEN32, $T0, 1f + + # Note: We use `vgmul` for GF(2^128) multiplication. The `vgmul` uses + # different order of coefficients. We should use`vbrev8` to reverse the + # data when we use `vgmul`. + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vbrev8_v $V0, $V28]} + @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]} + @{[vmv_v_i $V16, 0]} + # v16: [r-IV0, r-IV0, ...] + @{[vaesz_vs $V16, $V0]} + + # Prepare GF(2^128) multiplier [1, x, x^2, x^3, ...] in v8. + # We use `vwsll` to get power of 2 multipliers. Current rvv spec only + # supports `SEW<=3D64`. So, the maximum `VLEN` for this approach is `2= 048`. + # SEW64_BITS * AES_BLOCK_SIZE / LMUL + # =3D 64 * 128 / 4 =3D 2048 + # + # TODO: truncate the vl to `2048` for `vlen>2048` case. + slli $T0, $LEN32, 2 + @{[vsetvli "zero", $T0, "e32", "m1", "ta", "ma"]} + # v2: [`1`, `1`, `1`, `1`, ...] + @{[vmv_v_i $V2, 1]} + # v3: [`0`, `1`, `2`, `3`, ...] + @{[vid_v $V3]} + @{[vsetvli "zero", $T0, "e64", "m2", "ta", "ma"]} + # v4: [`1`, 0, `1`, 0, `1`, 0, `1`, 0, ...] + @{[vzext_vf2 $V4, $V2]} + # v6: [`0`, 0, `1`, 0, `2`, 0, `3`, 0, ...] + @{[vzext_vf2 $V6, $V3]} + slli $T0, $LEN32, 1 + @{[vsetvli "zero", $T0, "e32", "m2", "ta", "ma"]} + # v8: [1<<0=3D1, 0, 0, 0, 1<<1=3Dx, 0, 0, 0, 1<<2=3Dx^2, 0, 0, 0, ...] + @{[vwsll_vv $V8, $V4, $V6]} + + # Compute [r-IV0*1, r-IV0*x, r-IV0*x^2, r-IV0*x^3, ...] in v16 + @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]} + @{[vbrev8_v $V8, $V8]} + @{[vgmul_vv $V16, $V8]} + + # Compute [IV0*1, IV0*x, IV0*x^2, IV0*x^3, ...] in v28. + # Reverse the bits order back. + @{[vbrev8_v $V28, $V16]} + + # Prepare the x^n multiplier in v20. The `n` is the aes-xts block numb= er + # in a LMUL=3D4 register group. + # n =3D ((VLEN*LMUL)/(32*4)) =3D ((VLEN*4)/(32*4)) + # =3D (VLEN/32) + # We could use vsetvli with `e32, m1` to compute the `n` number. + @{[vsetvli $T0, "zero", "e32", "m1", "ta", "ma"]} + li $T1, 1 + sll $T0, $T1, $T0 + @{[vsetivli "zero", 2, "e64", "m1", "ta", "ma"]} + @{[vmv_v_i $V0, 0]} + @{[vsetivli "zero", 1, "e64", "m1", "tu", "ma"]} + @{[vmv_v_x $V0, $T0]} + @{[vsetivli "zero", 2, "e64", "m1", "ta", "ma"]} + @{[vbrev8_v $V0, $V0]} + @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]} + @{[vmv_v_i $V20, 0]} + @{[vaesz_vs $V20, $V0]} + + j 2f +1: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vbrev8_v $V16, $V28]} +2: +___ + + return $code; +} + +# prepare xts enc last block's input(v24) and iv(v28) +sub handle_xts_enc_last_block { + my $code=3D<<___; + bnez $TAIL_LENGTH, 2f + + beqz $UPDATE_IV, 1f + ## Store next IV + addi $VL, $VL, -4 + @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]} + # multiplier + @{[vslidedown_vx $V16, $V16, $VL]} + + # setup `x` multiplier with byte-reversed order + # 0b00000010 =3D> 0b01000000 (0x40) + li $T0, 0x40 + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vmv_v_i $V28, 0]} + @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]} + @{[vmv_v_x $V28, $T0]} + + # IV * `x` + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vgmul_vv $V16, $V28]} + # Reverse the IV's bits order back to big-endian + @{[vbrev8_v $V28, $V16]} + + @{[vse32_v $V28, $IV]} +1: + + ret +2: + # slidedown second to last block + addi $VL, $VL, -4 + @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]} + # ciphertext + @{[vslidedown_vx $V24, $V24, $VL]} + # multiplier + @{[vslidedown_vx $V16, $V16, $VL]} + + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vmv_v_v $V25, $V24]} + + # load last block into v24 + # note: We should load the last block before store the second to last = block + # for in-place operation. + @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]} + @{[vle8_v $V24, $INPUT]} + + # setup `x` multiplier with byte-reversed order + # 0b00000010 =3D> 0b01000000 (0x40) + li $T0, 0x40 + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vmv_v_i $V28, 0]} + @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]} + @{[vmv_v_x $V28, $T0]} + + # compute IV for last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vgmul_vv $V16, $V28]} + @{[vbrev8_v $V28, $V16]} + + # store second to last block + @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "ta", "ma"]} + @{[vse8_v $V25, $OUTPUT]} +___ + + return $code; +} + +# prepare xts dec second to last block's input(v24) and iv(v29) and +# last block's and iv(v28) +sub handle_xts_dec_last_block { + my $code=3D<<___; + bnez $TAIL_LENGTH, 2f + + beqz $UPDATE_IV, 1f + ## Store next IV + # setup `x` multiplier with byte-reversed order + # 0b00000010 =3D> 0b01000000 (0x40) + li $T0, 0x40 + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vmv_v_i $V28, 0]} + @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]} + @{[vmv_v_x $V28, $T0]} + + beqz $LENGTH, 3f + addi $VL, $VL, -4 + @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]} + # multiplier + @{[vslidedown_vx $V16, $V16, $VL]} + +3: + # IV * `x` + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vgmul_vv $V16, $V28]} + # Reverse the IV's bits order back to big-endian + @{[vbrev8_v $V28, $V16]} + + @{[vse32_v $V28, $IV]} +1: + + ret +2: + # load second to last block's ciphertext + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V24, $INPUT]} + addi $INPUT, $INPUT, 16 + + # setup `x` multiplier with byte-reversed order + # 0b00000010 =3D> 0b01000000 (0x40) + li $T0, 0x40 + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vmv_v_i $V20, 0]} + @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]} + @{[vmv_v_x $V20, $T0]} + + beqz $LENGTH, 1f + # slidedown third to last block + addi $VL, $VL, -4 + @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]} + # multiplier + @{[vslidedown_vx $V16, $V16, $VL]} + + # compute IV for last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vgmul_vv $V16, $V20]} + @{[vbrev8_v $V28, $V16]} + + # compute IV for second to last block + @{[vgmul_vv $V16, $V20]} + @{[vbrev8_v $V29, $V16]} + j 2f +1: + # compute IV for second to last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vgmul_vv $V16, $V20]} + @{[vbrev8_v $V29, $V16]} +2: +___ + + return $code; +} + +# Load all 11 round keys to v1-v11 registers. +sub aes_128_load_key { + my $code=3D<<___; + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V2, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V3, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V4, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V5, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V6, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V7, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V8, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V9, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V10, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V11, $KEY]} +___ + + return $code; +} + +# Load all 13 round keys to v1-v13 registers. +sub aes_192_load_key { + my $code=3D<<___; + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V2, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V3, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V4, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V5, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V6, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V7, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V8, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V9, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V10, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V11, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V12, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V13, $KEY]} +___ + + return $code; +} + +# Load all 15 round keys to v1-v15 registers. +sub aes_256_load_key { + my $code=3D<<___; + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V2, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V3, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V4, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V5, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V6, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V7, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V8, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V9, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V10, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V11, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V12, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V13, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V14, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V15, $KEY]} +___ + + return $code; +} + +# aes-128 enc with round keys v1-v11 +sub aes_128_enc { + my $code=3D<<___; + @{[vaesz_vs $V24, $V1]} + @{[vaesem_vs $V24, $V2]} + @{[vaesem_vs $V24, $V3]} + @{[vaesem_vs $V24, $V4]} + @{[vaesem_vs $V24, $V5]} + @{[vaesem_vs $V24, $V6]} + @{[vaesem_vs $V24, $V7]} + @{[vaesem_vs $V24, $V8]} + @{[vaesem_vs $V24, $V9]} + @{[vaesem_vs $V24, $V10]} + @{[vaesef_vs $V24, $V11]} +___ + + return $code; +} + +# aes-128 dec with round keys v1-v11 +sub aes_128_dec { + my $code=3D<<___; + @{[vaesz_vs $V24, $V11]} + @{[vaesdm_vs $V24, $V10]} + @{[vaesdm_vs $V24, $V9]} + @{[vaesdm_vs $V24, $V8]} + @{[vaesdm_vs $V24, $V7]} + @{[vaesdm_vs $V24, $V6]} + @{[vaesdm_vs $V24, $V5]} + @{[vaesdm_vs $V24, $V4]} + @{[vaesdm_vs $V24, $V3]} + @{[vaesdm_vs $V24, $V2]} + @{[vaesdf_vs $V24, $V1]} +___ + + return $code; +} + +# aes-192 enc with round keys v1-v13 +sub aes_192_enc { + my $code=3D<<___; + @{[vaesz_vs $V24, $V1]} + @{[vaesem_vs $V24, $V2]} + @{[vaesem_vs $V24, $V3]} + @{[vaesem_vs $V24, $V4]} + @{[vaesem_vs $V24, $V5]} + @{[vaesem_vs $V24, $V6]} + @{[vaesem_vs $V24, $V7]} + @{[vaesem_vs $V24, $V8]} + @{[vaesem_vs $V24, $V9]} + @{[vaesem_vs $V24, $V10]} + @{[vaesem_vs $V24, $V11]} + @{[vaesem_vs $V24, $V12]} + @{[vaesef_vs $V24, $V13]} +___ + + return $code; +} + +# aes-192 dec with round keys v1-v13 +sub aes_192_dec { + my $code=3D<<___; + @{[vaesz_vs $V24, $V13]} + @{[vaesdm_vs $V24, $V12]} + @{[vaesdm_vs $V24, $V11]} + @{[vaesdm_vs $V24, $V10]} + @{[vaesdm_vs $V24, $V9]} + @{[vaesdm_vs $V24, $V8]} + @{[vaesdm_vs $V24, $V7]} + @{[vaesdm_vs $V24, $V6]} + @{[vaesdm_vs $V24, $V5]} + @{[vaesdm_vs $V24, $V4]} + @{[vaesdm_vs $V24, $V3]} + @{[vaesdm_vs $V24, $V2]} + @{[vaesdf_vs $V24, $V1]} +___ + + return $code; +} + +# aes-256 enc with round keys v1-v15 +sub aes_256_enc { + my $code=3D<<___; + @{[vaesz_vs $V24, $V1]} + @{[vaesem_vs $V24, $V2]} + @{[vaesem_vs $V24, $V3]} + @{[vaesem_vs $V24, $V4]} + @{[vaesem_vs $V24, $V5]} + @{[vaesem_vs $V24, $V6]} + @{[vaesem_vs $V24, $V7]} + @{[vaesem_vs $V24, $V8]} + @{[vaesem_vs $V24, $V9]} + @{[vaesem_vs $V24, $V10]} + @{[vaesem_vs $V24, $V11]} + @{[vaesem_vs $V24, $V12]} + @{[vaesem_vs $V24, $V13]} + @{[vaesem_vs $V24, $V14]} + @{[vaesef_vs $V24, $V15]} +___ + + return $code; +} + +# aes-256 dec with round keys v1-v15 +sub aes_256_dec { + my $code=3D<<___; + @{[vaesz_vs $V24, $V15]} + @{[vaesdm_vs $V24, $V14]} + @{[vaesdm_vs $V24, $V13]} + @{[vaesdm_vs $V24, $V12]} + @{[vaesdm_vs $V24, $V11]} + @{[vaesdm_vs $V24, $V10]} + @{[vaesdm_vs $V24, $V9]} + @{[vaesdm_vs $V24, $V8]} + @{[vaesdm_vs $V24, $V7]} + @{[vaesdm_vs $V24, $V6]} + @{[vaesdm_vs $V24, $V5]} + @{[vaesdm_vs $V24, $V4]} + @{[vaesdm_vs $V24, $V3]} + @{[vaesdm_vs $V24, $V2]} + @{[vaesdf_vs $V24, $V1]} +___ + + return $code; +} + +$code .=3D <<___; +.p2align 3 +.globl rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt +.type rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt,\@function +rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt: + @{[load_xts_iv0]} + + # aes block size is 16 + andi $TAIL_LENGTH, $LENGTH, 15 + mv $STORE_LEN32, $LENGTH + beqz $TAIL_LENGTH, 1f + sub $LENGTH, $LENGTH, $TAIL_LENGTH + addi $STORE_LEN32, $LENGTH, -16 +1: + # We make the `LENGTH` become e32 length here. + srli $LEN32, $LENGTH, 2 + srli $STORE_LEN32, $STORE_LEN32, 2 + + # Load key length. + lwu $T0, 480($KEY) + li $T1, 32 + li $T2, 24 + li $T3, 16 + beq $T0, $T1, aes_xts_enc_256 + beq $T0, $T2, aes_xts_enc_192 + beq $T0, $T3, aes_xts_enc_128 +.size rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt,.-rv64i_zvbb_zvkg_zvkned_aes_= xts_encrypt +___ + +$code .=3D <<___; +.p2align 3 +aes_xts_enc_128: + @{[init_first_round]} + @{[aes_128_load_key]} + + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + j 1f + +.Lenc_blocks_128: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + # load plaintext into v24 + @{[vle32_v $V24, $INPUT]} + # update iv + @{[vgmul_vv $V16, $V20]} + # reverse the iv's bits order back + @{[vbrev8_v $V28, $V16]} +1: + @{[vxor_vv $V24, $V24, $V28]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + add $INPUT, $INPUT, $T0 + @{[aes_128_enc]} + @{[vxor_vv $V24, $V24, $V28]} + + # store ciphertext + @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]} + @{[vse32_v $V24, $OUTPUT]} + add $OUTPUT, $OUTPUT, $T0 + sub $STORE_LEN32, $STORE_LEN32, $VL + + bnez $LEN32, .Lenc_blocks_128 + + @{[handle_xts_enc_last_block]} + + # xts last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V28]} + @{[aes_128_enc]} + @{[vxor_vv $V24, $V24, $V28]} + + # store last block ciphertext + addi $OUTPUT, $OUTPUT, -16 + @{[vse32_v $V24, $OUTPUT]} + + ret +.size aes_xts_enc_128,.-aes_xts_enc_128 +___ + +$code .=3D <<___; +.p2align 3 +aes_xts_enc_192: + @{[init_first_round]} + @{[aes_192_load_key]} + + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + j 1f + +.Lenc_blocks_192: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + # load plaintext into v24 + @{[vle32_v $V24, $INPUT]} + # update iv + @{[vgmul_vv $V16, $V20]} + # reverse the iv's bits order back + @{[vbrev8_v $V28, $V16]} +1: + @{[vxor_vv $V24, $V24, $V28]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + add $INPUT, $INPUT, $T0 + @{[aes_192_enc]} + @{[vxor_vv $V24, $V24, $V28]} + + # store ciphertext + @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]} + @{[vse32_v $V24, $OUTPUT]} + add $OUTPUT, $OUTPUT, $T0 + sub $STORE_LEN32, $STORE_LEN32, $VL + + bnez $LEN32, .Lenc_blocks_192 + + @{[handle_xts_enc_last_block]} + + # xts last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V28]} + @{[aes_192_enc]} + @{[vxor_vv $V24, $V24, $V28]} + + # store last block ciphertext + addi $OUTPUT, $OUTPUT, -16 + @{[vse32_v $V24, $OUTPUT]} + + ret +.size aes_xts_enc_192,.-aes_xts_enc_192 +___ + +$code .=3D <<___; +.p2align 3 +aes_xts_enc_256: + @{[init_first_round]} + @{[aes_256_load_key]} + + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + j 1f + +.Lenc_blocks_256: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + # load plaintext into v24 + @{[vle32_v $V24, $INPUT]} + # update iv + @{[vgmul_vv $V16, $V20]} + # reverse the iv's bits order back + @{[vbrev8_v $V28, $V16]} +1: + @{[vxor_vv $V24, $V24, $V28]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + add $INPUT, $INPUT, $T0 + @{[aes_256_enc]} + @{[vxor_vv $V24, $V24, $V28]} + + # store ciphertext + @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]} + @{[vse32_v $V24, $OUTPUT]} + add $OUTPUT, $OUTPUT, $T0 + sub $STORE_LEN32, $STORE_LEN32, $VL + + bnez $LEN32, .Lenc_blocks_256 + + @{[handle_xts_enc_last_block]} + + # xts last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V28]} + @{[aes_256_enc]} + @{[vxor_vv $V24, $V24, $V28]} + + # store last block ciphertext + addi $OUTPUT, $OUTPUT, -16 + @{[vse32_v $V24, $OUTPUT]} + + ret +.size aes_xts_enc_256,.-aes_xts_enc_256 +___ + +##########################################################################= ###### +# void rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt(const unsigned char *in, +# unsigned char *out, size_t l= ength, +# const AES_KEY *key, +# unsigned char iv[16], +# int update_iv) +$code .=3D <<___; +.p2align 3 +.globl rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt +.type rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt,\@function +rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt: + @{[load_xts_iv0]} + + # aes block size is 16 + andi $TAIL_LENGTH, $LENGTH, 15 + beqz $TAIL_LENGTH, 1f + sub $LENGTH, $LENGTH, $TAIL_LENGTH + addi $LENGTH, $LENGTH, -16 +1: + # We make the `LENGTH` become e32 length here. + srli $LEN32, $LENGTH, 2 + + # Load key length. + lwu $T0, 480($KEY) + li $T1, 32 + li $T2, 24 + li $T3, 16 + beq $T0, $T1, aes_xts_dec_256 + beq $T0, $T2, aes_xts_dec_192 + beq $T0, $T3, aes_xts_dec_128 +.size rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt,.-rv64i_zvbb_zvkg_zvkned_aes_= xts_decrypt +___ + +$code .=3D <<___; +.p2align 3 +aes_xts_dec_128: + @{[init_first_round]} + @{[aes_128_load_key]} + + beqz $LEN32, 2f + + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + j 1f + +.Ldec_blocks_128: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + # load ciphertext into v24 + @{[vle32_v $V24, $INPUT]} + # update iv + @{[vgmul_vv $V16, $V20]} + # reverse the iv's bits order back + @{[vbrev8_v $V28, $V16]} +1: + @{[vxor_vv $V24, $V24, $V28]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + add $INPUT, $INPUT, $T0 + @{[aes_128_dec]} + @{[vxor_vv $V24, $V24, $V28]} + + # store plaintext + @{[vse32_v $V24, $OUTPUT]} + add $OUTPUT, $OUTPUT, $T0 + + bnez $LEN32, .Ldec_blocks_128 + +2: + @{[handle_xts_dec_last_block]} + + ## xts second to last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V29]} + @{[aes_128_dec]} + @{[vxor_vv $V24, $V24, $V29]} + @{[vmv_v_v $V25, $V24]} + + # load last block ciphertext + @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]} + @{[vle8_v $V24, $INPUT]} + + # store second to last block plaintext + addi $T0, $OUTPUT, 16 + @{[vse8_v $V25, $T0]} + + ## xts last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V28]} + @{[aes_128_dec]} + @{[vxor_vv $V24, $V24, $V28]} + + # store second to last block plaintext + @{[vse32_v $V24, $OUTPUT]} + + ret +.size aes_xts_dec_128,.-aes_xts_dec_128 +___ + +$code .=3D <<___; +.p2align 3 +aes_xts_dec_192: + @{[init_first_round]} + @{[aes_192_load_key]} + + beqz $LEN32, 2f + + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + j 1f + +.Ldec_blocks_192: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + # load ciphertext into v24 + @{[vle32_v $V24, $INPUT]} + # update iv + @{[vgmul_vv $V16, $V20]} + # reverse the iv's bits order back + @{[vbrev8_v $V28, $V16]} +1: + @{[vxor_vv $V24, $V24, $V28]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + add $INPUT, $INPUT, $T0 + @{[aes_192_dec]} + @{[vxor_vv $V24, $V24, $V28]} + + # store plaintext + @{[vse32_v $V24, $OUTPUT]} + add $OUTPUT, $OUTPUT, $T0 + + bnez $LEN32, .Ldec_blocks_192 + +2: + @{[handle_xts_dec_last_block]} + + ## xts second to last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V29]} + @{[aes_192_dec]} + @{[vxor_vv $V24, $V24, $V29]} + @{[vmv_v_v $V25, $V24]} + + # load last block ciphertext + @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]} + @{[vle8_v $V24, $INPUT]} + + # store second to last block plaintext + addi $T0, $OUTPUT, 16 + @{[vse8_v $V25, $T0]} + + ## xts last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V28]} + @{[aes_192_dec]} + @{[vxor_vv $V24, $V24, $V28]} + + # store second to last block plaintext + @{[vse32_v $V24, $OUTPUT]} + + ret +.size aes_xts_dec_192,.-aes_xts_dec_192 +___ + +$code .=3D <<___; +.p2align 3 +aes_xts_dec_256: + @{[init_first_round]} + @{[aes_256_load_key]} + + beqz $LEN32, 2f + + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + j 1f + +.Ldec_blocks_256: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + # load ciphertext into v24 + @{[vle32_v $V24, $INPUT]} + # update iv + @{[vgmul_vv $V16, $V20]} + # reverse the iv's bits order back + @{[vbrev8_v $V28, $V16]} +1: + @{[vxor_vv $V24, $V24, $V28]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + add $INPUT, $INPUT, $T0 + @{[aes_256_dec]} + @{[vxor_vv $V24, $V24, $V28]} + + # store plaintext + @{[vse32_v $V24, $OUTPUT]} + add $OUTPUT, $OUTPUT, $T0 + + bnez $LEN32, .Ldec_blocks_256 + +2: + @{[handle_xts_dec_last_block]} + + ## xts second to last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V29]} + @{[aes_256_dec]} + @{[vxor_vv $V24, $V24, $V29]} + @{[vmv_v_v $V25, $V24]} + + # load last block ciphertext + @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]} + @{[vle8_v $V24, $INPUT]} + + # store second to last block plaintext + addi $T0, $OUTPUT, 16 + @{[vse8_v $V25, $T0]} + + ## xts last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V28]} + @{[aes_256_dec]} + @{[vxor_vv $V24, $V24, $V28]} + + # store second to last block plaintext + @{[vse32_v $V24, $OUTPUT]} + + ret +.size aes_xts_dec_256,.-aes_xts_dec_256 +___ +} + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; diff --git a/arch/riscv/crypto/aes-riscv64-zvkned-zvkb.pl b/arch/riscv/cryp= to/aes-riscv64-zvkned-zvkb.pl new file mode 100644 index 000000000000..3b8c324bc4d5 --- /dev/null +++ b/arch/riscv/crypto/aes-riscv64-zvkned-zvkb.pl @@ -0,0 +1,415 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Jerry Shih +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# - RV64I +# - RISC-V Vector ('V') with VLEN >=3D 128 +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') +# - RISC-V Vector AES block cipher extension ('Zvkned') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extensio= n) +# $flavour is the first argument if it doesn't look like a file +my $output =3D $#ARGV >=3D 0 && $ARGV[$#ARGV] =3D~ m|\.\w+$| ? pop : undef; +my $flavour =3D $#ARGV >=3D 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=3D<<___; +.text +___ + +##########################################################################= ###### +# void rv64i_zvkb_zvkned_ctr32_encrypt_blocks(const unsigned char *in, +# unsigned char *out, size_t l= ength, +# const void *key, +# unsigned char ivec[16]); +{ +my ($INP, $OUTP, $LEN, $KEYP, $IVP) =3D ("a0", "a1", "a2", "a3", "a4"); +my ($T0, $T1, $T2, $T3) =3D ("t0", "t1", "t2", "t3"); +my ($VL) =3D ("t4"); +my ($LEN32) =3D ("t5"); +my ($CTR) =3D ("t6"); +my ($MASK) =3D ("v0"); +my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, + $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15, + $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23, + $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, +) =3D map("v$_",(0..31)); + +# Prepare the AES ctr input data into v16. +sub init_aes_ctr_input { + my $code=3D<<___; + # Setup mask into v0 + # The mask pattern for 4*N-th elements + # mask v0: [000100010001....] + # Note: + # We could setup the mask just for the maximum element length instea= d of + # the VLMAX. + li $T0, 0b10001000 + @{[vsetvli $T2, "zero", "e8", "m1", "ta", "ma"]} + @{[vmv_v_x $MASK, $T0]} + # Load IV. + # v31:[IV0, IV1, IV2, big-endian count] + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V31, $IVP]} + # Convert the big-endian counter into little-endian. + @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]} + @{[vrev8_v $V31, $V31, $MASK]} + # Splat the IV to v16 + @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]} + @{[vmv_v_i $V16, 0]} + @{[vaesz_vs $V16, $V31]} + # Prepare the ctr pattern into v20 + # v20: [x, x, x, 0, x, x, x, 1, x, x, x, 2, ...] + @{[viota_m $V20, $MASK, $MASK]} + # v16:[IV0, IV1, IV2, count+0, IV0, IV1, IV2, count+1, ...] + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]} + @{[vadd_vv $V16, $V16, $V20, $MASK]} +___ + + return $code; +} + +$code .=3D <<___; +.p2align 3 +.globl rv64i_zvkb_zvkned_ctr32_encrypt_blocks +.type rv64i_zvkb_zvkned_ctr32_encrypt_blocks,\@function +rv64i_zvkb_zvkned_ctr32_encrypt_blocks: + # The aes block size is 16 bytes. + # We try to get the minimum aes block number including the tail data. + addi $T0, $LEN, 15 + # the minimum block number + srli $T0, $T0, 4 + # We make the block number become e32 length here. + slli $LEN32, $T0, 2 + + # Load key length. + lwu $T0, 480($KEYP) + li $T1, 32 + li $T2, 24 + li $T3, 16 + + beq $T0, $T1, ctr32_encrypt_blocks_256 + beq $T0, $T2, ctr32_encrypt_blocks_192 + beq $T0, $T3, ctr32_encrypt_blocks_128 + + ret +.size rv64i_zvkb_zvkned_ctr32_encrypt_blocks,.-rv64i_zvkb_zvkned_ctr32_enc= rypt_blocks +___ + +$code .=3D <<___; +.p2align 3 +ctr32_encrypt_blocks_128: + # Load all 11 round keys to v1-v11 registers. + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V2, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V3, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V4, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V5, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V6, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V7, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V8, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V9, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V10, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} + + @{[init_aes_ctr_input]} + + ##### AES body + j 2f +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]} + # Increase ctr in v16. + @{[vadd_vx $V16, $V16, $CTR, $MASK]} +2: + # Prepare the AES ctr input into v24. + # The ctr data uses big-endian form. + @{[vmv_v_v $V24, $V16]} + @{[vrev8_v $V24, $V24, $MASK]} + srli $CTR, $VL, 2 + sub $LEN32, $LEN32, $VL + + # Load plaintext in bytes into v20. + @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]} + @{[vle8_v $V20, $INP]} + sub $LEN, $LEN, $T0 + add $INP, $INP, $T0 + + @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]} + @{[vaesz_vs $V24, $V1]} + @{[vaesem_vs $V24, $V2]} + @{[vaesem_vs $V24, $V3]} + @{[vaesem_vs $V24, $V4]} + @{[vaesem_vs $V24, $V5]} + @{[vaesem_vs $V24, $V6]} + @{[vaesem_vs $V24, $V7]} + @{[vaesem_vs $V24, $V8]} + @{[vaesem_vs $V24, $V9]} + @{[vaesem_vs $V24, $V10]} + @{[vaesef_vs $V24, $V11]} + + # ciphertext + @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V20]} + + # Store the ciphertext. + @{[vse8_v $V24, $OUTP]} + add $OUTP, $OUTP, $T0 + + bnez $LEN, 1b + + ## store ctr iv + @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]} + # Increase ctr in v16. + @{[vadd_vx $V16, $V16, $CTR, $MASK]} + # Convert ctr data back to big-endian. + @{[vrev8_v $V16, $V16, $MASK]} + @{[vse32_v $V16, $IVP]} + + ret +.size ctr32_encrypt_blocks_128,.-ctr32_encrypt_blocks_128 +___ + +$code .=3D <<___; +.p2align 3 +ctr32_encrypt_blocks_192: + # Load all 13 round keys to v1-v13 registers. + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V2, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V3, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V4, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V5, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V6, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V7, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V8, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V9, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V10, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V12, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V13, $KEYP]} + + @{[init_aes_ctr_input]} + + ##### AES body + j 2f +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]} + # Increase ctr in v16. + @{[vadd_vx $V16, $V16, $CTR, $MASK]} +2: + # Prepare the AES ctr input into v24. + # The ctr data uses big-endian form. + @{[vmv_v_v $V24, $V16]} + @{[vrev8_v $V24, $V24, $MASK]} + srli $CTR, $VL, 2 + sub $LEN32, $LEN32, $VL + + # Load plaintext in bytes into v20. + @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]} + @{[vle8_v $V20, $INP]} + sub $LEN, $LEN, $T0 + add $INP, $INP, $T0 + + @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]} + @{[vaesz_vs $V24, $V1]} + @{[vaesem_vs $V24, $V2]} + @{[vaesem_vs $V24, $V3]} + @{[vaesem_vs $V24, $V4]} + @{[vaesem_vs $V24, $V5]} + @{[vaesem_vs $V24, $V6]} + @{[vaesem_vs $V24, $V7]} + @{[vaesem_vs $V24, $V8]} + @{[vaesem_vs $V24, $V9]} + @{[vaesem_vs $V24, $V10]} + @{[vaesem_vs $V24, $V11]} + @{[vaesem_vs $V24, $V12]} + @{[vaesef_vs $V24, $V13]} + + # ciphertext + @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V20]} + + # Store the ciphertext. + @{[vse8_v $V24, $OUTP]} + add $OUTP, $OUTP, $T0 + + bnez $LEN, 1b + + ## store ctr iv + @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]} + # Increase ctr in v16. + @{[vadd_vx $V16, $V16, $CTR, $MASK]} + # Convert ctr data back to big-endian. + @{[vrev8_v $V16, $V16, $MASK]} + @{[vse32_v $V16, $IVP]} + + ret +.size ctr32_encrypt_blocks_192,.-ctr32_encrypt_blocks_192 +___ + +$code .=3D <<___; +.p2align 3 +ctr32_encrypt_blocks_256: + # Load all 15 round keys to v1-v15 registers. + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V2, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V3, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V4, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V5, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V6, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V7, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V8, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V9, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V10, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V12, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V13, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V14, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V15, $KEYP]} + + @{[init_aes_ctr_input]} + + ##### AES body + j 2f +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]} + # Increase ctr in v16. + @{[vadd_vx $V16, $V16, $CTR, $MASK]} +2: + # Prepare the AES ctr input into v24. + # The ctr data uses big-endian form. + @{[vmv_v_v $V24, $V16]} + @{[vrev8_v $V24, $V24, $MASK]} + srli $CTR, $VL, 2 + sub $LEN32, $LEN32, $VL + + # Load plaintext in bytes into v20. + @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]} + @{[vle8_v $V20, $INP]} + sub $LEN, $LEN, $T0 + add $INP, $INP, $T0 + + @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]} + @{[vaesz_vs $V24, $V1]} + @{[vaesem_vs $V24, $V2]} + @{[vaesem_vs $V24, $V3]} + @{[vaesem_vs $V24, $V4]} + @{[vaesem_vs $V24, $V5]} + @{[vaesem_vs $V24, $V6]} + @{[vaesem_vs $V24, $V7]} + @{[vaesem_vs $V24, $V8]} + @{[vaesem_vs $V24, $V9]} + @{[vaesem_vs $V24, $V10]} + @{[vaesem_vs $V24, $V11]} + @{[vaesem_vs $V24, $V12]} + @{[vaesem_vs $V24, $V13]} + @{[vaesem_vs $V24, $V14]} + @{[vaesef_vs $V24, $V15]} + + # ciphertext + @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V20]} + + # Store the ciphertext. + @{[vse8_v $V24, $OUTP]} + add $OUTP, $OUTP, $T0 + + bnez $LEN, 1b + + ## store ctr iv + @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]} + # Increase ctr in v16. + @{[vadd_vx $V16, $V16, $CTR, $MASK]} + # Convert ctr data back to big-endian. + @{[vrev8_v $V16, $V16, $MASK]} + @{[vse32_v $V16, $IVP]} + + ret +.size ctr32_encrypt_blocks_256,.-ctr32_encrypt_blocks_256 +___ +} + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/ae= s-riscv64-zvkned.pl index 303e82d9f6f0..71a9248320c0 100644 --- a/arch/riscv/crypto/aes-riscv64-zvkned.pl +++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl @@ -67,6 +67,752 @@ my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, ) =3D map("v$_",(0..31)); =20 +# Load all 11 round keys to v1-v11 registers. +sub aes_128_load_key { + my $KEYP =3D shift; + + my $code=3D<<___; + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V2, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V3, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V4, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V5, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V6, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V7, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V8, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V9, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V10, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} +___ + + return $code; +} + +# Load all 13 round keys to v1-v13 registers. +sub aes_192_load_key { + my $KEYP =3D shift; + + my $code=3D<<___; + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V2, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V3, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V4, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V5, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V6, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V7, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V8, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V9, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V10, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V12, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V13, $KEYP]} +___ + + return $code; +} + +# Load all 15 round keys to v1-v15 registers. +sub aes_256_load_key { + my $KEYP =3D shift; + + my $code=3D<<___; + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V2, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V3, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V4, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V5, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V6, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V7, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V8, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V9, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V10, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V12, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V13, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V14, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V15, $KEYP]} +___ + + return $code; +} + +# aes-128 encryption with round keys v1-v11 +sub aes_128_encrypt { + my $code=3D<<___; + @{[vaesz_vs $V24, $V1]} # with round key w[ 0, 3] + @{[vaesem_vs $V24, $V2]} # with round key w[ 4, 7] + @{[vaesem_vs $V24, $V3]} # with round key w[ 8,11] + @{[vaesem_vs $V24, $V4]} # with round key w[12,15] + @{[vaesem_vs $V24, $V5]} # with round key w[16,19] + @{[vaesem_vs $V24, $V6]} # with round key w[20,23] + @{[vaesem_vs $V24, $V7]} # with round key w[24,27] + @{[vaesem_vs $V24, $V8]} # with round key w[28,31] + @{[vaesem_vs $V24, $V9]} # with round key w[32,35] + @{[vaesem_vs $V24, $V10]} # with round key w[36,39] + @{[vaesef_vs $V24, $V11]} # with round key w[40,43] +___ + + return $code; +} + +# aes-128 decryption with round keys v1-v11 +sub aes_128_decrypt { + my $code=3D<<___; + @{[vaesz_vs $V24, $V11]} # with round key w[40,43] + @{[vaesdm_vs $V24, $V10]} # with round key w[36,39] + @{[vaesdm_vs $V24, $V9]} # with round key w[32,35] + @{[vaesdm_vs $V24, $V8]} # with round key w[28,31] + @{[vaesdm_vs $V24, $V7]} # with round key w[24,27] + @{[vaesdm_vs $V24, $V6]} # with round key w[20,23] + @{[vaesdm_vs $V24, $V5]} # with round key w[16,19] + @{[vaesdm_vs $V24, $V4]} # with round key w[12,15] + @{[vaesdm_vs $V24, $V3]} # with round key w[ 8,11] + @{[vaesdm_vs $V24, $V2]} # with round key w[ 4, 7] + @{[vaesdf_vs $V24, $V1]} # with round key w[ 0, 3] +___ + + return $code; +} + +# aes-192 encryption with round keys v1-v13 +sub aes_192_encrypt { + my $code=3D<<___; + @{[vaesz_vs $V24, $V1]} # with round key w[ 0, 3] + @{[vaesem_vs $V24, $V2]} # with round key w[ 4, 7] + @{[vaesem_vs $V24, $V3]} # with round key w[ 8,11] + @{[vaesem_vs $V24, $V4]} # with round key w[12,15] + @{[vaesem_vs $V24, $V5]} # with round key w[16,19] + @{[vaesem_vs $V24, $V6]} # with round key w[20,23] + @{[vaesem_vs $V24, $V7]} # with round key w[24,27] + @{[vaesem_vs $V24, $V8]} # with round key w[28,31] + @{[vaesem_vs $V24, $V9]} # with round key w[32,35] + @{[vaesem_vs $V24, $V10]} # with round key w[36,39] + @{[vaesem_vs $V24, $V11]} # with round key w[40,43] + @{[vaesem_vs $V24, $V12]} # with round key w[44,47] + @{[vaesef_vs $V24, $V13]} # with round key w[48,51] +___ + + return $code; +} + +# aes-192 decryption with round keys v1-v13 +sub aes_192_decrypt { + my $code=3D<<___; + @{[vaesz_vs $V24, $V13]} # with round key w[48,51] + @{[vaesdm_vs $V24, $V12]} # with round key w[44,47] + @{[vaesdm_vs $V24, $V11]} # with round key w[40,43] + @{[vaesdm_vs $V24, $V10]} # with round key w[36,39] + @{[vaesdm_vs $V24, $V9]} # with round key w[32,35] + @{[vaesdm_vs $V24, $V8]} # with round key w[28,31] + @{[vaesdm_vs $V24, $V7]} # with round key w[24,27] + @{[vaesdm_vs $V24, $V6]} # with round key w[20,23] + @{[vaesdm_vs $V24, $V5]} # with round key w[16,19] + @{[vaesdm_vs $V24, $V4]} # with round key w[12,15] + @{[vaesdm_vs $V24, $V3]} # with round key w[ 8,11] + @{[vaesdm_vs $V24, $V2]} # with round key w[ 4, 7] + @{[vaesdf_vs $V24, $V1]} # with round key w[ 0, 3] +___ + + return $code; +} + +# aes-256 encryption with round keys v1-v15 +sub aes_256_encrypt { + my $code=3D<<___; + @{[vaesz_vs $V24, $V1]} # with round key w[ 0, 3] + @{[vaesem_vs $V24, $V2]} # with round key w[ 4, 7] + @{[vaesem_vs $V24, $V3]} # with round key w[ 8,11] + @{[vaesem_vs $V24, $V4]} # with round key w[12,15] + @{[vaesem_vs $V24, $V5]} # with round key w[16,19] + @{[vaesem_vs $V24, $V6]} # with round key w[20,23] + @{[vaesem_vs $V24, $V7]} # with round key w[24,27] + @{[vaesem_vs $V24, $V8]} # with round key w[28,31] + @{[vaesem_vs $V24, $V9]} # with round key w[32,35] + @{[vaesem_vs $V24, $V10]} # with round key w[36,39] + @{[vaesem_vs $V24, $V11]} # with round key w[40,43] + @{[vaesem_vs $V24, $V12]} # with round key w[44,47] + @{[vaesem_vs $V24, $V13]} # with round key w[48,51] + @{[vaesem_vs $V24, $V14]} # with round key w[52,55] + @{[vaesef_vs $V24, $V15]} # with round key w[56,59] +___ + + return $code; +} + +# aes-256 decryption with round keys v1-v15 +sub aes_256_decrypt { + my $code=3D<<___; + @{[vaesz_vs $V24, $V15]} # with round key w[56,59] + @{[vaesdm_vs $V24, $V14]} # with round key w[52,55] + @{[vaesdm_vs $V24, $V13]} # with round key w[48,51] + @{[vaesdm_vs $V24, $V12]} # with round key w[44,47] + @{[vaesdm_vs $V24, $V11]} # with round key w[40,43] + @{[vaesdm_vs $V24, $V10]} # with round key w[36,39] + @{[vaesdm_vs $V24, $V9]} # with round key w[32,35] + @{[vaesdm_vs $V24, $V8]} # with round key w[28,31] + @{[vaesdm_vs $V24, $V7]} # with round key w[24,27] + @{[vaesdm_vs $V24, $V6]} # with round key w[20,23] + @{[vaesdm_vs $V24, $V5]} # with round key w[16,19] + @{[vaesdm_vs $V24, $V4]} # with round key w[12,15] + @{[vaesdm_vs $V24, $V3]} # with round key w[ 8,11] + @{[vaesdm_vs $V24, $V2]} # with round key w[ 4, 7] + @{[vaesdf_vs $V24, $V1]} # with round key w[ 0, 3] +___ + + return $code; +} + +{ +##########################################################################= ##### +# void rv64i_zvkned_cbc_encrypt(const unsigned char *in, unsigned char *ou= t, +# size_t length, const AES_KEY *key, +# unsigned char *ivec, const int enc); +my ($INP, $OUTP, $LEN, $KEYP, $IVP, $ENC) =3D ("a0", "a1", "a2", "a3", "a4= ", "a5"); +my ($T0, $T1) =3D ("t0", "t1", "t2"); + +$code .=3D <<___; +.p2align 3 +.globl rv64i_zvkned_cbc_encrypt +.type rv64i_zvkned_cbc_encrypt,\@function +rv64i_zvkned_cbc_encrypt: + # check whether the length is a multiple of 16 and >=3D 16 + li $T1, 16 + blt $LEN, $T1, L_end + andi $T1, $LEN, 15 + bnez $T1, L_end + + # Load key length. + lwu $T0, 480($KEYP) + + # Get proper routine for key length. + li $T1, 16 + beq $T1, $T0, L_cbc_enc_128 + + li $T1, 24 + beq $T1, $T0, L_cbc_enc_192 + + li $T1, 32 + beq $T1, $T0, L_cbc_enc_256 + + ret +.size rv64i_zvkned_cbc_encrypt,.-rv64i_zvkned_cbc_encrypt +___ + +$code .=3D <<___; +.p2align 3 +L_cbc_enc_128: + # Load all 11 round keys to v1-v11 registers. + @{[aes_128_load_key $KEYP]} + + # Load IV. + @{[vle32_v $V16, $IVP]} + + @{[vle32_v $V24, $INP]} + @{[vxor_vv $V24, $V24, $V16]} + j 2f + +1: + @{[vle32_v $V17, $INP]} + @{[vxor_vv $V24, $V24, $V17]} + +2: + # AES body + @{[aes_128_encrypt]} + + @{[vse32_v $V24, $OUTP]} + + addi $INP, $INP, 16 + addi $OUTP, $OUTP, 16 + addi $LEN, $LEN, -16 + + bnez $LEN, 1b + + @{[vse32_v $V24, $IVP]} + + ret +.size L_cbc_enc_128,.-L_cbc_enc_128 +___ + +$code .=3D <<___; +.p2align 3 +L_cbc_enc_192: + # Load all 13 round keys to v1-v13 registers. + @{[aes_192_load_key $KEYP]} + + # Load IV. + @{[vle32_v $V16, $IVP]} + + @{[vle32_v $V24, $INP]} + @{[vxor_vv $V24, $V24, $V16]} + j 2f + +1: + @{[vle32_v $V17, $INP]} + @{[vxor_vv $V24, $V24, $V17]} + +2: + # AES body + @{[aes_192_encrypt]} + + @{[vse32_v $V24, $OUTP]} + + addi $INP, $INP, 16 + addi $OUTP, $OUTP, 16 + addi $LEN, $LEN, -16 + + bnez $LEN, 1b + + @{[vse32_v $V24, $IVP]} + + ret +.size L_cbc_enc_192,.-L_cbc_enc_192 +___ + +$code .=3D <<___; +.p2align 3 +L_cbc_enc_256: + # Load all 15 round keys to v1-v15 registers. + @{[aes_256_load_key $KEYP]} + + # Load IV. + @{[vle32_v $V16, $IVP]} + + @{[vle32_v $V24, $INP]} + @{[vxor_vv $V24, $V24, $V16]} + j 2f + +1: + @{[vle32_v $V17, $INP]} + @{[vxor_vv $V24, $V24, $V17]} + +2: + # AES body + @{[aes_256_encrypt]} + + @{[vse32_v $V24, $OUTP]} + + addi $INP, $INP, 16 + addi $OUTP, $OUTP, 16 + addi $LEN, $LEN, -16 + + bnez $LEN, 1b + + @{[vse32_v $V24, $IVP]} + + ret +.size L_cbc_enc_256,.-L_cbc_enc_256 +___ + +##########################################################################= ##### +# void rv64i_zvkned_cbc_decrypt(const unsigned char *in, unsigned char *ou= t, +# size_t length, const AES_KEY *key, +# unsigned char *ivec, const int enc); +$code .=3D <<___; +.p2align 3 +.globl rv64i_zvkned_cbc_decrypt +.type rv64i_zvkned_cbc_decrypt,\@function +rv64i_zvkned_cbc_decrypt: + # check whether the length is a multiple of 16 and >=3D 16 + li $T1, 16 + blt $LEN, $T1, L_end + andi $T1, $LEN, 15 + bnez $T1, L_end + + # Load key length. + lwu $T0, 480($KEYP) + + # Get proper routine for key length. + li $T1, 16 + beq $T1, $T0, L_cbc_dec_128 + + li $T1, 24 + beq $T1, $T0, L_cbc_dec_192 + + li $T1, 32 + beq $T1, $T0, L_cbc_dec_256 + + ret +.size rv64i_zvkned_cbc_decrypt,.-rv64i_zvkned_cbc_decrypt +___ + +$code .=3D <<___; +.p2align 3 +L_cbc_dec_128: + # Load all 11 round keys to v1-v11 registers. + @{[aes_128_load_key $KEYP]} + + # Load IV. + @{[vle32_v $V16, $IVP]} + + @{[vle32_v $V24, $INP]} + @{[vmv_v_v $V17, $V24]} + j 2f + +1: + @{[vle32_v $V24, $INP]} + @{[vmv_v_v $V17, $V24]} + addi $OUTP, $OUTP, 16 + +2: + # AES body + @{[aes_128_decrypt]} + + @{[vxor_vv $V24, $V24, $V16]} + @{[vse32_v $V24, $OUTP]} + @{[vmv_v_v $V16, $V17]} + + addi $LEN, $LEN, -16 + addi $INP, $INP, 16 + + bnez $LEN, 1b + + @{[vse32_v $V16, $IVP]} + + ret +.size L_cbc_dec_128,.-L_cbc_dec_128 +___ + +$code .=3D <<___; +.p2align 3 +L_cbc_dec_192: + # Load all 13 round keys to v1-v13 registers. + @{[aes_192_load_key $KEYP]} + + # Load IV. + @{[vle32_v $V16, $IVP]} + + @{[vle32_v $V24, $INP]} + @{[vmv_v_v $V17, $V24]} + j 2f + +1: + @{[vle32_v $V24, $INP]} + @{[vmv_v_v $V17, $V24]} + addi $OUTP, $OUTP, 16 + +2: + # AES body + @{[aes_192_decrypt]} + + @{[vxor_vv $V24, $V24, $V16]} + @{[vse32_v $V24, $OUTP]} + @{[vmv_v_v $V16, $V17]} + + addi $LEN, $LEN, -16 + addi $INP, $INP, 16 + + bnez $LEN, 1b + + @{[vse32_v $V16, $IVP]} + + ret +.size L_cbc_dec_192,.-L_cbc_dec_192 +___ + +$code .=3D <<___; +.p2align 3 +L_cbc_dec_256: + # Load all 15 round keys to v1-v15 registers. + @{[aes_256_load_key $KEYP]} + + # Load IV. + @{[vle32_v $V16, $IVP]} + + @{[vle32_v $V24, $INP]} + @{[vmv_v_v $V17, $V24]} + j 2f + +1: + @{[vle32_v $V24, $INP]} + @{[vmv_v_v $V17, $V24]} + addi $OUTP, $OUTP, 16 + +2: + # AES body + @{[aes_256_decrypt]} + + @{[vxor_vv $V24, $V24, $V16]} + @{[vse32_v $V24, $OUTP]} + @{[vmv_v_v $V16, $V17]} + + addi $LEN, $LEN, -16 + addi $INP, $INP, 16 + + bnez $LEN, 1b + + @{[vse32_v $V16, $IVP]} + + ret +.size L_cbc_dec_256,.-L_cbc_dec_256 +___ +} + +{ +##########################################################################= ##### +# void rv64i_zvkned_ecb_encrypt(const unsigned char *in, unsigned char *ou= t, +# size_t length, const AES_KEY *key, +# const int enc); +my ($INP, $OUTP, $LEN, $KEYP, $ENC) =3D ("a0", "a1", "a2", "a3", "a4"); +my ($VL) =3D ("a5"); +my ($LEN32) =3D ("a6"); +my ($T0, $T1) =3D ("t0", "t1"); + +$code .=3D <<___; +.p2align 3 +.globl rv64i_zvkned_ecb_encrypt +.type rv64i_zvkned_ecb_encrypt,\@function +rv64i_zvkned_ecb_encrypt: + # Make the LEN become e32 length. + srli $LEN32, $LEN, 2 + + # Load key length. + lwu $T0, 480($KEYP) + + # Get proper routine for key length. + li $T1, 16 + beq $T1, $T0, L_ecb_enc_128 + + li $T1, 24 + beq $T1, $T0, L_ecb_enc_192 + + li $T1, 32 + beq $T1, $T0, L_ecb_enc_256 + + ret +.size rv64i_zvkned_ecb_encrypt,.-rv64i_zvkned_ecb_encrypt +___ + +$code .=3D <<___; +.p2align 3 +L_ecb_enc_128: + # Load all 11 round keys to v1-v11 registers. + @{[aes_128_load_key $KEYP]} + +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + + @{[vle32_v $V24, $INP]} + + # AES body + @{[aes_128_encrypt]} + + @{[vse32_v $V24, $OUTP]} + + add $INP, $INP, $T0 + add $OUTP, $OUTP, $T0 + + bnez $LEN32, 1b + + ret +.size L_ecb_enc_128,.-L_ecb_enc_128 +___ + +$code .=3D <<___; +.p2align 3 +L_ecb_enc_192: + # Load all 13 round keys to v1-v13 registers. + @{[aes_192_load_key $KEYP]} + +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + + @{[vle32_v $V24, $INP]} + + # AES body + @{[aes_192_encrypt]} + + @{[vse32_v $V24, $OUTP]} + + add $INP, $INP, $T0 + add $OUTP, $OUTP, $T0 + + bnez $LEN32, 1b + + ret +.size L_ecb_enc_192,.-L_ecb_enc_192 +___ + +$code .=3D <<___; +.p2align 3 +L_ecb_enc_256: + # Load all 15 round keys to v1-v15 registers. + @{[aes_256_load_key $KEYP]} + +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + + @{[vle32_v $V24, $INP]} + + # AES body + @{[aes_256_encrypt]} + + @{[vse32_v $V24, $OUTP]} + + add $INP, $INP, $T0 + add $OUTP, $OUTP, $T0 + + bnez $LEN32, 1b + + ret +.size L_ecb_enc_256,.-L_ecb_enc_256 +___ + +##########################################################################= ##### +# void rv64i_zvkned_ecb_decrypt(const unsigned char *in, unsigned char *ou= t, +# size_t length, const AES_KEY *key, +# const int enc); +$code .=3D <<___; +.p2align 3 +.globl rv64i_zvkned_ecb_decrypt +.type rv64i_zvkned_ecb_decrypt,\@function +rv64i_zvkned_ecb_decrypt: + # Make the LEN become e32 length. + srli $LEN32, $LEN, 2 + + # Load key length. + lwu $T0, 480($KEYP) + + # Get proper routine for key length. + li $T1, 16 + beq $T1, $T0, L_ecb_dec_128 + + li $T1, 24 + beq $T1, $T0, L_ecb_dec_192 + + li $T1, 32 + beq $T1, $T0, L_ecb_dec_256 + + ret +.size rv64i_zvkned_ecb_decrypt,.-rv64i_zvkned_ecb_decrypt +___ + +$code .=3D <<___; +.p2align 3 +L_ecb_dec_128: + # Load all 11 round keys to v1-v11 registers. + @{[aes_128_load_key $KEYP]} + +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + + @{[vle32_v $V24, $INP]} + + # AES body + @{[aes_128_decrypt]} + + @{[vse32_v $V24, $OUTP]} + + add $INP, $INP, $T0 + add $OUTP, $OUTP, $T0 + + bnez $LEN32, 1b + + ret +.size L_ecb_dec_128,.-L_ecb_dec_128 +___ + +$code .=3D <<___; +.p2align 3 +L_ecb_dec_192: + # Load all 13 round keys to v1-v13 registers. + @{[aes_192_load_key $KEYP]} + +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + + @{[vle32_v $V24, $INP]} + + # AES body + @{[aes_192_decrypt]} + + @{[vse32_v $V24, $OUTP]} + + add $INP, $INP, $T0 + add $OUTP, $OUTP, $T0 + + bnez $LEN32, 1b + + ret +.size L_ecb_dec_192,.-L_ecb_dec_192 +___ + +$code .=3D <<___; +.p2align 3 +L_ecb_dec_256: + # Load all 15 round keys to v1-v15 registers. + @{[aes_256_load_key $KEYP]} + +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + + @{[vle32_v $V24, $INP]} + + # AES body + @{[aes_256_decrypt]} + + @{[vse32_v $V24, $OUTP]} + + add $INP, $INP, $T0 + add $OUTP, $OUTP, $T0 + + bnez $LEN32, 1b + + ret +.size L_ecb_dec_256,.-L_ecb_dec_256 +___ +} + { ##########################################################################= ###### # int rv64i_zvkned_set_encrypt_key(const unsigned char *userKey, const int= bytes, --=20 2.28.0 From nobody Wed Dec 17 12:18:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70923C4167B for ; Mon, 27 Nov 2023 07:08:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231849AbjK0HIC (ORCPT ); Mon, 27 Nov 2023 02:08:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52534 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230063AbjK0HHf (ORCPT ); Mon, 27 Nov 2023 02:07:35 -0500 Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FAD7D53 for ; Sun, 26 Nov 2023 23:07:38 -0800 (PST) Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-1cc9b626a96so26864465ad.2 for ; Sun, 26 Nov 2023 23:07:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1701068857; x=1701673657; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7M9E14E6fLJh7AJb55zP++mgBfFiPcFBw2pNzr7Sm0I=; b=e+8bCO8jEFA6gYSaBocJdXaAeQD99UnoxHcv8K1CJJzl4l8KES2IllcSMwwIOO1umW EiymFzKvalJGKiyzR9H2BK9/QQ/x7lsmhS9HxwucnActR3WNil1tLWJXiDMSMeVOE/ri imcJuKWd87odmk5VR3EaPcRwEaYRlOhPuY7dlOgdf4nPz4VtrexEVg0dIyQxPDepdOHR P3t/wIqWCIySUrMnK+cxuUY6nLmO1zlnf7YVAHONSb49hiGdniJdCyA1z9yP3KEPlhOc PRqVnUdfqirluL2z7EWP/PNCfH30driZ1cNuUIGG+75Mo7df65ma1dvlrpqiBQP142ZP yaTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701068857; x=1701673657; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7M9E14E6fLJh7AJb55zP++mgBfFiPcFBw2pNzr7Sm0I=; b=vq06CxBgzHJ65N/q6DWycZYdtNX/iclQKBJmiIXcr6BSyvmm2g6r6jLz6tah4ndmjs IR6BfPCK/FbGfFVNqoH5JUZrxgAfvJZNcdicUvZUYyzxASSyaLK0TrrycQ3dMS58DuEZ 3m4kD/05vAE7IpS99stxm5ZS2taqnD2eYcnZfGbuxjnYLLkYIk+XNix/waBfB6XHoAao tCHS3YE6JqZDKREC2SqVOWMwaYZI76LtPfYYpr5ZSM/b0RgUeQOU9NfRVJ7YDwhkK8im sUK/5bVpPeKaMp6eizMT9KaBqkoNjT4Hqr7FptcwxjiK9kmegtEYTSf9EBG6b4khvH0j Jekg== X-Gm-Message-State: AOJu0YxaMElgYjBspW7YTHUaGVPuXMXU/Q/c0ZOrr0KRbEwYMLLAqm7r orgIlJtMNV1OORl7L9s0c0pktQ== X-Google-Smtp-Source: AGHT+IHVda7BxvkREXau4SuF+gpfjUKcDNap6z/qKUI6eGy91unUWy63cKgknIxJ/anhs1Q3NGprvg== X-Received: by 2002:a17:902:b488:b0:1cc:5aef:f2c1 with SMTP id y8-20020a170902b48800b001cc5aeff2c1mr9325175plr.33.1701068857649; Sun, 26 Nov 2023 23:07:37 -0800 (PST) Received: from localhost.localdomain ([101.10.45.230]) by smtp.gmail.com with ESMTPSA id jh15-20020a170903328f00b001cfcd3a764esm1340134plb.77.2023.11.26.23.07.34 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2023 23:07:37 -0800 (PST) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net, conor.dooley@microchip.com, ebiggers@kernel.org, ardb@kernel.org Cc: heiko@sntech.de, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH v2 08/13] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation Date: Mon, 27 Nov 2023 15:06:58 +0800 Message-Id: <20231127070703.1697-9-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231127070703.1697-1-jerry.shih@sifive.com> References: <20231127070703.1697-1-jerry.shih@sifive.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add a gcm hash implementation using the Zvkg extension from OpenSSL (openssl/openssl#21923). The perlasm here is different from the original implementation in OpenSSL. The OpenSSL assumes that the H is stored in little-endian. Thus, it needs to convert the H to big-endian for Zvkg instructions. In kernel, we have the big-endian H directly. There is no need for endian conversion. Co-developed-by: Christoph M=C3=BCllner Signed-off-by: Christoph M=C3=BCllner Co-developed-by: Heiko Stuebner Signed-off-by: Heiko Stuebner Signed-off-by: Jerry Shih --- Changelog v2: - Do not turn on kconfig `GHASH_RISCV64` option by default. - Add `asmlinkage` qualifier for crypto asm function. - Update the ghash fallback path in ghash_blocks(). - Rename structure riscv64_ghash_context to riscv64_ghash_tfm_ctx. - Fold ghash_update_zvkg() and ghash_final_zvkg(). - Reorder structure riscv64_ghash_alg_zvkg members initialization in the order declared. --- arch/riscv/crypto/Kconfig | 10 ++ arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/ghash-riscv64-glue.c | 175 ++++++++++++++++++++++++ arch/riscv/crypto/ghash-riscv64-zvkg.pl | 100 ++++++++++++++ 4 files changed, 292 insertions(+) create mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 9d991ddda289..6863f01a2ab0 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -34,4 +34,14 @@ config CRYPTO_AES_BLOCK_RISCV64 - Zvkb vector crypto extension (CTR/XTS) - Zvkg vector crypto extension (XTS) =20 +config CRYPTO_GHASH_RISCV64 + tristate "Hash functions: GHASH" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_GCM + help + GCM GHASH function (NIST SP 800-38D) + + Architecture: riscv64 using: + - Zvkg vector crypto extension + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 9574b009762f..94a7f8eaa8a7 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -9,6 +9,9 @@ aes-riscv64-y :=3D aes-riscv64-glue.o aes-riscv64-zvkned.o obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) +=3D aes-block-riscv64.o aes-block-riscv64-y :=3D aes-riscv64-block-mode-glue.o aes-riscv64-zvkned-= zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o =20 +obj-$(CONFIG_CRYPTO_GHASH_RISCV64) +=3D ghash-riscv64.o +ghash-riscv64-y :=3D ghash-riscv64-glue.o ghash-riscv64-zvkg.o + quiet_cmd_perlasm =3D PERLASM $@ cmd_perlasm =3D $(PERL) $(<) void $(@) =20 @@ -21,6 +24,10 @@ $(obj)/aes-riscv64-zvkned-zvbb-zvkg.S: $(src)/aes-riscv6= 4-zvkned-zvbb-zvkg.pl $(obj)/aes-riscv64-zvkned-zvkb.S: $(src)/aes-riscv64-zvkned-zvkb.pl $(call cmd,perlasm) =20 +$(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl + $(call cmd,perlasm) + clean-files +=3D aes-riscv64-zvkned.S clean-files +=3D aes-riscv64-zvkned-zvbb-zvkg.S clean-files +=3D aes-riscv64-zvkned-zvkb.S +clean-files +=3D ghash-riscv64-zvkg.S diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/gha= sh-riscv64-glue.c new file mode 100644 index 000000000000..b01ab5714677 --- /dev/null +++ b/arch/riscv/crypto/ghash-riscv64-glue.c @@ -0,0 +1,175 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * RISC-V optimized GHASH routines + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + * + * Copyright (C) 2023 SiFive, Inc. + * Author: Jerry Shih + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* ghash using zvkg vector crypto extension */ +asmlinkage void gcm_ghash_rv64i_zvkg(be128 *Xi, const be128 *H, const u8 *= inp, + size_t len); + +struct riscv64_ghash_tfm_ctx { + be128 key; +}; + +struct riscv64_ghash_desc_ctx { + be128 shash; + u8 buffer[GHASH_BLOCK_SIZE]; + u32 bytes; +}; + +static inline void ghash_blocks(const struct riscv64_ghash_tfm_ctx *tctx, + struct riscv64_ghash_desc_ctx *dctx, + const u8 *src, size_t srclen) +{ + /* The srclen is nonzero and a multiple of 16. */ + if (crypto_simd_usable()) { + kernel_vector_begin(); + gcm_ghash_rv64i_zvkg(&dctx->shash, &tctx->key, src, srclen); + kernel_vector_end(); + } else { + do { + crypto_xor((u8 *)&dctx->shash, src, GHASH_BLOCK_SIZE); + gf128mul_lle(&dctx->shash, &tctx->key); + srclen -=3D GHASH_BLOCK_SIZE; + src +=3D GHASH_BLOCK_SIZE; + } while (srclen); + } +} + +static int ghash_init(struct shash_desc *desc) +{ + struct riscv64_ghash_desc_ctx *dctx =3D shash_desc_ctx(desc); + + *dctx =3D (struct riscv64_ghash_desc_ctx){}; + + return 0; +} + +static int ghash_update_zvkg(struct shash_desc *desc, const u8 *src, + unsigned int srclen) +{ + size_t len; + const struct riscv64_ghash_tfm_ctx *tctx =3D crypto_shash_ctx(desc->tfm); + struct riscv64_ghash_desc_ctx *dctx =3D shash_desc_ctx(desc); + + if (dctx->bytes) { + if (dctx->bytes + srclen < GHASH_BLOCK_SIZE) { + memcpy(dctx->buffer + dctx->bytes, src, srclen); + dctx->bytes +=3D srclen; + return 0; + } + memcpy(dctx->buffer + dctx->bytes, src, + GHASH_BLOCK_SIZE - dctx->bytes); + + ghash_blocks(tctx, dctx, dctx->buffer, GHASH_BLOCK_SIZE); + + src +=3D GHASH_BLOCK_SIZE - dctx->bytes; + srclen -=3D GHASH_BLOCK_SIZE - dctx->bytes; + dctx->bytes =3D 0; + } + len =3D srclen & ~(GHASH_BLOCK_SIZE - 1); + + if (len) { + ghash_blocks(tctx, dctx, src, len); + src +=3D len; + srclen -=3D len; + } + + if (srclen) { + memcpy(dctx->buffer, src, srclen); + dctx->bytes =3D srclen; + } + + return 0; +} + +static int ghash_final_zvkg(struct shash_desc *desc, u8 *out) +{ + const struct riscv64_ghash_tfm_ctx *tctx =3D crypto_shash_ctx(desc->tfm); + struct riscv64_ghash_desc_ctx *dctx =3D shash_desc_ctx(desc); + int i; + + if (dctx->bytes) { + for (i =3D dctx->bytes; i < GHASH_BLOCK_SIZE; i++) + dctx->buffer[i] =3D 0; + + ghash_blocks(tctx, dctx, dctx->buffer, GHASH_BLOCK_SIZE); + } + + memcpy(out, &dctx->shash, GHASH_DIGEST_SIZE); + + return 0; +} + +static int ghash_setkey(struct crypto_shash *tfm, const u8 *key, + unsigned int keylen) +{ + struct riscv64_ghash_tfm_ctx *tctx =3D crypto_shash_ctx(tfm); + + if (keylen !=3D GHASH_BLOCK_SIZE) + return -EINVAL; + + memcpy(&tctx->key, key, GHASH_BLOCK_SIZE); + + return 0; +} + +static struct shash_alg riscv64_ghash_alg_zvkg =3D { + .init =3D ghash_init, + .update =3D ghash_update_zvkg, + .final =3D ghash_final_zvkg, + .setkey =3D ghash_setkey, + .descsize =3D sizeof(struct riscv64_ghash_desc_ctx), + .digestsize =3D GHASH_DIGEST_SIZE, + .base =3D { + .cra_blocksize =3D GHASH_BLOCK_SIZE, + .cra_ctxsize =3D sizeof(struct riscv64_ghash_tfm_ctx), + .cra_priority =3D 303, + .cra_name =3D "ghash", + .cra_driver_name =3D "ghash-riscv64-zvkg", + .cra_module =3D THIS_MODULE, + }, +}; + +static inline bool check_ghash_ext(void) +{ + return riscv_isa_extension_available(NULL, ZVKG) && + riscv_vector_vlen() >=3D 128; +} + +static int __init riscv64_ghash_mod_init(void) +{ + if (check_ghash_ext()) + return crypto_register_shash(&riscv64_ghash_alg_zvkg); + + return -ENODEV; +} + +static void __exit riscv64_ghash_mod_fini(void) +{ + crypto_unregister_shash(&riscv64_ghash_alg_zvkg); +} + +module_init(riscv64_ghash_mod_init); +module_exit(riscv64_ghash_mod_fini); + +MODULE_DESCRIPTION("GCM GHASH (RISC-V accelerated)"); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("ghash"); diff --git a/arch/riscv/crypto/ghash-riscv64-zvkg.pl b/arch/riscv/crypto/gh= ash-riscv64-zvkg.pl new file mode 100644 index 000000000000..4beea4ac9cbe --- /dev/null +++ b/arch/riscv/crypto/ghash-riscv64-zvkg.pl @@ -0,0 +1,100 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Christoph M=C3=BCllner +# Copyright (c) 2023, Jerry Shih +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# - RV64I +# - RISC-V Vector ('V') with VLEN >=3D 128 +# - RISC-V Vector GCM/GMAC extension ('Zvkg') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extensio= n) +# $flavour is the first argument if it doesn't look like a file +my $output =3D $#ARGV >=3D 0 && $ARGV[$#ARGV] =3D~ m|\.\w+$| ? pop : undef; +my $flavour =3D $#ARGV >=3D 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=3D<<___; +.text +___ + +##########################################################################= ##### +# void gcm_ghash_rv64i_zvkg(be128 *Xi, const be128 *H, const u8 *inp, size= _t len) +# +# input: Xi: current hash value +# H: hash key +# inp: pointer to input data +# len: length of input data in bytes (multiple of block size) +# output: Xi: Xi+1 (next hash value Xi) +{ +my ($Xi,$H,$inp,$len) =3D ("a0","a1","a2","a3"); +my ($vXi,$vH,$vinp,$Vzero) =3D ("v1","v2","v3","v4"); + +$code .=3D <<___; +.p2align 3 +.globl gcm_ghash_rv64i_zvkg +.type gcm_ghash_rv64i_zvkg,\@function +gcm_ghash_rv64i_zvkg: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $vH, $H]} + @{[vle32_v $vXi, $Xi]} + +Lstep: + @{[vle32_v $vinp, $inp]} + add $inp, $inp, 16 + add $len, $len, -16 + @{[vghsh_vv $vXi, $vH, $vinp]} + bnez $len, Lstep + + @{[vse32_v $vXi, $Xi]} + ret + +.size gcm_ghash_rv64i_zvkg,.-gcm_ghash_rv64i_zvkg +___ +} + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; --=20 2.28.0 From nobody Wed Dec 17 12:18:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB0DDC0755A for ; Mon, 27 Nov 2023 07:08:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231851AbjK0HIJ (ORCPT ); Mon, 27 Nov 2023 02:08:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232520AbjK0HHv (ORCPT ); Mon, 27 Nov 2023 02:07:51 -0500 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44823D5F for ; Sun, 26 Nov 2023 23:07:41 -0800 (PST) Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1cf8b35a6dbso26423515ad.0 for ; Sun, 26 Nov 2023 23:07:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1701068860; x=1701673660; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=H2ie9UBY5BJCsnebwLbVxGGtRlv0a5BfHvQORsnSkzU=; b=Yuxib0SSWBFUIRPoLM6pvXjk5Qh5530zPyuq9U5NXsPIafnX7E2/3TZVDN6E4jSrd0 cJJ9J7/3cq39yIeK9eEjyTELWUBHASi2Pn7EfPMV5kIxyI4qckdKlygCAZwCycIiwOjC Gjv4hgRgUYHOYg1BEHm4XfESdtq/F0pNlworYWtnLqSO+XJ8i21H5xQ3lba92D2pGQwS LtTU+kNFbW7TySBvLJ7kPJBXFTvJ/3xzIWDPzpOgZujRSPfauXufbbhu9jtXtFY7h/DF 5rXaQtLxZK6TH3rSWmgSGgisAxYXadFo/0xFp/l+D6WeZ3/fJT7MHBAZl0n4tGisgYQa Ym/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701068860; x=1701673660; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=H2ie9UBY5BJCsnebwLbVxGGtRlv0a5BfHvQORsnSkzU=; b=JxgbFvbiPY+iN/3bnioePLWcPtZIrMzOBKKuTJWlN9uQIe2umHJ3ful+sZQ40Z3CB4 4Y1jciy6znOc08fqDWI4GDE8elcbtvkqHaU94kfFWGWF9jESGt7nimIh7WLbQyLOWZ4s dD1g4+doT8kLTIEOhOOIToH74vloOK+M3cvKwzgwt6bplJWos+oGdSed/tRMSwvLA5ar zRlbqRAwQMQPJGyFxnHnMhwREg7erU0L6/mYW/uw/YsKhe+DZ3f66wPwSy5sNk62ASNw OmMHWPOfznLZxQj5FqYtEbo+rae2hInhWB9qXoIJkcvLi7aJAOSSRRMRqJObO5tNSxMA 9uiw== X-Gm-Message-State: AOJu0YzbkN/NJoVLFETkV/jwI+M2askp7d4uFc4/Wl7jBShl5mqF1eVA nuGeCblbStd88kjEQbMf0GxGlg== X-Google-Smtp-Source: AGHT+IE1JBBVZSJGyCAbSJ13cr1H5pF+3UXdDbWfyBIJioNv3kdxLCkMNdaACGujT6Ku8o5fRzSGSQ== X-Received: by 2002:a17:902:d38d:b0:1cf:a4e8:d2a1 with SMTP id e13-20020a170902d38d00b001cfa4e8d2a1mr8132283pld.42.1701068860527; Sun, 26 Nov 2023 23:07:40 -0800 (PST) Received: from localhost.localdomain ([101.10.45.230]) by smtp.gmail.com with ESMTPSA id jh15-20020a170903328f00b001cfcd3a764esm1340134plb.77.2023.11.26.23.07.37 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2023 23:07:40 -0800 (PST) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net, conor.dooley@microchip.com, ebiggers@kernel.org, ardb@kernel.org Cc: heiko@sntech.de, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH v2 09/13] RISC-V: crypto: add Zvknha/b accelerated SHA224/256 implementations Date: Mon, 27 Nov 2023 15:06:59 +0800 Message-Id: <20231127070703.1697-10-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231127070703.1697-1-jerry.shih@sifive.com> References: <20231127070703.1697-1-jerry.shih@sifive.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add SHA224 and 256 implementations using Zvknha or Zvknhb vector crypto extensions from OpenSSL(openssl/openssl#21923). Co-developed-by: Charalampos Mitrodimas Signed-off-by: Charalampos Mitrodimas Co-developed-by: Heiko Stuebner Signed-off-by: Heiko Stuebner Co-developed-by: Phoebe Chen Signed-off-by: Phoebe Chen Signed-off-by: Jerry Shih --- Changelog v2: - Do not turn on kconfig `SHA256_RISCV64` option by default. - Add `asmlinkage` qualifier for crypto asm function. - Rename sha256-riscv64-zvkb-zvknha_or_zvknhb to sha256-riscv64-zvknha_or_zvknhb-zvkb. - Reorder structure sha256_algs members initialization in the order declared. --- arch/riscv/crypto/Kconfig | 11 + arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/sha256-riscv64-glue.c | 145 ++++++++ .../sha256-riscv64-zvknha_or_zvknhb-zvkb.pl | 318 ++++++++++++++++++ 4 files changed, 481 insertions(+) create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c create mode 100644 arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.= pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 6863f01a2ab0..d31af9190717 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -44,4 +44,15 @@ config CRYPTO_GHASH_RISCV64 Architecture: riscv64 using: - Zvkg vector crypto extension =20 +config CRYPTO_SHA256_RISCV64 + tristate "Hash functions: SHA-224 and SHA-256" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_SHA256 + help + SHA-224 and SHA-256 secure hash algorithm (FIPS 180) + + Architecture: riscv64 using: + - Zvknha or Zvknhb vector crypto extensions + - Zvkb vector crypto extension + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 94a7f8eaa8a7..e9d7717ec943 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -12,6 +12,9 @@ aes-block-riscv64-y :=3D aes-riscv64-block-mode-glue.o ae= s-riscv64-zvkned-zvbb-zvk obj-$(CONFIG_CRYPTO_GHASH_RISCV64) +=3D ghash-riscv64.o ghash-riscv64-y :=3D ghash-riscv64-glue.o ghash-riscv64-zvkg.o =20 +obj-$(CONFIG_CRYPTO_SHA256_RISCV64) +=3D sha256-riscv64.o +sha256-riscv64-y :=3D sha256-riscv64-glue.o sha256-riscv64-zvknha_or_zvknh= b-zvkb.o + quiet_cmd_perlasm =3D PERLASM $@ cmd_perlasm =3D $(PERL) $(<) void $(@) =20 @@ -27,7 +30,11 @@ $(obj)/aes-riscv64-zvkned-zvkb.S: $(src)/aes-riscv64-zvk= ned-zvkb.pl $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl $(call cmd,perlasm) =20 +$(obj)/sha256-riscv64-zvknha_or_zvknhb-zvkb.S: $(src)/sha256-riscv64-zvknh= a_or_zvknhb-zvkb.pl + $(call cmd,perlasm) + clean-files +=3D aes-riscv64-zvkned.S clean-files +=3D aes-riscv64-zvkned-zvbb-zvkg.S clean-files +=3D aes-riscv64-zvkned-zvkb.S clean-files +=3D ghash-riscv64-zvkg.S +clean-files +=3D sha256-riscv64-zvknha_or_zvknhb-zvkb.S diff --git a/arch/riscv/crypto/sha256-riscv64-glue.c b/arch/riscv/crypto/sh= a256-riscv64-glue.c new file mode 100644 index 000000000000..760d89031d1c --- /dev/null +++ b/arch/riscv/crypto/sha256-riscv64-glue.c @@ -0,0 +1,145 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Linux/riscv64 port of the OpenSSL SHA256 implementation for RISC-V 64 + * + * Copyright (C) 2022 VRULL GmbH + * Author: Heiko Stuebner + * + * Copyright (C) 2023 SiFive, Inc. + * Author: Jerry Shih + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * sha256 using zvkb and zvknha/b vector crypto extension + * + * This asm function will just take the first 256-bit as the sha256 state = from + * the pointer to `struct sha256_state`. + */ +asmlinkage void +sha256_block_data_order_zvkb_zvknha_or_zvknhb(struct sha256_state *digest, + const u8 *data, int num_blks); + +static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data, + unsigned int len) +{ + int ret =3D 0; + + /* + * Make sure struct sha256_state begins directly with the SHA256 + * 256-bit internal state, as this is what the asm function expect. + */ + BUILD_BUG_ON(offsetof(struct sha256_state, state) !=3D 0); + + if (crypto_simd_usable()) { + kernel_vector_begin(); + ret =3D sha256_base_do_update( + desc, data, len, + sha256_block_data_order_zvkb_zvknha_or_zvknhb); + kernel_vector_end(); + } else { + ret =3D crypto_sha256_update(desc, data, len); + } + + return ret; +} + +static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data, + unsigned int len, u8 *out) +{ + if (crypto_simd_usable()) { + kernel_vector_begin(); + if (len) + sha256_base_do_update( + desc, data, len, + sha256_block_data_order_zvkb_zvknha_or_zvknhb); + sha256_base_do_finalize( + desc, sha256_block_data_order_zvkb_zvknha_or_zvknhb); + kernel_vector_end(); + + return sha256_base_finish(desc, out); + } + + return crypto_sha256_finup(desc, data, len, out); +} + +static int riscv64_sha256_final(struct shash_desc *desc, u8 *out) +{ + return riscv64_sha256_finup(desc, NULL, 0, out); +} + +static struct shash_alg sha256_algs[] =3D { + { + .init =3D sha256_base_init, + .update =3D riscv64_sha256_update, + .final =3D riscv64_sha256_final, + .finup =3D riscv64_sha256_finup, + .descsize =3D sizeof(struct sha256_state), + .digestsize =3D SHA256_DIGEST_SIZE, + .base =3D { + .cra_blocksize =3D SHA256_BLOCK_SIZE, + .cra_priority =3D 150, + .cra_name =3D "sha256", + .cra_driver_name =3D "sha256-riscv64-zvknha_or_zvknhb-zvkb", + .cra_module =3D THIS_MODULE, + }, + }, { + .init =3D sha224_base_init, + .update =3D riscv64_sha256_update, + .final =3D riscv64_sha256_final, + .finup =3D riscv64_sha256_finup, + .descsize =3D sizeof(struct sha256_state), + .digestsize =3D SHA224_DIGEST_SIZE, + .base =3D { + .cra_blocksize =3D SHA224_BLOCK_SIZE, + .cra_priority =3D 150, + .cra_name =3D "sha224", + .cra_driver_name =3D "sha224-riscv64-zvknha_or_zvknhb-zvkb", + .cra_module =3D THIS_MODULE, + }, + }, +}; + +static inline bool check_sha256_ext(void) +{ + /* + * From the spec: + * The Zvknhb ext supports both SHA-256 and SHA-512 and Zvknha only + * supports SHA-256. + */ + return (riscv_isa_extension_available(NULL, ZVKNHA) || + riscv_isa_extension_available(NULL, ZVKNHB)) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >=3D 128; +} + +static int __init riscv64_sha256_mod_init(void) +{ + if (check_sha256_ext()) + return crypto_register_shashes(sha256_algs, + ARRAY_SIZE(sha256_algs)); + + return -ENODEV; +} + +static void __exit riscv64_sha256_mod_fini(void) +{ + crypto_unregister_shashes(sha256_algs, ARRAY_SIZE(sha256_algs)); +} + +module_init(riscv64_sha256_mod_init); +module_exit(riscv64_sha256_mod_fini); + +MODULE_DESCRIPTION("SHA-256 (RISC-V accelerated)"); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("sha224"); +MODULE_ALIAS_CRYPTO("sha256"); diff --git a/arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.pl b/ar= ch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.pl new file mode 100644 index 000000000000..51b2d9d4f8f1 --- /dev/null +++ b/arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.pl @@ -0,0 +1,318 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Christoph M=C3=BCllner +# Copyright (c) 2023, Phoebe Chen +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# The generated code of this file depends on the following RISC-V extensio= ns: +# - RV64I +# - RISC-V Vector ('V') with VLEN >=3D 128 +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') +# - RISC-V Vector SHA-2 Secure Hash extension ('Zvknha' or 'Zvknhb') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extensio= n) +# $flavour is the first argument if it doesn't look like a file +my $output =3D $#ARGV >=3D 0 && $ARGV[$#ARGV] =3D~ m|\.\w+$| ? pop : undef; +my $flavour =3D $#ARGV >=3D 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=3D<<___; +.text +___ + +my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, + $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15, + $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23, + $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, +) =3D map("v$_",(0..31)); + +my $K256 =3D "K256"; + +# Function arguments +my ($H, $INP, $LEN, $KT, $H2, $INDEX_PATTERN) =3D ("a0", "a1", "a2", "a3",= "t3", "t4"); + +sub sha_256_load_constant { + my $code=3D<<___; + la $KT, $K256 # Load round constants K256 + @{[vle32_v $V10, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V11, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V12, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V13, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V14, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V16, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V17, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V18, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V19, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V20, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V21, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V22, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V23, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V24, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V25, $KT]} +___ + + return $code; +} + +##########################################################################= ###### +# void sha256_block_data_order_zvkb_zvknha_or_zvknhb(void *c, const void *= p, size_t len) +$code .=3D <<___; +.p2align 2 +.globl sha256_block_data_order_zvkb_zvknha_or_zvknhb +.type sha256_block_data_order_zvkb_zvknha_or_zvknhb,\@function +sha256_block_data_order_zvkb_zvknha_or_zvknhb: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + @{[sha_256_load_constant]} + + # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c} + # The dst vtype is e32m1 and the index vtype is e8mf4. + # We use index-load with the following index pattern at v26. + # i8 index: + # 20, 16, 4, 0 + # Instead of setting the i8 index, we could use a single 32bit + # little-endian value to cover the 4xi8 index. + # i32 value: + # 0x 00 04 10 14 + li $INDEX_PATTERN, 0x00041014 + @{[vsetivli "zero", 1, "e32", "m1", "ta", "ma"]} + @{[vmv_v_x $V26, $INDEX_PATTERN]} + + addi $H2, $H, 8 + + # Use index-load to get {f,e,b,a},{h,g,d,c} + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vluxei8_v $V6, $H, $V26]} + @{[vluxei8_v $V7, $H2, $V26]} + + # Setup v0 mask for the vmerge to replace the first word (idx=3D=3D0) = in key-scheduling. + # The AVL is 4 in SHA, so we could use a single e8(8 element masking) = for masking. + @{[vsetivli "zero", 1, "e8", "m1", "ta", "ma"]} + @{[vmv_v_i $V0, 0x01]} + + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + +L_round_loop: + # Decrement length by 1 + add $LEN, $LEN, -1 + + # Keep the current state as we need it later: H' =3D H+{a',b',c',...,h= '}. + @{[vmv_v_v $V30, $V6]} + @{[vmv_v_v $V31, $V7]} + + # Load the 512-bits of the message block in v1-v4 and perform + # an endian swap on each 4 bytes element. + @{[vle32_v $V1, $INP]} + @{[vrev8_v $V1, $V1]} + add $INP, $INP, 16 + @{[vle32_v $V2, $INP]} + @{[vrev8_v $V2, $V2]} + add $INP, $INP, 16 + @{[vle32_v $V3, $INP]} + @{[vrev8_v $V3, $V3]} + add $INP, $INP, 16 + @{[vle32_v $V4, $INP]} + @{[vrev8_v $V4, $V4]} + add $INP, $INP, 16 + + # Quad-round 0 (+0, Wt from oldest to newest in v1->v2->v3->v4) + @{[vadd_vv $V5, $V10, $V1]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V3, $V2, $V0]} + @{[vsha2ms_vv $V1, $V5, $V4]} # Generate W[19:16] + + # Quad-round 1 (+1, v2->v3->v4->v1) + @{[vadd_vv $V5, $V11, $V2]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V4, $V3, $V0]} + @{[vsha2ms_vv $V2, $V5, $V1]} # Generate W[23:20] + + # Quad-round 2 (+2, v3->v4->v1->v2) + @{[vadd_vv $V5, $V12, $V3]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V1, $V4, $V0]} + @{[vsha2ms_vv $V3, $V5, $V2]} # Generate W[27:24] + + # Quad-round 3 (+3, v4->v1->v2->v3) + @{[vadd_vv $V5, $V13, $V4]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V2, $V1, $V0]} + @{[vsha2ms_vv $V4, $V5, $V3]} # Generate W[31:28] + + # Quad-round 4 (+0, v1->v2->v3->v4) + @{[vadd_vv $V5, $V14, $V1]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V3, $V2, $V0]} + @{[vsha2ms_vv $V1, $V5, $V4]} # Generate W[35:32] + + # Quad-round 5 (+1, v2->v3->v4->v1) + @{[vadd_vv $V5, $V15, $V2]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V4, $V3, $V0]} + @{[vsha2ms_vv $V2, $V5, $V1]} # Generate W[39:36] + + # Quad-round 6 (+2, v3->v4->v1->v2) + @{[vadd_vv $V5, $V16, $V3]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V1, $V4, $V0]} + @{[vsha2ms_vv $V3, $V5, $V2]} # Generate W[43:40] + + # Quad-round 7 (+3, v4->v1->v2->v3) + @{[vadd_vv $V5, $V17, $V4]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V2, $V1, $V0]} + @{[vsha2ms_vv $V4, $V5, $V3]} # Generate W[47:44] + + # Quad-round 8 (+0, v1->v2->v3->v4) + @{[vadd_vv $V5, $V18, $V1]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V3, $V2, $V0]} + @{[vsha2ms_vv $V1, $V5, $V4]} # Generate W[51:48] + + # Quad-round 9 (+1, v2->v3->v4->v1) + @{[vadd_vv $V5, $V19, $V2]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V4, $V3, $V0]} + @{[vsha2ms_vv $V2, $V5, $V1]} # Generate W[55:52] + + # Quad-round 10 (+2, v3->v4->v1->v2) + @{[vadd_vv $V5, $V20, $V3]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V1, $V4, $V0]} + @{[vsha2ms_vv $V3, $V5, $V2]} # Generate W[59:56] + + # Quad-round 11 (+3, v4->v1->v2->v3) + @{[vadd_vv $V5, $V21, $V4]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V2, $V1, $V0]} + @{[vsha2ms_vv $V4, $V5, $V3]} # Generate W[63:60] + + # Quad-round 12 (+0, v1->v2->v3->v4) + # Note that we stop generating new message schedule words (Wt, v1-13) + # as we already generated all the words we end up consuming (i.e., W[6= 3:60]). + @{[vadd_vv $V5, $V22, $V1]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + + # Quad-round 13 (+1, v2->v3->v4->v1) + @{[vadd_vv $V5, $V23, $V2]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + + # Quad-round 14 (+2, v3->v4->v1->v2) + @{[vadd_vv $V5, $V24, $V3]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + + # Quad-round 15 (+3, v4->v1->v2->v3) + @{[vadd_vv $V5, $V25, $V4]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + + # H' =3D H+{a',b',c',...,h'} + @{[vadd_vv $V6, $V30, $V6]} + @{[vadd_vv $V7, $V31, $V7]} + bnez $LEN, L_round_loop + + # Store {f,e,b,a},{h,g,d,c} back to {a,b,c,d},{e,f,g,h}. + @{[vsuxei8_v $V6, $H, $V26]} + @{[vsuxei8_v $V7, $H2, $V26]} + + ret +.size sha256_block_data_order_zvkb_zvknha_or_zvknhb,.-sha256_block_data_or= der_zvkb_zvknha_or_zvknhb + +.p2align 2 +.type $K256,\@object +$K256: + .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5 + .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5 + .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3 + .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174 + .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc + .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da + .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7 + .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967 + .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13 + .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85 + .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3 + .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070 + .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5 + .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3 + .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208 + .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 +.size $K256,.-$K256 +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; --=20 2.28.0 From nobody Wed Dec 17 12:18:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 949D4C4167B for ; Mon, 27 Nov 2023 07:08:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232409AbjK0HIY (ORCPT ); Mon, 27 Nov 2023 02:08:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232533AbjK0HHx (ORCPT ); Mon, 27 Nov 2023 02:07:53 -0500 Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 413181990 for ; Sun, 26 Nov 2023 23:07:44 -0800 (PST) Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-1cfb2176150so11005685ad.3 for ; Sun, 26 Nov 2023 23:07:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1701068863; x=1701673663; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Zx+JOV6SwFsQe7mCBswIKAe2MRx+dnQNvWPjGwWvo/Q=; b=LbldXemujBFlSa/f74u4WdFGzrh9SagqPajn3hj1vs/rESl55oWhnxvVb4rjqK4J4F sHDs+MeZjLkc/mLdZG7eNySMBcBGB/3IBR5t4tI5I6+ktUXeG/Pen1Mah+zm/Ye7xgde 9A6lonh2xQzdBhscnwbr3o330RKaXrS2bOeWF+FF/NraUv1TV6n3rD6rNhTxQNzkNLRO 7cqfyxb1xmlRpkv3OtW3g2FZKjU6z0gT2vwLrzEivDTJN9P12EdmKmjgfImyA3sX+/d1 yRPUZUIL2nwEn6HFVubawouknTgyEJuZmUnbmmJW3bWRwVxwAdHeS64yDDf34Gni60WZ SlcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701068863; x=1701673663; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Zx+JOV6SwFsQe7mCBswIKAe2MRx+dnQNvWPjGwWvo/Q=; b=m1UBeE/L5TBdKtbvoqZEBZjJ5aPLv+CNJfexEmVWfDFegakBHq24WNSumMs5jNid38 10Pt0mS/u9z0D90fifb0b/RSql7SUJFov6GVphBTeuRbMcpxUcN1NPUb6qUneSV6UykG EumlwXWyuIfjSQJiZDDGrTVwg+ZyMTK6B/wZ1rickvhMju74Z5ezXaWhdv68Aakkl5e3 LAQouvcu2fquv+EGm4Ew3LeQZhz2sdeF468zrg7U+kjVnUfWCxie6F7tsYcNsT1N9sA8 WRIWG4W1YrquYTTGavtqTac+Ny07sXqirhk4stDCv/vHVoIh5GZ28wytsKrLiKsm2UMr MJFQ== X-Gm-Message-State: AOJu0YybYwFhDpyijt/lqekpvBpmubEUWnOgHEbQqEe00zlRwg/aqpO2 x/Ldk4+a6+i7efJMFa6QdU0mKQ== X-Google-Smtp-Source: AGHT+IFdhvBp2RdlsIAPjfy87G49FrSij7r62uYf+IVyP/edmjf3e/EKCMqEtk41/Rq8ZCgftaiBvQ== X-Received: by 2002:a17:902:8e89:b0:1cc:7adb:16a5 with SMTP id bg9-20020a1709028e8900b001cc7adb16a5mr9377034plb.13.1701068863528; Sun, 26 Nov 2023 23:07:43 -0800 (PST) Received: from localhost.localdomain ([101.10.45.230]) by smtp.gmail.com with ESMTPSA id jh15-20020a170903328f00b001cfcd3a764esm1340134plb.77.2023.11.26.23.07.40 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2023 23:07:43 -0800 (PST) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net, conor.dooley@microchip.com, ebiggers@kernel.org, ardb@kernel.org Cc: heiko@sntech.de, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH v2 10/13] RISC-V: crypto: add Zvknhb accelerated SHA384/512 implementations Date: Mon, 27 Nov 2023 15:07:00 +0800 Message-Id: <20231127070703.1697-11-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231127070703.1697-1-jerry.shih@sifive.com> References: <20231127070703.1697-1-jerry.shih@sifive.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add SHA384 and 512 implementations using Zvknhb vector crypto extension from OpenSSL(openssl/openssl#21923). Co-developed-by: Charalampos Mitrodimas Signed-off-by: Charalampos Mitrodimas Co-developed-by: Heiko Stuebner Signed-off-by: Heiko Stuebner Co-developed-by: Phoebe Chen Signed-off-by: Phoebe Chen Signed-off-by: Jerry Shih --- Changelog v2: - Do not turn on kconfig `SHA512_RISCV64` option by default. - Add `asmlinkage` qualifier for crypto asm function. - Rename sha512-riscv64-zvkb-zvknhb to sha512-riscv64-zvknhb-zvkb. - Reorder structure sha512_algs members initialization in the order declared. --- arch/riscv/crypto/Kconfig | 11 + arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/sha512-riscv64-glue.c | 139 +++++++++ .../crypto/sha512-riscv64-zvknhb-zvkb.pl | 266 ++++++++++++++++++ 4 files changed, 423 insertions(+) create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c create mode 100644 arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index d31af9190717..ad0b08a13c9a 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -55,4 +55,15 @@ config CRYPTO_SHA256_RISCV64 - Zvknha or Zvknhb vector crypto extensions - Zvkb vector crypto extension =20 +config CRYPTO_SHA512_RISCV64 + tristate "Hash functions: SHA-384 and SHA-512" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_SHA512 + help + SHA-384 and SHA-512 secure hash algorithm (FIPS 180) + + Architecture: riscv64 using: + - Zvknhb vector crypto extension + - Zvkb vector crypto extension + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index e9d7717ec943..8aabef950ad3 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -15,6 +15,9 @@ ghash-riscv64-y :=3D ghash-riscv64-glue.o ghash-riscv64-z= vkg.o obj-$(CONFIG_CRYPTO_SHA256_RISCV64) +=3D sha256-riscv64.o sha256-riscv64-y :=3D sha256-riscv64-glue.o sha256-riscv64-zvknha_or_zvknh= b-zvkb.o =20 +obj-$(CONFIG_CRYPTO_SHA512_RISCV64) +=3D sha512-riscv64.o +sha512-riscv64-y :=3D sha512-riscv64-glue.o sha512-riscv64-zvknhb-zvkb.o + quiet_cmd_perlasm =3D PERLASM $@ cmd_perlasm =3D $(PERL) $(<) void $(@) =20 @@ -33,8 +36,12 @@ $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl $(obj)/sha256-riscv64-zvknha_or_zvknhb-zvkb.S: $(src)/sha256-riscv64-zvknh= a_or_zvknhb-zvkb.pl $(call cmd,perlasm) =20 +$(obj)/sha512-riscv64-zvknhb-zvkb.S: $(src)/sha512-riscv64-zvknhb-zvkb.pl + $(call cmd,perlasm) + clean-files +=3D aes-riscv64-zvkned.S clean-files +=3D aes-riscv64-zvkned-zvbb-zvkg.S clean-files +=3D aes-riscv64-zvkned-zvkb.S clean-files +=3D ghash-riscv64-zvkg.S clean-files +=3D sha256-riscv64-zvknha_or_zvknhb-zvkb.S +clean-files +=3D sha512-riscv64-zvknhb-zvkb.S diff --git a/arch/riscv/crypto/sha512-riscv64-glue.c b/arch/riscv/crypto/sh= a512-riscv64-glue.c new file mode 100644 index 000000000000..3dd8e1c9d402 --- /dev/null +++ b/arch/riscv/crypto/sha512-riscv64-glue.c @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Linux/riscv64 port of the OpenSSL SHA512 implementation for RISC-V 64 + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + * + * Copyright (C) 2023 SiFive, Inc. + * Author: Jerry Shih + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * sha512 using zvkb and zvknhb vector crypto extension + * + * This asm function will just take the first 512-bit as the sha512 state = from + * the pointer to `struct sha512_state`. + */ +asmlinkage void sha512_block_data_order_zvkb_zvknhb(struct sha512_state *d= igest, + const u8 *data, + int num_blks); + +static int riscv64_sha512_update(struct shash_desc *desc, const u8 *data, + unsigned int len) +{ + int ret =3D 0; + + /* + * Make sure struct sha512_state begins directly with the SHA512 + * 512-bit internal state, as this is what the asm function expect. + */ + BUILD_BUG_ON(offsetof(struct sha512_state, state) !=3D 0); + + if (crypto_simd_usable()) { + kernel_vector_begin(); + ret =3D sha512_base_do_update( + desc, data, len, sha512_block_data_order_zvkb_zvknhb); + kernel_vector_end(); + } else { + ret =3D crypto_sha512_update(desc, data, len); + } + + return ret; +} + +static int riscv64_sha512_finup(struct shash_desc *desc, const u8 *data, + unsigned int len, u8 *out) +{ + if (crypto_simd_usable()) { + kernel_vector_begin(); + if (len) + sha512_base_do_update( + desc, data, len, + sha512_block_data_order_zvkb_zvknhb); + sha512_base_do_finalize(desc, + sha512_block_data_order_zvkb_zvknhb); + kernel_vector_end(); + + return sha512_base_finish(desc, out); + } + + return crypto_sha512_finup(desc, data, len, out); +} + +static int riscv64_sha512_final(struct shash_desc *desc, u8 *out) +{ + return riscv64_sha512_finup(desc, NULL, 0, out); +} + +static struct shash_alg sha512_algs[] =3D { + { + .init =3D sha512_base_init, + .update =3D riscv64_sha512_update, + .final =3D riscv64_sha512_final, + .finup =3D riscv64_sha512_finup, + .descsize =3D sizeof(struct sha512_state), + .digestsize =3D SHA512_DIGEST_SIZE, + .base =3D { + .cra_blocksize =3D SHA512_BLOCK_SIZE, + .cra_priority =3D 150, + .cra_name =3D "sha512", + .cra_driver_name =3D "sha512-riscv64-zvknhb-zvkb", + .cra_module =3D THIS_MODULE, + }, + }, + { + .init =3D sha384_base_init, + .update =3D riscv64_sha512_update, + .final =3D riscv64_sha512_final, + .finup =3D riscv64_sha512_finup, + .descsize =3D sizeof(struct sha512_state), + .digestsize =3D SHA384_DIGEST_SIZE, + .base =3D { + .cra_blocksize =3D SHA384_BLOCK_SIZE, + .cra_priority =3D 150, + .cra_name =3D "sha384", + .cra_driver_name =3D "sha384-riscv64-zvknhb-zvkb", + .cra_module =3D THIS_MODULE, + }, + }, +}; + +static inline bool check_sha512_ext(void) +{ + return riscv_isa_extension_available(NULL, ZVKNHB) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >=3D 128; +} + +static int __init riscv64_sha512_mod_init(void) +{ + if (check_sha512_ext()) + return crypto_register_shashes(sha512_algs, + ARRAY_SIZE(sha512_algs)); + + return -ENODEV; +} + +static void __exit riscv64_sha512_mod_fini(void) +{ + crypto_unregister_shashes(sha512_algs, ARRAY_SIZE(sha512_algs)); +} + +module_init(riscv64_sha512_mod_init); +module_exit(riscv64_sha512_mod_fini); + +MODULE_DESCRIPTION("SHA-512 (RISC-V accelerated)"); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("sha384"); +MODULE_ALIAS_CRYPTO("sha512"); diff --git a/arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.pl b/arch/riscv/c= rypto/sha512-riscv64-zvknhb-zvkb.pl new file mode 100644 index 000000000000..4be448266a59 --- /dev/null +++ b/arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.pl @@ -0,0 +1,266 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Christoph M=C3=BCllner +# Copyright (c) 2023, Phoebe Chen +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# The generated code of this file depends on the following RISC-V extensio= ns: +# - RV64I +# - RISC-V vector ('V') with VLEN >=3D 128 +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') +# - RISC-V Vector SHA-2 Secure Hash extension ('Zvknhb') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extensio= n) +# $flavour is the first argument if it doesn't look like a file +my $output =3D $#ARGV >=3D 0 && $ARGV[$#ARGV] =3D~ m|\.\w+$| ? pop : undef; +my $flavour =3D $#ARGV >=3D 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=3D<<___; +.text +___ + +my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, + $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15, + $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23, + $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, +) =3D map("v$_",(0..31)); + +my $K512 =3D "K512"; + +# Function arguments +my ($H, $INP, $LEN, $KT, $H2, $INDEX_PATTERN) =3D ("a0", "a1", "a2", "a3",= "t3", "t4"); + +##########################################################################= ###### +# void sha512_block_data_order_zvkb_zvknhb(void *c, const void *p, size_t = len) +$code .=3D <<___; +.p2align 2 +.globl sha512_block_data_order_zvkb_zvknhb +.type sha512_block_data_order_zvkb_zvknhb,\@function +sha512_block_data_order_zvkb_zvknhb: + @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]} + + # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c} + # The dst vtype is e64m2 and the index vtype is e8mf4. + # We use index-load with the following index pattern at v1. + # i8 index: + # 40, 32, 8, 0 + # Instead of setting the i8 index, we could use a single 32bit + # little-endian value to cover the 4xi8 index. + # i32 value: + # 0x 00 08 20 28 + li $INDEX_PATTERN, 0x00082028 + @{[vsetivli "zero", 1, "e32", "m1", "ta", "ma"]} + @{[vmv_v_x $V1, $INDEX_PATTERN]} + + addi $H2, $H, 16 + + # Use index-load to get {f,e,b,a},{h,g,d,c} + @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]} + @{[vluxei8_v $V22, $H, $V1]} + @{[vluxei8_v $V24, $H2, $V1]} + + # Setup v0 mask for the vmerge to replace the first word (idx=3D=3D0) = in key-scheduling. + # The AVL is 4 in SHA, so we could use a single e8(8 element masking) = for masking. + @{[vsetivli "zero", 1, "e8", "m1", "ta", "ma"]} + @{[vmv_v_i $V0, 0x01]} + + @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]} + +L_round_loop: + # Load round constants K512 + la $KT, $K512 + + # Decrement length by 1 + addi $LEN, $LEN, -1 + + # Keep the current state as we need it later: H' =3D H+{a',b',c',...,h= '}. + @{[vmv_v_v $V26, $V22]} + @{[vmv_v_v $V28, $V24]} + + # Load the 1024-bits of the message block in v10-v16 and perform the e= ndian + # swap. + @{[vle64_v $V10, $INP]} + @{[vrev8_v $V10, $V10]} + addi $INP, $INP, 32 + @{[vle64_v $V12, $INP]} + @{[vrev8_v $V12, $V12]} + addi $INP, $INP, 32 + @{[vle64_v $V14, $INP]} + @{[vrev8_v $V14, $V14]} + addi $INP, $INP, 32 + @{[vle64_v $V16, $INP]} + @{[vrev8_v $V16, $V16]} + addi $INP, $INP, 32 + + .rept 4 + # Quad-round 0 (+0, v10->v12->v14->v16) + @{[vle64_v $V20, $KT]} + addi $KT, $KT, 32 + @{[vadd_vv $V18, $V20, $V10]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + @{[vmerge_vvm $V18, $V14, $V12, $V0]} + @{[vsha2ms_vv $V10, $V18, $V16]} + + # Quad-round 1 (+1, v12->v14->v16->v10) + @{[vle64_v $V20, $KT]} + addi $KT, $KT, 32 + @{[vadd_vv $V18, $V20, $V12]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + @{[vmerge_vvm $V18, $V16, $V14, $V0]} + @{[vsha2ms_vv $V12, $V18, $V10]} + + # Quad-round 2 (+2, v14->v16->v10->v12) + @{[vle64_v $V20, $KT]} + addi $KT, $KT, 32 + @{[vadd_vv $V18, $V20, $V14]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + @{[vmerge_vvm $V18, $V10, $V16, $V0]} + @{[vsha2ms_vv $V14, $V18, $V12]} + + # Quad-round 3 (+3, v16->v10->v12->v14) + @{[vle64_v $V20, $KT]} + addi $KT, $KT, 32 + @{[vadd_vv $V18, $V20, $V16]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + @{[vmerge_vvm $V18, $V12, $V10, $V0]} + @{[vsha2ms_vv $V16, $V18, $V14]} + .endr + + # Quad-round 16 (+0, v10->v12->v14->v16) + # Note that we stop generating new message schedule words (Wt, v10-16) + # as we already generated all the words we end up consuming (i.e., W[7= 9:76]). + @{[vle64_v $V20, $KT]} + addi $KT, $KT, 32 + @{[vadd_vv $V18, $V20, $V10]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + + # Quad-round 17 (+1, v12->v14->v16->v10) + @{[vle64_v $V20, $KT]} + addi $KT, $KT, 32 + @{[vadd_vv $V18, $V20, $V12]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + + # Quad-round 18 (+2, v14->v16->v10->v12) + @{[vle64_v $V20, $KT]} + addi $KT, $KT, 32 + @{[vadd_vv $V18, $V20, $V14]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + + # Quad-round 19 (+3, v16->v10->v12->v14) + @{[vle64_v $V20, $KT]} + # No t1 increment needed. + @{[vadd_vv $V18, $V20, $V16]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + + # H' =3D H+{a',b',c',...,h'} + @{[vadd_vv $V22, $V26, $V22]} + @{[vadd_vv $V24, $V28, $V24]} + bnez $LEN, L_round_loop + + # Store {f,e,b,a},{h,g,d,c} back to {a,b,c,d},{e,f,g,h}. + @{[vsuxei8_v $V22, $H, $V1]} + @{[vsuxei8_v $V24, $H2, $V1]} + + ret +.size sha512_block_data_order_zvkb_zvknhb,.-sha512_block_data_order_zvkb_z= vknhb + +.p2align 3 +.type $K512,\@object +$K512: + .dword 0x428a2f98d728ae22, 0x7137449123ef65cd + .dword 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc + .dword 0x3956c25bf348b538, 0x59f111f1b605d019 + .dword 0x923f82a4af194f9b, 0xab1c5ed5da6d8118 + .dword 0xd807aa98a3030242, 0x12835b0145706fbe + .dword 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2 + .dword 0x72be5d74f27b896f, 0x80deb1fe3b1696b1 + .dword 0x9bdc06a725c71235, 0xc19bf174cf692694 + .dword 0xe49b69c19ef14ad2, 0xefbe4786384f25e3 + .dword 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65 + .dword 0x2de92c6f592b0275, 0x4a7484aa6ea6e483 + .dword 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5 + .dword 0x983e5152ee66dfab, 0xa831c66d2db43210 + .dword 0xb00327c898fb213f, 0xbf597fc7beef0ee4 + .dword 0xc6e00bf33da88fc2, 0xd5a79147930aa725 + .dword 0x06ca6351e003826f, 0x142929670a0e6e70 + .dword 0x27b70a8546d22ffc, 0x2e1b21385c26c926 + .dword 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df + .dword 0x650a73548baf63de, 0x766a0abb3c77b2a8 + .dword 0x81c2c92e47edaee6, 0x92722c851482353b + .dword 0xa2bfe8a14cf10364, 0xa81a664bbc423001 + .dword 0xc24b8b70d0f89791, 0xc76c51a30654be30 + .dword 0xd192e819d6ef5218, 0xd69906245565a910 + .dword 0xf40e35855771202a, 0x106aa07032bbd1b8 + .dword 0x19a4c116b8d2d0c8, 0x1e376c085141ab53 + .dword 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8 + .dword 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb + .dword 0x5b9cca4f7763e373, 0x682e6ff3d6b2b8a3 + .dword 0x748f82ee5defb2fc, 0x78a5636f43172f60 + .dword 0x84c87814a1f0ab72, 0x8cc702081a6439ec + .dword 0x90befffa23631e28, 0xa4506cebde82bde9 + .dword 0xbef9a3f7b2c67915, 0xc67178f2e372532b + .dword 0xca273eceea26619c, 0xd186b8c721c0c207 + .dword 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178 + .dword 0x06f067aa72176fba, 0x0a637dc5a2c898a6 + .dword 0x113f9804bef90dae, 0x1b710b35131c471b + .dword 0x28db77f523047d84, 0x32caab7b40c72493 + .dword 0x3c9ebe0a15c9bebc, 0x431d67c49c100d4c + .dword 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a + .dword 0x5fcb6fab3ad6faec, 0x6c44198c4a475817 +.size $K512,.-$K512 +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; --=20 2.28.0 From nobody Wed Dec 17 12:18:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76FD7C4167B for ; Mon, 27 Nov 2023 07:08:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232388AbjK0HIk (ORCPT ); Mon, 27 Nov 2023 02:08:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232563AbjK0HH4 (ORCPT ); Mon, 27 Nov 2023 02:07:56 -0500 Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9DA44182 for ; Sun, 26 Nov 2023 23:07:47 -0800 (PST) Received: by mail-pg1-x531.google.com with SMTP id 41be03b00d2f7-53fa455cd94so2382410a12.2 for ; Sun, 26 Nov 2023 23:07:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1701068867; x=1701673667; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MHvQMLoaTueUI0HUb6KNgDLBfeUfZVP5F06b8uLBkiE=; b=m2lF/Bq6rSqMQz2IuYpwfZLKqff25zTtskhuAnJ+W8pak0BC3bRs0LMz1CquEDz8ke R+AiptrFTtSKH3d3TCi9PaOLn87Na2Tt47HRWGdqoPju6dQ8lYG9nGnF69DzvufQ0UgQ FaDAmCCEiSo2vkl3G19hCb8mk5QKiSdvywoorV2IGSE+Bq8O8TuM2XTqcSXNSOIE+RmX oMB2XQ1O93CH+LIBCcclanl3Cl1ooOU3wGh6z5oDMqBC4+n74+/nWDC2Vp525/rbXi5q sWAn3k8wX4K+4O0rCFLGf8nXxg/jsPhnsksnrfXRNa6V2pjV7K9on7vSUxN1UIt2/jAW DMJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701068867; x=1701673667; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MHvQMLoaTueUI0HUb6KNgDLBfeUfZVP5F06b8uLBkiE=; b=QriRH9xRxuYnVw4NZrAI+fUdSswNo8Tq4vG8kQgCqtE4z16jC+LCn4t+37QssaTg9R p1nqMJ9Io/EwLXiFRqHg9gpPj0dvdGWZphqA/8ojisFxOtXecceMEpthpdl0a/MfOHpf XWYBkNQ4T5eU+Qn9dD88VtbksirlNc7SUMA3mIquFrQVtscJD2XAPb3nb5c0rPEusRqu rP9KcdHBJkOQkVPsxor38bftKrPnUcRM39cVjIgAmYnYifolTERMSUqBbQ2bdZZEasGe Bf3VWmhp83ANXVPuVZdOTq8Tk2xZSwEylLyk6I2hXIUqn+4ypzY8VD3at2TsB5ovmnwO EyYA== X-Gm-Message-State: AOJu0YzwZNuXggEPkYAmP81Zldkgl6I5bCiYD09A2fnF5L06daK0mJji zp9dRYq8CqsS9vi2FhFOgbbelQ== X-Google-Smtp-Source: AGHT+IFK7haKFNWcAoAL0u/ApDI4eyKdaHASKTvf+XvN5fJLNzDmGaowZ3pjGzDqkx0PXPdo+0RfNQ== X-Received: by 2002:a05:6a20:3d28:b0:18c:2287:29cf with SMTP id y40-20020a056a203d2800b0018c228729cfmr8525491pzi.40.1701068866585; Sun, 26 Nov 2023 23:07:46 -0800 (PST) Received: from localhost.localdomain ([101.10.45.230]) by smtp.gmail.com with ESMTPSA id jh15-20020a170903328f00b001cfcd3a764esm1340134plb.77.2023.11.26.23.07.43 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2023 23:07:46 -0800 (PST) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net, conor.dooley@microchip.com, ebiggers@kernel.org, ardb@kernel.org Cc: heiko@sntech.de, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH v2 11/13] RISC-V: crypto: add Zvksed accelerated SM4 implementation Date: Mon, 27 Nov 2023 15:07:01 +0800 Message-Id: <20231127070703.1697-12-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231127070703.1697-1-jerry.shih@sifive.com> References: <20231127070703.1697-1-jerry.shih@sifive.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add SM4 implementation using Zvksed vector crypto extension from OpenSSL (openssl/openssl#21923). The perlasm here is different from the original implementation in OpenSSL. In OpenSSL, SM4 has the separated set_encrypt_key and set_decrypt_key functions. In kernel, these set_key functions are merged into a single one in order to skip the redundant key expanding instructions. Co-developed-by: Christoph M=C3=BCllner Signed-off-by: Christoph M=C3=BCllner Co-developed-by: Heiko Stuebner Signed-off-by: Heiko Stuebner Signed-off-by: Jerry Shih --- Changelog v2: - Do not turn on kconfig `SM4_RISCV64` option by default. - Add the missed `static` declaration for riscv64_sm4_zvksed_alg. - Add `asmlinkage` qualifier for crypto asm function. - Rename sm4-riscv64-zvkb-zvksed to sm4-riscv64-zvksed-zvkb. - Reorder structure riscv64_sm4_zvksed_zvkb_alg members initialization in the order declared. --- arch/riscv/crypto/Kconfig | 17 ++ arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/sm4-riscv64-glue.c | 121 +++++++++++ arch/riscv/crypto/sm4-riscv64-zvksed.pl | 268 ++++++++++++++++++++++++ 4 files changed, 413 insertions(+) create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index ad0b08a13c9a..b28cf1972250 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -66,4 +66,21 @@ config CRYPTO_SHA512_RISCV64 - Zvknhb vector crypto extension - Zvkb vector crypto extension =20 +config CRYPTO_SM4_RISCV64 + tristate "Ciphers: SM4 (ShangMi 4)" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_ALGAPI + select CRYPTO_SM4 + help + SM4 cipher algorithms (OSCCA GB/T 32907-2016, + ISO/IEC 18033-3:2010/Amd 1:2021) + + SM4 (GBT.32907-2016) is a cryptographic standard issued by the + Organization of State Commercial Administration of China (OSCCA) + as an authorized cryptographic algorithms for the use within China. + + Architecture: riscv64 using: + - Zvksed vector crypto extension + - Zvkb vector crypto extension + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 8aabef950ad3..8e34861bba34 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -18,6 +18,9 @@ sha256-riscv64-y :=3D sha256-riscv64-glue.o sha256-riscv6= 4-zvknha_or_zvknhb-zvkb.o obj-$(CONFIG_CRYPTO_SHA512_RISCV64) +=3D sha512-riscv64.o sha512-riscv64-y :=3D sha512-riscv64-glue.o sha512-riscv64-zvknhb-zvkb.o =20 +obj-$(CONFIG_CRYPTO_SM4_RISCV64) +=3D sm4-riscv64.o +sm4-riscv64-y :=3D sm4-riscv64-glue.o sm4-riscv64-zvksed.o + quiet_cmd_perlasm =3D PERLASM $@ cmd_perlasm =3D $(PERL) $(<) void $(@) =20 @@ -39,9 +42,13 @@ $(obj)/sha256-riscv64-zvknha_or_zvknhb-zvkb.S: $(src)/sh= a256-riscv64-zvknha_or_z $(obj)/sha512-riscv64-zvknhb-zvkb.S: $(src)/sha512-riscv64-zvknhb-zvkb.pl $(call cmd,perlasm) =20 +$(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl + $(call cmd,perlasm) + clean-files +=3D aes-riscv64-zvkned.S clean-files +=3D aes-riscv64-zvkned-zvbb-zvkg.S clean-files +=3D aes-riscv64-zvkned-zvkb.S clean-files +=3D ghash-riscv64-zvkg.S clean-files +=3D sha256-riscv64-zvknha_or_zvknhb-zvkb.S clean-files +=3D sha512-riscv64-zvknhb-zvkb.S +clean-files +=3D sm4-riscv64-zvksed.S diff --git a/arch/riscv/crypto/sm4-riscv64-glue.c b/arch/riscv/crypto/sm4-r= iscv64-glue.c new file mode 100644 index 000000000000..9d9d24b67ee3 --- /dev/null +++ b/arch/riscv/crypto/sm4-riscv64-glue.c @@ -0,0 +1,121 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Linux/riscv64 port of the OpenSSL SM4 implementation for RISC-V 64 + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + * + * Copyright (C) 2023 SiFive, Inc. + * Author: Jerry Shih + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* sm4 using zvksed vector crypto extension */ +asmlinkage void rv64i_zvksed_sm4_encrypt(const u8 *in, u8 *out, const u32 = *key); +asmlinkage void rv64i_zvksed_sm4_decrypt(const u8 *in, u8 *out, const u32 = *key); +asmlinkage int rv64i_zvksed_sm4_set_key(const u8 *user_key, + unsigned int key_len, u32 *enc_key, + u32 *dec_key); + +static int riscv64_sm4_setkey_zvksed(struct crypto_tfm *tfm, const u8 *key, + unsigned int key_len) +{ + struct sm4_ctx *ctx =3D crypto_tfm_ctx(tfm); + int ret =3D 0; + + if (crypto_simd_usable()) { + kernel_vector_begin(); + if (rv64i_zvksed_sm4_set_key(key, key_len, ctx->rkey_enc, + ctx->rkey_dec)) + ret =3D -EINVAL; + kernel_vector_end(); + } else { + ret =3D sm4_expandkey(ctx, key, key_len); + } + + return ret; +} + +static void riscv64_sm4_encrypt_zvksed(struct crypto_tfm *tfm, u8 *dst, + const u8 *src) +{ + const struct sm4_ctx *ctx =3D crypto_tfm_ctx(tfm); + + if (crypto_simd_usable()) { + kernel_vector_begin(); + rv64i_zvksed_sm4_encrypt(src, dst, ctx->rkey_enc); + kernel_vector_end(); + } else { + sm4_crypt_block(ctx->rkey_enc, dst, src); + } +} + +static void riscv64_sm4_decrypt_zvksed(struct crypto_tfm *tfm, u8 *dst, + const u8 *src) +{ + const struct sm4_ctx *ctx =3D crypto_tfm_ctx(tfm); + + if (crypto_simd_usable()) { + kernel_vector_begin(); + rv64i_zvksed_sm4_decrypt(src, dst, ctx->rkey_dec); + kernel_vector_end(); + } else { + sm4_crypt_block(ctx->rkey_dec, dst, src); + } +} + +static struct crypto_alg riscv64_sm4_zvksed_zvkb_alg =3D { + .cra_flags =3D CRYPTO_ALG_TYPE_CIPHER, + .cra_blocksize =3D SM4_BLOCK_SIZE, + .cra_ctxsize =3D sizeof(struct sm4_ctx), + .cra_priority =3D 300, + .cra_name =3D "sm4", + .cra_driver_name =3D "sm4-riscv64-zvksed-zvkb", + .cra_cipher =3D { + .cia_min_keysize =3D SM4_KEY_SIZE, + .cia_max_keysize =3D SM4_KEY_SIZE, + .cia_setkey =3D riscv64_sm4_setkey_zvksed, + .cia_encrypt =3D riscv64_sm4_encrypt_zvksed, + .cia_decrypt =3D riscv64_sm4_decrypt_zvksed, + }, + .cra_module =3D THIS_MODULE, +}; + +static inline bool check_sm4_ext(void) +{ + return riscv_isa_extension_available(NULL, ZVKSED) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >=3D 128; +} + +static int __init riscv64_sm4_mod_init(void) +{ + if (check_sm4_ext()) + return crypto_register_alg(&riscv64_sm4_zvksed_zvkb_alg); + + return -ENODEV; +} + +static void __exit riscv64_sm4_mod_fini(void) +{ + crypto_unregister_alg(&riscv64_sm4_zvksed_zvkb_alg); +} + +module_init(riscv64_sm4_mod_init); +module_exit(riscv64_sm4_mod_fini); + +MODULE_DESCRIPTION("SM4 (RISC-V accelerated)"); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("sm4"); diff --git a/arch/riscv/crypto/sm4-riscv64-zvksed.pl b/arch/riscv/crypto/sm= 4-riscv64-zvksed.pl new file mode 100644 index 000000000000..dab1d026a360 --- /dev/null +++ b/arch/riscv/crypto/sm4-riscv64-zvksed.pl @@ -0,0 +1,268 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Christoph M=C3=BCllner +# Copyright (c) 2023, Jerry Shih +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# The generated code of this file depends on the following RISC-V extensio= ns: +# - RV64I +# - RISC-V Vector ('V') with VLEN >=3D 128 +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') +# - RISC-V Vector SM4 Block Cipher extension ('Zvksed') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extensio= n) +# $flavour is the first argument if it doesn't look like a file +my $output =3D $#ARGV >=3D 0 && $ARGV[$#ARGV] =3D~ m|\.\w+$| ? pop : undef; +my $flavour =3D $#ARGV >=3D 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=3D<<___; +.text +___ + +#### +# int rv64i_zvksed_sm4_set_key(const u8 *user_key, unsigned int key_len, +# u32 *enc_key, u32 *dec_key); +# +{ +my ($ukey,$key_len,$enc_key,$dec_key)=3D("a0","a1","a2","a3"); +my ($fk,$stride)=3D("a4","a5"); +my ($t0,$t1)=3D("t0","t1"); +my ($vukey,$vfk,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7)=3D("v1","v2","v3"= ,"v4","v5","v6","v7","v8","v9","v10"); +$code .=3D <<___; +.p2align 3 +.globl rv64i_zvksed_sm4_set_key +.type rv64i_zvksed_sm4_set_key,\@function +rv64i_zvksed_sm4_set_key: + li $t0, 16 + beq $t0, $key_len, 1f + li a0, 1 + ret +1: + + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + # Load the user key + @{[vle32_v $vukey, $ukey]} + @{[vrev8_v $vukey, $vukey]} + + # Load the FK. + la $fk, FK + @{[vle32_v $vfk, $fk]} + + # Generate round keys. + @{[vxor_vv $vukey, $vukey, $vfk]} + @{[vsm4k_vi $vk0, $vukey, 0]} # rk[0:3] + @{[vsm4k_vi $vk1, $vk0, 1]} # rk[4:7] + @{[vsm4k_vi $vk2, $vk1, 2]} # rk[8:11] + @{[vsm4k_vi $vk3, $vk2, 3]} # rk[12:15] + @{[vsm4k_vi $vk4, $vk3, 4]} # rk[16:19] + @{[vsm4k_vi $vk5, $vk4, 5]} # rk[20:23] + @{[vsm4k_vi $vk6, $vk5, 6]} # rk[24:27] + @{[vsm4k_vi $vk7, $vk6, 7]} # rk[28:31] + + # Store enc round keys + @{[vse32_v $vk0, $enc_key]} # rk[0:3] + addi $enc_key, $enc_key, 16 + @{[vse32_v $vk1, $enc_key]} # rk[4:7] + addi $enc_key, $enc_key, 16 + @{[vse32_v $vk2, $enc_key]} # rk[8:11] + addi $enc_key, $enc_key, 16 + @{[vse32_v $vk3, $enc_key]} # rk[12:15] + addi $enc_key, $enc_key, 16 + @{[vse32_v $vk4, $enc_key]} # rk[16:19] + addi $enc_key, $enc_key, 16 + @{[vse32_v $vk5, $enc_key]} # rk[20:23] + addi $enc_key, $enc_key, 16 + @{[vse32_v $vk6, $enc_key]} # rk[24:27] + addi $enc_key, $enc_key, 16 + @{[vse32_v $vk7, $enc_key]} # rk[28:31] + + # Store dec round keys in reverse order + addi $dec_key, $dec_key, 12 + li $stride, -4 + @{[vsse32_v $vk7, $dec_key, $stride]} # rk[31:28] + addi $dec_key, $dec_key, 16 + @{[vsse32_v $vk6, $dec_key, $stride]} # rk[27:24] + addi $dec_key, $dec_key, 16 + @{[vsse32_v $vk5, $dec_key, $stride]} # rk[23:20] + addi $dec_key, $dec_key, 16 + @{[vsse32_v $vk4, $dec_key, $stride]} # rk[19:16] + addi $dec_key, $dec_key, 16 + @{[vsse32_v $vk3, $dec_key, $stride]} # rk[15:12] + addi $dec_key, $dec_key, 16 + @{[vsse32_v $vk2, $dec_key, $stride]} # rk[11:8] + addi $dec_key, $dec_key, 16 + @{[vsse32_v $vk1, $dec_key, $stride]} # rk[7:4] + addi $dec_key, $dec_key, 16 + @{[vsse32_v $vk0, $dec_key, $stride]} # rk[3:0] + + li a0, 0 + ret +.size rv64i_zvksed_sm4_set_key,.-rv64i_zvksed_sm4_set_key +___ +} + +#### +# void rv64i_zvksed_sm4_encrypt(const unsigned char *in, unsigned char *ou= t, +# const SM4_KEY *key); +# +{ +my ($in,$out,$keys,$stride)=3D("a0","a1","a2","t0"); +my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=3D("v1","v2","v3= ","v4","v5","v6","v7","v8","v9","v10"); +$code .=3D <<___; +.p2align 3 +.globl rv64i_zvksed_sm4_encrypt +.type rv64i_zvksed_sm4_encrypt,\@function +rv64i_zvksed_sm4_encrypt: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + # Load input data + @{[vle32_v $vdata, $in]} + @{[vrev8_v $vdata, $vdata]} + + # Order of elements was adjusted in sm4_set_key() + # Encrypt with all keys + @{[vle32_v $vk0, $keys]} # rk[0:3] + @{[vsm4r_vs $vdata, $vk0]} + addi $keys, $keys, 16 + @{[vle32_v $vk1, $keys]} # rk[4:7] + @{[vsm4r_vs $vdata, $vk1]} + addi $keys, $keys, 16 + @{[vle32_v $vk2, $keys]} # rk[8:11] + @{[vsm4r_vs $vdata, $vk2]} + addi $keys, $keys, 16 + @{[vle32_v $vk3, $keys]} # rk[12:15] + @{[vsm4r_vs $vdata, $vk3]} + addi $keys, $keys, 16 + @{[vle32_v $vk4, $keys]} # rk[16:19] + @{[vsm4r_vs $vdata, $vk4]} + addi $keys, $keys, 16 + @{[vle32_v $vk5, $keys]} # rk[20:23] + @{[vsm4r_vs $vdata, $vk5]} + addi $keys, $keys, 16 + @{[vle32_v $vk6, $keys]} # rk[24:27] + @{[vsm4r_vs $vdata, $vk6]} + addi $keys, $keys, 16 + @{[vle32_v $vk7, $keys]} # rk[28:31] + @{[vsm4r_vs $vdata, $vk7]} + + # Save the ciphertext (in reverse element order) + @{[vrev8_v $vdata, $vdata]} + li $stride, -4 + addi $out, $out, 12 + @{[vsse32_v $vdata, $out, $stride]} + + ret +.size rv64i_zvksed_sm4_encrypt,.-rv64i_zvksed_sm4_encrypt +___ +} + +#### +# void rv64i_zvksed_sm4_decrypt(const unsigned char *in, unsigned char *ou= t, +# const SM4_KEY *key); +# +{ +my ($in,$out,$keys,$stride)=3D("a0","a1","a2","t0"); +my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=3D("v1","v2","v3= ","v4","v5","v6","v7","v8","v9","v10"); +$code .=3D <<___; +.p2align 3 +.globl rv64i_zvksed_sm4_decrypt +.type rv64i_zvksed_sm4_decrypt,\@function +rv64i_zvksed_sm4_decrypt: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + # Load input data + @{[vle32_v $vdata, $in]} + @{[vrev8_v $vdata, $vdata]} + + # Order of key elements was adjusted in sm4_set_key() + # Decrypt with all keys + @{[vle32_v $vk7, $keys]} # rk[31:28] + @{[vsm4r_vs $vdata, $vk7]} + addi $keys, $keys, 16 + @{[vle32_v $vk6, $keys]} # rk[27:24] + @{[vsm4r_vs $vdata, $vk6]} + addi $keys, $keys, 16 + @{[vle32_v $vk5, $keys]} # rk[23:20] + @{[vsm4r_vs $vdata, $vk5]} + addi $keys, $keys, 16 + @{[vle32_v $vk4, $keys]} # rk[19:16] + @{[vsm4r_vs $vdata, $vk4]} + addi $keys, $keys, 16 + @{[vle32_v $vk3, $keys]} # rk[15:11] + @{[vsm4r_vs $vdata, $vk3]} + addi $keys, $keys, 16 + @{[vle32_v $vk2, $keys]} # rk[11:8] + @{[vsm4r_vs $vdata, $vk2]} + addi $keys, $keys, 16 + @{[vle32_v $vk1, $keys]} # rk[7:4] + @{[vsm4r_vs $vdata, $vk1]} + addi $keys, $keys, 16 + @{[vle32_v $vk0, $keys]} # rk[3:0] + @{[vsm4r_vs $vdata, $vk0]} + + # Save the ciphertext (in reverse element order) + @{[vrev8_v $vdata, $vdata]} + li $stride, -4 + addi $out, $out, 12 + @{[vsse32_v $vdata, $out, $stride]} + + ret +.size rv64i_zvksed_sm4_decrypt,.-rv64i_zvksed_sm4_decrypt +___ +} + +$code .=3D <<___; +# Family Key (little-endian 32-bit chunks) +.p2align 3 +FK: + .word 0xA3B1BAC6, 0x56AA3350, 0x677D9197, 0xB27022DC +.size FK,.-FK +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; --=20 2.28.0 From nobody Wed Dec 17 12:18:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A6D0C4167B for ; Mon, 27 Nov 2023 07:08:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232506AbjK0HIo (ORCPT ); Mon, 27 Nov 2023 02:08:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232480AbjK0HH7 (ORCPT ); Mon, 27 Nov 2023 02:07:59 -0500 Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FD571BD9 for ; Sun, 26 Nov 2023 23:07:50 -0800 (PST) Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-1cf8b35a6dbso26424215ad.0 for ; Sun, 26 Nov 2023 23:07:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1701068870; x=1701673670; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yncBKNuxBeXEf1T1sdiq2nXhaNeK0KdzufyHRNnVzQo=; b=hMzHc1dFU0hJLFxD6A2qE0JJLlRWmOH7qMzFmBpuXo2eVJYpFTR9Ncs8pPZCrC752v qFi0AekjhjqyPQbwYxlEjZgtMQAlEuxwvz71+sw0ORCXkxOY+vY37eKkwf7S7CTyXZU7 B299+SPNkU1xXUPmlyi80BPqZdlgLZWKwl4dBAeyWK+DhNVtXZvEpvzpybuVjHFS6bM+ vZuhHwXJHnJNC3cVBU/JQrwLdR1oYPas+64UJtfTwxs2AXqsjbBgXubSS9J/boEvWC1j zodCcDoCm4t2ncvJVAOaYnBXq+OPNERoupJDGXdISpn8AG3fghrR7rLXF5BaoQQuhf/v ydXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701068870; x=1701673670; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yncBKNuxBeXEf1T1sdiq2nXhaNeK0KdzufyHRNnVzQo=; b=NTm96jxO6pUF/jmF+R1PBAjc7J3sH6IUya7e6T2n8Nh2zaL+f009YcTc/MDaSS//h0 nrAEVVV8HXJ7THb5zHnuXsGg8Cwd7kV6uq6H810yqG2uQ01cQwyFotnZ4iJhfVtXC3kM QEXO83dRMr7ymEIHcF9IVnvIVdd+81yLm1nQOtgeBkTFvfLGW8Qbz/DufK6ybczUIHoR tGXK6Ev7mklW9YjZ2XeRBgHcsygb6OZXFXklyK8ALZdl409Xxr4KHv7plGDUAbU1eCXf 5vzQsVL/JQS67QZVd7gpgHbE8QVEu8bHZP3gn33KPNbHfKmSsNgXxQRSW1QQpFokppuz d3oQ== X-Gm-Message-State: AOJu0Yz5/hpJ0rSp+iYuMkXHTU+k6hG4Md3/pozqiMufs5Sli+/gPzaq lndz1hp/8O+33MlkizB3WHqiSg== X-Google-Smtp-Source: AGHT+IGIlYXxV0/6Hz/XM9VtJY6cDuUp1mCFCtnsRfP772eQikHti4XfmoPs6aOvt6V4rIekwrrhow== X-Received: by 2002:a17:902:ee82:b0:1c6:2ae1:dc28 with SMTP id a2-20020a170902ee8200b001c62ae1dc28mr10388718pld.36.1701068869624; Sun, 26 Nov 2023 23:07:49 -0800 (PST) Received: from localhost.localdomain ([101.10.45.230]) by smtp.gmail.com with ESMTPSA id jh15-20020a170903328f00b001cfcd3a764esm1340134plb.77.2023.11.26.23.07.46 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2023 23:07:49 -0800 (PST) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net, conor.dooley@microchip.com, ebiggers@kernel.org, ardb@kernel.org Cc: heiko@sntech.de, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH v2 12/13] RISC-V: crypto: add Zvksh accelerated SM3 implementation Date: Mon, 27 Nov 2023 15:07:02 +0800 Message-Id: <20231127070703.1697-13-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231127070703.1697-1-jerry.shih@sifive.com> References: <20231127070703.1697-1-jerry.shih@sifive.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add SM3 implementation using Zvksh vector crypto extension from OpenSSL (openssl/openssl#21923). Co-developed-by: Christoph M=C3=BCllner Signed-off-by: Christoph M=C3=BCllner Co-developed-by: Heiko Stuebner Signed-off-by: Heiko Stuebner Signed-off-by: Jerry Shih --- Changelog v2: - Do not turn on kconfig `SM3_RISCV64` option by default. - Add `asmlinkage` qualifier for crypto asm function. - Rename sm3-riscv64-zvkb-zvksh to sm3-riscv64-zvksh-zvkb. - Reorder structure sm3_alg members initialization in the order declared. --- arch/riscv/crypto/Kconfig | 12 ++ arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/sm3-riscv64-glue.c | 124 +++++++++++++ arch/riscv/crypto/sm3-riscv64-zvksh.pl | 230 +++++++++++++++++++++++++ 4 files changed, 373 insertions(+) create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index b28cf1972250..7415fb303785 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -66,6 +66,18 @@ config CRYPTO_SHA512_RISCV64 - Zvknhb vector crypto extension - Zvkb vector crypto extension =20 +config CRYPTO_SM3_RISCV64 + tristate "Hash functions: SM3 (ShangMi 3)" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_HASH + select CRYPTO_SM3 + help + SM3 (ShangMi 3) secure hash function (OSCCA GM/T 0004-2012) + + Architecture: riscv64 using: + - Zvksh vector crypto extension + - Zvkb vector crypto extension + config CRYPTO_SM4_RISCV64 tristate "Ciphers: SM4 (ShangMi 4)" depends on 64BIT && RISCV_ISA_V diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 8e34861bba34..b1f857695c1c 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -18,6 +18,9 @@ sha256-riscv64-y :=3D sha256-riscv64-glue.o sha256-riscv6= 4-zvknha_or_zvknhb-zvkb.o obj-$(CONFIG_CRYPTO_SHA512_RISCV64) +=3D sha512-riscv64.o sha512-riscv64-y :=3D sha512-riscv64-glue.o sha512-riscv64-zvknhb-zvkb.o =20 +obj-$(CONFIG_CRYPTO_SM3_RISCV64) +=3D sm3-riscv64.o +sm3-riscv64-y :=3D sm3-riscv64-glue.o sm3-riscv64-zvksh.o + obj-$(CONFIG_CRYPTO_SM4_RISCV64) +=3D sm4-riscv64.o sm4-riscv64-y :=3D sm4-riscv64-glue.o sm4-riscv64-zvksed.o =20 @@ -42,6 +45,9 @@ $(obj)/sha256-riscv64-zvknha_or_zvknhb-zvkb.S: $(src)/sha= 256-riscv64-zvknha_or_z $(obj)/sha512-riscv64-zvknhb-zvkb.S: $(src)/sha512-riscv64-zvknhb-zvkb.pl $(call cmd,perlasm) =20 +$(obj)/sm3-riscv64-zvksh.S: $(src)/sm3-riscv64-zvksh.pl + $(call cmd,perlasm) + $(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl $(call cmd,perlasm) =20 @@ -51,4 +57,5 @@ clean-files +=3D aes-riscv64-zvkned-zvkb.S clean-files +=3D ghash-riscv64-zvkg.S clean-files +=3D sha256-riscv64-zvknha_or_zvknhb-zvkb.S clean-files +=3D sha512-riscv64-zvknhb-zvkb.S +clean-files +=3D sm3-riscv64-zvksh.S clean-files +=3D sm4-riscv64-zvksed.S diff --git a/arch/riscv/crypto/sm3-riscv64-glue.c b/arch/riscv/crypto/sm3-r= iscv64-glue.c new file mode 100644 index 000000000000..63c7af338877 --- /dev/null +++ b/arch/riscv/crypto/sm3-riscv64-glue.c @@ -0,0 +1,124 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Linux/riscv64 port of the OpenSSL SM3 implementation for RISC-V 64 + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + * + * Copyright (C) 2023 SiFive, Inc. + * Author: Jerry Shih + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * sm3 using zvksh vector crypto extension + * + * This asm function will just take the first 256-bit as the sm3 state from + * the pointer to `struct sm3_state`. + */ +asmlinkage void ossl_hwsm3_block_data_order_zvksh(struct sm3_state *digest, + u8 const *o, int num); + +static int riscv64_sm3_update(struct shash_desc *desc, const u8 *data, + unsigned int len) +{ + int ret =3D 0; + + /* + * Make sure struct sm3_state begins directly with the SM3 256-bit intern= al + * state, as this is what the asm function expect. + */ + BUILD_BUG_ON(offsetof(struct sm3_state, state) !=3D 0); + + if (crypto_simd_usable()) { + kernel_vector_begin(); + ret =3D sm3_base_do_update(desc, data, len, + ossl_hwsm3_block_data_order_zvksh); + kernel_vector_end(); + } else { + sm3_update(shash_desc_ctx(desc), data, len); + } + + return ret; +} + +static int riscv64_sm3_finup(struct shash_desc *desc, const u8 *data, + unsigned int len, u8 *out) +{ + struct sm3_state *ctx; + + if (crypto_simd_usable()) { + kernel_vector_begin(); + if (len) + sm3_base_do_update(desc, data, len, + ossl_hwsm3_block_data_order_zvksh); + sm3_base_do_finalize(desc, ossl_hwsm3_block_data_order_zvksh); + kernel_vector_end(); + + return sm3_base_finish(desc, out); + } + + ctx =3D shash_desc_ctx(desc); + if (len) + sm3_update(ctx, data, len); + sm3_final(ctx, out); + + return 0; +} + +static int riscv64_sm3_final(struct shash_desc *desc, u8 *out) +{ + return riscv64_sm3_finup(desc, NULL, 0, out); +} + +static struct shash_alg sm3_alg =3D { + .init =3D sm3_base_init, + .update =3D riscv64_sm3_update, + .final =3D riscv64_sm3_final, + .finup =3D riscv64_sm3_finup, + .descsize =3D sizeof(struct sm3_state), + .digestsize =3D SM3_DIGEST_SIZE, + .base =3D { + .cra_blocksize =3D SM3_BLOCK_SIZE, + .cra_priority =3D 150, + .cra_name =3D "sm3", + .cra_driver_name =3D "sm3-riscv64-zvksh-zvkb", + .cra_module =3D THIS_MODULE, + }, +}; + +static inline bool check_sm3_ext(void) +{ + return riscv_isa_extension_available(NULL, ZVKSH) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >=3D 128; +} + +static int __init riscv64_riscv64_sm3_mod_init(void) +{ + if (check_sm3_ext()) + return crypto_register_shash(&sm3_alg); + + return -ENODEV; +} + +static void __exit riscv64_sm3_mod_fini(void) +{ + crypto_unregister_shash(&sm3_alg); +} + +module_init(riscv64_riscv64_sm3_mod_init); +module_exit(riscv64_sm3_mod_fini); + +MODULE_DESCRIPTION("SM3 (RISC-V accelerated)"); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("sm3"); diff --git a/arch/riscv/crypto/sm3-riscv64-zvksh.pl b/arch/riscv/crypto/sm3= -riscv64-zvksh.pl new file mode 100644 index 000000000000..942d78d982e9 --- /dev/null +++ b/arch/riscv/crypto/sm3-riscv64-zvksh.pl @@ -0,0 +1,230 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Christoph M=C3=BCllner +# Copyright (c) 2023, Jerry Shih +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# The generated code of this file depends on the following RISC-V extensio= ns: +# - RV64I +# - RISC-V Vector ('V') with VLEN >=3D 128 +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') +# - RISC-V Vector SM3 Secure Hash extension ('Zvksh') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extensio= n) +# $flavour is the first argument if it doesn't look like a file +my $output =3D $#ARGV >=3D 0 && $ARGV[$#ARGV] =3D~ m|\.\w+$| ? pop : undef; +my $flavour =3D $#ARGV >=3D 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=3D<<___; +.text +___ + +##########################################################################= ###### +# ossl_hwsm3_block_data_order_zvksh(SM3_CTX *c, const void *p, size_t num); +{ +my ($CTX, $INPUT, $NUM) =3D ("a0", "a1", "a2"); +my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, + $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15, + $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23, + $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, +) =3D map("v$_",(0..31)); + +$code .=3D <<___; +.text +.p2align 3 +.globl ossl_hwsm3_block_data_order_zvksh +.type ossl_hwsm3_block_data_order_zvksh,\@function +ossl_hwsm3_block_data_order_zvksh: + @{[vsetivli "zero", 8, "e32", "m2", "ta", "ma"]} + + # Load initial state of hash context (c->A-H). + @{[vle32_v $V0, $CTX]} + @{[vrev8_v $V0, $V0]} + +L_sm3_loop: + # Copy the previous state to v2. + # It will be XOR'ed with the current state at the end of the round. + @{[vmv_v_v $V2, $V0]} + + # Load the 64B block in 2x32B chunks. + @{[vle32_v $V6, $INPUT]} # v6 :=3D {w7, ..., w0} + addi $INPUT, $INPUT, 32 + + @{[vle32_v $V8, $INPUT]} # v8 :=3D {w15, ..., w8} + addi $INPUT, $INPUT, 32 + + addi $NUM, $NUM, -1 + + # As vsm3c consumes only w0, w1, w4, w5 we need to slide the input + # 2 elements down so we process elements w2, w3, w6, w7 + # This will be repeated for each odd round. + @{[vslidedown_vi $V4, $V6, 2]} # v4 :=3D {X, X, w7, ..., w2} + + @{[vsm3c_vi $V0, $V6, 0]} + @{[vsm3c_vi $V0, $V4, 1]} + + # Prepare a vector with {w11, ..., w4} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, X, X, w7, ..., w4} + @{[vslideup_vi $V4, $V8, 4]} # v4 :=3D {w11, w10, w9, w8, w7, w6, w5= , w4} + + @{[vsm3c_vi $V0, $V4, 2]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, w11, w10, w9, w8, w7, = w6} + @{[vsm3c_vi $V0, $V4, 3]} + + @{[vsm3c_vi $V0, $V8, 4]} + @{[vslidedown_vi $V4, $V8, 2]} # v4 :=3D {X, X, w15, w14, w13, w12, w1= 1, w10} + @{[vsm3c_vi $V0, $V4, 5]} + + @{[vsm3me_vv $V6, $V8, $V6]} # v6 :=3D {w23, w22, w21, w20, w19, w18= , w17, w16} + + # Prepare a register with {w19, w18, w17, w16, w15, w14, w13, w12} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, X, X, w15, w14, w13, w= 12} + @{[vslideup_vi $V4, $V6, 4]} # v4 :=3D {w19, w18, w17, w16, w15, w14= , w13, w12} + + @{[vsm3c_vi $V0, $V4, 6]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, w19, w18, w17, w16, w1= 5, w14} + @{[vsm3c_vi $V0, $V4, 7]} + + @{[vsm3c_vi $V0, $V6, 8]} + @{[vslidedown_vi $V4, $V6, 2]} # v4 :=3D {X, X, w23, w22, w21, w20, w1= 9, w18} + @{[vsm3c_vi $V0, $V4, 9]} + + @{[vsm3me_vv $V8, $V6, $V8]} # v8 :=3D {w31, w30, w29, w28, w27, w26= , w25, w24} + + # Prepare a register with {w27, w26, w25, w24, w23, w22, w21, w20} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, X, X, w23, w22, w21, w= 20} + @{[vslideup_vi $V4, $V8, 4]} # v4 :=3D {w27, w26, w25, w24, w23, w22= , w21, w20} + + @{[vsm3c_vi $V0, $V4, 10]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, w27, w26, w25, w24, w2= 3, w22} + @{[vsm3c_vi $V0, $V4, 11]} + + @{[vsm3c_vi $V0, $V8, 12]} + @{[vslidedown_vi $V4, $V8, 2]} # v4 :=3D {x, X, w31, w30, w29, w28, w2= 7, w26} + @{[vsm3c_vi $V0, $V4, 13]} + + @{[vsm3me_vv $V6, $V8, $V6]} # v6 :=3D {w32, w33, w34, w35, w36, w37= , w38, w39} + + # Prepare a register with {w35, w34, w33, w32, w31, w30, w29, w28} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, X, X, w31, w30, w29, w= 28} + @{[vslideup_vi $V4, $V6, 4]} # v4 :=3D {w35, w34, w33, w32, w31, w30= , w29, w28} + + @{[vsm3c_vi $V0, $V4, 14]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, w35, w34, w33, w32, w3= 1, w30} + @{[vsm3c_vi $V0, $V4, 15]} + + @{[vsm3c_vi $V0, $V6, 16]} + @{[vslidedown_vi $V4, $V6, 2]} # v4 :=3D {X, X, w39, w38, w37, w36, w3= 5, w34} + @{[vsm3c_vi $V0, $V4, 17]} + + @{[vsm3me_vv $V8, $V6, $V8]} # v8 :=3D {w47, w46, w45, w44, w43, w42= , w41, w40} + + # Prepare a register with {w43, w42, w41, w40, w39, w38, w37, w36} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, X, X, w39, w38, w37, w= 36} + @{[vslideup_vi $V4, $V8, 4]} # v4 :=3D {w43, w42, w41, w40, w39, w38= , w37, w36} + + @{[vsm3c_vi $V0, $V4, 18]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, w43, w42, w41, w40, w3= 9, w38} + @{[vsm3c_vi $V0, $V4, 19]} + + @{[vsm3c_vi $V0, $V8, 20]} + @{[vslidedown_vi $V4, $V8, 2]} # v4 :=3D {X, X, w47, w46, w45, w44, w4= 3, w42} + @{[vsm3c_vi $V0, $V4, 21]} + + @{[vsm3me_vv $V6, $V8, $V6]} # v6 :=3D {w55, w54, w53, w52, w51, w50= , w49, w48} + + # Prepare a register with {w51, w50, w49, w48, w47, w46, w45, w44} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, X, X, w47, w46, w45, w= 44} + @{[vslideup_vi $V4, $V6, 4]} # v4 :=3D {w51, w50, w49, w48, w47, w46= , w45, w44} + + @{[vsm3c_vi $V0, $V4, 22]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, w51, w50, w49, w48, w4= 7, w46} + @{[vsm3c_vi $V0, $V4, 23]} + + @{[vsm3c_vi $V0, $V6, 24]} + @{[vslidedown_vi $V4, $V6, 2]} # v4 :=3D {X, X, w55, w54, w53, w52, w5= 1, w50} + @{[vsm3c_vi $V0, $V4, 25]} + + @{[vsm3me_vv $V8, $V6, $V8]} # v8 :=3D {w63, w62, w61, w60, w59, w58= , w57, w56} + + # Prepare a register with {w59, w58, w57, w56, w55, w54, w53, w52} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, X, X, w55, w54, w53, w= 52} + @{[vslideup_vi $V4, $V8, 4]} # v4 :=3D {w59, w58, w57, w56, w55, w54= , w53, w52} + + @{[vsm3c_vi $V0, $V4, 26]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, w59, w58, w57, w56, w5= 5, w54} + @{[vsm3c_vi $V0, $V4, 27]} + + @{[vsm3c_vi $V0, $V8, 28]} + @{[vslidedown_vi $V4, $V8, 2]} # v4 :=3D {X, X, w63, w62, w61, w60, w5= 9, w58} + @{[vsm3c_vi $V0, $V4, 29]} + + @{[vsm3me_vv $V6, $V8, $V6]} # v6 :=3D {w71, w70, w69, w68, w67, w66= , w65, w64} + + # Prepare a register with {w67, w66, w65, w64, w63, w62, w61, w60} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, X, X, w63, w62, w61, w= 60} + @{[vslideup_vi $V4, $V6, 4]} # v4 :=3D {w67, w66, w65, w64, w63, w62= , w61, w60} + + @{[vsm3c_vi $V0, $V4, 30]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 :=3D {X, X, w67, w66, w65, w64, w6= 3, w62} + @{[vsm3c_vi $V0, $V4, 31]} + + # XOR in the previous state. + @{[vxor_vv $V0, $V0, $V2]} + + bnez $NUM, L_sm3_loop # Check if there are any more block to proce= ss +L_sm3_end: + @{[vrev8_v $V0, $V0]} + @{[vse32_v $V0, $CTX]} + ret + +.size ossl_hwsm3_block_data_order_zvksh,.-ossl_hwsm3_block_data_order_zvksh +___ +} + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; --=20 2.28.0 From nobody Wed Dec 17 12:18:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E254C4167B for ; Mon, 27 Nov 2023 07:09:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232545AbjK0HI6 (ORCPT ); Mon, 27 Nov 2023 02:08:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232532AbjK0HIV (ORCPT ); Mon, 27 Nov 2023 02:08:21 -0500 Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D5881BF8 for ; Sun, 26 Nov 2023 23:07:53 -0800 (PST) Received: by mail-pf1-x430.google.com with SMTP id d2e1a72fcca58-6cb55001124so3316572b3a.0 for ; Sun, 26 Nov 2023 23:07:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1701068872; x=1701673672; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=65aClSoG7Vvogs/9DSpZCzSUcof37h/9Wvainxt/D1s=; b=H7+h9Bh00LE8XR7Jr/vBNO2taq3FUyH8g0iUjXm7Os3QHcR0RlIrXXGT61SP8IPi28 xrY09NcEU+fW6ayTRmLyGCJrvcLbkeVikGZuT5wpd+/Ix0zyPktUU+M3HVBu5tyE1hGj WOnAdPluO5L2nejsKOjdsFMPIPutyS+ZoomZUHR0HcyoG/f/J+OXfDE0berae91+ntjl SygPY0Zipe/WL15rg0R9crrcR4EeUzODaJfz9crQJF+EwFWYezDs4Q3kBPBz/oRYKDMT hfu5Oic44rutV8jgym3/FiHaF1Gb+zKZs8KjKXOcBio1zeFIqCjZA2a2VgEDIYcihec3 oOCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701068872; x=1701673672; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=65aClSoG7Vvogs/9DSpZCzSUcof37h/9Wvainxt/D1s=; b=foftH0ZPb2B2TM+BrNMJLnKVFDg0WCuEo+owcsjfxNgTlgwZuHMByy/9QbCsKJr1ms /JcyiARmQ4XCq+OAgBYpT+ehjR1U5f1kw6q+vWnjmrzKZcsOL780koPaoR0Fud/4mhSA 1lssM93ElNidf0Qbr03fKfyWZYNzNOmLNRzF6k0DX/pnGTVKfQpV0V7F94U+NMgBYnjv +/lALD7Q39kTILqVIQNMVR0hZgBXO7RsDDc/CQwudkuKsv1/KsODp8O3jOck3YdzaMrj jWnZsoGzrYbLUlw3bCahg3YLXBVWG7tOMGafhTxqnKYdckZHB5RnbYCOf/NNWJV3/412 tp3Q== X-Gm-Message-State: AOJu0YxFfPJoUVwFz/+4id9FlgNfeMjrBEK4t/arlJnNpNtZd9mg6OHY tAoOI+71HL96AP5ekh1rPvxZ4A== X-Google-Smtp-Source: AGHT+IHcnauxtqQExXlH4Csib1LUNoeakDZRIE0yrXb2DtrTMmRFI7ZHKdBCpbiQLIdAZt7kTNKppQ== X-Received: by 2002:a17:902:c942:b0:1cd:f823:456d with SMTP id i2-20020a170902c94200b001cdf823456dmr14867788pla.20.1701068872601; Sun, 26 Nov 2023 23:07:52 -0800 (PST) Received: from localhost.localdomain ([101.10.45.230]) by smtp.gmail.com with ESMTPSA id jh15-20020a170903328f00b001cfcd3a764esm1340134plb.77.2023.11.26.23.07.49 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 26 Nov 2023 23:07:52 -0800 (PST) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net, conor.dooley@microchip.com, ebiggers@kernel.org, ardb@kernel.org Cc: heiko@sntech.de, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH v2 13/13] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation Date: Mon, 27 Nov 2023 15:07:03 +0800 Message-Id: <20231127070703.1697-14-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231127070703.1697-1-jerry.shih@sifive.com> References: <20231127070703.1697-1-jerry.shih@sifive.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add a ChaCha20 vector implementation from OpenSSL(openssl/openssl#21923). Signed-off-by: Jerry Shih --- Changelog v2: - Do not turn on kconfig `CHACHA20_RISCV64` option by default. - Use simd skcipher interface. - Add `asmlinkage` qualifier for crypto asm function. - Reorder structure riscv64_chacha_alg_zvkb members initialization in the order declared. - Use smaller iv buffer instead of whole state matrix as chacha20's input. --- arch/riscv/crypto/Kconfig | 12 + arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/chacha-riscv64-glue.c | 122 +++++++++ arch/riscv/crypto/chacha-riscv64-zvkb.pl | 321 +++++++++++++++++++++++ 4 files changed, 462 insertions(+) create mode 100644 arch/riscv/crypto/chacha-riscv64-glue.c create mode 100644 arch/riscv/crypto/chacha-riscv64-zvkb.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 7415fb303785..1932297a1e73 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -34,6 +34,18 @@ config CRYPTO_AES_BLOCK_RISCV64 - Zvkb vector crypto extension (CTR/XTS) - Zvkg vector crypto extension (XTS) =20 +config CRYPTO_CHACHA20_RISCV64 + tristate "Ciphers: ChaCha20" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_SIMD + select CRYPTO_SKCIPHER + select CRYPTO_LIB_CHACHA_GENERIC + help + Length-preserving ciphers: ChaCha20 stream cipher algorithm + + Architecture: riscv64 using: + - Zvkb vector crypto extension + config CRYPTO_GHASH_RISCV64 tristate "Hash functions: GHASH" depends on 64BIT && RISCV_ISA_V diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index b1f857695c1c..748c53aa38dc 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -9,6 +9,9 @@ aes-riscv64-y :=3D aes-riscv64-glue.o aes-riscv64-zvkned.o obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) +=3D aes-block-riscv64.o aes-block-riscv64-y :=3D aes-riscv64-block-mode-glue.o aes-riscv64-zvkned-= zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o =20 +obj-$(CONFIG_CRYPTO_CHACHA20_RISCV64) +=3D chacha-riscv64.o +chacha-riscv64-y :=3D chacha-riscv64-glue.o chacha-riscv64-zvkb.o + obj-$(CONFIG_CRYPTO_GHASH_RISCV64) +=3D ghash-riscv64.o ghash-riscv64-y :=3D ghash-riscv64-glue.o ghash-riscv64-zvkg.o =20 @@ -36,6 +39,9 @@ $(obj)/aes-riscv64-zvkned-zvbb-zvkg.S: $(src)/aes-riscv64= -zvkned-zvbb-zvkg.pl $(obj)/aes-riscv64-zvkned-zvkb.S: $(src)/aes-riscv64-zvkned-zvkb.pl $(call cmd,perlasm) =20 +$(obj)/chacha-riscv64-zvkb.S: $(src)/chacha-riscv64-zvkb.pl + $(call cmd,perlasm) + $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl $(call cmd,perlasm) =20 @@ -54,6 +60,7 @@ $(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl clean-files +=3D aes-riscv64-zvkned.S clean-files +=3D aes-riscv64-zvkned-zvbb-zvkg.S clean-files +=3D aes-riscv64-zvkned-zvkb.S +clean-files +=3D chacha-riscv64-zvkb.S clean-files +=3D ghash-riscv64-zvkg.S clean-files +=3D sha256-riscv64-zvknha_or_zvknhb-zvkb.S clean-files +=3D sha512-riscv64-zvknhb-zvkb.S diff --git a/arch/riscv/crypto/chacha-riscv64-glue.c b/arch/riscv/crypto/ch= acha-riscv64-glue.c new file mode 100644 index 000000000000..96047cb75222 --- /dev/null +++ b/arch/riscv/crypto/chacha-riscv64-glue.c @@ -0,0 +1,122 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Port of the OpenSSL ChaCha20 implementation for RISC-V 64 + * + * Copyright (C) 2023 SiFive, Inc. + * Author: Jerry Shih + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* chacha20 using zvkb vector crypto extension */ +asmlinkage void ChaCha20_ctr32_zvkb(u8 *out, const u8 *input, size_t len, + const u32 *key, const u32 *counter); + +static int chacha20_encrypt(struct skcipher_request *req) +{ + u32 iv[CHACHA_IV_SIZE / sizeof(u32)]; + u8 block_buffer[CHACHA_BLOCK_SIZE]; + struct crypto_skcipher *tfm =3D crypto_skcipher_reqtfm(req); + const struct chacha_ctx *ctx =3D crypto_skcipher_ctx(tfm); + struct skcipher_walk walk; + unsigned int nbytes; + unsigned int tail_bytes; + int err; + + iv[0] =3D get_unaligned_le32(req->iv); + iv[1] =3D get_unaligned_le32(req->iv + 4); + iv[2] =3D get_unaligned_le32(req->iv + 8); + iv[3] =3D get_unaligned_le32(req->iv + 12); + + err =3D skcipher_walk_virt(&walk, req, false); + while (walk.nbytes) { + nbytes =3D walk.nbytes & (~(CHACHA_BLOCK_SIZE - 1)); + tail_bytes =3D walk.nbytes & (CHACHA_BLOCK_SIZE - 1); + kernel_vector_begin(); + if (nbytes) { + ChaCha20_ctr32_zvkb(walk.dst.virt.addr, + walk.src.virt.addr, nbytes, + ctx->key, iv); + iv[0] +=3D nbytes / CHACHA_BLOCK_SIZE; + } + if (walk.nbytes =3D=3D walk.total && tail_bytes > 0) { + memcpy(block_buffer, walk.src.virt.addr + nbytes, + tail_bytes); + ChaCha20_ctr32_zvkb(block_buffer, block_buffer, + CHACHA_BLOCK_SIZE, ctx->key, iv); + memcpy(walk.dst.virt.addr + nbytes, block_buffer, + tail_bytes); + tail_bytes =3D 0; + } + kernel_vector_end(); + + err =3D skcipher_walk_done(&walk, tail_bytes); + } + + return err; +} + +static struct skcipher_alg riscv64_chacha_alg_zvkb[] =3D { + { + .setkey =3D chacha20_setkey, + .encrypt =3D chacha20_encrypt, + .decrypt =3D chacha20_encrypt, + .min_keysize =3D CHACHA_KEY_SIZE, + .max_keysize =3D CHACHA_KEY_SIZE, + .ivsize =3D CHACHA_IV_SIZE, + .chunksize =3D CHACHA_BLOCK_SIZE, + .walksize =3D CHACHA_BLOCK_SIZE * 4, + .base =3D { + .cra_flags =3D CRYPTO_ALG_INTERNAL, + .cra_blocksize =3D 1, + .cra_ctxsize =3D sizeof(struct chacha_ctx), + .cra_priority =3D 300, + .cra_name =3D "__chacha20", + .cra_driver_name =3D "__chacha20-riscv64-zvkb", + .cra_module =3D THIS_MODULE, + }, + } +}; + +static struct simd_skcipher_alg + *riscv64_chacha_simd_alg_zvkb[ARRAY_SIZE(riscv64_chacha_alg_zvkb)]; + +static inline bool check_chacha20_ext(void) +{ + return riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >=3D 128; +} + +static int __init riscv64_chacha_mod_init(void) +{ + if (check_chacha20_ext()) + return simd_register_skciphers_compat( + riscv64_chacha_alg_zvkb, + ARRAY_SIZE(riscv64_chacha_alg_zvkb), + riscv64_chacha_simd_alg_zvkb); + + return -ENODEV; +} + +static void __exit riscv64_chacha_mod_fini(void) +{ + simd_unregister_skciphers(riscv64_chacha_alg_zvkb, + ARRAY_SIZE(riscv64_chacha_alg_zvkb), + riscv64_chacha_simd_alg_zvkb); +} + +module_init(riscv64_chacha_mod_init); +module_exit(riscv64_chacha_mod_fini); + +MODULE_DESCRIPTION("ChaCha20 (RISC-V accelerated)"); +MODULE_AUTHOR("Jerry Shih "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("chacha20"); diff --git a/arch/riscv/crypto/chacha-riscv64-zvkb.pl b/arch/riscv/crypto/c= hacha-riscv64-zvkb.pl new file mode 100644 index 000000000000..6602ef79452f --- /dev/null +++ b/arch/riscv/crypto/chacha-riscv64-zvkb.pl @@ -0,0 +1,321 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023-2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Jerry Shih +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# - RV64I +# - RISC-V Vector ('V') with VLEN >=3D 128 +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extensio= n) +# $flavour is the first argument if it doesn't look like a file +my $output =3D $#ARGV >=3D 0 && $ARGV[$#ARGV] =3D~ m|\.\w+$| ? pop : un= def; +my $flavour =3D $#ARGV >=3D 0 && $ARGV[0] !~ m|\.| ? shift : unde= f; + +$output and open STDOUT, ">$output"; + +my $code =3D <<___; +.text +___ + +# void ChaCha20_ctr32_zvkb(unsigned char *out, const unsigned char *inp, +# size_t len, const unsigned int key[8], +# const unsigned int counter[4]); +##########################################################################= ###### +my ( $OUTPUT, $INPUT, $LEN, $KEY, $COUNTER ) =3D ( "a0", "a1", "a2", "a3",= "a4" ); +my ( $T0 ) =3D ( "t0" ); +my ( $CONST_DATA0, $CONST_DATA1, $CONST_DATA2, $CONST_DATA3 ) =3D + ( "a5", "a6", "a7", "t1" ); +my ( $KEY0, $KEY1, $KEY2,$KEY3, $KEY4, $KEY5, $KEY6, $KEY7, + $COUNTER0, $COUNTER1, $NONCE0, $NONCE1 +) =3D ( "s0", "s1", "s2", "s3", "s4", "s5", "s6", + "s7", "s8", "s9", "s10", "s11" ); +my ( $VL, $STRIDE, $CHACHA_LOOP_COUNT ) =3D ( "t2", "t3", "t4" ); +my ( + $V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, $V8, $V9, $V10, + $V11, $V12, $V13, $V14, $V15, $V16, $V17, $V18, $V19, $V20, $V21, + $V22, $V23, $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, +) =3D map( "v$_", ( 0 .. 31 ) ); + +sub chacha_quad_round_group { + my ( + $A0, $B0, $C0, $D0, $A1, $B1, $C1, $D1, + $A2, $B2, $C2, $D2, $A3, $B3, $C3, $D3 + ) =3D @_; + + my $code =3D <<___; + # a +=3D b; d ^=3D a; d <<<=3D 16; + @{[vadd_vv $A0, $A0, $B0]} + @{[vadd_vv $A1, $A1, $B1]} + @{[vadd_vv $A2, $A2, $B2]} + @{[vadd_vv $A3, $A3, $B3]} + @{[vxor_vv $D0, $D0, $A0]} + @{[vxor_vv $D1, $D1, $A1]} + @{[vxor_vv $D2, $D2, $A2]} + @{[vxor_vv $D3, $D3, $A3]} + @{[vror_vi $D0, $D0, 32 - 16]} + @{[vror_vi $D1, $D1, 32 - 16]} + @{[vror_vi $D2, $D2, 32 - 16]} + @{[vror_vi $D3, $D3, 32 - 16]} + # c +=3D d; b ^=3D c; b <<<=3D 12; + @{[vadd_vv $C0, $C0, $D0]} + @{[vadd_vv $C1, $C1, $D1]} + @{[vadd_vv $C2, $C2, $D2]} + @{[vadd_vv $C3, $C3, $D3]} + @{[vxor_vv $B0, $B0, $C0]} + @{[vxor_vv $B1, $B1, $C1]} + @{[vxor_vv $B2, $B2, $C2]} + @{[vxor_vv $B3, $B3, $C3]} + @{[vror_vi $B0, $B0, 32 - 12]} + @{[vror_vi $B1, $B1, 32 - 12]} + @{[vror_vi $B2, $B2, 32 - 12]} + @{[vror_vi $B3, $B3, 32 - 12]} + # a +=3D b; d ^=3D a; d <<<=3D 8; + @{[vadd_vv $A0, $A0, $B0]} + @{[vadd_vv $A1, $A1, $B1]} + @{[vadd_vv $A2, $A2, $B2]} + @{[vadd_vv $A3, $A3, $B3]} + @{[vxor_vv $D0, $D0, $A0]} + @{[vxor_vv $D1, $D1, $A1]} + @{[vxor_vv $D2, $D2, $A2]} + @{[vxor_vv $D3, $D3, $A3]} + @{[vror_vi $D0, $D0, 32 - 8]} + @{[vror_vi $D1, $D1, 32 - 8]} + @{[vror_vi $D2, $D2, 32 - 8]} + @{[vror_vi $D3, $D3, 32 - 8]} + # c +=3D d; b ^=3D c; b <<<=3D 7; + @{[vadd_vv $C0, $C0, $D0]} + @{[vadd_vv $C1, $C1, $D1]} + @{[vadd_vv $C2, $C2, $D2]} + @{[vadd_vv $C3, $C3, $D3]} + @{[vxor_vv $B0, $B0, $C0]} + @{[vxor_vv $B1, $B1, $C1]} + @{[vxor_vv $B2, $B2, $C2]} + @{[vxor_vv $B3, $B3, $C3]} + @{[vror_vi $B0, $B0, 32 - 7]} + @{[vror_vi $B1, $B1, 32 - 7]} + @{[vror_vi $B2, $B2, 32 - 7]} + @{[vror_vi $B3, $B3, 32 - 7]} +___ + + return $code; +} + +$code .=3D <<___; +.p2align 3 +.globl ChaCha20_ctr32_zvkb +.type ChaCha20_ctr32_zvkb,\@function +ChaCha20_ctr32_zvkb: + srli $LEN, $LEN, 6 + beqz $LEN, .Lend + + addi sp, sp, -96 + sd s0, 0(sp) + sd s1, 8(sp) + sd s2, 16(sp) + sd s3, 24(sp) + sd s4, 32(sp) + sd s5, 40(sp) + sd s6, 48(sp) + sd s7, 56(sp) + sd s8, 64(sp) + sd s9, 72(sp) + sd s10, 80(sp) + sd s11, 88(sp) + + li $STRIDE, 64 + + #### chacha block data + # "expa" little endian + li $CONST_DATA0, 0x61707865 + # "nd 3" little endian + li $CONST_DATA1, 0x3320646e + # "2-by" little endian + li $CONST_DATA2, 0x79622d32 + # "te k" little endian + li $CONST_DATA3, 0x6b206574 + + lw $KEY0, 0($KEY) + lw $KEY1, 4($KEY) + lw $KEY2, 8($KEY) + lw $KEY3, 12($KEY) + lw $KEY4, 16($KEY) + lw $KEY5, 20($KEY) + lw $KEY6, 24($KEY) + lw $KEY7, 28($KEY) + + lw $COUNTER0, 0($COUNTER) + lw $COUNTER1, 4($COUNTER) + lw $NONCE0, 8($COUNTER) + lw $NONCE1, 12($COUNTER) + +.Lblock_loop: + @{[vsetvli $VL, $LEN, "e32", "m1", "ta", "ma"]} + + # init chacha const states + @{[vmv_v_x $V0, $CONST_DATA0]} + @{[vmv_v_x $V1, $CONST_DATA1]} + @{[vmv_v_x $V2, $CONST_DATA2]} + @{[vmv_v_x $V3, $CONST_DATA3]} + + # init chacha key states + @{[vmv_v_x $V4, $KEY0]} + @{[vmv_v_x $V5, $KEY1]} + @{[vmv_v_x $V6, $KEY2]} + @{[vmv_v_x $V7, $KEY3]} + @{[vmv_v_x $V8, $KEY4]} + @{[vmv_v_x $V9, $KEY5]} + @{[vmv_v_x $V10, $KEY6]} + @{[vmv_v_x $V11, $KEY7]} + + # init chacha key states + @{[vid_v $V12]} + @{[vadd_vx $V12, $V12, $COUNTER0]} + @{[vmv_v_x $V13, $COUNTER1]} + + # init chacha nonce states + @{[vmv_v_x $V14, $NONCE0]} + @{[vmv_v_x $V15, $NONCE1]} + + # load the top-half of input data + @{[vlsseg_nf_e32_v 8, $V16, $INPUT, $STRIDE]} + + li $CHACHA_LOOP_COUNT, 10 +.Lround_loop: + addi $CHACHA_LOOP_COUNT, $CHACHA_LOOP_COUNT, -1 + @{[chacha_quad_round_group + $V0, $V4, $V8, $V12, + $V1, $V5, $V9, $V13, + $V2, $V6, $V10, $V14, + $V3, $V7, $V11, $V15]} + @{[chacha_quad_round_group + $V0, $V5, $V10, $V15, + $V1, $V6, $V11, $V12, + $V2, $V7, $V8, $V13, + $V3, $V4, $V9, $V14]} + bnez $CHACHA_LOOP_COUNT, .Lround_loop + + # load the bottom-half of input data + addi $T0, $INPUT, 32 + @{[vlsseg_nf_e32_v 8, $V24, $T0, $STRIDE]} + + # add chacha top-half initial block states + @{[vadd_vx $V0, $V0, $CONST_DATA0]} + @{[vadd_vx $V1, $V1, $CONST_DATA1]} + @{[vadd_vx $V2, $V2, $CONST_DATA2]} + @{[vadd_vx $V3, $V3, $CONST_DATA3]} + @{[vadd_vx $V4, $V4, $KEY0]} + @{[vadd_vx $V5, $V5, $KEY1]} + @{[vadd_vx $V6, $V6, $KEY2]} + @{[vadd_vx $V7, $V7, $KEY3]} + # xor with the top-half input + @{[vxor_vv $V16, $V16, $V0]} + @{[vxor_vv $V17, $V17, $V1]} + @{[vxor_vv $V18, $V18, $V2]} + @{[vxor_vv $V19, $V19, $V3]} + @{[vxor_vv $V20, $V20, $V4]} + @{[vxor_vv $V21, $V21, $V5]} + @{[vxor_vv $V22, $V22, $V6]} + @{[vxor_vv $V23, $V23, $V7]} + + # save the top-half of output + @{[vssseg_nf_e32_v 8, $V16, $OUTPUT, $STRIDE]} + + # add chacha bottom-half initial block states + @{[vadd_vx $V8, $V8, $KEY4]} + @{[vadd_vx $V9, $V9, $KEY5]} + @{[vadd_vx $V10, $V10, $KEY6]} + @{[vadd_vx $V11, $V11, $KEY7]} + @{[vid_v $V0]} + @{[vadd_vx $V12, $V12, $COUNTER0]} + @{[vadd_vx $V13, $V13, $COUNTER1]} + @{[vadd_vx $V14, $V14, $NONCE0]} + @{[vadd_vx $V15, $V15, $NONCE1]} + @{[vadd_vv $V12, $V12, $V0]} + # xor with the bottom-half input + @{[vxor_vv $V24, $V24, $V8]} + @{[vxor_vv $V25, $V25, $V9]} + @{[vxor_vv $V26, $V26, $V10]} + @{[vxor_vv $V27, $V27, $V11]} + @{[vxor_vv $V29, $V29, $V13]} + @{[vxor_vv $V28, $V28, $V12]} + @{[vxor_vv $V30, $V30, $V14]} + @{[vxor_vv $V31, $V31, $V15]} + + # save the bottom-half of output + addi $T0, $OUTPUT, 32 + @{[vssseg_nf_e32_v 8, $V24, $T0, $STRIDE]} + + # update counter + add $COUNTER0, $COUNTER0, $VL + sub $LEN, $LEN, $VL + # increase offset for `4 * 16 * VL =3D 64 * VL` + slli $T0, $VL, 6 + add $INPUT, $INPUT, $T0 + add $OUTPUT, $OUTPUT, $T0 + bnez $LEN, .Lblock_loop + + ld s0, 0(sp) + ld s1, 8(sp) + ld s2, 16(sp) + ld s3, 24(sp) + ld s4, 32(sp) + ld s5, 40(sp) + ld s6, 48(sp) + ld s7, 56(sp) + ld s8, 64(sp) + ld s9, 72(sp) + ld s10, 80(sp) + ld s11, 88(sp) + addi sp, sp, 96 + +.Lend: + ret +.size ChaCha20_ctr32_zvkb,.-ChaCha20_ctr32_zvkb +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; --=20 2.28.0