[PATCH v2] fscrypt: improve filename encryption and decryption performance

Yuwen Chen posted 1 patch 3 months ago
fs/crypto/fname.c         | 22 ++++++++++++++--------
include/crypto/skcipher.h |  9 +++++++++
2 files changed, 23 insertions(+), 8 deletions(-)
[PATCH v2] fscrypt: improve filename encryption and decryption performance
Posted by Yuwen Chen 3 months ago
With the CONFIG_UNICODE configuration enabled, the fname_decrypt
and fscrypt_fname_encrypt functions may be called very frequently.
Since filenames are generally short, the frequent invocation of
memory allocation and release operations by these two functions
will lead to very poor performance.

Use the following program to conduct a file-creation test in
the private program directory(/data/media/0/Android/data/*)
of Android.

int main(int argc, char **argv)
{
    size_t fcnt = 0;
    char path[PATH_MAX];
    char buf[4096] = {0};
    int i, fd;

    if (argc < 2)
        return - EINVAL;

    fcnt = atoi(argv[1]);
    for (i = 0; i < fcnt; i++) {
        snprintf(path, sizeof(path), "./%d", i);
        fd = open(path, O_RDWR | O_CREAT, 0600);
        if (fd < 0)
            return - 1;
        write(fd, buf, sizeof(buf));
        close(fd);
    }
    return 0;
}

The test platform is Snapdragon 8s Gen4, with a kernel version
of v6.6 and a userdebug version.

Before this submission was merged, when creating 2000 files,
the performance test results are as follows:
$ time /data/file_creater 2000
0m14.83s real     0m00.00s user     0m14.30s system
0m15.61s real     0m00.00s user     0m15.04s system
0m14.72s real     0m00.01s user     0m14.18s system

After this submission was merged, the performance is as follows:
$ time /data/file_creater 2000
0m07.64s real     0m00.00s user     0m07.37s system
0m07.66s real     0m00.00s user     0m07.40s system
0m08.67s real     0m00.01s user     0m08.35s system

Signed-off-by: Yuwen Chen <ywen.chen@foxmail.com>
---
 fs/crypto/fname.c         | 22 ++++++++++++++--------
 include/crypto/skcipher.h |  9 +++++++++
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 010f9c0a4c2f1..168b2a88fa23b 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -92,14 +92,20 @@ static inline bool fscrypt_is_dot_dotdot(const struct qstr *str)
 int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
 			  u8 *out, unsigned int olen)
 {
-	struct skcipher_request *req = NULL;
 	DECLARE_CRYPTO_WAIT(wait);
 	const struct fscrypt_inode_info *ci = inode->i_crypt_info;
 	struct crypto_skcipher *tfm = ci->ci_enc_key.tfm;
+	SKCIPHER_REQUEST_ON_STACK(req, tfm, MAX_SKCIPHER_REQSIZE);
 	union fscrypt_iv iv;
 	struct scatterlist sg;
 	int res;
 
+	/*
+	 * When the size of the statically - allocated skcipher_request
+	 * structure is insufficient, remind us to make modifications.
+	 */
+	BUG_ON(MAX_SKCIPHER_REQSIZE < crypto_skcipher_reqsize(tfm));
+
 	/*
 	 * Copy the filename to the output buffer for encrypting in-place and
 	 * pad it with the needed number of NUL bytes.
@@ -124,7 +130,6 @@ int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
 
 	/* Do the encryption */
 	res = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
-	skcipher_request_free(req);
 	if (res < 0) {
 		fscrypt_err(inode, "Filename encryption failed: %d", res);
 		return res;
@@ -148,18 +153,20 @@ static int fname_decrypt(const struct inode *inode,
 			 const struct fscrypt_str *iname,
 			 struct fscrypt_str *oname)
 {
-	struct skcipher_request *req = NULL;
 	DECLARE_CRYPTO_WAIT(wait);
 	struct scatterlist src_sg, dst_sg;
 	const struct fscrypt_inode_info *ci = inode->i_crypt_info;
 	struct crypto_skcipher *tfm = ci->ci_enc_key.tfm;
+	SKCIPHER_REQUEST_ON_STACK(req, tfm, MAX_SKCIPHER_REQSIZE);
 	union fscrypt_iv iv;
 	int res;
 
-	/* Allocate request */
-	req = skcipher_request_alloc(tfm, GFP_NOFS);
-	if (!req)
-		return -ENOMEM;
+	/*
+	 * When the size of the statically - allocated skcipher_request
+	 * structure is insufficient, remind us to make modifications.
+	 */
+	BUG_ON(MAX_SKCIPHER_REQSIZE < crypto_skcipher_reqsize(tfm));
+
 	skcipher_request_set_callback(req,
 		CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
 		crypto_req_done, &wait);
@@ -172,7 +179,6 @@ static int fname_decrypt(const struct inode *inode,
 	sg_init_one(&dst_sg, oname->name, oname->len);
 	skcipher_request_set_crypt(req, &src_sg, &dst_sg, iname->len, &iv);
 	res = crypto_wait_req(crypto_skcipher_decrypt(req), &wait);
-	skcipher_request_free(req);
 	if (res < 0) {
 		fscrypt_err(inode, "Filename decryption failed: %d", res);
 		return res;
diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 9e5853464345b..77e74a038c36e 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -213,6 +213,7 @@ struct lskcipher_alg {
 };
 
 #define MAX_SYNC_SKCIPHER_REQSIZE      384
+#define MAX_SKCIPHER_REQSIZE	       384
 /*
  * This performs a type-check against the "_tfm" argument to make sure
  * all users have the correct skcipher tfm for doing on-stack requests.
@@ -226,6 +227,14 @@ struct lskcipher_alg {
 			crypto_sync_skcipher_tfm((_tfm)), \
 		 (void *)__##name##_desc)
 
+#define SKCIPHER_REQUEST_ON_STACK(name, _tfm, reqsize) \
+	char __##name##_desc[sizeof(struct skcipher_request) + reqsize \
+		] CRYPTO_MINALIGN_ATTR; \
+	struct skcipher_request *name = \
+		(((struct skcipher_request *)__##name##_desc)->base.tfm = \
+			crypto_skcipher_tfm((_tfm)), \
+		 (void *)__##name##_desc)
+
 /**
  * DOC: Symmetric Key Cipher API
  *
-- 
2.34.1
Re: [PATCH v2] fscrypt: improve filename encryption and decryption performance
Posted by Eric Biggers 3 months ago
On Fri, Jul 04, 2025 at 12:13:22PM +0800, Yuwen Chen wrote:
> With the CONFIG_UNICODE configuration enabled, the fname_decrypt
> and fscrypt_fname_encrypt functions may be called very frequently.
> Since filenames are generally short, the frequent invocation of
> memory allocation and release operations by these two functions
> will lead to very poor performance.
> 
> Use the following program to conduct a file-creation test in
> the private program directory(/data/media/0/Android/data/*)
> of Android.
> 
> int main(int argc, char **argv)
> {
>     size_t fcnt = 0;
>     char path[PATH_MAX];
>     char buf[4096] = {0};
>     int i, fd;
> 
>     if (argc < 2)
>         return - EINVAL;
> 
>     fcnt = atoi(argv[1]);
>     for (i = 0; i < fcnt; i++) {
>         snprintf(path, sizeof(path), "./%d", i);
>         fd = open(path, O_RDWR | O_CREAT, 0600);
>         if (fd < 0)
>             return - 1;
>         write(fd, buf, sizeof(buf));
>         close(fd);
>     }
>     return 0;
> }
> 
> The test platform is Snapdragon 8s Gen4, with a kernel version
> of v6.6 and a userdebug version.
> 
> Before this submission was merged, when creating 2000 files,
> the performance test results are as follows:
> $ time /data/file_creater 2000
> 0m14.83s real     0m00.00s user     0m14.30s system
> 0m15.61s real     0m00.00s user     0m15.04s system
> 0m14.72s real     0m00.01s user     0m14.18s system
> 
> After this submission was merged, the performance is as follows:
> $ time /data/file_creater 2000
> 0m07.64s real     0m00.00s user     0m07.37s system
> 0m07.66s real     0m00.00s user     0m07.40s system
> 0m08.67s real     0m00.01s user     0m08.35s system
> 
> Signed-off-by: Yuwen Chen <ywen.chen@foxmail.com>
> ---
>  fs/crypto/fname.c         | 22 ++++++++++++++--------
>  include/crypto/skcipher.h |  9 +++++++++
>  2 files changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
> index 010f9c0a4c2f1..168b2a88fa23b 100644
> --- a/fs/crypto/fname.c
> +++ b/fs/crypto/fname.c
> @@ -92,14 +92,20 @@ static inline bool fscrypt_is_dot_dotdot(const struct qstr *str)
>  int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
>  			  u8 *out, unsigned int olen)
>  {
> -	struct skcipher_request *req = NULL;
>  	DECLARE_CRYPTO_WAIT(wait);
>  	const struct fscrypt_inode_info *ci = inode->i_crypt_info;
>  	struct crypto_skcipher *tfm = ci->ci_enc_key.tfm;
> +	SKCIPHER_REQUEST_ON_STACK(req, tfm, MAX_SKCIPHER_REQSIZE);
>  	union fscrypt_iv iv;
>  	struct scatterlist sg;
>  	int res;
>  
> +	/*
> +	 * When the size of the statically - allocated skcipher_request
> +	 * structure is insufficient, remind us to make modifications.
> +	 */
> +	BUG_ON(MAX_SKCIPHER_REQSIZE < crypto_skcipher_reqsize(tfm));
> +
>  	/*
>  	 * Copy the filename to the output buffer for encrypting in-place and
>  	 * pad it with the needed number of NUL bytes.
> @@ -124,7 +130,6 @@ int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
>  
>  	/* Do the encryption */
>  	res = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
> -	skcipher_request_free(req);
>  	if (res < 0) {
>  		fscrypt_err(inode, "Filename encryption failed: %d", res);
>  		return res;

I'm guessing you have some debugging options enabled in your kconfig.  Usually
the allocations aren't quite *that* expensive.  That being said, it's always
been really annoying that they have to be there.

Unfortunately, as far as I know, you actually can't just allocate the
skcipher_request on the stack like that, since the legacy crypto API assumes
that the request memory is DMA-able.  On-stack requests also might not be
properly aligned (see
https://lore.kernel.org/all/CA+55aFxJOzMim_d-O2E2yip8JWo0NdYs_72sNwFKSkTjy8q0Sw@mail.gmail.com/
-- may be outdated, but I haven't heard otherwise).

The problem is really that the legacy crypto API (crypto_skcipher in this case)
was never really designed for efficient CPU-based crypto in the first place.
The correct solution is to add simple library APIs for the algorithms to
lib/crypto/, then update fscrypt to use that instead of crypto_skcipher.

I plan to do that, but I'm first focusing on other related things, such as doing
the same for fsverity.

- Eric
[PATCH v2] fscrypt: improve filename encryption and decryption performance
Posted by Yuwen Chen 3 months ago
On Thu, 3 Jul 2025 22:14:41 -0700, Eric Biggers wrote:
> I'm guessing you have some debugging options enabled in your kconfig.  Usually
> the allocations aren't quite *that* expensive.  That being said, it's always
> been really annoying that they have to be there.

Turn off most of the debugging options and merge these two patches
for memory allocation. The performance test results are as follows:
Before this submission was merged, when creating 10000 files,
the performance test results are as follows:
$ time /data/file_creater 10000
0m10.90s real     0m00.00s user     0m10.69s system

After merge these two patches, the performance is as follows:
$ time /data/file_creater 10000
0m05.32s real     0m00.00s user     0m05.28s system

> Unfortunately, as far as I know, you actually can't just allocate the
> skcipher_request on the stack like that, since the legacy crypto API assumes
> that the request memory is DMA-able.  On-stack requests also might not be
> properly aligned (see
> https://lore.kernel.org/all/CA+55aFxJOzMim_d-O2E2yip8JWo0NdYs_72sNwFKSkTjy8q0Sw@mail.gmail.com/
> -- may be outdated, but I haven't heard otherwise).

Thank you for the reminder. This should be a problem here.
Just, why can SYNC_SKCIPHER_REQUEST_ON_STACK be allocated on
the stack? Is it possible to use ALIGN to achieve alignment?

> The problem is really that the legacy crypto API (crypto_skcipher in this case)
> was never really designed for efficient CPU-based crypto in the first place.
> The correct solution is to add simple library APIs for the algorithms to
> lib/crypto/, then update fscrypt to use that instead of crypto_skcipher.
> I plan to do that, but I'm first focusing on other related things, such as doing
> the same for fsverity.

This sounds very good. For file name decryption, due to the
relatively small amount of data, the cost of interface calls
indeed cannot be ignored. Thank you very much for your guidance.
Re: [PATCH v2] fscrypt: improve filename encryption and decryption performance
Posted by Eric Biggers 3 months ago
On Tue, Jul 08, 2025 at 03:33:34PM +0800, Yuwen Chen wrote:
> On Thu, 3 Jul 2025 22:14:41 -0700, Eric Biggers wrote:
> > I'm guessing you have some debugging options enabled in your kconfig.  Usually
> > the allocations aren't quite *that* expensive.  That being said, it's always
> > been really annoying that they have to be there.
> 
> Turn off most of the debugging options and merge these two patches
> for memory allocation. The performance test results are as follows:
> Before this submission was merged, when creating 10000 files,
> the performance test results are as follows:
> $ time /data/file_creater 10000
> 0m10.90s real     0m00.00s user     0m10.69s system
> 
> After merge these two patches, the performance is as follows:
> $ time /data/file_creater 10000
> 0m05.32s real     0m00.00s user     0m05.28s system
> 
> > Unfortunately, as far as I know, you actually can't just allocate the
> > skcipher_request on the stack like that, since the legacy crypto API assumes
> > that the request memory is DMA-able.  On-stack requests also might not be
> > properly aligned (see
> > https://lore.kernel.org/all/CA+55aFxJOzMim_d-O2E2yip8JWo0NdYs_72sNwFKSkTjy8q0Sw@mail.gmail.com/
> > -- may be outdated, but I haven't heard otherwise).
> 
> Thank you for the reminder. This should be a problem here.
> Just, why can SYNC_SKCIPHER_REQUEST_ON_STACK be allocated on
> the stack? Is it possible to use ALIGN to achieve alignment?

I suppose that in practice the request alignment only matters for the
off-CPU offloads, and that's how SYNC_SKCIPHER_REQUEST_ON_STACK gets
away with maybe not aligning the request reliably.  If you look at e.g.
the software AES-XTS code, it doesn't even use the request context at
all, which makes the entire exercise a bit pointless.

I'm thinking we should just go ahead and use sync_skcipher and
SYNC_SKCIPHER_REQUEST_ON_STACK for now.  Previously this was impossible
because the x86 accelerated AES-XTS algorithms had CRYPTO_ALG_ASYNC set,
but now it is possible.

Can you review and test the following patchset:
https://lore.kernel.org/linux-fscrypt/20250708181313.66961-1-ebiggers@kernel.org/ ?

- Eric