From nobody Tue Dec 16 23:57:49 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C007EC4167B for ; Thu, 7 Dec 2023 21:27:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1444022AbjLGV1L (ORCPT ); Thu, 7 Dec 2023 16:27:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1444020AbjLGV0T (ORCPT ); Thu, 7 Dec 2023 16:26:19 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B982F1718 for ; Thu, 7 Dec 2023 13:24:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1701984276; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LZ8AFHsdVaRpANeTdm0bDlsPprhk6KdOQz2f6v+sGkU=; b=bQUJvm0npz3/ZYVvmhbaKouMXoMJjtIShTdj6OoamXnCdHvnVvmdwrwj2hlrbza2SUiRhw fKj2X4M/Vj+kbxJHk+u7+smuaRUNQ6QpfTZpk2zf8o/QTrhtlG3yLYVxkaN+OcLcUal5Zi I2Urq9p8WEtQ21l8WGPEcRL0On8zZeU= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-561-eRMO_W4-PjyBBRWUJZ1VeQ-1; Thu, 07 Dec 2023 16:24:30 -0500 X-MC-Unique: eRMO_W4-PjyBBRWUJZ1VeQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0F92F29AC015; Thu, 7 Dec 2023 21:24:29 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.42.28.161]) by smtp.corp.redhat.com (Postfix) with ESMTP id 56BDA1C060AF; Thu, 7 Dec 2023 21:24:26 +0000 (UTC) From: David Howells To: Jeff Layton , Steve French Cc: David Howells , Matthew Wilcox , Marc Dionne , Paulo Alcantara , Shyam Prasad N , Tom Talpey , Dominique Martinet , Eric Van Hensbergen , Ilya Dryomov , Christian Brauner , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 40/59] netfs: Support encryption on Unbuffered/DIO write Date: Thu, 7 Dec 2023 21:21:47 +0000 Message-ID: <20231207212206.1379128-41-dhowells@redhat.com> In-Reply-To: <20231207212206.1379128-1-dhowells@redhat.com> References: <20231207212206.1379128-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Support unbuffered and direct I/O writes to an encrypted file. This may require making an RMW cycle if the write is not appropriately aligned with respect to the crypto blocks. Signed-off-by: David Howells cc: Jeff Layton cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org --- fs/netfs/direct_read.c | 2 +- fs/netfs/direct_write.c | 210 ++++++++++++++++++++++++++++++++++- fs/netfs/internal.h | 8 ++ fs/netfs/io.c | 117 +++++++++++++++++++ fs/netfs/main.c | 1 + include/linux/netfs.h | 4 + include/trace/events/netfs.h | 1 + 7 files changed, 337 insertions(+), 6 deletions(-) diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c index 158719b56900..c01cbe42db8a 100644 --- a/fs/netfs/direct_read.c +++ b/fs/netfs/direct_read.c @@ -88,7 +88,7 @@ static int netfs_copy_xarray_to_iter(struct netfs_io_requ= est *rreq, * If we did a direct read to a bounce buffer (say we needed to decrypt it= ), * copy the data obtained to the destination iterator. */ -static int netfs_dio_copy_bounce_to_dest(struct netfs_io_request *rreq) +int netfs_dio_copy_bounce_to_dest(struct netfs_io_request *rreq) { struct iov_iter *dest_iter =3D &rreq->iter; struct kiocb *iocb =3D rreq->iocb; diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c index bb0c2718f57b..45165a3d5f99 100644 --- a/fs/netfs/direct_write.c +++ b/fs/netfs/direct_write.c @@ -23,6 +23,100 @@ static void netfs_cleanup_dio_write(struct netfs_io_req= uest *wreq) } } =20 +/* + * Allocate a bunch of pages and add them into the xarray buffer starting = at + * the given index. + */ +static int netfs_alloc_buffer(struct xarray *xa, pgoff_t index, unsigned i= nt nr_pages) +{ + struct page *page; + unsigned int n; + int ret =3D 0; + LIST_HEAD(list); + + n =3D alloc_pages_bulk_list(GFP_NOIO, nr_pages, &list); + if (n < nr_pages) { + ret =3D -ENOMEM; + } + + while ((page =3D list_first_entry_or_null(&list, struct page, lru))) { + list_del(&page->lru); + page->index =3D index; + ret =3D xa_insert(xa, index++, page, GFP_NOIO); + if (ret < 0) + break; + } + + while ((page =3D list_first_entry_or_null(&list, struct page, lru))) { + list_del(&page->lru); + __free_page(page); + } + return ret; +} + +/* + * Copy all of the data from the source iterator into folios in the destin= ation + * xarray. We cannot step through and kmap the source iterator if it's an + * iovec, so we have to step through the xarray and drop the RCU lock each + * time. + */ +static int netfs_copy_iter_to_xarray(struct iov_iter *src, struct xarray *= xa, + unsigned long long start) +{ + struct folio *folio; + void *base; + pgoff_t index =3D start / PAGE_SIZE; + size_t len, copied, count =3D iov_iter_count(src); + + XA_STATE(xas, xa, index); + + _enter("%zx", count); + + if (!count) + return -EIO; + + len =3D PAGE_SIZE - offset_in_page(start); + rcu_read_lock(); + xas_for_each(&xas, folio, ULONG_MAX) { + size_t offset; + + if (xas_retry(&xas, folio)) + continue; + + /* There shouldn't be a need to call xas_pause() as no one else + * can see the xarray we're iterating over. + */ + rcu_read_unlock(); + + offset =3D offset_in_folio(folio, start); + _debug("folio %lx +%zx [%llx]", folio->index, offset, start); + + while (offset < folio_size(folio)) { + len =3D min(count, len); + + base =3D kmap_local_folio(folio, offset); + copied =3D copy_from_iter(base, len, src); + kunmap_local(base); + if (copied !=3D len) + goto out; + count -=3D len; + if (count =3D=3D 0) + goto out; + + start +=3D len; + offset +=3D len; + len =3D PAGE_SIZE; + } + + rcu_read_lock(); + } + + rcu_read_unlock(); +out: + _leave(" =3D %zx", count); + return count ? -EIO : 0; +} + /* * Perform an unbuffered write where we may have to do an RMW operation on= an * encrypted file. This can also be used for direct I/O writes. @@ -31,20 +125,47 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kioc= b *iocb, struct iov_iter * struct netfs_group *netfs_group) { struct netfs_io_request *wreq; + struct netfs_inode *ctx =3D netfs_inode(file_inode(iocb->ki_filp)); + unsigned long long real_size =3D ctx->remote_i_size; unsigned long long start =3D iocb->ki_pos; unsigned long long end =3D start + iov_iter_count(iter); ssize_t ret, n; - bool async =3D !is_sync_kiocb(iocb); + size_t min_bsize =3D 1UL << ctx->min_bshift; + size_t bmask =3D min_bsize - 1; + size_t gap_before =3D start & bmask; + size_t gap_after =3D (min_bsize - end) & bmask; + bool use_bounce, async =3D !is_sync_kiocb(iocb); + enum { + DIRECT_IO, COPY_TO_BOUNCE, ENC_TO_BOUNCE, COPY_THEN_ENC, + } buffering; =20 _enter(""); =20 + /* The real size must be rounded out to the crypto block size plus + * any trailer we might want to attach. + */ + if (real_size && ctx->crypto_bshift) { + size_t cmask =3D 1UL << ctx->crypto_bshift; + + if (real_size < ctx->crypto_trailer) + return -EIO; + if ((real_size - ctx->crypto_trailer) & cmask) + return -EIO; + real_size -=3D ctx->crypto_trailer; + } + /* We're going to need a bounce buffer if what we transmit is going to * be different in some way to the source buffer, e.g. because it gets * encrypted/compressed or because it needs expanding to a block size. */ - // TODO + use_bounce =3D test_bit(NETFS_ICTX_ENCRYPTED, &ctx->flags); + if (gap_before || gap_after) { + if (iocb->ki_flags & IOCB_DIRECT) + return -EINVAL; + use_bounce =3D true; + } =20 - _debug("uw %llx-%llx", start, end); + _debug("uw %llx-%llx +%zx,%zx", start, end, gap_before, gap_after); =20 wreq =3D netfs_alloc_request(iocb->ki_filp->f_mapping, iocb->ki_filp, start, end - start, @@ -53,7 +174,57 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb= *iocb, struct iov_iter * if (IS_ERR(wreq)) return PTR_ERR(wreq); =20 - { + if (use_bounce) { + unsigned long long bstart =3D start - gap_before; + unsigned long long bend =3D end + gap_after; + pgoff_t first =3D bstart / PAGE_SIZE; + pgoff_t last =3D (bend - 1) / PAGE_SIZE; + + _debug("bounce %llx-%llx %lx-%lx", bstart, bend, first, last); + + ret =3D netfs_alloc_buffer(&wreq->bounce, first, last - first + 1); + if (ret < 0) + goto out; + + iov_iter_xarray(&wreq->io_iter, READ, &wreq->bounce, + bstart, bend - bstart); + + if (gap_before || gap_after) + async =3D false; /* We may have to repeat the RMW cycle */ + } + +repeat_rmw_cycle: + if (use_bounce) { + /* If we're going to need to do an RMW cycle, fill in the gaps + * at the ends of the buffer. + */ + if (gap_before || gap_after) { + struct iov_iter buffer =3D wreq->io_iter; + + if ((gap_before && start - gap_before < real_size) || + (gap_after && end < real_size)) { + ret =3D netfs_rmw_read(wreq, iocb->ki_filp, + start - gap_before, gap_before, + end, end < real_size ? gap_after : 0); + if (ret < 0) + goto out; + } + + if (gap_before && start - gap_before >=3D real_size) + iov_iter_zero(gap_before, &buffer); + if (gap_after && end >=3D real_size) { + iov_iter_advance(&buffer, end - start); + iov_iter_zero(gap_after, &buffer); + } + } + + if (!test_bit(NETFS_RREQ_CONTENT_ENCRYPTION, &wreq->flags)) + buffering =3D COPY_TO_BOUNCE; + else if (!gap_before && !gap_after && netfs_is_crypto_aligned(wreq, iter= )) + buffering =3D ENC_TO_BOUNCE; + else + buffering =3D COPY_THEN_ENC; + } else { /* If this is an async op and we're not using a bounce buffer, * we have to save the source buffer as the iterator is only * good until we return. In such a case, extract an iterator @@ -77,10 +248,25 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kioc= b *iocb, struct iov_iter * } =20 wreq->io_iter =3D wreq->iter; + buffering =3D DIRECT_IO; } =20 /* Copy the data into the bounce buffer and encrypt it. */ - // TODO + if (buffering =3D=3D COPY_TO_BOUNCE || + buffering =3D=3D COPY_THEN_ENC) { + ret =3D netfs_copy_iter_to_xarray(iter, &wreq->bounce, wreq->start); + if (ret < 0) + goto out; + wreq->iter =3D wreq->io_iter; + wreq->start -=3D gap_before; + wreq->len +=3D gap_before + gap_after; + } + + if (buffering =3D=3D COPY_THEN_ENC || + buffering =3D=3D ENC_TO_BOUNCE) { + if (!netfs_encrypt(wreq)) + goto out; + } =20 /* Dispatch the write. */ __set_bit(NETFS_RREQ_UPLOAD_TO_SERVER, &wreq->flags); @@ -101,6 +287,20 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kioc= b *iocb, struct iov_iter * wait_on_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS, TASK_UNINTERRUPTIBLE); =20 + /* See if the write failed due to a 3rd party race when doing + * an RMW on a partially modified block in an encrypted file. + */ + if (test_and_clear_bit(NETFS_RREQ_REPEAT_RMW, &wreq->flags)) { + netfs_clear_subrequests(wreq, false); + iov_iter_revert(iter, end - start); + wreq->error =3D 0; + wreq->start =3D start; + wreq->len =3D end - start; + wreq->transferred =3D 0; + wreq->submitted =3D 0; + goto repeat_rmw_cycle; + } + ret =3D wreq->error; _debug("waited =3D %zd", ret); if (ret =3D=3D 0) { diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index 7180e2931189..942578d98199 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -32,6 +32,11 @@ int netfs_prefetch_for_write(struct file *file, struct f= olio *folio, bool netfs_encrypt(struct netfs_io_request *wreq); void netfs_decrypt(struct netfs_io_request *rreq); =20 +/* + * direct_read.c + */ +int netfs_dio_copy_bounce_to_dest(struct netfs_io_request *rreq); + /* * direct_write.c */ @@ -42,6 +47,9 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *= iocb, struct iov_iter * * io.c */ int netfs_begin_read(struct netfs_io_request *rreq, bool sync); +ssize_t netfs_rmw_read(struct netfs_io_request *wreq, struct file *file, + unsigned long long start1, size_t len1, + unsigned long long start2, size_t len2); =20 /* * main.c diff --git a/fs/netfs/io.c b/fs/netfs/io.c index e4633ebc269f..ae014efe8728 100644 --- a/fs/netfs/io.c +++ b/fs/netfs/io.c @@ -780,3 +780,120 @@ int netfs_begin_read(struct netfs_io_request *rreq, b= ool sync) out: return ret; } + +static bool netfs_rmw_read_one(struct netfs_io_request *rreq, + unsigned long long start, size_t len) +{ + struct netfs_inode *ctx =3D netfs_inode(rreq->inode); + struct iov_iter io_iter; + unsigned long long pstart, end =3D start + len; + pgoff_t first, last; + ssize_t ret; + size_t min_bsize =3D 1UL << ctx->min_bshift; + + /* Determine the block we need to load. */ + end =3D round_up(end, min_bsize); + start =3D round_down(start, min_bsize); + + /* Determine the folios we need to insert. */ + pstart =3D round_down(start, PAGE_SIZE); + first =3D pstart / PAGE_SIZE; + last =3D DIV_ROUND_UP(end, PAGE_SIZE); + + ret =3D netfs_add_folios_to_buffer(&rreq->bounce, rreq->mapping, + first, last, GFP_NOFS); + if (ret < 0) { + rreq->error =3D ret; + return false; + } + + rreq->start =3D start; + rreq->len =3D len; + rreq->submitted =3D 0; + iov_iter_xarray(&rreq->io_iter, ITER_DEST, &rreq->bounce, start, len); + + io_iter =3D rreq->io_iter; + do { + _debug("submit %llx + %zx >=3D %llx", + rreq->start, rreq->submitted, rreq->i_size); + if (rreq->start + rreq->submitted >=3D rreq->i_size) + break; + if (!netfs_rreq_submit_slice(rreq, &io_iter, &rreq->subreq_counter)) + break; + } while (rreq->submitted < rreq->len); + + if (rreq->submitted < rreq->len) { + netfs_put_request(rreq, false, netfs_rreq_trace_put_no_submit); + return false; + } + + return true; +} + +/* + * Begin the process of reading in one or two chunks of data for use by + * unbuffered write to perform an RMW cycle. We don't read directly into = the + * write buffer as this may get called to redo the read in the case that a + * conditional write fails due to conflicting 3rd-party modifications. + */ +ssize_t netfs_rmw_read(struct netfs_io_request *wreq, struct file *file, + unsigned long long start1, size_t len1, + unsigned long long start2, size_t len2) +{ + struct netfs_io_request *rreq; + ssize_t ret; + + _enter("RMW:R=3D%x %llx-%llx %llx-%llx", + rreq->debug_id, start1, start1 + len1 - 1, start2, start2 + len2 -= 1); + + rreq =3D netfs_alloc_request(wreq->mapping, file, + start1, start2 - start1 + len2, NETFS_RMW_READ); + if (IS_ERR(rreq)) + return PTR_ERR(rreq); + + INIT_WORK(&rreq->work, netfs_rreq_work); + + rreq->iter =3D wreq->io_iter; + __set_bit(NETFS_RREQ_CRYPT_IN_PLACE, &rreq->flags); + __set_bit(NETFS_RREQ_USE_BOUNCE_BUFFER, &rreq->flags); + + /* Chop the reads into slices according to what the netfs wants and + * submit each one. + */ + netfs_get_request(rreq, netfs_rreq_trace_get_for_outstanding); + atomic_set(&rreq->nr_outstanding, 1); + if (len1 && !netfs_rmw_read_one(rreq, start1, len1)) + goto wait; + if (len2) + netfs_rmw_read_one(rreq, start2, len2); + +wait: + /* Keep nr_outstanding incremented so that the ref always belongs to us + * and the service code isn't punted off to a random thread pool to + * process. + */ + for (;;) { + wait_var_event(&rreq->nr_outstanding, + atomic_read(&rreq->nr_outstanding) =3D=3D 1); + netfs_rreq_assess(rreq, false); + if (atomic_read(&rreq->nr_outstanding) =3D=3D 1) + break; + cond_resched(); + } + + trace_netfs_rreq(wreq, netfs_rreq_trace_wait_ip); + wait_on_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS, + TASK_UNINTERRUPTIBLE); + + ret =3D rreq->error; + if (ret =3D=3D 0 && rreq->submitted < rreq->len) { + trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_read); + ret =3D -EIO; + } + + if (ret =3D=3D 0) + ret =3D netfs_dio_copy_bounce_to_dest(rreq); + + netfs_put_request(rreq, false, netfs_rreq_trace_put_return); + return ret; +} diff --git a/fs/netfs/main.c b/fs/netfs/main.c index 9fe96de6960e..122264d5c254 100644 --- a/fs/netfs/main.c +++ b/fs/netfs/main.c @@ -33,6 +33,7 @@ static const char *netfs_origins[nr__netfs_io_origin] =3D= { [NETFS_READPAGE] =3D "RP", [NETFS_READ_FOR_WRITE] =3D "RW", [NETFS_WRITEBACK] =3D "WB", + [NETFS_RMW_READ] =3D "RM", [NETFS_UNBUFFERED_WRITE] =3D "UW", [NETFS_DIO_READ] =3D "DR", [NETFS_DIO_WRITE] =3D "DW", diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 50adcf6942b8..a9898c99e2ba 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -143,6 +143,7 @@ struct netfs_inode { #define NETFS_ICTX_ENCRYPTED 2 /* The file contents are encrypted */ unsigned char min_bshift; /* log2 min block size for bounding box or 0 */ unsigned char crypto_bshift; /* log2 of crypto block size */ + unsigned char crypto_trailer; /* Size of crypto trailer */ }; =20 /* @@ -231,6 +232,7 @@ enum netfs_io_origin { NETFS_READPAGE, /* This read is a synchronous read */ NETFS_READ_FOR_WRITE, /* This read is to prepare a write */ NETFS_WRITEBACK, /* This write was triggered by writepages */ + NETFS_RMW_READ, /* This is an unbuffered read for RMW */ NETFS_UNBUFFERED_WRITE, /* This is an unbuffered write */ NETFS_DIO_READ, /* This is a direct I/O read */ NETFS_DIO_WRITE, /* This is a direct I/O write */ @@ -290,6 +292,7 @@ struct netfs_io_request { #define NETFS_RREQ_BLOCKED 10 /* We blocked */ #define NETFS_RREQ_CONTENT_ENCRYPTION 11 /* Content encryption is in use */ #define NETFS_RREQ_CRYPT_IN_PLACE 12 /* Enc/dec in place in ->io_iter */ +#define NETFS_RREQ_REPEAT_RMW 13 /* Need to repeat RMW cycle */ const struct netfs_request_ops *netfs_ops; void (*cleanup)(struct netfs_io_request *req); }; @@ -478,6 +481,7 @@ static inline void netfs_inode_init(struct netfs_inode = *ctx, ctx->flags =3D 0; ctx->min_bshift =3D 0; ctx->crypto_bshift =3D 0; + ctx->crypto_trailer =3D 0; #if IS_ENABLED(CONFIG_FSCACHE) ctx->cache =3D NULL; #endif diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index 6394fdf7a9cd..4dc5fcc7b0a4 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -33,6 +33,7 @@ EM(NETFS_READPAGE, "RP") \ EM(NETFS_READ_FOR_WRITE, "RW") \ EM(NETFS_WRITEBACK, "WB") \ + EM(NETFS_RMW_READ, "RM") \ EM(NETFS_UNBUFFERED_WRITE, "UW") \ EM(NETFS_DIO_READ, "DR") \ E_(NETFS_DIO_WRITE, "DW")