From nobody Sun Apr 12 02:53:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33A71C00144 for ; Tue, 2 Aug 2022 03:04:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235590AbiHBDED (ORCPT ); Mon, 1 Aug 2022 23:04:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55136 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234448AbiHBDDv (ORCPT ); Mon, 1 Aug 2022 23:03:51 -0400 Received: from out199-6.us.a.mail.aliyun.com (out199-6.us.a.mail.aliyun.com [47.90.199.6]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B8731DA4F for ; Mon, 1 Aug 2022 20:03:47 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045168;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VL9XT-F_1659409423; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VL9XT-F_1659409423) by smtp.aliyun-inc.com; Tue, 02 Aug 2022 11:03:43 +0800 From: Jingbo Xu To: dhowells@redhat.com, linux-cachefs@redhat.com Cc: linux-kernel@vger.kernel.org, xiang@kernel.org Subject: [PATCH RFC 1/9] cachefiles: improve FSCACHE_COOKIE_NO_DATA_TO_READ optimization Date: Tue, 2 Aug 2022 11:03:34 +0800 Message-Id: <20220802030342.46302-2-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220802030342.46302-1-jefflexu@linux.alibaba.com> References: <20220802030342.46302-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In the following introduced content map feature, cachefiles_prepare_[read|write] can query if the requested range is cached through either SEEK_[DATA|HOLE] llseek or a self maintained bitmap according to object->content_info. For already existing backing files, content_info can be derived from the xattr of the backing file. While for newly created tmpfile, content_info is initialized when the backing file is written for the first time. This time sequence requires FSCACHE_COOKIE_NO_DATA_TO_READ optimization, so that llseek will only be called after the first write, i.e. after content_info has been initializaed. This patch includes following changes: 1. Enable NO_DATA optimization in cachefiles_prepare_[read|write]. 2. Clear FSCACHE_COOKIE_NO_DATA_TO_READ on first write to the backing file. When working in non-on-demand mode, FSCACHE_COOKIE_NO_DATA_TO_READ is cleared when a_ops->release_folio() called. While for on-demand mode, there's a retry logic in cachefiles_prepare_read(), i.e. the requested range will be checked for the second time after the on-demand read, thus FSCACHE_COOKIE_NO_DATA_TO_READ needs to be cleared for on-demand mode once write completes. 3. Improve the setting/clearing of FSCACHE_COOKIE_NO_DATA_TO_READ in on-demand mode. Since now we rely on NO_DATA optimization when the backing file is actually tmpfile, the setting of FSCACHE_COOKIE_NO_DATA_TO_READ flag in on-demand mode is delayed until the size of the backing file is acquired when copen completes, so that FSCACHE_COOKIE_NO_DATA_TO_READ flag of tmpfile can be retained. Signed-off-by: Jingbo Xu --- fs/cachefiles/io.c | 20 +++++++++++++------- fs/cachefiles/ondemand.c | 5 +---- fs/fscache/cookie.c | 2 +- 3 files changed, 15 insertions(+), 12 deletions(-) diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c index 000a28f46e59..b513d9bf81f1 100644 --- a/fs/cachefiles/io.c +++ b/fs/cachefiles/io.c @@ -255,6 +255,7 @@ static void cachefiles_write_complete(struct kiocb *ioc= b, long ret) { struct cachefiles_kiocb *ki =3D container_of(iocb, struct cachefiles_kioc= b, iocb); struct cachefiles_object *object =3D ki->object; + struct fscache_cookie *cookie =3D object->cookie; struct inode *inode =3D file_inode(ki->iocb.ki_filp); =20 _enter("%ld", ret); @@ -269,6 +270,9 @@ static void cachefiles_write_complete(struct kiocb *ioc= b, long ret) =20 atomic_long_sub(ki->b_writing, &object->volume->cache->b_writing); set_bit(FSCACHE_COOKIE_HAVE_DATA, &object->cookie->flags); + if (cookie->advice & FSCACHE_ADV_WANT_CACHE_SIZE && + test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) + clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags); if (ki->term_func) ki->term_func(ki->term_func_priv, ret, ki->was_async); cachefiles_put_kiocb(ki); @@ -413,13 +417,6 @@ static enum netfs_io_source cachefiles_prepare_read(st= ruct netfs_io_subrequest * goto out_no_object; } =20 - if (test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) { - __set_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags); - why =3D cachefiles_trace_read_no_data; - if (!test_bit(NETFS_SREQ_ONDEMAND, &subreq->flags)) - goto out_no_object; - } - /* The object and the file may be being created in the background. */ if (!file) { why =3D cachefiles_trace_read_no_file; @@ -434,6 +431,11 @@ static enum netfs_io_source cachefiles_prepare_read(st= ruct netfs_io_subrequest * object =3D cachefiles_cres_object(cres); cache =3D object->volume->cache; cachefiles_begin_secure(cache, &saved_cred); + + if (test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) { + why =3D cachefiles_trace_read_no_data; + goto download_and_store; + } retry: off =3D cachefiles_inject_read_error(); if (off =3D=3D 0) @@ -510,6 +512,7 @@ int __cachefiles_prepare_write(struct cachefiles_object= *object, bool no_space_allocated_yet) { struct cachefiles_cache *cache =3D object->volume->cache; + struct fscache_cookie *cookie =3D object->cookie; loff_t start =3D *_start, pos; size_t len =3D *_len, down; int ret; @@ -526,6 +529,9 @@ int __cachefiles_prepare_write(struct cachefiles_object= *object, if (no_space_allocated_yet) goto check_space; =20 + if (test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) + goto check_space; + pos =3D cachefiles_inject_read_error(); if (pos =3D=3D 0) pos =3D vfs_llseek(file, *_start, SEEK_DATA); diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 1fee702d5529..a317857e2dfd 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -166,12 +166,9 @@ int cachefiles_ondemand_copen(struct cachefiles_cache = *cache, char *args) =20 cookie =3D req->object->cookie; cookie->object_size =3D size; - if (size) - clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags); - else + if (size =3D=3D 0) set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags); trace_cachefiles_ondemand_copen(req->object, id, size); - out: complete(&req->done); return ret; diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index 74920826d8f6..49c269c078eb 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -340,7 +340,7 @@ static struct fscache_cookie *fscache_alloc_cookie( cookie->key_len =3D index_key_len; cookie->aux_len =3D aux_data_len; cookie->object_size =3D object_size; - if (object_size =3D=3D 0) + if (object_size =3D=3D 0 && !(advice & FSCACHE_ADV_WANT_CACHE_SIZE)) __set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags); =20 if (fscache_set_key(cookie, index_key, index_key_len) < 0) --=20 2.27.0 From nobody Sun Apr 12 02:53:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DDE4C00144 for ; Tue, 2 Aug 2022 03:04:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235525AbiHBDDy (ORCPT ); Mon, 1 Aug 2022 23:03:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234389AbiHBDDv (ORCPT ); Mon, 1 Aug 2022 23:03:51 -0400 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B63ED1EAC6 for ; Mon, 1 Aug 2022 20:03:47 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046051;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VL9XpKv_1659409424; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VL9XpKv_1659409424) by smtp.aliyun-inc.com; Tue, 02 Aug 2022 11:03:44 +0800 From: Jingbo Xu To: dhowells@redhat.com, linux-cachefs@redhat.com Cc: linux-kernel@vger.kernel.org, xiang@kernel.org Subject: [PATCH RFC 2/9] cachefiles: add content map file helpers Date: Tue, 2 Aug 2022 11:03:35 +0800 Message-Id: <20220802030342.46302-3-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220802030342.46302-1-jefflexu@linux.alibaba.com> References: <20220802030342.46302-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Besides the mapping mechanism provided by the backing fs, a self maintained bitmap can be used to track if the corresponding file range is cached by the backing file or not. In this case, a content map file is used to permanentize the bitmap. As the first step, add the helper functions for looking up and freeing these content map files. Signed-off-by: Jingbo Xu --- fs/cachefiles/internal.h | 4 ++ fs/cachefiles/namei.c | 88 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 92 insertions(+) diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 6cba2c6de2f9..4c3ee6935811 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -270,6 +270,10 @@ extern struct dentry *cachefiles_get_directory(struct = cachefiles_cache *cache, bool *_is_new); extern void cachefiles_put_directory(struct dentry *dir); =20 +int cachefiles_look_up_map(struct cachefiles_cache *cache, + struct dentry *dir, struct file **pfile); +void cachefiles_put_map(struct file *file); + extern int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *= dir, char *filename); =20 diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index facf2ebe464b..2948eea18ca2 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -231,6 +231,94 @@ void cachefiles_put_directory(struct dentry *dir) } } =20 +/* + * Look up a content map file. + */ +int cachefiles_look_up_map(struct cachefiles_cache *cache, + struct dentry *dir, struct file **pfile) +{ + struct dentry *dentry; + struct file *file; + struct path path; + char *name =3D "Map"; + int ret; + + inode_lock_nested(d_inode(dir), I_MUTEX_PARENT); +retry: + ret =3D cachefiles_inject_read_error(); + if (ret) + goto err_unlock_dir; + + dentry =3D lookup_one_len(name, dir, strlen(name)); + if (IS_ERR(dentry)) { + ret =3D PTR_ERR(dentry); + goto err_unlock_dir; + } + + if (d_is_negative(dentry)) { + ret =3D cachefiles_has_space(cache, 1, 0, + cachefiles_has_space_for_create); + if (ret) + goto err_dput; + + ret =3D vfs_create(&init_user_ns, d_inode(dir), dentry, S_IFREG, true); + if (ret) + goto err_dput; + + if (unlikely(d_unhashed(dentry))) { + cachefiles_put_directory(dentry); + goto retry; + } + ASSERT(d_backing_inode(dentry)); + } + + inode_lock(d_inode(dentry)); + inode_unlock(d_inode(dir)); + + if (!__cachefiles_mark_inode_in_use(NULL, dentry)) { + inode_unlock(d_inode(dentry)); + dput(dentry); + return -EBUSY; + } + + inode_unlock(d_inode(dentry)); + ASSERT(d_backing_inode(dentry)); + + if (!d_is_reg(dentry)) { + pr_err("%pd is not a file\n", dentry); + cachefiles_put_directory(dentry); + return -EIO; + } + + path.mnt =3D cache->mnt; + path.dentry =3D dentry; + file =3D open_with_fake_path(&path, O_RDWR | O_LARGEFILE, + d_backing_inode(dentry), cache->cache_cred); + if (IS_ERR(file)) + cachefiles_put_directory(dentry); + + *pfile =3D file; + dput(dentry); + return 0; + +err_dput: + dput(dentry); +err_unlock_dir: + inode_unlock(d_inode(dir)); + return ret; +} + +/* + * Put a content map file. + */ +void cachefiles_put_map(struct file *file) +{ + if (file) { + cachefiles_do_unmark_inode_in_use(NULL, file->f_path.dentry); + fput(file); + } +} + /* * Remove a regular file from the cache. */ --=20 2.27.0 From nobody Sun Apr 12 02:53:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E62B4C00144 for ; Tue, 2 Aug 2022 03:04:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235684AbiHBDEV (ORCPT ); Mon, 1 Aug 2022 23:04:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235400AbiHBDDw (ORCPT ); Mon, 1 Aug 2022 23:03:52 -0400 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBC3A1EC4D for ; Mon, 1 Aug 2022 20:03:47 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045168;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VL9W-MZ_1659409424; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VL9W-MZ_1659409424) by smtp.aliyun-inc.com; Tue, 02 Aug 2022 11:03:45 +0800 From: Jingbo Xu To: dhowells@redhat.com, linux-cachefs@redhat.com Cc: linux-kernel@vger.kernel.org, xiang@kernel.org Subject: [PATCH RFC 3/9] cachefiles: allocate per-subdir content map files Date: Tue, 2 Aug 2022 11:03:36 +0800 Message-Id: <20220802030342.46302-4-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220802030342.46302-1-jefflexu@linux.alibaba.com> References: <20220802030342.46302-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Allocate one content map file for each sub-directory under one volume, so that the cacehfilesd only needs to remove the whole sub-directory (including the content map file and backing files in the sub-directory) as usual when it's going to cull the whole sub-directory or volume. The content map file will be shared among all backing files under this same sub-directory. Signed-off-by: Jingbo Xu --- fs/cachefiles/internal.h | 1 + fs/cachefiles/namei.c | 2 +- fs/cachefiles/volume.c | 14 ++++++++++++-- 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 4c3ee6935811..06bde4e0e4f5 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -42,6 +42,7 @@ struct cachefiles_volume { struct fscache_volume *vcookie; /* The netfs's representation */ struct dentry *dentry; /* The volume dentry */ struct dentry *fanout[256]; /* Fanout subdirs */ + struct file *content_map[256]; /* Content map files */ }; =20 /* diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index 2948eea18ca2..d2d5feea64e8 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -292,7 +292,7 @@ int cachefiles_look_up_map(struct cachefiles_cache *cac= he, =20 path.mnt =3D cache->mnt; path.dentry =3D dentry; - file =3D open_with_fake_path(&path, O_RDWR | O_LARGEFILE, + file =3D open_with_fake_path(&path, O_RDWR | O_LARGEFILE | O_DSYNC, d_backing_inode(dentry), cache->cache_cred); if (IS_ERR(file)) cachefiles_put_directory(dentry); diff --git a/fs/cachefiles/volume.c b/fs/cachefiles/volume.c index 89df0ba8ba5e..4decc91a8886 100644 --- a/fs/cachefiles/volume.c +++ b/fs/cachefiles/volume.c @@ -20,6 +20,7 @@ void cachefiles_acquire_volume(struct fscache_volume *vco= okie) struct cachefiles_cache *cache =3D vcookie->cache->cache_priv; const struct cred *saved_cred; struct dentry *vdentry, *fan; + struct file *map; size_t len; char *name; bool is_new =3D false; @@ -73,6 +74,11 @@ void cachefiles_acquire_volume(struct fscache_volume *vc= ookie) if (IS_ERR(fan)) goto error_fan; volume->fanout[i] =3D fan; + + ret =3D cachefiles_look_up_map(cache, fan, &map); + if (ret) + goto error_fan; + volume->content_map[i] =3D map; } =20 cachefiles_end_secure(cache, saved_cred); @@ -91,8 +97,10 @@ void cachefiles_acquire_volume(struct fscache_volume *vc= ookie) return; =20 error_fan: - for (i =3D 0; i < 256; i++) + for (i =3D 0; i < 256; i++) { cachefiles_put_directory(volume->fanout[i]); + cachefiles_put_map(volume->content_map[i]); + } error_dir: cachefiles_put_directory(volume->dentry); error_name: @@ -113,8 +121,10 @@ static void __cachefiles_free_volume(struct cachefiles= _volume *volume) =20 volume->vcookie->cache_priv =3D NULL; =20 - for (i =3D 0; i < 256; i++) + for (i =3D 0; i < 256; i++) { cachefiles_put_directory(volume->fanout[i]); + cachefiles_put_map(volume->content_map[i]); + } cachefiles_put_directory(volume->dentry); kfree(volume); } --=20 2.27.0 From nobody Sun Apr 12 02:53:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82D8BC00144 for ; Tue, 2 Aug 2022 03:04:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234325AbiHBDEc (ORCPT ); Mon, 1 Aug 2022 23:04:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231518AbiHBDDw (ORCPT ); Mon, 1 Aug 2022 23:03:52 -0400 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99E511EEF2 for ; Mon, 1 Aug 2022 20:03:48 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R121e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045170;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VL9XT0G_1659409425; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VL9XT0G_1659409425) by smtp.aliyun-inc.com; Tue, 02 Aug 2022 11:03:46 +0800 From: Jingbo Xu To: dhowells@redhat.com, linux-cachefs@redhat.com Cc: linux-kernel@vger.kernel.org, xiang@kernel.org Subject: [PATCH RFC 4/9] cachefiles: alloc/load/save content map Date: Tue, 2 Aug 2022 11:03:37 +0800 Message-Id: <20220802030342.46302-5-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220802030342.46302-1-jefflexu@linux.alibaba.com> References: <20220802030342.46302-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Besides the SEEK_[DATA|HOLE] llseek mechanism provided by the backing filesystem, this patch set is going to introduce a bitmap based mechanism, in which a self-maintained bitmap is used to track if the file range has been cached by the backing file. The bitmap is permanentized to the corresponding backing content map file. Since all backing files under one sub-directory share one backing content map file, the offset of the content map in the backing content map file is stored in the xattr for each backing file. Besides, the size of the content map is also stored in the xattr of the backing file. As shown in the following patches, the content map size stored in the xattr can be smaller or larger than the actual size of the backing file. In the lookup phase, for the case when the backing file already exists, the content map is loaded from the backing content map file. If the backing file doesn't exist, i.e. a new tmpfile is created as the backing file, the content map will not be initialized at this point. Instead, it will be expanded at runtime later. Signed-off-by: Jingbo Xu --- fs/cachefiles/Makefile | 3 +- fs/cachefiles/content-map.c | 93 +++++++++++++++++++++++++++++++++++++ fs/cachefiles/interface.c | 8 +++- fs/cachefiles/internal.h | 13 ++++++ fs/cachefiles/namei.c | 4 ++ fs/cachefiles/xattr.c | 9 ++++ 6 files changed, 128 insertions(+), 2 deletions(-) create mode 100644 fs/cachefiles/content-map.c diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile index c37a7a9af10b..59cd26cd7700 100644 --- a/fs/cachefiles/Makefile +++ b/fs/cachefiles/Makefile @@ -13,7 +13,8 @@ cachefiles-y :=3D \ namei.o \ security.o \ volume.o \ - xattr.o + xattr.o \ + content-map.o =20 cachefiles-$(CONFIG_CACHEFILES_ERROR_INJECTION) +=3D error_inject.o cachefiles-$(CONFIG_CACHEFILES_ONDEMAND) +=3D ondemand.o diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c new file mode 100644 index 000000000000..3432efdecbcf --- /dev/null +++ b/fs/cachefiles/content-map.c @@ -0,0 +1,93 @@ +#include +#include +#include +#include "internal.h" + +/* + * Zero the unused tail. + * + * @i_size indicates the size of the backing object. + */ +static void cachefiles_zero_content_map(void *map, size_t content_map_size, + size_t i_size) +{ + unsigned long granules_needed =3D DIV_ROUND_UP(i_size, CACHEFILES_GRAN_SI= ZE); + unsigned long bytes_needed =3D BITS_TO_BYTES(granules_needed); + unsigned long byte_end =3D min_t(unsigned long, bytes_needed, content_map= _size); + int i; + + if (bytes_needed < content_map_size) + memset(map + bytes_needed, 0, content_map_size - bytes_needed); + + for (i =3D granules_needed; i < byte_end * BITS_PER_BYTE; i++) + clear_bit(i, map); +} + +/* + * Load the content map from the backing map file. + */ +int cachefiles_load_content_map(struct cachefiles_object *object) +{ + struct file *file =3D object->volume->content_map[(u8)object->cookie->key= _hash]; + loff_t off =3D object->content_map_off; + size_t size =3D object->content_map_size; + void *map; + int ret; + + if (object->content_info !=3D CACHEFILES_CONTENT_MAP) + return 0; + + map =3D (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, get_order(size)= ); + if (!map) + return -ENOMEM; + + ret =3D kernel_read(file, map, size, &off); + if (ret !=3D size) { + free_pages((unsigned long)map, get_order(size)); + return ret < 0 ? ret : -EIO; + } + + /* + * Zero the unused tail. Later when expanding the content map, the + * content map itself may keep the same size while i_size of the backing + * object is increased. In this case, the original content map is reused + * and part of the original unused tail is used now. Be noted that + * content_map_size stored in xattr may be smaller or larger than the + * actual size of the backing object. + */ + cachefiles_zero_content_map(map, size, object->cookie->object_size); + + object->content_map =3D map; + return 0; +} + +/* + * Save the content map to the backing map file. + */ +void cachefiles_save_content_map(struct cachefiles_object *object) +{ + struct file *file =3D object->volume->content_map[(u8)object->cookie->key= _hash]; + loff_t off; + int ret; + + if (object->content_info !=3D CACHEFILES_CONTENT_MAP || + !object->content_map_size) + return; + + /* allocate space from content map file */ + off =3D object->content_map_off; + if (off =3D=3D CACHEFILES_CONTENT_MAP_OFF_INVAL) { + struct inode *inode =3D file_inode(file); + + inode_lock(inode); + off =3D i_size_read(inode); + i_size_write(inode, off + object->content_map_size); + inode_unlock(inode); + + object->content_map_off =3D off; + } + + ret =3D kernel_write(file, object->content_map, object->content_map_size,= &off); + if (ret !=3D object->content_map_size) + object->content_info =3D CACHEFILES_CONTENT_NO_DATA; +} diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index a69073a1d3f0..4cfbdc87b635 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -38,6 +38,8 @@ struct cachefiles_object *cachefiles_alloc_object(struct = fscache_cookie *cookie) object->volume =3D volume; object->debug_id =3D atomic_inc_return(&cachefiles_object_debug_id); object->cookie =3D fscache_get_cookie(cookie, fscache_cookie_get_attach_o= bject); + object->content_map_off =3D CACHEFILES_CONTENT_MAP_OFF_INVAL; + rwlock_init(&object->content_map_lock); =20 fscache_count_object(vcookie->cache); trace_cachefiles_ref(object->debug_id, cookie->debug_id, 1, @@ -88,6 +90,8 @@ void cachefiles_put_object(struct cachefiles_object *obje= ct, ASSERTCMP(object->file, =3D=3D, NULL); =20 kfree(object->d_name); + free_pages((unsigned long)object->content_map, + get_order(object->content_map_size)); =20 cache =3D object->volume->cache->cache; fscache_put_cookie(object->cookie, fscache_cookie_put_object); @@ -309,8 +313,10 @@ static void cachefiles_commit_object(struct cachefiles= _object *object, update =3D true; if (test_and_clear_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &object->cookie->flag= s)) update =3D true; - if (update) + if (update) { + cachefiles_save_content_map(object); cachefiles_set_object_xattr(object); + } =20 if (test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags)) cachefiles_commit_tmpfile(cache, object); diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 06bde4e0e4f5..1335ea5f4a5e 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -19,6 +19,7 @@ #include =20 #define CACHEFILES_DIO_BLOCK_SIZE 4096 +#define CACHEFILES_GRAN_SIZE 4096 /* one bit represents 4K */ =20 struct cachefiles_cache; struct cachefiles_object; @@ -30,6 +31,7 @@ enum cachefiles_content { CACHEFILES_CONTENT_ALL =3D 2, /* Content is all present, no map */ CACHEFILES_CONTENT_BACKFS_MAP =3D 3, /* Content is piecemeal, mapped thro= ugh backing fs */ CACHEFILES_CONTENT_DIRTY =3D 4, /* Content is dirty (only seen on disk) */ + CACHEFILES_CONTENT_MAP =3D 5, /* Content is piecemeal, map in use */ nr__cachefiles_content }; =20 @@ -59,6 +61,11 @@ struct cachefiles_object { refcount_t ref; u8 d_name_len; /* Length of filename */ enum cachefiles_content content_info:8; /* Info about content presence */ + rwlock_t content_map_lock; + void *content_map; + size_t content_map_size; /* size of content map in bytes */ + loff_t content_map_off; /* offset in the backing content map file */ +#define CACHEFILES_CONTENT_MAP_OFF_INVAL -1 unsigned long flags; #define CACHEFILES_OBJECT_USING_TMPFILE 0 /* Have an unlinked tmpfile */ #ifdef CONFIG_CACHEFILES_ONDEMAND @@ -169,6 +176,12 @@ extern int cachefiles_has_space(struct cachefiles_cach= e *cache, unsigned fnr, unsigned bnr, enum cachefiles_has_space_for reason); =20 +/* + * content-map.c + */ +extern int cachefiles_load_content_map(struct cachefiles_object *object); +extern void cachefiles_save_content_map(struct cachefiles_object *object); + /* * daemon.c */ diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index d2d5feea64e8..f5e1ec1d9445 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -690,6 +690,10 @@ static bool cachefiles_open_file(struct cachefiles_obj= ect *object, if (ret < 0) goto check_failed; =20 + ret =3D cachefiles_load_content_map(object); + if (ret < 0) + goto check_failed; + object->file =3D file; =20 /* Always update the atime on an object we've just looked up (this is diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c index 00b087c14995..05ac6b70787a 100644 --- a/fs/cachefiles/xattr.c +++ b/fs/cachefiles/xattr.c @@ -20,6 +20,8 @@ struct cachefiles_xattr { __be64 object_size; /* Actual size of the object */ __be64 zero_point; /* Size after which server has no data not written by = us */ + __be64 content_map_off;/* Offset inside the content map file */ + __be64 content_map_size;/* Size of the content map */ __u8 type; /* Type of object */ __u8 content; /* Content presence (enum cachefiles_content) */ __u8 data[]; /* netfs coherency data */ @@ -58,6 +60,8 @@ int cachefiles_set_object_xattr(struct cachefiles_object = *object) buf->zero_point =3D 0; buf->type =3D CACHEFILES_COOKIE_TYPE_DATA; buf->content =3D object->content_info; + buf->content_map_off =3D cpu_to_be64(object->content_map_off); + buf->content_map_size =3D cpu_to_be64(object->content_map_size); if (test_bit(FSCACHE_COOKIE_LOCAL_WRITE, &object->cookie->flags)) buf->content =3D CACHEFILES_CONTENT_DIRTY; if (len > 0) @@ -129,6 +133,11 @@ int cachefiles_check_auxdata(struct cachefiles_object = *object, struct file *file pr_warn("Dirty object in cache\n"); why =3D cachefiles_coherency_check_dirty; } else { + object->content_info =3D buf->content; + if (object->content_info =3D=3D CACHEFILES_CONTENT_MAP) { + object->content_map_off =3D be64_to_cpu(buf->content_map_off); + object->content_map_size =3D be64_to_cpu(buf->content_map_size); + } why =3D cachefiles_coherency_check_ok; ret =3D 0; } --=20 2.27.0 From nobody Sun Apr 12 02:53:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45A8CC00144 for ; Tue, 2 Aug 2022 03:04:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235720AbiHBDE0 (ORCPT ); Mon, 1 Aug 2022 23:04:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235415AbiHBDDw (ORCPT ); Mon, 1 Aug 2022 23:03:52 -0400 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6096E1FCFA for ; Mon, 1 Aug 2022 20:03:50 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046056;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VL9W-Nf_1659409426; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VL9W-Nf_1659409426) by smtp.aliyun-inc.com; Tue, 02 Aug 2022 11:03:47 +0800 From: Jingbo Xu To: dhowells@redhat.com, linux-cachefs@redhat.com Cc: linux-kernel@vger.kernel.org, xiang@kernel.org Subject: [PATCH RFC 5/9] cachefiles: mark content map on write to the backing file Date: Tue, 2 Aug 2022 11:03:38 +0800 Message-Id: <20220802030342.46302-6-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220802030342.46302-1-jefflexu@linux.alibaba.com> References: <20220802030342.46302-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Mark the content map on completion of the write to the backing file. The expansion of the content map (when the backing file is truncated to a larger size), and the allocation of the content map (when the backing file is a newly created tmpfile) is delayed to the point when the content map needs to be marked. It shall be safe to allocate memory with GFP_KERNEL inside the iocb.ki_complete() callback, since the callback is scheduled by workqueue for DIRECT IO. The content map is sized in granule of block size of backing filesystem, so that the backing content map file can be easily punched hole if the content map gets truncated or invalidated. Currently the content map is sized in PAGE_SIZE unit, which shall be multiples times of the block size of backing filesystem. Each bit of the content map indicates the existence of 4KB data of the backing file, thus each (4K sized) chunk of content map covers 128MB data of the backing file. When expanding the content map, a new content map needs to be allocated. A new offset inside the backing content map file also needs to be allocated, with the old range starting from the old offset getting punched hole. Currently the new offset is always allocated in an append style, i.e. the previous hole will not be reused. Signed-off-by: Jingbo Xu --- fs/cachefiles/content-map.c | 129 ++++++++++++++++++++++++++++++++++++ fs/cachefiles/internal.h | 2 + fs/cachefiles/io.c | 3 + 3 files changed, 134 insertions(+) diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c index 3432efdecbcf..877ff79e181b 100644 --- a/fs/cachefiles/content-map.c +++ b/fs/cachefiles/content-map.c @@ -1,8 +1,24 @@ #include #include #include +#include #include "internal.h" =20 +/* + * Return the size of the content map in bytes. + * + * There's one bit per granule (CACHEFILES_GRAN_SIZE, i.e. 4K). We size it= in + * terms of block size chunks (e.g. 4K), so that the map file can be punch= ed + * hole when the content map is truncated or invalidated. In this case, ea= ch 4K + * chunk spans (4096 * BITS_PER_BYTE * CACHEFILES_GRAN_SIZE, i.e. 128M) of= file + * space. + */ +static size_t cachefiles_map_size(loff_t i_size) +{ + i_size =3D round_up(i_size, PAGE_SIZE * BITS_PER_BYTE * CACHEFILES_GRAN_S= IZE); + return i_size / BITS_PER_BYTE / CACHEFILES_GRAN_SIZE; +} + /* * Zero the unused tail. * @@ -91,3 +107,116 @@ void cachefiles_save_content_map(struct cachefiles_obj= ect *object) if (ret !=3D object->content_map_size) object->content_info =3D CACHEFILES_CONTENT_NO_DATA; } + +static loff_t cachefiles_expand_map_off(struct file *file, loff_t old_off, + size_t old_size, size_t new_size) +{ + struct inode *inode =3D file_inode(file); + loff_t new_off; + bool punch =3D false; + + inode_lock(inode); + new_off =3D i_size_read(inode); + /* + * Simply expand the old content map range if possible; or discard the + * old content map range and create a new one. + */ + if (new_off =3D=3D old_off + old_size) { + i_size_write(inode, old_off + new_size); + new_off =3D old_off; + } else { + i_size_write(inode, new_off + new_size); + punch =3D true; + } + inode_unlock(inode); + + if (punch) + vfs_fallocate(file, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, + old_off, old_size); + + return new_off; +} + +/* + * Expand the content map to a larger file size. + */ +static void cachefiles_expand_content_map(struct cachefiles_object *object) +{ + struct file *file =3D object->volume->content_map[(u8)object->cookie->key= _hash]; + size_t size, zap_size; + void *map, *zap; + loff_t off; + + size =3D cachefiles_map_size(object->cookie->object_size); + if (size <=3D object->content_map_size) + return; + + map =3D (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, get_order(size)= ); + if (!map) + return; + + write_lock_bh(&object->content_map_lock); + if (size > object->content_map_size) { + zap =3D object->content_map; + zap_size =3D object->content_map_size; + memcpy(map, zap, zap_size); + object->content_map =3D map; + object->content_map_size =3D size; + + /* expand the content map file */ + off =3D object->content_map_off; + if (off !=3D CACHEFILES_CONTENT_MAP_OFF_INVAL) + object->content_map_off =3D cachefiles_expand_map_off(file, + off, zap_size, size); + } else { + zap =3D map; + zap_size =3D size; + } + write_unlock_bh(&object->content_map_lock); + + free_pages((unsigned long)zap, get_order(zap_size)); +} + +void cachefiles_mark_content_map(struct cachefiles_object *object, + loff_t start, loff_t len) +{ + pgoff_t granule; + loff_t end =3D start + len; + + if (object->cookie->advice & FSCACHE_ADV_SINGLE_CHUNK) { + if (start =3D=3D 0) { + object->content_info =3D CACHEFILES_CONTENT_SINGLE; + set_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &object->cookie->flags); + } + return; + } + + if (object->content_info =3D=3D CACHEFILES_CONTENT_NO_DATA) + object->content_info =3D CACHEFILES_CONTENT_MAP; + + /* TODO: set CACHEFILES_CONTENT_BACKFS_MAP accordingly */ + + if (object->content_info !=3D CACHEFILES_CONTENT_MAP) + return; + + read_lock_bh(&object->content_map_lock); + start =3D round_down(start, CACHEFILES_GRAN_SIZE); + do { + granule =3D start / CACHEFILES_GRAN_SIZE; + if (granule / BITS_PER_BYTE >=3D object->content_map_size) { + read_unlock_bh(&object->content_map_lock); + cachefiles_expand_content_map(object); + read_lock_bh(&object->content_map_lock); + } + + if (WARN_ON(granule / BITS_PER_BYTE >=3D object->content_map_size)) + break; + + set_bit(granule, object->content_map); + start +=3D CACHEFILES_GRAN_SIZE; + } while (start < end); + + set_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &object->cookie->flags); + read_unlock_bh(&object->content_map_lock); +} + diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 1335ea5f4a5e..c252746c8f9b 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -181,6 +181,8 @@ extern int cachefiles_has_space(struct cachefiles_cache= *cache, */ extern int cachefiles_load_content_map(struct cachefiles_object *object); extern void cachefiles_save_content_map(struct cachefiles_object *object); +extern void cachefiles_mark_content_map(struct cachefiles_object *object, + loff_t start, loff_t len); =20 /* * daemon.c diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c index b513d9bf81f1..27171fac649e 100644 --- a/fs/cachefiles/io.c +++ b/fs/cachefiles/io.c @@ -264,6 +264,9 @@ static void cachefiles_write_complete(struct kiocb *ioc= b, long ret) __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE); __sb_end_write(inode->i_sb, SB_FREEZE_WRITE); =20 + if (ret =3D=3D ki->len) + cachefiles_mark_content_map(ki->object, ki->start, ki->len); + if (ret < 0) trace_cachefiles_io_error(object, inode, ret, cachefiles_trace_write_error); --=20 2.27.0 From nobody Sun Apr 12 02:53:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4F08C00144 for ; Tue, 2 Aug 2022 03:04:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235022AbiHBDEM (ORCPT ); Mon, 1 Aug 2022 23:04:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235233AbiHBDDw (ORCPT ); Mon, 1 Aug 2022 23:03:52 -0400 Received: from out30-57.freemail.mail.aliyun.com (out30-57.freemail.mail.aliyun.com [115.124.30.57]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FB30201AD for ; Mon, 1 Aug 2022 20:03:50 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045168;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VL9XT1b_1659409427; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VL9XT1b_1659409427) by smtp.aliyun-inc.com; Tue, 02 Aug 2022 11:03:48 +0800 From: Jingbo Xu To: dhowells@redhat.com, linux-cachefs@redhat.com Cc: linux-kernel@vger.kernel.org, xiang@kernel.org Subject: [PATCH RFC 6/9] cachefiles: check content map on read/write Date: Tue, 2 Aug 2022 11:03:39 +0800 Message-Id: <20220802030342.46302-7-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220802030342.46302-1-jefflexu@linux.alibaba.com> References: <20220802030342.46302-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" cachefiles_find_next_granule()/cachefiles_find_next_hole() are used to check if the requested range has been cached or not. The return value of these two functions imitates that of SEEK_[DATA|HOLE], so that the existing codes can be resued as much as possible. Signed-off-by: Jingbo Xu --- fs/cachefiles/content-map.c | 30 ++++++++++++++++++++++++++++++ fs/cachefiles/internal.h | 4 ++++ fs/cachefiles/io.c | 36 +++++++++++++++++++++++++++++++----- 3 files changed, 65 insertions(+), 5 deletions(-) diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c index 877ff79e181b..949ec5d9e4c9 100644 --- a/fs/cachefiles/content-map.c +++ b/fs/cachefiles/content-map.c @@ -220,3 +220,33 @@ void cachefiles_mark_content_map(struct cachefiles_obj= ect *object, read_unlock_bh(&object->content_map_lock); } =20 +loff_t cachefiles_find_next_granule(struct cachefiles_object *object, + loff_t start) +{ + unsigned long size, granule =3D start / CACHEFILES_GRAN_SIZE; + loff_t result; + + read_lock_bh(&object->content_map_lock); + size =3D object->content_map_size * BITS_PER_BYTE; + result =3D find_next_bit(object->content_map, size, granule); + read_unlock_bh(&object->content_map_lock); + + if (result =3D=3D size) + return -ENXIO; + return result * CACHEFILES_GRAN_SIZE; +} + +loff_t cachefiles_find_next_hole(struct cachefiles_object *object, + loff_t start) +{ + unsigned long size, granule =3D start / CACHEFILES_GRAN_SIZE; + loff_t result; + + read_lock_bh(&object->content_map_lock); + size =3D object->content_map_size * BITS_PER_BYTE; + result =3D find_next_zero_bit(object->content_map, size, granule); + read_unlock_bh(&object->content_map_lock); + + return min_t(loff_t, result * CACHEFILES_GRAN_SIZE, + object->cookie->object_size); +} diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index c252746c8f9b..506700809a6d 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -183,6 +183,10 @@ extern int cachefiles_load_content_map(struct cachefil= es_object *object); extern void cachefiles_save_content_map(struct cachefiles_object *object); extern void cachefiles_mark_content_map(struct cachefiles_object *object, loff_t start, loff_t len); +extern loff_t cachefiles_find_next_granule(struct cachefiles_object *objec= t, + loff_t start); +extern loff_t cachefiles_find_next_hole(struct cachefiles_object *object, + loff_t start); =20 /* * daemon.c diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c index 27171fac649e..5c7c84cdafea 100644 --- a/fs/cachefiles/io.c +++ b/fs/cachefiles/io.c @@ -30,6 +30,32 @@ struct cachefiles_kiocb { u64 b_writing; }; =20 +static loff_t cachefiles_seek_data(struct cachefiles_object *object, + struct file *file, loff_t start) +{ + switch (object->content_info) { + case CACHEFILES_CONTENT_MAP: + return cachefiles_find_next_granule(object, start); + case CACHEFILES_CONTENT_BACKFS_MAP: + return vfs_llseek(file, start, SEEK_DATA); + default: + return -EINVAL; + } +} + +static loff_t cachefiles_seek_hole(struct cachefiles_object *object, + struct file *file, loff_t start) +{ + switch (object->content_info) { + case CACHEFILES_CONTENT_MAP: + return cachefiles_find_next_hole(object, start); + case CACHEFILES_CONTENT_BACKFS_MAP: + return vfs_llseek(file, start, SEEK_HOLE); + default: + return -EINVAL; + } +} + static inline void cachefiles_put_kiocb(struct cachefiles_kiocb *ki) { if (refcount_dec_and_test(&ki->ki_refcnt)) { @@ -103,7 +129,7 @@ static int cachefiles_read(struct netfs_cache_resources= *cres, =20 off2 =3D cachefiles_inject_read_error(); if (off2 =3D=3D 0) - off2 =3D vfs_llseek(file, off, SEEK_DATA); + off2 =3D cachefiles_seek_data(object, file, off); if (off2 < 0 && off2 >=3D (loff_t)-MAX_ERRNO && off2 !=3D -ENXIO) { skipped =3D 0; ret =3D off2; @@ -442,7 +468,7 @@ static enum netfs_io_source cachefiles_prepare_read(str= uct netfs_io_subrequest * retry: off =3D cachefiles_inject_read_error(); if (off =3D=3D 0) - off =3D vfs_llseek(file, subreq->start, SEEK_DATA); + off =3D cachefiles_seek_data(object, file, subreq->start); if (off < 0 && off >=3D (loff_t)-MAX_ERRNO) { if (off =3D=3D (loff_t)-ENXIO) { why =3D cachefiles_trace_read_seek_nxio; @@ -468,7 +494,7 @@ static enum netfs_io_source cachefiles_prepare_read(str= uct netfs_io_subrequest * =20 to =3D cachefiles_inject_read_error(); if (to =3D=3D 0) - to =3D vfs_llseek(file, subreq->start, SEEK_HOLE); + to =3D cachefiles_seek_hole(object, file, subreq->start); if (to < 0 && to >=3D (loff_t)-MAX_ERRNO) { trace_cachefiles_io_error(object, file_inode(file), to, cachefiles_trace_seek_error); @@ -537,7 +563,7 @@ int __cachefiles_prepare_write(struct cachefiles_object= *object, =20 pos =3D cachefiles_inject_read_error(); if (pos =3D=3D 0) - pos =3D vfs_llseek(file, *_start, SEEK_DATA); + pos =3D cachefiles_seek_data(object, file, *_start); if (pos < 0 && pos >=3D (loff_t)-MAX_ERRNO) { if (pos =3D=3D -ENXIO) goto check_space; /* Unallocated tail */ @@ -558,7 +584,7 @@ int __cachefiles_prepare_write(struct cachefiles_object= *object, =20 pos =3D cachefiles_inject_read_error(); if (pos =3D=3D 0) - pos =3D vfs_llseek(file, *_start, SEEK_HOLE); + pos =3D cachefiles_seek_hole(object, file, *_start); if (pos < 0 && pos >=3D (loff_t)-MAX_ERRNO) { trace_cachefiles_io_error(object, file_inode(file), pos, cachefiles_trace_seek_error); --=20 2.27.0 From nobody Sun Apr 12 02:53:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6318FC00144 for ; Tue, 2 Aug 2022 03:04:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235816AbiHBDEi (ORCPT ); Mon, 1 Aug 2022 23:04:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235520AbiHBDDy (ORCPT ); Mon, 1 Aug 2022 23:03:54 -0400 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B46E81DA4F for ; Mon, 1 Aug 2022 20:03:51 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R791e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045168;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VL9XpO._1659409428; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VL9XpO._1659409428) by smtp.aliyun-inc.com; Tue, 02 Aug 2022 11:03:48 +0800 From: Jingbo Xu To: dhowells@redhat.com, linux-cachefs@redhat.com Cc: linux-kernel@vger.kernel.org, xiang@kernel.org Subject: [PATCH RFC 7/9] cachefiles: free content map on invalidate Date: Tue, 2 Aug 2022 11:03:40 +0800 Message-Id: <20220802030342.46302-8-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220802030342.46302-1-jefflexu@linux.alibaba.com> References: <20220802030342.46302-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Free the content map when the cached file is invalidated. Also hole punch the backing content map file if any. Signed-off-by: Jingbo Xu --- fs/cachefiles/content-map.c | 21 +++++++++++++++++++++ fs/cachefiles/interface.c | 1 + fs/cachefiles/internal.h | 1 + 3 files changed, 23 insertions(+) diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c index 949ec5d9e4c9..b73a109844ca 100644 --- a/fs/cachefiles/content-map.c +++ b/fs/cachefiles/content-map.c @@ -250,3 +250,24 @@ loff_t cachefiles_find_next_hole(struct cachefiles_obj= ect *object, return min_t(loff_t, result * CACHEFILES_GRAN_SIZE, object->cookie->object_size); } + +void cachefiles_invalidate_content_map(struct cachefiles_object *object) +{ + struct file *file =3D object->volume->content_map[(u8)object->cookie->key= _hash]; + + if (object->content_info !=3D CACHEFILES_CONTENT_MAP) + return; + + write_lock_bh(&object->content_map_lock); + free_pages((unsigned long)object->content_map, + get_order(object->content_map_size)); + object->content_map =3D NULL; + object->content_map_size =3D 0; + + if (object->content_map_off !=3D CACHEFILES_CONTENT_MAP_OFF_INVAL) { + vfs_fallocate(file, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, + object->content_map_off, object->content_map_size); + object->content_map_off =3D CACHEFILES_CONTENT_MAP_OFF_INVAL; + } + write_unlock_bh(&object->content_map_lock); +} diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 4cfbdc87b635..f87b9a665d85 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -409,6 +409,7 @@ static bool cachefiles_invalidate_cookie(struct fscache= _cookie *cookie) =20 old_file =3D object->file; object->file =3D new_file; + cachefiles_invalidate_content_map(object); object->content_info =3D CACHEFILES_CONTENT_NO_DATA; set_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags); set_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &object->cookie->flags); diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 506700809a6d..c674c4e42529 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -187,6 +187,7 @@ extern loff_t cachefiles_find_next_granule(struct cache= files_object *object, loff_t start); extern loff_t cachefiles_find_next_hole(struct cachefiles_object *object, loff_t start); +extern void cachefiles_invalidate_content_map(struct cachefiles_object *ob= ject); =20 /* * daemon.c --=20 2.27.0 From nobody Sun Apr 12 02:53:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFAE1C00144 for ; Tue, 2 Aug 2022 03:04:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235817AbiHBDEo (ORCPT ); Mon, 1 Aug 2022 23:04:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235603AbiHBDEE (ORCPT ); Mon, 1 Aug 2022 23:04:04 -0400 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34FDF1EC4D for ; Mon, 1 Aug 2022 20:03:53 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046056;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VL9W-Pi_1659409429; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VL9W-Pi_1659409429) by smtp.aliyun-inc.com; Tue, 02 Aug 2022 11:03:49 +0800 From: Jingbo Xu To: dhowells@redhat.com, linux-cachefs@redhat.com Cc: linux-kernel@vger.kernel.org, xiang@kernel.org Subject: [PATCH RFC 8/9] cachefiles: resize content map on resize Date: Tue, 2 Aug 2022 11:03:41 +0800 Message-Id: <20220802030342.46302-9-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220802030342.46302-1-jefflexu@linux.alibaba.com> References: <20220802030342.46302-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Adjust the content map when we shorten a backing object. In this case, only the unused tail of the content map after shortening gets zeroed, while the size of the content map itself is not changed. Also the corresponding range in the backing content map file is not changed. Besides, the content map and the corresponding range in the backing content map file are not touched when we expand a backing object. They will be lazily expanded at runtime later. Signed-off-by: Jingbo Xu --- fs/cachefiles/content-map.c | 23 +++++++++++++++++++++++ fs/cachefiles/interface.c | 1 + fs/cachefiles/internal.h | 2 ++ 3 files changed, 26 insertions(+) diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c index b73a109844ca..360c59b06670 100644 --- a/fs/cachefiles/content-map.c +++ b/fs/cachefiles/content-map.c @@ -271,3 +271,26 @@ void cachefiles_invalidate_content_map(struct cachefil= es_object *object) } write_unlock_bh(&object->content_map_lock); } + +/* + * Adjust the content map when we shorten a backing object. + */ +void cachefiles_shorten_content_map(struct cachefiles_object *object, + loff_t new_size) +{ + if (object->content_info !=3D CACHEFILES_CONTENT_MAP) + return; + + read_lock_bh(&object->content_map_lock); + /* + * Nothing needs to be done when content map has not been allocated yet. + */ + if (!object->content_map_size) + goto out; + + if (cachefiles_map_size(new_size) <=3D object->content_map_size) + cachefiles_zero_content_map(object->content_map, + object->content_map_size, new_size); +out: + read_unlock_bh(&object->content_map_lock); +} diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index f87b9a665d85..76f70a9ebe50 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -290,6 +290,7 @@ static void cachefiles_resize_cookie(struct netfs_cache= _resources *cres, cachefiles_begin_secure(cache, &saved_cred); cachefiles_shorten_object(object, file, new_size); cachefiles_end_secure(cache, saved_cred); + cachefiles_shorten_content_map(object, new_size); object->cookie->object_size =3D new_size; return; } diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index c674c4e42529..7747f99f00c1 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -188,6 +188,8 @@ extern loff_t cachefiles_find_next_granule(struct cache= files_object *object, extern loff_t cachefiles_find_next_hole(struct cachefiles_object *object, loff_t start); extern void cachefiles_invalidate_content_map(struct cachefiles_object *ob= ject); +extern void cachefiles_shorten_content_map(struct cachefiles_object *objec= t, + loff_t new_size); =20 /* * daemon.c --=20 2.27.0 From nobody Sun Apr 12 02:53:29 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B2EBC00144 for ; Tue, 2 Aug 2022 03:04:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235840AbiHBDEx (ORCPT ); Mon, 1 Aug 2022 23:04:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231985AbiHBDEE (ORCPT ); Mon, 1 Aug 2022 23:04:04 -0400 Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com [115.124.30.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09D042C12D for ; Mon, 1 Aug 2022 20:03:52 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046056;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VL9PAYh_1659409430; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VL9PAYh_1659409430) by smtp.aliyun-inc.com; Tue, 02 Aug 2022 11:03:50 +0800 From: Jingbo Xu To: dhowells@redhat.com, linux-cachefs@redhat.com Cc: linux-kernel@vger.kernel.org, xiang@kernel.org Subject: [PATCH RFC 9/9] cachefiles: cull content map file on cull Date: Tue, 2 Aug 2022 11:03:42 +0800 Message-Id: <20220802030342.46302-10-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220802030342.46302-1-jefflexu@linux.alibaba.com> References: <20220802030342.46302-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Also hole punch the backing content map file when the backing object gets culled. When cacehfilesd is going to cull a whole directory, the whole directory will be moved to the graveyard and then cacehfilesd itself will remove all files under the directory one by one. Since each sub-directory under one volume maintains one backing content map file, cacehfilesd already works well with this bitmap-based mechanism and doesn't need any refactoring. Signed-off-by: Jingbo Xu --- fs/cachefiles/content-map.c | 37 +++++++++++++++++++++++++++++++++++++ fs/cachefiles/internal.h | 4 ++++ fs/cachefiles/namei.c | 4 ++++ fs/cachefiles/xattr.c | 17 +++++++++++++++++ 4 files changed, 62 insertions(+) diff --git a/fs/cachefiles/content-map.c b/fs/cachefiles/content-map.c index 360c59b06670..5584a0182df9 100644 --- a/fs/cachefiles/content-map.c +++ b/fs/cachefiles/content-map.c @@ -294,3 +294,40 @@ void cachefiles_shorten_content_map(struct cachefiles_= object *object, out: read_unlock_bh(&object->content_map_lock); } + +int cachefiles_cull_content_map(struct cachefiles_cache *cache, + struct dentry *dir, struct dentry *victim) +{ + struct dentry *map; + struct file *map_file; + size_t content_map_size =3D 0; + loff_t content_map_off =3D 0; + struct path path; + int ret; + + if (!d_is_reg(victim)) + return 0; + + ret =3D cachefiles_get_content_info(victim, &content_map_size, &content_m= ap_off); + if (ret || !content_map_size) + return ret; + + map =3D lookup_positive_unlocked("Map", dir, strlen("Map")); + if (IS_ERR(map)) + return PTR_ERR(map); + + path.mnt =3D cache->mnt; + path.dentry =3D map; + map_file =3D open_with_fake_path(&path, O_RDWR | O_LARGEFILE, + d_backing_inode(map), cache->cache_cred); + if (IS_ERR(map_file)) { + dput(map); + return PTR_ERR(map_file); + } + + ret =3D vfs_fallocate(map_file, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZ= E, + content_map_off, content_map_size); + + fput(map_file); + return ret; +} diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 7747f99f00c1..9c36631ee051 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -190,6 +190,8 @@ extern loff_t cachefiles_find_next_hole(struct cachefil= es_object *object, extern void cachefiles_invalidate_content_map(struct cachefiles_object *ob= ject); extern void cachefiles_shorten_content_map(struct cachefiles_object *objec= t, loff_t new_size); +extern int cachefiles_cull_content_map(struct cachefiles_cache *cache, + struct dentry *dir, struct dentry *victim); =20 /* * daemon.c @@ -384,6 +386,8 @@ extern int cachefiles_remove_object_xattr(struct cachef= iles_cache *cache, extern void cachefiles_prepare_to_write(struct fscache_cookie *cookie); extern bool cachefiles_set_volume_xattr(struct cachefiles_volume *volume); extern int cachefiles_check_volume_xattr(struct cachefiles_volume *volume); +extern int cachefiles_get_content_info(struct dentry *dentry, + size_t *content_map_size, loff_t *content_map_off); =20 /* * Error handling diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index f5e1ec1d9445..79c759468ab3 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -923,6 +923,10 @@ int cachefiles_cull(struct cachefiles_cache *cache, st= ruct dentry *dir, if (ret < 0) goto error_unlock; =20 + ret =3D cachefiles_cull_content_map(cache, dir, victim); + if (ret < 0) + goto error; + ret =3D cachefiles_bury_object(cache, NULL, dir, victim, FSCACHE_OBJECT_WAS_CULLED); if (ret < 0) diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c index 05ac6b70787a..b7091c8e4262 100644 --- a/fs/cachefiles/xattr.c +++ b/fs/cachefiles/xattr.c @@ -283,3 +283,20 @@ int cachefiles_check_volume_xattr(struct cachefiles_vo= lume *volume) _leave(" =3D %d", ret); return ret; } + +int cachefiles_get_content_info(struct dentry *dentry, size_t *content_map= _size, + loff_t *content_map_off) +{ + struct cachefiles_xattr buf; + ssize_t xlen, tlen =3D sizeof(buf); + + xlen =3D vfs_getxattr(&init_user_ns, dentry, cachefiles_xattr_cache, &buf= , tlen); + if (xlen !=3D tlen) + return -ESTALE; + + if (buf.content =3D=3D CACHEFILES_CONTENT_MAP) { + *content_map_off =3D be64_to_cpu(buf.content_map_off); + *content_map_size =3D be64_to_cpu(buf.content_map_size); + } + return 0; +} --=20 2.27.0