From nobody Mon Feb 9 00:38:52 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 430CFEB64DD for ; Tue, 11 Jul 2023 04:37:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231133AbjGKEhP (ORCPT ); Tue, 11 Jul 2023 00:37:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230318AbjGKEg4 (ORCPT ); Tue, 11 Jul 2023 00:36:56 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F18EBE49 for ; Mon, 10 Jul 2023 21:36:31 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-6682909acadso2784109b3a.3 for ; Mon, 10 Jul 2023 21:36:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1689050191; x=1691642191; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=o2fbpRc1gP3OYJFiewDVSSan5FPPeq6w2nLF2ekKLbg=; b=RQesib1xN1ZmHsPS1BZxb8GFBw3bLOQP+UTjbpSvqSu+zXCUbswMRRjAQVntHnffgk 2E7MeVDBV3pc5dYbmgSn6MIm2amFwKOtJplCxUPOs4k2kU3oM5T7RL+2F9ANb+ranU8N cTamzNqo/wgS45OeJK1zXiqykdKVk4JG6kyQnw8eJDpaUqPUIkUPzbNhzXXwt17/i6dg koxv3ey5vV36NfWt0COANNWYLYE5QEZNoTZyYAyTVobEvSdbLe/RxfNmvTFEo+Olk8Dn BNMc0q1bCu/CaaEV8YEzXStHiJ5nkGZrRC5Gn+FtESHQPWr5TV0cdYOfjYn8FB/MEZYv N43w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689050191; x=1691642191; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o2fbpRc1gP3OYJFiewDVSSan5FPPeq6w2nLF2ekKLbg=; b=iPdjWY4xTtMWHQycMINErro8z6kMDiypeAKmHA97U5MlGAcPNsMZLSAEd2gkWJ4gyK Monh077Gj3/v/iwH0sQ1sSDCNBpWBTXDfgc2wTMZXzhBvjZvA6OnPi22v5dIn8jWZJmA WwT9Nz5cUXXGzXcrff1dpdwAv8HilOx2cvoQ+5DcbxWlL7jkVabQ300ldo7JihsIiDnh uOafiIq5zb7GjZ2N+LCrMXvzv6yhCZJiK8aKsQwIUynSUbMxYX1kxwYD3WS9yLKGfHlQ i9xSwmIhWcxAlLvoBHjCdHRc3UZCdgciCoYe3ny+rdfRVtSiCF25rFXWbUueNoYX8oZn 6zJA== X-Gm-Message-State: ABy/qLY7N38wSCzL33sUpmYKoao98MJB6q0h0FVkjOx55w7I0KjZhJCJ 8+nhiFBHqG/j+N+3MDjMHyW5WT/6h1gDQ22GecxAZw== X-Google-Smtp-Source: APBJJlH4Z539tozKrapAxAT/lMBr/3gs+hikaUOY++P9Tu6dKkQgiDYYm65yDeiYW5QBvBYNqbJcOA== X-Received: by 2002:a05:6a21:900c:b0:130:74c8:b501 with SMTP id tq12-20020a056a21900c00b0013074c8b501mr8444984pzb.30.1689050191294; Mon, 10 Jul 2023 21:36:31 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.236]) by smtp.gmail.com with ESMTPSA id ij9-20020a170902ab4900b001b9de67285dsm755259plb.156.2023.07.10.21.36.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jul 2023 21:36:30 -0700 (PDT) From: Jiachen Zhang To: Miklos Szeredi , Jonathan Corbet , linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Cc: me@jcix.top, Jiachen Zhang Subject: [PATCH 1/5] fuse: check attributes staleness on fuse_iget() Date: Tue, 11 Jul 2023 12:34:01 +0800 Message-Id: <20230711043405.66256-2-zhangjiachen.jaycee@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230711043405.66256-1-zhangjiachen.jaycee@bytedance.com> References: <20230711043405.66256-1-zhangjiachen.jaycee@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Function fuse_direntplus_link() might call fuse_iget() to initialize a new fuse_inode and change its attributes. If fi->attr_version is always initialized with 0, even if the attributes returned by the FUSE_READDIR request is staled, as the new fi->attr_version is 0, fuse_change_attributes will still set the staled attributes to inode. This wrong behaviour may cause file size inconsistency even when there is no changes from server-side. To reproduce the issue, consider the following 2 programs (A and B) are running concurrently, A B ---------------------------------- -------------------------------- { /fusemnt/dir/f is a file path in a fuse mount, the size of f is 0. } readdir(/fusemnt/dir) start //Daemon set size 0 to f direntry fallocate(f, 1024) stat(f) // B see size 1024 echo 2 > /proc/sys/vm/drop_caches readdir(/fusemnt/dir) reply to kernel Kernel set 0 to the I_NEW inode stat(f) // B see size 0 In the above case, only program B is modifying the file size, however, B observes file size changing between the 2 'readonly' stat() calls. To fix this issue, we should make sure readdirplus still follows the rule of attr_version staleness checking even if the fi->attr_version is lost due to inode eviction. So this patch increases fc->attr_version on inode eviction, and compares request attr_version and the fc->attr_version when a FUSE_READDIRPLUS request is finished. Signed-off-by: Jiachen Zhang --- fs/fuse/inode.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 660be31aaabc..3e0b1fb1db17 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -115,6 +115,7 @@ static void fuse_free_inode(struct inode *inode) =20 static void fuse_evict_inode(struct inode *inode) { + struct fuse_conn *fc =3D get_fuse_conn(inode); struct fuse_inode *fi =3D get_fuse_inode(inode); =20 /* Will write inode on close/munmap and in all other dirtiers */ @@ -137,6 +138,8 @@ static void fuse_evict_inode(struct inode *inode) WARN_ON(!list_empty(&fi->write_files)); WARN_ON(!list_empty(&fi->queued_writes)); } + + atomic64_inc(&fc->attr_version); } =20 static int fuse_reconfigure(struct fs_context *fsc) @@ -409,6 +412,10 @@ struct inode *fuse_iget(struct super_block *sb, u64 no= deid, fi->nlookup++; spin_unlock(&fi->lock); fuse_change_attributes(inode, attr, attr_valid, attr_version); + spin_lock(&fi->lock); + if (attr_version < atomic64_read(&fc->attr_version)) + fuse_invalidate_attr(inode); + spin_unlock(&fi->lock); =20 return inode; } --=20 2.20.1 From nobody Mon Feb 9 00:38:52 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64DCBC001B0 for ; Tue, 11 Jul 2023 04:37:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231219AbjGKEhi (ORCPT ); Tue, 11 Jul 2023 00:37:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45030 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230292AbjGKEhA (ORCPT ); Tue, 11 Jul 2023 00:37:00 -0400 Received: from mail-pg1-x529.google.com (mail-pg1-x529.google.com [IPv6:2607:f8b0:4864:20::529]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44221BF for ; Mon, 10 Jul 2023 21:36:35 -0700 (PDT) Received: by mail-pg1-x529.google.com with SMTP id 41be03b00d2f7-55b1238a024so4014135a12.0 for ; Mon, 10 Jul 2023 21:36:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1689050195; x=1691642195; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3XAHhhESlWk2lPaujVBvLDXmvRjl2svsri3z2wXZkLk=; b=M8vGYeHoV98HxtEMd30yZy8J3vsp1gg2UT4CLif35qX5VjoffNTseSfCyCPQjIFXd3 jMXsnIN0y93aH99CbAOqK1hQE2WTk+EnG39KFI93kZTddE/ujIniiOVaxFJ9l2l4vTyy jxxAVfv4neMTtiBhVY1x1UCTGNus5SOkXaix/+VzLqJK/JlnU4Kcb8D4gfAGDEfpfMVG INTIy7c3kT1rRA6/vEmVG6JIAT+juFpfwpk2RuCHm+yuEJpr/Ig5n/5eBGCmRW1nx2i7 F2KVD16LtUYKJ7l7HCE+eVrlVLVXKFzfR62F3qHP5TSypqeEUutFNuq7rPM8cbnX+yk3 c8kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689050195; x=1691642195; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3XAHhhESlWk2lPaujVBvLDXmvRjl2svsri3z2wXZkLk=; b=IEGKAWhVlNlWdspghGIZvPWt8H65FCY1xZMicXzy1syD5BWW34tAPdPSf+VF6o5hLk nw7jkq1yuBbo7GF3NFU2cVW2sd07jqH2xiEKojdv93msd65uQMc4dktK6/4q5PRuB/QP wukyxSqyA4e5fu+cjlHPidf81yursmeDTz75Trcw/55HNavsxJU6qOK+EqAKnlCVtKdM AOFeMHhZdZB5wv5sHjCzNTdTJqy3emOdDLm3EUHw+G3/5VcwOPedznbxzkEh2f837Hmu ZsjWJLuqjOuMkKOpPZNZdAMYV2iZdwEGUd3RR5y+yDxSn2FK2Tgal7VGP3xxkXLAAnwe ho4Q== X-Gm-Message-State: ABy/qLZz4TRRLSHegsBUWe2ucJdxBV40agyZVrOYX6ss9I37dR6nHaed 1gxvlXOhFzsMwfBV23uWtQf5lg== X-Google-Smtp-Source: APBJJlHVXHi5H4pyAltoj/82qhz5I3Bsfho9pJLgbS3udUMYxFnBreB2K43EEK98etc5oZFkabl6Ow== X-Received: by 2002:a17:902:b117:b0:1b3:e90b:93e1 with SMTP id q23-20020a170902b11700b001b3e90b93e1mr13910557plr.36.1689050194747; Mon, 10 Jul 2023 21:36:34 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.236]) by smtp.gmail.com with ESMTPSA id ij9-20020a170902ab4900b001b9de67285dsm755259plb.156.2023.07.10.21.36.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jul 2023 21:36:34 -0700 (PDT) From: Jiachen Zhang To: Miklos Szeredi , Jonathan Corbet , linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Cc: me@jcix.top, Jiachen Zhang Subject: [PATCH 2/5] fuse: invalidate dentry on EEXIST creates or ENOENT deletes Date: Tue, 11 Jul 2023 12:34:02 +0800 Message-Id: <20230711043405.66256-3-zhangjiachen.jaycee@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230711043405.66256-1-zhangjiachen.jaycee@bytedance.com> References: <20230711043405.66256-1-zhangjiachen.jaycee@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The EEXIST errors returned from server are strong sign that a local negative dentry should be invalidated. Similarly, The ENOENT errors from server can also be a sign of revalidate failure. This commit invalidates dentries on EEXIST creates and ENOENT deletes by calling fuse_invalidate_entry(), which improves the consistency with no performance degradation. Signed-off-by: Jiachen Zhang --- fs/fuse/dir.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 5a4a7155cf1c..cfe38ee91ffd 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -755,7 +755,8 @@ static int fuse_atomic_open(struct inode *dir, struct d= entry *entry, if (err =3D=3D -ENOSYS) { fc->no_create =3D 1; goto mknod; - } + } else if (err =3D=3D -EEXIST) + fuse_invalidate_entry(entry); out_dput: dput(res); return err; @@ -835,6 +836,8 @@ static int create_new_entry(struct fuse_mount *fm, stru= ct fuse_args *args, return 0; =20 out_put_forget_req: + if (err =3D=3D -EEXIST) + fuse_invalidate_entry(entry); kfree(forget); return err; } @@ -986,7 +989,7 @@ static int fuse_unlink(struct inode *dir, struct dentry= *entry) if (!err) { fuse_dir_changed(dir); fuse_entry_unlinked(entry); - } else if (err =3D=3D -EINTR) + } else if (err =3D=3D -EINTR || err =3D=3D -ENOENT) fuse_invalidate_entry(entry); return err; } @@ -1009,7 +1012,7 @@ static int fuse_rmdir(struct inode *dir, struct dentr= y *entry) if (!err) { fuse_dir_changed(dir); fuse_entry_unlinked(entry); - } else if (err =3D=3D -EINTR) + } else if (err =3D=3D -EINTR || err =3D=3D -ENOENT) fuse_invalidate_entry(entry); return err; } @@ -1050,7 +1053,7 @@ static int fuse_rename_common(struct inode *olddir, s= truct dentry *oldent, /* newent will end up negative */ if (!(flags & RENAME_EXCHANGE) && d_really_is_positive(newent)) fuse_entry_unlinked(newent); - } else if (err =3D=3D -EINTR) { + } else if (err =3D=3D -EINTR || err =3D=3D -ENOENT) { /* If request was interrupted, DEITY only knows if the rename actually took place. If the invalidation fails (e.g. some process has CWD under the renamed --=20 2.20.1 From nobody Mon Feb 9 00:38:52 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96D3AEB64DC for ; Tue, 11 Jul 2023 04:36:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230239AbjGKEgo (ORCPT ); Tue, 11 Jul 2023 00:36:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229449AbjGKEgk (ORCPT ); Tue, 11 Jul 2023 00:36:40 -0400 Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com [IPv6:2607:f8b0:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52E11E57 for ; Mon, 10 Jul 2023 21:36:38 -0700 (PDT) Received: by mail-pg1-x535.google.com with SMTP id 41be03b00d2f7-557790487feso3796863a12.0 for ; Mon, 10 Jul 2023 21:36:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1689050198; x=1691642198; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LYyneuFOya7q2sTcPQVJeQ03ke3KB9TyjkGjVTDlnv0=; b=US+yv14lPeyTCnFdCgqlMbJNwE5KFha2iFb4bUJz85g79tWQwveWhVddRTdv1h409z sTqEZOWjzTPRfgsJPWBR7Xx5TMSMfStlFaNsHvHzD2rRTGfDherNOCKdeLsuMxgINzR2 RQ/zGyLXnd+lNknbGvcDuOHlIVtsGnIVwOVpXPUZtRNkx761pJZXbBpAEj6JPPSqf78Y thC7FMr85gmT+MdH6x1W87SMix0tqX2IT2CXnXTw0rGKCjfu9FSlEa2NOjTvXSK9JIpO x4C5KSu2PyHdX8/MP4lTcW1T5X3bIbebYofTJvZNNZEApZEH8RJguPodrK/KFIeniH2p hCRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689050198; x=1691642198; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LYyneuFOya7q2sTcPQVJeQ03ke3KB9TyjkGjVTDlnv0=; b=aHuOTtTsFwNgH7wHVl2tT0f8e/wmfXcSmre9xj44dCTFVkfoMvxs4w2REtON8VWUuJ xV4moSiHkU9jBIdCYfhZOHwK8H/4fhIrnW6g4tS6oi6ADch76Mp7WvFhZ6WsNhNAymPR U3yt12CSJaKX+WrJIhHin91WWK34/0dhwP7bubMFmNI/l0imje+lcleiNU09kgi6x8kD SkZqVUQvdqB/S4TBTQV7A+GQM5Eto+W9oi3iqiZob+TYFkYZOs5k0amvc/2OnWZVZvin Dr75Zi8T/VmRIvFTwBZ1gUTr7wlOmX42b0CWHeKe/bVnOxzekoub0n42O/RPDb+Rx80d hc8Q== X-Gm-Message-State: ABy/qLbZfNwVEePQkro7F6yh9C67+5bPdgBma9emlH5c+KBMVL/42EpW DjL48kF4kLk/OvvJ01BV7R0ehA== X-Google-Smtp-Source: APBJJlHIf59vPvuJkt+nHEjvyzXHNkLoYRaEdMaHVely1xqgzY3bc4iYFFavqAdHFMoBUkKWqV62rg== X-Received: by 2002:a17:902:e88b:b0:1b8:1d4b:f5fc with SMTP id w11-20020a170902e88b00b001b81d4bf5fcmr17269445plg.30.1689050197840; Mon, 10 Jul 2023 21:36:37 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.236]) by smtp.gmail.com with ESMTPSA id ij9-20020a170902ab4900b001b9de67285dsm755259plb.156.2023.07.10.21.36.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jul 2023 21:36:37 -0700 (PDT) From: Jiachen Zhang To: Miklos Szeredi , Jonathan Corbet , linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Cc: me@jcix.top, Jiachen Zhang Subject: [PATCH 3/5] fuse: add FOPEN_INVAL_ATTR Date: Tue, 11 Jul 2023 12:34:03 +0800 Message-Id: <20230711043405.66256-4-zhangjiachen.jaycee@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230711043405.66256-1-zhangjiachen.jaycee@bytedance.com> References: <20230711043405.66256-1-zhangjiachen.jaycee@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add FOPEN_INVAL_ATTR so that the fuse daemon can ask kernel to invalidate the attr cache on file open. The fi->attr_version should be increased when handling FOPEN_INVAL_ATTR. Because if a FUSE request returning attributes (getattr, setattr, lookup, and readdirplus) starts before a FUSE_OPEN replying FOPEN_INVAL_ATTR, but finishes after the FUSE_OPEN, staled attributes will be set to the inode and falsely clears the inval_mask. Signed-off-by: Jiachen Zhang --- fs/fuse/file.c | 10 ++++++++++ include/uapi/linux/fuse.h | 2 ++ 2 files changed, 12 insertions(+) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index de37a3a06a71..412824a11b7b 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -215,6 +215,16 @@ void fuse_finish_open(struct inode *inode, struct file= *file) file_update_time(file); fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE); } + + if (ff->open_flags & FOPEN_INVAL_ATTR) { + struct fuse_inode *fi =3D get_fuse_inode(inode); + + spin_lock(&fi->lock); + fi->attr_version =3D atomic64_inc_return(&fc->attr_version); + fuse_invalidate_attr(inode); + spin_unlock(&fi->lock); + } + if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache) fuse_link_write_file(file); } diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index b3fcab13fcd3..1a24c11637a4 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -315,6 +315,7 @@ struct fuse_file_lock { * FOPEN_STREAM: the file is stream-like (no file position at all) * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_C= ACHE) * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the sam= e inode + * FOPEN_INVAL_ATTR: invalidate the attr cache on open */ #define FOPEN_DIRECT_IO (1 << 0) #define FOPEN_KEEP_CACHE (1 << 1) @@ -323,6 +324,7 @@ struct fuse_file_lock { #define FOPEN_STREAM (1 << 4) #define FOPEN_NOFLUSH (1 << 5) #define FOPEN_PARALLEL_DIRECT_WRITES (1 << 6) +#define FOPEN_INVAL_ATTR (1 << 7) =20 /** * INIT request/reply flags --=20 2.20.1 From nobody Mon Feb 9 00:38:52 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B5B6EB64DC for ; Tue, 11 Jul 2023 04:36:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230260AbjGKEgs (ORCPT ); Tue, 11 Jul 2023 00:36:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230254AbjGKEgn (ORCPT ); Tue, 11 Jul 2023 00:36:43 -0400 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3438E57 for ; Mon, 10 Jul 2023 21:36:41 -0700 (PDT) Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1b9e8e5b12dso8348965ad.3 for ; Mon, 10 Jul 2023 21:36:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1689050201; x=1691642201; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lnm0VPrqsxf3OoUkgFVp05JccyOYblt7LrKaUOTzwLE=; b=fgzEOPXUo9OtDvr33nwOel8ZSlBeOsg+Qu2lVQ9VSpIuAKLPSi9xaF71yz/Xw46zq6 fd+boEwdUSuJRFQ51nX01zpzLQog3dcJF4xA3fNd8Ve0r0K7oddvvZuVNcMJ216x+4dl Ifpoy2Z923Pc1KLJnlLsiIsDCAxOuGAy89ckiwCbH31dsBe9WUEJ05BZruhZC+iT+hQI DQbaGuZvtmKxzD1bsVl6Sux6IbnSpIJ171qpMijqfdCxFtCV1o0y0bU0ZR4hwdF2TN4/ AoqWH5/36qymY1EzJnrGqohEhRAtKaSYGpP8e/rhVqea1T/+k0wt0MyEpMD6rE9bjS+w jx5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689050201; x=1691642201; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lnm0VPrqsxf3OoUkgFVp05JccyOYblt7LrKaUOTzwLE=; b=cF09Ko9Hu+/0n4k7Nwgb/lVazo4gREphjkyM14iCiNd8y0sxoo5CoD8hWtctMO8+vw 8E/esnm5Df5jaducp9pBAdapA/fTIrSZ7p/wdnh+mPz42JkbEGA+HbI+8d+NCVaRGQEU zAFyKmxbTSiB+cxPbfYYqoCrCf8zFbfzK9X+a+i1OCA6zrAYVr4lpU3AxCOSWNyVSApO U5t4yTvvsLdMA4x0X3ApcM8h2qyEjvUrGL4fNPhmKlrMxO+lnHFWdb3COpqYpaWOPf0y 6slrGqNlfjKABSW7Lv4WTRNOBlbX070ThWm/3ZyE+WZIadSV+2TEeofdxVCPIiq3Usk3 rBQw== X-Gm-Message-State: ABy/qLY4YP0kPWcILzgA6WyrY0/La41hNqJ5S059LHmaFJc8gMo7/MNP ip+tko06LUnPR28RczB8RZA31g== X-Google-Smtp-Source: APBJJlHqbvCfDD6fEridYbhlFoWzHLj42mgUUP5+LgHd5Tvc2WYKYak2Kq6wQvCwqcBdF2lEDRYEQw== X-Received: by 2002:a17:902:eacc:b0:1b8:ae12:5610 with SMTP id p12-20020a170902eacc00b001b8ae125610mr15069821pld.7.1689050201405; Mon, 10 Jul 2023 21:36:41 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.236]) by smtp.gmail.com with ESMTPSA id ij9-20020a170902ab4900b001b9de67285dsm755259plb.156.2023.07.10.21.36.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jul 2023 21:36:40 -0700 (PDT) From: Jiachen Zhang To: Miklos Szeredi , Jonathan Corbet , linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Cc: me@jcix.top, Jiachen Zhang Subject: [PATCH 4/5] fuse: writeback_cache consistency enhancement (writeback_cache_v2) Date: Tue, 11 Jul 2023 12:34:04 +0800 Message-Id: <20230711043405.66256-5-zhangjiachen.jaycee@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230711043405.66256-1-zhangjiachen.jaycee@bytedance.com> References: <20230711043405.66256-1-zhangjiachen.jaycee@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Some users may want both the high performance of the writeback_cahe mode and a little bit more consistency among FUSE mounts. Current writeback_cache mode never updates attributes from server, so can never see the file attributes changed by other FUSE mounts, which means 'zero-consisteny'. This commit introduces writeback_cache_v2 mode, which allows the attributes to be updated from server to kernel when the inode is clean and no writeback is in-progressing. FUSE daemons can select this mode by the FUSE_WRITEBACK_CACHE_V2 init flag. In writeback_cache_v2 mode, the server generates official attributes. Therefore, 1. For the cmtime, the cmtime generated by kernel are just temporary values that are never flushed to server by fuse_write_inode(), and they could be eventually updated by the official server cmtime. The mtime-based revalidation of the fc->auto_inval_data mode is also skipped, as the kernel-generated temporary cmtime are likely not equal to the offical server cmtime. 2. For the file size, we expect server updates its file size on FUSE_WRITEs. So we increase fi->attr_version in fuse_writepage_end() to check the staleness of the returning file size. Together with FOPEN_INVAL_ATTR, a FUSE daemon is able to implement close-to-open (CTO) consistency like NFS client implementations. Signed-off-by: Jiachen Zhang --- fs/fuse/file.c | 25 +++++++++++++++++++++++ fs/fuse/fuse_i.h | 6 ++++++ fs/fuse/inode.c | 42 +++++++++++++++++++++++++++++++++++++-- include/uapi/linux/fuse.h | 9 ++++++++- 4 files changed, 79 insertions(+), 3 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 412824a11b7b..09416caea575 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1914,6 +1914,10 @@ static void fuse_writepage_end(struct fuse_mount *fm= , struct fuse_args *args, */ fuse_send_writepage(fm, next, inarg->offset + inarg->size); } + + if (fc->writeback_cache_v2) + fi->attr_version =3D atomic64_inc_return(&fc->attr_version); + fi->writectr--; fuse_writepage_finish(fm, wpa); spin_unlock(&fi->lock); @@ -1943,10 +1947,18 @@ static struct fuse_file *fuse_write_file_get(struct= fuse_inode *fi) =20 int fuse_write_inode(struct inode *inode, struct writeback_control *wbc) { + struct fuse_conn *fc =3D get_fuse_conn(inode); struct fuse_inode *fi =3D get_fuse_inode(inode); struct fuse_file *ff; int err; =20 + /* + * Kernel c/mtime should not be updated to the server in the + * writeback_cache_v2 mode as server c/mtime are official. + */ + if (fc->writeback_cache_v2) + return 0; + /* * Inode is always written before the last reference is dropped and * hence this should not be reached from reclaim. @@ -2375,11 +2387,14 @@ static int fuse_write_begin(struct file *file, stru= ct address_space *mapping, { pgoff_t index =3D pos >> PAGE_SHIFT; struct fuse_conn *fc =3D get_fuse_conn(file_inode(file)); + struct fuse_inode *fi =3D get_fuse_inode(file_inode(file)); struct page *page; loff_t fsize; int err =3D -ENOMEM; =20 WARN_ON(!fc->writeback_cache); + if (fc->writeback_cache_v2) + mutex_lock(&fi->wb_attr_mutex); =20 page =3D grab_cache_page_write_begin(mapping, index); if (!page) @@ -2411,6 +2426,9 @@ static int fuse_write_begin(struct file *file, struct= address_space *mapping, unlock_page(page); put_page(page); error: + if (fc->writeback_cache_v2) + mutex_unlock(&fi->wb_attr_mutex); + return err; } =20 @@ -2419,6 +2437,7 @@ static int fuse_write_end(struct file *file, struct a= ddress_space *mapping, struct page *page, void *fsdata) { struct inode *inode =3D page->mapping->host; + struct fuse_conn *fc =3D get_fuse_conn(inode); =20 /* Haven't copied anything? Skip zeroing, size extending, dirtying. */ if (!copied) @@ -2442,6 +2461,12 @@ static int fuse_write_end(struct file *file, struct = address_space *mapping, unlock_page(page); put_page(page); =20 + if (fc->writeback_cache_v2) { + struct fuse_inode *fi =3D get_fuse_inode(inode); + + mutex_unlock(&fi->wb_attr_mutex); + } + return copied; } =20 diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 9b7fc7d3c7f1..200be199eb93 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -155,6 +155,9 @@ struct fuse_inode { */ struct fuse_inode_dax *dax; #endif + + /** Lock for serializing size updates in writeback_cache_v2 mode */ + struct mutex wb_attr_mutex; }; =20 /** FUSE inode state bits */ @@ -656,6 +659,9 @@ struct fuse_conn { /* show legacy mount options */ unsigned int legacy_opts_show:1; =20 + /* Improved writeback cache policy */ + unsigned writeback_cache_v2:1; + /* * fs kills suid/sgid/cap on write/chown/trunc. suid is killed on * write/trunc only if caller did not have CAP_FSETID. sgid is killed diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 3e0b1fb1db17..958f8534a585 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -84,6 +84,7 @@ static struct inode *fuse_alloc_inode(struct super_block = *sb) fi->orig_ino =3D 0; fi->state =3D 0; mutex_init(&fi->mutex); + mutex_init(&fi->wb_attr_mutex); spin_lock_init(&fi->lock); fi->forget =3D fuse_alloc_forget(); if (!fi->forget) @@ -246,14 +247,36 @@ void fuse_change_attributes(struct inode *inode, stru= ct fuse_attr *attr, u32 cache_mask; loff_t oldsize; struct timespec64 old_mtime; + bool try_wb_update =3D false; + + if (fc->writeback_cache_v2 && S_ISREG(inode->i_mode)) { + mutex_lock(&fi->wb_attr_mutex); + try_wb_update =3D true; + } =20 spin_lock(&fi->lock); /* * In case of writeback_cache enabled, writes update mtime, ctime and * may update i_size. In these cases trust the cached value in the * inode. + * + * In writeback_cache_v2 mode, if all the following conditions are met, + * then we allow the attributes to be refreshed: + * + * - inode is not in the process of being written (I_SYNC) + * - inode has no dirty pages (I_DIRTY_PAGES) + * - inode data-related attributes are clean (I_DIRTY_DATASYNC) + * - inode does not have any page writeback in progress + * + * Note: checking PAGECACHE_TAG_WRITEBACK is not sufficient in fuse, + * since inode can appear to have no PageWriteback pages, yet still have + * outstanding write request. */ cache_mask =3D fuse_get_cache_mask(inode); + if (try_wb_update && !(inode->i_state & (I_DIRTY_PAGES | I_SYNC | + I_DIRTY_DATASYNC)) && RB_EMPTY_ROOT(&fi->writepages)) + cache_mask &=3D ~(STATX_MTIME | STATX_CTIME | STATX_SIZE); + if (cache_mask & STATX_SIZE) attr->size =3D i_size_read(inode); =20 @@ -269,6 +292,8 @@ void fuse_change_attributes(struct inode *inode, struct= fuse_attr *attr, if ((attr_version !=3D 0 && fi->attr_version > attr_version) || test_bit(FUSE_I_SIZE_UNSTABLE, &fi->state)) { spin_unlock(&fi->lock); + if (try_wb_update) + mutex_unlock(&fi->wb_attr_mutex); return; } =20 @@ -292,7 +317,13 @@ void fuse_change_attributes(struct inode *inode, struc= t fuse_attr *attr, truncate_pagecache(inode, attr->size); if (!fc->explicit_inval_data) inval =3D true; - } else if (fc->auto_inval_data) { + } else if (!fc->writeback_cache_v2 && fc->auto_inval_data) { + /* + * When fc->writeback_cache_v2 is set, the old_mtime + * can be generated by kernel and must not equal to + * new_mtime generated by server. So skip in such + * case. + */ struct timespec64 new_mtime =3D { .tv_sec =3D attr->mtime, .tv_nsec =3D attr->mtimensec, @@ -312,6 +343,9 @@ void fuse_change_attributes(struct inode *inode, struct= fuse_attr *attr, =20 if (IS_ENABLED(CONFIG_FUSE_DAX)) fuse_dax_dontcache(inode, attr->flags); + + if (try_wb_update) + mutex_unlock(&fi->wb_attr_mutex); } =20 static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr, @@ -1179,6 +1213,10 @@ static void process_init_reply(struct fuse_mount *fm= , struct fuse_args *args, fc->async_dio =3D 1; if (flags & FUSE_WRITEBACK_CACHE) fc->writeback_cache =3D 1; + if (flags & FUSE_WRITEBACK_CACHE_V2) { + fc->writeback_cache =3D 1; + fc->writeback_cache_v2 =3D 1; + } if (flags & FUSE_PARALLEL_DIROPS) fc->parallel_dirops =3D 1; if (flags & FUSE_HANDLE_KILLPRIV) @@ -1262,7 +1300,7 @@ void fuse_send_init(struct fuse_mount *fm) FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA | FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT | FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP | - FUSE_HAS_EXPIRE_ONLY; + FUSE_HAS_EXPIRE_ONLY | FUSE_WRITEBACK_CACHE_V2; #ifdef CONFIG_FUSE_DAX if (fm->fc->dax) flags |=3D FUSE_MAP_ALIGNMENT; diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 1a24c11637a4..850a3c0f87fb 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -207,6 +207,9 @@ * - add FUSE_EXT_GROUPS * - add FUSE_CREATE_SUPP_GROUP * - add FUSE_HAS_EXPIRE_ONLY + * + * 7.39 + * - add FUSE_WRITEBACK_CACHE_V2 init flag */ =20 #ifndef _LINUX_FUSE_H @@ -242,7 +245,7 @@ #define FUSE_KERNEL_VERSION 7 =20 /** Minor version number of this interface */ -#define FUSE_KERNEL_MINOR_VERSION 38 +#define FUSE_KERNEL_MINOR_VERSION 39 =20 /** The node ID of the root inode */ #define FUSE_ROOT_ID 1 @@ -373,6 +376,9 @@ struct fuse_file_lock { * FUSE_CREATE_SUPP_GROUP: add supplementary group info to create, mkdir, * symlink and mknod (single group that matches parent) * FUSE_HAS_EXPIRE_ONLY: kernel supports expiry-only entry invalidation + * FUSE_WRITEBACK_CACHE_V2: + * allow time/size to be refreshed if no pending write + * c/mtime not updated from kernel to server */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) @@ -411,6 +417,7 @@ struct fuse_file_lock { #define FUSE_HAS_INODE_DAX (1ULL << 33) #define FUSE_CREATE_SUPP_GROUP (1ULL << 34) #define FUSE_HAS_EXPIRE_ONLY (1ULL << 35) +#define FUSE_WRITEBACK_CACHE_V2 (1ULL << 36) =20 /** * CUSE INIT request/reply flags --=20 2.20.1 From nobody Mon Feb 9 00:38:52 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 448B5C001DC for ; Tue, 11 Jul 2023 04:38:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231322AbjGKEiC (ORCPT ); Tue, 11 Jul 2023 00:38:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230314AbjGKEhK (ORCPT ); Tue, 11 Jul 2023 00:37:10 -0400 Received: from mail-oi1-x22a.google.com (mail-oi1-x22a.google.com [IPv6:2607:f8b0:4864:20::22a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BD46E57 for ; Mon, 10 Jul 2023 21:36:45 -0700 (PDT) Received: by mail-oi1-x22a.google.com with SMTP id 5614622812f47-39ca120c103so3731029b6e.2 for ; Mon, 10 Jul 2023 21:36:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1689050204; x=1691642204; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Uwg2LF4AANWZXAWwCxGinMEpTuqfx93iO2WWEDfZD58=; b=FQYPBCJf+VlZ5ovBna1yPEjwhpPcuj+TyxHZ6gvlXmFomTF22kPeeT/g0ze4HZ9wHo zqm6W0WC7dErw/M8WMC2QBLnzPkIlnJau1pUamJyDbHRCUYMta2PJDPLuefEj30lNmqQ bFMNCmQHoyfEyV999O2zvsnHzuAHebR7iDySxbjdRsstFgtEIbbr2FjSj2EhgG5Yq/8J 6eqYzrVylv37i/NMhZFrvNEhx58ljDYuU6v/ZPCHAnyBWoKzf5/yGmtW+v1VyCZBFWI1 A3hQnXSUZR+VmVS5C6CeVBnQ+TKQ1SuUcDn8NMKn75QP4P+7Qq5cV+EdA1Vom2Os1u3g Z+7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689050204; x=1691642204; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Uwg2LF4AANWZXAWwCxGinMEpTuqfx93iO2WWEDfZD58=; b=aROgf9+gZxSNpMqN56qw19y+LqOGnH4+ZofyCetLnhsahihZRDPjj7YIxL7zRA//X/ XqAVArcU89DVzchfkG0Yo5r2AutmQ4yB+UHlAvb7lf7T7mOSOsqUcBunRoQryWJrEBkq L60Rn0DhfwTbri6QerGgAAr4OrtgW81pK3LwGLkFNe6oXlmVzkhcJ9AccLLtrjsN2USd k95TsjWjbLmV4NdHIu6oVaSZYQe42cZDl6mLDx/z0Tywz56qivtIXbOtfDHCKOFI3SmT 8UKXFxMK4QfYwlcAK7rPHyJD/0uhMhWficgHXXrGtJabEvqtXvrcu52E7kAxjmFdKN4Q 1TJQ== X-Gm-Message-State: ABy/qLYh7noAZuTG/f+J+EgrvsuMVQuWOIaSfesCr6rc++R45O7Lw/eZ GQxrEB8260s6rfCed+DUOFHIynBT87q3NGLo+r+NyQ== X-Google-Smtp-Source: APBJJlF+/Mer4asnYDOi1Z2fZgD4qhRhJV5X1fI+Mt3Z+R6CtLdrmsgjVXdUNNYIMRbvUQc7u4fz1g== X-Received: by 2002:a05:6358:9995:b0:131:46b:3953 with SMTP id j21-20020a056358999500b00131046b3953mr12700100rwb.13.1689050204573; Mon, 10 Jul 2023 21:36:44 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.236]) by smtp.gmail.com with ESMTPSA id ij9-20020a170902ab4900b001b9de67285dsm755259plb.156.2023.07.10.21.36.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jul 2023 21:36:44 -0700 (PDT) From: Jiachen Zhang To: Miklos Szeredi , Jonathan Corbet , linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Cc: me@jcix.top, Jiachen Zhang Subject: [PATCH 5/5] docs: fuse: improve FUSE consistency explanation Date: Tue, 11 Jul 2023 12:34:05 +0800 Message-Id: <20230711043405.66256-6-zhangjiachen.jaycee@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230711043405.66256-1-zhangjiachen.jaycee@bytedance.com> References: <20230711043405.66256-1-zhangjiachen.jaycee@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Signed-off-by: Jiachen Zhang --- Documentation/filesystems/fuse-io.rst | 32 +++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/Documentation/filesystems/fuse-io.rst b/Documentation/filesyst= ems/fuse-io.rst index 255a368fe534..cdd292dd2e9c 100644 --- a/Documentation/filesystems/fuse-io.rst +++ b/Documentation/filesystems/fuse-io.rst @@ -10,6 +10,10 @@ Fuse supports the following I/O modes: - cached + write-through + writeback-cache + + writeback-cache-v2 + +Direct-io Mode +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 The direct-io mode can be selected with the FOPEN_DIRECT_IO flag in the FUSE_OPEN reply. @@ -17,6 +21,9 @@ FUSE_OPEN reply. In direct-io mode the page cache is completely bypassed for reads and writ= es. No read-ahead takes place. Shared mmap is disabled. =20 +Cached Modes and Cache Coherence +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D + In cached mode reads may be satisfied from the page cache, and data may be read-ahead by the kernel to fill the cache. The cache is always kept cons= istent after any writes to the file. All mmap modes are supported. @@ -24,7 +31,8 @@ after any writes to the file. All mmap modes are support= ed. The cached mode has two sub modes controlling how writes are handled. The write-through mode is the default and is supported on all kernels. The writeback-cache mode may be selected by the FUSE_WRITEBACK_CACHE flag in t= he -FUSE_INIT reply. +FUSE_INIT reply. In either modes, if the FOPEN_KEEP_CACHE flag is not set = in +the FUSE_OPEN, cached pages of the file will be invalidated immediatedly. =20 In write-through mode each write is immediately sent to userspace as one o= r more WRITE requests, as well as updating any cached pages (and caching previous= ly @@ -38,7 +46,27 @@ reclaim on memory pressure) or explicitly (invoked by cl= ose(2), fsync(2) and when the last ref to the file is being released on munmap(2)). This mode assumes that all changes to the filesystem go through the FUSE kernel modu= le (size and atime/ctime/mtime attributes are kept up-to-date by the kernel),= so -it's generally not suitable for network filesystems. If a partial page is +it's generally not suitable for network filesystems (you can consider the +writeback-cache-v2 mode mentioned latter for them). If a partial page is written, then the page needs to be first read from userspace. This means,= that even for files opened for O_WRONLY it is possible that READ requests will = be generated by the kernel. + +Writeback-cache-v2 mode (enabled by the FUSE_WRITEBACK_CACHE_V2 flag) reta= ins +the dirty page management logic of the writeback-cache mode, which provides +great write performance. Furthermore, the v2 mode improves cache coherenc= e for +multiple FUSE mounts scenarios, especially for network filesystems. The ke= rnel +a/c/mtime and size attributes are allowed to be updated from the filesystem +either on timeout or when they have been explicitly invalidated. Meanwhile= , if +ever updated by kernel locally, the attributes will not be propagated to t= he +filesystem. In other words, the filesystem rather than kernel is considere= d the +official source for generating these attributes. + +By combining the writeback-cache-v2 mode with the appropriate open flags +(FOPEN_KEEP_CACHE and FOPEN_INVAL_ATTR for keeping page cache and invalida= ting +attributes on FUSE_OPEN respectively), filesystems are able to implement t= he +close-to-open (CTO) consistency semantics, which is widely supported by NFS +client implementations. This allows for maintaining the writeback manner of +dirty pages while ensuring cache coherence of attributes and file data if = the +operations among different FUSE mounts on a file are properly serialized by +users using the open-after-close manner. --=20 2.20.1