From nobody Sat May 18 04:13:26 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1643122369506306.89896052805784; Tue, 25 Jan 2022 06:52:49 -0800 (PST) Received: from localhost ([::1]:55956 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nCNBg-0004IQ-Go for importer@patchew.org; Tue, 25 Jan 2022 09:52:48 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55106) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCMZ1-0002Wz-Ru for qemu-devel@nongnu.org; Tue, 25 Jan 2022 09:12:52 -0500 Received: from us-smtp-delivery-44.mimecast.com ([207.211.30.44]:27638) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCMYx-0004am-LS for qemu-devel@nongnu.org; Tue, 25 Jan 2022 09:12:51 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-235-HIF4c5kWNE-nNXUwDEaiyw-1; Tue, 25 Jan 2022 09:12:38 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 00EB669755; Tue, 25 Jan 2022 14:12:37 +0000 (UTC) Received: from bahia.redhat.com (unknown [10.39.192.28]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6F05284A0A; Tue, 25 Jan 2022 14:12:33 +0000 (UTC) X-MC-Unique: HIF4c5kWNE-nNXUwDEaiyw-1 From: Greg Kurz To: qemu-devel@nongnu.org Subject: [PATCH v4 1/2] virtiofsd: Track mounts Date: Tue, 25 Jan 2022 15:12:11 +0100 Message-Id: <20220125141213.361930-2-groug@kaod.org> In-Reply-To: <20220125141213.361930-1-groug@kaod.org> References: <20220125141213.361930-1-groug@kaod.org> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: kaod.org Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: softfail client-ip=207.211.30.44; envelope-from=groug@kaod.org; helo=us-smtp-delivery-44.mimecast.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sebastian Hasler , Greg Kurz , "Dr. David Alan Gilbert" , virtio-fs@redhat.com, Stefan Hajnoczi , Vivek Goyal Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZM-MESSAGEID: 1643122373585100001 Content-Type: text/plain; charset="utf-8" The upcoming implementation of ->sync_fs() needs to know about all submounts in order to call syncfs() on them when virtiofsd is started without '-o announce_submounts'. Track every inode that comes up with a new mount id in a GHashTable. If the mount id isn't available, e.g. no statx() on the host, fallback on the device id for the key. This is done during lookup because we only care for the submounts that the client knows about. The inode is removed from the hash table when ultimately unreferenced. This can happen on a per-mount basis when the client posts a FUSE_FORGET request or for all submounts at once with FUSE_DESTROY. Signed-off-by: Greg Kurz --- tools/virtiofsd/passthrough_ll.c | 43 +++++++++++++++++++++++++++++--- 1 file changed, 40 insertions(+), 3 deletions(-) diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough= _ll.c index 64b5b4fbb186..7bf31fc129c8 100644 --- a/tools/virtiofsd/passthrough_ll.c +++ b/tools/virtiofsd/passthrough_ll.c @@ -117,6 +117,7 @@ struct lo_inode { GHashTable *posix_locks; /* protected by lo_inode->plock_mutex */ =20 mode_t filetype; + bool is_mnt; }; =20 struct lo_cred { @@ -164,6 +165,7 @@ struct lo_data { bool use_statx; struct lo_inode root; GHashTable *inodes; /* protected by lo->mutex */ + GHashTable *mnt_inodes; /* protected by lo->mutex */ struct lo_map ino_map; /* protected by lo->mutex */ struct lo_map dirp_map; /* protected by lo->mutex */ struct lo_map fd_map; /* protected by lo->mutex */ @@ -1000,6 +1002,31 @@ static int do_statx(struct lo_data *lo, int dirfd, c= onst char *pathname, return 0; } =20 +static uint64_t mnt_inode_key(struct lo_inode *inode) +{ + /* Prefer mnt_id, fallback on dev */ + return inode->key.mnt_id ? inode->key.mnt_id : inode->key.dev; +} + +static void add_mnt_inode(struct lo_data *lo, struct lo_inode *inode) +{ + uint64_t mnt_key =3D mnt_inode_key(inode); + + if (!g_hash_table_contains(lo->mnt_inodes, &mnt_key)) { + inode->is_mnt =3D true; + g_hash_table_insert(lo->mnt_inodes, &mnt_key, inode); + } +} + +static void remove_mnt_inode(struct lo_data *lo, struct lo_inode *inode) +{ + uint64_t mnt_key =3D mnt_inode_key(inode); + + if (inode->is_mnt) { + g_hash_table_remove(lo->mnt_inodes, &mnt_key); + } +} + /* * Increments nlookup on the inode on success. unref_inode_lolocked() must= be * called eventually to decrement nlookup again. If inodep is non-NULL, the @@ -1086,10 +1113,15 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t = parent, const char *name, pthread_mutex_lock(&lo->mutex); inode->fuse_ino =3D lo_add_inode_mapping(req, inode); g_hash_table_insert(lo->inodes, &inode->key, inode); + add_mnt_inode(lo, inode); pthread_mutex_unlock(&lo->mutex); } e->ino =3D inode->fuse_ino; =20 + fuse_log(FUSE_LOG_DEBUG, " %lli/%s -> %lli%s\n", + (unsigned long long) parent, name, (unsigned long long) e->in= o, + inode->is_mnt ? " (submount)" : ""); + /* Transfer ownership of inode pointer to caller or drop it */ if (inodep) { *inodep =3D inode; @@ -1099,9 +1131,6 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t pa= rent, const char *name, =20 lo_inode_put(lo, &dir); =20 - fuse_log(FUSE_LOG_DEBUG, " %lli/%s -> %lli\n", (unsigned long long)pa= rent, - name, (unsigned long long)e->ino); - return 0; =20 out_err: @@ -1563,6 +1592,7 @@ static void unref_inode(struct lo_data *lo, struct lo= _inode *inode, uint64_t n) g_hash_table_destroy(inode->posix_locks); pthread_mutex_destroy(&inode->plock_mutex); } + remove_mnt_inode(lo, inode); /* Drop our refcount from lo_do_lookup() */ lo_inode_put(lo, &inode); } @@ -3337,6 +3367,7 @@ static void lo_destroy(void *userdata) struct lo_data *lo =3D (struct lo_data *)userdata; =20 pthread_mutex_lock(&lo->mutex); + g_hash_table_remove_all(lo->mnt_inodes); while (true) { GHashTableIter iter; gpointer key, value; @@ -3850,6 +3881,7 @@ static void setup_root(struct lo_data *lo, struct lo_= inode *root) root->posix_locks =3D g_hash_table_new_full( g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy= ); } + add_mnt_inode(lo, root); } =20 static guint lo_key_hash(gconstpointer key) @@ -3869,6 +3901,10 @@ static gboolean lo_key_equal(gconstpointer a, gconst= pointer b) =20 static void fuse_lo_data_cleanup(struct lo_data *lo) { + if (lo->mnt_inodes) { + g_hash_table_destroy(lo->mnt_inodes); + } + if (lo->inodes) { g_hash_table_destroy(lo->inodes); } @@ -3931,6 +3967,7 @@ int main(int argc, char *argv[]) lo.root.fd =3D -1; lo.root.fuse_ino =3D FUSE_ROOT_ID; lo.cache =3D CACHE_AUTO; + lo.mnt_inodes =3D g_hash_table_new(g_int64_hash, g_int64_equal); =20 /* * Set up the ino map like this: --=20 2.34.1 From nobody Sat May 18 04:13:26 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1643122156743236.05004293357422; Tue, 25 Jan 2022 06:49:16 -0800 (PST) Received: from localhost ([::1]:48612 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nCN8F-0007Yk-OR for importer@patchew.org; Tue, 25 Jan 2022 09:49:15 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55108) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCMZ2-0002X0-3U for qemu-devel@nongnu.org; Tue, 25 Jan 2022 09:12:52 -0500 Received: from us-smtp-delivery-44.mimecast.com ([207.211.30.44]:48750) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCMZ0-0004ay-Bu for qemu-devel@nongnu.org; Tue, 25 Jan 2022 09:12:51 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-199-9nXbbwY8MAKUeyUgvXp-HA-1; Tue, 25 Jan 2022 09:12:46 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AFF99814702; Tue, 25 Jan 2022 14:12:45 +0000 (UTC) Received: from bahia.redhat.com (unknown [10.39.192.28]) by smtp.corp.redhat.com (Postfix) with ESMTP id 44E9D84A0F; Tue, 25 Jan 2022 14:12:37 +0000 (UTC) X-MC-Unique: 9nXbbwY8MAKUeyUgvXp-HA-1 From: Greg Kurz To: qemu-devel@nongnu.org Subject: [PATCH v4 2/2] virtiofsd: Add support for FUSE_SYNCFS request Date: Tue, 25 Jan 2022 15:12:12 +0100 Message-Id: <20220125141213.361930-3-groug@kaod.org> In-Reply-To: <20220125141213.361930-1-groug@kaod.org> References: <20220125141213.361930-1-groug@kaod.org> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=groug@kaod.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: kaod.org Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: softfail client-ip=207.211.30.44; envelope-from=groug@kaod.org; helo=us-smtp-delivery-44.mimecast.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sebastian Hasler , Greg Kurz , "Dr. David Alan Gilbert" , virtio-fs@redhat.com, Stefan Hajnoczi , Vivek Goyal Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZM-MESSAGEID: 1643122160611100001 Content-Type: text/plain; charset="utf-8" Honor the expected behavior of syncfs() to synchronously flush all data and metadata on linux systems. If virtiofsd is started with '-o announce_submounts', the client is expected to send a FUSE_SYNCFS request for each individual submount. In this case, we just create a new file descriptor on the submount inode with lo_inode_open(), call syncfs() on it and close it. The intermediary file is needed because O_PATH descriptors aren't backed by an actual file and syncfs() would fail with EBADF. If virtiofsd is started without '-o announce_submounts', the client only sends a single FUSE_SYNCFS request, for the root inode. In this case, we need to loop on all known submounts to sync them. We cannot call syncfs() with the lo->mutex held since it could stall virtiofsd for an unbounded time : let's generate the list of inodes with the mutex held, drop the mutex and then loop on the temporary list. A reference must be taken on each inode to ensure it doesn't go away when the mutex is dropped. Note that syncfs() might suffer from a time penalty if the submounts are being hammered by some unrelated workload on the host. The only solution to prevent that is to avoid shared mounts. Signed-off-by: Greg Kurz --- tools/virtiofsd/fuse_lowlevel.c | 11 +++ tools/virtiofsd/fuse_lowlevel.h | 13 ++++ tools/virtiofsd/passthrough_ll.c | 98 +++++++++++++++++++++++++++ tools/virtiofsd/passthrough_seccomp.c | 1 + 4 files changed, 123 insertions(+) diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowleve= l.c index e4679c73abc2..e02d8b25a5f6 100644 --- a/tools/virtiofsd/fuse_lowlevel.c +++ b/tools/virtiofsd/fuse_lowlevel.c @@ -1876,6 +1876,16 @@ static void do_lseek(fuse_req_t req, fuse_ino_t node= id, } } =20 +static void do_syncfs(fuse_req_t req, fuse_ino_t nodeid, + struct fuse_mbuf_iter *iter) +{ + if (req->se->op.syncfs) { + req->se->op.syncfs(req, nodeid); + } else { + fuse_reply_err(req, ENOSYS); + } +} + static void do_init(fuse_req_t req, fuse_ino_t nodeid, struct fuse_mbuf_iter *iter) { @@ -2280,6 +2290,7 @@ static struct { [FUSE_RENAME2] =3D { do_rename2, "RENAME2" }, [FUSE_COPY_FILE_RANGE] =3D { do_copy_file_range, "COPY_FILE_RANGE" }, [FUSE_LSEEK] =3D { do_lseek, "LSEEK" }, + [FUSE_SYNCFS] =3D { do_syncfs, "SYNCFS" }, }; =20 #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0])) diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowleve= l.h index c55c0ca2fc1c..b889dae4de0e 100644 --- a/tools/virtiofsd/fuse_lowlevel.h +++ b/tools/virtiofsd/fuse_lowlevel.h @@ -1226,6 +1226,19 @@ struct fuse_lowlevel_ops { */ void (*lseek)(fuse_req_t req, fuse_ino_t ino, off_t off, int whence, struct fuse_file_info *fi); + + /** + * Synchronize file system content + * + * If this request is answered with an error code of ENOSYS, + * this is treated as success and future calls to syncfs() will + * succeed automatically without being sent to the filesystem + * process. + * + * @param req request handle + * @param ino the inode number + */ + void (*syncfs)(fuse_req_t req, fuse_ino_t ino); }; =20 /** diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough= _ll.c index 7bf31fc129c8..9021eb091a28 100644 --- a/tools/virtiofsd/passthrough_ll.c +++ b/tools/virtiofsd/passthrough_ll.c @@ -3362,6 +3362,103 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino= , off_t off, int whence, } } =20 +static int do_syncfs(struct lo_data *lo, struct lo_inode *inode) +{ + int fd, err =3D 0; + + fuse_log(FUSE_LOG_DEBUG, "lo_syncfs(ino=3D%" PRIu64 ")\n", inode->fuse= _ino); + + fd =3D lo_inode_open(lo, inode, O_RDONLY); + if (fd < 0) { + return -fd; + } + + if (syncfs(fd) < 0) { + err =3D -errno; + } + + close(fd); + return err; +} + +struct syncfs_func_data { + struct lo_data *lo; + int err; +}; + +static void syncfs_func(gpointer data, gpointer user_data) +{ + struct syncfs_func_data *sfdata =3D user_data; + struct lo_data *lo =3D sfdata->lo; + struct lo_inode *inode =3D data; + + if (!sfdata->err) { + sfdata->err =3D do_syncfs(lo, inode); + } + + lo_inode_put(lo, &inode); +} + +static int lo_syncfs_all(fuse_req_t req) +{ + struct lo_data *lo =3D lo_data(req); + GHashTableIter iter; + gpointer key, value; + GSList *list =3D NULL; + struct syncfs_func_data sfdata =3D { + .lo =3D lo, + .err =3D 0, + }; + + pthread_mutex_lock(&lo->mutex); + + g_hash_table_iter_init(&iter, lo->mnt_inodes); + while (g_hash_table_iter_next(&iter, &key, &value)) { + struct lo_inode *inode =3D value; + + /* Reference is put in syncfs_func() */ + g_atomic_int_inc(&inode->refcount); + list =3D g_slist_prepend(list, inode); + } + + pthread_mutex_unlock(&lo->mutex); + + g_slist_foreach(list, syncfs_func, &sfdata); + g_slist_free(list); + return sfdata.err; +} + +static int lo_syncfs_one(fuse_req_t req, fuse_ino_t ino) +{ + struct lo_data *lo =3D lo_data(req); + struct lo_inode *inode; + int err; + + inode =3D lo_inode(req, ino); + if (!inode) { + return -EBADF; + } + + err =3D do_syncfs(lo, inode); + lo_inode_put(lo, &inode); + return err; +} + +static void lo_syncfs(fuse_req_t req, fuse_ino_t ino) +{ + struct lo_data *lo =3D lo_data(req); + int err; + + if (lo->announce_submounts) { + err =3D lo_syncfs_one(req, ino); + } else { + err =3D lo_syncfs_all(req); + } + + fuse_reply_err(req, err); +} + + static void lo_destroy(void *userdata) { struct lo_data *lo =3D (struct lo_data *)userdata; @@ -3423,6 +3520,7 @@ static struct fuse_lowlevel_ops lo_oper =3D { .copy_file_range =3D lo_copy_file_range, #endif .lseek =3D lo_lseek, + .syncfs =3D lo_syncfs, .destroy =3D lo_destroy, }; =20 diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passth= rough_seccomp.c index a3ce9f898d2d..3e9d6181dc69 100644 --- a/tools/virtiofsd/passthrough_seccomp.c +++ b/tools/virtiofsd/passthrough_seccomp.c @@ -108,6 +108,7 @@ static const int syscall_allowlist[] =3D { SCMP_SYS(set_robust_list), SCMP_SYS(setxattr), SCMP_SYS(symlinkat), + SCMP_SYS(syncfs), SCMP_SYS(time), /* Rarely needed, except on static builds */ SCMP_SYS(tgkill), SCMP_SYS(unlinkat), --=20 2.34.1