meson.build | 16 + tools/virtiofsd/fuse_common.h | 7 + tools/virtiofsd/fuse_lowlevel.h | 5 + tools/virtiofsd/fuse_lowlevel.c | 5 + tools/virtiofsd/helper.c | 1 + tools/virtiofsd/passthrough_ll.c | 117 ++++++- tools/virtiofsd/passthrough_seccomp.c | 1 + tests/acceptance/boot_linux.py | 13 +- tests/acceptance/virtiofs_submounts.py | 289 ++++++++++++++++++ .../virtiofs_submounts.py.data/cleanup.sh | 46 +++ .../guest-cleanup.sh | 30 ++ .../virtiofs_submounts.py.data/guest.sh | 138 +++++++++ .../virtiofs_submounts.py.data/host.sh | 127 ++++++++ 13 files changed, 779 insertions(+), 16 deletions(-) create mode 100644 tests/acceptance/virtiofs_submounts.py create mode 100644 tests/acceptance/virtiofs_submounts.py.data/cleanup.sh create mode 100644 tests/acceptance/virtiofs_submounts.py.data/guest-cleanup.sh create mode 100644 tests/acceptance/virtiofs_submounts.py.data/guest.sh create mode 100644 tests/acceptance/virtiofs_submounts.py.data/host.sh
RFC: https://www.redhat.com/archives/virtio-fs/2020-May/msg00024.html v1: https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03598.html Branch: https://github.com/XanClic/qemu.git virtiofs-submounts-v3 Branch: https://git.xanclic.moe/XanClic/qemu.git virtiofs-submounts-v3 Based-on: <160390309510.12234.8858324597971641979.stgit@gimli.home> (Alex’s pull request “VFIO updates 2020-10-28 (for QEMU 5.2 soft-freeze)”, notably the “linux-headers: update against 5.10-rc1” patch) Hi, We want to (be able to) announce the host mount structure of the shared directory to the guest so it can replicate that structure. This ensures that whenever the combination of st_dev and st_ino is unique on the host, it will be unique in the guest as well. This feature is optional and needs to be enabled explicitly, so that the mount structure isn’t leaked to the guest if the user doesn’t want it to be. The last patch in this series adds a test script. For it to pass, you need to compile a kernel that includes the “fuse: Mirror virtio-fs submounts” patch series (e.g. 5.10-rc1), and provide it to the test (as described in the test patch). Known caveats: - stat(2) doesn’t trigger auto-mounting. Therefore, issuing a stat() on a sub-mountpoint before it’s been auto-mounted will show its parent’s st_dev together with the st_ino it has in the sub-mounted filesystem. For example, imagine you want to share a whole filesystem with the guest, which on the host first looks like this: root/ (st_dev=64, st_ino=128) sub_fs/ (st_dev=64, st_ino=234) And then you mount another filesystem under sub_fs, so it looks like this: root/ (st_dev=64, st_ino=128) sub_fs/ (st_dev=96, st_ino=128) ... As you can see, sub_fs becomes a mount point, so its st_dev and st_ino change from what they were on root’s filesystem to what they are in the sub-filesystem. In fact, root and sub_fs now have the same st_ino, which is not unlikely given that both are root nodes in their respective filesystems. Now, this filesystem is shared with the guest through virtiofsd. There is no way for virtiofsd to uncover sub_fs’s original st_ino value of 234, so it will always provide st_ino=128 to the guest. However, virtiofsd does notice that sub_fs is a mount point and announces this fact to the guest. We want this to result in something like the following tree in the guest: root/ (st_dev=32, st_ino=128) sub_fs/ (st_dev=33, st_ino=128) ... That is, sub_fs should be a different filesystem that’s auto-mounted. However, as stated above, stat(2) doesn’t trigger auto-mounting, so before it happens, the following structure will be visible: root/ (st_dev=32, st_ino=128) sub_fs/ (st_dev=32, st_ino=128) That is, sub_fs and root will have the same st_dev/st_ino combination. This can easily be seen by executing find(1) on root in the guest, which will subsequently complain about an alleged filesystem loop. To properly fix this problem, we probably would have to be able to uncover sub_fs’s original st_ino value (i.e. 234) and let the guest use that until the auto-mount happens. However, there is no way to get that value (from userspace at least). Note that NFS with crossmnt has the exact same issue. - You can unmount auto-mounted submounts in the guest, but then you still cannot unmount them on the host. The guest still holds a reference to the submount’s root directory, because that’s just a normal entry in its parent directory (on the submount’s parent filesystem). This is kind of related to the issue noted above: When the submount is unmounted, the guest shouldn’t have a reference to sub_fs as the submount’s root directory (host’s st_dev=96, st_ino=128), but to it as a normal entry in its parent filesystem (st_dev=64, st_ino=234). (When you have multiple nesting levels, you can unmount inner mounts when the outer ones have been unmounted in the guest. For example, say you have a structure A/B/C/D, where each is a mount point, then unmounting D, C, and B in the guest will allow the host to unmount D and C.) - You can mount a filesystem twice on the host, and then it will show the same st_dev for all files within both mounts. However, the mounts are still distinct, so that if you e.g. mount another filesystem in one of the trees, it will not show up in the other. With this version of the series, both mounts will show up as different filesystems in the guest (i.e., both will have their own st_dev). That is because the guest receives no information to correlate different mounts; it just sees that some directory is a mount point, so it allocates a dedicated anonymous block device and uses it for that mounted filesystem, independently of what other submounts there may be. That means if a combination of st_dev+st_ino is unique in the guest, it may not be unique on the host. v2: - Switch from the FUSE_ATTR_FLAGS to the FUSE_SUBMOUNTS capability - Include Miklos’s patch for using statx() to include the mount ID as an additional key for lo_inodes (besides st_dev and st_ino). On one hand, this fixes a bug where if you mount the same filesystem twice in the shared directory, virtiofsd used to see it as the exact same tree (so you couldn’t mount another filesystem in one of both trees, but not in the other -- in the guest, it would either appear in both or neither). Now it sees both trees and all nodes within as separate. On the other, Miklos's patch allows us to simplify the submount detection a bit, because we don’t actually have to store every node parent’s st_dev. It turns out that in all code that actually needs to check for submounts, we already have the parent lo_inode around and can just query its mount ID and st_dev. (While the code was pretty much taken from Miklos as he posted it (with minor adjustments), I didn’t add his S-o-b, because he didn’t give it. I hope using Suggested-by, linking to his original mail, and CC-ing him on this series will suffice.) git-backport-diff against v1: Key: [----] : patches are identical [####] : number of functional differences between upstream/downstream patch [down] : patch is downstream-only The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively 001/7:[down] 'virtiofsd: Check FUSE_SUBMOUNTS' 002/7:[0013] [FC] 'virtiofsd: Add attr_flags to fuse_entry_param' 003/7:[down] 'meson.build: Check for statx()' 004/7:[down] 'virtiofsd: Add mount ID to the lo_inode key' 005/7:[0077] [FC] 'virtiofsd: Announce sub-mount points' 006/7:[----] [--] 'tests/acceptance/boot_linux: Accept SSH pubkey' 007/7:[----] [--] 'tests/acceptance: Add virtiofs_submounts.py' Max Reitz (7): virtiofsd: Check FUSE_SUBMOUNTS virtiofsd: Add attr_flags to fuse_entry_param meson.build: Check for statx() virtiofsd: Add mount ID to the lo_inode key virtiofsd: Announce sub-mount points tests/acceptance/boot_linux: Accept SSH pubkey tests/acceptance: Add virtiofs_submounts.py meson.build | 16 + tools/virtiofsd/fuse_common.h | 7 + tools/virtiofsd/fuse_lowlevel.h | 5 + tools/virtiofsd/fuse_lowlevel.c | 5 + tools/virtiofsd/helper.c | 1 + tools/virtiofsd/passthrough_ll.c | 117 ++++++- tools/virtiofsd/passthrough_seccomp.c | 1 + tests/acceptance/boot_linux.py | 13 +- tests/acceptance/virtiofs_submounts.py | 289 ++++++++++++++++++ .../virtiofs_submounts.py.data/cleanup.sh | 46 +++ .../guest-cleanup.sh | 30 ++ .../virtiofs_submounts.py.data/guest.sh | 138 +++++++++ .../virtiofs_submounts.py.data/host.sh | 127 ++++++++ 13 files changed, 779 insertions(+), 16 deletions(-) create mode 100644 tests/acceptance/virtiofs_submounts.py create mode 100644 tests/acceptance/virtiofs_submounts.py.data/cleanup.sh create mode 100644 tests/acceptance/virtiofs_submounts.py.data/guest-cleanup.sh create mode 100644 tests/acceptance/virtiofs_submounts.py.data/guest.sh create mode 100644 tests/acceptance/virtiofs_submounts.py.data/host.sh -- 2.26.2
On Thu, Oct 29, 2020 at 06:17:37PM +0100, Max Reitz wrote: > RFC: https://www.redhat.com/archives/virtio-fs/2020-May/msg00024.html > v1: https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03598.html > > Branch: https://github.com/XanClic/qemu.git virtiofs-submounts-v3 > Branch: https://git.xanclic.moe/XanClic/qemu.git virtiofs-submounts-v3 > > Based-on: <160390309510.12234.8858324597971641979.stgit@gimli.home> > (Alex’s pull request > “VFIO updates 2020-10-28 (for QEMU 5.2 soft-freeze)”, > notably the “linux-headers: update against 5.10-rc1” patch) > > > Hi, > > We want to (be able to) announce the host mount structure of the shared > directory to the guest so it can replicate that structure. This ensures > that whenever the combination of st_dev and st_ino is unique on the > host, it will be unique in the guest as well. > > This feature is optional and needs to be enabled explicitly, so that the > mount structure isn’t leaked to the guest if the user doesn’t want it to > be. > > The last patch in this series adds a test script. For it to pass, you > need to compile a kernel that includes the “fuse: Mirror virtio-fs > submounts” patch series (e.g. 5.10-rc1), and provide it to the test (as > described in the test patch). > > > Known caveats: > - stat(2) doesn’t trigger auto-mounting. Therefore, issuing a stat() on > a sub-mountpoint before it’s been auto-mounted will show its parent’s > st_dev together with the st_ino it has in the sub-mounted filesystem. > > For example, imagine you want to share a whole filesystem with the > guest, which on the host first looks like this: > > root/ (st_dev=64, st_ino=128) > sub_fs/ (st_dev=64, st_ino=234) > > And then you mount another filesystem under sub_fs, so it looks like > this: > > root/ (st_dev=64, st_ino=128) > sub_fs/ (st_dev=96, st_ino=128) > ... > > As you can see, sub_fs becomes a mount point, so its st_dev and st_ino > change from what they were on root’s filesystem to what they are in > the sub-filesystem. In fact, root and sub_fs now have the same > st_ino, which is not unlikely given that both are root nodes in their > respective filesystems. > > Now, this filesystem is shared with the guest through virtiofsd. > There is no way for virtiofsd to uncover sub_fs’s original st_ino > value of 234, so it will always provide st_ino=128 to the guest. > However, virtiofsd does notice that sub_fs is a mount point and > announces this fact to the guest. > > We want this to result in something like the following tree in the > guest: > > root/ (st_dev=32, st_ino=128) > sub_fs/ (st_dev=33, st_ino=128) > ... > > That is, sub_fs should be a different filesystem that’s auto-mounted. > However, as stated above, stat(2) doesn’t trigger auto-mounting, so > before it happens, the following structure will be visible: > > root/ (st_dev=32, st_ino=128) > sub_fs/ (st_dev=32, st_ino=128) > > That is, sub_fs and root will have the same st_dev/st_ino combination. > > This can easily be seen by executing find(1) on root in the guest, > which will subsequently complain about an alleged filesystem loop. > > To properly fix this problem, we probably would have to be able to > uncover sub_fs’s original st_ino value (i.e. 234) and let the guest > use that until the auto-mount happens. However, there is no way to > get that value (from userspace at least). > > Note that NFS with crossmnt has the exact same issue. > > > - You can unmount auto-mounted submounts in the guest, but then you > still cannot unmount them on the host. The guest still holds a > reference to the submount’s root directory, because that’s just a > normal entry in its parent directory (on the submount’s parent > filesystem). > > This is kind of related to the issue noted above: When the submount is > unmounted, the guest shouldn’t have a reference to sub_fs as the > submount’s root directory (host’s st_dev=96, st_ino=128), but to it as > a normal entry in its parent filesystem (st_dev=64, st_ino=234). > > (When you have multiple nesting levels, you can unmount inner mounts > when the outer ones have been unmounted in the guest. For example, > say you have a structure A/B/C/D, where each is a mount point, then > unmounting D, C, and B in the guest will allow the host to unmount D > and C.) > > > - You can mount a filesystem twice on the host, and then it will show > the same st_dev for all files within both mounts. However, the mounts > are still distinct, so that if you e.g. mount another filesystem in > one of the trees, it will not show up in the other. > > With this version of the series, both mounts will show up as different > filesystems in the guest (i.e., both will have their own st_dev). > That is because the guest receives no information to correlate > different mounts; it just sees that some directory is a mount point, > so it allocates a dedicated anonymous block device and uses it for > that mounted filesystem, independently of what other submounts there > may be. > > That means if a combination of st_dev+st_ino is unique in the guest, > it may not be unique on the host. > > > v2: > - Switch from the FUSE_ATTR_FLAGS to the FUSE_SUBMOUNTS capability > > - Include Miklos’s patch for using statx() to include the mount ID as an > additional key for lo_inodes (besides st_dev and st_ino). > > On one hand, this fixes a bug where if you mount the same filesystem > twice in the shared directory, virtiofsd used to see it as the exact > same tree (so you couldn’t mount another filesystem in one of both > trees, but not in the other -- in the guest, it would either appear in > both or neither). Now it sees both trees and all nodes within as > separate. > > On the other, Miklos's patch allows us to simplify the submount > detection a bit, because we don’t actually have to store every node > parent’s st_dev. It turns out that in all code that actually needs to > check for submounts, we already have the parent lo_inode around and > can just query its mount ID and st_dev. > > (While the code was pretty much taken from Miklos as he posted it > (with minor adjustments), I didn’t add his S-o-b, because he didn’t > give it. I hope using Suggested-by, linking to his original mail, and > CC-ing him on this series will suffice.) > > > git-backport-diff against v1: > > Key: > [----] : patches are identical > [####] : number of functional differences between upstream/downstream patch > [down] : patch is downstream-only > The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively > > 001/7:[down] 'virtiofsd: Check FUSE_SUBMOUNTS' > 002/7:[0013] [FC] 'virtiofsd: Add attr_flags to fuse_entry_param' > 003/7:[down] 'meson.build: Check for statx()' > 004/7:[down] 'virtiofsd: Add mount ID to the lo_inode key' > 005/7:[0077] [FC] 'virtiofsd: Announce sub-mount points' > 006/7:[----] [--] 'tests/acceptance/boot_linux: Accept SSH pubkey' > 007/7:[----] [--] 'tests/acceptance: Add virtiofs_submounts.py' > > > Max Reitz (7): > virtiofsd: Check FUSE_SUBMOUNTS > virtiofsd: Add attr_flags to fuse_entry_param > meson.build: Check for statx() > virtiofsd: Add mount ID to the lo_inode key > virtiofsd: Announce sub-mount points > tests/acceptance/boot_linux: Accept SSH pubkey > tests/acceptance: Add virtiofs_submounts.py > > meson.build | 16 + > tools/virtiofsd/fuse_common.h | 7 + > tools/virtiofsd/fuse_lowlevel.h | 5 + > tools/virtiofsd/fuse_lowlevel.c | 5 + > tools/virtiofsd/helper.c | 1 + > tools/virtiofsd/passthrough_ll.c | 117 ++++++- > tools/virtiofsd/passthrough_seccomp.c | 1 + > tests/acceptance/boot_linux.py | 13 +- > tests/acceptance/virtiofs_submounts.py | 289 ++++++++++++++++++ > .../virtiofs_submounts.py.data/cleanup.sh | 46 +++ > .../guest-cleanup.sh | 30 ++ > .../virtiofs_submounts.py.data/guest.sh | 138 +++++++++ > .../virtiofs_submounts.py.data/host.sh | 127 ++++++++ > 13 files changed, 779 insertions(+), 16 deletions(-) > create mode 100644 tests/acceptance/virtiofs_submounts.py > create mode 100644 tests/acceptance/virtiofs_submounts.py.data/cleanup.sh > create mode 100644 tests/acceptance/virtiofs_submounts.py.data/guest-cleanup.sh > create mode 100644 tests/acceptance/virtiofs_submounts.py.data/guest.sh > create mode 100644 tests/acceptance/virtiofs_submounts.py.data/host.sh > > -- > 2.26.2 > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
© 2016 - 2024 Red Hat, Inc.