Documentation/bpf/kfuncs.rst | 23 +- fs/fuse/Kconfig | 8 + fs/fuse/Makefile | 1 + fs/fuse/backing.c | 4241 +++++++++++++++++ fs/fuse/bpf_register.c | 209 + fs/fuse/control.c | 2 +- fs/fuse/dev.c | 85 +- fs/fuse/dir.c | 344 +- fs/fuse/file.c | 63 +- fs/fuse/fuse_i.h | 495 +- fs/fuse/inode.c | 360 +- fs/fuse/ioctl.c | 2 +- fs/fuse/readdir.c | 5 + fs/fuse/xattr.c | 18 + fs/overlayfs/file.c | 23 +- include/linux/bpf.h | 2 +- include/linux/bpf_fuse.h | 283 ++ include/linux/fs.h | 5 + include/uapi/linux/bpf.h | 12 + include/uapi/linux/fuse.h | 41 + kernel/bpf/Makefile | 4 + kernel/bpf/bpf_fuse.c | 241 + kernel/bpf/bpf_struct_ops.c | 6 +- kernel/bpf/bpf_struct_ops_types.h | 4 + kernel/bpf/btf.c | 1 + kernel/bpf/helpers.c | 32 +- kernel/bpf/verifier.c | 32 + tools/include/uapi/linux/bpf.h | 12 + tools/include/uapi/linux/fuse.h | 1135 +++++ .../testing/selftests/bpf/prog_tests/dynptr.c | 1 + .../selftests/bpf/progs/dynptr_success.c | 21 + .../selftests/filesystems/fuse/.gitignore | 2 + .../selftests/filesystems/fuse/Makefile | 189 + .../testing/selftests/filesystems/fuse/OWNERS | 2 + .../selftests/filesystems/fuse/bpf_common.h | 51 + .../selftests/filesystems/fuse/bpf_loader.c | 597 +++ .../testing/selftests/filesystems/fuse/fd.txt | 21 + .../selftests/filesystems/fuse/fd_bpf.bpf.c | 397 ++ .../selftests/filesystems/fuse/fuse_daemon.c | 300 ++ .../selftests/filesystems/fuse/fuse_test.c | 2412 ++++++++++ .../filesystems/fuse/struct_op_test.bpf.c | 642 +++ .../selftests/filesystems/fuse/test.bpf.c | 996 ++++ .../filesystems/fuse/test_framework.h | 172 + .../selftests/filesystems/fuse/test_fuse.h | 494 ++ 44 files changed, 13755 insertions(+), 231 deletions(-) create mode 100644 fs/fuse/backing.c create mode 100644 fs/fuse/bpf_register.c create mode 100644 include/linux/bpf_fuse.h create mode 100644 kernel/bpf/bpf_fuse.c create mode 100644 tools/include/uapi/linux/fuse.h create mode 100644 tools/testing/selftests/filesystems/fuse/.gitignore create mode 100644 tools/testing/selftests/filesystems/fuse/Makefile create mode 100644 tools/testing/selftests/filesystems/fuse/OWNERS create mode 100644 tools/testing/selftests/filesystems/fuse/bpf_common.h create mode 100644 tools/testing/selftests/filesystems/fuse/bpf_loader.c create mode 100644 tools/testing/selftests/filesystems/fuse/fd.txt create mode 100644 tools/testing/selftests/filesystems/fuse/fd_bpf.bpf.c create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_daemon.c create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_test.c create mode 100644 tools/testing/selftests/filesystems/fuse/struct_op_test.bpf.c create mode 100644 tools/testing/selftests/filesystems/fuse/test.bpf.c create mode 100644 tools/testing/selftests/filesystems/fuse/test_framework.h create mode 100644 tools/testing/selftests/filesystems/fuse/test_fuse.h
These patches extend FUSE to be able to act as a stacked filesystem. This allows pure passthrough, where the fuse file system simply reflects the lower filesystem, and also allows optional pre and post filtering in BPF and/or the userspace daemon as needed. This can dramatically reduce or even eliminate transitions to and from userspace. In this patch set, I've reworked the bpf code to add a new struct_op type instead of a new program type, and used new kfuncs in place of new helpers. Additionally, it now uses dynptrs for variable sized buffers. The first three patches are repeats of a previous patch set which I have not yet adjusted for comments. I plan to adjust those and submit them separately with fixes, but wanted to have the current fuse-bpf code visible before then. Patches 4-7 mostly rearrange existing code to remove noise from the main patch. Patch 8 contains the main sections of fuse-bpf Patches 9-25 implementing most FUSE functions as operations on a lower filesystem. From patch 25, you can run fuse as a passthrough filesystem. Patches 26-32 provide bpf functionality so that you can alter fuse parameters via fuse_op programs. Patch 33 extends this to userspace, and patches 34-37 add some testing functionality. There's definitely a lot of cleanup and some restructuring I would like to do. In the current form, I could get rid of the large macro in place of a function that takes a struct that groups a bunch of function pointers, although I'm not sure a function that takes three void*'s is much better than the macro... I'm definitely open to suggestions on how to clean that up. This changes the format of adding a backing file/bpf slightly from v2. fuse_op programs are specified by name, limited to 15 characters. The block added to fuse_bpf_entires has been increased to compensate. This adds one more unused field when specifying the backing file. Lookups responses that add a backing file must go through an ioctl interface. This is to prevent any attempts at fooling priveledged programs with fd trickery. Currently, there are two types of fuse_bpf_entry. One for passing the fuse_op program you wish to use, specified by name, and one for passing the fd of the backing file you'd like to associate with the given lookup. In the future, this may be extended to a more complicated system allowing for multiple bpf programs or backing files. This would come with kfuncs for bpf to indicate which backing file should be acted upon. Multiple bpf programs would allow chaining existing programs to extend functionality without requiring an entirely new program set. You can run this without needing to set up a userspace daemon by adding these mount options: root_dir=[fd],no_daemon where fd is an open file descriptor pointing to the folder you'd like to use as the root directory. The fd can be immediately closed after mounting. You may also set a root_bpf program by setting root_bpf=[fuse_op name] after registering a fuse_op program. This is useful for running various fs tests. This patch set is against bpf-next The main changes for v3: Restructured around struct_op programs Using dynptrs instead of packets Using kfuncs instead of new helpers Selftests now use skel for loading Alessio Balsini (1): fs: Generic function to convert iocb to rw flags Daniel Rosenberg (36): bpf: verifier: Accept dynptr mem as mem in herlpers bpf: Allow NULL buffers in bpf_dynptr_slice(_rw) selftests/bpf: Test allowing NULL buffer in dynptr slice fuse-bpf: Update fuse side uapi fuse-bpf: Add data structures for fuse-bpf fuse-bpf: Prepare for fuse-bpf patch fuse: Add fuse-bpf, a stacked fs extension for FUSE fuse-bpf: Add ioctl interface for /dev/fuse fuse-bpf: Don't support export_operations fuse-bpf: Add support for access fuse-bpf: Partially add mapping support fuse-bpf: Add lseek support fuse-bpf: Add support for fallocate fuse-bpf: Support file/dir open/close fuse-bpf: Support mknod/unlink/mkdir/rmdir fuse-bpf: Add support for read/write iter fuse-bpf: support readdir fuse-bpf: Add support for sync operations fuse-bpf: Add Rename support fuse-bpf: Add attr support fuse-bpf: Add support for FUSE_COPY_FILE_RANGE fuse-bpf: Add xattr support fuse-bpf: Add symlink/link support fuse-bpf: allow mounting with no userspace daemon bpf: Increase struct_op limits fuse-bpf: Add fuse-bpf constants WIP: bpf: Add fuse_ops struct_op programs fuse-bpf: Export Functions fuse: Provide registration functions for fuse-bpf fuse-bpf: Set fuse_ops at mount or lookup time fuse-bpf: Call bpf for pre/post filters fuse-bpf: Add userspace pre/post filters WIP: fuse-bpf: add error_out tools: Add FUSE, update bpf includes fuse-bpf: Add selftests fuse: Provide easy way to test fuse struct_op call Documentation/bpf/kfuncs.rst | 23 +- fs/fuse/Kconfig | 8 + fs/fuse/Makefile | 1 + fs/fuse/backing.c | 4241 +++++++++++++++++ fs/fuse/bpf_register.c | 209 + fs/fuse/control.c | 2 +- fs/fuse/dev.c | 85 +- fs/fuse/dir.c | 344 +- fs/fuse/file.c | 63 +- fs/fuse/fuse_i.h | 495 +- fs/fuse/inode.c | 360 +- fs/fuse/ioctl.c | 2 +- fs/fuse/readdir.c | 5 + fs/fuse/xattr.c | 18 + fs/overlayfs/file.c | 23 +- include/linux/bpf.h | 2 +- include/linux/bpf_fuse.h | 283 ++ include/linux/fs.h | 5 + include/uapi/linux/bpf.h | 12 + include/uapi/linux/fuse.h | 41 + kernel/bpf/Makefile | 4 + kernel/bpf/bpf_fuse.c | 241 + kernel/bpf/bpf_struct_ops.c | 6 +- kernel/bpf/bpf_struct_ops_types.h | 4 + kernel/bpf/btf.c | 1 + kernel/bpf/helpers.c | 32 +- kernel/bpf/verifier.c | 32 + tools/include/uapi/linux/bpf.h | 12 + tools/include/uapi/linux/fuse.h | 1135 +++++ .../testing/selftests/bpf/prog_tests/dynptr.c | 1 + .../selftests/bpf/progs/dynptr_success.c | 21 + .../selftests/filesystems/fuse/.gitignore | 2 + .../selftests/filesystems/fuse/Makefile | 189 + .../testing/selftests/filesystems/fuse/OWNERS | 2 + .../selftests/filesystems/fuse/bpf_common.h | 51 + .../selftests/filesystems/fuse/bpf_loader.c | 597 +++ .../testing/selftests/filesystems/fuse/fd.txt | 21 + .../selftests/filesystems/fuse/fd_bpf.bpf.c | 397 ++ .../selftests/filesystems/fuse/fuse_daemon.c | 300 ++ .../selftests/filesystems/fuse/fuse_test.c | 2412 ++++++++++ .../filesystems/fuse/struct_op_test.bpf.c | 642 +++ .../selftests/filesystems/fuse/test.bpf.c | 996 ++++ .../filesystems/fuse/test_framework.h | 172 + .../selftests/filesystems/fuse/test_fuse.h | 494 ++ 44 files changed, 13755 insertions(+), 231 deletions(-) create mode 100644 fs/fuse/backing.c create mode 100644 fs/fuse/bpf_register.c create mode 100644 include/linux/bpf_fuse.h create mode 100644 kernel/bpf/bpf_fuse.c create mode 100644 tools/include/uapi/linux/fuse.h create mode 100644 tools/testing/selftests/filesystems/fuse/.gitignore create mode 100644 tools/testing/selftests/filesystems/fuse/Makefile create mode 100644 tools/testing/selftests/filesystems/fuse/OWNERS create mode 100644 tools/testing/selftests/filesystems/fuse/bpf_common.h create mode 100644 tools/testing/selftests/filesystems/fuse/bpf_loader.c create mode 100644 tools/testing/selftests/filesystems/fuse/fd.txt create mode 100644 tools/testing/selftests/filesystems/fuse/fd_bpf.bpf.c create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_daemon.c create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_test.c create mode 100644 tools/testing/selftests/filesystems/fuse/struct_op_test.bpf.c create mode 100644 tools/testing/selftests/filesystems/fuse/test.bpf.c create mode 100644 tools/testing/selftests/filesystems/fuse/test_framework.h create mode 100644 tools/testing/selftests/filesystems/fuse/test_fuse.h base-commit: 49859de997c3115b85544bce6b6ceab60a7fabc4 -- 2.40.0.634.g4ca3ef3211-goog
On Tue, Apr 18, 2023 at 4:40 AM Daniel Rosenberg <drosen@google.com> wrote: > > These patches extend FUSE to be able to act as a stacked filesystem. This > allows pure passthrough, where the fuse file system simply reflects the lower > filesystem, and also allows optional pre and post filtering in BPF and/or the > userspace daemon as needed. This can dramatically reduce or even eliminate > transitions to and from userspace. > > In this patch set, I've reworked the bpf code to add a new struct_op type > instead of a new program type, and used new kfuncs in place of new helpers. > Additionally, it now uses dynptrs for variable sized buffers. The first three > patches are repeats of a previous patch set which I have not yet adjusted for > comments. I plan to adjust those and submit them separately with fixes, but > wanted to have the current fuse-bpf code visible before then. > > Patches 4-7 mostly rearrange existing code to remove noise from the main patch. > Patch 8 contains the main sections of fuse-bpf > Patches 9-25 implementing most FUSE functions as operations on a lower > filesystem. From patch 25, you can run fuse as a passthrough filesystem. > Patches 26-32 provide bpf functionality so that you can alter fuse parameters > via fuse_op programs. > Patch 33 extends this to userspace, and patches 34-37 add some testing > functionality. > That's a nice logical breakup for review. I feel there is so much subtle code in those patches that the only sane path forward is to review and merge them in phases. Your patches adds this config: +config FUSE_BPF + bool "Adds BPF to fuse" + depends on FUSE_FS + depends on BPF + help + Extends FUSE by adding BPF to prefilter calls and potentially pass to a + backing file system Since your patches add the PASSTHROUGH functionality before adding BPF functionality, would it make sense to review and merge the PASSTHROUGH functionality strictly before the BPF functionality? Alternatively, you could aim to merge support for some PASSTHROUGH ops then support for some BPF functionality and then slowly add ops to both. Which brings me to my biggest concern. I still do not see how these patches replace Allesio's FUSE_DEV_IOC_PASSTHROUGH_OPEN patches. Is the idea here that ioctl needs to be done at FUSE_LOOKUP instead or in addition to the ioctl on FUSE_OPEN to setup the read/write passthrough on the backing file? I am missing things like the FILESYSTEM_MAX_STACK_DEPTH check that was added as a result of review on Allesio's patches. The reason I am concerned about this is that we are using the FUSE_DEV_IOC_PASSTHROUGH_OPEN patches and I would like to upstream their functionality sooner rather than later. These patches have already been running in production for a while I believe that they are running in Android as well and there is value in upsteaming well tested patches. The API does not need to stay FUSE_DEV_IOC_PASSTHROUGH_OPEN it should be an API that is extendable to FUSE-BPF, but it would be useful if the read/write passthrough could be the goal for first merge. Does any of this make sense to you? Can you draw a roadmap for merging FUSE-BPF that starts with a first (hopefully short term) phase that adds the read/write passthrough functionality? I can help with review and testing of that part if needed. I was planning to discuss this with you on LSFMM anyway, but better start the discussion beforehand. Thanks, Amir.
On Mon, Apr 17, 2023 at 10:33 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > Which brings me to my biggest concern. > I still do not see how these patches replace Allesio's > FUSE_DEV_IOC_PASSTHROUGH_OPEN patches. > > Is the idea here that ioctl needs to be done at FUSE_LOOKUP > instead or in addition to the ioctl on FUSE_OPEN to setup the > read/write passthrough on the backing file? > In these patches, the fuse daemon responds to the lookup request via an ioctl, essentially in the same way it would have to the /dev/fuse node. It just flags the write as coming from an ioctl and calls fuse_dev_do_write. An additional block in the lookup response gives the backing file and what bpf_ops to use. The main difference is that fuse-bpf uses backing inodes, while passthrough uses a file. Fuse-bpf's read/write support currently isn't complete, but it does allow for direct passthrough. You could set ops to default to userspace in every case that Allesio's passthrough code does and it should have about the same effect. With the struct_op change, I did notice that doing something like that is more annoying, and am planning to add a default op which only takes the meta info and runs if the opcode specific op is not present. > I am missing things like the FILESYSTEM_MAX_STACK_DEPTH check that > was added as a result of review on Allesio's patches. > I'd definitely want to fix any issues that were fixed there. There's a lot of common code between fuse-bpf and fuse passthrough, so many of the suggestions there will apply here. > The reason I am concerned about this is that we are using the > FUSE_DEV_IOC_PASSTHROUGH_OPEN patches and I would like > to upstream their functionality sooner rather than later. > These patches have already been running in production for a while > I believe that they are running in Android as well and there is value > in upsteaming well tested patches. > > The API does not need to stay FUSE_DEV_IOC_PASSTHROUGH_OPEN > it should be an API that is extendable to FUSE-BPF, but it would be > useful if the read/write passthrough could be the goal for first merge. > > Does any of this make sense to you? > Can you draw a roadmap for merging FUSE-BPF that starts with > a first (hopefully short term) phase that adds the read/write passthrough > functionality? > > I can help with review and testing of that part if needed. > I was planning to discuss this with you on LSFMM anyway, > but better start the discussion beforehand. > > Thanks, > Amir. We've been using an earlier version of fuse-bpf on Android, closer to the V1 patches. They fit our current needs but don't cover everything we intend to. The V3 patches switch to a new style of bpf program, which I'm hoping to get some feedback on before I spend too much time fixing up the details. The backing calls themselves can be reviewed separately from that though. Without bpf, we're essentially enabling complete passthrough at a directory or file. By default, once you set a backing file fuse-bpf calls by the backing filesystem by default, with no additional userspace interaction apart from if an installed bpf program says otherwise. If we had some commands without others, we'd have behavior changes as we introduce support for additional calls. We'd need a way to set default behavior. Perhaps something like a u64 flag field extension in FUSE_INIT for indicating which opcodes support backing, and a response for what those should default to doing. If there's a bpf_op present for a given opcode, it would be able to override that default. If we had something like that, we'd be able to add support for a subset of opcodes in a sensible way.
On Fri, Apr 21, 2023 at 4:41 AM Daniel Rosenberg <drosen@google.com> wrote: > > On Mon, Apr 17, 2023 at 10:33 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > Which brings me to my biggest concern. > > I still do not see how these patches replace Allesio's > > FUSE_DEV_IOC_PASSTHROUGH_OPEN patches. > > > > Is the idea here that ioctl needs to be done at FUSE_LOOKUP > > instead or in addition to the ioctl on FUSE_OPEN to setup the > > read/write passthrough on the backing file? > > > > In these patches, the fuse daemon responds to the lookup request via > an ioctl, essentially in the same way it would have to the /dev/fuse > node. It just flags the write as coming from an ioctl and calls > fuse_dev_do_write. An additional block in the lookup response gives > the backing file and what bpf_ops to use. The main difference is that > fuse-bpf uses backing inodes, while passthrough uses a file. Ah right. I wonder if there is benefit in both APIs or if backing inode is sufficient to impelelent everything the could be interesting to implement with a backing file. > Fuse-bpf's read/write support currently isn't complete, but it does > allow for direct passthrough. You could set ops to default to > userspace in every case that Allesio's passthrough code does and it > should have about the same effect. What are the subtle differences then? > With the struct_op change, I did > notice that doing something like that is more annoying, and am > planning to add a default op which only takes the meta info and runs > if the opcode specific op is not present. > Sounds interesting. I'll wait to see what you propose. > > > I am missing things like the FILESYSTEM_MAX_STACK_DEPTH check that > > was added as a result of review on Allesio's patches. > > > > I'd definitely want to fix any issues that were fixed there. There's a > lot of common code between fuse-bpf and fuse passthrough, so many of > the suggestions there will apply here. > That's why I suggested trying to implement the passthough file ioctl functionality first to make sure that none of the review comments in the first round were missed. But if we need functionality of both ioctls, we can collaborate the work on merging them separately. > > The reason I am concerned about this is that we are using the > > FUSE_DEV_IOC_PASSTHROUGH_OPEN patches and I would like > > to upstream their functionality sooner rather than later. > > These patches have already been running in production for a while > > I believe that they are running in Android as well and there is value > > in upsteaming well tested patches. > > > > The API does not need to stay FUSE_DEV_IOC_PASSTHROUGH_OPEN > > it should be an API that is extendable to FUSE-BPF, but it would be > > useful if the read/write passthrough could be the goal for first merge. > > > > Does any of this make sense to you? > > Can you draw a roadmap for merging FUSE-BPF that starts with > > a first (hopefully short term) phase that adds the read/write passthrough > > functionality? > > > > I can help with review and testing of that part if needed. > > I was planning to discuss this with you on LSFMM anyway, > > but better start the discussion beforehand. > > > > Thanks, > > Amir. > > We've been using an earlier version of fuse-bpf on Android, closer to > the V1 patches. They fit our current needs but don't cover everything > we intend to. The V3 patches switch to a new style of bpf program, > which I'm hoping to get some feedback on before I spend too much time > fixing up the details. The backing calls themselves can be reviewed > separately from that though. > > Without bpf, we're essentially enabling complete passthrough at a > directory or file. By default, once you set a backing file fuse-bpf > calls by the backing filesystem by default, with no additional > userspace interaction apart from if an installed bpf program says > otherwise. If we had some commands without others, we'd have behavior > changes as we introduce support for additional calls. We'd need a way > to set default behavior. Perhaps something like a u64 flag field > extension in FUSE_INIT for indicating which opcodes support backing, > and a response for what those should default to doing. If there's a > bpf_op present for a given opcode, it would be able to override that > default. If we had something like that, we'd be able to add support > for a subset of opcodes in a sensible way. So maybe this is something to consider. Thanks, Amir.
On Tue, 18 Apr 2023 at 03:40, Daniel Rosenberg <drosen@google.com> wrote: > > These patches extend FUSE to be able to act as a stacked filesystem. This > allows pure passthrough, where the fuse file system simply reflects the lower > filesystem, and also allows optional pre and post filtering in BPF and/or the > userspace daemon as needed. This can dramatically reduce or even eliminate > transitions to and from userspace. I'll ignore BPF for now and concentrate on the passthrough aspect, which I understand better. The security model needs to be thought about and documented. Think about this: the fuse server now delegates operations it would itself perform to the passthrough code in fuse. The permissions that would have been checked in the context of the fuse server are now checked in the context of the task performing the operation. The server may be able to bypass seccomp restrictions. Files that are open on the backing filesystem are now hidden (e.g. lsof won't find these), which allows the server to obfuscate accesses to backing files. Etc. These are not particularly worrying if the server is privileged, but fuse comes with the history of supporting unprivileged servers, so we should look at supporting passthrough with unprivileged servers as well. My other generic comment is that you should add justification for doing this in the first place. I guess it's mainly performance. So how performance can be won in real life cases? It would also be good to measure the contribution of individual ops to that win. Is there another reason for this besides performance? Thanks, Miklos
On Mon, Apr 24, 2023 at 8:32 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > > > The security model needs to be thought about and documented. Think > about this: the fuse server now delegates operations it would itself > perform to the passthrough code in fuse. The permissions that would > have been checked in the context of the fuse server are now checked in > the context of the task performing the operation. The server may be > able to bypass seccomp restrictions. Files that are open on the > backing filesystem are now hidden (e.g. lsof won't find these), which > allows the server to obfuscate accesses to backing files. Etc. > > These are not particularly worrying if the server is privileged, but > fuse comes with the history of supporting unprivileged servers, so we > should look at supporting passthrough with unprivileged servers as > well. > This is on my todo list. My current plan is to grab the creds that the daemon uses to respond to FUSE_INIT. That should keep behavior fairly similar. I'm not sure if there are cases where the fuse server is operating under multiple contexts. I don't currently have a plan for exposing open files via lsof. Every such file should relate to one that will show up though. I haven't dug into how that's set up, but I'm open to suggestions. > My other generic comment is that you should add justification for > doing this in the first place. I guess it's mainly performance. So > how performance can be won in real life cases? It would also be good > to measure the contribution of individual ops to that win. Is there > another reason for this besides performance? > > Thanks, > Miklos Our main concern with it is performance. We have some preliminary numbers looking at the pure passthrough case. We've been testing using a ramdrive on a somewhat slow machine, as that should highlight differences more. We ran fio for sequential reads, and random read/write. For sequential reads, we were seeing libfuse's passthrough_hp take about a 50% hit, with fuse-bpf not being detectably slower. For random read/write, we were seeing a roughly 90% drop in performance from passthrough_hp, while fuse-bpf has about a 7% drop in read and write speed. When we use a bpf that traces every opcode, that performance hit increases to a roughly 1% drop in sequential read performance, and a 20% drop in both read and write performance for random read/write. We plan to make more complex bpf examples, with fuse daemon equivalents to compare against. We have not looked closely at the impact of individual opcodes yet. There's also a potential ease of use for fuse-bpf. If you're implementing a fuse daemon that is largely mirroring a backing filesystem, you only need to write code for the differences in behavior. For instance, say you want to remove image metadata like location. You could give bpf information on what range of data is metadata, and zero out that section without having to handle any other operations. -Daniel
On 2023/5/2 17:07, Daniel Rosenberg wrote: > On Mon, Apr 24, 2023 at 8:32 AM Miklos Szeredi <miklos@szeredi.hu> wrote: >> >> >> The security model needs to be thought about and documented. Think >> about this: the fuse server now delegates operations it would itself >> perform to the passthrough code in fuse. The permissions that would >> have been checked in the context of the fuse server are now checked in >> the context of the task performing the operation. The server may be >> able to bypass seccomp restrictions. Files that are open on the >> backing filesystem are now hidden (e.g. lsof won't find these), which >> allows the server to obfuscate accesses to backing files. Etc. >> >> These are not particularly worrying if the server is privileged, but >> fuse comes with the history of supporting unprivileged servers, so we >> should look at supporting passthrough with unprivileged servers as >> well. >> > > This is on my todo list. My current plan is to grab the creds that the > daemon uses to respond to FUSE_INIT. That should keep behavior fairly > similar. I'm not sure if there are cases where the fuse server is > operating under multiple contexts. > I don't currently have a plan for exposing open files via lsof. Every > such file should relate to one that will show up though. I haven't dug > into how that's set up, but I'm open to suggestions. > >> My other generic comment is that you should add justification for >> doing this in the first place. I guess it's mainly performance. So >> how performance can be won in real life cases? It would also be good >> to measure the contribution of individual ops to that win. Is there >> another reason for this besides performance? >> >> Thanks, >> Miklos > > Our main concern with it is performance. We have some preliminary > numbers looking at the pure passthrough case. We've been testing using > a ramdrive on a somewhat slow machine, as that should highlight > differences more. We ran fio for sequential reads, and random > read/write. For sequential reads, we were seeing libfuse's > passthrough_hp take about a 50% hit, with fuse-bpf not being > detectably slower. For random read/write, we were seeing a roughly 90% > drop in performance from passthrough_hp, while fuse-bpf has about a 7% > drop in read and write speed. When we use a bpf that traces every > opcode, that performance hit increases to a roughly 1% drop in > sequential read performance, and a 20% drop in both read and write > performance for random read/write. We plan to make more complex bpf > examples, with fuse daemon equivalents to compare against. > > We have not looked closely at the impact of individual opcodes yet. > > There's also a potential ease of use for fuse-bpf. If you're > implementing a fuse daemon that is largely mirroring a backing > filesystem, you only need to write code for the differences in > behavior. For instance, say you want to remove image metadata like > location. You could give bpf information on what range of data is > metadata, and zero out that section without having to handle any other > operations. A bit out of topic (although I'm not quite look into FUSE BPF internals) After roughly listening to this topic in FS track last week, I'm not quite sure (at least in the long term) if it might be better if ebpf-related filter/redirect stuffs could be landed in vfs or in a somewhat stackable fs so that we could redirect/filter any sub-fstree in principle? It's just an open question and I have no real tendency of this but do we really need a BPF-filter functionality for each individual fs? It sounds much like https://learn.microsoft.com/en-us/windows-hardware/drivers/ifs/about-file-system-filter-drivers Thanks, Gao Xiang > > -Daniel
On Wed, May 17, 2023 at 5:50 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote: > > > > On 2023/5/2 17:07, Daniel Rosenberg wrote: > > On Mon, Apr 24, 2023 at 8:32 AM Miklos Szeredi <miklos@szeredi.hu> wrote: > >> > >> > >> The security model needs to be thought about and documented. Think > >> about this: the fuse server now delegates operations it would itself > >> perform to the passthrough code in fuse. The permissions that would > >> have been checked in the context of the fuse server are now checked in > >> the context of the task performing the operation. The server may be > >> able to bypass seccomp restrictions. Files that are open on the > >> backing filesystem are now hidden (e.g. lsof won't find these), which > >> allows the server to obfuscate accesses to backing files. Etc. > >> > >> These are not particularly worrying if the server is privileged, but > >> fuse comes with the history of supporting unprivileged servers, so we > >> should look at supporting passthrough with unprivileged servers as > >> well. > >> > > > > This is on my todo list. My current plan is to grab the creds that the > > daemon uses to respond to FUSE_INIT. That should keep behavior fairly > > similar. I'm not sure if there are cases where the fuse server is > > operating under multiple contexts. > > I don't currently have a plan for exposing open files via lsof. Every > > such file should relate to one that will show up though. I haven't dug > > into how that's set up, but I'm open to suggestions. > > > >> My other generic comment is that you should add justification for > >> doing this in the first place. I guess it's mainly performance. So > >> how performance can be won in real life cases? It would also be good > >> to measure the contribution of individual ops to that win. Is there > >> another reason for this besides performance? > >> > >> Thanks, > >> Miklos > > > > Our main concern with it is performance. We have some preliminary > > numbers looking at the pure passthrough case. We've been testing using > > a ramdrive on a somewhat slow machine, as that should highlight > > differences more. We ran fio for sequential reads, and random > > read/write. For sequential reads, we were seeing libfuse's > > passthrough_hp take about a 50% hit, with fuse-bpf not being > > detectably slower. For random read/write, we were seeing a roughly 90% > > drop in performance from passthrough_hp, while fuse-bpf has about a 7% > > drop in read and write speed. When we use a bpf that traces every > > opcode, that performance hit increases to a roughly 1% drop in > > sequential read performance, and a 20% drop in both read and write > > performance for random read/write. We plan to make more complex bpf > > examples, with fuse daemon equivalents to compare against. > > > > We have not looked closely at the impact of individual opcodes yet. > > > > There's also a potential ease of use for fuse-bpf. If you're > > implementing a fuse daemon that is largely mirroring a backing > > filesystem, you only need to write code for the differences in > > behavior. For instance, say you want to remove image metadata like > > location. You could give bpf information on what range of data is > > metadata, and zero out that section without having to handle any other > > operations. > > A bit out of topic (although I'm not quite look into FUSE BPF internals) > After roughly listening to this topic in FS track last week, I'm not > quite sure (at least in the long term) if it might be better if > ebpf-related filter/redirect stuffs could be landed in vfs or in a > somewhat stackable fs so that we could redirect/filter any sub-fstree > in principle? It's just an open question and I have no real tendency > of this but do we really need a BPF-filter functionality for each > individual fs? I think that is a valid question, but the answer is that even if it makes sense, doing something like this in vfs would be a much bigger project with larger consequences on performance and security and whatnot, so even if (and a very big if) this ever happens, using FUSE-BPF as a playground for this sort of stuff would be a good idea. This reminds me of union mounts - it made sense to have union mount functionality in vfs, but after a long winding road, a stacked fs (overlayfs) turned out to be a much more practical solution. > > It sounds much like > https://learn.microsoft.com/en-us/windows-hardware/drivers/ifs/about-file-system-filter-drivers > Nice reference. I must admit that I found it hard to understand what Windows filter drivers can do compared to FUSE-BPF design. It'd be nice to get some comparison from what is planned for FUSE-BPF. Interesting to note that there is a "legacy" Windows filter driver API, so Windows didn't get everything right for the first API - that is especially interesting to look at as repeating other people's mistakes would be a shame. Thanks, Amir.
Hi Amir, On 2023/5/17 23:51, Amir Goldstein wrote: > On Wed, May 17, 2023 at 5:50 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote: >> >> >> >> On 2023/5/2 17:07, Daniel Rosenberg wrote: >>> On Mon, Apr 24, 2023 at 8:32 AM Miklos Szeredi <miklos@szeredi.hu> wrote: >>>> >>>> >>>> The security model needs to be thought about and documented. Think >>>> about this: the fuse server now delegates operations it would itself >>>> perform to the passthrough code in fuse. The permissions that would >>>> have been checked in the context of the fuse server are now checked in >>>> the context of the task performing the operation. The server may be >>>> able to bypass seccomp restrictions. Files that are open on the >>>> backing filesystem are now hidden (e.g. lsof won't find these), which >>>> allows the server to obfuscate accesses to backing files. Etc. >>>> >>>> These are not particularly worrying if the server is privileged, but >>>> fuse comes with the history of supporting unprivileged servers, so we >>>> should look at supporting passthrough with unprivileged servers as >>>> well. >>>> >>> >>> This is on my todo list. My current plan is to grab the creds that the >>> daemon uses to respond to FUSE_INIT. That should keep behavior fairly >>> similar. I'm not sure if there are cases where the fuse server is >>> operating under multiple contexts. >>> I don't currently have a plan for exposing open files via lsof. Every >>> such file should relate to one that will show up though. I haven't dug >>> into how that's set up, but I'm open to suggestions. >>> >>>> My other generic comment is that you should add justification for >>>> doing this in the first place. I guess it's mainly performance. So >>>> how performance can be won in real life cases? It would also be good >>>> to measure the contribution of individual ops to that win. Is there >>>> another reason for this besides performance? >>>> >>>> Thanks, >>>> Miklos >>> >>> Our main concern with it is performance. We have some preliminary >>> numbers looking at the pure passthrough case. We've been testing using >>> a ramdrive on a somewhat slow machine, as that should highlight >>> differences more. We ran fio for sequential reads, and random >>> read/write. For sequential reads, we were seeing libfuse's >>> passthrough_hp take about a 50% hit, with fuse-bpf not being >>> detectably slower. For random read/write, we were seeing a roughly 90% >>> drop in performance from passthrough_hp, while fuse-bpf has about a 7% >>> drop in read and write speed. When we use a bpf that traces every >>> opcode, that performance hit increases to a roughly 1% drop in >>> sequential read performance, and a 20% drop in both read and write >>> performance for random read/write. We plan to make more complex bpf >>> examples, with fuse daemon equivalents to compare against. >>> >>> We have not looked closely at the impact of individual opcodes yet. >>> >>> There's also a potential ease of use for fuse-bpf. If you're >>> implementing a fuse daemon that is largely mirroring a backing >>> filesystem, you only need to write code for the differences in >>> behavior. For instance, say you want to remove image metadata like >>> location. You could give bpf information on what range of data is >>> metadata, and zero out that section without having to handle any other >>> operations. >> >> A bit out of topic (although I'm not quite look into FUSE BPF internals) >> After roughly listening to this topic in FS track last week, I'm not >> quite sure (at least in the long term) if it might be better if >> ebpf-related filter/redirect stuffs could be landed in vfs or in a >> somewhat stackable fs so that we could redirect/filter any sub-fstree >> in principle? It's just an open question and I have no real tendency >> of this but do we really need a BPF-filter functionality for each >> individual fs? > > I think that is a valid question, but the answer is that even if it makes sense, > doing something like this in vfs would be a much bigger project with larger > consequences on performance and security and whatnot, so even if > (and a very big if) this ever happens, using FUSE-BPF as a playground for > this sort of stuff would be a good idea. My current observation is that the total Fuse-BPF LoC is already beyond the whole FUSE itself. In addition, it almost hooks all fs operations which impacts something to me. > > This reminds me of union mounts - it made sense to have union mount > functionality in vfs, but after a long winding road, a stacked fs (overlayfs) > turned out to be a much more practical solution. Yeah, I agree. So it was just a pure hint on my side. > >> >> It sounds much like >> https://learn.microsoft.com/en-us/windows-hardware/drivers/ifs/about-file-system-filter-drivers >> > > Nice reference. > I must admit that I found it hard to understand what Windows filter drivers > can do compared to FUSE-BPF design. > It'd be nice to get some comparison from what is planned for FUSE-BPF. At least some investigation/analysis first might be better in the long term development. > > Interesting to note that there is a "legacy" Windows filter driver API, > so Windows didn't get everything right for the first API - that is especially > interesting to look at as repeating other people's mistakes would be a shame. I'm not familiar with that details as well, yet I saw that they have a filesystem filter subsystem, so I mentioned it here. Thanks, Gao Xiang > > Thanks, > Amir.
On 2023/5/17 00:05, Gao Xiang wrote:
> Hi Amir,
>
> On 2023/5/17 23:51, Amir Goldstein wrote:
>> On Wed, May 17, 2023 at 5:50 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>>
>>>
>>>
>>> On 2023/5/2 17:07, Daniel Rosenberg wrote:
>>>> On Mon, Apr 24, 2023 at 8:32 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>>
>>>>>
>>>>> The security model needs to be thought about and documented. Think
>>>>> about this: the fuse server now delegates operations it would itself
>>>>> perform to the passthrough code in fuse. The permissions that would
>>>>> have been checked in the context of the fuse server are now checked in
>>>>> the context of the task performing the operation. The server may be
>>>>> able to bypass seccomp restrictions. Files that are open on the
>>>>> backing filesystem are now hidden (e.g. lsof won't find these), which
>>>>> allows the server to obfuscate accesses to backing files. Etc.
>>>>>
>>>>> These are not particularly worrying if the server is privileged, but
>>>>> fuse comes with the history of supporting unprivileged servers, so we
>>>>> should look at supporting passthrough with unprivileged servers as
>>>>> well.
>>>>>
>>>>
>>>> This is on my todo list. My current plan is to grab the creds that the
>>>> daemon uses to respond to FUSE_INIT. That should keep behavior fairly
>>>> similar. I'm not sure if there are cases where the fuse server is
>>>> operating under multiple contexts.
>>>> I don't currently have a plan for exposing open files via lsof. Every
>>>> such file should relate to one that will show up though. I haven't dug
>>>> into how that's set up, but I'm open to suggestions.
>>>>
>>>>> My other generic comment is that you should add justification for
>>>>> doing this in the first place. I guess it's mainly performance. So
>>>>> how performance can be won in real life cases? It would also be good
>>>>> to measure the contribution of individual ops to that win. Is there
>>>>> another reason for this besides performance?
>>>>>
>>>>> Thanks,
>>>>> Miklos
>>>>
>>>> Our main concern with it is performance. We have some preliminary
>>>> numbers looking at the pure passthrough case. We've been testing using
>>>> a ramdrive on a somewhat slow machine, as that should highlight
>>>> differences more. We ran fio for sequential reads, and random
>>>> read/write. For sequential reads, we were seeing libfuse's
>>>> passthrough_hp take about a 50% hit, with fuse-bpf not being
>>>> detectably slower. For random read/write, we were seeing a roughly 90%
>>>> drop in performance from passthrough_hp, while fuse-bpf has about a 7%
>>>> drop in read and write speed. When we use a bpf that traces every
>>>> opcode, that performance hit increases to a roughly 1% drop in
>>>> sequential read performance, and a 20% drop in both read and write
>>>> performance for random read/write. We plan to make more complex bpf
>>>> examples, with fuse daemon equivalents to compare against.
>>>>
>>>> We have not looked closely at the impact of individual opcodes yet.
>>>>
>>>> There's also a potential ease of use for fuse-bpf. If you're
>>>> implementing a fuse daemon that is largely mirroring a backing
>>>> filesystem, you only need to write code for the differences in
>>>> behavior. For instance, say you want to remove image metadata like
>>>> location. You could give bpf information on what range of data is
>>>> metadata, and zero out that section without having to handle any other
>>>> operations.
>>>
>>> A bit out of topic (although I'm not quite look into FUSE BPF internals)
>>> After roughly listening to this topic in FS track last week, I'm not
>>> quite sure (at least in the long term) if it might be better if
>>> ebpf-related filter/redirect stuffs could be landed in vfs or in a
>>> somewhat stackable fs so that we could redirect/filter any sub-fstree
>>> in principle? It's just an open question and I have no real tendency
>>> of this but do we really need a BPF-filter functionality for each
>>> individual fs?
>>
>> I think that is a valid question, but the answer is that even if it makes sense,
>> doing something like this in vfs would be a much bigger project with larger
>> consequences on performance and security and whatnot, so even if
>> (and a very big if) this ever happens, using FUSE-BPF as a playground for
>> this sort of stuff would be a good idea.
>
> My current observation is that the total Fuse-BPF LoC is already beyond the
^ sorry I double-checked now I was wrong, forget about it.
> whole FUSE itself. In addition, it almost hooks all fs operations which
> impacts something to me.
>
>>
>> This reminds me of union mounts - it made sense to have union mount
>> functionality in vfs, but after a long winding road, a stacked fs (overlayfs)
>> turned out to be a much more practical solution.
>
> Yeah, I agree. So it was just a pure hint on my side.
>
>>
>>>
>>> It sounds much like
>>> https://learn.microsoft.com/en-us/windows-hardware/drivers/ifs/about-file-system-filter-drivers
>>>
>>
>> Nice reference.
>> I must admit that I found it hard to understand what Windows filter drivers
>> can do compared to FUSE-BPF design.
>> It'd be nice to get some comparison from what is planned for FUSE-BPF.
>
> At least some investigation/analysis first might be better in the long
> term development.
>
>>
>> Interesting to note that there is a "legacy" Windows filter driver API,
>> so Windows didn't get everything right for the first API - that is especially
>> interesting to look at as repeating other people's mistakes would be a shame.
>
> I'm not familiar with that details as well, yet I saw that they have a
> filesystem filter subsystem, so I mentioned it here.
>
> Thanks,
> Gao Xiang
>
>>
>> Thanks,
>> Amir.
© 2016 - 2025 Red Hat, Inc.