[v3] FUSE BPF: A Stacked Filesystem Extension for FUSE

[RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE

Daniel Rosenberg posted 37 patches 2 years, 9 months ago

Diff against v2
Download series mbox

Documentation/bpf/kfuncs.rst                  |   23 +-
fs/fuse/Kconfig                               |    8 +
fs/fuse/Makefile                              |    1 +
fs/fuse/backing.c                             | 4241 +++++++++++++++++
fs/fuse/bpf_register.c                        |  209 +
fs/fuse/control.c                             |    2 +-
fs/fuse/dev.c                                 |   85 +-
fs/fuse/dir.c                                 |  344 +-
fs/fuse/file.c                                |   63 +-
fs/fuse/fuse_i.h                              |  495 +-
fs/fuse/inode.c                               |  360 +-
fs/fuse/ioctl.c                               |    2 +-
fs/fuse/readdir.c                             |    5 +
fs/fuse/xattr.c                               |   18 +
fs/overlayfs/file.c                           |   23 +-
include/linux/bpf.h                           |    2 +-
include/linux/bpf_fuse.h                      |  283 ++
include/linux/fs.h                            |    5 +
include/uapi/linux/bpf.h                      |   12 +
include/uapi/linux/fuse.h                     |   41 +
kernel/bpf/Makefile                           |    4 +
kernel/bpf/bpf_fuse.c                         |  241 +
kernel/bpf/bpf_struct_ops.c                   |    6 +-
kernel/bpf/bpf_struct_ops_types.h             |    4 +
kernel/bpf/btf.c                              |    1 +
kernel/bpf/helpers.c                          |   32 +-
kernel/bpf/verifier.c                         |   32 +
tools/include/uapi/linux/bpf.h                |   12 +
tools/include/uapi/linux/fuse.h               | 1135 +++++
.../testing/selftests/bpf/prog_tests/dynptr.c |    1 +
.../selftests/bpf/progs/dynptr_success.c      |   21 +
.../selftests/filesystems/fuse/.gitignore     |    2 +
.../selftests/filesystems/fuse/Makefile       |  189 +
.../testing/selftests/filesystems/fuse/OWNERS |    2 +
.../selftests/filesystems/fuse/bpf_common.h   |   51 +
.../selftests/filesystems/fuse/bpf_loader.c   |  597 +++
.../testing/selftests/filesystems/fuse/fd.txt |   21 +
.../selftests/filesystems/fuse/fd_bpf.bpf.c   |  397 ++
.../selftests/filesystems/fuse/fuse_daemon.c  |  300 ++
.../selftests/filesystems/fuse/fuse_test.c    | 2412 ++++++++++
.../filesystems/fuse/struct_op_test.bpf.c     |  642 +++
.../selftests/filesystems/fuse/test.bpf.c     |  996 ++++
.../filesystems/fuse/test_framework.h         |  172 +
.../selftests/filesystems/fuse/test_fuse.h    |  494 ++
44 files changed, 13755 insertions(+), 231 deletions(-)
create mode 100644 fs/fuse/backing.c
create mode 100644 fs/fuse/bpf_register.c
create mode 100644 include/linux/bpf_fuse.h
create mode 100644 kernel/bpf/bpf_fuse.c
create mode 100644 tools/include/uapi/linux/fuse.h
create mode 100644 tools/testing/selftests/filesystems/fuse/.gitignore
create mode 100644 tools/testing/selftests/filesystems/fuse/Makefile
create mode 100644 tools/testing/selftests/filesystems/fuse/OWNERS
create mode 100644 tools/testing/selftests/filesystems/fuse/bpf_common.h
create mode 100644 tools/testing/selftests/filesystems/fuse/bpf_loader.c
create mode 100644 tools/testing/selftests/filesystems/fuse/fd.txt
create mode 100644 tools/testing/selftests/filesystems/fuse/fd_bpf.bpf.c
create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_daemon.c
create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_test.c
create mode 100644 tools/testing/selftests/filesystems/fuse/struct_op_test.bpf.c
create mode 100644 tools/testing/selftests/filesystems/fuse/test.bpf.c
create mode 100644 tools/testing/selftests/filesystems/fuse/test_framework.h
create mode 100644 tools/testing/selftests/filesystems/fuse/test_fuse.h

Expand all Fold all

[RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE

Posted by Daniel Rosenberg 2 years, 9 months ago

These patches extend FUSE to be able to act as a stacked filesystem. This
allows pure passthrough, where the fuse file system simply reflects the lower
filesystem, and also allows optional pre and post filtering in BPF and/or the
userspace daemon as needed. This can dramatically reduce or even eliminate
transitions to and from userspace.

In this patch set, I've reworked the bpf code to add a new struct_op type
instead of a new program type, and used new kfuncs in place of new helpers.
Additionally, it now uses dynptrs for variable sized buffers. The first three
patches are repeats of a previous patch set which I have not yet adjusted for
comments. I plan to adjust those and submit them separately with fixes, but
wanted to have the current fuse-bpf code visible before then.

Patches 4-7 mostly rearrange existing code to remove noise from the main patch.
Patch 8 contains the main sections of fuse-bpf
Patches 9-25 implementing most FUSE functions as operations on a lower
filesystem. From patch 25, you can run fuse as a passthrough filesystem.
Patches 26-32 provide bpf functionality so that you can alter fuse parameters
via fuse_op programs.
Patch 33 extends this to userspace, and patches 34-37 add some testing
functionality.

There's definitely a lot of cleanup and some restructuring I would like to do.
In the current form, I could get rid of the large macro in place of a function
that takes a struct that groups a bunch of function pointers, although I'm not
sure a function that takes three void*'s is much better than the macro... I'm
definitely open to suggestions on how to clean that up.

This changes the format of adding a backing file/bpf slightly from v2. fuse_op
programs are specified by name, limited to 15 characters. The block added to
fuse_bpf_entires has been increased to compensate. This adds one more unused
field when specifying the backing file.

Lookups responses that add a backing file must go through an ioctl interface.
This is to prevent any attempts at fooling priveledged programs with fd
trickery.

Currently, there are two types of fuse_bpf_entry. One for passing the fuse_op
program you wish to use, specified by name, and one for passing the fd of the
backing file you'd like to associate with the given lookup. In the future, this
may be extended to a more complicated system allowing for multiple bpf programs
or backing files. This would come with kfuncs for bpf to indicate which backing
file should be acted upon. Multiple bpf programs would allow chaining existing
programs to extend functionality without requiring an entirely new program set.

You can run this without needing to set up a userspace daemon by adding these
mount options: root_dir=[fd],no_daemon where fd is an open file descriptor
pointing to the folder you'd like to use as the root directory. The fd can be
immediately closed after mounting. You may also set a root_bpf program by
setting root_bpf=[fuse_op name] after registering a fuse_op program.
This is useful for running various fs tests.

This patch set is against bpf-next

The main changes for v3:
Restructured around struct_op programs
Using dynptrs instead of packets
Using kfuncs instead of new helpers
Selftests now use skel for loading

Alessio Balsini (1):
  fs: Generic function to convert iocb to rw flags

Daniel Rosenberg (36):
  bpf: verifier: Accept dynptr mem as mem in herlpers
  bpf: Allow NULL buffers in bpf_dynptr_slice(_rw)
  selftests/bpf: Test allowing NULL buffer in dynptr slice
  fuse-bpf: Update fuse side uapi
  fuse-bpf: Add data structures for fuse-bpf
  fuse-bpf: Prepare for fuse-bpf patch
  fuse: Add fuse-bpf, a stacked fs extension for FUSE
  fuse-bpf: Add ioctl interface for /dev/fuse
  fuse-bpf: Don't support export_operations
  fuse-bpf: Add support for access
  fuse-bpf: Partially add mapping support
  fuse-bpf: Add lseek support
  fuse-bpf: Add support for fallocate
  fuse-bpf: Support file/dir open/close
  fuse-bpf: Support mknod/unlink/mkdir/rmdir
  fuse-bpf: Add support for read/write iter
  fuse-bpf: support readdir
  fuse-bpf: Add support for sync operations
  fuse-bpf: Add Rename support
  fuse-bpf: Add attr support
  fuse-bpf: Add support for FUSE_COPY_FILE_RANGE
  fuse-bpf: Add xattr support
  fuse-bpf: Add symlink/link support
  fuse-bpf: allow mounting with no userspace daemon
  bpf: Increase struct_op limits
  fuse-bpf: Add fuse-bpf constants
  WIP: bpf: Add fuse_ops struct_op programs
  fuse-bpf: Export Functions
  fuse: Provide registration functions for fuse-bpf
  fuse-bpf: Set fuse_ops at mount or lookup time
  fuse-bpf: Call bpf for pre/post filters
  fuse-bpf: Add userspace pre/post filters
  WIP: fuse-bpf: add error_out
  tools: Add FUSE, update bpf includes
  fuse-bpf: Add selftests
  fuse: Provide easy way to test fuse struct_op call

 Documentation/bpf/kfuncs.rst                  |   23 +-
 fs/fuse/Kconfig                               |    8 +
 fs/fuse/Makefile                              |    1 +
 fs/fuse/backing.c                             | 4241 +++++++++++++++++
 fs/fuse/bpf_register.c                        |  209 +
 fs/fuse/control.c                             |    2 +-
 fs/fuse/dev.c                                 |   85 +-
 fs/fuse/dir.c                                 |  344 +-
 fs/fuse/file.c                                |   63 +-
 fs/fuse/fuse_i.h                              |  495 +-
 fs/fuse/inode.c                               |  360 +-
 fs/fuse/ioctl.c                               |    2 +-
 fs/fuse/readdir.c                             |    5 +
 fs/fuse/xattr.c                               |   18 +
 fs/overlayfs/file.c                           |   23 +-
 include/linux/bpf.h                           |    2 +-
 include/linux/bpf_fuse.h                      |  283 ++
 include/linux/fs.h                            |    5 +
 include/uapi/linux/bpf.h                      |   12 +
 include/uapi/linux/fuse.h                     |   41 +
 kernel/bpf/Makefile                           |    4 +
 kernel/bpf/bpf_fuse.c                         |  241 +
 kernel/bpf/bpf_struct_ops.c                   |    6 +-
 kernel/bpf/bpf_struct_ops_types.h             |    4 +
 kernel/bpf/btf.c                              |    1 +
 kernel/bpf/helpers.c                          |   32 +-
 kernel/bpf/verifier.c                         |   32 +
 tools/include/uapi/linux/bpf.h                |   12 +
 tools/include/uapi/linux/fuse.h               | 1135 +++++
 .../testing/selftests/bpf/prog_tests/dynptr.c |    1 +
 .../selftests/bpf/progs/dynptr_success.c      |   21 +
 .../selftests/filesystems/fuse/.gitignore     |    2 +
 .../selftests/filesystems/fuse/Makefile       |  189 +
 .../testing/selftests/filesystems/fuse/OWNERS |    2 +
 .../selftests/filesystems/fuse/bpf_common.h   |   51 +
 .../selftests/filesystems/fuse/bpf_loader.c   |  597 +++
 .../testing/selftests/filesystems/fuse/fd.txt |   21 +
 .../selftests/filesystems/fuse/fd_bpf.bpf.c   |  397 ++
 .../selftests/filesystems/fuse/fuse_daemon.c  |  300 ++
 .../selftests/filesystems/fuse/fuse_test.c    | 2412 ++++++++++
 .../filesystems/fuse/struct_op_test.bpf.c     |  642 +++
 .../selftests/filesystems/fuse/test.bpf.c     |  996 ++++
 .../filesystems/fuse/test_framework.h         |  172 +
 .../selftests/filesystems/fuse/test_fuse.h    |  494 ++
 44 files changed, 13755 insertions(+), 231 deletions(-)
 create mode 100644 fs/fuse/backing.c
 create mode 100644 fs/fuse/bpf_register.c
 create mode 100644 include/linux/bpf_fuse.h
 create mode 100644 kernel/bpf/bpf_fuse.c
 create mode 100644 tools/include/uapi/linux/fuse.h
 create mode 100644 tools/testing/selftests/filesystems/fuse/.gitignore
 create mode 100644 tools/testing/selftests/filesystems/fuse/Makefile
 create mode 100644 tools/testing/selftests/filesystems/fuse/OWNERS
 create mode 100644 tools/testing/selftests/filesystems/fuse/bpf_common.h
 create mode 100644 tools/testing/selftests/filesystems/fuse/bpf_loader.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/fd.txt
 create mode 100644 tools/testing/selftests/filesystems/fuse/fd_bpf.bpf.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_daemon.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_test.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/struct_op_test.bpf.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/test.bpf.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/test_framework.h
 create mode 100644 tools/testing/selftests/filesystems/fuse/test_fuse.h


base-commit: 49859de997c3115b85544bce6b6ceab60a7fabc4
-- 
2.40.0.634.g4ca3ef3211-goog

Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE

Posted by Amir Goldstein 2 years, 9 months ago

On Tue, Apr 18, 2023 at 4:40 AM Daniel Rosenberg <drosen@google.com> wrote:
>
> These patches extend FUSE to be able to act as a stacked filesystem. This
> allows pure passthrough, where the fuse file system simply reflects the lower
> filesystem, and also allows optional pre and post filtering in BPF and/or the
> userspace daemon as needed. This can dramatically reduce or even eliminate
> transitions to and from userspace.
>
> In this patch set, I've reworked the bpf code to add a new struct_op type
> instead of a new program type, and used new kfuncs in place of new helpers.
> Additionally, it now uses dynptrs for variable sized buffers. The first three
> patches are repeats of a previous patch set which I have not yet adjusted for
> comments. I plan to adjust those and submit them separately with fixes, but
> wanted to have the current fuse-bpf code visible before then.
>
> Patches 4-7 mostly rearrange existing code to remove noise from the main patch.
> Patch 8 contains the main sections of fuse-bpf
> Patches 9-25 implementing most FUSE functions as operations on a lower
> filesystem. From patch 25, you can run fuse as a passthrough filesystem.
> Patches 26-32 provide bpf functionality so that you can alter fuse parameters
> via fuse_op programs.
> Patch 33 extends this to userspace, and patches 34-37 add some testing
> functionality.
>

That's a nice logical breakup for review.

I feel there is so much subtle code in those patches that the
only sane path forward is to review and merge them in phases.

Your patches adds this config:

+config FUSE_BPF
+       bool "Adds BPF to fuse"
+       depends on FUSE_FS
+       depends on BPF
+       help
+         Extends FUSE by adding BPF to prefilter calls and
potentially pass to a
+         backing file system

Since your patches add the PASSTHROUGH functionality before adding
BPF functionality, would it make sense to review and merge the PASSTHROUGH
functionality strictly before the BPF functionality?

Alternatively, you could aim to merge support for some PASSTHROUGH ops
then support for some BPF functionality and then slowly add ops to both.

Which brings me to my biggest concern.
I still do not see how these patches replace Allesio's
FUSE_DEV_IOC_PASSTHROUGH_OPEN patches.

Is the idea here that ioctl needs to be done at FUSE_LOOKUP
instead or in addition to the ioctl on FUSE_OPEN to setup the
read/write passthrough on the backing file?

I am missing things like the FILESYSTEM_MAX_STACK_DEPTH check that
was added as a result of review on Allesio's patches.

The reason I am concerned about this is that we are using the
FUSE_DEV_IOC_PASSTHROUGH_OPEN patches and I would like
to upstream their functionality sooner rather than later.
These patches have already been running in production for a while
I believe that they are running in Android as well and there is value
in upsteaming well tested patches.

The API does not need to stay FUSE_DEV_IOC_PASSTHROUGH_OPEN
it should be an API that is extendable to FUSE-BPF, but it would be
useful if the read/write passthrough could be the goal for first merge.

Does any of this make sense to you?
Can you draw a roadmap for merging FUSE-BPF that starts with
a first (hopefully short term) phase that adds the read/write passthrough
functionality?

I can help with review and testing of that part if needed.
I was planning to discuss this with you on LSFMM anyway,
but better start the discussion beforehand.

Thanks,
Amir.

Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE

Posted by Daniel Rosenberg 2 years, 9 months ago

On Mon, Apr 17, 2023 at 10:33 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
>
> Which brings me to my biggest concern.
> I still do not see how these patches replace Allesio's
> FUSE_DEV_IOC_PASSTHROUGH_OPEN patches.
>
> Is the idea here that ioctl needs to be done at FUSE_LOOKUP
> instead or in addition to the ioctl on FUSE_OPEN to setup the
> read/write passthrough on the backing file?
>

In these patches, the fuse daemon responds to the lookup request via
an ioctl, essentially in the same way it would have to the /dev/fuse
node. It just flags the write as coming from an ioctl and calls
fuse_dev_do_write. An additional block in the lookup response gives
the backing file and what bpf_ops to use. The main difference is that
fuse-bpf uses backing inodes, while passthrough uses a file.
Fuse-bpf's read/write support currently isn't complete, but it does
allow for direct passthrough. You could set ops to default to
userspace in every case that Allesio's passthrough code does and it
should have about the same effect. With the struct_op change, I did
notice that doing something like that is more annoying, and am
planning to add a default op which only takes the meta info and runs
if the opcode specific op is not present.

> I am missing things like the FILESYSTEM_MAX_STACK_DEPTH check that
> was added as a result of review on Allesio's patches.
>

I'd definitely want to fix any issues that were fixed there. There's a
lot of common code between fuse-bpf and fuse passthrough, so many of
the suggestions there will apply here.

> The reason I am concerned about this is that we are using the
> FUSE_DEV_IOC_PASSTHROUGH_OPEN patches and I would like
> to upstream their functionality sooner rather than later.
> These patches have already been running in production for a while
> I believe that they are running in Android as well and there is value
> in upsteaming well tested patches.
>
> The API does not need to stay FUSE_DEV_IOC_PASSTHROUGH_OPEN
> it should be an API that is extendable to FUSE-BPF, but it would be
> useful if the read/write passthrough could be the goal for first merge.
>
> Does any of this make sense to you?
> Can you draw a roadmap for merging FUSE-BPF that starts with
> a first (hopefully short term) phase that adds the read/write passthrough
> functionality?
>
> I can help with review and testing of that part if needed.
> I was planning to discuss this with you on LSFMM anyway,
> but better start the discussion beforehand.
>
> Thanks,
> Amir.

We've been using an earlier version of fuse-bpf on Android, closer to
the V1 patches. They fit our current needs but don't cover everything
we intend to. The V3 patches switch to a new style of bpf program,
which I'm hoping to get some feedback on before I spend too much time
fixing up the details. The backing calls themselves can be reviewed
separately from that though.

Without bpf, we're essentially enabling complete passthrough at a
directory or file. By default, once you set a backing file fuse-bpf
calls by the backing filesystem by default, with no additional
userspace interaction apart from if an installed bpf program says
otherwise. If we had some commands without others, we'd have behavior
changes as we introduce support for additional calls. We'd need a way
to set default behavior. Perhaps something like a u64 flag field
extension in FUSE_INIT for indicating which opcodes support backing,
and a response for what those should default to doing. If there's a
bpf_op present for a given opcode, it would be able to override that
default. If we had something like that, we'd be able to add support
for a subset of opcodes in a sensible way.

Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE

Posted by Amir Goldstein 2 years, 9 months ago

On Fri, Apr 21, 2023 at 4:41 AM Daniel Rosenberg <drosen@google.com> wrote:
>
> On Mon, Apr 17, 2023 at 10:33 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> >
> > Which brings me to my biggest concern.
> > I still do not see how these patches replace Allesio's
> > FUSE_DEV_IOC_PASSTHROUGH_OPEN patches.
> >
> > Is the idea here that ioctl needs to be done at FUSE_LOOKUP
> > instead or in addition to the ioctl on FUSE_OPEN to setup the
> > read/write passthrough on the backing file?
> >
>
> In these patches, the fuse daemon responds to the lookup request via
> an ioctl, essentially in the same way it would have to the /dev/fuse
> node. It just flags the write as coming from an ioctl and calls
> fuse_dev_do_write. An additional block in the lookup response gives
> the backing file and what bpf_ops to use. The main difference is that
> fuse-bpf uses backing inodes, while passthrough uses a file.

Ah right. I wonder if there is benefit in both APIs or if backing inode
is sufficient to impelelent everything the could be interesting to implement
with a backing file.

> Fuse-bpf's read/write support currently isn't complete, but it does
> allow for direct passthrough. You could set ops to default to
> userspace in every case that Allesio's passthrough code does and it
> should have about the same effect.

What are the subtle differences then?

> With the struct_op change, I did
> notice that doing something like that is more annoying, and am
> planning to add a default op which only takes the meta info and runs
> if the opcode specific op is not present.
>

Sounds interesting. I'll wait to see what you propose.

>
> > I am missing things like the FILESYSTEM_MAX_STACK_DEPTH check that
> > was added as a result of review on Allesio's patches.
> >
>
> I'd definitely want to fix any issues that were fixed there. There's a
> lot of common code between fuse-bpf and fuse passthrough, so many of
> the suggestions there will apply here.
>

That's why I suggested trying to implement the passthough file ioctl
functionality first to make sure that none of the review comments
in the first round were missed.

But if we need functionality of both ioctls, we can collaborate the
work on merging them separately.

> > The reason I am concerned about this is that we are using the
> > FUSE_DEV_IOC_PASSTHROUGH_OPEN patches and I would like
> > to upstream their functionality sooner rather than later.
> > These patches have already been running in production for a while
> > I believe that they are running in Android as well and there is value
> > in upsteaming well tested patches.
> >
> > The API does not need to stay FUSE_DEV_IOC_PASSTHROUGH_OPEN
> > it should be an API that is extendable to FUSE-BPF, but it would be
> > useful if the read/write passthrough could be the goal for first merge.
> >
> > Does any of this make sense to you?
> > Can you draw a roadmap for merging FUSE-BPF that starts with
> > a first (hopefully short term) phase that adds the read/write passthrough
> > functionality?
> >
> > I can help with review and testing of that part if needed.
> > I was planning to discuss this with you on LSFMM anyway,
> > but better start the discussion beforehand.
> >
> > Thanks,
> > Amir.
>
> We've been using an earlier version of fuse-bpf on Android, closer to
> the V1 patches. They fit our current needs but don't cover everything
> we intend to. The V3 patches switch to a new style of bpf program,
> which I'm hoping to get some feedback on before I spend too much time
> fixing up the details. The backing calls themselves can be reviewed
> separately from that though.
>
> Without bpf, we're essentially enabling complete passthrough at a
> directory or file. By default, once you set a backing file fuse-bpf
> calls by the backing filesystem by default, with no additional
> userspace interaction apart from if an installed bpf program says
> otherwise. If we had some commands without others, we'd have behavior
> changes as we introduce support for additional calls. We'd need a way
> to set default behavior. Perhaps something like a u64 flag field
> extension in FUSE_INIT for indicating which opcodes support backing,
> and a response for what those should default to doing. If there's a
> bpf_op present for a given opcode, it would be able to override that
> default. If we had something like that, we'd be able to add support
> for a subset of opcodes in a sensible way.

So maybe this is something to consider.

Thanks,
Amir.

Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE

Posted by Miklos Szeredi 2 years, 9 months ago

On Tue, 18 Apr 2023 at 03:40, Daniel Rosenberg <drosen@google.com> wrote:
>
> These patches extend FUSE to be able to act as a stacked filesystem. This
> allows pure passthrough, where the fuse file system simply reflects the lower
> filesystem, and also allows optional pre and post filtering in BPF and/or the
> userspace daemon as needed. This can dramatically reduce or even eliminate
> transitions to and from userspace.

I'll ignore BPF for now and concentrate on the passthrough aspect,
which I understand better.

The security model needs to be thought about and documented.  Think
about this: the fuse server now delegates operations it would itself
perform to the passthrough code in fuse.  The permissions that would
have been checked in the context of the fuse server are now checked in
the context of the task performing the operation.  The server may be
able to bypass seccomp restrictions.  Files that are open on the
backing filesystem are now hidden (e.g. lsof won't find these), which
allows the server to obfuscate accesses to backing files.  Etc.

These are not particularly worrying if the server is privileged, but
fuse comes with the history of supporting unprivileged servers, so we
should look at supporting passthrough with unprivileged servers as
well.

My other generic comment is that you should add justification for
doing this in the first place.  I guess it's mainly performance.  So
how performance can be won in real life cases?   It would also be good
to measure the contribution of individual ops to that win.   Is there
another reason for this besides performance?

Thanks,
Miklos

Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE

Posted by Daniel Rosenberg 2 years, 9 months ago

On Mon, Apr 24, 2023 at 8:32 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>
> The security model needs to be thought about and documented.  Think
> about this: the fuse server now delegates operations it would itself
> perform to the passthrough code in fuse.  The permissions that would
> have been checked in the context of the fuse server are now checked in
> the context of the task performing the operation.  The server may be
> able to bypass seccomp restrictions.  Files that are open on the
> backing filesystem are now hidden (e.g. lsof won't find these), which
> allows the server to obfuscate accesses to backing files.  Etc.
>
> These are not particularly worrying if the server is privileged, but
> fuse comes with the history of supporting unprivileged servers, so we
> should look at supporting passthrough with unprivileged servers as
> well.
>

This is on my todo list. My current plan is to grab the creds that the
daemon uses to respond to FUSE_INIT. That should keep behavior fairly
similar. I'm not sure if there are cases where the fuse server is
operating under multiple contexts.
I don't currently have a plan for exposing open files via lsof. Every
such file should relate to one that will show up though. I haven't dug
into how that's set up, but I'm open to suggestions.

> My other generic comment is that you should add justification for
> doing this in the first place.  I guess it's mainly performance.  So
> how performance can be won in real life cases?   It would also be good
> to measure the contribution of individual ops to that win.   Is there
> another reason for this besides performance?
>
> Thanks,
> Miklos

Our main concern with it is performance. We have some preliminary
numbers looking at the pure passthrough case. We've been testing using
a ramdrive on a somewhat slow machine, as that should highlight
differences more. We ran fio for sequential reads, and random
read/write. For sequential reads, we were seeing libfuse's
passthrough_hp take about a 50% hit, with fuse-bpf not being
detectably slower. For random read/write, we were seeing a roughly 90%
drop in performance from passthrough_hp, while fuse-bpf has about a 7%
drop in read and write speed. When we use a bpf that traces every
opcode, that performance hit increases to a roughly 1% drop in
sequential read performance, and a 20% drop in both read and write
performance for random read/write. We plan to make more complex bpf
examples, with fuse daemon equivalents to compare against.

We have not looked closely at the impact of individual opcodes yet.

There's also a potential ease of use for fuse-bpf. If you're
implementing a fuse daemon that is largely mirroring a backing
filesystem, you only need to write code for the differences in
behavior. For instance, say you want to remove image metadata like
location. You could give bpf information on what range of data is
metadata, and zero out that section without having to handle any other
operations.

 -Daniel

Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE

Posted by Gao Xiang 2 years, 8 months ago


On 2023/5/2 17:07, Daniel Rosenberg wrote:
> On Mon, Apr 24, 2023 at 8:32 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>>
>> The security model needs to be thought about and documented.  Think
>> about this: the fuse server now delegates operations it would itself
>> perform to the passthrough code in fuse.  The permissions that would
>> have been checked in the context of the fuse server are now checked in
>> the context of the task performing the operation.  The server may be
>> able to bypass seccomp restrictions.  Files that are open on the
>> backing filesystem are now hidden (e.g. lsof won't find these), which
>> allows the server to obfuscate accesses to backing files.  Etc.
>>
>> These are not particularly worrying if the server is privileged, but
>> fuse comes with the history of supporting unprivileged servers, so we
>> should look at supporting passthrough with unprivileged servers as
>> well.
>>
> 
> This is on my todo list. My current plan is to grab the creds that the
> daemon uses to respond to FUSE_INIT. That should keep behavior fairly
> similar. I'm not sure if there are cases where the fuse server is
> operating under multiple contexts.
> I don't currently have a plan for exposing open files via lsof. Every
> such file should relate to one that will show up though. I haven't dug
> into how that's set up, but I'm open to suggestions.
> 
>> My other generic comment is that you should add justification for
>> doing this in the first place.  I guess it's mainly performance.  So
>> how performance can be won in real life cases?   It would also be good
>> to measure the contribution of individual ops to that win.   Is there
>> another reason for this besides performance?
>>
>> Thanks,
>> Miklos
> 
> Our main concern with it is performance. We have some preliminary
> numbers looking at the pure passthrough case. We've been testing using
> a ramdrive on a somewhat slow machine, as that should highlight
> differences more. We ran fio for sequential reads, and random
> read/write. For sequential reads, we were seeing libfuse's
> passthrough_hp take about a 50% hit, with fuse-bpf not being
> detectably slower. For random read/write, we were seeing a roughly 90%
> drop in performance from passthrough_hp, while fuse-bpf has about a 7%
> drop in read and write speed. When we use a bpf that traces every
> opcode, that performance hit increases to a roughly 1% drop in
> sequential read performance, and a 20% drop in both read and write
> performance for random read/write. We plan to make more complex bpf
> examples, with fuse daemon equivalents to compare against.
> 
> We have not looked closely at the impact of individual opcodes yet.
> 
> There's also a potential ease of use for fuse-bpf. If you're
> implementing a fuse daemon that is largely mirroring a backing
> filesystem, you only need to write code for the differences in
> behavior. For instance, say you want to remove image metadata like
> location. You could give bpf information on what range of data is
> metadata, and zero out that section without having to handle any other
> operations.

A bit out of topic (although I'm not quite look into FUSE BPF internals)
After roughly listening to this topic in FS track last week, I'm not
quite sure (at least in the long term) if it might be better if
ebpf-related filter/redirect stuffs could be landed in vfs or in a
somewhat stackable fs so that we could redirect/filter any sub-fstree
in principle?    It's just an open question and I have no real tendency
of this but do we really need a BPF-filter functionality for each
individual fs?

It sounds much like
https://learn.microsoft.com/en-us/windows-hardware/drivers/ifs/about-file-system-filter-drivers

Thanks,
Gao Xiang

> 
>   -Daniel

Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE

Posted by Amir Goldstein 2 years, 8 months ago

On Wed, May 17, 2023 at 5:50 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>
>
>
> On 2023/5/2 17:07, Daniel Rosenberg wrote:
> > On Mon, Apr 24, 2023 at 8:32 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>
> >>
> >> The security model needs to be thought about and documented.  Think
> >> about this: the fuse server now delegates operations it would itself
> >> perform to the passthrough code in fuse.  The permissions that would
> >> have been checked in the context of the fuse server are now checked in
> >> the context of the task performing the operation.  The server may be
> >> able to bypass seccomp restrictions.  Files that are open on the
> >> backing filesystem are now hidden (e.g. lsof won't find these), which
> >> allows the server to obfuscate accesses to backing files.  Etc.
> >>
> >> These are not particularly worrying if the server is privileged, but
> >> fuse comes with the history of supporting unprivileged servers, so we
> >> should look at supporting passthrough with unprivileged servers as
> >> well.
> >>
> >
> > This is on my todo list. My current plan is to grab the creds that the
> > daemon uses to respond to FUSE_INIT. That should keep behavior fairly
> > similar. I'm not sure if there are cases where the fuse server is
> > operating under multiple contexts.
> > I don't currently have a plan for exposing open files via lsof. Every
> > such file should relate to one that will show up though. I haven't dug
> > into how that's set up, but I'm open to suggestions.
> >
> >> My other generic comment is that you should add justification for
> >> doing this in the first place.  I guess it's mainly performance.  So
> >> how performance can be won in real life cases?   It would also be good
> >> to measure the contribution of individual ops to that win.   Is there
> >> another reason for this besides performance?
> >>
> >> Thanks,
> >> Miklos
> >
> > Our main concern with it is performance. We have some preliminary
> > numbers looking at the pure passthrough case. We've been testing using
> > a ramdrive on a somewhat slow machine, as that should highlight
> > differences more. We ran fio for sequential reads, and random
> > read/write. For sequential reads, we were seeing libfuse's
> > passthrough_hp take about a 50% hit, with fuse-bpf not being
> > detectably slower. For random read/write, we were seeing a roughly 90%
> > drop in performance from passthrough_hp, while fuse-bpf has about a 7%
> > drop in read and write speed. When we use a bpf that traces every
> > opcode, that performance hit increases to a roughly 1% drop in
> > sequential read performance, and a 20% drop in both read and write
> > performance for random read/write. We plan to make more complex bpf
> > examples, with fuse daemon equivalents to compare against.
> >
> > We have not looked closely at the impact of individual opcodes yet.
> >
> > There's also a potential ease of use for fuse-bpf. If you're
> > implementing a fuse daemon that is largely mirroring a backing
> > filesystem, you only need to write code for the differences in
> > behavior. For instance, say you want to remove image metadata like
> > location. You could give bpf information on what range of data is
> > metadata, and zero out that section without having to handle any other
> > operations.
>
> A bit out of topic (although I'm not quite look into FUSE BPF internals)
> After roughly listening to this topic in FS track last week, I'm not
> quite sure (at least in the long term) if it might be better if
> ebpf-related filter/redirect stuffs could be landed in vfs or in a
> somewhat stackable fs so that we could redirect/filter any sub-fstree
> in principle?    It's just an open question and I have no real tendency
> of this but do we really need a BPF-filter functionality for each
> individual fs?

I think that is a valid question, but the answer is that even if it makes sense,
doing something like this in vfs would be a much bigger project with larger
consequences on performance and security and whatnot, so even if
(and a very big if) this ever happens, using FUSE-BPF as a playground for
this sort of stuff would be a good idea.

This reminds me of union mounts - it made sense to have union mount
functionality in vfs, but after a long winding road, a stacked fs (overlayfs)
turned out to be a much more practical solution.

>
> It sounds much like
> https://learn.microsoft.com/en-us/windows-hardware/drivers/ifs/about-file-system-filter-drivers
>

Nice reference.
I must admit that I found it hard to understand what Windows filter drivers
can do compared to FUSE-BPF design.
It'd be nice to get some comparison from what is planned for FUSE-BPF.

Interesting to note that there is a "legacy" Windows filter driver API,
so Windows didn't get everything right for the first API - that is especially
interesting to look at as repeating other people's mistakes would be a shame.

Thanks,
Amir.

Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE

Posted by Gao Xiang 2 years, 8 months ago

Hi Amir,

On 2023/5/17 23:51, Amir Goldstein wrote:
> On Wed, May 17, 2023 at 5:50 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>
>>
>>
>> On 2023/5/2 17:07, Daniel Rosenberg wrote:
>>> On Mon, Apr 24, 2023 at 8:32 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>
>>>>
>>>> The security model needs to be thought about and documented.  Think
>>>> about this: the fuse server now delegates operations it would itself
>>>> perform to the passthrough code in fuse.  The permissions that would
>>>> have been checked in the context of the fuse server are now checked in
>>>> the context of the task performing the operation.  The server may be
>>>> able to bypass seccomp restrictions.  Files that are open on the
>>>> backing filesystem are now hidden (e.g. lsof won't find these), which
>>>> allows the server to obfuscate accesses to backing files.  Etc.
>>>>
>>>> These are not particularly worrying if the server is privileged, but
>>>> fuse comes with the history of supporting unprivileged servers, so we
>>>> should look at supporting passthrough with unprivileged servers as
>>>> well.
>>>>
>>>
>>> This is on my todo list. My current plan is to grab the creds that the
>>> daemon uses to respond to FUSE_INIT. That should keep behavior fairly
>>> similar. I'm not sure if there are cases where the fuse server is
>>> operating under multiple contexts.
>>> I don't currently have a plan for exposing open files via lsof. Every
>>> such file should relate to one that will show up though. I haven't dug
>>> into how that's set up, but I'm open to suggestions.
>>>
>>>> My other generic comment is that you should add justification for
>>>> doing this in the first place.  I guess it's mainly performance.  So
>>>> how performance can be won in real life cases?   It would also be good
>>>> to measure the contribution of individual ops to that win.   Is there
>>>> another reason for this besides performance?
>>>>
>>>> Thanks,
>>>> Miklos
>>>
>>> Our main concern with it is performance. We have some preliminary
>>> numbers looking at the pure passthrough case. We've been testing using
>>> a ramdrive on a somewhat slow machine, as that should highlight
>>> differences more. We ran fio for sequential reads, and random
>>> read/write. For sequential reads, we were seeing libfuse's
>>> passthrough_hp take about a 50% hit, with fuse-bpf not being
>>> detectably slower. For random read/write, we were seeing a roughly 90%
>>> drop in performance from passthrough_hp, while fuse-bpf has about a 7%
>>> drop in read and write speed. When we use a bpf that traces every
>>> opcode, that performance hit increases to a roughly 1% drop in
>>> sequential read performance, and a 20% drop in both read and write
>>> performance for random read/write. We plan to make more complex bpf
>>> examples, with fuse daemon equivalents to compare against.
>>>
>>> We have not looked closely at the impact of individual opcodes yet.
>>>
>>> There's also a potential ease of use for fuse-bpf. If you're
>>> implementing a fuse daemon that is largely mirroring a backing
>>> filesystem, you only need to write code for the differences in
>>> behavior. For instance, say you want to remove image metadata like
>>> location. You could give bpf information on what range of data is
>>> metadata, and zero out that section without having to handle any other
>>> operations.
>>
>> A bit out of topic (although I'm not quite look into FUSE BPF internals)
>> After roughly listening to this topic in FS track last week, I'm not
>> quite sure (at least in the long term) if it might be better if
>> ebpf-related filter/redirect stuffs could be landed in vfs or in a
>> somewhat stackable fs so that we could redirect/filter any sub-fstree
>> in principle?    It's just an open question and I have no real tendency
>> of this but do we really need a BPF-filter functionality for each
>> individual fs?
> 
> I think that is a valid question, but the answer is that even if it makes sense,
> doing something like this in vfs would be a much bigger project with larger
> consequences on performance and security and whatnot, so even if
> (and a very big if) this ever happens, using FUSE-BPF as a playground for
> this sort of stuff would be a good idea.

My current observation is that the total Fuse-BPF LoC is already beyond the
whole FUSE itself.  In addition, it almost hooks all fs operations which
impacts something to me.

> 
> This reminds me of union mounts - it made sense to have union mount
> functionality in vfs, but after a long winding road, a stacked fs (overlayfs)
> turned out to be a much more practical solution.

Yeah, I agree.  So it was just a pure hint on my side.

> 
>>
>> It sounds much like
>> https://learn.microsoft.com/en-us/windows-hardware/drivers/ifs/about-file-system-filter-drivers
>>
> 
> Nice reference.
> I must admit that I found it hard to understand what Windows filter drivers
> can do compared to FUSE-BPF design.
> It'd be nice to get some comparison from what is planned for FUSE-BPF.

At least some investigation/analysis first might be better in the long
term development.

> 
> Interesting to note that there is a "legacy" Windows filter driver API,
> so Windows didn't get everything right for the first API - that is especially
> interesting to look at as repeating other people's mistakes would be a shame.

I'm not familiar with that details as well, yet I saw that they have a
filesystem filter subsystem, so I mentioned it here.

Thanks,
Gao Xiang

> 
> Thanks,
> Amir.

Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE

Posted by Gao Xiang 2 years, 8 months ago


On 2023/5/17 00:05, Gao Xiang wrote:
> Hi Amir,
> 
> On 2023/5/17 23:51, Amir Goldstein wrote:
>> On Wed, May 17, 2023 at 5:50 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
>>>
>>>
>>>
>>> On 2023/5/2 17:07, Daniel Rosenberg wrote:
>>>> On Mon, Apr 24, 2023 at 8:32 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>>
>>>>>
>>>>> The security model needs to be thought about and documented.  Think
>>>>> about this: the fuse server now delegates operations it would itself
>>>>> perform to the passthrough code in fuse.  The permissions that would
>>>>> have been checked in the context of the fuse server are now checked in
>>>>> the context of the task performing the operation.  The server may be
>>>>> able to bypass seccomp restrictions.  Files that are open on the
>>>>> backing filesystem are now hidden (e.g. lsof won't find these), which
>>>>> allows the server to obfuscate accesses to backing files.  Etc.
>>>>>
>>>>> These are not particularly worrying if the server is privileged, but
>>>>> fuse comes with the history of supporting unprivileged servers, so we
>>>>> should look at supporting passthrough with unprivileged servers as
>>>>> well.
>>>>>
>>>>
>>>> This is on my todo list. My current plan is to grab the creds that the
>>>> daemon uses to respond to FUSE_INIT. That should keep behavior fairly
>>>> similar. I'm not sure if there are cases where the fuse server is
>>>> operating under multiple contexts.
>>>> I don't currently have a plan for exposing open files via lsof. Every
>>>> such file should relate to one that will show up though. I haven't dug
>>>> into how that's set up, but I'm open to suggestions.
>>>>
>>>>> My other generic comment is that you should add justification for
>>>>> doing this in the first place.  I guess it's mainly performance.  So
>>>>> how performance can be won in real life cases?   It would also be good
>>>>> to measure the contribution of individual ops to that win.   Is there
>>>>> another reason for this besides performance?
>>>>>
>>>>> Thanks,
>>>>> Miklos
>>>>
>>>> Our main concern with it is performance. We have some preliminary
>>>> numbers looking at the pure passthrough case. We've been testing using
>>>> a ramdrive on a somewhat slow machine, as that should highlight
>>>> differences more. We ran fio for sequential reads, and random
>>>> read/write. For sequential reads, we were seeing libfuse's
>>>> passthrough_hp take about a 50% hit, with fuse-bpf not being
>>>> detectably slower. For random read/write, we were seeing a roughly 90%
>>>> drop in performance from passthrough_hp, while fuse-bpf has about a 7%
>>>> drop in read and write speed. When we use a bpf that traces every
>>>> opcode, that performance hit increases to a roughly 1% drop in
>>>> sequential read performance, and a 20% drop in both read and write
>>>> performance for random read/write. We plan to make more complex bpf
>>>> examples, with fuse daemon equivalents to compare against.
>>>>
>>>> We have not looked closely at the impact of individual opcodes yet.
>>>>
>>>> There's also a potential ease of use for fuse-bpf. If you're
>>>> implementing a fuse daemon that is largely mirroring a backing
>>>> filesystem, you only need to write code for the differences in
>>>> behavior. For instance, say you want to remove image metadata like
>>>> location. You could give bpf information on what range of data is
>>>> metadata, and zero out that section without having to handle any other
>>>> operations.
>>>
>>> A bit out of topic (although I'm not quite look into FUSE BPF internals)
>>> After roughly listening to this topic in FS track last week, I'm not
>>> quite sure (at least in the long term) if it might be better if
>>> ebpf-related filter/redirect stuffs could be landed in vfs or in a
>>> somewhat stackable fs so that we could redirect/filter any sub-fstree
>>> in principle?    It's just an open question and I have no real tendency
>>> of this but do we really need a BPF-filter functionality for each
>>> individual fs?
>>
>> I think that is a valid question, but the answer is that even if it makes sense,
>> doing something like this in vfs would be a much bigger project with larger
>> consequences on performance and security and whatnot, so even if
>> (and a very big if) this ever happens, using FUSE-BPF as a playground for
>> this sort of stuff would be a good idea.
> 
> My current observation is that the total Fuse-BPF LoC is already beyond the


                          ^ sorry I double-checked now I was wrong, forget about it.

> whole FUSE itself.  In addition, it almost hooks all fs operations which
> impacts something to me.
> 
>>
>> This reminds me of union mounts - it made sense to have union mount
>> functionality in vfs, but after a long winding road, a stacked fs (overlayfs)
>> turned out to be a much more practical solution.
> 
> Yeah, I agree.  So it was just a pure hint on my side.
> 
>>
>>>
>>> It sounds much like
>>> https://learn.microsoft.com/en-us/windows-hardware/drivers/ifs/about-file-system-filter-drivers
>>>
>>
>> Nice reference.
>> I must admit that I found it hard to understand what Windows filter drivers
>> can do compared to FUSE-BPF design.
>> It'd be nice to get some comparison from what is planned for FUSE-BPF.
> 
> At least some investigation/analysis first might be better in the long
> term development.
> 
>>
>> Interesting to note that there is a "legacy" Windows filter driver API,
>> so Windows didn't get everything right for the first API - that is especially
>> interesting to look at as repeating other people's mistakes would be a shame.
> 
> I'm not familiar with that details as well, yet I saw that they have a
> filesystem filter subsystem, so I mentioned it here.
> 
> Thanks,
> Gao Xiang
> 
>>
>> Thanks,
>> Amir.