fs/fhandle.c | 97 ++++++++++++++++++++++------------------------- fs/libfs.c | 1 + fs/pidfs.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++ include/linux/exportfs.h | 20 ++++++++++ include/linux/pseudo_fs.h | 1 + 5 files changed, 164 insertions(+), 51 deletions(-)
Hey,
Now that we have the preliminaries to lookup struct pid based on its
inode number alone we can implement file handle support.
This is based on custom export operation methods which allows pidfs to
implement permission checking and opening of pidfs file handles cleanly
without hacking around in the core file handle code too much.
This is lightly tested.
Thanks!
Christian
---
Christian Brauner (5):
fhandle: simplify error handling
exportfs: add open method
fhandle: pull CAP_DAC_READ_SEARCH check into may_decode_fh()
exportfs: add permission method
pidfs: implement file handle support
Erin Shepherd (1):
pseudofs: add support for export_ops
fs/fhandle.c | 97 ++++++++++++++++++++++-------------------------
fs/libfs.c | 1 +
fs/pidfs.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/exportfs.h | 20 ++++++++++
include/linux/pseudo_fs.h | 1 +
5 files changed, 164 insertions(+), 51 deletions(-)
---
base-commit: 94c9a56ad3521a28177610c63298d66de634cb9d
change-id: 20241129-work-pidfs-file_handle-07bdfb860a38
On Fri, Nov 29, 2024 at 2:39 PM Christian Brauner <brauner@kernel.org> wrote: > > Hey, > > Now that we have the preliminaries to lookup struct pid based on its > inode number alone we can implement file handle support. > > This is based on custom export operation methods which allows pidfs to > implement permission checking and opening of pidfs file handles cleanly > without hacking around in the core file handle code too much. > > This is lightly tested. With my comments addressed as you pushed to vfs-6.14.pidfs branch in your tree, you may add to the patches posted: Reviewed-by: Amir Goldstein <amir73il@gmail.com> HOWEVER, IMO there is still one thing that has to be addressed before merge - We must make sure that nfsd cannot export pidfs. In principal, SB_NOUSER filesystems should not be accessible to userspace paths, so exportfs should not be able to configure nfsd export of pidfs, but maybe this limitation can be worked around by using magic link paths? I think it may be worth explicitly disallowing nfsd export of SB_NOUSER filesystems and we could also consider blocking SB_KERNMOUNT, but may there are users exporting ramfs? Jeff has mentioned that he thinks we are blocking export of cgroupfs by nfsd, but I really don't see where that is being enforced. The requirement for FS_REQUIRES_DEV in check_export() is weak because user can overrule it with manual fsid argument to exportfs. So maybe we disallow nfsd export of kernfs and backport to stable kernels to be on the safe side? On top of that, we may also want to reject nfsd export of any fs with custom ->open() or ->permission() export ops, on the grounds that nfsd does not call these ops? Regarding the two other kernel users of exportfs, namely, overlayfs and fanotify - For overlayfs, I think that in ovl_can_decode_fh() we can safely opt-out of SB_NOUSER and SB_KERNMOUNT filesystems, to not allow nfs exporting of overlayfs over those lower fs. For fanotify, there is already a check in fanotify_events_supported() to disallow sb/mount marks on SB_NOUSER and a comment that questions the value of allowing them for SB_KERNMOUNT. So for pidfs there is no risk wrt fanotify and it does not look like pidfs is going to generate any fanotify events anyway. Thanks, Amir.
On Sat, Nov 30, 2024 at 01:22:05PM +0100, Amir Goldstein wrote:
> On Fri, Nov 29, 2024 at 2:39 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > Hey,
> >
> > Now that we have the preliminaries to lookup struct pid based on its
> > inode number alone we can implement file handle support.
> >
> > This is based on custom export operation methods which allows pidfs to
> > implement permission checking and opening of pidfs file handles cleanly
> > without hacking around in the core file handle code too much.
> >
> > This is lightly tested.
>
> With my comments addressed as you pushed to vfs-6.14.pidfs branch
> in your tree, you may add to the patches posted:
>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
>
> HOWEVER,
> IMO there is still one thing that has to be addressed before merge -
> We must make sure that nfsd cannot export pidfs.
>
> In principal, SB_NOUSER filesystems should not be accessible to
> userspace paths, so exportfs should not be able to configure nfsd
> export of pidfs, but maybe this limitation can be worked around by
> using magic link paths?
I don't see how. I might be missing details.
> I think it may be worth explicitly disallowing nfsd export of SB_NOUSER
> filesystems and we could also consider blocking SB_KERNMOUNT,
> but may there are users exporting ramfs?
No need to restrict it if it's safe, I guess.
> Jeff has mentioned that he thinks we are blocking export of cgroupfs
> by nfsd, but I really don't see where that is being enforced.
> The requirement for FS_REQUIRES_DEV in check_export() is weak
> because user can overrule it with manual fsid argument to exportfs.
> So maybe we disallow nfsd export of kernfs and backport to stable kernels
> to be on the safe side?
File handles and nfs export have become two distinct things and there
filesystems based on kernfs, and pidfs want to support file handles
without support nfs export.
So I think instead of having nfs check what filesystems may be exported
we should let the filesystems indicate that they cannot be exported and
make nfs honour that.
So something like the untested sketch:
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 1358c21837f1..a5c75cb1c812 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -154,6 +154,7 @@ static const struct export_operations kernfs_export_ops = {
.fh_to_dentry = kernfs_fh_to_dentry,
.fh_to_parent = kernfs_fh_to_parent,
.get_parent = kernfs_get_parent_dentry,
+ .flags = EXPORT_OP_FILE_HANDLE,
};
/**
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index eacafe46e3b6..170c5729e7f2 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -417,6 +417,7 @@ static struct svc_export *svc_export_lookup(struct svc_export *);
static int check_export(struct path *path, int *flags, unsigned char *uuid)
{
struct inode *inode = d_inode(path->dentry);
+ const struct export_operations *nop;
/*
* We currently export only dirs, regular files, and (for v4
@@ -449,11 +450,16 @@ static int check_export(struct path *path, int *flags, unsigned char *uuid)
return -EINVAL;
}
- if (!exportfs_can_decode_fh(inode->i_sb->s_export_op)) {
+ if (!exportfs_can_decode_fh(nop)) {
dprintk("exp_export: export of invalid fs type.\n");
return -EINVAL;
}
+ if (nop && nop->flags & EXPORT_OP_FILE_HANDLE) {
+ dprintk("exp_export: filesystem only supports non-exportable file handles.\n");
+ return -EINVAL;
+ }
+
if (is_idmapped_mnt(path->mnt)) {
dprintk("exp_export: export of idmapped mounts not yet supported.\n");
return -EINVAL;
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 9aa7493b1e10..d1646c0789e1 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -83,10 +83,15 @@ void ovl_revert_creds(const struct cred *old_cred)
*/
int ovl_can_decode_fh(struct super_block *sb)
{
+ const struct export_operations *nop = sb->s_export_op;
+
if (!capable(CAP_DAC_READ_SEARCH))
return 0;
- if (!exportfs_can_decode_fh(sb->s_export_op))
+ if (!exportfs_can_decode_fh(nop))
+ return 0;
+
+ if (nop && nop->flags & EXPORT_OP_FILE_HANDLE)
return 0;
return sb->s_export_op->encode_fh ? -1 : FILEID_INO32_GEN;
diff --git a/fs/pidfs.c b/fs/pidfs.c
index dde3e4e90ea9..9d98b5461dc7 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -570,6 +570,7 @@ static const struct export_operations pidfs_export_operations = {
.fh_to_dentry = pidfs_fh_to_dentry,
.open = pidfs_export_open,
.permission = pidfs_export_permission,
+ .flags = EXPORT_OP_FILE_HANDLE,
};
static int pidfs_init_inode(struct inode *inode, void *data)
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index a087606ace19..98f7cb17abee 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -280,6 +280,7 @@ struct export_operations {
*/
#define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */
#define EXPORT_OP_ASYNC_LOCK (0x40) /* fs can do async lock request */
+#define EXPORT_OP_FILE_HANDLE (0x80) /* fs only supports file handles, no proper export */
unsigned long flags;
};
> On top of that, we may also want to reject nfsd export of any fs
> with custom ->open() or ->permission() export ops, on the grounds
> that nfsd does not call these ops?
>
> Regarding the two other kernel users of exportfs, namely,
> overlayfs and fanotify -
>
> For overlayfs, I think that in ovl_can_decode_fh() we can safely
> opt-out of SB_NOUSER and SB_KERNMOUNT filesystems,
> to not allow nfs exporting of overlayfs over those lower fs.
>
> For fanotify, there is already a check in fanotify_events_supported()
> to disallow sb/mount marks on SB_NOUSER and a comment that
> questions the value of allowing them for SB_KERNMOUNT.
> So for pidfs there is no risk wrt fanotify and it does not look like pidfs
> is going to generate any fanotify events anyway.
On Sun, Dec 1, 2024 at 9:43 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Sat, Nov 30, 2024 at 01:22:05PM +0100, Amir Goldstein wrote:
> > On Fri, Nov 29, 2024 at 2:39 PM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > Hey,
> > >
> > > Now that we have the preliminaries to lookup struct pid based on its
> > > inode number alone we can implement file handle support.
> > >
> > > This is based on custom export operation methods which allows pidfs to
> > > implement permission checking and opening of pidfs file handles cleanly
> > > without hacking around in the core file handle code too much.
> > >
> > > This is lightly tested.
> >
> > With my comments addressed as you pushed to vfs-6.14.pidfs branch
> > in your tree, you may add to the patches posted:
> >
> > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> >
> > HOWEVER,
> > IMO there is still one thing that has to be addressed before merge -
> > We must make sure that nfsd cannot export pidfs.
> >
> > In principal, SB_NOUSER filesystems should not be accessible to
> > userspace paths, so exportfs should not be able to configure nfsd
> > export of pidfs, but maybe this limitation can be worked around by
> > using magic link paths?
>
> I don't see how. I might be missing details.
AFAIK, nfsd gets the paths to export from userspace via
svc_export_parse() => kern_path(buf, 0, &exp.ex_path)
afterwards check_export() validates exp.ex_path and I see that regular
files can be exported.
I suppose that a pidfs file can have a magic link path no?
The question is whether this magic link path could be passed to nfsd
via the exportfs UAPI.
>
> > I think it may be worth explicitly disallowing nfsd export of SB_NOUSER
> > filesystems and we could also consider blocking SB_KERNMOUNT,
> > but may there are users exporting ramfs?
>
> No need to restrict it if it's safe, I guess.
>
> > Jeff has mentioned that he thinks we are blocking export of cgroupfs
> > by nfsd, but I really don't see where that is being enforced.
> > The requirement for FS_REQUIRES_DEV in check_export() is weak
> > because user can overrule it with manual fsid argument to exportfs.
> > So maybe we disallow nfsd export of kernfs and backport to stable kernels
> > to be on the safe side?
>
> File handles and nfs export have become two distinct things and there
> filesystems based on kernfs, and pidfs want to support file handles
> without support nfs export.
>
> So I think instead of having nfs check what filesystems may be exported
> we should let the filesystems indicate that they cannot be exported and
> make nfs honour that.
Yes, I agree, but...
>
> So something like the untested sketch:
>
> diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
> index 1358c21837f1..a5c75cb1c812 100644
> --- a/fs/kernfs/mount.c
> +++ b/fs/kernfs/mount.c
> @@ -154,6 +154,7 @@ static const struct export_operations kernfs_export_ops = {
> .fh_to_dentry = kernfs_fh_to_dentry,
> .fh_to_parent = kernfs_fh_to_parent,
> .get_parent = kernfs_get_parent_dentry,
> + .flags = EXPORT_OP_FILE_HANDLE,
> };
>
> /**
> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> index eacafe46e3b6..170c5729e7f2 100644
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -417,6 +417,7 @@ static struct svc_export *svc_export_lookup(struct svc_export *);
> static int check_export(struct path *path, int *flags, unsigned char *uuid)
> {
> struct inode *inode = d_inode(path->dentry);
> + const struct export_operations *nop;
>
> /*
> * We currently export only dirs, regular files, and (for v4
> @@ -449,11 +450,16 @@ static int check_export(struct path *path, int *flags, unsigned char *uuid)
> return -EINVAL;
> }
>
> - if (!exportfs_can_decode_fh(inode->i_sb->s_export_op)) {
> + if (!exportfs_can_decode_fh(nop)) {
> dprintk("exp_export: export of invalid fs type.\n");
> return -EINVAL;
> }
>
> + if (nop && nop->flags & EXPORT_OP_FILE_HANDLE) {
> + dprintk("exp_export: filesystem only supports non-exportable file handles.\n");
> + return -EINVAL;
> + }
> +
> if (is_idmapped_mnt(path->mnt)) {
> dprintk("exp_export: export of idmapped mounts not yet supported.\n");
> return -EINVAL;
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 9aa7493b1e10..d1646c0789e1 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -83,10 +83,15 @@ void ovl_revert_creds(const struct cred *old_cred)
> */
> int ovl_can_decode_fh(struct super_block *sb)
> {
> + const struct export_operations *nop = sb->s_export_op;
> +
> if (!capable(CAP_DAC_READ_SEARCH))
> return 0;
>
> - if (!exportfs_can_decode_fh(sb->s_export_op))
> + if (!exportfs_can_decode_fh(nop))
> + return 0;
> +
> + if (nop && nop->flags & EXPORT_OP_FILE_HANDLE)
> return 0;
>
> return sb->s_export_op->encode_fh ? -1 : FILEID_INO32_GEN;
> diff --git a/fs/pidfs.c b/fs/pidfs.c
> index dde3e4e90ea9..9d98b5461dc7 100644
> --- a/fs/pidfs.c
> +++ b/fs/pidfs.c
> @@ -570,6 +570,7 @@ static const struct export_operations pidfs_export_operations = {
> .fh_to_dentry = pidfs_fh_to_dentry,
> .open = pidfs_export_open,
> .permission = pidfs_export_permission,
> + .flags = EXPORT_OP_FILE_HANDLE,
> };
>
> static int pidfs_init_inode(struct inode *inode, void *data)
> diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> index a087606ace19..98f7cb17abee 100644
> --- a/include/linux/exportfs.h
> +++ b/include/linux/exportfs.h
> @@ -280,6 +280,7 @@ struct export_operations {
> */
> #define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */
> #define EXPORT_OP_ASYNC_LOCK (0x40) /* fs can do async lock request */
> +#define EXPORT_OP_FILE_HANDLE (0x80) /* fs only supports file handles, no proper export */
This is a bad name IMO, since pidfs clearly does support file handles
and supports the open_by_handle_at() UAPI.
I was going to suggest EXPORT_OP_NO_NFS_EXPORT, but it also
sounds silly, so maybe:
#define EXPORT_OP_LOCAL_FILE_HANDLE (0x80) /* fs only
supports local file handles, no nfs export */
With that you may add:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Thanks,
Amir.
On Sun, Dec 01, 2024 at 01:09:17PM +0100, Amir Goldstein wrote:
> On Sun, Dec 1, 2024 at 9:43 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Sat, Nov 30, 2024 at 01:22:05PM +0100, Amir Goldstein wrote:
> > > On Fri, Nov 29, 2024 at 2:39 PM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > Hey,
> > > >
> > > > Now that we have the preliminaries to lookup struct pid based on its
> > > > inode number alone we can implement file handle support.
> > > >
> > > > This is based on custom export operation methods which allows pidfs to
> > > > implement permission checking and opening of pidfs file handles cleanly
> > > > without hacking around in the core file handle code too much.
> > > >
> > > > This is lightly tested.
> > >
> > > With my comments addressed as you pushed to vfs-6.14.pidfs branch
> > > in your tree, you may add to the patches posted:
> > >
> > > Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> > >
> > > HOWEVER,
> > > IMO there is still one thing that has to be addressed before merge -
> > > We must make sure that nfsd cannot export pidfs.
> > >
> > > In principal, SB_NOUSER filesystems should not be accessible to
> > > userspace paths, so exportfs should not be able to configure nfsd
> > > export of pidfs, but maybe this limitation can be worked around by
> > > using magic link paths?
> >
> > I don't see how. I might be missing details.
>
> AFAIK, nfsd gets the paths to export from userspace via
> svc_export_parse() => kern_path(buf, 0, &exp.ex_path)
> afterwards check_export() validates exp.ex_path and I see that regular
> files can be exported.
> I suppose that a pidfs file can have a magic link path no?
> The question is whether this magic link path could be passed to nfsd
> via the exportfs UAPI.
Ah, ok. I see what you mean. You're thinking about specifying
/proc/<pid>/fd/<pidfd> in /etc/exports. Yes, that would work.
>
> >
> > > I think it may be worth explicitly disallowing nfsd export of SB_NOUSER
> > > filesystems and we could also consider blocking SB_KERNMOUNT,
> > > but may there are users exporting ramfs?
> >
> > No need to restrict it if it's safe, I guess.
> >
> > > Jeff has mentioned that he thinks we are blocking export of cgroupfs
> > > by nfsd, but I really don't see where that is being enforced.
> > > The requirement for FS_REQUIRES_DEV in check_export() is weak
> > > because user can overrule it with manual fsid argument to exportfs.
> > > So maybe we disallow nfsd export of kernfs and backport to stable kernels
> > > to be on the safe side?
> >
> > File handles and nfs export have become two distinct things and there
> > filesystems based on kernfs, and pidfs want to support file handles
> > without support nfs export.
> >
> > So I think instead of having nfs check what filesystems may be exported
> > we should let the filesystems indicate that they cannot be exported and
> > make nfs honour that.
>
> Yes, I agree, but...
>
> >
> > So something like the untested sketch:
> >
> > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
> > index 1358c21837f1..a5c75cb1c812 100644
> > --- a/fs/kernfs/mount.c
> > +++ b/fs/kernfs/mount.c
> > @@ -154,6 +154,7 @@ static const struct export_operations kernfs_export_ops = {
> > .fh_to_dentry = kernfs_fh_to_dentry,
> > .fh_to_parent = kernfs_fh_to_parent,
> > .get_parent = kernfs_get_parent_dentry,
> > + .flags = EXPORT_OP_FILE_HANDLE,
> > };
> >
> > /**
> > diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> > index eacafe46e3b6..170c5729e7f2 100644
> > --- a/fs/nfsd/export.c
> > +++ b/fs/nfsd/export.c
> > @@ -417,6 +417,7 @@ static struct svc_export *svc_export_lookup(struct svc_export *);
> > static int check_export(struct path *path, int *flags, unsigned char *uuid)
> > {
> > struct inode *inode = d_inode(path->dentry);
> > + const struct export_operations *nop;
> >
> > /*
> > * We currently export only dirs, regular files, and (for v4
> > @@ -449,11 +450,16 @@ static int check_export(struct path *path, int *flags, unsigned char *uuid)
> > return -EINVAL;
> > }
> >
> > - if (!exportfs_can_decode_fh(inode->i_sb->s_export_op)) {
> > + if (!exportfs_can_decode_fh(nop)) {
> > dprintk("exp_export: export of invalid fs type.\n");
> > return -EINVAL;
> > }
> >
> > + if (nop && nop->flags & EXPORT_OP_FILE_HANDLE) {
> > + dprintk("exp_export: filesystem only supports non-exportable file handles.\n");
> > + return -EINVAL;
> > + }
> > +
> > if (is_idmapped_mnt(path->mnt)) {
> > dprintk("exp_export: export of idmapped mounts not yet supported.\n");
> > return -EINVAL;
> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > index 9aa7493b1e10..d1646c0789e1 100644
> > --- a/fs/overlayfs/util.c
> > +++ b/fs/overlayfs/util.c
> > @@ -83,10 +83,15 @@ void ovl_revert_creds(const struct cred *old_cred)
> > */
> > int ovl_can_decode_fh(struct super_block *sb)
> > {
> > + const struct export_operations *nop = sb->s_export_op;
> > +
> > if (!capable(CAP_DAC_READ_SEARCH))
> > return 0;
> >
> > - if (!exportfs_can_decode_fh(sb->s_export_op))
> > + if (!exportfs_can_decode_fh(nop))
> > + return 0;
> > +
> > + if (nop && nop->flags & EXPORT_OP_FILE_HANDLE)
> > return 0;
> >
> > return sb->s_export_op->encode_fh ? -1 : FILEID_INO32_GEN;
> > diff --git a/fs/pidfs.c b/fs/pidfs.c
> > index dde3e4e90ea9..9d98b5461dc7 100644
> > --- a/fs/pidfs.c
> > +++ b/fs/pidfs.c
> > @@ -570,6 +570,7 @@ static const struct export_operations pidfs_export_operations = {
> > .fh_to_dentry = pidfs_fh_to_dentry,
> > .open = pidfs_export_open,
> > .permission = pidfs_export_permission,
> > + .flags = EXPORT_OP_FILE_HANDLE,
> > };
> >
> > static int pidfs_init_inode(struct inode *inode, void *data)
> > diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> > index a087606ace19..98f7cb17abee 100644
> > --- a/include/linux/exportfs.h
> > +++ b/include/linux/exportfs.h
> > @@ -280,6 +280,7 @@ struct export_operations {
> > */
> > #define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */
> > #define EXPORT_OP_ASYNC_LOCK (0x40) /* fs can do async lock request */
> > +#define EXPORT_OP_FILE_HANDLE (0x80) /* fs only supports file handles, no proper export */
>
> This is a bad name IMO, since pidfs clearly does support file handles
> and supports the open_by_handle_at() UAPI.
>
> I was going to suggest EXPORT_OP_NO_NFS_EXPORT, but it also
> sounds silly, so maybe:
>
> #define EXPORT_OP_LOCAL_FILE_HANDLE (0x80) /* fs only
> supports local file handles, no nfs export */
Thank you. I'll send a reply with a proper patch to this thread in a second.
> With that you may add:
>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
>
> Thanks,
> Amir.
© 2016 - 2026 Red Hat, Inc.