fs/file_table.c | 2 ++ fs/internal.h | 1 + fs/namei.c | 38 ++++++++++++++++++++++++++++---- fs/open.c | 9 ++++++++ fs/proc/base.c | 24 ++++++++++++++------ fs/proc/fd.c | 6 ++++- fs/proc/internal.h | 4 +++- include/linux/fcntl.h | 6 ++++- include/linux/fs.h | 1 + include/linux/namei.h | 15 ++++++++++++- include/uapi/asm-generic/fcntl.h | 4 ++++ include/uapi/linux/openat2.h | 1 + 12 files changed, 96 insertions(+), 15 deletions(-)
Add upgrade restrictions to openat2(). Extend struct open_how to allow setting transitive restrictions on using file descriptors to open other files. A use case for this feature is to block services or containers from re-opening/upgrading an O_PATH file descriptor through e.g. /proc/<pid>/fd/<nr> or OPENAT2_EMPTY_PATH (if upstreamed) as O_WRONLY. The implementation idea is this: magic paths like /proc/<pid>/fd/<nr> (currently the only one of its sort AFAIK) go through nd_jump_link() to hard set current->nameidata. To include information about the fd yielding the magic link, we add a new struct jump_how as a parameter. This struct may include restictions or other metadata attached to the magic link jump other than the struct path to jump to. So far it has only one unsigned int field: allowed_upgrades. This is a flag int that (for now) may be either READ_UPGRADABLE, WRITE_UPGRADABLE, or DENY_UPGRADES. The idea is that you can restrict what kind of open flags may be used to open files in any way using this fd as a starting point (transitively). The check is enforced in may_open_upgrade(), which is just the old may_open() with an extra test. To keep this state attached to the fds, we add a field f_allowed_upgrades to struct file. Then in do_open(), after success, we compute: file->f_allowed_upgrades = op->allowed_upgrades & nd->allowed_upgrades; where op is the struct open_flags that is build from open_how in build_open_flags(), and nd->allowed_upgrades is set during path traversal either in path_init() or nd_jump_link(). The implementation and the idea are a bit rough; it is the first bit of less trivial work I have done on the kernel, hence the RFC status. I did create some self tests already which this patch passes, and nothing seems to break on a fresh vng kernel. But obviously there may be MANY things I am overlooking. The original idea for this features comes form the UAPI group kernel feature idea list [1]. [1] https://github.com/uapi-group/kernel-features?tab=readme-ov-file#upgrade-masks-in-openat2 Jori Koolstra (1): vfs: transitive upgrade restrictions for fds fs/file_table.c | 2 ++ fs/internal.h | 1 + fs/namei.c | 38 ++++++++++++++++++++++++++++---- fs/open.c | 9 ++++++++ fs/proc/base.c | 24 ++++++++++++++------ fs/proc/fd.c | 6 ++++- fs/proc/internal.h | 4 +++- include/linux/fcntl.h | 6 ++++- include/linux/fs.h | 1 + include/linux/namei.h | 15 ++++++++++++- include/uapi/asm-generic/fcntl.h | 4 ++++ include/uapi/linux/openat2.h | 1 + 12 files changed, 96 insertions(+), 15 deletions(-) -- 2.53.0
On Mon, 2026-03-23 at 23:00 +0100, Jori Koolstra wrote: > Add upgrade restrictions to openat2(). Extend struct open_how to allow > setting transitive restrictions on using file descriptors to open other > files. A use case for this feature is to block services or containers > from re-opening/upgrading an O_PATH file descriptor through e.g. > /proc/<pid>/fd/<nr> or OPENAT2_EMPTY_PATH (if upstreamed) as O_WRONLY. > > The implementation idea is this: magic paths like /proc/<pid>/fd/<nr> > (currently the only one of its sort AFAIK) go through nd_jump_link() to > hard set current->nameidata. To include information about the fd > yielding the magic link, we add a new struct jump_how as a parameter. > This struct may include restictions or other metadata attached to the > magic link jump other than the struct path to jump to. So far it has > only one unsigned int field: allowed_upgrades. This is a flag int that > (for now) may be either READ_UPGRADABLE, WRITE_UPGRADABLE, or > DENY_UPGRADES. > > The idea is that you can restrict what kind of open flags may be used > to open files in any way using this fd as a starting point > (transitively). The check is enforced in may_open_upgrade(), which is > just the old may_open() with an extra test. To keep this state attached > to the fds, we add a field f_allowed_upgrades to struct file. Then > in do_open(), after success, we compute: > > file->f_allowed_upgrades = > op->allowed_upgrades & nd->allowed_upgrades; > > where op is the struct open_flags that is build from open_how in > build_open_flags(), and nd->allowed_upgrades is set during path > traversal either in path_init() or nd_jump_link(). > > The implementation and the idea are a bit rough; it is the first bit of > less trivial work I have done on the kernel, hence the RFC status. I did > create some self tests already which this patch passes, and nothing > seems to break on a fresh vng kernel. But obviously there may be MANY > things I am overlooking. > > The original idea for this features comes form the UAPI group kernel > feature idea list [1]. > > [1] https://github.com/uapi-group/kernel-features?tab=readme-ov-file#upgrade-masks-in-openat2 > > Jori Koolstra (1): > vfs: transitive upgrade restrictions for fds > > fs/file_table.c | 2 ++ > fs/internal.h | 1 + > fs/namei.c | 38 ++++++++++++++++++++++++++++---- > fs/open.c | 9 ++++++++ > fs/proc/base.c | 24 ++++++++++++++------ > fs/proc/fd.c | 6 ++++- > fs/proc/internal.h | 4 +++- > include/linux/fcntl.h | 6 ++++- > include/linux/fs.h | 1 + > include/linux/namei.h | 15 ++++++++++++- > include/uapi/asm-generic/fcntl.h | 4 ++++ > include/uapi/linux/openat2.h | 1 + > 12 files changed, 96 insertions(+), 15 deletions(-) It's an interesting idea, but I could see it being difficult to track the result of this across a large chain of open fd's. If you are going to do this, then at the very least you should add a mechanism (fcntl() command?) to query the current f_allowed_upgrade mask, so that this can be debugged in some fashion. -- Jeff Layton <jlayton@kernel.org>
© 2016 - 2026 Red Hat, Inc.