From nobody Fri Oct 3 07:40:41 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 197E62E5B21 for ; Wed, 3 Sep 2025 12:02:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756900947; cv=none; b=VOnmyscvX4qjdxGuOxCPQnYmkdAa/zXSdFgaEepqLAr0Ge9OTetiBLsT9lW9n4i/q1tUjM/tQ6f9ReqQr6MR6mLR/mJae30QSa5Dm9ASsdZi+85a55eJSgNDNK/Almeh21ScdS+BxyMS4PSlsZizgClZ0kGJRChi4QNHkO5nD5Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756900947; c=relaxed/simple; bh=+ne/FesykYnH6KE6q6bVytOMwgMV+cNuIB7nYPhn+xg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ad8MHbHlrJsGySiap50a/jBgpQ01LAJdZRT/SvsWHliL1F46oIN7FEvpgYsDLqac4Na7mwfgOZrlGgrzo/hfefF5g/7eIIQ9sjcSz5E99iTNlK/KpyETPPwGyWkvNKJUZqNTrlbwQHAdNOuSvcujNYXcy4DCVaYtzPpIB7MvSvE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=T6qNVjlc; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="T6qNVjlc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1756900943; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bGLI6Loye0jPC9Hy/grfL+r07q9/QalrfK6eBAUzugs=; b=T6qNVjlcKETGsM14gRtKqeb6/I4UZs5IebnLxTVlTqwxTonvpgrEu9ZWjpnegwolJEWo9u QWlybKMCMmW+SooWKyMj6ZPndbo5ncryxdsQkxh+rkBXOVRwlPzBzWKd6DOH/gTQ2oT6IF e37CNdKkpSZXd0b/DCjpsppC5585QLk= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-554-sXGEiuHvNoKSngvuXyO3Bw-1; Wed, 03 Sep 2025 08:02:19 -0400 X-MC-Unique: sXGEiuHvNoKSngvuXyO3Bw-1 X-Mimecast-MFC-AGG-ID: sXGEiuHvNoKSngvuXyO3Bw_1756900937 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B4B251956089; Wed, 3 Sep 2025 12:02:10 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.42.28.6]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 9706B30001B1; Wed, 3 Sep 2025 12:02:05 +0000 (UTC) From: David Howells To: Al Viro , Christian Brauner Cc: David Howells , Marc Dionne , Jeffrey Altman , Steve French , linux-afs@lists.infradead.org, openafs-devel@openafs.org, linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Etienne Champetier , Chet Ramey , Cheyenne Wills , Christian Brauner , Mimi Zohar , linux-integrity@vger.kernel.org Subject: [PATCH 1/2] vfs: Allow filesystems with foreign owner IDs to override UID checks Date: Wed, 3 Sep 2025 13:01:53 +0100 Message-ID: <20250903120157.899182-2-dhowells@redhat.com> In-Reply-To: <20250903120157.899182-1-dhowells@redhat.com> References: <20250903120157.899182-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Content-Type: text/plain; charset="utf-8" A number of ownership checks made by the VFS make a number of assumptions: (1) that it is meaningful to compare inode->i_uid to a second ->i_uid or to current_fsuid(), (2) that current_fsuid() represents the subject of the action, (3) that the number in ->i_uid belong to the system's ID space and (4) that the IDs can be represented by 32-bit integers. Network filesystems, however, may violate all four of these assumptions. Indeed, a network filesystem may not even have an actual concept of a UNIX integer UID (cifs without POSIX extensions, for example). Plug-in block filesystems (e.g. USB drives) may also violate this assumption. In particular, AFS implements its own ACL security model with its own per-cell user ID space with 64-bit IDs for some server variants. The subject is represented by a token in a key, not current_fsuid(). The AFS user IDs and the system user IDs for a cell may be numerically equivalent, but that's matter of administrative policy and should perhaps be noted in the cell definition or by mount option. A subsequent patch will address AFS. To help fix this, three functions are defined to perform UID comparison within the VFS: (1) vfs_inode_is_owned_by_me(). This defaults to comparing i_uid to current_fsuid(), with appropriate namespace mapping, assuming that the fsuid identifies the subject of the action. The filesystem may override it by implementing an inode op: int (*is_owned_by_me)(struct mnt_idmap *idmap, struct inode *inode); This should return 0 if owned, 1 if not or an error if there's some sort of lookup failure. It may use a means of identifying the subject of the action other than fsuid, for example by using an authentication token stored in a key. (2) vfs_inodes_have_same_owner(). This defaults to comparing the i_uids of two different inodes with appropriate namespace mapping. The filesystem may override it by implementing another inode op: int (*have_same_owner)(struct mnt_idmap *idmap, struct inode *inode1, struct inode *inode2); Again, this should return 0 if matching, 1 if not or an error if there's some sort of lookup failure. (3) vfs_inode_and_dir_have_same_owner(). This is similar to (2), but assumes that the second inode is the parent directory to the first and takes a nameidata struct instead of a second inode pointer. Fix a number of places within the VFS where such UID checks are made that should be deferring interpretation to the filesystem. (*) chown_ok() (*) chgrp_ok() Call vfs_inode_is_owned_by_me(). Possibly these need to defer all their checks to the network filesystem as the interpretation of the new UID/GID depends on the netfs too, but the ->setattr() method gets a chance to deal with that. (*) coredump_file() Call vfs_is_owned_by_me() to check that the file created is owned by the caller - but the check that's there might be sufficient. (*) inode_owner_or_capable() Call vfs_is_owned_by_me(). I'm not sure whether the namespace mapping makes sense in such a case, but it probably could be used. (*) vfs_setlease() Call vfs_is_owned_by_me(). Actually, it should query if leasing is permitted. Also, setting locks could perhaps do with a permission call to the filesystem driver as AFS, for example, has a lock permission bit in the ACL, but since the AFS server checks that when the RPC call is made, it's probably unnecessary. (*) acl_permission_check() (*) posix_acl_permission() Unchanged. These functions are only used by generic_permission() which is overridden if ->permission() is supplied, and when evaluating a POSIX ACL, it should arguably be checking the UID anyway. AFS, for example, implements its own ACLs and evaluates them in ->permission() and on the server. (*) may_follow_link() Call vfs_inode_and_dir_have_same_owner() and vfs_is_owned_by_me() on the the link and its parent dir. (*) may_create_in_sticky() Call vfs_is_owned_by_me() and also vfs_inode_and_dir_have_same_owner() both. [?] Should this return ok immediately if the open call we're in created the file being checked. (*) __check_sticky() Call vfs_is_owned_by_me() on both the dir and the inode, but for AFS vfs_is_owned_by_me() on a directory doesn't work, so call vfs_inodes_have_same_owner() instead to check the directory (as is done in may_create_in_sticky()). (*) may_dedupe_file() Call vfs_is_owned_by_me(). (*) IMA policy ops. Unchanged for now. I'm not sure what the best way to deal with this is - if, indeed, it needs any changes. Note that wrapping stuff up into vfs_inode_is_owned_by_me() isn't necessarily the most efficient as it means we may end up doing the uid idmapping an extra time - though this is only done in three places, all to do with world-writable sticky dir checks. Signed-off-by: David Howells cc: Etienne Champetier cc: Marc Dionne cc: Jeffrey Altman cc: Chet Ramey cc: Cheyenne Wills cc: Alexander Viro cc: Christian Brauner cc: Steve French cc: Mimi Zohar cc: linux-afs@lists.infradead.org cc: openafs-devel@openafs.org cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-integrity@vger.kernel.org Link: https://groups.google.com/g/gnu.bash.bug/c/6PPTfOgFdL4/m/2AQU-S1N76UJ Link: https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=3Dbash-5.3-= rc1#n733 --- Documentation/filesystems/vfs.rst | 21 ++++ fs/attr.c | 58 ++++++----- fs/coredump.c | 2 +- fs/inode.c | 11 +- fs/internal.h | 1 + fs/locks.c | 7 +- fs/namei.c | 161 ++++++++++++++++++++++++------ fs/remap_range.c | 20 ++-- include/linux/fs.h | 6 +- 9 files changed, 216 insertions(+), 71 deletions(-) diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/= vfs.rst index 486a91633474..85e92d5b369a 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -518,6 +518,9 @@ As of kernel 2.6.22, the following members are defined: struct dentry *dentry, struct file_kattr *fa); int (*fileattr_get)(struct dentry *dentry, struct file_kattr *fa); struct offset_ctx *(*get_offset_ctx)(struct inode *inode); + int (*is_owned_by_me)(struct mnt_idmap *idmap, struct inode *inode); + int (*have_same_owner)(struct mnt_idmap *idmap, struct inode *inode, + struct dentry *dentry); }; =20 Again, all methods are called without any locks being held, unless @@ -702,6 +705,24 @@ otherwise noted. filesystem must define this operation to use simple_offset_dir_operations. =20 +``is_owned_by_me`` + called to determine if the file can be considered to be 'owned' by + the owner of the process or if the process has a token that grants + it ownership privileges. If unset, the default is to compare i_uid + to current_fsuid() - but this may give incorrect results for some + network or plug-in block filesystems. For example, AFS determines + ownership entirely according to an obtained token and i_uid may not + even be from the same ID space as current_uid(). + +``have_same_owner`` + called to determine if an inode has the same owner as its immediate + parent on the path walked. If unset, the default is to simply + compare the i_uid of both. For example, AFS compares the owner IDs + of both - but these are a 64-bit values on some variants that might + not fit into a kuid_t and cifs has GUIDs that cannot be compared to + kuid_t. + + The Address Space Object =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 diff --git a/fs/attr.c b/fs/attr.c index 5425c1dbbff9..ec6c66743526 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -16,6 +16,7 @@ #include #include #include +#include "internal.h" =20 /** * setattr_should_drop_sgid - determine whether the setgid bit needs to be @@ -91,19 +92,21 @@ EXPORT_SYMBOL(setattr_should_drop_suidgid); * permissions. On non-idmapped mounts or if permission checking is to be * performed on the raw inode simply pass @nop_mnt_idmap. */ -static bool chown_ok(struct mnt_idmap *idmap, - const struct inode *inode, vfsuid_t ia_vfsuid) +static int chown_ok(struct mnt_idmap *idmap, + struct inode *inode, vfsuid_t ia_vfsuid) { vfsuid_t vfsuid =3D i_uid_into_vfsuid(idmap, inode); - if (vfsuid_eq_kuid(vfsuid, current_fsuid()) && - vfsuid_eq(ia_vfsuid, vfsuid)) - return true; + int ret; + + ret =3D vfs_inode_is_owned_by_me(idmap, inode); + if (ret <=3D 0) + return ret; if (capable_wrt_inode_uidgid(idmap, inode, CAP_CHOWN)) - return true; + return 0; if (!vfsuid_valid(vfsuid) && ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN)) - return true; - return false; + return 0; + return -EPERM; } =20 /** @@ -118,23 +121,27 @@ static bool chown_ok(struct mnt_idmap *idmap, * permissions. On non-idmapped mounts or if permission checking is to be * performed on the raw inode simply pass @nop_mnt_idmap. */ -static bool chgrp_ok(struct mnt_idmap *idmap, - const struct inode *inode, vfsgid_t ia_vfsgid) +static int chgrp_ok(struct mnt_idmap *idmap, + struct inode *inode, vfsgid_t ia_vfsgid) { vfsgid_t vfsgid =3D i_gid_into_vfsgid(idmap, inode); - vfsuid_t vfsuid =3D i_uid_into_vfsuid(idmap, inode); - if (vfsuid_eq_kuid(vfsuid, current_fsuid())) { + int ret; + + ret =3D vfs_inode_is_owned_by_me(idmap, inode); + if (ret < 0) + return ret; + if (ret =3D=3D 0) { if (vfsgid_eq(ia_vfsgid, vfsgid)) - return true; + return 0; if (vfsgid_in_group_p(ia_vfsgid)) - return true; + return 0; } if (capable_wrt_inode_uidgid(idmap, inode, CAP_CHOWN)) - return true; + return 0; if (!vfsgid_valid(vfsgid) && ns_capable(inode->i_sb->s_user_ns, CAP_CHOWN)) - return true; - return false; + return 0; + return -EPERM; } =20 /** @@ -163,6 +170,7 @@ int setattr_prepare(struct mnt_idmap *idmap, struct den= try *dentry, { struct inode *inode =3D d_inode(dentry); unsigned int ia_valid =3D attr->ia_valid; + int ret; =20 /* * First check size constraints. These can't be overriden using @@ -179,14 +187,18 @@ int setattr_prepare(struct mnt_idmap *idmap, struct d= entry *dentry, goto kill_priv; =20 /* Make sure a caller can chown. */ - if ((ia_valid & ATTR_UID) && - !chown_ok(idmap, inode, attr->ia_vfsuid)) - return -EPERM; + if (ia_valid & ATTR_UID) { + ret =3D chown_ok(idmap, inode, attr->ia_vfsuid); + if (ret < 0) + return ret; + } =20 /* Make sure caller can chgrp. */ - if ((ia_valid & ATTR_GID) && - !chgrp_ok(idmap, inode, attr->ia_vfsgid)) - return -EPERM; + if (ia_valid & ATTR_GID) { + ret =3D chgrp_ok(idmap, inode, attr->ia_vfsgid); + if (ret < 0) + return ret; + } =20 /* Make sure a caller can chmod. */ if (ia_valid & ATTR_MODE) { diff --git a/fs/coredump.c b/fs/coredump.c index 5dce257c67fc..dfe5d6c17ff2 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -951,7 +951,7 @@ static bool coredump_file(struct core_name *cn, struct = coredump_params *cprm, * filesystem. */ idmap =3D file_mnt_idmap(file); - if (!vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), current_fsuid())) { + if (vfs_inode_is_owned_by_me(idmap, inode) !=3D 0) { coredump_report_failure("Core dump to %s aborted: cannot preserve file o= wner", cn->corename); return false; } diff --git a/fs/inode.c b/fs/inode.c index 01ebdc40021e..25f0a009a511 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2580,16 +2580,19 @@ EXPORT_SYMBOL(inode_init_owner); * On non-idmapped mounts or if permission checking is to be performed on = the * raw inode simply pass @nop_mnt_idmap. */ -bool inode_owner_or_capable(struct mnt_idmap *idmap, - const struct inode *inode) +bool inode_owner_or_capable(struct mnt_idmap *idmap, struct inode *inode) { vfsuid_t vfsuid; struct user_namespace *ns; + int ret; =20 - vfsuid =3D i_uid_into_vfsuid(idmap, inode); - if (vfsuid_eq_kuid(vfsuid, current_fsuid())) + ret =3D vfs_inode_is_owned_by_me(idmap, inode); + if (ret =3D=3D 0) return true; + if (ret < 0) + return false; =20 + vfsuid =3D i_uid_into_vfsuid(idmap, inode); ns =3D current_user_ns(); if (vfsuid_has_mapping(ns, vfsuid) && ns_capable(ns, CAP_FOWNER)) return true; diff --git a/fs/internal.h b/fs/internal.h index 38e8aab27bbd..bd22c629614b 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -52,6 +52,7 @@ extern int finish_clean_context(struct fs_context *fc); /* * namei.c */ +int vfs_inode_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode); extern int filename_lookup(int dfd, struct filename *name, unsigned flags, struct path *path, struct path *root); int do_rmdir(int dfd, struct filename *name); diff --git a/fs/locks.c b/fs/locks.c index 559f02aa4172..c27de96de573 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -68,6 +68,7 @@ #include =20 #include +#include "internal.h" =20 static struct file_lock *file_lock(struct file_lock_core *flc) { @@ -2013,10 +2014,12 @@ int vfs_setlease(struct file *filp, int arg, struct file_lease **lease, void *= *priv) { struct inode *inode =3D file_inode(filp); - vfsuid_t vfsuid =3D i_uid_into_vfsuid(file_mnt_idmap(filp), inode); int error; =20 - if ((!vfsuid_eq_kuid(vfsuid, current_fsuid())) && !capable(CAP_LEASE)) + error =3D vfs_inode_is_owned_by_me(file_mnt_idmap(filp), inode); + if (error < 0) + return error; + if (error !=3D 0 && !capable(CAP_LEASE)) return -EACCES; if (!S_ISREG(inode->i_mode)) return -EINVAL; diff --git a/fs/namei.c b/fs/namei.c index cd43ff89fbaa..cd3c083f9e81 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -53,8 +53,8 @@ * The new code replaces the old recursive symlink resolution with * an iterative one (in case of non-nested symlink chains). It does * this with calls to _follow_link(). - * As a side effect, dir_namei(), _namei() and follow_link() are now=20 - * replaced with a single function lookup_dentry() that can handle all=20 + * As a side effect, dir_namei(), _namei() and follow_link() are now + * replaced with a single function lookup_dentry() that can handle all * the special cases of the former code. * * With the new dcache, the pathname is stored at each inode, at least as @@ -1149,6 +1149,72 @@ fs_initcall(init_fs_namei_sysctls); =20 #endif /* CONFIG_SYSCTL */ =20 +/* + * Determine if an inode is owned by the process (allowing for fsuid overr= ide), + * returning 0 if so, 1 if not and a negative error code if there was a pr= oblem + * making the determination. + */ +int vfs_inode_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode) +{ + if (unlikely(inode->i_op->is_owned_by_me)) + return inode->i_op->is_owned_by_me(idmap, inode); + if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), current_fsuid())) + return 0; + return 1; /* Not same. */ +} + +/* + * Determine if an inode has the same owner as its parent, returning 0 if = so, 1 + * if not and a negative error code if there was a problem making the + * determination. + */ +static int vfs_inode_and_dir_have_same_owner(struct mnt_idmap *idmap, stru= ct inode *inode, + const struct nameidata *nd) +{ + if (unlikely(inode->i_op->have_same_owner)) { + struct dentry *parent; + struct inode *dir; + int ret; + + if (inode !=3D nd->inode) { + dir =3D nd->inode; + ret =3D inode->i_op->have_same_owner(idmap, inode, dir); + } else if (nd->flags & LOOKUP_RCU) { + parent =3D READ_ONCE(nd->path.dentry); + dir =3D READ_ONCE(parent->d_inode); + if (!dir) + return -ECHILD; + ret =3D inode->i_op->have_same_owner(idmap, inode, dir); + } else { + parent =3D dget_parent(nd->path.dentry); + dir =3D parent->d_inode; + ret =3D inode->i_op->have_same_owner(idmap, inode, dir); + dput(parent); + } + return ret; + } + + if (vfsuid_valid(nd->dir_vfsuid) && + vfsuid_eq(i_uid_into_vfsuid(idmap, inode), nd->dir_vfsuid)) + return 0; + return 1; /* Not same. */ +} + +/* + * Determine if two inodes have the same owner, returning 0 if so, 1 if no= t and + * a negative error code if there was a problem making the determination. + */ +static int vfs_inodes_have_same_owner(struct mnt_idmap *idmap, struct inod= e *inode, + struct inode *dir) +{ + if (unlikely(inode->i_op->have_same_owner)) + return inode->i_op->have_same_owner(idmap, inode, dir); + if (vfsuid_eq(i_uid_into_vfsuid(idmap, inode), + i_uid_into_vfsuid(idmap, dir))) + return 0; + return 1; /* Not same. */ +} + /** * may_follow_link - Check symlink following for unsafe situations * @nd: nameidata pathwalk data @@ -1165,27 +1231,28 @@ fs_initcall(init_fs_namei_sysctls); * * Returns 0 if following the symlink is allowed, -ve on error. */ -static inline int may_follow_link(struct nameidata *nd, const struct inode= *inode) +static inline int may_follow_link(struct nameidata *nd, struct inode *inod= e) { struct mnt_idmap *idmap; - vfsuid_t vfsuid; + int ret; =20 if (!sysctl_protected_symlinks) return 0; =20 - idmap =3D mnt_idmap(nd->path.mnt); - vfsuid =3D i_uid_into_vfsuid(idmap, inode); - /* Allowed if owner and follower match. */ - if (vfsuid_eq_kuid(vfsuid, current_fsuid())) - return 0; - /* Allowed if parent directory not sticky and world-writable. */ if ((nd->dir_mode & (S_ISVTX|S_IWOTH)) !=3D (S_ISVTX|S_IWOTH)) return 0; =20 + idmap =3D mnt_idmap(nd->path.mnt); + /* Allowed if owner and follower match. */ + ret =3D vfs_inode_is_owned_by_me(idmap, inode); + if (ret <=3D 0) + return ret; + /* Allowed if parent directory and link owner match. */ - if (vfsuid_valid(nd->dir_vfsuid) && vfsuid_eq(nd->dir_vfsuid, vfsuid)) - return 0; + ret =3D vfs_inode_and_dir_have_same_owner(idmap, inode, nd); + if (ret <=3D 0) + return ret; =20 if (nd->flags & LOOKUP_RCU) return -ECHILD; @@ -1283,12 +1350,12 @@ int may_linkat(struct mnt_idmap *idmap, const struc= t path *link) * @inode: the inode of the file to open * * Block an O_CREAT open of a FIFO (or a regular file) when: - * - sysctl_protected_fifos (or sysctl_protected_regular) is enabled - * - the file already exists - * - we are in a sticky directory - * - we don't own the file + * - sysctl_protected_fifos (or sysctl_protected_regular) is enabled, + * - the file already exists, + * - we are in a sticky directory, + * - the directory is world writable, + * - we don't own the file and * - the owner of the directory doesn't own the file - * - the directory is world writable * If the sysctl_protected_fifos (or sysctl_protected_regular) is set to 2 * the directory doesn't have to be world writable: being group writable w= ill * be enough. @@ -1299,13 +1366,45 @@ int may_linkat(struct mnt_idmap *idmap, const struc= t path *link) * On non-idmapped mounts or if permission checking is to be performed on = the * raw inode simply pass @nop_mnt_idmap. * + * For a filesystem (e.g. a network filesystem) that has a separate ID spa= ce + * and has foreign IDs (maybe even non-integer IDs), i_uid cannot be compa= red + * to current_fsuid() and may not be directly comparable to another i_uid. + * Instead, the filesystem is asked to perform the comparisons. With netw= ork + * filesystems, there also exists the possibility of doing anonymous + * operations and having anonymously-owned objects. + * + * We have the following scenarios: + * + * USER DIR FILE FILE ALLOWED + * OWNER OWNER STATE + * =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D =3D= =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D + * A A - New Yes + * A A A Exists Yes + * A A C Exists No + * A B - New Yes + * A B A Exists Yes, FO=3D=3DU + * A B B Exists Yes, FO=3D=3DDO + * A B C Exists No + * A anon[1] - New Yes + * A anon[1] A Exists Yes + * A anon[1] C Exists No + * anon A - New Yes + * anon A A Exists Yes, FO=3D=3DDO + * anon anon[1] - New Yes + * anon anon[1] - Exists No + * anon A A Exists Yes, FO=3D=3DDO + * anon A C Exists No + * anon A anon Exists No + * + * [1] Can anonymously-owned dirs be sticky? + * * Returns 0 if the open is allowed, -ve on error. */ static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata = *nd, - struct inode *const inode) + struct inode *inode) { umode_t dir_mode =3D nd->dir_mode; - vfsuid_t dir_vfsuid =3D nd->dir_vfsuid, i_vfsuid; + int ret; =20 if (likely(!(dir_mode & S_ISVTX))) return 0; @@ -1316,13 +1415,13 @@ static int may_create_in_sticky(struct mnt_idmap *i= dmap, struct nameidata *nd, if (S_ISFIFO(inode->i_mode) && !sysctl_protected_fifos) return 0; =20 - i_vfsuid =3D i_uid_into_vfsuid(idmap, inode); - - if (vfsuid_eq(i_vfsuid, dir_vfsuid)) - return 0; + ret =3D vfs_inode_and_dir_have_same_owner(idmap, inode, nd); + if (ret <=3D 0) + return ret; =20 - if (vfsuid_eq_kuid(i_vfsuid, current_fsuid())) - return 0; + ret =3D vfs_inode_is_owned_by_me(idmap, inode); + if (ret <=3D 0) + return ret; =20 if (likely(dir_mode & 0002)) { audit_log_path_denied(AUDIT_ANOM_CREAT, "sticky_create"); @@ -3134,12 +3233,14 @@ EXPORT_SYMBOL(user_path_at); int __check_sticky(struct mnt_idmap *idmap, struct inode *dir, struct inode *inode) { - kuid_t fsuid =3D current_fsuid(); + int ret; =20 - if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), fsuid)) - return 0; - if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, dir), fsuid)) - return 0; + ret =3D vfs_inode_is_owned_by_me(idmap, inode); + if (ret <=3D 0) + return ret; + ret =3D vfs_inodes_have_same_owner(idmap, inode, dir); + if (ret <=3D 0) + return ret; return !capable_wrt_inode_uidgid(idmap, inode, CAP_FOWNER); } EXPORT_SYMBOL(__check_sticky); diff --git a/fs/remap_range.c b/fs/remap_range.c index 26afbbbfb10c..9eee93c27001 100644 --- a/fs/remap_range.c +++ b/fs/remap_range.c @@ -413,20 +413,22 @@ loff_t vfs_clone_file_range(struct file *file_in, lof= f_t pos_in, EXPORT_SYMBOL(vfs_clone_file_range); =20 /* Check whether we are allowed to dedupe the destination file */ -static bool may_dedupe_file(struct file *file) +static int may_dedupe_file(struct file *file) { struct mnt_idmap *idmap =3D file_mnt_idmap(file); struct inode *inode =3D file_inode(file); + int ret; =20 if (capable(CAP_SYS_ADMIN)) - return true; + return 0; if (file->f_mode & FMODE_WRITE) - return true; - if (vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode), current_fsuid())) - return true; + return 0; + ret =3D vfs_inode_is_owned_by_me(idmap, inode); + if (ret <=3D 0) + return ret; if (!inode_permission(idmap, inode, MAY_WRITE)) - return true; - return false; + return 0; + return -EPERM; } =20 loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos, @@ -459,8 +461,8 @@ loff_t vfs_dedupe_file_range_one(struct file *src_file,= loff_t src_pos, if (ret) return ret; =20 - ret =3D -EPERM; - if (!may_dedupe_file(dst_file)) + ret =3D may_dedupe_file(dst_file); + if (ret < 0) goto out_drop_write; =20 ret =3D -EXDEV; diff --git a/include/linux/fs.h b/include/linux/fs.h index d7ab4f96d705..1e975be1bdc4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1986,8 +1986,7 @@ static inline bool sb_start_intwrite_trylock(struct s= uper_block *sb) return __sb_start_write_trylock(sb, SB_FREEZE_FS); } =20 -bool inode_owner_or_capable(struct mnt_idmap *idmap, - const struct inode *inode); +bool inode_owner_or_capable(struct mnt_idmap *idmap, struct inode *inode); =20 /* * VFS helper functions.. @@ -2262,6 +2261,9 @@ struct inode_operations { struct dentry *dentry, struct file_kattr *fa); int (*fileattr_get)(struct dentry *dentry, struct file_kattr *fa); struct offset_ctx *(*get_offset_ctx)(struct inode *inode); + int (*is_owned_by_me)(struct mnt_idmap *idmap, struct inode *inode); + int (*have_same_owner)(struct mnt_idmap *idmap, struct inode *inode1, + struct inode *inode2); } ____cacheline_aligned; =20 /* Did the driver provide valid mmap hook configuration? */ From nobody Fri Oct 3 07:40:41 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A65E2F7449 for ; Wed, 3 Sep 2025 12:02:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756900950; cv=none; b=nRQCDRqV+3vdrL0PUaLnCjvF6sYqgRf/GEnaaBdhX9qlC+yvbT3stnz7AvzwksOc0TQkc/CVil5Ob/WaZBBrFDCPbRC4wYH8iJZ9vY7mOhMulVUSCH8nEKaaEZbyx7MTd25mv6vimed42M03a/cN/12C6BEBinFAZvYmhqx9Ux4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756900950; c=relaxed/simple; bh=KoNPUn7kI8MGjzCwP8b/gzQgxcS+YbXibeTIk0hv5pU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RP3jBCVVikixV1742Ft/ugJ9vVlOiyEeLFxOTgZifOLrlVfGkjcpM8WLyyFrqKsBMW7/NoJ93sSodrDUkLBg62YOZi/eK6tDlKEQK6IATnNXXg+Ne6efT1X5mH3REoyZuSTnw3C7AegqghoHEISEYxWZ7ZJnwH1O0C4e/XKdoq4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Il0S+m8o; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Il0S+m8o" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1756900947; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XqEopCUVV6AIxWgJeMrxsHn0rkjOl/HkbJLHlKDiYUQ=; b=Il0S+m8oZYDJcOG6BNguA61bAY/rA4NnfQ5mY7qktf1uHVDcIZV25N5EIeCmCVm6DkcM9J taY9rGeQBdVrsKFbcKKCUEvvyg3KXfkN6yPgdJAHaXF2yLH6ai9Zwqsmb2WjZaiJjE+Q4T urSUS/E+mNMTimuYkbS7npbhEX0k69o= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-641-7OHjWuu0Mmy7MlEaHpbjCA-1; Wed, 03 Sep 2025 08:02:23 -0400 X-MC-Unique: 7OHjWuu0Mmy7MlEaHpbjCA-1 X-Mimecast-MFC-AGG-ID: 7OHjWuu0Mmy7MlEaHpbjCA_1756900941 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7F36C1955D61; Wed, 3 Sep 2025 12:02:16 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.42.28.6]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2FE8819560B8; Wed, 3 Sep 2025 12:02:11 +0000 (UTC) From: David Howells To: Al Viro , Christian Brauner Cc: David Howells , Marc Dionne , Jeffrey Altman , Steve French , linux-afs@lists.infradead.org, openafs-devel@openafs.org, linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Etienne Champetier , Chet Ramey , Cheyenne Wills , Christian Brauner Subject: [PATCH 2/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir Date: Wed, 3 Sep 2025 13:01:54 +0100 Message-ID: <20250903120157.899182-3-dhowells@redhat.com> In-Reply-To: <20250903120157.899182-1-dhowells@redhat.com> References: <20250903120157.899182-1-dhowells@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" Since version 1.11 (January 1992) Bash has a work around in redir_open() that causes open(O_CREAT) of a file to be retried without O_CREAT if open() fails with an EACCES error if bash was built with AFS workarounds configured: #if defined (AFS) if ((fd < 0) && (errno =3D=3D EACCES)) { fd =3D open (filename, flags & ~O_CREAT, mode); errno =3D EACCES; /* restore errno */ } #endif /* AFS */ The ~O_CREAT fallback logic was introduced to workaround a bug[1] in the IBM AFS 3.1 cache manager and server which can return EACCES in preference to EEXIST if the requested file exists but the caller is neither granted explicit PRSFS_READ permission nor is the file owner and is granted PRSFS_INSERT permission on the directory. IBM AFS 3.2 altered the cache manager permission checks but failed to correct the permission checks in the AFS server. As of this writing, all IBM AFS derived servers continue to return EACCES in preference to EEXIST when these conditions are met. Bug reports have been filed with all implementations. As an unintended side effect, the Bash fallback logic also undermines the Linux kernel protections against O_CREAT opening FIFOs and regular files not owned by the user in world writeable sticky directories - unless the owner is the same as that of the directory - as was added in commit 30aba6656f61e ("namei: allow restricted O_CREAT of FIFOs and regular files"). As a result the Bash fallback logic masks an incompatibility between the ownership checks performed by may_create_in_sticky() and network filesystems such as AFS where the uid namespace is disjoint from the uid namespace of the local system. However, the bash work around is going to be removed[2]. Fix this in the kernel by using a preceding patch that allows the user ID comparisons to be overridden by: (1) Implement the ->is_owned_by_me() inode op for kafs to determine if the caller owns the file by checking to see if the server indicated the ADMINISTER bit was set in the access rights returned by the FS.FetchStatus and suchlike instead of checking the i_uid to current_fsuid(). Unfortunately, this check doesn't work for directories, but none of the ops should require that. Note that anonymous accesses to AFS will never see the ADMINISTER bit being set and so will not be perceived as owning an anonymously-owned file. (2) Implement the ->have_same_owner() inode op, for kafs to compare the AFS owner IDs retrieved by FS.FetchStatus (which are 64-bit integers with AuriStor's YFS server and, as such, won't fit in a kuid_t). Note that whilst an anonymously-owned file will match an anonymously-owned parent directory, an anonymously-owned directory cannot have the sticky bit set. This can be tested by creating a sticky directory (the user must have a token to do this) and creating a file in it. Then strace bash doing "echo foo >>file" and look at whether bash does a single, successful O_CREAT open on the file or whether that one fails and then bash does one without O_CREAT that succeeds. Fixes: 30aba6656f61 ("namei: allow restricted O_CREAT of FIFOs and regular = files") Reported-by: Etienne Champetier Signed-off-by: David Howells cc: Marc Dionne cc: Jeffrey Altman cc: Chet Ramey cc: Cheyenne Wills cc: Alexander Viro cc: Christian Brauner cc: Steve French cc: linux-afs@lists.infradead.org cc: openafs-devel@openafs.org cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://groups.google.com/g/gnu.bash.bug/c/6PPTfOgFdL4/m/2AQU-S1N76UJ= [1] Link: https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=3Dbash-5.3-= rc1#n733 [2] --- fs/afs/dir.c | 2 ++ fs/afs/file.c | 2 ++ fs/afs/internal.h | 3 +++ fs/afs/security.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 53 insertions(+) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index bfb69e066672..644782a416d7 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -65,6 +65,8 @@ const struct inode_operations afs_dir_inode_operations = =3D { .permission =3D afs_permission, .getattr =3D afs_getattr, .setattr =3D afs_setattr, + .is_owned_by_me =3D afs_is_owned_by_me, + .have_same_owner =3D afs_have_same_owner, }; =20 const struct address_space_operations afs_dir_aops =3D { diff --git a/fs/afs/file.c b/fs/afs/file.c index f66a92294284..399a40c45d0a 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -47,6 +47,8 @@ const struct inode_operations afs_file_inode_operations = =3D { .getattr =3D afs_getattr, .setattr =3D afs_setattr, .permission =3D afs_permission, + .is_owned_by_me =3D afs_is_owned_by_me, + .have_same_owner =3D afs_have_same_owner, }; =20 const struct address_space_operations afs_file_aops =3D { diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 1124ea4000cb..8c2ca00ac237 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -1515,6 +1515,9 @@ extern struct key *afs_request_key(struct afs_cell *); extern struct key *afs_request_key_rcu(struct afs_cell *); extern int afs_check_permit(struct afs_vnode *, struct key *, afs_access_t= *); extern int afs_permission(struct mnt_idmap *, struct inode *, int); +int afs_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode); +int afs_have_same_owner(struct mnt_idmap *idmap, struct inode *inode1, + struct inode *inode2); extern void __exit afs_clean_up_permit_cache(void); =20 /* diff --git a/fs/afs/security.c b/fs/afs/security.c index 6a7744c9e2a2..19b11c7cb1ff 100644 --- a/fs/afs/security.c +++ b/fs/afs/security.c @@ -477,6 +477,52 @@ int afs_permission(struct mnt_idmap *idmap, struct ino= de *inode, return ret; } =20 +/* + * Determine if an inode is owned by 'me' - whatever that means for the + * filesystem. In the case of AFS, this means that the file is owned by t= he + * AFS user represented by the Rx Security Class token held in a key. Ret= urns + * 0 if owned by me, 1 if not; can also return an error. + */ +int afs_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode) +{ + struct afs_vnode *vnode =3D AFS_FS_I(inode); + afs_access_t access; + struct key *key; + int ret; + + if (S_ISDIR(inode->i_mode)) + return 1; /* The ADMIN right check doesn't work for directories. */ + + key =3D afs_request_key(vnode->volume->cell); + if (IS_ERR(key)) + return PTR_ERR(key); + + /* Get the access rights for the key on this file. */ + ret =3D afs_check_permit(vnode, key, &access); + if (ret < 0) + goto error; + + /* We get the ADMINISTER bit if we own the file. */ + ret =3D (access & AFS_ACE_ADMINISTER) ? 0 : 1; +error: + key_put(key); + return ret; +} + +/* + * Determine if a file has the same owner as its parent - whatever that me= ans + * for the filesystem. In the case of AFS, this means comparing their AFS + * UIDs. Returns 0 if same, 1 if not same; can also return an error. + */ +int afs_have_same_owner(struct mnt_idmap *idmap, struct inode *inode1, + struct inode *inode2) +{ + const struct afs_vnode *vnode1 =3D AFS_FS_I(inode1); + const struct afs_vnode *vnode2 =3D AFS_FS_I(inode2); + + return vnode1->status.owner !=3D vnode2->status.owner; +} + void __exit afs_clean_up_permit_cache(void) { int i;