From: Andrey Albershteyn <aalbersh@redhat.com>
Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
extended attributes/flags. The syscalls take parent directory fd and
path to the child together with struct fsxattr.
This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
that file don't need to be open as we can reference it with a path
instead of fd. By having this we can manipulated inode extended
attributes not only on regular files but also on special ones. This
is not possible with FS_IOC_FSSETXATTR ioctl as with special files
we can not call ioctl() directly on the filesystem inode using fd.
This patch adds two new syscalls which allows userspace to get/set
extended inode attributes on special files by using parent directory
and a path - *at() like syscall.
CC: linux-api@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
CC: linux-xfs@vger.kernel.org
Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
arch/alpha/kernel/syscalls/syscall.tbl | 2 +
arch/arm/tools/syscall.tbl | 2 +
arch/arm64/tools/syscall_32.tbl | 2 +
arch/m68k/kernel/syscalls/syscall.tbl | 2 +
arch/microblaze/kernel/syscalls/syscall.tbl | 2 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 2 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 2 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 2 +
arch/parisc/kernel/syscalls/syscall.tbl | 2 +
arch/powerpc/kernel/syscalls/syscall.tbl | 2 +
arch/s390/kernel/syscalls/syscall.tbl | 2 +
arch/sh/kernel/syscalls/syscall.tbl | 2 +
arch/sparc/kernel/syscalls/syscall.tbl | 2 +
arch/x86/entry/syscalls/syscall_32.tbl | 2 +
arch/x86/entry/syscalls/syscall_64.tbl | 2 +
arch/xtensa/kernel/syscalls/syscall.tbl | 2 +
fs/inode.c | 130 ++++++++++++++++++++++++++++
include/linux/syscalls.h | 6 ++
include/uapi/asm-generic/unistd.h | 8 +-
include/uapi/linux/fs.h | 3 +
20 files changed, 178 insertions(+), 1 deletion(-)
diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index c59d53d6d3f3490f976ca179ddfe02e69265ae4d..4b9e687494c16b60c6fd6ca1dc4d6564706a7e25 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -506,3 +506,5 @@
574 common getxattrat sys_getxattrat
575 common listxattrat sys_listxattrat
576 common removexattrat sys_removexattrat
+577 common getfsxattrat sys_getfsxattrat
+578 common setfsxattrat sys_setfsxattrat
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 49eeb2ad8dbd8e074c6240417693f23fb328afa8..66466257f3c2debb3e2299f0b608c6740c98cab2 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -481,3 +481,5 @@
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
+467 common getfsxattrat sys_getfsxattrat
+468 common setfsxattrat sys_setfsxattrat
diff --git a/arch/arm64/tools/syscall_32.tbl b/arch/arm64/tools/syscall_32.tbl
index 69a829912a05eb8a3e21ed701d1030e31c0148bc..9c516118b154811d8d11d5696f32817430320dbf 100644
--- a/arch/arm64/tools/syscall_32.tbl
+++ b/arch/arm64/tools/syscall_32.tbl
@@ -478,3 +478,5 @@
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
+467 common getfsxattrat sys_getfsxattrat
+468 common setfsxattrat sys_setfsxattrat
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index f5ed71f1910d09769c845c2d062d99ee0449437c..159476387f394a92ee5e29db89b118c630372db2 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -466,3 +466,5 @@
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
+467 common getfsxattrat sys_getfsxattrat
+468 common setfsxattrat sys_setfsxattrat
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 680f568b77f2cbefc3eacb2517f276041f229b1e..a6d59ee740b58cacf823702003cf9bad17c0d3b7 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -472,3 +472,5 @@
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
+467 common getfsxattrat sys_getfsxattrat
+468 common setfsxattrat sys_setfsxattrat
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 0b9b7e25b69ad592642f8533bee9ccfe95ce9626..cfe38fcebe1a0279e11751378d3e71c5ec6b6569 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -405,3 +405,5 @@
464 n32 getxattrat sys_getxattrat
465 n32 listxattrat sys_listxattrat
466 n32 removexattrat sys_removexattrat
+467 n32 getfsxattrat sys_getfsxattrat
+468 n32 setfsxattrat sys_setfsxattrat
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index c844cd5cda620b2809a397cdd6f4315ab6a1bfe2..29a0c5974d1aa2f01e33edc0252d75fb97abe230 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -381,3 +381,5 @@
464 n64 getxattrat sys_getxattrat
465 n64 listxattrat sys_listxattrat
466 n64 removexattrat sys_removexattrat
+467 n64 getfsxattrat sys_getfsxattrat
+468 n64 setfsxattrat sys_setfsxattrat
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 349b8aad1159f404103bd2057a1e64e9bf309f18..6c00436807c57c492ba957fcd59af1202231cf80 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -454,3 +454,5 @@
464 o32 getxattrat sys_getxattrat
465 o32 listxattrat sys_listxattrat
466 o32 removexattrat sys_removexattrat
+467 o32 getfsxattrat sys_getfsxattrat
+468 o32 setfsxattrat sys_setfsxattrat
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index d9fc94c869657fcfbd7aca1d5f5abc9fae2fb9d8..b3578fac43d6b65167787fcc97d2d09f5a9828e7 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -465,3 +465,5 @@
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
+467 common getfsxattrat sys_getfsxattrat
+468 common setfsxattrat sys_setfsxattrat
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index d8b4ab78bef076bd50d49b87dea5060fd8c1686a..808045d82c9465c3bfa96b15947546efe5851e9a 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -557,3 +557,5 @@
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
+467 common getfsxattrat sys_getfsxattrat
+468 common setfsxattrat sys_setfsxattrat
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index e9115b4d8b635b846e5c9ad6ce229605323723a5..78dfc2c184d4815baf8a9e61c546c9936d58a47c 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -469,3 +469,5 @@
464 common getxattrat sys_getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat sys_removexattrat
+467 common getfsxattrat sys_getfsxattrat sys_getfsxattrat
+468 common setfsxattrat sys_setfsxattrat sys_setfsxattrat
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index c8cad33bf250ea110de37bd1407f5a43ec5e38f2..d5a5c8339f0ed25ea07c4aba90351d352033c8a0 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -470,3 +470,5 @@
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
+467 common getfsxattrat sys_getfsxattrat
+468 common setfsxattrat sys_setfsxattrat
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 727f99d333b304b3db0711953a3d91ece18a28eb..817dcd8603bcbffc47f3f59aa3b74b16486453d0 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -512,3 +512,5 @@
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
+467 common getfsxattrat sys_getfsxattrat
+468 common setfsxattrat sys_setfsxattrat
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 4d0fb2fba7e208ae9455459afe11e277321d9f74..b4842c027c5d00c0236b2ba89387c5e2267447bd 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -472,3 +472,5 @@
464 i386 getxattrat sys_getxattrat
465 i386 listxattrat sys_listxattrat
466 i386 removexattrat sys_removexattrat
+467 i386 getfsxattrat sys_getfsxattrat
+468 i386 setfsxattrat sys_setfsxattrat
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5eb708bff1c791debd6cfc5322583b2ae53f6437..b6f0a7236aaee624cf9b484239a1068085a8ffe1 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -390,6 +390,8 @@
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
+467 common getfsxattrat sys_getfsxattrat
+468 common setfsxattrat sys_setfsxattrat
#
# Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 37effc1b134eea061f2c350c1d68b4436b65a4dd..425d56be337d1de22f205ac503df61ff86224fee 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -437,3 +437,5 @@
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
+467 common getfsxattrat sys_getfsxattrat
+468 common setfsxattrat sys_setfsxattrat
diff --git a/fs/inode.c b/fs/inode.c
index 6b4c77268fc0ecace4ac78a9ca777fbffc277f4a..811debf379ab299f287ed90863277cfda27db30c 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -23,6 +23,9 @@
#include <linux/rw_hint.h>
#include <linux/seq_file.h>
#include <linux/debugfs.h>
+#include <linux/syscalls.h>
+#include <linux/fileattr.h>
+#include <linux/namei.h>
#include <trace/events/writeback.h>
#define CREATE_TRACE_POINTS
#include <trace/events/timestamp.h>
@@ -2953,3 +2956,130 @@ umode_t mode_strip_sgid(struct mnt_idmap *idmap,
return mode & ~S_ISGID;
}
EXPORT_SYMBOL(mode_strip_sgid);
+
+SYSCALL_DEFINE5(getfsxattrat, int, dfd, const char __user *, filename,
+ struct fsxattr __user *, ufsx, size_t, usize,
+ unsigned int, at_flags)
+{
+ struct fileattr fa = {};
+ struct path filepath;
+ int error;
+ unsigned int lookup_flags = 0;
+ struct filename *name;
+ struct fsxattr fsx = {};
+
+ BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
+ BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
+
+ if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
+ return -EINVAL;
+
+ if (!(at_flags & AT_SYMLINK_NOFOLLOW))
+ lookup_flags |= LOOKUP_FOLLOW;
+
+ if (at_flags & AT_EMPTY_PATH)
+ lookup_flags |= LOOKUP_EMPTY;
+
+ if (usize > PAGE_SIZE)
+ return -E2BIG;
+
+ if (usize < FSXATTR_SIZE_VER0)
+ return -EINVAL;
+
+ name = getname_maybe_null(filename, at_flags);
+ if (!name) {
+ CLASS(fd, f)(dfd);
+
+ if (fd_empty(f))
+ return -EBADF;
+ error = vfs_fileattr_get(file_dentry(fd_file(f)), &fa);
+ } else {
+ error = filename_lookup(dfd, name, lookup_flags, &filepath,
+ NULL);
+ if (error)
+ goto out;
+ error = vfs_fileattr_get(filepath.dentry, &fa);
+ path_put(&filepath);
+ }
+ if (error == -ENOIOCTLCMD)
+ error = -EOPNOTSUPP;
+ if (!error) {
+ fileattr_to_fsxattr(&fa, &fsx);
+ error = copy_struct_to_user(ufsx, usize, &fsx,
+ sizeof(struct fsxattr), NULL);
+ }
+out:
+ putname(name);
+ return error;
+}
+
+SYSCALL_DEFINE5(setfsxattrat, int, dfd, const char __user *, filename,
+ struct fsxattr __user *, ufsx, size_t, usize,
+ unsigned int, at_flags)
+{
+ struct fileattr fa;
+ struct path filepath;
+ int error;
+ unsigned int lookup_flags = 0;
+ struct filename *name;
+ struct mnt_idmap *idmap;
+ struct dentry *dentry;
+ struct vfsmount *mnt;
+ struct fsxattr fsx = {};
+
+ BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
+ BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
+
+ if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
+ return -EINVAL;
+
+ if (!(at_flags & AT_SYMLINK_NOFOLLOW))
+ lookup_flags |= LOOKUP_FOLLOW;
+
+ if (at_flags & AT_EMPTY_PATH)
+ lookup_flags |= LOOKUP_EMPTY;
+
+ if (usize > PAGE_SIZE)
+ return -E2BIG;
+
+ if (usize < FSXATTR_SIZE_VER0)
+ return -EINVAL;
+
+ error = copy_struct_from_user(&fsx, sizeof(struct fsxattr), ufsx, usize);
+ if (error)
+ return error;
+
+ fsxattr_to_fileattr(&fsx, &fa);
+
+ name = getname_maybe_null(filename, at_flags);
+ if (!name) {
+ CLASS(fd, f)(dfd);
+
+ if (fd_empty(f))
+ return -EBADF;
+
+ idmap = file_mnt_idmap(fd_file(f));
+ dentry = file_dentry(fd_file(f));
+ mnt = fd_file(f)->f_path.mnt;
+ } else {
+ error = filename_lookup(dfd, name, lookup_flags, &filepath,
+ NULL);
+ if (error)
+ return error;
+
+ idmap = mnt_idmap(filepath.mnt);
+ dentry = filepath.dentry;
+ mnt = filepath.mnt;
+ }
+
+ error = mnt_want_write(mnt);
+ if (!error) {
+ error = vfs_fileattr_set(idmap, dentry, &fa);
+ if (error == -ENOIOCTLCMD)
+ error = -EOPNOTSUPP;
+ mnt_drop_write(mnt);
+ }
+
+ path_put(&filepath);
+ return error;
+}
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index c6333204d45130eb022f6db460eea34a1f6e91db..e242ea39b3e63a8008bc777764b616fd63bd40c4 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -371,6 +371,12 @@ asmlinkage long sys_removexattrat(int dfd, const char __user *path,
asmlinkage long sys_lremovexattr(const char __user *path,
const char __user *name);
asmlinkage long sys_fremovexattr(int fd, const char __user *name);
+asmlinkage long sys_getfsxattrat(int dfd, const char __user *filename,
+ struct fsxattr __user *ufsx, size_t usize,
+ unsigned int at_flags);
+asmlinkage long sys_setfsxattrat(int dfd, const char __user *filename,
+ struct fsxattr __user *ufsx, size_t usize,
+ unsigned int at_flags);
asmlinkage long sys_getcwd(char __user *buf, unsigned long size);
asmlinkage long sys_eventfd2(unsigned int count, int flags);
asmlinkage long sys_epoll_create1(int flags);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 88dc393c2bca38c0fa1b3fae579f7cfe4931223c..50be2e1007bc2779120d05c6e9512a689f86779c 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -850,8 +850,14 @@ __SYSCALL(__NR_listxattrat, sys_listxattrat)
#define __NR_removexattrat 466
__SYSCALL(__NR_removexattrat, sys_removexattrat)
+/* fs/inode.c */
+#define __NR_getfsxattrat 467
+__SYSCALL(__NR_getfsxattrat, sys_getfsxattrat)
+#define __NR_setfsxattrat 468
+__SYSCALL(__NR_setfsxattrat, sys_setfsxattrat)
+
#undef __NR_syscalls
-#define __NR_syscalls 467
+#define __NR_syscalls 469
/*
* 32 bit systems traditionally used different
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 7539717707337a8cb22396a869baba3bafa08371..aed753e5d50c97da9b895a187fdaecf0477db74b 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -139,6 +139,9 @@ struct fsxattr {
unsigned char fsx_pad[8];
};
+#define FSXATTR_SIZE_VER0 28
+#define FSXATTR_SIZE_LATEST FSXATTR_SIZE_VER0
+
/*
* Flags for the fsx_xflags field
*/
--
2.47.2
On Fri, Mar 21, 2025 at 08:48:42PM +0100, Andrey Albershteyn wrote:
> From: Andrey Albershteyn <aalbersh@redhat.com>
>
> Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> extended attributes/flags. The syscalls take parent directory fd and
> path to the child together with struct fsxattr.
>
> This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> that file don't need to be open as we can reference it with a path
> instead of fd. By having this we can manipulated inode extended
> attributes not only on regular files but also on special ones. This
> is not possible with FS_IOC_FSSETXATTR ioctl as with special files
> we can not call ioctl() directly on the filesystem inode using fd.
>
> This patch adds two new syscalls which allows userspace to get/set
> extended inode attributes on special files by using parent directory
> and a path - *at() like syscall.
>
> CC: linux-api@vger.kernel.org
> CC: linux-fsdevel@vger.kernel.org
> CC: linux-xfs@vger.kernel.org
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
> arch/alpha/kernel/syscalls/syscall.tbl | 2 +
> arch/arm/tools/syscall.tbl | 2 +
> arch/arm64/tools/syscall_32.tbl | 2 +
> arch/m68k/kernel/syscalls/syscall.tbl | 2 +
> arch/microblaze/kernel/syscalls/syscall.tbl | 2 +
> arch/mips/kernel/syscalls/syscall_n32.tbl | 2 +
> arch/mips/kernel/syscalls/syscall_n64.tbl | 2 +
> arch/mips/kernel/syscalls/syscall_o32.tbl | 2 +
> arch/parisc/kernel/syscalls/syscall.tbl | 2 +
> arch/powerpc/kernel/syscalls/syscall.tbl | 2 +
> arch/s390/kernel/syscalls/syscall.tbl | 2 +
> arch/sh/kernel/syscalls/syscall.tbl | 2 +
> arch/sparc/kernel/syscalls/syscall.tbl | 2 +
> arch/x86/entry/syscalls/syscall_32.tbl | 2 +
> arch/x86/entry/syscalls/syscall_64.tbl | 2 +
> arch/xtensa/kernel/syscalls/syscall.tbl | 2 +
> fs/inode.c | 130 ++++++++++++++++++++++++++++
> include/linux/syscalls.h | 6 ++
> include/uapi/asm-generic/unistd.h | 8 +-
> include/uapi/linux/fs.h | 3 +
> 20 files changed, 178 insertions(+), 1 deletion(-)
>
> diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
> index c59d53d6d3f3490f976ca179ddfe02e69265ae4d..4b9e687494c16b60c6fd6ca1dc4d6564706a7e25 100644
> --- a/arch/alpha/kernel/syscalls/syscall.tbl
> +++ b/arch/alpha/kernel/syscalls/syscall.tbl
> @@ -506,3 +506,5 @@
> 574 common getxattrat sys_getxattrat
> 575 common listxattrat sys_listxattrat
> 576 common removexattrat sys_removexattrat
> +577 common getfsxattrat sys_getfsxattrat
> +578 common setfsxattrat sys_setfsxattrat
> diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
> index 49eeb2ad8dbd8e074c6240417693f23fb328afa8..66466257f3c2debb3e2299f0b608c6740c98cab2 100644
> --- a/arch/arm/tools/syscall.tbl
> +++ b/arch/arm/tools/syscall.tbl
> @@ -481,3 +481,5 @@
> 464 common getxattrat sys_getxattrat
> 465 common listxattrat sys_listxattrat
> 466 common removexattrat sys_removexattrat
> +467 common getfsxattrat sys_getfsxattrat
> +468 common setfsxattrat sys_setfsxattrat
> diff --git a/arch/arm64/tools/syscall_32.tbl b/arch/arm64/tools/syscall_32.tbl
> index 69a829912a05eb8a3e21ed701d1030e31c0148bc..9c516118b154811d8d11d5696f32817430320dbf 100644
> --- a/arch/arm64/tools/syscall_32.tbl
> +++ b/arch/arm64/tools/syscall_32.tbl
> @@ -478,3 +478,5 @@
> 464 common getxattrat sys_getxattrat
> 465 common listxattrat sys_listxattrat
> 466 common removexattrat sys_removexattrat
> +467 common getfsxattrat sys_getfsxattrat
> +468 common setfsxattrat sys_setfsxattrat
> diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
> index f5ed71f1910d09769c845c2d062d99ee0449437c..159476387f394a92ee5e29db89b118c630372db2 100644
> --- a/arch/m68k/kernel/syscalls/syscall.tbl
> +++ b/arch/m68k/kernel/syscalls/syscall.tbl
> @@ -466,3 +466,5 @@
> 464 common getxattrat sys_getxattrat
> 465 common listxattrat sys_listxattrat
> 466 common removexattrat sys_removexattrat
> +467 common getfsxattrat sys_getfsxattrat
> +468 common setfsxattrat sys_setfsxattrat
> diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
> index 680f568b77f2cbefc3eacb2517f276041f229b1e..a6d59ee740b58cacf823702003cf9bad17c0d3b7 100644
> --- a/arch/microblaze/kernel/syscalls/syscall.tbl
> +++ b/arch/microblaze/kernel/syscalls/syscall.tbl
> @@ -472,3 +472,5 @@
> 464 common getxattrat sys_getxattrat
> 465 common listxattrat sys_listxattrat
> 466 common removexattrat sys_removexattrat
> +467 common getfsxattrat sys_getfsxattrat
> +468 common setfsxattrat sys_setfsxattrat
> diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
> index 0b9b7e25b69ad592642f8533bee9ccfe95ce9626..cfe38fcebe1a0279e11751378d3e71c5ec6b6569 100644
> --- a/arch/mips/kernel/syscalls/syscall_n32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
> @@ -405,3 +405,5 @@
> 464 n32 getxattrat sys_getxattrat
> 465 n32 listxattrat sys_listxattrat
> 466 n32 removexattrat sys_removexattrat
> +467 n32 getfsxattrat sys_getfsxattrat
> +468 n32 setfsxattrat sys_setfsxattrat
> diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
> index c844cd5cda620b2809a397cdd6f4315ab6a1bfe2..29a0c5974d1aa2f01e33edc0252d75fb97abe230 100644
> --- a/arch/mips/kernel/syscalls/syscall_n64.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
> @@ -381,3 +381,5 @@
> 464 n64 getxattrat sys_getxattrat
> 465 n64 listxattrat sys_listxattrat
> 466 n64 removexattrat sys_removexattrat
> +467 n64 getfsxattrat sys_getfsxattrat
> +468 n64 setfsxattrat sys_setfsxattrat
> diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
> index 349b8aad1159f404103bd2057a1e64e9bf309f18..6c00436807c57c492ba957fcd59af1202231cf80 100644
> --- a/arch/mips/kernel/syscalls/syscall_o32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
> @@ -454,3 +454,5 @@
> 464 o32 getxattrat sys_getxattrat
> 465 o32 listxattrat sys_listxattrat
> 466 o32 removexattrat sys_removexattrat
> +467 o32 getfsxattrat sys_getfsxattrat
> +468 o32 setfsxattrat sys_setfsxattrat
> diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
> index d9fc94c869657fcfbd7aca1d5f5abc9fae2fb9d8..b3578fac43d6b65167787fcc97d2d09f5a9828e7 100644
> --- a/arch/parisc/kernel/syscalls/syscall.tbl
> +++ b/arch/parisc/kernel/syscalls/syscall.tbl
> @@ -465,3 +465,5 @@
> 464 common getxattrat sys_getxattrat
> 465 common listxattrat sys_listxattrat
> 466 common removexattrat sys_removexattrat
> +467 common getfsxattrat sys_getfsxattrat
> +468 common setfsxattrat sys_setfsxattrat
> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
> index d8b4ab78bef076bd50d49b87dea5060fd8c1686a..808045d82c9465c3bfa96b15947546efe5851e9a 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -557,3 +557,5 @@
> 464 common getxattrat sys_getxattrat
> 465 common listxattrat sys_listxattrat
> 466 common removexattrat sys_removexattrat
> +467 common getfsxattrat sys_getfsxattrat
> +468 common setfsxattrat sys_setfsxattrat
> diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
> index e9115b4d8b635b846e5c9ad6ce229605323723a5..78dfc2c184d4815baf8a9e61c546c9936d58a47c 100644
> --- a/arch/s390/kernel/syscalls/syscall.tbl
> +++ b/arch/s390/kernel/syscalls/syscall.tbl
> @@ -469,3 +469,5 @@
> 464 common getxattrat sys_getxattrat sys_getxattrat
> 465 common listxattrat sys_listxattrat sys_listxattrat
> 466 common removexattrat sys_removexattrat sys_removexattrat
> +467 common getfsxattrat sys_getfsxattrat sys_getfsxattrat
> +468 common setfsxattrat sys_setfsxattrat sys_setfsxattrat
> diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
> index c8cad33bf250ea110de37bd1407f5a43ec5e38f2..d5a5c8339f0ed25ea07c4aba90351d352033c8a0 100644
> --- a/arch/sh/kernel/syscalls/syscall.tbl
> +++ b/arch/sh/kernel/syscalls/syscall.tbl
> @@ -470,3 +470,5 @@
> 464 common getxattrat sys_getxattrat
> 465 common listxattrat sys_listxattrat
> 466 common removexattrat sys_removexattrat
> +467 common getfsxattrat sys_getfsxattrat
> +468 common setfsxattrat sys_setfsxattrat
> diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
> index 727f99d333b304b3db0711953a3d91ece18a28eb..817dcd8603bcbffc47f3f59aa3b74b16486453d0 100644
> --- a/arch/sparc/kernel/syscalls/syscall.tbl
> +++ b/arch/sparc/kernel/syscalls/syscall.tbl
> @@ -512,3 +512,5 @@
> 464 common getxattrat sys_getxattrat
> 465 common listxattrat sys_listxattrat
> 466 common removexattrat sys_removexattrat
> +467 common getfsxattrat sys_getfsxattrat
> +468 common setfsxattrat sys_setfsxattrat
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 4d0fb2fba7e208ae9455459afe11e277321d9f74..b4842c027c5d00c0236b2ba89387c5e2267447bd 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -472,3 +472,5 @@
> 464 i386 getxattrat sys_getxattrat
> 465 i386 listxattrat sys_listxattrat
> 466 i386 removexattrat sys_removexattrat
> +467 i386 getfsxattrat sys_getfsxattrat
> +468 i386 setfsxattrat sys_setfsxattrat
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 5eb708bff1c791debd6cfc5322583b2ae53f6437..b6f0a7236aaee624cf9b484239a1068085a8ffe1 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -390,6 +390,8 @@
> 464 common getxattrat sys_getxattrat
> 465 common listxattrat sys_listxattrat
> 466 common removexattrat sys_removexattrat
> +467 common getfsxattrat sys_getfsxattrat
> +468 common setfsxattrat sys_setfsxattrat
>
> #
> # Due to a historical design error, certain syscalls are numbered differently
> diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
> index 37effc1b134eea061f2c350c1d68b4436b65a4dd..425d56be337d1de22f205ac503df61ff86224fee 100644
> --- a/arch/xtensa/kernel/syscalls/syscall.tbl
> +++ b/arch/xtensa/kernel/syscalls/syscall.tbl
> @@ -437,3 +437,5 @@
> 464 common getxattrat sys_getxattrat
> 465 common listxattrat sys_listxattrat
> 466 common removexattrat sys_removexattrat
> +467 common getfsxattrat sys_getfsxattrat
> +468 common setfsxattrat sys_setfsxattrat
> diff --git a/fs/inode.c b/fs/inode.c
> index 6b4c77268fc0ecace4ac78a9ca777fbffc277f4a..811debf379ab299f287ed90863277cfda27db30c 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
I really dislike the name fsxattr for a lot of reasons but it definitely
shouldn't go in inode.c. Just add a new fs/file_attr.c file and move all
the relevant helpers from fs/ioctl.c in there and then add the system
calls. Otherwise it's just all very confusing.
> @@ -23,6 +23,9 @@
> #include <linux/rw_hint.h>
> #include <linux/seq_file.h>
> #include <linux/debugfs.h>
> +#include <linux/syscalls.h>
> +#include <linux/fileattr.h>
> +#include <linux/namei.h>
> #include <trace/events/writeback.h>
> #define CREATE_TRACE_POINTS
> #include <trace/events/timestamp.h>
> @@ -2953,3 +2956,130 @@ umode_t mode_strip_sgid(struct mnt_idmap *idmap,
> return mode & ~S_ISGID;
> }
> EXPORT_SYMBOL(mode_strip_sgid);
> +
> +SYSCALL_DEFINE5(getfsxattrat, int, dfd, const char __user *, filename,
This is really misnamed. It will end up confusing userspace to no end:
getxattr()
getxattrat()
getfsxattrat()
Please name this file_setattr() and file_getattr(). There's also no need
for the *at() prefix. We've never been consistent with that. We have
plent of system calls that to fd+path without having an *at() prefix.
And here it's especially unneeded because there's no pre-existing system
calls that would even force the use of that prefix.
> + struct fsxattr __user *, ufsx, size_t, usize,
> + unsigned int, at_flags)
> +{
> + struct fileattr fa = {};
> + struct path filepath;
> + int error;
> + unsigned int lookup_flags = 0;
> + struct filename *name;
> + struct fsxattr fsx = {};
> +
> + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> +
> + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> + return -EINVAL;
> +
> + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> + lookup_flags |= LOOKUP_FOLLOW;
> +
> + if (at_flags & AT_EMPTY_PATH)
> + lookup_flags |= LOOKUP_EMPTY;
> +
> + if (usize > PAGE_SIZE)
> + return -E2BIG;
> +
> + if (usize < FSXATTR_SIZE_VER0)
> + return -EINVAL;
> +
> + name = getname_maybe_null(filename, at_flags);
> + if (!name) {
This is broken as it doesn't handle AT_FDCWD correctly. You need:
name = getname_maybe_null(filename, at_flags);
if (IS_ERR(name))
return PTR_ERR(name);
if (!name && dfd >= 0) {
CLASS(fd, f)(dfd);
> + CLASS(fd, f)(dfd);
> +
> + if (fd_empty(f))
> + return -EBADF;
I'm pretty sure you can just do a:
path = fd_file(f_to)->f_path;
path_get(&path);
and then the vfs_fileattr_get() call and the path_put() call can be
shared between the two branches. Note that you can also use:
struct path path __free(path_put) = NULL;
and then the cleanup infrastructure will handle the path_put() for you.
> + error = vfs_fileattr_get(file_dentry(fd_file(f)), &fa);
> + } else {
> + error = filename_lookup(dfd, name, lookup_flags, &filepath,
> + NULL);
> + if (error)
> + goto out;
> + error = vfs_fileattr_get(filepath.dentry, &fa);
> + path_put(&filepath);
> + }
> + if (error == -ENOIOCTLCMD)
> + error = -EOPNOTSUPP;
> + if (!error) {
> + fileattr_to_fsxattr(&fa, &fsx);
> + error = copy_struct_to_user(ufsx, usize, &fsx,
> + sizeof(struct fsxattr), NULL);
> + }
> +out:
> + putname(name);
> + return error;
> +}
> +
> +SYSCALL_DEFINE5(setfsxattrat, int, dfd, const char __user *, filename,
> + struct fsxattr __user *, ufsx, size_t, usize,
> + unsigned int, at_flags)
> +{
> + struct fileattr fa;
> + struct path filepath;
> + int error;
> + unsigned int lookup_flags = 0;
> + struct filename *name;
> + struct mnt_idmap *idmap;
> + struct dentry *dentry;
> + struct vfsmount *mnt;
> + struct fsxattr fsx = {};
> +
> + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> +
> + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> + return -EINVAL;
> +
> + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> + lookup_flags |= LOOKUP_FOLLOW;
> +
> + if (at_flags & AT_EMPTY_PATH)
> + lookup_flags |= LOOKUP_EMPTY;
> +
> + if (usize > PAGE_SIZE)
> + return -E2BIG;
> +
> + if (usize < FSXATTR_SIZE_VER0)
> + return -EINVAL;
> +
> + error = copy_struct_from_user(&fsx, sizeof(struct fsxattr), ufsx, usize);
> + if (error)
> + return error;
> +
> + fsxattr_to_fileattr(&fsx, &fa);
> +
> + name = getname_maybe_null(filename, at_flags);
> + if (!name) {
Same comment as above.
> + CLASS(fd, f)(dfd);
> +
> + if (fd_empty(f))
> + return -EBADF;
> +
> + idmap = file_mnt_idmap(fd_file(f));
> + dentry = file_dentry(fd_file(f));
> + mnt = fd_file(f)->f_path.mnt;
This is a UAF. fd_file(f)->f_path.mnt and file_dentry(fd_file(f)) will
get auto cleaned at the end of the scope. By the time you call
vfs_fileattr_set() nothing pins them anymore...
In general, same comment about unifying the branches as for the get case
via path_get() as above. And just keep the path around don't store mount
and dentry separately.
> + } else {
> + error = filename_lookup(dfd, name, lookup_flags, &filepath,
> + NULL);
> + if (error)
> + return error;
> +
> + idmap = mnt_idmap(filepath.mnt);
> + dentry = filepath.dentry;
> + mnt = filepath.mnt;
> + }
> +
> + error = mnt_want_write(mnt);
> + if (!error) {
> + error = vfs_fileattr_set(idmap, dentry, &fa);
> + if (error == -ENOIOCTLCMD)
> + error = -EOPNOTSUPP;
> + mnt_drop_write(mnt);
> + }
> +
> + path_put(&filepath);
> + return error;
> +}
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index c6333204d45130eb022f6db460eea34a1f6e91db..e242ea39b3e63a8008bc777764b616fd63bd40c4 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -371,6 +371,12 @@ asmlinkage long sys_removexattrat(int dfd, const char __user *path,
> asmlinkage long sys_lremovexattr(const char __user *path,
> const char __user *name);
> asmlinkage long sys_fremovexattr(int fd, const char __user *name);
> +asmlinkage long sys_getfsxattrat(int dfd, const char __user *filename,
> + struct fsxattr __user *ufsx, size_t usize,
> + unsigned int at_flags);
> +asmlinkage long sys_setfsxattrat(int dfd, const char __user *filename,
> + struct fsxattr __user *ufsx, size_t usize,
> + unsigned int at_flags);
> asmlinkage long sys_getcwd(char __user *buf, unsigned long size);
> asmlinkage long sys_eventfd2(unsigned int count, int flags);
> asmlinkage long sys_epoll_create1(int flags);
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 88dc393c2bca38c0fa1b3fae579f7cfe4931223c..50be2e1007bc2779120d05c6e9512a689f86779c 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -850,8 +850,14 @@ __SYSCALL(__NR_listxattrat, sys_listxattrat)
> #define __NR_removexattrat 466
> __SYSCALL(__NR_removexattrat, sys_removexattrat)
>
> +/* fs/inode.c */
> +#define __NR_getfsxattrat 467
> +__SYSCALL(__NR_getfsxattrat, sys_getfsxattrat)
> +#define __NR_setfsxattrat 468
> +__SYSCALL(__NR_setfsxattrat, sys_setfsxattrat)
> +
> #undef __NR_syscalls
> -#define __NR_syscalls 467
> +#define __NR_syscalls 469
>
> /*
> * 32 bit systems traditionally used different
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 7539717707337a8cb22396a869baba3bafa08371..aed753e5d50c97da9b895a187fdaecf0477db74b 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -139,6 +139,9 @@ struct fsxattr {
> unsigned char fsx_pad[8];
> };
>
> +#define FSXATTR_SIZE_VER0 28
> +#define FSXATTR_SIZE_LATEST FSXATTR_SIZE_VER0
> +
> /*
> * Flags for the fsx_xflags field
> */
>
> --
> 2.47.2
>
On Tue 22-04-25 16:59:02, Christian Brauner wrote:
> On Fri, Mar 21, 2025 at 08:48:42PM +0100, Andrey Albershteyn wrote:
> > From: Andrey Albershteyn <aalbersh@redhat.com>
> >
> > Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> > extended attributes/flags. The syscalls take parent directory fd and
> > path to the child together with struct fsxattr.
> >
> > This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> > that file don't need to be open as we can reference it with a path
> > instead of fd. By having this we can manipulated inode extended
> > attributes not only on regular files but also on special ones. This
> > is not possible with FS_IOC_FSSETXATTR ioctl as with special files
> > we can not call ioctl() directly on the filesystem inode using fd.
> >
> > This patch adds two new syscalls which allows userspace to get/set
> > extended inode attributes on special files by using parent directory
> > and a path - *at() like syscall.
> >
> > CC: linux-api@vger.kernel.org
> > CC: linux-fsdevel@vger.kernel.org
> > CC: linux-xfs@vger.kernel.org
> > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > Acked-by: Arnd Bergmann <arnd@arndb.de>
...
> > + struct fsxattr __user *, ufsx, size_t, usize,
> > + unsigned int, at_flags)
> > +{
> > + struct fileattr fa = {};
> > + struct path filepath;
> > + int error;
> > + unsigned int lookup_flags = 0;
> > + struct filename *name;
> > + struct fsxattr fsx = {};
> > +
> > + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> > + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> > +
> > + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > + return -EINVAL;
> > +
> > + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> > + lookup_flags |= LOOKUP_FOLLOW;
> > +
> > + if (at_flags & AT_EMPTY_PATH)
> > + lookup_flags |= LOOKUP_EMPTY;
> > +
> > + if (usize > PAGE_SIZE)
> > + return -E2BIG;
> > +
> > + if (usize < FSXATTR_SIZE_VER0)
> > + return -EINVAL;
> > +
> > + name = getname_maybe_null(filename, at_flags);
> > + if (!name) {
>
> This is broken as it doesn't handle AT_FDCWD correctly. You need:
>
> name = getname_maybe_null(filename, at_flags);
> if (IS_ERR(name))
> return PTR_ERR(name);
>
> if (!name && dfd >= 0) {
> CLASS(fd, f)(dfd);
Ah, you're indeed right that if dfd == AT_FDCWD and filename == NULL, the
we should operate on cwd but we'd bail with error here. I've missed that
during my review. But as far as I've checked the same bug is there in
path_setxattrat() and path_getxattrat() so we should fix this there as
well?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
On Wed, Apr 23, 2025 at 11:53:25AM +0200, Jan Kara wrote:
> On Tue 22-04-25 16:59:02, Christian Brauner wrote:
> > On Fri, Mar 21, 2025 at 08:48:42PM +0100, Andrey Albershteyn wrote:
> > > From: Andrey Albershteyn <aalbersh@redhat.com>
> > >
> > > Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> > > extended attributes/flags. The syscalls take parent directory fd and
> > > path to the child together with struct fsxattr.
> > >
> > > This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> > > that file don't need to be open as we can reference it with a path
> > > instead of fd. By having this we can manipulated inode extended
> > > attributes not only on regular files but also on special ones. This
> > > is not possible with FS_IOC_FSSETXATTR ioctl as with special files
> > > we can not call ioctl() directly on the filesystem inode using fd.
> > >
> > > This patch adds two new syscalls which allows userspace to get/set
> > > extended inode attributes on special files by using parent directory
> > > and a path - *at() like syscall.
> > >
> > > CC: linux-api@vger.kernel.org
> > > CC: linux-fsdevel@vger.kernel.org
> > > CC: linux-xfs@vger.kernel.org
> > > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > > Acked-by: Arnd Bergmann <arnd@arndb.de>
> ...
> > > + struct fsxattr __user *, ufsx, size_t, usize,
> > > + unsigned int, at_flags)
> > > +{
> > > + struct fileattr fa = {};
> > > + struct path filepath;
> > > + int error;
> > > + unsigned int lookup_flags = 0;
> > > + struct filename *name;
> > > + struct fsxattr fsx = {};
> > > +
> > > + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> > > + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> > > +
> > > + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > + return -EINVAL;
> > > +
> > > + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> > > + lookup_flags |= LOOKUP_FOLLOW;
> > > +
> > > + if (at_flags & AT_EMPTY_PATH)
> > > + lookup_flags |= LOOKUP_EMPTY;
> > > +
> > > + if (usize > PAGE_SIZE)
> > > + return -E2BIG;
> > > +
> > > + if (usize < FSXATTR_SIZE_VER0)
> > > + return -EINVAL;
> > > +
> > > + name = getname_maybe_null(filename, at_flags);
> > > + if (!name) {
> >
> > This is broken as it doesn't handle AT_FDCWD correctly. You need:
> >
> > name = getname_maybe_null(filename, at_flags);
> > if (IS_ERR(name))
> > return PTR_ERR(name);
> >
> > if (!name && dfd >= 0) {
> > CLASS(fd, f)(dfd);
>
> Ah, you're indeed right that if dfd == AT_FDCWD and filename == NULL, the
> we should operate on cwd but we'd bail with error here. I've missed that
> during my review. But as far as I've checked the same bug is there in
> path_setxattrat() and path_getxattrat() so we should fix this there as
> well?
Yes, please!
On 2025-04-24 11:06:07, Christian Brauner wrote:
> On Wed, Apr 23, 2025 at 11:53:25AM +0200, Jan Kara wrote:
> > On Tue 22-04-25 16:59:02, Christian Brauner wrote:
> > > On Fri, Mar 21, 2025 at 08:48:42PM +0100, Andrey Albershteyn wrote:
> > > > From: Andrey Albershteyn <aalbersh@redhat.com>
> > > >
> > > > Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> > > > extended attributes/flags. The syscalls take parent directory fd and
> > > > path to the child together with struct fsxattr.
> > > >
> > > > This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> > > > that file don't need to be open as we can reference it with a path
> > > > instead of fd. By having this we can manipulated inode extended
> > > > attributes not only on regular files but also on special ones. This
> > > > is not possible with FS_IOC_FSSETXATTR ioctl as with special files
> > > > we can not call ioctl() directly on the filesystem inode using fd.
> > > >
> > > > This patch adds two new syscalls which allows userspace to get/set
> > > > extended inode attributes on special files by using parent directory
> > > > and a path - *at() like syscall.
> > > >
> > > > CC: linux-api@vger.kernel.org
> > > > CC: linux-fsdevel@vger.kernel.org
> > > > CC: linux-xfs@vger.kernel.org
> > > > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > > > Acked-by: Arnd Bergmann <arnd@arndb.de>
> > ...
> > > > + struct fsxattr __user *, ufsx, size_t, usize,
> > > > + unsigned int, at_flags)
> > > > +{
> > > > + struct fileattr fa = {};
> > > > + struct path filepath;
> > > > + int error;
> > > > + unsigned int lookup_flags = 0;
> > > > + struct filename *name;
> > > > + struct fsxattr fsx = {};
> > > > +
> > > > + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> > > > + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> > > > +
> > > > + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > + return -EINVAL;
> > > > +
> > > > + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> > > > + lookup_flags |= LOOKUP_FOLLOW;
> > > > +
> > > > + if (at_flags & AT_EMPTY_PATH)
> > > > + lookup_flags |= LOOKUP_EMPTY;
> > > > +
> > > > + if (usize > PAGE_SIZE)
> > > > + return -E2BIG;
> > > > +
> > > > + if (usize < FSXATTR_SIZE_VER0)
> > > > + return -EINVAL;
> > > > +
> > > > + name = getname_maybe_null(filename, at_flags);
> > > > + if (!name) {
> > >
> > > This is broken as it doesn't handle AT_FDCWD correctly. You need:
> > >
> > > name = getname_maybe_null(filename, at_flags);
> > > if (IS_ERR(name))
> > > return PTR_ERR(name);
> > >
> > > if (!name && dfd >= 0) {
> > > CLASS(fd, f)(dfd);
> >
> > Ah, you're indeed right that if dfd == AT_FDCWD and filename == NULL, the
> > we should operate on cwd but we'd bail with error here. I've missed that
> > during my review. But as far as I've checked the same bug is there in
> > path_setxattrat() and path_getxattrat() so we should fix this there as
> > well?
>
> Yes, please!
>
Thanks for the review, Christian. I will fix issues you noticed as
suggested. I see that Jan already sent fix for path_[s|g]etxattrat()
so won't do anything here.
--
- Andrey
On Fri 21-03-25 20:48:42, Andrey Albershteyn wrote:
> From: Andrey Albershteyn <aalbersh@redhat.com>
>
> Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> extended attributes/flags. The syscalls take parent directory fd and
> path to the child together with struct fsxattr.
>
> This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> that file don't need to be open as we can reference it with a path
> instead of fd. By having this we can manipulated inode extended
> attributes not only on regular files but also on special ones. This
> is not possible with FS_IOC_FSSETXATTR ioctl as with special files
> we can not call ioctl() directly on the filesystem inode using fd.
>
> This patch adds two new syscalls which allows userspace to get/set
> extended inode attributes on special files by using parent directory
> and a path - *at() like syscall.
>
> CC: linux-api@vger.kernel.org
> CC: linux-fsdevel@vger.kernel.org
> CC: linux-xfs@vger.kernel.org
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
Looks good. Just two nits below:
> +SYSCALL_DEFINE5(getfsxattrat, int, dfd, const char __user *, filename,
> + struct fsxattr __user *, ufsx, size_t, usize,
> + unsigned int, at_flags)
> +{
> + struct fileattr fa = {};
> + struct path filepath;
> + int error;
> + unsigned int lookup_flags = 0;
> + struct filename *name;
> + struct fsxattr fsx = {};
> +
> + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> +
> + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> + return -EINVAL;
> +
> + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> + lookup_flags |= LOOKUP_FOLLOW;
> +
> + if (at_flags & AT_EMPTY_PATH)
> + lookup_flags |= LOOKUP_EMPTY;
Strictly speaking setting LOOKUP_EMPTY does not have any effect because
empty names are already handled by getname_maybe_null(). But it does not
hurt either so I don't really care...
> +
> + if (usize > PAGE_SIZE)
> + return -E2BIG;
> +
> + if (usize < FSXATTR_SIZE_VER0)
> + return -EINVAL;
> +
> + name = getname_maybe_null(filename, at_flags);
> + if (!name) {
> + CLASS(fd, f)(dfd);
> +
> + if (fd_empty(f))
> + return -EBADF;
> + error = vfs_fileattr_get(file_dentry(fd_file(f)), &fa);
> + } else {
> + error = filename_lookup(dfd, name, lookup_flags, &filepath,
> + NULL);
> + if (error)
> + goto out;
> + error = vfs_fileattr_get(filepath.dentry, &fa);
> + path_put(&filepath);
> + }
> + if (error == -ENOIOCTLCMD)
> + error = -EOPNOTSUPP;
> + if (!error) {
> + fileattr_to_fsxattr(&fa, &fsx);
> + error = copy_struct_to_user(ufsx, usize, &fsx,
> + sizeof(struct fsxattr), NULL);
> + }
> +out:
> + putname(name);
> + return error;
> +}
> +
> +SYSCALL_DEFINE5(setfsxattrat, int, dfd, const char __user *, filename,
> + struct fsxattr __user *, ufsx, size_t, usize,
> + unsigned int, at_flags)
> +{
> + struct fileattr fa;
> + struct path filepath;
> + int error;
> + unsigned int lookup_flags = 0;
> + struct filename *name;
> + struct mnt_idmap *idmap;
> + struct dentry *dentry;
> + struct vfsmount *mnt;
> + struct fsxattr fsx = {};
> +
> + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> +
> + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> + return -EINVAL;
> +
> + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> + lookup_flags |= LOOKUP_FOLLOW;
> +
> + if (at_flags & AT_EMPTY_PATH)
> + lookup_flags |= LOOKUP_EMPTY;
Same comment regarding LOOKUP_EMPTY here.
> +
> + if (usize > PAGE_SIZE)
> + return -E2BIG;
> +
> + if (usize < FSXATTR_SIZE_VER0)
> + return -EINVAL;
> +
> + error = copy_struct_from_user(&fsx, sizeof(struct fsxattr), ufsx, usize);
> + if (error)
> + return error;
> +
> + fsxattr_to_fileattr(&fsx, &fa);
> +
> + name = getname_maybe_null(filename, at_flags);
> + if (!name) {
> + CLASS(fd, f)(dfd);
> +
> + if (fd_empty(f))
> + return -EBADF;
> +
> + idmap = file_mnt_idmap(fd_file(f));
> + dentry = file_dentry(fd_file(f));
> + mnt = fd_file(f)->f_path.mnt;
> + } else {
> + error = filename_lookup(dfd, name, lookup_flags, &filepath,
> + NULL);
> + if (error)
> + return error;
> +
> + idmap = mnt_idmap(filepath.mnt);
> + dentry = filepath.dentry;
> + mnt = filepath.mnt;
> + }
> +
> + error = mnt_want_write(mnt);
> + if (!error) {
> + error = vfs_fileattr_set(idmap, dentry, &fa);
> + if (error == -ENOIOCTLCMD)
> + error = -EOPNOTSUPP;
> + mnt_drop_write(mnt);
> + }
> +
> + path_put(&filepath);
filepath will not be initialized here in case of name == NULL.
> + return error;
> +}
With this fixed feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
On Fri, Mar 21, 2025 at 8:49 PM Andrey Albershteyn <aalbersh@redhat.com> wrote:
>
> From: Andrey Albershteyn <aalbersh@redhat.com>
>
> Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> extended attributes/flags. The syscalls take parent directory fd and
> path to the child together with struct fsxattr.
>
> This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> that file don't need to be open as we can reference it with a path
> instead of fd. By having this we can manipulated inode extended
> attributes not only on regular files but also on special ones. This
> is not possible with FS_IOC_FSSETXATTR ioctl as with special files
> we can not call ioctl() directly on the filesystem inode using fd.
>
> This patch adds two new syscalls which allows userspace to get/set
> extended inode attributes on special files by using parent directory
> and a path - *at() like syscall.
>
> CC: linux-api@vger.kernel.org
> CC: linux-fsdevel@vger.kernel.org
> CC: linux-xfs@vger.kernel.org
> Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
...
> +SYSCALL_DEFINE5(setfsxattrat, int, dfd, const char __user *, filename,
> + struct fsxattr __user *, ufsx, size_t, usize,
> + unsigned int, at_flags)
> +{
> + struct fileattr fa;
> + struct path filepath;
> + int error;
> + unsigned int lookup_flags = 0;
> + struct filename *name;
> + struct mnt_idmap *idmap;.
> + struct dentry *dentry;
> + struct vfsmount *mnt;
> + struct fsxattr fsx = {};
> +
> + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> +
> + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> + return -EINVAL;
> +
> + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> + lookup_flags |= LOOKUP_FOLLOW;
> +
> + if (at_flags & AT_EMPTY_PATH)
> + lookup_flags |= LOOKUP_EMPTY;
> +
> + if (usize > PAGE_SIZE)
> + return -E2BIG;
> +
> + if (usize < FSXATTR_SIZE_VER0)
> + return -EINVAL;
> +
> + error = copy_struct_from_user(&fsx, sizeof(struct fsxattr), ufsx, usize);
> + if (error)
> + return error;
> +
> + fsxattr_to_fileattr(&fsx, &fa);
> +
> + name = getname_maybe_null(filename, at_flags);
> + if (!name) {
> + CLASS(fd, f)(dfd);
> +
> + if (fd_empty(f))
> + return -EBADF;
> +
> + idmap = file_mnt_idmap(fd_file(f));
> + dentry = file_dentry(fd_file(f));
> + mnt = fd_file(f)->f_path.mnt;
> + } else {
> + error = filename_lookup(dfd, name, lookup_flags, &filepath,
> + NULL);
> + if (error)
> + return error;
> +
> + idmap = mnt_idmap(filepath.mnt);
> + dentry = filepath.dentry;
> + mnt = filepath.mnt;
> + }
> +
> + error = mnt_want_write(mnt);
> + if (!error) {
> + error = vfs_fileattr_set(idmap, dentry, &fa);
> + if (error == -ENOIOCTLCMD)
> + error = -EOPNOTSUPP;
This is awkward.
vfs_fileattr_set() should return -EOPNOTSUPP.
ioctl_setflags() could maybe convert it to -ENOIOCTLCMD,
but looking at similar cases ioctl_fiemap(), ioctl_fsfreeze() the
ioctl returns -EOPNOTSUPP.
I don't think it is necessarily a bad idea to start returning
-EOPNOTSUPP instead of -ENOIOCTLCMD for the ioctl
because that really reflects the fact that the ioctl is now implemented
in vfs and not in the specific fs.
and I think it would not be a bad idea at all to make that change
together with the merge of the syscalls as a sort of hint to userspace
that uses the ioctl, that the sycalls API exists.
Thanks,
Amir.
On 2025-03-23 09:56:25, Amir Goldstein wrote:
> On Fri, Mar 21, 2025 at 8:49 PM Andrey Albershteyn <aalbersh@redhat.com> wrote:
> >
> > From: Andrey Albershteyn <aalbersh@redhat.com>
> >
> > Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> > extended attributes/flags. The syscalls take parent directory fd and
> > path to the child together with struct fsxattr.
> >
> > This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> > that file don't need to be open as we can reference it with a path
> > instead of fd. By having this we can manipulated inode extended
> > attributes not only on regular files but also on special ones. This
> > is not possible with FS_IOC_FSSETXATTR ioctl as with special files
> > we can not call ioctl() directly on the filesystem inode using fd.
> >
> > This patch adds two new syscalls which allows userspace to get/set
> > extended inode attributes on special files by using parent directory
> > and a path - *at() like syscall.
> >
> > CC: linux-api@vger.kernel.org
> > CC: linux-fsdevel@vger.kernel.org
> > CC: linux-xfs@vger.kernel.org
> > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > Acked-by: Arnd Bergmann <arnd@arndb.de>
> > ---
> ...
> > +SYSCALL_DEFINE5(setfsxattrat, int, dfd, const char __user *, filename,
> > + struct fsxattr __user *, ufsx, size_t, usize,
> > + unsigned int, at_flags)
> > +{
> > + struct fileattr fa;
> > + struct path filepath;
> > + int error;
> > + unsigned int lookup_flags = 0;
> > + struct filename *name;
> > + struct mnt_idmap *idmap;.
>
> > + struct dentry *dentry;
> > + struct vfsmount *mnt;
> > + struct fsxattr fsx = {};
> > +
> > + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> > + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> > +
> > + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > + return -EINVAL;
> > +
> > + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> > + lookup_flags |= LOOKUP_FOLLOW;
> > +
> > + if (at_flags & AT_EMPTY_PATH)
> > + lookup_flags |= LOOKUP_EMPTY;
> > +
> > + if (usize > PAGE_SIZE)
> > + return -E2BIG;
> > +
> > + if (usize < FSXATTR_SIZE_VER0)
> > + return -EINVAL;
> > +
> > + error = copy_struct_from_user(&fsx, sizeof(struct fsxattr), ufsx, usize);
> > + if (error)
> > + return error;
> > +
> > + fsxattr_to_fileattr(&fsx, &fa);
> > +
> > + name = getname_maybe_null(filename, at_flags);
> > + if (!name) {
> > + CLASS(fd, f)(dfd);
> > +
> > + if (fd_empty(f))
> > + return -EBADF;
> > +
> > + idmap = file_mnt_idmap(fd_file(f));
> > + dentry = file_dentry(fd_file(f));
> > + mnt = fd_file(f)->f_path.mnt;
> > + } else {
> > + error = filename_lookup(dfd, name, lookup_flags, &filepath,
> > + NULL);
> > + if (error)
> > + return error;
> > +
> > + idmap = mnt_idmap(filepath.mnt);
> > + dentry = filepath.dentry;
> > + mnt = filepath.mnt;
> > + }
> > +
> > + error = mnt_want_write(mnt);
> > + if (!error) {
> > + error = vfs_fileattr_set(idmap, dentry, &fa);
> > + if (error == -ENOIOCTLCMD)
> > + error = -EOPNOTSUPP;
>
> This is awkward.
> vfs_fileattr_set() should return -EOPNOTSUPP.
> ioctl_setflags() could maybe convert it to -ENOIOCTLCMD,
> but looking at similar cases ioctl_fiemap(), ioctl_fsfreeze() the
> ioctl returns -EOPNOTSUPP.
>
> I don't think it is necessarily a bad idea to start returning
> -EOPNOTSUPP instead of -ENOIOCTLCMD for the ioctl
> because that really reflects the fact that the ioctl is now implemented
> in vfs and not in the specific fs.
>
> and I think it would not be a bad idea at all to make that change
> together with the merge of the syscalls as a sort of hint to userspace
> that uses the ioctl, that the sycalls API exists.
>
> Thanks,
> Amir.
>
Hmm, not sure what you're suggesting here. I see it as:
- get/setfsxattrat should return EOPNOTSUPP as it make more sense
than ENOIOCTLCMD
- ioctl_setflags returns ENOIOCTLCMD which also expected
Don't really see a reason to change what vfs_fileattr_set() returns
and then copying this if() to other places or start returning
EOPNOTSUPP.
--
- Andrey
On Thu, Mar 27, 2025 at 10:33 AM Andrey Albershteyn <aalbersh@redhat.com> wrote:
>
> On 2025-03-23 09:56:25, Amir Goldstein wrote:
> > On Fri, Mar 21, 2025 at 8:49 PM Andrey Albershteyn <aalbersh@redhat.com> wrote:
> > >
> > > From: Andrey Albershteyn <aalbersh@redhat.com>
> > >
> > > Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> > > extended attributes/flags. The syscalls take parent directory fd and
> > > path to the child together with struct fsxattr.
> > >
> > > This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> > > that file don't need to be open as we can reference it with a path
> > > instead of fd. By having this we can manipulated inode extended
> > > attributes not only on regular files but also on special ones. This
> > > is not possible with FS_IOC_FSSETXATTR ioctl as with special files
> > > we can not call ioctl() directly on the filesystem inode using fd.
> > >
> > > This patch adds two new syscalls which allows userspace to get/set
> > > extended inode attributes on special files by using parent directory
> > > and a path - *at() like syscall.
> > >
> > > CC: linux-api@vger.kernel.org
> > > CC: linux-fsdevel@vger.kernel.org
> > > CC: linux-xfs@vger.kernel.org
> > > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > > Acked-by: Arnd Bergmann <arnd@arndb.de>
> > > ---
> > ...
> > > +SYSCALL_DEFINE5(setfsxattrat, int, dfd, const char __user *, filename,
> > > + struct fsxattr __user *, ufsx, size_t, usize,
> > > + unsigned int, at_flags)
> > > +{
> > > + struct fileattr fa;
> > > + struct path filepath;
> > > + int error;
> > > + unsigned int lookup_flags = 0;
> > > + struct filename *name;
> > > + struct mnt_idmap *idmap;.
> >
> > > + struct dentry *dentry;
> > > + struct vfsmount *mnt;
> > > + struct fsxattr fsx = {};
> > > +
> > > + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> > > + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> > > +
> > > + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > + return -EINVAL;
> > > +
> > > + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> > > + lookup_flags |= LOOKUP_FOLLOW;
> > > +
> > > + if (at_flags & AT_EMPTY_PATH)
> > > + lookup_flags |= LOOKUP_EMPTY;
> > > +
> > > + if (usize > PAGE_SIZE)
> > > + return -E2BIG;
> > > +
> > > + if (usize < FSXATTR_SIZE_VER0)
> > > + return -EINVAL;
> > > +
> > > + error = copy_struct_from_user(&fsx, sizeof(struct fsxattr), ufsx, usize);
> > > + if (error)
> > > + return error;
> > > +
> > > + fsxattr_to_fileattr(&fsx, &fa);
> > > +
> > > + name = getname_maybe_null(filename, at_flags);
> > > + if (!name) {
> > > + CLASS(fd, f)(dfd);
> > > +
> > > + if (fd_empty(f))
> > > + return -EBADF;
> > > +
> > > + idmap = file_mnt_idmap(fd_file(f));
> > > + dentry = file_dentry(fd_file(f));
> > > + mnt = fd_file(f)->f_path.mnt;
> > > + } else {
> > > + error = filename_lookup(dfd, name, lookup_flags, &filepath,
> > > + NULL);
> > > + if (error)
> > > + return error;
> > > +
> > > + idmap = mnt_idmap(filepath.mnt);
> > > + dentry = filepath.dentry;
> > > + mnt = filepath.mnt;
> > > + }
> > > +
> > > + error = mnt_want_write(mnt);
> > > + if (!error) {
> > > + error = vfs_fileattr_set(idmap, dentry, &fa);
> > > + if (error == -ENOIOCTLCMD)
> > > + error = -EOPNOTSUPP;
> >
> > This is awkward.
> > vfs_fileattr_set() should return -EOPNOTSUPP.
> > ioctl_setflags() could maybe convert it to -ENOIOCTLCMD,
> > but looking at similar cases ioctl_fiemap(), ioctl_fsfreeze() the
> > ioctl returns -EOPNOTSUPP.
> >
> > I don't think it is necessarily a bad idea to start returning
> > -EOPNOTSUPP instead of -ENOIOCTLCMD for the ioctl
> > because that really reflects the fact that the ioctl is now implemented
> > in vfs and not in the specific fs.
> >
> > and I think it would not be a bad idea at all to make that change
> > together with the merge of the syscalls as a sort of hint to userspace
> > that uses the ioctl, that the sycalls API exists.
> >
> > Thanks,
> > Amir.
> >
>
> Hmm, not sure what you're suggesting here. I see it as:
> - get/setfsxattrat should return EOPNOTSUPP as it make more sense
> than ENOIOCTLCMD
> - ioctl_setflags returns ENOIOCTLCMD which also expected
>
> Don't really see a reason to change what vfs_fileattr_set() returns
> and then copying this if() to other places or start returning
> EOPNOTSUPP.
ENOIOCTLCMD conceptually means that the ioctl command is unknown
This is not the case since ->fileattr_[gs]et() became a vfs API
the ioctl command is handled by vfs and it is known, but individual
filesystems may not support it, so conceptually, returning EOPNOTSUPP
from ioctl() is more correct these days, exactly as is done with the ioctls
FS_IOC_FIEMAP and FIFREEZE which were also historically per fs
ioctls and made into a vfs API.
The fact that bcachefs does not implement ->fileattr_[gs]et() and does
implement FS_IOC_FS[GS]ETXATTR is an oversight IMO, since it
was probably merged after the vfs conversion patch.
This mistake means that bcachefs fileattr cannot be copied up by
ovl_copy_fileattr() which uses the vfs API and NOT the ioctl.
However, if you would made the internal vfs API change that I suggested,
it will have broken ovl_real_fileattr_get() and ovl_copy_fileattr(),
so leave it for now - if I care enough I can do it later together with
fixing the overlayfs and fuse code.
Thanks,
Amir.
On Thu, Mar 27, 2025 at 12:39:28PM +0100, Amir Goldstein wrote:
> On Thu, Mar 27, 2025 at 10:33 AM Andrey Albershteyn <aalbersh@redhat.com> wrote:
> >
> > On 2025-03-23 09:56:25, Amir Goldstein wrote:
> > > On Fri, Mar 21, 2025 at 8:49 PM Andrey Albershteyn <aalbersh@redhat.com> wrote:
> > > >
> > > > From: Andrey Albershteyn <aalbersh@redhat.com>
> > > >
> > > > Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> > > > extended attributes/flags. The syscalls take parent directory fd and
> > > > path to the child together with struct fsxattr.
> > > >
> > > > This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> > > > that file don't need to be open as we can reference it with a path
> > > > instead of fd. By having this we can manipulated inode extended
> > > > attributes not only on regular files but also on special ones. This
> > > > is not possible with FS_IOC_FSSETXATTR ioctl as with special files
> > > > we can not call ioctl() directly on the filesystem inode using fd.
> > > >
> > > > This patch adds two new syscalls which allows userspace to get/set
> > > > extended inode attributes on special files by using parent directory
> > > > and a path - *at() like syscall.
> > > >
> > > > CC: linux-api@vger.kernel.org
> > > > CC: linux-fsdevel@vger.kernel.org
> > > > CC: linux-xfs@vger.kernel.org
> > > > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > > > Acked-by: Arnd Bergmann <arnd@arndb.de>
> > > > ---
> > > ...
> > > > +SYSCALL_DEFINE5(setfsxattrat, int, dfd, const char __user *, filename,
> > > > + struct fsxattr __user *, ufsx, size_t, usize,
> > > > + unsigned int, at_flags)
> > > > +{
> > > > + struct fileattr fa;
> > > > + struct path filepath;
> > > > + int error;
> > > > + unsigned int lookup_flags = 0;
> > > > + struct filename *name;
> > > > + struct mnt_idmap *idmap;.
> > >
> > > > + struct dentry *dentry;
> > > > + struct vfsmount *mnt;
> > > > + struct fsxattr fsx = {};
> > > > +
> > > > + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> > > > + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> > > > +
> > > > + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > + return -EINVAL;
> > > > +
> > > > + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> > > > + lookup_flags |= LOOKUP_FOLLOW;
> > > > +
> > > > + if (at_flags & AT_EMPTY_PATH)
> > > > + lookup_flags |= LOOKUP_EMPTY;
> > > > +
> > > > + if (usize > PAGE_SIZE)
> > > > + return -E2BIG;
> > > > +
> > > > + if (usize < FSXATTR_SIZE_VER0)
> > > > + return -EINVAL;
> > > > +
> > > > + error = copy_struct_from_user(&fsx, sizeof(struct fsxattr), ufsx, usize);
> > > > + if (error)
> > > > + return error;
> > > > +
> > > > + fsxattr_to_fileattr(&fsx, &fa);
> > > > +
> > > > + name = getname_maybe_null(filename, at_flags);
> > > > + if (!name) {
> > > > + CLASS(fd, f)(dfd);
> > > > +
> > > > + if (fd_empty(f))
> > > > + return -EBADF;
> > > > +
> > > > + idmap = file_mnt_idmap(fd_file(f));
> > > > + dentry = file_dentry(fd_file(f));
> > > > + mnt = fd_file(f)->f_path.mnt;
> > > > + } else {
> > > > + error = filename_lookup(dfd, name, lookup_flags, &filepath,
> > > > + NULL);
> > > > + if (error)
> > > > + return error;
> > > > +
> > > > + idmap = mnt_idmap(filepath.mnt);
> > > > + dentry = filepath.dentry;
> > > > + mnt = filepath.mnt;
> > > > + }
> > > > +
> > > > + error = mnt_want_write(mnt);
> > > > + if (!error) {
> > > > + error = vfs_fileattr_set(idmap, dentry, &fa);
> > > > + if (error == -ENOIOCTLCMD)
> > > > + error = -EOPNOTSUPP;
> > >
> > > This is awkward.
> > > vfs_fileattr_set() should return -EOPNOTSUPP.
> > > ioctl_setflags() could maybe convert it to -ENOIOCTLCMD,
> > > but looking at similar cases ioctl_fiemap(), ioctl_fsfreeze() the
> > > ioctl returns -EOPNOTSUPP.
> > >
> > > I don't think it is necessarily a bad idea to start returning
> > > -EOPNOTSUPP instead of -ENOIOCTLCMD for the ioctl
> > > because that really reflects the fact that the ioctl is now implemented
> > > in vfs and not in the specific fs.
> > >
> > > and I think it would not be a bad idea at all to make that change
> > > together with the merge of the syscalls as a sort of hint to userspace
> > > that uses the ioctl, that the sycalls API exists.
> > >
> > > Thanks,
> > > Amir.
> > >
> >
> > Hmm, not sure what you're suggesting here. I see it as:
> > - get/setfsxattrat should return EOPNOTSUPP as it make more sense
> > than ENOIOCTLCMD
> > - ioctl_setflags returns ENOIOCTLCMD which also expected
> >
> > Don't really see a reason to change what vfs_fileattr_set() returns
> > and then copying this if() to other places or start returning
> > EOPNOTSUPP.
>
> ENOIOCTLCMD conceptually means that the ioctl command is unknown
> This is not the case since ->fileattr_[gs]et() became a vfs API
vfs_fileattr_{g,s}et() should not return ENOIOCTLCMD. Change the return
code to EOPNOTSUPP and then make EOPNOTSUPP be translated to ENOTTY on
on overlayfs and to ENOIOCTLCMD in ecryptfs and in fs/ioctl.c. This way
we get a clean VFS api while retaining current behavior. Amir can do his
cleanup based on that.
On Tue, Apr 22, 2025 at 04:31:29PM +0200, Christian Brauner wrote:
> On Thu, Mar 27, 2025 at 12:39:28PM +0100, Amir Goldstein wrote:
> > On Thu, Mar 27, 2025 at 10:33 AM Andrey Albershteyn <aalbersh@redhat.com> wrote:
> > >
> > > On 2025-03-23 09:56:25, Amir Goldstein wrote:
> > > > On Fri, Mar 21, 2025 at 8:49 PM Andrey Albershteyn <aalbersh@redhat.com> wrote:
> > > > >
> > > > > From: Andrey Albershteyn <aalbersh@redhat.com>
> > > > >
> > > > > Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> > > > > extended attributes/flags. The syscalls take parent directory fd and
> > > > > path to the child together with struct fsxattr.
> > > > >
> > > > > This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> > > > > that file don't need to be open as we can reference it with a path
> > > > > instead of fd. By having this we can manipulated inode extended
> > > > > attributes not only on regular files but also on special ones. This
> > > > > is not possible with FS_IOC_FSSETXATTR ioctl as with special files
> > > > > we can not call ioctl() directly on the filesystem inode using fd.
> > > > >
> > > > > This patch adds two new syscalls which allows userspace to get/set
> > > > > extended inode attributes on special files by using parent directory
> > > > > and a path - *at() like syscall.
> > > > >
> > > > > CC: linux-api@vger.kernel.org
> > > > > CC: linux-fsdevel@vger.kernel.org
> > > > > CC: linux-xfs@vger.kernel.org
> > > > > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > > > > Acked-by: Arnd Bergmann <arnd@arndb.de>
> > > > > ---
> > > > ...
> > > > > +SYSCALL_DEFINE5(setfsxattrat, int, dfd, const char __user *, filename,
> > > > > + struct fsxattr __user *, ufsx, size_t, usize,
> > > > > + unsigned int, at_flags)
> > > > > +{
> > > > > + struct fileattr fa;
> > > > > + struct path filepath;
> > > > > + int error;
> > > > > + unsigned int lookup_flags = 0;
> > > > > + struct filename *name;
> > > > > + struct mnt_idmap *idmap;.
> > > >
> > > > > + struct dentry *dentry;
> > > > > + struct vfsmount *mnt;
> > > > > + struct fsxattr fsx = {};
> > > > > +
> > > > > + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> > > > > + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> > > > > +
> > > > > + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > > + return -EINVAL;
> > > > > +
> > > > > + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> > > > > + lookup_flags |= LOOKUP_FOLLOW;
> > > > > +
> > > > > + if (at_flags & AT_EMPTY_PATH)
> > > > > + lookup_flags |= LOOKUP_EMPTY;
> > > > > +
> > > > > + if (usize > PAGE_SIZE)
> > > > > + return -E2BIG;
> > > > > +
> > > > > + if (usize < FSXATTR_SIZE_VER0)
> > > > > + return -EINVAL;
> > > > > +
> > > > > + error = copy_struct_from_user(&fsx, sizeof(struct fsxattr), ufsx, usize);
> > > > > + if (error)
> > > > > + return error;
> > > > > +
> > > > > + fsxattr_to_fileattr(&fsx, &fa);
> > > > > +
> > > > > + name = getname_maybe_null(filename, at_flags);
> > > > > + if (!name) {
> > > > > + CLASS(fd, f)(dfd);
> > > > > +
> > > > > + if (fd_empty(f))
> > > > > + return -EBADF;
> > > > > +
> > > > > + idmap = file_mnt_idmap(fd_file(f));
> > > > > + dentry = file_dentry(fd_file(f));
> > > > > + mnt = fd_file(f)->f_path.mnt;
> > > > > + } else {
> > > > > + error = filename_lookup(dfd, name, lookup_flags, &filepath,
> > > > > + NULL);
> > > > > + if (error)
> > > > > + return error;
> > > > > +
> > > > > + idmap = mnt_idmap(filepath.mnt);
> > > > > + dentry = filepath.dentry;
> > > > > + mnt = filepath.mnt;
> > > > > + }
> > > > > +
> > > > > + error = mnt_want_write(mnt);
> > > > > + if (!error) {
> > > > > + error = vfs_fileattr_set(idmap, dentry, &fa);
> > > > > + if (error == -ENOIOCTLCMD)
> > > > > + error = -EOPNOTSUPP;
> > > >
> > > > This is awkward.
> > > > vfs_fileattr_set() should return -EOPNOTSUPP.
> > > > ioctl_setflags() could maybe convert it to -ENOIOCTLCMD,
> > > > but looking at similar cases ioctl_fiemap(), ioctl_fsfreeze() the
> > > > ioctl returns -EOPNOTSUPP.
> > > >
> > > > I don't think it is necessarily a bad idea to start returning
> > > > -EOPNOTSUPP instead of -ENOIOCTLCMD for the ioctl
> > > > because that really reflects the fact that the ioctl is now implemented
> > > > in vfs and not in the specific fs.
> > > >
> > > > and I think it would not be a bad idea at all to make that change
> > > > together with the merge of the syscalls as a sort of hint to userspace
> > > > that uses the ioctl, that the sycalls API exists.
> > > >
> > > > Thanks,
> > > > Amir.
> > > >
> > >
> > > Hmm, not sure what you're suggesting here. I see it as:
> > > - get/setfsxattrat should return EOPNOTSUPP as it make more sense
> > > than ENOIOCTLCMD
> > > - ioctl_setflags returns ENOIOCTLCMD which also expected
> > >
> > > Don't really see a reason to change what vfs_fileattr_set() returns
> > > and then copying this if() to other places or start returning
> > > EOPNOTSUPP.
> >
> > ENOIOCTLCMD conceptually means that the ioctl command is unknown
> > This is not the case since ->fileattr_[gs]et() became a vfs API
>
> vfs_fileattr_{g,s}et() should not return ENOIOCTLCMD. Change the return
> code to EOPNOTSUPP and then make EOPNOTSUPP be translated to ENOTTY on
> on overlayfs and to ENOIOCTLCMD in ecryptfs and in fs/ioctl.c. This way
> we get a clean VFS api while retaining current behavior. Amir can do his
> cleanup based on that.
Also this get/set dance is not something new apis should do. It should
be handled like setattr_prepare() or generic_fillattr() where the
filesystem calls a VFS helper and that does all of this based on the
current state of the inode instead of calling into the filesystem twice:
int vfs_fileattr_set(struct mnt_idmap *idmap, struct dentry *dentry,
struct fileattr *fa)
{
<snip>
inode_lock(inode);
err = vfs_fileattr_get(dentry, &old_ma);
if (!err) {
/* initialize missing bits from old_ma */
if (fa->flags_valid) {
<snip>
err = fileattr_set_prepare(inode, &old_ma, fa);
if (!err && !security_inode_setfsxattr(inode, fa))
err = inode->i_op->fileattr_set(idmap, dentry, fa);
On 2025-04-22 17:14:10, Christian Brauner wrote:
> On Tue, Apr 22, 2025 at 04:31:29PM +0200, Christian Brauner wrote:
> > On Thu, Mar 27, 2025 at 12:39:28PM +0100, Amir Goldstein wrote:
> > > On Thu, Mar 27, 2025 at 10:33 AM Andrey Albershteyn <aalbersh@redhat.com> wrote:
> > > >
> > > > On 2025-03-23 09:56:25, Amir Goldstein wrote:
> > > > > On Fri, Mar 21, 2025 at 8:49 PM Andrey Albershteyn <aalbersh@redhat.com> wrote:
> > > > > >
> > > > > > From: Andrey Albershteyn <aalbersh@redhat.com>
> > > > > >
> > > > > > Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> > > > > > extended attributes/flags. The syscalls take parent directory fd and
> > > > > > path to the child together with struct fsxattr.
> > > > > >
> > > > > > This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> > > > > > that file don't need to be open as we can reference it with a path
> > > > > > instead of fd. By having this we can manipulated inode extended
> > > > > > attributes not only on regular files but also on special ones. This
> > > > > > is not possible with FS_IOC_FSSETXATTR ioctl as with special files
> > > > > > we can not call ioctl() directly on the filesystem inode using fd.
> > > > > >
> > > > > > This patch adds two new syscalls which allows userspace to get/set
> > > > > > extended inode attributes on special files by using parent directory
> > > > > > and a path - *at() like syscall.
> > > > > >
> > > > > > CC: linux-api@vger.kernel.org
> > > > > > CC: linux-fsdevel@vger.kernel.org
> > > > > > CC: linux-xfs@vger.kernel.org
> > > > > > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > > > > > Acked-by: Arnd Bergmann <arnd@arndb.de>
> > > > > > ---
> > > > > ...
> > > > > > +SYSCALL_DEFINE5(setfsxattrat, int, dfd, const char __user *, filename,
> > > > > > + struct fsxattr __user *, ufsx, size_t, usize,
> > > > > > + unsigned int, at_flags)
> > > > > > +{
> > > > > > + struct fileattr fa;
> > > > > > + struct path filepath;
> > > > > > + int error;
> > > > > > + unsigned int lookup_flags = 0;
> > > > > > + struct filename *name;
> > > > > > + struct mnt_idmap *idmap;.
> > > > >
> > > > > > + struct dentry *dentry;
> > > > > > + struct vfsmount *mnt;
> > > > > > + struct fsxattr fsx = {};
> > > > > > +
> > > > > > + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> > > > > > + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> > > > > > +
> > > > > > + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> > > > > > + lookup_flags |= LOOKUP_FOLLOW;
> > > > > > +
> > > > > > + if (at_flags & AT_EMPTY_PATH)
> > > > > > + lookup_flags |= LOOKUP_EMPTY;
> > > > > > +
> > > > > > + if (usize > PAGE_SIZE)
> > > > > > + return -E2BIG;
> > > > > > +
> > > > > > + if (usize < FSXATTR_SIZE_VER0)
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + error = copy_struct_from_user(&fsx, sizeof(struct fsxattr), ufsx, usize);
> > > > > > + if (error)
> > > > > > + return error;
> > > > > > +
> > > > > > + fsxattr_to_fileattr(&fsx, &fa);
> > > > > > +
> > > > > > + name = getname_maybe_null(filename, at_flags);
> > > > > > + if (!name) {
> > > > > > + CLASS(fd, f)(dfd);
> > > > > > +
> > > > > > + if (fd_empty(f))
> > > > > > + return -EBADF;
> > > > > > +
> > > > > > + idmap = file_mnt_idmap(fd_file(f));
> > > > > > + dentry = file_dentry(fd_file(f));
> > > > > > + mnt = fd_file(f)->f_path.mnt;
> > > > > > + } else {
> > > > > > + error = filename_lookup(dfd, name, lookup_flags, &filepath,
> > > > > > + NULL);
> > > > > > + if (error)
> > > > > > + return error;
> > > > > > +
> > > > > > + idmap = mnt_idmap(filepath.mnt);
> > > > > > + dentry = filepath.dentry;
> > > > > > + mnt = filepath.mnt;
> > > > > > + }
> > > > > > +
> > > > > > + error = mnt_want_write(mnt);
> > > > > > + if (!error) {
> > > > > > + error = vfs_fileattr_set(idmap, dentry, &fa);
> > > > > > + if (error == -ENOIOCTLCMD)
> > > > > > + error = -EOPNOTSUPP;
> > > > >
> > > > > This is awkward.
> > > > > vfs_fileattr_set() should return -EOPNOTSUPP.
> > > > > ioctl_setflags() could maybe convert it to -ENOIOCTLCMD,
> > > > > but looking at similar cases ioctl_fiemap(), ioctl_fsfreeze() the
> > > > > ioctl returns -EOPNOTSUPP.
> > > > >
> > > > > I don't think it is necessarily a bad idea to start returning
> > > > > -EOPNOTSUPP instead of -ENOIOCTLCMD for the ioctl
> > > > > because that really reflects the fact that the ioctl is now implemented
> > > > > in vfs and not in the specific fs.
> > > > >
> > > > > and I think it would not be a bad idea at all to make that change
> > > > > together with the merge of the syscalls as a sort of hint to userspace
> > > > > that uses the ioctl, that the sycalls API exists.
> > > > >
> > > > > Thanks,
> > > > > Amir.
> > > > >
> > > >
> > > > Hmm, not sure what you're suggesting here. I see it as:
> > > > - get/setfsxattrat should return EOPNOTSUPP as it make more sense
> > > > than ENOIOCTLCMD
> > > > - ioctl_setflags returns ENOIOCTLCMD which also expected
> > > >
> > > > Don't really see a reason to change what vfs_fileattr_set() returns
> > > > and then copying this if() to other places or start returning
> > > > EOPNOTSUPP.
> > >
> > > ENOIOCTLCMD conceptually means that the ioctl command is unknown
> > > This is not the case since ->fileattr_[gs]et() became a vfs API
> >
> > vfs_fileattr_{g,s}et() should not return ENOIOCTLCMD. Change the return
> > code to EOPNOTSUPP and then make EOPNOTSUPP be translated to ENOTTY on
> > on overlayfs and to ENOIOCTLCMD in ecryptfs and in fs/ioctl.c. This way
> > we get a clean VFS api while retaining current behavior. Amir can do his
> > cleanup based on that.
>
> Also this get/set dance is not something new apis should do. It should
> be handled like setattr_prepare() or generic_fillattr() where the
> filesystem calls a VFS helper and that does all of this based on the
> current state of the inode instead of calling into the filesystem twice:
>
> int vfs_fileattr_set(struct mnt_idmap *idmap, struct dentry *dentry,
> struct fileattr *fa)
> {
> <snip>
> inode_lock(inode);
> err = vfs_fileattr_get(dentry, &old_ma);
> if (!err) {
> /* initialize missing bits from old_ma */
> if (fa->flags_valid) {
> <snip>
> err = fileattr_set_prepare(inode, &old_ma, fa);
> if (!err && !security_inode_setfsxattr(inode, fa))
> err = inode->i_op->fileattr_set(idmap, dentry, fa);
>
You mean something like this? (not all fs are done)
--
From 421445f054ccad3116d55ae22c8995a48bb753fd Mon Sep 17 00:00:00 2001
From: Andrey Albershteyn <aalbersh@kernel.org>
Date: Fri, 25 Apr 2025 17:20:42 +0200
Subject: [PATCH] fs: push retrieval of fileattr down to filesystems
Currently, vfs_fileattr_set() calls twice to the file system. Firstly,
to retrieve current state of the inode extended attributes and secondly
to set the new ones.
This patch refactors this in a way that filesystem firstly gets current
inode attribute state and then calls VFS helper to verify them. This way
vfs_fileattr_set() will call filesystem just once.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
---
fs/ext2/ioctl.c | 9 ++++++
fs/ext4/ioctl.c | 9 ++++++
fs/f2fs/file.c | 12 +++++++-
fs/file_attr.c | 62 ++++++++++++++++++++++++----------------
fs/gfs2/file.c | 9 ++++++
fs/hfsplus/inode.c | 9 ++++++
fs/jfs/ioctl.c | 9 +++++-
fs/ntfs3/file.c | 12 +++++++-
fs/orangefs/inode.c | 9 ++++++
fs/ubifs/ioctl.c | 12 +++++++-
fs/xfs/xfs_ioctl.c | 6 ++++
include/linux/fileattr.h | 2 ++
mm/shmem.c | 8 ++++++
13 files changed, 140 insertions(+), 28 deletions(-)
diff --git a/fs/ext2/ioctl.c b/fs/ext2/ioctl.c
index 44e04484e570..3a45ed9c12b7 100644
--- a/fs/ext2/ioctl.c
+++ b/fs/ext2/ioctl.c
@@ -32,6 +32,15 @@ int ext2_fileattr_set(struct mnt_idmap *idmap,
{
struct inode *inode = d_inode(dentry);
struct ext2_inode_info *ei = EXT2_I(inode);
+ struct fileattr cfa;
+ int err;
+
+ err = ext2_fileattr_get(dentry, &cfa);
+ if (err)
+ return err;
+ err = vfs_fileattr_set_prepare(idmap, dentry, &cfa, fa);
+ if (err)
+ return err;
if (fileattr_has_fsx(fa))
return -EOPNOTSUPP;
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index d17207386ead..f988ff4d7256 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -1002,6 +1002,15 @@ int ext4_fileattr_set(struct mnt_idmap *idmap,
struct inode *inode = d_inode(dentry);
u32 flags = fa->flags;
int err = -EOPNOTSUPP;
+ struct fileattr cfa;
+
+ err = ext4_fileattr_get(dentry, &cfa);
+ if (err)
+ return err;
+
+ err = vfs_fileattr_set_prepare(idmap, dentry, &cfa, fa);
+ if (err)
+ return err;
if (flags & ~EXT4_FL_USER_VISIBLE)
goto out;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index abbcbb5865a3..f196a07f1f17 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3371,14 +3371,24 @@ int f2fs_fileattr_set(struct mnt_idmap *idmap,
struct dentry *dentry, struct fileattr *fa)
{
struct inode *inode = d_inode(dentry);
- u32 fsflags = fa->flags, mask = F2FS_SETTABLE_FS_FL;
+ u32 fsflags, mask = F2FS_SETTABLE_FS_FL;
u32 iflags;
+ struct fileattr cfa;
int err;
if (unlikely(f2fs_cp_error(F2FS_I_SB(inode))))
return -EIO;
if (!f2fs_is_checkpoint_ready(F2FS_I_SB(inode)))
return -ENOSPC;
+
+ err = f2fs_fileattr_get(dentry, &cfa);
+ if (err)
+ return err;
+ err = vfs_fileattr_set_prepare(idmap, dentry, &cfa, fa);
+ if (err)
+ return err;
+ fsflags = fa->flags;
+
if (fsflags & ~F2FS_GETTABLE_FS_FL)
return -EOPNOTSUPP;
fsflags &= F2FS_SETTABLE_FS_FL;
diff --git a/fs/file_attr.c b/fs/file_attr.c
index 5e51c5b851ef..d0a01377bca8 100644
--- a/fs/file_attr.c
+++ b/fs/file_attr.c
@@ -7,6 +7,8 @@
#include <linux/fileattr.h>
#include <linux/namei.h>
+#include "internal.h"
+
/**
* fileattr_fill_xflags - initialize fileattr with xflags
* @fa: fileattr pointer
@@ -225,6 +227,36 @@ static int fileattr_set_prepare(struct inode *inode,
return 0;
}
+/**
+ * vfs_fileattr_set_prepare - merge new filettr state and check for validity
+ * @idmap: idmap of the mount
+ * @dentry: the object to change
+ * @cfa: current fileattr state
+ * @fa: fileattr pointer with new values
+ *
+ * Return: 0 on success, or a negative error on failure.
+ */
+int vfs_fileattr_set_prepare(struct mnt_idmap *idmap, struct dentry *dentry,
+ struct fileattr *cfa, struct fileattr *fa)
+{
+ int err;
+
+ /* initialize missing bits from cfa */
+ if (fa->flags_valid) {
+ fa->fsx_xflags |= cfa->fsx_xflags & ~FS_XFLAG_COMMON;
+ fa->fsx_extsize = cfa->fsx_extsize;
+ fa->fsx_nextents = cfa->fsx_nextents;
+ fa->fsx_projid = cfa->fsx_projid;
+ fa->fsx_cowextsize = cfa->fsx_cowextsize;
+ } else {
+ fa->flags |= cfa->flags & ~FS_COMMON_FL;
+ }
+
+ err = fileattr_set_prepare(d_inode(dentry), cfa, fa);
+ return err;
+}
+EXPORT_SYMBOL(vfs_fileattr_set_prepare);
+
/**
* vfs_fileattr_set - change miscellaneous file attributes
* @idmap: idmap of the mount
@@ -245,7 +277,6 @@ int vfs_fileattr_set(struct mnt_idmap *idmap, struct dentry *dentry,
struct fileattr *fa)
{
struct inode *inode = d_inode(dentry);
- struct fileattr old_ma = {};
int err;
if (!inode->i_op->fileattr_set)
@@ -255,29 +286,12 @@ int vfs_fileattr_set(struct mnt_idmap *idmap, struct dentry *dentry,
return -EPERM;
inode_lock(inode);
- err = vfs_fileattr_get(dentry, &old_ma);
- if (!err) {
- /* initialize missing bits from old_ma */
- if (fa->flags_valid) {
- fa->fsx_xflags |= old_ma.fsx_xflags & ~FS_XFLAG_COMMON;
- fa->fsx_extsize = old_ma.fsx_extsize;
- fa->fsx_nextents = old_ma.fsx_nextents;
- fa->fsx_projid = old_ma.fsx_projid;
- fa->fsx_cowextsize = old_ma.fsx_cowextsize;
- } else {
- fa->flags |= old_ma.flags & ~FS_COMMON_FL;
- }
-
- err = fileattr_set_prepare(inode, &old_ma, fa);
- if (err)
- goto out;
- err = security_inode_file_setattr(dentry, fa);
- if (err)
- goto out;
- err = inode->i_op->fileattr_set(idmap, dentry, fa);
- if (err)
- goto out;
- }
+ err = security_inode_file_setattr(dentry, fa);
+ if (err)
+ goto out;
+ err = inode->i_op->fileattr_set(idmap, dentry, fa);
+ if (err)
+ goto out;
out:
inode_unlock(inode);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index fd1147aa3891..cf796fa73af2 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -282,10 +282,19 @@ int gfs2_fileattr_set(struct mnt_idmap *idmap,
u32 fsflags = fa->flags, gfsflags = 0;
u32 mask;
int i;
+ struct fileattr cfa;
+ int error;
if (d_is_special(dentry))
return -ENOTTY;
+ error = gfs2_fileattr_get(dentry, &cfa);
+ if (error)
+ return error;
+ error = vfs_fileattr_set_prepare(idmap, dentry, &cfa, fa);
+ if (error)
+ return error;
+
if (fileattr_has_fsx(fa))
return -EOPNOTSUPP;
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index f331e9574217..cdb11d00faea 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -678,6 +678,15 @@ int hfsplus_fileattr_set(struct mnt_idmap *idmap,
struct inode *inode = d_inode(dentry);
struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
unsigned int new_fl = 0;
+ struct fileattr cfa;
+ int err;
+
+ err = hfsplus_fileattr_get(dentry, &cfa);
+ if (err)
+ return err;
+ err = vfs_fileattr_set_prepare(idmap, dentry, &cfa, fa);
+ if (err)
+ return err;
if (fileattr_has_fsx(fa))
return -EOPNOTSUPP;
diff --git a/fs/jfs/ioctl.c b/fs/jfs/ioctl.c
index f7bd7e8f5be4..4c62c14d15b0 100644
--- a/fs/jfs/ioctl.c
+++ b/fs/jfs/ioctl.c
@@ -75,11 +75,18 @@ int jfs_fileattr_set(struct mnt_idmap *idmap,
{
struct inode *inode = d_inode(dentry);
struct jfs_inode_info *jfs_inode = JFS_IP(inode);
- unsigned int flags;
+ unsigned int flags = jfs_inode->mode2 & JFS_FL_USER_VISIBLE;
+ struct fileattr cfa;
+ int err;
if (d_is_special(dentry))
return -ENOTTY;
+ fileattr_fill_flags(&cfa, jfs_map_ext2(flags, 0));
+ err = vfs_fileattr_set_prepare(idmap, dentry, &cfa, fa);
+ if (err)
+ return err;
+
if (fileattr_has_fsx(fa))
return -EOPNOTSUPP;
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index 9b6a3f8d2e7c..bc7ee7595b70 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -83,12 +83,22 @@ int ntfs_fileattr_set(struct mnt_idmap *idmap, struct dentry *dentry,
{
struct inode *inode = d_inode(dentry);
struct ntfs_inode *ni = ntfs_i(inode);
- u32 flags = fa->flags;
+ u32 flags;
unsigned int new_fl = 0;
+ struct fileattr cfa;
+ int err;
+
+ err = ntfs_fileattr_get(dentry, &cfa);
+ if (err)
+ return err;
+ err = vfs_fileattr_set_prepare(idmap, dentry, &cfa, fa);
+ if (err)
+ return err;
if (fileattr_has_fsx(fa))
return -EOPNOTSUPP;
+ flags = fa->flags;
if (flags & ~(FS_IMMUTABLE_FL | FS_APPEND_FL | FS_COMPR_FL))
return -EOPNOTSUPP;
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 5ac743c6bc2e..aecb61146443 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -910,6 +910,15 @@ static int orangefs_fileattr_set(struct mnt_idmap *idmap,
struct dentry *dentry, struct fileattr *fa)
{
u64 val = 0;
+ struct fileattr cfa;
+ int error = 0;
+
+ error = orangefs_fileattr_get(dentry, &cfa);
+ if (error)
+ return error;
+ error = vfs_fileattr_set_prepare(idmap, dentry, &cfa, fa);
+ if (error)
+ return error;
gossip_debug(GOSSIP_FILE_DEBUG, "%s: called on %pd\n", __func__,
dentry);
diff --git a/fs/ubifs/ioctl.c b/fs/ubifs/ioctl.c
index 2c99349cf537..e71e362c786b 100644
--- a/fs/ubifs/ioctl.c
+++ b/fs/ubifs/ioctl.c
@@ -148,14 +148,24 @@ int ubifs_fileattr_set(struct mnt_idmap *idmap,
struct dentry *dentry, struct fileattr *fa)
{
struct inode *inode = d_inode(dentry);
- int flags = fa->flags;
+ int flags;
+ struct fileattr cfa;
+ int err;
if (d_is_special(dentry))
return -ENOTTY;
+ err = ubifs_fileattr_get(dentry, &cfa);
+ if (err)
+ return err;
+ err = vfs_fileattr_set_prepare(idmap, dentry, &cfa, fa);
+ if (err)
+ return err;
+
if (fileattr_has_fsx(fa))
return -EOPNOTSUPP;
+ flags = fa->flags;
if (flags & ~UBIFS_GETTABLE_IOCTL_FLAGS)
return -EOPNOTSUPP;
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index d250f7f74e3b..c861dc1c3cf0 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -733,12 +733,18 @@ xfs_fileattr_set(
struct xfs_dquot *pdqp = NULL;
struct xfs_dquot *olddquot = NULL;
int error;
+ struct fileattr cfa;
trace_xfs_ioctl_setattr(ip);
if (d_is_special(dentry))
return -ENOTTY;
+ xfs_fill_fsxattr(ip, XFS_DATA_FORK, &cfa);
+ error = vfs_fileattr_set_prepare(idmap, dentry, &cfa, fa);
+ if (error)
+ return error;
+
if (!fa->fsx_valid) {
if (fa->flags & ~(FS_IMMUTABLE_FL | FS_APPEND_FL |
FS_NOATIME_FL | FS_NODUMP_FL |
diff --git a/include/linux/fileattr.h b/include/linux/fileattr.h
index f62a5143eb2d..aba76d897533 100644
--- a/include/linux/fileattr.h
+++ b/include/linux/fileattr.h
@@ -75,6 +75,8 @@ static inline bool fileattr_has_fsx(const struct fileattr *fa)
}
int vfs_fileattr_get(struct dentry *dentry, struct fileattr *fa);
+int vfs_fileattr_set_prepare(struct mnt_idmap *idmap, struct dentry *dentry,
+ struct fileattr *cfa, struct fileattr *fa);
int vfs_fileattr_set(struct mnt_idmap *idmap, struct dentry *dentry,
struct fileattr *fa);
int ioctl_getflags(struct file *file, unsigned int __user *argp);
diff --git a/mm/shmem.c b/mm/shmem.c
index 99327c30507c..c2a5991f944f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4199,6 +4199,14 @@ static int shmem_fileattr_set(struct mnt_idmap *idmap,
struct inode *inode = d_inode(dentry);
struct shmem_inode_info *info = SHMEM_I(inode);
int ret, flags;
+ struct fileattr cfa;
+
+ ret = shmem_fileattr_get(dentry, &cfa);
+ if (ret)
+ return ret;
+ ret = vfs_fileattr_set_prepare(idmap, dentry, &cfa, fa);
+ if (ret)
+ return ret;
if (fileattr_has_fsx(fa))
return -EOPNOTSUPP;
--
2.47.2
On Fri, Apr 25, 2025 at 08:16:48PM +0200, Andrey Albershteyn wrote:
> On 2025-04-22 17:14:10, Christian Brauner wrote:
> > On Tue, Apr 22, 2025 at 04:31:29PM +0200, Christian Brauner wrote:
> > > On Thu, Mar 27, 2025 at 12:39:28PM +0100, Amir Goldstein wrote:
> > > > On Thu, Mar 27, 2025 at 10:33 AM Andrey Albershteyn <aalbersh@redhat.com> wrote:
> > > > >
> > > > > On 2025-03-23 09:56:25, Amir Goldstein wrote:
> > > > > > On Fri, Mar 21, 2025 at 8:49 PM Andrey Albershteyn <aalbersh@redhat.com> wrote:
> > > > > > >
> > > > > > > From: Andrey Albershteyn <aalbersh@redhat.com>
> > > > > > >
> > > > > > > Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> > > > > > > extended attributes/flags. The syscalls take parent directory fd and
> > > > > > > path to the child together with struct fsxattr.
> > > > > > >
> > > > > > > This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> > > > > > > that file don't need to be open as we can reference it with a path
> > > > > > > instead of fd. By having this we can manipulated inode extended
> > > > > > > attributes not only on regular files but also on special ones. This
> > > > > > > is not possible with FS_IOC_FSSETXATTR ioctl as with special files
> > > > > > > we can not call ioctl() directly on the filesystem inode using fd.
> > > > > > >
> > > > > > > This patch adds two new syscalls which allows userspace to get/set
> > > > > > > extended inode attributes on special files by using parent directory
> > > > > > > and a path - *at() like syscall.
> > > > > > >
> > > > > > > CC: linux-api@vger.kernel.org
> > > > > > > CC: linux-fsdevel@vger.kernel.org
> > > > > > > CC: linux-xfs@vger.kernel.org
> > > > > > > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com>
> > > > > > > Acked-by: Arnd Bergmann <arnd@arndb.de>
> > > > > > > ---
> > > > > > ...
> > > > > > > +SYSCALL_DEFINE5(setfsxattrat, int, dfd, const char __user *, filename,
> > > > > > > + struct fsxattr __user *, ufsx, size_t, usize,
> > > > > > > + unsigned int, at_flags)
> > > > > > > +{
> > > > > > > + struct fileattr fa;
> > > > > > > + struct path filepath;
> > > > > > > + int error;
> > > > > > > + unsigned int lookup_flags = 0;
> > > > > > > + struct filename *name;
> > > > > > > + struct mnt_idmap *idmap;.
> > > > > >
> > > > > > > + struct dentry *dentry;
> > > > > > > + struct vfsmount *mnt;
> > > > > > > + struct fsxattr fsx = {};
> > > > > > > +
> > > > > > > + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> > > > > > > + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> > > > > > > +
> > > > > > > + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > > > > + return -EINVAL;
> > > > > > > +
> > > > > > > + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> > > > > > > + lookup_flags |= LOOKUP_FOLLOW;
> > > > > > > +
> > > > > > > + if (at_flags & AT_EMPTY_PATH)
> > > > > > > + lookup_flags |= LOOKUP_EMPTY;
> > > > > > > +
> > > > > > > + if (usize > PAGE_SIZE)
> > > > > > > + return -E2BIG;
> > > > > > > +
> > > > > > > + if (usize < FSXATTR_SIZE_VER0)
> > > > > > > + return -EINVAL;
> > > > > > > +
> > > > > > > + error = copy_struct_from_user(&fsx, sizeof(struct fsxattr), ufsx, usize);
> > > > > > > + if (error)
> > > > > > > + return error;
> > > > > > > +
> > > > > > > + fsxattr_to_fileattr(&fsx, &fa);
> > > > > > > +
> > > > > > > + name = getname_maybe_null(filename, at_flags);
> > > > > > > + if (!name) {
> > > > > > > + CLASS(fd, f)(dfd);
> > > > > > > +
> > > > > > > + if (fd_empty(f))
> > > > > > > + return -EBADF;
> > > > > > > +
> > > > > > > + idmap = file_mnt_idmap(fd_file(f));
> > > > > > > + dentry = file_dentry(fd_file(f));
> > > > > > > + mnt = fd_file(f)->f_path.mnt;
> > > > > > > + } else {
> > > > > > > + error = filename_lookup(dfd, name, lookup_flags, &filepath,
> > > > > > > + NULL);
> > > > > > > + if (error)
> > > > > > > + return error;
> > > > > > > +
> > > > > > > + idmap = mnt_idmap(filepath.mnt);
> > > > > > > + dentry = filepath.dentry;
> > > > > > > + mnt = filepath.mnt;
> > > > > > > + }
> > > > > > > +
> > > > > > > + error = mnt_want_write(mnt);
> > > > > > > + if (!error) {
> > > > > > > + error = vfs_fileattr_set(idmap, dentry, &fa);
> > > > > > > + if (error == -ENOIOCTLCMD)
> > > > > > > + error = -EOPNOTSUPP;
> > > > > >
> > > > > > This is awkward.
> > > > > > vfs_fileattr_set() should return -EOPNOTSUPP.
> > > > > > ioctl_setflags() could maybe convert it to -ENOIOCTLCMD,
> > > > > > but looking at similar cases ioctl_fiemap(), ioctl_fsfreeze() the
> > > > > > ioctl returns -EOPNOTSUPP.
> > > > > >
> > > > > > I don't think it is necessarily a bad idea to start returning
> > > > > > -EOPNOTSUPP instead of -ENOIOCTLCMD for the ioctl
> > > > > > because that really reflects the fact that the ioctl is now implemented
> > > > > > in vfs and not in the specific fs.
> > > > > >
> > > > > > and I think it would not be a bad idea at all to make that change
> > > > > > together with the merge of the syscalls as a sort of hint to userspace
> > > > > > that uses the ioctl, that the sycalls API exists.
> > > > > >
> > > > > > Thanks,
> > > > > > Amir.
> > > > > >
> > > > >
> > > > > Hmm, not sure what you're suggesting here. I see it as:
> > > > > - get/setfsxattrat should return EOPNOTSUPP as it make more sense
> > > > > than ENOIOCTLCMD
> > > > > - ioctl_setflags returns ENOIOCTLCMD which also expected
> > > > >
> > > > > Don't really see a reason to change what vfs_fileattr_set() returns
> > > > > and then copying this if() to other places or start returning
> > > > > EOPNOTSUPP.
> > > >
> > > > ENOIOCTLCMD conceptually means that the ioctl command is unknown
> > > > This is not the case since ->fileattr_[gs]et() became a vfs API
> > >
> > > vfs_fileattr_{g,s}et() should not return ENOIOCTLCMD. Change the return
> > > code to EOPNOTSUPP and then make EOPNOTSUPP be translated to ENOTTY on
> > > on overlayfs and to ENOIOCTLCMD in ecryptfs and in fs/ioctl.c. This way
> > > we get a clean VFS api while retaining current behavior. Amir can do his
> > > cleanup based on that.
> >
> > Also this get/set dance is not something new apis should do. It should
> > be handled like setattr_prepare() or generic_fillattr() where the
> > filesystem calls a VFS helper and that does all of this based on the
> > current state of the inode instead of calling into the filesystem twice:
> >
> > int vfs_fileattr_set(struct mnt_idmap *idmap, struct dentry *dentry,
> > struct fileattr *fa)
> > {
> > <snip>
> > inode_lock(inode);
> > err = vfs_fileattr_get(dentry, &old_ma);
> > if (!err) {
> > /* initialize missing bits from old_ma */
> > if (fa->flags_valid) {
> > <snip>
> > err = fileattr_set_prepare(inode, &old_ma, fa);
> > if (!err && !security_inode_setfsxattr(inode, fa))
> > err = inode->i_op->fileattr_set(idmap, dentry, fa);
> >
>
> You mean something like this? (not all fs are done)
Yes, possibly. But don't bother with this now as that'll need some more
thinking and it'll just stall your work for no good reason. Let's just
get the syscalls in mergable shape now.
© 2016 - 2025 Red Hat, Inc.