[PATCH v3 05/12] man/man2/fspick.2: document "new" mount API

Aleksa Sarai posted 12 patches 4 months, 1 week ago
There is a newer version of this series
[PATCH v3 05/12] man/man2/fspick.2: document "new" mount API
Posted by Aleksa Sarai 4 months, 1 week ago
This is loosely based on the original documentation written by David
Howells and later maintained by Christian Brauner, but has been
rewritten to be more from a user perspective (as well as fixing a few
critical mistakes).

Co-authored-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Co-authored-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
 man/man2/fspick.2 | 309 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 309 insertions(+)

diff --git a/man/man2/fspick.2 b/man/man2/fspick.2
new file mode 100644
index 0000000000000000000000000000000000000000..a1060bcdb7d57b0656d4065683b5c69407550038
--- /dev/null
+++ b/man/man2/fspick.2
@@ -0,0 +1,309 @@
+.\" Copyright, the authors of the Linux man-pages project
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH fspick 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+fspick \- select filesystem for reconfiguration
+.SH LIBRARY
+Standard C library
+.RI ( libc ,\~ \-lc )
+.SH SYNOPSIS
+.nf
+.BR "#include <fcntl.h>" "          /* Definition of " AT_* " constants */"
+.B #include <sys/mount.h>
+.P
+.BI "int fspick(int " dirfd ", const char *" path ", unsigned int " flags ");"
+.fi
+.SH DESCRIPTION
+The
+.BR fspick ()
+system call is part of
+the suite of file descriptor based mount facilities in Linux.
+.P
+.BR fspick()
+creates a new filesystem configuration context
+for the extant filesystem instance
+associated with the path described by
+.IR dirfd
+and
+.IR path ,
+places it into reconfiguration mode
+(similar to
+.BR mount (8)
+with the
+.I -o remount
+option).
+A new file descriptor
+associated with the filesystem configuration context
+is then returned.
+The calling process must have the
+.BR CAP_SYS_ADMIN
+capability in order to create a new filesystem configuration context.
+.P
+The resultant file descriptor can be used with
+.BR fsconfig (2)
+to specify the desired set of changes to
+filesystem parameters of the filesystem instance.
+Once the desired set of changes have been configured,
+the changes can be effectuated by calling
+.BR fsconfig (2)
+with the
+.B \%FSCONFIG_CMD_RECONFIGURE
+command.
+.P
+As with "*at()" system calls,
+.BR fspick ()
+uses the
+.I dirfd
+argument in conjunction with the
+.I path
+argument to determine the path to operate on, as follows:
+.IP \[bu] 3
+If the pathname given in
+.I path
+is absolute, then
+.I dirfd
+is ignored.
+.IP \[bu]
+If the pathname given in
+.I path
+is relative and
+.I dirfd
+is the special value
+.BR \%AT_FDCWD ,
+then
+.I path
+is interpreted relative to
+the current working directory
+of the calling process (like
+.BR open (2)).
+.IP \[bu]
+If the pathname given in
+.I path
+is relative,
+then it is interpreted relative to
+the directory referred to by the file descriptor
+.I dirfd
+(rather than relative to
+the current working directory
+of the calling process,
+as is done by
+.BR open (2)
+for a relative pathname).
+In this case,
+.I dirfd
+must be a directory
+that was opened for reading
+.RB ( O_RDONLY )
+or using the
+.B O_PATH
+flag.
+.IP \[bu]
+If
+.I path
+is an empty string,
+and
+.I flags
+contains
+.BR \%FSPICK_EMPTY_PATH ,
+then the file descriptor
+.I dirfd
+is operated on directly.
+In this case,
+.I dirfd
+may refer to any type of file,
+not just a directory.
+.P
+.I flags
+can be used to control aspects of how
+.I path
+is resolved and
+properties of the returned file descriptor.
+A value for
+.I flags
+is constructed by bitwise ORing
+zero or more of the following constants:
+.RS
+.TP
+.B FSPICK_CLOEXEC
+Set the close-on-exec
+.RB ( FD_CLOEXEC )
+flag on the new file descriptor.
+See the description of the
+.B O_CLOEXEC
+flag in
+.BR open (2)
+for reasons why this may be useful.
+.TP
+.B FSPICK_EMPTY_PATH
+If
+.I path
+is an empty string,
+operate on the file referred to by
+.I dirfd
+(which may have been obtained from
+.BR open (2),
+.BR fsmount (2),
+or
+.BR open_tree (2)).
+In this case,
+.I dirfd
+may refer to any type of file,
+not just a directory.
+If
+.I dirfd
+is
+.BR \%AT_FDCWD ,
+.BR fspick ()
+will operate on the current working directory
+of the calling process.
+.TP
+.B FSPICK_SYMLINK_NOFOLLOW
+Do not follow symbolic links
+in the terminal component of
+.IR path .
+If
+.I path
+references a symbolic link,
+the returned filesystem context will reference
+the filesystem that the symbolic link itself resides on.
+.TP
+.B FSPICK_NO_AUTOMOUNT
+Do not automount any automount points encountered
+while resolving
+.IR path .
+This allows you to reconfigure an automount point,
+rather than the location that would be mounted.
+This flag has no effect if
+the automount point has already been mounted over.
+.RE
+.P
+As with filesystem contexts created with
+.BR fsopen (2),
+the file descriptor returned by
+.BR fspick ()
+may be queried for message strings at any time by calling
+.BR read (2)
+on the file descriptor.
+(See the "Message retrieval interface" subsection in
+.BR fsopen (2)
+for more details on the message format.)
+.SH RETURN VALUE
+On success, a new file descriptor is returned.
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EACCES
+Search permission is denied
+for one of the directories
+in the path prefix of
+.IR path .
+(See also
+.BR path_resolution (7).)
+.TP
+.B EBADF
+.I path
+is relative but
+.I dirfd
+is neither
+.B \%AT_FDCWD
+nor a valid file descriptor.
+.TP
+.B EFAULT
+.I path
+is NULL
+or a pointer to a location
+outside the calling process's accessible address space.
+.TP
+.B EINVAL
+Invalid flag specified in
+.IR flags .
+.TP
+.B ELOOP
+Too many symbolic links encountered when resolving
+.IR path .
+.TP
+.B EMFILE
+The calling process has too many open files to create more.
+.TP
+.B ENAMETOOLONG
+.I path
+is longer than
+.BR PATH_MAX .
+.TP
+.B ENFILE
+The system has too many open files to create more.
+.TP
+.B ENOENT
+A component of
+.I path
+does not exist,
+or is a dangling symbolic link.
+.TP
+.B ENOENT
+.I path
+is an empty string, but
+.B \%FSPICK_EMPTY_PATH
+is not specified in
+.IR flags .
+.TP
+.B ENOTDIR
+A component of the path prefix of
+.I path
+is not a directory;
+or
+.I path
+is relative and
+.I dirfd
+is a file descriptor referring to a file other than a directory.
+.TP
+.B ENOMEM
+The kernel could not allocate sufficient memory to complete the operation.
+.TP
+.B EPERM
+The calling process does not have the required
+.B \%CAP_SYS_ADMIN
+capability.
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 5.2.
+.\" commit cf3cba4a429be43e5527a3f78859b1bfd9ebc5fb
+.\" commit 400913252d09f9cfb8cce33daee43167921fc343
+glibc 2.36.
+.SH EXAMPLES
+The following example sets the read-only flag
+on the filesystem instance referenced by
+the mount object attached at
+.IR /tmp .
+.P
+.in +4n
+.EX
+int fsfd = fspick(AT_FDCWD, "/tmp", FSPICK_CLOEXEC);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
+fsconfig(fsfd, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0);
+.EE
+.in
+.P
+The above procedure is functionally equivalent to
+the following mount operation using
+.BR mount (2):
+.P
+.in +4n
+.EX
+mount(NULL, "/tmp", NULL, MS_REMOUNT | MS_RDONLY, NULL);
+.EE
+.in
+.SH SEE ALSO
+.BR fsconfig (2),
+.BR fsmount (2),
+.BR fsopen (2),
+.BR mount (2),
+.BR mount_setattr (2),
+.BR move_mount (2),
+.BR open_tree (2),
+.BR mount_namespaces (7)
+

-- 
2.50.1
Re: [PATCH v3 05/12] man/man2/fspick.2: document "new" mount API
Posted by Askar Safin 3 months, 3 weeks ago
 ---- On Sat, 09 Aug 2025 00:39:49 +0400  Aleksa Sarai <cyphar@cyphar.com> wrote --- 
 > +The above procedure is functionally equivalent to
 > +the following mount operation using
 > +.BR mount (2):

This is not true.

fspick adds options to superblock. It doesn't remove existing ones.

mount(MS_REMOUNT) replaces options. I. e. mount(2) call provided in
example will unset all other options.

In the end of this message you will find C code, which proves this.

Also, recently I sent patch to mount(2) manpage,
which clarifies MS_REMOUNT | MS_BIND behavior.
This is somewhat related.
The patch comes with another reproducer.
Here is a link: https://lore.kernel.org/linux-man/20250822114315.1571537-1-safinaskar@zohomail.com/

--
Askar Safin
https://types.pl/@safinaskar

// You need to be root (non-initial user namespace will go)

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/mount.h>
#include <sys/syscall.h>
#include <sys/vfs.h>
#include <sys/sysmacros.h>
#include <sys/statvfs.h>
#include <linux/openat2.h>

#define MY_ASSERT(cond) do { \
    if (!(cond)) { \
        fprintf (stderr, "%d: %s: assertion failed\n", __LINE__, #cond); \
        exit (1); \
    } \
} while (0)

int
main (void)
{
    // Init
    {
        MY_ASSERT (chdir ("/") == 0);
        MY_ASSERT (unshare (CLONE_NEWNS) == 0);
        MY_ASSERT (mount (NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL) == 0);
        MY_ASSERT (mount (NULL, "/tmp", "tmpfs", 0, NULL) == 0);
    }

    MY_ASSERT (mkdir ("/tmp/a", 0777) == 0);
    MY_ASSERT (mkdir ("/tmp/b", 0777) == 0);

    {
        MY_ASSERT (mount (NULL, "/tmp/a", "tmpfs", MS_SYNCHRONOUS, NULL) == 0);
        {
            struct statfs buf;
            memset (&buf, 0, sizeof buf);
            MY_ASSERT (statfs ("/tmp/a", &buf) == 0);
            MY_ASSERT (buf.f_flags & ST_SYNCHRONOUS);
            MY_ASSERT (!(buf.f_flags & ST_RDONLY));
        }
        {
            int fsfd = fspick (AT_FDCWD, "/tmp/a", FSPICK_CLOEXEC);
            MY_ASSERT (fsfd >= 0);
            MY_ASSERT (fsconfig (fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0) == 0);
            MY_ASSERT (fsconfig (fsfd, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0) == 0);
            MY_ASSERT (close (fsfd) == 0);
        }
        {
            struct statfs buf;
            memset (&buf, 0, sizeof buf);
            MY_ASSERT (statfs ("/tmp/a", &buf) == 0);
            MY_ASSERT (buf.f_flags & ST_SYNCHRONOUS);
            MY_ASSERT (buf.f_flags & ST_RDONLY);
        }
        MY_ASSERT (umount ("/tmp/a") == 0);
    }

    {
        MY_ASSERT (mount (NULL, "/tmp/a", "tmpfs", MS_SYNCHRONOUS, NULL) == 0);
        {
            struct statfs buf;
            memset (&buf, 0, sizeof buf);
            MY_ASSERT (statfs ("/tmp/a", &buf) == 0);
            MY_ASSERT (buf.f_flags & ST_SYNCHRONOUS);
            MY_ASSERT (!(buf.f_flags & ST_RDONLY));
        }
        MY_ASSERT (mount (NULL, "/tmp/a", NULL, MS_REMOUNT | MS_RDONLY, NULL) == 0);
        {
            struct statfs buf;
            memset (&buf, 0, sizeof buf);
            MY_ASSERT (statfs ("/tmp/a", &buf) == 0);
            MY_ASSERT (!(buf.f_flags & ST_SYNCHRONOUS));
            MY_ASSERT (buf.f_flags & ST_RDONLY);
        }
        MY_ASSERT (umount ("/tmp/a") == 0);
    }

    printf ("All tests passed\n");
    exit (0);
}
Re: [PATCH v3 05/12] man/man2/fspick.2: document "new" mount API
Posted by Aleksa Sarai 3 months, 3 weeks ago
On 2025-08-22, Askar Safin <safinaskar@zohomail.com> wrote:
>  ---- On Sat, 09 Aug 2025 00:39:49 +0400  Aleksa Sarai <cyphar@cyphar.com> wrote --- 
>  > +The above procedure is functionally equivalent to
>  > +the following mount operation using
>  > +.BR mount (2):
> 
> This is not true.
> 
> fspick adds options to superblock. It doesn't remove existing ones.

fspick "copies the existing parameters" would be more accurate. I can
reword this, but it's an example and I don't think it makes sense to add
a large amount of clarifying text for each example.

The comparisons to mount(2) are meant to be indicative, but if you I can
also just remove them (David's versions didn't include them).

> mount(MS_REMOUNT) replaces options. I. e. mount(2) call provided in
> example will unset all other options.
> 
> In the end of this message you will find C code, which proves this.

Yes, I am already keenly aware of this behaviour.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
Re: [PATCH v3 05/12] man/man2/fspick.2: document "new" mount API
Posted by Askar Safin 3 months, 3 weeks ago
 ---- On Fri, 22 Aug 2025 17:40:18 +0400  Aleksa Sarai <cyphar@cyphar.com> wrote --- 
 > On 2025-08-22, Askar Safin <safinaskar@zohomail.com> wrote:
 > >  ---- On Sat, 09 Aug 2025 00:39:49 +0400  Aleksa Sarai <cyphar@cyphar.com> wrote --- 
 > >  > +The above procedure is functionally equivalent to
 > >  > +the following mount operation using
 > >  > +.BR mount (2):
 > > 
 > > This is not true.
 > > 
 > > fspick adds options to superblock. It doesn't remove existing ones.
 > 
 > fspick "copies the existing parameters" would be more accurate. I can
 > reword this, but it's an example and I don't think it makes sense to add
 > a large amount of clarifying text for each example.

I suggest adding "but mount(2) clears existing parameters here, and fspick/fsconfig doesn't".

--
Askar Safin
https://types.pl/@safinaskar