[PATCH v2 03/11] fsopen.2: document 'new' mount api

Aleksa Sarai posted 11 patches 1 month, 4 weeks ago
[PATCH v2 03/11] fsopen.2: document 'new' mount api
Posted by Aleksa Sarai 1 month, 4 weeks ago
This is loosely based on the original documentation written by David
Howells and later maintained by Christian Brauner, but has been
rewritten to be more from a user perspective (as well as fixing a few
critical mistakes).

Co-developed-by: David Howells <dhowells@redhat.com>
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
 man/man2/fsopen.2 | 319 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 319 insertions(+)

diff --git a/man/man2/fsopen.2 b/man/man2/fsopen.2
new file mode 100644
index 000000000000..ad38ef0782be
--- /dev/null
+++ b/man/man2/fsopen.2
@@ -0,0 +1,319 @@
+.\" Copyright, the authors of the Linux man-pages project
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH fsopen 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+fsopen \- create a new filesystem context
+.SH LIBRARY
+Standard C library
+.RI ( libc ,\~ \-lc )
+.SH SYNOPSIS
+.nf
+.BR "#include <sys/mount.h>"
+.P
+.BI "int fsopen(const char *" fsname ", unsigned int " flags ");"
+.fi
+.SH DESCRIPTION
+The
+.BR fsopen ()
+system call is part of the suite of file descriptor based mount facilities in
+Linux.
+.P
+.BR fsopen ()
+creates a blank filesystem configuration context within the kernel
+for the filesystem named by
+.IR fsname ,
+puts the context into creation mode and attaches it to a file descriptor,
+which is then returned.
+The calling process must have the
+.B \%CAP_SYS_ADMIN
+capability in order to create a new filesystem configuration context.
+.P
+A filesystem configuration context is an in-kernel representation of a pending
+transaction,
+containing a set of configuration parameters that are to be applied
+when creating a new instance of a filesystem
+(or modifying the configuration of an existing filesystem instance,
+such as when using
+.BR fspick (2)).
+.P
+After obtaining a filesystem configuration context with
+.BR fsopen (),
+the general workflow for operating on the context looks like the following:
+.IP (1) 5
+Pass the filesystem context file descriptor to
+.BR fsconfig (2)
+to specify any desired filesystem parameters.
+This may be done as many times as necessary.
+.IP (2)
+Pass the same filesystem context file descriptor to
+.BR fsconfig (2)
+with
+.B \%FSCONFIG_CMD_CREATE
+to create an instance of the configured filesystem.
+.IP (3)
+Pass the same filesystem context file descriptor to
+.BR fsmount (2)
+to create a new mount object for the root of the filesystem,
+which is then attached to a new file descriptor.
+This also places the filesystem context file descriptor into reconfiguration
+mode,
+similar to the mode produced by
+.BR fspick (2).
+.IP (4)
+Use the mount object file descriptor as a
+.I dirfd
+argument to "*at()" system calls;
+or attach the mount object to a mount point
+by passing the mount object file descriptor to
+.BR move_mount (2).
+.P
+A filesystem context will move between different modes throughout its
+lifecycle
+(such as the creation phase when created with
+.BR fsopen (),
+the reconfiguration phase when an existing filesystem instance is selected by
+.BR fspick (2),
+and the intermediate "needs-mount" phase between
+.\" FS_CONTEXT_NEEDS_MOUNT is the term the kernel uses for this.
+.BR \%FSCONFIG_CMD_CREATE
+and
+.BR fsmount (2)),
+which has an impact on what operations are permitted on the filesystem context.
+.P
+The file descriptor returned by
+.BR fsopen ()
+also acts as a channel for filesystem drivers to provide more comprehensive
+error, warning, and information messages
+than are normally provided through the standard
+.BR errno (3)
+interface for system calls.
+If an error occurs at any time during the workflow mentioned above,
+calling
+.BR read (2)
+on the filesystem context file descriptor will retrieve any ancillary
+information about the encountered errors.
+(See the "Message retrieval interface" section for more details on the message
+format.)
+.P
+.I flags
+can be used to control aspects of the creation of the filesystem configuration
+context file descriptor.
+A value for
+.I flags
+is constructed by bitwise ORing
+zero or more of the following constants:
+.RS
+.TP
+.B FSOPEN_CLOEXEC
+Set the close-on-exec
+.RB ( FD_CLOEXEC )
+flag on the new file descriptor.
+See the description of the
+.B O_CLOEXEC
+flag in
+.BR open (2)
+for reasons why this may be useful.
+.RE
+.P
+A list of filesystems supported by the running kernel
+(and thus a list of valid values for
+.IR fsname )
+can be obtained from
+.IR /proc/filesystems .
+(See also
+.BR proc_filesystems (5).)
+.SS Message retrieval interface
+When doing operations on a filesystem configuration context,
+the filesystem driver may choose to provide ancillary information to userspace
+in the form of message strings.
+.P
+The filesystem context file descriptors returned by
+.BR fsopen ()
+and
+.BR fspick (2)
+may be queried for message strings at any time by calling
+.BR read (2)
+on the file descriptor.
+Each call to
+.BR read (2)
+will return a single message,
+prefixed to indicate its class:
+.RS
+.TP
+.B "e <message>"
+An error message was logged.
+This is usually associated with an error being returned from the corresponding
+system call which triggered this message.
+.TP
+.B "w <message>"
+A warning message was logged.
+.TP
+.B "i <message>"
+An informational message was logged.
+.RE
+.P
+Messages are removed from the queue as they are read.
+Note that the message queue has limited depth,
+so it is possible for messages to get lost.
+If there are no messages in the message queue,
+.B read(2)
+will return no data and
+.I errno
+will be set to
+.BR \%ENODATA .
+If the
+.I buf
+argument to
+.BR read (2)
+is not large enough to contain the message,
+.BR read (2)
+will return no data and
+.I errno
+will be set to
+.BR \%EMSGSIZE .
+.P
+If there are multiple filesystem context file descriptors referencing the same
+filesystem instance
+(such as if you call
+.BR fspick (2)
+multiple times for the same mount),
+each one gets its own independent message queue.
+This does not apply to file descriptors that were duplicated with
+.BR dup (2).
+.P
+Messages strings will usually be prefixed by the filesystem driver that logged
+the message, though this may not always be the case.
+See the Linux kernel source code for details.
+.SH RETURN VALUE
+On success, a new file descriptor is returned.
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EFAULT
+.I fsname
+is NULL
+or a pointer to a location
+outside the calling process's accessible address space.
+.TP
+.B EINVAL
+.I flags
+had an invalid flag set.
+.TP
+.B EMFILE
+The calling process has too many open files to create more.
+.TP
+.B ENFILE
+The system has too many open files to create more.
+.TP
+.B ENODEV
+The filesystem named by
+.I fsname
+is not supported by the kernel.
+.TP
+.B ENOMEM
+The kernel could not allocate sufficient memory to complete the operation.
+.TP
+.B EPERM
+The calling process does not have the required
+.B \%CAP_SYS_ADMIN
+capability.
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 5.2.
+.\" commit 24dcb3d90a1f67fe08c68a004af37df059d74005
+glibc 2.36.
+.SH EXAMPLES
+To illustrate the workflow for creating a new mount,
+the following is an example of how to mount an
+.BR ext4 (5)
+filesystem stored on
+.I /dev/sdb1
+onto
+.IR /mnt .
+.P
+.in +4n
+.EX
+int fsfd, mntfd;
+\&
+fsfd = fsopen("ext4", FSOPEN_CLOEXEC);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
+fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0)
+fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME);
+move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
+.EE
+.in
+.P
+First, an ext4 configuration context is created and attached to the file
+descriptor
+.IR fsfd .
+Then, a series of parameters
+(such as the source of the filesystem)
+are provided using
+.BR fsconfig (2),
+followed by the filesystem instance being created with
+.BR \%FSCONFIG_CMD_CREATE .
+.BR fsmount (2)
+is then used to create a new mount object attached to the file descriptor
+.IR mntfd ,
+which is then attached to the intended mount point using
+.BR move_mount (2).
+.P
+The above procedure is functionally equivalent to the following mount operation
+using
+.BR mount (2):
+.P
+.in +4n
+.EX
+mount("/dev/sdb1", "/mnt", "ext4", MS_RELATIME,
+      "ro,noatime,acl,user_xattr,iversion");
+.EE
+.in
+.P
+And here's an example of creating a mount object
+of an NFS server share
+and setting a Smack security module label.
+However, instead of attaching it to a mount point,
+the program uses the mount object directly
+to open a file from the NFS share.
+.P
+.in +4n
+.EX
+int fsfd, mntfd, fd;
+\&
+fsfd = fsopen("nfs", 0);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "example.com/pub/linux", 0);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "nfsvers", "3", 0);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "rsize", "65536", 0);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "wsize", "65536", 0);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "smackfsdef", "foolabel", 0);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "rdma", NULL, 0);
+fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+mntfd = fsmount(fsfd, 0, MOUNT_ATTR_NODEV);
+fd = openat(mntfd, "src/linux-5.2.tar.xz", O_RDONLY);
+.EE
+.in
+.P
+Unlike the previous example,
+this operation has no trivial equivalent with
+.BR mount (2),
+as it was not previously possible to create a mount object
+that is not attached to any mount point.
+.SH SEE ALSO
+.BR fsconfig (2),
+.BR fsmount (2),
+.BR fspick (2),
+.BR mount (2),
+.BR mount_setattr (2),
+.BR move_mount (2),
+.BR open_tree (2),
+.BR mount_namespaces (7)

-- 
2.50.1
Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
Posted by Askar Safin 1 month, 4 weeks ago
 > If there are no messages in the message queue,
 > read(2) will return no data and errno will be set to ENODATA.
 > If the buf argument to read(2) is not large enough to contain the message,
 > read(2) will return no data and errno will be set to EMSGSIZE.

read(2) will return -1 in these cases? If yes, then, please, write this.

Also, I see that you addressed all my requests. Thank you!

And thank you again for writing all these manpages!

--
Askar Safin
https://types.pl/@safinaskar
Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
Posted by Aleksa Sarai 1 month, 3 weeks ago
On 2025-08-08, Askar Safin <safinaskar@zohomail.com> wrote:
>  > If there are no messages in the message queue,
>  > read(2) will return no data and errno will be set to ENODATA.
>  > If the buf argument to read(2) is not large enough to contain the message,
>  > read(2) will return no data and errno will be set to EMSGSIZE.
> 
> read(2) will return -1 in these cases? If yes, then, please, write this.

Yes (well, the syscall returns -EMSGSIZE). I'll try to add a note
without making the paragraph too wordy...

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
Posted by Alejandro Colomar 1 month, 4 weeks ago
Hi Aleksa,

On Thu, Aug 07, 2025 at 03:44:37AM +1000, Aleksa Sarai wrote:
> This is loosely based on the original documentation written by David
> Howells and later maintained by Christian Brauner, but has been
> rewritten to be more from a user perspective (as well as fixing a few
> critical mistakes).
> 
> Co-developed-by: David Howells <dhowells@redhat.com>
> Co-developed-by: Christian Brauner <brauner@kernel.org>

Please use Co-authored-by.  It's documented under CONTRIBUTING.d/:

	$ cat CONTRIBUTING.d/patches/description | grep -A99 Trailer;
	    Trailer
		Sign your patch with "Signed-off-by:".  Read about the
		"Developer's Certificate of Origin" at
		<https://www.kernel.org/doc/Documentation/process/submitting-patches.rst>.
		When appropriate, other tags documented in that file, such as
		"Reported-by:", "Reviewed-by:", "Acked-by:", and "Suggested-by:"
		can be added to the patch.  We use "Co-authored-by:" instead of
		"Co-developed-by:".  Example:

			Signed-off-by: Alejandro Colomar <alx@kernel.org>

I think 'author' is more appropriate than 'developer' for documentation.
It is also more consistent with the Copyright notice, which assigns
copyright to the authors (documented in AUTHORS).  And ironically, even
the kernel documentation about Co-authored-by talks about authorship
instead of development:

	Co-developed-by: states that the patch was co-created by
	multiple developers; it is used to give attribution to
	co-authors (in addition to the author attributed by the From:
	tag) when several people work on a single patch.

> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> ---
>  man/man2/fsopen.2 | 319 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 319 insertions(+)
> 
> diff --git a/man/man2/fsopen.2 b/man/man2/fsopen.2
> new file mode 100644
> index 000000000000..ad38ef0782be
> --- /dev/null
> +++ b/man/man2/fsopen.2
> @@ -0,0 +1,319 @@
> +.\" Copyright, the authors of the Linux man-pages project
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH fsopen 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +fsopen \- create a new filesystem context
> +.SH LIBRARY
> +Standard C library
> +.RI ( libc ,\~ \-lc )
> +.SH SYNOPSIS
> +.nf
> +.BR "#include <sys/mount.h>"
> +.P
> +.BI "int fsopen(const char *" fsname ", unsigned int " flags ");"
> +.fi
> +.SH DESCRIPTION
> +The
> +.BR fsopen ()
> +system call is part of the suite of file descriptor based mount facilities in
> +Linux.
> +.P
> +.BR fsopen ()
> +creates a blank filesystem configuration context within the kernel
> +for the filesystem named by
> +.IR fsname ,
> +puts the context into creation mode and attaches it to a file descriptor,
> +which is then returned.
> +The calling process must have the
> +.B \%CAP_SYS_ADMIN
> +capability in order to create a new filesystem configuration context.
> +.P
> +A filesystem configuration context is an in-kernel representation of a pending
> +transaction,

This page still needs semantic newlines.  (Please review all pages
regarding that.)  (In this specific sentence, I'd break after 'is'.)

> +containing a set of configuration parameters that are to be applied
> +when creating a new instance of a filesystem
> +(or modifying the configuration of an existing filesystem instance,
> +such as when using
> +.BR fspick (2)).
> +.P
> +After obtaining a filesystem configuration context with
> +.BR fsopen (),
> +the general workflow for operating on the context looks like the following:
> +.IP (1) 5
> +Pass the filesystem context file descriptor to
> +.BR fsconfig (2)
> +to specify any desired filesystem parameters.
> +This may be done as many times as necessary.
> +.IP (2)
> +Pass the same filesystem context file descriptor to

Do we need to say "same"?  I guess it's obvious.  Or do you expect
any confusion if we don't?

> +.BR fsconfig (2)
> +with
> +.B \%FSCONFIG_CMD_CREATE
> +to create an instance of the configured filesystem.
> +.IP (3)
> +Pass the same filesystem context file descriptor to
> +.BR fsmount (2)
> +to create a new mount object for the root of the filesystem,
> +which is then attached to a new file descriptor.
> +This also places the filesystem context file descriptor into reconfiguration
> +mode,
> +similar to the mode produced by
> +.BR fspick (2).
> +.IP (4)
> +Use the mount object file descriptor as a
> +.I dirfd
> +argument to "*at()" system calls;
> +or attach the mount object to a mount point
> +by passing the mount object file descriptor to
> +.BR move_mount (2).
> +.P
> +A filesystem context will move between different modes throughout its
> +lifecycle
> +(such as the creation phase when created with
> +.BR fsopen (),
> +the reconfiguration phase when an existing filesystem instance is selected by
> +.BR fspick (2),
> +and the intermediate "needs-mount" phase between
> +.\" FS_CONTEXT_NEEDS_MOUNT is the term the kernel uses for this.
> +.BR \%FSCONFIG_CMD_CREATE
> +and
> +.BR fsmount (2)),
> +which has an impact on what operations are permitted on the filesystem context.
> +.P
> +The file descriptor returned by
> +.BR fsopen ()
> +also acts as a channel for filesystem drivers to provide more comprehensive
> +error, warning, and information messages

Should we just say "diagnostic messages" to avoid explicitly mentioning
all the levels?

> +than are normally provided through the standard
> +.BR errno (3)
> +interface for system calls.
> +If an error occurs at any time during the workflow mentioned above,
> +calling
> +.BR read (2)
> +on the filesystem context file descriptor will retrieve any ancillary
> +information about the encountered errors.
> +(See the "Message retrieval interface" section for more details on the message
> +format.)
> +.P
> +.I flags
> +can be used to control aspects of the creation of the filesystem configuration
> +context file descriptor.
> +A value for
> +.I flags
> +is constructed by bitwise ORing
> +zero or more of the following constants:
> +.RS
> +.TP
> +.B FSOPEN_CLOEXEC
> +Set the close-on-exec
> +.RB ( FD_CLOEXEC )
> +flag on the new file descriptor.
> +See the description of the
> +.B O_CLOEXEC
> +flag in
> +.BR open (2)
> +for reasons why this may be useful.
> +.RE
> +.P
> +A list of filesystems supported by the running kernel
> +(and thus a list of valid values for
> +.IR fsname )
> +can be obtained from
> +.IR /proc/filesystems .
> +(See also
> +.BR proc_filesystems (5).)
> +.SS Message retrieval interface
> +When doing operations on a filesystem configuration context,
> +the filesystem driver may choose to provide ancillary information to userspace
> +in the form of message strings.
> +.P
> +The filesystem context file descriptors returned by
> +.BR fsopen ()
> +and
> +.BR fspick (2)
> +may be queried for message strings at any time by calling
> +.BR read (2)
> +on the file descriptor.
> +Each call to
> +.BR read (2)
> +will return a single message,
> +prefixed to indicate its class:
> +.RS
> +.TP
> +.B "e <message>"
> +An error message was logged.
> +This is usually associated with an error being returned from the corresponding
> +system call which triggered this message.
> +.TP
> +.B "w <message>"
> +A warning message was logged.
> +.TP
> +.B "i <message>"
> +An informational message was logged.
> +.RE
> +.P
> +Messages are removed from the queue as they are read.
> +Note that the message queue has limited depth,
> +so it is possible for messages to get lost.
> +If there are no messages in the message queue,
> +.B read(2)
> +will return no data and
> +.I errno
> +will be set to
> +.BR \%ENODATA .
> +If the
> +.I buf
> +argument to
> +.BR read (2)
> +is not large enough to contain the message,
> +.BR read (2)
> +will return no data and
> +.I errno
> +will be set to
> +.BR \%EMSGSIZE .
> +.P
> +If there are multiple filesystem context file descriptors referencing the same
> +filesystem instance
> +(such as if you call
> +.BR fspick (2)
> +multiple times for the same mount),
> +each one gets its own independent message queue.
> +This does not apply to file descriptors that were duplicated with
> +.BR dup (2).
> +.P
> +Messages strings will usually be prefixed by the filesystem driver that logged

s/Messages/Message/

BTW, here, I'd break after 'prefixed', and then after the ','.

> +the message, though this may not always be the case.
> +See the Linux kernel source code for details.
> +.SH RETURN VALUE
> +On success, a new file descriptor is returned.
> +On error, \-1 is returned, and
> +.I errno
> +is set to indicate the error.
> +.SH ERRORS
> +.TP
> +.B EFAULT
> +.I fsname
> +is NULL
> +or a pointer to a location
> +outside the calling process's accessible address space.
> +.TP
> +.B EINVAL
> +.I flags
> +had an invalid flag set.
> +.TP
> +.B EMFILE
> +The calling process has too many open files to create more.
> +.TP
> +.B ENFILE
> +The system has too many open files to create more.
> +.TP
> +.B ENODEV
> +The filesystem named by
> +.I fsname
> +is not supported by the kernel.
> +.TP
> +.B ENOMEM
> +The kernel could not allocate sufficient memory to complete the operation.
> +.TP
> +.B EPERM
> +The calling process does not have the required
> +.B \%CAP_SYS_ADMIN
> +capability.
> +.SH STANDARDS
> +Linux.
> +.SH HISTORY
> +Linux 5.2.
> +.\" commit 24dcb3d90a1f67fe08c68a004af37df059d74005
> +glibc 2.36.
> +.SH EXAMPLES
> +To illustrate the workflow for creating a new mount,
> +the following is an example of how to mount an
> +.BR ext4 (5)
> +filesystem stored on
> +.I /dev/sdb1
> +onto
> +.IR /mnt .
> +.P
> +.in +4n
> +.EX
> +int fsfd, mntfd;
> +\&
> +fsfd = fsopen("ext4", FSOPEN_CLOEXEC);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0)
> +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> +mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME);
> +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> +.EE
> +.in
> +.P
> +First, an ext4 configuration context is created and attached to the file

Here, I'd break after the ',', and if you need to break again, after
'created'.

> +descriptor
> +.IR fsfd .
> +Then, a series of parameters
> +(such as the source of the filesystem)
> +are provided using
> +.BR fsconfig (2),
> +followed by the filesystem instance being created with
> +.BR \%FSCONFIG_CMD_CREATE .
> +.BR fsmount (2)
> +is then used to create a new mount object attached to the file descriptor
> +.IR mntfd ,
> +which is then attached to the intended mount point using
> +.BR move_mount (2).
> +.P
> +The above procedure is functionally equivalent to the following mount operation
> +using
> +.BR mount (2):
> +.P
> +.in +4n
> +.EX
> +mount("/dev/sdb1", "/mnt", "ext4", MS_RELATIME,
> +      "ro,noatime,acl,user_xattr,iversion");
> +.EE
> +.in
> +.P
> +And here's an example of creating a mount object
> +of an NFS server share
> +and setting a Smack security module label.
> +However, instead of attaching it to a mount point,
> +the program uses the mount object directly
> +to open a file from the NFS share.
> +.P
> +.in +4n
> +.EX
> +int fsfd, mntfd, fd;
> +\&
> +fsfd = fsopen("nfs", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "example.com/pub/linux", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "nfsvers", "3", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "rsize", "65536", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "wsize", "65536", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "smackfsdef", "foolabel", 0);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "rdma", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> +mntfd = fsmount(fsfd, 0, MOUNT_ATTR_NODEV);
> +fd = openat(mntfd, "src/linux-5.2.tar.xz", O_RDONLY);
> +.EE
> +.in
> +.P
> +Unlike the previous example,
> +this operation has no trivial equivalent with
> +.BR mount (2),
> +as it was not previously possible to create a mount object
> +that is not attached to any mount point.
> +.SH SEE ALSO
> +.BR fsconfig (2),
> +.BR fsmount (2),
> +.BR fspick (2),
> +.BR mount (2),
> +.BR mount_setattr (2),
> +.BR move_mount (2),
> +.BR open_tree (2),
> +.BR mount_namespaces (7)

Other than those minor comments, the text LGTM.


Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>
Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
Posted by Aleksa Sarai 1 month, 4 weeks ago
On 2025-08-07, Alejandro Colomar <alx@kernel.org> wrote:
> Hi Aleksa,
> 
> On Thu, Aug 07, 2025 at 03:44:37AM +1000, Aleksa Sarai wrote:
> > This is loosely based on the original documentation written by David
> > Howells and later maintained by Christian Brauner, but has been
> > rewritten to be more from a user perspective (as well as fixing a few
> > critical mistakes).
> > 
> > Co-developed-by: David Howells <dhowells@redhat.com>
> > Co-developed-by: Christian Brauner <brauner@kernel.org>
> 
> Please use Co-authored-by.  It's documented under CONTRIBUTING.d/:
> 
> 	$ cat CONTRIBUTING.d/patches/description | grep -A99 Trailer;
> 	    Trailer
> 		Sign your patch with "Signed-off-by:".  Read about the
> 		"Developer's Certificate of Origin" at
> 		<https://www.kernel.org/doc/Documentation/process/submitting-patches.rst>.
> 		When appropriate, other tags documented in that file, such as
> 		"Reported-by:", "Reviewed-by:", "Acked-by:", and "Suggested-by:"
> 		can be added to the patch.  We use "Co-authored-by:" instead of
> 		"Co-developed-by:".  Example:
> 
> 			Signed-off-by: Alejandro Colomar <alx@kernel.org>
> 
> I think 'author' is more appropriate than 'developer' for documentation.
> It is also more consistent with the Copyright notice, which assigns
> copyright to the authors (documented in AUTHORS).  And ironically, even
> the kernel documentation about Co-authored-by talks about authorship
> instead of development:
> 
> 	Co-developed-by: states that the patch was co-created by
> 	multiple developers; it is used to give attribution to
> 	co-authors (in addition to the author attributed by the From:
> 	tag) when several people work on a single patch.

Sure, fixed.

Can you also clarify whether CONTRIBUTING.d/patches/range-diff is
required for submissions? I don't think b4 supports including it (and I
really would prefer to not have to use raw git-send-email again just for
man-pages -- b4 has so many benefits over raw git-send-email). Is the
b4-style changelog I include in the cover-letter sufficient?

I like to think of myself as a fairly prolific git user, but I don't
think I've ever seen --range-diff= output in a git-send-email patch
before...

> > Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> > ---
> >  man/man2/fsopen.2 | 319 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 319 insertions(+)
> > 
> > diff --git a/man/man2/fsopen.2 b/man/man2/fsopen.2
> > new file mode 100644
> > index 000000000000..ad38ef0782be
> > --- /dev/null
> > +++ b/man/man2/fsopen.2
> > @@ -0,0 +1,319 @@
> > +.\" Copyright, the authors of the Linux man-pages project
> > +.\"
> > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > +.\"
> > +.TH fsopen 2 (date) "Linux man-pages (unreleased)"
> > +.SH NAME
> > +fsopen \- create a new filesystem context
> > +.SH LIBRARY
> > +Standard C library
> > +.RI ( libc ,\~ \-lc )
> > +.SH SYNOPSIS
> > +.nf
> > +.BR "#include <sys/mount.h>"
> > +.P
> > +.BI "int fsopen(const char *" fsname ", unsigned int " flags ");"
> > +.fi
> > +.SH DESCRIPTION
> > +The
> > +.BR fsopen ()
> > +system call is part of the suite of file descriptor based mount facilities in
> > +Linux.
> > +.P
> > +.BR fsopen ()
> > +creates a blank filesystem configuration context within the kernel
> > +for the filesystem named by
> > +.IR fsname ,
> > +puts the context into creation mode and attaches it to a file descriptor,
> > +which is then returned.
> > +The calling process must have the
> > +.B \%CAP_SYS_ADMIN
> > +capability in order to create a new filesystem configuration context.
> > +.P
> > +A filesystem configuration context is an in-kernel representation of a pending
> > +transaction,
> 
> This page still needs semantic newlines.  (Please review all pages
> regarding that.)  (In this specific sentence, I'd break after 'is'.)
> 
> > +containing a set of configuration parameters that are to be applied
> > +when creating a new instance of a filesystem
> > +(or modifying the configuration of an existing filesystem instance,
> > +such as when using
> > +.BR fspick (2)).
> > +.P
> > +After obtaining a filesystem configuration context with
> > +.BR fsopen (),
> > +the general workflow for operating on the context looks like the following:
> > +.IP (1) 5
> > +Pass the filesystem context file descriptor to
> > +.BR fsconfig (2)
> > +to specify any desired filesystem parameters.
> > +This may be done as many times as necessary.
> > +.IP (2)
> > +Pass the same filesystem context file descriptor to
> 
> Do we need to say "same"?  I guess it's obvious.  Or do you expect
> any confusion if we don't?
> 
> > +.BR fsconfig (2)
> > +with
> > +.B \%FSCONFIG_CMD_CREATE
> > +to create an instance of the configured filesystem.
> > +.IP (3)
> > +Pass the same filesystem context file descriptor to
> > +.BR fsmount (2)
> > +to create a new mount object for the root of the filesystem,
> > +which is then attached to a new file descriptor.
> > +This also places the filesystem context file descriptor into reconfiguration
> > +mode,
> > +similar to the mode produced by
> > +.BR fspick (2).
> > +.IP (4)
> > +Use the mount object file descriptor as a
> > +.I dirfd
> > +argument to "*at()" system calls;
> > +or attach the mount object to a mount point
> > +by passing the mount object file descriptor to
> > +.BR move_mount (2).
> > +.P
> > +A filesystem context will move between different modes throughout its
> > +lifecycle
> > +(such as the creation phase when created with
> > +.BR fsopen (),
> > +the reconfiguration phase when an existing filesystem instance is selected by
> > +.BR fspick (2),
> > +and the intermediate "needs-mount" phase between
> > +.\" FS_CONTEXT_NEEDS_MOUNT is the term the kernel uses for this.
> > +.BR \%FSCONFIG_CMD_CREATE
> > +and
> > +.BR fsmount (2)),
> > +which has an impact on what operations are permitted on the filesystem context.
> > +.P
> > +The file descriptor returned by
> > +.BR fsopen ()
> > +also acts as a channel for filesystem drivers to provide more comprehensive
> > +error, warning, and information messages
> 
> Should we just say "diagnostic messages" to avoid explicitly mentioning
> all the levels?
> 
> > +than are normally provided through the standard
> > +.BR errno (3)
> > +interface for system calls.
> > +If an error occurs at any time during the workflow mentioned above,
> > +calling
> > +.BR read (2)
> > +on the filesystem context file descriptor will retrieve any ancillary
> > +information about the encountered errors.
> > +(See the "Message retrieval interface" section for more details on the message
> > +format.)
> > +.P
> > +.I flags
> > +can be used to control aspects of the creation of the filesystem configuration
> > +context file descriptor.
> > +A value for
> > +.I flags
> > +is constructed by bitwise ORing
> > +zero or more of the following constants:
> > +.RS
> > +.TP
> > +.B FSOPEN_CLOEXEC
> > +Set the close-on-exec
> > +.RB ( FD_CLOEXEC )
> > +flag on the new file descriptor.
> > +See the description of the
> > +.B O_CLOEXEC
> > +flag in
> > +.BR open (2)
> > +for reasons why this may be useful.
> > +.RE
> > +.P
> > +A list of filesystems supported by the running kernel
> > +(and thus a list of valid values for
> > +.IR fsname )
> > +can be obtained from
> > +.IR /proc/filesystems .
> > +(See also
> > +.BR proc_filesystems (5).)
> > +.SS Message retrieval interface
> > +When doing operations on a filesystem configuration context,
> > +the filesystem driver may choose to provide ancillary information to userspace
> > +in the form of message strings.
> > +.P
> > +The filesystem context file descriptors returned by
> > +.BR fsopen ()
> > +and
> > +.BR fspick (2)
> > +may be queried for message strings at any time by calling
> > +.BR read (2)
> > +on the file descriptor.
> > +Each call to
> > +.BR read (2)
> > +will return a single message,
> > +prefixed to indicate its class:
> > +.RS
> > +.TP
> > +.B "e <message>"
> > +An error message was logged.
> > +This is usually associated with an error being returned from the corresponding
> > +system call which triggered this message.
> > +.TP
> > +.B "w <message>"
> > +A warning message was logged.
> > +.TP
> > +.B "i <message>"
> > +An informational message was logged.
> > +.RE
> > +.P
> > +Messages are removed from the queue as they are read.
> > +Note that the message queue has limited depth,
> > +so it is possible for messages to get lost.
> > +If there are no messages in the message queue,
> > +.B read(2)
> > +will return no data and
> > +.I errno
> > +will be set to
> > +.BR \%ENODATA .
> > +If the
> > +.I buf
> > +argument to
> > +.BR read (2)
> > +is not large enough to contain the message,
> > +.BR read (2)
> > +will return no data and
> > +.I errno
> > +will be set to
> > +.BR \%EMSGSIZE .
> > +.P
> > +If there are multiple filesystem context file descriptors referencing the same
> > +filesystem instance
> > +(such as if you call
> > +.BR fspick (2)
> > +multiple times for the same mount),
> > +each one gets its own independent message queue.
> > +This does not apply to file descriptors that were duplicated with
> > +.BR dup (2).
> > +.P
> > +Messages strings will usually be prefixed by the filesystem driver that logged
> 
> s/Messages/Message/
> 
> BTW, here, I'd break after 'prefixed', and then after the ','.
> 
> > +the message, though this may not always be the case.
> > +See the Linux kernel source code for details.
> > +.SH RETURN VALUE
> > +On success, a new file descriptor is returned.
> > +On error, \-1 is returned, and
> > +.I errno
> > +is set to indicate the error.
> > +.SH ERRORS
> > +.TP
> > +.B EFAULT
> > +.I fsname
> > +is NULL
> > +or a pointer to a location
> > +outside the calling process's accessible address space.
> > +.TP
> > +.B EINVAL
> > +.I flags
> > +had an invalid flag set.
> > +.TP
> > +.B EMFILE
> > +The calling process has too many open files to create more.
> > +.TP
> > +.B ENFILE
> > +The system has too many open files to create more.
> > +.TP
> > +.B ENODEV
> > +The filesystem named by
> > +.I fsname
> > +is not supported by the kernel.
> > +.TP
> > +.B ENOMEM
> > +The kernel could not allocate sufficient memory to complete the operation.
> > +.TP
> > +.B EPERM
> > +The calling process does not have the required
> > +.B \%CAP_SYS_ADMIN
> > +capability.
> > +.SH STANDARDS
> > +Linux.
> > +.SH HISTORY
> > +Linux 5.2.
> > +.\" commit 24dcb3d90a1f67fe08c68a004af37df059d74005
> > +glibc 2.36.
> > +.SH EXAMPLES
> > +To illustrate the workflow for creating a new mount,
> > +the following is an example of how to mount an
> > +.BR ext4 (5)
> > +filesystem stored on
> > +.I /dev/sdb1
> > +onto
> > +.IR /mnt .
> > +.P
> > +.in +4n
> > +.EX
> > +int fsfd, mntfd;
> > +\&
> > +fsfd = fsopen("ext4", FSOPEN_CLOEXEC);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0)
> > +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> > +mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME);
> > +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> > +.EE
> > +.in
> > +.P
> > +First, an ext4 configuration context is created and attached to the file
> 
> Here, I'd break after the ',', and if you need to break again, after
> 'created'.
> 
> > +descriptor
> > +.IR fsfd .
> > +Then, a series of parameters
> > +(such as the source of the filesystem)
> > +are provided using
> > +.BR fsconfig (2),
> > +followed by the filesystem instance being created with
> > +.BR \%FSCONFIG_CMD_CREATE .
> > +.BR fsmount (2)
> > +is then used to create a new mount object attached to the file descriptor
> > +.IR mntfd ,
> > +which is then attached to the intended mount point using
> > +.BR move_mount (2).
> > +.P
> > +The above procedure is functionally equivalent to the following mount operation
> > +using
> > +.BR mount (2):
> > +.P
> > +.in +4n
> > +.EX
> > +mount("/dev/sdb1", "/mnt", "ext4", MS_RELATIME,
> > +      "ro,noatime,acl,user_xattr,iversion");
> > +.EE
> > +.in
> > +.P
> > +And here's an example of creating a mount object
> > +of an NFS server share
> > +and setting a Smack security module label.
> > +However, instead of attaching it to a mount point,
> > +the program uses the mount object directly
> > +to open a file from the NFS share.
> > +.P
> > +.in +4n
> > +.EX
> > +int fsfd, mntfd, fd;
> > +\&
> > +fsfd = fsopen("nfs", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "example.com/pub/linux", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "nfsvers", "3", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "rsize", "65536", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "wsize", "65536", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "smackfsdef", "foolabel", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "rdma", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> > +mntfd = fsmount(fsfd, 0, MOUNT_ATTR_NODEV);
> > +fd = openat(mntfd, "src/linux-5.2.tar.xz", O_RDONLY);
> > +.EE
> > +.in
> > +.P
> > +Unlike the previous example,
> > +this operation has no trivial equivalent with
> > +.BR mount (2),
> > +as it was not previously possible to create a mount object
> > +that is not attached to any mount point.
> > +.SH SEE ALSO
> > +.BR fsconfig (2),
> > +.BR fsmount (2),
> > +.BR fspick (2),
> > +.BR mount (2),
> > +.BR mount_setattr (2),
> > +.BR move_mount (2),
> > +.BR open_tree (2),
> > +.BR mount_namespaces (7)
> 
> Other than those minor comments, the text LGTM.
> 
> 
> Cheers,
> Alex
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
Posted by Alejandro Colomar 1 month, 4 weeks ago
Hi Aleksa,

On Thu, Aug 07, 2025 at 11:27:04PM +1000, Aleksa Sarai wrote:
> > I think 'author' is more appropriate than 'developer' for documentation.
> > It is also more consistent with the Copyright notice, which assigns
> > copyright to the authors (documented in AUTHORS).  And ironically, even
> > the kernel documentation about Co-authored-by talks about authorship

(Oops, s/Co-authored-by/Co-developed-by/)

> > instead of development:
> > 
> > 	Co-developed-by: states that the patch was co-created by
> > 	multiple developers; it is used to give attribution to
> > 	co-authors (in addition to the author attributed by the From:
> > 	tag) when several people work on a single patch.
> 
> Sure, fixed.
> 
> Can you also clarify whether CONTRIBUTING.d/patches/range-diff is
> required for submissions? I don't think b4 supports including it (and I
> really would prefer to not have to use raw git-send-email again just for
> man-pages -- b4 has so many benefits over raw git-send-email). Is the
> b4-style changelog I include in the cover-letter sufficient?

Yes, that's sufficient.  As Captain Barbossa would say, "the code is
more what you'd call 'guidelines' than actual rules".  ;)

> I like to think of myself as a fairly prolific git user, but I don't
> think I've ever seen --range-diff= output in a git-send-email patch
> before...

Yup, I only learnt about a few years ago.  I have to say it's great as
a reviewer; it changed my efficiency reviewing code when we started
using it at $dayjob-1.

And even as a submitter, it has also saved me a few times, when I
introduced a regression in some revision of a patch set, and I could
easily trace back to the revision where I had introduced it by reading
the range diffs, which are much shorter than the actual code.

Maybe we could ping Konstantin to add this to b4?


Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>
Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
Posted by Aleksa Sarai 1 month, 4 weeks ago
On 2025-08-07, Alejandro Colomar <alx@kernel.org> wrote:
> Hi Aleksa,
> 
> On Thu, Aug 07, 2025 at 11:27:04PM +1000, Aleksa Sarai wrote:
> > > I think 'author' is more appropriate than 'developer' for documentation.
> > > It is also more consistent with the Copyright notice, which assigns
> > > copyright to the authors (documented in AUTHORS).  And ironically, even
> > > the kernel documentation about Co-authored-by talks about authorship
> 
> (Oops, s/Co-authored-by/Co-developed-by/)
> 
> > > instead of development:
> > > 
> > > 	Co-developed-by: states that the patch was co-created by
> > > 	multiple developers; it is used to give attribution to
> > > 	co-authors (in addition to the author attributed by the From:
> > > 	tag) when several people work on a single patch.
> > 
> > Sure, fixed.
> > 
> > Can you also clarify whether CONTRIBUTING.d/patches/range-diff is
> > required for submissions? I don't think b4 supports including it (and I
> > really would prefer to not have to use raw git-send-email again just for
> > man-pages -- b4 has so many benefits over raw git-send-email). Is the
> > b4-style changelog I include in the cover-letter sufficient?
> 
> Yes, that's sufficient.  As Captain Barbossa would say, "the code is
> more what you'd call 'guidelines' than actual rules".  ;)
> 
> > I like to think of myself as a fairly prolific git user, but I don't
> > think I've ever seen --range-diff= output in a git-send-email patch
> > before...
> 
> Yup, I only learnt about a few years ago.  I have to say it's great as
> a reviewer; it changed my efficiency reviewing code when we started
> using it at $dayjob-1.
> 
> And even as a submitter, it has also saved me a few times, when I
> introduced a regression in some revision of a patch set, and I could
> easily trace back to the revision where I had introduced it by reading
> the range diffs, which are much shorter than the actual code.
> 
> Maybe we could ping Konstantin to add this to b4?

Konstantin, would you be interested in a patch to add --range-diff to
the trailing bits of cover letters? I would guess that b4 already has
all of the necessary metadata to reference the right commits.

It seems like a fairly neat way of providing some more metadata about
changes between patchsets, for folks that care about that information.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
Posted by Konstantin Ryabitsev 1 month, 4 weeks ago
On Fri, Aug 08, 2025 at 12:26:48AM +1000, Aleksa Sarai wrote:
> Konstantin, would you be interested in a patch to add --range-diff to
> the trailing bits of cover letters? I would guess that b4 already has
> all of the necessary metadata to reference the right commits.
> 
> It seems like a fairly neat way of providing some more metadata about
> changes between patchsets, for folks that care about that information.

It's already there, just add ${range_diff} to your cover letter template.

Cheers,
-K
Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
Posted by Aleksa Sarai 1 month, 4 weeks ago
On 2025-08-07, Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Fri, Aug 08, 2025 at 12:26:48AM +1000, Aleksa Sarai wrote:
> > Konstantin, would you be interested in a patch to add --range-diff to
> > the trailing bits of cover letters? I would guess that b4 already has
> > all of the necessary metadata to reference the right commits.
> > 
> > It seems like a fairly neat way of providing some more metadata about
> > changes between patchsets, for folks that care about that information.
> 
> It's already there, just add ${range_diff} to your cover letter template.

Oh, my bad... Time to go re-read the b4 docs again.

> Cheers,
> -K
> 

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
Posted by Aleksa Sarai 1 month, 4 weeks ago
On 2025-08-07, Alejandro Colomar <alx@kernel.org> wrote:
> Hi Aleksa,
> 
> On Thu, Aug 07, 2025 at 03:44:37AM +1000, Aleksa Sarai wrote:
> > This is loosely based on the original documentation written by David
> > Howells and later maintained by Christian Brauner, but has been
> > rewritten to be more from a user perspective (as well as fixing a few
> > critical mistakes).
> > 
> > Co-developed-by: David Howells <dhowells@redhat.com>
> > Co-developed-by: Christian Brauner <brauner@kernel.org>
> 
> Please use Co-authored-by.  It's documented under CONTRIBUTING.d/:
> 
> 	$ cat CONTRIBUTING.d/patches/description | grep -A99 Trailer;
> 	    Trailer
> 		Sign your patch with "Signed-off-by:".  Read about the
> 		"Developer's Certificate of Origin" at
> 		<https://www.kernel.org/doc/Documentation/process/submitting-patches.rst>.
> 		When appropriate, other tags documented in that file, such as
> 		"Reported-by:", "Reviewed-by:", "Acked-by:", and "Suggested-by:"
> 		can be added to the patch.  We use "Co-authored-by:" instead of
> 		"Co-developed-by:".  Example:
> 
> 			Signed-off-by: Alejandro Colomar <alx@kernel.org>
> 
> I think 'author' is more appropriate than 'developer' for documentation.
> It is also more consistent with the Copyright notice, which assigns
> copyright to the authors (documented in AUTHORS).  And ironically, even
> the kernel documentation about Co-authored-by talks about authorship
> instead of development:
> 
> 	Co-developed-by: states that the patch was co-created by
> 	multiple developers; it is used to give attribution to
> 	co-authors (in addition to the author attributed by the From:
> 	tag) when several people work on a single patch.
> 
> > Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> > ---
> >  man/man2/fsopen.2 | 319 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 319 insertions(+)
> > 
> > diff --git a/man/man2/fsopen.2 b/man/man2/fsopen.2
> > new file mode 100644
> > index 000000000000..ad38ef0782be
> > --- /dev/null
> > +++ b/man/man2/fsopen.2
> > @@ -0,0 +1,319 @@
> > +.\" Copyright, the authors of the Linux man-pages project
> > +.\"
> > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > +.\"
> > +.TH fsopen 2 (date) "Linux man-pages (unreleased)"
> > +.SH NAME
> > +fsopen \- create a new filesystem context
> > +.SH LIBRARY
> > +Standard C library
> > +.RI ( libc ,\~ \-lc )
> > +.SH SYNOPSIS
> > +.nf
> > +.BR "#include <sys/mount.h>"
> > +.P
> > +.BI "int fsopen(const char *" fsname ", unsigned int " flags ");"
> > +.fi
> > +.SH DESCRIPTION
> > +The
> > +.BR fsopen ()
> > +system call is part of the suite of file descriptor based mount facilities in
> > +Linux.
> > +.P
> > +.BR fsopen ()
> > +creates a blank filesystem configuration context within the kernel
> > +for the filesystem named by
> > +.IR fsname ,
> > +puts the context into creation mode and attaches it to a file descriptor,
> > +which is then returned.
> > +The calling process must have the
> > +.B \%CAP_SYS_ADMIN
> > +capability in order to create a new filesystem configuration context.
> > +.P
> > +A filesystem configuration context is an in-kernel representation of a pending
> > +transaction,
> 
> This page still needs semantic newlines.  (Please review all pages
> regarding that.)  (In this specific sentence, I'd break after 'is'.)

I did try adding them to this page (and all of the other pages -- I
suspect the pages later in the patchset have more aggressive newlining).
If you compare the newline placement between v1 and v2 you'll see that I
have added a lot of newlines in all of the man-pages, but it's possible
I missed a couple of sentences like this one.

To be honest I feel quite lost where the "semantic newlines" school
would deem appropriate to place newlines, and man-pages(7) is very terse
on the topic. Outside of very obvious examples,
it just feels wrong
to have such choppy
line break usage.
I understand
the argument that
this helps
with reviewing diffs,
but I really find it
incredibly unnatural.
(And this tongue-in-cheek example
is probably wrong too.)

> > +containing a set of configuration parameters that are to be applied
> > +when creating a new instance of a filesystem
> > +(or modifying the configuration of an existing filesystem instance,
> > +such as when using
> > +.BR fspick (2)).
> > +.P
> > +After obtaining a filesystem configuration context with
> > +.BR fsopen (),
> > +the general workflow for operating on the context looks like the following:
> > +.IP (1) 5
> > +Pass the filesystem context file descriptor to
> > +.BR fsconfig (2)
> > +to specify any desired filesystem parameters.
> > +This may be done as many times as necessary.
> > +.IP (2)
> > +Pass the same filesystem context file descriptor to
> 
> Do we need to say "same"?  I guess it's obvious.  Or do you expect
> any confusion if we don't?

The first time I saw this interface I was confused when you pass
which file descriptor (especially around the FSCONFIG_CMD_CREATE stage),
so I felt it better to make it clear which file descriptor we are
talking about.

> > +.BR fsconfig (2)
> > +with
> > +.B \%FSCONFIG_CMD_CREATE
> > +to create an instance of the configured filesystem.
> > +.IP (3)
> > +Pass the same filesystem context file descriptor to
> > +.BR fsmount (2)
> > +to create a new mount object for the root of the filesystem,
> > +which is then attached to a new file descriptor.
> > +This also places the filesystem context file descriptor into reconfiguration
> > +mode,
> > +similar to the mode produced by
> > +.BR fspick (2).
> > +.IP (4)
> > +Use the mount object file descriptor as a
> > +.I dirfd
> > +argument to "*at()" system calls;
> > +or attach the mount object to a mount point
> > +by passing the mount object file descriptor to
> > +.BR move_mount (2).
> > +.P
> > +A filesystem context will move between different modes throughout its
> > +lifecycle
> > +(such as the creation phase when created with
> > +.BR fsopen (),
> > +the reconfiguration phase when an existing filesystem instance is selected by
> > +.BR fspick (2),
> > +and the intermediate "needs-mount" phase between
> > +.\" FS_CONTEXT_NEEDS_MOUNT is the term the kernel uses for this.
> > +.BR \%FSCONFIG_CMD_CREATE
> > +and
> > +.BR fsmount (2)),
> > +which has an impact on what operations are permitted on the filesystem context.
> > +.P
> > +The file descriptor returned by
> > +.BR fsopen ()
> > +also acts as a channel for filesystem drivers to provide more comprehensive
> > +error, warning, and information messages
> 
> Should we just say "diagnostic messages" to avoid explicitly mentioning
> all the levels?

Sure.

> > +than are normally provided through the standard
> > +.BR errno (3)
> > +interface for system calls.
> > +If an error occurs at any time during the workflow mentioned above,
> > +calling
> > +.BR read (2)
> > +on the filesystem context file descriptor will retrieve any ancillary
> > +information about the encountered errors.
> > +(See the "Message retrieval interface" section for more details on the message
> > +format.)
> > +.P
> > +.I flags
> > +can be used to control aspects of the creation of the filesystem configuration
> > +context file descriptor.
> > +A value for
> > +.I flags
> > +is constructed by bitwise ORing
> > +zero or more of the following constants:
> > +.RS
> > +.TP
> > +.B FSOPEN_CLOEXEC
> > +Set the close-on-exec
> > +.RB ( FD_CLOEXEC )
> > +flag on the new file descriptor.
> > +See the description of the
> > +.B O_CLOEXEC
> > +flag in
> > +.BR open (2)
> > +for reasons why this may be useful.
> > +.RE
> > +.P
> > +A list of filesystems supported by the running kernel
> > +(and thus a list of valid values for
> > +.IR fsname )
> > +can be obtained from
> > +.IR /proc/filesystems .
> > +(See also
> > +.BR proc_filesystems (5).)
> > +.SS Message retrieval interface
> > +When doing operations on a filesystem configuration context,
> > +the filesystem driver may choose to provide ancillary information to userspace
> > +in the form of message strings.
> > +.P
> > +The filesystem context file descriptors returned by
> > +.BR fsopen ()
> > +and
> > +.BR fspick (2)
> > +may be queried for message strings at any time by calling
> > +.BR read (2)
> > +on the file descriptor.
> > +Each call to
> > +.BR read (2)
> > +will return a single message,
> > +prefixed to indicate its class:
> > +.RS
> > +.TP
> > +.B "e <message>"
> > +An error message was logged.
> > +This is usually associated with an error being returned from the corresponding
> > +system call which triggered this message.
> > +.TP
> > +.B "w <message>"
> > +A warning message was logged.
> > +.TP
> > +.B "i <message>"
> > +An informational message was logged.
> > +.RE
> > +.P
> > +Messages are removed from the queue as they are read.
> > +Note that the message queue has limited depth,
> > +so it is possible for messages to get lost.
> > +If there are no messages in the message queue,
> > +.B read(2)
> > +will return no data and
> > +.I errno
> > +will be set to
> > +.BR \%ENODATA .
> > +If the
> > +.I buf
> > +argument to
> > +.BR read (2)
> > +is not large enough to contain the message,
> > +.BR read (2)
> > +will return no data and
> > +.I errno
> > +will be set to
> > +.BR \%EMSGSIZE .
> > +.P
> > +If there are multiple filesystem context file descriptors referencing the same
> > +filesystem instance
> > +(such as if you call
> > +.BR fspick (2)
> > +multiple times for the same mount),
> > +each one gets its own independent message queue.
> > +This does not apply to file descriptors that were duplicated with
> > +.BR dup (2).
> > +.P
> > +Messages strings will usually be prefixed by the filesystem driver that logged
> 
> s/Messages/Message/
> 
> BTW, here, I'd break after 'prefixed', and then after the ','.
> 
> > +the message, though this may not always be the case.
> > +See the Linux kernel source code for details.
> > +.SH RETURN VALUE
> > +On success, a new file descriptor is returned.
> > +On error, \-1 is returned, and
> > +.I errno
> > +is set to indicate the error.
> > +.SH ERRORS
> > +.TP
> > +.B EFAULT
> > +.I fsname
> > +is NULL
> > +or a pointer to a location
> > +outside the calling process's accessible address space.
> > +.TP
> > +.B EINVAL
> > +.I flags
> > +had an invalid flag set.
> > +.TP
> > +.B EMFILE
> > +The calling process has too many open files to create more.
> > +.TP
> > +.B ENFILE
> > +The system has too many open files to create more.
> > +.TP
> > +.B ENODEV
> > +The filesystem named by
> > +.I fsname
> > +is not supported by the kernel.
> > +.TP
> > +.B ENOMEM
> > +The kernel could not allocate sufficient memory to complete the operation.
> > +.TP
> > +.B EPERM
> > +The calling process does not have the required
> > +.B \%CAP_SYS_ADMIN
> > +capability.
> > +.SH STANDARDS
> > +Linux.
> > +.SH HISTORY
> > +Linux 5.2.
> > +.\" commit 24dcb3d90a1f67fe08c68a004af37df059d74005
> > +glibc 2.36.
> > +.SH EXAMPLES
> > +To illustrate the workflow for creating a new mount,
> > +the following is an example of how to mount an
> > +.BR ext4 (5)
> > +filesystem stored on
> > +.I /dev/sdb1
> > +onto
> > +.IR /mnt .
> > +.P
> > +.in +4n
> > +.EX
> > +int fsfd, mntfd;
> > +\&
> > +fsfd = fsopen("ext4", FSOPEN_CLOEXEC);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0)
> > +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> > +mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME);
> > +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> > +.EE
> > +.in
> > +.P
> > +First, an ext4 configuration context is created and attached to the file
> 
> Here, I'd break after the ',', and if you need to break again, after
> 'created'.

Okay, I wanted to avoid having lines with single words due to semantic
newlines, but if that's what you prefer I can update that everywhere...

> > +descriptor
> > +.IR fsfd .
> > +Then, a series of parameters
> > +(such as the source of the filesystem)
> > +are provided using
> > +.BR fsconfig (2),
> > +followed by the filesystem instance being created with
> > +.BR \%FSCONFIG_CMD_CREATE .
> > +.BR fsmount (2)
> > +is then used to create a new mount object attached to the file descriptor
> > +.IR mntfd ,
> > +which is then attached to the intended mount point using
> > +.BR move_mount (2).
> > +.P
> > +The above procedure is functionally equivalent to the following mount operation
> > +using
> > +.BR mount (2):
> > +.P
> > +.in +4n
> > +.EX
> > +mount("/dev/sdb1", "/mnt", "ext4", MS_RELATIME,
> > +      "ro,noatime,acl,user_xattr,iversion");
> > +.EE
> > +.in
> > +.P
> > +And here's an example of creating a mount object
> > +of an NFS server share
> > +and setting a Smack security module label.
> > +However, instead of attaching it to a mount point,
> > +the program uses the mount object directly
> > +to open a file from the NFS share.
> > +.P
> > +.in +4n
> > +.EX
> > +int fsfd, mntfd, fd;
> > +\&
> > +fsfd = fsopen("nfs", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "example.com/pub/linux", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "nfsvers", "3", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "rsize", "65536", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "wsize", "65536", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "smackfsdef", "foolabel", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "rdma", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> > +mntfd = fsmount(fsfd, 0, MOUNT_ATTR_NODEV);
> > +fd = openat(mntfd, "src/linux-5.2.tar.xz", O_RDONLY);
> > +.EE
> > +.in
> > +.P
> > +Unlike the previous example,
> > +this operation has no trivial equivalent with
> > +.BR mount (2),
> > +as it was not previously possible to create a mount object
> > +that is not attached to any mount point.
> > +.SH SEE ALSO
> > +.BR fsconfig (2),
> > +.BR fsmount (2),
> > +.BR fspick (2),
> > +.BR mount (2),
> > +.BR mount_setattr (2),
> > +.BR move_mount (2),
> > +.BR open_tree (2),
> > +.BR mount_namespaces (7)
> 
> Other than those minor comments, the text LGTM.
> 
> 
> Cheers,
> Alex
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
Posted by Alejandro Colomar 1 month, 4 weeks ago
Hi Aleksa,

On Thu, Aug 07, 2025 at 10:50:17PM +1000, Aleksa Sarai wrote:
> > > +A filesystem configuration context is an in-kernel representation of a pending
> > > +transaction,
> > 
> > This page still needs semantic newlines.  (Please review all pages
> > regarding that.)  (In this specific sentence, I'd break after 'is'.)
> 
> I did try adding them to this page (and all of the other pages -- I
> suspect the pages later in the patchset have more aggressive newlining).
> If you compare the newline placement between v1 and v2 you'll see that I
> have added a lot of newlines in all of the man-pages, but it's possible
> I missed a couple of sentences like this one.

Yup, it's quite better.  Thanks!

> To be honest I feel quite lost where the "semantic newlines" school
> would deem appropriate to place newlines, and man-pages(7) is very terse
> on the topic. Outside of very obvious examples,
> it just feels wrong
> to have such choppy
> line break usage.
> I understand
> the argument that
> this helps
> with reviewing diffs,
> but I really find it
> incredibly unnatural.
> (And this tongue-in-cheek example
> is probably wrong too.)

I understand.  The guidelines I use are:

	If there's punctuation, break.
	If there isn't punctuation, but the sentence would go past the
	80-char right margin, try to find the best point to break (this
	is sometimes hard or subjective).
	Other than that, there's no need to break.

Does that seem reasonable?  (I can always amend a few cases that you
don't know where to split.)

> 
> > > +containing a set of configuration parameters that are to be applied
> > > +when creating a new instance of a filesystem
> > > +(or modifying the configuration of an existing filesystem instance,
> > > +such as when using
> > > +.BR fspick (2)).
> > > +.P
> > > +After obtaining a filesystem configuration context with
> > > +.BR fsopen (),
> > > +the general workflow for operating on the context looks like the following:
> > > +.IP (1) 5
> > > +Pass the filesystem context file descriptor to
> > > +.BR fsconfig (2)
> > > +to specify any desired filesystem parameters.
> > > +This may be done as many times as necessary.
> > > +.IP (2)
> > > +Pass the same filesystem context file descriptor to
> > 
> > Do we need to say "same"?  I guess it's obvious.  Or do you expect
> > any confusion if we don't?
> 
> The first time I saw this interface I was confused when you pass
> which file descriptor (especially around the FSCONFIG_CMD_CREATE stage),
> so I felt it better to make it clear which file descriptor we are
> talking about.

Okay.

> > > +.EX
> > > +int fsfd, mntfd;
> > > +\&
> > > +fsfd = fsopen("ext4", FSOPEN_CLOEXEC);
> > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
> > > +fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD);
> > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
> > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
> > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
> > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0)
> > > +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> > > +mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME);
> > > +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> > > +.EE
> > > +.in
> > > +.P
> > > +First, an ext4 configuration context is created and attached to the file
> > 
> > Here, I'd break after the ',', and if you need to break again, after
> > 'created'.
> 
> Okay, I wanted to avoid having lines with single words due to semantic
> newlines, but if that's what you prefer I can update that everywhere...

I don't have a strong opinion on that.  I sometimes avoid the break if
the rest of the sentence is short and all fits in one line, but if you
already need to break, that'd be the first obvious place to look at.
Other times, I have a more pedantic day, and split at every comma, even
unnecessarily.


Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>