From nobody Thu Oct 2 02:15:09 2025 Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [80.241.56.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0546D30F941; Wed, 24 Sep 2025 15:32:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=80.241.56.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727925; cv=none; b=DdxKKRDvDpcXBuOZQhNzfpfVcr9CeX01tljHaflDcAKwhCpfe2dRSM3yFeaPdBB7Vf/mRWbfkfQHURITr1TCil4TmVT8xjPpGfxNLBvZcWJEypi71fjdtGy3fSLx1dm862OVKZA0fTJda0pfB+O04LxhEv72s14dmFbCluopcAs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727925; c=relaxed/simple; bh=bGGsXcJTqVmACpHPzFsqLbSwIgPQEqN1fqzHOR52pto=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=b56LhvgG2r2bUxS8GYxtZ7MyK20JF6jIK5QC7zsMddfY9oj3Eb1X2KTi8CUcDA3GhK/jVZ9P9AoJuIRUE4owsgLjhN6WKVXu1RKLOQPVcM1YAcv0ToIZKXlmkmUEQZuIDTkZm5/BJwu3T3zph+QFQkvj0PUh6VuPWQsQe9KSWxc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com; spf=pass smtp.mailfrom=cyphar.com; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b=R1aieqww; arc=none smtp.client-ip=80.241.56.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cyphar.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b="R1aieqww" Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4cX16k5RD8z9tVj; Wed, 24 Sep 2025 17:31:58 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1758727918; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZqEBxt5Z3itEPCsRWBRayttX1uU+C4QmKPtz2QLVhR4=; b=R1aieqwwIFtyv11F7W/A87qxR+HV2JcVSl9FfO9PpuQ1V7DscG8aMRJ/eL5AP8kSTtoRV1 YGcPD79d2KqBqZ8dv34z1RpvYgYJHVKUaKvBUobUNw0GmX+MCuCAmcRiItaPnWcEuhwokP mlhBttaqSHHXEu0LZEXylVRyGSt2BsCUs8Yh6gYbGuFg8OIB9ai580UUOuFcaGttLsjkQC uqIe4D0yyigFs5ku77YJ8WSH3gNpcwKvkBQN2rrwIh55z6w9nv4cKLOZpqqrZ3NAxSmLlO 0lWttMipZ+QLPVHdn583ywByAqexI7HASk7UeozjdC6J5qRoVCFbvTOnv1pqyQ== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of cyphar@cyphar.com designates 2001:67c:2050:b231:465::202 as permitted sender) smtp.mailfrom=cyphar@cyphar.com From: Aleksa Sarai Date: Thu, 25 Sep 2025 01:31:23 +1000 Subject: [PATCH v5 1/8] man/man2/fsopen.2: document "new" mount API Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250925-new-mount-api-v5-1-028fb88023f2@cyphar.com> References: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> In-Reply-To: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> To: Alejandro Colomar Cc: "Michael T. Kerrisk" , Alexander Viro , Jan Kara , Askar Safin , "G. Branden Robinson" , linux-man@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, David Howells , Christian Brauner , Aleksa Sarai X-Developer-Signature: v=1; a=openpgp-sha256; l=12296; i=cyphar@cyphar.com; h=from:subject:message-id; bh=bGGsXcJTqVmACpHPzFsqLbSwIgPQEqN1fqzHOR52pto=; b=owGbwMvMwCWmMf3Xpe0vXfIZT6slMWRc4btbosi+v0CK9/zxuVFRsz6sfmpzOXZK1oV9imfPa lg+PPnibsdEFgYxLgZLMUWWbX6eoZvmL76S/GklG8wcViaQIdIiDQwMDAwsDHy5iXmlRjpGeqba hnqGhjpGOkYMXJwCMNV8RQz/vR4EK+re87BWWqH151N0WZOIluG7Vc38uvwO2fFhtyZNZ2TYNyV G9Kr5id12VXWldkxqiyJeHFhtdEo7Z/sis/7lfY5cAA== X-Developer-Key: i=cyphar@cyphar.com; a=openpgp; fpr=C9C370B246B09F6DBCFC744C34401015D1D2D386 X-Rspamd-Queue-Id: 4cX16k5RD8z9tVj This is loosely based on the original documentation written by David Howells and later maintained by Christian Brauner, but has been rewritten to be more from a user perspective (as well as fixing a few critical mistakes). Co-authored-by: David Howells Signed-off-by: David Howells Co-authored-by: Christian Brauner Signed-off-by: Christian Brauner Signed-off-by: Aleksa Sarai --- man/man2/fsopen.2 | 385 ++++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 385 insertions(+) diff --git a/man/man2/fsopen.2 b/man/man2/fsopen.2 new file mode 100644 index 0000000000000000000000000000000000000000..7fbc6c3d28e2e741cd9003c1056= 21b4242abd487 --- /dev/null +++ b/man/man2/fsopen.2 @@ -0,0 +1,385 @@ +.\" Copyright, the authors of the Linux man-pages project +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.TH fsopen 2 (date) "Linux man-pages (unreleased)" +.SH NAME +fsopen \- create a new filesystem context +.SH LIBRARY +Standard C library +.RI ( libc ,\~ \-lc ) +.SH SYNOPSIS +.nf +.B #include +.P +.BI "int fsopen(const char *" fsname ", unsigned int " flags ); +.fi +.SH DESCRIPTION +The +.BR fsopen () +system call is part of +the suite of file-descriptor-based mount facilities in Linux. +.P +.BR fsopen () +creates a blank filesystem configuration context within the kernel +for the filesystem named by +.I fsname +and places it into creation mode. +A new file descriptor +associated with the filesystem configuration context +is then returned. +The calling process must have the +.B \%CAP_SYS_ADMIN +capability in order to create a new filesystem configuration context. +.P +A filesystem configuration context is +an in-kernel representation of a pending transaction, +containing a set of configuration parameters that are to be applied +when creating a new instance of a filesystem +(or modifying the configuration of an existing filesystem instance, +such as when using +.BR fspick (2)). +.P +After obtaining a filesystem configuration context with +.BR fsopen (), +the general workflow for operating on the context looks like the following: +.IP (1) 5 +Pass the filesystem context file descriptor to +.BR fsconfig (2) +to specify any desired filesystem parameters. +This may be done as many times as necessary. +.IP (2) +Pass the same filesystem context file descriptor to +.BR fsconfig (2) +with +.B \%FSCONFIG_CMD_CREATE +to create an instance of the configured filesystem. +.IP (3) +Pass the same filesystem context file descriptor to +.BR fsmount (2) +to create a new detached mount object for +the root of the filesystem instance, +which is then attached to a new file descriptor. +(This also places the filesystem context file descriptor into +reconfiguration mode, +similar to the mode produced by +.BR fspick (2).) +Once a mount object has been created with +.BR fsmount (2), +the filesystem context file descriptor can be safely closed. +.IP (4) +Now that a mount object has been created, +you may +.RS +.IP \[bu] 3 +use the detached mount object file descriptor as a +.I dirfd +argument to "*at()" system calls; +and/or +.IP \[bu] +attach the mount object to a mount point +by passing the mount object file descriptor to +.BR move_mount (2). +This will also prevent the mount object from +being unmounted and destroyed when +the mount object file descriptor is closed. +.RE +.IP +The mount object file descriptor will +remain associated with the mount object +even after doing the above operations, +so you may repeatedly use the mount object file descriptor with +.BR move_mount (2) +and/or "*at()" system calls +as many times as necessary. +.P +A filesystem context will move between different modes +throughout its lifecycle +(such as the creation phase +when created with +.BR fsopen (), +the reconfiguration phase +when an existing filesystem instance is selected with +.BR fspick (2), +and the intermediate "awaiting-mount" phase +.\" FS_CONTEXT_AWAITING_MOUNT is the term the kernel uses for this. +between +.B \%FSCONFIG_CMD_CREATE +and +.BR fsmount (2)), +which has an impact on +what operations are permitted on the filesystem context. +.P +The file descriptor returned by +.BR fsopen () +also acts as a channel for filesystem drivers to +provide more comprehensive diagnostic information +than is normally provided through the standard +.BR errno (3) +interface for system calls. +If an error occurs at any time during the workflow mentioned above, +calling +.BR read (2) +on the filesystem context file descriptor +will retrieve any ancillary information about the encountered errors. +(See the "Message retrieval interface" section +for more details on the message format.) +.P +.I flags +can be used to control aspects of +the creation of the filesystem configuration context file descriptor. +A value for +.I flags +is constructed by bitwise ORing +zero or more of the following constants: +.RS +.TP +.B FSOPEN_CLOEXEC +Set the close-on-exec +.RB ( FD_CLOEXEC ) +flag on the new file descriptor. +See the description of the +.B O_CLOEXEC +flag in +.BR open (2) +for reasons why this may be useful. +.RE +.P +A list of filesystems supported by the running kernel +(and thus a list of valid values for +.IR fsname ) +can be obtained from +.IR /proc/filesystems . +(See also +.BR proc_filesystems (5).) +.SS Message retrieval interface +When doing operations on a filesystem configuration context, +the filesystem driver may choose to provide +ancillary information to userspace +in the form of message strings. +.P +The filesystem context file descriptors returned by +.BR fsopen () +and +.BR fspick (2) +may be queried for message strings at any time by calling +.BR read (2) +on the file descriptor. +Each call to +.BR read (2) +will return a single message, +prefixed to indicate its class: +.RS +.TP +.BI e\~ message +An error message was logged. +This is usually associated with an error being returned +from the corresponding system call which triggered this message. +.TP +.BI w\~ message +A warning message was logged. +.TP +.BI i\~ message +An informational message was logged. +.RE +.P +Messages are removed from the queue as they are read. +Note that the message queue has limited depth, +so it is possible for messages to get lost. +If there are no messages in the message queue, +.B read(2) +will return \-1 and +.I errno +will be set to +.BR \%ENODATA . +If the +.I buf +argument to +.BR read (2) +is not large enough to contain the entire message, +.BR read (2) +will return \-1 and +.I errno +will be set to +.BR \%EMSGSIZE . +(See BUGS.) +.P +If there are multiple filesystem contexts +referencing the same filesystem instance +(such as if you call +.BR fspick (2) +multiple times for the same mount), +each one gets its own independent message queue. +This does not apply to multiple file descriptors that are +tied to the same underlying open file description +(such as those created with +.BR dup (2)). +.P +Message strings will usually be prefixed by +the name of the filesystem or kernel subsystem +that logged the message, +though this may not always be the case. +See the Linux kernel source code for details. +.SH RETURN VALUE +On success, a new file descriptor is returned. +On error, \-1 is returned, and +.I errno +is set to indicate the error. +.SH ERRORS +.TP +.B EFAULT +.I fsname +is NULL +or a pointer to a location +outside the calling process's accessible address space. +.TP +.B EINVAL +.I flags +had an invalid flag set. +.TP +.B EMFILE +The calling process has too many open files to create more. +.TP +.B ENFILE +The system has too many open files to create more. +.TP +.B ENODEV +The filesystem named by +.I fsname +is not supported by the kernel. +.TP +.B ENOMEM +The kernel could not allocate sufficient memory to complete the operation. +.TP +.B EPERM +The calling process does not have the required +.B \%CAP_SYS_ADMIN +capability. +.SH STANDARDS +Linux. +.SH HISTORY +Linux 5.2. +.\" commit 24dcb3d90a1f67fe08c68a004af37df059d74005 +.\" commit 400913252d09f9cfb8cce33daee43167921fc343 +glibc 2.36. +.SH BUGS +.SS Message retrieval interface and \fB\%EMSGSIZE\fP +As described in the "Message retrieval interface" subsection above, +calling +.BR read (2) +with too small a buffer to contain +the next pending message in the message queue +for the filesystem configuration context +will cause +.BR read (2) +to return \-1 and set +.BR errno (3) +to +.BR \%EMSGSIZE . +.P +However, +this failed operation still +consumes the message from the message queue. +This effectively discards the message silently, +as no data is copied into the +.BR read (2) +buffer. +.P +Programs should take care to ensure that +their buffers are sufficiently large +to contain any reasonable message string, +in order to avoid silently losing valuable diagnostic information. +.\" Aleksa Sarai +.\" This unfortunate behaviour has existed since this feature was merged= , but +.\" I have sent a patchset which will finally fix it. +.\" +.SH EXAMPLES +To illustrate the workflow for creating a new mount, +the following is an example of how to mount an +.BR ext4 (5) +filesystem stored on +.I /dev/sdb1 +onto +.IR /mnt . +.P +.in +4n +.EX +int fsfd, mntfd; +\& +fsfd =3D fsopen("ext4", FSOPEN_CLOEXEC); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0); +fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0) +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); +mntfd =3D fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME); +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +First, +an ext4 configuration context is created and attached to the file descript= or +.IR fsfd . +Then, a series of parameters +(such as the source of the filesystem) +are provided using +.BR fsconfig (2), +followed by the filesystem instance being created with +.BR \%FSCONFIG_CMD_CREATE . +.BR fsmount (2) +is then used to create a new mount object attached to the file descriptor +.IR mntfd , +which is then attached to the intended mount point using +.BR move_mount (2). +.P +The above procedure is functionally equivalent to +the following mount operation using +.BR mount (2): +.P +.in +4n +.EX +mount("/dev/sdb1", "/mnt", "ext4", MS_RELATIME, + "ro,noatime,acl,user_xattr,iversion"); +.EE +.in +.P +And here's an example of creating a mount object +of an NFS server share +and setting a Smack security module label. +However, instead of attaching it to a mount point, +the program uses the mount object directly +to open a file from the NFS share. +.P +.in +4n +.EX +int fsfd, mntfd, fd; +\& +fsfd =3D fsopen("nfs", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "example.com/pub", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "nfsvers", "3", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "rsize", "65536", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "wsize", "65536", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "smackfsdef", "foolabel", 0); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "rdma", NULL, 0); +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); +mntfd =3D fsmount(fsfd, 0, MOUNT_ATTR_NODEV); +fd =3D openat(mntfd, "src/linux-5.2.tar.xz", O_RDONLY); +.EE +.in +.P +Unlike the previous example, +this operation has no trivial equivalent with +.BR mount (2), +as it was not previously possible to create a mount object +that is not attached to any mount point. +.SH SEE ALSO +.BR fsconfig (2), +.BR fsmount (2), +.BR fspick (2), +.BR mount (2), +.BR mount_setattr (2), +.BR move_mount (2), +.BR open_tree (2), +.BR mount_namespaces (7) --=20 2.51.0 From nobody Thu Oct 2 02:15:09 2025 Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [80.241.56.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C71530BBA1; Wed, 24 Sep 2025 15:32:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=80.241.56.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727930; cv=none; b=j8MzaODR9hllw/fkkgTGC7IhbVQlmnHRRPGj66BZ1ogdAQhRIlDochUiWjQGcY2VIJ7VllHL2pWy3BU8nhQVc9r5on35ALdeArgVtxyS4Ye+Cuc6f8GoF5EBXPyh87EUgObJbgNElnfreqNAUX5YKdJx++WOrWSDrWArAGi00Fo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727930; c=relaxed/simple; bh=k9cHeEix8A3JLB6qcHM/CwFWw4AO5MexTEadFAgcixU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=BRYfzuEDI3ODr6FScpJNVghZBAlDtzJLNNWbzaJwVUAr0Iyi0ILFyiq+rxjteHSDrBp/not7t0Wo7A29uXNdaEz/RwZ4Pna4GqILCslnIX4/2l/Nclm7jsxPsUszpPrQ9NynEXOKV1rBWnpASCC+ccO+3aXBiOYSCgvXYun5qZo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com; spf=pass smtp.mailfrom=cyphar.com; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b=EPw6+pO1; arc=none smtp.client-ip=80.241.56.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cyphar.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b="EPw6+pO1" Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4cX16r6Bxtz9tV9; Wed, 24 Sep 2025 17:32:04 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1758727924; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d0Gp1o7v3xihQTHcLuuyiFt1h+NYrCI22NIp8uL/H/c=; b=EPw6+pO1K3d63oT9I7MtvrLuYYXcngSwVU4xUV0bhiTB3NEcTn3b6hmhBKga6aeof6FMlI QcX+QJB8Y+2Z9xyQl0hbEYl2qjqtNAMU2M4hBcDZSD79ZKPZbF5kohWlCKBkE5vZyMj5f0 b25BBo7fsVRvJ7KvtHoj6mwlgeowGZY3gsw4k7uuCR7Y8hQA3+TvS5VfP37TvN1GxKrpBc kXxOUQ3agAOu5/7yHGhA5fI0SvPENvo8BHGFuNKnU/c0TAROK0EySdHkm93FGpKD6DlY0B galsFUzygn4RO+6EDxFymDbLIiU8CeBm4g+S8vZa4EISc/XAvk/XhEkhdSpAvw== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of cyphar@cyphar.com designates 2001:67c:2050:b231:465::202 as permitted sender) smtp.mailfrom=cyphar@cyphar.com From: Aleksa Sarai Date: Thu, 25 Sep 2025 01:31:24 +1000 Subject: [PATCH v5 2/8] man/man2/fspick.2: document "new" mount API Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250925-new-mount-api-v5-2-028fb88023f2@cyphar.com> References: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> In-Reply-To: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> To: Alejandro Colomar Cc: "Michael T. Kerrisk" , Alexander Viro , Jan Kara , Askar Safin , "G. Branden Robinson" , linux-man@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, David Howells , Christian Brauner , Aleksa Sarai X-Developer-Signature: v=1; a=openpgp-sha256; l=8505; i=cyphar@cyphar.com; h=from:subject:message-id; bh=k9cHeEix8A3JLB6qcHM/CwFWw4AO5MexTEadFAgcixU=; b=owGbwMvMwCWmMf3Xpe0vXfIZT6slMWRc4bvrPOOVZWWR5LlPDgeWPz71eN5yL/7WrMlu8RkPX 8aVtK283jGRhUGMi8FSTJFlm59n6Kb5i68kf1rJBjOHlQlkiLRIAwMDAwMLA19uYl6pkY6Rnqm2 oZ6hoY6RjhEDF6cAXLUrw//A8w31B5iCDvBPv79xf7HPmYAVq5nmNVVf8//f9MLDnimekeFTVX2 1qcvx/IU2O4N2eESvcX1rfG7Hb1upN5xHHy0/78cBAA== X-Developer-Key: i=cyphar@cyphar.com; a=openpgp; fpr=C9C370B246B09F6DBCFC744C34401015D1D2D386 X-Rspamd-Queue-Id: 4cX16r6Bxtz9tV9 This is loosely based on the original documentation written by David Howells and later maintained by Christian Brauner, but has been rewritten to be more from a user perspective (as well as fixing a few critical mistakes). Co-authored-by: David Howells Signed-off-by: David Howells Co-authored-by: Christian Brauner Signed-off-by: Christian Brauner Signed-off-by: Aleksa Sarai --- man/man2/fspick.2 | 343 ++++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 343 insertions(+) diff --git a/man/man2/fspick.2 b/man/man2/fspick.2 new file mode 100644 index 0000000000000000000000000000000000000000..800aed81d38384be4563f2558e3= cef846d7e7cee --- /dev/null +++ b/man/man2/fspick.2 @@ -0,0 +1,343 @@ +.\" Copyright, the authors of the Linux man-pages project +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.TH fspick 2 (date) "Linux man-pages (unreleased)" +.SH NAME +fspick \- select filesystem for reconfiguration +.SH LIBRARY +Standard C library +.RI ( libc ,\~ \-lc ) +.SH SYNOPSIS +.nf +.BR "#include " " /* Definition of " AT_* " constants */" +.B #include +.P +.BI "int fspick(int " dirfd ", const char *" path ", unsigned int " flags = ); +.fi +.SH DESCRIPTION +The +.BR fspick () +system call is part of +the suite of file-descriptor-based mount facilities in Linux. +.P +.BR fspick () +creates a new filesystem configuration context +for the extant filesystem instance +associated with the path described by +.I dirfd +and +.IR path , +places it into reconfiguration mode +(similar to +.BR mount (8) +with the +.I \-o\~remount +option). +A new file descriptor +associated with the filesystem configuration context +is then returned. +The calling process must have the +.B \%CAP_SYS_ADMIN +capability in order to create a new filesystem configuration context. +.P +The resultant file descriptor can be used with +.BR fsconfig (2) +to specify the desired set of changes to +filesystem parameters of the filesystem instance. +Once the desired set of changes have been configured, +the changes can be effectuated by calling +.BR fsconfig (2) +with the +.B \%FSCONFIG_CMD_RECONFIGURE +command. +In contrast to +the behaviour of +.B MS_REMOUNT +with +.BR mount (2), +.BR fspick () +instantiates the filesystem configuration context +with a copy of +the extant filesystem's filesystem parameters; +thus, +subsequent +.B \%FSCONFIG_CMD_RECONFIGURE +operations +will only update filesystem parameters +explicitly modified with +.BR fsconfig (2). +.P +As with "*at()" system calls, +.BR fspick () +uses the +.I dirfd +argument in conjunction with the +.I path +argument to determine the path to operate on, as follows: +.IP \[bu] 3 +If the pathname given in +.I path +is absolute, then +.I dirfd +is ignored. +.IP \[bu] +If the pathname given in +.I path +is relative and +.I dirfd +is the special value +.BR \%AT_FDCWD , +then +.I path +is interpreted relative to +the current working directory +of the calling process (like +.BR open (2)). +.IP \[bu] +If the pathname given in +.I path +is relative, +then it is interpreted relative to +the directory referred to by the file descriptor +.I dirfd +(rather than relative to +the current working directory +of the calling process, +as is done by +.BR open (2) +for a relative pathname). +In this case, +.I dirfd +must be a directory +that was opened for reading +.RB ( O_RDONLY ) +or using the +.B O_PATH +flag. +.IP \[bu] +If +.I path +is an empty string, +and +.I flags +contains +.BR \%FSPICK_EMPTY_PATH , +then the file descriptor +.I dirfd +is operated on directly. +In this case, +.I dirfd +may refer to any type of file, +not just a directory. +.P +See +.BR openat (2) +for an explanation of why the +.I dirfd +argument is useful. +.P +.I flags +can be used to control aspects of how +.I path +is resolved and +properties of the returned file descriptor. +A value for +.I flags +is constructed by bitwise ORing +zero or more of the following constants: +.RS +.TP +.B FSPICK_CLOEXEC +Set the close-on-exec +.RB ( FD_CLOEXEC ) +flag on the new file descriptor. +See the description of the +.B O_CLOEXEC +flag in +.BR open (2) +for reasons why this may be useful. +.TP +.B FSPICK_EMPTY_PATH +If +.I path +is an empty string, +operate on the file referred to by +.I dirfd +(which may have been obtained from +.BR open (2), +.BR fsmount (2), +or +.BR open_tree (2)). +In this case, +.I dirfd +may refer to any type of file, +not just a directory. +If +.I dirfd +is +.BR \%AT_FDCWD , +.BR fspick () +will operate on the current working directory +of the calling process. +.TP +.B FSPICK_SYMLINK_NOFOLLOW +Do not follow symbolic links +in the terminal component of +.IR path . +If +.I path +references a symbolic link, +the returned filesystem context will reference +the filesystem that the symbolic link itself resides on. +.TP +.B FSPICK_NO_AUTOMOUNT +Do not automount the terminal ("basename") component of +.I path +if it is a directory that is an automount point. +This allows you to reconfigure an automount point, +rather than the location that would be mounted. +This flag has no effect +if the automount point has already been mounted over. +.RE +.P +As with filesystem contexts created with +.BR fsopen (2), +the file descriptor returned by +.BR fspick () +may be queried for message strings at any time by calling +.BR read (2) +on the file descriptor. +(See the "Message retrieval interface" subsection in +.BR fsopen (2) +for more details on the message format.) +.SH RETURN VALUE +On success, a new file descriptor is returned. +On error, \-1 is returned, and +.I errno +is set to indicate the error. +.SH ERRORS +.TP +.B EACCES +Search permission is denied +for one of the directories +in the path prefix of +.IR path . +(See also +.BR path_resolution (7).) +.TP +.B EBADF +.I path +is relative but +.I dirfd +is neither +.B \%AT_FDCWD +nor a valid file descriptor. +.TP +.B EFAULT +.I path +is NULL +or a pointer to a location +outside the calling process's accessible address space. +.TP +.B EINVAL +Invalid flag specified in +.IR flags . +.TP +.B ELOOP +Too many symbolic links encountered when resolving +.IR path . +.TP +.B EMFILE +The calling process has too many open files to create more. +.TP +.B ENAMETOOLONG +.I path +is longer than +.BR PATH_MAX . +.TP +.B ENFILE +The system has too many open files to create more. +.TP +.B ENOENT +A component of +.I path +does not exist, +or is a dangling symbolic link. +.TP +.B ENOENT +.I path +is an empty string, but +.B \%FSPICK_EMPTY_PATH +is not specified in +.IR flags . +.TP +.B ENOTDIR +A component of the path prefix of +.I path +is not a directory; +or +.I path +is relative and +.I dirfd +is a file descriptor referring to a file other than a directory. +.TP +.B ENOMEM +The kernel could not allocate sufficient memory to complete the operation. +.TP +.B EPERM +The calling process does not have the required +.B \%CAP_SYS_ADMIN +capability. +.SH STANDARDS +Linux. +.SH HISTORY +Linux 5.2. +.\" commit cf3cba4a429be43e5527a3f78859b1bfd9ebc5fb +.\" commit 400913252d09f9cfb8cce33daee43167921fc343 +glibc 2.36. +.SH EXAMPLES +The following example sets the read-only flag +on the filesystem instance referenced by +the mount object attached at +.IR /tmp . +.P +.in +4n +.EX +int fsfd =3D fspick(AT_FDCWD, "/tmp", FSPICK_CLOEXEC); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0); +fsconfig(fsfd, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0); +.EE +.in +.P +The above procedure is roughly equivalent to +the following mount operation using +.BR mount (2): +.P +.in +4n +.EX +mount(NULL, "/tmp", NULL, MS_REMOUNT | MS_RDONLY, NULL); +.EE +.in +.P +With the notable caveat that +in this example, +.BR mount (2) +will clear all other filesystem parameters +(such as +.B MS_DIRSYNC +or +.BR MS_SYNCHRONOUS ); +.BR fsconfig (2) +will only modify the +.I ro +parameter. +.SH SEE ALSO +.BR fsconfig (2), +.BR fsmount (2), +.BR fsopen (2), +.BR mount (2), +.BR mount_setattr (2), +.BR move_mount (2), +.BR open_tree (2), +.BR mount_namespaces (7) --=20 2.51.0 From nobody Thu Oct 2 02:15:09 2025 Received: from mout-p-103.mailbox.org (mout-p-103.mailbox.org [80.241.56.161]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE06D30DECF; Wed, 24 Sep 2025 15:32:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=80.241.56.161 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727938; cv=none; b=U4KTlQNmCjUIcGb1X03mI/utwaPkxpoPPJvQmCBN8Nao330fFBq2KIGNwagaD1Bcollu5UyNLvguQ4yNn4Aben82b18oZBSA/ffQHrMEEgYc17cwd7XB4UuEfK6qGwxtbhfZoUb9pkEku3vowYR7cJ/Bb0zlKUz9lP1pbL+SiPI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727938; c=relaxed/simple; bh=J+ikyT2hGfZ5Iht2Nkf283QrqZ6iDubxDBsUM3OTwhQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=PoA7OE5Ght9DE70Mh4Wqsavl4y//2HHRraeVN6+LfTpmB1xj3kSZIEfeFAedvgSdnrUrBj4QIGuLEUDFmGBpB4Ty1635IaN+ZuoKgrma6VHdej0FpWgl/2l3OA5Ffi3HviJrJy0GI0n2Giz58XeQFBmgjZFXX64WE3nqXfH6tzo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com; spf=pass smtp.mailfrom=cyphar.com; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b=IxulPb/h; arc=none smtp.client-ip=80.241.56.161 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cyphar.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b="IxulPb/h" Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-103.mailbox.org (Postfix) with ESMTPS id 4cX16z4ZpGz9ssB; Wed, 24 Sep 2025 17:32:11 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1758727931; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mWZPGPsC2pSzqCQuugC+Ch03v2OVxDXh0QwNgygKrIs=; b=IxulPb/hKHmokcWp7xhCqAuQD3cK80CNezicYdlCBgwotr0e9T0X3HVGdmyYx+QGC3MlRY Qji7vQsomyI0mWY6SD4NOWuG7GrIRR//X893PLzDHGUIWJVoeNSMG3HMwF5XGI3fZyNm5H FoaM3MHn+n//7fc6JrYwKV0w4abi9phUMJHNIj4zVpUMi3arfzLYxpjT2eSgdiaL0hyeHF sJNfEQ956x9yjsZ9XoBifN7VREwyzU7f1yUHqMYIerDE/P9n/3mu8Q06orAdqMefgXpFAF I4tH7BPIr20cKUCsZRu1T4pGtpksRTjenBuQBUCfitsVnDULRJyEURiA06uutQ== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of cyphar@cyphar.com designates 2001:67c:2050:b231:465::202 as permitted sender) smtp.mailfrom=cyphar@cyphar.com From: Aleksa Sarai Date: Thu, 25 Sep 2025 01:31:25 +1000 Subject: [PATCH v5 3/8] man/man2/fsconfig.2: document "new" mount API Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250925-new-mount-api-v5-3-028fb88023f2@cyphar.com> References: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> In-Reply-To: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> To: Alejandro Colomar Cc: "Michael T. Kerrisk" , Alexander Viro , Jan Kara , Askar Safin , "G. Branden Robinson" , linux-man@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, David Howells , Christian Brauner , Aleksa Sarai X-Developer-Signature: v=1; a=openpgp-sha256; l=20473; i=cyphar@cyphar.com; h=from:subject:message-id; bh=J+ikyT2hGfZ5Iht2Nkf283QrqZ6iDubxDBsUM3OTwhQ=; b=owGbwMvMwCWmMf3Xpe0vXfIZT6slMWRc4bsbNXmWXoT1Vn/jewZ3sx2Orb54hTNBqCvhv4+bx Oavm6bpdExkYRDjYrAUU2TZ5ucZumn+4ivJn1aywcxhZQIZIi3SwMDAwMDCwJebmFdqpGOkZ6pt qGdoqGOkY8TAxSkAU71OlOGvxFbN+z8vBzM+PJQlxuy5Z3HTDnfmqIIGs3me8rP3iz75y/Dfjce 6esLVHZZX0+dfv1CxXme1wfdTE0K2Birn+zy6lp/JCQA= X-Developer-Key: i=cyphar@cyphar.com; a=openpgp; fpr=C9C370B246B09F6DBCFC744C34401015D1D2D386 X-Rspamd-Queue-Id: 4cX16z4ZpGz9ssB This is loosely based on the original documentation written by David Howells and later maintained by Christian Brauner, but has been rewritten to be more from a user perspective (as well as fixing a few critical mistakes). Co-authored-by: David Howells Signed-off-by: David Howells Co-authored-by: Christian Brauner Signed-off-by: Christian Brauner Signed-off-by: Aleksa Sarai --- man/man2/fsconfig.2 | 729 ++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 729 insertions(+) diff --git a/man/man2/fsconfig.2 b/man/man2/fsconfig.2 new file mode 100644 index 0000000000000000000000000000000000000000..a2d844a105c74f17af640d69910= 46dbd5fa69cf0 --- /dev/null +++ b/man/man2/fsconfig.2 @@ -0,0 +1,729 @@ +.\" Copyright, the authors of the Linux man-pages project +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.TH fsconfig 2 (date) "Linux man-pages (unreleased)" +.SH NAME +fsconfig \- configure new or existing filesystem context +.SH LIBRARY +Standard C library +.RI ( libc ,\~ \-lc ) +.SH SYNOPSIS +.nf +.B #include +.P +.BI "int fsconfig(int " fd ", unsigned int " cmd , +.BI " const char *_Nullable " key , +.BI " const void *_Nullable " value ", int " aux ); +.fi +.SH DESCRIPTION +The +.BR fsconfig () +system call is part of +the suite of file-descriptor-based mount facilities in Linux. +.P +.BR fsconfig () +is used to supply parameters to +and issue commands against +the filesystem configuration context +associated with the file descriptor +.IR fd . +Filesystem configuration contexts can be created with +.BR fsopen (2) +or be instantiated from an extant filesystem instance with +.BR fspick (2). +.P +The +.I cmd +argument indicates the command to be issued. +Some commands supply parameters to the context +(equivalent to mount options specified with +.BR mount (8)), +while others are meta-operations on the filesystem context. +The list of valid +.I cmd +values are: +.RS +.TP +.B FSCONFIG_SET_FLAG +Set the flag parameter named by +.IR key . +.I value +must be NULL, +and +.I aux +must be 0. +.TP +.B FSCONFIG_SET_STRING +Set the string parameter named by +.I key +to the value specified by +.IR value . +.I value +points to a null-terminated string, +and +.I aux +must be 0. +.TP +.B FSCONFIG_SET_BINARY +Set the blob parameter named by +.I key +to the contents of the binary blob +specified by +.IR value . +.I value +points to +the start of a buffer +that is +.I aux +bytes in length. +.TP +.B FSCONFIG_SET_FD +Set the file parameter named by +.I key +to the open file description +referenced by the file descriptor +.IR aux . +.I value +must be NULL. +.IP +You may also use +.B \%FSCONFIG_SET_STRING +for file parameters, +with +.I value +set to a null-terminated string +containing a base-10 representation +of the file descriptor number. +This mechanism is primarily intended for compatibility +with older +.BR mount (2)-based +programs, +and only works for parameters +that +.I only +accept file descriptor arguments. +.TP +.B FSCONFIG_SET_PATH +Set the path parameter named by +.I key +to the object at a provided path, +resolved in a similar manner to +.BR openat (2). +.I value +points to a null-terminated pathname string, +and +.I aux +is equivalent to the +.I dirfd +argument to +.BR openat (2). +See +.BR openat (2) +for an explanation of the need for +.BR \%FSCONFIG_SET_PATH . +.IP +You may also use +.B \%FSCONFIG_SET_STRING +for path parameters, +the behaviour of which is equivalent to +.B \%FSCONFIG_SET_PATH +with +.I aux +set to +.BR \%AT_FDCWD . +.TP +.B FSCONFIG_SET_PATH_EMPTY +As with +.BR \%FSCONFIG_SET_PATH , +except that if +.I value +is an empty string, +the file descriptor specified by +.I aux +is operated on directly +and may be any type of file +(not just a directory). +This is equivalent to the behaviour of +.B \%AT_EMPTY_PATH +with most "*at()" system calls. +If +.I aux +is +.BR \%AT_FDCWD , +the parameter will be set to +the current working directory +of the calling process. +.TP +.B FSCONFIG_CMD_CREATE +This command instructs the filesystem driver +to instantiate an instance of the filesystem in the kernel +with the parameters specified in the filesystem configuration context. +.I key +and +.I value +must be NULL, +and +.I aux +must be 0. +.IP +This command can only be issued once +in the lifetime of a filesystem context. +If the operation succeeds, +the filesystem context +associated with file descriptor +.I fd +now references the created filesystem instance, +and is placed into a special "awaiting-mount" mode +that allows you to use +.BR fsmount (2) +to create a mount object from the filesystem instance. +.\" FS_CONTEXT_AWAITING_MOUNT is the term the kernel uses for this. +If the operation fails, +in most cases +the filesystem context is placed in a failed mode +and cannot be used for any further +.BR fsconfig () +operations +(though you may still retrieve diagnostic messages +through the message retrieval interface, +as described in +the corresponding subsection of +.BR fsopen (2)). +.IP +This command can only be issued against +filesystem configuration contexts +that were created with +.BR fsopen (2). +In order to create a filesystem instance, +the calling process must have the +.B \%CAP_SYS_ADMIN +capability. +.IP +An important thing to be aware of is that +the Linux kernel will +.I silently +reuse extant filesystem instances +depending on the filesystem type +and the configured parameters +(each filesystem driver has +its own policy for +how filesystem instances are reused). +This means that +the filesystem instance "created" by +.B \%FSCONFIG_CMD_CREATE +may, in fact, be a reference +to an extant filesystem instance in the kernel. +(For reference, +this behaviour also applies to +.BR mount (2).) +.IP +One side-effect of this behaviour is that +if an extant filesystem instance is reused, +.I all +parameters configured +for this filesystem configuration context +are +.I silently ignored +(with the exception of the +.I ro +and +.I rw +flag parameters; +if the state of the read-only flag in the +extant filesystem instance and the filesystem configuration context +do not match, this operation will return +.BR EBUSY ). +This also means that +.B \%FSCONFIG_CMD_RECONFIGURE +commands issued against +the "created" filesystem instance +will also affect any mount objects associated with +the extant filesystem instance. +.IP +Programs that need to ensure +that they create a new filesystem instance +with specific parameters +(notably, security-related parameters +such as +.I acl +to enable POSIX ACLs\[em]\c +as described in +.BR acl (5)) +should use +.B \%FSCONFIG_CMD_CREATE_EXCL +instead. +.TP +.BR FSCONFIG_CMD_CREATE_EXCL " (since Linux 6.6)" +.\" commit 22ed7ecdaefe0cac0c6e6295e83048af60435b13 +.\" commit 84ab1277ce5a90a8d1f377707d662ac43cc0918a +As with +.BR \%FSCONFIG_CMD_CREATE , +except that the kernel is instructed +to not reuse extant filesystem instances. +If the operation +would be forced to +reuse an extant filesystem instance, +this operation will return +.B EBUSY +instead. +.IP +As a result (unlike +.BR \%FSCONFIG_CMD_CREATE ), +if this operation succeeds +then the calling process can be sure that +all of the parameters successfully configured with +.BR fsconfig () +will actually be applied +to the created filesystem instance. +.TP +.B FSCONFIG_CMD_RECONFIGURE +This command instructs the filesystem driver +to apply the parameters specified in the filesystem configuration context +to the extant filesystem instance +referenced by the filesystem configuration context. +.I key +and +.I value +must be NULL, +and +.I aux +must be 0. +.IP +This is primarily intended for use with +.BR fspick (2), +but may also be used to modify +the parameters of a filesystem instance +after +.B \%FSCONFIG_CMD_CREATE +was used to create it +and a mount object was created using +.BR fsmount (2). +In order to reconfigure an extant filesystem instance, +the calling process must have the +.B CAP_SYS_ADMIN +capability. +.IP +If the operation succeeds, +the filesystem context is reset +but remains in reconfiguration mode +and thus can be reused for subsequent +.B \%FSCONFIG_CMD_RECONFIGURE +commands. +If the operation fails, +in most cases +the filesystem context is placed in a failed mode +and cannot be used for any further +.BR fsconfig () +operations +(though you may still retrieve diagnostic messages +through the message retrieval interface, +as described in +the corresponding subsection of +.BR fsopen (2)). +.RE +.P +Parameters specified with +.BI FSCONFIG_SET_ * +do not take effect +until a corresponding +.B \%FSCONFIG_CMD_CREATE +or +.B \%FSCONFIG_CMD_RECONFIGURE +command is issued. +.SH RETURN VALUE +On success, +.BR fsconfig () +returns 0. +On error, \-1 is returned, and +.I errno +is set to indicate the error. +.SH ERRORS +If an error occurs, the filesystem driver may provide +additional information about the error +through the message retrieval interface for filesystem configuration conte= xts. +This additional information can be retrieved at any time by calling +.BR read (2) +on the filesystem instance or filesystem configuration context +referenced by the file descriptor +.IR fd . +(See the "Message retrieval interface" subsection in +.BR fsopen (2) +for more details on the message format.) +.P +Even after an error occurs, +the filesystem configuration context is +.I not +invalidated, +and thus can still be used with other +.BR fsconfig () +commands. +This means that users can probe support for filesystem parameters +on a per-parameter basis, +and adjust which parameters they wish to set. +.P +The error values given below result from +filesystem type independent errors. +Each filesystem type may have its own special errors +and its own special behavior. +See the Linux kernel source code for details. +.TP +.B EACCES +A component of a path +provided as a path parameter +was not searchable. +(See also +.BR path_resolution (7).) +.TP +.B EACCES +.B \%FSCONFIG_CMD_CREATE +was attempted +for a read-only filesystem +without specifying the +.RB ' ro ' +flag parameter. +.TP +.B EACCES +A specified block device parameter +is located on a filesystem +mounted with the +.B \%MS_NODEV +option. +.TP +.B EBADF +The file descriptor given by +.I fd +(or possibly by +.IR aux , +depending on the command) +is invalid. +.TP +.B EBUSY +The filesystem context associated with +.I fd +is in the wrong state +for the given command. +.TP +.B EBUSY +The filesystem instance cannot be reconfigured as read-only +with +.B \%FSCONFIG_CMD_RECONFIGURE +because some programs +still hold files open for writing. +.TP +.B EBUSY +A new filesystem instance was requested with +.B \%FSCONFIG_CMD_CREATE_EXCL +but a matching superblock already existed. +.TP +.B EFAULT +One of the pointer arguments +points to a location +outside the calling process's accessible address space. +.TP +.B EINVAL +.I fd +does not refer to +a filesystem configuration context +or filesystem instance. +.TP +.B EINVAL +One of the values of +.IR name , +.IR value , +and/or +.I aux +were set to a non-zero value when +.I cmd +required that they be zero +(or NULL). +.TP +.B EINVAL +The parameter named by +.I name +cannot be set +using the type specified with +.IR cmd . +.TP +.B EINVAL +One of the source parameters +referred to +an invalid superblock. +.TP +.B ELOOP +Too many links encountered +during pathname resolution +of a path argument. +.TP +.B ENAMETOOLONG +A path argument was longer than +.BR PATH_MAX . +.TP +.B ENOENT +A path argument had a non-existent component. +.TP +.B ENOENT +A path argument is an empty string, +but +.I cmd +is not +.BR \%FSCONFIG_SET_PATH_EMPTY . +.TP +.B ENOMEM +The kernel could not allocate sufficient memory to complete the operation. +.TP +.B ENOTBLK +The parameter named by +.I name +must be a block device, +but the provided parameter value was not a block device. +.TP +.B ENOTDIR +A component of the path prefix +of a path argument +was not a directory. +.TP +.B EOPNOTSUPP +The command given by +.I cmd +is not valid. +.TP +.B ENXIO +The major number +of a block device parameter +is out of range. +.TP +.B EPERM +The command given by +.I cmd +was +.BR \%FSCONFIG_CMD_CREATE , +.BR \%FSCONFIG_CMD_CREATE_EXCL , +or +.BR \%FSCONFIG_CMD_RECONFIGURE , +but the calling process does not have the required +.B \%CAP_SYS_ADMIN +capability. +.SH STANDARDS +Linux. +.SH HISTORY +Linux 5.2. +.\" commit ecdab150fddb42fe6a739335257949220033b782 +.\" commit 400913252d09f9cfb8cce33daee43167921fc343 +glibc 2.36. +.SH NOTES +.SS Generic filesystem parameters +Each filesystem driver is responsible for +parsing most parameters specified with +.BR fsconfig (), +meaning that individual filesystems +may have very different behaviour +when encountering parameters with the same name. +In general, +you should not assume that the behaviour of +.BR fsconfig () +when specifying a parameter to one filesystem type +will match the behaviour of the same parameter +with a different filesystem type. +.P +However, +the following generic parameters +apply to all filesystems and have unified behaviour. +They are set using the listed +.BI \%FSCONFIG_SET_ * +command. +.TP +\fIro\fP and \fIrw\fP (\fB\%FSCONFIG_SET_FLAG\fP) +Configure whether the filesystem instance is read-only. +.TP +\fIdirsync\fP (\fB\%FSCONFIG_SET_FLAG\fP) +Make directory changes on this filesystem instance synchronous. +.TP +\fIsync\fP and \fIasync\fP (\fB\%FSCONFIG_SET_FLAG\fP) +Configure whether writes on this filesystem instance +will be made synchronous +(as though the +.B O_SYNC +flag to +.BR open (2) +was specified for +all file opens in this filesystem instance). +.TP +\fIlazytime\fP and \fInolazytime\fP (\fB\%FSCONFIG_SET_FLAG\fP) +Configure whether to reduce on-disk updates +of inode timestamps on this filesystem instance +(as described in the +.B \%MS_LAZYTIME +section of +.BR mount (2)). +.TP +\fImand\fP and \fInomand\fP (\fB\%FSCONFIG_SET_FLAG\fP) +Configure whether the filesystem instance should permit mandatory locking. +Since Linux 5.15, +.\" commit f7e33bdbd6d1bdf9c3df8bba5abcf3399f957ac3 +mandatory locking has been deprecated +and setting this flag is a no-op. +.TP +\fIsource\fP (\fB\%FSCONFIG_SET_STRING\fP) +This parameter is equivalent to the +.I source +parameter passed to +.BR mount (2) +for the same filesystem type, +and is usually the pathname of a block device +containing the filesystem. +This parameter may only be set once +per filesystem configuration context transaction. +.P +In addition, +any filesystem parameters associated with +Linux Security Modules (LSMs) +are also generic with respect to the underlying filesystem. +See the documentation for the LSM you wish to configure for more details. +.SH CAVEATS +.SS Filesystem parameter types +As a result of +each filesystem driver being responsible for +parsing most parameters specified with +.BR fsconfig (), +some filesystem drivers +may have unintuitive behaviour +with regards to which +.BI \%FSCONFIG_SET_ * +commands are permitted +to configure a given parameter. +.P +In order for +filesystem parameters to be backwards compatible with +.BR mount (2), +they must be parseable as strings; +this almost universally means that +.B \%FSCONFIG_SET_STRING +can also be used to configure them. +.\" Aleksa Sarai +.\" Theoretically, a filesystem could check fc->oldapi and refuse +.\" FSCONFIG_SET_STRING if the operation is coming from the new API, but= no +.\" filesystems do this (and probably never will). +However, other +.BI \%FSCONFIG_SET_ * +commands need to be opted into +by each filesystem driver's parameter parser. +.P +One of the most user-visible instances of +this inconsistency is that +many filesystems do not support +configuring path parameters with +.B \%FSCONFIG_SET_PATH +(despite the name), +which can lead to somewhat confusing +.B EINVAL +errors. +(For example, the generic +.I source +parameter\[em]\c +which is usually a path\[em]\c +can only be configured +with +.BR \%FSCONFIG_SET_STRING .) +.P +When writing programs that use +.BR fsconfig () +to configure parameters +with commands other than +.BR \%FSCONFIG_SET_STRING , +users should verify +that the +.BI \%FSCONFIG_SET_ * +commands used to configure each parameter +are supported by the corresponding filesystem driver. +.\" Aleksa Sarai +.\" While this (quite confusing) inconsistency in behaviour is true today +.\" (and has been true since this was merged), this appears to mostly be= an +.\" unintended consequence of filesystem drivers hand-coding fsparam par= sing. +.\" Path parameters are the most eggregious causes of confusion. +.\" Hopefully we can make this no longer the case in a future kernel. +.SH EXAMPLES +To illustrate the different kinds of flags that can be configured with +.BR fsconfig (), +here are a few examples of some different filesystems being created: +.P +.in +4n +.EX +int fsfd, mntfd; +\& +fsfd =3D fsopen("tmpfs", FSOPEN_CLOEXEC); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "inode64", NULL, 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "uid", "1234", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "huge", "never", 0); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "casefold", NULL, 0); +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); +mntfd =3D fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_NOEXEC); +move_mount(mntfd, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH); +\& +fsfd =3D fsopen("erofs", FSOPEN_CLOEXEC); +fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "/dev/loop0", 0); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0); +fsconfig(fsfd, FSCONFIG_CMD_CREATE_EXCL, NULL, NULL, 0); +mntfd =3D fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_NOSUID); +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +Usually, +specifying the same parameter named by +.I key +multiple times with +.BR fsconfig () +causes the parameter value to be replaced. +However, some filesystems may have unique behaviour: +.P +.in +4n +.EX +\& +int fsfd, mntfd; +int lowerdirfd =3D open("/o/ctr/lower1", O_DIRECTORY | O_CLOEXEC); +\& +fsfd =3D fsopen("overlay", FSOPEN_CLOEXEC); +/* "lowerdir+" appends to the lower dir stack each time */ +fsconfig(fsfd, FSCONFIG_SET_FD, "lowerdir+", NULL, lowerdirfd); +fsconfig(fsfd, FSCONFIG_SET_STRING, "lowerdir+", "/o/ctr/lower2", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "lowerdir+", "/o/ctr/lower3", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "lowerdir+", "/o/ctr/lower4", 0); +.\" fsconfig(fsfd, FSCONFIG_SET_PATH, "lowerdir+", "/o/ctr/lower5", AT_FDC= WD); +.\" fsconfig(fsfd, FSCONFIG_SET_PATH_EMPTY, "lowerdir+", "", lowerdirfd); +.\" Aleksa Sarai: Hopefully these will also be supported in the future. +fsconfig(fsfd, FSCONFIG_SET_STRING, "xino", "auto", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "nfs_export", "off", 0); +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); +mntfd =3D fsmount(fsfd, FSMOUNT_CLOEXEC, 0); +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +And here is an example of how +.BR fspick (2) +can be used with +.BR fsconfig () +to reconfigure the parameters +of an extant filesystem instance +attached to +.IR /proc : +.P +.in +4n +.EX +int fsfd =3D fspick(AT_FDCWD, "/proc", FSPICK_CLOEXEC); +fsconfig(fsfd, FSCONFIG_SET_STRING, "hidepid", "ptraceable", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "subset", "pid", 0); +fsconfig(fsfd, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0); +.EE +.in +.SH SEE ALSO +.BR fsmount (2), +.BR fsopen (2), +.BR fspick (2), +.BR mount (2), +.BR mount_setattr (2), +.BR move_mount (2), +.BR open_tree (2), +.BR mount_namespaces (7) --=20 2.51.0 From nobody Thu Oct 2 02:15:09 2025 Received: from mout-p-102.mailbox.org (mout-p-102.mailbox.org [80.241.56.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12D393126D5; Wed, 24 Sep 2025 15:32:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=80.241.56.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727944; cv=none; b=mAi2lLbjrLnWKxI3bLNv7qX4s6sx4l10hK+dyurCRxOBNOjnwuon3bFB5f2VCZ0NryDcBvq0CSWjHYWFT8SBstUMDAgJ3hSE1mMfGu5NehgP9Y5OYvvHGIHP8AFJiavDE0S3Mhz5F5C1NFIpAZpigbNLGGD79zaElaQREif5L10= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727944; c=relaxed/simple; bh=ChneMzLle+I19p0Vy3bqIHW0I1w/iBP3d0DvcrDAiis=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=nCCIRib+NssDJOseDfcP77hEGSCR/OZn5Rr7zu3jRH1u1wyxyA7B9haPxmxMAv7RxWBSNNLzmsEr6pOjrpqa0ZS7LEwIG+BcE4QqEsUtPKhPN19IKRQrOYMBJCq8SaHM6C0gr1dCmu4er4TM9EmTlfFqKv1tcODqS/B4zxlGMeU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com; spf=pass smtp.mailfrom=cyphar.com; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b=m03d7NDb; arc=none smtp.client-ip=80.241.56.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cyphar.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b="m03d7NDb" Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-102.mailbox.org (Postfix) with ESMTPS id 4cX1762FwCz9t39; Wed, 24 Sep 2025 17:32:18 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1758727938; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DGi8kxALXCToQwD2Q26NEL4I+n6LW3bxP7Ez9xNeJ+w=; b=m03d7NDbx7o4/0bHDp9o5KauXoldB5cOHtpl6MC/m2e0KDF1SktyXQsBqbTha020zSA5Ei 68DTrNItfoRf6cqUaD6vWOgP7BCoNjgGzzP7j6gwI5Vhb5iGAnCXMKUJt5nCujLR7VtfMz RFlYs+u+nVWmaubhOMvenru+NGGEImMGNSvPWG/OgetVLCfEB967Iu12L3Vabi+IJHGTs8 UCE10ywzf4Sx3Y2misi/ZeBlFLLR63KG4JMEoJcDOScRSuwKO/9mDlYYFnri86pHht/+Av 4v9YBPXlzooGvwV2GjRUAWqhXOVK02Pj1HeKonbXUyk2FVuzaXFUyP0iS4mNSQ== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of cyphar@cyphar.com designates 2001:67c:2050:b231:465::202 as permitted sender) smtp.mailfrom=cyphar@cyphar.com From: Aleksa Sarai Date: Thu, 25 Sep 2025 01:31:26 +1000 Subject: [PATCH v5 4/8] man/man2/fsmount.2: document "new" mount API Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250925-new-mount-api-v5-4-028fb88023f2@cyphar.com> References: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> In-Reply-To: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> To: Alejandro Colomar Cc: "Michael T. Kerrisk" , Alexander Viro , Jan Kara , Askar Safin , "G. Branden Robinson" , linux-man@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, David Howells , Christian Brauner , Aleksa Sarai X-Developer-Signature: v=1; a=openpgp-sha256; l=6673; i=cyphar@cyphar.com; h=from:subject:message-id; bh=ChneMzLle+I19p0Vy3bqIHW0I1w/iBP3d0DvcrDAiis=; b=owGbwMvMwCWmMf3Xpe0vXfIZT6slMWRc4bv7/oLh40/6/Uwv9wlvYZ6z8Gyd1cJ+pyv9y9JcH mnN2B4s0DGRhUGMi8FSTJFlm59n6Kb5i68kf1rJBjOHlQlkiLRIAwMDAwMLA19uYl6pkY6Rnqm2 oZ6hoY6RjhEDF6cATLXNKob/+fdj306v1+xKO9+kxJnwnuvEtM0nDl9U7jTi2fPcOepBEsNfiQr tDf/ZpwbOe1DLM/2V3CvjFraTG/Xfx06Zdiv0fMYhRgA= X-Developer-Key: i=cyphar@cyphar.com; a=openpgp; fpr=C9C370B246B09F6DBCFC744C34401015D1D2D386 X-Rspamd-Queue-Id: 4cX1762FwCz9t39 This is loosely based on the original documentation written by David Howells and later maintained by Christian Brauner, but has been rewritten to be more from a user perspective (as well as fixing a few critical mistakes). Co-authored-by: David Howells Signed-off-by: David Howells Co-authored-by: Christian Brauner Signed-off-by: Christian Brauner Signed-off-by: Aleksa Sarai --- man/man2/fsmount.2 | 231 +++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 231 insertions(+) diff --git a/man/man2/fsmount.2 b/man/man2/fsmount.2 new file mode 100644 index 0000000000000000000000000000000000000000..b62850a68443bb8f6178389eb6c= b1a5f9029ab30 --- /dev/null +++ b/man/man2/fsmount.2 @@ -0,0 +1,231 @@ +.\" Copyright, the authors of the Linux man-pages project +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.TH fsmount 2 (date) "Linux man-pages (unreleased)" +.SH NAME +fsmount \- instantiate mount object from filesystem context +.SH LIBRARY +Standard C library +.RI ( libc ,\~ \-lc ) +.SH SYNOPSIS +.nf +.B #include +.P +.BI "int fsmount(int " fsfd ", unsigned int " flags \ +", unsigned int " attr_flags ); +.fi +.SH DESCRIPTION +The +.BR fsmount () +system call is part of +the suite of file-descriptor-based mount facilities in Linux. +.P +.BR fsmount () +creates a new detached mount object +for the root of the new filesystem instance +referenced by the filesystem context file descriptor +.IR fsfd . +A new file descriptor +associated with the detached mount object +is then returned. +In order to create a mount object with +.BR fsmount (), +the calling process must have the +.B \%CAP_SYS_ADMIN +capability. +.P +The filesystem context must have been created with a call to +.BR fsopen (2) +and then had a filesystem instance instantiated with a call to +.BR fsconfig (2) +with +.B \%FSCONFIG_CMD_CREATE +or +.B \%FSCONFIG_CMD_CREATE_EXCL +in order to be in the correct state +for this operation +(the "awaiting-mount" mode in kernel-developer parlance). +.\" FS_CONTEXT_AWAITING_MOUNT is the term the kernel uses for this. +Unlike +.BR open_tree (2) +with +.BR \%OPEN_TREE_CLONE , +.BR fsmount () +can only be called once +in the lifetime of a filesystem context +to produce a mount object. +.P +As with file descriptors returned from +.BR open_tree (2) +called with +.BR OPEN_TREE_CLONE , +the returned file descriptor +can then be used with +.BR move_mount (2), +.BR mount_setattr (2), +or other such system calls to do further mount operations. +This mount object will be unmounted and destroyed +when the file descriptor is closed +if it was not otherwise attached to a mount point +by calling +.BR move_mount (2). +(Note that the unmount operation on +.BR close (2) +is lazy\[em]akin to calling +.BR umount2 (2) +with +.BR MNT_DETACH ; +any existing open references to files +from the mount object +will continue to work, +and the mount object will only be completely destroyed +once it ceases to be busy.) +The returned file descriptor +also acts the same as one produced by +.BR open (2) +with +.BR O_PATH , +meaning it can also be used as a +.I dirfd +argument +to "*at()" system calls. +.P +.I flags +controls the creation of the returned file descriptor. +A value for +.I flags +is constructed by bitwise ORing +zero or more of the following constants: +.RS +.TP +.B FSMOUNT_CLOEXEC +Set the close-on-exec +.RB ( FD_CLOEXEC ) +flag on the new file descriptor. +See the description of the +.B O_CLOEXEC +flag in +.BR open (2) +for reasons why this may be useful. +.RE +.P +.I attr_flags +specifies mount attributes +which will be applied to the created mount object, +in the form of +.BI \%MOUNT_ATTR_ * +flags. +The flags are interpreted as though +.BR mount_setattr (2) +was called with +.I attr.attr_set +set to the same value as +.IR attr_flags . +.BI \%MOUNT_ATTR_ * +flags which would require +specifying additional fields in +.BR mount_attr (2type) +(such as +.BR \%MOUNT_ATTR_IDMAP ) +are not valid flag values for +.IR attr_flags . +.P +If the +.BR fsmount () +operation is successful, +the filesystem context +associated with the file descriptor +.I fsfd +is reset +and placed into reconfiguration mode, +as if it were just returned by +.BR fspick (2). +You may continue to use +.BR fsconfig (2) +with the now-reset filesystem context, +including issuing the +.B \%FSCONFIG_CMD_RECONFIGURE +command +to reconfigure the filesystem instance. +.SH RETURN VALUE +On success, a new file descriptor is returned. +On error, \-1 is returned, and +.I errno +is set to indicate the error. +.SH ERRORS +.TP +.B EBUSY +The filesystem context associated with +.I fsfd +is not in the right state +to be used by +.BR fsmount (). +.TP +.B EINVAL +.I flags +had an invalid flag set. +.TP +.B EINVAL +.I attr_flags +had an invalid +.BI MOUNT_ATTR_ * +flag set. +.TP +.B EMFILE +The calling process has too many open files to create more. +.TP +.B ENFILE +The system has too many open files to create more. +.TP +.B ENOSPC +The "anonymous" mount namespace +necessary to contain the new mount object +could not be allocated, +as doing so would exceed +the configured per-user limit on +the number of mount namespaces in the current user namespace. +(See also +.BR namespaces (7).) +.TP +.B ENOMEM +The kernel could not allocate sufficient memory to complete the operation. +.TP +.B EPERM +The calling process does not have the required +.B CAP_SYS_ADMIN +capability. +.SH STANDARDS +Linux. +.SH HISTORY +Linux 5.2. +.\" commit 93766fbd2696c2c4453dd8e1070977e9cd4e6b6d +.\" commit 400913252d09f9cfb8cce33daee43167921fc343 +glibc 2.36. +.SH EXAMPLES +.in +4n +.EX +int fsfd, mntfd, tmpfd; +\& +fsfd =3D fsopen("tmpfs", FSOPEN_CLOEXEC); +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); +mntfd =3D fsmount(fsfd, FSMOUNT_CLOEXEC, + MOUNT_ATTR_NODEV | MOUNT_ATTR_NOEXEC); +\& +/* Create a new file without attaching the mount object */ +tmpfd =3D openat(mntfd, "tmpfile", O_CREAT | O_EXCL | O_RDWR, 0600); +unlinkat(mntfd, "tmpfile", 0); +\& +/* Attach the mount object to "/tmp" */ +move_mount(mntfd, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.SH SEE ALSO +.BR fsconfig (2), +.BR fsopen (2), +.BR fspick (2), +.BR mount (2), +.BR mount_setattr (2), +.BR move_mount (2), +.BR open_tree (2), +.BR mount_namespaces (7) --=20 2.51.0 From nobody Thu Oct 2 02:15:09 2025 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2FE42FE570; Wed, 24 Sep 2025 15:32:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=80.241.56.151 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727951; cv=none; b=AAo5DrCcS1gHQ04++VQyiiPqxZiQa3fSj76zxMihr2lC9ZBvKiA78CLYS0y8ZR7ZMaFxtk+edcJ+v8PER8vI6r6TZ9zZWAqqRe58Czp27C/x5BRXLC5TQaBQV+mseQA9nJsXY8sdv1MjwtsHeOiBDFVg4AXqrl+aiU0+tfQZ2XM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727951; c=relaxed/simple; bh=hZeBAV6BRGIt/pxlV1mBTum5w55VwRzgSVOfIVuiZeQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=dgysha4Vozea97D+F38b0K97Hh1uIGRTQ3RII+Q9cCAgkagxt0XtCUuixE+kxlQKlHpoR6p8gAqG1Yzgz2iioZSMtOrkavPSau2nViZzoQLWPcMzdId/+iG57RhxXSGoQtg3KNdZry4YHXiUAUbYIuV4q8RjQ3YWmwDQIDGwMPw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com; spf=pass smtp.mailfrom=cyphar.com; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b=ZKTeUJK1; arc=none smtp.client-ip=80.241.56.151 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cyphar.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b="ZKTeUJK1" Received: from smtp202.mailbox.org (smtp202.mailbox.org [10.196.197.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4cX17F0SC4z9t8j; Wed, 24 Sep 2025 17:32:25 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1758727945; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=42csQVRg7CHV2e4M9yMyFFNIL29EBlWC/8JHuPeIQDo=; b=ZKTeUJK1hUT36L/DsvJiCvxp9xLolH32e073oMq4SUHPKefT6g0gOshPwMPODW4ZlSWj/r OaQDxox/swiYuW8bs3ynMxnAzl/Vgr8DGJm08+bSxJiYE0tjXqUTjDzR4jWZo/vUi8DX/r O/PF3sRkQf1Jpb0OI7XFj4RDDOd0giUQnUwhPzbcMUYxry+C/Ti+/yxKGVGmJ6opIDYnqM Quubo99h+6rF0suZpf1knU4WrK9+1Nu8Gx6Axl3uPJa4m7xts14EebUBnrNguYk6KYLlEb qr6sJD0eT0g4WoCrvN9pcxi3FwknpbgFisIcaTdIEswE7o6jffMChxfujmO4WQ== From: Aleksa Sarai Date: Thu, 25 Sep 2025 01:31:27 +1000 Subject: [PATCH v5 5/8] man/man2/move_mount.2: document "new" mount API Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250925-new-mount-api-v5-5-028fb88023f2@cyphar.com> References: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> In-Reply-To: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> To: Alejandro Colomar Cc: "Michael T. Kerrisk" , Alexander Viro , Jan Kara , Askar Safin , "G. Branden Robinson" , linux-man@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, David Howells , Christian Brauner , Aleksa Sarai X-Developer-Signature: v=1; a=openpgp-sha256; l=15621; i=cyphar@cyphar.com; h=from:subject:message-id; bh=hZeBAV6BRGIt/pxlV1mBTum5w55VwRzgSVOfIVuiZeQ=; b=owGbwMvMwCWmMf3Xpe0vXfIZT6slMWRc4bvrWvf3g/FavlMmanmSP3S3LrAqe6jn37zoY2Plk QfKZr/qOiayMIhxMViKKbJs8/MM3TR/8ZXkTyvZYOawMoEMkRZpYGBgYGBh4MtNzCs10jHSM9U2 1DM01DHSMWLg4hSAqY57wvA/xKEh7IhATpps9Y7yGsPZpxkd8mujnjickKyafbZzZ+Yahv9Z3ut /ffywwdtdSPnwHQmbcCGG0x1T9TkitRz8fkrV7OAAAA== X-Developer-Key: i=cyphar@cyphar.com; a=openpgp; fpr=C9C370B246B09F6DBCFC744C34401015D1D2D386 This is loosely based on the original documentation written by David Howells and later maintained by Christian Brauner, but has been rewritten to be more from a user perspective (as well as fixing a few critical mistakes). Co-authored-by: David Howells Signed-off-by: David Howells Co-authored-by: Christian Brauner Signed-off-by: Christian Brauner Signed-off-by: Aleksa Sarai --- man/man2/move_mount.2 | 646 ++++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 646 insertions(+) diff --git a/man/man2/move_mount.2 b/man/man2/move_mount.2 new file mode 100644 index 0000000000000000000000000000000000000000..f954f36c43c444afb167088cc66= 5607dfeb10676 --- /dev/null +++ b/man/man2/move_mount.2 @@ -0,0 +1,646 @@ +.\" Copyright, the authors of the Linux man-pages project +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.TH move_mount 2 (date) "Linux man-pages (unreleased)" +.SH NAME +move_mount \- move or attach mount object to filesystem +.SH LIBRARY +Standard C library +.RI ( libc ,\~ \-lc ) +.SH SYNOPSIS +.nf +.BR "#include " " /* Definition of " AT_* " constants */" +.B #include +.P +.BI "int move_mount(int " from_dirfd ", const char *" from_path , +.BI " int " to_dirfd ", const char *" to_path , +.BI " unsigned int " flags ); +.fi +.SH DESCRIPTION +The +.BR move_mount () +system call is part of +the suite of file-descriptor-based mount facilities in Linux. +.P +.BR move_mount () +moves the mount object indicated by +.I from_dirfd +and +.I from_path +to the path indicated by +.I to_dirfd +and +.IR to_path . +The mount object being moved +can be an existing mount point in the current mount namespace, +or a detached mount object created by +.BR fsmount (2) +or +.BR open_tree (2) +with +.BR \%OPEN_TREE_CLONE . +.P +To access the source mount object +or the destination mount point, +no permissions are required on the object itself, +but if either pathname is supplied, +execute (search) permission is required +on all of the directories specified in +.I from_path +or +.IR to_path . +.P +The calling process must have the +.B \%CAP_SYS_ADMIN +capability in order to move or attach a mount object. +.P +As with "*at()" system calls, +.BR move_mount () +uses the +.I from_dirfd +and +.I to_dirfd +arguments +in conjunction with the +.I from_path +and +.I to_path +arguments to determine the source and destination objects to operate on +(respectively), as follows: +.IP \[bu] 3 +If the pathname given in +.I *_path +is absolute, then +the corresponding +.I *_dirfd +is ignored. +.IP \[bu] +If the pathname given in +.I *_path +is relative and +the corresponding +.I *_dirfd +is the special value +.BR \%AT_FDCWD , +then +.I *_path +is interpreted relative to +the current working directory +of the calling process (like +.BR open (2)). +.IP \[bu] +If the pathname given in +.I *_path +is relative, +then it is interpreted relative to +the directory referred to by +the corresponding file descriptor +.I *_dirfd +(rather than relative to +the current working directory +of the calling process, +as is done by +.BR open (2) +for a relative pathname). +In this case, +the corresponding +.I *_dirfd +must be a directory +that was opened for reading +.RB ( O_RDONLY ) +or using the +.B O_PATH +flag. +.IP \[bu] +If +.I *_path +is an empty string, +and +.I flags +contains the appropriate +.BI \%MOVE_MOUNT_ * _EMPTY_PATH +flag, +then the corresponding file descriptor +.I *_dirfd +is operated on directly. +In this case, +the corresponding +.I *_dirfd +may refer to any type of file, +not just a directory. +.P +See +.BR openat (2) +for an explanation of why the +.I *_dirfd +arguments are useful. +.P +.I flags +can be used to control aspects of the path lookup +for both the source and destination objects, +as well as other properties of the mount operation. +A value for +.I flags +is constructed by bitwise ORing +zero or more of the following constants: +.RS +.TP +.B MOVE_MOUNT_F_EMPTY_PATH +If +.I from_path +is an empty string, operate on the file referred to by +.I from_dirfd +(which may have been obtained from +.BR open (2), +.BR fsmount (2), +or +.BR open_tree (2)). +In this case, +.I from_dirfd +may refer to any type of file, +not just a directory. +If +.I from_dirfd +is +.BR \%AT_FDCWD , +.BR move_mount () +will operate on the current working directory +of the calling process. +.IP +This is the most common mechanism +used to attach detached mount objects +produced by +.BR fsmount (2) +and +.BR open_tree (2) +to a mount point. +.TP +.B MOVE_MOUNT_T_EMPTY_PATH +As with +.BR \%MOVE_MOUNT_F_EMPTY_PATH , +except operating on +.I to_dirfd +and +.IR to_path . +.TP +.B MOVE_MOUNT_F_SYMLINKS +If +.I from_path +references a symbolic link, +then dereference it. +The default behaviour for +.BR move_mount () +is to +.I not follow +symbolic links. +.TP +.B MOVE_MOUNT_T_SYMLINKS +As with +.BR \%MOVE_MOUNT_F_SYMLINKS , +except operating on +.I to_dirfd +and +.IR to_path . +.TP +.B MOVE_MOUNT_F_NO_AUTOMOUNT +Do not automount the terminal ("basename") component of +.I \%from_path +if it is a directory that is an automount point. +This allows a mount object +that has an automount point at its root +to be moved +and prevents unintended triggering of an automount point. +This flag has no effect +if the automount point has already been mounted over. +.TP +.B MOVE_MOUNT_T_NO_AUTOMOUNT +As with +.BR \%MOVE_MOUNT_F_NO_AUTOMOUNT , +except operating on +.I to_dirfd +and +.IR to_path . +This allows an automount point to be manually mounted over. +.TP +.BR MOVE_MOUNT_SET_GROUP " (since Linux 5.15)" +Add the attached private-propagation mount object indicated by +.I to_dirfd +and +.I to_path +into the mount propagation "peer group" +of the attached non-private-propagation mount object indicated by +.I from_dirfd +and +.IR from_path . +.IP +Unlike other +.BR move_mount () +operations, +this operation does not move or attach any mount objects. +Instead, it only updates the metadata +of attached mount objects. +(Also, take careful note of +the argument order\[em]\c +the mount object being modified +by this operation is the one specified by +.I to_dirfd +and +.IR to_path .) +.IP +This makes it possible to first create a mount tree +consisting only of private mounts +and then configure the desired propagation layout afterwards. +(See the "SHARED SUBTREES" section of +.BR mount_namespaces (7) +for more information about mount propagation and peer groups.) +.TP +.BR MOVE_MOUNT_BENEATH " (since Linux 6.5)" +If the path indicated by +.I to_dirfd +and +.I to_path +is an existing mount object, +rather than attaching or moving the mount object +indicated by +.I from_dirfd +and +.I from_path +on top of the mount stack, +attach or move it beneath the current top mount +on the mount stack. +.IP +After using +.BR \%MOVE_MOUNT_BENEATH , +it is possible to +.BR umount (2) +the top mount +in order to reveal the mount object +which was attached beneath it earlier. +This allows for the seamless (and atomic) replacement +of intricate mount trees, +which can further be used +to "upgrade" a mount tree with a newer version. +.IP +This operation has several restrictions: +.RS +.IP \[bu] 3 +Mount objects cannot be attached beneath the filesystem root, +including cases where +the filesystem root was configured by +.BR chroot (2) +or +.BR pivot_root (2). +To mount beneath the filesystem root, +.BR pivot_root (2) +must be used. +.IP \[bu] +The target path indicated by +.I to_dirfd +and +.I to_path +must not be a detached mount object, +such as those produced by +.BR open_tree (2) +with +.B \%OPEN_TREE_CLONE +or +.BR fsmount (2). +.IP \[bu] +The current top mount +of the target path's mount stack +and its parent mount +must be in the calling process's mount namespace. +.IP \[bu] +The caller must have sufficient privileges +to unmount the top mount +of the target path's mount stack, +to prove they have privileges +to reveal the underlying mount. +.IP \[bu] +Mount propagation events triggered by this +.BR move_mount () +operation +(as described in +.BR mount_namespaces (7)) +are calculated based on the parent mount +of the current top mount +of the target path's mount stack. +.IP \[bu] +The target path's mount +cannot be an ancestor in the mount tree of +the source mount object. +.IP \[bu] +The source mount object +must not have any overmounts, +otherwise it would be possible to create "shadow mounts" +(i.e., two mounts mounted on the same parent mount at the same mount point= ). +.IP \[bu] +It is not possible to move a mount +beneath a top mount +if the parent mount +of the current top mount +propagates to the top mount itself. +Otherwise, +.B \%MOVE_MOUNT_BENEATH +would cause the mount object +to be propagated +to the top mount +from the parent mount, +defeating the purpose of using +.BR \%MOVE_MOUNT_BENEATH . +.IP \[bu] +It is not possible to move a mount +beneath a top mount +if the parent mount +of the current top mount +propagates to the mount object +being mounted beneath. +Otherwise, this would cause a similar propagation issue +to the previous point, +also defeating the purpose of using +.BR \%MOVE_MOUNT_BENEATH . +.RE +.RE +.P +If +.I from_dirfd +is a mount object file descriptor and +.BR move_mount () +is operating on it directly, +.I from_dirfd +will remain associated with the mount object after +.BR move_mount () +succeeds, +so you may repeatedly use +.I from_dirfd +with +.BR move_mount (2) +and/or "*at()" system calls +as many times as necessary. +.SH RETURN VALUE +On success, +.BR move_mount () +returns 0. +On error, \-1 is returned, and +.I errno +is set to indicate the error. +.SH ERRORS +.TP +.B EACCES +Search permission is denied +for one of the directories +in the path prefix of one of +.I from_path +or +.IR to_path . +(See also +.BR path_resolution (7).) +.TP +.B EBADF +One of +.I from_dirfd +or +.I to_dirfd +is not a valid file descriptor. +.TP +.B EFAULT +One of +.I from_path +or +.I to_path +is NULL +or a pointer to a location +outside the calling process's accessible address space. +.TP +.B EINVAL +Invalid flag specified in +.IR flags . +.TP +.B EINVAL +The path indicated by +.I from_dirfd +and +.I from_path +is not a mount object. +.TP +.B EINVAL +The mount object type +of the source mount object and target inode +are not compatible +(i.e., the source is a file but the target is a directory, or vice-versa). +.TP +.B EINVAL +The source mount object or target path +are not in the calling process's mount namespace +(or an anonymous mount namespace of the calling process). +.TP +.B EINVAL +The source mount object's parent mount +has shared mount propagation, +and thus cannot be moved +(as described in +.BR mount_namespaces (7)). +.TP +.B EINVAL +The source mount has +.B MS_UNBINDABLE +child mounts +but the target path +resides on a mount tree with shared mount propagation, +which would otherwise cause the unbindable mounts to be propagated +(as described in +.BR mount_namespaces (7)). +.TP +.B EINVAL +.B \%MOVE_MOUNT_BENEATH +was attempted, +but one of the listed restrictions was violated. +.TP +.B ELOOP +Too many symbolic links encountered +when resolving one of +.I from_path +or +.IR to_path . +.TP +.B ENAMETOOLONG +One of +.I from_path +or +.I to_path +is longer than +.BR PATH_MAX . +.TP +.B ENOENT +A component of one of +.I from_path +or +.I to_path +does not exist. +.TP +.B ENOENT +One of +.I from_path +or +.I to_path +is an empty string, +but the corresponding +.BI MOVE_MOUNT_ * _EMPTY_PATH +flag is not specified in +.IR flags . +.TP +.B ENOTDIR +A component of the path prefix of one of +.I from_path +or +.I to_path +is not a directory, +or one of +.I from_path +or +.I to_path +is relative +and the corresponding +.I from_dirfd +or +.I to_dirfd +is a file descriptor referring to a file other than a directory. +.TP +.B ENOMEM +The kernel could not allocate sufficient memory to complete the operation. +.TP +.B EPERM +The calling process does not have the required +.B \%CAP_SYS_ADMIN +capability. +.SH STANDARDS +Linux. +.SH HISTORY +Linux 5.2. +.\" commit 2db154b3ea8e14b04fee23e3fdfd5e9d17fbc6ae +.\" commit 400913252d09f9cfb8cce33daee43167921fc343 +glibc 2.36. +.SH EXAMPLES +.BR move_mount () +can be used to move attached mounts like the following: +.P +.in +4n +.EX +move_mount(AT_FDCWD, "/a", AT_FDCWD, "/b", 0); +.EE +.in +.P +This would move the mount object mounted on +.I /a +to +.IR /b . +The above procedure is functionally equivalent to +the following mount operation +using +.BR mount (2): +.P +.in +4n +.EX +mount("/a", "/b", NULL, MS_MOVE, NULL); +.EE +.in +.P +.BR move_mount () +can also be used in conjunction with file descriptors returned from +.BR open_tree (2) +or +.BR open (2): +.P +.in +4n +.EX +int fd =3D open_tree(AT_FDCWD, "/mnt", 0); /* open("/mnt", O_PATH); */ +move_mount(fd, "", AT_FDCWD, "/mnt2", MOVE_MOUNT_F_EMPTY_PATH); +move_mount(fd, "", AT_FDCWD, "/mnt3", MOVE_MOUNT_F_EMPTY_PATH); +move_mount(fd, "", AT_FDCWD, "/mnt4", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +This would move the mount object mounted at +.I /mnt +to +.IR /mnt2 , +then +.IR /mnt3 , +and then +.IR /mnt4 . +.P +If the source mount object +indicated by +.I from_dirfd +and +.I from_path +is a detached mount object, +.BR move_mount () +can be used to attach it to a mount point: +.P +.in +4n +.EX +int fsfd, mntfd; +\& +fsfd =3D fsopen("ext4", FSOPEN_CLOEXEC); +fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "/dev/sda1", 0); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0); +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); +mntfd =3D fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_NODEV); +move_mount(mntfd, "", AT_FDCWD, "/home", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +This would create a new filesystem configuration context for ext4, +configure it, +create a detached mount object, +and then attach it to +.IR /home . +The above procedure is functionally equivalent to +the following mount operation +using +.BR mount (2): +.P +.in +4n +.EX +mount("/dev/sda1", "/home", "ext4", MS_NODEV, "user_xattr"); +.EE +.in +.P +The same operation also works with detached bind-mounts created with +.BR open_tree (2) +with +.BR OPEN_TREE_CLONE : +.P +.in +4n +.EX +int mntfd =3D open_tree(AT_FDCWD, "/home/cyphar", OPEN_TREE_CLONE); +move_mount(mntfd, "", AT_FDCWD, "/root", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +This would create a new bind-mount of +.I /home/cyphar +as a detached mount object, +and then attach it to +.IR /root . +The above procedure is functionally equivalent to +the following mount operation +using +.BR mount (2): +.P +.in +4n +.EX +mount("/home/cyphar", "/root", NULL, MS_BIND, NULL); +.EE +.in +.SH SEE ALSO +.BR fsconfig (2), +.BR fsmount (2), +.BR fsopen (2), +.BR fspick (2), +.BR mount (2), +.BR mount_setattr (2), +.BR open_tree (2), +.BR mount_namespaces (7) --=20 2.51.0 From nobody Thu Oct 2 02:15:09 2025 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 859F03128D4; Wed, 24 Sep 2025 15:32:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=80.241.56.151 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727956; cv=none; b=kgXTa7+Vw9Mi9vQ65j0I0sTT+1+JtMACfdJUkDu7O5LfoPt7uoEJv/MOy5zJnVTL4hgU1NnW5RwOIlOYl2LmxHGISOrEn5pn1DCcobTw8a/m4zi3PPcDYbii1DDfk73hK6Ven8pl31wcX76pbzvlLp+KUPAIdkB//a2IeXBz/cM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727956; c=relaxed/simple; bh=lpMAdC4jgNlif9kcoqV124dUaczKTafVfJbmH1OLGkc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=I8KA9FfJpwUnSY3bCnQyrk8xFb46MpwZK370FrWxlejZeZTnudO0/F0hwWdGo1uEpXN7qZrMq95Z62uMIFHhze7vK+CPzHXHjGantDYevca5lIqbP53w58D1pZknCVEu0RtpdH4YSLMUX1LZ6REhJMlfDwhEIAox7ASqhfXf6zU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com; spf=pass smtp.mailfrom=cyphar.com; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b=HFIvS6mV; arc=none smtp.client-ip=80.241.56.151 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cyphar.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b="HFIvS6mV" Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4cX17M0cb5z9tRP; Wed, 24 Sep 2025 17:32:31 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1758727951; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NH0F67XJBMbPE+wVabBoPnSJcpngS4xM4FL+5qIYO1g=; b=HFIvS6mVpMsfCgi1EGJot0dWqCpSGxmv/wNJU/gDXwFNlCdYMauiDd5PWmZrzJOiu5YsOe 9TOlP2mi9a/LnZM1D/BPfchMlDmMSISFPpRHirzm9l8rY6LTnJ9v4ABWItYhjLzklEIWap nHY6WfmLsllmln+cF+3NhJJzrlT1xDoI9SyCQwGj/JVhiFQdFF03AAcrz3nYgsE5cCPHnZ 3Fw8pFEDqS7D+ObBq8+1JpCKjvFgPYoqFGMJuHrM+LRk+mJ8KKFkvpJaU7O9Pozcuwp2B7 TYHL+CTHrKZrkM2A3jP0qsqO/pdLtHB49AWPSlHS8CGVwKRP2w9C2ErwVlHU1w== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of cyphar@cyphar.com designates 2001:67c:2050:b231:465::202 as permitted sender) smtp.mailfrom=cyphar@cyphar.com From: Aleksa Sarai Date: Thu, 25 Sep 2025 01:31:28 +1000 Subject: [PATCH v5 6/8] man/man2/open_tree.2: document "new" mount API Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250925-new-mount-api-v5-6-028fb88023f2@cyphar.com> References: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> In-Reply-To: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> To: Alejandro Colomar Cc: "Michael T. Kerrisk" , Alexander Viro , Jan Kara , Askar Safin , "G. Branden Robinson" , linux-man@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, David Howells , Christian Brauner , Aleksa Sarai X-Developer-Signature: v=1; a=openpgp-sha256; l=12713; i=cyphar@cyphar.com; h=from:subject:message-id; bh=lpMAdC4jgNlif9kcoqV124dUaczKTafVfJbmH1OLGkc=; b=owGbwMvMwCWmMf3Xpe0vXfIZT6slMWRc4bvHOfnh2ljd8gm7Dje6nF20qrQkxjQnjmeu0+WKV j+tnf92d0xkYRDjYrAUU2TZ5ucZumn+4ivJn1aywcxhZQIZIi3SwMDAwMDCwJebmFdqpGOkZ6pt qGdoqGOkY8TAxSkAU612hJHhW4GOt9XilaUZ9/6IRe09e3lGz3Xv9lOhUnfNbhdOPGYnxPA/49n mQo/sHQsSwsXrb+1Tn2V406ki+6dfzc4Xzs/PtlzkBwA= X-Developer-Key: i=cyphar@cyphar.com; a=openpgp; fpr=C9C370B246B09F6DBCFC744C34401015D1D2D386 X-Rspamd-Queue-Id: 4cX17M0cb5z9tRP This is loosely based on the original documentation written by David Howells and later maintained by Christian Brauner, but has been rewritten to be more from a user perspective (as well as fixing a few critical mistakes). Co-authored-by: David Howells Signed-off-by: David Howells Co-authored-by: Christian Brauner Signed-off-by: Christian Brauner Signed-off-by: Aleksa Sarai --- man/man2/open_tree.2 | 518 +++++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 518 insertions(+) diff --git a/man/man2/open_tree.2 b/man/man2/open_tree.2 new file mode 100644 index 0000000000000000000000000000000000000000..6b04a80927a8b6a394cf7ab341b= 8d6b29d42d304 --- /dev/null +++ b/man/man2/open_tree.2 @@ -0,0 +1,518 @@ +.\" Copyright, the authors of the Linux man-pages project +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.TH open_tree 2 (date) "Linux man-pages (unreleased)" +.SH NAME +open_tree \- open path or create detached mount object and attach to fd +.SH LIBRARY +Standard C library +.RI ( libc ,\~ \-lc ) +.SH SYNOPSIS +.nf +.BR "#define _GNU_SOURCE " "/* See feature_test_macros(7) */" +.BR "#include " " /* Definition of " AT_* " constants */" +.B #include +.P +.BI "int open_tree(int " dirfd ", const char *" path ", unsigned int " fla= gs ); +.fi +.SH DESCRIPTION +The +.BR open_tree () +system call is part of +the suite of file-descriptor-based mount facilities in Linux. +.IP \[bu] 3 +If +.I flags +contains +.BR \%OPEN_TREE_CLONE , +.BR open_tree () +creates a detached mount object +which consists of a bind-mount of +the path specified by the +.IR path . +A new file descriptor +associated with the detached mount object +is then returned. +The mount object is equivalent to a bind-mount +that would be created by +.BR mount (2) +called with +.BR \%MS_BIND , +except that it is tied to a file descriptor +and is not mounted onto the filesystem. +.IP +As with file descriptors returned from +.BR fsmount (2), +the resultant file descriptor can then be used with +.BR move_mount (2), +.BR mount_setattr (2), +or other such system calls to do further mount operations. +.IP +This mount object will be unmounted and destroyed +when the file descriptor is closed +if it was not otherwise attached to a mount point +by calling +.BR move_mount (2). +This implicit unmount operation is lazy\[em]\c +akin to calling +.BR umount2 (2) +with +.BR \%MNT_DETACH ; +thus, +any existing open references to files +from the mount object +will continue to work, +and the mount object will only be completely destroyed +once it ceases to be busy. +.IP \[bu] +If +.I flags +does not contain +.BR \%OPEN_TREE_CLONE , +.BR open_tree () +returns a file descriptor +that is exactly equivalent to +one produced by +.BR openat (2) +when called with the same +.I dirfd +and +.IR path . +.P +In either case, the resultant file descriptor +acts the same as one produced by +.BR open (2) +with +.BR O_PATH , +meaning it can also be used as a +.I dirfd +argument to +"*at()" system calls. +However, +unlike +.BR open (2) +called with +.BR O_PATH , +automounts will +by default +be triggered by +.BR open_tree () +unless +.B \%AT_NO_AUTOMOUNT +is included in +.IR flags . +.P +As with "*at()" system calls, +.BR open_tree () +uses the +.I dirfd +argument in conjunction with the +.I path +argument to determine the path to operate on, as follows: +.IP \[bu] 3 +If the pathname given in +.I path +is absolute, then +.I dirfd +is ignored. +.IP \[bu] +If the pathname given in +.I path +is relative and +.I dirfd +is the special value +.BR \%AT_FDCWD , +then +.I path +is interpreted relative to +the current working directory +of the calling process (like +.BR open (2)). +.IP \[bu] +If the pathname given in +.I path +is relative, +then it is interpreted relative to +the directory referred to by the file descriptor +.I dirfd +(rather than relative to +the current working directory +of the calling process, +as is done by +.BR open (2) +for a relative pathname). +In this case, +.I dirfd +must be a directory +that was opened for reading +.RB ( \%O_RDONLY ) +or using the +.B O_PATH +flag. +.IP \[bu] +If +.I path +is an empty string, +and +.I flags +contains +.BR \%AT_EMPTY_PATH , +then the file descriptor +.I dirfd +is operated on directly. +In this case, +.I dirfd +may refer to any type of file, +not just a directory. +.P +See +.BR openat (2) +for an explanation of why the +.I dirfd +argument is useful. +.P +.I flags +can be used to control aspects of the path lookup +and properties of the returned file descriptor. +A value for +.I flags +is constructed by bitwise ORing +zero or more of the following constants: +.RS +.TP +.B \%AT_EMPTY_PATH +If +.I path +is an empty string, operate on the file referred to by +.I dirfd +(which may have been obtained from +.BR open (2), +.BR fsmount (2), +or from another +.BR open_tree () +call). +In this case, +.I dirfd +may refer to any type of file, not just a directory. +If +.I dirfd +is +.BR \%AT_FDCWD , +.BR open_tree () +will operate on the current working directory +of the calling process. +This flag is Linux-specific; +define +.B \%_GNU_SOURCE +to obtain its definition. +.TP +.B \%AT_NO_AUTOMOUNT +Do not automount the terminal ("basename") component of +.I path +if it is a directory that is an automount point. +This allows you to create a handle to the automount point itself, +rather than the location it would mount. +This flag has no effect if the mount point has already been mounted over. +This flag is Linux-specific; +define +.B \%_GNU_SOURCE +to obtain its definition. +.TP +.B \%AT_SYMLINK_NOFOLLOW +If +.I path +is a symbolic link, do not dereference it; +instead, +create either a handle to the link itself +or a bind-mount of it. +The resultant file descriptor is indistinguishable from one produced by +.BR openat (2) +with +.BR \%O_PATH | O_NOFOLLLOW . +.TP +.B \%OPEN_TREE_CLOEXEC +Set the close-on-exec +.RB ( FD_CLOEXEC ) +flag on the new file descriptor. +See the description of the +.B O_CLOEXEC +flag in +.BR open (2) +for reasons why this may be useful. +.TP +.B \%OPEN_TREE_CLONE +Rather than creating an +.BR openat (2)-style +.B O_PATH +file descriptor, +create a bind-mount of +.I path +(akin to +.IR \%mount\~\-\-bind ) +as a detached mount object. +In order to do this operation, +the calling process must have the +.B \%CAP_SYS_ADMIN +capability. +.TP +.B \%AT_RECURSIVE +Create a recursive bind-mount of the path +(akin to +.IR \%mount\~\-\-rbind ) +as a detached mount object. +This flag is only permitted in conjunction with +.BR \%OPEN_TREE_CLONE . +.SH RETURN VALUE +On success, a new file descriptor is returned. +On error, \-1 is returned, and +.I errno +is set to indicate the error. +.SH ERRORS +.TP +.B EACCES +Search permission is denied for one of the directories +in the path prefix of +.IR path . +(See also +.BR path_resolution (7).) +.TP +.B EBADF +.I path +is relative but +.I dirfd +is neither +.B \%AT_FDCWD +nor a valid file descriptor. +.TP +.B EFAULT +.I path +is NULL +or a pointer to a location +outside the calling process's accessible address space. +.TP +.B EINVAL +Invalid flag specified in +.IR flags . +.TP +.B ELOOP +Too many symbolic links encountered when resolving +.IR path . +.TP +.B EMFILE +The calling process has too many open files to create more. +.TP +.B ENAMETOOLONG +.I path +is longer than +.BR PATH_MAX . +.TP +.B ENFILE +The system has too many open files to create more. +.TP +.B ENOENT +A component of +.I path +does not exist, or is a dangling symbolic link. +.TP +.B ENOENT +.I path +is an empty string, but +.B AT_EMPTY_PATH +is not specified in +.IR flags . +.TP +.B ENOTDIR +A component of the path prefix of +.I path +is not a directory, or +.I path +is relative and +.I dirfd +is a file descriptor referring to a file other than a directory. +.TP +.B ENOSPC +The "anonymous" mount namespace +necessary to contain the +.B \%OPEN_TREE_CLONE +detached bind-mount mount object +could not be allocated, +as doing so would exceed +the configured per-user limit on +the number of mount namespaces in the current user namespace. +(See also +.BR namespaces (7).) +.TP +.B ENOMEM +The kernel could not allocate sufficient memory to complete the operation. +.TP +.B EPERM +.I flags +contains +.B \%OPEN_TREE_CLONE +but the calling process does not have the required +.B CAP_SYS_ADMIN +capability. +.SH STANDARDS +Linux. +.SH HISTORY +Linux 5.2. +.\" commit a07b20004793d8926f78d63eb5980559f7813404 +.\" commit 400913252d09f9cfb8cce33daee43167921fc343 +glibc 2.36. +.SH NOTES +.SS Mount propagation +The bind-mount mount objects created by +.BR open_tree () +with +.B \%OPEN_TREE_CLONE +are not associated with +the mount namespace of the calling process. +Instead, each mount object is placed +in a newly allocated "anonymous" mount namespace +associated with the calling process. +.P +One of the side-effects of this is that +(unlike bind-mounts created with +.BR mount (2)), +mount propagation +(as described in +.BR mount_namespaces (7)) +will not be applied to bind-mounts created by +.BR open_tree () +until the bind-mount is attached with +.BR move_mount (2), +at which point the mount object +will be associated with the mount namespace +where it was attached +and mount propagation will resume. +Note that any mount propagation events that occurred +before the mount object was attached +will +.I not +be propagated to the mount object, +even after it is attached. +.SH EXAMPLES +The following examples show how +.BR open_tree () +can be used in place of more traditional +.BR mount (2) +calls with +.BR MS_BIND . +.P +.in +4n +.EX +int srcfd =3D open_tree(AT_FDCWD, "/var", OPEN_TREE_CLONE); +move_mount(srcfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +First, +a detached bind-mount mount object of +.I /var +is created +and associated with the file descriptor +.IR srcfd . +Then, the mount object is attached to +.I /mnt +using +.BR move_mount (2) +with +.B \%MOVE_MOUNT_F_EMPTY_PATH +to request that the detached mount object +associated with the file descriptor +.I srcfd +be moved (and thus attached) to +.IR /mnt . +.P +The above procedure is functionally equivalent to +the following mount operation using +.BR mount (2): +.P +.in +4n +.EX +mount("/var", "/mnt", NULL, MS_BIND, NULL); +.EE +.in +.P +.B \%OPEN_TREE_CLONE +can be combined with +.B \%AT_RECURSIVE +to create recursive detached bind-mount mount objects, +which in turn can be attached to mount points +to create recursive bind-mounts. +.P +.in +4n +.EX +int srcfd =3D open_tree(AT_FDCWD, "/var", + OPEN_TREE_CLONE | AT_RECURSIVE); +move_mount(srcfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +The above procedure is functionally equivalent to +the following mount operation using +.BR mount (2): +.P +.in +4n +.EX +mount("/var", "/mnt", NULL, MS_BIND | MS_REC, NULL); +.EE +.in +.P +One of the primary benefits of using +.BR open_tree () +and +.BR move_mount (2) +over the traditional +.BR mount (2) +is that operating with +.IR dirfd -style +file descriptors is far easier and more intuitive. +.P +.in +4n +.EX +int srcfd =3D open_tree(100, "", AT_EMPTY_PATH | OPEN_TREE_CLONE); +move_mount(srcfd, "", 200, "foo", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +The above procedure is roughly equivalent to +the following mount operation using +.BR mount (2): +.P +.in +4n +.EX +mount("/proc/self/fd/100", + "/proc/self/fd/200/foo", + NULL, MS_BIND, NULL); +.EE +.in +.P +In addition, you can use the file descriptor returned by +.BR open_tree () +as the +.I dirfd +argument to any "*at()" system calls: +.P +.in +4n +.EX +int dirfd, fd; +\& +dirfd =3D open_tree(AT_FDCWD, "/etc", OPEN_TREE_CLONE); +fd =3D openat(dirfd, "passwd", O_RDONLY); +fchmodat(dirfd, "shadow", 0000, 0); +close(dirfd); +close(fd); +/* The bind-mount is now destroyed */ +.EE +.in +.SH SEE ALSO +.BR fsconfig (2), +.BR fsmount (2), +.BR fsopen (2), +.BR fspick (2), +.BR mount (2), +.BR mount_setattr (2), +.BR move_mount (2), +.BR mount_namespaces (7) --=20 2.51.0 From nobody Thu Oct 2 02:15:09 2025 Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [80.241.56.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 136AF311950; Wed, 24 Sep 2025 15:32:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=80.241.56.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727964; cv=none; b=sIY6efr9TT34sGClzEYdV6ET1+Gr7b0Y42Fr23uqWwDdC9f5ANFuXg8x80bnYhNwy/CjsnhkuYTZcgTagX6/+cSyCWqO0uccvvrXKBtOy/N6vvV8iR8gZc50ODJ3GbRtEFziAs0veMyVj6chytg1kWUw5Cr7AsZnduGCdH+zdeY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727964; c=relaxed/simple; bh=mTc1es+fPFgyoAOW2eH1s+w64Uur4VOyAf5pQqsrpdg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=LbIea8IMPo4fE9APka/re9HI3U+FO78pEPp9QH5m84XswSdCcsG8lbF1sHPhtWkeG9aPEz2HkBccv7aDzWogSW1HaWzjRP8K5x9ascyt1OJHr2iDDy6PoJNlt0U+J0EwersjIw+5gdrm1F6LZ+4IBrCtxKKNfEwHUnk38A8RXQQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com; spf=pass smtp.mailfrom=cyphar.com; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b=vHhw44Tp; arc=none smtp.client-ip=80.241.56.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cyphar.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b="vHhw44Tp" Received: from smtp202.mailbox.org (smtp202.mailbox.org [10.196.197.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4cX17V1fSyz9tV9; Wed, 24 Sep 2025 17:32:38 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1758727958; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AvArTlTicYXcpPgiB7521Uo+R9zCrn4ZCR2VtyXXiN8=; b=vHhw44TpXtvNqg1iVavajyhTBv4SvgCPnDtgA58nVfIbs7KzUFYSNmaWMmesNBHkKkQDQL XyKdHR49gRnXiEKRa97gViF0digwUfSlH9MdUFA/18p71p6M0P58DjTfeFWlK/f0VAEO7R mQ3Ld7Br1Yb1Xs9NeSWOz/rntDWbyCtHC3Ij6XK3MWYN+3sYqanUEzROJi5nzOtt2RIan7 lqokxe2MUiYf2NPUAJaq/hzOiTy4sc1MD3nIs3/vcZOgiQVtKDrDjcHrNjA197k/c/KT/H oh89uqURlDgYjfwQSZcqN6RkIeoNjEKBI5QphFOU/4W0A38GhlNU4tp/DSEbdQ== From: Aleksa Sarai Date: Thu, 25 Sep 2025 01:31:29 +1000 Subject: [PATCH v5 7/8] man/man2/open_tree{,_attr}.2: document new open_tree_attr() API Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250925-new-mount-api-v5-7-028fb88023f2@cyphar.com> References: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> In-Reply-To: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> To: Alejandro Colomar Cc: "Michael T. Kerrisk" , Alexander Viro , Jan Kara , Askar Safin , "G. Branden Robinson" , linux-man@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, David Howells , Christian Brauner , Aleksa Sarai X-Developer-Signature: v=1; a=openpgp-sha256; l=7437; i=cyphar@cyphar.com; h=from:subject:message-id; bh=mTc1es+fPFgyoAOW2eH1s+w64Uur4VOyAf5pQqsrpdg=; b=owGbwMvMwCWmMf3Xpe0vXfIZT6slMWRc4bsXvD7ZIKnI5XDx48mVy+saF4S8jdXb7nsybP/Hl J3p+bVxHRNZGMS4GCzFFFm2+XmGbpq/+Eryp5VsMHNYmUCGSIs0MDAwMLAw8OUm5pUa6RjpmWob 6hka6hjpGDFwcQrAVK8+zchwcN1JD9+C7EMn4ldrb2KdxhEQdzwn6ctFu5jZ39s+N88sYPifpG7 14v/Xc1OrZ77/GLBs6ba5seJz1oocdrOMuX3YkHc/CwA= X-Developer-Key: i=cyphar@cyphar.com; a=openpgp; fpr=C9C370B246B09F6DBCFC744C34401015D1D2D386 This is a new API added in Linux 6.15, and is effectively just a minor expansion of open_tree(2) in order to allow for MOUNT_ATTR_IDMAP to be changed for an existing ID-mapped mount. glibc does not yet have a wrapper for this. While working on this man-page, I discovered a bug in open_tree_attr(2) that accidentally permitted changing MOUNT_ATTR_IDMAP for extant detached ID-mapped mount objects. This is definitely a bug, but there is no need to add this to BUGS because the patch to fix this has already been accepted (slated for 6.18, and will be backported to 6.15+). Cc: Christian Brauner Signed-off-by: Aleksa Sarai Reviewed-by: Askar Safin --- man/man2/open_tree.2 | 191 ++++++++++++++++++++++++++++++++++++++++++= ++++ man/man2/open_tree_attr.2 | 1 + 2 files changed, 192 insertions(+) diff --git a/man/man2/open_tree.2 b/man/man2/open_tree.2 index 6b04a80927a8b6a394cf7ab341b8d6b29d42d304..8b48f3b782bbb8d017ff50ae662= 4707cc1db992b 100644 --- a/man/man2/open_tree.2 +++ b/man/man2/open_tree.2 @@ -15,7 +15,19 @@ .SH SYNOPSIS .B #include .P .BI "int open_tree(int " dirfd ", const char *" path ", unsigned int " fla= gs ); +.P +.BR "#include " " /* Definition of " SYS_* " constants *= /" +.P +.B int syscall(SYS_open_tree_attr, +.BI " int " dirfd ", const char *" path ", unsigned int " flags= , +.BI " struct mount_attr *_Nullable " attr ", size_t " size ); .fi +.P +.IR Note : +glibc provides no wrapper for +.BR open_tree_attr (), +necessitating the use of +.BR syscall (2). .SH DESCRIPTION The .BR open_tree () @@ -263,6 +275,129 @@ .SH DESCRIPTION as a detached mount object. This flag is only permitted in conjunction with .BR \%OPEN_TREE_CLONE . +.SS open_tree_attr() +The +.BR open_tree_attr () +system call operates in exactly the same way as +.BR open_tree (), +except for the differences described here. +.P +After performing the same operation as with +.BR open_tree (), +.BR open_tree_attr () +will apply the mount attribute changes described in +.I attr +to the file descriptor before it is returned. +(See +.BR mount_attr (2type) +for a description of the +.I \%mount_attr +structure. +As described in +.BR mount_setattr (2), +.I size +must be set to +.I \%sizeof(struct mount_attr) +in order to support future extensions.) +If +.I attr +is NULL, +or has +.IR \%attr.attr_clr , +.IR \%attr.attr_set , +and +.I \%attr.propagation +all set to zero, +then +.BR open_tree_attr () +has identical behaviour to +.BR open_tree (). +.P +The application of +.I attr +to the resultant file descriptor +has identical semantics to +.BR mount_setattr (2), +except for the following extensions and general caveats: +.IP \[bu] 3 +Unlike +.BR mount_setattr (2) +called with a regular +.B OPEN_TREE_CLONE +detached mount object from +.BR open_tree (), +.BR open_tree_attr () +can specify a different setting for +.B \%MOUNT_ATTR_IDMAP +to the original mount object cloned with +.BR \%OPEN_TREE_CLONE . +.IP +Adding +.B \%MOUNT_ATTR_IDMAP +to +.I \%attr.attr_clr +will disable ID-mapping for the new mount object; +adding +.B \%MOUNT_ATTR_IDMAP +to +.I \%attr.attr_set +will configure the mount object to have the ID-mapping defined by +the user namespace referenced by the file descriptor +.IR \%attr.userns_fd . +(The semantics of which are identical to when +.BR mount_setattr (2) +is used to configure +.BR \%MOUNT_ATTR_IDMAP .) +.IP +Changing or removing the mapping +of an ID-mapped mount is only permitted +if a new detached mount object is being created with +.I flags +including +.BR \%OPEN_TREE_CLONE . +.\" Aleksa Sarai +.\" At time of writing, this is not actually true because of a bug where +.\" open_tree_attr() would accidentally permit changing MOUNT_ATTR_IDMAP = for +.\" existing detached mount objects without setting OPEN_TREE_CLONE, but a +.\" patch to fix it has been slated for 6.18 and will be backported to 6.= 15+. +.\" +.IP \[bu] +If +.I flags +contains +.BR \%AT_RECURSIVE , +then the attributes described in +.I attr +are applied recursively +(just as when +.BR mount_setattr (2) +is called with +.BR \%AT_RECURSIVE ). +However, this applies in addition to the +.BR open_tree ()-specific +behaviour regarding +.BR \%AT_RECURSIVE , +and thus +.I flags +must also contain +.BR \%OPEN_TREE_CLONE . +.P +Note that if +.I flags +does not contain +.BR \%OPEN_TREE_CLONE , +.BR open_tree_attr () +will attempt to modify the mount attributes of +the mount object attached at +the path described by +.I dirfd +and +.IR path . +As with +.BR mount_setattr (2), +if said path is not a mount point, +.BR open_tree_attr () +will return an error. .SH RETURN VALUE On success, a new file descriptor is returned. On error, \-1 is returned, and @@ -356,10 +491,15 @@ .SH ERRORS .SH STANDARDS Linux. .SH HISTORY +.SS open_tree() Linux 5.2. .\" commit a07b20004793d8926f78d63eb5980559f7813404 .\" commit 400913252d09f9cfb8cce33daee43167921fc343 glibc 2.36. +.SS open_tree_attr() +Linux 6.15. +.\" commit c4a16820d90199409c9bf01c4f794e1e9e8d8fd8 +.\" commit 7a54947e727b6df840780a66c970395ed9734ebe .SH NOTES .SS Mount propagation The bind-mount mount objects created by @@ -507,6 +647,57 @@ .SH EXAMPLES /* The bind-mount is now destroyed */ .EE .in +.SS open_tree_attr() +The following is an example of how +.BR open_tree_attr () +can be used to +take an existing id-mapped mount and +construct a new bind-mount mount object +with a different +.B \%MOUNT_ATTR_IDMAP +attribute. +The resultant detached mount object +can be used +like any other mount object +returned by +.BR open_tree (). +.P +.in +4n +.EX +int nsfd1, nsfd2; +int mntfd1, mntfd2, mntfd3; +struct mount_attr attr; +mntfd1 =3D open_tree(AT_FDCWD, "/foo", OPEN_TREE_CLONE); +\& +/* Configure the id-mapping of mntfd1 */ +nsfd1 =3D open("/proc/1234/ns/user", O_RDONLY); +memset(&attr, 0, sizeof(attr)); +attr.attr_set =3D MOUNT_ATTR_IDMAP; +attr.userns_fd =3D nsfd1; +mount_setattr(mntfd1, "", AT_EMPTY_PATH, &attr, sizeof(attr)); +\& +/* Create a new copy with a different id-mapping */ +nsfd2 =3D open("/proc/5678/ns/user", O_RDONLY); +memset(&attr, 0, sizeof(attr)); +attr.attr_clr =3D MOUNT_ATTR_IDMAP; +.\" Using .attr_clr is not strictly necessary but makes the intent clearer. +attr.attr_set =3D MOUNT_ATTR_IDMAP; +attr.userns_fd =3D nsfd2; +mntfd2 =3D open_tree(mntfd1, "", OPEN_TREE_CLONE, + &attr, sizeof(attr)); +\& +/* Create a new copy with the id-mapping cleared */ +memset(&attr, 0, sizeof(attr)); +attr.attr_clr =3D MOUNT_ATTR_IDMAP; +mntfd3 =3D open_tree(mntfd1, "", OPEN_TREE_CLONE, + &attr, sizeof(attr)); +.EE +.in +.P +.BR open_tree_attr () +can also be used +with attached mount objects; +the above example is only intended to be illustrative. .SH SEE ALSO .BR fsconfig (2), .BR fsmount (2), diff --git a/man/man2/open_tree_attr.2 b/man/man2/open_tree_attr.2 new file mode 100644 index 0000000000000000000000000000000000000000..e57269bbd269bcce0b0a9744256= 44ba75e379f2f --- /dev/null +++ b/man/man2/open_tree_attr.2 @@ -0,0 +1 @@ +.so man2/open_tree.2 --=20 2.51.0 From nobody Thu Oct 2 02:15:09 2025 Received: from mout-p-103.mailbox.org (mout-p-103.mailbox.org [80.241.56.161]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76B40313545; Wed, 24 Sep 2025 15:32:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=80.241.56.161 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727970; cv=none; b=NEPJ3hZAvIVgK8gKBq3q9ZeYlxd9NAwGVJmbm5oQz4tiqKgOd4qrkQEFfbq2Wa4kZ2sHcYhnlC4+x3dxR65+07CycVmPbSRTBkvh77fKVS6hAzyFPea9Hv8lbHzzuVXkGWlnrl2/52+ndBlloHLjpNMf49+W0Y+PTiD/yqgdm80= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727970; c=relaxed/simple; bh=CjXX2QK5JsRjG6mf5chm3pFhzMwRmtPUDlCuGyRsRG8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=uFxc/i9OzDPJJg1JdXwpTZ8k6EeMcblW9rQosodWKnHdtWwdtkOYPOj6leNB71SivelFHTLzhDcbia47m9mn6dXdB4H8/YOZrsE9nKbm7Ig2ZtoOerJW/y8Yr1hfA8/wrjmTRJik68A4tmOSuKr3A7CtCHYbSkblGqUwLBZoeX4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com; spf=pass smtp.mailfrom=cyphar.com; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b=abeV+F54; arc=none smtp.client-ip=80.241.56.161 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cyphar.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b="abeV+F54" Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-103.mailbox.org (Postfix) with ESMTPS id 4cX17c6QNtz9srM; Wed, 24 Sep 2025 17:32:44 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1758727964; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6jp4RJnTh4Wq1Bn9YKF4KFTuTOWruSTCYzzraQL9wO0=; b=abeV+F540RaU80zgXJ5ihU7P0PAypTF3OFFUhZOq2VaQL1VYwQfxzSvns+KOrdiIJNp6Gd +sy6nqB1Zos0n+QusqTjlR0KrwgdaZ/ejd0nkso1AhI4b79wHN81PwhKtrafgS6tMW2f5W +oUZP+Hyp6MxiLagErbfcdzNH7PQlnnriYbiwjOjObMXLt3Ro5KyRzxZwkVd/n+fkzjjGs NEOvlG8p1uZd70S86sERvoHSiKHs15jY2I6jjsdIiApFtkINIbHFVsafius2jrlZdXMh61 YQGmvipXXSv6obsJMpsgXj26Qjn4CV8E+fDx55ztZoPvC4FKJFZZBixpbxRx2g== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of cyphar@cyphar.com designates 2001:67c:2050:b231:465::202 as permitted sender) smtp.mailfrom=cyphar@cyphar.com From: Aleksa Sarai Date: Thu, 25 Sep 2025 01:31:30 +1000 Subject: [PATCH v5 8/8] man/man2/{fsconfig,mount_setattr}.2: add note about attribute-parameter distinction Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250925-new-mount-api-v5-8-028fb88023f2@cyphar.com> References: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> In-Reply-To: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> To: Alejandro Colomar Cc: "Michael T. Kerrisk" , Alexander Viro , Jan Kara , Askar Safin , "G. Branden Robinson" , linux-man@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, David Howells , Christian Brauner , Aleksa Sarai X-Developer-Signature: v=1; a=openpgp-sha256; l=2984; i=cyphar@cyphar.com; h=from:subject:message-id; bh=CjXX2QK5JsRjG6mf5chm3pFhzMwRmtPUDlCuGyRsRG8=; b=owGbwMvMwCWmMf3Xpe0vXfIZT6slMWRc4bv3MtMk7L9h0qlG/6ZiOUZGv+BH6fyb9qc+/Pjir RW/IP/vjoksDGJcDJZiiizb/DxDN81ffCX500o2mDmsTCBDpEUaGBgYGFgY+HIT80qNdIz0TLUN 9QwNdYx0jBi4OAVgqu0rGBmuXWc1y/dOZT/n33U/isWybrGa9gTh80IMNRU3fq34dsiS4X9Y6dX 6UsfOefWOV3MmhW1tMv/UUtK4XJkp9X2Q74Q38bwA X-Developer-Key: i=cyphar@cyphar.com; a=openpgp; fpr=C9C370B246B09F6DBCFC744C34401015D1D2D386 X-Rspamd-Queue-Id: 4cX17c6QNtz9srM This was not particularly well documented in mount(8) nor mount(2), and since this is a fairly notable aspect of the new mount API, we should probably add some words about it. Signed-off-by: Aleksa Sarai --- man/man2/fsconfig.2 | 12 ++++++++++++ man/man2/mount_setattr.2 | 39 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/man/man2/fsconfig.2 b/man/man2/fsconfig.2 index a2d844a105c74f17af640d6991046dbd5fa69cf0..3b972761196b9c1577a6f324a2f= 4135471dd0ab3 100644 --- a/man/man2/fsconfig.2 +++ b/man/man2/fsconfig.2 @@ -580,6 +580,18 @@ .SS Generic filesystem parameters Linux Security Modules (LSMs) are also generic with respect to the underlying filesystem. See the documentation for the LSM you wish to configure for more details. +.SS Mount attributes and filesystem parameters +Some filesystem parameters +(traditionally associated with +.BR mount (8)-style +options) +have a sibling mount attribute +with superficially similar user-facing behaviour. +.P +For a description of the distinction between +mount attributes and filesystem parameters, +see the "Mount attributes and filesystem parameters" subsection of +.BR mount_setattr (2). .SH CAVEATS .SS Filesystem parameter types As a result of diff --git a/man/man2/mount_setattr.2 b/man/man2/mount_setattr.2 index 2f8a79dfde722b7b58b80797d89798076af94f55..efe22496be95383b986d9a36233= 24d472a76c189 100644 --- a/man/man2/mount_setattr.2 +++ b/man/man2/mount_setattr.2 @@ -792,6 +792,45 @@ .SS ID-mapped mounts .BR chown (2) system call changes the ownership globally and permanently. .\" +.SS Mount attributes and filesystem parameters +Some mount attributes +(traditionally associated with +.BR mount (8)-style +options) +have a sibling filesystem parameter +with superficially similar user-facing behaviour. +For example, the +.I \-o\~ro +option to +.BR mount (8) +can refer to the +"read-only" filesystem parameter, +or the "read-only" mount attribute. +Both of these result in mount objects becoming read-only, +but they do have different behaviour. +.P +The distinction between these two kinds of option is that +mount object attributes are applied per-mount-object +(allowing different mount objects +derived from a given filesystem instance +to have different attributes), +while filesystem instance parameters +("superblock flags" in kernel-developer parlance) +apply to all mount objects +derived from the same filesystem instance. +.P +When using +.BR mount (2), +the line between these two types of mount options was blurred. +However, with +.BR mount_setattr () +and +.BR fsconfig (2), +the distinction is made much clearer. +Mount attributes are configured with +.BR mount_setattr (), +while filesystem parameters are configured using +.BR fsconfig (2). .SS Extensibility In order to allow for future extensibility, .BR mount_setattr () --=20 2.51.0