From nobody Thu Oct 2 02:15:08 2025 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 859F03128D4; Wed, 24 Sep 2025 15:32:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=80.241.56.151 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727956; cv=none; b=kgXTa7+Vw9Mi9vQ65j0I0sTT+1+JtMACfdJUkDu7O5LfoPt7uoEJv/MOy5zJnVTL4hgU1NnW5RwOIlOYl2LmxHGISOrEn5pn1DCcobTw8a/m4zi3PPcDYbii1DDfk73hK6Ven8pl31wcX76pbzvlLp+KUPAIdkB//a2IeXBz/cM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758727956; c=relaxed/simple; bh=lpMAdC4jgNlif9kcoqV124dUaczKTafVfJbmH1OLGkc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=I8KA9FfJpwUnSY3bCnQyrk8xFb46MpwZK370FrWxlejZeZTnudO0/F0hwWdGo1uEpXN7qZrMq95Z62uMIFHhze7vK+CPzHXHjGantDYevca5lIqbP53w58D1pZknCVEu0RtpdH4YSLMUX1LZ6REhJMlfDwhEIAox7ASqhfXf6zU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com; spf=pass smtp.mailfrom=cyphar.com; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b=HFIvS6mV; arc=none smtp.client-ip=80.241.56.151 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cyphar.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cyphar.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cyphar.com header.i=@cyphar.com header.b="HFIvS6mV" Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4cX17M0cb5z9tRP; Wed, 24 Sep 2025 17:32:31 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1758727951; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NH0F67XJBMbPE+wVabBoPnSJcpngS4xM4FL+5qIYO1g=; b=HFIvS6mVpMsfCgi1EGJot0dWqCpSGxmv/wNJU/gDXwFNlCdYMauiDd5PWmZrzJOiu5YsOe 9TOlP2mi9a/LnZM1D/BPfchMlDmMSISFPpRHirzm9l8rY6LTnJ9v4ABWItYhjLzklEIWap nHY6WfmLsllmln+cF+3NhJJzrlT1xDoI9SyCQwGj/JVhiFQdFF03AAcrz3nYgsE5cCPHnZ 3Fw8pFEDqS7D+ObBq8+1JpCKjvFgPYoqFGMJuHrM+LRk+mJ8KKFkvpJaU7O9Pozcuwp2B7 TYHL+CTHrKZrkM2A3jP0qsqO/pdLtHB49AWPSlHS8CGVwKRP2w9C2ErwVlHU1w== Authentication-Results: outgoing_mbo_mout; dkim=none; spf=pass (outgoing_mbo_mout: domain of cyphar@cyphar.com designates 2001:67c:2050:b231:465::202 as permitted sender) smtp.mailfrom=cyphar@cyphar.com From: Aleksa Sarai Date: Thu, 25 Sep 2025 01:31:28 +1000 Subject: [PATCH v5 6/8] man/man2/open_tree.2: document "new" mount API Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250925-new-mount-api-v5-6-028fb88023f2@cyphar.com> References: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> In-Reply-To: <20250925-new-mount-api-v5-0-028fb88023f2@cyphar.com> To: Alejandro Colomar Cc: "Michael T. Kerrisk" , Alexander Viro , Jan Kara , Askar Safin , "G. Branden Robinson" , linux-man@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, David Howells , Christian Brauner , Aleksa Sarai X-Developer-Signature: v=1; a=openpgp-sha256; l=12713; i=cyphar@cyphar.com; h=from:subject:message-id; bh=lpMAdC4jgNlif9kcoqV124dUaczKTafVfJbmH1OLGkc=; b=owGbwMvMwCWmMf3Xpe0vXfIZT6slMWRc4bvHOfnh2ljd8gm7Dje6nF20qrQkxjQnjmeu0+WKV j+tnf92d0xkYRDjYrAUU2TZ5ucZumn+4ivJn1aywcxhZQIZIi3SwMDAwMDCwJebmFdqpGOkZ6pt qGdoqGOkY8TAxSkAU612hJHhW4GOt9XilaUZ9/6IRe09e3lGz3Xv9lOhUnfNbhdOPGYnxPA/49n mQo/sHQsSwsXrb+1Tn2V406ki+6dfzc4Xzs/PtlzkBwA= X-Developer-Key: i=cyphar@cyphar.com; a=openpgp; fpr=C9C370B246B09F6DBCFC744C34401015D1D2D386 X-Rspamd-Queue-Id: 4cX17M0cb5z9tRP This is loosely based on the original documentation written by David Howells and later maintained by Christian Brauner, but has been rewritten to be more from a user perspective (as well as fixing a few critical mistakes). Co-authored-by: David Howells Signed-off-by: David Howells Co-authored-by: Christian Brauner Signed-off-by: Christian Brauner Signed-off-by: Aleksa Sarai --- man/man2/open_tree.2 | 518 +++++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 518 insertions(+) diff --git a/man/man2/open_tree.2 b/man/man2/open_tree.2 new file mode 100644 index 0000000000000000000000000000000000000000..6b04a80927a8b6a394cf7ab341b= 8d6b29d42d304 --- /dev/null +++ b/man/man2/open_tree.2 @@ -0,0 +1,518 @@ +.\" Copyright, the authors of the Linux man-pages project +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.TH open_tree 2 (date) "Linux man-pages (unreleased)" +.SH NAME +open_tree \- open path or create detached mount object and attach to fd +.SH LIBRARY +Standard C library +.RI ( libc ,\~ \-lc ) +.SH SYNOPSIS +.nf +.BR "#define _GNU_SOURCE " "/* See feature_test_macros(7) */" +.BR "#include " " /* Definition of " AT_* " constants */" +.B #include +.P +.BI "int open_tree(int " dirfd ", const char *" path ", unsigned int " fla= gs ); +.fi +.SH DESCRIPTION +The +.BR open_tree () +system call is part of +the suite of file-descriptor-based mount facilities in Linux. +.IP \[bu] 3 +If +.I flags +contains +.BR \%OPEN_TREE_CLONE , +.BR open_tree () +creates a detached mount object +which consists of a bind-mount of +the path specified by the +.IR path . +A new file descriptor +associated with the detached mount object +is then returned. +The mount object is equivalent to a bind-mount +that would be created by +.BR mount (2) +called with +.BR \%MS_BIND , +except that it is tied to a file descriptor +and is not mounted onto the filesystem. +.IP +As with file descriptors returned from +.BR fsmount (2), +the resultant file descriptor can then be used with +.BR move_mount (2), +.BR mount_setattr (2), +or other such system calls to do further mount operations. +.IP +This mount object will be unmounted and destroyed +when the file descriptor is closed +if it was not otherwise attached to a mount point +by calling +.BR move_mount (2). +This implicit unmount operation is lazy\[em]\c +akin to calling +.BR umount2 (2) +with +.BR \%MNT_DETACH ; +thus, +any existing open references to files +from the mount object +will continue to work, +and the mount object will only be completely destroyed +once it ceases to be busy. +.IP \[bu] +If +.I flags +does not contain +.BR \%OPEN_TREE_CLONE , +.BR open_tree () +returns a file descriptor +that is exactly equivalent to +one produced by +.BR openat (2) +when called with the same +.I dirfd +and +.IR path . +.P +In either case, the resultant file descriptor +acts the same as one produced by +.BR open (2) +with +.BR O_PATH , +meaning it can also be used as a +.I dirfd +argument to +"*at()" system calls. +However, +unlike +.BR open (2) +called with +.BR O_PATH , +automounts will +by default +be triggered by +.BR open_tree () +unless +.B \%AT_NO_AUTOMOUNT +is included in +.IR flags . +.P +As with "*at()" system calls, +.BR open_tree () +uses the +.I dirfd +argument in conjunction with the +.I path +argument to determine the path to operate on, as follows: +.IP \[bu] 3 +If the pathname given in +.I path +is absolute, then +.I dirfd +is ignored. +.IP \[bu] +If the pathname given in +.I path +is relative and +.I dirfd +is the special value +.BR \%AT_FDCWD , +then +.I path +is interpreted relative to +the current working directory +of the calling process (like +.BR open (2)). +.IP \[bu] +If the pathname given in +.I path +is relative, +then it is interpreted relative to +the directory referred to by the file descriptor +.I dirfd +(rather than relative to +the current working directory +of the calling process, +as is done by +.BR open (2) +for a relative pathname). +In this case, +.I dirfd +must be a directory +that was opened for reading +.RB ( \%O_RDONLY ) +or using the +.B O_PATH +flag. +.IP \[bu] +If +.I path +is an empty string, +and +.I flags +contains +.BR \%AT_EMPTY_PATH , +then the file descriptor +.I dirfd +is operated on directly. +In this case, +.I dirfd +may refer to any type of file, +not just a directory. +.P +See +.BR openat (2) +for an explanation of why the +.I dirfd +argument is useful. +.P +.I flags +can be used to control aspects of the path lookup +and properties of the returned file descriptor. +A value for +.I flags +is constructed by bitwise ORing +zero or more of the following constants: +.RS +.TP +.B \%AT_EMPTY_PATH +If +.I path +is an empty string, operate on the file referred to by +.I dirfd +(which may have been obtained from +.BR open (2), +.BR fsmount (2), +or from another +.BR open_tree () +call). +In this case, +.I dirfd +may refer to any type of file, not just a directory. +If +.I dirfd +is +.BR \%AT_FDCWD , +.BR open_tree () +will operate on the current working directory +of the calling process. +This flag is Linux-specific; +define +.B \%_GNU_SOURCE +to obtain its definition. +.TP +.B \%AT_NO_AUTOMOUNT +Do not automount the terminal ("basename") component of +.I path +if it is a directory that is an automount point. +This allows you to create a handle to the automount point itself, +rather than the location it would mount. +This flag has no effect if the mount point has already been mounted over. +This flag is Linux-specific; +define +.B \%_GNU_SOURCE +to obtain its definition. +.TP +.B \%AT_SYMLINK_NOFOLLOW +If +.I path +is a symbolic link, do not dereference it; +instead, +create either a handle to the link itself +or a bind-mount of it. +The resultant file descriptor is indistinguishable from one produced by +.BR openat (2) +with +.BR \%O_PATH | O_NOFOLLLOW . +.TP +.B \%OPEN_TREE_CLOEXEC +Set the close-on-exec +.RB ( FD_CLOEXEC ) +flag on the new file descriptor. +See the description of the +.B O_CLOEXEC +flag in +.BR open (2) +for reasons why this may be useful. +.TP +.B \%OPEN_TREE_CLONE +Rather than creating an +.BR openat (2)-style +.B O_PATH +file descriptor, +create a bind-mount of +.I path +(akin to +.IR \%mount\~\-\-bind ) +as a detached mount object. +In order to do this operation, +the calling process must have the +.B \%CAP_SYS_ADMIN +capability. +.TP +.B \%AT_RECURSIVE +Create a recursive bind-mount of the path +(akin to +.IR \%mount\~\-\-rbind ) +as a detached mount object. +This flag is only permitted in conjunction with +.BR \%OPEN_TREE_CLONE . +.SH RETURN VALUE +On success, a new file descriptor is returned. +On error, \-1 is returned, and +.I errno +is set to indicate the error. +.SH ERRORS +.TP +.B EACCES +Search permission is denied for one of the directories +in the path prefix of +.IR path . +(See also +.BR path_resolution (7).) +.TP +.B EBADF +.I path +is relative but +.I dirfd +is neither +.B \%AT_FDCWD +nor a valid file descriptor. +.TP +.B EFAULT +.I path +is NULL +or a pointer to a location +outside the calling process's accessible address space. +.TP +.B EINVAL +Invalid flag specified in +.IR flags . +.TP +.B ELOOP +Too many symbolic links encountered when resolving +.IR path . +.TP +.B EMFILE +The calling process has too many open files to create more. +.TP +.B ENAMETOOLONG +.I path +is longer than +.BR PATH_MAX . +.TP +.B ENFILE +The system has too many open files to create more. +.TP +.B ENOENT +A component of +.I path +does not exist, or is a dangling symbolic link. +.TP +.B ENOENT +.I path +is an empty string, but +.B AT_EMPTY_PATH +is not specified in +.IR flags . +.TP +.B ENOTDIR +A component of the path prefix of +.I path +is not a directory, or +.I path +is relative and +.I dirfd +is a file descriptor referring to a file other than a directory. +.TP +.B ENOSPC +The "anonymous" mount namespace +necessary to contain the +.B \%OPEN_TREE_CLONE +detached bind-mount mount object +could not be allocated, +as doing so would exceed +the configured per-user limit on +the number of mount namespaces in the current user namespace. +(See also +.BR namespaces (7).) +.TP +.B ENOMEM +The kernel could not allocate sufficient memory to complete the operation. +.TP +.B EPERM +.I flags +contains +.B \%OPEN_TREE_CLONE +but the calling process does not have the required +.B CAP_SYS_ADMIN +capability. +.SH STANDARDS +Linux. +.SH HISTORY +Linux 5.2. +.\" commit a07b20004793d8926f78d63eb5980559f7813404 +.\" commit 400913252d09f9cfb8cce33daee43167921fc343 +glibc 2.36. +.SH NOTES +.SS Mount propagation +The bind-mount mount objects created by +.BR open_tree () +with +.B \%OPEN_TREE_CLONE +are not associated with +the mount namespace of the calling process. +Instead, each mount object is placed +in a newly allocated "anonymous" mount namespace +associated with the calling process. +.P +One of the side-effects of this is that +(unlike bind-mounts created with +.BR mount (2)), +mount propagation +(as described in +.BR mount_namespaces (7)) +will not be applied to bind-mounts created by +.BR open_tree () +until the bind-mount is attached with +.BR move_mount (2), +at which point the mount object +will be associated with the mount namespace +where it was attached +and mount propagation will resume. +Note that any mount propagation events that occurred +before the mount object was attached +will +.I not +be propagated to the mount object, +even after it is attached. +.SH EXAMPLES +The following examples show how +.BR open_tree () +can be used in place of more traditional +.BR mount (2) +calls with +.BR MS_BIND . +.P +.in +4n +.EX +int srcfd =3D open_tree(AT_FDCWD, "/var", OPEN_TREE_CLONE); +move_mount(srcfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +First, +a detached bind-mount mount object of +.I /var +is created +and associated with the file descriptor +.IR srcfd . +Then, the mount object is attached to +.I /mnt +using +.BR move_mount (2) +with +.B \%MOVE_MOUNT_F_EMPTY_PATH +to request that the detached mount object +associated with the file descriptor +.I srcfd +be moved (and thus attached) to +.IR /mnt . +.P +The above procedure is functionally equivalent to +the following mount operation using +.BR mount (2): +.P +.in +4n +.EX +mount("/var", "/mnt", NULL, MS_BIND, NULL); +.EE +.in +.P +.B \%OPEN_TREE_CLONE +can be combined with +.B \%AT_RECURSIVE +to create recursive detached bind-mount mount objects, +which in turn can be attached to mount points +to create recursive bind-mounts. +.P +.in +4n +.EX +int srcfd =3D open_tree(AT_FDCWD, "/var", + OPEN_TREE_CLONE | AT_RECURSIVE); +move_mount(srcfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +The above procedure is functionally equivalent to +the following mount operation using +.BR mount (2): +.P +.in +4n +.EX +mount("/var", "/mnt", NULL, MS_BIND | MS_REC, NULL); +.EE +.in +.P +One of the primary benefits of using +.BR open_tree () +and +.BR move_mount (2) +over the traditional +.BR mount (2) +is that operating with +.IR dirfd -style +file descriptors is far easier and more intuitive. +.P +.in +4n +.EX +int srcfd =3D open_tree(100, "", AT_EMPTY_PATH | OPEN_TREE_CLONE); +move_mount(srcfd, "", 200, "foo", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +The above procedure is roughly equivalent to +the following mount operation using +.BR mount (2): +.P +.in +4n +.EX +mount("/proc/self/fd/100", + "/proc/self/fd/200/foo", + NULL, MS_BIND, NULL); +.EE +.in +.P +In addition, you can use the file descriptor returned by +.BR open_tree () +as the +.I dirfd +argument to any "*at()" system calls: +.P +.in +4n +.EX +int dirfd, fd; +\& +dirfd =3D open_tree(AT_FDCWD, "/etc", OPEN_TREE_CLONE); +fd =3D openat(dirfd, "passwd", O_RDONLY); +fchmodat(dirfd, "shadow", 0000, 0); +close(dirfd); +close(fd); +/* The bind-mount is now destroyed */ +.EE +.in +.SH SEE ALSO +.BR fsconfig (2), +.BR fsmount (2), +.BR fsopen (2), +.BR fspick (2), +.BR mount (2), +.BR mount_setattr (2), +.BR move_mount (2), +.BR mount_namespaces (7) --=20 2.51.0