man/man2/fsconfig.2 | 681 ++++++++++++++++++++++++++++++++++++++++++ man/man2/fsmount.2 | 220 ++++++++++++++ man/man2/fsopen.2 | 384 ++++++++++++++++++++++++ man/man2/fspick.2 | 309 +++++++++++++++++++ man/man2/mount_setattr.2 | 62 +++- man/man2/move_mount.2 | 640 +++++++++++++++++++++++++++++++++++++++ man/man2/open_tree.2 | 593 ++++++++++++++++++++++++++++++++++++ man/man2/open_tree_attr.2 | 1 + man/man2/statx.2 | 6 +- man/man2type/mount_attr.2type | 61 ++++ 10 files changed, 2940 insertions(+), 17 deletions(-)
Back in 2019, the new mount API was merged[1]. David Howells then set
about writing man pages for these new APIs, and sent some patches back
in 2020[2].
Unfortunately, these patches were never merged, which meant that these
APIs were practically undocumented for many years -- arguably this has
been a contributing factor to the relatively slow adoption of these new
(far better) APIs. For instance, I have often discovered that many folks
are unaware of the read(2)-based message retrieval interface provided by
filesystem context file descriptors.
In 2024, Christian Brauner adapted David Howell's original man pages
into the easier-to-edit Markdown format and published them on GitHub[3].
These have been maintained since, including updated information on new
features added since David Howells's 2020 draft pages (such as
MOVE_MOUNT_BENEATH).
While this was a welcome improvement to the previous status quo (that
had lasted over 6 years), speaking personally my experience is that not
having access to these man pages from the terminal has been a fairly
common painpoint.
So, this is a modern version of the man pages for these APIs, in the
hopes that we can finally (6 years later) get proper documentation for
these APIs in the man-pages project.
One important thing to note is that most of these were re-written by me,
with very minimal copying from the versions available from Christian[2].
The reasons for this are two-fold:
* Both Howells's original version and Christian's maintained versions
contain crucial mistakes that I have been bitten by in the past (the
most obvious being that all of these APIs were merged in Linux 5.2,
but the man pages all claim they were merged in different versions.)
* As the man pages appear to have been written from Howells's
perspective while implementing them, some of the wording is a little
too tied to the implementation (or appears to describe features that
don't really exist in the merged versions of these APIs).
* The original versions of the man-pages lacked bigger-picture
explanations of the reasoning behind the API, which would make it
easier for readers to understand what operations are doing.
I decided that the best way to resolve these issues is to rewrite them
from the perspective of an actual user of these APIs (me), and check
that we do not repeat the mistakes I found in the originals. I have also
done my best to resolve the issues raised by Michael Kerrisk on the
original patchset sent by Howells[1].
In addition, I have also included a man page for open_tree_attr(2) (as a
subsection of the new open_tree(2) man page), which was merged in Linux
6.15.
[1]: https://lore.kernel.org/all/20190507204921.GL23075@ZenIV.linux.org.uk/
[2]: https://lore.kernel.org/linux-man/159680892602.29015.6551860260436544999.stgit@warthog.procyon.org.uk/
[3]: https://github.com/brauner/man-pages-md
Co-authored-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Co-authored-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
Changes in v3:
- `sed -i s|Co-developed-by|Co-authored-by|g`. [Alejandro Colomar]
- Add Signed-off-by for co-authors. [Christian Brauner]
- `sed -i s|needs-mount|awaiting-mount|g`, to match the kernel parlance.
- Fix VERSIONS/HISTORY mixup in mount_attr(2type) that was copied from
open_how(2type). [Alejandro Colomar]
- Fix incorrect .BR usage in SYNOPSIS.
- Some more semantic newlines fixes. [Alejandro Colomar]
- Minor fixes suggested by Alejandro. [Alejandro Colomar]
- open_tree_attr(2): heavily reword everything to be better formatted
and more explicit about its behaviour.
- open_tree(2): write proper explanatory paragraphs for the EXAMPLES.
- mount_setattr(2): fix stray doublequote in SYNOPSIS. [Askar Safin]
- fsopen(2): rework structure of the DESCRIPTION introduction.
- fsopen(2): explicitly say that read(2) errors in the message retrieval
interface are actual errors, not return 0. [Askar Safin]
- fsopen(2): add BUGS section to describe the unfortunate -ENODATA
message dropping behaviour that should be fixed by
<https://lore.kernel.org/r/20250807-fscontext-log-cleanups-v3-0-8d91d6242dc3@cyphar.com/>.
- fsconfig(2): add a NOTES subsection about generic filesystem
parameters.
- fsconfig(2): add comment about the weirdness surrounding
FSCONFIG_SET_PATH.
- {fspick,open_tree}(2): Correct AT_NO_AUTOMOUNT description (copied
from David, who probably copied it from statx(2)) -- AT_NO_AUTOMOUNT
applies to all path components, not just the final one. [Christian
Brauner]
- statx(2): fix AT_NO_AUTOMOUNT documentation.
- open_tree(2): swap open(2) reference for openat(2) when saying that
the result is identical. [Askar Safin]
- fsmount(2): fix DESCRIPTION introduction, and rework attr_flags
description to better reference mount_setattr(2).
- {fsopen,fspick,fsmount,open_tree}(2): don't use "attach" when talking
about the file descriptors we return that reference in-kernel objects,
to avoid confusing readers with mount object attachment status.
- fsconfig(2): remove pidns argument example, as it was kind of unclear
and referenced kernel features not yet merged.
- fsconfig(2): remove rambling FSCONFIG_SET_PATH_EMPTY text (which
mostly describes an academic issue that doesn't apply to any existing
filesystem), and instead add a CAVEATS section which touches on the
weird type behaviour of fsconfig(2).
- v2: <https://lore.kernel.org/r/20250807-new-mount-api-v2-0-558a27b8068c@cyphar.com>
Changes in v2:
- `make -R lint-man`. [Alejandro Colomar]
- `sed -i s|Glibc|glibc|g`. [Alejandro Colomar]
- `sed -i s|pathname|path|g` [Alejandro Colomar]
- Clean up macro usage, example code, and synopsis. [Alejandro Colomar]
- Try to use semantic newlines. [Alejandro Colomar]
- Make sure the usage of "filesystem context", "filesystem instance",
and "mount object" are consistent. [Askar Safin]
- Avoid referring to these syscalls without an "at" suffix as "*at()
syscalls". [Askar Safin]
- Use \% to avoid hyphenation of constants. [Askar Safin, G. Branden Robinson]
- Add a new subsection to mount_setattr(2) to describe the distinction
between mount attributes and filesystem parameters.
- (Under protest) double-space-after-period formatted commit messages.
- v1: <https://lore.kernel.org/r/20250806-new-mount-api-v1-0-8678f56c6ee0@cyphar.com>
---
Aleksa Sarai (12):
man/man2/statx.2: correctly document AT_NO_AUTOMOUNT
man/man2/mount_setattr.2: fix stray quote in SYNOPSIS
man/man2/mount_setattr.2: move mount_attr struct to mount_attr(2type)
man/man2/fsopen.2: document "new" mount API
man/man2/fspick.2: document "new" mount API
man/man2/fsconfig.2: document "new" mount API
man/man2/fsmount.2: document "new" mount API
man/man2/move_mount.2: document "new" mount API
man/man2/open_tree.2: document "new" mount API
man/man2/mount_setattr.2: mirror opening sentence from fsopen(2)
man/man2/open_tree{,_attr}.2: document new open_tree_attr() API
man/man2/{fsconfig,mount_setattr}.2: add note about attribute-parameter distinction
man/man2/fsconfig.2 | 681 ++++++++++++++++++++++++++++++++++++++++++
man/man2/fsmount.2 | 220 ++++++++++++++
man/man2/fsopen.2 | 384 ++++++++++++++++++++++++
man/man2/fspick.2 | 309 +++++++++++++++++++
man/man2/mount_setattr.2 | 62 +++-
man/man2/move_mount.2 | 640 +++++++++++++++++++++++++++++++++++++++
man/man2/open_tree.2 | 593 ++++++++++++++++++++++++++++++++++++
man/man2/open_tree_attr.2 | 1 +
man/man2/statx.2 | 6 +-
man/man2type/mount_attr.2type | 61 ++++
10 files changed, 2940 insertions(+), 17 deletions(-)
---
base-commit: e473affca7b039fd018eedb839d6c80e4fd3df17
change-id: 20250802-new-mount-api-436db984f432
Kind regards,
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
---- On Sat, 09 Aug 2025 00:39:44 +0400 Aleksa Sarai <cyphar@cyphar.com> wrote --- > Back in 2019, the new mount API was merged[1]. David Howells then set I finished my experiments! -- Askar Safin https://types.pl/@safinaskar
There is one particular case when open_tree is more powerful than openat with O_PATH. open_tree supports AT_EMPTY_PATH, and openat supports nothing similar. This means that we can convert normal O_RDONLY file descriptor to O_PATH descriptor using open_tree! I. e.: rd = openat(AT_FDCWD, "/tmp/a", O_RDONLY, 0); // Regular file open_tree(rd, "", AT_EMPTY_PATH); You can achieve same effect using /proc: rd = openat(AT_FDCWD, "/tmp/a", O_RDONLY, 0); // Regular file snprintf(buf, sizeof(buf), "/proc/self/fd/%d", rd); openat(AT_FDCWD, buf, O_PATH, 0); But still I think this has security implications. This means that even if we deny access to /proc for container, it still is able to convert O_RDONLY descriptors to O_PATH descriptors using open_tree. I. e. this is yet another thing to think about when creating sandboxes. I know you delivered a talk about similar things a lot of time ago: https://lwn.net/Articles/934460/ . (I tested this.) -- Askar Safin https://types.pl/@safinaskar
On 2025-08-21, Askar Safin <safinaskar@zohomail.com> wrote: > There is one particular case when open_tree is more powerful than openat with O_PATH. open_tree supports AT_EMPTY_PATH, and openat supports nothing similar. > This means that we can convert normal O_RDONLY file descriptor to O_PATH descriptor using open_tree! I. e.: > rd = openat(AT_FDCWD, "/tmp/a", O_RDONLY, 0); // Regular file > open_tree(rd, "", AT_EMPTY_PATH); > You can achieve same effect using /proc: > rd = openat(AT_FDCWD, "/tmp/a", O_RDONLY, 0); // Regular file > snprintf(buf, sizeof(buf), "/proc/self/fd/%d", rd); > openat(AT_FDCWD, buf, O_PATH, 0); > But still I think this has security implications. This means that even if we deny access to /proc for container, it still is able to convert O_RDONLY > descriptors to O_PATH descriptors using open_tree. I. e. this is yet another thing to think about when creating sandboxes. > I know you delivered a talk about similar things a lot of time ago: https://lwn.net/Articles/934460/ . (I tested this.) O_RDONLY -> O_PATH is less of an issue than the other way around. There isn't much you can do with O_PATH that you can't do with a properly open file (by design you actually should have strictly less privileges but some operations are only really possible with O_PATH, but they're not security-critical in that way). I was working on a new patchset for resolving this issue (and adding O_EMPTYPATH support) late last year but other things fell on my plate and the design was quite difficult to get to a place where everyone agreed to it. The core issue is that we would need to block not just re-opening but also any operation that is a write (or read) in disguise, which kind of implies you need to have capabilities attached to file descriptors. This is already slightly shaky ground if you look at the history of projects like capsicum -- but also my impression was that just adding it to "file_permission" was not sufficient, you need to put it in "path_permission" which means we have to either bloat "struct path" or come up with some extended structure that you need to plumb through everywhere. But yes, this is a thing that is still on my list of things to do, but not in the immediate future. -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/
I noticed that you changed docs for automounts.
So I dig into automounts implementation.
And I found a bug in openat2.
If RESOLVE_NO_XDEV is specified, then name resolution
doesn't cross automount points (i. e. we get EXDEV),
but automounts still happen!
I think this is a bug.
Bug is reproduced in 6.17-rc1.
In the end of this mail you will find reproducer.
And miniconfig.
If you send patches for this bug, please, CC me.
Are automounts actually used? Is it possible to deprecate or
remove them? It seems for me automounts are rarely tested obscure
feature, which affects core namei code.
This reproducer is based on "tracing" automount, which
actually *IS* already deprecated. But automount mechanism
itself is not deprecated, as well as I know.
Also, I did read namei code, and I think that
options AT_NO_AUTOMOUNT, FSPICK_NO_AUTOMOUNT, etc affect
last component only, not all of them. I didn't test this yet.
I plan to test this within next days.
Also, I still didn't finish my experiments. Hopefully I will
finish them in 7 days. :)
Askar Safin
====
miniconfig:
CONFIG_64BIT=y
CONFIG_EXPERT=y
CONFIG_PRINTK=y
CONFIG_PRINTK_TIME=y
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_PROC_FS=y
CONFIG_DEVTMPFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_DEBUG_FS=y
CONFIG_USER_EVENTS=y
CONFIG_FTRACE=y
CONFIG_MULTIUSER=y
CONFIG_NAMESPACES=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_RD_GZIP=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_TRACEFS_AUTOMOUNT_DEPRECATED=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
====
/*
Author: Askar Safin
Public domain
Make sure your kernel is compiled with CONFIG_TRACEFS_AUTOMOUNT_DEPRECATED=y
If that openat2 bug reproduces, then this program will
print "BUG REPRODUCED". If openat2 is fixed, then
the program will print "BUG NOT REPRODUCED".
Any other output means that something gone wrong,
i. e. results are indeterminate.
This program requires root in initial user namespace
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/mount.h>
#include <sys/syscall.h>
#include <linux/openat2.h>
int
main (void)
{
if (unshare (CLONE_NEWNS) != 0)
{
perror ("unshare");
return 1;
}
if (mount (NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL) != 0)
{
perror ("mount(NULL, /, NULL, MS_REC | MS_PRIVATE, NULL)");
return 1;
}
if (mount (NULL, "/tmp", "tmpfs", 0, NULL) != 0)
{
perror ("mount tmpfs");
return 1;
}
if (mkdir ("/tmp/debugfs", 0777) != 0)
{
perror ("mkdir(/tmp/debugfs)");
return 1;
}
if (mount (NULL, "/tmp/debugfs", "debugfs", 0, NULL) != 0)
{
perror ("mount debugfs");
return 1;
}
{
struct statx tracing;
if (statx (AT_FDCWD, "/tmp/debugfs/tracing", AT_NO_AUTOMOUNT, 0, &tracing) != 0)
{
perror ("statx tracing");
return 1;
}
if (!(tracing.stx_attributes_mask & STATX_ATTR_MOUNT_ROOT))
{
fprintf (stderr, "???\n");
return 1;
}
// Let's check that nothing is mounted at /tmp/debugfs/tracing yet
if (tracing.stx_attributes & STATX_ATTR_MOUNT_ROOT)
{
fprintf (stderr, "Something already mounted at /tmp/debugfs/tracing\n");
return 1;
}
}
if (chdir ("/tmp/debugfs") != 0)
{
perror ("chdir");
return 1;
}
{
struct open_how how;
memset (&how, 0, sizeof how);
how.flags = O_DIRECTORY;
how.mode = 0;
how.resolve = RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS;
if (syscall (SYS_openat2, AT_FDCWD, "tracing", &how, sizeof how) != -1)
{
fprintf (stderr, "openat2 crossed automount point");
return 1;
}
if (errno != EXDEV)
{
fprintf (stderr, "wrong errno");
return 1;
}
}
{
struct statx tracing;
if (statx (AT_FDCWD, "/tmp/debugfs/tracing", AT_NO_AUTOMOUNT, 0, &tracing) != 0)
{
perror ("statx tracing (2)");
return 1;
}
if (!(tracing.stx_attributes_mask & STATX_ATTR_MOUNT_ROOT))
{
fprintf (stderr, "???\n");
return 1;
}
if (tracing.stx_attributes & STATX_ATTR_MOUNT_ROOT)
{
fprintf (stderr, "BUG REPRODUCED. Something mounted at /tmp/debugfs/tracing\n");
return 0;
}
else
{
fprintf (stderr, "BUG NOT REPRODUCED\n");
return 0;
}
}
}
On 2025-08-17, Askar Safin <safinaskar@zohomail.com> wrote:
> I noticed that you changed docs for automounts. So I dig into
> automounts implementation. And I found a bug in openat2. If
> RESOLVE_NO_XDEV is specified, then name resolution doesn't cross
> automount points (i. e. we get EXDEV), but automounts still happen! I
> think this is a bug. Bug is reproduced in 6.17-rc1. In the end of this
> mail you will find reproducer. And miniconfig.
Yes, this is a bug -- we check LOOKUP_NO_XDEV after traverse_mounts()
because we want to error out if we actually jumped to a different mount.
We should probably be erroring out in follow_automount() as well, and I
missed this when I wrote openat2().
openat2() also really needs RESOLVE_NO_AUTOMOUNT (and probably
RESOLVE_NO_DOTDOT as well as some other small features). I'll try to
send something soon.
> Are automounts actually used? Is it possible to deprecate or
> remove them? It seems for me automounts are rarely tested obscure
> feature, which affects core namei code.
I use them for auto-mounting NFS shares on my laptop, and I'm sure there
are plenty of other users. They are little bit funky but I highly doubt
they are "unused". Howells probably disagrees in even stronger terms.
Most distributions provide autofs as a supported package (I think it
even comes pre-installed for some distros).
They are not tested by fstests AFAICS, but that's more of a flaw in
fstests (automount requires you to have a running autofs daemon, which
probably makes testing it in fstests or selftests impractical) not the
feature itself.
> This reproducer is based on "tracing" automount, which
> actually *IS* already deprecated. But automount mechanism
> itself is not deprecated, as well as I know.
The automount behaviour of tracefs is different to the general automount
mechanism which is managed by userspace with the autofs daemon. I don't
know the history behind the deprecation, but I expect that it was
deprecated in favour of configuring it with autofs (or just enabling it
by default).
> Also, I did read namei code, and I think that
> options AT_NO_AUTOMOUNT, FSPICK_NO_AUTOMOUNT, etc affect
> last component only, not all of them. I didn't test this yet.
> I plan to test this within next days.
No, LOOKUP_AUTOMOUNT affects all components. I double-checked this with
Christian.
You would think that it's only the last component (like O_DIRECTORY,
O_NOFOLLOW, AT_SYMLINK_{,NO}FOLLOW) but follow_automount() is called for
all components (i.e., as part of step_into()). It hooks into the regular
lookup flow for mountpoints.
Yes, it is quite funky that AT_NO_AUTOMOUNT is the only AT_* flag that
works this way -- hence why I went with a different RESOLVE_* namespace
for openat2() (which _always_ act on _all_ components).
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
---- On Sun, 17 Aug 2025 20:16:04 +0400 Aleksa Sarai <cyphar@cyphar.com> wrote --- > They are not tested by fstests AFAICS, but that's more of a flaw in > fstests (automount requires you to have a running autofs daemon, which > probably makes testing it in fstests or selftests impractical) not the > feature itself. I suggest testing automounts in fstests/selftests using "tracing" automount. This is what I do in my reproducers. > The automount behaviour of tracefs is different to the general automount > mechanism which is managed by userspace with the autofs daemon. Yes. But I still was able to write reproducers using "tracing", so this automount point is totally okay for tests. (At least for some tests, such as RESOLVE_NO_XDEV.) -- Askar Safin https://types.pl/@safinaskar
On 2025-08-20, Askar Safin <safinaskar@zohomail.com> wrote: > ---- On Sun, 17 Aug 2025 20:16:04 +0400 Aleksa Sarai <cyphar@cyphar.com> wrote --- > > They are not tested by fstests AFAICS, but that's more of a flaw in > > fstests (automount requires you to have a running autofs daemon, which > > probably makes testing it in fstests or selftests impractical) not the > > feature itself. > > I suggest testing automounts in fstests/selftests using "tracing" automount. > This is what I do in my reproducers. > > > The automount behaviour of tracefs is different to the general automount > > mechanism which is managed by userspace with the autofs daemon. > > Yes. But I still was able to write reproducers using "tracing", so this > automount point is totally okay for tests. (At least for some tests, > such as RESOLVE_NO_XDEV.) Sure, but I don't think people use allyesconfig when running selftests. I wonder if the automated test runners even enable deprecated features like that. In any case, you can definitely write some tests for it. :D -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/
On Mon, Aug 18, 2025 at 02:16:04AM +1000, Aleksa Sarai wrote:
> On 2025-08-17, Askar Safin <safinaskar@zohomail.com> wrote:
> > I noticed that you changed docs for automounts. So I dig into
> > automounts implementation. And I found a bug in openat2. If
> > RESOLVE_NO_XDEV is specified, then name resolution doesn't cross
> > automount points (i. e. we get EXDEV), but automounts still happen! I
> > think this is a bug. Bug is reproduced in 6.17-rc1. In the end of this
> > mail you will find reproducer. And miniconfig.
>
> Yes, this is a bug -- we check LOOKUP_NO_XDEV after traverse_mounts()
> because we want to error out if we actually jumped to a different mount.
> We should probably be erroring out in follow_automount() as well, and I
> missed this when I wrote openat2().
>
> openat2() also really needs RESOLVE_NO_AUTOMOUNT (and probably
> RESOLVE_NO_DOTDOT as well as some other small features). I'll try to
> send something soon.
>
> > Are automounts actually used? Is it possible to deprecate or
> > remove them? It seems for me automounts are rarely tested obscure
> > feature, which affects core namei code.
>
> I use them for auto-mounting NFS shares on my laptop, and I'm sure there
> are plenty of other users. They are little bit funky but I highly doubt
> they are "unused". Howells probably disagrees in even stronger terms.
> Most distributions provide autofs as a supported package (I think it
> even comes pre-installed for some distros).
>
> They are not tested by fstests AFAICS, but that's more of a flaw in
> fstests (automount requires you to have a running autofs daemon, which
> probably makes testing it in fstests or selftests impractical) not the
> feature itself.
>
> > This reproducer is based on "tracing" automount, which
> > actually *IS* already deprecated. But automount mechanism
> > itself is not deprecated, as well as I know.
>
> The automount behaviour of tracefs is different to the general automount
> mechanism which is managed by userspace with the autofs daemon. I don't
> know the history behind the deprecation, but I expect that it was
> deprecated in favour of configuring it with autofs (or just enabling it
> by default).
>
> > Also, I did read namei code, and I think that
> > options AT_NO_AUTOMOUNT, FSPICK_NO_AUTOMOUNT, etc affect
> > last component only, not all of them. I didn't test this yet.
> > I plan to test this within next days.
>
> No, LOOKUP_AUTOMOUNT affects all components. I double-checked this with
> Christian.
Hm? I was asking the question in the chat because I was unsure and not
in front of a computer you then said that it does affect all components. :)
>
> You would think that it's only the last component (like O_DIRECTORY,
> O_NOFOLLOW, AT_SYMLINK_{,NO}FOLLOW) but follow_automount() is called for
> all components (i.e., as part of step_into()). It hooks into the regular
> lookup flow for mountpoints.
>
> Yes, it is quite funky that AT_NO_AUTOMOUNT is the only AT_* flag that
> works this way -- hence why I went with a different RESOLVE_* namespace
> for openat2() (which _always_ act on _all_ components).
>
> --
> Aleksa Sarai
> Senior Software Engineer (Containers)
> SUSE Linux GmbH
> https://www.cyphar.com/
On 2025-08-19, Christian Brauner <brauner@kernel.org> wrote:
> On Mon, Aug 18, 2025 at 02:16:04AM +1000, Aleksa Sarai wrote:
> > On 2025-08-17, Askar Safin <safinaskar@zohomail.com> wrote:
> > > I noticed that you changed docs for automounts. So I dig into
> > > automounts implementation. And I found a bug in openat2. If
> > > RESOLVE_NO_XDEV is specified, then name resolution doesn't cross
> > > automount points (i. e. we get EXDEV), but automounts still happen! I
> > > think this is a bug. Bug is reproduced in 6.17-rc1. In the end of this
> > > mail you will find reproducer. And miniconfig.
> >
> > Yes, this is a bug -- we check LOOKUP_NO_XDEV after traverse_mounts()
> > because we want to error out if we actually jumped to a different mount.
> > We should probably be erroring out in follow_automount() as well, and I
> > missed this when I wrote openat2().
> >
> > openat2() also really needs RESOLVE_NO_AUTOMOUNT (and probably
> > RESOLVE_NO_DOTDOT as well as some other small features). I'll try to
> > send something soon.
> >
> > > Are automounts actually used? Is it possible to deprecate or
> > > remove them? It seems for me automounts are rarely tested obscure
> > > feature, which affects core namei code.
> >
> > I use them for auto-mounting NFS shares on my laptop, and I'm sure there
> > are plenty of other users. They are little bit funky but I highly doubt
> > they are "unused". Howells probably disagrees in even stronger terms.
> > Most distributions provide autofs as a supported package (I think it
> > even comes pre-installed for some distros).
> >
> > They are not tested by fstests AFAICS, but that's more of a flaw in
> > fstests (automount requires you to have a running autofs daemon, which
> > probably makes testing it in fstests or selftests impractical) not the
> > feature itself.
> >
> > > This reproducer is based on "tracing" automount, which
> > > actually *IS* already deprecated. But automount mechanism
> > > itself is not deprecated, as well as I know.
> >
> > The automount behaviour of tracefs is different to the general automount
> > mechanism which is managed by userspace with the autofs daemon. I don't
> > know the history behind the deprecation, but I expect that it was
> > deprecated in favour of configuring it with autofs (or just enabling it
> > by default).
> >
> > > Also, I did read namei code, and I think that
> > > options AT_NO_AUTOMOUNT, FSPICK_NO_AUTOMOUNT, etc affect
> > > last component only, not all of them. I didn't test this yet.
> > > I plan to test this within next days.
> >
> > No, LOOKUP_AUTOMOUNT affects all components. I double-checked this with
> > Christian.
>
> Hm? I was asking the question in the chat because I was unsure and not
> in front of a computer you then said that it does affect all components. :)
Yeah I misunderstood what you said -- didn't mean to throw you under the
bus, sorry about that!
> > You would think that it's only the last component (like O_DIRECTORY,
> > O_NOFOLLOW, AT_SYMLINK_{,NO}FOLLOW) but follow_automount() is called for
> > all components (i.e., as part of step_into()). It hooks into the regular
> > lookup flow for mountpoints.
> >
> > Yes, it is quite funky that AT_NO_AUTOMOUNT is the only AT_* flag that
> > works this way -- hence why I went with a different RESOLVE_* namespace
> > for openat2() (which _always_ act on _all_ components).
> >
> > --
> > Aleksa Sarai
> > Senior Software Engineer (Containers)
> > SUSE Linux GmbH
> > https://www.cyphar.com/
>
>
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
I just sent to fsdevel fix for that RESOLVE_NO_XDEV bug.
Aleksa Sarai <cyphar@cyphar.com>:
> No, LOOKUP_AUTOMOUNT affects all components. I double-checked this with
> Christian.
No. I just tested this. See tests (and miniconfig) in the end of this message.
statx always follows automounts in non-final components no matter what.
I tested this. And it follows automounts in final component depending on
AT_NO_AUTOMOUNT. I tested this too. Also, absolutely all other syscalls always
follow automounts in non-final components no matter what. With sole exception
for openat2 with RESOLVE_NO_XDEV. I didn't test this, but I conclude this
by reading code.
First of all, LOOKUP_PARENT's doc in kernel currently is wrong:
https://elixir.bootlin.com/linux/v6.17-rc1/source/include/linux/namei.h#L31
We see there:
#define LOOKUP_PARENT BIT(10) /* Looking up final parent in path */
This is not true. LOOKUP_PARENT means that we are resolving any non-final
component. LOOKUP_PARENT is set when we enter link_path_walk, which
is used for resolving everything except for final component.
And LOOKUP_PARENT is cleared when we leave link_path_walk.
Now let's look here:
https://elixir.bootlin.com/linux/v6.17-rc1/source/fs/namei.c#L1447
if (!(lookup_flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
We never return -EISDIR in this "if" if we are in non-final component
thanks to LOOKUP_PARENT here. We fall to finish_automount instead.
Again: if this is non-final component, then LOOKUP_PARENT is set, and thus
LOOKUP_AUTOMOUNT is ignored. If this is final component, then LOOKUP_AUTOMOUNT
may affect things.
Code below tests that:
- statx always follows non-final automounts
- statx follow final automounts depending on options
The code doesn't test other syscalls, they can be added if needed.
The code was tested in Qemu on Linux 6.17-rc1.
I'm not trying to insult you in any way.
Again: thank you a lot for your work! For openat2 and for these mans.
Askar Safin
====
miniconfig:
CONFIG_64BIT=y
CONFIG_EXPERT=y
CONFIG_PRINTK=y
CONFIG_PRINTK_TIME=y
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_PROC_FS=y
CONFIG_DEVTMPFS=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_DEBUG_FS=y
CONFIG_USER_EVENTS=y
CONFIG_FTRACE=y
CONFIG_MULTIUSER=y
CONFIG_NAMESPACES=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_RD_GZIP=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_TRACEFS_AUTOMOUNT_DEPRECATED=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
====
/*
Author: Askar Safin
Public domain
Make sure your kernel is compiled with CONFIG_TRACEFS_AUTOMOUNT_DEPRECATED=y
If all tests pass, the program
should print "All tests passed".
Any other output means that something gone wrong.
This program requires root in initial user namespace
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/mount.h>
#include <sys/syscall.h>
#include <linux/openat2.h>
#define MY_ASSERT(cond) do { \
if (!(cond)) { \
fprintf (stderr, "%s: assertion failed\n", #cond); \
exit (1); \
} \
} while (0)
bool
tracing_mounted (void)
{
struct statx tracing;
if (statx (AT_FDCWD, "/tmp/debugfs/tracing", AT_NO_AUTOMOUNT, 0, &tracing) != 0)
{
perror ("statx tracing");
exit (1);
}
if (!(tracing.stx_attributes_mask & STATX_ATTR_MOUNT_ROOT))
{
fprintf (stderr, "???\n");
exit (1);
}
return tracing.stx_attributes & STATX_ATTR_MOUNT_ROOT;
}
void
mount_debugfs (void)
{
if (mount (NULL, "/tmp/debugfs", "debugfs", 0, NULL) != 0)
{
perror ("mount debugfs");
exit (1);
}
MY_ASSERT (!tracing_mounted ());
}
void
umount_debugfs (void)
{
umount ("/tmp/debugfs/tracing"); // Ignore errors
if (umount ("/tmp/debugfs") != 0)
{
perror ("umount debugfs");
exit (1);
}
}
int
main (void)
{
// Init
{
if (chdir ("/") != 0)
{
perror ("chdir /");
exit (1);
}
if (unshare (CLONE_NEWNS) != 0)
{
perror ("unshare");
exit (1);
}
if (mount (NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL) != 0)
{
perror ("mount(NULL, /, NULL, MS_REC | MS_PRIVATE, NULL)");
exit (1);
}
if (mount (NULL, "/tmp", "tmpfs", 0, NULL) != 0)
{
perror ("mount tmpfs");
exit (1);
}
}
if (mkdir ("/tmp/debugfs", 0777) != 0)
{
perror ("mkdir(/tmp/debugfs)");
exit (1);
}
// statx always follows automounts in non-final components. With AT_NO_AUTOMOUNT and without AT_NO_AUTOMOUNT
{
mount_debugfs();
{
struct statx readme;
if (statx (AT_FDCWD, "/tmp/debugfs/tracing/README", 0, 0, &readme) != 0)
{
perror ("statx");
exit (1);
}
}
MY_ASSERT (tracing_mounted ());
umount_debugfs();
mount_debugfs();
{
struct statx readme;
if (statx (AT_FDCWD, "/tmp/debugfs/tracing/README", AT_NO_AUTOMOUNT, 0, &readme) != 0)
{
perror ("statx");
exit (1);
}
}
MY_ASSERT (tracing_mounted ());
umount_debugfs();
}
// statx follows automounts in final components if AT_NO_AUTOMOUNT is not specified
{
mount_debugfs();
{
struct statx tracing;
if (statx (AT_FDCWD, "/tmp/debugfs/tracing", 0, 0, &tracing) != 0)
{
perror ("statx");
exit (1);
}
if (!(tracing.stx_attributes_mask & STATX_ATTR_MOUNT_ROOT))
{
fprintf (stderr, "???\n");
exit (1);
}
// Checking that this is new mount, not automount point itself
MY_ASSERT (tracing.stx_attributes & STATX_ATTR_MOUNT_ROOT);
}
MY_ASSERT (tracing_mounted ());
umount_debugfs ();
mount_debugfs();
{
struct statx tracing;
if (statx (AT_FDCWD, "/tmp/debugfs/tracing", AT_NO_AUTOMOUNT, 0, &tracing) != 0)
{
perror ("statx");
exit (1);
}
if (!(tracing.stx_attributes_mask & STATX_ATTR_MOUNT_ROOT))
{
fprintf (stderr, "???\n");
exit (1);
}
MY_ASSERT (!(tracing.stx_attributes & STATX_ATTR_MOUNT_ROOT));
}
MY_ASSERT (!tracing_mounted ());
umount_debugfs ();
}
printf ("All tests passed\n");
exit (0);
}
On 2025-08-17, Askar Safin <safinaskar@zohomail.com> wrote: > I just sent to fsdevel fix for that RESOLVE_NO_XDEV bug. Thanks, I've sent some review comments. > if (!(lookup_flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY | > LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) && > > We never return -EISDIR in this "if" if we are in non-final component > thanks to LOOKUP_PARENT here. We fall to finish_automount instead. Grr, I re-read this conditional a few times and I still misunderstood what it was doing. My bad. -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/
I plan to do a lot of testing of "new" mount API on my computer. It is quiet possible that I will find some bugs in these manpages during testing. (I already found some, but I'm not sure.) I think this will take 3-7 days. So, Alejandro Colomar, please, don't merge this patchset until then. -- Askar Safin https://types.pl/@safinaskar
On 2025-08-09, Askar Safin <safinaskar@zohomail.com> wrote: > I plan to do a lot of testing of "new" mount API on my computer. > It is quiet possible that I will find some bugs in these manpages during testing. > (I already found some, but I'm not sure.) > I think this will take 3-7 days. > So, Alejandro Colomar, please, don't merge this patchset until then. I don't plan to work on this again for the next week at least (I've already spent over a week on these docs -- writing, rewriting, and then rewriting once more for good measure; I've started seeing groff in my nightmares...), so I will go through review comments after you're done. There are some rough edges on these APIs I found while writing these docs, so I plan to fix those this cycle if possible (hopefully those aren't the bugs you said you found in the docs). Two of the fixes have already been merged in the vfs tree for 6.18 (the -ENODATA handling bug, as well as a bug in open_tree_attr() that would've let userspace trigger UAFs). (Once 6.18 is out, I will send a follow-up patchset to document the fixes.) FYI, I've already fixed the few ".BR \% FOO" typos. (My terminal font doesn't have a bold typeface, so when reviewing the rendered man-pages, mistakes involving .B are hard to spot.) -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/
Hi Aleksa,
On Sun, Aug 10, 2025 at 03:32:25AM +1000, Aleksa Sarai wrote:
> On 2025-08-09, Askar Safin <safinaskar@zohomail.com> wrote:
> > I plan to do a lot of testing of "new" mount API on my computer.
> > It is quiet possible that I will find some bugs in these manpages during testing.
> > (I already found some, but I'm not sure.)
> > I think this will take 3-7 days.
> > So, Alejandro Colomar, please, don't merge this patchset until then.
>
> I don't plan to work on this again for the next week at least (I've
> already spent over a week on these docs -- writing, rewriting, and then
> rewriting once more for good measure; I've started seeing groff in my
> nightmares...), so I will go through review comments after you're done.
>
> There are some rough edges on these APIs I found while writing these
> docs, so I plan to fix those this cycle if possible (hopefully those
> aren't the bugs you said you found in the docs). Two of the fixes have
> already been merged in the vfs tree for 6.18 (the -ENODATA handling bug,
> as well as a bug in open_tree_attr() that would've let userspace trigger
> UAFs). (Once 6.18 is out, I will send a follow-up patchset to document
> the fixes.)
>
> FYI, I've already fixed the few ".BR \% FOO" typos. (My terminal font
> doesn't have a bold typeface, so when reviewing the rendered man-pages,
> mistakes involving .B are hard to spot.)
You can review in PDF if you want. See the pdfman(1) script under
src/bin/. It's quite portable:
$ cat src/bin/pdfman
#!/bin/bash
#
# Copyright, the authors of the Linux man-pages project
# SPDX-License-Identifier: GPL-3.0-or-later
set -Eeuo pipefail;
shopt -s lastpipe;
printf '%s\n' "${!#}.XXXXXX" \
| sed 's,.*/,,' \
| xargs mktemp -t \
| read -r tmp;
man -Tpdf "$@" >"$tmp";
xdg-open "$tmp";
It works essentially like man(1), so you can pass any man(7) file as its
argument to read it as a PDF.
(You may or may not have it available in your system, if your distro
packages a recent enough version of the project.)
Have a lovely day!
Alex
--
<https://www.alejandro-colomar.es/>
Hi Askar, On Sat, Aug 09, 2025 at 07:04:06PM +0400, Askar Safin wrote: > I plan to do a lot of testing of "new" mount API on my computer. > It is quiet possible that I will find some bugs in these manpages during testing. > (I already found some, but I'm not sure.) > I think this will take 3-7 days. > So, Alejandro Colomar, please, don't merge this patchset until then. Sure, thanks! Cheers, Alex -- <https://www.alejandro-colomar.es/>
© 2016 - 2025 Red Hat, Inc.