[PATCH] ipe: document AT_EXECVE_CHECK TOCTOU issue on OverlayFS

wufan@kernel.org posted 1 patch 1 week, 1 day ago
Documentation/admin-guide/LSM/ipe.rst | 32 +++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
[PATCH] ipe: document AT_EXECVE_CHECK TOCTOU issue on OverlayFS
Posted by wufan@kernel.org 1 week, 1 day ago
From: Fan Wu <wufan@kernel.org>

Document a known TOCTOU (time-of-check to time-of-use) issue when using
AT_EXECVE_CHECK with read() on OverlayFS. The deny_write_access()
protection is only held during the syscall, allowing a copy-up operation
to be triggered afterward, causing subsequent read() calls to return
content from the unprotected upper layer.

This is generally not a concern for typical IPE deployments since
dm-verity and fs-verity protected files are effectively read-only.
However, OverlayFS with a writable upper layer presents a special case.

Document mitigation strategies including mounting overlay as read-only
and using mmap() instead of read(). Note that the mmap() mitigation
relies on current OverlayFS implementation details and should not be
considered a security guarantee.

Signed-off-by: Fan Wu <wufan@kernel.org>
---
 Documentation/admin-guide/LSM/ipe.rst | 32 +++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/Documentation/admin-guide/LSM/ipe.rst b/Documentation/admin-guide/LSM/ipe.rst
index a756d8158531..b621a98fe5e2 100644
--- a/Documentation/admin-guide/LSM/ipe.rst
+++ b/Documentation/admin-guide/LSM/ipe.rst
@@ -110,6 +110,34 @@ intercepts during the execution process, this mechanism needs the interpreter
 to take the initiative, and existing interpreters won't be automatically
 supported unless the signal call is added.
 
+.. WARNING::
+
+   There is a known TOCTOU (time-of-check to time-of-use) issue with
+   ``AT_EXECVE_CHECK`` when interpreters use ``read()`` to obtain script
+   contents after the check [#atacexecvecheck_toctou]_. The ``AT_EXECVE_CHECK``
+   protection (via ``deny_write_access()``) is only held during the syscall.
+   After it returns, the file can be modified before the interpreter reads it.
+
+   In typical IPE deployments, this is not a concern because files protected
+   by dm-verity or fs-verity are effectively read-only and cannot be modified.
+   However, OverlayFS presents a special case: when the lower layer is
+   dm-verity protected (read-only) but the upper layer is writable, an
+   attacker with write access can trigger a copy-up operation after the
+   ``AT_EXECVE_CHECK`` returns, causing subsequent ``read()`` calls to return
+   content from the unprotected upper layer instead of the verified lower layer.
+
+   To mitigate this issue on OverlayFS:
+
+   -  Mount the overlay as read-only, or restrict write access to the upper
+      layer.
+   -  Interpreters may use ``mmap()`` instead of ``read()`` to obtain script
+      contents. Currently, OverlayFS fixes the underlying real file reference
+      at ``open()`` time for mmap operations, so mmap will continue to access
+      the original lower layer file even after a copy-up. However, this
+      behavior is an implementation detail of OverlayFS and is not guaranteed
+      to remain stable across kernel versions. Do not rely on this as a
+      security guarantee.
+
 Threat Model
 ------------
 
@@ -833,3 +861,7 @@ A:
                      kernel's fsverity support; IPE does not impose any
                      restrictions on the digest algorithm itself;
                      thus, this list may be out of date.
+
+.. [#atacexecvecheck_toctou] See the O_DENY_WRITE RFC discussion for details on
+                             this TOCTOU issue:
+                             https://lore.kernel.org/all/20250822170800.2116980-1-mic@digikod.net/
-- 
2.52.0
Re: [PATCH] ipe: document AT_EXECVE_CHECK TOCTOU issue on OverlayFS
Posted by Amir Goldstein 1 week, 1 day ago
On Fri, Jan 30, 2026 at 1:14 AM <wufan@kernel.org> wrote:
>
> From: Fan Wu <wufan@kernel.org>
>
> Document a known TOCTOU (time-of-check to time-of-use) issue when using
> AT_EXECVE_CHECK with read() on OverlayFS.

Hi Fan Wu,

TBH, I don't like the way that this problem is being framed.
IIUC, the problem is using IPE on a non-read-only fs.
Is that correct?

That fact that IPE metadata is usually coupled with read-only fs
is interesting for the understanding of the use case, but unless
IPE feature mandates read-only fs, this is a generic problem.

OverlayFS is just one private case, which happens to be common
in Android or containers? IDK, you did not mention this.

Please describe the problem as a generic problem and give
overlayfs as an example, preferable with references to the
real world use cases.

If I misunderstood, please explain why this problem is exclusive
to overlayfs.

Thanks,
Amir.

> The deny_write_access()
> protection is only held during the syscall, allowing a copy-up operation
> to be triggered afterward, causing subsequent read() calls to return
> content from the unprotected upper layer.
>
> This is generally not a concern for typical IPE deployments since
> dm-verity and fs-verity protected files are effectively read-only.
> However, OverlayFS with a writable upper layer presents a special case.
>
> Document mitigation strategies including mounting overlay as read-only
> and using mmap() instead of read(). Note that the mmap() mitigation
> relies on current OverlayFS implementation details and should not be
> considered a security guarantee.
>
> Signed-off-by: Fan Wu <wufan@kernel.org>
> ---
>  Documentation/admin-guide/LSM/ipe.rst | 32 +++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
>
> diff --git a/Documentation/admin-guide/LSM/ipe.rst b/Documentation/admin-guide/LSM/ipe.rst
> index a756d8158531..b621a98fe5e2 100644
> --- a/Documentation/admin-guide/LSM/ipe.rst
> +++ b/Documentation/admin-guide/LSM/ipe.rst
> @@ -110,6 +110,34 @@ intercepts during the execution process, this mechanism needs the interpreter
>  to take the initiative, and existing interpreters won't be automatically
>  supported unless the signal call is added.
>
> +.. WARNING::
> +
> +   There is a known TOCTOU (time-of-check to time-of-use) issue with
> +   ``AT_EXECVE_CHECK`` when interpreters use ``read()`` to obtain script
> +   contents after the check [#atacexecvecheck_toctou]_. The ``AT_EXECVE_CHECK``
> +   protection (via ``deny_write_access()``) is only held during the syscall.
> +   After it returns, the file can be modified before the interpreter reads it.
> +
> +   In typical IPE deployments, this is not a concern because files protected
> +   by dm-verity or fs-verity are effectively read-only and cannot be modified.
> +   However, OverlayFS presents a special case: when the lower layer is
> +   dm-verity protected (read-only) but the upper layer is writable, an
> +   attacker with write access can trigger a copy-up operation after the
> +   ``AT_EXECVE_CHECK`` returns, causing subsequent ``read()`` calls to return
> +   content from the unprotected upper layer instead of the verified lower layer.
> +
> +   To mitigate this issue on OverlayFS:
> +
> +   -  Mount the overlay as read-only, or restrict write access to the upper
> +      layer.
> +   -  Interpreters may use ``mmap()`` instead of ``read()`` to obtain script
> +      contents. Currently, OverlayFS fixes the underlying real file reference
> +      at ``open()`` time for mmap operations, so mmap will continue to access
> +      the original lower layer file even after a copy-up. However, this
> +      behavior is an implementation detail of OverlayFS and is not guaranteed
> +      to remain stable across kernel versions. Do not rely on this as a
> +      security guarantee.
> +
>  Threat Model
>  ------------
>
> @@ -833,3 +861,7 @@ A:
>                       kernel's fsverity support; IPE does not impose any
>                       restrictions on the digest algorithm itself;
>                       thus, this list may be out of date.
> +
> +.. [#atacexecvecheck_toctou] See the O_DENY_WRITE RFC discussion for details on
> +                             this TOCTOU issue:
> +                             https://lore.kernel.org/all/20250822170800.2116980-1-mic@digikod.net/
> --
> 2.52.0
>
Re: [PATCH] ipe: document AT_EXECVE_CHECK TOCTOU issue on OverlayFS
Posted by Fan Wu 1 week ago
On Fri, Jan 30, 2026 at 3:06 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Fri, Jan 30, 2026 at 1:14 AM <wufan@kernel.org> wrote:
> >
> > From: Fan Wu <wufan@kernel.org>
> >
> > Document a known TOCTOU (time-of-check to time-of-use) issue when using
> > AT_EXECVE_CHECK with read() on OverlayFS.
>
> Hi Fan Wu,
>
> TBH, I don't like the way that this problem is being framed.
> IIUC, the problem is using IPE on a non-read-only fs.
> Is that correct?
>
> That fact that IPE metadata is usually coupled with read-only fs
> is interesting for the understanding of the use case, but unless
> IPE feature mandates read-only fs, this is a generic problem.
>
> OverlayFS is just one private case, which happens to be common
> in Android or containers? IDK, you did not mention this.
>
> Please describe the problem as a generic problem and give
> overlayfs as an example, preferable with references to the
> real world use cases.
>
> If I misunderstood, please explain why this problem is exclusive
> to overlayfs.
>
> Thanks,
> Amir.
>

Hi Amir,

Thanks for the review. That's exactly why we CC'd you and the
overlayfs folks, we wanted to get your perspective before documenting
this.

Let me give some background. IPE enforces execution policy based on
file integrity properties, primarily dm-verity and fs-verity. These
are the trust anchors, and files without these protections won't be
trusted by IPE. Since dm-verity and fs-verity are inherently
read-only, in typical deployments the TOCTOU issue doesn't exist. To
support overlayfs, IPE uses d_real_inode() to look through the overlay
and get the real inode from the lower layer.

Recently a new feature AT_EXECVE_CHECK was introduced to allow script
interpreters to request LSM checks on script files before execution.
The idea is: interpreter opens the script, calls execveat() with
AT_EXECVE_CHECK to verify the file passes security policy, then reads
and executes the content.

What we found is that on overlayfs with a dm-verity lower layer and
writable upper layer, when a script file only exists in the lower
layer, AT_EXECVE_CHECK passes because IPE sees it's dm-verity
protected. But if another process writes to the same path after
execveat() returns, copy-up happens and subsequent read() from the
original fd returns content from the upper layer. We verified this
through testing.

Overlayfs is popular in container environments, so we want to document
this for IPE users.

We noticed the overlayfs documentation
(https://docs.kernel.org/filesystems/overlayfs.html#non-standard-behavior)
states that if a lower layer file is opened and memory mapped,
subsequent changes are not reflected in the memory mapping. We also
verified this: mmap keeps the original lower layer content after
copy-up. One reason we CC'd you is to ask: is relying on mmap to keep
the original lower file reference a reasonable choice? Or would you
recommend against depending on this behavior?

The narrative in the patch can definitely be adjusted. Would something
like this work better:

"When using AT_EXECVE_CHECK on overlayfs, if the lower layer is
integrity-protected but the upper layer is writable, a copy-up between
the check and read() may cause the interpreter to read unverified
content."

Let us know what you think.

-Fan
Re: [PATCH] ipe: document AT_EXECVE_CHECK TOCTOU issue on OverlayFS
Posted by Amir Goldstein 1 week ago
[+CC fsdevel]

On Fri, Jan 30, 2026 at 8:21 PM Fan Wu <wufan@kernel.org> wrote:
>
> On Fri, Jan 30, 2026 at 3:06 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Fri, Jan 30, 2026 at 1:14 AM <wufan@kernel.org> wrote:
> > >
> > > From: Fan Wu <wufan@kernel.org>
> > >
> > > Document a known TOCTOU (time-of-check to time-of-use) issue when using
> > > AT_EXECVE_CHECK with read() on OverlayFS.
> >
> > Hi Fan Wu,
> >
> > TBH, I don't like the way that this problem is being framed.
> > IIUC, the problem is using IPE on a non-read-only fs.
> > Is that correct?
> >
> > That fact that IPE metadata is usually coupled with read-only fs
> > is interesting for the understanding of the use case, but unless
> > IPE feature mandates read-only fs, this is a generic problem.
> >
> > OverlayFS is just one private case, which happens to be common
> > in Android or containers? IDK, you did not mention this.
> >
> > Please describe the problem as a generic problem and give
> > overlayfs as an example, preferable with references to the
> > real world use cases.
> >
> > If I misunderstood, please explain why this problem is exclusive
> > to overlayfs.
> >
> > Thanks,
> > Amir.
> >
>
> Hi Amir,
>
> Thanks for the review. That's exactly why we CC'd you and the
> overlayfs folks, we wanted to get your perspective before documenting
> this.
>
> Let me give some background. IPE enforces execution policy based on
> file integrity properties, primarily dm-verity and fs-verity. These
> are the trust anchors, and files without these protections won't be
> trusted by IPE. Since dm-verity and fs-verity are inherently
> read-only, in typical deployments the TOCTOU issue doesn't exist. To
> support overlayfs, IPE uses d_real_inode() to look through the overlay
> and get the real inode from the lower layer.
>
> Recently a new feature AT_EXECVE_CHECK was introduced to allow script
> interpreters to request LSM checks on script files before execution.
> The idea is: interpreter opens the script, calls execveat() with
> AT_EXECVE_CHECK to verify the file passes security policy, then reads
> and executes the content.
>
> What we found is that on overlayfs with a dm-verity lower layer and
> writable upper layer, when a script file only exists in the lower
> layer, AT_EXECVE_CHECK passes because IPE sees it's dm-verity
> protected. But if another process writes to the same path after
> execveat() returns, copy-up happens and subsequent read() from the
> original fd returns content from the upper layer. We verified this
> through testing.

I don't understand how this is different from any AT_EXECVE_CHECK
TOCTOU race on a writable filesystem, regardless of IPE.
It seems to me that it is the user calling AT_EXECVE_CHECK who
is responsible for verifying after reading that file has not changed and if
it has changed, then the AT_EXECVE_CHECK could be invalidated
(depending on policy).

Maybe multi-grain ctime could provide a safe cache invalidating check?
As long as the filesystem is trusted to report true MG ctime.
See below regarding overlayfs...

>
> Overlayfs is popular in container environments, so we want to document
> this for IPE users.
>
> We noticed the overlayfs documentation
> (https://docs.kernel.org/filesystems/overlayfs.html#non-standard-behavior)
> states that if a lower layer file is opened and memory mapped,
> subsequent changes are not reflected in the memory mapping. We also
> verified this: mmap keeps the original lower layer content after
> copy-up. One reason we CC'd you is to ask: is relying on mmap to keep
> the original lower file reference a reasonable choice? Or would you
> recommend against depending on this behavior?

I recommend against depending on this behavior.
Please do not document this as a solution.

It sounds like you are documenting a recipe for how to write a safe
interpreter?

The advice to mount overlayfs read only seems impractical to 90%
of the container users which use overlayfs specifically to add writability
over a read only image.

If the generic ctime check is not considered reliable enough for checking
if a file was modified and copied up, what about checking the
STATX_ATTR_VERITY flag of the file?

ovl_getattr() reports the STATX_ flags from the "upper most" inode
and then merges some specific fields from lower layers.

I think that would mean that in the use case to describe, a copy up
would result in STATX_ATTR_VERITY going away when doing
fstat() after a copy up.

If this works, better document it as a good solution.

>
> The narrative in the patch can definitely be adjusted. Would something
> like this work better:
>
> "When using AT_EXECVE_CHECK on overlayfs, if the lower layer is
> integrity-protected but the upper layer is writable, a copy-up between
> the check and read() may cause the interpreter to read unverified
> content."

Sounds fine to me, as long as it is clear that overlayfs is just
a private case of the AT_EXECVE_CHECK TOCOU race.

Thanks,
Amir.