Documentation/admin-guide/LSM/ipe.rst | 32 +++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
From: Fan Wu <wufan@kernel.org>
Document a known TOCTOU (time-of-check to time-of-use) issue when using
AT_EXECVE_CHECK with read() on OverlayFS. The deny_write_access()
protection is only held during the syscall, allowing a copy-up operation
to be triggered afterward, causing subsequent read() calls to return
content from the unprotected upper layer.
This is generally not a concern for typical IPE deployments since
dm-verity and fs-verity protected files are effectively read-only.
However, OverlayFS with a writable upper layer presents a special case.
Document mitigation strategies including mounting overlay as read-only
and using mmap() instead of read(). Note that the mmap() mitigation
relies on current OverlayFS implementation details and should not be
considered a security guarantee.
Signed-off-by: Fan Wu <wufan@kernel.org>
---
Documentation/admin-guide/LSM/ipe.rst | 32 +++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/Documentation/admin-guide/LSM/ipe.rst b/Documentation/admin-guide/LSM/ipe.rst
index a756d8158531..b621a98fe5e2 100644
--- a/Documentation/admin-guide/LSM/ipe.rst
+++ b/Documentation/admin-guide/LSM/ipe.rst
@@ -110,6 +110,34 @@ intercepts during the execution process, this mechanism needs the interpreter
to take the initiative, and existing interpreters won't be automatically
supported unless the signal call is added.
+.. WARNING::
+
+ There is a known TOCTOU (time-of-check to time-of-use) issue with
+ ``AT_EXECVE_CHECK`` when interpreters use ``read()`` to obtain script
+ contents after the check [#atacexecvecheck_toctou]_. The ``AT_EXECVE_CHECK``
+ protection (via ``deny_write_access()``) is only held during the syscall.
+ After it returns, the file can be modified before the interpreter reads it.
+
+ In typical IPE deployments, this is not a concern because files protected
+ by dm-verity or fs-verity are effectively read-only and cannot be modified.
+ However, OverlayFS presents a special case: when the lower layer is
+ dm-verity protected (read-only) but the upper layer is writable, an
+ attacker with write access can trigger a copy-up operation after the
+ ``AT_EXECVE_CHECK`` returns, causing subsequent ``read()`` calls to return
+ content from the unprotected upper layer instead of the verified lower layer.
+
+ To mitigate this issue on OverlayFS:
+
+ - Mount the overlay as read-only, or restrict write access to the upper
+ layer.
+ - Interpreters may use ``mmap()`` instead of ``read()`` to obtain script
+ contents. Currently, OverlayFS fixes the underlying real file reference
+ at ``open()`` time for mmap operations, so mmap will continue to access
+ the original lower layer file even after a copy-up. However, this
+ behavior is an implementation detail of OverlayFS and is not guaranteed
+ to remain stable across kernel versions. Do not rely on this as a
+ security guarantee.
+
Threat Model
------------
@@ -833,3 +861,7 @@ A:
kernel's fsverity support; IPE does not impose any
restrictions on the digest algorithm itself;
thus, this list may be out of date.
+
+.. [#atacexecvecheck_toctou] See the O_DENY_WRITE RFC discussion for details on
+ this TOCTOU issue:
+ https://lore.kernel.org/all/20250822170800.2116980-1-mic@digikod.net/
--
2.52.0
On Fri, Jan 30, 2026 at 1:14 AM <wufan@kernel.org> wrote: > > From: Fan Wu <wufan@kernel.org> > > Document a known TOCTOU (time-of-check to time-of-use) issue when using > AT_EXECVE_CHECK with read() on OverlayFS. Hi Fan Wu, TBH, I don't like the way that this problem is being framed. IIUC, the problem is using IPE on a non-read-only fs. Is that correct? That fact that IPE metadata is usually coupled with read-only fs is interesting for the understanding of the use case, but unless IPE feature mandates read-only fs, this is a generic problem. OverlayFS is just one private case, which happens to be common in Android or containers? IDK, you did not mention this. Please describe the problem as a generic problem and give overlayfs as an example, preferable with references to the real world use cases. If I misunderstood, please explain why this problem is exclusive to overlayfs. Thanks, Amir. > The deny_write_access() > protection is only held during the syscall, allowing a copy-up operation > to be triggered afterward, causing subsequent read() calls to return > content from the unprotected upper layer. > > This is generally not a concern for typical IPE deployments since > dm-verity and fs-verity protected files are effectively read-only. > However, OverlayFS with a writable upper layer presents a special case. > > Document mitigation strategies including mounting overlay as read-only > and using mmap() instead of read(). Note that the mmap() mitigation > relies on current OverlayFS implementation details and should not be > considered a security guarantee. > > Signed-off-by: Fan Wu <wufan@kernel.org> > --- > Documentation/admin-guide/LSM/ipe.rst | 32 +++++++++++++++++++++++++++ > 1 file changed, 32 insertions(+) > > diff --git a/Documentation/admin-guide/LSM/ipe.rst b/Documentation/admin-guide/LSM/ipe.rst > index a756d8158531..b621a98fe5e2 100644 > --- a/Documentation/admin-guide/LSM/ipe.rst > +++ b/Documentation/admin-guide/LSM/ipe.rst > @@ -110,6 +110,34 @@ intercepts during the execution process, this mechanism needs the interpreter > to take the initiative, and existing interpreters won't be automatically > supported unless the signal call is added. > > +.. WARNING:: > + > + There is a known TOCTOU (time-of-check to time-of-use) issue with > + ``AT_EXECVE_CHECK`` when interpreters use ``read()`` to obtain script > + contents after the check [#atacexecvecheck_toctou]_. The ``AT_EXECVE_CHECK`` > + protection (via ``deny_write_access()``) is only held during the syscall. > + After it returns, the file can be modified before the interpreter reads it. > + > + In typical IPE deployments, this is not a concern because files protected > + by dm-verity or fs-verity are effectively read-only and cannot be modified. > + However, OverlayFS presents a special case: when the lower layer is > + dm-verity protected (read-only) but the upper layer is writable, an > + attacker with write access can trigger a copy-up operation after the > + ``AT_EXECVE_CHECK`` returns, causing subsequent ``read()`` calls to return > + content from the unprotected upper layer instead of the verified lower layer. > + > + To mitigate this issue on OverlayFS: > + > + - Mount the overlay as read-only, or restrict write access to the upper > + layer. > + - Interpreters may use ``mmap()`` instead of ``read()`` to obtain script > + contents. Currently, OverlayFS fixes the underlying real file reference > + at ``open()`` time for mmap operations, so mmap will continue to access > + the original lower layer file even after a copy-up. However, this > + behavior is an implementation detail of OverlayFS and is not guaranteed > + to remain stable across kernel versions. Do not rely on this as a > + security guarantee. > + > Threat Model > ------------ > > @@ -833,3 +861,7 @@ A: > kernel's fsverity support; IPE does not impose any > restrictions on the digest algorithm itself; > thus, this list may be out of date. > + > +.. [#atacexecvecheck_toctou] See the O_DENY_WRITE RFC discussion for details on > + this TOCTOU issue: > + https://lore.kernel.org/all/20250822170800.2116980-1-mic@digikod.net/ > -- > 2.52.0 >
On Fri, Jan 30, 2026 at 3:06 AM Amir Goldstein <amir73il@gmail.com> wrote: > > On Fri, Jan 30, 2026 at 1:14 AM <wufan@kernel.org> wrote: > > > > From: Fan Wu <wufan@kernel.org> > > > > Document a known TOCTOU (time-of-check to time-of-use) issue when using > > AT_EXECVE_CHECK with read() on OverlayFS. > > Hi Fan Wu, > > TBH, I don't like the way that this problem is being framed. > IIUC, the problem is using IPE on a non-read-only fs. > Is that correct? > > That fact that IPE metadata is usually coupled with read-only fs > is interesting for the understanding of the use case, but unless > IPE feature mandates read-only fs, this is a generic problem. > > OverlayFS is just one private case, which happens to be common > in Android or containers? IDK, you did not mention this. > > Please describe the problem as a generic problem and give > overlayfs as an example, preferable with references to the > real world use cases. > > If I misunderstood, please explain why this problem is exclusive > to overlayfs. > > Thanks, > Amir. > Hi Amir, Thanks for the review. That's exactly why we CC'd you and the overlayfs folks, we wanted to get your perspective before documenting this. Let me give some background. IPE enforces execution policy based on file integrity properties, primarily dm-verity and fs-verity. These are the trust anchors, and files without these protections won't be trusted by IPE. Since dm-verity and fs-verity are inherently read-only, in typical deployments the TOCTOU issue doesn't exist. To support overlayfs, IPE uses d_real_inode() to look through the overlay and get the real inode from the lower layer. Recently a new feature AT_EXECVE_CHECK was introduced to allow script interpreters to request LSM checks on script files before execution. The idea is: interpreter opens the script, calls execveat() with AT_EXECVE_CHECK to verify the file passes security policy, then reads and executes the content. What we found is that on overlayfs with a dm-verity lower layer and writable upper layer, when a script file only exists in the lower layer, AT_EXECVE_CHECK passes because IPE sees it's dm-verity protected. But if another process writes to the same path after execveat() returns, copy-up happens and subsequent read() from the original fd returns content from the upper layer. We verified this through testing. Overlayfs is popular in container environments, so we want to document this for IPE users. We noticed the overlayfs documentation (https://docs.kernel.org/filesystems/overlayfs.html#non-standard-behavior) states that if a lower layer file is opened and memory mapped, subsequent changes are not reflected in the memory mapping. We also verified this: mmap keeps the original lower layer content after copy-up. One reason we CC'd you is to ask: is relying on mmap to keep the original lower file reference a reasonable choice? Or would you recommend against depending on this behavior? The narrative in the patch can definitely be adjusted. Would something like this work better: "When using AT_EXECVE_CHECK on overlayfs, if the lower layer is integrity-protected but the upper layer is writable, a copy-up between the check and read() may cause the interpreter to read unverified content." Let us know what you think. -Fan
[+CC fsdevel] On Fri, Jan 30, 2026 at 8:21 PM Fan Wu <wufan@kernel.org> wrote: > > On Fri, Jan 30, 2026 at 3:06 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > On Fri, Jan 30, 2026 at 1:14 AM <wufan@kernel.org> wrote: > > > > > > From: Fan Wu <wufan@kernel.org> > > > > > > Document a known TOCTOU (time-of-check to time-of-use) issue when using > > > AT_EXECVE_CHECK with read() on OverlayFS. > > > > Hi Fan Wu, > > > > TBH, I don't like the way that this problem is being framed. > > IIUC, the problem is using IPE on a non-read-only fs. > > Is that correct? > > > > That fact that IPE metadata is usually coupled with read-only fs > > is interesting for the understanding of the use case, but unless > > IPE feature mandates read-only fs, this is a generic problem. > > > > OverlayFS is just one private case, which happens to be common > > in Android or containers? IDK, you did not mention this. > > > > Please describe the problem as a generic problem and give > > overlayfs as an example, preferable with references to the > > real world use cases. > > > > If I misunderstood, please explain why this problem is exclusive > > to overlayfs. > > > > Thanks, > > Amir. > > > > Hi Amir, > > Thanks for the review. That's exactly why we CC'd you and the > overlayfs folks, we wanted to get your perspective before documenting > this. > > Let me give some background. IPE enforces execution policy based on > file integrity properties, primarily dm-verity and fs-verity. These > are the trust anchors, and files without these protections won't be > trusted by IPE. Since dm-verity and fs-verity are inherently > read-only, in typical deployments the TOCTOU issue doesn't exist. To > support overlayfs, IPE uses d_real_inode() to look through the overlay > and get the real inode from the lower layer. > > Recently a new feature AT_EXECVE_CHECK was introduced to allow script > interpreters to request LSM checks on script files before execution. > The idea is: interpreter opens the script, calls execveat() with > AT_EXECVE_CHECK to verify the file passes security policy, then reads > and executes the content. > > What we found is that on overlayfs with a dm-verity lower layer and > writable upper layer, when a script file only exists in the lower > layer, AT_EXECVE_CHECK passes because IPE sees it's dm-verity > protected. But if another process writes to the same path after > execveat() returns, copy-up happens and subsequent read() from the > original fd returns content from the upper layer. We verified this > through testing. I don't understand how this is different from any AT_EXECVE_CHECK TOCTOU race on a writable filesystem, regardless of IPE. It seems to me that it is the user calling AT_EXECVE_CHECK who is responsible for verifying after reading that file has not changed and if it has changed, then the AT_EXECVE_CHECK could be invalidated (depending on policy). Maybe multi-grain ctime could provide a safe cache invalidating check? As long as the filesystem is trusted to report true MG ctime. See below regarding overlayfs... > > Overlayfs is popular in container environments, so we want to document > this for IPE users. > > We noticed the overlayfs documentation > (https://docs.kernel.org/filesystems/overlayfs.html#non-standard-behavior) > states that if a lower layer file is opened and memory mapped, > subsequent changes are not reflected in the memory mapping. We also > verified this: mmap keeps the original lower layer content after > copy-up. One reason we CC'd you is to ask: is relying on mmap to keep > the original lower file reference a reasonable choice? Or would you > recommend against depending on this behavior? I recommend against depending on this behavior. Please do not document this as a solution. It sounds like you are documenting a recipe for how to write a safe interpreter? The advice to mount overlayfs read only seems impractical to 90% of the container users which use overlayfs specifically to add writability over a read only image. If the generic ctime check is not considered reliable enough for checking if a file was modified and copied up, what about checking the STATX_ATTR_VERITY flag of the file? ovl_getattr() reports the STATX_ flags from the "upper most" inode and then merges some specific fields from lower layers. I think that would mean that in the use case to describe, a copy up would result in STATX_ATTR_VERITY going away when doing fstat() after a copy up. If this works, better document it as a good solution. > > The narrative in the patch can definitely be adjusted. Would something > like this work better: > > "When using AT_EXECVE_CHECK on overlayfs, if the lower layer is > integrity-protected but the upper layer is writable, a copy-up between > the check and read() may cause the interpreter to read unverified > content." Sounds fine to me, as long as it is clear that overlayfs is just a private case of the AT_EXECVE_CHECK TOCOU race. Thanks, Amir.
© 2016 - 2026 Red Hat, Inc.