[PATCH v2] 9pfs: local: ignore O_NOATIME if we don't have permissions

Omar Sandoval posted 1 patch 4 years ago
Test docker-mingw@fedora passed
Test docker-quick@centos7 passed
Test checkpatch passed
Test FreeBSD passed
Test asan passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/e9bee604e8df528584693a4ec474ded6295ce8ad.1587149256.git.osandov@fb.com
Maintainers: Christian Schoenebeck <qemu_oss@crudebyte.com>, Greg Kurz <groug@kaod.org>
hw/9pfs/9p-util.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
[PATCH v2] 9pfs: local: ignore O_NOATIME if we don't have permissions
Posted by Omar Sandoval 4 years ago
From: Omar Sandoval <osandov@fb.com>

QEMU's local 9pfs server passes through O_NOATIME from the client. If
the QEMU process doesn't have permissions to use O_NOATIME (namely, it
does not own the file nor have the CAP_FOWNER capability), the open will
fail. This causes issues when from the client's point of view, it
believes it has permissions to use O_NOATIME (e.g., a process running as
root in the virtual machine). Additionally, overlayfs on Linux opens
files on the lower layer using O_NOATIME, so in this case a 9pfs mount
can't be used as a lower layer for overlayfs (cf.
https://github.com/osandov/drgn/blob/dabfe1971951701da13863dbe6d8a1d172ad9650/vmtest/onoatimehack.c
and https://github.com/NixOS/nixpkgs/issues/54509).

Luckily, O_NOATIME is effectively a hint, and is often ignored by, e.g.,
network filesystems. open(2) notes that O_NOATIME "may not be effective
on all filesystems. One example is NFS, where the server maintains the
access time." This means that we can honor it when possible but fall
back to ignoring it.

Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
Changes from v1:

* Add comment
* Add Christian's acked-by

 hw/9pfs/9p-util.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h
index 79ed6b233e..546f46dc7d 100644
--- a/hw/9pfs/9p-util.h
+++ b/hw/9pfs/9p-util.h
@@ -37,9 +37,22 @@ static inline int openat_file(int dirfd, const char *name, int flags,
 {
     int fd, serrno, ret;
 
+again:
     fd = openat(dirfd, name, flags | O_NOFOLLOW | O_NOCTTY | O_NONBLOCK,
                 mode);
     if (fd == -1) {
+        if (errno == EPERM && (flags & O_NOATIME)) {
+            /*
+             * The client passed O_NOATIME but we lack permissions to honor it.
+             * Rather than failing the open, fall back without O_NOATIME. This
+             * doesn't break the semantics on the client side, as the Linux
+             * open(2) man page notes that O_NOATIME "may not be effective on
+             * all filesystems". In particular, NFS and other network
+             * filesystems ignore it entirely.
+             */
+            flags &= ~O_NOATIME;
+            goto again;
+        }
         return -1;
     }
 
-- 
2.26.1


Re: [PATCH v2] 9pfs: local: ignore O_NOATIME if we don't have permissions
Posted by Greg Kurz 4 years ago
On Fri, 17 Apr 2020 11:48:24 -0700
Omar Sandoval <osandov@osandov.com> wrote:

> From: Omar Sandoval <osandov@fb.com>
> 
> QEMU's local 9pfs server passes through O_NOATIME from the client. If
> the QEMU process doesn't have permissions to use O_NOATIME (namely, it
> does not own the file nor have the CAP_FOWNER capability), the open will
> fail. This causes issues when from the client's point of view, it
> believes it has permissions to use O_NOATIME (e.g., a process running as
> root in the virtual machine). Additionally, overlayfs on Linux opens
> files on the lower layer using O_NOATIME, so in this case a 9pfs mount
> can't be used as a lower layer for overlayfs (cf.
> https://github.com/osandov/drgn/blob/dabfe1971951701da13863dbe6d8a1d172ad9650/vmtest/onoatimehack.c
> and https://github.com/NixOS/nixpkgs/issues/54509).
> 
> Luckily, O_NOATIME is effectively a hint, and is often ignored by, e.g.,
> network filesystems. open(2) notes that O_NOATIME "may not be effective
> on all filesystems. One example is NFS, where the server maintains the
> access time." This means that we can honor it when possible but fall
> back to ignoring it.
> 
> Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---

Applied to 9p-next.

> Changes from v1:
> 
> * Add comment
> * Add Christian's acked-by
> 
>  hw/9pfs/9p-util.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h
> index 79ed6b233e..546f46dc7d 100644
> --- a/hw/9pfs/9p-util.h
> +++ b/hw/9pfs/9p-util.h
> @@ -37,9 +37,22 @@ static inline int openat_file(int dirfd, const char *name, int flags,
>  {
>      int fd, serrno, ret;
>  
> +again:
>      fd = openat(dirfd, name, flags | O_NOFOLLOW | O_NOCTTY | O_NONBLOCK,
>                  mode);
>      if (fd == -1) {
> +        if (errno == EPERM && (flags & O_NOATIME)) {
> +            /*
> +             * The client passed O_NOATIME but we lack permissions to honor it.
> +             * Rather than failing the open, fall back without O_NOATIME. This
> +             * doesn't break the semantics on the client side, as the Linux
> +             * open(2) man page notes that O_NOATIME "may not be effective on
> +             * all filesystems". In particular, NFS and other network
> +             * filesystems ignore it entirely.
> +             */
> +            flags &= ~O_NOATIME;
> +            goto again;
> +        }
>          return -1;
>      }
>