[PATCH] linux-user: Use memfd for open syscall emulation

Rainer Müller posted 1 patch 1 year, 9 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20220725162811.87985-1-raimue@codingfarm.de
Maintainers: Laurent Vivier <laurent@vivier.eu>
There is a newer version of this series
linux-user/syscall.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
[PATCH] linux-user: Use memfd for open syscall emulation
Posted by Rainer Müller 1 year, 9 months ago
For certain paths in /proc, the open syscall is intercepted and the
returned file descriptor points to a temporary file with emulated
contents.

If TMPDIR is not accessible or writable for the current user (for
example in a read-only mounted chroot or container) tools such as ps
from procps may fail unexpectedly. Trying to read one of these paths
such as /proc/self/stat would return an error such as ENOENT or EROFS.

To relax the requirement on a writable TMPDIR, use memfd_create()
instead to create an anonymous file and return its file descriptor.

Signed-off-by: Rainer Müller <raimue@codingfarm.de>
---
 linux-user/syscall.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 991b85e6b4..3e4af930ad 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8265,9 +8265,11 @@ static int do_openat(CPUArchState *cpu_env, int dirfd, const char *pathname, int
     }
 
     if (fake_open->filename) {
+        int fd, r;
+
+#ifndef CONFIG_MEMFD
         const char *tmpdir;
         char filename[PATH_MAX];
-        int fd, r;
 
         /* create temporary file to map stat to */
         tmpdir = getenv("TMPDIR");
@@ -8279,6 +8281,12 @@ static int do_openat(CPUArchState *cpu_env, int dirfd, const char *pathname, int
             return fd;
         }
         unlink(filename);
+#else
+        fd = memfd_create("qemu-open", 0);
+        if (fd < 0) {
+            return fd;
+        }
+#endif
 
         if ((r = fake_open->fill(cpu_env, fd))) {
             int e = errno;
-- 
2.25.1


Re: [PATCH] linux-user: Use memfd for open syscall emulation
Posted by Richard Henderson 1 year, 9 months ago
On 7/25/22 09:28, Rainer Müller wrote:
> For certain paths in /proc, the open syscall is intercepted and the
> returned file descriptor points to a temporary file with emulated
> contents.
> 
> If TMPDIR is not accessible or writable for the current user (for
> example in a read-only mounted chroot or container) tools such as ps
> from procps may fail unexpectedly. Trying to read one of these paths
> such as /proc/self/stat would return an error such as ENOENT or EROFS.
> 
> To relax the requirement on a writable TMPDIR, use memfd_create()
> instead to create an anonymous file and return its file descriptor.
> 
> Signed-off-by: Rainer Müller <raimue@codingfarm.de>
> ---
>   linux-user/syscall.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 991b85e6b4..3e4af930ad 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -8265,9 +8265,11 @@ static int do_openat(CPUArchState *cpu_env, int dirfd, const char *pathname, int
>       }
>   
>       if (fake_open->filename) {
> +        int fd, r;
> +
> +#ifndef CONFIG_MEMFD
>           const char *tmpdir;
>           char filename[PATH_MAX];
> -        int fd, r;
>   
>           /* create temporary file to map stat to */
>           tmpdir = getenv("TMPDIR");
> @@ -8279,6 +8281,12 @@ static int do_openat(CPUArchState *cpu_env, int dirfd, const char *pathname, int
>               return fd;
>           }
>           unlink(filename);
> +#else
> +        fd = memfd_create("qemu-open", 0);
> +        if (fd < 0) {
> +            return fd;
> +        }
> +#endif

Even without CONFIG_MEMFD, we will have the memfd_create function available in util/.
I think you should drop the ifdefs like so:

#include "qemu/memfd.h"

     fd = memfd_create(...);
     if (fd < 0) {
         if (errno != ENOSYS) {
             return fd;
         }
         // tmpdir fallback
     }


r~

[PATCH v2] linux-user: Use memfd for open syscall emulation
Posted by Rainer Müller 1 year, 9 months ago
For certain paths in /proc, the open syscall is intercepted and the
returned file descriptor points to a temporary file with emulated
contents.

If TMPDIR is not accessible or writable for the current user (for
example in a read-only mounted chroot or container) tools such as ps
from procps may fail unexpectedly. Trying to read one of these paths
such as /proc/self/stat would return an error such as ENOENT or EROFS.

To relax the requirement on a writable TMPDIR, use memfd_create()
instead to create an anonymous file and return its file descriptor.

Signed-off-by: Rainer Müller <raimue@codingfarm.de>
---
v2: no more #ifdefs, use stub from util/memfd.c with ENOSYS fallback,
    tested with 'strace -e fault=memfd_create'
---
 linux-user/syscall.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 991b85e6b4..7b55726f25 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8269,16 +8269,22 @@ static int do_openat(CPUArchState *cpu_env, int dirfd, const char *pathname, int
         char filename[PATH_MAX];
         int fd, r;
 
-        /* create temporary file to map stat to */
-        tmpdir = getenv("TMPDIR");
-        if (!tmpdir)
-            tmpdir = "/tmp";
-        snprintf(filename, sizeof(filename), "%s/qemu-open.XXXXXX", tmpdir);
-        fd = mkstemp(filename);
+        fd = memfd_create("qemu-open", 0);
         if (fd < 0) {
-            return fd;
+            if (errno != ENOSYS) {
+                return fd;
+            }
+            /* create temporary file to map stat to */
+            tmpdir = getenv("TMPDIR");
+            if (!tmpdir)
+                tmpdir = "/tmp";
+            snprintf(filename, sizeof(filename), "%s/qemu-open.XXXXXX", tmpdir);
+            fd = mkstemp(filename);
+            if (fd < 0) {
+                return fd;
+            }
+            unlink(filename);
         }
-        unlink(filename);
 
         if ((r = fake_open->fill(cpu_env, fd))) {
             int e = errno;
-- 
2.25.1


Re: [PATCH v2] linux-user: Use memfd for open syscall emulation
Posted by Richard Henderson 1 year, 9 months ago
On 7/29/22 08:49, Rainer Müller wrote:
> +            /* create temporary file to map stat to */
> +            tmpdir = getenv("TMPDIR");
> +            if (!tmpdir)
> +                tmpdir = "/tmp";
> +            snprintf(filename, sizeof(filename), "%s/qemu-open.XXXXXX", tmpdir);
> +            fd = mkstemp(filename);
> +            if (fd < 0) {
> +                return fd;
> +            }

We've been using g_file_open_tmp elsewhere; probably good to follow suit here.


r~

Re: [PATCH v2] linux-user: Use memfd for open syscall emulation
Posted by Rainer Müller 1 year, 9 months ago
On 29/07/2022 18.01, Richard Henderson wrote:
> On 7/29/22 08:49, Rainer Müller wrote:
>> +            /* create temporary file to map stat to */
>> +            tmpdir = getenv("TMPDIR");
>> +            if (!tmpdir)
>> +                tmpdir = "/tmp";
>> +            snprintf(filename, sizeof(filename),
>> "%s/qemu-open.XXXXXX", tmpdir);
>> +            fd = mkstemp(filename);
>> +            if (fd < 0) {
>> +                return fd;
>> +            }
> 
> We've been using g_file_open_tmp elsewhere; probably good to follow suit
> here.

That seemed reasonable at first, but with regards to error handling it
gets a bit complicated.

The suggested g_file_open_tmp() would leave us with a GError only, but
to return something meaningful to the caller we must set errno in this
context. As far as I can see, there is no way to convert back to an
errno from GError.

With g_file_open_tmp() we could always set the same generic errno, but
that would hide the real cause completely. I debugged this problem with
this message that was confusing, but at least it gave away a hint:
  cat: can't open '/proc/self/stat': Read-only file system

The other option would be to g_assert_true(fd >= 0) and kill the process
in case opening the temporary file failed. This also feels wrong, as the
caller could still recover from this state and continue.

Rainer

Re: [PATCH v2] linux-user: Use memfd for open syscall emulation
Posted by Richard Henderson 1 year, 9 months ago
On 7/29/22 14:19, Rainer Müller wrote:
> On 29/07/2022 18.01, Richard Henderson wrote:
>> On 7/29/22 08:49, Rainer Müller wrote:
>>> +            /* create temporary file to map stat to */
>>> +            tmpdir = getenv("TMPDIR");
>>> +            if (!tmpdir)
>>> +                tmpdir = "/tmp";
>>> +            snprintf(filename, sizeof(filename),
>>> "%s/qemu-open.XXXXXX", tmpdir);
>>> +            fd = mkstemp(filename);
>>> +            if (fd < 0) {
>>> +                return fd;
>>> +            }
>>
>> We've been using g_file_open_tmp elsewhere; probably good to follow suit
>> here.
> 
> That seemed reasonable at first, but with regards to error handling it
> gets a bit complicated.
> 
> The suggested g_file_open_tmp() would leave us with a GError only, but
> to return something meaningful to the caller we must set errno in this
> context. As far as I can see, there is no way to convert back to an
> errno from GError.
> 
> With g_file_open_tmp() we could always set the same generic errno, but
> that would hide the real cause completely. I debugged this problem with
> this message that was confusing, but at least it gave away a hint:
>    cat: can't open '/proc/self/stat': Read-only file system

That's a good reason.  Let's leave this alone then.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~