[v1] Optimize mass closing of FDs on child spwaning

[libvirt] [PATCH 3/3] virCommand: use procfs to learn opened FDs

Posted by Michal Privoznik 6 years, 7 months ago

When spawning a child process, between fork() and exec() we close
all file descriptors and keep only those the caller wants us to
pass onto the child. The problem is how we do that. Currently, we
get the limit of opened files and then iterate through each one
of them and either close() it or make it survive exec(). This
approach is suboptimal (although, not that much in default
configurations where the limit is pretty low - 1024). We have
/proc where we can learn what FDs we hold open and thus we can
selectively close only those.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 src/util/vircommand.c | 78 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 70 insertions(+), 8 deletions(-)

diff --git a/src/util/vircommand.c b/src/util/vircommand.c
index faee36c449..d154bb81d4 100644
--- a/src/util/vircommand.c
+++ b/src/util/vircommand.c
@@ -491,27 +491,89 @@ virExecCommon(virCommandPtr cmd, gid_t *groups, int ngroups)
     return ret;
 }
 
+# ifdef __linux__
+/* On Linux, we can utilize procfs and read the table of opened
+ * FDs and selectively close only those FDs we don't want to pass
+ * onto child process (well, the one we will exec soon since this
+ * is called from the child). */
+static int
+virCommandMassCloseGetFDsLinux(virCommandPtr cmd ATTRIBUTE_UNUSED,
+                               virBitmapPtr fds)
+{
+    DIR *dp = NULL;
+    struct dirent *entry;
+    const char *dirName = "/proc/self/fd";
+    int rc;
+    int ret = -1;
+
+    if (virDirOpen(&dp, dirName) < 0)
+        return -1;
+
+    while ((rc = virDirRead(dp, &entry, dirName)) > 0) {
+        int fd;
+
+        if (virStrToLong_i(entry->d_name, NULL, 10, &fd) < 0) {
+            virReportError(VIR_ERR_INTERNAL_ERROR,
+                           _("unable to parse FD: %s"),
+                           entry->d_name);
+            goto cleanup;
+        }
+
+        if (virBitmapSetBit(fds, fd) < 0) {
+            virReportError(VIR_ERR_INTERNAL_ERROR,
+                           _("unable to set FD as open: %d"),
+                           fd);
+            goto cleanup;
+        }
+    }
+
+    if (rc < 0)
+        goto cleanup;
+
+    ret = 0;
+ cleanup:
+    VIR_DIR_CLOSE(dp);
+    return ret;
+}
+
+# else /* !__linux__ */
+
+static int
+virCommandMassCloseGetFDsGeneric(virCommandPtr cmd ATTRIBUTE_UNUSED,
+                                 virBitmapPtr fds)
+{
+    virBitmapSetAll(fds);
+    return 0;
+}
+# endif /* !__linux__ */
+
 static int
 virCommandMassClose(virCommandPtr cmd,
                     int childin,
                     int childout,
                     int childerr)
 {
+    VIR_AUTOPTR(virBitmap) fds = NULL;
     int openmax = sysconf(_SC_OPEN_MAX);
-    int fd;
-    int tmpfd;
+    int fd = -1;
 
-    if (openmax < 0) {
-        virReportSystemError(errno,  "%s",
-                             _("sysconf(_SC_OPEN_MAX) failed"));
+    if (!(fds = virBitmapNew(openmax)))
         return -1;
-    }
 
-    for (fd = 3; fd < openmax; fd++) {
+# ifdef __linux__
+    if (virCommandMassCloseGetFDsLinux(cmd, fds) < 0)
+        return -1;
+# else
+    if (virCommandMassCloseGetFDsGeneric(cmd, fds) < 0)
+        return -1;
+# endif
+
+    fd = virBitmapNextSetBit(fds, -1);
+    for (; fd >= 0; fd = virBitmapNextSetBit(fds, fd)) {
         if (fd == childin || fd == childout || fd == childerr)
             continue;
         if (!virCommandFDIsSet(cmd, fd)) {
-            tmpfd = fd;
+            int tmpfd = fd;
             VIR_MASS_CLOSE(tmpfd);
         } else if (virSetInherit(fd, true) < 0) {
             virReportSystemError(errno, _("failed to preserve fd %d"), fd);
-- 
2.21.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] virCommand: use procfs to learn opened FDs

Posted by Eric Blake 6 years, 7 months ago

On 7/3/19 2:19 AM, Michal Privoznik wrote:
> When spawning a child process, between fork() and exec() we close
> all file descriptors and keep only those the caller wants us to
> pass onto the child. The problem is how we do that. Currently, we
> get the limit of opened files and then iterate through each one
> of them and either close() it or make it survive exec(). This
> approach is suboptimal (although, not that much in default
> configurations where the limit is pretty low - 1024). We have
> /proc where we can learn what FDs we hold open and thus we can
> selectively close only those.
> 
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> ---
>  src/util/vircommand.c | 78 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 70 insertions(+), 8 deletions(-)

> +# ifdef __linux__
> +/* On Linux, we can utilize procfs and read the table of opened
> + * FDs and selectively close only those FDs we don't want to pass
> + * onto child process (well, the one we will exec soon since this
> + * is called from the child). */
> +static int
> +virCommandMassCloseGetFDsLinux(virCommandPtr cmd ATTRIBUTE_UNUSED,
> +                               virBitmapPtr fds)
> +{
> +    DIR *dp = NULL;
> +    struct dirent *entry;
> +    const char *dirName = "/proc/self/fd";
> +    int rc;
> +    int ret = -1;
> +
> +    if (virDirOpen(&dp, dirName) < 0)
> +        return -1;

Unfortunately, POSIX says that opendir()/readdir() are not async-safe
(the list of async-safe functions is
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04;
some implementations of opendir() call malloc(), and malloc() is
definitely not async-safe - if you ever fork() in one thread while
another thread is locked inside a malloc, your child process is
deadlocked if it has to malloc because the forked process no longer has
the thread to release the malloc lock).  It also says readdir() is not
threadafe
(https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09)

Therefore, opendir() (hidden in virDirOpen) and readdir() (hidden in
virDirRead) are generally unsafe to be used between fork() of a
multi-threaded app and the subsequent exec(). But maybe you are safe
because this code is only compiled on Linux? Are we absolutely sure this
can't deadlock, in spite of violating POSIX constraints on async-safety?

http://austingroupbugs.net/view.php?id=696 points out some interesting
problems from the POSIX side - readdir_r() is absolutely broken, and
readdir() being marked as thread-unsafe may be too strict.  But then
again, http://austingroupbugs.net/view.php?id=75 states
   "The application shall not modify the structure to which the
     return value of readdir() points, nor any storage areas pointed to
     by pointers within the structure. The returned pointer, and
     pointers within the structure, might be invalidated or the
     structure or the storage areas might be overwritten by a
     subsequent call to readdir() on the same directory stream.
     They shall not be affected by a call to readdir() on a different
     directory stream."
where it may be the intent that if opendir() doesn't take any locks or
other async-unsafe actions, using opendir() after fork() may be safe
after all (readdir on a DIR* that was opened in the parent is not, but
readdir() on a fresh DIR* opened in the child might be safe, even if not
currently strictly portable).

> +
> +    while ((rc = virDirRead(dp, &entry, dirName)) > 0) {
> +        int fd;
> +
> +        if (virStrToLong_i(entry->d_name, NULL, 10, &fd) < 0) {
> +            virReportError(VIR_ERR_INTERNAL_ERROR,
> +                           _("unable to parse FD: %s"),
> +                           entry->d_name);
> +            goto cleanup;
> +        }
> +
> +        if (virBitmapSetBit(fds, fd) < 0) {

Is our use of virBitmap likewise avoiding use of malloc()?

> +            virReportError(VIR_ERR_INTERNAL_ERROR,
> +                           _("unable to set FD as open: %d"),
> +                           fd);
> +            goto cleanup;
> +        }
> +    }
> +
> +    if (rc < 0)
> +        goto cleanup;
> +
> +    ret = 0;
> + cleanup:
> +    VIR_DIR_CLOSE(dp);
> +    return ret;
> +}
> +
> +# else /* !__linux__ */
> +
> +static int
> +virCommandMassCloseGetFDsGeneric(virCommandPtr cmd ATTRIBUTE_UNUSED,
> +                                 virBitmapPtr fds)
> +{
> +    virBitmapSetAll(fds);
> +    return 0;
> +}
> +# endif /* !__linux__ */
> +
>  static int
>  virCommandMassClose(virCommandPtr cmd,
>                      int childin,
>                      int childout,
>                      int childerr)
>  {
> +    VIR_AUTOPTR(virBitmap) fds = NULL;
>      int openmax = sysconf(_SC_OPEN_MAX);
> -    int fd;
> -    int tmpfd;
> +    int fd = -1;
>  
> -    if (openmax < 0) {
> -        virReportSystemError(errno,  "%s",
> -                             _("sysconf(_SC_OPEN_MAX) failed"));
> +    if (!(fds = virBitmapNew(openmax)))

Would it be better to pre-alloc the bitmap in the parent, prior to
fork()ing, to ensure that we are not using malloc() in between fork and
exec?

>          return -1;
> -    }
>  
> -    for (fd = 3; fd < openmax; fd++) {
> +# ifdef __linux__
> +    if (virCommandMassCloseGetFDsLinux(cmd, fds) < 0)
> +        return -1;
> +# else
> +    if (virCommandMassCloseGetFDsGeneric(cmd, fds) < 0)
> +        return -1;
> +# endif
> +
> +    fd = virBitmapNextSetBit(fds, -1);
> +    for (; fd >= 0; fd = virBitmapNextSetBit(fds, fd)) {
>          if (fd == childin || fd == childout || fd == childerr)
>              continue;
>          if (!virCommandFDIsSet(cmd, fd)) {
> -            tmpfd = fd;
> +            int tmpfd = fd;
>              VIR_MASS_CLOSE(tmpfd);
>          } else if (virSetInherit(fd, true) < 0) {
>              virReportSystemError(errno, _("failed to preserve fd %d"), fd);
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] virCommand: use procfs to learn opened FDs

Posted by Michal Prívozník 6 years, 7 months ago

On 7/12/19 6:55 PM, Eric Blake wrote:
> On 7/3/19 2:19 AM, Michal Privoznik wrote:
>> When spawning a child process, between fork() and exec() we close
>> all file descriptors and keep only those the caller wants us to
>> pass onto the child. The problem is how we do that. Currently, we
>> get the limit of opened files and then iterate through each one
>> of them and either close() it or make it survive exec(). This
>> approach is suboptimal (although, not that much in default
>> configurations where the limit is pretty low - 1024). We have
>> /proc where we can learn what FDs we hold open and thus we can
>> selectively close only those.
>>
>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>> ---
>>  src/util/vircommand.c | 78 ++++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 70 insertions(+), 8 deletions(-)
> 
>> +# ifdef __linux__
>> +/* On Linux, we can utilize procfs and read the table of opened
>> + * FDs and selectively close only those FDs we don't want to pass
>> + * onto child process (well, the one we will exec soon since this
>> + * is called from the child). */
>> +static int
>> +virCommandMassCloseGetFDsLinux(virCommandPtr cmd ATTRIBUTE_UNUSED,
>> +                               virBitmapPtr fds)
>> +{
>> +    DIR *dp = NULL;
>> +    struct dirent *entry;
>> +    const char *dirName = "/proc/self/fd";
>> +    int rc;
>> +    int ret = -1;
>> +
>> +    if (virDirOpen(&dp, dirName) < 0)
>> +        return -1;
> 
> Unfortunately, POSIX says that opendir()/readdir() are not async-safe
> (the list of async-safe functions is
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04;
> some implementations of opendir() call malloc(), and malloc() is
> definitely not async-safe - if you ever fork() in one thread while
> another thread is locked inside a malloc, your child process is
> deadlocked if it has to malloc because the forked process no longer has
> the thread to release the malloc lock).  It also says readdir() is not
> threadafe
> (https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09)
> 
> Therefore, opendir() (hidden in virDirOpen) and readdir() (hidden in
> virDirRead) are generally unsafe to be used between fork() of a
> multi-threaded app and the subsequent exec(). But maybe you are safe
> because this code is only compiled on Linux? Are we absolutely sure this
> can't deadlock, in spite of violating POSIX constraints on async-safety?
> 
> http://austingroupbugs.net/view.php?id=696 points out some interesting
> problems from the POSIX side - readdir_r() is absolutely broken, and
> readdir() being marked as thread-unsafe may be too strict.  But then
> again, http://austingroupbugs.net/view.php?id=75 states
>    "The application shall not modify the structure to which the
>      return value of readdir() points, nor any storage areas pointed to
>      by pointers within the structure. The returned pointer, and
>      pointers within the structure, might be invalidated or the
>      structure or the storage areas might be overwritten by a
>      subsequent call to readdir() on the same directory stream.
>      They shall not be affected by a call to readdir() on a different
>      directory stream."
> where it may be the intent that if opendir() doesn't take any locks or
> other async-unsafe actions, using opendir() after fork() may be safe
> after all (readdir on a DIR* that was opened in the parent is not, but
> readdir() on a fresh DIR* opened in the child might be safe, even if not
> currently strictly portable).

D'oh. POSIX and its limitations. I could allocate the bitmap in the
parent, sure. But avoiding opendir() is not possible (that's the whole
point of this patch). While testing this - on Linux - I did not run into
any deadlock, but maybe I was just lucky. On the other hand, this is
going to run only on Linux, on anything else we'll still iterate over
all FDs.

I find it rather surprising (in a disturbing way) that there is no
better solution to this problem than iterating over all FDs and closing
them. One by one.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] virCommand: use procfs to learn opened FDs

Posted by Eric Blake 6 years, 7 months ago

[adding libc-help in cc]

On 7/13/19 2:41 AM, Michal Prívozník wrote:
> On 7/12/19 6:55 PM, Eric Blake wrote:
>> On 7/3/19 2:19 AM, Michal Privoznik wrote:
>>> When spawning a child process, between fork() and exec() we close
>>> all file descriptors and keep only those the caller wants us to
>>> pass onto the child. The problem is how we do that. Currently, we
>>> get the limit of opened files and then iterate through each one
>>> of them and either close() it or make it survive exec(). This
>>> approach is suboptimal (although, not that much in default
>>> configurations where the limit is pretty low - 1024). We have
>>> /proc where we can learn what FDs we hold open and thus we can
>>> selectively close only those.
>>>
>>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>>> ---
>>>  src/util/vircommand.c | 78 ++++++++++++++++++++++++++++++++++++++-----
>>>  1 file changed, 70 insertions(+), 8 deletions(-)
>>
>>> +# ifdef __linux__
>>> +/* On Linux, we can utilize procfs and read the table of opened
>>> + * FDs and selectively close only those FDs we don't want to pass
>>> + * onto child process (well, the one we will exec soon since this
>>> + * is called from the child). */
>>> +static int
>>> +virCommandMassCloseGetFDsLinux(virCommandPtr cmd ATTRIBUTE_UNUSED,
>>> +                               virBitmapPtr fds)
>>> +{
>>> +    DIR *dp = NULL;
>>> +    struct dirent *entry;
>>> +    const char *dirName = "/proc/self/fd";
>>> +    int rc;
>>> +    int ret = -1;
>>> +
>>> +    if (virDirOpen(&dp, dirName) < 0)
>>> +        return -1;
>>
>> Unfortunately, POSIX says that opendir()/readdir() are not async-safe
>> (the list of async-safe functions is
>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04;
>> some implementations of opendir() call malloc(), and malloc() is
>> definitely not async-safe - if you ever fork() in one thread while
>> another thread is locked inside a malloc, your child process is
>> deadlocked if it has to malloc because the forked process no longer has
>> the thread to release the malloc lock).  It also says readdir() is not
>> threadafe
>> (https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09)
>>
>> Therefore, opendir() (hidden in virDirOpen) and readdir() (hidden in
>> virDirRead) are generally unsafe to be used between fork() of a
>> multi-threaded app and the subsequent exec(). But maybe you are safe
>> because this code is only compiled on Linux? Are we absolutely sure this
>> can't deadlock, in spite of violating POSIX constraints on async-safety?
>>
>> http://austingroupbugs.net/view.php?id=696 points out some interesting
>> problems from the POSIX side - readdir_r() is absolutely broken, and
>> readdir() being marked as thread-unsafe may be too strict.  But then
>> again, http://austingroupbugs.net/view.php?id=75 states
>>    "The application shall not modify the structure to which the
>>      return value of readdir() points, nor any storage areas pointed to
>>      by pointers within the structure. The returned pointer, and
>>      pointers within the structure, might be invalidated or the
>>      structure or the storage areas might be overwritten by a
>>      subsequent call to readdir() on the same directory stream.
>>      They shall not be affected by a call to readdir() on a different
>>      directory stream."
>> where it may be the intent that if opendir() doesn't take any locks or
>> other async-unsafe actions, using opendir() after fork() may be safe
>> after all (readdir on a DIR* that was opened in the parent is not, but
>> readdir() on a fresh DIR* opened in the child might be safe, even if not
>> currently strictly portable).
> 
> D'oh. POSIX and its limitations. I could allocate the bitmap in the
> parent, sure. But avoiding opendir() is not possible (that's the whole
> point of this patch). While testing this - on Linux - I did not run into
> any deadlock, but maybe I was just lucky. On the other hand, this is
> going to run only on Linux, on anything else we'll still iterate over
> all FDs.
> 
> I find it rather surprising (in a disturbing way) that there is no
> better solution to this problem than iterating over all FDs and closing
> them. One by one.

Does anyone know if glibc guarantees that opendir/readdir in between
multi-threaded fork() and exec() is safe, even though POSIX does not
guarantee that safety in general?  I know that one approach to avoid
having to close all fds is religiously using O_CLOEXEC everywhere (so
that the race window of having an fd that would leak is not possible),
but that's also an expensive audit, compared to just ensuring that
things are closed after fork().  Are there any other ideas out there
that we should be aware of (and no, pthread_atfork is not a sane idea)?
(various BSD systems have the closefrom() syscall which is more
efficient than lots of close() calls; and Linux may be adding something
similar https://lwn.net/Articles/789023/), Is there any saner way to
close all unneeded fds that were not properly marked O_CLOEXEC by an
application linking against a multithreaded lib that must perform fork/exec?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] virCommand: use procfs to learn opened FDs

Posted by Florian Weimer 6 years, 6 months ago

* Eric Blake:

> Does anyone know if glibc guarantees that opendir/readdir in between
> multi-threaded fork() and exec() is safe, even though POSIX does not
> guarantee that safety in general?

glibc supports malloc after multi-threaded fork as an extension (or as
a bug, because it makes malloc not async-signal-safe).

Lots of programs depend on this.  OpenJDK even calls malloc after
vfork, which is not officially supported (but of course we can't break
OpenJDK).

> I know that one approach to avoid
> having to close all fds is religiously using O_CLOEXEC everywhere (so
> that the race window of having an fd that would leak is not possible),
> but that's also an expensive audit, compared to just ensuring that
> things are closed after fork().  Are there any other ideas out there
> that we should be aware of (and no, pthread_atfork is not a sane idea)?
> (various BSD systems have the closefrom() syscall which is more
> efficient than lots of close() calls; and Linux may be adding something
> similar https://lwn.net/Articles/789023/), Is there any saner way to
> close all unneeded fds that were not properly marked O_CLOEXEC by an
> application linking against a multithreaded lib that must perform fork/exec?

I tried to add getdents64 (which got committed, but may yet move from
<unistd.h> to <dirent.h>, to match musl) and <sys/direntries.h> (which
did not) in glibc 2.30.  Those interfaces are async-signal-safe
(except on some MIPS variants, where getdents64 has complex
emulation).

If you do not want to use opendir/readdir, issuing getdents64 directly
and parsing the buffer is your best option right now.  (Lowering the
RLIMIT_NOFILE limit does not enable probing for stray descriptors,
unfortunately.)  But opendir/readdir after fork should be fine,
really.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] virCommand: use procfs to learn opened FDs

Posted by Eric Blake 6 years, 6 months ago

On 7/14/19 12:23 AM, Florian Weimer wrote:
> * Eric Blake:
> 
>> Does anyone know if glibc guarantees that opendir/readdir in between
>> multi-threaded fork() and exec() is safe, even though POSIX does not
>> guarantee that safety in general?
> 
> glibc supports malloc after multi-threaded fork as an extension (or as
> a bug, because it makes malloc not async-signal-safe).

It's not a bug for glibc to provide guarantees above what POSIX
requires, but IS a bug for applications to depend on those guarantees
without realizing they are non-portable.

> 
> If you do not want to use opendir/readdir, issuing getdents64 directly
> and parsing the buffer is your best option right now.  (Lowering the
> RLIMIT_NOFILE limit does not enable probing for stray descriptors,
> unfortunately.)  But opendir/readdir after fork should be fine,
> really.

Thanks for checking; I'm okay with the patch that started this thread
going in libvirt if we tweak it to also include a big fat comment
stating that use of opendir/readdir is not safe in general, but should
be safe in this specific use (because glibc adds async-signal safety to
those functions that was not required by POSIX), since the patch is only
using opendir on Linux.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] virCommand: use procfs to learn opened FDs

Posted by Florian Weimer 6 years, 6 months ago

* Eric Blake:

> On 7/14/19 12:23 AM, Florian Weimer wrote:
>> * Eric Blake:
>> 
>>> Does anyone know if glibc guarantees that opendir/readdir in between
>>> multi-threaded fork() and exec() is safe, even though POSIX does not
>>> guarantee that safety in general?
>> 
>> glibc supports malloc after multi-threaded fork as an extension (or as
>> a bug, because it makes malloc not async-signal-safe).
>
> It's not a bug for glibc to provide guarantees above what POSIX
> requires, but IS a bug for applications to depend on those guarantees
> without realizing they are non-portable.

It's a bug because it makes malloc not async-signal-safe (as required by
POSIX) in our current implementation of malloc.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] virCommand: use procfs to learn opened FDs

Posted by Eric Blake 6 years, 6 months ago

On 7/15/19 9:26 AM, Florian Weimer wrote:
> * Eric Blake:
> 
>> On 7/14/19 12:23 AM, Florian Weimer wrote:
>>> * Eric Blake:
>>>
>>>> Does anyone know if glibc guarantees that opendir/readdir in between
>>>> multi-threaded fork() and exec() is safe, even though POSIX does not
>>>> guarantee that safety in general?
>>>
>>> glibc supports malloc after multi-threaded fork as an extension (or as
>>> a bug, because it makes malloc not async-signal-safe).
>>
>> It's not a bug for glibc to provide guarantees above what POSIX
>> requires, but IS a bug for applications to depend on those guarantees
>> without realizing they are non-portable.
> 
> It's a bug because it makes malloc not async-signal-safe (as required by
> POSIX) in our current implementation of malloc.

Huh? malloc() is NOT required by POSIX to be async-signal-safe (it is
NOT in the list at
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
2.4.3 Signal Actions).  POSIX allows for malloc() to be made
async-signal-safe ("Implementations may make other interfaces
async-signal-safe.") but does not require it; therefore, a
strictly-compliant POSIX application CANNOT call malloc() from inside a
signal handler, or after fork()ing from a multi-threaded app but prior
to exec(), because the signal or the fork may have interrupted another
use of malloc(), where the nested malloc() deadlocks trying to obtain
the resource held by the interrupted malloc().  But if glibc has ways to
guarantee that one malloc() can be interrupted by a signal or by a
multi-threaded fork, and still have the second re-entrant usage from
that interrupting context that won't deadlock, then glibc is providing a
stronger guarantee than what POSIX requires.  Anything that has to
obtain a mutex is notoriously difficult to make async-signal-safe, which
is why POSIX does not make the requirement of malloc(), or in turn of
anything like opendir() that might use malloc() under the hood.

I don't get what you say will make malloc() non-async-signal-safe, since
POSIX does not require that was in the first place. I am, however,
stating that glibc might be willing to provide that additional guarantee
(maybe for just opendir(), rather than malloc()) even though relying on
that aspect of glibc means your application is no longer strictly
compliant, but will only work on glibc.  But knowing what glibc
guarantees even when POSIX does not determines what we can do under
#ifdef __linux__ that we otherwise cannot portably do for risk of deadlock.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] virCommand: use procfs to learn opened FDs

Posted by Florian Weimer 6 years, 6 months ago

* Eric Blake:

> On 7/15/19 9:26 AM, Florian Weimer wrote:
>> * Eric Blake:
>> 
>>> On 7/14/19 12:23 AM, Florian Weimer wrote:
>>>> * Eric Blake:
>>>>
>>>>> Does anyone know if glibc guarantees that opendir/readdir in between
>>>>> multi-threaded fork() and exec() is safe, even though POSIX does not
>>>>> guarantee that safety in general?
>>>>
>>>> glibc supports malloc after multi-threaded fork as an extension (or as
>>>> a bug, because it makes malloc not async-signal-safe).
>>>
>>> It's not a bug for glibc to provide guarantees above what POSIX
>>> requires, but IS a bug for applications to depend on those guarantees
>>> without realizing they are non-portable.
>> 
>> It's a bug because it makes malloc not async-signal-safe (as required by
>> POSIX) in our current implementation of malloc.
>
> Huh? malloc() is NOT required by POSIX to be async-signal-safe (it is
> NOT in the list at

Sorry, I mistyped.  I meant to write fork.  It's on the list.

Thanks,
Florian

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] virCommand: use procfs to learn opened FDs

Posted by Eric Blake 6 years, 6 months ago

On 7/15/19 12:01 PM, Florian Weimer wrote:

>>>>> glibc supports malloc after multi-threaded fork as an extension (or as
>>>>> a bug, because it makes malloc not async-signal-safe).
>>>>
>>>> It's not a bug for glibc to provide guarantees above what POSIX
>>>> requires, but IS a bug for applications to depend on those guarantees
>>>> without realizing they are non-portable.
>>>
>>> It's a bug because it makes malloc not async-signal-safe (as required by
>>> POSIX) in our current implementation of malloc.
>>
>> Huh? malloc() is NOT required by POSIX to be async-signal-safe (it is
>> NOT in the list at
> 
> Sorry, I mistyped.  I meant to write fork.  It's on the list.

Ah, that makes much more sense.  In other words, the manner in which
glibc guarantees that malloc() can be called after fork() is enough to
simultaneously break the POSIX requirement that fork() can be called
from a signal handler without risking deadlock (because whatever glibc
does to make malloc safe after fork is not re-entrant enough if a signal
handler tries to fork while glibc was already in the middle of preparing
for fork).

Is it worth opening a bug against POSIX to try to strike or reduce the
requirement that fork() be async-signal safe?  For a single-threaded
app, fork()ing in a signal handler might have made sense, but as POSIX
already says:

"When the application calls fork() from a signal handler and any of the
fork handlers registered by pthread_atfork() calls a function that is
not async-signal-safe, the behavior is undefined."
...
"While the fork() function is async-signal-safe, there is no way for an
implementation to determine whether the fork handlers established by
pthread_atfork() are async-signal-safe. The fork handlers may attempt to
execute portions of the implementation that are not async-signal-safe,
such as those that are protected by mutexes, leading to a deadlock
condition. It is therefore undefined for the fork handlers to execute
functions that are not async-signal-safe when fork() is called from a
signal handler."

it might be nicer if the POSIX wording were changed to require fork() to
be async-signal safe only when used in a single-threaded application
(where pthread_atfork() is not in use), or even dropped the requirement
altogether (relying on implementations to provide that additional
guarantee if they'd like, but not requiring it of all implementations).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] virCommand: use procfs to learn opened FDs

Posted by Florian Weimer 6 years, 6 months ago

* Eric Blake:

> On 7/15/19 12:01 PM, Florian Weimer wrote:
>
>>>>>> glibc supports malloc after multi-threaded fork as an extension (or as
>>>>>> a bug, because it makes malloc not async-signal-safe).
>>>>>
>>>>> It's not a bug for glibc to provide guarantees above what POSIX
>>>>> requires, but IS a bug for applications to depend on those guarantees
>>>>> without realizing they are non-portable.
>>>>
>>>> It's a bug because it makes malloc not async-signal-safe (as required by
>>>> POSIX) in our current implementation of malloc.
>>>
>>> Huh? malloc() is NOT required by POSIX to be async-signal-safe (it is
>>> NOT in the list at
>> 
>> Sorry, I mistyped.  I meant to write fork.  It's on the list.
>
> Ah, that makes much more sense.  In other words, the manner in which
> glibc guarantees that malloc() can be called after fork() is enough to
> simultaneously break the POSIX requirement that fork() can be called
> from a signal handler without risking deadlock (because whatever glibc
> does to make malloc safe after fork is not re-entrant enough if a signal
> handler tries to fork while glibc was already in the middle of preparing
> for fork).

It's taking all the malloc locks, so interrupting malloc is sufficient
to trigger a deadlock with fork.  There are a bunch of other locks as
well, but the malloc locks is the most prominent one.

> Is it worth opening a bug against POSIX to try to strike or reduce the
> requirement that fork() be async-signal safe?  For a single-threaded
> app, fork()ing in a signal handler might have made sense, but as POSIX
> already says:

fork is commonly used in in-process crash handlers.  I suspect this is
in part because it allows one to stop all other threads.  I do not think
this is the proper way to write crash handlers (at least not on Linux),
but it's still very much current practice.  I would certainly welcome a
different requirement from POSIX, but then maybe POSIX does not want to
act as the driver of change on our behalf like this.

Thanks,
Florian

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] virCommand: use procfs to learn opened FDs

Posted by Daniel P. Berrangé 6 years, 6 months ago

On Sun, Jul 14, 2019 at 07:23:10AM +0200, Florian Weimer wrote:
> * Eric Blake:
> 
> > Does anyone know if glibc guarantees that opendir/readdir in between
> > multi-threaded fork() and exec() is safe, even though POSIX does not
> > guarantee that safety in general?
> 
> glibc supports malloc after multi-threaded fork as an extension (or as
> a bug, because it makes malloc not async-signal-safe).
> 
> Lots of programs depend on this.  OpenJDK even calls malloc after
> vfork, which is not officially supported (but of course we can't break
> OpenJDK).

Yep, libvirt already relies glibc semantics that let malloc/free  work
after fork(), for other code we've had since essentially forever.

> > I know that one approach to avoid
> > having to close all fds is religiously using O_CLOEXEC everywhere (so
> > that the race window of having an fd that would leak is not possible),
> > but that's also an expensive audit, compared to just ensuring that
> > things are closed after fork().  Are there any other ideas out there
> > that we should be aware of (and no, pthread_atfork is not a sane idea)?
> > (various BSD systems have the closefrom() syscall which is more
> > efficient than lots of close() calls; and Linux may be adding something
> > similar https://lwn.net/Articles/789023/), Is there any saner way to
> > close all unneeded fds that were not properly marked O_CLOEXEC by an
> > application linking against a multithreaded lib that must perform fork/exec?
> 
> I tried to add getdents64 (which got committed, but may yet move from
> <unistd.h> to <dirent.h>, to match musl) and <sys/direntries.h> (which
> did not) in glibc 2.30.  Those interfaces are async-signal-safe
> (except on some MIPS variants, where getdents64 has complex
> emulation).
> 
> If you do not want to use opendir/readdir, issuing getdents64 directly
> and parsing the buffer is your best option right now.  (Lowering the
> RLIMIT_NOFILE limit does not enable probing for stray descriptors,
> unfortunately.)  But opendir/readdir after fork should be fine,
> really.

Ok, lets just keep it simple & use opendif/readdir.

If we ever hit problems, we can just disable the code & go back to the
slower code we have right now.

Hopefully the kernel folks will finally merge one of the recent
closefrom()-like  proposals and make life much easier.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] virCommand: use procfs to learn opened FDs

Posted by Michal Prívozník 6 years, 7 months ago

On 7/13/19 5:33 PM, Eric Blake wrote:
> [adding libc-help in cc]
> 
> On 7/13/19 2:41 AM, Michal Prívozník wrote:
>> On 7/12/19 6:55 PM, Eric Blake wrote:
>>> On 7/3/19 2:19 AM, Michal Privoznik wrote:
>>>> When spawning a child process, between fork() and exec() we close
>>>> all file descriptors and keep only those the caller wants us to
>>>> pass onto the child. The problem is how we do that. Currently, we
>>>> get the limit of opened files and then iterate through each one
>>>> of them and either close() it or make it survive exec(). This
>>>> approach is suboptimal (although, not that much in default
>>>> configurations where the limit is pretty low - 1024). We have
>>>> /proc where we can learn what FDs we hold open and thus we can
>>>> selectively close only those.
>>>>
>>>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>>>> ---
>>>>  src/util/vircommand.c | 78 ++++++++++++++++++++++++++++++++++++++-----
>>>>  1 file changed, 70 insertions(+), 8 deletions(-)
>>>
>>>> +# ifdef __linux__
>>>> +/* On Linux, we can utilize procfs and read the table of opened
>>>> + * FDs and selectively close only those FDs we don't want to pass
>>>> + * onto child process (well, the one we will exec soon since this
>>>> + * is called from the child). */
>>>> +static int
>>>> +virCommandMassCloseGetFDsLinux(virCommandPtr cmd ATTRIBUTE_UNUSED,
>>>> +                               virBitmapPtr fds)
>>>> +{
>>>> +    DIR *dp = NULL;
>>>> +    struct dirent *entry;
>>>> +    const char *dirName = "/proc/self/fd";
>>>> +    int rc;
>>>> +    int ret = -1;
>>>> +
>>>> +    if (virDirOpen(&dp, dirName) < 0)
>>>> +        return -1;
>>>
>>> Unfortunately, POSIX says that opendir()/readdir() are not async-safe
>>> (the list of async-safe functions is
>>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04;
>>> some implementations of opendir() call malloc(), and malloc() is
>>> definitely not async-safe - if you ever fork() in one thread while
>>> another thread is locked inside a malloc, your child process is
>>> deadlocked if it has to malloc because the forked process no longer has
>>> the thread to release the malloc lock).  It also says readdir() is not
>>> threadafe
>>> (https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09)
>>>
>>> Therefore, opendir() (hidden in virDirOpen) and readdir() (hidden in
>>> virDirRead) are generally unsafe to be used between fork() of a
>>> multi-threaded app and the subsequent exec(). But maybe you are safe
>>> because this code is only compiled on Linux? Are we absolutely sure this
>>> can't deadlock, in spite of violating POSIX constraints on async-safety?
>>>
>>> http://austingroupbugs.net/view.php?id=696 points out some interesting
>>> problems from the POSIX side - readdir_r() is absolutely broken, and
>>> readdir() being marked as thread-unsafe may be too strict.  But then
>>> again, http://austingroupbugs.net/view.php?id=75 states
>>>    "The application shall not modify the structure to which the
>>>      return value of readdir() points, nor any storage areas pointed to
>>>      by pointers within the structure. The returned pointer, and
>>>      pointers within the structure, might be invalidated or the
>>>      structure or the storage areas might be overwritten by a
>>>      subsequent call to readdir() on the same directory stream.
>>>      They shall not be affected by a call to readdir() on a different
>>>      directory stream."
>>> where it may be the intent that if opendir() doesn't take any locks or
>>> other async-unsafe actions, using opendir() after fork() may be safe
>>> after all (readdir on a DIR* that was opened in the parent is not, but
>>> readdir() on a fresh DIR* opened in the child might be safe, even if not
>>> currently strictly portable).
>>
>> D'oh. POSIX and its limitations. I could allocate the bitmap in the
>> parent, sure. But avoiding opendir() is not possible (that's the whole
>> point of this patch). While testing this - on Linux - I did not run into
>> any deadlock, but maybe I was just lucky. On the other hand, this is
>> going to run only on Linux, on anything else we'll still iterate over
>> all FDs.
>>
>> I find it rather surprising (in a disturbing way) that there is no
>> better solution to this problem than iterating over all FDs and closing
>> them. One by one.
> 
> Does anyone know if glibc guarantees that opendir/readdir in between
> multi-threaded fork() and exec() is safe, even though POSIX does not
> guarantee that safety in general?  I know that one approach to avoid
> having to close all fds is religiously using O_CLOEXEC everywhere (so
> that the race window of having an fd that would leak is not possible),
> but that's also an expensive audit, compared to just ensuring that
> things are closed after fork(). 

Problem is not only that our codebase is big and thus not auditable
easily, the other part of the problem is that we are linking with
variety of libraries that may open/close FDs as we call them (e.g.
libiscsi will open a connection to iSCSI target). But from our POV those
FDs are out of our reach. And yet, we don't want to leak them into child
processes.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list