[PATCH] net: tap: Drop the close of fds for child process

Bin Meng posted 1 patch 1 year ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230406112041.798585-1-bmeng@tinylab.org
Maintainers: Jason Wang <jasowang@redhat.com>
net/tap.c | 14 --------------
1 file changed, 14 deletions(-)
[PATCH] net: tap: Drop the close of fds for child process
Posted by Bin Meng 1 year ago
Current codes using a brute-force traversal of all file descriptors
do not scale on a system where the maximum number of file descriptors
are set to a very large value (e.g.: in a Docker container of Manjaro
distribution it is set to 1073741816). QEMU just looks freezed during
start-up.

The close-on-exec flag was introduced since a faily old Linux kernel
(2.6.23). With recent newer kernels that QEMU supports, we don't need
to manually close the fds for child process as the proper O_CLOEXEC
flag should have been set properly on files that we don't want child
process to see.

Reported-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---

 net/tap.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 1bf085d422..49e1915484 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -446,13 +446,6 @@ static void launch_script(const char *setup_script, const char *ifname,
         return;
     }
     if (pid == 0) {
-        int open_max = sysconf(_SC_OPEN_MAX), i;
-
-        for (i = 3; i < open_max; i++) {
-            if (i != fd) {
-                close(i);
-            }
-        }
         parg = args;
         *parg++ = (char *)setup_script;
         *parg++ = (char *)ifname;
@@ -536,17 +529,10 @@ static int net_bridge_run_helper(const char *helper, const char *bridge,
         return -1;
     }
     if (pid == 0) {
-        int open_max = sysconf(_SC_OPEN_MAX), i;
         char *fd_buf = NULL;
         char *br_buf = NULL;
         char *helper_cmd = NULL;
 
-        for (i = 3; i < open_max; i++) {
-            if (i != sv[1]) {
-                close(i);
-            }
-        }
-
         fd_buf = g_strdup_printf("%s%d", "--fd=", sv[1]);
 
         if (strrchr(helper, ' ') || strrchr(helper, '\t')) {
-- 
2.34.1
Re: [PATCH] net: tap: Drop the close of fds for child process
Posted by Daniel P. Berrangé 1 year ago
On Thu, Apr 06, 2023 at 07:20:41PM +0800, Bin Meng wrote:
> Current codes using a brute-force traversal of all file descriptors
> do not scale on a system where the maximum number of file descriptors
> are set to a very large value (e.g.: in a Docker container of Manjaro
> distribution it is set to 1073741816). QEMU just looks freezed during
> start-up.
> 
> The close-on-exec flag was introduced since a faily old Linux kernel
> (2.6.23). With recent newer kernels that QEMU supports, we don't need
> to manually close the fds for child process as the proper O_CLOEXEC
> flag should have been set properly on files that we don't want child
> process to see.

Even though O_CLOEXEC has existed for a long time, there is plenty
of code that doesn't use it reliably. While QEMU can control its
own code, we use a huge number of 3rd party libraries and we don't
trust them to reliably be using O_CLOEXEC on everything they open.

> Reported-by: Zhangjin Wu <falcon@tinylab.org>
> Signed-off-by: Bin Meng <bmeng@tinylab.org>
> ---
> 
>  net/tap.c | 14 --------------
>  1 file changed, 14 deletions(-)
> 
> diff --git a/net/tap.c b/net/tap.c
> index 1bf085d422..49e1915484 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -446,13 +446,6 @@ static void launch_script(const char *setup_script, const char *ifname,
>          return;
>      }
>      if (pid == 0) {
> -        int open_max = sysconf(_SC_OPEN_MAX), i;
> -
> -        for (i = 3; i < open_max; i++) {
> -            if (i != fd) {
> -                close(i);
> -            }
> -        }
>          parg = args;
>          *parg++ = (char *)setup_script;
>          *parg++ = (char *)ifname;
> @@ -536,17 +529,10 @@ static int net_bridge_run_helper(const char *helper, const char *bridge,
>          return -1;
>      }
>      if (pid == 0) {
> -        int open_max = sysconf(_SC_OPEN_MAX), i;
>          char *fd_buf = NULL;
>          char *br_buf = NULL;
>          char *helper_cmd = NULL;
>  
> -        for (i = 3; i < open_max; i++) {
> -            if (i != sv[1]) {
> -                close(i);
> -            }
> -        }

BSD has closefrom(3) we could use here, while modern Linux has
close_range(3, open_max)

We should probe for those two funtions and use them preferentially,
only falling back to the current manual loop where they don't exist.


> -
>          fd_buf = g_strdup_printf("%s%d", "--fd=", sv[1]);
>  
>          if (strrchr(helper, ' ') || strrchr(helper, '\t')) {
> -- 
> 2.34.1
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH] net: tap: Drop the close of fds for child process
Posted by Philippe Mathieu-Daudé 1 year ago
On 6/4/23 13:20, Bin Meng wrote:
> Current codes using a brute-force traversal of all file descriptors
> do not scale on a system where the maximum number of file descriptors
> are set to a very large value (e.g.: in a Docker container of Manjaro
> distribution it is set to 1073741816). QEMU just looks freezed during
> start-up.
> 
> The close-on-exec flag was introduced since a faily old Linux kernel
> (2.6.23). With recent newer kernels that QEMU supports, we don't need
> to manually close the fds for child process as the proper O_CLOEXEC
> flag should have been set properly on files that we don't want child
> process to see.

But this file is common to all POSIX implementations, not only Linux.

> Reported-by: Zhangjin Wu <falcon@tinylab.org>
> Signed-off-by: Bin Meng <bmeng@tinylab.org>
> ---
> 
>   net/tap.c | 14 --------------
>   1 file changed, 14 deletions(-)
> 
> diff --git a/net/tap.c b/net/tap.c
> index 1bf085d422..49e1915484 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -446,13 +446,6 @@ static void launch_script(const char *setup_script, const char *ifname,
>           return;
>       }
>       if (pid == 0) {

Maybe guard with #ifndef O_CLOEXEC

> -        int open_max = sysconf(_SC_OPEN_MAX), i;
> -
> -        for (i = 3; i < open_max; i++) {
> -            if (i != fd) {
> -                close(i);
> -            }
> -        }

or add qemu_close_cloexec() in util/osdep.c similar to qemu_open_cloexec()?

>           parg = args;
>           *parg++ = (char *)setup_script;
>           *parg++ = (char *)ifname;
> @@ -536,17 +529,10 @@ static int net_bridge_run_helper(const char *helper, const char *bridge,
>           return -1;
>       }
>       if (pid == 0) {
> -        int open_max = sysconf(_SC_OPEN_MAX), i;
>           char *fd_buf = NULL;
>           char *br_buf = NULL;
>           char *helper_cmd = NULL;
>   
> -        for (i = 3; i < open_max; i++) {
> -            if (i != sv[1]) {
> -                close(i);
> -            }
> -        }
> -
>           fd_buf = g_strdup_printf("%s%d", "--fd=", sv[1]);
>   
>           if (strrchr(helper, ' ') || strrchr(helper, '\t')) {
Re: [PATCH] net: tap: Drop the close of fds for child process
Posted by Bin Meng 1 year ago
On Thu, Apr 6, 2023 at 8:34 PM Philippe Mathieu-Daudé <philmd@linaro.org> wrote:
>
> On 6/4/23 13:20, Bin Meng wrote:
> > Current codes using a brute-force traversal of all file descriptors
> > do not scale on a system where the maximum number of file descriptors
> > are set to a very large value (e.g.: in a Docker container of Manjaro
> > distribution it is set to 1073741816). QEMU just looks freezed during
> > start-up.
> >
> > The close-on-exec flag was introduced since a faily old Linux kernel
> > (2.6.23). With recent newer kernels that QEMU supports, we don't need
> > to manually close the fds for child process as the proper O_CLOEXEC
> > flag should have been set properly on files that we don't want child
> > process to see.
>
> But this file is common to all POSIX implementations, not only Linux.

Yes, this file is used for Linux, BSD and Solaris.
I checked that O_CLOEXEC is available on Linux (2.6.23), FreeBSD
(8.3), OpenBSD 5.0, Solaris 11. This flag is part of POSIX.1-2008.

Question is do we still need to support OSes that are older and do not
have this support?

>
> > Reported-by: Zhangjin Wu <falcon@tinylab.org>
> > Signed-off-by: Bin Meng <bmeng@tinylab.org>
> > ---
> >
> >   net/tap.c | 14 --------------
> >   1 file changed, 14 deletions(-)
> >
> > diff --git a/net/tap.c b/net/tap.c
> > index 1bf085d422..49e1915484 100644
> > --- a/net/tap.c
> > +++ b/net/tap.c
> > @@ -446,13 +446,6 @@ static void launch_script(const char *setup_script, const char *ifname,
> >           return;
> >       }
> >       if (pid == 0) {
>
> Maybe guard with #ifndef O_CLOEXEC
>
> > -        int open_max = sysconf(_SC_OPEN_MAX), i;
> > -
> > -        for (i = 3; i < open_max; i++) {
> > -            if (i != fd) {
> > -                close(i);
> > -            }
> > -        }
>
> or add qemu_close_cloexec() in util/osdep.c similar to qemu_open_cloexec()?
>
> >           parg = args;
> >           *parg++ = (char *)setup_script;
> >           *parg++ = (char *)ifname;
> > @@ -536,17 +529,10 @@ static int net_bridge_run_helper(const char *helper, const char *bridge,
> >           return -1;
> >       }
> >       if (pid == 0) {
> > -        int open_max = sysconf(_SC_OPEN_MAX), i;
> >           char *fd_buf = NULL;
> >           char *br_buf = NULL;
> >           char *helper_cmd = NULL;
> >
> > -        for (i = 3; i < open_max; i++) {
> > -            if (i != sv[1]) {
> > -                close(i);
> > -            }
> > -        }
> > -
> >           fd_buf = g_strdup_printf("%s%d", "--fd=", sv[1]);
> >
> >           if (strrchr(helper, ' ') || strrchr(helper, '\t')) {

Regards,
Bin