Current codes using a brute-force traversal of all file descriptors
do not scale on a system where the maximum number of file descriptors
are set to a very large value (e.g.: in a Docker container of Manjaro
distribution it is set to 1073741816). QEMU just looks freezed during
start-up.
The close-on-exec flag was introduced since a faily old Linux kernel
(2.6.23). With recent newer kernels that QEMU supports, we don't need
to manually close the fds for child process as the proper O_CLOEXEC
flag should have been set properly on files that we don't want child
process to see.
Reported-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Bin Meng <bmeng@tinylab.org>
---
net/tap.c | 14 --------------
1 file changed, 14 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index 1bf085d422..49e1915484 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -446,13 +446,6 @@ static void launch_script(const char *setup_script, const char *ifname,
return;
}
if (pid == 0) {
- int open_max = sysconf(_SC_OPEN_MAX), i;
-
- for (i = 3; i < open_max; i++) {
- if (i != fd) {
- close(i);
- }
- }
parg = args;
*parg++ = (char *)setup_script;
*parg++ = (char *)ifname;
@@ -536,17 +529,10 @@ static int net_bridge_run_helper(const char *helper, const char *bridge,
return -1;
}
if (pid == 0) {
- int open_max = sysconf(_SC_OPEN_MAX), i;
char *fd_buf = NULL;
char *br_buf = NULL;
char *helper_cmd = NULL;
- for (i = 3; i < open_max; i++) {
- if (i != sv[1]) {
- close(i);
- }
- }
-
fd_buf = g_strdup_printf("%s%d", "--fd=", sv[1]);
if (strrchr(helper, ' ') || strrchr(helper, '\t')) {
--
2.34.1
On Thu, Apr 06, 2023 at 07:20:41PM +0800, Bin Meng wrote: > Current codes using a brute-force traversal of all file descriptors > do not scale on a system where the maximum number of file descriptors > are set to a very large value (e.g.: in a Docker container of Manjaro > distribution it is set to 1073741816). QEMU just looks freezed during > start-up. > > The close-on-exec flag was introduced since a faily old Linux kernel > (2.6.23). With recent newer kernels that QEMU supports, we don't need > to manually close the fds for child process as the proper O_CLOEXEC > flag should have been set properly on files that we don't want child > process to see. Even though O_CLOEXEC has existed for a long time, there is plenty of code that doesn't use it reliably. While QEMU can control its own code, we use a huge number of 3rd party libraries and we don't trust them to reliably be using O_CLOEXEC on everything they open. > Reported-by: Zhangjin Wu <falcon@tinylab.org> > Signed-off-by: Bin Meng <bmeng@tinylab.org> > --- > > net/tap.c | 14 -------------- > 1 file changed, 14 deletions(-) > > diff --git a/net/tap.c b/net/tap.c > index 1bf085d422..49e1915484 100644 > --- a/net/tap.c > +++ b/net/tap.c > @@ -446,13 +446,6 @@ static void launch_script(const char *setup_script, const char *ifname, > return; > } > if (pid == 0) { > - int open_max = sysconf(_SC_OPEN_MAX), i; > - > - for (i = 3; i < open_max; i++) { > - if (i != fd) { > - close(i); > - } > - } > parg = args; > *parg++ = (char *)setup_script; > *parg++ = (char *)ifname; > @@ -536,17 +529,10 @@ static int net_bridge_run_helper(const char *helper, const char *bridge, > return -1; > } > if (pid == 0) { > - int open_max = sysconf(_SC_OPEN_MAX), i; > char *fd_buf = NULL; > char *br_buf = NULL; > char *helper_cmd = NULL; > > - for (i = 3; i < open_max; i++) { > - if (i != sv[1]) { > - close(i); > - } > - } BSD has closefrom(3) we could use here, while modern Linux has close_range(3, open_max) We should probe for those two funtions and use them preferentially, only falling back to the current manual loop where they don't exist. > - > fd_buf = g_strdup_printf("%s%d", "--fd=", sv[1]); > > if (strrchr(helper, ' ') || strrchr(helper, '\t')) { > -- > 2.34.1 > > With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 6/4/23 13:20, Bin Meng wrote: > Current codes using a brute-force traversal of all file descriptors > do not scale on a system where the maximum number of file descriptors > are set to a very large value (e.g.: in a Docker container of Manjaro > distribution it is set to 1073741816). QEMU just looks freezed during > start-up. > > The close-on-exec flag was introduced since a faily old Linux kernel > (2.6.23). With recent newer kernels that QEMU supports, we don't need > to manually close the fds for child process as the proper O_CLOEXEC > flag should have been set properly on files that we don't want child > process to see. But this file is common to all POSIX implementations, not only Linux. > Reported-by: Zhangjin Wu <falcon@tinylab.org> > Signed-off-by: Bin Meng <bmeng@tinylab.org> > --- > > net/tap.c | 14 -------------- > 1 file changed, 14 deletions(-) > > diff --git a/net/tap.c b/net/tap.c > index 1bf085d422..49e1915484 100644 > --- a/net/tap.c > +++ b/net/tap.c > @@ -446,13 +446,6 @@ static void launch_script(const char *setup_script, const char *ifname, > return; > } > if (pid == 0) { Maybe guard with #ifndef O_CLOEXEC > - int open_max = sysconf(_SC_OPEN_MAX), i; > - > - for (i = 3; i < open_max; i++) { > - if (i != fd) { > - close(i); > - } > - } or add qemu_close_cloexec() in util/osdep.c similar to qemu_open_cloexec()? > parg = args; > *parg++ = (char *)setup_script; > *parg++ = (char *)ifname; > @@ -536,17 +529,10 @@ static int net_bridge_run_helper(const char *helper, const char *bridge, > return -1; > } > if (pid == 0) { > - int open_max = sysconf(_SC_OPEN_MAX), i; > char *fd_buf = NULL; > char *br_buf = NULL; > char *helper_cmd = NULL; > > - for (i = 3; i < open_max; i++) { > - if (i != sv[1]) { > - close(i); > - } > - } > - > fd_buf = g_strdup_printf("%s%d", "--fd=", sv[1]); > > if (strrchr(helper, ' ') || strrchr(helper, '\t')) {
On Thu, Apr 6, 2023 at 8:34 PM Philippe Mathieu-Daudé <philmd@linaro.org> wrote: > > On 6/4/23 13:20, Bin Meng wrote: > > Current codes using a brute-force traversal of all file descriptors > > do not scale on a system where the maximum number of file descriptors > > are set to a very large value (e.g.: in a Docker container of Manjaro > > distribution it is set to 1073741816). QEMU just looks freezed during > > start-up. > > > > The close-on-exec flag was introduced since a faily old Linux kernel > > (2.6.23). With recent newer kernels that QEMU supports, we don't need > > to manually close the fds for child process as the proper O_CLOEXEC > > flag should have been set properly on files that we don't want child > > process to see. > > But this file is common to all POSIX implementations, not only Linux. Yes, this file is used for Linux, BSD and Solaris. I checked that O_CLOEXEC is available on Linux (2.6.23), FreeBSD (8.3), OpenBSD 5.0, Solaris 11. This flag is part of POSIX.1-2008. Question is do we still need to support OSes that are older and do not have this support? > > > Reported-by: Zhangjin Wu <falcon@tinylab.org> > > Signed-off-by: Bin Meng <bmeng@tinylab.org> > > --- > > > > net/tap.c | 14 -------------- > > 1 file changed, 14 deletions(-) > > > > diff --git a/net/tap.c b/net/tap.c > > index 1bf085d422..49e1915484 100644 > > --- a/net/tap.c > > +++ b/net/tap.c > > @@ -446,13 +446,6 @@ static void launch_script(const char *setup_script, const char *ifname, > > return; > > } > > if (pid == 0) { > > Maybe guard with #ifndef O_CLOEXEC > > > - int open_max = sysconf(_SC_OPEN_MAX), i; > > - > > - for (i = 3; i < open_max; i++) { > > - if (i != fd) { > > - close(i); > > - } > > - } > > or add qemu_close_cloexec() in util/osdep.c similar to qemu_open_cloexec()? > > > parg = args; > > *parg++ = (char *)setup_script; > > *parg++ = (char *)ifname; > > @@ -536,17 +529,10 @@ static int net_bridge_run_helper(const char *helper, const char *bridge, > > return -1; > > } > > if (pid == 0) { > > - int open_max = sysconf(_SC_OPEN_MAX), i; > > char *fd_buf = NULL; > > char *br_buf = NULL; > > char *helper_cmd = NULL; > > > > - for (i = 3; i < open_max; i++) { > > - if (i != sv[1]) { > > - close(i); > > - } > > - } > > - > > fd_buf = g_strdup_printf("%s%d", "--fd=", sv[1]); > > > > if (strrchr(helper, ' ') || strrchr(helper, '\t')) { Regards, Bin
© 2016 - 2024 Red Hat, Inc.