On macOS when BROKEN_POLL is set in GLib, our tests will periodically
trigger a warning:
(process:50880): GLib-WARNING **: 02:54:15.272: poll(2) failed due to: Bad file descriptor.
Our code is inherantly racy, calling g_source_destroy which
removes the FD from the event thread poll asynchronously but
we close the FD immediately after g_source_destroy returns.
With poll() this results in POLLNVAL which we're ignoring, but
with select() it generates the BADF error on macOS.
We need to ignore the warnings on macOS to avoid abort()ing
our test programs.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
tests/meson.build | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/tests/meson.build b/tests/meson.build
index fd78d8a2fd..08d2ae67cb 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -39,9 +39,24 @@ tests_env = [
'abs_top_srcdir=@0@'.format(meson.source_root()),
'LC_ALL=C',
'LIBVIRT_AUTOSTART=0',
- 'G_DEBUG=fatal-warnings',
]
+
+# On macOS when BROKEN_POLL is set for GLib, our tests will
+# periodically trigger a warning:
+#
+# (process:50880): GLib-WARNING **: 02:54:15.272: poll(2) failed due to: Bad file descriptor.
+#
+# Our code is inherantly racy, calling g_source_destroy which
+# removes the FD from the event thread poll asynchronously
+# but we close the FD immediately after g_source_destroy returns.
+#
+# With poll() this results in POLLNVAL, but with select() it
+# generates the BADF error on macOS
+if host_machine.system() != 'darwin'
+ tests_env += ['G_DEBUG=fatal-warnings']
+endif
+
if use_expensive_tests
tests_env += 'VIR_TEST_EXPENSIVE=1'
else
--
2.35.1
On Thu, Apr 28, 2022 at 12:22:14PM +0100, Daniel P. Berrangé wrote:
> +# On macOS when BROKEN_POLL is set for GLib, our tests will
> +# periodically trigger a warning:
> +#
> +# (process:50880): GLib-WARNING **: 02:54:15.272: poll(2) failed due to: Bad file descriptor.
> +#
> +# Our code is inherantly racy, calling g_source_destroy which
> +# removes the FD from the event thread poll asynchronously
> +# but we close the FD immediately after g_source_destroy returns.
> +#
> +# With poll() this results in POLLNVAL, but with select() it
> +# generates the BADF error on macOS
> +if host_machine.system() != 'darwin'
> + tests_env += ['G_DEBUG=fatal-warnings']
> +endif
Doesn't this just sweep the issue under the carpet? Shouldn't GLib be
changed to do something like
#ifndef BROKEN_POLL
if (poll() < 0)
if (!POLLNVAL)
return -1;
#else
if (select() < 0)
if (!EBADF)
return -1;
#endif
instead? If acting on an fd that's already been closed is okay when
using the poll()-based implementation, the same should apply to the
select()-based one as well.
--
Andrea Bolognani / Red Hat / Virtualization
On Thu, Apr 28, 2022 at 06:39:42AM -0700, Andrea Bolognani wrote: > On Thu, Apr 28, 2022 at 12:22:14PM +0100, Daniel P. Berrangé wrote: > > +# On macOS when BROKEN_POLL is set for GLib, our tests will > > +# periodically trigger a warning: > > +# > > +# (process:50880): GLib-WARNING **: 02:54:15.272: poll(2) failed due to: Bad file descriptor. > > +# > > +# Our code is inherantly racy, calling g_source_destroy which > > +# removes the FD from the event thread poll asynchronously > > +# but we close the FD immediately after g_source_destroy returns. > > +# > > +# With poll() this results in POLLNVAL, but with select() it > > +# generates the BADF error on macOS > > +if host_machine.system() != 'darwin' > > + tests_env += ['G_DEBUG=fatal-warnings'] > > +endif > > Doesn't this just sweep the issue under the carpet? Shouldn't GLib be > changed to do something like > > #ifndef BROKEN_POLL > if (poll() < 0) > if (!POLLNVAL) > return -1; That's not the way POLLNVAL works - this is an even that is set in each 'struct pollfd', so it is associated with individual file descriptors. The application code can thus receive and handle the G_IO_NVAL condition - it isn't fatal from POV of g_poll > #else > if (select() < 0) > if (!EBADF) > return -1; > #endif > > instead? If acting on an fd that's already been closed is okay when > using the poll()-based implementation, the same should apply to the > select()-based one as well. The only way to get the same semantics for select() would be to see EBADF and then iterate over struct pollfd, calling fcntl() to see which one(s) are BADF. Possible, but its unpleasantly non-scalable, so I imagine that's why they didn't do this. I'll still probably inclined to file a bug against glib, if nothing else to provide a bug that other people can discover to explain why the behaviour is different. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, Apr 28, 2022 at 02:52:45PM +0100, Daniel P. Berrangé wrote: > On Thu, Apr 28, 2022 at 06:39:42AM -0700, Andrea Bolognani wrote: > > Shouldn't GLib be > > changed to do something like > > > > #ifndef BROKEN_POLL > > if (poll() < 0) > > if (!POLLNVAL) > > return -1; > > That's not the way POLLNVAL works - this is an even that is set in > each 'struct pollfd', so it is associated with individual file > descriptors. The application code can thus receive and handle the > G_IO_NVAL condition - it isn't fatal from POV of g_poll Right, I might have oversimplified things :) But your correction made the situation clearer in my head, thanks! > > #else > > if (select() < 0) > > if (!EBADF) > > return -1; > > #endif > > > > > instead? If acting on an fd that's already been closed is okay when > > using the poll()-based implementation, the same should apply to the > > select()-based one as well. > > The only way to get the same semantics for select() would be to see > EBADF and then iterate over struct pollfd, calling fcntl() to see > which one(s) are BADF. Possible, but its unpleasantly non-scalable, > so I imagine that's why they didn't do this. IIUC g_poll(), just like any other function used as GPollFunc, is supposed to have the same semantics as poll()[1], which means that passing an invalid FD should result in the corresponding flag being set rather than an outright failure. In other words, the current implementation of g_poll() on macOS doesn't follow the contract defined by GLib itself. It seems to me that this is a (fairly serious) bug in the library, no? [1] https://docs.gtk.org/glib/callback.PollFunc.html -- Andrea Bolognani / Red Hat / Virtualization
On Thu, Apr 28, 2022 at 08:33:46AM -0700, Andrea Bolognani wrote: > On Thu, Apr 28, 2022 at 02:52:45PM +0100, Daniel P. Berrangé wrote: > > > #else > > > if (select() < 0) > > > if (!EBADF) > > > return -1; > > > #endif > > > > > > > > instead? If acting on an fd that's already been closed is okay when > > > using the poll()-based implementation, the same should apply to the > > > select()-based one as well. > > > > The only way to get the same semantics for select() would be to see > > EBADF and then iterate over struct pollfd, calling fcntl() to see > > which one(s) are BADF. Possible, but its unpleasantly non-scalable, > > so I imagine that's why they didn't do this. > > IIUC g_poll(), just like any other function used as GPollFunc, is > supposed to have the same semantics as poll()[1], which means that > passing an invalid FD should result in the corresponding flag being > set rather than an outright failure. > > In other words, the current implementation of g_poll() on macOS > doesn't follow the contract defined by GLib itself. It seems to me > that this is a (fairly serious) bug in the library, no? It is significant, but long standing. GLib actually had this behaviour forever on macOS, but it regressed when Meson was introduced, until the recent fix. The question is whether efficiency trumps API semantics. Normally I'm heavily biased towards API semantics, but poll is a performance critical API, so it isn't so easy to declare we must workaround all the quirks. I filed a bug to raise the subject for discussion though https://gitlab.gnome.org/GNOME/glib/-/issues/2644 With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, Apr 28, 2022 at 05:55:41PM +0100, Daniel P. Berrangé wrote: > On Thu, Apr 28, 2022 at 08:33:46AM -0700, Andrea Bolognani wrote: > > In other words, the current implementation of g_poll() on macOS > > doesn't follow the contract defined by GLib itself. It seems to me > > that this is a (fairly serious) bug in the library, no? > > It is significant, but long standing. GLib actually had this behaviour > forever on macOS, but it regressed when Meson was introduced, until > the recent fix. > > The question is whether efficiency trumps API semantics. Normally I'm > heavily biased towards API semantics, but poll is a performance > critical API, so it isn't so easy to declare we must workaround all > the quirks. > > I filed a bug to raise the subject for discussion though > > https://gitlab.gnome.org/GNOME/glib/-/issues/2644 That's an excellent bug report! Thanks for spending time on it, and let's see where the upstream discussion leads :) -- Andrea Bolognani / Red Hat / Virtualization
On Fri, Apr 29, 2022 at 01:05:42AM -0700, Andrea Bolognani wrote:
>On Thu, Apr 28, 2022 at 05:55:41PM +0100, Daniel P. Berrangé wrote:
>> On Thu, Apr 28, 2022 at 08:33:46AM -0700, Andrea Bolognani wrote:
>> > In other words, the current implementation of g_poll() on macOS
>> > doesn't follow the contract defined by GLib itself. It seems to me
>> > that this is a (fairly serious) bug in the library, no?
>>
>> It is significant, but long standing. GLib actually had this behaviour
>> forever on macOS, but it regressed when Meson was introduced, until
>> the recent fix.
>>
>> The question is whether efficiency trumps API semantics. Normally I'm
>> heavily biased towards API semantics, but poll is a performance
>> critical API, so it isn't so easy to declare we must workaround all
>> the quirks.
>>
>> I filed a bug to raise the subject for discussion though
>>
>> https://gitlab.gnome.org/GNOME/glib/-/issues/2644
>
>That's an excellent bug report! Thanks for spending time on it, and
>let's see where the upstream discussion leads :)
>
Just so I don't sway the potential discussion in the issue to the wrong
way, I'll ask here. Since BROKEN_POLL is used due to poll() on OS/X not
behaving correctly with fds=NULL, nfds=0 [0], wouldn't it be possible to
"fix" at least the internal calls to g_poll() which are done when
running the main loop since they always have at least one FD (the
eventfd used to signal the main loop [1])? I know it would not fix the
problem, but it would at least workaround the part that is used the
most, I presume.
Martin
[0] https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/gpoll.c#L59
[1] this is a presumption I did not find in the code, but I've seen in
gdb and strace
On Mon, May 02, 2022 at 03:28:50PM +0200, Martin Kletzander wrote: > On Fri, Apr 29, 2022 at 01:05:42AM -0700, Andrea Bolognani wrote: > > On Thu, Apr 28, 2022 at 05:55:41PM +0100, Daniel P. Berrangé wrote: > > > On Thu, Apr 28, 2022 at 08:33:46AM -0700, Andrea Bolognani wrote: > > > > In other words, the current implementation of g_poll() on macOS > > > > doesn't follow the contract defined by GLib itself. It seems to me > > > > that this is a (fairly serious) bug in the library, no? > > > > > > It is significant, but long standing. GLib actually had this behaviour > > > forever on macOS, but it regressed when Meson was introduced, until > > > the recent fix. > > > > > > The question is whether efficiency trumps API semantics. Normally I'm > > > heavily biased towards API semantics, but poll is a performance > > > critical API, so it isn't so easy to declare we must workaround all > > > the quirks. > > > > > > I filed a bug to raise the subject for discussion though > > > > > > https://gitlab.gnome.org/GNOME/glib/-/issues/2644 > > > > That's an excellent bug report! Thanks for spending time on it, and > > let's see where the upstream discussion leads :) > > > > Just so I don't sway the potential discussion in the issue to the wrong > way, I'll ask here. Since BROKEN_POLL is used due to poll() on OS/X not > behaving correctly with fds=NULL, nfds=0 [0], wouldn't it be possible to > "fix" at least the internal calls to g_poll() which are done when > running the main loop since they always have at least one FD (the > eventfd used to signal the main loop [1])? I know it would not fix the > problem, but it would at least workaround the part that is used the > most, I presume. The comment is somewhat misleading - the nfds=0 case is not actually the main problem they were fixing. The real killer issue is that poll() doesn't work with block/char devices - ie any /dev/NNN FD. This is what broke QEMU because it failed with TAP device FDs. > > [0] https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/gpoll.c#L59 Dunno why the comment mentions nfds=0 because the the original commit message describes the device node FDs scenario. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
© 2016 - 2026 Red Hat, Inc.