We ran into so many clang bugs recently, that I'm strongly in favor of making it optional and aim on gcc only. Debugging these issues burns our time needlessly. Just consider what's happening here. I've pushed patches recently that call some virNuma*() APIs during QEMU cmd line generation. And while at it, I removed virNuma*() mocks from qemuxml2argvmock.c (v9.1.0-213-g95ae91fdd4). Andrea told me in review to make sure our CI works. So I've fired up my freebsd VM and compiled my code there. It worked. Except, I'm compiling with -O0 and our CI compiles with the default (-O2). And of course this is the source of the current build breakage. Long story short, clang is so eager to produce the fastest code that it simply ignores written code and generates what it *assumes* behaves the same. Well, it doesn't and this breaks our mocking and thus our tests. And of course it's happening only with optimizations. In this particular case, I've noticed that while virNumaNodesetIsAvailable() calls virNumaNodeIsAvailable(), it disregards the return value and virReportError()-s immediately. This is because the !WITH_NUMACTL variant of virNumaNodeIsAvailable() calls virNumaGetMaxNode() which fails and thus false is returned. WORSE, then I stepped in gdb into the mock, I've seen random numbers as integers. This is because the function from virnumamock.c wasn't optimized as much and followed standard calling convention (it grabbed the integer argument from the stack). But the caller (virNumaNodesetIsAvailable()) was so optimized that it did not even bothered to push anything onto stack. After these patches, there is still one qemuxml2argvtest case failing, and it's simply because no matter how hard I try, I can't stop clang from optimizing the function. Consider the following code: virNumaCPUSetToNodeset(virBitmap *cpuset, virBitmap **nodeset) { g_autoptr(virBitmap) nodes = virBitmapNew(0); ssize_t pos = -1; while ((pos = virBitmapNextSetBit(cpuset, pos)) >= 0) { int node = virNumaGetNodeOfCPU(pos); if (node < 0) { virReportSystemError(errno, _("Unable to get NUMA node of cpu %zd"), pos); return -1; } ... } ... } And this is the assembly code that clang generates (even after VIR_OPTNONE treatment): 116f5f: e8 0c b7 f9 ff call b2670 <virBitmapNew@plt> 116f64: 48 89 44 24 40 mov %rax,0x40(%rsp) 116f69: 48 c7 44 24 18 ff ff movq $0xffffffffffffffff,0x18(%rsp) 116f70: ff ff 116f72: 48 8b 7c 24 38 mov 0x38(%rsp),%rdi 116f77: 48 8b 74 24 18 mov 0x18(%rsp),%rsi 116f7c: e8 0f 74 f9 ff call ae390 <virBitmapNextSetBit@plt> 116f81: eb 00 jmp 116f83 <virNumaCPUSetToNodeset+0x43> 116f83: 48 89 44 24 18 mov %rax,0x18(%rsp) 116f88: 48 83 f8 00 cmp $0x0,%rax 116f8c: 0f 8c b2 00 00 00 jl 117044 <virNumaCPUSetToNodeset+0x104> 116f92: 48 8b 7c 24 18 mov 0x18(%rsp),%rdi 116f97: e8 d4 a1 f9 ff call b1170 <virNumaGetNodeOfCPU@plt> 116f9c: c7 44 24 10 ff ff ff movl $0xffffffff,0x10(%rsp) 116fa3: ff 116fa4: 83 7c 24 10 00 cmpl $0x0,0x10(%rsp) 116fa9: 7d 74 jge 11701f <virNumaCPUSetToNodeset+0xdf> 116fab: e8 80 72 f9 ff call ae230 <__errno_location@plt> 116fb0: 8b 18 mov (%rax),%ebx 116fb2: 48 8d 3d 27 35 2b 00 lea 0x2b3527(%rip),%rdi # 3ca4e0 <vmdk4GetBackingStore.prefix+0xf260> 116fb9: 48 8d 35 8c e3 25 00 lea 0x25e38c(%rip),%rsi # 37534c <modeMap+0x215c> 116fc0: ba 05 00 00 00 mov $0x5,%edx 116fc5: e8 e6 a7 f9 ff call b17b0 <dcgettext@plt> 116fca: 48 8b 4c 24 18 mov 0x18(%rsp),%rcx 116fcf: 48 89 0c 24 mov %rcx,(%rsp) 116fd3: 48 8d 15 ea e0 25 00 lea 0x25e0ea(%rip),%rdx # 3750c4 <modeMap+0x1ed4> 116fda: 48 8d 0d 54 e3 25 00 lea 0x25e354(%rip),%rcx # 375335 <modeMap+0x2145> 116fe1: 31 ff xor %edi,%edi 116fe3: 89 de mov %ebx,%esi 116fe5: 41 b8 10 04 00 00 mov $0x410,%r8d 116feb: 49 89 c1 mov %rax,%r9 116fee: 31 c0 xor %eax,%eax 116ff0: e8 8b c1 f9 ff call b3180 <virReportSystemErrorFull@plt> My assembler is a bit rusty, but at virNumaGetNodeOfCPU@plt is called 116f97, after which, -1 is stored on a stack (which corresponds to @pos variable), after which comparison against 0 is made and the jump happens iff the value (-1) is greater or equal than zero. Otherwise virReportSystemError() is called. To reproduce this issue even on Linux, configure with: export CC=clang meson setup -Dsystem=true -Dexpensive_tests=enabled -Dnumactl=disabled _build At this point, I think we can call clang broken and focus our time on developing libvirt. Not coming up with hacks around broken compilers. Michal Prívozník (2): internal.h: Invent VIR_OPTNONE virnuma: Annotate some functions as VIR_OPTNONE src/internal.h | 20 ++++++++++++++++++++ src/util/virnuma.c | 20 ++++++++++---------- 2 files changed, 30 insertions(+), 10 deletions(-) -- 2.39.2
On Wed, Mar 15, 2023 at 08:10:26PM +0100, Michal Privoznik wrote: > We ran into so many clang bugs recently, that I'm strongly in favor of > making it optional and aim on gcc only. Debugging these issues burns our > time needlessly. > [...] > > After these patches, there is still one qemuxml2argvtest case failing, > and it's simply because no matter how hard I try, I can't stop clang > from optimizing the function. > [...] > > At this point, I think we can call clang broken and focus our time on > developing libvirt. Not coming up with hacks around broken compilers. clang is the default compiler for FreeBSD and macOS, both of which are officially supported platforms. Unless we are willing to change that, ignoring clang is simply out of the question. I've only looked at your patches very briefly and I'm trying to wrap something else up before the end of the week, so unfortunately I'm unlikely to be able to do a proper review within a reasonable timeframe. Plus IIUC even with these patches applied we'd still have at least one failing test case, so they're not a complete solution. So, in the interest of returning the CI to green as soon as possible, I would recommend reverting 95ae91fdd4da quickly. We can then worry about improving the situation compared to the (admittedly poor) status quo as a follow-up, once that urgency is gone. A thought about VIR_OPTNONE. It seems to me that we'd want to apply this to all the functions that currently are marked with G_NO_INLINE for mocking purposes. So, wouldn't it make sense to instead have a single VIR_MOCKABLE annotation that combines the two, and use that everywhere? -- Andrea Bolognani / Red Hat / Virtualization
On 3/16/23 15:51, Andrea Bolognani wrote: > On Wed, Mar 15, 2023 at 08:10:26PM +0100, Michal Privoznik wrote: >> We ran into so many clang bugs recently, that I'm strongly in favor of >> making it optional and aim on gcc only. Debugging these issues burns our >> time needlessly. >> > [...] >> >> After these patches, there is still one qemuxml2argvtest case failing, >> and it's simply because no matter how hard I try, I can't stop clang >> from optimizing the function. >> > [...] >> >> At this point, I think we can call clang broken and focus our time on >> developing libvirt. Not coming up with hacks around broken compilers. > > clang is the default compiler for FreeBSD and macOS, both of which > are officially supported platforms. Unless we are willing to change > that, ignoring clang is simply out of the question. I'm fine with doing that change. FreeBSD offers option to install gcc. And for macOS - the fact we compile there says absolutely nothing. Anybody willing to make libvirt work on macOS is more than welcome to post patches. > > > I've only looked at your patches very briefly and I'm trying to wrap > something else up before the end of the week, so unfortunately I'm > unlikely to be able to do a proper review within a reasonable > timeframe. Plus IIUC even with these patches applied we'd still have > at least one failing test case, so they're not a complete solution. Yep. > > So, in the interest of returning the CI to green as soon as possible, > I would recommend reverting 95ae91fdd4da quickly. We can then worry > about improving the situation compared to the (admittedly poor) > status quo as a follow-up, once that urgency is gone. That won't do any good as it's not the root cause of this problem. The problem is (was until Dan fixed it) that even though I've specifically marked some functions to be not optimized, clang ignored that and optimized them anyway. And in this particular case it was virNumaCPUSetToNodeset() and virNumaGetNodeOfCPU() which are new. They were not implemented in the commit you're referencing. > > A thought about VIR_OPTNONE. It seems to me that we'd want to apply > this to all the functions that currently are marked with G_NO_INLINE > for mocking purposes. So, wouldn't it make sense to instead have a > single VIR_MOCKABLE annotation that combines the two, and use that > everywhere? > The problem is, __attribute__((optnone)) works only when passed at function definition, while noinline can be passed at function definition. At any rate, this is not solved (for new enough clang, i.e. 11.0+) and for even newer clang, where completely unrelated issue is happening, I've posted another fix [1]. Which brings me back to the question from the cover letter - how many clang related problems are acceptable for us? 1: https://listman.redhat.com/archives/libvir-list/2023-March/238923.html Michal
On Mon, Mar 20, 2023 at 03:17:24PM +0100, Michal Prívozník wrote: > On 3/16/23 15:51, Andrea Bolognani wrote: > > On Wed, Mar 15, 2023 at 08:10:26PM +0100, Michal Privoznik wrote: > >> We ran into so many clang bugs recently, that I'm strongly in favor of > >> making it optional and aim on gcc only. Debugging these issues burns our > >> time needlessly. > >> > > [...] > >> > >> After these patches, there is still one qemuxml2argvtest case failing, > >> and it's simply because no matter how hard I try, I can't stop clang > >> from optimizing the function. > >> > > [...] > >> > >> At this point, I think we can call clang broken and focus our time on > >> developing libvirt. Not coming up with hacks around broken compilers. > > > > clang is the default compiler for FreeBSD and macOS, both of which > > are officially supported platforms. Unless we are willing to change > > that, ignoring clang is simply out of the question. > > I'm fine with doing that change. FreeBSD offers option to install gcc. > And for macOS - the fact we compile there says absolutely nothing. > Anybody willing to make libvirt work on macOS is more than welcome to > post patches. clang is the system compiler for those platforms, and IMHO we should be compatible with it. Just as Fedora was long unhappy with certain apps trying to force use of clang, when Fedora's system compiler was GCC, the opposite is reasonable for other platforms. > > A thought about VIR_OPTNONE. It seems to me that we'd want to apply > > this to all the functions that currently are marked with G_NO_INLINE > > for mocking purposes. So, wouldn't it make sense to instead have a > > single VIR_MOCKABLE annotation that combines the two, and use that > > everywhere? > > > > The problem is, __attribute__((optnone)) works only when passed at > function definition, while noinline can be passed at function > definition. Is that a problem in fact ? We only need to suppress inlining when the calling function is in the same compilation unit as the called function. Likewise for this return value optimization. So if 'noinline' annotation was in the definition, rather than the declaration, it would be fine for our purposes. I think it would make sense to have a VIR_MOCKABLE annotation that covered 'noline', 'optnone' and 'noipa' attributes (the precise set varying according to what compiler we target). If anything this is more appealing that having it in the .h file, since this annotations are not intended to influence callers from external compilation units. We would need to update our test program that validates the existance of noinline for mocked functions to look in the .c instead of .h With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 3/21/23 11:46, Daniel P. Berrangé wrote: > On Mon, Mar 20, 2023 at 03:17:24PM +0100, Michal Prívozník wrote: >> On 3/16/23 15:51, Andrea Bolognani wrote: >>> On Wed, Mar 15, 2023 at 08:10:26PM +0100, Michal Privoznik wrote: >>>> We ran into so many clang bugs recently, that I'm strongly in favor of >>>> making it optional and aim on gcc only. Debugging these issues burns our >>>> time needlessly. >>>> >>> [...] >>>> >>>> After these patches, there is still one qemuxml2argvtest case failing, >>>> and it's simply because no matter how hard I try, I can't stop clang >>>> from optimizing the function. >>>> >>> [...] >>>> >>>> At this point, I think we can call clang broken and focus our time on >>>> developing libvirt. Not coming up with hacks around broken compilers. >>> >>> clang is the default compiler for FreeBSD and macOS, both of which >>> are officially supported platforms. Unless we are willing to change >>> that, ignoring clang is simply out of the question. >> >> I'm fine with doing that change. FreeBSD offers option to install gcc. >> And for macOS - the fact we compile there says absolutely nothing. >> Anybody willing to make libvirt work on macOS is more than welcome to >> post patches. > > clang is the system compiler for those platforms, and IMHO we should > be compatible with it. Just as Fedora was long unhappy with certain > apps trying to force use of clang, when Fedora's system compiler was > GCC, the opposite is reasonable for other platforms. > >>> A thought about VIR_OPTNONE. It seems to me that we'd want to apply >>> this to all the functions that currently are marked with G_NO_INLINE >>> for mocking purposes. So, wouldn't it make sense to instead have a >>> single VIR_MOCKABLE annotation that combines the two, and use that >>> everywhere? >>> >> >> The problem is, __attribute__((optnone)) works only when passed at >> function definition, while noinline can be passed at function >> definition. > > Is that a problem in fact ? We only need to suppress inlining when > the calling function is in the same compilation unit as the called > function. Likewise for this return value optimization. So if 'noinline' > annotation was in the definition, rather than the declaration, it > would be fine for our purposes. > > I think it would make sense to have a VIR_MOCKABLE annotation that > covered 'noline', 'optnone' and 'noipa' attributes (the precise set > varying according to what compiler we target). > > If anything this is more appealing that having it in the .h file, > since this annotations are not intended to influence callers from > external compilation units. > > We would need to update our test program that validates the > existance of noinline for mocked functions to look in the .c > instead of .h The problem we're facing is not that functions we mock are optimized, but their callers are optimized in such way that our mocking is ineffective. IOW: int mockThisFunction() { ... } int someRegularNotMockedFunction() { ... mockThisFunction(); ... } we would need to annotate someRegularNotMockedFunction() and leave mockThisFunction() be. This will get hairy pretty soon. And just for the record: the correct set of attributes that work for me are: optnone and used. Michal
On Tue, Mar 21, 2023 at 03:13:02PM +0100, Michal Prívozník wrote: > On 3/21/23 11:46, Daniel P. Berrangé wrote: > > On Mon, Mar 20, 2023 at 03:17:24PM +0100, Michal Prívozník wrote: > >> On 3/16/23 15:51, Andrea Bolognani wrote: > >>> On Wed, Mar 15, 2023 at 08:10:26PM +0100, Michal Privoznik wrote: > >>>> We ran into so many clang bugs recently, that I'm strongly in favor of > >>>> making it optional and aim on gcc only. Debugging these issues burns our > >>>> time needlessly. > >>>> > >>> [...] > >>>> > >>>> After these patches, there is still one qemuxml2argvtest case failing, > >>>> and it's simply because no matter how hard I try, I can't stop clang > >>>> from optimizing the function. > >>>> > >>> [...] > >>>> > >>>> At this point, I think we can call clang broken and focus our time on > >>>> developing libvirt. Not coming up with hacks around broken compilers. > >>> > >>> clang is the default compiler for FreeBSD and macOS, both of which > >>> are officially supported platforms. Unless we are willing to change > >>> that, ignoring clang is simply out of the question. > >> > >> I'm fine with doing that change. FreeBSD offers option to install gcc. > >> And for macOS - the fact we compile there says absolutely nothing. > >> Anybody willing to make libvirt work on macOS is more than welcome to > >> post patches. > > > > clang is the system compiler for those platforms, and IMHO we should > > be compatible with it. Just as Fedora was long unhappy with certain > > apps trying to force use of clang, when Fedora's system compiler was > > GCC, the opposite is reasonable for other platforms. > > > >>> A thought about VIR_OPTNONE. It seems to me that we'd want to apply > >>> this to all the functions that currently are marked with G_NO_INLINE > >>> for mocking purposes. So, wouldn't it make sense to instead have a > >>> single VIR_MOCKABLE annotation that combines the two, and use that > >>> everywhere? > >>> > >> > >> The problem is, __attribute__((optnone)) works only when passed at > >> function definition, while noinline can be passed at function > >> definition. > > > > Is that a problem in fact ? We only need to suppress inlining when > > the calling function is in the same compilation unit as the called > > function. Likewise for this return value optimization. So if 'noinline' > > annotation was in the definition, rather than the declaration, it > > would be fine for our purposes. > > > > I think it would make sense to have a VIR_MOCKABLE annotation that > > covered 'noline', 'optnone' and 'noipa' attributes (the precise set > > varying according to what compiler we target). > > > > If anything this is more appealing that having it in the .h file, > > since this annotations are not intended to influence callers from > > external compilation units. > > > > We would need to update our test program that validates the > > existance of noinline for mocked functions to look in the .c > > instead of .h > > The problem we're facing is not that functions we mock are optimized, > but their callers are optimized in such way that our mocking is > ineffective. IOW: > > int mockThisFunction() { > ... > } > > int someRegularNotMockedFunction() { > ... > mockThisFunction(); > ... > } > > we would need to annotate someRegularNotMockedFunction() and leave > mockThisFunction() be. This will get hairy pretty soon. > > And just for the record: the correct set of attributes that work for me > are: optnone and used. Hmm, i tried adding optnone to both, and it didn't work for me. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 3/21/23 15:19, Daniel P. Berrangé wrote: > On Tue, Mar 21, 2023 at 03:13:02PM +0100, Michal Prívozník wrote: >> On 3/21/23 11:46, Daniel P. Berrangé wrote: >>> On Mon, Mar 20, 2023 at 03:17:24PM +0100, Michal Prívozník wrote: >>>> On 3/16/23 15:51, Andrea Bolognani wrote: >>>>> On Wed, Mar 15, 2023 at 08:10:26PM +0100, Michal Privoznik wrote: >>>>>> We ran into so many clang bugs recently, that I'm strongly in favor of >>>>>> making it optional and aim on gcc only. Debugging these issues burns our >>>>>> time needlessly. >>>>>> >>>>> [...] >>>>>> >>>>>> After these patches, there is still one qemuxml2argvtest case failing, >>>>>> and it's simply because no matter how hard I try, I can't stop clang >>>>>> from optimizing the function. >>>>>> >>>>> [...] >>>>>> >>>>>> At this point, I think we can call clang broken and focus our time on >>>>>> developing libvirt. Not coming up with hacks around broken compilers. >>>>> >>>>> clang is the default compiler for FreeBSD and macOS, both of which >>>>> are officially supported platforms. Unless we are willing to change >>>>> that, ignoring clang is simply out of the question. >>>> >>>> I'm fine with doing that change. FreeBSD offers option to install gcc. >>>> And for macOS - the fact we compile there says absolutely nothing. >>>> Anybody willing to make libvirt work on macOS is more than welcome to >>>> post patches. >>> >>> clang is the system compiler for those platforms, and IMHO we should >>> be compatible with it. Just as Fedora was long unhappy with certain >>> apps trying to force use of clang, when Fedora's system compiler was >>> GCC, the opposite is reasonable for other platforms. >>> >>>>> A thought about VIR_OPTNONE. It seems to me that we'd want to apply >>>>> this to all the functions that currently are marked with G_NO_INLINE >>>>> for mocking purposes. So, wouldn't it make sense to instead have a >>>>> single VIR_MOCKABLE annotation that combines the two, and use that >>>>> everywhere? >>>>> >>>> >>>> The problem is, __attribute__((optnone)) works only when passed at >>>> function definition, while noinline can be passed at function >>>> definition. >>> >>> Is that a problem in fact ? We only need to suppress inlining when >>> the calling function is in the same compilation unit as the called >>> function. Likewise for this return value optimization. So if 'noinline' >>> annotation was in the definition, rather than the declaration, it >>> would be fine for our purposes. >>> >>> I think it would make sense to have a VIR_MOCKABLE annotation that >>> covered 'noline', 'optnone' and 'noipa' attributes (the precise set >>> varying according to what compiler we target). >>> >>> If anything this is more appealing that having it in the .h file, >>> since this annotations are not intended to influence callers from >>> external compilation units. >>> >>> We would need to update our test program that validates the >>> existance of noinline for mocked functions to look in the .c >>> instead of .h >> >> The problem we're facing is not that functions we mock are optimized, >> but their callers are optimized in such way that our mocking is >> ineffective. IOW: >> >> int mockThisFunction() { >> ... >> } >> >> int someRegularNotMockedFunction() { >> ... >> mockThisFunction(); >> ... >> } >> >> we would need to annotate someRegularNotMockedFunction() and leave >> mockThisFunction() be. This will get hairy pretty soon. >> >> And just for the record: the correct set of attributes that work for me >> are: optnone and used. > > Hmm, i tried adding optnone to both, and it didn't work for me. D'oh. Spoke too early. What I did also was rewrite virNumaGetNodeOfCPU() as follows: int VIR_MOCKABLE virNumaGetNodeOfCPU(int cpu) { errno = ENOSYS; if (cpu < 0) return -1; return -1; } (just don't make that a ternary operator, as it stops working) Where VIR_MOCKABLE is: #if defined(__clang__) # define VIR_OPTNONE __attribute__((optnone)) #else # define VIR_OPTNONE #endif #if defined(__clang__) # define VIR_NOIPA #else # define VIR_NOIPA __attribute__((noipa)) #endif #define VIR_MOCKABLE VIR_OPTNONE VIR_NOIPA __attribute__((noinline)) __attribute__((used)) BUT, I also had to annotate virNumaCPUSetToNodeset() which is NOT mocked. And this is the most annoying thing about whole approach. Since we already have a fix and now are fighting two failed CI builds, I wonder whether it makes sense to just forcibly disable all optimizations IF the compiler is clang and it doesn't have -fsemantic-interposition option. Michal
On Tue, Mar 21, 2023 at 10:46:36AM +0000, Daniel P. Berrangé wrote: > On Mon, Mar 20, 2023 at 03:17:24PM +0100, Michal Prívozník wrote: > > On 3/16/23 15:51, Andrea Bolognani wrote: > > > A thought about VIR_OPTNONE. It seems to me that we'd want to apply > > > this to all the functions that currently are marked with G_NO_INLINE > > > for mocking purposes. So, wouldn't it make sense to instead have a > > > single VIR_MOCKABLE annotation that combines the two, and use that > > > everywhere? > > > > > > > The problem is, __attribute__((optnone)) works only when passed at > > function definition, while noinline can be passed at function > > definition. > > Is that a problem in fact ? We only need to suppress inlining when > the calling function is in the same compilation unit as the called > function. Likewise for this return value optimization. So if 'noinline' > annotation was in the definition, rather than the declaration, it > would be fine for our purposes. > > I think it would make sense to have a VIR_MOCKABLE annotation that > covered 'noline', 'optnone' and 'noipa' attributes (the precise set > varying according to what compiler we target). Empirically 'optnone' does *NOT* fix the problem we face. The optimizations about return value still get performed with 'optnone' present :-( After chasing links I found https://reviews.llvm.org/D28404#641077 where there is this nugget "optnone isn't *really* no optimizations: clearly this is true, but then neither is -O0. We run the always inliner, a couple of other passes, and we run several parts of the code generators optimizer. I understand why optnone deficiencies (ie, too many optimizations) might be frustrating, but having *more users* seems likely to make this *better*." Much of that ticket is debate of whether 'optnone' should be thue same as '-O0' There is also a blog post http://maskray.me/blog/2021-05-09-fno-semantic-interposition which confirms what we discovered "For ELF -fpic, Clang never suppresses interprocedural optimizations (including inlining) on default visibility external linkage definitions. So projects relying on blocked interprocedural optimizations have been broken for years. They only probably work recently by specifying -fsemantic-interposition." By 'default visibility' they basically mean any symbol which is not marked as 'weak', which is what we tried todo before and then reverted with: commit 407a281a8e2b6c5078ba1148535663ea64fd9314 Author: Daniel P. Berrangé <berrange@redhat.com> Date: Wed Jul 12 11:07:17 2017 +0100 Revert "Prevent more compiler optimization of mockable functions" This reverts commit e4b980c853d2114b25fa805a84ea288384416221. Looking at the revert again though, I think we might not be out of luck. We had to revert because when we put stuff into .a files it got lost when linked to the final executable when it was marked weak. I think we might be able to work around this using linker flag -Wl,--whole-archive. This wasn't practical in 2017 as we still used automake+libtool and they had a habit of re-ordering such args. Now we're using meson though, it has a 'link_whole' option integrated which should do the right thing with -Wl,--whole-archive. This might let us re-try the weak function approach. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Mon, Mar 20, 2023 at 03:17:24PM +0100, Michal Prívozník wrote: > On 3/16/23 15:51, Andrea Bolognani wrote: > > On Wed, Mar 15, 2023 at 08:10:26PM +0100, Michal Privoznik wrote: > >> We ran into so many clang bugs recently, that I'm strongly in favor of > >> making it optional and aim on gcc only. Debugging these issues burns our > >> time needlessly. > >> > > [...] > >> > >> After these patches, there is still one qemuxml2argvtest case failing, > >> and it's simply because no matter how hard I try, I can't stop clang > >> from optimizing the function. > >> > > [...] > >> > >> At this point, I think we can call clang broken and focus our time on > >> developing libvirt. Not coming up with hacks around broken compilers. > > > > clang is the default compiler for FreeBSD and macOS, both of which > > are officially supported platforms. Unless we are willing to change > > that, ignoring clang is simply out of the question. > > I'm fine with doing that change. FreeBSD offers option to install gcc. > And for macOS - the fact we compile there says absolutely nothing. > Anybody willing to make libvirt work on macOS is more than welcome to > post patches. > > > > > > > I've only looked at your patches very briefly and I'm trying to wrap > > something else up before the end of the week, so unfortunately I'm > > unlikely to be able to do a proper review within a reasonable > > timeframe. Plus IIUC even with these patches applied we'd still have > > at least one failing test case, so they're not a complete solution. > > Yep. > > > > > So, in the interest of returning the CI to green as soon as possible, > > I would recommend reverting 95ae91fdd4da quickly. We can then worry > > about improving the situation compared to the (admittedly poor) > > status quo as a follow-up, once that urgency is gone. > > That won't do any good as it's not the root cause of this problem. The > problem is (was until Dan fixed it) that even though I've specifically > marked some functions to be not optimized, clang ignored that and > optimized them anyway. And in this particular case it was > virNumaCPUSetToNodeset() and virNumaGetNodeOfCPU() which are new. They > were not implemented in the commit you're referencing. > > > > > A thought about VIR_OPTNONE. It seems to me that we'd want to apply > > this to all the functions that currently are marked with G_NO_INLINE > > for mocking purposes. So, wouldn't it make sense to instead have a > > single VIR_MOCKABLE annotation that combines the two, and use that > > everywhere? > > > > The problem is, __attribute__((optnone)) works only when passed at > function definition, while noinline can be passed at function > definition. > > At any rate, this is not solved (for new enough clang, i.e. 11.0+) and > for even newer clang, where completely unrelated issue is happening, > I've posted another fix [1]. Which brings me back to the question from > the cover letter - how many clang related problems are acceptable for > us? Again, this is not a clang problem. This is a libvirt problem caused by us making bogus assumptions when mocking for our unit tests. The GCC maintainers have told me they consider this CLang behaviour acceptable and if we want our mocks to be reliable we need more than just "noinline". With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 3/20/23 15:24, Daniel P. Berrangé wrote: > On Mon, Mar 20, 2023 at 03:17:24PM +0100, Michal Prívozník wrote: >> On 3/16/23 15:51, Andrea Bolognani wrote: >>> On Wed, Mar 15, 2023 at 08:10:26PM +0100, Michal Privoznik wrote: >>>> We ran into so many clang bugs recently, that I'm strongly in favor of >>>> making it optional and aim on gcc only. Debugging these issues burns our >>>> time needlessly. >>>> >>> [...] >>>> >>>> After these patches, there is still one qemuxml2argvtest case failing, >>>> and it's simply because no matter how hard I try, I can't stop clang >>>> from optimizing the function. >>>> >>> [...] >>>> >>>> At this point, I think we can call clang broken and focus our time on >>>> developing libvirt. Not coming up with hacks around broken compilers. >>> >>> clang is the default compiler for FreeBSD and macOS, both of which >>> are officially supported platforms. Unless we are willing to change >>> that, ignoring clang is simply out of the question. >> >> I'm fine with doing that change. FreeBSD offers option to install gcc. >> And for macOS - the fact we compile there says absolutely nothing. >> Anybody willing to make libvirt work on macOS is more than welcome to >> post patches. >> >>> >>> >>> I've only looked at your patches very briefly and I'm trying to wrap >>> something else up before the end of the week, so unfortunately I'm >>> unlikely to be able to do a proper review within a reasonable >>> timeframe. Plus IIUC even with these patches applied we'd still have >>> at least one failing test case, so they're not a complete solution. >> >> Yep. >> >>> >>> So, in the interest of returning the CI to green as soon as possible, >>> I would recommend reverting 95ae91fdd4da quickly. We can then worry >>> about improving the situation compared to the (admittedly poor) >>> status quo as a follow-up, once that urgency is gone. >> >> That won't do any good as it's not the root cause of this problem. The >> problem is (was until Dan fixed it) that even though I've specifically >> marked some functions to be not optimized, clang ignored that and >> optimized them anyway. And in this particular case it was >> virNumaCPUSetToNodeset() and virNumaGetNodeOfCPU() which are new. They >> were not implemented in the commit you're referencing. >> >>> >>> A thought about VIR_OPTNONE. It seems to me that we'd want to apply >>> this to all the functions that currently are marked with G_NO_INLINE >>> for mocking purposes. So, wouldn't it make sense to instead have a >>> single VIR_MOCKABLE annotation that combines the two, and use that >>> everywhere? >>> >> >> The problem is, __attribute__((optnone)) works only when passed at >> function definition, while noinline can be passed at function >> definition. >> >> At any rate, this is not solved (for new enough clang, i.e. 11.0+) and >> for even newer clang, where completely unrelated issue is happening, >> I've posted another fix [1]. Which brings me back to the question from >> the cover letter - how many clang related problems are acceptable for >> us? > > Again, this is not a clang problem. The -fsemantic-interposition is not documented in clang. At least not in its manpage or --help output. I had to go to gcc's manpage to learn what the option is doing. But okay, CLang itself is not broken. But the defaults they chose are weird and we are investing non-negligible amount of time trying to fix them. Not to mention, that apparently the optimization was enabled way before users were able to turn it off. Speaking of which, the option was initially available for x86_64 only: https://github.com/llvm/llvm-project/commit/68a20c7f36d1d51cc46c0bd17384c16bc7818fa2 Hence, FreeBSD 12 (clang 10.0.1) is foobared. And macOS very likely did not pick up aforementioned commit hence it claims it doesn't have the option. > This is a libvirt problem caused by > us making bogus assumptions when mocking for our unit tests. The GCC > maintainers have told me they consider this CLang behaviour acceptable > and if we want our mocks to be reliable we need more than just "noinline". For some reason, I am not able to reproduce this issue with no combination of -ON and -fno-semantic-interposition/-fsemantic interposition with GCC. So CLang must be doing something more than GCC and semantic-interposition stops it. Michal
On Tue, Mar 21, 2023 at 10:35:13AM +0100, Michal Prívozník wrote: > On 3/20/23 15:24, Daniel P. Berrangé wrote: > > On Mon, Mar 20, 2023 at 03:17:24PM +0100, Michal Prívozník wrote: > >> On 3/16/23 15:51, Andrea Bolognani wrote: > >>> On Wed, Mar 15, 2023 at 08:10:26PM +0100, Michal Privoznik wrote: > >>>> We ran into so many clang bugs recently, that I'm strongly in favor of > >>>> making it optional and aim on gcc only. Debugging these issues burns our > >>>> time needlessly. > >>>> > >>> [...] > >>>> > >>>> After these patches, there is still one qemuxml2argvtest case failing, > >>>> and it's simply because no matter how hard I try, I can't stop clang > >>>> from optimizing the function. > >>>> > >>> [...] > >>>> > >>>> At this point, I think we can call clang broken and focus our time on > >>>> developing libvirt. Not coming up with hacks around broken compilers. > >>> > >>> clang is the default compiler for FreeBSD and macOS, both of which > >>> are officially supported platforms. Unless we are willing to change > >>> that, ignoring clang is simply out of the question. > >> > >> I'm fine with doing that change. FreeBSD offers option to install gcc. > >> And for macOS - the fact we compile there says absolutely nothing. > >> Anybody willing to make libvirt work on macOS is more than welcome to > >> post patches. > >> > >>> > >>> > >>> I've only looked at your patches very briefly and I'm trying to wrap > >>> something else up before the end of the week, so unfortunately I'm > >>> unlikely to be able to do a proper review within a reasonable > >>> timeframe. Plus IIUC even with these patches applied we'd still have > >>> at least one failing test case, so they're not a complete solution. > >> > >> Yep. > >> > >>> > >>> So, in the interest of returning the CI to green as soon as possible, > >>> I would recommend reverting 95ae91fdd4da quickly. We can then worry > >>> about improving the situation compared to the (admittedly poor) > >>> status quo as a follow-up, once that urgency is gone. > >> > >> That won't do any good as it's not the root cause of this problem. The > >> problem is (was until Dan fixed it) that even though I've specifically > >> marked some functions to be not optimized, clang ignored that and > >> optimized them anyway. And in this particular case it was > >> virNumaCPUSetToNodeset() and virNumaGetNodeOfCPU() which are new. They > >> were not implemented in the commit you're referencing. > >> > >>> > >>> A thought about VIR_OPTNONE. It seems to me that we'd want to apply > >>> this to all the functions that currently are marked with G_NO_INLINE > >>> for mocking purposes. So, wouldn't it make sense to instead have a > >>> single VIR_MOCKABLE annotation that combines the two, and use that > >>> everywhere? > >>> > >> > >> The problem is, __attribute__((optnone)) works only when passed at > >> function definition, while noinline can be passed at function > >> definition. > >> > >> At any rate, this is not solved (for new enough clang, i.e. 11.0+) and > >> for even newer clang, where completely unrelated issue is happening, > >> I've posted another fix [1]. Which brings me back to the question from > >> the cover letter - how many clang related problems are acceptable for > >> us? > > > > Again, this is not a clang problem. > > The -fsemantic-interposition is not documented in clang. At least not in > its manpage or --help output. I had to go to gcc's manpage to learn what > the option is doing. > > But okay, CLang itself is not broken. But the defaults they chose are > weird and we are investing non-negligible amount of time trying to fix > them. > > Not to mention, that apparently the optimization was enabled way before > users were able to turn it off. Speaking of which, the option was > initially available for x86_64 only: > > https://github.com/llvm/llvm-project/commit/68a20c7f36d1d51cc46c0bd17384c16bc7818fa2 > > Hence, FreeBSD 12 (clang 10.0.1) is foobared. And macOS very likely did > not pick up aforementioned commit hence it claims it doesn't have the > option. Argh. I was mislead by the CI pipeline being green. I just discvoered that we had Cirrus CI mis-configured in /libvirt namespace, so it only run post-merge to master, not pre-merge. I've now marked the variables as non-Protected, so Cirrus CI will always run. It confirms we're still broken in FreeBSD 12 and macOS. > > This is a libvirt problem caused by > > us making bogus assumptions when mocking for our unit tests. The GCC > > maintainers have told me they consider this CLang behaviour acceptable > > and if we want our mocks to be reliable we need more than just "noinline". > > For some reason, I am not able to reproduce this issue with no > combination of -ON and -fno-semantic-interposition/-fsemantic > interposition with GCC. So CLang must be doing something more than GCC > and semantic-interposition stops it. I've not tried especially to reproduce on GCC, I just accepted the GCC maintainers opinion that what we're doing is unsafe to rely on even for GCC. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Wed, Mar 15, 2023 at 08:10:26PM +0100, Michal Privoznik wrote: > We ran into so many clang bugs recently, that I'm strongly in favor of > making it optional and aim on gcc only. Debugging these issues burns our > time needlessly. Just consider what's happening here. > > I've pushed patches recently that call some virNuma*() APIs during QEMU > cmd line generation. And while at it, I removed virNuma*() mocks from > qemuxml2argvmock.c (v9.1.0-213-g95ae91fdd4). Andrea told me in review to > make sure our CI works. So I've fired up my freebsd VM and compiled my > code there. It worked. Except, I'm compiling with -O0 and our CI > compiles with the default (-O2). And of course this is the source of the > current build breakage. > > Long story short, clang is so eager to produce the fastest code that it > simply ignores written code and generates what it *assumes* behaves the > same. Well, it doesn't and this breaks our mocking and thus our tests. snip > And of course it's happening only with optimizations. snip > To reproduce this issue even on Linux, configure with: > > export CC=clang > meson setup -Dsystem=true -Dexpensive_tests=enabled -Dnumactl=disabled _build > > > At this point, I think we can call clang broken and focus our time on > developing libvirt. Not coming up with hacks around broken compilers. Clang's behaviour is not broken or wrong. The code it generates is functionally correct when used, because according to the C stndard it is forbidden to define the same function more than once. Apps linked aginst libvirt.so or our daemons all operate perfectly. The problem is confined to our test suite where we're trying to selectively replace code at runtime and in doing so, we're making assumptions about code optimization that are not valid. In doing this we're violating the language standard. Of course this is a feature that ELF explicitly supports, despite being outside the language stndards. What this means is that we can't assume the compiler default behaviour will do what we want. The sepecific problem we're facing here is that the compiler is doing inter-procedural analysis because it can see the body of both functions. We've already tried to suppress the optimization by using the 'noinline' annotation but unfortunately is possible to keep a function as non-inlined, but still be wrong. Whuat we're seeing here is that a separate optimization is looking at the return value and determining that it is a compile time constant and thus elide the conditional, while still calling the orginal function ! I can demonstrate this a little more clearly with an isolated example: $ cat inline.h int IsAvailable(int i) __attribute__((noinline)); int IsSomethingOK(int i); $ cat inline.c #include <stdbool.h> int IsAvailable(int i) { fprintf(stderr, "Bad hit\n"); return false; } #endif int IsSomethingOK(int i) { while (i--) { if (IsAvailable(i)) continue; fprintf(stderr, "Bad stuff\n"); return -1; } fprintf(stderr, "OK\n"); return 0; } $ cat mock.c #include <stdbool.h> #include <stdio.h> #include "inline.h" int IsAvailable(int i) { fprintf(stderr, "Good hit\n"); return true; } $ LD_LIBRARY_PATH=. ./something Bad hit Bad stuff Here we see 'Bad hit' so it must be running the original IsAvailable() code body, and we see 'Bad stuff' so it must be seeing the 'return false' return value. Now running with the override $ LD_PRELOAD=mock.so LD_LIBRARY_PATH=. ./something Good hit Bad stuff we see 'Good hit' so it is running our replacement mocked impl of IsAvailable(), but we still see 'Bad stuff' so it must be still using the old 'return value' value. This proves that the "inlining" optimization is separate from the "constant return value" optimization. We're only disabling the former optimization, so our mocks are not safe. We've hit this about 6 years ago which lead to using 'weak' annotation too: commit e4b980c853d2114b25fa805a84ea288384416221 Author: Daniel P. Berrangé <berrange@redhat.com> Date: Wed Jul 5 11:46:28 2017 +0100 Prevent more compiler optimization of mockable functions Currently all mockable functions are annotated with the 'noinline' attribute. This is insufficient to guarantee that a function can be reliably mocked with an LD_PRELOAD. The C language spec allows the compiler to assume there is only a single implementation of each function. It can thus do things like propagating constant return values into the caller at compile time, or creating multiple specialized copies of the function body each optimized for a different caller. To prevent these optimizations we must also set the 'noclone' and 'weak' attributes. This fixes the test suite when libvirt.so is built with CLang with optimization enabled. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> but this caused unwanted side effects: commit 407a281a8e2b6c5078ba1148535663ea64fd9314 Author: Daniel P. Berrangé <berrange@redhat.com> Date: Wed Jul 12 11:07:17 2017 +0100 Revert "Prevent more compiler optimization of mockable functions" This reverts commit e4b980c853d2114b25fa805a84ea288384416221. When a binary links against a .a archive (as opposed to a shared library), any symbols which are marked as 'weak' get silently dropped. As a result when the binary later runs, those 'weak' functions have an address of 0x0 and thus crash when run. This happened with virtlogd and virtlockd because they don't link to libvirt.so, but instead just libvirt_util.a and libvirt_rpc.a. The virRandomBits symbols was weak and so left out of the virtlogd & virtlockd binaries, despite being required by virHashTable functions. Various other binaries like libvirt_lxc, libvirt_iohelper, etc also link directly to .a files instead of libvirt.so, so are potentially at risk of dropping symbols leading to a later runtime crash. This is normal linker behaviour because a weak symbol is not treated as undefined, so nothing forces it to be pulled in from the .a You have to force the linker to pull in weak symbols using -u$SYMNAME which is not a practical approach. This risk is silent bad linkage that affects runtime behaviour is not acceptable for a fix that was merely trying to fix the test suite. So stop using __weak__ again. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> There's a bit of a thread here: https://listman.redhat.com/archives/libvir-list/2017-July/150772.html I tied another approach here: https://listman.redhat.com/archives/libvir-list/2017-July/150809.html which didn't work well enough and then IIRC we abandoned attempts at that time. Meanwhile though GCC gained a 'noipa' attribute (no inter procedural analysis) which can suppress these return value optimizations, amongst other things. CLang though doesn't support it yet https://reviews.llvm.org/D101011 Looking at the compiler flags though there is another option which is to pass '-fsemantic-interposition'. With CLang this defaults to off, which means it is free to assume replacement functions have the same semantics. Passing -fsemantic-interposition forces it to assume that replacements might have different semantics, and thus cannot do the return value optimizations. The downside is that it would have a impact on performance across everything we compile with that arg, but I don't think libvirt stuff is generally that performance sensitive, so we can probably just go with -fsemantic-interposition on clang unconditionally. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 3/15/23 20:10, Michal Privoznik wrote: > Another option might be to just turn off optimization for the whole virnuma.c file when compiling with clang. Except not really, because meson deliberately doesn't allow that: https://github.com/mesonbuild/meson/issues/1367 On the other hand, that would solve only the problem at hand and not the root cause. So maybe we need a bigger hammer after all. Something among these lines: diff --git i/meson.build w/meson.build index 319ed790f9..d018693ca0 100644 --- i/meson.build +++ w/meson.build @@ -206,6 +206,12 @@ libvirt_lib_version = '@0@.@1@.@2@'.format(libvirt_so_version, libvirt_age, libv cc = meson.get_compiler('c') cc_flags = [] +compiler_warn = 0 +if cc.get_id() == 'clang' + cc_flags += [ '-O0' ] + compiler_warn = 1 +endif + git_werror = get_option('git_werror') if (git_werror.enabled() or git_werror.auto()) and git and not get_option('werror') cc_flags += [ '-Werror' ] @@ -2268,3 +2274,7 @@ if conf.has('WITH_QEMU') } summary(priv_summary, section: 'Privileges') endif + +if compiler_warn != 0 + summary({'compiler': 'broken !!! Forcibly turning off optimizations. Go unbreak your compiler !!!'}, section: 'Compiler') +endif Of course, we could do more clever check (from my example in cover letter whether Y() calls X()), but I don't know how to write that in meson. But since we officially aim on GCC an Clang and only the latter is broken I guess checking for compiler's ID is just fine. Michal
© 2016 - 2023 Red Hat, Inc.