[PATCH v3 4/4] gcov: use atomic counter updates to fix concurrent access crashes

Konstantin Khorenko posted 4 patches 6 hours ago
[PATCH v3 4/4] gcov: use atomic counter updates to fix concurrent access crashes
Posted by Konstantin Khorenko 6 hours ago
GCC's GCOV instrumentation can merge global branch counters with loop
induction variables as an optimization.  In inflate_fast(), the inner
copy loops get transformed so that the GCOV counter value is loaded
multiple times to compute the loop base address, start index, and end
bound.  Since GCOV counters are global (not per-CPU), concurrent
execution on different CPUs causes the counter to change between loads,
producing inconsistent values and out-of-bounds memory writes.

The crash manifests during IPComp (IP Payload Compression) processing
when inflate_fast() runs concurrently on multiple CPUs:

  BUG: unable to handle page fault for address: ffffd0a3c0902ffa
  RIP: inflate_fast+1431
  Call Trace:
   zlib_inflate
   __deflate_decompress
   crypto_comp_decompress
   ipcomp_decompress [xfrm_ipcomp]
   ipcomp_input [xfrm_ipcomp]
   xfrm_input

At the crash point, the compiler generated three loads from the same
global GCOV counter (__gcov0.inflate_fast+216) to compute base, start,
and end for an indexed loop.  Another CPU modified the counter between
loads, making the values inconsistent — the write went 3.4 MB past a
65 KB buffer.

Add -fprofile-update=atomic to CFLAGS_GCOV at the global level in the
top-level Makefile.  This tells GCC that GCOV counters may be
concurrently accessed, causing counter updates to use atomic
instructions (lock addq) instead of plain load/store.  This prevents
the compiler from merging counters with loop induction variables.

Applying this globally rather than per-subsystem not only addresses the
observed crash in zlib but makes GCOV coverage data more consistent
overall, preventing similar issues in any kernel code path that may
execute concurrently.

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 6b1d9fb1a6b4..a55ad668d6ba 100644
--- a/Makefile
+++ b/Makefile
@@ -806,7 +806,7 @@ all: vmlinux
 
 CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage
 ifdef CONFIG_CC_IS_GCC
-CFLAGS_GCOV	+= -fno-tree-loop-im
+CFLAGS_GCOV	+= -fno-tree-loop-im -fprofile-update=atomic
 endif
 export CFLAGS_GCOV
 
-- 
2.43.5

Re: [PATCH v3 4/4] gcov: use atomic counter updates to fix concurrent access crashes
Posted by Peter Oberparleiter 3 hours ago
On 01.04.2026 16:20, Konstantin Khorenko wrote:
> GCC's GCOV instrumentation can merge global branch counters with loop
> induction variables as an optimization.  In inflate_fast(), the inner
> copy loops get transformed so that the GCOV counter value is loaded
> multiple times to compute the loop base address, start index, and end
> bound.  Since GCOV counters are global (not per-CPU), concurrent
> execution on different CPUs causes the counter to change between loads,
> producing inconsistent values and out-of-bounds memory writes.
> 
> The crash manifests during IPComp (IP Payload Compression) processing
> when inflate_fast() runs concurrently on multiple CPUs:
> 
>   BUG: unable to handle page fault for address: ffffd0a3c0902ffa
>   RIP: inflate_fast+1431
>   Call Trace:
>    zlib_inflate
>    __deflate_decompress
>    crypto_comp_decompress
>    ipcomp_decompress [xfrm_ipcomp]
>    ipcomp_input [xfrm_ipcomp]
>    xfrm_input
> 
> At the crash point, the compiler generated three loads from the same
> global GCOV counter (__gcov0.inflate_fast+216) to compute base, start,
> and end for an indexed loop.  Another CPU modified the counter between
> loads, making the values inconsistent — the write went 3.4 MB past a
> 65 KB buffer.
> 
> Add -fprofile-update=atomic to CFLAGS_GCOV at the global level in the
> top-level Makefile.  This tells GCC that GCOV counters may be
> concurrently accessed, causing counter updates to use atomic
> instructions (lock addq) instead of plain load/store.  This prevents
> the compiler from merging counters with loop induction variables.
> 
> Applying this globally rather than per-subsystem not only addresses the
> observed crash in zlib but makes GCOV coverage data more consistent
> overall, preventing similar issues in any kernel code path that may
> execute concurrently.
> 
> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>

Thanks, this looks good to me!

Successfully tested this series on s390 (except for patch 3 which
depends on x86) using GCC 15.2.0, GCC 10.1.0, and current Clang from git
(20260401).

Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>

-- 
Peter Oberparleiter
Linux on IBM Z Development - IBM Germany R&D