From nobody Mon Jun 8 04:25:18 2026 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAA11306743 for ; Mon, 8 Jun 2026 02:16:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780884978; cv=none; b=rnHqI/mP4nCm2P9HeBmF1ZrgOlsLnTrqsJcFmDvEbtdSYpjJODWLNl8f5UHKJr+QBGu2NmGQs8lhSwRsBXmMbHxKwPpmThykmGXMTeeIvNoO0UF7gAn7kLtjlOWXSB3t0DKZTmG4jkm7IeBW5bVgWw4a2HIKfXRqIrf4mOnsUaw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780884978; c=relaxed/simple; bh=r8VLQ5xW5G8HMju11P4ebDI8f1WFA0r88NXcTbAF1ss=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=kHXvRMkNNGHvq0fDfKMmMagGcln2E2vgVLf2OhpG0WU8odTYu3mOVay3LMtKOVNC909F8GByUax+ACQ7SbejbBoQwARWpcnS/9khG8Em3Xq/C0R4N3gGLkXOj1U7459gm09aErQXyNP4n2mJkcpywfugqftZpUgS/gh/Jszl0UU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=W/bRlk6o; arc=none smtp.client-ip=115.124.30.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="W/bRlk6o" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1780884967; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=+Zj72wWx7pevdXvZqoPRoRytLSeBTqsdE5BMuNdgQeY=; b=W/bRlk6oOhGGAX1mlAKhliuVeB7pMv0iBF6f8+GcqIJDhcLhfWTvvnSc42y+C1QM7cS3QT0OE4sCaqZ/4HMckRnkSBTTgBdOKuvKcP31SyWEfIJWohRPqqkvrunHegKckMbyZCksV27P1dATMhqQ+eJipm350NhISFZKVsfWIlg= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033032089153;MF=xiangzao@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0X4J3NqX_1780884956; Received: from localhost.localdomain(mailfrom:xiangzao@linux.alibaba.com fp:SMTPD_---0X4J3NqX_1780884956 cluster:ay36) by smtp.aliyun-inc.com; Mon, 08 Jun 2026 10:16:06 +0800 From: Yuanhe Shu To: Mathieu Desnoyers , Peter Zijlstra Cc: "Paul E . McKenney" , Boqun Feng , Thomas Gleixner , linux-kernel@vger.kernel.org, Yuanhe Shu Subject: [PATCH] rseq: don't promote transient TLS faults to SIGSEGV Date: Mon, 8 Jun 2026 10:15:53 +0800 Message-Id: <20260608021553.1037128-1-xiangzao@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On return to user space the rseq slow path writes the new cpu_id / mm_cid into the user-space rseq TLS. rseq_update_usr() already classifies its failures in rseq_event::fatal: the flag is set only when corrupt user data is positively identified (e.g. a bad rseq_cs signature or an out-of-bounds abort IP) and stays clear when the access merely hit an unresolved page fault. rseq_slowpath_update_usr() ignores that and calls force_sig(SIGSEGV) on any failure, so a transient page fault on a still-registered rseq area becomes a fatal SIGSEGV. This is reachable since glibc >=3D 2.35 registers rseq for every thread by default: a memcg OOM victim can die of SIGSEGV (si_code=3DSI_KERNEL, si_addr=3DNULL) shortly after fork, before returning to user space, because the CoW of the inherited TLS page cannot be charged to the OOM-locked memcg and the rseq write faults. With oom_score_adj=3D-1000 the OOM killer finds no killable task, so the rseq SIGSEGV is the sole outcome; otherwise the rseq SIGSEGV can be delivered before the OOM killer queues SIGKILL, and the process exits 139 instead of 137, breaking OOMKilled detection in container runtimes. LTP mm/oom03 and mm/oom05 reproduce it on v7.1-rc6+, and a strace A/B with glibc.pthread.rseq as the sole variable shows the SIGSEGV only when rseq is registered. Only raise SIGSEGV when rseq_event::fatal is set. A non-fatal fault leaves the cached IDs untouched and is retried on a later return to user; a genuinely unmapped area keeps faulting and user space takes SIGSEGV through its own access. All corruption and ROP-hardening checks keep their SIGSEGV. Signal delivery is left untouched: it must abort the interrupted critical section before the handler runs and therefore cannot safely defer a fault. Signed-off-by: Yuanhe Shu --- Tested on v7.1-rc6+ (vanilla): - LTP mm/oom03 (14/14) and mm/oom05 (8/8): pass with the patch (the victim is reaped with SIGKILL); without it the rseq SIGSEGV makes the same cases fail. - strace A/B on the oom03 binary with glibc.pthread.rseq as the sole variable: 2 SIGSEGV (SI_KERNEL, si_addr=3DNULL) with rseq registered, 0 without -- isolates the cause to the rseq slow path. - tools/testing/selftests/rseq: run_param_test.sh, run_syscall_errors_test.sh, run_legacy_check.sh and run_timeslice_test.sh all pass. kernel/rseq.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/rseq.c b/kernel/rseq.c index e75e3a5e312c..38a19cef4ad0 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -302,11 +302,18 @@ static void rseq_slowpath_update_usr(struct pt_regs *= regs) =20 if (unlikely(!rseq_update_usr(t, regs, &ids))) { /* - * Clear the errors just in case this might survive magically, but - * leave the rest intact. + * rseq_update_usr() sets rseq_event::fatal only on corrupt + * user data, which keeps its SIGSEGV. A clear fatal bit is an + * unresolved page fault on a still-registered rseq area (e.g. + * a CoW that cannot be charged to an OOM-locked memcg): that + * is transient, so leave the cached IDs untouched and retry on + * a later return to user instead of killing the task. */ + bool fatal =3D t->rseq.event.fatal; + t->rseq.event.error =3D 0; - force_sig(SIGSEGV); + if (fatal) + force_sig(SIGSEGV); } } =20 --=20 2.39.5 (Apple Git-154)