From nobody Mon Oct 6 17:02:08 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8AEF28C866; Fri, 18 Jul 2025 07:52:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752825164; cv=none; b=dahnbw92Yj3ButKF0rPoPIhc+Fablp75kgdEP1gOiOAkBhfmJfiqiuzNzxLgLA6LJc+Fl+rVrp5g77p9roYoy4QgsNHH4CQZLzxp199AJQDyN1Fi/zyjfapCvB8jge2asMlE+nst/gjYsksE0Z2DmcaYbUf5P3ypmSIN0hOv6Os= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752825164; c=relaxed/simple; bh=FGQlBoPm1PKEd0IQK+rgrMUjaCmUcAXlg83cGI/1wSU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=H5CxW5XH2zsSo/tRsfMKZYOSonMZxvRzBm62waEVWKAcn9ub8OxJYuNenUU08/HvlxmYVKi3Rb8MLvOnF2bYyCTBc9//gxBXw9UUgtRzaYSWKyylqVPWVIXtvMGxy6lYATc91g9Q0AlKp0wbO5+wSch/SeJCd2gx1pB5wNL+Yz4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=ErVUEaeG; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=e7rjsFFP; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="ErVUEaeG"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="e7rjsFFP" From: Nam Cao DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1752825161; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mNKPptFsyfmWbCY4i1DfRN3MXotJF8V+8zEiwOtGLF4=; b=ErVUEaeGmwb6pHg+kFCage9yNzc5Cg+ShmQgZVhfv37SdWU1Ci6yc0nvEgvHA0ybgwdvrf lAThY4UYhVJaHgiTwLEinS6nyLRfMlOHrmUqUx+PXz5qKzfiGZVpRSnMUqnhoZMm9A0WEr jV9NA/cb3iZd4f3DLwkU43ppJX8iOC8aIp6A/MvivKwL1SdJnJQEsD3qSsUaFh4bFGGg6D o6PTzaVpcSTAgC2N9+/sP+iLSsgHvp6wjrW9YTlWFXYikBiX0Qf+AHacA1+S1AxCrQ/8rV wXNgwEqN+b43ZqJSIbpxOu0GBpQjHzQtDXhatKzQRpfnWV2VkRaP8idJMAm0Rw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1752825161; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mNKPptFsyfmWbCY4i1DfRN3MXotJF8V+8zEiwOtGLF4=; b=e7rjsFFPCRhgpPSfoT89GcilN72Cyram+MT98zrxzjkv6LnXlwYNi47B18QOjI1cCfYfQY ttQnoXsOWu6QYwDQ== To: Alexander Viro , Christian Brauner , Jan Kara , Shuah Khan , Davidlohr Bueso , Soheil Hassas Yeganeh , Khazhismel Kumykov , Willem de Bruijn , Eric Dumazet , Jens Axboe , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Nam Cao Subject: [PATCH 1/2] selftests/eventpoll: Add test for multiple waiters Date: Fri, 18 Jul 2025 09:52:28 +0200 Message-Id: <92a02f24fe05c1a284bd643e830b182e0a2571b2.1752824628.git.namcao@linutronix.de> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a test whichs creates 64 threads who all epoll_wait() on the same eventpoll. The source eventfd is written but never read, therefore all the threads should always see an EPOLLIN event. This test fails because of a kernel bug, which will be fixed by a follow-up commit. Signed-off-by: Nam Cao --- .../filesystems/epoll/epoll_wakeup_test.c | 45 +++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c = b/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c index 65ede506305c..0852c68d0461 100644 --- a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c +++ b/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c @@ -3493,4 +3493,49 @@ TEST(epoll64) close(ctx.sfd[1]); } =20 +static void *epoll65_wait(void *ctx_) +{ + struct epoll_mtcontext *ctx =3D ctx_; + struct epoll_event event; + + for (int i =3D 0; i < 100000; ++i) { + if (!epoll_wait(ctx->efd[0], &event, 1, 0)) + return (void *)ENODATA; + } + + return (void *)0; +} + +TEST(epoll65) +{ + struct epoll_mtcontext ctx; + struct epoll_event event; + int64_t dummy_data =3D 99; + pthread_t threads[64]; + uintptr_t ret; + int i, err; + + ctx.efd[0] =3D epoll_create(1); + ASSERT_GE(ctx.efd[0], 0); + ctx.efd[1] =3D eventfd(0, 0); + ASSERT_GE(ctx.efd[1], 0); + + event.events =3D EPOLLIN; + err =3D epoll_ctl(ctx.efd[0], EPOLL_CTL_ADD, ctx.efd[1], &event); + ASSERT_EQ(err, 0); + + write(ctx.efd[1], &dummy_data, sizeof(dummy_data)); + + for (i =3D 0; i < ARRAY_SIZE(threads); ++i) + ASSERT_EQ(pthread_create(&threads[i], NULL, epoll65_wait, &ctx), 0); + + for (i =3D 0; i < ARRAY_SIZE(threads); ++i) { + ASSERT_EQ(pthread_join(threads[i], (void **)&ret), 0); + ASSERT_EQ(ret, 0); + } + + close(ctx.efd[0]); + close(ctx.efd[1]); +} + TEST_HARNESS_MAIN --=20 2.39.5 From nobody Mon Oct 6 17:02:08 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D5509292B50; Fri, 18 Jul 2025 07:52:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752825166; cv=none; b=BhbMPqGwE5zAa7QW7o9mmEo7/zpz1QMRyI1mXrUI9f2H9vwRRvYkKQ0omM3VoatKgtdNgS1bYeSMDm3u1Saj4s9ZTX0mjDHWEcThu21C2e0ljKpYPaWN8jHOY0a+vkSz9zTz9eYdk9vFqJxmMY4zq/ajqNi6aLhuI8JhhZT7juU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752825166; c=relaxed/simple; bh=If6nvlARjBBLLOjsIdWNzsD8ZBKZ++97S3/OQnhAMe8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=mqM+o5wf4GqNDLSjo2+2MeVAGZuLPJly9rDLD5c12xbHrxn36F5YVcTAnt6nQoNF0SuGjEIgISOP2OpApIEzwPVOI3Y/9VigwTWg6mi9MXaIP5gYprT9pBJTpKsibOfL4Q5s82ZajIqhJxvT3n7QD6VyOUINdHjHm8xwpoC75BA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=a3JJsSPi; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=MUWXlenx; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="a3JJsSPi"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="MUWXlenx" From: Nam Cao DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1752825161; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s2AY1K2KFKiGv6RYEA/Ls+yWTzlUHJhYLFDSPddq6Xs=; b=a3JJsSPi/3WFT8Wnw6e1gLP0QLbAqrnggabFYoK0peqeCglNET5tCLW5OlHMhmT9KlQWAb kk4qWhqBnpnAK22Q3sPVPS3L6UNwiw/IaNnKlP5kfltdg74xUGDonLAozbdoS41qsf31R7 Bpm+z7EmIm9kM7LlBlKCkH3nxPKCrcD/mVfuvn+HV5BPuVB/jubyX6rFPnc4EGaRUPtezJ Jl8GMlLUa0YTmZPTgGLNLJ6dzR+EHGSWxujXjoM/fX73VW5W3RQRq1X4VKPcAJVI0lBHWd PLMyOH9kG5YHg1F/UdRmJIMjpQrVVVwECzSRdzb2sNEfg7VK5mrVZ3OvpQ8SBw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1752825161; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s2AY1K2KFKiGv6RYEA/Ls+yWTzlUHJhYLFDSPddq6Xs=; b=MUWXlenxwlncp7WJg4OzcK1dPrvJqGvcUVBVjrPUVpimWr3RMvIRo7JDnTqw35Ve2HoHzd tRFnSYD8TRyyWeAw== To: Alexander Viro , Christian Brauner , Jan Kara , Shuah Khan , Davidlohr Bueso , Soheil Hassas Yeganeh , Khazhismel Kumykov , Willem de Bruijn , Eric Dumazet , Jens Axboe , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Nam Cao , stable@vger.kernel.org Subject: [PATCH 2/2] eventpoll: Fix epoll_wait() report false negative Date: Fri, 18 Jul 2025 09:52:29 +0200 Message-Id: <43d64ad765e2c47e958f01246320359b11379466.1752824628.git.namcao@linutronix.de> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" ep_events_available() checks for available events by looking at ep->rdllist and ep->ovflist. However, this is done without a lock, therefore the returned value is not reliable. Because it is possible that both checks on ep->rdllist and ep->ovflist are false while ep_start_scan() or ep_done_scan() is being executed on other CPUs, despite events are available. This bug can be observed by: 1. Create an eventpoll with at least one ready level-triggered event 2. Create multiple threads who do epoll_wait() with zero timeout. The threads do not consume the events, therefore all epoll_wait() should return at least one event. If one thread is executing ep_events_available() while another thread is executing ep_start_scan() or ep_done_scan(), epoll_wait() may wrongly return no event for the former thread. This reproducer is implemented as TEST(epoll65) in tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c Fix it by skipping ep_events_available(), just call ep_try_send_events() directly. epoll_sendevents() (io_uring) suffers the same problem, fix that as well. There is still ep_busy_loop() who uses ep_events_available() without lock, but it is probably okay (?) for busy-polling. Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()= ") Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout") Fixes: ae3a4f1fdc2c ("eventpoll: add epoll_sendevents() helper") Signed-off-by: Nam Cao Cc: stable@vger.kernel.org --- fs/eventpoll.c | 16 ++-------------- 1 file changed, 2 insertions(+), 14 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 0fbf5dfedb24..541481eafc20 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2022,7 +2022,7 @@ static int ep_schedule_timeout(ktime_t *to) static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, int maxevents, struct timespec64 *timeout) { - int res, eavail, timed_out =3D 0; + int res, eavail =3D 1, timed_out =3D 0; u64 slack =3D 0; wait_queue_entry_t wait; ktime_t expires, *to =3D NULL; @@ -2041,16 +2041,6 @@ static int ep_poll(struct eventpoll *ep, struct epol= l_event __user *events, timed_out =3D 1; } =20 - /* - * This call is racy: We may or may not see events that are being added - * to the ready list under the lock (e.g., in IRQ callbacks). For cases - * with a non-zero timeout, this thread will check the ready list under - * lock and will add to the wait queue. For cases with a zero - * timeout, the user by definition should not care and will have to - * recheck again. - */ - eavail =3D ep_events_available(ep); - while (1) { if (eavail) { res =3D ep_try_send_events(ep, events, maxevents); @@ -2496,9 +2486,7 @@ int epoll_sendevents(struct file *file, struct epoll_= event __user *events, * Racy call, but that's ok - it should get retried based on * poll readiness anyway. */ - if (ep_events_available(ep)) - return ep_try_send_events(ep, events, maxevents); - return 0; + return ep_try_send_events(ep, events, maxevents); } =20 /* --=20 2.39.5