From nobody Tue Jun 16 10:00:15 2026 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5888B1A6817 for ; Sat, 18 Apr 2026 04:17:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776485847; cv=none; b=C2jBiJyYPIzQfPyhGan8nbd0XL7EO9DUF14L/QPt/9qsQZTjR49c0sBRLGLLxYYnKU66F9wrkOXP1SN5vxM46KI7gwE6LPhIFVx3v9u53awqcHM+g7+RAQXJEWLMM9dGdC9BdREM4xLvbOj1HIV7a8acFq8QJ4FtU5p/U8OQXVg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776485847; c=relaxed/simple; bh=DNeuuui43Z7WUckyxw1/kPzB8/wmhhyPAn6t/FTVdRI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gRgXdzrq8Sltt8dDxBf4mW9xiQPKWObWu8rh66mv2L0cEMbvlZJ3QghO/ZOBURAYFAdWaeH/NbY/xKnTnOruqvMZhfAPHyYsrRY7HdzSfW5wzwazgUwPuRFe3j3kHYv8bwRO0lq268sbkJDlcWIURhSCbuiT4Y9UUAHWxnN/RV0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=h91FiJXa; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="h91FiJXa" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-82f943870baso358833b3a.1 for ; Fri, 17 Apr 2026 21:17:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776485846; x=1777090646; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9LhJmNKzbi8lgBR/6qaGu+UXTmOuN+KpkyQcs/Gar+k=; b=h91FiJXaMX1ycer1et/ZVCtRWpbDAi0Skejbc3gg4os6SkRZX07Nx/qQJCQ1QA68+b MHMD6XO2+PBE+NXpTtLhWd1NmjL90rt/dfL04DFXmH3EzJuX1QCHUfhlZ+zO+5krrr1m zbyn4Rp8sqaa19KAxIw0/u/gk7mIjUd3yiJv+fsrk8vNf7KbZakohizxTaetfc8Y0FeQ 02Z30XdRBSvIGNIcAH6dmpMVQ59DOrIbSr3irbo7cHoMdWLRlwKuJbaQFjAaessqOy9e SBy2I2GvxZoVffKmh8S8WgX6AaAxmAcodBDQedv/JbG8rhKhOoOg5A17qNB6p41li29M 5otQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776485846; x=1777090646; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=9LhJmNKzbi8lgBR/6qaGu+UXTmOuN+KpkyQcs/Gar+k=; b=Z6vSse/KTMVNR51CAtXZsuzEurCrK4hKUrZb4STeU5bEMGZrJ4o0L2kcGTNcuJ6nei y7zM+a2NKk4CW0Q5shQQ/E7mZrH4QqtZAQunnMYqoUdZ6TNZnc2UnTstUVQ8mDqGFVPI LG7vSLXN7yAtGC594hfd3w1DV5E5uIwy1T+sWNjFnIjbQgu9HIxJQGfFym3x7w9Sev9I r0IbKeDcFnKHsuwif8ILbxzTyIm+/73+J5kQ0SNOYmEzudONgW7e/E4l42vchYW2FnPQ zMAi7hoifVzovWpPR1LBLDeGGs1rHmTRkAiQ88fItYcGws2VnI7jU5bAqjQx9R+W0j9Q tXiw== X-Forwarded-Encrypted: i=1; AFNElJ8XlogcRvA7c3m9d6hCP9OZVE/sgjkv5GUTooNhNgtC3EgXFNvcgukSd7GSq+OHDcd1NVwwh6qA/oq17AM=@vger.kernel.org X-Gm-Message-State: AOJu0YyDmVKSZcS5vbYAhfuymtG5P9XyFtO2dJ+W4Pi9OaxeIahJzDRR lJK2zZWogc1XuuI3/xmcZ9GMEtQW3jb0lY19U3Q2Bl+lP6H3kLLvQ14h X-Gm-Gg: AeBDievujEj5+NQ9gwmvgGqfy4V8HvT99CAO5RFR31xR2nayphxZRA5BwutqfJX/6Sa PPBqzDiU5syF3ucY8pY+tpmtOWkP0xkFj5Scy/NIwdJeixNRpFVejgHJFqmHEp6le9snh0ttvQO DM8ORfRCfzg7pljYdp9kuW7+ICDXRXKrOseH05gpp2vsRYucwmwxyTzsxWh4t0XtK6Ft5gzc+zK dJuhbTija+zuxeZqJPC2piqUm1oSo89hzLkOAgl7H0rq5qZhbJXRm3T2HgwDNVZzQR5e0mv0H0H 46jJaWpEKPLJIkmX+PZL/VbOfp4wl34Hg0dLiBhuUcUJkGPmYw65HW5BeLORkqY1CIhExq8tTq4 3FMcQyUeeRKQUD/gqMeEDl6FRqdmKh2t+fqxiRhHJvCTTxqk1OkNesmYzUrsKISUp8g87beZwof tQY4c5JcEAZcf1S/279qs1rbP8lqpVced9DGAp+9D2QWuvckC9FS48vr9Z0aEB8A== X-Received: by 2002:a05:6a00:3028:b0:82f:6640:7229 with SMTP id d2e1a72fcca58-82f8c91c9femr5819948b3a.23.1776485845707; Fri, 17 Apr 2026 21:17:25 -0700 (PDT) Received: from DESKTOP-MUHC17F.tail07b66e.ts.net ([188.253.121.151]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-82f8e981992sm4356787b3a.7.2026.04.17.21.17.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Apr 2026 21:17:25 -0700 (PDT) From: Zhenzhong Wu To: netdev@vger.kernel.org Cc: edumazet@google.com, ncardwell@google.com, kuniyu@google.com, davem@davemloft.net, dsahern@kernel.org, kuba@kernel.org, pabeni@redhat.com, horms@kernel.org, shuah@kernel.org, tamird@kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Zhenzhong Wu , stable@vger.kernel.org Subject: [PATCH net 1/2] tcp: call sk_data_ready() after listener migration Date: Sat, 18 Apr 2026 12:16:32 +0800 Message-ID: <20260418041633.691435-2-jt26wzz@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260418041633.691435-1-jt26wzz@gmail.com> References: <20260418041633.691435-1-jt26wzz@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When inet_csk_listen_stop() migrates an established child socket from a closing listener to another socket in the same SO_REUSEPORT group, the target listener gets a new accept-queue entry via inet_csk_reqsk_queue_add(), but that path never notifies the target listener's waiters. As a result, a nonblocking accept() still succeeds because it checks the accept queue directly, but waiters that sleep for listener readiness can remain asleep until another connection generates a wakeup. This affects poll()/epoll_wait()-based waiters, and can also leave a blocking accept() asleep after migration even though the child is already in the target listener's accept queue. This was observed in a local test where listener A completed the handshake, queued the child, and was closed before userspace called accept(). The child was migrated to listener B, but listener B never received a wakeup for the migrated accept-queue entry. Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration in inet_csk_listen_stop(). The reqsk_timer_handler() path does not need the same change: half-open requests only become readable to userspace when the final ACK completes the handshake, and tcp_child_process() already wakes the listener in that case. Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in = accept queues.") Cc: stable@vger.kernel.org Signed-off-by: Zhenzhong Wu --- net/ipv4/inet_connection_sock.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_soc= k.c index 4ac3ae1bc..da1ce082f 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk) __NET_INC_STATS(sock_net(nsk), LINUX_MIB_TCPMIGRATEREQSUCCESS); reqsk_migrate_reset(req); + READ_ONCE(nsk->sk_data_ready)(nsk); } else { __NET_INC_STATS(sock_net(nsk), LINUX_MIB_TCPMIGRATEREQFAILURE); --=20 2.43.0 From nobody Tue Jun 16 10:00:15 2026 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31492271A71 for ; Sat, 18 Apr 2026 04:17:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776485854; cv=none; b=ERf3ml+Jv9qdPjU62cFasxUJksCvlPhqpyp7oNOWgKByCSpWV2kHsxQPmc9hinexmK2Ym1hOo85H/WJgMEd3V37MkyhzYCQFSnf7sM+KWm6xDOemRewVScb7c4hILRe1/TusVgdYNcEi64NGBLmbmOc9BC+ZLgLICbOeMCJ52Vg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776485854; c=relaxed/simple; bh=6bHBWuib2VnJt0uYbQUXdG8PsUDJ1BfklOfjP2I7E7Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=vEyLmR2FyuBms6aNgQKnvA7PtFacmvKDNetFiodGEYthPpr8NqMPR3uRIbjPOdt0o18TWg/Q4wwbKF0EJ+DyS73MeUq8MWhmYRdYGISkxCIsC9vnxmxwYJMQxFqwUUDHaHKPfRxPW89k91GfkatFq4N/PaRdPdmZYyljTlopZ8Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=eXyxskv+; arc=none smtp.client-ip=209.85.210.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eXyxskv+" Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-82f37c09352so1609006b3a.0 for ; Fri, 17 Apr 2026 21:17:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776485852; x=1777090652; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=l03ftqcBwNBANgewVYkjhRzUGysmTzfJ0cwZF6DkQQ8=; b=eXyxskv+84L8rRQpMBl+dpa0OL1pmqakzcyeXoDh5BKBWNF9vX68ivz5ajAQ6D0+OZ V3fgu2O46EXh/silO+/dcDXHARZz7nCV2l54KRv37H8JInee86UI0EvjItJmC5ELzuRa WweSlN+9DDZEAEIR9RYAlW5o679aQQHrjgnWoMXGJ+znbMEAOFC4tG2BsS+xkhHUrFup HXMfFDjdyz+7IF28hbw6QNwl0vFuANuoc5bI1VubhSRutezev0/bxqIvWkbbnSHYUUTu Yx5bwZ3WYvbxnBj/vwCPVYfS10mtAtDVsdwLwO8Y4WxiPPBWnDc9atMtI54I66mNADSk W0dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776485852; x=1777090652; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=l03ftqcBwNBANgewVYkjhRzUGysmTzfJ0cwZF6DkQQ8=; b=pE2XS8MoHe0dowR8IJ7q56Det1KmBl0IAWJL1T7i1cUyaem+mZDzD7GClvFTTm3yL/ Z1EvkqNqPPyPr2PBfhweDHZfQghY4w03GBbSSb046m1e6SdpKFTxcuDG6tWNXXbBNcE4 DEwNVEaWtGRwK4fZ8VsqdkYIaTjVdlWkTnUs1z130Hx9OqYATaqd7liRoO79jmvMZrlc 0vSBlrSTCAH1VNoEVnktKh0QQLqiOV74br6s+neTmfSbtKQ9VNK3eQKfyV60IkFkoQXN EMeA6LKxBV1PhXQ9QiaYjcyyUEceL8GgYQJLaALqk1Qenjuao52H81TxZ9MRj5iOOMk/ 6s6A== X-Forwarded-Encrypted: i=1; AFNElJ/y2YnepNDpaecSTjjFL5AeDnyO83cepfW8dGg8ATnCyNWq5AHTg8rx7bmxEFpT3Vz0vyMOnCgDRPz0j48=@vger.kernel.org X-Gm-Message-State: AOJu0Ywz/Z9ssVAluacqHAww5LPCT110P8uOF0RWjFmwLI9rQelg4UrA XcFh4Is70SiBImJQvAoO/l8ohRcB3ATRAyAsz3TPcwgnkmdf1vf/VFMb X-Gm-Gg: AeBDiesH2Yig2QFVtEUqeSl9fhO6gbpTqQ46E+ZPvpHEfQddW3xm5DzGKJ0ErRPPuxL yXbiDoNgpD/lS56FLpqaRJ6H2Iw6pIK3gSxDA/fF8ejZn5xRWtkQFMD4QMnD5kZhUjhI8wMfRxI uwE46jh3mxbYtNnUALbbSUdtzrPSXX4R20kNLLfS7ttagH944yYHNayjBsNgm+dSvGiK0Hdyh2c N0+XgdqahpjNYDK0FYRAWOE05MJSVW0kAHgw46N1bakvNmL4payvKTX/JaQVxyTzJAx2Y0n0pSf GFKFIwZ6hfhBZWhqsW5hUhgeoknx1ZWQq1W8wKHGvn6t1wAJP74YB06vAA4KgGvrLqclDH7p5kn 0w63OgdV3+FMJP9nu6X9xyME/V83/v+DZp8KVzz4vIa9qJruCBAz8GfBu1mx5AALO25fnrxdcxd BpY/nFpaTXCNUvz6DX6ZQjUaKpI3b1Y+1S/IEw52aA+nTWeAVtsI6bguiQtav1CQ== X-Received: by 2002:a05:6a00:27ab:b0:82f:6eca:563e with SMTP id d2e1a72fcca58-82f8b573171mr4381198b3a.34.1776485851324; Fri, 17 Apr 2026 21:17:31 -0700 (PDT) Received: from DESKTOP-MUHC17F.tail07b66e.ts.net ([188.253.121.151]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-82f8e981992sm4356787b3a.7.2026.04.17.21.17.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Apr 2026 21:17:30 -0700 (PDT) From: Zhenzhong Wu To: netdev@vger.kernel.org Cc: edumazet@google.com, ncardwell@google.com, kuniyu@google.com, davem@davemloft.net, dsahern@kernel.org, kuba@kernel.org, pabeni@redhat.com, horms@kernel.org, shuah@kernel.org, tamird@kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Zhenzhong Wu Subject: [PATCH net 2/2] selftests: net: add reuseport migration wakeup regression tests Date: Sat, 18 Apr 2026 12:16:33 +0800 Message-ID: <20260418041633.691435-3-jt26wzz@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260418041633.691435-1-jt26wzz@gmail.com> References: <20260418041633.691435-1-jt26wzz@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add selftests that reproduce missing wakeups on the target listener after SO_REUSEPORT migration from inet_csk_listen_stop(). The epoll case connects while only the first listener is active so the child lands on its accept queue, registers the second listener with epoll, then closes the first listener to trigger migration. It verifies that the target listener both accepts the migrated child and becomes readable via epoll. The blocking accept case starts a thread blocked in accept() on the target listener, closes the first listener to trigger migration, and verifies that the blocked accept() wakes and returns the migrated child. Wait until the helper thread is actually asleep in accept() before triggering migration so the test does not race waiter registration. Run the tests in a private network namespace and enable net.ipv4.tcp_migrate_req=3D1 there so they can exercise the migration path without relying on a sk_reuseport/migrate BPF program. Treat a missing or unwritable tcp_migrate_req sysctl as SKIP. Run both scenarios for IPv4 and IPv6. These tests cover the bug fixed by the preceding patch. Signed-off-by: Zhenzhong Wu --- tools/testing/selftests/net/Makefile | 3 + .../selftests/net/reuseport_migrate_accept.c | 533 ++++++++++++++++++ .../selftests/net/reuseport_migrate_epoll.c | 353 ++++++++++++ 3 files changed, 889 insertions(+) create mode 100644 tools/testing/selftests/net/reuseport_migrate_accept.c create mode 100644 tools/testing/selftests/net/reuseport_migrate_epoll.c diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests= /net/Makefile index a275ed584..2f8b6c44d 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -184,6 +184,8 @@ TEST_GEN_PROGS :=3D \ reuseport_bpf_cpu \ reuseport_bpf_numa \ reuseport_dualstack \ + reuseport_migrate_accept \ + reuseport_migrate_epoll \ sk_bind_sendto_listen \ sk_connect_zero_addr \ sk_so_peek_off \ @@ -232,6 +234,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS +=3D -lnuma $(OUTPUT)/tcp_mmap: LDLIBS +=3D -lpthread -lcrypto $(OUTPUT)/tcp_inq: LDLIBS +=3D -lpthread $(OUTPUT)/bind_bhash: LDLIBS +=3D -lpthread +$(OUTPUT)/reuseport_migrate_accept: LDLIBS +=3D -lpthread $(OUTPUT)/io_uring_zerocopy_tx: CFLAGS +=3D -I../../../include/ =20 include bpf.mk diff --git a/tools/testing/selftests/net/reuseport_migrate_accept.c b/tools= /testing/selftests/net/reuseport_migrate_accept.c new file mode 100644 index 000000000..a516843a0 --- /dev/null +++ b/tools/testing/selftests/net/reuseport_migrate_accept.c @@ -0,0 +1,533 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest.h" + +#define ACCEPT_BLOCK_TIMEOUT_MS 1000 +#define ACCEPT_CLEANUP_TIMEOUT_MS 1000 +#define ACCEPT_WAKE_TIMEOUT_MS 2000 +#define TCP_MIGRATE_REQ_PATH "/proc/sys/net/ipv4/tcp_migrate_req" + +struct reuseport_migrate_case { + const char *name; + int family; + const char *addr; +}; + +struct accept_result { + int listener_fd; + atomic_int started; + atomic_int tid; + int accepted_fd; + int err; +}; + +static const struct reuseport_migrate_case test_cases[] =3D { + { + .name =3D "ipv4 blocking accept wake after reuseport migration", + .family =3D AF_INET, + .addr =3D "127.0.0.1", + }, + { + .name =3D "ipv6 blocking accept wake after reuseport migration", + .family =3D AF_INET6, + .addr =3D "::1", + }, +}; + +static void close_fd(int *fd) +{ + if (*fd >=3D 0) { + close(*fd); + *fd =3D -1; + } +} + +static bool unsupported_addr_err(int family, int err) +{ + return family =3D=3D AF_INET6 && + (err =3D=3D EAFNOSUPPORT || + err =3D=3D EPROTONOSUPPORT || + err =3D=3D EADDRNOTAVAIL); +} + +static int make_sockaddr(const struct reuseport_migrate_case *test_case, + unsigned short port, + struct sockaddr_storage *addr, + socklen_t *addrlen) +{ + memset(addr, 0, sizeof(*addr)); + + if (test_case->family =3D=3D AF_INET) { + struct sockaddr_in *addr4 =3D (struct sockaddr_in *)addr; + + addr4->sin_family =3D AF_INET; + addr4->sin_port =3D htons(port); + if (inet_pton(AF_INET, test_case->addr, &addr4->sin_addr) !=3D 1) + return -1; + + *addrlen =3D sizeof(*addr4); + return 0; + } + + if (test_case->family =3D=3D AF_INET6) { + struct sockaddr_in6 *addr6 =3D (struct sockaddr_in6 *)addr; + + addr6->sin6_family =3D AF_INET6; + addr6->sin6_port =3D htons(port); + if (inet_pton(AF_INET6, test_case->addr, &addr6->sin6_addr) !=3D 1) + return -1; + + *addrlen =3D sizeof(*addr6); + return 0; + } + + return -1; +} + +static int create_reuseport_socket(const struct reuseport_migrate_case *te= st_case) +{ + int one =3D 1; + int fd; + + fd =3D socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); + if (fd < 0) + return -1; + + if (test_case->family =3D=3D AF_INET6 && + setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &one, sizeof(one))) { + close(fd); + return -1; + } + + if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one))) { + close(fd); + return -1; + } + + return fd; +} + +static int enable_tcp_migrate_req(void) +{ + int len; + int fd; + + fd =3D open(TCP_MIGRATE_REQ_PATH, O_RDWR | O_CLOEXEC); + if (fd < 0) { + if (errno =3D=3D ENOENT || errno =3D=3D EACCES || + errno =3D=3D EPERM || errno =3D=3D EROFS) + return KSFT_SKIP; + return KSFT_FAIL; + } + + len =3D write(fd, "1", 1); + if (len !=3D 1) { + if (errno =3D=3D EACCES || errno =3D=3D EPERM || errno =3D=3D EROFS) { + close(fd); + return KSFT_SKIP; + } + + close(fd); + return KSFT_FAIL; + } + + close(fd); + return KSFT_PASS; +} + +static void setup_netns(void) +{ + int ret; + + if (unshare(CLONE_NEWNET)) + ksft_exit_skip("unshare(CLONE_NEWNET): %s\n", strerror(errno)); + + if (system("ip link set lo up")) + ksft_exit_skip("failed to bring up lo interface in netns\n"); + + ret =3D enable_tcp_migrate_req(); + if (ret =3D=3D KSFT_SKIP) + ksft_exit_skip("failed to enable tcp_migrate_req\n"); + if (ret =3D=3D KSFT_FAIL) + ksft_exit_fail_msg("failed to enable tcp_migrate_req\n"); +} + +static void noop_handler(int sig) +{ + (void)sig; +} + +static void *accept_thread(void *arg) +{ + struct accept_result *result =3D arg; + + atomic_store_explicit(&result->tid, (int)syscall(SYS_gettid), + memory_order_release); + atomic_store_explicit(&result->started, 1, memory_order_release); + result->accepted_fd =3D accept4(result->listener_fd, NULL, NULL, + SOCK_CLOEXEC); + if (result->accepted_fd < 0) + result->err =3D errno; + + return NULL; +} + +static int read_thread_state(int tid, char *state) +{ + char *close_paren; + char path[64]; + char buf[256]; + ssize_t len; + int fd; + + snprintf(path, sizeof(path), "/proc/self/task/%d/stat", tid); + + fd =3D open(path, O_RDONLY | O_CLOEXEC); + if (fd < 0) + return -errno; + + len =3D read(fd, buf, sizeof(buf) - 1); + close(fd); + if (len < 0) + return -errno; + if (!len) + return -EINVAL; + + buf[len] =3D '\0'; + close_paren =3D strrchr(buf, ')'); + if (!close_paren || close_paren[1] !=3D ' ' || !close_paren[2]) + return -EINVAL; + + *state =3D close_paren[2]; + return 0; +} + +static int wait_for_accept_to_block(const struct reuseport_migrate_case *t= est_case, + int tid) +{ + char state =3D '\0'; + int ret; + int i; + + /* + * A started thread is not enough here: we need to know the waiter + * has actually gone to sleep in accept() before closing listener_a, + * otherwise migration can race ahead of waiter registration. Poll + * /proc task state because the pthread APIs can tell us whether the + * thread has exited, but not whether it is already blocked in the + * target syscall. + */ + for (i =3D 0; i < ACCEPT_BLOCK_TIMEOUT_MS; i++) { + ret =3D read_thread_state(tid, &state); + if (!ret) { + if (state =3D=3D 'S' || state =3D=3D 'D') + return KSFT_PASS; + if (state =3D=3D 'Z') + break; + } else if (ret =3D=3D -ENOENT) { + break; + } + + usleep(1000); + } + + ksft_print_msg("%s: accept waiter never blocked before migration\n", + test_case->name); + return KSFT_FAIL; +} + +static int join_thread_with_timeout(pthread_t thread, int timeout_ms, + bool *timed_out) +{ + struct timespec deadline; + int err; + + *timed_out =3D false; + + if (clock_gettime(CLOCK_REALTIME, &deadline)) + return KSFT_FAIL; + + deadline.tv_nsec +=3D timeout_ms * 1000000LL; + deadline.tv_sec +=3D deadline.tv_nsec / 1000000000LL; + deadline.tv_nsec %=3D 1000000000LL; + + err =3D pthread_timedjoin_np(thread, NULL, &deadline); + if (!err) + return KSFT_PASS; + + if (err !=3D ETIMEDOUT) + return KSFT_FAIL; + + *timed_out =3D true; + return KSFT_FAIL; +} + +static int interrupt_accept_thread(pthread_t thread) +{ + int err; + + err =3D pthread_kill(thread, SIGUSR1); + if (err && err !=3D ESRCH) + return KSFT_FAIL; + + return KSFT_PASS; +} + +static int stop_accept_thread(pthread_t thread, bool *timed_out) +{ + if (interrupt_accept_thread(thread)) + return KSFT_FAIL; + + return join_thread_with_timeout(thread, ACCEPT_CLEANUP_TIMEOUT_MS, + timed_out); +} + +static int run_test(const struct reuseport_migrate_case *test_case) +{ + struct accept_result result =3D { + .listener_fd =3D -1, + .started =3D 0, + .tid =3D -1, + .accepted_fd =3D -1, + .err =3D 0, + }; + struct sockaddr_storage addr; + struct sigaction sa =3D { + .sa_handler =3D noop_handler, + }; + bool thread_joined =3D false; + bool cleanup_timed_out; + int listener_a =3D -1; + int listener_b =3D -1; + int ret =3D KSFT_FAIL; + socklen_t addrlen; + pthread_t thread; + int client =3D -1; + bool timed_out; + int probe =3D -1; + int tid; + + if (make_sockaddr(test_case, 0, &addr, &addrlen)) { + ksft_print_msg("%s: failed to build socket address\n", + test_case->name); + goto out; + } + + if (sigemptyset(&sa.sa_mask)) { + ksft_perror("sigemptyset"); + goto out; + } + + if (sigaction(SIGUSR1, &sa, NULL)) { + ksft_perror("sigaction(SIGUSR1)"); + goto out; + } + + listener_a =3D create_reuseport_socket(test_case); + if (listener_a < 0) { + if (unsupported_addr_err(test_case->family, errno)) { + ret =3D KSFT_SKIP; + goto out; + } + + ksft_perror("socket(listener_a)"); + goto out; + } + + if (bind(listener_a, (struct sockaddr *)&addr, addrlen)) { + if (unsupported_addr_err(test_case->family, errno)) { + ret =3D KSFT_SKIP; + goto out; + } + + ksft_perror("bind(listener_a)"); + goto out; + } + + if (listen(listener_a, 1)) { + ksft_perror("listen(listener_a)"); + goto out; + } + + addrlen =3D sizeof(addr); + if (getsockname(listener_a, (struct sockaddr *)&addr, &addrlen)) { + ksft_perror("getsockname(listener_a)"); + goto out; + } + + listener_b =3D create_reuseport_socket(test_case); + if (listener_b < 0) { + if (unsupported_addr_err(test_case->family, errno)) { + ret =3D KSFT_SKIP; + goto out; + } + + ksft_perror("socket(listener_b)"); + goto out; + } + + if (bind(listener_b, (struct sockaddr *)&addr, addrlen)) { + ksft_perror("bind(listener_b)"); + goto out; + } + + client =3D socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_= TCP); + if (client < 0) { + if (unsupported_addr_err(test_case->family, errno)) { + ret =3D KSFT_SKIP; + goto out; + } + + ksft_perror("socket(client)"); + goto out; + } + + /* Connect while only listener_a is listening, ensuring the + * child lands in listener_a's accept queue deterministically. + */ + if (connect(client, (struct sockaddr *)&addr, addrlen)) { + if (unsupported_addr_err(test_case->family, errno)) { + ret =3D KSFT_SKIP; + goto out; + } + + ksft_perror("connect(client)"); + goto out; + } + + if (listen(listener_b, 1)) { + ksft_perror("listen(listener_b)"); + goto out; + } + + result.listener_fd =3D listener_b; + if (pthread_create(&thread, NULL, accept_thread, &result)) { + ksft_perror("pthread_create"); + goto out; + } + + while (!atomic_load_explicit(&result.started, memory_order_acquire)) + sched_yield(); + + tid =3D atomic_load_explicit(&result.tid, memory_order_acquire); + if (wait_for_accept_to_block(test_case, tid)) + goto out_with_thread; + + close_fd(&listener_a); + + ret =3D join_thread_with_timeout(thread, ACCEPT_WAKE_TIMEOUT_MS, &timed_o= ut); + if (ret =3D=3D KSFT_PASS) { + thread_joined =3D true; + if (result.accepted_fd < 0) { + ksft_print_msg("%s: blocking accept() returned err=3D%d (%s)\n", + test_case->name, result.err, + strerror(result.err)); + ret =3D KSFT_FAIL; + } + + goto out_with_thread; + } + + if (!timed_out) { + ksft_print_msg("%s: join_thread_with_timeout() failed\n", + test_case->name); + goto out_with_thread; + } + + if (stop_accept_thread(thread, &cleanup_timed_out) =3D=3D KSFT_FAIL) { + ksft_print_msg("%s: failed to stop blocking accept waiter\n", + test_case->name); + goto out_with_thread; + } + thread_joined =3D true; + + if (result.accepted_fd >=3D 0) { + ksft_print_msg("%s: blocking accept() completed only in cleanup\n", + test_case->name); + goto out_with_thread; + } + + if (result.err !=3D EINTR) { + ksft_print_msg("%s: blocking accept() returned err=3D%d (%s)\n", + test_case->name, result.err, + strerror(result.err)); + goto out_with_thread; + } + + probe =3D accept4(listener_b, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC); + if (probe >=3D 0) { + ksft_print_msg("%s: accept queue was populated, but blocking accept() ti= med out\n", + test_case->name); + } else if (errno =3D=3D EAGAIN || errno =3D=3D EWOULDBLOCK) { + ksft_print_msg("%s: target listener had no queued child after migration\= n", + test_case->name); + } else { + ksft_perror("accept4(listener_b)"); + } + +out_with_thread: + close_fd(&probe); + if (!thread_joined) { + if (stop_accept_thread(thread, &cleanup_timed_out) =3D=3D KSFT_FAIL) { + ksft_print_msg("%s: failed to stop blocking accept waiter\n", + test_case->name); + ret =3D KSFT_FAIL; + goto out; + } + + thread_joined =3D true; + } + if (thread_joined) + close_fd(&result.accepted_fd); + +out: + close_fd(&client); + close_fd(&listener_b); + close_fd(&listener_a); + + return ret; +} + +int main(void) +{ + int status =3D KSFT_PASS; + int ret; + int i; + + setup_netns(); + + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(test_cases)); + + for (i =3D 0; i < ARRAY_SIZE(test_cases); i++) { + ret =3D run_test(&test_cases[i]); + ksft_test_result_code(ret, test_cases[i].name, NULL); + + if (ret =3D=3D KSFT_FAIL) + status =3D KSFT_FAIL; + } + + if (status =3D=3D KSFT_FAIL) + ksft_exit_fail(); + + ksft_finished(); +} diff --git a/tools/testing/selftests/net/reuseport_migrate_epoll.c b/tools/= testing/selftests/net/reuseport_migrate_epoll.c new file mode 100644 index 000000000..9cbfb58c4 --- /dev/null +++ b/tools/testing/selftests/net/reuseport_migrate_epoll.c @@ -0,0 +1,353 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest.h" + +#define EPOLL_TIMEOUT_MS 500 +#define TCP_MIGRATE_REQ_PATH "/proc/sys/net/ipv4/tcp_migrate_req" + +struct reuseport_migrate_case { + const char *name; + int family; + const char *addr; +}; + +static const struct reuseport_migrate_case test_cases[] =3D { + { + .name =3D "ipv4 epoll wake after reuseport migration", + .family =3D AF_INET, + .addr =3D "127.0.0.1", + }, + { + .name =3D "ipv6 epoll wake after reuseport migration", + .family =3D AF_INET6, + .addr =3D "::1", + }, +}; + +static void close_fd(int *fd) +{ + if (*fd >=3D 0) { + close(*fd); + *fd =3D -1; + } +} + +static bool unsupported_addr_err(int family, int err) +{ + return family =3D=3D AF_INET6 && + (err =3D=3D EAFNOSUPPORT || + err =3D=3D EPROTONOSUPPORT || + err =3D=3D EADDRNOTAVAIL); +} + +static int make_sockaddr(const struct reuseport_migrate_case *test_case, + unsigned short port, + struct sockaddr_storage *addr, + socklen_t *addrlen) +{ + memset(addr, 0, sizeof(*addr)); + + if (test_case->family =3D=3D AF_INET) { + struct sockaddr_in *addr4 =3D (struct sockaddr_in *)addr; + + addr4->sin_family =3D AF_INET; + addr4->sin_port =3D htons(port); + if (inet_pton(AF_INET, test_case->addr, &addr4->sin_addr) !=3D 1) + return -1; + + *addrlen =3D sizeof(*addr4); + return 0; + } + + if (test_case->family =3D=3D AF_INET6) { + struct sockaddr_in6 *addr6 =3D (struct sockaddr_in6 *)addr; + + addr6->sin6_family =3D AF_INET6; + addr6->sin6_port =3D htons(port); + if (inet_pton(AF_INET6, test_case->addr, &addr6->sin6_addr) !=3D 1) + return -1; + + *addrlen =3D sizeof(*addr6); + return 0; + } + + return -1; +} + +static int create_reuseport_socket(const struct reuseport_migrate_case *te= st_case) +{ + int one =3D 1; + int fd; + + fd =3D socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); + if (fd < 0) + return -1; + + if (test_case->family =3D=3D AF_INET6 && + setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &one, sizeof(one))) { + close(fd); + return -1; + } + + if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one))) { + close(fd); + return -1; + } + + return fd; +} + +static int set_nonblocking(int fd) +{ + int flags; + + flags =3D fcntl(fd, F_GETFL); + if (flags < 0) + return -1; + + return fcntl(fd, F_SETFL, flags | O_NONBLOCK); +} + +static int enable_tcp_migrate_req(void) +{ + int len; + int fd; + + fd =3D open(TCP_MIGRATE_REQ_PATH, O_RDWR | O_CLOEXEC); + if (fd < 0) { + if (errno =3D=3D ENOENT || errno =3D=3D EACCES || + errno =3D=3D EPERM || errno =3D=3D EROFS) + return KSFT_SKIP; + return KSFT_FAIL; + } + + len =3D write(fd, "1", 1); + if (len !=3D 1) { + if (errno =3D=3D EACCES || errno =3D=3D EPERM || errno =3D=3D EROFS) { + close(fd); + return KSFT_SKIP; + } + + close(fd); + return KSFT_FAIL; + } + + close(fd); + return KSFT_PASS; +} + +static void setup_netns(void) +{ + int ret; + + if (unshare(CLONE_NEWNET)) + ksft_exit_skip("unshare(CLONE_NEWNET): %s\n", strerror(errno)); + + if (system("ip link set lo up")) + ksft_exit_skip("failed to bring up lo interface in netns\n"); + + ret =3D enable_tcp_migrate_req(); + if (ret =3D=3D KSFT_SKIP) + ksft_exit_skip("failed to enable tcp_migrate_req\n"); + if (ret =3D=3D KSFT_FAIL) + ksft_exit_fail_msg("failed to enable tcp_migrate_req\n"); +} + +static int run_test(const struct reuseport_migrate_case *test_case) +{ + struct sockaddr_storage addr; + struct epoll_event ev =3D { + .events =3D EPOLLIN, + }; + int listener_a =3D -1; + int listener_b =3D -1; + int ret =3D KSFT_FAIL; + socklen_t addrlen; + int accepted =3D -1; + int client =3D -1; + int epfd =3D -1; + int n; + + if (make_sockaddr(test_case, 0, &addr, &addrlen)) { + ksft_print_msg("%s: failed to build socket address\n", + test_case->name); + goto out; + } + + listener_a =3D create_reuseport_socket(test_case); + if (listener_a < 0) { + if (unsupported_addr_err(test_case->family, errno)) { + ret =3D KSFT_SKIP; + goto out; + } + + ksft_perror("socket(listener_a)"); + goto out; + } + + if (bind(listener_a, (struct sockaddr *)&addr, addrlen)) { + if (unsupported_addr_err(test_case->family, errno)) { + ret =3D KSFT_SKIP; + goto out; + } + + ksft_perror("bind(listener_a)"); + goto out; + } + + if (listen(listener_a, 1)) { + ksft_perror("listen(listener_a)"); + goto out; + } + + addrlen =3D sizeof(addr); + if (getsockname(listener_a, (struct sockaddr *)&addr, &addrlen)) { + ksft_perror("getsockname(listener_a)"); + goto out; + } + + listener_b =3D create_reuseport_socket(test_case); + if (listener_b < 0) { + if (unsupported_addr_err(test_case->family, errno)) { + ret =3D KSFT_SKIP; + goto out; + } + + ksft_perror("socket(listener_b)"); + goto out; + } + + if (bind(listener_b, (struct sockaddr *)&addr, addrlen)) { + ksft_perror("bind(listener_b)"); + goto out; + } + + client =3D socket(test_case->family, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_= TCP); + if (client < 0) { + if (unsupported_addr_err(test_case->family, errno)) { + ret =3D KSFT_SKIP; + goto out; + } + + ksft_perror("socket(client)"); + goto out; + } + + /* Connect while only listener_a is listening, ensuring the + * child lands in listener_a's accept queue deterministically. + */ + if (connect(client, (struct sockaddr *)&addr, addrlen)) { + if (unsupported_addr_err(test_case->family, errno)) { + ret =3D KSFT_SKIP; + goto out; + } + + ksft_perror("connect(client)"); + goto out; + } + + if (listen(listener_b, 1)) { + ksft_perror("listen(listener_b)"); + goto out; + } + + if (set_nonblocking(listener_b)) { + ksft_perror("set_nonblocking(listener_b)"); + goto out; + } + + epfd =3D epoll_create1(EPOLL_CLOEXEC); + if (epfd < 0) { + ksft_perror("epoll_create1"); + goto out; + } + + ev.data.fd =3D listener_b; + if (epoll_ctl(epfd, EPOLL_CTL_ADD, listener_b, &ev)) { + ksft_perror("epoll_ctl(ADD listener_b)"); + goto out; + } + + close_fd(&listener_a); + + n =3D epoll_wait(epfd, &ev, 1, EPOLL_TIMEOUT_MS); + if (n < 0) { + ksft_perror("epoll_wait"); + goto out; + } + + accepted =3D accept4(listener_b, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC= ); + if (accepted < 0) { + if (errno =3D=3D EAGAIN || errno =3D=3D EWOULDBLOCK) { + ksft_print_msg("%s: target listener had no queued child after migration= \n", + test_case->name); + goto out; + } + + ksft_perror("accept4(listener_b)"); + goto out; + } + + if (n !=3D 1) { + ksft_print_msg("%s: accept queue was populated, but epoll_wait() timed o= ut\n", + test_case->name); + goto out; + } + + if (ev.data.fd !=3D listener_b || !(ev.events & EPOLLIN)) { + ksft_print_msg("%s: unexpected epoll event fd=3D%d events=3D%#x\n", + test_case->name, ev.data.fd, ev.events); + goto out; + } + + ret =3D KSFT_PASS; + +out: + close_fd(&accepted); + close_fd(&epfd); + close_fd(&client); + close_fd(&listener_b); + close_fd(&listener_a); + + return ret; +} + +int main(void) +{ + int status =3D KSFT_PASS; + int ret; + int i; + + setup_netns(); + + ksft_print_header(); + ksft_set_plan(ARRAY_SIZE(test_cases)); + + for (i =3D 0; i < ARRAY_SIZE(test_cases); i++) { + ret =3D run_test(&test_cases[i]); + ksft_test_result_code(ret, test_cases[i].name, NULL); + + if (ret =3D=3D KSFT_FAIL) + status =3D KSFT_FAIL; + } + + if (status =3D=3D KSFT_FAIL) + ksft_exit_fail(); + + ksft_finished(); +} --=20 2.43.0