From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB2FA9461; Wed, 21 May 2025 15:32:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841545; cv=none; b=kSTDdfuHUc81LFT9JM5YFPnQ5274XLIP5oWk2MvGTdhU56GhJRu7ufuPAjPDXHtY7qrH9qqz+ORoqZ1C8/8kL1WsGhnmKuw3BoVZa8TYppW7D2A8GvM+RC3t7t5J62gBYXvmNQyIo0/APndMrrAuNxCN5aJUjwBCSlcUikyFcrk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841545; c=relaxed/simple; bh=15lq1dewDEmZRAi/ruAvpBBjnmaSB3upAZ1I3QpCEos=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=H3CbqM9MzcemIm2u1EW5+5a828KkzqPNFqFT7oRk55fwf0y0Q4LbaqDgEl5KNEC856Kj6I92uudgThHt5extuFMAoQiiDvIEsgtqla4gWZE/sM/ZET/9J8RSmircr965H4L4/5lDBKnArDbGPs7yzSXiUhcCY4vDgVuAFp7Q+rY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dO97HBnT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dO97HBnT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 40FBFC4CEE7; Wed, 21 May 2025 15:32:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841544; bh=15lq1dewDEmZRAi/ruAvpBBjnmaSB3upAZ1I3QpCEos=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dO97HBnTEObO2AIvtdgC8emxlz1lr9zp7b4iPXwxerWkLxkekPoJwitnoknJe1USR fpfiBSBUJohK9+gVttw2baTEaTfcvOrI8pQSXOU+bs8teKTPj+ut4dvuCNBkJGY9hO uyGuZd1mWsegzAt+nqbiDaCbYq7Hi/9zFgqtmM3FhYS5LiiXAHSNdIzNAKFMqHYCXP /DYx0xX4aBqvN2QZAF1dW2jJRSVdX1SNrxhhC7YdpOHkznmo5fAOQ1JypfKpdZEUif qZ4wEGwbTzrroGuKA7qSarwm9Vp5olCSBqw2l/sQapHPN3RK2bcT9TRhTI4SGGj6fV j+nnlwoiWhfiw== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Jens Axboe , Alexander Mikhalitsyn , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org, Leon Romanovsky , David Ahern , Arnd Bergmann , Kees Cook , Lennart Poettering , Luca Boccassi , linux-arch@vger.kernel.org Subject: [PATCH v6.1 01/27] af_unix: Kconfig: make CONFIG_UNIX bool Date: Wed, 21 May 2025 16:27:00 +0100 Message-ID: <20250521152920.1116756-2-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Alexander Mikhalitsyn [ Upstream commit 97154bcf4d1b7cabefec8a72cff5fbb91d5afb7b ] Let's make CONFIG_UNIX a bool instead of a tristate. We've decided to do that during discussion about SCM_PIDFD patchset [1]. [1] https://lore.kernel.org/lkml/20230524081933.44dc8bea@kernel.org/ Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Leon Romanovsky Cc: David Ahern Cc: Arnd Bergmann Cc: Kees Cook Cc: Christian Brauner Cc: Kuniyuki Iwashima Cc: Lennart Poettering Cc: Luca Boccassi Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-arch@vger.kernel.org Suggested-by: Jakub Kicinski Signed-off-by: Alexander Mikhalitsyn Acked-by: Christian Brauner Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller (cherry picked from commit 97154bcf4d1b7cabefec8a72cff5fbb91d5afb7b) Signed-off-by: Lee Jones --- net/unix/Kconfig | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/net/unix/Kconfig b/net/unix/Kconfig index b7f811216820..28b232f281ab 100644 --- a/net/unix/Kconfig +++ b/net/unix/Kconfig @@ -4,7 +4,7 @@ # =20 config UNIX - tristate "Unix domain sockets" + bool "Unix domain sockets" help If you say Y here, you will include support for Unix domain sockets; sockets are the standard Unix mechanism for establishing and @@ -14,10 +14,6 @@ config UNIX an embedded system or something similar, you therefore definitely want to say Y here. =20 - To compile this driver as a module, choose M here: the module will be - called unix. Note that several important services won't work - correctly if you say M here and then neglect to load the module. - Say Y unless you know what you are doing. =20 config UNIX_SCM --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 10C8B218EBF; Wed, 21 May 2025 15:32:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841552; cv=none; b=RjmNATmiDkgvgE9C4cLnMJVJ/C6AiGbQlGXVn1VhpmtUluMLTad69JibX7zdNS13Hf7Rp7KKJLqSqtmV6bVGRZJemfiM3/icupgQ5ry6jjoIyufe6BSdukWxLm6L2hsfLAGT+GCPZFthg5NcWk/GqXmSH1Hq+lD/WWW+IjV5svY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841552; c=relaxed/simple; bh=dl4/jVpEjwuOyn5OIckbpyVPjbDVzVffI1XGjQPselI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QtIS4qXhLhYKi5OHNcytaeomZcSdA6G9uKJPytIRe71hRjJXluQ0bAk1Jeu38CTYUAE+wH1zxP+1N9Mnm4xmjCXJpnq65zpRdAUn4DoQql1AIiR75A9BO39v7IyUUdebdryOwppt7dWB+vHVQpLNSK1rr/4l37tuVrdb9qz1ryA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=vAfo93rf; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vAfo93rf" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7D567C4AF0C; Wed, 21 May 2025 15:32:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841551; bh=dl4/jVpEjwuOyn5OIckbpyVPjbDVzVffI1XGjQPselI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vAfo93rfwi8tR5zMMbkAXxRq/O/qSPaWusN2ajbt3w2cPBmjq16hrYyybnUIklsUP xk5uByeilYrZE3S6GZc9c5Dl4FTykaiLs082mQG6iddGQUV4qDbOEm9TWU7gOkm18l PGQDXF08nZyYqZt3o0UFQMIH+8Qxt1zgS/YmxQ/pVAZHeuXJHpUAIVEOr8K70W3X+t PK8ZULVDYLHeQq5b57XW4CCRNv3c9X4brN1wCmvZqisl3AQ/CGJQNFXQgk/mQXrndS 6j0X1U5Vvb7M2HtRFM9ECN/qJOXlCPTWqELgMzPKBk4LOp+Yxjsu5Pl0/r0cxzb/wK +weNqO5upSi/A== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Alexander Mikhalitsyn , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org, Simon Horman Subject: [PATCH v6.1 02/27] af_unix: Return struct unix_sock from unix_get_socket(). Date: Wed, 21 May 2025 16:27:01 +0100 Message-ID: <20250521152920.1116756-3-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 5b17307bd0789edea0675d524a2b277b93bbde62 ] Currently, unix_get_socket() returns struct sock, but after calling it, we always cast it to unix_sk(). Let's return struct unix_sock from unix_get_socket(). Signed-off-by: Kuniyuki Iwashima Acked-by: Pavel Begunkov Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20240123170856.41348-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 5b17307bd0789edea0675d524a2b277b93bbde62) Signed-off-by: Lee Jones --- include/net/af_unix.h | 2 +- net/unix/garbage.c | 19 +++++++------------ net/unix/scm.c | 19 +++++++------------ 3 files changed, 15 insertions(+), 25 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index e7d71a516bd4..52ae023c3e93 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -13,7 +13,7 @@ void unix_notinflight(struct user_struct *user, struct fi= le *fp); void unix_destruct_scm(struct sk_buff *skb); void unix_gc(void); void wait_for_unix_gc(void); -struct sock *unix_get_socket(struct file *filp); +struct unix_sock *unix_get_socket(struct file *filp); struct sock *unix_peer_get(struct sock *sk); =20 #define UNIX_HASH_MOD (256 - 1) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index d2fc795394a5..438f5d9b9173 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -105,20 +105,15 @@ static void scan_inflight(struct sock *x, void (*func= )(struct unix_sock *), =20 while (nfd--) { /* Get the socket the fd matches if it indeed does so */ - struct sock *sk =3D unix_get_socket(*fp++); + struct unix_sock *u =3D unix_get_socket(*fp++); =20 - if (sk) { - struct unix_sock *u =3D unix_sk(sk); + /* Ignore non-candidates, they could have been added + * to the queues after starting the garbage collection + */ + if (u && test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) { + hit =3D true; =20 - /* Ignore non-candidates, they could - * have been added to the queues after - * starting the garbage collection - */ - if (test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) { - hit =3D true; - - func(u); - } + func(u); } } if (hit && hitlist !=3D NULL) { diff --git a/net/unix/scm.c b/net/unix/scm.c index 4eff7da9f6f9..693817a31ad8 100644 --- a/net/unix/scm.c +++ b/net/unix/scm.c @@ -21,9 +21,8 @@ EXPORT_SYMBOL(gc_inflight_list); DEFINE_SPINLOCK(unix_gc_lock); EXPORT_SYMBOL(unix_gc_lock); =20 -struct sock *unix_get_socket(struct file *filp) +struct unix_sock *unix_get_socket(struct file *filp) { - struct sock *u_sock =3D NULL; struct inode *inode =3D file_inode(filp); =20 /* Socket ? */ @@ -33,10 +32,10 @@ struct sock *unix_get_socket(struct file *filp) =20 /* PF_UNIX ? */ if (s && sock->ops && sock->ops->family =3D=3D PF_UNIX) - u_sock =3D s; + return unix_sk(s); } =20 - return u_sock; + return NULL; } EXPORT_SYMBOL(unix_get_socket); =20 @@ -45,13 +44,11 @@ EXPORT_SYMBOL(unix_get_socket); */ void unix_inflight(struct user_struct *user, struct file *fp) { - struct sock *s =3D unix_get_socket(fp); + struct unix_sock *u =3D unix_get_socket(fp); =20 spin_lock(&unix_gc_lock); =20 - if (s) { - struct unix_sock *u =3D unix_sk(s); - + if (u) { if (!u->inflight) { BUG_ON(!list_empty(&u->link)); list_add_tail(&u->link, &gc_inflight_list); @@ -68,13 +65,11 @@ void unix_inflight(struct user_struct *user, struct fil= e *fp) =20 void unix_notinflight(struct user_struct *user, struct file *fp) { - struct sock *s =3D unix_get_socket(fp); + struct unix_sock *u =3D unix_get_socket(fp); =20 spin_lock(&unix_gc_lock); =20 - if (s) { - struct unix_sock *u =3D unix_sk(s); - + if (u) { BUG_ON(!u->inflight); BUG_ON(list_empty(&u->link)); =20 --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 391C817BB0D; Wed, 21 May 2025 15:32:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841559; cv=none; b=QTpdkx3fA8qyUUkOGUvLJwwKa9Fx2q2B3yh1EDNt6hrBXoxXt+HboiPjPrWtscB12WLBUY4BM9klWO/ecL6okpgJFXGgKdoyhWRG4Wcfr/UPbmd1yM7F2V2iB42pFtK6Ig10ave2oCvGbCUinp2h5KcmpK52uWKsYUmsb2cHRwU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841559; c=relaxed/simple; bh=ZAXKZ42zBWZyJ7kSkE/8V0bVlCWESw/XuUcLoE5rsKw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nFT/hVqx8z+W9x4FE176hZtec9llXEmzrhO5T6tBF/vK+biiyvJhtuEoUDc7xKlrau61/2zOPXHLmczM389NiXN82pQJRceLyWM4GnOaYnlrPUU0e16QBjYgvad/ievdlQ0acaty2P0dvs0eabq3CrRfGVLnh8rtwA5WguXRlSI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VHicAW/I; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VHicAW/I" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 928DDC4CEE7; Wed, 21 May 2025 15:32:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841558; bh=ZAXKZ42zBWZyJ7kSkE/8V0bVlCWESw/XuUcLoE5rsKw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VHicAW/ICNtkp+fmKZ9/4QZpzC4GhxiUrtZbZfsEXmCNiUyQP7K1sXci9tQkSo1kF JIHUBSKbv6X9XlvzvRDayf1FrBrpuxXM8E17sD53ufPFrIBxYWrokQWWy+T9rkORYI egdZ+GQY1xobkRZCn5TWKm5LYTfSyOtxdx0AopaAg2YcfnRKVQW4cHf1ZKDH+5OXQB h4/xt+h1K3AHg+W2TE7c07DNKndW9etEYas7vOFFE6piunA4ZakCnV6Aw7JqRuLr7c GfiC54XVrXwLALvfKYa/jY75j+yBLDOnJ7wuRx79A2LBR1ICdO8cnlMLhIZc3W5bKz 3DvrJeLGcL+gQ== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Alexander Mikhalitsyn , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 03/27] af_unix: Run GC on only one CPU. Date: Wed, 21 May 2025 16:27:02 +0100 Message-ID: <20250521152920.1116756-4-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 8b90a9f819dc2a06baae4ec1a64d875e53b824ec ] If more than 16000 inflight AF_UNIX sockets exist and the garbage collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc(). Also, they wait for unix_gc() to complete. In unix_gc(), all inflight AF_UNIX sockets are traversed at least once, and more if they are the GC candidate. Thus, sendmsg() significantly slows down with too many inflight AF_UNIX sockets. There is a small window to invoke multiple unix_gc() instances, which will then be blocked by the same spinlock except for one. Let's convert unix_gc() to use struct work so that it will not consume CPUs unnecessarily. Note WRITE_ONCE(gc_in_progress, true) is moved before running GC. If we leave the WRITE_ONCE() as is and use the following test to call flush_work(), a process might not call it. CPU 0 CPU 1 --- --- start work and call __unix_gc= () if (work_pending(&unix_gc_work) || <-- false READ_ONCE(gc_in_progress)) <-- false flush_work(); <-- missed! WRITE_ONCE(gc_in_progress, true) Signed-off-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20240123170856.41348-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 8b90a9f819dc2a06baae4ec1a64d875e53b824ec) Signed-off-by: Lee Jones --- net/unix/garbage.c | 54 +++++++++++++++++++++++----------------------- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 438f5d9b9173..bf628bfb6d35 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -86,7 +86,6 @@ /* Internal data structures and random procedures: */ =20 static LIST_HEAD(gc_candidates); -static DECLARE_WAIT_QUEUE_HEAD(unix_gc_wait); =20 static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), struct sk_buff_head *hitlist) @@ -182,23 +181,8 @@ static void inc_inflight_move_tail(struct unix_sock *u) } =20 static bool gc_in_progress; -#define UNIX_INFLIGHT_TRIGGER_GC 16000 - -void wait_for_unix_gc(void) -{ - /* If number of inflight sockets is insane, - * force a garbage collect right now. - * Paired with the WRITE_ONCE() in unix_inflight(), - * unix_notinflight() and gc_in_progress(). - */ - if (READ_ONCE(unix_tot_inflight) > UNIX_INFLIGHT_TRIGGER_GC && - !READ_ONCE(gc_in_progress)) - unix_gc(); - wait_event(unix_gc_wait, !READ_ONCE(gc_in_progress)); -} =20 -/* The external entry point: unix_gc() */ -void unix_gc(void) +static void __unix_gc(struct work_struct *work) { struct sk_buff *next_skb, *skb; struct unix_sock *u; @@ -209,13 +193,6 @@ void unix_gc(void) =20 spin_lock(&unix_gc_lock); =20 - /* Avoid a recursive GC. */ - if (gc_in_progress) - goto out; - - /* Paired with READ_ONCE() in wait_for_unix_gc(). */ - WRITE_ONCE(gc_in_progress, true); - /* First, select candidates for garbage collection. Only * in-flight sockets are considered, and from those only ones * which don't have any external reference. @@ -346,8 +323,31 @@ void unix_gc(void) /* Paired with READ_ONCE() in wait_for_unix_gc(). */ WRITE_ONCE(gc_in_progress, false); =20 - wake_up(&unix_gc_wait); - - out: spin_unlock(&unix_gc_lock); } + +static DECLARE_WORK(unix_gc_work, __unix_gc); + +void unix_gc(void) +{ + WRITE_ONCE(gc_in_progress, true); + queue_work(system_unbound_wq, &unix_gc_work); +} + +#define UNIX_INFLIGHT_TRIGGER_GC 16000 + +void wait_for_unix_gc(void) +{ + /* If number of inflight sockets is insane, + * force a garbage collect right now. + * + * Paired with the WRITE_ONCE() in unix_inflight(), + * unix_notinflight(), and __unix_gc(). + */ + if (READ_ONCE(unix_tot_inflight) > UNIX_INFLIGHT_TRIGGER_GC && + !READ_ONCE(gc_in_progress)) + unix_gc(); + + if (READ_ONCE(gc_in_progress)) + flush_work(&unix_gc_work); +} --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B04D31607A4; Wed, 21 May 2025 15:32:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841565; cv=none; b=QqM8xHX4lcBoepHenY3mddOPbU1S6FqZ2ooEB9slGpEwW7TnecFF24w7mXYiPCX9jTfrj3H1cK97G2uK2OT36hIA7Acg2cMegNKac/xKAGDCGOK4LT3o3R6FSSXEryWUcDY3tQyiBuW2eq0yCGXVCdsQEBF3kAnsOhkvG734CNU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841565; c=relaxed/simple; bh=LkzJwLWakW+9qDe2Gz5b3l1g1z1NAf7iWtP6+sDm0ZA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pJw9cxZqaADk0DDW1asoFZsNMuFXQrDZ3MQxMOtJ/L6V2wFLHewl7mQqHsgRPeM+w6cDzyFcQV1D4O5RnoKZnMCSahiLQPLdCIKJSoNo8NCMmzvtuMVyrHdFHQiB2jZSO1w+n2tcqQpNNHumx5ep/A3/pna+al4SGU6xFa3jd24= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ugP9L4sd; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ugP9L4sd" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 964FCC4CEE4; Wed, 21 May 2025 15:32:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841565; bh=LkzJwLWakW+9qDe2Gz5b3l1g1z1NAf7iWtP6+sDm0ZA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ugP9L4sdt1CrtPS/yWJtyRvdDa1rwBVnXDgM8WDhHBRbda/l1fp5h5Jl+LPGHKLGT +21h00aRhGT0C8fIRb3wQFcjfdQE3kNfWpD1IZRznyHCyKhVMGfTOpiuoeNfbxUVC4 X981UkruXdeYWnm6hkNabqnkAvkHgtRBc5c/A2xgbtd9cY8i2SDUoin+C4gcLCQy/r fmbzggFQ+qQ1ijyiXFvVXwzBZt/qQ93BPRrlxvuKVSncpNWYc1iYTPMM3cLSlqPWlt 1C0T94pPSeb/pI7PP93ZtuJ/RmVJ+kaTkyqfdMFPdsUwR+cow+Rfs/fXbNOAsptN0h xhUvnYHygGQbA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Alexander Mikhalitsyn , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 04/27] af_unix: Try to run GC async. Date: Wed, 21 May 2025 16:27:03 +0100 Message-ID: <20250521152920.1116756-5-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit d9f21b3613337b55cc9d4a6ead484dca68475143 ] If more than 16000 inflight AF_UNIX sockets exist and the garbage collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc(). Also, they wait for unix_gc() to complete. In unix_gc(), all inflight AF_UNIX sockets are traversed at least once, and more if they are the GC candidate. Thus, sendmsg() significantly slows down with too many inflight AF_UNIX sockets. However, if a process sends data with no AF_UNIX FD, the sendmsg() call does not need to wait for GC. After this change, only the process that meets the condition below will be blocked under such a situation. 1) cmsg contains AF_UNIX socket 2) more than 32 AF_UNIX sent by the same user are still inflight Note that even a sendmsg() call that does not meet the condition but has AF_UNIX FD will be blocked later in unix_scm_to_skb() by the spinlock, but we allow that as a bonus for sane users. The results below are the time spent in unix_dgram_sendmsg() sending 1 byte of data with no FD 4096 times on a host where 32K inflight AF_UNIX sockets exist. Without series: the sane sendmsg() needs to wait gc unreasonably. $ sudo /usr/share/bcc/tools/funclatency -p 11165 unix_dgram_sendmsg Tracing 1 functions for "unix_dgram_sendmsg"... Hit Ctrl-C to end. ^C nsecs : count distribution [...] 524288 -> 1048575 : 0 | = | 1048576 -> 2097151 : 3881 |************************************= ****| 2097152 -> 4194303 : 214 |** = | 4194304 -> 8388607 : 1 | = | avg =3D 1825567 nsecs, total: 7477526027 nsecs, count: 4096 With series: the sane sendmsg() can finish much faster. $ sudo /usr/share/bcc/tools/funclatency -p 8702 unix_dgram_sendmsg Tracing 1 functions for "unix_dgram_sendmsg"... Hit Ctrl-C to end. ^C nsecs : count distribution [...] 128 -> 255 : 0 | = | 256 -> 511 : 4092 |************************************= ****| 512 -> 1023 : 2 | = | 1024 -> 2047 : 0 | = | 2048 -> 4095 : 0 | = | 4096 -> 8191 : 1 | = | 8192 -> 16383 : 1 | = | avg =3D 410 nsecs, total: 1680510 nsecs, count: 4096 Signed-off-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20240123170856.41348-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit d9f21b3613337b55cc9d4a6ead484dca68475143) Signed-off-by: Lee Jones --- include/net/af_unix.h | 12 ++++++++++-- include/net/scm.h | 1 + net/core/scm.c | 5 +++++ net/unix/af_unix.c | 6 ++++-- net/unix/garbage.c | 10 +++++++++- 5 files changed, 29 insertions(+), 5 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 52ae023c3e93..be488d627531 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -8,12 +8,20 @@ #include #include =20 +#if IS_ENABLED(CONFIG_UNIX) +struct unix_sock *unix_get_socket(struct file *filp); +#else +static inline struct unix_sock *unix_get_socket(struct file *filp) +{ + return NULL; +} +#endif + void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); void unix_destruct_scm(struct sk_buff *skb); void unix_gc(void); -void wait_for_unix_gc(void); -struct unix_sock *unix_get_socket(struct file *filp); +void wait_for_unix_gc(struct scm_fp_list *fpl); struct sock *unix_peer_get(struct sock *sk); =20 #define UNIX_HASH_MOD (256 - 1) diff --git a/include/net/scm.h b/include/net/scm.h index 585adc1346bd..a5c26008fcec 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -23,6 +23,7 @@ struct scm_creds { =20 struct scm_fp_list { short count; + short count_unix; short max; struct user_struct *user; struct file *fp[SCM_MAX_FD]; diff --git a/net/core/scm.c b/net/core/scm.c index a877c4ef4c25..bb25052624ee 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -36,6 +36,7 @@ #include #include #include +#include =20 =20 /* @@ -85,6 +86,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_f= p_list **fplp) return -ENOMEM; *fplp =3D fpl; fpl->count =3D 0; + fpl->count_unix =3D 0; fpl->max =3D SCM_MAX_FD; fpl->user =3D NULL; } @@ -109,6 +111,9 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm= _fp_list **fplp) fput(file); return -EINVAL; } + if (unix_get_socket(file)) + fpl->count_unix++; + *fpp++ =3D file; fpl->count++; } diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 5ce60087086c..f74f7878b3fe 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1875,11 +1875,12 @@ static int unix_dgram_sendmsg(struct socket *sock, = struct msghdr *msg, long timeo; int err; =20 - wait_for_unix_gc(); err =3D scm_send(sock, msg, &scm, false); if (err < 0) return err; =20 + wait_for_unix_gc(scm.fp); + err =3D -EOPNOTSUPP; if (msg->msg_flags&MSG_OOB) goto out; @@ -2145,11 +2146,12 @@ static int unix_stream_sendmsg(struct socket *sock,= struct msghdr *msg, bool fds_sent =3D false; int data_len; =20 - wait_for_unix_gc(); err =3D scm_send(sock, msg, &scm, false); if (err < 0) return err; =20 + wait_for_unix_gc(scm.fp); + err =3D -EOPNOTSUPP; if (msg->msg_flags & MSG_OOB) { #if IS_ENABLED(CONFIG_AF_UNIX_OOB) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index bf628bfb6d35..2934d7b68036 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -335,8 +335,9 @@ void unix_gc(void) } =20 #define UNIX_INFLIGHT_TRIGGER_GC 16000 +#define UNIX_INFLIGHT_SANE_USER (SCM_MAX_FD * 8) =20 -void wait_for_unix_gc(void) +void wait_for_unix_gc(struct scm_fp_list *fpl) { /* If number of inflight sockets is insane, * force a garbage collect right now. @@ -348,6 +349,13 @@ void wait_for_unix_gc(void) !READ_ONCE(gc_in_progress)) unix_gc(); =20 + /* Penalise users who want to send AF_UNIX sockets + * but whose sockets have not been received yet. + */ + if (!fpl || !fpl->count_unix || + READ_ONCE(fpl->user->unix_inflight) < UNIX_INFLIGHT_SANE_USER) + return; + if (READ_ONCE(gc_in_progress)) flush_work(&unix_gc_work); } --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B812B189513; Wed, 21 May 2025 15:32:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841572; cv=none; b=bHdtnsaijjekicstUlHW77qTEIURojhXto1KQnXEBYKOaARaDcy1PGM2EHo1qDV+rCcKuea4wx4wj9rUt0J+z2hlXyqr8eic4RLUJvtJBl55qetlsXsS13KH8+54j/qnkuMWjqaOnVXrbT8JLTra6nVQUq2LEiLKqbaC7uq2YxY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841572; c=relaxed/simple; bh=PQ8yqnHUEFa5BUreHYRwSgQIVySTh+UdDqRcgs6i9lI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=B+nCF8csFl0MZxU2LNF4h7OfYjXwI+busmGDf53fCj0DoMAa6dV5xpp06uPTBtCY/wvrZA9T13VrLJoocQtdbz0/78v/+101nrV8I6fpMvR5amZOqmq+kP/Q+J88K/P17AMEt4scw8HdQPNxhwEMPfYiERAc/HWFmuBDQ6YbNK8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BAn5V7uA; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BAn5V7uA" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8147EC4CEE4; Wed, 21 May 2025 15:32:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841572; bh=PQ8yqnHUEFa5BUreHYRwSgQIVySTh+UdDqRcgs6i9lI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=BAn5V7uAjpTI4MACXEixF5hQmEMuTIqpoql0l9KOsRg+qj7/C7xw8wcjPSOdg9zBi r/d7ADETVl+Pn3CJAwoqlcZer+JxFqJCVVLo46spb1BrZsnvwfJOGZtlTp7xx4lqOy jLdhqspMbwFfOEAGuXv7rqKz86FbHhhdPPr+OqKiXJxhkWrU9BBHE7f/vcK2ShYPkb 4gzOogxi3UKUe1TVfH5TlrhbMb+gOCStevxGUx/9rx4qHuA6lSbQ1MK1+j0CgZTEK9 nMC0/0jZK+qAr8a1BySWi9KkQkjnHwqKfSnHiUH45bh7fAwFAalcKATdmOPGqvif2B 28Kf4F7oi97MA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Alexander Mikhalitsyn , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 05/27] af_unix: Replace BUG_ON() with WARN_ON_ONCE(). Date: Wed, 21 May 2025 16:27:04 +0100 Message-ID: <20250521152920.1116756-6-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit d0f6dc26346863e1f4a23117f5468614e54df064 ] This is a prep patch for the last patch in this series so that checkpatch will not warn about BUG_ON(). Signed-off-by: Kuniyuki Iwashima Acked-by: Jens Axboe Link: https://lore.kernel.org/r/20240129190435.57228-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit d0f6dc26346863e1f4a23117f5468614e54df064) Signed-off-by: Lee Jones --- net/unix/garbage.c | 8 ++++---- net/unix/scm.c | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 2934d7b68036..7eeaac165e85 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -145,7 +145,7 @@ static void scan_children(struct sock *x, void (*func)(= struct unix_sock *), /* An embryo cannot be in-flight, so it's safe * to use the list link. */ - BUG_ON(!list_empty(&u->link)); + WARN_ON_ONCE(!list_empty(&u->link)); list_add_tail(&u->link, &embryos); } spin_unlock(&x->sk_receive_queue.lock); @@ -224,8 +224,8 @@ static void __unix_gc(struct work_struct *work) =20 total_refs =3D file_count(sk->sk_socket->file); =20 - BUG_ON(!u->inflight); - BUG_ON(total_refs < u->inflight); + WARN_ON_ONCE(!u->inflight); + WARN_ON_ONCE(total_refs < u->inflight); if (total_refs =3D=3D u->inflight) { list_move_tail(&u->link, &gc_candidates); __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags); @@ -318,7 +318,7 @@ static void __unix_gc(struct work_struct *work) list_move_tail(&u->link, &gc_inflight_list); =20 /* All candidates should have been detached by now. */ - BUG_ON(!list_empty(&gc_candidates)); + WARN_ON_ONCE(!list_empty(&gc_candidates)); =20 /* Paired with READ_ONCE() in wait_for_unix_gc(). */ WRITE_ONCE(gc_in_progress, false); diff --git a/net/unix/scm.c b/net/unix/scm.c index 693817a31ad8..6f446dd2deed 100644 --- a/net/unix/scm.c +++ b/net/unix/scm.c @@ -50,10 +50,10 @@ void unix_inflight(struct user_struct *user, struct fil= e *fp) =20 if (u) { if (!u->inflight) { - BUG_ON(!list_empty(&u->link)); + WARN_ON_ONCE(!list_empty(&u->link)); list_add_tail(&u->link, &gc_inflight_list); } else { - BUG_ON(list_empty(&u->link)); + WARN_ON_ONCE(list_empty(&u->link)); } u->inflight++; /* Paired with READ_ONCE() in wait_for_unix_gc() */ @@ -70,8 +70,8 @@ void unix_notinflight(struct user_struct *user, struct fi= le *fp) spin_lock(&unix_gc_lock); =20 if (u) { - BUG_ON(!u->inflight); - BUG_ON(list_empty(&u->link)); + WARN_ON_ONCE(!u->inflight); + WARN_ON_ONCE(list_empty(&u->link)); =20 u->inflight--; if (!u->inflight) --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFB8D189513; Wed, 21 May 2025 15:32:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841580; cv=none; b=Xn9E5JnZjcPQF0WfZXA5Zc7Q9LcLSSfjuSAFvDf0CvqvCyEEyk6zbQhSa5X2fRfMtLTKZ/npA/XOi8mEnrJi6+ZdvvT9m61zlj+U7TBCXI+RDMPogQoqx36JONUTfS3a/e6zS/huYoyKWR8GcdLy63gaMcXFv2iZ8TjY9DNjkm0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841580; c=relaxed/simple; bh=oef6HLrlbAgx8WrXcvkKFP3hlsXTMr+e/rWC4/VLf9c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aiyPfq9W3Y5F/lRDoTI9nIxx9iu+8v1eLpLGir7SUxZcXLs7V0NmlzyhmuYwzphZpvcXtspawVz6btAP4XsmFiCMIBd+ShMm0lQKne6tWIt/hQRNPIYvwbrXqWVObbQPrL76tcqYMvmJBSUPFYH5o4imDK4jmRo/97TnLLeQADg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=np4WEJP1; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="np4WEJP1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 69784C4CEE7; Wed, 21 May 2025 15:32:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841579; bh=oef6HLrlbAgx8WrXcvkKFP3hlsXTMr+e/rWC4/VLf9c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=np4WEJP1m83ZWB/BspU69JbUCa34WhxzwD73n9Aesyr6EA5MadeMtDALnmavuIlLg s5r6u26/6/hNiGXra+FH6Yb4lXMhV9iOC7b3G7NX382NXGe7Ie/YOi0EdEiHhZtZQ/ 3/zTxwbdig1wNu4ZB6AzhSMxM2ICkwYxcZfM2QUFp6YnYRWd/Vtyv0VblzLbeT8ztV niM0j3bAj2mhmU8E7vBoFBqOe2BhBoLFVd+kcnZeyRuPcCz4TyMHwSw+Z6/gxaj+M/ pT102jKuc+FU5e6GliT/EpkbZ05EGMRG6IANdl0GRBPWw/q4MC41H/Y2kgA6ZY+hyF Gc2qzefWD9HaQ== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Jens Axboe , Alexander Mikhalitsyn , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 06/27] af_unix: Remove io_uring code for GC. Date: Wed, 21 May 2025 16:27:05 +0100 Message-ID: <20250521152920.1116756-7-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 11498715f266a3fb4caabba9dd575636cbcaa8f1 ] Since commit 705318a99a13 ("io_uring/af_unix: disable sending io_uring over sockets"), io_uring's unix socket cannot be passed via SCM_RIGHTS, so it does not contribute to cyclic reference and no longer be candidate for garbage collection. Also, commit 6e5e6d274956 ("io_uring: drop any code related to SCM_RIGHTS") cleaned up SCM_RIGHTS code in io_uring. Let's do it in AF_UNIX as well by reverting commit 0091bfc81741 ("io_uring/af_unix: defer registered files gc to io_uring release") and commit 10369080454d ("net: reclaim skb->scm_io_uring bit"). Signed-off-by: Kuniyuki Iwashima Acked-by: Jens Axboe Link: https://lore.kernel.org/r/20240129190435.57228-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 11498715f266a3fb4caabba9dd575636cbcaa8f1) Signed-off-by: Lee Jones --- net/unix/garbage.c | 25 ++----------------------- 1 file changed, 2 insertions(+), 23 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 7eeaac165e85..c04f82489abb 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -184,12 +184,10 @@ static bool gc_in_progress; =20 static void __unix_gc(struct work_struct *work) { - struct sk_buff *next_skb, *skb; - struct unix_sock *u; - struct unix_sock *next; struct sk_buff_head hitlist; - struct list_head cursor; + struct unix_sock *u, *next; LIST_HEAD(not_cycle_list); + struct list_head cursor; =20 spin_lock(&unix_gc_lock); =20 @@ -293,30 +291,11 @@ static void __unix_gc(struct work_struct *work) =20 spin_unlock(&unix_gc_lock); =20 - /* We need io_uring to clean its registered files, ignore all io_uring - * originated skbs. It's fine as io_uring doesn't keep references to - * other io_uring instances and so killing all other files in the cycle - * will put all io_uring references forcing it to go through normal - * release.path eventually putting registered files. - */ - skb_queue_walk_safe(&hitlist, skb, next_skb) { - if (skb->scm_io_uring) { - __skb_unlink(skb, &hitlist); - skb_queue_tail(&skb->sk->sk_receive_queue, skb); - } - } - /* Here we are. Hitlist is filled. Die. */ __skb_queue_purge(&hitlist); =20 spin_lock(&unix_gc_lock); =20 - /* There could be io_uring registered files, just push them back to - * the inflight list - */ - list_for_each_entry_safe(u, next, &gc_candidates, link) - list_move_tail(&u->link, &gc_inflight_list); - /* All candidates should have been detached by now. */ WARN_ON_ONCE(!list_empty(&gc_candidates)); =20 --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E29E91EA7C2; Wed, 21 May 2025 15:33:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841587; cv=none; b=tW8NHDocAN6mry3hxOulXg+OLdkpUK/tIfvIoVy+OQi6N98Jx9jwsnHx4JomLwgaVO2EIODhqEmjG/xzkaK+WWEG+nv/JRdLUwQF5t+b79vL1ug0UZDEX1gwkD53AcV6KFKvTAulS/G/RDdO8yhNZkjeQZQ0UlP1mkWmlu5NSek= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841587; c=relaxed/simple; bh=uy5+OGLquQ/hI58aGewGRzZaToGymlH4O2HPyG63V1g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WgqUQn6K0wx/yUcT2wBSbay+joVdbfQ/c7GXZMlDbwxVS9xJQn6DtZP+uhFpwGtBgGigvD561K0qNkajK/76IZnhdqkTdupaogHELH/ZHue9ewnUA5SQ5mC8/Aa71O/3mORnFhc8n99PCUVSIlTi7R1To3Fqn/NQKO3F05CDP1E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=u634/p1X; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="u634/p1X" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6490FC4CEE7; Wed, 21 May 2025 15:33:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841586; bh=uy5+OGLquQ/hI58aGewGRzZaToGymlH4O2HPyG63V1g=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=u634/p1Xku1Cpj5EXo9jf5o8fUeQW3kQGnhsLg5J5uyM9pZQHwY5jlcVlc7pFRUUK 80m1OPxP1UtkCaaRew4EvRwTT7IhjBFWmGMPlEjsVJjHOyvkBN5lu5k+Eu6lPqav7T vgEDoO+91lZheh6w4Uz4vgIu1tNDUOzoVGFgMdo4GqNbiXwUOZYhFZ0G1QUKkVk+mc AT/b1CEhUsQujhLk5A42Xxrk99mw4vhSPs0rbqZHk1OjHKqM9djVn1traUK38LxY+u KwodJBYdTyGPn6K7p/hNCWFqbA3mg3Z/qFqmPu1j/Q7jK93tHnc0iND1i56zJi2I1A qd3agsMoy/ePQ== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Jens Axboe , Alexander Mikhalitsyn , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 07/27] af_unix: Remove CONFIG_UNIX_SCM. Date: Wed, 21 May 2025 16:27:06 +0100 Message-ID: <20250521152920.1116756-8-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 99a7a5b9943ea2d05fb0dee38e4ae2290477ed83 ] Originally, the code related to garbage collection was all in garbage.c. Commit f4e65870e5ce ("net: split out functions related to registering inflight socket files") moved some functions to scm.c for io_uring and added CONFIG_UNIX_SCM just in case AF_UNIX was built as module. However, since commit 97154bcf4d1b ("af_unix: Kconfig: make CONFIG_UNIX bool"), AF_UNIX is no longer built separately. Also, io_uring does not support SCM_RIGHTS now. Let's move the functions back to garbage.c Signed-off-by: Kuniyuki Iwashima Acked-by: Jens Axboe Link: https://lore.kernel.org/r/20240129190435.57228-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 99a7a5b9943ea2d05fb0dee38e4ae2290477ed83) Signed-off-by: Lee Jones --- include/net/af_unix.h | 7 +- net/Makefile | 2 +- net/unix/Kconfig | 5 -- net/unix/Makefile | 2 - net/unix/af_unix.c | 63 +++++++++++++++++- net/unix/garbage.c | 73 ++++++++++++++++++++- net/unix/scm.c | 149 ------------------------------------------ net/unix/scm.h | 10 --- 8 files changed, 137 insertions(+), 174 deletions(-) delete mode 100644 net/unix/scm.c delete mode 100644 net/unix/scm.h diff --git a/include/net/af_unix.h b/include/net/af_unix.h index be488d627531..91d2036fc182 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -17,19 +17,20 @@ static inline struct unix_sock *unix_get_socket(struct = file *filp) } #endif =20 +extern spinlock_t unix_gc_lock; +extern unsigned int unix_tot_inflight; + void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); -void unix_destruct_scm(struct sk_buff *skb); void unix_gc(void); void wait_for_unix_gc(struct scm_fp_list *fpl); + struct sock *unix_peer_get(struct sock *sk); =20 #define UNIX_HASH_MOD (256 - 1) #define UNIX_HASH_SIZE (256 * 2) #define UNIX_HASH_BITS 8 =20 -extern unsigned int unix_tot_inflight; - struct unix_address { refcount_t refcnt; int len; diff --git a/net/Makefile b/net/Makefile index 0914bea9c335..103cd8d61f68 100644 --- a/net/Makefile +++ b/net/Makefile @@ -17,7 +17,7 @@ obj-$(CONFIG_NETFILTER) +=3D netfilter/ obj-$(CONFIG_INET) +=3D ipv4/ obj-$(CONFIG_TLS) +=3D tls/ obj-$(CONFIG_XFRM) +=3D xfrm/ -obj-$(CONFIG_UNIX_SCM) +=3D unix/ +obj-$(CONFIG_UNIX) +=3D unix/ obj-y +=3D ipv6/ obj-$(CONFIG_BPFILTER) +=3D bpfilter/ obj-$(CONFIG_PACKET) +=3D packet/ diff --git a/net/unix/Kconfig b/net/unix/Kconfig index 28b232f281ab..8b5d04210d7c 100644 --- a/net/unix/Kconfig +++ b/net/unix/Kconfig @@ -16,11 +16,6 @@ config UNIX =20 Say Y unless you know what you are doing. =20 -config UNIX_SCM - bool - depends on UNIX - default y - config AF_UNIX_OOB bool depends on UNIX diff --git a/net/unix/Makefile b/net/unix/Makefile index 20491825b4d0..4ddd125c4642 100644 --- a/net/unix/Makefile +++ b/net/unix/Makefile @@ -11,5 +11,3 @@ unix-$(CONFIG_BPF_SYSCALL) +=3D unix_bpf.o =20 obj-$(CONFIG_UNIX_DIAG) +=3D unix_diag.o unix_diag-y :=3D diag.o - -obj-$(CONFIG_UNIX_SCM) +=3D scm.o diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index f74f7878b3fe..7bcc4c526274 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -116,8 +116,6 @@ #include #include =20 -#include "scm.h" - static atomic_long_t unix_nr_socks; static struct hlist_head bsd_socket_buckets[UNIX_HASH_SIZE / 2]; static spinlock_t bsd_socket_locks[UNIX_HASH_SIZE / 2]; @@ -1726,6 +1724,52 @@ static int unix_getname(struct socket *sock, struct = sockaddr *uaddr, int peer) return err; } =20 +/* The "user->unix_inflight" variable is protected by the garbage + * collection lock, and we just read it locklessly here. If you go + * over the limit, there might be a tiny race in actually noticing + * it across threads. Tough. + */ +static inline bool too_many_unix_fds(struct task_struct *p) +{ + struct user_struct *user =3D current_user(); + + if (unlikely(READ_ONCE(user->unix_inflight) > task_rlimit(p, RLIMIT_NOFIL= E))) + return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN); + return false; +} + +static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) +{ + int i; + + if (too_many_unix_fds(current)) + return -ETOOMANYREFS; + + /* Need to duplicate file references for the sake of garbage + * collection. Otherwise a socket in the fps might become a + * candidate for GC while the skb is not yet queued. + */ + UNIXCB(skb).fp =3D scm_fp_dup(scm->fp); + if (!UNIXCB(skb).fp) + return -ENOMEM; + + for (i =3D scm->fp->count - 1; i >=3D 0; i--) + unix_inflight(scm->fp->user, scm->fp->fp[i]); + + return 0; +} + +static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb) +{ + int i; + + scm->fp =3D UNIXCB(skb).fp; + UNIXCB(skb).fp =3D NULL; + + for (i =3D scm->fp->count - 1; i >=3D 0; i--) + unix_notinflight(scm->fp->user, scm->fp->fp[i]); +} + static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) { scm->fp =3D scm_fp_dup(UNIXCB(skb).fp); @@ -1773,6 +1817,21 @@ static void unix_peek_fds(struct scm_cookie *scm, st= ruct sk_buff *skb) spin_unlock(&unix_gc_lock); } =20 +static void unix_destruct_scm(struct sk_buff *skb) +{ + struct scm_cookie scm; + + memset(&scm, 0, sizeof(scm)); + scm.pid =3D UNIXCB(skb).pid; + if (UNIXCB(skb).fp) + unix_detach_fds(&scm, skb); + + /* Alas, it calls VFS */ + /* So fscking what? fput() had been SMP-safe since the last Summer */ + scm_destroy(&scm); + sock_wfree(skb); +} + static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bo= ol send_fds) { int err =3D 0; diff --git a/net/unix/garbage.c b/net/unix/garbage.c index c04f82489abb..0104be9d4704 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -81,11 +81,80 @@ #include #include =20 -#include "scm.h" +struct unix_sock *unix_get_socket(struct file *filp) +{ + struct inode *inode =3D file_inode(filp); + + /* Socket ? */ + if (S_ISSOCK(inode->i_mode) && !(filp->f_mode & FMODE_PATH)) { + struct socket *sock =3D SOCKET_I(inode); + const struct proto_ops *ops; + struct sock *sk =3D sock->sk; =20 -/* Internal data structures and random procedures: */ + ops =3D READ_ONCE(sock->ops); =20 + /* PF_UNIX ? */ + if (sk && ops && ops->family =3D=3D PF_UNIX) + return unix_sk(sk); + } + + return NULL; +} + +DEFINE_SPINLOCK(unix_gc_lock); +unsigned int unix_tot_inflight; static LIST_HEAD(gc_candidates); +static LIST_HEAD(gc_inflight_list); + +/* Keep the number of times in flight count for the file + * descriptor if it is for an AF_UNIX socket. + */ +void unix_inflight(struct user_struct *user, struct file *filp) +{ + struct unix_sock *u =3D unix_get_socket(filp); + + spin_lock(&unix_gc_lock); + + if (u) { + if (!u->inflight) { + WARN_ON_ONCE(!list_empty(&u->link)); + list_add_tail(&u->link, &gc_inflight_list); + } else { + WARN_ON_ONCE(list_empty(&u->link)); + } + u->inflight++; + + /* Paired with READ_ONCE() in wait_for_unix_gc() */ + WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + 1); + } + + WRITE_ONCE(user->unix_inflight, user->unix_inflight + 1); + + spin_unlock(&unix_gc_lock); +} + +void unix_notinflight(struct user_struct *user, struct file *filp) +{ + struct unix_sock *u =3D unix_get_socket(filp); + + spin_lock(&unix_gc_lock); + + if (u) { + WARN_ON_ONCE(!u->inflight); + WARN_ON_ONCE(list_empty(&u->link)); + + u->inflight--; + if (!u->inflight) + list_del_init(&u->link); + + /* Paired with READ_ONCE() in wait_for_unix_gc() */ + WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - 1); + } + + WRITE_ONCE(user->unix_inflight, user->unix_inflight - 1); + + spin_unlock(&unix_gc_lock); +} =20 static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), struct sk_buff_head *hitlist) diff --git a/net/unix/scm.c b/net/unix/scm.c deleted file mode 100644 index 6f446dd2deed..000000000000 --- a/net/unix/scm.c +++ /dev/null @@ -1,149 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "scm.h" - -unsigned int unix_tot_inflight; -EXPORT_SYMBOL(unix_tot_inflight); - -LIST_HEAD(gc_inflight_list); -EXPORT_SYMBOL(gc_inflight_list); - -DEFINE_SPINLOCK(unix_gc_lock); -EXPORT_SYMBOL(unix_gc_lock); - -struct unix_sock *unix_get_socket(struct file *filp) -{ - struct inode *inode =3D file_inode(filp); - - /* Socket ? */ - if (S_ISSOCK(inode->i_mode) && !(filp->f_mode & FMODE_PATH)) { - struct socket *sock =3D SOCKET_I(inode); - struct sock *s =3D sock->sk; - - /* PF_UNIX ? */ - if (s && sock->ops && sock->ops->family =3D=3D PF_UNIX) - return unix_sk(s); - } - - return NULL; -} -EXPORT_SYMBOL(unix_get_socket); - -/* Keep the number of times in flight count for the file - * descriptor if it is for an AF_UNIX socket. - */ -void unix_inflight(struct user_struct *user, struct file *fp) -{ - struct unix_sock *u =3D unix_get_socket(fp); - - spin_lock(&unix_gc_lock); - - if (u) { - if (!u->inflight) { - WARN_ON_ONCE(!list_empty(&u->link)); - list_add_tail(&u->link, &gc_inflight_list); - } else { - WARN_ON_ONCE(list_empty(&u->link)); - } - u->inflight++; - /* Paired with READ_ONCE() in wait_for_unix_gc() */ - WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + 1); - } - WRITE_ONCE(user->unix_inflight, user->unix_inflight + 1); - spin_unlock(&unix_gc_lock); -} - -void unix_notinflight(struct user_struct *user, struct file *fp) -{ - struct unix_sock *u =3D unix_get_socket(fp); - - spin_lock(&unix_gc_lock); - - if (u) { - WARN_ON_ONCE(!u->inflight); - WARN_ON_ONCE(list_empty(&u->link)); - - u->inflight--; - if (!u->inflight) - list_del_init(&u->link); - /* Paired with READ_ONCE() in wait_for_unix_gc() */ - WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - 1); - } - WRITE_ONCE(user->unix_inflight, user->unix_inflight - 1); - spin_unlock(&unix_gc_lock); -} - -/* - * The "user->unix_inflight" variable is protected by the garbage - * collection lock, and we just read it locklessly here. If you go - * over the limit, there might be a tiny race in actually noticing - * it across threads. Tough. - */ -static inline bool too_many_unix_fds(struct task_struct *p) -{ - struct user_struct *user =3D current_user(); - - if (unlikely(READ_ONCE(user->unix_inflight) > task_rlimit(p, RLIMIT_NOFIL= E))) - return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN); - return false; -} - -int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) -{ - int i; - - if (too_many_unix_fds(current)) - return -ETOOMANYREFS; - - /* - * Need to duplicate file references for the sake of garbage - * collection. Otherwise a socket in the fps might become a - * candidate for GC while the skb is not yet queued. - */ - UNIXCB(skb).fp =3D scm_fp_dup(scm->fp); - if (!UNIXCB(skb).fp) - return -ENOMEM; - - for (i =3D scm->fp->count - 1; i >=3D 0; i--) - unix_inflight(scm->fp->user, scm->fp->fp[i]); - return 0; -} -EXPORT_SYMBOL(unix_attach_fds); - -void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb) -{ - int i; - - scm->fp =3D UNIXCB(skb).fp; - UNIXCB(skb).fp =3D NULL; - - for (i =3D scm->fp->count-1; i >=3D 0; i--) - unix_notinflight(scm->fp->user, scm->fp->fp[i]); -} -EXPORT_SYMBOL(unix_detach_fds); - -void unix_destruct_scm(struct sk_buff *skb) -{ - struct scm_cookie scm; - - memset(&scm, 0, sizeof(scm)); - scm.pid =3D UNIXCB(skb).pid; - if (UNIXCB(skb).fp) - unix_detach_fds(&scm, skb); - - /* Alas, it calls VFS */ - /* So fscking what? fput() had been SMP-safe since the last Summer */ - scm_destroy(&scm); - sock_wfree(skb); -} -EXPORT_SYMBOL(unix_destruct_scm); diff --git a/net/unix/scm.h b/net/unix/scm.h deleted file mode 100644 index 5a255a477f16..000000000000 --- a/net/unix/scm.h +++ /dev/null @@ -1,10 +0,0 @@ -#ifndef NET_UNIX_SCM_H -#define NET_UNIX_SCM_H - -extern struct list_head gc_inflight_list; -extern spinlock_t unix_gc_lock; - -int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb); -void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb); - -#endif --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E6D81EB18D; Wed, 21 May 2025 15:33:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841595; cv=none; b=btbjdmR9i6O6M76qWCLuaZ8yN+CxoSiZKcniEs7DAWlBtCBoD5GG0zJ93arYIuG7o0kht3qH4NktfuR0srFk4CGYqnjSeQhnzd17fMXi4edHGOQ/fDtOnsBSbV6l/shLw7J9wP+F6OyacMHcHY+wCAeafuVMgMQ3gyLTFaP9sc8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841595; c=relaxed/simple; bh=0JYCZIROnsimerrNpFz6DlcLGnAeUCxZ4ayHUpKmd9A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UU5qJBabW687Qz4AfMENmnBwnhKuJU9hi9bexUu0xCCIUSj7WqK7ZlFNaR8Ei6B07UPHwoOsw193O58x/ZUx2+IDszzoa89PlsZRuoJyGVOgful7W2pRasZO3FEiMMXfOS8EnmUm0e2A7W0jk6LACaN4sMq6p4/ESYIeSvhaoos= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Sl5hWbMq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Sl5hWbMq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 945FEC4CEED; Wed, 21 May 2025 15:33:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841593; bh=0JYCZIROnsimerrNpFz6DlcLGnAeUCxZ4ayHUpKmd9A=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Sl5hWbMqrMZ5wWVbEKw5NNUAeVBUS2V34eOpje0ZmbQtB/bKSbpbkOYOMIUg7/yrT R6HKeae37nFQOta6Jkh39xXrKfwWpw4mPVkjEU1JPcdqtvfysd6QGFiIU1+dND2TVK P9UT+lzIP25a4W2bF+7EP5NphiD3pxP7FUmnAm+LUicaN2gTw/OcPW0GJ9KlZ0y70N yxXdDlVL6kYulCzB9sr1oxbuN6/W4ItjTqLvLjUkVAZpif12yG870UUFZ4IDWG6ZVZ iyOsSiMaJn+WjYBPZOpb+VaWeCQzJa4a2QL3TNhGUMeOA6xj41fuwZTWhrg7pNeare Vko9Q8FfJ0IdA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Alexander Mikhalitsyn , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 08/27] af_unix: Allocate struct unix_vertex for each inflight AF_UNIX fd. Date: Wed, 21 May 2025 16:27:07 +0100 Message-ID: <20250521152920.1116756-9-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 1fbfdfaa590248c1d86407f578e40e5c65136330 ] We will replace the garbage collection algorithm for AF_UNIX, where we will consider each inflight AF_UNIX socket as a vertex and its file descriptor as an edge in a directed graph. This patch introduces a new struct unix_vertex representing a vertex in the graph and adds its pointer to struct unix_sock. When we send a fd using the SCM_RIGHTS message, we allocate struct scm_fp_list to struct scm_cookie in scm_fp_copy(). Then, we bump each refcount of the inflight fds' struct file and save them in scm_fp_list.fp. After that, unix_attach_fds() inexplicably clones scm_fp_list of scm_cookie and sets it to skb. (We will remove this part after replacing GC.) Here, we add a new function call in unix_attach_fds() to preallocate struct unix_vertex per inflight AF_UNIX fd and link each vertex to skb's scm_fp_list.vertices. When sendmsg() succeeds later, if the socket of the inflight fd is still not inflight yet, we will set the preallocated vertex to struct unix_sock.vertex and link it to a global list unix_unvisited_vertices under spin_lock(&unix_gc_lock). If the socket is already inflight, we free the preallocated vertex. This is to avoid taking the lock unnecessarily when sendmsg() could fail later. In the following patch, we will similarly allocate another struct per edge, which will finally be linked to the inflight socket's unix_vertex.edges. And then, we will count the number of edges as unix_vertex.out_degree. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 1fbfdfaa590248c1d86407f578e40e5c65136330) Signed-off-by: Lee Jones --- include/net/af_unix.h | 9 +++++++++ include/net/scm.h | 3 +++ net/core/scm.c | 7 +++++++ net/unix/af_unix.c | 6 ++++++ net/unix/garbage.c | 38 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 63 insertions(+) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 91d2036fc182..b41aff1ac688 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -22,9 +22,17 @@ extern unsigned int unix_tot_inflight; =20 void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); +int unix_prepare_fpl(struct scm_fp_list *fpl); +void unix_destroy_fpl(struct scm_fp_list *fpl); void unix_gc(void); void wait_for_unix_gc(struct scm_fp_list *fpl); =20 +struct unix_vertex { + struct list_head edges; + struct list_head entry; + unsigned long out_degree; +}; + struct sock *unix_peer_get(struct sock *sk); =20 #define UNIX_HASH_MOD (256 - 1) @@ -62,6 +70,7 @@ struct unix_sock { struct path path; struct mutex iolock, bindlock; struct sock *peer; + struct unix_vertex *vertex; struct list_head link; unsigned long inflight; spinlock_t lock; diff --git a/include/net/scm.h b/include/net/scm.h index a5c26008fcec..4183495d1981 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -25,6 +25,9 @@ struct scm_fp_list { short count; short count_unix; short max; +#ifdef CONFIG_UNIX + struct list_head vertices; +#endif struct user_struct *user; struct file *fp[SCM_MAX_FD]; }; diff --git a/net/core/scm.c b/net/core/scm.c index bb25052624ee..09bacb3d36f2 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -89,6 +89,9 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_f= p_list **fplp) fpl->count_unix =3D 0; fpl->max =3D SCM_MAX_FD; fpl->user =3D NULL; +#if IS_ENABLED(CONFIG_UNIX) + INIT_LIST_HEAD(&fpl->vertices); +#endif } fpp =3D &fpl->fp[fpl->count]; =20 @@ -372,8 +375,12 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl) if (new_fpl) { for (i =3D 0; i < fpl->count; i++) get_file(fpl->fp[i]); + new_fpl->max =3D new_fpl->count; new_fpl->user =3D get_uid(fpl->user); +#if IS_ENABLED(CONFIG_UNIX) + INIT_LIST_HEAD(&new_fpl->vertices); +#endif } return new_fpl; } diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 7bcc4c526274..0d3ba0d210c0 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -955,6 +955,7 @@ static struct sock *unix_create1(struct net *net, struc= t socket *sock, int kern, sk->sk_destruct =3D unix_sock_destructor; u =3D unix_sk(sk); u->inflight =3D 0; + u->vertex =3D NULL; u->path.dentry =3D NULL; u->path.mnt =3D NULL; spin_lock_init(&u->lock); @@ -1756,6 +1757,9 @@ static int unix_attach_fds(struct scm_cookie *scm, st= ruct sk_buff *skb) for (i =3D scm->fp->count - 1; i >=3D 0; i--) unix_inflight(scm->fp->user, scm->fp->fp[i]); =20 + if (unix_prepare_fpl(UNIXCB(skb).fp)) + return -ENOMEM; + return 0; } =20 @@ -1766,6 +1770,8 @@ static void unix_detach_fds(struct scm_cookie *scm, s= truct sk_buff *skb) scm->fp =3D UNIXCB(skb).fp; UNIXCB(skb).fp =3D NULL; =20 + unix_destroy_fpl(scm->fp); + for (i =3D scm->fp->count - 1; i >=3D 0; i--) unix_notinflight(scm->fp->user, scm->fp->fp[i]); } diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 0104be9d4704..8ea7640e032e 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -101,6 +101,44 @@ struct unix_sock *unix_get_socket(struct file *filp) return NULL; } =20 +static void unix_free_vertices(struct scm_fp_list *fpl) +{ + struct unix_vertex *vertex, *next_vertex; + + list_for_each_entry_safe(vertex, next_vertex, &fpl->vertices, entry) { + list_del(&vertex->entry); + kfree(vertex); + } +} + +int unix_prepare_fpl(struct scm_fp_list *fpl) +{ + struct unix_vertex *vertex; + int i; + + if (!fpl->count_unix) + return 0; + + for (i =3D 0; i < fpl->count_unix; i++) { + vertex =3D kmalloc(sizeof(*vertex), GFP_KERNEL); + if (!vertex) + goto err; + + list_add(&vertex->entry, &fpl->vertices); + } + + return 0; + +err: + unix_free_vertices(fpl); + return -ENOMEM; +} + +void unix_destroy_fpl(struct scm_fp_list *fpl) +{ + unix_free_vertices(fpl); +} + DEFINE_SPINLOCK(unix_gc_lock); unsigned int unix_tot_inflight; static LIST_HEAD(gc_candidates); --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F02821C9F3; Wed, 21 May 2025 15:33:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841602; cv=none; b=fi6dDEXyu76ldX3MFbEJbTE9aniv1heycn7WEioNm8byUdHXsAyEYOLw/LNLEa8OFv4Gm/sZuZtulxaZJZ9memZrWx/VpLp2SafMRJnePZp5Y/436JcVK9z7FWZ3zb9MbV35MjKKwUyerCPXhA6/kV4GAb/Dfl6i+DTkUPFPT6w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841602; c=relaxed/simple; bh=aCmgMEOAM0j0CHxLV4UlaDTjhZ5Vvnt97mhZk29kpbs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YjvWqr4xcINtS3pmpEg5M7WRC6AJDSXDEz0JAN3rguM9pf7+HqJn8G8VAbAYapUUWCWecN/QTKLVc2UaToqmtLxYTbAttWAp9cjOuWK0D9mVimSQGUt/2emqMtkyNr/wIW974DCJybtdlRCIMiwvFhiYJm7R+MlJRpB8ZkLA0AA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CgOWnU1o; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CgOWnU1o" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C7226C4CEE4; Wed, 21 May 2025 15:33:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841600; bh=aCmgMEOAM0j0CHxLV4UlaDTjhZ5Vvnt97mhZk29kpbs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CgOWnU1ox6+9sZjix2vvQr60e3FocABop1an6Uf2uzXqGdKBfHJrndJOYfMLoKSh6 b9DMWudXyZ1ezwXaw1NBgL6aai08zsSS+t/G8CrX/dx57eUzoesRyk3zAIMTebBWn+ HOuK+eboPk2hRgBm/PUwdOo9hQOa9gkRZ0s7WUFCWRhpFegxY/8ZnYZEElTaXlb+IY 5OkPzMtulSuJOKMiPgxjmsDbbNzisoRl69iRMzYBShPhlTGimiZiWvJXY+UYLbH+hq 7VV/gpzQiIWlQmhn5p8GlRRO0IFQoQ36j+00w8e7wZdcYjTLipd57HhrO6jtkYizeu 2dh4dm6Y7XZgw== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Jens Axboe , Alexander Mikhalitsyn , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 09/27] af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd. Date: Wed, 21 May 2025 16:27:08 +0100 Message-ID: <20250521152920.1116756-10-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 29b64e354029cfcf1eea4d91b146c7b769305930 ] As with the previous patch, we preallocate to skb's scm_fp_list an array of struct unix_edge in the number of inflight AF_UNIX fds. There we just preallocate memory and do not use immediately because sendmsg() could fail after this point. The actual use will be in the next patch. When we queue skb with inflight edges, we will set the inflight socket's unix_sock as unix_edge->predecessor and the receiver's unix_sock as successor, and then we will link the edge to the inflight socket's unix_vertex.edges. Note that we set NULL to cloned scm_fp_list.edges in scm_fp_dup() so that MSG_PEEK does not change the shape of the directed graph. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 29b64e354029cfcf1eea4d91b146c7b769305930) Signed-off-by: Lee Jones --- include/net/af_unix.h | 6 ++++++ include/net/scm.h | 5 +++++ net/core/scm.c | 2 ++ net/unix/garbage.c | 6 ++++++ 4 files changed, 19 insertions(+) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index b41aff1ac688..279087595966 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -33,6 +33,12 @@ struct unix_vertex { unsigned long out_degree; }; =20 +struct unix_edge { + struct unix_sock *predecessor; + struct unix_sock *successor; + struct list_head vertex_entry; +}; + struct sock *unix_peer_get(struct sock *sk); =20 #define UNIX_HASH_MOD (256 - 1) diff --git a/include/net/scm.h b/include/net/scm.h index 4183495d1981..19d7d802ed6c 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -21,12 +21,17 @@ struct scm_creds { kgid_t gid; }; =20 +#ifdef CONFIG_UNIX +struct unix_edge; +#endif + struct scm_fp_list { short count; short count_unix; short max; #ifdef CONFIG_UNIX struct list_head vertices; + struct unix_edge *edges; #endif struct user_struct *user; struct file *fp[SCM_MAX_FD]; diff --git a/net/core/scm.c b/net/core/scm.c index 09bacb3d36f2..4c343729f960 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -90,6 +90,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_f= p_list **fplp) fpl->max =3D SCM_MAX_FD; fpl->user =3D NULL; #if IS_ENABLED(CONFIG_UNIX) + fpl->edges =3D NULL; INIT_LIST_HEAD(&fpl->vertices); #endif } @@ -379,6 +380,7 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl) new_fpl->max =3D new_fpl->count; new_fpl->user =3D get_uid(fpl->user); #if IS_ENABLED(CONFIG_UNIX) + new_fpl->edges =3D NULL; INIT_LIST_HEAD(&new_fpl->vertices); #endif } diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 8ea7640e032e..912b7945692c 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -127,6 +127,11 @@ int unix_prepare_fpl(struct scm_fp_list *fpl) list_add(&vertex->entry, &fpl->vertices); } =20 + fpl->edges =3D kvmalloc_array(fpl->count_unix, sizeof(*fpl->edges), + GFP_KERNEL_ACCOUNT); + if (!fpl->edges) + goto err; + return 0; =20 err: @@ -136,6 +141,7 @@ int unix_prepare_fpl(struct scm_fp_list *fpl) =20 void unix_destroy_fpl(struct scm_fp_list *fpl) { + kvfree(fpl->edges); unix_free_vertices(fpl); } =20 --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71E681C8FBA; Wed, 21 May 2025 15:33:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841608; cv=none; b=dOWytTnVbs2gzOObwj0RqNwHyR74GXuxVvElTkvIVg8ROL8Gyfaj1VUVsUIA45VG6lARIC8GHEWDrCqH6QNM7GUcu2pEssibDFv8ilSMQ78doEGTnQdpS7jBpExtaLZ/DLJnQdizJ+tw7kW2Y13m41Z0IRnsqlxS0uNsWVESgJI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841608; c=relaxed/simple; bh=tKOkanSvpRwyDrcb0yNwjPZo47TElqLpIaD2fxZ2JMU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jKBU7nXCaZ9zXGBpA8eWCqCQOoZ/HFgKgwmn0a3ufPuNduwFAhN/yIV+I6pfKkoRiXXWBy9zg0F+8c5qz9pLv4S+kO8PL2FrJBXDidZqb6P7X3e+9OWoNrXKrhizg/ABRmGPTx4YoIAJj1zBr3GuFsJHLdAJQe0NGHIqJT5PYZQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TGm5nZ6y; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TGm5nZ6y" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EFAA4C4CEE4; Wed, 21 May 2025 15:33:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841607; bh=tKOkanSvpRwyDrcb0yNwjPZo47TElqLpIaD2fxZ2JMU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TGm5nZ6yc3aKF07onbNVoQADRfeCBsgQ5fJRJs4qs7cXazPAKFg1xeGObBu08NHR/ Gt1ja+PzslHtqLj2SV1Kt4KbiOT339sY2Ax9lY6LpmYSyPS847zwq1KO2dL9YKfAg7 8TcWP1+PyyXqNajckzIBtq/UphRQz0IC/vmJWiuEG33k2kcUyMM7bFCsCGPvP7GXm9 np3V504Aut32lw9Wprl4T7mKBVE6hjmDbJdpvc+C/2kRYmLG98kfz7tmx/EJQqSa9u HKLnKp+Zv4GoDQgEuUW9UlcfcM+gu9bQw/VBiHkO+9pNd2TlI+uv92GjJFabzMBoFX fZwN9slVXzWMA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Alexander Mikhalitsyn , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 10/27] af_unix: Link struct unix_edge when queuing skb. Date: Wed, 21 May 2025 16:27:09 +0100 Message-ID: <20250521152920.1116756-11-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 42f298c06b30bfe0a8cbee5d38644e618699e26e ] Just before queuing skb with inflight fds, we call scm_stat_add(), which is a good place to set up the preallocated struct unix_vertex and struct unix_edge in UNIXCB(skb).fp. Then, we call unix_add_edges() and construct the directed graph as follows: 1. Set the inflight socket's unix_sock to unix_edge.predecessor. 2. Set the receiver's unix_sock to unix_edge.successor. 3. Set the preallocated vertex to inflight socket's unix_sock.vertex. 4. Link inflight socket's unix_vertex.entry to unix_unvisited_vertices. 5. Link unix_edge.vertex_entry to the inflight socket's unix_vertex.edges. Let's say we pass the fd of AF_UNIX socket A to B and the fd of B to C. The graph looks like this: +-------------------------+ | unix_unvisited_vertices | <-------------------------. +-------------------------+ | + | | +--------------+ +--------------+ | +--------= ------+ | | unix_sock A | <---. .---> | unix_sock B | <-|-. .---> | unix_s= ock C | | +--------------+ | | +--------------+ | | | +--------= ------+ | .-+ | vertex | | | .-+ | vertex | | | | | vert= ex | | | +--------------+ | | | +--------------+ | | | +--------= ------+ | | | | | | | | | | +--------------+ | | | +--------------+ | | | | '-> | unix_vertex | | | '-> | unix_vertex | | | | | +--------------+ | | +--------------+ | | | `---> | entry | +---------> | entry | +-' | | |--------------| | | |--------------| | | | edges | <-. | | | edges | <-. | | +--------------+ | | | +--------------+ | | | | | | | | | .----------------------' | | .----------------------' | | | | | | | | | +--------------+ | | | +--------------+ | | | | unix_edge | | | | | unix_edge | | | | +--------------+ | | | +--------------+ | | `-> | vertex_entry | | | `-> | vertex_entry | | | |--------------| | | |--------------| | | | predecessor | +---' | | predecessor | +---' | |--------------| | |--------------| | | successor | +-----' | successor | +-----' +--------------+ +--------------+ Henceforth, we denote such a graph as A -> B (-> C). Now, we can express all inflight fd graphs that do not contain embryo sockets. We will support the particular case later. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 42f298c06b30bfe0a8cbee5d38644e618699e26e) Signed-off-by: Lee Jones --- include/net/af_unix.h | 2 + include/net/scm.h | 1 + net/core/scm.c | 2 + net/unix/af_unix.c | 8 +++- net/unix/garbage.c | 90 ++++++++++++++++++++++++++++++++++++++++++- 5 files changed, 100 insertions(+), 3 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 279087595966..08cc90348043 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -22,6 +22,8 @@ extern unsigned int unix_tot_inflight; =20 void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); +void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver); +void unix_del_edges(struct scm_fp_list *fpl); int unix_prepare_fpl(struct scm_fp_list *fpl); void unix_destroy_fpl(struct scm_fp_list *fpl); void unix_gc(void); diff --git a/include/net/scm.h b/include/net/scm.h index 19d7d802ed6c..19789096424d 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -30,6 +30,7 @@ struct scm_fp_list { short count_unix; short max; #ifdef CONFIG_UNIX + bool inflight; struct list_head vertices; struct unix_edge *edges; #endif diff --git a/net/core/scm.c b/net/core/scm.c index 4c343729f960..1ff78bd4ee83 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -90,6 +90,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_f= p_list **fplp) fpl->max =3D SCM_MAX_FD; fpl->user =3D NULL; #if IS_ENABLED(CONFIG_UNIX) + fpl->inflight =3D false; fpl->edges =3D NULL; INIT_LIST_HEAD(&fpl->vertices); #endif @@ -380,6 +381,7 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl) new_fpl->max =3D new_fpl->count; new_fpl->user =3D get_uid(fpl->user); #if IS_ENABLED(CONFIG_UNIX) + new_fpl->inflight =3D false; new_fpl->edges =3D NULL; INIT_LIST_HEAD(&new_fpl->vertices); #endif diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 0d3ba0d210c0..658a1680a92e 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1910,8 +1910,10 @@ static void scm_stat_add(struct sock *sk, struct sk_= buff *skb) struct scm_fp_list *fp =3D UNIXCB(skb).fp; struct unix_sock *u =3D unix_sk(sk); =20 - if (unlikely(fp && fp->count)) + if (unlikely(fp && fp->count)) { atomic_add(fp->count, &u->scm_stat.nr_fds); + unix_add_edges(fp, u); + } } =20 static void scm_stat_del(struct sock *sk, struct sk_buff *skb) @@ -1919,8 +1921,10 @@ static void scm_stat_del(struct sock *sk, struct sk_= buff *skb) struct scm_fp_list *fp =3D UNIXCB(skb).fp; struct unix_sock *u =3D unix_sk(sk); =20 - if (unlikely(fp && fp->count)) + if (unlikely(fp && fp->count)) { atomic_sub(fp->count, &u->scm_stat.nr_fds); + unix_del_edges(fp); + } } =20 /* diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 912b7945692c..b5b4a200dbf3 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -101,6 +101,38 @@ struct unix_sock *unix_get_socket(struct file *filp) return NULL; } =20 +static LIST_HEAD(unix_unvisited_vertices); + +static void unix_add_edge(struct scm_fp_list *fpl, struct unix_edge *edge) +{ + struct unix_vertex *vertex =3D edge->predecessor->vertex; + + if (!vertex) { + vertex =3D list_first_entry(&fpl->vertices, typeof(*vertex), entry); + vertex->out_degree =3D 0; + INIT_LIST_HEAD(&vertex->edges); + + list_move_tail(&vertex->entry, &unix_unvisited_vertices); + edge->predecessor->vertex =3D vertex; + } + + vertex->out_degree++; + list_add_tail(&edge->vertex_entry, &vertex->edges); +} + +static void unix_del_edge(struct scm_fp_list *fpl, struct unix_edge *edge) +{ + struct unix_vertex *vertex =3D edge->predecessor->vertex; + + list_del(&edge->vertex_entry); + vertex->out_degree--; + + if (!vertex->out_degree) { + edge->predecessor->vertex =3D NULL; + list_move_tail(&vertex->entry, &fpl->vertices); + } +} + static void unix_free_vertices(struct scm_fp_list *fpl) { struct unix_vertex *vertex, *next_vertex; @@ -111,6 +143,60 @@ static void unix_free_vertices(struct scm_fp_list *fpl) } } =20 +DEFINE_SPINLOCK(unix_gc_lock); + +void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver) +{ + int i =3D 0, j =3D 0; + + spin_lock(&unix_gc_lock); + + if (!fpl->count_unix) + goto out; + + do { + struct unix_sock *inflight =3D unix_get_socket(fpl->fp[j++]); + struct unix_edge *edge; + + if (!inflight) + continue; + + edge =3D fpl->edges + i++; + edge->predecessor =3D inflight; + edge->successor =3D receiver; + + unix_add_edge(fpl, edge); + } while (i < fpl->count_unix); + +out: + spin_unlock(&unix_gc_lock); + + fpl->inflight =3D true; + + unix_free_vertices(fpl); +} + +void unix_del_edges(struct scm_fp_list *fpl) +{ + int i =3D 0; + + spin_lock(&unix_gc_lock); + + if (!fpl->count_unix) + goto out; + + do { + struct unix_edge *edge =3D fpl->edges + i++; + + unix_del_edge(fpl, edge); + } while (i < fpl->count_unix); + +out: + spin_unlock(&unix_gc_lock); + + fpl->inflight =3D false; +} + int unix_prepare_fpl(struct scm_fp_list *fpl) { struct unix_vertex *vertex; @@ -141,11 +227,13 @@ int unix_prepare_fpl(struct scm_fp_list *fpl) =20 void unix_destroy_fpl(struct scm_fp_list *fpl) { + if (fpl->inflight) + unix_del_edges(fpl); + kvfree(fpl->edges); unix_free_vertices(fpl); } =20 -DEFINE_SPINLOCK(unix_gc_lock); unsigned int unix_tot_inflight; static LIST_HEAD(gc_candidates); static LIST_HEAD(gc_inflight_list); --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 945AC1DF75C; Wed, 21 May 2025 15:33:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841616; cv=none; b=eoUJLauoQZjrxW+VmODgVhAxDGEHj+6F8HM6eCcnbAP0KUI21bOmoQvb/WKJzYajCrYVg4r08BidPIuKYerpVtxzGepEJfndCYnJuJV2ND8BdRT0GReMZVe/tdm1T1KJbX6CdiXINQYkubvMfg739mrijRF2VLigTmPt9gVnT6Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841616; c=relaxed/simple; bh=4XqPg+mh/kFe742e9TKTxwUvm/c1ssaaC/hHgrTNopA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=czVuti3OYx8S4t3BTnMyvtgHu0c/H+ePy+7EFjneWlARs6mbVuHVqAf48RXgbm0huIzsjNCfp+MsDPUxR9REXcFZlDR5FT4JmfguBbW7lscf0/fABuhRXz5XB2AT3EzBVWOdRFE+u+3oeqjf3EH61ifB9g1YxctNnfZSVkqCHCU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=h/S1YXgc; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="h/S1YXgc" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2A35DC4CEE4; Wed, 21 May 2025 15:33:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841615; bh=4XqPg+mh/kFe742e9TKTxwUvm/c1ssaaC/hHgrTNopA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=h/S1YXgc/RM92uPN5zmMIXHrjO8dfKaDoTX6HTkd/VTMLM86pahb3B0iuV+zSeAhg 6p0DJiqBexx3/yFV2+LDEEtA/T0dCaYYGilklJZeeSR6/iImOnfFAEXTNmV+aLBYXx A2IkPfgi3xZ0lRHxRhuFI0tsJ99w0K7X7zv3ojjlfXhnX511YHVUozCxfmaF2bUKlG eDJF1/gc/3iB6GgDyQxqB+1TECZZcqXba8nbembJGue6fXmXg8pfo/WFPTPij07/Oa Q6t4knaFtXLaaBv8gqbWWjOW22rKhT+VbSKHK/74KjQ+Yzzyl202OA0j8ryo8r9pee 5TPH2Ejaj7QJw== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Alexander Mikhalitsyn , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 11/27] af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb. Date: Wed, 21 May 2025 16:27:10 +0100 Message-ID: <20250521152920.1116756-12-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 22c3c0c52d32f41cc38cd936ea0c93f22ced3315 ] Currently, we track the number of inflight sockets in two variables. unix_tot_inflight is the total number of inflight AF_UNIX sockets on the host, and user->unix_inflight is the number of inflight fds per user. We update them one by one in unix_inflight(), which can be done once in batch. Also, sendmsg() could fail even after unix_inflight(), then we need to acquire unix_gc_lock only to decrement the counters. Let's bulk update the counters in unix_add_edges() and unix_del_edges(), which is called only for successfully passed fds. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 22c3c0c52d32f41cc38cd936ea0c93f22ced3315) Signed-off-by: Lee Jones --- net/unix/garbage.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index b5b4a200dbf3..f7041fc23000 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -144,6 +144,7 @@ static void unix_free_vertices(struct scm_fp_list *fpl) } =20 DEFINE_SPINLOCK(unix_gc_lock); +unsigned int unix_tot_inflight; =20 void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver) { @@ -168,7 +169,10 @@ void unix_add_edges(struct scm_fp_list *fpl, struct un= ix_sock *receiver) unix_add_edge(fpl, edge); } while (i < fpl->count_unix); =20 + WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + fpl->count_unix); out: + WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight + fpl->coun= t); + spin_unlock(&unix_gc_lock); =20 fpl->inflight =3D true; @@ -191,7 +195,10 @@ void unix_del_edges(struct scm_fp_list *fpl) unix_del_edge(fpl, edge); } while (i < fpl->count_unix); =20 + WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - fpl->count_unix); out: + WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight - fpl->coun= t); + spin_unlock(&unix_gc_lock); =20 fpl->inflight =3D false; @@ -234,7 +241,6 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) unix_free_vertices(fpl); } =20 -unsigned int unix_tot_inflight; static LIST_HEAD(gc_candidates); static LIST_HEAD(gc_inflight_list); =20 @@ -255,13 +261,8 @@ void unix_inflight(struct user_struct *user, struct fi= le *filp) WARN_ON_ONCE(list_empty(&u->link)); } u->inflight++; - - /* Paired with READ_ONCE() in wait_for_unix_gc() */ - WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + 1); } =20 - WRITE_ONCE(user->unix_inflight, user->unix_inflight + 1); - spin_unlock(&unix_gc_lock); } =20 @@ -278,13 +279,8 @@ void unix_notinflight(struct user_struct *user, struct= file *filp) u->inflight--; if (!u->inflight) list_del_init(&u->link); - - /* Paired with READ_ONCE() in wait_for_unix_gc() */ - WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - 1); } =20 - WRITE_ONCE(user->unix_inflight, user->unix_inflight - 1); - spin_unlock(&unix_gc_lock); } =20 --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65FAD1DF75C; Wed, 21 May 2025 15:33:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841622; cv=none; b=UU+qQdLcL5A7kgoy9qQPSHTkCEyEcZCmG/pB6eb2Mt1p3bVR9We2g0nF19J+QoCTT/LDezSYC6Iy8euW03hYuBZer52p/wr9cqSz7086C1GWSsP0sfWG054NMf4r3TsHblDWL8gVviHFz+V7ADTYGpYdpjNqM3vYPFGXW2UhW5U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841622; c=relaxed/simple; bh=pivfaqLbjvxiTmCHNmYRHM0reWtCukqUacXbd9qDv+w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MREQuh+A5aNBbu3F16X5LdrDZDv8VHemaJv+IvfHCeAr1T8ujyzKtu6cqnea/+qSrD9fQ5d98CIxp4tly3vQ6uB3zDkhapGOh5SDos6jpehBjjo1Hvt3aZUs6FvYvFcLhGc+i2EtQtwgkWS+CabjoH+XzMSlT1phe2wnxcIhQx8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FLQ7LHh6; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FLQ7LHh6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5197DC4CEE7; Wed, 21 May 2025 15:33:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841622; bh=pivfaqLbjvxiTmCHNmYRHM0reWtCukqUacXbd9qDv+w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FLQ7LHh6i6jOmYwzj14pa1emwXPUxmSvd8I35OwFuLcAfzbFvYQjPrNBe9G89WI0F XO5AznfmHLxYmTdoVsXrNluH+LSt2fcrvMKbIPbTgasnViPi5L4ZC5lGA2mn5WX0Tq 8ksQj/7FExqUOiZY7K5TBSRiWn4i7DGPi8s7dir0KraApLJvZD167BLnQb2vvEzIEe 9cF+YHAOOUfdcQS3hRVkqdkPzOp/ghOi1YwjXkFex98vAaAVjto5MgUfC2wqzKJNIM IhWh9xGX/tEpfDoNZYsKq3fgs5qzJJVC9r3f12EznE4uVnWcFXBPpbneszeEf6pp9P u2nTTET6oht7A== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Jens Axboe , Alexander Mikhalitsyn , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 12/27] af_unix: Iterate all vertices by DFS. Date: Wed, 21 May 2025 16:27:11 +0100 Message-ID: <20250521152920.1116756-13-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 6ba76fd2848e107594ea4f03b737230f74bc23ea ] The new GC will use a depth first search graph algorithm to find cyclic references. The algorithm visits every vertex exactly once. Here, we implement the DFS part without recursion so that no one can abuse it. unix_walk_scc() marks every vertex unvisited by initialising index as UNIX_VERTEX_INDEX_UNVISITED and iterates inflight vertices in unix_unvisited_vertices and call __unix_walk_scc() to start DFS from an arbitrary vertex. __unix_walk_scc() iterates all edges starting from the vertex and explores the neighbour vertices with DFS using edge_stack. After visiting all neighbours, __unix_walk_scc() moves the visited vertex to unix_visited_vertices so that unix_walk_scc() will not restart DFS from the visited vertex. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 6ba76fd2848e107594ea4f03b737230f74bc23ea) Signed-off-by: Lee Jones --- include/net/af_unix.h | 2 ++ net/unix/garbage.c | 74 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 76 insertions(+) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 08cc90348043..9d51d675cc9f 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -33,12 +33,14 @@ struct unix_vertex { struct list_head edges; struct list_head entry; unsigned long out_degree; + unsigned long index; }; =20 struct unix_edge { struct unix_sock *predecessor; struct unix_sock *successor; struct list_head vertex_entry; + struct list_head stack_entry; }; =20 struct sock *unix_peer_get(struct sock *sk); diff --git a/net/unix/garbage.c b/net/unix/garbage.c index f7041fc23000..295dd1a7b8e0 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -103,6 +103,11 @@ struct unix_sock *unix_get_socket(struct file *filp) =20 static LIST_HEAD(unix_unvisited_vertices); =20 +enum unix_vertex_index { + UNIX_VERTEX_INDEX_UNVISITED, + UNIX_VERTEX_INDEX_START, +}; + static void unix_add_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex =3D edge->predecessor->vertex; @@ -241,6 +246,73 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) unix_free_vertices(fpl); } =20 +static LIST_HEAD(unix_visited_vertices); + +static void __unix_walk_scc(struct unix_vertex *vertex) +{ + unsigned long index =3D UNIX_VERTEX_INDEX_START; + struct unix_edge *edge; + LIST_HEAD(edge_stack); + +next_vertex: + vertex->index =3D index; + index++; + + /* Explore neighbour vertices (receivers of the current vertex's fd). */ + list_for_each_entry(edge, &vertex->edges, vertex_entry) { + struct unix_vertex *next_vertex =3D edge->successor->vertex; + + if (!next_vertex) + continue; + + if (next_vertex->index =3D=3D UNIX_VERTEX_INDEX_UNVISITED) { + /* Iterative deepening depth first search + * + * 1. Push a forward edge to edge_stack and set + * the successor to vertex for the next iteration. + */ + list_add(&edge->stack_entry, &edge_stack); + + vertex =3D next_vertex; + goto next_vertex; + + /* 2. Pop the edge directed to the current vertex + * and restore the ancestor for backtracking. + */ +prev_vertex: + edge =3D list_first_entry(&edge_stack, typeof(*edge), stack_entry); + list_del_init(&edge->stack_entry); + + vertex =3D edge->predecessor->vertex; + } + } + + /* Don't restart DFS from this vertex in unix_walk_scc(). */ + list_move_tail(&vertex->entry, &unix_visited_vertices); + + /* Need backtracking ? */ + if (!list_empty(&edge_stack)) + goto prev_vertex; +} + +static void unix_walk_scc(void) +{ + struct unix_vertex *vertex; + + list_for_each_entry(vertex, &unix_unvisited_vertices, entry) + vertex->index =3D UNIX_VERTEX_INDEX_UNVISITED; + + /* Visit every vertex exactly once. + * __unix_walk_scc() moves visited vertices to unix_visited_vertices. + */ + while (!list_empty(&unix_unvisited_vertices)) { + vertex =3D list_first_entry(&unix_unvisited_vertices, typeof(*vertex), e= ntry); + __unix_walk_scc(vertex); + } + + list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); +} + static LIST_HEAD(gc_candidates); static LIST_HEAD(gc_inflight_list); =20 @@ -388,6 +460,8 @@ static void __unix_gc(struct work_struct *work) =20 spin_lock(&unix_gc_lock); =20 + unix_walk_scc(); + /* First, select candidates for garbage collection. Only * in-flight sockets are considered, and from those only ones * which don't have any external reference. --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9788222596; Wed, 21 May 2025 15:33:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841630; cv=none; b=UVpUs5ty9RiIduP7EGBtXRDgQcCXUDVjFiTTbRn2eQTJ9PpQIxxdV/XAGOrVGyuC4tYnWku37WJbUtF1n1E5MLR/TGUyyNukw3Du/NE48jtq+wUZxARKan1LnpZOA5DIopUHxP18flHDhZp9Z6/OtOKgdjJJTEtl4B6OPGEs6HU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841630; c=relaxed/simple; bh=HI9OdU+GOMA19WqHEJ+k84G8liPCrsrz2n9jkc+dL7Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=B0gCmp6xjAxlkuBhp63Ymd+dLLDgvoV6kq7mgUjtFAl7CeNh4MoZLFQoENfarfxwzUjTW5WJwtv5lLSR/rJyNjfkvFRzYFp+pLuezX0ClGa5xuS+MyVzbwuK0F4gMcW6VtdZjS/mFr1gi/JgYY56gRneJ7TQULoHZEjYZDg57C8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TU9HqmfK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TU9HqmfK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 70AB0C4CEE7; Wed, 21 May 2025 15:33:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841629; bh=HI9OdU+GOMA19WqHEJ+k84G8liPCrsrz2n9jkc+dL7Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TU9HqmfKtQnnVuEYeK4LThXwnFXPjSQ8OKC9gNQxOCV4MrgEXiUxDTPMmnFNh3Wgl 80RJr62BDiAQimwjkfH4kG+/0hKtULoN2Va1q6KimDM0swgdpMfVUGiN6MyBG6OCmb mbO3sj7yDqd6Sq0HOyY4hdODrCV89BspuXysFkC0fjbL8J1rhQlgpnh/zx5OvyAbke PIOOtVzGGc3Y0EFc2fIA1s66ta7ix9uqZzLE/8DV2c/f0zjp8zUzQnoNQXGbCtiQtm xkJzRT8JpEK/ug5mn8OnEMkflhpyOIgh7slw5Dshx3JMyIjn1CMOJeshwhriXilvJO eNY4XhhkVdAWw== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Jens Axboe , Alexander Mikhalitsyn , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 13/27] af_unix: Detect Strongly Connected Components. Date: Wed, 21 May 2025 16:27:12 +0100 Message-ID: <20250521152920.1116756-14-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 3484f063172dd88776b062046d721d7c2ae1af7c ] In the new GC, we use a simple graph algorithm, Tarjan's Strongly Connected Components (SCC) algorithm, to find cyclic references. The algorithm visits every vertex exactly once using depth-first search (DFS). DFS starts by pushing an input vertex to a stack and assigning it a unique number. Two fields, index and lowlink, are initialised with the number, but lowlink could be updated later during DFS. If a vertex has an edge to an unvisited inflight vertex, we visit it and do the same processing. So, we will have vertices in the stack in the order they appear and number them consecutively in the same order. If a vertex has a back-edge to a visited vertex in the stack, we update the predecessor's lowlink with the successor's index. After iterating edges from the vertex, we check if its index equals its lowlink. If the lowlink is different from the index, it shows there was a back-edge. Then, we go backtracking and propagate the lowlink to its predecessor and resume the previous edge iteration from the next edge. If the lowlink is the same as the index, we pop vertices before and including the vertex from the stack. Then, the set of vertices is SCC, possibly forming a cycle. At the same time, we move the vertices to unix_visited_vertices. When we finish the algorithm, all vertices in each SCC will be linked via unix_vertex.scc_entry. Let's take an example. We have a graph including five inflight vertices (F is not inflight): A -> B -> C -> D -> E (-> F) ^ | `---------' Suppose that we start DFS from C. We will visit C, D, and B first and initialise their index and lowlink. Then, the stack looks like this: > B =3D (3, 3) (index, lowlink) D =3D (2, 2) C =3D (1, 1) When checking B's edge to C, we update B's lowlink with C's index and propagate it to D. B =3D (3, 1) (index, lowlink) > D =3D (2, 1) C =3D (1, 1) Next, we visit E, which has no edge to an inflight vertex. > E =3D (4, 4) (index, lowlink) B =3D (3, 1) D =3D (2, 1) C =3D (1, 1) When we leave from E, its index and lowlink are the same, so we pop E from the stack as single-vertex SCC. Next, we leave from B and D but do nothing because their lowlink are different from their index. B =3D (3, 1) (index, lowlink) D =3D (2, 1) > C =3D (1, 1) Then, we leave from C, whose index and lowlink are the same, so we pop B, D and C as SCC. Last, we do DFS for the rest of vertices, A, which is also a single-vertex SCC. Finally, each unix_vertex.scc_entry is linked as follows: A -. B -> C -> D E -. ^ | ^ | ^ | `--' `---------' `--' We use SCC later to decide whether we can garbage-collect the sockets. Note that we still cannot detect SCC properly if an edge points to an embryo socket. The following two patches will sort it out. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 3484f063172dd88776b062046d721d7c2ae1af7c) Signed-off-by: Lee Jones --- include/net/af_unix.h | 3 +++ net/unix/garbage.c | 46 +++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 47 insertions(+), 2 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 9d51d675cc9f..e6f7bba19152 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -32,8 +32,11 @@ void wait_for_unix_gc(struct scm_fp_list *fpl); struct unix_vertex { struct list_head edges; struct list_head entry; + struct list_head scc_entry; unsigned long out_degree; unsigned long index; + unsigned long lowlink; + bool on_stack; }; =20 struct unix_edge { diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 295dd1a7b8e0..cdeff548e130 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -251,11 +251,19 @@ static LIST_HEAD(unix_visited_vertices); static void __unix_walk_scc(struct unix_vertex *vertex) { unsigned long index =3D UNIX_VERTEX_INDEX_START; + LIST_HEAD(vertex_stack); struct unix_edge *edge; LIST_HEAD(edge_stack); =20 next_vertex: + /* Push vertex to vertex_stack. + * The vertex will be popped when finalising SCC later. + */ + vertex->on_stack =3D true; + list_add(&vertex->scc_entry, &vertex_stack); + vertex->index =3D index; + vertex->lowlink =3D index; index++; =20 /* Explore neighbour vertices (receivers of the current vertex's fd). */ @@ -283,12 +291,46 @@ static void __unix_walk_scc(struct unix_vertex *verte= x) edge =3D list_first_entry(&edge_stack, typeof(*edge), stack_entry); list_del_init(&edge->stack_entry); =20 + next_vertex =3D vertex; vertex =3D edge->predecessor->vertex; + + /* If the successor has a smaller lowlink, two vertices + * are in the same SCC, so propagate the smaller lowlink + * to skip SCC finalisation. + */ + vertex->lowlink =3D min(vertex->lowlink, next_vertex->lowlink); + } else if (next_vertex->on_stack) { + /* Loop detected by a back/cross edge. + * + * The successor is on vertex_stack, so two vertices are + * in the same SCC. If the successor has a smaller index, + * propagate it to skip SCC finalisation. + */ + vertex->lowlink =3D min(vertex->lowlink, next_vertex->index); + } else { + /* The successor was already grouped as another SCC */ } } =20 - /* Don't restart DFS from this vertex in unix_walk_scc(). */ - list_move_tail(&vertex->entry, &unix_visited_vertices); + if (vertex->index =3D=3D vertex->lowlink) { + struct list_head scc; + + /* SCC finalised. + * + * If the lowlink was not updated, all the vertices above on + * vertex_stack are in the same SCC. Group them using scc_entry. + */ + __list_cut_position(&scc, &vertex_stack, &vertex->scc_entry); + + list_for_each_entry_reverse(vertex, &scc, scc_entry) { + /* Don't restart DFS from this vertex in unix_walk_scc(). */ + list_move_tail(&vertex->entry, &unix_visited_vertices); + + vertex->on_stack =3D false; + } + + list_del(&scc); + } =20 /* Need backtracking ? */ if (!list_empty(&edge_stack)) --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB3EA1E2606; Wed, 21 May 2025 15:33:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841637; cv=none; b=EQ5KlwC/pi5XbjL1C0A0zX6jLKqX6SsGLoIeFJHfYuomzeON04b5OHssDSAmXHh5qY8L898Jib4ikoTJHk5SEj0gbDdXCdXVmAUE4izE5KaBqVGGE9LF6yRxzbujBwwmIOu89226sEgiTpjShpsoBeK8WRJPGqxAnJ4tuBWaXG4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841637; c=relaxed/simple; bh=BSy8Koe2LlIfi02MrdJ+5aDU4ObqpdpP/dGX8EvvOjE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=T+GqnHaJc7lbIHkfAJ0WebiMjyfI/Q9kri9X3f69VFv1oAOvuyUCtSybAa5cOWaeu89m+upAn4YjPr0sSEzqRYY76A9+nMYPflwJMNjCUBnT/kUwTa8i597ucdJnyG1hkVF6RPj4EcfPzJjdwnBbetN5MBVdwOYXTbGIjL79i58= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=U/hVibg9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="U/hVibg9" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7DC52C4CEEB; Wed, 21 May 2025 15:33:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841636; bh=BSy8Koe2LlIfi02MrdJ+5aDU4ObqpdpP/dGX8EvvOjE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=U/hVibg9nuYnpKkpKuVlS/OFAvUOkLwN7QfLYvclxccteHzJxQtyZH4tBcDAcuqXF OXCm4arsAwcDfkfDBhzbVOZIIX9O3jzXuep9hXzC21mWUTxxV90/LenPw2Mtc+M4JE ZN5b+/b/98L6oNLQR+Twr+XjGYCf4FbB68LRDERhTPPq/gE6ItlaYvYbiblbrmCkPB ZgirHOFm4mmIc48rl1PGPo8arm7+tco+a7z5lIb9mDl8imOdjAFGbWh4TgG98HDimy WO5cxNG+rE3OXCASbD/rLL3uvv5BFem3QzWDOZ/dEEbUwsaYdoHsF0MP3muD2uoOxU Lb+rgPXaupRqg== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Alexander Mikhalitsyn , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 14/27] af_unix: Save listener for embryo socket. Date: Wed, 21 May 2025 16:27:13 +0100 Message-ID: <20250521152920.1116756-15-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit aed6ecef55d70de3762ce41c561b7f547dbaf107 ] This is a prep patch for the following change, where we need to fetch the listening socket from the successor embryo socket during GC. We add a new field to struct unix_sock to save a pointer to a listening socket. We set it when connect() creates a new socket, and clear it when accept() is called. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit aed6ecef55d70de3762ce41c561b7f547dbaf107) Signed-off-by: Lee Jones --- include/net/af_unix.h | 1 + net/unix/af_unix.c | 5 ++++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index e6f7bba19152..624fea657518 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -83,6 +83,7 @@ struct unix_sock { struct path path; struct mutex iolock, bindlock; struct sock *peer; + struct sock *listener; struct unix_vertex *vertex; struct list_head link; unsigned long inflight; diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 658a1680a92e..6075ecbe40b2 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -954,6 +954,7 @@ static struct sock *unix_create1(struct net *net, struc= t socket *sock, int kern, sk->sk_max_ack_backlog =3D READ_ONCE(net->unx.sysctl_max_dgram_qlen); sk->sk_destruct =3D unix_sock_destructor; u =3D unix_sk(sk); + u->listener =3D NULL; u->inflight =3D 0; u->vertex =3D NULL; u->path.dentry =3D NULL; @@ -1558,6 +1559,7 @@ static int unix_stream_connect(struct socket *sock, s= truct sockaddr *uaddr, newsk->sk_type =3D sk->sk_type; init_peercred(newsk); newu =3D unix_sk(newsk); + newu->listener =3D other; RCU_INIT_POINTER(newsk->sk_wq, &newu->peer_wq); otheru =3D unix_sk(other); =20 @@ -1651,8 +1653,8 @@ static int unix_accept(struct socket *sock, struct so= cket *newsock, int flags, bool kern) { struct sock *sk =3D sock->sk; - struct sock *tsk; struct sk_buff *skb; + struct sock *tsk; int err; =20 err =3D -EOPNOTSUPP; @@ -1677,6 +1679,7 @@ static int unix_accept(struct socket *sock, struct so= cket *newsock, int flags, } =20 tsk =3D skb->sk; + unix_sk(tsk)->listener =3D NULL; skb_free_datagram(sk, skb); wake_up_interruptible(&unix_sk(sk)->peer_wait); =20 --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A9DFF2206AC; Wed, 21 May 2025 15:34:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841643; cv=none; b=DzRG0F49+Egji0E0NW08oqraEPJ4n1dQ11RLBvPc99HHpaZEgut6TrWKYlEa7QzqqO1PTSwZfCHJazKE1Wh+99jcnWnPRTu+D3LDhjx2VOH92QLslRZXznIuZ9lkk58EgIjRjpdchIr6vh2l/iYuU6C5qxfVMY+bcYq5c7bTWMw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841643; c=relaxed/simple; bh=eSfVZgrHS8VynzF1LFlvxACfI79Y+FAdwgbNMxCAqr4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=g6MIwot9rImO+bQ7HZKwRv1gHiSu+8MntI40UGGfZ/7GS/oMOB86oBu8GL26Ay2sfg9/nWhvNiVR21jYyPwEpZoZc7xwRDxiIBb7TRdDbZXy+5RW6AzubyFpsK0ABcT1Lf4ZRZAK12NdPSrarWhokPCuoYh6ezDtfmNgFXrisHk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=A1chswUg; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="A1chswUg" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8A1EAC4CEE4; Wed, 21 May 2025 15:34:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841643; bh=eSfVZgrHS8VynzF1LFlvxACfI79Y+FAdwgbNMxCAqr4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=A1chswUgNTH/Z4k/hLqCfpeiTQHBkTnA5bCOf20Eqv4/t1EETnCHW8VO8kL3VDhtW mBC79jmkest8lVadn6Gdm5GrOvSvpc4zAnSyODw1WNhnnxYXZGy6R48F5RNAaZCvaA GE8UVRtqR2IVH/IXBjenBv95RD+GoYNh5nBxXE4BxwfSeD4Hk9321RF37NYGfQtDck IYhOnUtcrHkmigDy952f1FUs6Vt9kS5pjaES23ThaasuvzxvNj7FrVaEGEH1PD0L2I 2wLRn4S4KiyMUTd/B9QwH5crYbTejxOcCl+VzkDhDrULTsloKBYBNnaS2YrLNMedjY qU0KiBFg3PDcg== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Alexander Mikhalitsyn , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 15/27] af_unix: Fix up unix_edge.successor for embryo socket. Date: Wed, 21 May 2025 16:27:14 +0100 Message-ID: <20250521152920.1116756-16-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit dcf70df2048d27c5d186f013f101a4aefd63aa41 ] To garbage collect inflight AF_UNIX sockets, we must define the cyclic reference appropriately. This is a bit tricky if the loop consists of embryo sockets. Suppose that the fd of AF_UNIX socket A is passed to D and the fd B to C and that C and D are embryo sockets of A and B, respectively. It may appear that there are two separate graphs, A (-> D) and B (-> C), but this is not correct. A --. .-- B X C <-' `-> D Now, D holds A's refcount, and C has B's refcount, so unix_release() will never be called for A and B when we close() them. However, no one can call close() for D and C to free skbs holding refcounts of A and B because C/D is in A/B's receive queue, which should have been purged by unix_release() for A and B. So, here's another type of cyclic reference. When a fd of an AF_UNIX socket is passed to an embryo socket, the reference is indirectly held by its parent listening socket. .-> A .-> B | `- sk_receive_queue | `- sk_receive_queue | `- skb | `- skb | `- sk =3D=3D C | `- sk =3D=3D D | `- sk_receive_queue | `- sk_receive_queue | `- skb +---------' `- skb +-. | | `---------------------------------------------------------' Technically, the graph must be denoted as A <-> B instead of A (-> D) and B (-> C) to find such a cyclic reference without touching each socket's receive queue. .-> A --. .-- B <-. | X | =3D=3D A <-> B `-- C <-' `-> D --' We apply this fixup during GC by fetching the real successor by unix_edge_successor(). When we call accept(), we clear unix_sock.listener under unix_gc_lock not to confuse GC. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit dcf70df2048d27c5d186f013f101a4aefd63aa41) Signed-off-by: Lee Jones --- include/net/af_unix.h | 1 + net/unix/af_unix.c | 2 +- net/unix/garbage.c | 20 +++++++++++++++++++- 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 624fea657518..d7f589e14467 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -24,6 +24,7 @@ void unix_inflight(struct user_struct *user, struct file = *fp); void unix_notinflight(struct user_struct *user, struct file *fp); void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver); void unix_del_edges(struct scm_fp_list *fpl); +void unix_update_edges(struct unix_sock *receiver); int unix_prepare_fpl(struct scm_fp_list *fpl); void unix_destroy_fpl(struct scm_fp_list *fpl); void unix_gc(void); diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 6075ecbe40b2..4d8b2b2b9a70 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1679,7 +1679,7 @@ static int unix_accept(struct socket *sock, struct so= cket *newsock, int flags, } =20 tsk =3D skb->sk; - unix_sk(tsk)->listener =3D NULL; + unix_update_edges(unix_sk(tsk)); skb_free_datagram(sk, skb); wake_up_interruptible(&unix_sk(sk)->peer_wait); =20 diff --git a/net/unix/garbage.c b/net/unix/garbage.c index cdeff548e130..6ff7e0b5c544 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -101,6 +101,17 @@ struct unix_sock *unix_get_socket(struct file *filp) return NULL; } =20 +static struct unix_vertex *unix_edge_successor(struct unix_edge *edge) +{ + /* If an embryo socket has a fd, + * the listener indirectly holds the fd's refcnt. + */ + if (edge->successor->listener) + return unix_sk(edge->successor->listener)->vertex; + + return edge->successor->vertex; +} + static LIST_HEAD(unix_unvisited_vertices); =20 enum unix_vertex_index { @@ -209,6 +220,13 @@ void unix_del_edges(struct scm_fp_list *fpl) fpl->inflight =3D false; } =20 +void unix_update_edges(struct unix_sock *receiver) +{ + spin_lock(&unix_gc_lock); + receiver->listener =3D NULL; + spin_unlock(&unix_gc_lock); +} + int unix_prepare_fpl(struct scm_fp_list *fpl) { struct unix_vertex *vertex; @@ -268,7 +286,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex) =20 /* Explore neighbour vertices (receivers of the current vertex's fd). */ list_for_each_entry(edge, &vertex->edges, vertex_entry) { - struct unix_vertex *next_vertex =3D edge->successor->vertex; + struct unix_vertex *next_vertex =3D unix_edge_successor(edge); =20 if (!next_vertex) continue; --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D2D01607A4; Wed, 21 May 2025 15:34:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841652; cv=none; b=E69zIMC254M7jLclKZagHhpgvRsPVguKcubvLNxAdsQLBgi3ohuY4R2zPqs/lgMocCkAV9REN6UvD+koGywNJT+E2kB4MLFi0TB+fzUXmQppLH8f86nwiQ3lN0rXE80uBUuo1n7wx1dhnXvfZ2pLyMcecuSGbwMI6eap4FTqZTM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841652; c=relaxed/simple; bh=O03T3ChknsheJGHN0Jw6E9IVyQ1C+VFIXNJlYAu3PMo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rtr23Ev0H2lsK8ZSGe+UNBZiNEzMju0AhM+tphi8jHqH5dyyNVoM5DMLo2O6bxClw/pVHx4lZqxmGVmG0XLZt7CP+kNY10L+MTpe9h7agNvJruwgGJ6TOViCeTvBxtZJRKtXP6FI8f7vMmSVmmugi5pfpiPOI3Qc6L1oZ7bcui0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Pcz0BEPp; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Pcz0BEPp" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A00BAC4CEEB; Wed, 21 May 2025 15:34:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841650; bh=O03T3ChknsheJGHN0Jw6E9IVyQ1C+VFIXNJlYAu3PMo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Pcz0BEPpYj+99/HP0j/YUS8E7rAkEphaHTPZvCs/GyevM0WIdDzRLU5n0hQpfVIfC nCnM//FJLCsFNpP7LvyIq3x8rBYy+99crWZ9qK6hnH3nJI/fyC0NUxBF0NjTDWFzhV ATpOnsju4k+r0wqkLftO3yTuRzVYUAOuy45VLawAVwKhnN+AkJQp7BCxdQw813XfqM q4ILxrcLYOAKsit7dABugg/koZdEq0dE8w2Eq/7VVHM8JHZbyQVqb6F9R9Wf7GCHTg 9w4Cx/B6y/0+CuIcs9z1cbBlqnPG//0vQ7bsIkvRY3ingBE9gBU9cNM51lTOrpWyiT zxBjgxBMBoyAw== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Jens Axboe , Alexander Mikhalitsyn , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 16/27] af_unix: Save O(n) setup of Tarjan's algo. Date: Wed, 21 May 2025 16:27:15 +0100 Message-ID: <20250521152920.1116756-17-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit ba31b4a4e1018f5844c6eb31734976e2184f2f9a ] Before starting Tarjan's algorithm, we need to mark all vertices as unvisited. We can save this O(n) setup by reserving two special indices (0, 1) and using two variables. The first time we link a vertex to unix_unvisited_vertices, we set unix_vertex_unvisited_index to index. During DFS, we can see that the index of unvisited vertices is the same as unix_vertex_unvisited_index. When we finalise SCC later, we set unix_vertex_grouped_index to each vertex's index. Then, we can know (i) that the vertex is on the stack if the index of a visited vertex is >=3D 2 and (ii) that it is not on the stack and belongs to a different SCC if the index is unix_vertex_grouped_index. After the whole algorithm, all indices of vertices are set as unix_vertex_grouped_index. Next time we start DFS, we know that all unvisited vertices have unix_vertex_grouped_index, and we can use unix_vertex_unvisited_index as the not-on-stack marker. To use the same variable in __unix_walk_scc(), we can swap unix_vertex_(grouped|unvisited)_index at the end of Tarjan's algorithm. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-10-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit ba31b4a4e1018f5844c6eb31734976e2184f2f9a) Signed-off-by: Lee Jones --- include/net/af_unix.h | 1 - net/unix/garbage.c | 26 +++++++++++++++----------- 2 files changed, 15 insertions(+), 12 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index d7f589e14467..ffbc7322e41b 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -37,7 +37,6 @@ struct unix_vertex { unsigned long out_degree; unsigned long index; unsigned long lowlink; - bool on_stack; }; =20 struct unix_edge { diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 6ff7e0b5c544..feae6c17b291 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -115,16 +115,20 @@ static struct unix_vertex *unix_edge_successor(struct= unix_edge *edge) static LIST_HEAD(unix_unvisited_vertices); =20 enum unix_vertex_index { - UNIX_VERTEX_INDEX_UNVISITED, + UNIX_VERTEX_INDEX_MARK1, + UNIX_VERTEX_INDEX_MARK2, UNIX_VERTEX_INDEX_START, }; =20 +static unsigned long unix_vertex_unvisited_index =3D UNIX_VERTEX_INDEX_MAR= K1; + static void unix_add_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex =3D edge->predecessor->vertex; =20 if (!vertex) { vertex =3D list_first_entry(&fpl->vertices, typeof(*vertex), entry); + vertex->index =3D unix_vertex_unvisited_index; vertex->out_degree =3D 0; INIT_LIST_HEAD(&vertex->edges); =20 @@ -265,6 +269,7 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) } =20 static LIST_HEAD(unix_visited_vertices); +static unsigned long unix_vertex_grouped_index =3D UNIX_VERTEX_INDEX_MARK2; =20 static void __unix_walk_scc(struct unix_vertex *vertex) { @@ -274,10 +279,10 @@ static void __unix_walk_scc(struct unix_vertex *verte= x) LIST_HEAD(edge_stack); =20 next_vertex: - /* Push vertex to vertex_stack. + /* Push vertex to vertex_stack and mark it as on-stack + * (index >=3D UNIX_VERTEX_INDEX_START). * The vertex will be popped when finalising SCC later. */ - vertex->on_stack =3D true; list_add(&vertex->scc_entry, &vertex_stack); =20 vertex->index =3D index; @@ -291,7 +296,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex) if (!next_vertex) continue; =20 - if (next_vertex->index =3D=3D UNIX_VERTEX_INDEX_UNVISITED) { + if (next_vertex->index =3D=3D unix_vertex_unvisited_index) { /* Iterative deepening depth first search * * 1. Push a forward edge to edge_stack and set @@ -317,7 +322,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex) * to skip SCC finalisation. */ vertex->lowlink =3D min(vertex->lowlink, next_vertex->lowlink); - } else if (next_vertex->on_stack) { + } else if (next_vertex->index !=3D unix_vertex_grouped_index) { /* Loop detected by a back/cross edge. * * The successor is on vertex_stack, so two vertices are @@ -344,7 +349,8 @@ static void __unix_walk_scc(struct unix_vertex *vertex) /* Don't restart DFS from this vertex in unix_walk_scc(). */ list_move_tail(&vertex->entry, &unix_visited_vertices); =20 - vertex->on_stack =3D false; + /* Mark vertex as off-stack. */ + vertex->index =3D unix_vertex_grouped_index; } =20 list_del(&scc); @@ -357,20 +363,18 @@ static void __unix_walk_scc(struct unix_vertex *verte= x) =20 static void unix_walk_scc(void) { - struct unix_vertex *vertex; - - list_for_each_entry(vertex, &unix_unvisited_vertices, entry) - vertex->index =3D UNIX_VERTEX_INDEX_UNVISITED; - /* Visit every vertex exactly once. * __unix_walk_scc() moves visited vertices to unix_visited_vertices. */ while (!list_empty(&unix_unvisited_vertices)) { + struct unix_vertex *vertex; + vertex =3D list_first_entry(&unix_unvisited_vertices, typeof(*vertex), e= ntry); __unix_walk_scc(vertex); } =20 list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); + swap(unix_vertex_unvisited_index, unix_vertex_grouped_index); } =20 static LIST_HEAD(gc_candidates); --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C7718217704; Wed, 21 May 2025 15:34:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841657; cv=none; b=sSG7VVGc0DSEpDytSKyzh3QerEPMMY+fhRa8sfVVcrI4pcbK73cO7z8wpuXKRjr/hoc1pyoMOkawd4vFwdahNMp5Y8x3a69gynGE2qRva2RRTnteHDpSZJ8k8/ARVTO4qFfq3YIkA5vwt8UbuXDyDmmflPvSYe8W1UfRTG4wBhQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841657; c=relaxed/simple; bh=uNlOJaA3Kj9exb8Mp76c1OZdZEWpdVv9D2KSAybA18Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WwjAeoTUcDr9DXYcgQyU7ODr9EZCot28O5EEoudm6C/gyQPsDn6zd9oFv5Jet1MFCxpfraU0RbLpn/bG4bROq03D5wmMQpN5N+UggQB+fbw+mP0YUD5fPmy+tBOFUfWIXNPwVpsDmb6mRJcAngsb2uMj+DUQjnkwa3NXhRp9ufA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=r/SiIbi8; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="r/SiIbi8" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AC4F9C4CEE4; Wed, 21 May 2025 15:34:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841657; bh=uNlOJaA3Kj9exb8Mp76c1OZdZEWpdVv9D2KSAybA18Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=r/SiIbi87JL+YWRLguYSshkkgWbRjRQx6ooMpbxhPM02zq9suGVedlxL8OpsIIZoH 5T66yaq2/LAGg54Mlcp+AyHG+6nTk+aTZMu/Z15f+C50vuyA3B0dnAhma2TIhmCeGm wRewuomehLUdmTz4fsqA7arDDwep04K4rmqNNmNwdxbOVHUjJH8lf0quTSHQB9iVrf oDkUYdkAojPMNsU3ZrWJEnSyw+LwVmN7GqvofcPLAHcHB1AxQrgrGRUy+jqZnKhkFC FXLNiv3MJwpIDwkzi+jbVPlZtGNOgZPhBCG9y6B6WgOzYNgkEm4+aajDTPM9tvmopv w2iq2u1XbN3Zg== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Alexander Mikhalitsyn , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 17/27] af_unix: Skip GC if no cycle exists. Date: Wed, 21 May 2025 16:27:16 +0100 Message-ID: <20250521152920.1116756-18-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 77e5593aebba823bcbcf2c4b58b07efcd63933b8 ] We do not need to run GC if there is no possible cyclic reference. We use unix_graph_maybe_cyclic to decide if we should run GC. If a fd of an AF_UNIX socket is passed to an already inflight AF_UNIX socket, they could form a cyclic reference. Then, we set true to unix_graph_maybe_cyclic and later run Tarjan's algorithm to group them into SCC. Once we run Tarjan's algorithm, we are 100% sure whether cyclic references exist or not. If there is no cycle, we set false to unix_graph_maybe_cyclic and can skip the entire garbage collection next time. When finalising SCC, we set true to unix_graph_maybe_cyclic if SCC consists of multiple vertices. Even if SCC is a single vertex, a cycle might exist as self-fd passing. Given the corner case is rare, we detect it by checking all edges of the vertex and set true to unix_graph_maybe_cyclic. With this change, __unix_gc() is just a spin_lock() dance in the normal usage. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-11-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 77e5593aebba823bcbcf2c4b58b07efcd63933b8) Signed-off-by: Lee Jones --- net/unix/garbage.c | 48 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index feae6c17b291..8f0dc39bb72f 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -112,6 +112,19 @@ static struct unix_vertex *unix_edge_successor(struct = unix_edge *edge) return edge->successor->vertex; } =20 +static bool unix_graph_maybe_cyclic; + +static void unix_update_graph(struct unix_vertex *vertex) +{ + /* If the receiver socket is not inflight, no cyclic + * reference could be formed. + */ + if (!vertex) + return; + + unix_graph_maybe_cyclic =3D true; +} + static LIST_HEAD(unix_unvisited_vertices); =20 enum unix_vertex_index { @@ -138,12 +151,16 @@ static void unix_add_edge(struct scm_fp_list *fpl, st= ruct unix_edge *edge) =20 vertex->out_degree++; list_add_tail(&edge->vertex_entry, &vertex->edges); + + unix_update_graph(unix_edge_successor(edge)); } =20 static void unix_del_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex =3D edge->predecessor->vertex; =20 + unix_update_graph(unix_edge_successor(edge)); + list_del(&edge->vertex_entry); vertex->out_degree--; =20 @@ -227,6 +244,7 @@ void unix_del_edges(struct scm_fp_list *fpl) void unix_update_edges(struct unix_sock *receiver) { spin_lock(&unix_gc_lock); + unix_update_graph(unix_sk(receiver->listener)->vertex); receiver->listener =3D NULL; spin_unlock(&unix_gc_lock); } @@ -268,6 +286,26 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) unix_free_vertices(fpl); } =20 +static bool unix_scc_cyclic(struct list_head *scc) +{ + struct unix_vertex *vertex; + struct unix_edge *edge; + + /* SCC containing multiple vertices ? */ + if (!list_is_singular(scc)) + return true; + + vertex =3D list_first_entry(scc, typeof(*vertex), scc_entry); + + /* Self-reference or a embryo-listener circle ? */ + list_for_each_entry(edge, &vertex->edges, vertex_entry) { + if (unix_edge_successor(edge) =3D=3D vertex) + return true; + } + + return false; +} + static LIST_HEAD(unix_visited_vertices); static unsigned long unix_vertex_grouped_index =3D UNIX_VERTEX_INDEX_MARK2; =20 @@ -353,6 +391,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex) vertex->index =3D unix_vertex_grouped_index; } =20 + if (!unix_graph_maybe_cyclic) + unix_graph_maybe_cyclic =3D unix_scc_cyclic(&scc); + list_del(&scc); } =20 @@ -363,6 +404,8 @@ static void __unix_walk_scc(struct unix_vertex *vertex) =20 static void unix_walk_scc(void) { + unix_graph_maybe_cyclic =3D false; + /* Visit every vertex exactly once. * __unix_walk_scc() moves visited vertices to unix_visited_vertices. */ @@ -524,6 +567,9 @@ static void __unix_gc(struct work_struct *work) =20 spin_lock(&unix_gc_lock); =20 + if (!unix_graph_maybe_cyclic) + goto skip_gc; + unix_walk_scc(); =20 /* First, select candidates for garbage collection. Only @@ -633,7 +679,7 @@ static void __unix_gc(struct work_struct *work) =20 /* All candidates should have been detached by now. */ WARN_ON_ONCE(!list_empty(&gc_candidates)); - +skip_gc: /* Paired with READ_ONCE() in wait_for_unix_gc(). */ WRITE_ONCE(gc_in_progress, false); =20 --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 51872220F4C; Wed, 21 May 2025 15:34:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841665; cv=none; b=ZDA6wT+GbPoym85xwhSHURfd1dEp2Tf5PG+fHBqUEZ1CYk9la0ANvYJCAz5oyVAk813yRk3pes2qhBWoa4y2NymnKoXCSUoeapLXSaczNlCJCSDPg2H/P/+L6Sv8Ut2lTIjVr9Nwe+yP1Q1ustkNs2uUuL7TinpqNQsvKSfI5eM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841665; c=relaxed/simple; bh=PEQNaL5xlKqj4lxc3qzk2ZmeiDwQy4A5YNUXp+GxMWI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=p9XTZfYVfCX9m4AqXHWs06eePfqhwQ6OUdmD0TvX5r1WPGXNWsFy/MMMmPmYZs31d4JM3ATOg+C8a26NJNHJoaglBIZ3v+10hO8rNiOZNeXE6j5bxoAQvDNWJLRcYsALugF5i03389uT26jxuaSznXSA1/7AmdozqT2XCwJmyC0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NKaa8S6T; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NKaa8S6T" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BD72DC4CEE4; Wed, 21 May 2025 15:34:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841664; bh=PEQNaL5xlKqj4lxc3qzk2ZmeiDwQy4A5YNUXp+GxMWI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NKaa8S6TCKtrW5AuS6+uVSvhHjGcuuCWlRayFbVQR/5nMMz165imPmRErOEsEpVk9 PEBDtfkDCCQSxfjBcCRRPQHitmhXSXQvxa02ri2/Iq5PppUxOoiI4jv86ypfQN5yIf nhPIEjIVo+H39WGTikOYGA9T7icpE81dBGXfZb4nK94jGnPAItEeJOlUZC/pKwBQZJ 9piSKK3b9AA6FGEMY7M/lXV0T4Rqx41liEuSmZYUZODPRCEzCoGGw2ZXa9/LtzNpYd UIseLoOBeYIK5SFu/IEPlrZKVkxN3Z80YN7LtkACOq3kv+fyxI7EKcfUV2aTKLO+xy Lbqyw9tgLswJA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Jens Axboe , Alexander Mikhalitsyn , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 18/27] af_unix: Avoid Tarjan's algorithm if unnecessary. Date: Wed, 21 May 2025 16:27:17 +0100 Message-ID: <20250521152920.1116756-19-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit ad081928a8b0f57f269df999a28087fce6f2b6ce ] Once a cyclic reference is formed, we need to run GC to check if there is dead SCC. However, we do not need to run Tarjan's algorithm if we know that the shape of the inflight graph has not been changed. If an edge is added/updated/deleted and the edge's successor is inflight, we set false to unix_graph_grouped, which means we need to re-classify SCC. Once we finalise SCC, we set true to unix_graph_grouped. While unix_graph_grouped is true, we can iterate the grouped SCC using vertex->scc_entry in unix_walk_scc_fast(). list_add() and list_for_each_entry_reverse() uses seem weird, but they are to keep the vertex order consistent and make writing test easier. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-12-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit ad081928a8b0f57f269df999a28087fce6f2b6ce) Signed-off-by: Lee Jones --- net/unix/garbage.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 8f0dc39bb72f..d25841ab2de4 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -113,6 +113,7 @@ static struct unix_vertex *unix_edge_successor(struct u= nix_edge *edge) } =20 static bool unix_graph_maybe_cyclic; +static bool unix_graph_grouped; =20 static void unix_update_graph(struct unix_vertex *vertex) { @@ -123,6 +124,7 @@ static void unix_update_graph(struct unix_vertex *verte= x) return; =20 unix_graph_maybe_cyclic =3D true; + unix_graph_grouped =3D false; } =20 static LIST_HEAD(unix_unvisited_vertices); @@ -144,6 +146,7 @@ static void unix_add_edge(struct scm_fp_list *fpl, stru= ct unix_edge *edge) vertex->index =3D unix_vertex_unvisited_index; vertex->out_degree =3D 0; INIT_LIST_HEAD(&vertex->edges); + INIT_LIST_HEAD(&vertex->scc_entry); =20 list_move_tail(&vertex->entry, &unix_unvisited_vertices); edge->predecessor->vertex =3D vertex; @@ -418,6 +421,26 @@ static void unix_walk_scc(void) =20 list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); swap(unix_vertex_unvisited_index, unix_vertex_grouped_index); + + unix_graph_grouped =3D true; +} + +static void unix_walk_scc_fast(void) +{ + while (!list_empty(&unix_unvisited_vertices)) { + struct unix_vertex *vertex; + struct list_head scc; + + vertex =3D list_first_entry(&unix_unvisited_vertices, typeof(*vertex), e= ntry); + list_add(&scc, &vertex->scc_entry); + + list_for_each_entry_reverse(vertex, &scc, scc_entry) + list_move_tail(&vertex->entry, &unix_visited_vertices); + + list_del(&scc); + } + + list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); } =20 static LIST_HEAD(gc_candidates); @@ -570,7 +593,10 @@ static void __unix_gc(struct work_struct *work) if (!unix_graph_maybe_cyclic) goto skip_gc; =20 - unix_walk_scc(); + if (unix_graph_grouped) + unix_walk_scc_fast(); + else + unix_walk_scc(); =20 /* First, select candidates for garbage collection. Only * in-flight sockets are considered, and from those only ones --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 27423217701; Wed, 21 May 2025 15:34:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841672; cv=none; b=GaVX9vdy0rKAQejtyf9gKlmSzoHXXr76S69/BVRpEVfVgQJeMAJ/18ilj86o8vE+dTqbcVbHiJc/giLBZtkBCbZPNOOiWQiYTRbEA7hJno+1IXfICfG2IPAWXAjP+LfLJowXc86KmmVwv1E5+iB8kWcBDunKqzR7oB3NB6zauxo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841672; c=relaxed/simple; bh=yu92KeGDMeX1CTsAO0+Ti1UIw/EsUEv/HsCIvlwqmuU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HkZ7qyb/KPuKIu76OKETPiQcmQ0YlOWACMrLsMq7DjkctZYlIjplH202aJaiDtC5g32eaJ2HE+3S4lsDPtpTtx5wIjQDLyBFWa1yb7uhLGEk95VjtQkbD5ugXnJVctgifxG/UOhZdWgZvnfaDqGaJgQFpT22upgECJFTELb64Zw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oPs0WCLz; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oPs0WCLz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0819CC4CEE4; Wed, 21 May 2025 15:34:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841672; bh=yu92KeGDMeX1CTsAO0+Ti1UIw/EsUEv/HsCIvlwqmuU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oPs0WCLzbY9CBXjRCeUhX3cIvRkUtAK4XkRN/HLSrXgidRtKmQ1tusEaboC3ViE5M GBEBb7gQscHsMacGlWKLkqSeuxHLEFYYiGhmRFNOvUhUhpxmXS6uJnuUoIEabz4sRP F16OGgv+XA1J9Zub0I8sjIdqgcAhqVyvP8VPnosO8hikhGxcWEb7KFmg88uC0OEMs8 8nLO2AfrobuzTFXT0hezNV1yUOFHDUWg0X+VrZQuPks5FtfIvS/AU6YIfrIFvhYx8u UUKroPVNcA0H1/7GwfHIq8651X2CaQlnC/CZbiLAVs8l7yyrAHd7LNmd0TkjlXi6cK 3Mnal3SfLZWWA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Alexander Mikhalitsyn , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 19/27] af_unix: Assign a unique index to SCC. Date: Wed, 21 May 2025 16:27:18 +0100 Message-ID: <20250521152920.1116756-20-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit bfdb01283ee8f2f3089656c3ff8f62bb072dabb2 ] The definition of the lowlink in Tarjan's algorithm is the smallest index of a vertex that is reachable with at most one back-edge in SCC. This is not useful for a cross-edge. If we start traversing from A in the following graph, the final lowlink of D is 3. The cross-edge here is one between D and C. A -> B -> D D =3D (4, 3) (index, lowlink) ^ | | C =3D (3, 1) | V | B =3D (2, 1) `--- C <--' A =3D (1, 1) This is because the lowlink of D is updated with the index of C. In the following patch, we detect a dead SCC by checking two conditions for each vertex. 1) vertex has no edge directed to another SCC (no bridge) 2) vertex's out_degree is the same as the refcount of its file If 1) is false, there is a receiver of all fds of the SCC and its ancestor SCC. To evaluate 1), we need to assign a unique index to each SCC and assign it to all vertices in the SCC. This patch changes the lowlink update logic for cross-edge so that in the example above, the lowlink of D is updated with the lowlink of C. A -> B -> D D =3D (4, 1) (index, lowlink) ^ | | C =3D (3, 1) | V | B =3D (2, 1) `--- C <--' A =3D (1, 1) Then, all vertices in the same SCC have the same lowlink, and we can quickly find the bridge connecting to different SCC if exists. However, it is no longer called lowlink, so we rename it to scc_index. (It's sometimes called lowpoint.) Also, we add a global variable to hold the last index used in DFS so that we do not reset the initial index in each DFS. This patch can be squashed to the SCC detection patch but is split deliberately for anyone wondering why lowlink is not used as used in the original Tarjan's algorithm and many reference implementations. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-13-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit bfdb01283ee8f2f3089656c3ff8f62bb072dabb2) Signed-off-by: Lee Jones --- include/net/af_unix.h | 2 +- net/unix/garbage.c | 29 +++++++++++++++-------------- 2 files changed, 16 insertions(+), 15 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index ffbc7322e41b..14d56b07a54d 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -36,7 +36,7 @@ struct unix_vertex { struct list_head scc_entry; unsigned long out_degree; unsigned long index; - unsigned long lowlink; + unsigned long scc_index; }; =20 struct unix_edge { diff --git a/net/unix/garbage.c b/net/unix/garbage.c index d25841ab2de4..2e66b57f3f0f 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -312,9 +312,8 @@ static bool unix_scc_cyclic(struct list_head *scc) static LIST_HEAD(unix_visited_vertices); static unsigned long unix_vertex_grouped_index =3D UNIX_VERTEX_INDEX_MARK2; =20 -static void __unix_walk_scc(struct unix_vertex *vertex) +static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *las= t_index) { - unsigned long index =3D UNIX_VERTEX_INDEX_START; LIST_HEAD(vertex_stack); struct unix_edge *edge; LIST_HEAD(edge_stack); @@ -326,9 +325,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex) */ list_add(&vertex->scc_entry, &vertex_stack); =20 - vertex->index =3D index; - vertex->lowlink =3D index; - index++; + vertex->index =3D *last_index; + vertex->scc_index =3D *last_index; + (*last_index)++; =20 /* Explore neighbour vertices (receivers of the current vertex's fd). */ list_for_each_entry(edge, &vertex->edges, vertex_entry) { @@ -358,30 +357,30 @@ static void __unix_walk_scc(struct unix_vertex *verte= x) next_vertex =3D vertex; vertex =3D edge->predecessor->vertex; =20 - /* If the successor has a smaller lowlink, two vertices - * are in the same SCC, so propagate the smaller lowlink + /* If the successor has a smaller scc_index, two vertices + * are in the same SCC, so propagate the smaller scc_index * to skip SCC finalisation. */ - vertex->lowlink =3D min(vertex->lowlink, next_vertex->lowlink); + vertex->scc_index =3D min(vertex->scc_index, next_vertex->scc_index); } else if (next_vertex->index !=3D unix_vertex_grouped_index) { /* Loop detected by a back/cross edge. * - * The successor is on vertex_stack, so two vertices are - * in the same SCC. If the successor has a smaller index, + * The successor is on vertex_stack, so two vertices are in + * the same SCC. If the successor has a smaller *scc_index*, * propagate it to skip SCC finalisation. */ - vertex->lowlink =3D min(vertex->lowlink, next_vertex->index); + vertex->scc_index =3D min(vertex->scc_index, next_vertex->scc_index); } else { /* The successor was already grouped as another SCC */ } } =20 - if (vertex->index =3D=3D vertex->lowlink) { + if (vertex->index =3D=3D vertex->scc_index) { struct list_head scc; =20 /* SCC finalised. * - * If the lowlink was not updated, all the vertices above on + * If the scc_index was not updated, all the vertices above on * vertex_stack are in the same SCC. Group them using scc_entry. */ __list_cut_position(&scc, &vertex_stack, &vertex->scc_entry); @@ -407,6 +406,8 @@ static void __unix_walk_scc(struct unix_vertex *vertex) =20 static void unix_walk_scc(void) { + unsigned long last_index =3D UNIX_VERTEX_INDEX_START; + unix_graph_maybe_cyclic =3D false; =20 /* Visit every vertex exactly once. @@ -416,7 +417,7 @@ static void unix_walk_scc(void) struct unix_vertex *vertex; =20 vertex =3D list_first_entry(&unix_unvisited_vertices, typeof(*vertex), e= ntry); - __unix_walk_scc(vertex); + __unix_walk_scc(vertex, &last_index); } =20 list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4025D22127A; Wed, 21 May 2025 15:34:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841679; cv=none; b=qh2yhxsRWaDY5UMu022GpAdxD57iPC7dqEH2CBl4cfeNXV7Uz5QbJvp8YV+yvaHjEd714aAXQQ2qQWEIKQoN77PUBM+CmV9N3+K8E8Ee8pSfiqL3vrIbV/Cpn6Wd3LcU+QKbf5Qs/lBoZwfSiqJhX9/y+wmKUnTlREpSZ6EcAsQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841679; c=relaxed/simple; bh=HEKPamw4N0SlwDUEynkY5dfMAkCnnRNBCS9aAC+V1kg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kTpQZNfA0tuuqETiGgYljfc3ES5aoZvS/biKyeukpxJGrcaYojJXWQ1S9/qQdLPXHYs/imcDNqRXRh5Irl09d3SmAhYZp8LVDvebGQs50y9zl1stfgR2Vu6XaBbO8Xrp0o15fm+qPrBtoBzfP6ykQLFuc+JKy69E6Bmvk23Oc2s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NLkJs1ml; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NLkJs1ml" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2D4A9C4CEE7; Wed, 21 May 2025 15:34:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841679; bh=HEKPamw4N0SlwDUEynkY5dfMAkCnnRNBCS9aAC+V1kg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NLkJs1mlGZOcEy/Rd5EZWCQArgVcPmHGtil0q3tehtMMwKrpmFCmEjcE3YlErsjgY H4utljsy5djUbPijWEWwQBZk2p/IT+ZGsEWz90sABAkGcT19qhvkBtE7gijEIWK6JA Vek5jhv47XzspmDZgfQBv/DqK3A5gCXcfHVNXLQlzKaFlrVjc7pTLOkDAoueLtUoMU 9OaMhRUKdrNZJt/qp2OyEm+ODSFleJJi/LLeR+KD6i78Ycka/eMGrm9D5Vu52PddSX wCg/YV85iEoCm7XDa026Xrw0VLevXKsrEJirSVy1Vk5pnnjmr2N7aD8AE8kZ8ktJtT +4zbb3h8i20BA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Alexander Mikhalitsyn , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 20/27] af_unix: Detect dead SCC. Date: Wed, 21 May 2025 16:27:19 +0100 Message-ID: <20250521152920.1116756-21-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit a15702d8b3aad8ce5268c565bd29f0e02fd2db83 ] When iterating SCC, we call unix_vertex_dead() for each vertex to check if the vertex is close()d and has no bridge to another SCC. If both conditions are true for every vertex in SCC, we can execute garbage collection for all skb in the SCC. The actual garbage collection is done in the following patch, replacing the old implementation. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-14-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit a15702d8b3aad8ce5268c565bd29f0e02fd2db83) Signed-off-by: Lee Jones --- net/unix/garbage.c | 44 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 2e66b57f3f0f..1f53c25fc71b 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -289,6 +289,39 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) unix_free_vertices(fpl); } =20 +static bool unix_vertex_dead(struct unix_vertex *vertex) +{ + struct unix_edge *edge; + struct unix_sock *u; + long total_ref; + + list_for_each_entry(edge, &vertex->edges, vertex_entry) { + struct unix_vertex *next_vertex =3D unix_edge_successor(edge); + + /* The vertex's fd can be received by a non-inflight socket. */ + if (!next_vertex) + return false; + + /* The vertex's fd can be received by an inflight socket in + * another SCC. + */ + if (next_vertex->scc_index !=3D vertex->scc_index) + return false; + } + + /* No receiver exists out of the same SCC. */ + + edge =3D list_first_entry(&vertex->edges, typeof(*edge), vertex_entry); + u =3D edge->predecessor; + total_ref =3D file_count(u->sk.sk_socket->file); + + /* If not close()d, total_ref > out_degree. */ + if (total_ref !=3D vertex->out_degree) + return false; + + return true; +} + static bool unix_scc_cyclic(struct list_head *scc) { struct unix_vertex *vertex; @@ -377,6 +410,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex,= unsigned long *last_inde =20 if (vertex->index =3D=3D vertex->scc_index) { struct list_head scc; + bool scc_dead =3D true; =20 /* SCC finalised. * @@ -391,6 +425,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex,= unsigned long *last_inde =20 /* Mark vertex as off-stack. */ vertex->index =3D unix_vertex_grouped_index; + + if (scc_dead) + scc_dead =3D unix_vertex_dead(vertex); } =20 if (!unix_graph_maybe_cyclic) @@ -431,13 +468,18 @@ static void unix_walk_scc_fast(void) while (!list_empty(&unix_unvisited_vertices)) { struct unix_vertex *vertex; struct list_head scc; + bool scc_dead =3D true; =20 vertex =3D list_first_entry(&unix_unvisited_vertices, typeof(*vertex), e= ntry); list_add(&scc, &vertex->scc_entry); =20 - list_for_each_entry_reverse(vertex, &scc, scc_entry) + list_for_each_entry_reverse(vertex, &scc, scc_entry) { list_move_tail(&vertex->entry, &unix_visited_vertices); =20 + if (scc_dead) + scc_dead =3D unix_vertex_dead(vertex); + } + list_del(&scc); } =20 --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0F94229B1A; Wed, 21 May 2025 15:34:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841686; cv=none; b=nwjAX1JV5blAi3HFrUrMa4VI4qNgp2HDI3vn1jZSs7diIYwjVKmYd/Mg/LdG19hNjYdnWFpCjQGUUcHI3SHOetc3WS2H1ve3B0Dk03JkHv4/nvuGHDR+86qEC7l6uNQ/6+b9C1DrH4wp/VYDiSzkBCzyZJHtRJeSvtTVj0XrMpo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841686; c=relaxed/simple; bh=Z4yeiZqWZpuNmbt9uzWikv0yXPuruF94qxI5kAwDCSg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XUsyepezwES8rAXAVSwAqIGNz4alQsDCacBBruFySQZ1NsgoB22Q13TS40YAm8IgawWhGHVp15RVe4Aip1JOvfyegwoKR4D2ayJ5ORi6Hq6Ifb8rm/o0ht75m3QhH7WXhusD1O0+dB+ODmNQQoHroTMUkjQZkS2mAkNQbrL1euA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SR4w3AP3; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SR4w3AP3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 670B9C4CEEB; Wed, 21 May 2025 15:34:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841686; bh=Z4yeiZqWZpuNmbt9uzWikv0yXPuruF94qxI5kAwDCSg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SR4w3AP3mCN08c3pDl31W71B4LJP/WgDfH6dJB/z9lVQX3Xkx2RGk2WbXaRR/S0ox I9vFhdoqpjzb6rSnpVIOUiYurvsgj+oRvPqR7cKeVAExs3LNK+MyuPWFsyoWijL5UO hWpAn8f6obkOmglDexHmP4uBv1C8bb7XgfJwId6vfqohxS+oUtFxV+Db1FSEnjPXYT MHs2qGwj34LMzEs/SWSmz3FcPSuWQE5UMP/LaBTCERql/HjKlYAlZQMjTvWvGGO2S2 tGPc/o5NA0iTXtaOzuurWhXMXQa2l4u5wxxYPgXT5+2jhLF+l5v3mepvvKAdx+iVXq Il0pX8lWPN2nw== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Alexander Mikhalitsyn , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 21/27] af_unix: Replace garbage collection algorithm. Date: Wed, 21 May 2025 16:27:20 +0100 Message-ID: <20250521152920.1116756-22-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 4090fa373f0e763c43610853d2774b5979915959 ] If we find a dead SCC during iteration, we call unix_collect_skb() to splice all skb in the SCC to the global sk_buff_head, hitlist. After iterating all SCC, we unlock unix_gc_lock and purge the queue. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-15-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 4090fa373f0e763c43610853d2774b5979915959) Signed-off-by: Lee Jones --- include/net/af_unix.h | 8 -- net/unix/af_unix.c | 12 -- net/unix/garbage.c | 318 +++++++++--------------------------------- 3 files changed, 64 insertions(+), 274 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 14d56b07a54d..47808e366731 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -19,9 +19,6 @@ static inline struct unix_sock *unix_get_socket(struct fi= le *filp) =20 extern spinlock_t unix_gc_lock; extern unsigned int unix_tot_inflight; - -void unix_inflight(struct user_struct *user, struct file *fp); -void unix_notinflight(struct user_struct *user, struct file *fp); void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver); void unix_del_edges(struct scm_fp_list *fpl); void unix_update_edges(struct unix_sock *receiver); @@ -85,12 +82,7 @@ struct unix_sock { struct sock *peer; struct sock *listener; struct unix_vertex *vertex; - struct list_head link; - unsigned long inflight; spinlock_t lock; - unsigned long gc_flags; -#define UNIX_GC_CANDIDATE 0 -#define UNIX_GC_MAYBE_CYCLE 1 struct socket_wq peer_wq; wait_queue_entry_t peer_wake; struct scm_stat scm_stat; diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 4d8b2b2b9a70..25f66adf47d1 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -955,12 +955,10 @@ static struct sock *unix_create1(struct net *net, str= uct socket *sock, int kern, sk->sk_destruct =3D unix_sock_destructor; u =3D unix_sk(sk); u->listener =3D NULL; - u->inflight =3D 0; u->vertex =3D NULL; u->path.dentry =3D NULL; u->path.mnt =3D NULL; spin_lock_init(&u->lock); - INIT_LIST_HEAD(&u->link); mutex_init(&u->iolock); /* single task reading lock */ mutex_init(&u->bindlock); /* single task binding lock */ init_waitqueue_head(&u->peer_wait); @@ -1744,8 +1742,6 @@ static inline bool too_many_unix_fds(struct task_stru= ct *p) =20 static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) { - int i; - if (too_many_unix_fds(current)) return -ETOOMANYREFS; =20 @@ -1757,9 +1753,6 @@ static int unix_attach_fds(struct scm_cookie *scm, st= ruct sk_buff *skb) if (!UNIXCB(skb).fp) return -ENOMEM; =20 - for (i =3D scm->fp->count - 1; i >=3D 0; i--) - unix_inflight(scm->fp->user, scm->fp->fp[i]); - if (unix_prepare_fpl(UNIXCB(skb).fp)) return -ENOMEM; =20 @@ -1768,15 +1761,10 @@ static int unix_attach_fds(struct scm_cookie *scm, = struct sk_buff *skb) =20 static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb) { - int i; - scm->fp =3D UNIXCB(skb).fp; UNIXCB(skb).fp =3D NULL; =20 unix_destroy_fpl(scm->fp); - - for (i =3D scm->fp->count - 1; i >=3D 0; i--) - unix_notinflight(scm->fp->user, scm->fp->fp[i]); } =20 static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 1f53c25fc71b..89ea71d9297b 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -322,6 +322,52 @@ static bool unix_vertex_dead(struct unix_vertex *verte= x) return true; } =20 +enum unix_recv_queue_lock_class { + U_RECVQ_LOCK_NORMAL, + U_RECVQ_LOCK_EMBRYO, +}; + +static void unix_collect_skb(struct list_head *scc, struct sk_buff_head *h= itlist) +{ + struct unix_vertex *vertex; + + list_for_each_entry_reverse(vertex, scc, scc_entry) { + struct sk_buff_head *queue; + struct unix_edge *edge; + struct unix_sock *u; + + edge =3D list_first_entry(&vertex->edges, typeof(*edge), vertex_entry); + u =3D edge->predecessor; + queue =3D &u->sk.sk_receive_queue; + + spin_lock(&queue->lock); + + if (u->sk.sk_state =3D=3D TCP_LISTEN) { + struct sk_buff *skb; + + skb_queue_walk(queue, skb) { + struct sk_buff_head *embryo_queue =3D &skb->sk->sk_receive_queue; + + /* listener -> embryo order, the inversion never happens. */ + spin_lock_nested(&embryo_queue->lock, U_RECVQ_LOCK_EMBRYO); + skb_queue_splice_init(embryo_queue, hitlist); + spin_unlock(&embryo_queue->lock); + } + } else { + skb_queue_splice_init(queue, hitlist); + +#if IS_ENABLED(CONFIG_AF_UNIX_OOB) + if (u->oob_skb) { + kfree_skb(u->oob_skb); + u->oob_skb =3D NULL; + } +#endif + } + + spin_unlock(&queue->lock); + } +} + static bool unix_scc_cyclic(struct list_head *scc) { struct unix_vertex *vertex; @@ -345,7 +391,8 @@ static bool unix_scc_cyclic(struct list_head *scc) static LIST_HEAD(unix_visited_vertices); static unsigned long unix_vertex_grouped_index =3D UNIX_VERTEX_INDEX_MARK2; =20 -static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *las= t_index) +static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *las= t_index, + struct sk_buff_head *hitlist) { LIST_HEAD(vertex_stack); struct unix_edge *edge; @@ -430,7 +477,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex,= unsigned long *last_inde scc_dead =3D unix_vertex_dead(vertex); } =20 - if (!unix_graph_maybe_cyclic) + if (scc_dead) + unix_collect_skb(&scc, hitlist); + else if (!unix_graph_maybe_cyclic) unix_graph_maybe_cyclic =3D unix_scc_cyclic(&scc); =20 list_del(&scc); @@ -441,7 +490,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex,= unsigned long *last_inde goto prev_vertex; } =20 -static void unix_walk_scc(void) +static void unix_walk_scc(struct sk_buff_head *hitlist) { unsigned long last_index =3D UNIX_VERTEX_INDEX_START; =20 @@ -454,7 +503,7 @@ static void unix_walk_scc(void) struct unix_vertex *vertex; =20 vertex =3D list_first_entry(&unix_unvisited_vertices, typeof(*vertex), e= ntry); - __unix_walk_scc(vertex, &last_index); + __unix_walk_scc(vertex, &last_index, hitlist); } =20 list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); @@ -463,7 +512,7 @@ static void unix_walk_scc(void) unix_graph_grouped =3D true; } =20 -static void unix_walk_scc_fast(void) +static void unix_walk_scc_fast(struct sk_buff_head *hitlist) { while (!list_empty(&unix_unvisited_vertices)) { struct unix_vertex *vertex; @@ -480,279 +529,40 @@ static void unix_walk_scc_fast(void) scc_dead =3D unix_vertex_dead(vertex); } =20 + if (scc_dead) + unix_collect_skb(&scc, hitlist); + list_del(&scc); } =20 list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); } =20 -static LIST_HEAD(gc_candidates); -static LIST_HEAD(gc_inflight_list); - -/* Keep the number of times in flight count for the file - * descriptor if it is for an AF_UNIX socket. - */ -void unix_inflight(struct user_struct *user, struct file *filp) -{ - struct unix_sock *u =3D unix_get_socket(filp); - - spin_lock(&unix_gc_lock); - - if (u) { - if (!u->inflight) { - WARN_ON_ONCE(!list_empty(&u->link)); - list_add_tail(&u->link, &gc_inflight_list); - } else { - WARN_ON_ONCE(list_empty(&u->link)); - } - u->inflight++; - } - - spin_unlock(&unix_gc_lock); -} - -void unix_notinflight(struct user_struct *user, struct file *filp) -{ - struct unix_sock *u =3D unix_get_socket(filp); - - spin_lock(&unix_gc_lock); - - if (u) { - WARN_ON_ONCE(!u->inflight); - WARN_ON_ONCE(list_empty(&u->link)); - - u->inflight--; - if (!u->inflight) - list_del_init(&u->link); - } - - spin_unlock(&unix_gc_lock); -} - -static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), - struct sk_buff_head *hitlist) -{ - struct sk_buff *skb; - struct sk_buff *next; - - spin_lock(&x->sk_receive_queue.lock); - skb_queue_walk_safe(&x->sk_receive_queue, skb, next) { - /* Do we have file descriptors ? */ - if (UNIXCB(skb).fp) { - bool hit =3D false; - /* Process the descriptors of this socket */ - int nfd =3D UNIXCB(skb).fp->count; - struct file **fp =3D UNIXCB(skb).fp->fp; - - while (nfd--) { - /* Get the socket the fd matches if it indeed does so */ - struct unix_sock *u =3D unix_get_socket(*fp++); - - /* Ignore non-candidates, they could have been added - * to the queues after starting the garbage collection - */ - if (u && test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) { - hit =3D true; - - func(u); - } - } - if (hit && hitlist !=3D NULL) { - __skb_unlink(skb, &x->sk_receive_queue); - __skb_queue_tail(hitlist, skb); - } - } - } - spin_unlock(&x->sk_receive_queue.lock); -} - -static void scan_children(struct sock *x, void (*func)(struct unix_sock *), - struct sk_buff_head *hitlist) -{ - if (x->sk_state !=3D TCP_LISTEN) { - scan_inflight(x, func, hitlist); - } else { - struct sk_buff *skb; - struct sk_buff *next; - struct unix_sock *u; - LIST_HEAD(embryos); - - /* For a listening socket collect the queued embryos - * and perform a scan on them as well. - */ - spin_lock(&x->sk_receive_queue.lock); - skb_queue_walk_safe(&x->sk_receive_queue, skb, next) { - u =3D unix_sk(skb->sk); - - /* An embryo cannot be in-flight, so it's safe - * to use the list link. - */ - WARN_ON_ONCE(!list_empty(&u->link)); - list_add_tail(&u->link, &embryos); - } - spin_unlock(&x->sk_receive_queue.lock); - - while (!list_empty(&embryos)) { - u =3D list_entry(embryos.next, struct unix_sock, link); - scan_inflight(&u->sk, func, hitlist); - list_del_init(&u->link); - } - } -} - -static void dec_inflight(struct unix_sock *usk) -{ - usk->inflight--; -} - -static void inc_inflight(struct unix_sock *usk) -{ - usk->inflight++; -} - -static void inc_inflight_move_tail(struct unix_sock *u) -{ - u->inflight++; - - /* If this still might be part of a cycle, move it to the end - * of the list, so that it's checked even if it was already - * passed over - */ - if (test_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags)) - list_move_tail(&u->link, &gc_candidates); -} - static bool gc_in_progress; =20 static void __unix_gc(struct work_struct *work) { struct sk_buff_head hitlist; - struct unix_sock *u, *next; - LIST_HEAD(not_cycle_list); - struct list_head cursor; =20 spin_lock(&unix_gc_lock); =20 - if (!unix_graph_maybe_cyclic) + if (!unix_graph_maybe_cyclic) { + spin_unlock(&unix_gc_lock); goto skip_gc; - - if (unix_graph_grouped) - unix_walk_scc_fast(); - else - unix_walk_scc(); - - /* First, select candidates for garbage collection. Only - * in-flight sockets are considered, and from those only ones - * which don't have any external reference. - * - * Holding unix_gc_lock will protect these candidates from - * being detached, and hence from gaining an external - * reference. Since there are no possible receivers, all - * buffers currently on the candidates' queues stay there - * during the garbage collection. - * - * We also know that no new candidate can be added onto the - * receive queues. Other, non candidate sockets _can_ be - * added to queue, so we must make sure only to touch - * candidates. - * - * Embryos, though never candidates themselves, affect which - * candidates are reachable by the garbage collector. Before - * being added to a listener's queue, an embryo may already - * receive data carrying SCM_RIGHTS, potentially making the - * passed socket a candidate that is not yet reachable by the - * collector. It becomes reachable once the embryo is - * enqueued. Therefore, we must ensure that no SCM-laden - * embryo appears in a (candidate) listener's queue between - * consecutive scan_children() calls. - */ - list_for_each_entry_safe(u, next, &gc_inflight_list, link) { - struct sock *sk =3D &u->sk; - long total_refs; - - total_refs =3D file_count(sk->sk_socket->file); - - WARN_ON_ONCE(!u->inflight); - WARN_ON_ONCE(total_refs < u->inflight); - if (total_refs =3D=3D u->inflight) { - list_move_tail(&u->link, &gc_candidates); - __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags); - __set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags); - - if (sk->sk_state =3D=3D TCP_LISTEN) { - unix_state_lock_nested(sk, U_LOCK_GC_LISTENER); - unix_state_unlock(sk); - } - } - } - - /* Now remove all internal in-flight reference to children of - * the candidates. - */ - list_for_each_entry(u, &gc_candidates, link) - scan_children(&u->sk, dec_inflight, NULL); - - /* Restore the references for children of all candidates, - * which have remaining references. Do this recursively, so - * only those remain, which form cyclic references. - * - * Use a "cursor" link, to make the list traversal safe, even - * though elements might be moved about. - */ - list_add(&cursor, &gc_candidates); - while (cursor.next !=3D &gc_candidates) { - u =3D list_entry(cursor.next, struct unix_sock, link); - - /* Move cursor to after the current position. */ - list_move(&cursor, &u->link); - - if (u->inflight) { - list_move_tail(&u->link, ¬_cycle_list); - __clear_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags); - scan_children(&u->sk, inc_inflight_move_tail, NULL); - } } - list_del(&cursor); =20 - /* Now gc_candidates contains only garbage. Restore original - * inflight counters for these as well, and remove the skbuffs - * which are creating the cycle(s). - */ - skb_queue_head_init(&hitlist); - list_for_each_entry(u, &gc_candidates, link) { - scan_children(&u->sk, inc_inflight, &hitlist); + __skb_queue_head_init(&hitlist); =20 -#if IS_ENABLED(CONFIG_AF_UNIX_OOB) - if (u->oob_skb) { - kfree_skb(u->oob_skb); - u->oob_skb =3D NULL; - } -#endif - } - - /* not_cycle_list contains those sockets which do not make up a - * cycle. Restore these to the inflight list. - */ - while (!list_empty(¬_cycle_list)) { - u =3D list_entry(not_cycle_list.next, struct unix_sock, link); - __clear_bit(UNIX_GC_CANDIDATE, &u->gc_flags); - list_move_tail(&u->link, &gc_inflight_list); - } + if (unix_graph_grouped) + unix_walk_scc_fast(&hitlist); + else + unix_walk_scc(&hitlist); =20 spin_unlock(&unix_gc_lock); =20 - /* Here we are. Hitlist is filled. Die. */ __skb_queue_purge(&hitlist); - - spin_lock(&unix_gc_lock); - - /* All candidates should have been detached by now. */ - WARN_ON_ONCE(!list_empty(&gc_candidates)); skip_gc: - /* Paired with READ_ONCE() in wait_for_unix_gc(). */ WRITE_ONCE(gc_in_progress, false); - - spin_unlock(&unix_gc_lock); } =20 static DECLARE_WORK(unix_gc_work, __unix_gc); --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD63922157E; Wed, 21 May 2025 15:34:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841693; cv=none; b=qdOT9s8WuC8kINzQseoWvEwGFp5/346DeQ9v2yYZlqY791KUxjUFBqEvXQGR6D9flvIqzg/jT/Eru6lgtJje4YiAnIekcJcl7FoNsTjQYoUSghpmGO4YPBH65FnK5L0tEbpidEMkxJhl1p3xkYlAIffqcuForMXLmUPJg6FanYE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841693; c=relaxed/simple; bh=ERiCiUGqcRQ1G1WKLLAB8HnGpZKvb8P/HcN8e3Z+wu8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=azXJ0YY7BVgmWUUkknBlYBZFwwyqBRIrl1utp306ON7d5jUBaBsr7h9dDaPGJp7N0Mot9sKYHkTBC2KCKzBPKA5SvBc2KbqiO+2ZsRkBtvMXqcbqK4SDrr2VdDUyD/gYQAwQmuVFa/as1MRB2xAvgIaSttR2BYBXXcW5auuXvU4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=npabVo0V; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="npabVo0V" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 969CCC4CEE4; Wed, 21 May 2025 15:34:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841693; bh=ERiCiUGqcRQ1G1WKLLAB8HnGpZKvb8P/HcN8e3Z+wu8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=npabVo0VCqqr9MC01PtJU4I23drif1Zf7URdVhwAc+nC0Q07mCjuF3WqYNf1hVvkJ HNMGcNjvUo4pTuIS7aOQ4c2xauA0mPRLf91ePdXyMtNOH6H68nSjfqlT8xNHMtMN6z jU/DO5GI81fP7XAw2pnCpXeaxrp4KOPmJ99lc/cYJAMiifDXCZZmOwpiqXeAV/hXh4 zuW4xH0mIyMJTituBavBC48IMImsKOG5Qt1Alf5NTWOJCzmCinN9Ap5FT9MQzy/3Wp jVuhXM3/weCR4irf3NIWj05jRFofArqc63GDpd67EqDq5mQ37uPmBKTfkTGwackJSy xqmG2gVWd8IJA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Alexander Mikhalitsyn , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 22/27] af_unix: Remove lock dance in unix_peek_fds(). Date: Wed, 21 May 2025 16:27:21 +0100 Message-ID: <20250521152920.1116756-23-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 118f457da9ed58a79e24b73c2ef0aa1987241f0e ] In the previous GC implementation, the shape of the inflight socket graph was not expected to change while GC was in progress. MSG_PEEK was tricky because it could install inflight fd silently and transform the graph. Let's say we peeked a fd, which was a listening socket, and accept()ed some embryo sockets from it. The garbage collection algorithm would have been confused because the set of sockets visited in scan_inflight() would change within the same GC invocation. That's why we placed spin_lock(&unix_gc_lock) and spin_unlock() in unix_peek_fds() with a fat comment. In the new GC implementation, we no longer garbage-collect the socket if it exists in another queue, that is, if it has a bridge to another SCC. Also, accept() will require the lock if it has edges. Thus, we need not do the complicated lock dance. Signed-off-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20240401173125.92184-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 118f457da9ed58a79e24b73c2ef0aa1987241f0e) Signed-off-by: Lee Jones --- include/net/af_unix.h | 1 - net/unix/af_unix.c | 42 ------------------------------------------ net/unix/garbage.c | 2 +- 3 files changed, 1 insertion(+), 44 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 47808e366731..4c726df56c0b 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -17,7 +17,6 @@ static inline struct unix_sock *unix_get_socket(struct fi= le *filp) } #endif =20 -extern spinlock_t unix_gc_lock; extern unsigned int unix_tot_inflight; void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver); void unix_del_edges(struct scm_fp_list *fpl); diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 25f66adf47d1..ce5b74dfd8ae 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1770,48 +1770,6 @@ static void unix_detach_fds(struct scm_cookie *scm, = struct sk_buff *skb) static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) { scm->fp =3D scm_fp_dup(UNIXCB(skb).fp); - - /* - * Garbage collection of unix sockets starts by selecting a set of - * candidate sockets which have reference only from being in flight - * (total_refs =3D=3D inflight_refs). This condition is checked once dur= ing - * the candidate collection phase, and candidates are marked as such, so - * that non-candidates can later be ignored. While inflight_refs is - * protected by unix_gc_lock, total_refs (file count) is not, hence this - * is an instantaneous decision. - * - * Once a candidate, however, the socket must not be reinstalled into a - * file descriptor while the garbage collection is in progress. - * - * If the above conditions are met, then the directed graph of - * candidates (*) does not change while unix_gc_lock is held. - * - * Any operations that changes the file count through file descriptors - * (dup, close, sendmsg) does not change the graph since candidates are - * not installed in fds. - * - * Dequeing a candidate via recvmsg would install it into an fd, but - * that takes unix_gc_lock to decrement the inflight count, so it's - * serialized with garbage collection. - * - * MSG_PEEK is special in that it does not change the inflight count, - * yet does install the socket into an fd. The following lock/unlock - * pair is to ensure serialization with garbage collection. It must be - * done between incrementing the file count and installing the file into - * an fd. - * - * If garbage collection starts after the barrier provided by the - * lock/unlock, then it will see the elevated refcount and not mark this - * as a candidate. If a garbage collection is already in progress - * before the file count was incremented, then the lock/unlock pair will - * ensure that garbage collection is finished before progressing to - * installing the fd. - * - * (*) A -> B where B is on the queue of A or B is on the queue of C - * which is on the queue of listening socket A. - */ - spin_lock(&unix_gc_lock); - spin_unlock(&unix_gc_lock); } =20 static void unix_destruct_scm(struct sk_buff *skb) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 89ea71d9297b..12a4ec27e0d4 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -183,7 +183,7 @@ static void unix_free_vertices(struct scm_fp_list *fpl) } } =20 -DEFINE_SPINLOCK(unix_gc_lock); +static DEFINE_SPINLOCK(unix_gc_lock); unsigned int unix_tot_inflight; =20 void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver) --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06E2321A420; Wed, 21 May 2025 15:35:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841701; cv=none; b=ATR9gOxh+JC4kfA4hSrG4enLkRnboOmRhVkjiPFnB3NvQRoScZKfl2poE2T2UXMxBJarNxi5R4K37SRfrxLE+fgQoyIDh4IpNyssSM/7j7rUz1TLd5Gxn+xhmiowWT9piw4/W5irQuCRZdAxPEMdEAhCJpyAzUarEbgUEnsohv8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841701; c=relaxed/simple; bh=mi2zcw7qvB0tv8uFyF/DELKbUV30hjdh0e9RKqDxxGU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uT4SPG1avwVxw2HZXQ/FzA42exh7D4uq9gF0h97EvbiYIffa62X6vzKivVPBCvt6pKrXrSURk1PvWmCXAgsJ938IoVjS8/6kk92XBdKFheJNEOPzk2WEIGk6MqXnEHm7Hm8eoK+neZO8CKz0VKwCzn5dRpNBAUPRn3/OhRULAhc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BCkT2Su1; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BCkT2Su1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BD971C4CEE7; Wed, 21 May 2025 15:34:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841700; bh=mi2zcw7qvB0tv8uFyF/DELKbUV30hjdh0e9RKqDxxGU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=BCkT2Su1FhhgiFEpjoZL4y0uj+gfjB347kLWFMQGikn1WgY6rs5Q1NZnEvuNjF/wL 1v1vGIeapY77dt7I+24/Wm6Q0he3NCQYkTR6Nch077CYcScVGEaiGDJeKYspQg2Qlz y4inirtVxkeSKdReK2uBbgr1OIBH4KbsHlvPpqyD5/Nd9Axsu2yUCRdHlPp1xhZkoQ dAmqtoBIxKnA96EdLflwvSqD+fnSzvFYtF8JwVJBRk/AY1+iU1tZf3BVa2g9gg/kV2 fbyIStb+Sy76DBMSdNsaxSmSq7RzZGxnsqDjsg0vq5cA455fqRQZm5BcTFZtQQGIJC pYcvwNRyVgTDw== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Jens Axboe , Alexander Mikhalitsyn , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org, kernel test robot Subject: [PATCH v6.1 23/27] af_unix: Try not to hold unix_gc_lock during accept(). Date: Wed, 21 May 2025 16:27:22 +0100 Message-ID: <20250521152920.1116756-24-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit fd86344823b521149bb31d91eba900ba3525efa6 ] Commit dcf70df2048d ("af_unix: Fix up unix_edge.successor for embryo socket.") added spin_lock(&unix_gc_lock) in accept() path, and it caused regression in a stress test as reported by kernel test robot. If the embryo socket is not part of the inflight graph, we need not hold the lock. To decide that in O(1) time and avoid the regression in the normal use case, 1. add a new stat unix_sk(sk)->scm_stat.nr_unix_fds 2. count the number of inflight AF_UNIX sockets in the receive queue under unix_state_lock() 3. move unix_update_edges() call under unix_state_lock() 4. avoid locking if nr_unix_fds is 0 in unix_update_edges() Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-lkp/202404101427.92a08551-oliver.sang@in= tel.com Signed-off-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20240413021928.20946-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni (cherry picked from commit fd86344823b521149bb31d91eba900ba3525efa6) Signed-off-by: Lee Jones --- include/net/af_unix.h | 1 + net/unix/af_unix.c | 2 +- net/unix/garbage.c | 20 ++++++++++++++++---- 3 files changed, 18 insertions(+), 5 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 4c726df56c0b..b1f82d74339e 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -67,6 +67,7 @@ struct unix_skb_parms { =20 struct scm_stat { atomic_t nr_fds; + unsigned long nr_unix_fds; }; =20 #define UNIXCB(skb) (*(struct unix_skb_parms *)&((skb)->cb)) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index ce5b74dfd8ae..79b783a70c87 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1677,12 +1677,12 @@ static int unix_accept(struct socket *sock, struct = socket *newsock, int flags, } =20 tsk =3D skb->sk; - unix_update_edges(unix_sk(tsk)); skb_free_datagram(sk, skb); wake_up_interruptible(&unix_sk(sk)->peer_wait); =20 /* attach accepted sock to socket */ unix_state_lock(tsk); + unix_update_edges(unix_sk(tsk)); newsock->state =3D SS_CONNECTED; unix_sock_inherit_flags(sock, newsock); sock_graft(tsk, newsock); diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 12a4ec27e0d4..95240a59808f 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -209,6 +209,7 @@ void unix_add_edges(struct scm_fp_list *fpl, struct uni= x_sock *receiver) unix_add_edge(fpl, edge); } while (i < fpl->count_unix); =20 + receiver->scm_stat.nr_unix_fds +=3D fpl->count_unix; WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + fpl->count_unix); out: WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight + fpl->coun= t); @@ -222,6 +223,7 @@ void unix_add_edges(struct scm_fp_list *fpl, struct uni= x_sock *receiver) =20 void unix_del_edges(struct scm_fp_list *fpl) { + struct unix_sock *receiver; int i =3D 0; =20 spin_lock(&unix_gc_lock); @@ -235,6 +237,8 @@ void unix_del_edges(struct scm_fp_list *fpl) unix_del_edge(fpl, edge); } while (i < fpl->count_unix); =20 + receiver =3D fpl->edges[0].successor; + receiver->scm_stat.nr_unix_fds -=3D fpl->count_unix; WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - fpl->count_unix); out: WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight - fpl->coun= t); @@ -246,10 +250,18 @@ void unix_del_edges(struct scm_fp_list *fpl) =20 void unix_update_edges(struct unix_sock *receiver) { - spin_lock(&unix_gc_lock); - unix_update_graph(unix_sk(receiver->listener)->vertex); - receiver->listener =3D NULL; - spin_unlock(&unix_gc_lock); + /* nr_unix_fds is only updated under unix_state_lock(). + * If it's 0 here, the embryo socket is not part of the + * inflight graph, and GC will not see it, so no lock needed. + */ + if (!receiver->scm_stat.nr_unix_fds) { + receiver->listener =3D NULL; + } else { + spin_lock(&unix_gc_lock); + unix_update_graph(unix_sk(receiver->listener)->vertex); + receiver->listener =3D NULL; + spin_unlock(&unix_gc_lock); + } } =20 int unix_prepare_fpl(struct scm_fp_list *fpl) --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D2011EB9E1; Wed, 21 May 2025 15:35:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841709; cv=none; b=LS4/Q8FZtZQM8KTA1nXKCSdEldVFRTTnDSFyeQsITPb3vOsr3ZZyZtzkUofFhxBL+oVlRp8Jm+MB4rbP+vrR/YzNr05K6eVZ+ascLdAb5KoIdUdWXBxq0IEESZfXSYkI393qhiYgeuJrkoSkHKAceHjnf9YDEcLzDaquHlvmuc4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841709; c=relaxed/simple; bh=PT5Ok3WA0PklFQdEs6v5dKeUByJAMYK5CcQzWgFX76s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OI43OAxrc2FzDiPeHgTAmxMqcPgRd/7lwaLnnDdqQWrG81LlrSgZlJs4Y6954+jb6DwooLkF3uRS56Xan9FblMHZdt8ToAhdLBf1bvUuKw75dl9TFmUt75SN4JvR0q9wqbwCYsFI2Roa/hGQL6lT7bgOjnroy2jkj0touZsPcUo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fZV+/0pE; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fZV+/0pE" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 584DFC4CEEB; Wed, 21 May 2025 15:35:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841708; bh=PT5Ok3WA0PklFQdEs6v5dKeUByJAMYK5CcQzWgFX76s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fZV+/0pEU7Ph/QKkON90pB+6Kxfc4TKYUYzKBuBNk+FMYPVKbxidM/Jq1GRkzNvJ9 zVBf9QModPwznwI6+oorV5ewq4eu++IBMIoJv4b4u1X/xchuQgoh5ya7GlKai5Ooao MQRrYRcR4dSSwNqXUCGMH6cr0FjQUwfdS0xI3CW0d0KqYFIZkYBK9nUXLI7rKyJFrC 5FyfwMNS/Ak8P1mBiWi3U2zWivoI4XMqjOJvcxjGTJCWC357KSWAs0UDVcbQufOLPS kPXRVZj5nDKlvosKcG7aLK2L7paLgjbTO4CyqIVynooESen6QZsOiEu/N7LKSSzcPM DCjs7g2cPmcgQ== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Jens Axboe , Alexander Mikhalitsyn , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org, syzbot+f3f3eef1d2100200e593@syzkaller.appspotmail.com Subject: [PATCH v6.1 24/27] af_unix: Don't access successor in unix_del_edges() during GC. Date: Wed, 21 May 2025 16:27:23 +0100 Message-ID: <20250521152920.1116756-25-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 1af2dface5d286dd1f2f3405a0d6fa9f2c8fb998 ] syzbot reported use-after-free in unix_del_edges(). [0] What the repro does is basically repeat the following quickly. 1. pass a fd of an AF_UNIX socket to itself socketpair(AF_UNIX, SOCK_DGRAM, 0, [3, 4]) =3D 0 sendmsg(3, {..., msg_control=3D[{cmsg_len=3D20, cmsg_level=3DSOL_SOCKET, cmsg_type=3DSCM_RIGHTS, cmsg_data=3D[4]}= ], ...}, 0) =3D 0 2. pass other fds of AF_UNIX sockets to the socket above socketpair(AF_UNIX, SOCK_SEQPACKET, 0, [5, 6]) =3D 0 sendmsg(3, {..., msg_control=3D[{cmsg_len=3D48, cmsg_level=3DSOL_SOCKET, cmsg_type=3DSCM_RIGHTS, cmsg_data=3D[5, = 6]}], ...}, 0) =3D 0 3. close all sockets Here, two skb are created, and every unix_edge->successor is the first socket. Then, __unix_gc() will garbage-collect the two skb: (a) free skb with self-referencing fd (b) free skb holding other sockets After (a), the self-referencing socket will be scheduled to be freed later by the delayed_fput() task. syzbot repeated the sequences above (1. ~ 3.) quickly and triggered the task concurrently while GC was running. So, at (b), the socket was already freed, and accessing it was illegal. unix_del_edges() accesses the receiver socket as edge->successor to optimise GC. However, we should not do it during GC. Garbage-collecting sockets does not change the shape of the rest of the graph, so we need not call unix_update_graph() to update unix_graph_grouped when we purge skb. However, if we clean up all loops in the unix_walk_scc_fast() path, unix_graph_maybe_cyclic remains unchanged (true), and __unix_gc() will call unix_walk_scc_fast() continuously even though there is no socket to garbage-collect. To keep that optimisation while fixing UAF, let's add the same updating logic of unix_graph_maybe_cyclic in unix_walk_scc_fast() as done in unix_walk_scc() and __unix_walk_scc(). Note that when unix_del_edges() is called from other places, the receiver socket is always alive: - sendmsg: the successor's sk_refcnt is bumped by sock_hold() unix_find_other() for SOCK_DGRAM, connect() for SOCK_STREAM - recvmsg: the successor is the receiver, and its fd is alive [0]: BUG: KASAN: slab-use-after-free in unix_edge_successor net/unix/garbage.c:1= 09 [inline] BUG: KASAN: slab-use-after-free in unix_del_edge net/unix/garbage.c:165 [in= line] BUG: KASAN: slab-use-after-free in unix_del_edges+0x148/0x630 net/unix/garb= age.c:237 Read of size 8 at addr ffff888079c6e640 by task kworker/u8:6/1099 CPU: 0 PID: 1099 Comm: kworker/u8:6 Not tainted 6.9.0-rc4-next-20240418-syz= kaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Goo= gle 03/27/2024 Workqueue: events_unbound __unix_gc Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 unix_edge_successor net/unix/garbage.c:109 [inline] unix_del_edge net/unix/garbage.c:165 [inline] unix_del_edges+0x148/0x630 net/unix/garbage.c:237 unix_destroy_fpl+0x59/0x210 net/unix/garbage.c:298 unix_detach_fds net/unix/af_unix.c:1811 [inline] unix_destruct_scm+0x13e/0x210 net/unix/af_unix.c:1826 skb_release_head_state+0x100/0x250 net/core/skbuff.c:1127 skb_release_all net/core/skbuff.c:1138 [inline] __kfree_skb net/core/skbuff.c:1154 [inline] kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1190 __skb_queue_purge_reason include/linux/skbuff.h:3251 [inline] __skb_queue_purge include/linux/skbuff.h:3256 [inline] __unix_gc+0x1732/0x1830 net/unix/garbage.c:575 process_one_work kernel/workqueue.c:3218 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299 worker_thread+0x86d/0xd70 kernel/workqueue.c:3380 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Allocated by task 14427: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 unpoison_slab_object mm/kasan/common.c:312 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3897 [inline] slab_alloc_node mm/slub.c:3957 [inline] kmem_cache_alloc_noprof+0x135/0x290 mm/slub.c:3964 sk_prot_alloc+0x58/0x210 net/core/sock.c:2074 sk_alloc+0x38/0x370 net/core/sock.c:2133 unix_create1+0xb4/0x770 unix_create+0x14e/0x200 net/unix/af_unix.c:1034 __sock_create+0x490/0x920 net/socket.c:1571 sock_create net/socket.c:1622 [inline] __sys_socketpair+0x33e/0x720 net/socket.c:1773 __do_sys_socketpair net/socket.c:1822 [inline] __se_sys_socketpair net/socket.c:1819 [inline] __x64_sys_socketpair+0x9b/0xb0 net/socket.c:1819 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 1805: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2190 [inline] slab_free mm/slub.c:4393 [inline] kmem_cache_free+0x145/0x340 mm/slub.c:4468 sk_prot_free net/core/sock.c:2114 [inline] __sk_destruct+0x467/0x5f0 net/core/sock.c:2208 sock_put include/net/sock.h:1948 [inline] unix_release_sock+0xa8b/0xd20 net/unix/af_unix.c:665 unix_release+0x91/0xc0 net/unix/af_unix.c:1049 __sock_release net/socket.c:659 [inline] sock_close+0xbc/0x240 net/socket.c:1421 __fput+0x406/0x8b0 fs/file_table.c:422 delayed_fput+0x59/0x80 fs/file_table.c:445 process_one_work kernel/workqueue.c:3218 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299 worker_thread+0x86d/0xd70 kernel/workqueue.c:3380 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The buggy address belongs to the object at ffff888079c6e000 which belongs to the cache UNIX of size 1920 The buggy address is located 1600 bytes inside of freed 1920-byte region [ffff888079c6e000, ffff888079c6e780) Reported-by: syzbot+f3f3eef1d2100200e593@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=3Df3f3eef1d2100200e593 Fixes: 77e5593aebba ("af_unix: Skip GC if no cycle exists.") Fixes: fd86344823b5 ("af_unix: Try not to hold unix_gc_lock during accept()= .") Signed-off-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20240419235102.31707-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni (cherry picked from commit 1af2dface5d286dd1f2f3405a0d6fa9f2c8fb998) Signed-off-by: Lee Jones --- net/unix/garbage.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 95240a59808f..d76450133e4f 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -158,11 +158,14 @@ static void unix_add_edge(struct scm_fp_list *fpl, st= ruct unix_edge *edge) unix_update_graph(unix_edge_successor(edge)); } =20 +static bool gc_in_progress; + static void unix_del_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex =3D edge->predecessor->vertex; =20 - unix_update_graph(unix_edge_successor(edge)); + if (!gc_in_progress) + unix_update_graph(unix_edge_successor(edge)); =20 list_del(&edge->vertex_entry); vertex->out_degree--; @@ -237,8 +240,10 @@ void unix_del_edges(struct scm_fp_list *fpl) unix_del_edge(fpl, edge); } while (i < fpl->count_unix); =20 - receiver =3D fpl->edges[0].successor; - receiver->scm_stat.nr_unix_fds -=3D fpl->count_unix; + if (!gc_in_progress) { + receiver =3D fpl->edges[0].successor; + receiver->scm_stat.nr_unix_fds -=3D fpl->count_unix; + } WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - fpl->count_unix); out: WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight - fpl->coun= t); @@ -526,6 +531,8 @@ static void unix_walk_scc(struct sk_buff_head *hitlist) =20 static void unix_walk_scc_fast(struct sk_buff_head *hitlist) { + unix_graph_maybe_cyclic =3D false; + while (!list_empty(&unix_unvisited_vertices)) { struct unix_vertex *vertex; struct list_head scc; @@ -543,6 +550,8 @@ static void unix_walk_scc_fast(struct sk_buff_head *hit= list) =20 if (scc_dead) unix_collect_skb(&scc, hitlist); + else if (!unix_graph_maybe_cyclic) + unix_graph_maybe_cyclic =3D unix_scc_cyclic(&scc); =20 list_del(&scc); } @@ -550,8 +559,6 @@ static void unix_walk_scc_fast(struct sk_buff_head *hit= list) list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); } =20 -static bool gc_in_progress; - static void __unix_gc(struct work_struct *work) { struct sk_buff_head hitlist; --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35E1A21CA12; Wed, 21 May 2025 15:35:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841716; cv=none; b=i8luGjM7iJv86bjAllUoE8BGYM4Ds/HiyejGTnTF8t2OhHpi2FQGtY7rv2jXkCq3mu0VOPKT1+Ee1x8z7m74HhhPHAAgFZff0jSAdIUxgpVs4yNmDTP8Gl5YuKVnhx81B4XJMHgw6lMEeeIaOyrbqxtPkmIL0+5giMnSP/a3IMI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841716; c=relaxed/simple; bh=nFpv8UhQogZFwEw7DnfQ4TT/MPeOnir38BUAG2sE8mA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pGfgVz0NtbaJveDQBTQEmJxNKnZVbd8mZyBTzJ1vDlPz048+Emzu7PIKmhI5W8DzGqokhTcs1fFrT6ysrX4v6l8uhtXPc01lSuCNS9CAT994WyPNNna2wV6xa1vHrn46SUVfINLuvkaCa3aah7S3oP+DSFKAM9hgrMsv51EGoQ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XF0753SD; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XF0753SD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B0C58C4CEE4; Wed, 21 May 2025 15:35:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841715; bh=nFpv8UhQogZFwEw7DnfQ4TT/MPeOnir38BUAG2sE8mA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XF0753SDmWZuRg6Sb/HL76J5YJxubvUwe/clloAepLadwa5XZ1Tf5RpSPkmBADHYA mP3WAtHrXstacimuADd1f9TgRRNjMA+WfHqNikTjvCdeq/0CelQ+1+vjlU+Pukvs71 td2GCIByqefukOrO/HA6f9dMphKqLeh4aY0H19ddcg96WVacl8x9qO2azmmntOKPmS bHYN4TqxNHHH34odTxUwPF0rBtS7xGNrkvjW+zt76zxpuR9ktAlcxdsVQpprNBVqug TnxcYsKx7jS/TqPTCiXNEd0/kNURINpH9huNLPX0mkYAlTVQBNXSUWimxfrrR2DZCI ORCgADxlOCxJg== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Jens Axboe , Alexander Mikhalitsyn , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 25/27] af_unix: Add dead flag to struct scm_fp_list. Date: Wed, 21 May 2025 16:27:24 +0100 Message-ID: <20250521152920.1116756-26-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 7172dc93d621d5dc302d007e95ddd1311ec64283 ] Commit 1af2dface5d2 ("af_unix: Don't access successor in unix_del_edges() during GC.") fixed use-after-free by avoid accessing edge->successor while GC is in progress. However, there could be a small race window where another process could call unix_del_edges() while gc_in_progress is true and __skb_queue_purge() is on the way. So, we need another marker for struct scm_fp_list which indicates if the skb is garbage-collected. This patch adds dead flag in struct scm_fp_list and set it true before calling __skb_queue_purge(). Fixes: 1af2dface5d2 ("af_unix: Don't access successor in unix_del_edges() d= uring GC.") Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240508171150.50601-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 7172dc93d621d5dc302d007e95ddd1311ec64283) Signed-off-by: Lee Jones --- include/net/scm.h | 1 + net/core/scm.c | 1 + net/unix/garbage.c | 14 ++++++++++---- 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/include/net/scm.h b/include/net/scm.h index 19789096424d..0be0dc3eb1dc 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -31,6 +31,7 @@ struct scm_fp_list { short max; #ifdef CONFIG_UNIX bool inflight; + bool dead; struct list_head vertices; struct unix_edge *edges; #endif diff --git a/net/core/scm.c b/net/core/scm.c index 1ff78bd4ee83..cdd4e5befb14 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -91,6 +91,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_f= p_list **fplp) fpl->user =3D NULL; #if IS_ENABLED(CONFIG_UNIX) fpl->inflight =3D false; + fpl->dead =3D false; fpl->edges =3D NULL; INIT_LIST_HEAD(&fpl->vertices); #endif diff --git a/net/unix/garbage.c b/net/unix/garbage.c index d76450133e4f..1f8b8cdfcdc8 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -158,13 +158,11 @@ static void unix_add_edge(struct scm_fp_list *fpl, st= ruct unix_edge *edge) unix_update_graph(unix_edge_successor(edge)); } =20 -static bool gc_in_progress; - static void unix_del_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex =3D edge->predecessor->vertex; =20 - if (!gc_in_progress) + if (!fpl->dead) unix_update_graph(unix_edge_successor(edge)); =20 list_del(&edge->vertex_entry); @@ -240,7 +238,7 @@ void unix_del_edges(struct scm_fp_list *fpl) unix_del_edge(fpl, edge); } while (i < fpl->count_unix); =20 - if (!gc_in_progress) { + if (!fpl->dead) { receiver =3D fpl->edges[0].successor; receiver->scm_stat.nr_unix_fds -=3D fpl->count_unix; } @@ -559,9 +557,12 @@ static void unix_walk_scc_fast(struct sk_buff_head *hi= tlist) list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); } =20 +static bool gc_in_progress; + static void __unix_gc(struct work_struct *work) { struct sk_buff_head hitlist; + struct sk_buff *skb; =20 spin_lock(&unix_gc_lock); =20 @@ -579,6 +580,11 @@ static void __unix_gc(struct work_struct *work) =20 spin_unlock(&unix_gc_lock); =20 + skb_queue_walk(&hitlist, skb) { + if (UNIXCB(skb).fp) + UNIXCB(skb).fp->dead =3D true; + } + __skb_queue_purge(&hitlist); skip_gc: WRITE_ONCE(gc_in_progress, false); --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7B6122F16F; Wed, 21 May 2025 15:35:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841722; cv=none; b=qnT7RfORNrlFeVWkP1Ahbg0l6p0u9sxS8LYVy3GYmKtkdZ59lWUc1zofOX70/VbTCH6Vt9W/fcUHfJzfylW3uW6HabKamvFDw2gQ523P7imVWsrSXoKFkWKeqmC10QMsow1mhd3rD1rVHbvmX09unPlPF/heM5cGLzlNZo5KLg0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841722; c=relaxed/simple; bh=rEd3GIc9h95purN1mvqJDK9HVhwB0Wh8CFRZBngI7II=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=q8mYs+ccSanH6qf8q+UQxPC712wYfkiAd7fjF+27sxns2UYnrUiAeCWBQY7kioeXnwVySRwaVG4iCJjHAYjc61kvDPIOAGhR6oWaTEJDtJV6PWrkZ1KqVsJyJNAzntrTcvKfPrTr9LSZtbtYGF/OniKOYiE/SXpOUKxxsFvapzg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Tz3jmHst; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Tz3jmHst" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3512C4CEE4; Wed, 21 May 2025 15:35:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841722; bh=rEd3GIc9h95purN1mvqJDK9HVhwB0Wh8CFRZBngI7II=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Tz3jmHstAkrGj/lMkMKrAHiTmXIfBzecXOUfFgFqyI+I3xqE0hazzQeikkvmPueOK DmiFea0evRcG77RncL6yDiT4n2MCs42FeIdFikLDRlM//lc+ozE1CezwwpNmTji2/P NXYzofCTYwsf0kFlApvjMCaBZ8OL1KV7+XL1b95n6EahhvUBRR/C1C+1D7VjgkACbM FoQO9W2DaM69rAFCsbYFOLhIoObLsN8d1fExgmQZ8P2+rXo2h0BHZ9Kb90H+277hyf 5ZMmGx53TPu1Odf8bdrgNJv4cDexd+tMLOlQ9UqRD27QG0A8Dp2ZoZOM1XumKmrlhG nXLkLPHe7DZYw== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Jens Axboe , Alexander Mikhalitsyn , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.1 26/27] af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS Date: Wed, 21 May 2025 16:27:25 +0100 Message-ID: <20250521152920.1116756-27-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Michal Luczaj [ Upstream commit 041933a1ec7b4173a8e638cae4f8e394331d7e54 ] GC attempts to explicitly drop oob_skb's reference before purging the hit list. The problem is with embryos: kfree_skb(u->oob_skb) is never called on an embryo socket. The python script below [0] sends a listener's fd to its embryo as OOB data. While GC does collect the embryo's queue, it fails to drop the OOB skb's refcount. The skb which was in embryo's receive queue stays as unix_sk(sk)->oob_skb and keeps the listener's refcount [1]. Tell GC to dispose embryo's oob_skb. [0]: from array import array from socket import * addr =3D '\x00unix-oob' lis =3D socket(AF_UNIX, SOCK_STREAM) lis.bind(addr) lis.listen(1) s =3D socket(AF_UNIX, SOCK_STREAM) s.connect(addr) scm =3D (SOL_SOCKET, SCM_RIGHTS, array('i', [lis.fileno()])) s.sendmsg([b'x'], [scm], MSG_OOB) lis.close() [1] $ grep unix-oob /proc/net/unix $ ./unix-oob.py $ grep unix-oob /proc/net/unix 0000000000000000: 00000002 00000000 00000000 0001 02 0 @unix-oob 0000000000000000: 00000002 00000000 00010000 0001 01 6072 @unix-oob Fixes: 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") Signed-off-by: Michal Luczaj Reviewed-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni (cherry picked from commit 041933a1ec7b4173a8e638cae4f8e394331d7e54) Signed-off-by: Lee Jones --- net/unix/garbage.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 1f8b8cdfcdc8..dfe94a90ece4 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -342,6 +342,18 @@ enum unix_recv_queue_lock_class { U_RECVQ_LOCK_EMBRYO, }; =20 +static void unix_collect_queue(struct unix_sock *u, struct sk_buff_head *h= itlist) +{ + skb_queue_splice_init(&u->sk.sk_receive_queue, hitlist); + +#if IS_ENABLED(CONFIG_AF_UNIX_OOB) + if (u->oob_skb) { + WARN_ON_ONCE(skb_unref(u->oob_skb)); + u->oob_skb =3D NULL; + } +#endif +} + static void unix_collect_skb(struct list_head *scc, struct sk_buff_head *h= itlist) { struct unix_vertex *vertex; @@ -365,18 +377,11 @@ static void unix_collect_skb(struct list_head *scc, s= truct sk_buff_head *hitlist =20 /* listener -> embryo order, the inversion never happens. */ spin_lock_nested(&embryo_queue->lock, U_RECVQ_LOCK_EMBRYO); - skb_queue_splice_init(embryo_queue, hitlist); + unix_collect_queue(unix_sk(skb->sk), hitlist); spin_unlock(&embryo_queue->lock); } } else { - skb_queue_splice_init(queue, hitlist); - -#if IS_ENABLED(CONFIG_AF_UNIX_OOB) - if (u->oob_skb) { - kfree_skb(u->oob_skb); - u->oob_skb =3D NULL; - } -#endif + unix_collect_queue(u, hitlist); } =20 spin_unlock(&queue->lock); --=20 2.49.0.1143.g0be31eac6b-goog From nobody Sun Dec 14 19:19:01 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9580221DB5; Wed, 21 May 2025 15:35:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841730; cv=none; b=bWcyVB4a/VwcTI+0WWqXy7VogwJSV+gB50A1yPV+mvQCvxFbMmlg403gLifzuwqNFpvKQjeMtVjNOZn4ul6EM16RTpD8GllHCjJNGgLOgaWONzEwmcAVnWPJn5Uugc/q9aG2nfZOkVxOrcAvJSLAiJ9ogq4tLjafB060yhDNhYY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747841730; c=relaxed/simple; bh=SvoYm9cLZxsVGUhdWCKXpdKXEpXhlruEw3SXmHGIogA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OGjm5iKsCZt3HvMlCFiBjABhQfvE04fVcTYXEwLvz/MRytrQlTwR9MOf+GVdivevJGhCDyhIPvDXU1rwpqYb0jlbTNRMKZzMou8v6E4N28VU/3yCaQTg64632OLB2yRg+PgDR8/i2X5P3e/j5KnY4LEW6nlK7JzX20E6xam7Srk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LHaSSF/F; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LHaSSF/F" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F383BC4CEE4; Wed, 21 May 2025 15:35:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747841730; bh=SvoYm9cLZxsVGUhdWCKXpdKXEpXhlruEw3SXmHGIogA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LHaSSF/Fy3guyqdDiWwPzUqDlUzYT6GKMDfJ389bO4fjwDTbytNCRXezZ7UHxA709 Sdb1e8VneNYpNnkWVceB+mqVblHC66pes2ZAV68PDMWohN6WZ+Albwsw71wvUGJ3tC +yV802jx2wBz4SazDoP1rvLJ9k2/QoclqlnKHElwtQN4UiIXgRErZk67qFFaMSEO6T /7Nldxppb/KMXbh23+w38saz3xkMcNEOdHgHiMVJOXHck7Nkn7E5P7YuxX47kIDsRz AG5tlA8FhKwuqpoRmW3Bn5Qx54IhD8v2QYnR0jPFiB5UFsB+VTwqjErNq6oz845m+x 9yqw0Hqj1T/hQ== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Kuniyuki Iwashima , Jens Axboe , Alexander Mikhalitsyn , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org, Shigeru Yoshida , syzkaller Subject: [PATCH v6.1 27/27] af_unix: Fix uninit-value in __unix_walk_scc() Date: Wed, 21 May 2025 16:27:26 +0100 Message-ID: <20250521152920.1116756-28-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1143.g0be31eac6b-goog In-Reply-To: <20250521152920.1116756-1-lee@kernel.org> References: <20250521152920.1116756-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Shigeru Yoshida [ Upstream commit 927fa5b3e4f52e0967bfc859afc98ad1c523d2d5 ] KMSAN reported uninit-value access in __unix_walk_scc() [1]. In the list_for_each_entry_reverse() loop, when the vertex's index equals it's scc_index, the loop uses the variable vertex as a temporary variable that points to a vertex in scc. And when the loop is finished, the variable vertex points to the list head, in this case scc, which is a local variable on the stack (more precisely, it's not even scc and might underflow the call stack of __unix_walk_scc(): container_of(&scc, struct unix_vertex, scc_entry)). However, the variable vertex is used under the label prev_vertex. So if the edge_stack is not empty and the function jumps to the prev_vertex label, the function will access invalid data on the stack. This causes the uninit-value access issue. Fix this by introducing a new temporary variable for the loop. [1] BUG: KMSAN: uninit-value in __unix_walk_scc net/unix/garbage.c:478 [inline] BUG: KMSAN: uninit-value in unix_walk_scc net/unix/garbage.c:526 [inline] BUG: KMSAN: uninit-value in __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584 __unix_walk_scc net/unix/garbage.c:478 [inline] unix_walk_scc net/unix/garbage.c:526 [inline] __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393 kthread+0x3c4/0x530 kernel/kthread.c:389 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Uninit was stored to memory at: unix_walk_scc net/unix/garbage.c:526 [inline] __unix_gc+0x2adf/0x3c20 net/unix/garbage.c:584 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393 kthread+0x3c4/0x530 kernel/kthread.c:389 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Local variable entries created at: ref_tracker_free+0x48/0xf30 lib/ref_tracker.c:222 netdev_tracker_free include/linux/netdevice.h:4058 [inline] netdev_put include/linux/netdevice.h:4075 [inline] dev_put include/linux/netdevice.h:4101 [inline] update_gid_event_work_handler+0xaa/0x1b0 drivers/infiniband/core/roce_gid_= mgmt.c:813 CPU: 1 PID: 12763 Comm: kworker/u8:31 Not tainted 6.10.0-rc4-00217-g35bb670= d65fc #32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 0= 4/01/2014 Workqueue: events_unbound __unix_gc Fixes: 3484f063172d ("af_unix: Detect Strongly Connected Components.") Reported-by: syzkaller Signed-off-by: Shigeru Yoshida Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20240702160428.10153-1-syoshida@redhat.com Signed-off-by: Jakub Kicinski (cherry picked from commit 927fa5b3e4f52e0967bfc859afc98ad1c523d2d5) Signed-off-by: Lee Jones --- net/unix/garbage.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index dfe94a90ece4..23efb78fe9ef 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -476,6 +476,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex,= unsigned long *last_inde } =20 if (vertex->index =3D=3D vertex->scc_index) { + struct unix_vertex *v; struct list_head scc; bool scc_dead =3D true; =20 @@ -486,15 +487,15 @@ static void __unix_walk_scc(struct unix_vertex *verte= x, unsigned long *last_inde */ __list_cut_position(&scc, &vertex_stack, &vertex->scc_entry); =20 - list_for_each_entry_reverse(vertex, &scc, scc_entry) { + list_for_each_entry_reverse(v, &scc, scc_entry) { /* Don't restart DFS from this vertex in unix_walk_scc(). */ - list_move_tail(&vertex->entry, &unix_visited_vertices); + list_move_tail(&v->entry, &unix_visited_vertices); =20 /* Mark vertex as off-stack. */ - vertex->index =3D unix_vertex_grouped_index; + v->index =3D unix_vertex_grouped_index; =20 if (scc_dead) - scc_dead =3D unix_vertex_dead(vertex); + scc_dead =3D unix_vertex_dead(v); } =20 if (scc_dead) --=20 2.49.0.1143.g0be31eac6b-goog