From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D19BF1E51EF; Wed, 21 May 2025 14:49:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747838973; cv=none; b=p5V7ZwV7bPfquPxSTOpOjStXtr7pJccgBVCeTM3Sj33sgY36b45xUuMWILyVSim6NRDohC6KBax7j2hvN+yzp64kXxuKcUT+Ps6Q9BCNG6t3TOZ2FOg+Ckk1sxWnf8uAnCTdaYddGrWvZPXpvk/uaRA2cJVyPBw1ShlZbf52QO0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747838973; c=relaxed/simple; bh=68+ebJ4FubqnMCeq4jcRHGP5kbGYBkNvzs6YdR8rFLI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ek3WQL5xIgjaWiiCI+6yZz2mkvgvxFe1z7HSM7FayEPg6GBZ8MGa3ok491Ty3l5VTtS99j4GB3HpGRBb9trpV0XxyWefSwf6Ex45IvSf17jF9M64kTSef+HNN/SfYuNNBmWafYMk291kyMqQMJdG7n8vARl7IWBujqfUeH9ZYG4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=krf3vuOq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="krf3vuOq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4C0D3C4CEE7; Wed, 21 May 2025 14:49:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747838973; bh=68+ebJ4FubqnMCeq4jcRHGP5kbGYBkNvzs6YdR8rFLI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=krf3vuOqh6bceKHwkR17/SEjRqv1XUtGiqXBg6tv0GX01tR6Dojmk8SnfL2MJXD8E ndhUhLAOX3OnBVB1VSrexC1zL2mUlcTiCa9IXPkxH3KVwHvy6+41UaY9Y7my7ULqEn fo7J02WFmPNi+sUrcOShiIZeudH3v6y096I9ijPQIDh3gA+22WGpUyX3tpAsszPTyE PWzAd/QOJ+YOyYBbsWoxEc84a30wQoS0Qk7ro/Zkm9yqfsgrTK//ntOOtmtKJ6kqYe lO+9a/NNArmZzy+iHThRi0mPGPZCWWEdaSOeg3WgdJXd+UZzIvCPaKueX+92funttl 0a1UAa1bey+VA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org, Simon Horman Subject: [PATCH v6.6 01/26] af_unix: Return struct unix_sock from unix_get_socket(). Date: Wed, 21 May 2025 14:45:09 +0000 Message-ID: <20250521144803.2050504-2-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 5b17307bd0789edea0675d524a2b277b93bbde62 ] Currently, unix_get_socket() returns struct sock, but after calling it, we always cast it to unix_sk(). Let's return struct unix_sock from unix_get_socket(). Signed-off-by: Kuniyuki Iwashima Acked-by: Pavel Begunkov Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20240123170856.41348-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 5b17307bd0789edea0675d524a2b277b93bbde62) Signed-off-by: Lee Jones --- include/net/af_unix.h | 2 +- net/unix/garbage.c | 19 +++++++------------ net/unix/scm.c | 19 +++++++------------ 3 files changed, 15 insertions(+), 25 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 77bf30203d3cf..7a00d7ed527b6 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -14,7 +14,7 @@ void unix_destruct_scm(struct sk_buff *skb); void io_uring_destruct_scm(struct sk_buff *skb); void unix_gc(void); void wait_for_unix_gc(void); -struct sock *unix_get_socket(struct file *filp); +struct unix_sock *unix_get_socket(struct file *filp); struct sock *unix_peer_get(struct sock *sk); =20 #define UNIX_HASH_MOD (256 - 1) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 2a758531e1027..38639766b9e7c 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -105,20 +105,15 @@ static void scan_inflight(struct sock *x, void (*func= )(struct unix_sock *), =20 while (nfd--) { /* Get the socket the fd matches if it indeed does so */ - struct sock *sk =3D unix_get_socket(*fp++); + struct unix_sock *u =3D unix_get_socket(*fp++); =20 - if (sk) { - struct unix_sock *u =3D unix_sk(sk); + /* Ignore non-candidates, they could have been added + * to the queues after starting the garbage collection + */ + if (u && test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) { + hit =3D true; =20 - /* Ignore non-candidates, they could - * have been added to the queues after - * starting the garbage collection - */ - if (test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) { - hit =3D true; - - func(u); - } + func(u); } } if (hit && hitlist !=3D NULL) { diff --git a/net/unix/scm.c b/net/unix/scm.c index e92f2fad64105..b5ae5ab167773 100644 --- a/net/unix/scm.c +++ b/net/unix/scm.c @@ -21,9 +21,8 @@ EXPORT_SYMBOL(gc_inflight_list); DEFINE_SPINLOCK(unix_gc_lock); EXPORT_SYMBOL(unix_gc_lock); =20 -struct sock *unix_get_socket(struct file *filp) +struct unix_sock *unix_get_socket(struct file *filp) { - struct sock *u_sock =3D NULL; struct inode *inode =3D file_inode(filp); =20 /* Socket ? */ @@ -34,10 +33,10 @@ struct sock *unix_get_socket(struct file *filp) =20 /* PF_UNIX ? */ if (s && ops && ops->family =3D=3D PF_UNIX) - u_sock =3D s; + return unix_sk(s); } =20 - return u_sock; + return NULL; } EXPORT_SYMBOL(unix_get_socket); =20 @@ -46,13 +45,11 @@ EXPORT_SYMBOL(unix_get_socket); */ void unix_inflight(struct user_struct *user, struct file *fp) { - struct sock *s =3D unix_get_socket(fp); + struct unix_sock *u =3D unix_get_socket(fp); =20 spin_lock(&unix_gc_lock); =20 - if (s) { - struct unix_sock *u =3D unix_sk(s); - + if (u) { if (!u->inflight) { BUG_ON(!list_empty(&u->link)); list_add_tail(&u->link, &gc_inflight_list); @@ -69,13 +66,11 @@ void unix_inflight(struct user_struct *user, struct fil= e *fp) =20 void unix_notinflight(struct user_struct *user, struct file *fp) { - struct sock *s =3D unix_get_socket(fp); + struct unix_sock *u =3D unix_get_socket(fp); =20 spin_lock(&unix_gc_lock); =20 - if (s) { - struct unix_sock *u =3D unix_sk(s); - + if (u) { BUG_ON(!u->inflight); BUG_ON(list_empty(&u->link)); =20 --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A70701DFE00; Wed, 21 May 2025 14:49:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747838981; cv=none; b=AjE8SP0CYfcXP5E+eIIlUmkyLGmbDMDFgY7E5aUVqT3tO7EWIJITmpTT39f5cRDNJmhFKmU0nvqFxKbhBik5kLJU9+pEkMVUQl7P5NUE6FkRgT8uQuKyOGnJ1qdlrnuqVuVUTXth/+t3dK61gvwH7V+bfvZmAEdaBoHZ/ZOy/J0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747838981; c=relaxed/simple; bh=8cUgHkvbHACk8fUQ4pXuXawjYvAU50wseEwUIO8WWiQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dWTiVNmk6ej/PViqH4ML56zCrpdCivVbdOTruvSQpKko/1rJz8j+55880GG0JQ0vcI2eeKYVc8QSLW7+KLQ3hB7F65pseDEScVQ0KH2TZIMZJWpQVUnoa+6ir7Z571CzSFqPgM6rj+q/KnVI7idafXhtRPE6fzXM6h7+Wf4inBE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ECX7TZ99; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ECX7TZ99" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D3752C4CEE7; Wed, 21 May 2025 14:49:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747838981; bh=8cUgHkvbHACk8fUQ4pXuXawjYvAU50wseEwUIO8WWiQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ECX7TZ99lYeXrVdC/eZYbpAKd6IXJ1GYGe0ioZsWS70TqeDy2y5NE0tvC8/avdjPP /+mknFhO2/CYvPhN3N8K4gM3y3SKRi48yWL8SNB5eEOGPYfhaqdsQ2s0PHYxT1Yzy/ oZ/+olspbDC39tCyy/8iOFcPUsJo/Afm/I0Rr5Bji2b28Bi5XlgZGcfVwaMVOWBDsK xiL1QeXqkJCQTd8Xdnq0Ad8OOPMjomkEpD+CXobnlJQO1hKU1sRQTToADNP31uq0Bc 6Rurev9TLI4UkipSSWmCBzyGwNUkw7AS1gV/9Nry3Pl2xeqYnuWrnpgpXvJvT60JgG PI0iOgBM+flbQ== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 02/26] af_unix: Run GC on only one CPU. Date: Wed, 21 May 2025 14:45:10 +0000 Message-ID: <20250521144803.2050504-3-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 8b90a9f819dc2a06baae4ec1a64d875e53b824ec ] If more than 16000 inflight AF_UNIX sockets exist and the garbage collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc(). Also, they wait for unix_gc() to complete. In unix_gc(), all inflight AF_UNIX sockets are traversed at least once, and more if they are the GC candidate. Thus, sendmsg() significantly slows down with too many inflight AF_UNIX sockets. There is a small window to invoke multiple unix_gc() instances, which will then be blocked by the same spinlock except for one. Let's convert unix_gc() to use struct work so that it will not consume CPUs unnecessarily. Note WRITE_ONCE(gc_in_progress, true) is moved before running GC. If we leave the WRITE_ONCE() as is and use the following test to call flush_work(), a process might not call it. CPU 0 CPU 1 --- --- start work and call __unix_gc= () if (work_pending(&unix_gc_work) || <-- false READ_ONCE(gc_in_progress)) <-- false flush_work(); <-- missed! WRITE_ONCE(gc_in_progress, true) Signed-off-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20240123170856.41348-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 8b90a9f819dc2a06baae4ec1a64d875e53b824ec) Signed-off-by: Lee Jones --- net/unix/garbage.c | 54 +++++++++++++++++++++++----------------------- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 38639766b9e7c..a2a8543613a52 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -86,7 +86,6 @@ /* Internal data structures and random procedures: */ =20 static LIST_HEAD(gc_candidates); -static DECLARE_WAIT_QUEUE_HEAD(unix_gc_wait); =20 static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), struct sk_buff_head *hitlist) @@ -182,23 +181,8 @@ static void inc_inflight_move_tail(struct unix_sock *u) } =20 static bool gc_in_progress; -#define UNIX_INFLIGHT_TRIGGER_GC 16000 - -void wait_for_unix_gc(void) -{ - /* If number of inflight sockets is insane, - * force a garbage collect right now. - * Paired with the WRITE_ONCE() in unix_inflight(), - * unix_notinflight() and gc_in_progress(). - */ - if (READ_ONCE(unix_tot_inflight) > UNIX_INFLIGHT_TRIGGER_GC && - !READ_ONCE(gc_in_progress)) - unix_gc(); - wait_event(unix_gc_wait, !READ_ONCE(gc_in_progress)); -} =20 -/* The external entry point: unix_gc() */ -void unix_gc(void) +static void __unix_gc(struct work_struct *work) { struct sk_buff *next_skb, *skb; struct unix_sock *u; @@ -209,13 +193,6 @@ void unix_gc(void) =20 spin_lock(&unix_gc_lock); =20 - /* Avoid a recursive GC. */ - if (gc_in_progress) - goto out; - - /* Paired with READ_ONCE() in wait_for_unix_gc(). */ - WRITE_ONCE(gc_in_progress, true); - /* First, select candidates for garbage collection. Only * in-flight sockets are considered, and from those only ones * which don't have any external reference. @@ -346,8 +323,31 @@ void unix_gc(void) /* Paired with READ_ONCE() in wait_for_unix_gc(). */ WRITE_ONCE(gc_in_progress, false); =20 - wake_up(&unix_gc_wait); - - out: spin_unlock(&unix_gc_lock); } + +static DECLARE_WORK(unix_gc_work, __unix_gc); + +void unix_gc(void) +{ + WRITE_ONCE(gc_in_progress, true); + queue_work(system_unbound_wq, &unix_gc_work); +} + +#define UNIX_INFLIGHT_TRIGGER_GC 16000 + +void wait_for_unix_gc(void) +{ + /* If number of inflight sockets is insane, + * force a garbage collect right now. + * + * Paired with the WRITE_ONCE() in unix_inflight(), + * unix_notinflight(), and __unix_gc(). + */ + if (READ_ONCE(unix_tot_inflight) > UNIX_INFLIGHT_TRIGGER_GC && + !READ_ONCE(gc_in_progress)) + unix_gc(); + + if (READ_ONCE(gc_in_progress)) + flush_work(&unix_gc_work); +} --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35A851E2307; Wed, 21 May 2025 14:49:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747838989; cv=none; b=u2cZ2dUdTXhGkFnPQmYYGTbt3EFpwzHRo7ZEunwR7hG2BWsDB7xh9WOUQymWsP+WDJ8/fhbIeCuhjFJUHTdhS68fP1JnkCKCYvI4HbElgngH6+lA54rPT/fShN0rXJkgY5FVTShn6Wkjrg1vQTcKo5MRzNwrIhzZfNRp8Ql23LI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747838989; c=relaxed/simple; bh=LacymBk3OdyXuOFQL42AbgxdTK7d0Kz4CzSFY4FvTDY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AdkGvbBjB5pLpqTIMUja+xWH0YrWagKvk49DINeWAUdBrks//AITEsYt35dE+hoAGcG3dSMU5wN2z6oIhmtulECHAmL4DxVDVIAjOS5320VtFRPH0Mni5oZ9BpkF3t+qWKocVvZmTllKQa9dPRaw+e+a5xrnGCP1PBAz+Qp8LUY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qy2FSQdP; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qy2FSQdP" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 63CB8C4CEEB; Wed, 21 May 2025 14:49:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747838988; bh=LacymBk3OdyXuOFQL42AbgxdTK7d0Kz4CzSFY4FvTDY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qy2FSQdP+Tghz5uXIboLeO3TFJqmc4W0V2mrFw6Y/R60t+RkNRwH/JOgW5y0CdLeP lbwFHo3V3mc0Y9+WPBd2Sd4p6tYIJEbBVUTDQt/PM+a935XdHc8VqOLs8qweBaXJlv jYaBRpd3AE5aZnMqoLJa7j/e3CFZvT948r6iTJ37wuSlJu8jXv8U57pMda0jjW91oC vZzqmoLdlUApjxW3VIKiSsa4ivbmd5vw/UpLhrZ+3DgTjJlsrp/77At/yAUKF1n0ZH YXDvlP7eTpWHZ9wtg5PIt25lne2GmuBRCqCTGaFy7RfmJZ+O/+RnsxChcKV/f9Bq0D gFTJf4N5e9Qqg== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 03/26] af_unix: Try to run GC async. Date: Wed, 21 May 2025 14:45:11 +0000 Message-ID: <20250521144803.2050504-4-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit d9f21b3613337b55cc9d4a6ead484dca68475143 ] If more than 16000 inflight AF_UNIX sockets exist and the garbage collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc(). Also, they wait for unix_gc() to complete. In unix_gc(), all inflight AF_UNIX sockets are traversed at least once, and more if they are the GC candidate. Thus, sendmsg() significantly slows down with too many inflight AF_UNIX sockets. However, if a process sends data with no AF_UNIX FD, the sendmsg() call does not need to wait for GC. After this change, only the process that meets the condition below will be blocked under such a situation. 1) cmsg contains AF_UNIX socket 2) more than 32 AF_UNIX sent by the same user are still inflight Note that even a sendmsg() call that does not meet the condition but has AF_UNIX FD will be blocked later in unix_scm_to_skb() by the spinlock, but we allow that as a bonus for sane users. The results below are the time spent in unix_dgram_sendmsg() sending 1 byte of data with no FD 4096 times on a host where 32K inflight AF_UNIX sockets exist. Without series: the sane sendmsg() needs to wait gc unreasonably. $ sudo /usr/share/bcc/tools/funclatency -p 11165 unix_dgram_sendmsg Tracing 1 functions for "unix_dgram_sendmsg"... Hit Ctrl-C to end. ^C nsecs : count distribution [...] 524288 -> 1048575 : 0 | = | 1048576 -> 2097151 : 3881 |************************************= ****| 2097152 -> 4194303 : 214 |** = | 4194304 -> 8388607 : 1 | = | avg =3D 1825567 nsecs, total: 7477526027 nsecs, count: 4096 With series: the sane sendmsg() can finish much faster. $ sudo /usr/share/bcc/tools/funclatency -p 8702 unix_dgram_sendmsg Tracing 1 functions for "unix_dgram_sendmsg"... Hit Ctrl-C to end. ^C nsecs : count distribution [...] 128 -> 255 : 0 | = | 256 -> 511 : 4092 |************************************= ****| 512 -> 1023 : 2 | = | 1024 -> 2047 : 0 | = | 2048 -> 4095 : 0 | = | 4096 -> 8191 : 1 | = | 8192 -> 16383 : 1 | = | avg =3D 410 nsecs, total: 1680510 nsecs, count: 4096 Signed-off-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20240123170856.41348-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit d9f21b3613337b55cc9d4a6ead484dca68475143) Signed-off-by: Lee Jones --- include/net/af_unix.h | 12 ++++++++++-- include/net/scm.h | 1 + net/core/scm.c | 5 +++++ net/unix/af_unix.c | 6 ++++-- net/unix/garbage.c | 10 +++++++++- 5 files changed, 29 insertions(+), 5 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 7a00d7ed527b6..865e2f7bd67cf 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -8,13 +8,21 @@ #include #include =20 +#if IS_ENABLED(CONFIG_UNIX) +struct unix_sock *unix_get_socket(struct file *filp); +#else +static inline struct unix_sock *unix_get_socket(struct file *filp) +{ + return NULL; +} +#endif + void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); void unix_destruct_scm(struct sk_buff *skb); void io_uring_destruct_scm(struct sk_buff *skb); void unix_gc(void); -void wait_for_unix_gc(void); -struct unix_sock *unix_get_socket(struct file *filp); +void wait_for_unix_gc(struct scm_fp_list *fpl); struct sock *unix_peer_get(struct sock *sk); =20 #define UNIX_HASH_MOD (256 - 1) diff --git a/include/net/scm.h b/include/net/scm.h index e8c76b4be2fe7..1ff6a28550644 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -24,6 +24,7 @@ struct scm_creds { =20 struct scm_fp_list { short count; + short count_unix; short max; struct user_struct *user; struct file *fp[SCM_MAX_FD]; diff --git a/net/core/scm.c b/net/core/scm.c index 737917c7ac627..574607b1c2d96 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -36,6 +36,7 @@ #include #include #include +#include =20 =20 /* @@ -85,6 +86,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_f= p_list **fplp) return -ENOMEM; *fplp =3D fpl; fpl->count =3D 0; + fpl->count_unix =3D 0; fpl->max =3D SCM_MAX_FD; fpl->user =3D NULL; } @@ -109,6 +111,9 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm= _fp_list **fplp) fput(file); return -EINVAL; } + if (unix_get_socket(file)) + fpl->count_unix++; + *fpp++ =3D file; fpl->count++; } diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index ab23c8d72122b..bb92b1ed94aaf 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1885,11 +1885,12 @@ static int unix_dgram_sendmsg(struct socket *sock, = struct msghdr *msg, long timeo; int err; =20 - wait_for_unix_gc(); err =3D scm_send(sock, msg, &scm, false); if (err < 0) return err; =20 + wait_for_unix_gc(scm.fp); + err =3D -EOPNOTSUPP; if (msg->msg_flags&MSG_OOB) goto out; @@ -2157,11 +2158,12 @@ static int unix_stream_sendmsg(struct socket *sock,= struct msghdr *msg, bool fds_sent =3D false; int data_len; =20 - wait_for_unix_gc(); err =3D scm_send(sock, msg, &scm, false); if (err < 0) return err; =20 + wait_for_unix_gc(scm.fp); + err =3D -EOPNOTSUPP; if (msg->msg_flags & MSG_OOB) { #if IS_ENABLED(CONFIG_AF_UNIX_OOB) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index a2a8543613a52..96cc6b7674333 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -335,8 +335,9 @@ void unix_gc(void) } =20 #define UNIX_INFLIGHT_TRIGGER_GC 16000 +#define UNIX_INFLIGHT_SANE_USER (SCM_MAX_FD * 8) =20 -void wait_for_unix_gc(void) +void wait_for_unix_gc(struct scm_fp_list *fpl) { /* If number of inflight sockets is insane, * force a garbage collect right now. @@ -348,6 +349,13 @@ void wait_for_unix_gc(void) !READ_ONCE(gc_in_progress)) unix_gc(); =20 + /* Penalise users who want to send AF_UNIX sockets + * but whose sockets have not been received yet. + */ + if (!fpl || !fpl->count_unix || + READ_ONCE(fpl->user->unix_inflight) < UNIX_INFLIGHT_SANE_USER) + return; + if (READ_ONCE(gc_in_progress)) flush_work(&unix_gc_work); } --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B71A31D63DF; Wed, 21 May 2025 14:49:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747838996; cv=none; b=CM7UgrpRrJ1q1AV7jiD/IL5C1Rpi6+8hPISsZ1QuESdUnx8TdYqa5VJ3HQeUe80eD8yacqCzK0mMd9+pbQRY9SrSRl8YAi2ywndnoeefd2ft/d5j8jdPu+HbulLm4uFt8yWnHEZXdAscPOjNs9OraKsi+a4gO1YHDWX/O7yJg6g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747838996; c=relaxed/simple; bh=rUbsdizpyy0BfPaaxnOjmiPENgTIRCEFrJ0owHP8VDY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PBIe59bkmj3p6Kcx4ZUSJUWoQ0tWtZ4WgbSSimJFyJu1Blm+JOdeCJ1/riLA0ioT60LXyeOxa7iW4vOgj1LTQbbNHFmituEVAzFZMh5DPFG9QnfE7hBx8eExIBznxjO/qXoytsqc9wVKm1LNnCsKo88+oxwTpkwFOmDqKICxMq4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=E5XnY31j; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="E5XnY31j" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F2A5DC4CEE4; Wed, 21 May 2025 14:49:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747838996; bh=rUbsdizpyy0BfPaaxnOjmiPENgTIRCEFrJ0owHP8VDY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=E5XnY31jYeaRpcMNMsks6CZkqLulVxE1twVgsZvIeh6eH1DENlQ/GSKUKy5/zIdFS yWSK6hh7fm1Q+qRYh1hx+Wfzm9jH1JmE0SYIw5bvs9x6zrB3azrtwIbnE69aAB+lHS Z2OsOWgfzHSbQY0/DnsrIe9jCg8fSDKVWeJqD2Q3CXatAYjwVbaZubGS+4hnxvJvmV eZY0HVk0gspYhk7/oZsrZ78iNUjt8Zua6wKTdG3HVP5BXT/dlTfOkKK/f6dpe8tXkg EJ8hVGW5hAp8rU2qQ3JquaqbFouWWQughVq4zm7sv5dWqzaUND88B5tcVFvA6Y5i5c f6BcTxe+c3A5w== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 04/26] af_unix: Replace BUG_ON() with WARN_ON_ONCE(). Date: Wed, 21 May 2025 14:45:12 +0000 Message-ID: <20250521144803.2050504-5-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit d0f6dc26346863e1f4a23117f5468614e54df064 ] This is a prep patch for the last patch in this series so that checkpatch will not warn about BUG_ON(). Signed-off-by: Kuniyuki Iwashima Acked-by: Jens Axboe Link: https://lore.kernel.org/r/20240129190435.57228-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit d0f6dc26346863e1f4a23117f5468614e54df064) Signed-off-by: Lee Jones --- net/unix/garbage.c | 8 ++++---- net/unix/scm.c | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 96cc6b7674333..b4bf7f7538826 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -145,7 +145,7 @@ static void scan_children(struct sock *x, void (*func)(= struct unix_sock *), /* An embryo cannot be in-flight, so it's safe * to use the list link. */ - BUG_ON(!list_empty(&u->link)); + WARN_ON_ONCE(!list_empty(&u->link)); list_add_tail(&u->link, &embryos); } spin_unlock(&x->sk_receive_queue.lock); @@ -224,8 +224,8 @@ static void __unix_gc(struct work_struct *work) =20 total_refs =3D file_count(sk->sk_socket->file); =20 - BUG_ON(!u->inflight); - BUG_ON(total_refs < u->inflight); + WARN_ON_ONCE(!u->inflight); + WARN_ON_ONCE(total_refs < u->inflight); if (total_refs =3D=3D u->inflight) { list_move_tail(&u->link, &gc_candidates); __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags); @@ -318,7 +318,7 @@ static void __unix_gc(struct work_struct *work) list_move_tail(&u->link, &gc_inflight_list); =20 /* All candidates should have been detached by now. */ - BUG_ON(!list_empty(&gc_candidates)); + WARN_ON_ONCE(!list_empty(&gc_candidates)); =20 /* Paired with READ_ONCE() in wait_for_unix_gc(). */ WRITE_ONCE(gc_in_progress, false); diff --git a/net/unix/scm.c b/net/unix/scm.c index b5ae5ab167773..505e56cf02a21 100644 --- a/net/unix/scm.c +++ b/net/unix/scm.c @@ -51,10 +51,10 @@ void unix_inflight(struct user_struct *user, struct fil= e *fp) =20 if (u) { if (!u->inflight) { - BUG_ON(!list_empty(&u->link)); + WARN_ON_ONCE(!list_empty(&u->link)); list_add_tail(&u->link, &gc_inflight_list); } else { - BUG_ON(list_empty(&u->link)); + WARN_ON_ONCE(list_empty(&u->link)); } u->inflight++; /* Paired with READ_ONCE() in wait_for_unix_gc() */ @@ -71,8 +71,8 @@ void unix_notinflight(struct user_struct *user, struct fi= le *fp) spin_lock(&unix_gc_lock); =20 if (u) { - BUG_ON(!u->inflight); - BUG_ON(list_empty(&u->link)); + WARN_ON_ONCE(!u->inflight); + WARN_ON_ONCE(list_empty(&u->link)); =20 u->inflight--; if (!u->inflight) --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49A9C1537C6; Wed, 21 May 2025 14:50:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839004; cv=none; b=ovuY+y4U73ICaEcKVkedmWorBGqqfLA65vgH3obViGtti/gGnPLxWOzwLZEst53BFyXVDX6vI5rZlKAoP3NY8XvR04iIXfclQ+ZpgTGkle+kWDzt8YXNuDmbRIPCBxiqNqsFxUcXMnuOKt1ZzbMImPkUvG1ap8TPSVTcXhXtJBA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839004; c=relaxed/simple; bh=B1YJC4noCJYqbdluweLPdw+nQEa22mlM+03uMJzYTds=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MzVdUYmGhr06AypD2brzdOFYcNjCFCYc619rjsx7Bzc2Of6cy1NzgrlWSdBfEkPOSNNJ3sGkgv2g5sDNZHN4+G0WllaIEPfN6ov0nFVuHGw6ulTFgeJPQe49IErUpowCTGwg4Imx17uMAvPX+KeuxnDM0NgIucnLaEGt0jVuGU8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hNYPAGbc; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hNYPAGbc" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7A8FFC4CEE7; Wed, 21 May 2025 14:50:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839003; bh=B1YJC4noCJYqbdluweLPdw+nQEa22mlM+03uMJzYTds=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hNYPAGbcvyjW/w+X/fTLuLiCDvKe6XIz/72ic2ZmMq6ElgLLLHimqLq2YOusc0Onu HYVpDVEg2gjbHFiV8rek2tg1jRA2mH5yEeHkfcI61qDqL8Stv8BFciWGliECEMjEVS Y3GzYzyTDTUp7A38+zk8ZAVIJPaOnjfql0UeRBXns/T1ivg3tLOiqu1IhLHef70TfF qsyrVa3gDs41g8/gA7cKELCKgZOrdYkTaWnv1bUdMPvY7DyQuSO9jbgAFSqHyeaMA2 bp6ouutlNi8BgzTPfYQRVxpXJI/e3LKaMEBeTl4XlLaRdQS9hPjAuhvBf/klgrJWu+ 7BX0eC5388Cww== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 05/26] af_unix: Remove io_uring code for GC. Date: Wed, 21 May 2025 14:45:13 +0000 Message-ID: <20250521144803.2050504-6-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 11498715f266a3fb4caabba9dd575636cbcaa8f1 ] Since commit 705318a99a13 ("io_uring/af_unix: disable sending io_uring over sockets"), io_uring's unix socket cannot be passed via SCM_RIGHTS, so it does not contribute to cyclic reference and no longer be candidate for garbage collection. Also, commit 6e5e6d274956 ("io_uring: drop any code related to SCM_RIGHTS") cleaned up SCM_RIGHTS code in io_uring. Let's do it in AF_UNIX as well by reverting commit 0091bfc81741 ("io_uring/af_unix: defer registered files gc to io_uring release") and commit 10369080454d ("net: reclaim skb->scm_io_uring bit"). Signed-off-by: Kuniyuki Iwashima Acked-by: Jens Axboe Link: https://lore.kernel.org/r/20240129190435.57228-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 11498715f266a3fb4caabba9dd575636cbcaa8f1) Signed-off-by: Lee Jones --- include/net/af_unix.h | 1 - net/unix/garbage.c | 25 ++----------------------- net/unix/scm.c | 6 ------ 3 files changed, 2 insertions(+), 30 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 865e2f7bd67cf..4d35204c08570 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -20,7 +20,6 @@ static inline struct unix_sock *unix_get_socket(struct fi= le *filp) void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); void unix_destruct_scm(struct sk_buff *skb); -void io_uring_destruct_scm(struct sk_buff *skb); void unix_gc(void); void wait_for_unix_gc(struct scm_fp_list *fpl); struct sock *unix_peer_get(struct sock *sk); diff --git a/net/unix/garbage.c b/net/unix/garbage.c index b4bf7f7538826..c04f82489abb9 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -184,12 +184,10 @@ static bool gc_in_progress; =20 static void __unix_gc(struct work_struct *work) { - struct sk_buff *next_skb, *skb; - struct unix_sock *u; - struct unix_sock *next; struct sk_buff_head hitlist; - struct list_head cursor; + struct unix_sock *u, *next; LIST_HEAD(not_cycle_list); + struct list_head cursor; =20 spin_lock(&unix_gc_lock); =20 @@ -293,30 +291,11 @@ static void __unix_gc(struct work_struct *work) =20 spin_unlock(&unix_gc_lock); =20 - /* We need io_uring to clean its registered files, ignore all io_uring - * originated skbs. It's fine as io_uring doesn't keep references to - * other io_uring instances and so killing all other files in the cycle - * will put all io_uring references forcing it to go through normal - * release.path eventually putting registered files. - */ - skb_queue_walk_safe(&hitlist, skb, next_skb) { - if (skb->destructor =3D=3D io_uring_destruct_scm) { - __skb_unlink(skb, &hitlist); - skb_queue_tail(&skb->sk->sk_receive_queue, skb); - } - } - /* Here we are. Hitlist is filled. Die. */ __skb_queue_purge(&hitlist); =20 spin_lock(&unix_gc_lock); =20 - /* There could be io_uring registered files, just push them back to - * the inflight list - */ - list_for_each_entry_safe(u, next, &gc_candidates, link) - list_move_tail(&u->link, &gc_inflight_list); - /* All candidates should have been detached by now. */ WARN_ON_ONCE(!list_empty(&gc_candidates)); =20 diff --git a/net/unix/scm.c b/net/unix/scm.c index 505e56cf02a21..db65b0ab59479 100644 --- a/net/unix/scm.c +++ b/net/unix/scm.c @@ -148,9 +148,3 @@ void unix_destruct_scm(struct sk_buff *skb) sock_wfree(skb); } EXPORT_SYMBOL(unix_destruct_scm); - -void io_uring_destruct_scm(struct sk_buff *skb) -{ - unix_destruct_scm(skb); -} -EXPORT_SYMBOL(io_uring_destruct_scm); --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6F2A1DE891; Wed, 21 May 2025 14:50:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839012; cv=none; b=Xv/L1q4QuFtxqo0vy3tOZTBb4J5sl0YNKX48ctw2UvzxX/Puu9DsWB7ZK8hlTjpO40DCjF7xhGhpyT7ZgByLLeoJDupl/sMcOncZ4EGewwHG5xx9Vtkd6KGlYjlqcm7VSQgc4q5BCYR2dabUNwTU2D6QoFkA78ni+4cgy6cBi+8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839012; c=relaxed/simple; bh=WYkzm1N5I3TPmGmQgs6HWj4fRNpemSWsJgscAjFgG2U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PHdfyfMSeeqoOeoSo6LMfhvPDJv7k+KIF9eNaz2F0VpPAZgPwZ82mLnjl5u8KCVPfXg6U9GsNTmm9pmedIrC8yeObSot/gyGTFM6NBsTGnWkrkzdVl2TbJ5SbZEOLX39o4LGbAxvrNecp4djeSILVfEzVSnLBTAunnZe84yKalM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=E0EDk4AE; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="E0EDk4AE" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2559BC4CEE7; Wed, 21 May 2025 14:50:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839011; bh=WYkzm1N5I3TPmGmQgs6HWj4fRNpemSWsJgscAjFgG2U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=E0EDk4AEu6FWyxDydqaEG37ZDxO1ogJi/c3mqAEwpWfyRUgfuEZfCD+I6+XHW0/Cv YbxeANlqiSPVAPuv2VRK88gDI9Pp9SDIyC7bxOOZBld1AXgzwvcAnQv/xnMUMWmbfs peEI8agvC4/Mcr0UVRszCaCRcGdmgJ+Fsr6DIgZoha05PkuLBJuy9FJ9s+ZxYjIA3V CgAT6YCw3gmcUde7KRqYTFKv9kM1BjF2PM8pubMl7zVyycaO8qq7CTCHrDPW/0e3lo P/NTJZnGaKS4Ss+54GOe/WmZu7p5vTBDBRt8ZUF7bxUR8GHsy0pS0wQp1ENoBTASAu IlF4GwcOZrwtw== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 06/26] af_unix: Remove CONFIG_UNIX_SCM. Date: Wed, 21 May 2025 14:45:14 +0000 Message-ID: <20250521144803.2050504-7-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 99a7a5b9943ea2d05fb0dee38e4ae2290477ed83 ] Originally, the code related to garbage collection was all in garbage.c. Commit f4e65870e5ce ("net: split out functions related to registering inflight socket files") moved some functions to scm.c for io_uring and added CONFIG_UNIX_SCM just in case AF_UNIX was built as module. However, since commit 97154bcf4d1b ("af_unix: Kconfig: make CONFIG_UNIX bool"), AF_UNIX is no longer built separately. Also, io_uring does not support SCM_RIGHTS now. Let's move the functions back to garbage.c Signed-off-by: Kuniyuki Iwashima Acked-by: Jens Axboe Link: https://lore.kernel.org/r/20240129190435.57228-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 99a7a5b9943ea2d05fb0dee38e4ae2290477ed83) Signed-off-by: Lee Jones --- include/net/af_unix.h | 7 +- net/Makefile | 2 +- net/unix/Kconfig | 5 -- net/unix/Makefile | 2 - net/unix/af_unix.c | 63 +++++++++++++++++- net/unix/garbage.c | 73 +++++++++++++++++++- net/unix/scm.c | 150 ------------------------------------------ net/unix/scm.h | 10 --- 8 files changed, 137 insertions(+), 175 deletions(-) delete mode 100644 net/unix/scm.c delete mode 100644 net/unix/scm.h diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 4d35204c08570..3dee0b2721aa4 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -17,19 +17,20 @@ static inline struct unix_sock *unix_get_socket(struct = file *filp) } #endif =20 +extern spinlock_t unix_gc_lock; +extern unsigned int unix_tot_inflight; + void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); -void unix_destruct_scm(struct sk_buff *skb); void unix_gc(void); void wait_for_unix_gc(struct scm_fp_list *fpl); + struct sock *unix_peer_get(struct sock *sk); =20 #define UNIX_HASH_MOD (256 - 1) #define UNIX_HASH_SIZE (256 * 2) #define UNIX_HASH_BITS 8 =20 -extern unsigned int unix_tot_inflight; - struct unix_address { refcount_t refcnt; int len; diff --git a/net/Makefile b/net/Makefile index 4c4dc535453df..45f3fbaae644e 100644 --- a/net/Makefile +++ b/net/Makefile @@ -17,7 +17,7 @@ obj-$(CONFIG_NETFILTER) +=3D netfilter/ obj-$(CONFIG_INET) +=3D ipv4/ obj-$(CONFIG_TLS) +=3D tls/ obj-$(CONFIG_XFRM) +=3D xfrm/ -obj-$(CONFIG_UNIX_SCM) +=3D unix/ +obj-$(CONFIG_UNIX) +=3D unix/ obj-y +=3D ipv6/ obj-$(CONFIG_BPFILTER) +=3D bpfilter/ obj-$(CONFIG_PACKET) +=3D packet/ diff --git a/net/unix/Kconfig b/net/unix/Kconfig index 28b232f281ab1..8b5d04210d7cf 100644 --- a/net/unix/Kconfig +++ b/net/unix/Kconfig @@ -16,11 +16,6 @@ config UNIX =20 Say Y unless you know what you are doing. =20 -config UNIX_SCM - bool - depends on UNIX - default y - config AF_UNIX_OOB bool depends on UNIX diff --git a/net/unix/Makefile b/net/unix/Makefile index 20491825b4d0d..4ddd125c4642c 100644 --- a/net/unix/Makefile +++ b/net/unix/Makefile @@ -11,5 +11,3 @@ unix-$(CONFIG_BPF_SYSCALL) +=3D unix_bpf.o =20 obj-$(CONFIG_UNIX_DIAG) +=3D unix_diag.o unix_diag-y :=3D diag.o - -obj-$(CONFIG_UNIX_SCM) +=3D scm.o diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index bb92b1ed94aaf..78758af2c6f38 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -117,8 +117,6 @@ #include #include =20 -#include "scm.h" - static atomic_long_t unix_nr_socks; static struct hlist_head bsd_socket_buckets[UNIX_HASH_SIZE / 2]; static spinlock_t bsd_socket_locks[UNIX_HASH_SIZE / 2]; @@ -1752,6 +1750,52 @@ static int unix_getname(struct socket *sock, struct = sockaddr *uaddr, int peer) return err; } =20 +/* The "user->unix_inflight" variable is protected by the garbage + * collection lock, and we just read it locklessly here. If you go + * over the limit, there might be a tiny race in actually noticing + * it across threads. Tough. + */ +static inline bool too_many_unix_fds(struct task_struct *p) +{ + struct user_struct *user =3D current_user(); + + if (unlikely(READ_ONCE(user->unix_inflight) > task_rlimit(p, RLIMIT_NOFIL= E))) + return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN); + return false; +} + +static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) +{ + int i; + + if (too_many_unix_fds(current)) + return -ETOOMANYREFS; + + /* Need to duplicate file references for the sake of garbage + * collection. Otherwise a socket in the fps might become a + * candidate for GC while the skb is not yet queued. + */ + UNIXCB(skb).fp =3D scm_fp_dup(scm->fp); + if (!UNIXCB(skb).fp) + return -ENOMEM; + + for (i =3D scm->fp->count - 1; i >=3D 0; i--) + unix_inflight(scm->fp->user, scm->fp->fp[i]); + + return 0; +} + +static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb) +{ + int i; + + scm->fp =3D UNIXCB(skb).fp; + UNIXCB(skb).fp =3D NULL; + + for (i =3D scm->fp->count - 1; i >=3D 0; i--) + unix_notinflight(scm->fp->user, scm->fp->fp[i]); +} + static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) { scm->fp =3D scm_fp_dup(UNIXCB(skb).fp); @@ -1799,6 +1843,21 @@ static void unix_peek_fds(struct scm_cookie *scm, st= ruct sk_buff *skb) spin_unlock(&unix_gc_lock); } =20 +static void unix_destruct_scm(struct sk_buff *skb) +{ + struct scm_cookie scm; + + memset(&scm, 0, sizeof(scm)); + scm.pid =3D UNIXCB(skb).pid; + if (UNIXCB(skb).fp) + unix_detach_fds(&scm, skb); + + /* Alas, it calls VFS */ + /* So fscking what? fput() had been SMP-safe since the last Summer */ + scm_destroy(&scm); + sock_wfree(skb); +} + static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bo= ol send_fds) { int err =3D 0; diff --git a/net/unix/garbage.c b/net/unix/garbage.c index c04f82489abb9..0104be9d47045 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -81,11 +81,80 @@ #include #include =20 -#include "scm.h" +struct unix_sock *unix_get_socket(struct file *filp) +{ + struct inode *inode =3D file_inode(filp); + + /* Socket ? */ + if (S_ISSOCK(inode->i_mode) && !(filp->f_mode & FMODE_PATH)) { + struct socket *sock =3D SOCKET_I(inode); + const struct proto_ops *ops; + struct sock *sk =3D sock->sk; =20 -/* Internal data structures and random procedures: */ + ops =3D READ_ONCE(sock->ops); =20 + /* PF_UNIX ? */ + if (sk && ops && ops->family =3D=3D PF_UNIX) + return unix_sk(sk); + } + + return NULL; +} + +DEFINE_SPINLOCK(unix_gc_lock); +unsigned int unix_tot_inflight; static LIST_HEAD(gc_candidates); +static LIST_HEAD(gc_inflight_list); + +/* Keep the number of times in flight count for the file + * descriptor if it is for an AF_UNIX socket. + */ +void unix_inflight(struct user_struct *user, struct file *filp) +{ + struct unix_sock *u =3D unix_get_socket(filp); + + spin_lock(&unix_gc_lock); + + if (u) { + if (!u->inflight) { + WARN_ON_ONCE(!list_empty(&u->link)); + list_add_tail(&u->link, &gc_inflight_list); + } else { + WARN_ON_ONCE(list_empty(&u->link)); + } + u->inflight++; + + /* Paired with READ_ONCE() in wait_for_unix_gc() */ + WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + 1); + } + + WRITE_ONCE(user->unix_inflight, user->unix_inflight + 1); + + spin_unlock(&unix_gc_lock); +} + +void unix_notinflight(struct user_struct *user, struct file *filp) +{ + struct unix_sock *u =3D unix_get_socket(filp); + + spin_lock(&unix_gc_lock); + + if (u) { + WARN_ON_ONCE(!u->inflight); + WARN_ON_ONCE(list_empty(&u->link)); + + u->inflight--; + if (!u->inflight) + list_del_init(&u->link); + + /* Paired with READ_ONCE() in wait_for_unix_gc() */ + WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - 1); + } + + WRITE_ONCE(user->unix_inflight, user->unix_inflight - 1); + + spin_unlock(&unix_gc_lock); +} =20 static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), struct sk_buff_head *hitlist) diff --git a/net/unix/scm.c b/net/unix/scm.c deleted file mode 100644 index db65b0ab59479..0000000000000 --- a/net/unix/scm.c +++ /dev/null @@ -1,150 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "scm.h" - -unsigned int unix_tot_inflight; -EXPORT_SYMBOL(unix_tot_inflight); - -LIST_HEAD(gc_inflight_list); -EXPORT_SYMBOL(gc_inflight_list); - -DEFINE_SPINLOCK(unix_gc_lock); -EXPORT_SYMBOL(unix_gc_lock); - -struct unix_sock *unix_get_socket(struct file *filp) -{ - struct inode *inode =3D file_inode(filp); - - /* Socket ? */ - if (S_ISSOCK(inode->i_mode) && !(filp->f_mode & FMODE_PATH)) { - struct socket *sock =3D SOCKET_I(inode); - const struct proto_ops *ops =3D READ_ONCE(sock->ops); - struct sock *s =3D sock->sk; - - /* PF_UNIX ? */ - if (s && ops && ops->family =3D=3D PF_UNIX) - return unix_sk(s); - } - - return NULL; -} -EXPORT_SYMBOL(unix_get_socket); - -/* Keep the number of times in flight count for the file - * descriptor if it is for an AF_UNIX socket. - */ -void unix_inflight(struct user_struct *user, struct file *fp) -{ - struct unix_sock *u =3D unix_get_socket(fp); - - spin_lock(&unix_gc_lock); - - if (u) { - if (!u->inflight) { - WARN_ON_ONCE(!list_empty(&u->link)); - list_add_tail(&u->link, &gc_inflight_list); - } else { - WARN_ON_ONCE(list_empty(&u->link)); - } - u->inflight++; - /* Paired with READ_ONCE() in wait_for_unix_gc() */ - WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + 1); - } - WRITE_ONCE(user->unix_inflight, user->unix_inflight + 1); - spin_unlock(&unix_gc_lock); -} - -void unix_notinflight(struct user_struct *user, struct file *fp) -{ - struct unix_sock *u =3D unix_get_socket(fp); - - spin_lock(&unix_gc_lock); - - if (u) { - WARN_ON_ONCE(!u->inflight); - WARN_ON_ONCE(list_empty(&u->link)); - - u->inflight--; - if (!u->inflight) - list_del_init(&u->link); - /* Paired with READ_ONCE() in wait_for_unix_gc() */ - WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - 1); - } - WRITE_ONCE(user->unix_inflight, user->unix_inflight - 1); - spin_unlock(&unix_gc_lock); -} - -/* - * The "user->unix_inflight" variable is protected by the garbage - * collection lock, and we just read it locklessly here. If you go - * over the limit, there might be a tiny race in actually noticing - * it across threads. Tough. - */ -static inline bool too_many_unix_fds(struct task_struct *p) -{ - struct user_struct *user =3D current_user(); - - if (unlikely(READ_ONCE(user->unix_inflight) > task_rlimit(p, RLIMIT_NOFIL= E))) - return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN); - return false; -} - -int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) -{ - int i; - - if (too_many_unix_fds(current)) - return -ETOOMANYREFS; - - /* - * Need to duplicate file references for the sake of garbage - * collection. Otherwise a socket in the fps might become a - * candidate for GC while the skb is not yet queued. - */ - UNIXCB(skb).fp =3D scm_fp_dup(scm->fp); - if (!UNIXCB(skb).fp) - return -ENOMEM; - - for (i =3D scm->fp->count - 1; i >=3D 0; i--) - unix_inflight(scm->fp->user, scm->fp->fp[i]); - return 0; -} -EXPORT_SYMBOL(unix_attach_fds); - -void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb) -{ - int i; - - scm->fp =3D UNIXCB(skb).fp; - UNIXCB(skb).fp =3D NULL; - - for (i =3D scm->fp->count-1; i >=3D 0; i--) - unix_notinflight(scm->fp->user, scm->fp->fp[i]); -} -EXPORT_SYMBOL(unix_detach_fds); - -void unix_destruct_scm(struct sk_buff *skb) -{ - struct scm_cookie scm; - - memset(&scm, 0, sizeof(scm)); - scm.pid =3D UNIXCB(skb).pid; - if (UNIXCB(skb).fp) - unix_detach_fds(&scm, skb); - - /* Alas, it calls VFS */ - /* So fscking what? fput() had been SMP-safe since the last Summer */ - scm_destroy(&scm); - sock_wfree(skb); -} -EXPORT_SYMBOL(unix_destruct_scm); diff --git a/net/unix/scm.h b/net/unix/scm.h deleted file mode 100644 index 5a255a477f160..0000000000000 --- a/net/unix/scm.h +++ /dev/null @@ -1,10 +0,0 @@ -#ifndef NET_UNIX_SCM_H -#define NET_UNIX_SCM_H - -extern struct list_head gc_inflight_list; -extern spinlock_t unix_gc_lock; - -int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb); -void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb); - -#endif --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1CD091DF72C; Wed, 21 May 2025 14:50:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839019; cv=none; b=QqWx+iv92uIWpCar0CtRB8tNycd0AjyA7Rw9GZUqhVY1kDanNeUo4H+gAsvo7hQ7o3pX1N/h5nRpIbfwCC1E4BRR+YWbh4FRss2tNFJ8pZrnsqoTxj7Cf5DLldPdU4sQBqUQqq+xgjkW+8pYIr8exxkUAjmvBh0i64bF50Rb6H4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839019; c=relaxed/simple; bh=TOLrDA2eeMqe0vtTBtCmPHLnSNOa4XzbiEXn1PfbsmI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=H8YPc8Vw9YlpWNzL/ElGF7AuryZgqt+GAO7N7Jy1BGBvyQKbrrzf/GcWAtRt6v4R+6EVvwBX7LmRO4sJQTVXfq2kAIEnJKx9wnfqEWO3xCFqeKmX1dLigie4HyoM52348Ar45Miky3zy3X8ImgpQiNft9yFasIj9mpvnCwx1ujQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DGMETCHK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DGMETCHK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A5E9CC4CEE4; Wed, 21 May 2025 14:50:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839018; bh=TOLrDA2eeMqe0vtTBtCmPHLnSNOa4XzbiEXn1PfbsmI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DGMETCHKjyHaPOa5wxkCkC/AOqg/az0lIPJnyOVODKa1xqg1SWzWbNLO6zT3WNKsI 27ko9aiFCBK3/zLYNGw68+SiWM+OzTq6q7h6FXbBlq1IoWto4OQCKuUuRllPlxRMEK 103GF/LeohFTIWioFzhBYpCg+sr5K9cRGou2jDIn+0vxu9BZQU3ANyoUjhAmGi1yQP B5Af6pJNmS/qSavfREQZFCFZhyOb+9oLU/5zwtAoPNj1Jm/7U1AoDVzWAvHCQUWysP ZUdfuVEtVeQF1AtY4LkPCxHDR71j/6mMvuHu0MdKmgUV0Dm7HS19VujUvp1bVHi3Gu 27Pu8rqQS07dA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 07/26] af_unix: Allocate struct unix_vertex for each inflight AF_UNIX fd. Date: Wed, 21 May 2025 14:45:15 +0000 Message-ID: <20250521144803.2050504-8-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 1fbfdfaa590248c1d86407f578e40e5c65136330 ] We will replace the garbage collection algorithm for AF_UNIX, where we will consider each inflight AF_UNIX socket as a vertex and its file descriptor as an edge in a directed graph. This patch introduces a new struct unix_vertex representing a vertex in the graph and adds its pointer to struct unix_sock. When we send a fd using the SCM_RIGHTS message, we allocate struct scm_fp_list to struct scm_cookie in scm_fp_copy(). Then, we bump each refcount of the inflight fds' struct file and save them in scm_fp_list.fp. After that, unix_attach_fds() inexplicably clones scm_fp_list of scm_cookie and sets it to skb. (We will remove this part after replacing GC.) Here, we add a new function call in unix_attach_fds() to preallocate struct unix_vertex per inflight AF_UNIX fd and link each vertex to skb's scm_fp_list.vertices. When sendmsg() succeeds later, if the socket of the inflight fd is still not inflight yet, we will set the preallocated vertex to struct unix_sock.vertex and link it to a global list unix_unvisited_vertices under spin_lock(&unix_gc_lock). If the socket is already inflight, we free the preallocated vertex. This is to avoid taking the lock unnecessarily when sendmsg() could fail later. In the following patch, we will similarly allocate another struct per edge, which will finally be linked to the inflight socket's unix_vertex.edges. And then, we will count the number of edges as unix_vertex.out_degree. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 1fbfdfaa590248c1d86407f578e40e5c65136330) Signed-off-by: Lee Jones --- include/net/af_unix.h | 9 +++++++++ include/net/scm.h | 3 +++ net/core/scm.c | 7 +++++++ net/unix/af_unix.c | 6 ++++++ net/unix/garbage.c | 38 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 63 insertions(+) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 3dee0b2721aa4..07f0f698c9490 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -22,9 +22,17 @@ extern unsigned int unix_tot_inflight; =20 void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); +int unix_prepare_fpl(struct scm_fp_list *fpl); +void unix_destroy_fpl(struct scm_fp_list *fpl); void unix_gc(void); void wait_for_unix_gc(struct scm_fp_list *fpl); =20 +struct unix_vertex { + struct list_head edges; + struct list_head entry; + unsigned long out_degree; +}; + struct sock *unix_peer_get(struct sock *sk); =20 #define UNIX_HASH_MOD (256 - 1) @@ -62,6 +70,7 @@ struct unix_sock { struct path path; struct mutex iolock, bindlock; struct sock *peer; + struct unix_vertex *vertex; struct list_head link; unsigned long inflight; spinlock_t lock; diff --git a/include/net/scm.h b/include/net/scm.h index 1ff6a28550644..11e86e55f332d 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -26,6 +26,9 @@ struct scm_fp_list { short count; short count_unix; short max; +#ifdef CONFIG_UNIX + struct list_head vertices; +#endif struct user_struct *user; struct file *fp[SCM_MAX_FD]; }; diff --git a/net/core/scm.c b/net/core/scm.c index 574607b1c2d96..27e5634c958e8 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -89,6 +89,9 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_f= p_list **fplp) fpl->count_unix =3D 0; fpl->max =3D SCM_MAX_FD; fpl->user =3D NULL; +#if IS_ENABLED(CONFIG_UNIX) + INIT_LIST_HEAD(&fpl->vertices); +#endif } fpp =3D &fpl->fp[fpl->count]; =20 @@ -376,8 +379,12 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl) if (new_fpl) { for (i =3D 0; i < fpl->count; i++) get_file(fpl->fp[i]); + new_fpl->max =3D new_fpl->count; new_fpl->user =3D get_uid(fpl->user); +#if IS_ENABLED(CONFIG_UNIX) + INIT_LIST_HEAD(&new_fpl->vertices); +#endif } return new_fpl; } diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 78758af2c6f38..6d62fa5b0e68d 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -979,6 +979,7 @@ static struct sock *unix_create1(struct net *net, struc= t socket *sock, int kern, sk->sk_destruct =3D unix_sock_destructor; u =3D unix_sk(sk); u->inflight =3D 0; + u->vertex =3D NULL; u->path.dentry =3D NULL; u->path.mnt =3D NULL; spin_lock_init(&u->lock); @@ -1782,6 +1783,9 @@ static int unix_attach_fds(struct scm_cookie *scm, st= ruct sk_buff *skb) for (i =3D scm->fp->count - 1; i >=3D 0; i--) unix_inflight(scm->fp->user, scm->fp->fp[i]); =20 + if (unix_prepare_fpl(UNIXCB(skb).fp)) + return -ENOMEM; + return 0; } =20 @@ -1792,6 +1796,8 @@ static void unix_detach_fds(struct scm_cookie *scm, s= truct sk_buff *skb) scm->fp =3D UNIXCB(skb).fp; UNIXCB(skb).fp =3D NULL; =20 + unix_destroy_fpl(scm->fp); + for (i =3D scm->fp->count - 1; i >=3D 0; i--) unix_notinflight(scm->fp->user, scm->fp->fp[i]); } diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 0104be9d47045..8ea7640e032e8 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -101,6 +101,44 @@ struct unix_sock *unix_get_socket(struct file *filp) return NULL; } =20 +static void unix_free_vertices(struct scm_fp_list *fpl) +{ + struct unix_vertex *vertex, *next_vertex; + + list_for_each_entry_safe(vertex, next_vertex, &fpl->vertices, entry) { + list_del(&vertex->entry); + kfree(vertex); + } +} + +int unix_prepare_fpl(struct scm_fp_list *fpl) +{ + struct unix_vertex *vertex; + int i; + + if (!fpl->count_unix) + return 0; + + for (i =3D 0; i < fpl->count_unix; i++) { + vertex =3D kmalloc(sizeof(*vertex), GFP_KERNEL); + if (!vertex) + goto err; + + list_add(&vertex->entry, &fpl->vertices); + } + + return 0; + +err: + unix_free_vertices(fpl); + return -ENOMEM; +} + +void unix_destroy_fpl(struct scm_fp_list *fpl) +{ + unix_free_vertices(fpl); +} + DEFINE_SPINLOCK(unix_gc_lock); unsigned int unix_tot_inflight; static LIST_HEAD(gc_candidates); --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B33971E1C09; Wed, 21 May 2025 14:50:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839026; cv=none; b=BwchxWb2c9tfeTEYg+eBU85Do5FpEfTs1CGjIlVYiyb+u5ggmV2woz9Ps3a11AvjIZcOtworzm03p4lddJbi9VkFvuNvy1wY90jmIPOwRH+xpIk7eU1NqZ8NIHQNGzTBtOAhxNoO7TszpFggLXJmrdLibyPqKelJEgQLa20p3+g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839026; c=relaxed/simple; bh=o/VWYxs0NRzWCaYUudCgt1Nw4n+OPgG9HufhAotjsSc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mHFEcB9+YF31+TC21nTksN79k3zrAeKQLX/g5cFP4z9ZK2JjA4C6FZTQ6tqqpXPnjMVQGUD410MkBTSc8tyqxw4fMQ/j5jqWVznjdazZdjtEukXSxtSJmAotqXuTQpOjwhQlGvPtBqVxhXR1Rjmnd9UqknMIcdt62TBjcua3Nvs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oxWlsvIi; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oxWlsvIi" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 523BAC4CEE4; Wed, 21 May 2025 14:50:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839026; bh=o/VWYxs0NRzWCaYUudCgt1Nw4n+OPgG9HufhAotjsSc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oxWlsvIiqFqv1BLpNncPpRpgA2+jM/zYBEzw0zB9UwOWclLwDU21W8KUv8eqfyi1r wyOZzCoz/ouY4tohzFCAg8U0TeM0MARTgSM25F9B72sIyPieLCZq8QZiCuu4w+bpoS wLXNUSxLTNHOwC22xu6JPoF62NPI3g2VJP9Z2/5lwshw4FSdU/j/Yb0yX+HoTpJr7v 5Gs9z+4o0RfZ1q8IC9C4KW8+rF54jZ4BCIRCVxc+mlTzkQV/e7KmfFXb4yr6kRyRyD YvGN8YikGKO4HJbcopimoEG/32WdIJHRqc65zQct72kyx5u3n6H41PNQqaJ00o/GTd tiIcBPxL75e0g== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 08/26] af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd. Date: Wed, 21 May 2025 14:45:16 +0000 Message-ID: <20250521144803.2050504-9-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 29b64e354029cfcf1eea4d91b146c7b769305930 ] As with the previous patch, we preallocate to skb's scm_fp_list an array of struct unix_edge in the number of inflight AF_UNIX fds. There we just preallocate memory and do not use immediately because sendmsg() could fail after this point. The actual use will be in the next patch. When we queue skb with inflight edges, we will set the inflight socket's unix_sock as unix_edge->predecessor and the receiver's unix_sock as successor, and then we will link the edge to the inflight socket's unix_vertex.edges. Note that we set NULL to cloned scm_fp_list.edges in scm_fp_dup() so that MSG_PEEK does not change the shape of the directed graph. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 29b64e354029cfcf1eea4d91b146c7b769305930) Signed-off-by: Lee Jones --- include/net/af_unix.h | 6 ++++++ include/net/scm.h | 5 +++++ net/core/scm.c | 2 ++ net/unix/garbage.c | 6 ++++++ 4 files changed, 19 insertions(+) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 07f0f698c9490..dd5750daf0b92 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -33,6 +33,12 @@ struct unix_vertex { unsigned long out_degree; }; =20 +struct unix_edge { + struct unix_sock *predecessor; + struct unix_sock *successor; + struct list_head vertex_entry; +}; + struct sock *unix_peer_get(struct sock *sk); =20 #define UNIX_HASH_MOD (256 - 1) diff --git a/include/net/scm.h b/include/net/scm.h index 11e86e55f332d..915c4c94306ec 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -22,12 +22,17 @@ struct scm_creds { kgid_t gid; }; =20 +#ifdef CONFIG_UNIX +struct unix_edge; +#endif + struct scm_fp_list { short count; short count_unix; short max; #ifdef CONFIG_UNIX struct list_head vertices; + struct unix_edge *edges; #endif struct user_struct *user; struct file *fp[SCM_MAX_FD]; diff --git a/net/core/scm.c b/net/core/scm.c index 27e5634c958e8..96e3d2785e509 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -90,6 +90,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_f= p_list **fplp) fpl->max =3D SCM_MAX_FD; fpl->user =3D NULL; #if IS_ENABLED(CONFIG_UNIX) + fpl->edges =3D NULL; INIT_LIST_HEAD(&fpl->vertices); #endif } @@ -383,6 +384,7 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl) new_fpl->max =3D new_fpl->count; new_fpl->user =3D get_uid(fpl->user); #if IS_ENABLED(CONFIG_UNIX) + new_fpl->edges =3D NULL; INIT_LIST_HEAD(&new_fpl->vertices); #endif } diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 8ea7640e032e8..912b7945692c9 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -127,6 +127,11 @@ int unix_prepare_fpl(struct scm_fp_list *fpl) list_add(&vertex->entry, &fpl->vertices); } =20 + fpl->edges =3D kvmalloc_array(fpl->count_unix, sizeof(*fpl->edges), + GFP_KERNEL_ACCOUNT); + if (!fpl->edges) + goto err; + return 0; =20 err: @@ -136,6 +141,7 @@ int unix_prepare_fpl(struct scm_fp_list *fpl) =20 void unix_destroy_fpl(struct scm_fp_list *fpl) { + kvfree(fpl->edges); unix_free_vertices(fpl); } =20 --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B77BF1DFD84; Wed, 21 May 2025 14:50:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839035; cv=none; b=PcSQT07sRfS+tVVoudFNnIJx+4dkru9A91M2xLlVrbRvhSfeJ1zcYb9okp4RW4iXsIZxfaFw71IKBeeWKV7mjcBPyS2VjVwv5DbAedFXkeuhim+6dWnBOjQ61jJfEJv1TBJin/U3DehNnMBMwc4o0Fv7Kw1zfL7keIoQHnaTDWk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839035; c=relaxed/simple; bh=WMTznbR6ePA4gSowAlz0pUqMZN0xJeUBgveOaz/kO8w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=luWgOKtBolfd30wCfaq2RFG5K1522NOjkNZ9qaf0Cqxud3QznnRav3KNWXJcpPFez//c63Lb3hq2fQebIy7vkVsBg2bxg6Z5/JBIMFGyT4s9lizenr34GObhrPowdrMPzYdkwWxqcu7sbX2LhQRVFbyyTZjUpllgSfn95rgErcE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CIrmhVhh; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CIrmhVhh" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E5B4DC4CEE4; Wed, 21 May 2025 14:50:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839034; bh=WMTznbR6ePA4gSowAlz0pUqMZN0xJeUBgveOaz/kO8w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CIrmhVhhy3wfmXNVQ0dVSgaIya7iuS5/TfnW6UDJmTR7K365ulvaV98Jj3Zg7Gt9k miTxcyGx4sDm6278KzYRU1pVfXc07LIilyamTVk1rDxUBofUxqK0was3Bv1hCq9CI8 /VRg/QmpGz7EyyyaQAEfeuSQ85krc+FvJA/b30CCPTdj6m1g5js0+CAnTGFJ2vk8lp u3ONM6P+isQ/6cFl0ZqIUNSfdprNfl/wnSr4b1vpkj2gk/Te9lOVK75VmoNksaeZoI D7nIkyDvszjVYSpK3AjZMSK0h51vWESFQYVlfXeEEgVMC/DmbN3xZQwvaR1CwDdO8Z oMUtL5nd3JgzA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 09/26] af_unix: Link struct unix_edge when queuing skb. Date: Wed, 21 May 2025 14:45:17 +0000 Message-ID: <20250521144803.2050504-10-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 42f298c06b30bfe0a8cbee5d38644e618699e26e ] Just before queuing skb with inflight fds, we call scm_stat_add(), which is a good place to set up the preallocated struct unix_vertex and struct unix_edge in UNIXCB(skb).fp. Then, we call unix_add_edges() and construct the directed graph as follows: 1. Set the inflight socket's unix_sock to unix_edge.predecessor. 2. Set the receiver's unix_sock to unix_edge.successor. 3. Set the preallocated vertex to inflight socket's unix_sock.vertex. 4. Link inflight socket's unix_vertex.entry to unix_unvisited_vertices. 5. Link unix_edge.vertex_entry to the inflight socket's unix_vertex.edges. Let's say we pass the fd of AF_UNIX socket A to B and the fd of B to C. The graph looks like this: +-------------------------+ | unix_unvisited_vertices | <-------------------------. +-------------------------+ | + | | +--------------+ +--------------+ | +--------= ------+ | | unix_sock A | <---. .---> | unix_sock B | <-|-. .---> | unix_s= ock C | | +--------------+ | | +--------------+ | | | +--------= ------+ | .-+ | vertex | | | .-+ | vertex | | | | | vert= ex | | | +--------------+ | | | +--------------+ | | | +--------= ------+ | | | | | | | | | | +--------------+ | | | +--------------+ | | | | '-> | unix_vertex | | | '-> | unix_vertex | | | | | +--------------+ | | +--------------+ | | | `---> | entry | +---------> | entry | +-' | | |--------------| | | |--------------| | | | edges | <-. | | | edges | <-. | | +--------------+ | | | +--------------+ | | | | | | | | | .----------------------' | | .----------------------' | | | | | | | | | +--------------+ | | | +--------------+ | | | | unix_edge | | | | | unix_edge | | | | +--------------+ | | | +--------------+ | | `-> | vertex_entry | | | `-> | vertex_entry | | | |--------------| | | |--------------| | | | predecessor | +---' | | predecessor | +---' | |--------------| | |--------------| | | successor | +-----' | successor | +-----' +--------------+ +--------------+ Henceforth, we denote such a graph as A -> B (-> C). Now, we can express all inflight fd graphs that do not contain embryo sockets. We will support the particular case later. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 42f298c06b30bfe0a8cbee5d38644e618699e26e) Signed-off-by: Lee Jones --- include/net/af_unix.h | 2 + include/net/scm.h | 1 + net/core/scm.c | 2 + net/unix/af_unix.c | 8 +++- net/unix/garbage.c | 90 ++++++++++++++++++++++++++++++++++++++++++- 5 files changed, 100 insertions(+), 3 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index dd5750daf0b92..affcb990f95e2 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -22,6 +22,8 @@ extern unsigned int unix_tot_inflight; =20 void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); +void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver); +void unix_del_edges(struct scm_fp_list *fpl); int unix_prepare_fpl(struct scm_fp_list *fpl); void unix_destroy_fpl(struct scm_fp_list *fpl); void unix_gc(void); diff --git a/include/net/scm.h b/include/net/scm.h index 915c4c94306ec..07d66c41cc33c 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -31,6 +31,7 @@ struct scm_fp_list { short count_unix; short max; #ifdef CONFIG_UNIX + bool inflight; struct list_head vertices; struct unix_edge *edges; #endif diff --git a/net/core/scm.c b/net/core/scm.c index 96e3d2785e509..1e47788379c2c 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -90,6 +90,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_f= p_list **fplp) fpl->max =3D SCM_MAX_FD; fpl->user =3D NULL; #if IS_ENABLED(CONFIG_UNIX) + fpl->inflight =3D false; fpl->edges =3D NULL; INIT_LIST_HEAD(&fpl->vertices); #endif @@ -384,6 +385,7 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl) new_fpl->max =3D new_fpl->count; new_fpl->user =3D get_uid(fpl->user); #if IS_ENABLED(CONFIG_UNIX) + new_fpl->inflight =3D false; new_fpl->edges =3D NULL; INIT_LIST_HEAD(&new_fpl->vertices); #endif diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 6d62fa5b0e68d..e54f54f9d9948 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1920,8 +1920,10 @@ static void scm_stat_add(struct sock *sk, struct sk_= buff *skb) struct scm_fp_list *fp =3D UNIXCB(skb).fp; struct unix_sock *u =3D unix_sk(sk); =20 - if (unlikely(fp && fp->count)) + if (unlikely(fp && fp->count)) { atomic_add(fp->count, &u->scm_stat.nr_fds); + unix_add_edges(fp, u); + } } =20 static void scm_stat_del(struct sock *sk, struct sk_buff *skb) @@ -1929,8 +1931,10 @@ static void scm_stat_del(struct sock *sk, struct sk_= buff *skb) struct scm_fp_list *fp =3D UNIXCB(skb).fp; struct unix_sock *u =3D unix_sk(sk); =20 - if (unlikely(fp && fp->count)) + if (unlikely(fp && fp->count)) { atomic_sub(fp->count, &u->scm_stat.nr_fds); + unix_del_edges(fp); + } } =20 /* diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 912b7945692c9..b5b4a200dbf3b 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -101,6 +101,38 @@ struct unix_sock *unix_get_socket(struct file *filp) return NULL; } =20 +static LIST_HEAD(unix_unvisited_vertices); + +static void unix_add_edge(struct scm_fp_list *fpl, struct unix_edge *edge) +{ + struct unix_vertex *vertex =3D edge->predecessor->vertex; + + if (!vertex) { + vertex =3D list_first_entry(&fpl->vertices, typeof(*vertex), entry); + vertex->out_degree =3D 0; + INIT_LIST_HEAD(&vertex->edges); + + list_move_tail(&vertex->entry, &unix_unvisited_vertices); + edge->predecessor->vertex =3D vertex; + } + + vertex->out_degree++; + list_add_tail(&edge->vertex_entry, &vertex->edges); +} + +static void unix_del_edge(struct scm_fp_list *fpl, struct unix_edge *edge) +{ + struct unix_vertex *vertex =3D edge->predecessor->vertex; + + list_del(&edge->vertex_entry); + vertex->out_degree--; + + if (!vertex->out_degree) { + edge->predecessor->vertex =3D NULL; + list_move_tail(&vertex->entry, &fpl->vertices); + } +} + static void unix_free_vertices(struct scm_fp_list *fpl) { struct unix_vertex *vertex, *next_vertex; @@ -111,6 +143,60 @@ static void unix_free_vertices(struct scm_fp_list *fpl) } } =20 +DEFINE_SPINLOCK(unix_gc_lock); + +void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver) +{ + int i =3D 0, j =3D 0; + + spin_lock(&unix_gc_lock); + + if (!fpl->count_unix) + goto out; + + do { + struct unix_sock *inflight =3D unix_get_socket(fpl->fp[j++]); + struct unix_edge *edge; + + if (!inflight) + continue; + + edge =3D fpl->edges + i++; + edge->predecessor =3D inflight; + edge->successor =3D receiver; + + unix_add_edge(fpl, edge); + } while (i < fpl->count_unix); + +out: + spin_unlock(&unix_gc_lock); + + fpl->inflight =3D true; + + unix_free_vertices(fpl); +} + +void unix_del_edges(struct scm_fp_list *fpl) +{ + int i =3D 0; + + spin_lock(&unix_gc_lock); + + if (!fpl->count_unix) + goto out; + + do { + struct unix_edge *edge =3D fpl->edges + i++; + + unix_del_edge(fpl, edge); + } while (i < fpl->count_unix); + +out: + spin_unlock(&unix_gc_lock); + + fpl->inflight =3D false; +} + int unix_prepare_fpl(struct scm_fp_list *fpl) { struct unix_vertex *vertex; @@ -141,11 +227,13 @@ int unix_prepare_fpl(struct scm_fp_list *fpl) =20 void unix_destroy_fpl(struct scm_fp_list *fpl) { + if (fpl->inflight) + unix_del_edges(fpl); + kvfree(fpl->edges); unix_free_vertices(fpl); } =20 -DEFINE_SPINLOCK(unix_gc_lock); unsigned int unix_tot_inflight; static LIST_HEAD(gc_candidates); static LIST_HEAD(gc_inflight_list); --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E592B1E2838; Wed, 21 May 2025 14:50:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839042; cv=none; b=CNvwy5KQKjOKuDBazPxxdUtDdRt5VQrOhOuJ2ts4axsH+TTQYmscAHcwQn858bX9IN63vxPu7hMgvjZBna7bbJ0xrVt31ooDENuT5S+GcUmig+eYmJWdP+3PSIJD6GE72oLv/ZISCe5L2Jei/rYzaUau0pHYrzoLJjRZK0WcJZU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839042; c=relaxed/simple; bh=Xv8DFqpPzl8U7cumbdw7UaBSqzhuHGhsvgbA9jOrjW0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ugwXflJ01fDSUUegK5J+l5J8s2BJIU59OINq/ciml8nkUaFJCD9tw0a9a62R9uoJLFZvZdrKBusVHln2yUeEfa4Q3axzTj4dAzaX7scHVxDrQCTQH2Hj4zhaHG2qNIESgYpFf5a5OiNvlNJQ86T5B2hgBlBdj+jI0k6Sa/JWuIU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FCx1sjJV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FCx1sjJV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8C0D6C4CEE4; Wed, 21 May 2025 14:50:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839041; bh=Xv8DFqpPzl8U7cumbdw7UaBSqzhuHGhsvgbA9jOrjW0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FCx1sjJVBOX0nGpVYdPy8Ck2pEn0ZvSN9SR5QZFNax0vIDriFzl4mh+8uHlyC++0g RfMTN6tIAGBqoYIMt4LslBnA/QXz1naYTqDwb5VvcMJyN0hroxGHn5vrwLr7RVrJek +B53MFzuUQRWarJX3yZch0M1O1W0HsWsHiRmzDp2bAuVhhcv8Ljp9u7+DyZi8BXaD1 sSh7xZZA2CpXXgfck663nO0byOOmcQMcqQHPKwf7o3zv1sj/mmgYbjkk7JMTNwW9mJ JuFqoBXbDKIa1oS6ApF5x+2ugVovdozcXY1+/pGoyhMlkiO3MU89VRh4G+OfEpOj2/ VJKhaMATtcS9w== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 10/26] af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb. Date: Wed, 21 May 2025 14:45:18 +0000 Message-ID: <20250521144803.2050504-11-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 22c3c0c52d32f41cc38cd936ea0c93f22ced3315 ] Currently, we track the number of inflight sockets in two variables. unix_tot_inflight is the total number of inflight AF_UNIX sockets on the host, and user->unix_inflight is the number of inflight fds per user. We update them one by one in unix_inflight(), which can be done once in batch. Also, sendmsg() could fail even after unix_inflight(), then we need to acquire unix_gc_lock only to decrement the counters. Let's bulk update the counters in unix_add_edges() and unix_del_edges(), which is called only for successfully passed fds. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 22c3c0c52d32f41cc38cd936ea0c93f22ced3315) Signed-off-by: Lee Jones --- net/unix/garbage.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index b5b4a200dbf3b..f7041fc230008 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -144,6 +144,7 @@ static void unix_free_vertices(struct scm_fp_list *fpl) } =20 DEFINE_SPINLOCK(unix_gc_lock); +unsigned int unix_tot_inflight; =20 void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver) { @@ -168,7 +169,10 @@ void unix_add_edges(struct scm_fp_list *fpl, struct un= ix_sock *receiver) unix_add_edge(fpl, edge); } while (i < fpl->count_unix); =20 + WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + fpl->count_unix); out: + WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight + fpl->coun= t); + spin_unlock(&unix_gc_lock); =20 fpl->inflight =3D true; @@ -191,7 +195,10 @@ void unix_del_edges(struct scm_fp_list *fpl) unix_del_edge(fpl, edge); } while (i < fpl->count_unix); =20 + WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - fpl->count_unix); out: + WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight - fpl->coun= t); + spin_unlock(&unix_gc_lock); =20 fpl->inflight =3D false; @@ -234,7 +241,6 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) unix_free_vertices(fpl); } =20 -unsigned int unix_tot_inflight; static LIST_HEAD(gc_candidates); static LIST_HEAD(gc_inflight_list); =20 @@ -255,13 +261,8 @@ void unix_inflight(struct user_struct *user, struct fi= le *filp) WARN_ON_ONCE(list_empty(&u->link)); } u->inflight++; - - /* Paired with READ_ONCE() in wait_for_unix_gc() */ - WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + 1); } =20 - WRITE_ONCE(user->unix_inflight, user->unix_inflight + 1); - spin_unlock(&unix_gc_lock); } =20 @@ -278,13 +279,8 @@ void unix_notinflight(struct user_struct *user, struct= file *filp) u->inflight--; if (!u->inflight) list_del_init(&u->link); - - /* Paired with READ_ONCE() in wait_for_unix_gc() */ - WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - 1); } =20 - WRITE_ONCE(user->unix_inflight, user->unix_inflight - 1); - spin_unlock(&unix_gc_lock); } =20 --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 751DD1E32D3; Wed, 21 May 2025 14:50:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839049; cv=none; b=Gfd4685iiz+IDwAcJaBAbA8i/4VqPHDfQa9PTmn0tOi6sKFeYBSqQb2VEfUsDebBxYvlWuO4UkZ4GuN2k8e3B2t4TITjFl6Wj4tIibX8CarW3YLBAcYM03n+S0v5AwsD4cKlpe9Zm3vc1SGOVoHeANopFuTg1lbmLKBJrnpVOws= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839049; c=relaxed/simple; bh=u2ZzaYvbGgy2F20uevAjxXFAtKBM0vNIjQjeYH5pvL8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Iy4hB7m7PpE2r25gzhCebZHyzeMIwwKEdo8n3SdOLJU5TISodaUcQS9XbtmVhFuFrtqptgUnv4vUQPZhJfOrueoHD4M9oWFECVh2soZm7WTTDA9gvu6GcXbVXInagekvrxY4LxpGJoEUGpKaekNphldQpbmjbD9vAxfQKEsN/sw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=j11N1BJ2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="j11N1BJ2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1CC69C4CEE4; Wed, 21 May 2025 14:50:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839049; bh=u2ZzaYvbGgy2F20uevAjxXFAtKBM0vNIjQjeYH5pvL8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=j11N1BJ2hXLNq+8QqZ4ddOPxS22i0lpYjtX7HVLZm0qD7H1FBYDYING4v5N+EDY/C 50artXd5IGCohx3DqnjnzmdqNcScYS10rlsHdDS2M569NSbomGxi1HtjPZPL2SdYsd utBulrmsDO10q6PNKctb5JCi7+UgAVQCGRCW0KNVDnTWfS80TzVChWQjCWJbfHF//j ngfGA7jGGauQ0A7h8iz43iDf/Jm3+8qaCricg0wCZqZFfMCOo72zr/P3IUrxlIARlU w15IDnUgKNlKenLXtYEZUSuFUsMnDDWds8JJeaLSg4ru05iGBWscsJj/Cldbg+L2dn 33+ay0T3oJLvA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 11/26] af_unix: Iterate all vertices by DFS. Date: Wed, 21 May 2025 14:45:19 +0000 Message-ID: <20250521144803.2050504-12-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 6ba76fd2848e107594ea4f03b737230f74bc23ea ] The new GC will use a depth first search graph algorithm to find cyclic references. The algorithm visits every vertex exactly once. Here, we implement the DFS part without recursion so that no one can abuse it. unix_walk_scc() marks every vertex unvisited by initialising index as UNIX_VERTEX_INDEX_UNVISITED and iterates inflight vertices in unix_unvisited_vertices and call __unix_walk_scc() to start DFS from an arbitrary vertex. __unix_walk_scc() iterates all edges starting from the vertex and explores the neighbour vertices with DFS using edge_stack. After visiting all neighbours, __unix_walk_scc() moves the visited vertex to unix_visited_vertices so that unix_walk_scc() will not restart DFS from the visited vertex. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 6ba76fd2848e107594ea4f03b737230f74bc23ea) Signed-off-by: Lee Jones --- include/net/af_unix.h | 2 ++ net/unix/garbage.c | 74 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 76 insertions(+) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index affcb990f95e2..9198735a6acb0 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -33,12 +33,14 @@ struct unix_vertex { struct list_head edges; struct list_head entry; unsigned long out_degree; + unsigned long index; }; =20 struct unix_edge { struct unix_sock *predecessor; struct unix_sock *successor; struct list_head vertex_entry; + struct list_head stack_entry; }; =20 struct sock *unix_peer_get(struct sock *sk); diff --git a/net/unix/garbage.c b/net/unix/garbage.c index f7041fc230008..295dd1a7b8e0f 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -103,6 +103,11 @@ struct unix_sock *unix_get_socket(struct file *filp) =20 static LIST_HEAD(unix_unvisited_vertices); =20 +enum unix_vertex_index { + UNIX_VERTEX_INDEX_UNVISITED, + UNIX_VERTEX_INDEX_START, +}; + static void unix_add_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex =3D edge->predecessor->vertex; @@ -241,6 +246,73 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) unix_free_vertices(fpl); } =20 +static LIST_HEAD(unix_visited_vertices); + +static void __unix_walk_scc(struct unix_vertex *vertex) +{ + unsigned long index =3D UNIX_VERTEX_INDEX_START; + struct unix_edge *edge; + LIST_HEAD(edge_stack); + +next_vertex: + vertex->index =3D index; + index++; + + /* Explore neighbour vertices (receivers of the current vertex's fd). */ + list_for_each_entry(edge, &vertex->edges, vertex_entry) { + struct unix_vertex *next_vertex =3D edge->successor->vertex; + + if (!next_vertex) + continue; + + if (next_vertex->index =3D=3D UNIX_VERTEX_INDEX_UNVISITED) { + /* Iterative deepening depth first search + * + * 1. Push a forward edge to edge_stack and set + * the successor to vertex for the next iteration. + */ + list_add(&edge->stack_entry, &edge_stack); + + vertex =3D next_vertex; + goto next_vertex; + + /* 2. Pop the edge directed to the current vertex + * and restore the ancestor for backtracking. + */ +prev_vertex: + edge =3D list_first_entry(&edge_stack, typeof(*edge), stack_entry); + list_del_init(&edge->stack_entry); + + vertex =3D edge->predecessor->vertex; + } + } + + /* Don't restart DFS from this vertex in unix_walk_scc(). */ + list_move_tail(&vertex->entry, &unix_visited_vertices); + + /* Need backtracking ? */ + if (!list_empty(&edge_stack)) + goto prev_vertex; +} + +static void unix_walk_scc(void) +{ + struct unix_vertex *vertex; + + list_for_each_entry(vertex, &unix_unvisited_vertices, entry) + vertex->index =3D UNIX_VERTEX_INDEX_UNVISITED; + + /* Visit every vertex exactly once. + * __unix_walk_scc() moves visited vertices to unix_visited_vertices. + */ + while (!list_empty(&unix_unvisited_vertices)) { + vertex =3D list_first_entry(&unix_unvisited_vertices, typeof(*vertex), e= ntry); + __unix_walk_scc(vertex); + } + + list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); +} + static LIST_HEAD(gc_candidates); static LIST_HEAD(gc_inflight_list); =20 @@ -388,6 +460,8 @@ static void __unix_gc(struct work_struct *work) =20 spin_lock(&unix_gc_lock); =20 + unix_walk_scc(); + /* First, select candidates for garbage collection. Only * in-flight sockets are considered, and from those only ones * which don't have any external reference. --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E87C22173F; Wed, 21 May 2025 14:50:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839057; cv=none; b=Kt/amGIDYScZve5trAqCkaIZf0BGU5APhgwkS+qOD3rcnRmN3AIpUID/YB+AKRzvfVgwp6vJj7ycLWmV/PFwuhSZvHTTwXNoUnM4a1rZSvAV/HpQ99YLEzm3iomLdOuBDRA/eM0NxpK7EHvO91tn8ABkn9ckzRB/85wn6U3Bniw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839057; c=relaxed/simple; bh=qQc3vuQx4R+xhH0ECzFCqUUCufG4WSasMZ8FEbnAfWU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=po59V+fJrtHlI775NqZ/ORlOO7dYomq2sgHSr/4bnMAQ7O8O72wWiUss8+4hV06jD0EFRikkG7v2zwtajqZxNQc44tWARGsRu5MhC81YDUcVZl7ZzSyLZVjqWy7MurukEE6tX++xbJunzju4UIJaOta3V066u5GQPmXUT70S7g8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FnYQUXLB; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FnYQUXLB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9CD88C4CEE4; Wed, 21 May 2025 14:50:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839056; bh=qQc3vuQx4R+xhH0ECzFCqUUCufG4WSasMZ8FEbnAfWU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FnYQUXLBbZ3Mh0EoLmtj1Q95gz4MBL05bdDnb23Cj7hoSezHQAGSNJJ2hKD6DZ2/a 8kOUuV8V7DatKSBhgOpc/ZkI0n5vWzocxIGGAcRltAiyKnoTVKnk3chRXH/TIuqE7+ xSGj6WvSeoa042NetWaEyWR9yUVAQSLuuzPM8Fs4zFNLGkKM5MBMH07Roi5Z60sKeH Gic2F0poJYDWSde2xxd19FmPRDJoq9+H7lNCxo2m8Miahk79xHsJkL9+FOuPycWIQd cODnIz/RxEDZwzlT6e3g7nvlgW8m5c02hG/HmOh8nM3bpounGh1I3VcmJ5u4hPW4jW qaOmYZO+0vBWw== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 12/26] af_unix: Detect Strongly Connected Components. Date: Wed, 21 May 2025 14:45:20 +0000 Message-ID: <20250521144803.2050504-13-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 3484f063172dd88776b062046d721d7c2ae1af7c ] In the new GC, we use a simple graph algorithm, Tarjan's Strongly Connected Components (SCC) algorithm, to find cyclic references. The algorithm visits every vertex exactly once using depth-first search (DFS). DFS starts by pushing an input vertex to a stack and assigning it a unique number. Two fields, index and lowlink, are initialised with the number, but lowlink could be updated later during DFS. If a vertex has an edge to an unvisited inflight vertex, we visit it and do the same processing. So, we will have vertices in the stack in the order they appear and number them consecutively in the same order. If a vertex has a back-edge to a visited vertex in the stack, we update the predecessor's lowlink with the successor's index. After iterating edges from the vertex, we check if its index equals its lowlink. If the lowlink is different from the index, it shows there was a back-edge. Then, we go backtracking and propagate the lowlink to its predecessor and resume the previous edge iteration from the next edge. If the lowlink is the same as the index, we pop vertices before and including the vertex from the stack. Then, the set of vertices is SCC, possibly forming a cycle. At the same time, we move the vertices to unix_visited_vertices. When we finish the algorithm, all vertices in each SCC will be linked via unix_vertex.scc_entry. Let's take an example. We have a graph including five inflight vertices (F is not inflight): A -> B -> C -> D -> E (-> F) ^ | `---------' Suppose that we start DFS from C. We will visit C, D, and B first and initialise their index and lowlink. Then, the stack looks like this: > B =3D (3, 3) (index, lowlink) D =3D (2, 2) C =3D (1, 1) When checking B's edge to C, we update B's lowlink with C's index and propagate it to D. B =3D (3, 1) (index, lowlink) > D =3D (2, 1) C =3D (1, 1) Next, we visit E, which has no edge to an inflight vertex. > E =3D (4, 4) (index, lowlink) B =3D (3, 1) D =3D (2, 1) C =3D (1, 1) When we leave from E, its index and lowlink are the same, so we pop E from the stack as single-vertex SCC. Next, we leave from B and D but do nothing because their lowlink are different from their index. B =3D (3, 1) (index, lowlink) D =3D (2, 1) > C =3D (1, 1) Then, we leave from C, whose index and lowlink are the same, so we pop B, D and C as SCC. Last, we do DFS for the rest of vertices, A, which is also a single-vertex SCC. Finally, each unix_vertex.scc_entry is linked as follows: A -. B -> C -> D E -. ^ | ^ | ^ | `--' `---------' `--' We use SCC later to decide whether we can garbage-collect the sockets. Note that we still cannot detect SCC properly if an edge points to an embryo socket. The following two patches will sort it out. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 3484f063172dd88776b062046d721d7c2ae1af7c) Signed-off-by: Lee Jones --- include/net/af_unix.h | 3 +++ net/unix/garbage.c | 46 +++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 47 insertions(+), 2 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 9198735a6acb0..37171943fb542 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -32,8 +32,11 @@ void wait_for_unix_gc(struct scm_fp_list *fpl); struct unix_vertex { struct list_head edges; struct list_head entry; + struct list_head scc_entry; unsigned long out_degree; unsigned long index; + unsigned long lowlink; + bool on_stack; }; =20 struct unix_edge { diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 295dd1a7b8e0f..cdeff548e1307 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -251,11 +251,19 @@ static LIST_HEAD(unix_visited_vertices); static void __unix_walk_scc(struct unix_vertex *vertex) { unsigned long index =3D UNIX_VERTEX_INDEX_START; + LIST_HEAD(vertex_stack); struct unix_edge *edge; LIST_HEAD(edge_stack); =20 next_vertex: + /* Push vertex to vertex_stack. + * The vertex will be popped when finalising SCC later. + */ + vertex->on_stack =3D true; + list_add(&vertex->scc_entry, &vertex_stack); + vertex->index =3D index; + vertex->lowlink =3D index; index++; =20 /* Explore neighbour vertices (receivers of the current vertex's fd). */ @@ -283,12 +291,46 @@ static void __unix_walk_scc(struct unix_vertex *verte= x) edge =3D list_first_entry(&edge_stack, typeof(*edge), stack_entry); list_del_init(&edge->stack_entry); =20 + next_vertex =3D vertex; vertex =3D edge->predecessor->vertex; + + /* If the successor has a smaller lowlink, two vertices + * are in the same SCC, so propagate the smaller lowlink + * to skip SCC finalisation. + */ + vertex->lowlink =3D min(vertex->lowlink, next_vertex->lowlink); + } else if (next_vertex->on_stack) { + /* Loop detected by a back/cross edge. + * + * The successor is on vertex_stack, so two vertices are + * in the same SCC. If the successor has a smaller index, + * propagate it to skip SCC finalisation. + */ + vertex->lowlink =3D min(vertex->lowlink, next_vertex->index); + } else { + /* The successor was already grouped as another SCC */ } } =20 - /* Don't restart DFS from this vertex in unix_walk_scc(). */ - list_move_tail(&vertex->entry, &unix_visited_vertices); + if (vertex->index =3D=3D vertex->lowlink) { + struct list_head scc; + + /* SCC finalised. + * + * If the lowlink was not updated, all the vertices above on + * vertex_stack are in the same SCC. Group them using scc_entry. + */ + __list_cut_position(&scc, &vertex_stack, &vertex->scc_entry); + + list_for_each_entry_reverse(vertex, &scc, scc_entry) { + /* Don't restart DFS from this vertex in unix_walk_scc(). */ + list_move_tail(&vertex->entry, &unix_visited_vertices); + + vertex->on_stack =3D false; + } + + list_del(&scc); + } =20 /* Need backtracking ? */ if (!list_empty(&edge_stack)) --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CBB4C1E503D; Wed, 21 May 2025 14:51:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839064; cv=none; b=rxylDw/ksyqeICvE8OXOWLfBSvZmJ4/KinfB+xj7wxnStoTiuQOXE1NqGSP//T22aoSmcyqcPJkO76q5m5uDehFI/pBgf4rLyB/GFR9fRxWzbIJXOuS/jpreHIkfuLcEVu45+Z5H6yng78K+aA76vth41AKg2Mt81iNqfm4eym4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839064; c=relaxed/simple; bh=PKUNOwIpuUE5YO5nXlI0AOnbmxAlCcv9+44QlYtduwY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Qmo27YW6uQMNklZP8fQVF6rSHvL9jQPLbz9qfWA9oi+AVevPFyBGFdkZBKcC0eDlcX+npqx71qAg6jLIfv+eTbUgdMfRw9p1OrFwCPXh3FLUsp/rcMYiyH7ctFweH1J0O1eCDoeKP59/JIDNABvYpDNvPxybXo17HKMgVZMvoUg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=j8Uncq5h; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="j8Uncq5h" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 73C61C4CEE4; Wed, 21 May 2025 14:51:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839064; bh=PKUNOwIpuUE5YO5nXlI0AOnbmxAlCcv9+44QlYtduwY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=j8Uncq5heV8YgpaxK/ezZHH+Yuw74bNgNs6OnHGKTeX/HF2qh1AY+xtwRD5l/WSHg 47b7DjeZKLPRch+sw5LjP4DRInpPkH79onCjXrcXxZYnLxNi0dFGQutCnmQw6Iz2dQ 3vXueZbFxG+Bt7UuWwxUvBpPsM0MIZae5HqQXDvdRRuvo3J+sP3gu3MrHbLKwttSJW McrmjDA21W/WGPul607OjXPSatq/T77+AlmAxkbEIzhvpvamYnJhyI1XeW0/GtkT2Q 1jh9X4lraHX+AMj7iditTyp1ReVU3VXjFVpUqwX7L4q/ynxi5H2Rn7lE3tinM4h/XP yeAcuHtaLo4AA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 13/26] af_unix: Save listener for embryo socket. Date: Wed, 21 May 2025 14:45:21 +0000 Message-ID: <20250521144803.2050504-14-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit aed6ecef55d70de3762ce41c561b7f547dbaf107 ] This is a prep patch for the following change, where we need to fetch the listening socket from the successor embryo socket during GC. We add a new field to struct unix_sock to save a pointer to a listening socket. We set it when connect() creates a new socket, and clear it when accept() is called. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit aed6ecef55d70de3762ce41c561b7f547dbaf107) Signed-off-by: Lee Jones --- include/net/af_unix.h | 1 + net/unix/af_unix.c | 5 ++++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 37171943fb542..d6b755b254a17 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -83,6 +83,7 @@ struct unix_sock { struct path path; struct mutex iolock, bindlock; struct sock *peer; + struct sock *listener; struct unix_vertex *vertex; struct list_head link; unsigned long inflight; diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index e54f54f9d9948..4d4c035ba626d 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -978,6 +978,7 @@ static struct sock *unix_create1(struct net *net, struc= t socket *sock, int kern, sk->sk_max_ack_backlog =3D READ_ONCE(net->unx.sysctl_max_dgram_qlen); sk->sk_destruct =3D unix_sock_destructor; u =3D unix_sk(sk); + u->listener =3D NULL; u->inflight =3D 0; u->vertex =3D NULL; u->path.dentry =3D NULL; @@ -1582,6 +1583,7 @@ static int unix_stream_connect(struct socket *sock, s= truct sockaddr *uaddr, newsk->sk_type =3D sk->sk_type; init_peercred(newsk); newu =3D unix_sk(newsk); + newu->listener =3D other; RCU_INIT_POINTER(newsk->sk_wq, &newu->peer_wq); otheru =3D unix_sk(other); =20 @@ -1677,8 +1679,8 @@ static int unix_accept(struct socket *sock, struct so= cket *newsock, int flags, bool kern) { struct sock *sk =3D sock->sk; - struct sock *tsk; struct sk_buff *skb; + struct sock *tsk; int err; =20 err =3D -EOPNOTSUPP; @@ -1703,6 +1705,7 @@ static int unix_accept(struct socket *sock, struct so= cket *newsock, int flags, } =20 tsk =3D skb->sk; + unix_sk(tsk)->listener =3D NULL; skb_free_datagram(sk, skb); wake_up_interruptible(&unix_sk(sk)->peer_wait); =20 --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5437B1E5B60; Wed, 21 May 2025 14:51:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839072; cv=none; b=H86nhlwSBbWY7DgiqD/Kjfy/rYEugquAIluCdS4PEiI2i2e+q3jAEPziEqUsYDtxgUJW5lUZsP/Y2jZ32cHEdYgagRzBUrdpxA91cUtbEW88kVE9LZcOMbtD0sHWiT139kuf7cVuWbrOqTZ8ajqI+RPAIbzpqvr2RdqTinQ+jHc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839072; c=relaxed/simple; bh=Z9SeKw4LkfELpZ/nUOPNNN0qUN4nI9p96rYaieXj5Sk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OvYSyWFAtiOGkaxY//oO5GuzymQNpi2EpXqGKumdWRoyRBY6szw8I4O7DV9zNz4qlF8VVGjsFDfi6COSZElbisUQ1Uv9kmD8Cz90omgcDUtTHHVaaaP91nMHZpssr1ihOtf8my1aVDrfPoQFHAziPd2tVrtUxPAzWwmwRIFXeMw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RvOOhQiS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RvOOhQiS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EFDDDC4CEF1; Wed, 21 May 2025 14:51:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839072; bh=Z9SeKw4LkfELpZ/nUOPNNN0qUN4nI9p96rYaieXj5Sk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=RvOOhQiSh4bZ4gRUU/f8R+xs1V7LQWEsvyt/J/DeS/rUnoXZqkZIfRVJAoMw/Ix+l XIiNapFvBpWS6KrV+TtZ8Kh0CZvfitlzK9RuJTjXO8QIIHq+4vs/urF9mZNwnUT2CG OYXDutEhRMdsjGLGT4dlaw3gKm0J7cAqDh9JOuQhLPKLaJfKFmOwFvr1+C1arscMjC pqEO+DDnsnbD+9U6BYMITTKEn3IHUHPI9Hd50FLAAI9Zq5087hccZIhXEHlw9HS6PB XqqIydYHHX8rlJ1rvtBrf1OVxpO6zQv47KZ63xCmlsizo9ZG3WG99yWxOet1+2rvCL bMrjzPIeBJncg== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 14/26] af_unix: Fix up unix_edge.successor for embryo socket. Date: Wed, 21 May 2025 14:45:22 +0000 Message-ID: <20250521144803.2050504-15-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit dcf70df2048d27c5d186f013f101a4aefd63aa41 ] To garbage collect inflight AF_UNIX sockets, we must define the cyclic reference appropriately. This is a bit tricky if the loop consists of embryo sockets. Suppose that the fd of AF_UNIX socket A is passed to D and the fd B to C and that C and D are embryo sockets of A and B, respectively. It may appear that there are two separate graphs, A (-> D) and B (-> C), but this is not correct. A --. .-- B X C <-' `-> D Now, D holds A's refcount, and C has B's refcount, so unix_release() will never be called for A and B when we close() them. However, no one can call close() for D and C to free skbs holding refcounts of A and B because C/D is in A/B's receive queue, which should have been purged by unix_release() for A and B. So, here's another type of cyclic reference. When a fd of an AF_UNIX socket is passed to an embryo socket, the reference is indirectly held by its parent listening socket. .-> A .-> B | `- sk_receive_queue | `- sk_receive_queue | `- skb | `- skb | `- sk =3D=3D C | `- sk =3D=3D D | `- sk_receive_queue | `- sk_receive_queue | `- skb +---------' `- skb +-. | | `---------------------------------------------------------' Technically, the graph must be denoted as A <-> B instead of A (-> D) and B (-> C) to find such a cyclic reference without touching each socket's receive queue. .-> A --. .-- B <-. | X | =3D=3D A <-> B `-- C <-' `-> D --' We apply this fixup during GC by fetching the real successor by unix_edge_successor(). When we call accept(), we clear unix_sock.listener under unix_gc_lock not to confuse GC. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit dcf70df2048d27c5d186f013f101a4aefd63aa41) Signed-off-by: Lee Jones --- include/net/af_unix.h | 1 + net/unix/af_unix.c | 2 +- net/unix/garbage.c | 20 +++++++++++++++++++- 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index d6b755b254a17..9d92dd608fc42 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -24,6 +24,7 @@ void unix_inflight(struct user_struct *user, struct file = *fp); void unix_notinflight(struct user_struct *user, struct file *fp); void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver); void unix_del_edges(struct scm_fp_list *fpl); +void unix_update_edges(struct unix_sock *receiver); int unix_prepare_fpl(struct scm_fp_list *fpl); void unix_destroy_fpl(struct scm_fp_list *fpl); void unix_gc(void); diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 4d4c035ba626d..93316e9efc532 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1705,7 +1705,7 @@ static int unix_accept(struct socket *sock, struct so= cket *newsock, int flags, } =20 tsk =3D skb->sk; - unix_sk(tsk)->listener =3D NULL; + unix_update_edges(unix_sk(tsk)); skb_free_datagram(sk, skb); wake_up_interruptible(&unix_sk(sk)->peer_wait); =20 diff --git a/net/unix/garbage.c b/net/unix/garbage.c index cdeff548e1307..6ff7e0b5c5444 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -101,6 +101,17 @@ struct unix_sock *unix_get_socket(struct file *filp) return NULL; } =20 +static struct unix_vertex *unix_edge_successor(struct unix_edge *edge) +{ + /* If an embryo socket has a fd, + * the listener indirectly holds the fd's refcnt. + */ + if (edge->successor->listener) + return unix_sk(edge->successor->listener)->vertex; + + return edge->successor->vertex; +} + static LIST_HEAD(unix_unvisited_vertices); =20 enum unix_vertex_index { @@ -209,6 +220,13 @@ void unix_del_edges(struct scm_fp_list *fpl) fpl->inflight =3D false; } =20 +void unix_update_edges(struct unix_sock *receiver) +{ + spin_lock(&unix_gc_lock); + receiver->listener =3D NULL; + spin_unlock(&unix_gc_lock); +} + int unix_prepare_fpl(struct scm_fp_list *fpl) { struct unix_vertex *vertex; @@ -268,7 +286,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex) =20 /* Explore neighbour vertices (receivers of the current vertex's fd). */ list_for_each_entry(edge, &vertex->edges, vertex_entry) { - struct unix_vertex *next_vertex =3D edge->successor->vertex; + struct unix_vertex *next_vertex =3D unix_edge_successor(edge); =20 if (!next_vertex) continue; --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 312A31E1E1C; Wed, 21 May 2025 14:51:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839080; cv=none; b=uBaBoRnI8c1ssZzAPx9i6OIkstIPYCEbsEt8meS/T5u4IlpqTw0+CGw1pa2/oG3oCai1MWgpSuxOH2Ruo0UiCpnb71ja+CJDA4r5R51lu1EenuR5EdSJmoqrn8oG1fzjW0ssEySilcaY6qj/PM20OvocECp9sIof7UU4sPKmaVQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839080; c=relaxed/simple; bh=wRQQFqc9Q+OLVI5oXIa/+8+D65v+vcb0Zh7HW85KKHA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kjrof9FgPxRFIu3q7DF+0BxNa4m6si7uUl+ATfXto5nmedjqlNSh58av4YTgdRmss2gVkkm6tvZg9KSwGjHHpCyTNN4Ar/GNWj6N2ZmIOujRVdy22RJFqApPUOpUHmGBUY2pok7pCZIpzOojValNbJLEMb8a1ehkBwR1tlvCyHw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DA8eXBMz; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DA8eXBMz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CD885C4CEE4; Wed, 21 May 2025 14:51:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839080; bh=wRQQFqc9Q+OLVI5oXIa/+8+D65v+vcb0Zh7HW85KKHA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DA8eXBMzI1PSW+QcxTgxDwb8qWW5aTeW/mXfohn6JRevEPD8V7Wz1g6o72F2UFbjV Rd1iXuvp8d7+eiKJqL3ipAeBXnjzCMvNJo6bQBGbWqEfTCKfXx51WgBtJYQzeqeywf JlBD6KHGI9bTiAsUmaCgPphUZkszuBDcUqfuKIyc0uUFzqVl7NSOycv1IiuEe1Z8op H8DkVB3Vt1vVbtwkxg/DDX478zKE14Ldolm/TGeztcOvomAqglFaj62FnEJWgodEIX oO3+J5qJxcH/RWdLNl1p+YmxyVT46GwCa+rcVFYE5+1Y5YYCEFxqCOttshYq29VhCk 3H0qnWABVwXvQ== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 15/26] af_unix: Save O(n) setup of Tarjan's algo. Date: Wed, 21 May 2025 14:45:23 +0000 Message-ID: <20250521144803.2050504-16-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit ba31b4a4e1018f5844c6eb31734976e2184f2f9a ] Before starting Tarjan's algorithm, we need to mark all vertices as unvisited. We can save this O(n) setup by reserving two special indices (0, 1) and using two variables. The first time we link a vertex to unix_unvisited_vertices, we set unix_vertex_unvisited_index to index. During DFS, we can see that the index of unvisited vertices is the same as unix_vertex_unvisited_index. When we finalise SCC later, we set unix_vertex_grouped_index to each vertex's index. Then, we can know (i) that the vertex is on the stack if the index of a visited vertex is >=3D 2 and (ii) that it is not on the stack and belongs to a different SCC if the index is unix_vertex_grouped_index. After the whole algorithm, all indices of vertices are set as unix_vertex_grouped_index. Next time we start DFS, we know that all unvisited vertices have unix_vertex_grouped_index, and we can use unix_vertex_unvisited_index as the not-on-stack marker. To use the same variable in __unix_walk_scc(), we can swap unix_vertex_(grouped|unvisited)_index at the end of Tarjan's algorithm. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-10-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit ba31b4a4e1018f5844c6eb31734976e2184f2f9a) Signed-off-by: Lee Jones --- include/net/af_unix.h | 1 - net/unix/garbage.c | 26 +++++++++++++++----------- 2 files changed, 15 insertions(+), 12 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 9d92dd608fc42..053f67adb9f1b 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -37,7 +37,6 @@ struct unix_vertex { unsigned long out_degree; unsigned long index; unsigned long lowlink; - bool on_stack; }; =20 struct unix_edge { diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 6ff7e0b5c5444..feae6c17b2911 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -115,16 +115,20 @@ static struct unix_vertex *unix_edge_successor(struct= unix_edge *edge) static LIST_HEAD(unix_unvisited_vertices); =20 enum unix_vertex_index { - UNIX_VERTEX_INDEX_UNVISITED, + UNIX_VERTEX_INDEX_MARK1, + UNIX_VERTEX_INDEX_MARK2, UNIX_VERTEX_INDEX_START, }; =20 +static unsigned long unix_vertex_unvisited_index =3D UNIX_VERTEX_INDEX_MAR= K1; + static void unix_add_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex =3D edge->predecessor->vertex; =20 if (!vertex) { vertex =3D list_first_entry(&fpl->vertices, typeof(*vertex), entry); + vertex->index =3D unix_vertex_unvisited_index; vertex->out_degree =3D 0; INIT_LIST_HEAD(&vertex->edges); =20 @@ -265,6 +269,7 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) } =20 static LIST_HEAD(unix_visited_vertices); +static unsigned long unix_vertex_grouped_index =3D UNIX_VERTEX_INDEX_MARK2; =20 static void __unix_walk_scc(struct unix_vertex *vertex) { @@ -274,10 +279,10 @@ static void __unix_walk_scc(struct unix_vertex *verte= x) LIST_HEAD(edge_stack); =20 next_vertex: - /* Push vertex to vertex_stack. + /* Push vertex to vertex_stack and mark it as on-stack + * (index >=3D UNIX_VERTEX_INDEX_START). * The vertex will be popped when finalising SCC later. */ - vertex->on_stack =3D true; list_add(&vertex->scc_entry, &vertex_stack); =20 vertex->index =3D index; @@ -291,7 +296,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex) if (!next_vertex) continue; =20 - if (next_vertex->index =3D=3D UNIX_VERTEX_INDEX_UNVISITED) { + if (next_vertex->index =3D=3D unix_vertex_unvisited_index) { /* Iterative deepening depth first search * * 1. Push a forward edge to edge_stack and set @@ -317,7 +322,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex) * to skip SCC finalisation. */ vertex->lowlink =3D min(vertex->lowlink, next_vertex->lowlink); - } else if (next_vertex->on_stack) { + } else if (next_vertex->index !=3D unix_vertex_grouped_index) { /* Loop detected by a back/cross edge. * * The successor is on vertex_stack, so two vertices are @@ -344,7 +349,8 @@ static void __unix_walk_scc(struct unix_vertex *vertex) /* Don't restart DFS from this vertex in unix_walk_scc(). */ list_move_tail(&vertex->entry, &unix_visited_vertices); =20 - vertex->on_stack =3D false; + /* Mark vertex as off-stack. */ + vertex->index =3D unix_vertex_grouped_index; } =20 list_del(&scc); @@ -357,20 +363,18 @@ static void __unix_walk_scc(struct unix_vertex *verte= x) =20 static void unix_walk_scc(void) { - struct unix_vertex *vertex; - - list_for_each_entry(vertex, &unix_unvisited_vertices, entry) - vertex->index =3D UNIX_VERTEX_INDEX_UNVISITED; - /* Visit every vertex exactly once. * __unix_walk_scc() moves visited vertices to unix_visited_vertices. */ while (!list_empty(&unix_unvisited_vertices)) { + struct unix_vertex *vertex; + vertex =3D list_first_entry(&unix_unvisited_vertices, typeof(*vertex), e= ntry); __unix_walk_scc(vertex); } =20 list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); + swap(unix_vertex_unvisited_index, unix_vertex_grouped_index); } =20 static LIST_HEAD(gc_candidates); --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3833C219A6B; Wed, 21 May 2025 14:51:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839088; cv=none; b=r4RyvCuXW4wFv3DPNefB8gfX2mpM71XeqPF2Bv13AbhmG+6E8xEWEG8kHBHgnJQGPK2U1c0plaH4cRTQeonkF9v3sWV5ngf7xXWa9+KnjxYmEBC9PTpUtSaTflrerZPS9X6cEH+3HKzs7XdPU2kOpGI2TX9j0RiUMGnCr0N5vUM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839088; c=relaxed/simple; bh=4OcnMD5UfSaRziij14tKB31fCsUtEP1azZGsr3SMlRM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iZ2SVzPIpHvSJO0e9ROM0QY1faGnG4g+304z3q7hQ1Ri75B5jH17l9CKHUsywb2bAfvLWLAiRV1f6bFlfcgstN/RWNbUmvQPLz6pZGsfCyzAAzEBmK6pE9VFWZitNKPsdUZMd5ThXvGifAh7R/aTpUAEWtzUUH37x+KcTFShy+I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=P+FYoWCV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="P+FYoWCV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5A865C4CEE4; Wed, 21 May 2025 14:51:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839087; bh=4OcnMD5UfSaRziij14tKB31fCsUtEP1azZGsr3SMlRM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=P+FYoWCVKJiW6NBDu43O3B9Lg035kqQTAtpc6A7pnda0EPiMiiSpKquf5NWCWx34j ue9xwInn4GBNCrQfzuNLPAse27m+jWl8r0ds+WZsNhVLjnEHfwwc5oNx/mugwWX8ah adm96WNVnkkaaNporNyYyStU7mKFH38cXEtSMqiRJWWr+Dokk5ai/PSop6h8taN2Xm YBXHMdWh1Zd8XJib/8L3ro7+yx4egjf+RaE/p5XRBCgT7qEWpyXU6xthgaW026tw+y E90bkkHFBbIuWND7P57RpAUQc9tY9na47vl0mtiXMyugwMK5+Zgm8iAahy/0FLkUmi sz8Tz//aRUqZw== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 16/26] af_unix: Skip GC if no cycle exists. Date: Wed, 21 May 2025 14:45:24 +0000 Message-ID: <20250521144803.2050504-17-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 77e5593aebba823bcbcf2c4b58b07efcd63933b8 ] We do not need to run GC if there is no possible cyclic reference. We use unix_graph_maybe_cyclic to decide if we should run GC. If a fd of an AF_UNIX socket is passed to an already inflight AF_UNIX socket, they could form a cyclic reference. Then, we set true to unix_graph_maybe_cyclic and later run Tarjan's algorithm to group them into SCC. Once we run Tarjan's algorithm, we are 100% sure whether cyclic references exist or not. If there is no cycle, we set false to unix_graph_maybe_cyclic and can skip the entire garbage collection next time. When finalising SCC, we set true to unix_graph_maybe_cyclic if SCC consists of multiple vertices. Even if SCC is a single vertex, a cycle might exist as self-fd passing. Given the corner case is rare, we detect it by checking all edges of the vertex and set true to unix_graph_maybe_cyclic. With this change, __unix_gc() is just a spin_lock() dance in the normal usage. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-11-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 77e5593aebba823bcbcf2c4b58b07efcd63933b8) Signed-off-by: Lee Jones --- net/unix/garbage.c | 48 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index feae6c17b2911..8f0dc39bb72fc 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -112,6 +112,19 @@ static struct unix_vertex *unix_edge_successor(struct = unix_edge *edge) return edge->successor->vertex; } =20 +static bool unix_graph_maybe_cyclic; + +static void unix_update_graph(struct unix_vertex *vertex) +{ + /* If the receiver socket is not inflight, no cyclic + * reference could be formed. + */ + if (!vertex) + return; + + unix_graph_maybe_cyclic =3D true; +} + static LIST_HEAD(unix_unvisited_vertices); =20 enum unix_vertex_index { @@ -138,12 +151,16 @@ static void unix_add_edge(struct scm_fp_list *fpl, st= ruct unix_edge *edge) =20 vertex->out_degree++; list_add_tail(&edge->vertex_entry, &vertex->edges); + + unix_update_graph(unix_edge_successor(edge)); } =20 static void unix_del_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex =3D edge->predecessor->vertex; =20 + unix_update_graph(unix_edge_successor(edge)); + list_del(&edge->vertex_entry); vertex->out_degree--; =20 @@ -227,6 +244,7 @@ void unix_del_edges(struct scm_fp_list *fpl) void unix_update_edges(struct unix_sock *receiver) { spin_lock(&unix_gc_lock); + unix_update_graph(unix_sk(receiver->listener)->vertex); receiver->listener =3D NULL; spin_unlock(&unix_gc_lock); } @@ -268,6 +286,26 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) unix_free_vertices(fpl); } =20 +static bool unix_scc_cyclic(struct list_head *scc) +{ + struct unix_vertex *vertex; + struct unix_edge *edge; + + /* SCC containing multiple vertices ? */ + if (!list_is_singular(scc)) + return true; + + vertex =3D list_first_entry(scc, typeof(*vertex), scc_entry); + + /* Self-reference or a embryo-listener circle ? */ + list_for_each_entry(edge, &vertex->edges, vertex_entry) { + if (unix_edge_successor(edge) =3D=3D vertex) + return true; + } + + return false; +} + static LIST_HEAD(unix_visited_vertices); static unsigned long unix_vertex_grouped_index =3D UNIX_VERTEX_INDEX_MARK2; =20 @@ -353,6 +391,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex) vertex->index =3D unix_vertex_grouped_index; } =20 + if (!unix_graph_maybe_cyclic) + unix_graph_maybe_cyclic =3D unix_scc_cyclic(&scc); + list_del(&scc); } =20 @@ -363,6 +404,8 @@ static void __unix_walk_scc(struct unix_vertex *vertex) =20 static void unix_walk_scc(void) { + unix_graph_maybe_cyclic =3D false; + /* Visit every vertex exactly once. * __unix_walk_scc() moves visited vertices to unix_visited_vertices. */ @@ -524,6 +567,9 @@ static void __unix_gc(struct work_struct *work) =20 spin_lock(&unix_gc_lock); =20 + if (!unix_graph_maybe_cyclic) + goto skip_gc; + unix_walk_scc(); =20 /* First, select candidates for garbage collection. Only @@ -633,7 +679,7 @@ static void __unix_gc(struct work_struct *work) =20 /* All candidates should have been detached by now. */ WARN_ON_ONCE(!list_empty(&gc_candidates)); - +skip_gc: /* Paired with READ_ONCE() in wait_for_unix_gc(). */ WRITE_ONCE(gc_in_progress, false); =20 --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFBBB1D63DF; Wed, 21 May 2025 14:51:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839095; cv=none; b=aEMfvcsqmOQZCHNmzHvQse15n65yf60r5zfvyBvyxm6Abftuc3NPWveKkA/QkrctBwP6uDdGvYdHnpwY4nlGkVjgT3e3C6K35wPtlXqToJrSVcyalsearSdaGu92gQDWgfnmY0UY/MATpxjHIM12bsVV5WQO6Dt+esBQbjzUXgo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839095; c=relaxed/simple; bh=H4UhcNGje5V+6UrQJrQ/fqXbkUaEHJufJVQE+hU3Mc4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sj6x6qgYAZ56iVKVVtE31wG0nuRsG+22D7R6nhnAZU47Ciyy5KSBMEkCSuBO3JThHVcbiuQy+iyAlNFLFdS9uwaDfMvCS6oOEWk1PHaR+vbWtYCHJFoURq8rphyoSYDMvTvios11E7pXq30kg+C8CS99yUnsfDWZqOT+zUA74oo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nOwTX0P7; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nOwTX0P7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6075EC4CEE4; Wed, 21 May 2025 14:51:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839095; bh=H4UhcNGje5V+6UrQJrQ/fqXbkUaEHJufJVQE+hU3Mc4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=nOwTX0P7cYynehac63VSlU0zvdZGZO6yOGQBnLgVOFadk8bJAe5K8I05yY/yERIM9 4o0VAZFrJprgFQoJ1+pUOFMUeIJDVHJLj6VasNu3j3FHRVxVxsr3IaMFV7ogr5AERW C1yavlPT9scgAVDFXmH41BA+rJrrtkJ9nx1ltoKIj4V4nvJSPCtAr6qTBJJjFAlMIK dYg0jFDIfuv/oiXcEVlGRgs44kBx0/2xJiu5vNKO43IGTzUYThLM1bJ1kXB4VbkvWS YnIGYkNiCC+xfisHkODODcNRIt/n2HCUR2V5wVHUrfmirSYzyWP0EPSJIgxko85Ser w+1Ou2Xz9tuGg== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 17/26] af_unix: Avoid Tarjan's algorithm if unnecessary. Date: Wed, 21 May 2025 14:45:25 +0000 Message-ID: <20250521144803.2050504-18-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit ad081928a8b0f57f269df999a28087fce6f2b6ce ] Once a cyclic reference is formed, we need to run GC to check if there is dead SCC. However, we do not need to run Tarjan's algorithm if we know that the shape of the inflight graph has not been changed. If an edge is added/updated/deleted and the edge's successor is inflight, we set false to unix_graph_grouped, which means we need to re-classify SCC. Once we finalise SCC, we set true to unix_graph_grouped. While unix_graph_grouped is true, we can iterate the grouped SCC using vertex->scc_entry in unix_walk_scc_fast(). list_add() and list_for_each_entry_reverse() uses seem weird, but they are to keep the vertex order consistent and make writing test easier. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-12-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit ad081928a8b0f57f269df999a28087fce6f2b6ce) Signed-off-by: Lee Jones --- net/unix/garbage.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 8f0dc39bb72fc..d25841ab2de40 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -113,6 +113,7 @@ static struct unix_vertex *unix_edge_successor(struct u= nix_edge *edge) } =20 static bool unix_graph_maybe_cyclic; +static bool unix_graph_grouped; =20 static void unix_update_graph(struct unix_vertex *vertex) { @@ -123,6 +124,7 @@ static void unix_update_graph(struct unix_vertex *verte= x) return; =20 unix_graph_maybe_cyclic =3D true; + unix_graph_grouped =3D false; } =20 static LIST_HEAD(unix_unvisited_vertices); @@ -144,6 +146,7 @@ static void unix_add_edge(struct scm_fp_list *fpl, stru= ct unix_edge *edge) vertex->index =3D unix_vertex_unvisited_index; vertex->out_degree =3D 0; INIT_LIST_HEAD(&vertex->edges); + INIT_LIST_HEAD(&vertex->scc_entry); =20 list_move_tail(&vertex->entry, &unix_unvisited_vertices); edge->predecessor->vertex =3D vertex; @@ -418,6 +421,26 @@ static void unix_walk_scc(void) =20 list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); swap(unix_vertex_unvisited_index, unix_vertex_grouped_index); + + unix_graph_grouped =3D true; +} + +static void unix_walk_scc_fast(void) +{ + while (!list_empty(&unix_unvisited_vertices)) { + struct unix_vertex *vertex; + struct list_head scc; + + vertex =3D list_first_entry(&unix_unvisited_vertices, typeof(*vertex), e= ntry); + list_add(&scc, &vertex->scc_entry); + + list_for_each_entry_reverse(vertex, &scc, scc_entry) + list_move_tail(&vertex->entry, &unix_visited_vertices); + + list_del(&scc); + } + + list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); } =20 static LIST_HEAD(gc_candidates); @@ -570,7 +593,10 @@ static void __unix_gc(struct work_struct *work) if (!unix_graph_maybe_cyclic) goto skip_gc; =20 - unix_walk_scc(); + if (unix_graph_grouped) + unix_walk_scc_fast(); + else + unix_walk_scc(); =20 /* First, select candidates for garbage collection. Only * in-flight sockets are considered, and from those only ones --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F322E1D63DF; Wed, 21 May 2025 14:51:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839105; cv=none; b=CIJeVwNa11Z0Aq0mUbjHWLQK/gZ8dl0y4VSXWzdLbLBU3eCfZzKnfuUGhVLqhnE1XhbkepiTpF7bIxaGzcTw04Ynw7ZKQwQvplHhUDxSpIA9l8u//yGUrQyZZX+sAOopsSvlNwW//W+nSsLNYNKXUDtS0ac6GSogjSjA0TY5KXI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839105; c=relaxed/simple; bh=IBWT4QNnq0/bMB+o/yrFT/saz4N9Do0m1KmPwiDs1j4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YvVZHIPGD3AlAOQlkM58+Mcc9QdLnrLD1HRPac+k0SXBltmVrhElsPtmgf2QS2PhAho0Ki+826MQd+uQ5/612yaBng+KrU7lM4bI+ZLbXIj2hejV6Wksu4kxdoFGftf/ONzr7KS2mGpSz+VWgmLIRVRVnWGbG0QbMc6MSajpCYA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=X14Pq0Qo; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="X14Pq0Qo" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 00768C4CEE4; Wed, 21 May 2025 14:51:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839103; bh=IBWT4QNnq0/bMB+o/yrFT/saz4N9Do0m1KmPwiDs1j4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=X14Pq0Qoqe3t+gU2ljAeAqPHSLSUH51tL/KfZsJ6RnRNQjdz7QCXzroGb4LQJmi+b OIf+DEP7gC+f8xWpyprUegWJ12+0RHBXclIsNLT79WCvKEYQzpfyN60GG+rbMDhJ3o SLXGkWdaPq+wX1yY8odwWnWvdJh4NT1Pd+hxSM7Fah7qL+/4tKg++0fYiQWeIiZ8tM g+6D4AAW1uXH9qirpIaJvIyf+F/sdua1GGFUDY4X4TKFoXTC6deY8Op8UpBMPruM+2 mofFLRtteQm5bIqLzoL+KchY/l/M/sPeblpmY72+tq9kHIDblpxsUo7r1w9K3+v370 SKy5nVGdPM6wA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 18/26] af_unix: Assign a unique index to SCC. Date: Wed, 21 May 2025 14:45:26 +0000 Message-ID: <20250521144803.2050504-19-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit bfdb01283ee8f2f3089656c3ff8f62bb072dabb2 ] The definition of the lowlink in Tarjan's algorithm is the smallest index of a vertex that is reachable with at most one back-edge in SCC. This is not useful for a cross-edge. If we start traversing from A in the following graph, the final lowlink of D is 3. The cross-edge here is one between D and C. A -> B -> D D =3D (4, 3) (index, lowlink) ^ | | C =3D (3, 1) | V | B =3D (2, 1) `--- C <--' A =3D (1, 1) This is because the lowlink of D is updated with the index of C. In the following patch, we detect a dead SCC by checking two conditions for each vertex. 1) vertex has no edge directed to another SCC (no bridge) 2) vertex's out_degree is the same as the refcount of its file If 1) is false, there is a receiver of all fds of the SCC and its ancestor SCC. To evaluate 1), we need to assign a unique index to each SCC and assign it to all vertices in the SCC. This patch changes the lowlink update logic for cross-edge so that in the example above, the lowlink of D is updated with the lowlink of C. A -> B -> D D =3D (4, 1) (index, lowlink) ^ | | C =3D (3, 1) | V | B =3D (2, 1) `--- C <--' A =3D (1, 1) Then, all vertices in the same SCC have the same lowlink, and we can quickly find the bridge connecting to different SCC if exists. However, it is no longer called lowlink, so we rename it to scc_index. (It's sometimes called lowpoint.) Also, we add a global variable to hold the last index used in DFS so that we do not reset the initial index in each DFS. This patch can be squashed to the SCC detection patch but is split deliberately for anyone wondering why lowlink is not used as used in the original Tarjan's algorithm and many reference implementations. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-13-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit bfdb01283ee8f2f3089656c3ff8f62bb072dabb2) Signed-off-by: Lee Jones --- include/net/af_unix.h | 2 +- net/unix/garbage.c | 29 +++++++++++++++-------------- 2 files changed, 16 insertions(+), 15 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 053f67adb9f1b..bb7f10aa01293 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -36,7 +36,7 @@ struct unix_vertex { struct list_head scc_entry; unsigned long out_degree; unsigned long index; - unsigned long lowlink; + unsigned long scc_index; }; =20 struct unix_edge { diff --git a/net/unix/garbage.c b/net/unix/garbage.c index d25841ab2de40..2e66b57f3f0f6 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -312,9 +312,8 @@ static bool unix_scc_cyclic(struct list_head *scc) static LIST_HEAD(unix_visited_vertices); static unsigned long unix_vertex_grouped_index =3D UNIX_VERTEX_INDEX_MARK2; =20 -static void __unix_walk_scc(struct unix_vertex *vertex) +static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *las= t_index) { - unsigned long index =3D UNIX_VERTEX_INDEX_START; LIST_HEAD(vertex_stack); struct unix_edge *edge; LIST_HEAD(edge_stack); @@ -326,9 +325,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex) */ list_add(&vertex->scc_entry, &vertex_stack); =20 - vertex->index =3D index; - vertex->lowlink =3D index; - index++; + vertex->index =3D *last_index; + vertex->scc_index =3D *last_index; + (*last_index)++; =20 /* Explore neighbour vertices (receivers of the current vertex's fd). */ list_for_each_entry(edge, &vertex->edges, vertex_entry) { @@ -358,30 +357,30 @@ static void __unix_walk_scc(struct unix_vertex *verte= x) next_vertex =3D vertex; vertex =3D edge->predecessor->vertex; =20 - /* If the successor has a smaller lowlink, two vertices - * are in the same SCC, so propagate the smaller lowlink + /* If the successor has a smaller scc_index, two vertices + * are in the same SCC, so propagate the smaller scc_index * to skip SCC finalisation. */ - vertex->lowlink =3D min(vertex->lowlink, next_vertex->lowlink); + vertex->scc_index =3D min(vertex->scc_index, next_vertex->scc_index); } else if (next_vertex->index !=3D unix_vertex_grouped_index) { /* Loop detected by a back/cross edge. * - * The successor is on vertex_stack, so two vertices are - * in the same SCC. If the successor has a smaller index, + * The successor is on vertex_stack, so two vertices are in + * the same SCC. If the successor has a smaller *scc_index*, * propagate it to skip SCC finalisation. */ - vertex->lowlink =3D min(vertex->lowlink, next_vertex->index); + vertex->scc_index =3D min(vertex->scc_index, next_vertex->scc_index); } else { /* The successor was already grouped as another SCC */ } } =20 - if (vertex->index =3D=3D vertex->lowlink) { + if (vertex->index =3D=3D vertex->scc_index) { struct list_head scc; =20 /* SCC finalised. * - * If the lowlink was not updated, all the vertices above on + * If the scc_index was not updated, all the vertices above on * vertex_stack are in the same SCC. Group them using scc_entry. */ __list_cut_position(&scc, &vertex_stack, &vertex->scc_entry); @@ -407,6 +406,8 @@ static void __unix_walk_scc(struct unix_vertex *vertex) =20 static void unix_walk_scc(void) { + unsigned long last_index =3D UNIX_VERTEX_INDEX_START; + unix_graph_maybe_cyclic =3D false; =20 /* Visit every vertex exactly once. @@ -416,7 +417,7 @@ static void unix_walk_scc(void) struct unix_vertex *vertex; =20 vertex =3D list_first_entry(&unix_unvisited_vertices, typeof(*vertex), e= ntry); - __unix_walk_scc(vertex); + __unix_walk_scc(vertex, &last_index); } =20 list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97F471D63DF; Wed, 21 May 2025 14:51:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839111; cv=none; b=tfKLW5c8xSDGrbJtLkfOSUEu/+F4R3RirIf4lAH5ZX3xAEQvceoezlHEWdvr8/SSxUoBi5Bqkm0TS5qpo5upbb9v8AsjmyDo1z30w1lQf8l71/K2cdzxf05KwDdurTbMDrg14HjjDLUUGQNw6hzO+TXz1twzKw6TwTEkZvWDtqY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839111; c=relaxed/simple; bh=H95I4bzN8IvuAVxZK25u/MkkXfxt/gt6nKI4Kx6IHGo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=V3yRr1RfeiY7zswB1lN2p5knmd2dCZY10n8v3DXYiNaLoGae46wh+40VaUuy4TmnbZG+7A1wGRlu+p/rgptyybmCcpugAMLcybIdTSP64qML++nf4tdxmaqqF741NWptSH44Xwz3ini0afDL3Qu+FhaT5k7l0W55gLUmmH83Fd4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PIMXwtux; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PIMXwtux" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CE1DBC4CEE4; Wed, 21 May 2025 14:51:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839111; bh=H95I4bzN8IvuAVxZK25u/MkkXfxt/gt6nKI4Kx6IHGo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PIMXwtuxgQ5RE0ya9CA+SWwwdjTvLfBvCtPFlwhuWcga+4dn9hlFKNQjIB2qkRQhD MMXFF+GMH/qdqoNIVU7apl/e8CM6xfUQKy8h9CApcL3vMBJoJOEAZam7HEdwsk4Sul UntrAunmnTcppShU+tTtffBlCzgU7PnXfLMoson3oh9nfGGPSAiTL0th4mh8JD30xi +r2/S27WO1qXkG5kJnCVF3Z+7+G8ttWFpVd9D3RQtUWPe/fVd7US7wSg11DT/yTLg6 zIZEOhlDlsCEPpkXE7Hy/HN7slOIN+CoV9dxtrKXegvj6+sgiKFJe0cxqqZKY846VL 5UWlyTLt3byLQ== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 19/26] af_unix: Detect dead SCC. Date: Wed, 21 May 2025 14:45:27 +0000 Message-ID: <20250521144803.2050504-20-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit a15702d8b3aad8ce5268c565bd29f0e02fd2db83 ] When iterating SCC, we call unix_vertex_dead() for each vertex to check if the vertex is close()d and has no bridge to another SCC. If both conditions are true for every vertex in SCC, we can execute garbage collection for all skb in the SCC. The actual garbage collection is done in the following patch, replacing the old implementation. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-14-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit a15702d8b3aad8ce5268c565bd29f0e02fd2db83) Signed-off-by: Lee Jones --- net/unix/garbage.c | 44 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 2e66b57f3f0f6..1f53c25fc71b6 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -289,6 +289,39 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) unix_free_vertices(fpl); } =20 +static bool unix_vertex_dead(struct unix_vertex *vertex) +{ + struct unix_edge *edge; + struct unix_sock *u; + long total_ref; + + list_for_each_entry(edge, &vertex->edges, vertex_entry) { + struct unix_vertex *next_vertex =3D unix_edge_successor(edge); + + /* The vertex's fd can be received by a non-inflight socket. */ + if (!next_vertex) + return false; + + /* The vertex's fd can be received by an inflight socket in + * another SCC. + */ + if (next_vertex->scc_index !=3D vertex->scc_index) + return false; + } + + /* No receiver exists out of the same SCC. */ + + edge =3D list_first_entry(&vertex->edges, typeof(*edge), vertex_entry); + u =3D edge->predecessor; + total_ref =3D file_count(u->sk.sk_socket->file); + + /* If not close()d, total_ref > out_degree. */ + if (total_ref !=3D vertex->out_degree) + return false; + + return true; +} + static bool unix_scc_cyclic(struct list_head *scc) { struct unix_vertex *vertex; @@ -377,6 +410,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex,= unsigned long *last_inde =20 if (vertex->index =3D=3D vertex->scc_index) { struct list_head scc; + bool scc_dead =3D true; =20 /* SCC finalised. * @@ -391,6 +425,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex,= unsigned long *last_inde =20 /* Mark vertex as off-stack. */ vertex->index =3D unix_vertex_grouped_index; + + if (scc_dead) + scc_dead =3D unix_vertex_dead(vertex); } =20 if (!unix_graph_maybe_cyclic) @@ -431,13 +468,18 @@ static void unix_walk_scc_fast(void) while (!list_empty(&unix_unvisited_vertices)) { struct unix_vertex *vertex; struct list_head scc; + bool scc_dead =3D true; =20 vertex =3D list_first_entry(&unix_unvisited_vertices, typeof(*vertex), e= ntry); list_add(&scc, &vertex->scc_entry); =20 - list_for_each_entry_reverse(vertex, &scc, scc_entry) + list_for_each_entry_reverse(vertex, &scc, scc_entry) { list_move_tail(&vertex->entry, &unix_visited_vertices); =20 + if (scc_dead) + scc_dead =3D unix_vertex_dead(vertex); + } + list_del(&scc); } =20 --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3AABE1DE8A6; Wed, 21 May 2025 14:51:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839119; cv=none; b=m0Y739lfzUwp1fegkf7bO3R3BC8eUuoavuv0gw6clpedg0HLH/4GOZogNWFlu0Njd9JCmLk04bfEhh5IdyG9bDKok26TqfMN9uvJdImXolIRx8hgCSJQvTve0yOU9RqsAQBSa1+kM+KVh4yGQZ/EsIn/CHS6XEFNddQ+wW2iHac= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839119; c=relaxed/simple; bh=9nE74ByLJSos6WQ1qK1P6n4fUWfXvEPB2dIFqq74H5k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=InhbYmNmZKXHfomByJSlyDcHbUVcF7SAePn6BDiCMuksjyJ8GOaS871TTRYX4HO7thASlD3jPbiY7v+Z2CGnzBOjMiWvgikkNDXamFkC9mu7UVl3dJ3ePKtD2FEJC2fsqMxuuXrNbuZxtrWQ+9ZCHB+cSM1dOB3lAa8o6TRa7P0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LSVanl8b; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LSVanl8b" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 799E6C4CEE4; Wed, 21 May 2025 14:51:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839118; bh=9nE74ByLJSos6WQ1qK1P6n4fUWfXvEPB2dIFqq74H5k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LSVanl8bHYQ+lnIxFlv0fV5sU8HdebVlx4Vl9+K1Gc8mEeHEAFnIxk3zolNZsBj2b VchnFVG6UegIzvAWXf9PX8Hc+PUmsMvK+MwfqgtfsTAaY73SLN1WMBrNb+gf7BnCX9 tooZF7NCZGjMteIgBgzG1RgP58QcUx0aL47d5qA4zYdYsQMt2wHUNe2oA3JVqO5s0K LBs/I0lLLQG2ywn1J6gB1TeBJ6tLpiYPPhGWQpyoZUEqWcxs5sjNngVCok4Fpfdk3R nJEyGg/URJ9iE27SsApZcs8NQsF7XhhGQpGvk6HJCBKEYlaN+JARlFgKe/beyXEcH8 MvkEAeKHV1wJA== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 20/26] af_unix: Replace garbage collection algorithm. Date: Wed, 21 May 2025 14:45:28 +0000 Message-ID: <20250521144803.2050504-21-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 4090fa373f0e763c43610853d2774b5979915959 ] If we find a dead SCC during iteration, we call unix_collect_skb() to splice all skb in the SCC to the global sk_buff_head, hitlist. After iterating all SCC, we unlock unix_gc_lock and purge the queue. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240325202425.60930-15-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 4090fa373f0e763c43610853d2774b5979915959) Signed-off-by: Lee Jones --- include/net/af_unix.h | 8 -- net/unix/af_unix.c | 12 -- net/unix/garbage.c | 318 +++++++++--------------------------------- 3 files changed, 64 insertions(+), 274 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index bb7f10aa01293..d88ca51a9081d 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -19,9 +19,6 @@ static inline struct unix_sock *unix_get_socket(struct fi= le *filp) =20 extern spinlock_t unix_gc_lock; extern unsigned int unix_tot_inflight; - -void unix_inflight(struct user_struct *user, struct file *fp); -void unix_notinflight(struct user_struct *user, struct file *fp); void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver); void unix_del_edges(struct scm_fp_list *fpl); void unix_update_edges(struct unix_sock *receiver); @@ -85,12 +82,7 @@ struct unix_sock { struct sock *peer; struct sock *listener; struct unix_vertex *vertex; - struct list_head link; - unsigned long inflight; spinlock_t lock; - unsigned long gc_flags; -#define UNIX_GC_CANDIDATE 0 -#define UNIX_GC_MAYBE_CYCLE 1 struct socket_wq peer_wq; wait_queue_entry_t peer_wake; struct scm_stat scm_stat; diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 93316e9efc532..eee0bccd7877b 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -979,12 +979,10 @@ static struct sock *unix_create1(struct net *net, str= uct socket *sock, int kern, sk->sk_destruct =3D unix_sock_destructor; u =3D unix_sk(sk); u->listener =3D NULL; - u->inflight =3D 0; u->vertex =3D NULL; u->path.dentry =3D NULL; u->path.mnt =3D NULL; spin_lock_init(&u->lock); - INIT_LIST_HEAD(&u->link); mutex_init(&u->iolock); /* single task reading lock */ mutex_init(&u->bindlock); /* single task binding lock */ init_waitqueue_head(&u->peer_wait); @@ -1770,8 +1768,6 @@ static inline bool too_many_unix_fds(struct task_stru= ct *p) =20 static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) { - int i; - if (too_many_unix_fds(current)) return -ETOOMANYREFS; =20 @@ -1783,9 +1779,6 @@ static int unix_attach_fds(struct scm_cookie *scm, st= ruct sk_buff *skb) if (!UNIXCB(skb).fp) return -ENOMEM; =20 - for (i =3D scm->fp->count - 1; i >=3D 0; i--) - unix_inflight(scm->fp->user, scm->fp->fp[i]); - if (unix_prepare_fpl(UNIXCB(skb).fp)) return -ENOMEM; =20 @@ -1794,15 +1787,10 @@ static int unix_attach_fds(struct scm_cookie *scm, = struct sk_buff *skb) =20 static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb) { - int i; - scm->fp =3D UNIXCB(skb).fp; UNIXCB(skb).fp =3D NULL; =20 unix_destroy_fpl(scm->fp); - - for (i =3D scm->fp->count - 1; i >=3D 0; i--) - unix_notinflight(scm->fp->user, scm->fp->fp[i]); } =20 static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 1f53c25fc71b6..89ea71d9297ba 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -322,6 +322,52 @@ static bool unix_vertex_dead(struct unix_vertex *verte= x) return true; } =20 +enum unix_recv_queue_lock_class { + U_RECVQ_LOCK_NORMAL, + U_RECVQ_LOCK_EMBRYO, +}; + +static void unix_collect_skb(struct list_head *scc, struct sk_buff_head *h= itlist) +{ + struct unix_vertex *vertex; + + list_for_each_entry_reverse(vertex, scc, scc_entry) { + struct sk_buff_head *queue; + struct unix_edge *edge; + struct unix_sock *u; + + edge =3D list_first_entry(&vertex->edges, typeof(*edge), vertex_entry); + u =3D edge->predecessor; + queue =3D &u->sk.sk_receive_queue; + + spin_lock(&queue->lock); + + if (u->sk.sk_state =3D=3D TCP_LISTEN) { + struct sk_buff *skb; + + skb_queue_walk(queue, skb) { + struct sk_buff_head *embryo_queue =3D &skb->sk->sk_receive_queue; + + /* listener -> embryo order, the inversion never happens. */ + spin_lock_nested(&embryo_queue->lock, U_RECVQ_LOCK_EMBRYO); + skb_queue_splice_init(embryo_queue, hitlist); + spin_unlock(&embryo_queue->lock); + } + } else { + skb_queue_splice_init(queue, hitlist); + +#if IS_ENABLED(CONFIG_AF_UNIX_OOB) + if (u->oob_skb) { + kfree_skb(u->oob_skb); + u->oob_skb =3D NULL; + } +#endif + } + + spin_unlock(&queue->lock); + } +} + static bool unix_scc_cyclic(struct list_head *scc) { struct unix_vertex *vertex; @@ -345,7 +391,8 @@ static bool unix_scc_cyclic(struct list_head *scc) static LIST_HEAD(unix_visited_vertices); static unsigned long unix_vertex_grouped_index =3D UNIX_VERTEX_INDEX_MARK2; =20 -static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *las= t_index) +static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *las= t_index, + struct sk_buff_head *hitlist) { LIST_HEAD(vertex_stack); struct unix_edge *edge; @@ -430,7 +477,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex,= unsigned long *last_inde scc_dead =3D unix_vertex_dead(vertex); } =20 - if (!unix_graph_maybe_cyclic) + if (scc_dead) + unix_collect_skb(&scc, hitlist); + else if (!unix_graph_maybe_cyclic) unix_graph_maybe_cyclic =3D unix_scc_cyclic(&scc); =20 list_del(&scc); @@ -441,7 +490,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex,= unsigned long *last_inde goto prev_vertex; } =20 -static void unix_walk_scc(void) +static void unix_walk_scc(struct sk_buff_head *hitlist) { unsigned long last_index =3D UNIX_VERTEX_INDEX_START; =20 @@ -454,7 +503,7 @@ static void unix_walk_scc(void) struct unix_vertex *vertex; =20 vertex =3D list_first_entry(&unix_unvisited_vertices, typeof(*vertex), e= ntry); - __unix_walk_scc(vertex, &last_index); + __unix_walk_scc(vertex, &last_index, hitlist); } =20 list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); @@ -463,7 +512,7 @@ static void unix_walk_scc(void) unix_graph_grouped =3D true; } =20 -static void unix_walk_scc_fast(void) +static void unix_walk_scc_fast(struct sk_buff_head *hitlist) { while (!list_empty(&unix_unvisited_vertices)) { struct unix_vertex *vertex; @@ -480,279 +529,40 @@ static void unix_walk_scc_fast(void) scc_dead =3D unix_vertex_dead(vertex); } =20 + if (scc_dead) + unix_collect_skb(&scc, hitlist); + list_del(&scc); } =20 list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); } =20 -static LIST_HEAD(gc_candidates); -static LIST_HEAD(gc_inflight_list); - -/* Keep the number of times in flight count for the file - * descriptor if it is for an AF_UNIX socket. - */ -void unix_inflight(struct user_struct *user, struct file *filp) -{ - struct unix_sock *u =3D unix_get_socket(filp); - - spin_lock(&unix_gc_lock); - - if (u) { - if (!u->inflight) { - WARN_ON_ONCE(!list_empty(&u->link)); - list_add_tail(&u->link, &gc_inflight_list); - } else { - WARN_ON_ONCE(list_empty(&u->link)); - } - u->inflight++; - } - - spin_unlock(&unix_gc_lock); -} - -void unix_notinflight(struct user_struct *user, struct file *filp) -{ - struct unix_sock *u =3D unix_get_socket(filp); - - spin_lock(&unix_gc_lock); - - if (u) { - WARN_ON_ONCE(!u->inflight); - WARN_ON_ONCE(list_empty(&u->link)); - - u->inflight--; - if (!u->inflight) - list_del_init(&u->link); - } - - spin_unlock(&unix_gc_lock); -} - -static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), - struct sk_buff_head *hitlist) -{ - struct sk_buff *skb; - struct sk_buff *next; - - spin_lock(&x->sk_receive_queue.lock); - skb_queue_walk_safe(&x->sk_receive_queue, skb, next) { - /* Do we have file descriptors ? */ - if (UNIXCB(skb).fp) { - bool hit =3D false; - /* Process the descriptors of this socket */ - int nfd =3D UNIXCB(skb).fp->count; - struct file **fp =3D UNIXCB(skb).fp->fp; - - while (nfd--) { - /* Get the socket the fd matches if it indeed does so */ - struct unix_sock *u =3D unix_get_socket(*fp++); - - /* Ignore non-candidates, they could have been added - * to the queues after starting the garbage collection - */ - if (u && test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) { - hit =3D true; - - func(u); - } - } - if (hit && hitlist !=3D NULL) { - __skb_unlink(skb, &x->sk_receive_queue); - __skb_queue_tail(hitlist, skb); - } - } - } - spin_unlock(&x->sk_receive_queue.lock); -} - -static void scan_children(struct sock *x, void (*func)(struct unix_sock *), - struct sk_buff_head *hitlist) -{ - if (x->sk_state !=3D TCP_LISTEN) { - scan_inflight(x, func, hitlist); - } else { - struct sk_buff *skb; - struct sk_buff *next; - struct unix_sock *u; - LIST_HEAD(embryos); - - /* For a listening socket collect the queued embryos - * and perform a scan on them as well. - */ - spin_lock(&x->sk_receive_queue.lock); - skb_queue_walk_safe(&x->sk_receive_queue, skb, next) { - u =3D unix_sk(skb->sk); - - /* An embryo cannot be in-flight, so it's safe - * to use the list link. - */ - WARN_ON_ONCE(!list_empty(&u->link)); - list_add_tail(&u->link, &embryos); - } - spin_unlock(&x->sk_receive_queue.lock); - - while (!list_empty(&embryos)) { - u =3D list_entry(embryos.next, struct unix_sock, link); - scan_inflight(&u->sk, func, hitlist); - list_del_init(&u->link); - } - } -} - -static void dec_inflight(struct unix_sock *usk) -{ - usk->inflight--; -} - -static void inc_inflight(struct unix_sock *usk) -{ - usk->inflight++; -} - -static void inc_inflight_move_tail(struct unix_sock *u) -{ - u->inflight++; - - /* If this still might be part of a cycle, move it to the end - * of the list, so that it's checked even if it was already - * passed over - */ - if (test_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags)) - list_move_tail(&u->link, &gc_candidates); -} - static bool gc_in_progress; =20 static void __unix_gc(struct work_struct *work) { struct sk_buff_head hitlist; - struct unix_sock *u, *next; - LIST_HEAD(not_cycle_list); - struct list_head cursor; =20 spin_lock(&unix_gc_lock); =20 - if (!unix_graph_maybe_cyclic) + if (!unix_graph_maybe_cyclic) { + spin_unlock(&unix_gc_lock); goto skip_gc; - - if (unix_graph_grouped) - unix_walk_scc_fast(); - else - unix_walk_scc(); - - /* First, select candidates for garbage collection. Only - * in-flight sockets are considered, and from those only ones - * which don't have any external reference. - * - * Holding unix_gc_lock will protect these candidates from - * being detached, and hence from gaining an external - * reference. Since there are no possible receivers, all - * buffers currently on the candidates' queues stay there - * during the garbage collection. - * - * We also know that no new candidate can be added onto the - * receive queues. Other, non candidate sockets _can_ be - * added to queue, so we must make sure only to touch - * candidates. - * - * Embryos, though never candidates themselves, affect which - * candidates are reachable by the garbage collector. Before - * being added to a listener's queue, an embryo may already - * receive data carrying SCM_RIGHTS, potentially making the - * passed socket a candidate that is not yet reachable by the - * collector. It becomes reachable once the embryo is - * enqueued. Therefore, we must ensure that no SCM-laden - * embryo appears in a (candidate) listener's queue between - * consecutive scan_children() calls. - */ - list_for_each_entry_safe(u, next, &gc_inflight_list, link) { - struct sock *sk =3D &u->sk; - long total_refs; - - total_refs =3D file_count(sk->sk_socket->file); - - WARN_ON_ONCE(!u->inflight); - WARN_ON_ONCE(total_refs < u->inflight); - if (total_refs =3D=3D u->inflight) { - list_move_tail(&u->link, &gc_candidates); - __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags); - __set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags); - - if (sk->sk_state =3D=3D TCP_LISTEN) { - unix_state_lock_nested(sk, U_LOCK_GC_LISTENER); - unix_state_unlock(sk); - } - } - } - - /* Now remove all internal in-flight reference to children of - * the candidates. - */ - list_for_each_entry(u, &gc_candidates, link) - scan_children(&u->sk, dec_inflight, NULL); - - /* Restore the references for children of all candidates, - * which have remaining references. Do this recursively, so - * only those remain, which form cyclic references. - * - * Use a "cursor" link, to make the list traversal safe, even - * though elements might be moved about. - */ - list_add(&cursor, &gc_candidates); - while (cursor.next !=3D &gc_candidates) { - u =3D list_entry(cursor.next, struct unix_sock, link); - - /* Move cursor to after the current position. */ - list_move(&cursor, &u->link); - - if (u->inflight) { - list_move_tail(&u->link, ¬_cycle_list); - __clear_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags); - scan_children(&u->sk, inc_inflight_move_tail, NULL); - } } - list_del(&cursor); =20 - /* Now gc_candidates contains only garbage. Restore original - * inflight counters for these as well, and remove the skbuffs - * which are creating the cycle(s). - */ - skb_queue_head_init(&hitlist); - list_for_each_entry(u, &gc_candidates, link) { - scan_children(&u->sk, inc_inflight, &hitlist); + __skb_queue_head_init(&hitlist); =20 -#if IS_ENABLED(CONFIG_AF_UNIX_OOB) - if (u->oob_skb) { - kfree_skb(u->oob_skb); - u->oob_skb =3D NULL; - } -#endif - } - - /* not_cycle_list contains those sockets which do not make up a - * cycle. Restore these to the inflight list. - */ - while (!list_empty(¬_cycle_list)) { - u =3D list_entry(not_cycle_list.next, struct unix_sock, link); - __clear_bit(UNIX_GC_CANDIDATE, &u->gc_flags); - list_move_tail(&u->link, &gc_inflight_list); - } + if (unix_graph_grouped) + unix_walk_scc_fast(&hitlist); + else + unix_walk_scc(&hitlist); =20 spin_unlock(&unix_gc_lock); =20 - /* Here we are. Hitlist is filled. Die. */ __skb_queue_purge(&hitlist); - - spin_lock(&unix_gc_lock); - - /* All candidates should have been detached by now. */ - WARN_ON_ONCE(!list_empty(&gc_candidates)); skip_gc: - /* Paired with READ_ONCE() in wait_for_unix_gc(). */ WRITE_ONCE(gc_in_progress, false); - - spin_unlock(&unix_gc_lock); } =20 static DECLARE_WORK(unix_gc_work, __unix_gc); --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2DD841FF1AD; Wed, 21 May 2025 14:52:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839127; cv=none; b=HRyU7HYjq8AzqwBE+njI/fACWFRox+4E0d6em/atUnXterALWHjPUgRk3TFmCduXxXAMTudfKhND1Qgt7FMqcP3f/xLVVcGvdIcyBEuHRhDzzkn2jUKqN8uc2Gm/d8AzCEaQcMjHOxkVpEb3n7SuI2I0u/MyT7kidYFvfuqxBOk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839127; c=relaxed/simple; bh=4L+lwrX7Ae12v/jarUE4TloDXL3LKGUzaohCqpphWOA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Sm5hyyCxQTgpdEXgz85dhRFhE3EG/JI3+kibwcE6bpD0/lndt3cDIDMl7Tv/S5D8L+mFp0U+LK+MqRA+FZqyn4YjC9BYK+xPZdIoJoHKeINWVbA1cjH/PjQLwY2WmcQf1TbW/LTDGfUaNKM7XcpB1t/jwYWE2l0JJ8qCvbNtnks= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iRLakpgY; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iRLakpgY" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6FE2EC4CEE4; Wed, 21 May 2025 14:52:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839126; bh=4L+lwrX7Ae12v/jarUE4TloDXL3LKGUzaohCqpphWOA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iRLakpgYlPRd+vrbXrsoGQ6O3WJBkVTG8QOywgEY5A9sSM1jTOsGWyus5Jn/XDYsF JLcj21efniPjGKnoTurmn8TzHRqrSBDCr1uy8tAmBMZVyQBPl7ZsI/aBFdDi6dpqW1 gLevInVrC+vzqSkjBSpd0YIkT29rpWoB2OpQZtj3m6QtWKWJds+iwAB35DZKsPIxRn 4h63CXQEmTdBfrpprJBxa5Ql8EPkqeHkxubnDa1cv4R46kbgXbE4C+ypQKKqdiB0Xt nSP1ujjQ8b20juYC5D1fkOzfIjAhKRpwN4NvzCj3KA2gLouM/7A2OmjY8QpJhgtPAd oYVpI1jDkxjfw== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 21/26] af_unix: Remove lock dance in unix_peek_fds(). Date: Wed, 21 May 2025 14:45:29 +0000 Message-ID: <20250521144803.2050504-22-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 118f457da9ed58a79e24b73c2ef0aa1987241f0e ] In the previous GC implementation, the shape of the inflight socket graph was not expected to change while GC was in progress. MSG_PEEK was tricky because it could install inflight fd silently and transform the graph. Let's say we peeked a fd, which was a listening socket, and accept()ed some embryo sockets from it. The garbage collection algorithm would have been confused because the set of sockets visited in scan_inflight() would change within the same GC invocation. That's why we placed spin_lock(&unix_gc_lock) and spin_unlock() in unix_peek_fds() with a fat comment. In the new GC implementation, we no longer garbage-collect the socket if it exists in another queue, that is, if it has a bridge to another SCC. Also, accept() will require the lock if it has edges. Thus, we need not do the complicated lock dance. Signed-off-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20240401173125.92184-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 118f457da9ed58a79e24b73c2ef0aa1987241f0e) Signed-off-by: Lee Jones --- include/net/af_unix.h | 1 - net/unix/af_unix.c | 42 ------------------------------------------ net/unix/garbage.c | 2 +- 3 files changed, 1 insertion(+), 44 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index d88ca51a9081d..47042de4a2a9c 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -17,7 +17,6 @@ static inline struct unix_sock *unix_get_socket(struct fi= le *filp) } #endif =20 -extern spinlock_t unix_gc_lock; extern unsigned int unix_tot_inflight; void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver); void unix_del_edges(struct scm_fp_list *fpl); diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index eee0bccd7877b..df70d8a7ee837 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1796,48 +1796,6 @@ static void unix_detach_fds(struct scm_cookie *scm, = struct sk_buff *skb) static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) { scm->fp =3D scm_fp_dup(UNIXCB(skb).fp); - - /* - * Garbage collection of unix sockets starts by selecting a set of - * candidate sockets which have reference only from being in flight - * (total_refs =3D=3D inflight_refs). This condition is checked once dur= ing - * the candidate collection phase, and candidates are marked as such, so - * that non-candidates can later be ignored. While inflight_refs is - * protected by unix_gc_lock, total_refs (file count) is not, hence this - * is an instantaneous decision. - * - * Once a candidate, however, the socket must not be reinstalled into a - * file descriptor while the garbage collection is in progress. - * - * If the above conditions are met, then the directed graph of - * candidates (*) does not change while unix_gc_lock is held. - * - * Any operations that changes the file count through file descriptors - * (dup, close, sendmsg) does not change the graph since candidates are - * not installed in fds. - * - * Dequeing a candidate via recvmsg would install it into an fd, but - * that takes unix_gc_lock to decrement the inflight count, so it's - * serialized with garbage collection. - * - * MSG_PEEK is special in that it does not change the inflight count, - * yet does install the socket into an fd. The following lock/unlock - * pair is to ensure serialization with garbage collection. It must be - * done between incrementing the file count and installing the file into - * an fd. - * - * If garbage collection starts after the barrier provided by the - * lock/unlock, then it will see the elevated refcount and not mark this - * as a candidate. If a garbage collection is already in progress - * before the file count was incremented, then the lock/unlock pair will - * ensure that garbage collection is finished before progressing to - * installing the fd. - * - * (*) A -> B where B is on the queue of A or B is on the queue of C - * which is on the queue of listening socket A. - */ - spin_lock(&unix_gc_lock); - spin_unlock(&unix_gc_lock); } =20 static void unix_destruct_scm(struct sk_buff *skb) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 89ea71d9297ba..12a4ec27e0d4d 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -183,7 +183,7 @@ static void unix_free_vertices(struct scm_fp_list *fpl) } } =20 -DEFINE_SPINLOCK(unix_gc_lock); +static DEFINE_SPINLOCK(unix_gc_lock); unsigned int unix_tot_inflight; =20 void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver) --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED08A1FF61E; Wed, 21 May 2025 14:52:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839135; cv=none; b=MlufZkWp7O5TzOfyXAvXjH58NyPh/OJlapj9KUlskiXft4cSxZMyMaykBKGWh0Rn/Dz7GZq8P5tD9u9cOiqoC0PE64cctvnvY5Ao2BoLJA4IJT2cWIeTkBZYri1ADuqTO9mJHFZjADtSH37l0A0k7tqcMnh03f5rGdfI/h/9izM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839135; c=relaxed/simple; bh=Cj77OU4/P+Re9+XycqfKs1pRkE+Tyv4YPPPvDrFfbYM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fjKke2FFkmZqqjQ0v3huoEEFj5JklYl0zoF0XJbjzYYOgzWFpYGL6L9JgHCmzLt1hwSHxQVHRK8UfU/O+0jUOqusUuZKfH79/Q0XqrDrivtX4mrkduV4CBEhFUEPpJRcnxSlzO++YW1jz2ruVDTWubIgYK67YVyNx2X/pcVZ+PE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jT33xGSt; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jT33xGSt" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F2A21C4CEE4; Wed, 21 May 2025 14:52:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839134; bh=Cj77OU4/P+Re9+XycqfKs1pRkE+Tyv4YPPPvDrFfbYM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jT33xGSt2sT+jMeCZx+oiEjeZfdxTxD2wPqJLvLdPIff5VHMH9U22T+WnkR6qcdkF LoZvWgLnvT6BfuXzA2zhaAOwJITrVUi7h6OtTY/3TcuAWPpYNrm/logPJwyAsRVdyp LddDyrtLP3vyFuPszeoduKNNZsUTvvTuSb6I1CrpJdPNB1UtakACEkc12OIyamUvl9 CiadC2BNC1+cGM+nvowsWWqIQOKMR36fkeiPfqNXLiPOAvjxKjQ7eq09KbO/PF8RTX zgzcl4qsUuuDoyIWzuICMxOvUiiA5SSHjox6gZ0fU8cbRSyFtmOBjdsnR/ouUwCe0w /TZvxDRCF13kw== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org, kernel test robot Subject: [PATCH v6.6 22/26] af_unix: Try not to hold unix_gc_lock during accept(). Date: Wed, 21 May 2025 14:45:30 +0000 Message-ID: <20250521144803.2050504-23-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit fd86344823b521149bb31d91eba900ba3525efa6 ] Commit dcf70df2048d ("af_unix: Fix up unix_edge.successor for embryo socket.") added spin_lock(&unix_gc_lock) in accept() path, and it caused regression in a stress test as reported by kernel test robot. If the embryo socket is not part of the inflight graph, we need not hold the lock. To decide that in O(1) time and avoid the regression in the normal use case, 1. add a new stat unix_sk(sk)->scm_stat.nr_unix_fds 2. count the number of inflight AF_UNIX sockets in the receive queue under unix_state_lock() 3. move unix_update_edges() call under unix_state_lock() 4. avoid locking if nr_unix_fds is 0 in unix_update_edges() Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-lkp/202404101427.92a08551-oliver.sang@in= tel.com Signed-off-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20240413021928.20946-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni (cherry picked from commit fd86344823b521149bb31d91eba900ba3525efa6) Signed-off-by: Lee Jones --- include/net/af_unix.h | 1 + net/unix/af_unix.c | 2 +- net/unix/garbage.c | 20 ++++++++++++++++---- 3 files changed, 18 insertions(+), 5 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 47042de4a2a9c..b6eedf7650da5 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -67,6 +67,7 @@ struct unix_skb_parms { =20 struct scm_stat { atomic_t nr_fds; + unsigned long nr_unix_fds; }; =20 #define UNIXCB(skb) (*(struct unix_skb_parms *)&((skb)->cb)) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index df70d8a7ee837..236a2cd2bc93d 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1703,12 +1703,12 @@ static int unix_accept(struct socket *sock, struct = socket *newsock, int flags, } =20 tsk =3D skb->sk; - unix_update_edges(unix_sk(tsk)); skb_free_datagram(sk, skb); wake_up_interruptible(&unix_sk(sk)->peer_wait); =20 /* attach accepted sock to socket */ unix_state_lock(tsk); + unix_update_edges(unix_sk(tsk)); newsock->state =3D SS_CONNECTED; unix_sock_inherit_flags(sock, newsock); sock_graft(tsk, newsock); diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 12a4ec27e0d4d..95240a59808f2 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -209,6 +209,7 @@ void unix_add_edges(struct scm_fp_list *fpl, struct uni= x_sock *receiver) unix_add_edge(fpl, edge); } while (i < fpl->count_unix); =20 + receiver->scm_stat.nr_unix_fds +=3D fpl->count_unix; WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + fpl->count_unix); out: WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight + fpl->coun= t); @@ -222,6 +223,7 @@ void unix_add_edges(struct scm_fp_list *fpl, struct uni= x_sock *receiver) =20 void unix_del_edges(struct scm_fp_list *fpl) { + struct unix_sock *receiver; int i =3D 0; =20 spin_lock(&unix_gc_lock); @@ -235,6 +237,8 @@ void unix_del_edges(struct scm_fp_list *fpl) unix_del_edge(fpl, edge); } while (i < fpl->count_unix); =20 + receiver =3D fpl->edges[0].successor; + receiver->scm_stat.nr_unix_fds -=3D fpl->count_unix; WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - fpl->count_unix); out: WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight - fpl->coun= t); @@ -246,10 +250,18 @@ void unix_del_edges(struct scm_fp_list *fpl) =20 void unix_update_edges(struct unix_sock *receiver) { - spin_lock(&unix_gc_lock); - unix_update_graph(unix_sk(receiver->listener)->vertex); - receiver->listener =3D NULL; - spin_unlock(&unix_gc_lock); + /* nr_unix_fds is only updated under unix_state_lock(). + * If it's 0 here, the embryo socket is not part of the + * inflight graph, and GC will not see it, so no lock needed. + */ + if (!receiver->scm_stat.nr_unix_fds) { + receiver->listener =3D NULL; + } else { + spin_lock(&unix_gc_lock); + unix_update_graph(unix_sk(receiver->listener)->vertex); + receiver->listener =3D NULL; + spin_unlock(&unix_gc_lock); + } } =20 int unix_prepare_fpl(struct scm_fp_list *fpl) --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11F3D1DF72C; Wed, 21 May 2025 14:52:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839143; cv=none; b=Kv5dhFnSt8eha5KfM3K3kBor38Y5fHTUEV/+u7EM7E05PLNFSGrDED9NFyaesw0CVoFpeww36ffcK/qIeZvc3D3wDpP7xoZ5C5f4wAx5keJEbzxnlx6nqhQrZPmXRVkXb0oxU6UVGoGL++rBc+16Z33YlgReki8WsCJhKkMp/7I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839143; c=relaxed/simple; bh=RnbPGbaa6Hknh4pccWuYX64BmotyweWnfw3AQVu6Co0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MqCjJWu7qob+4W9aCGd2UFznj/Av2moTWqL3c1nigtoekZs4JzZm5ky3OfdUSDFmnT1JS/AM8e7qCP3ina7/xsJ0k6f1NDYVpHrIH1tkwe6F9TcJL5oHmm8UD/2DSpl+f0wOFLKTLZeqScCsWuFb0GKWL+5z50rgjA0eYzHeAoI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Xb22h8zi; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Xb22h8zi" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2AE77C4CEE7; Wed, 21 May 2025 14:52:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839142; bh=RnbPGbaa6Hknh4pccWuYX64BmotyweWnfw3AQVu6Co0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Xb22h8zixqnmPeWnCSK2OQFlkuvR23yIBlw3xDpHutktiBjTjuujlSK4fan4bp77o 9V5lrtASD2uBn3EvCbMReBvy1QsbRw6My1tOeTV84H0taROhUvw4v1yDvF+wNZy2/p +Hrsw7izbMQDY1zxKXbrdEH0f75sgxBK1Ixr2ZNxNvX42g9VfrclhZUTpxNONjshbO BZX1wTk+MpC80uLuoijZZx1hBn9ER75PY60pdlc2JBOicoRX/8Z4hvVbQqeYPttVGR udaGVtl8O1qUhYNIAxhO+gysJ2FNzLeiSvW8ySRqJ6hdnaPr00Ed7wFa52SfQr/A5q T5UgJ8aZXuDmg== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org, syzbot+f3f3eef1d2100200e593@syzkaller.appspotmail.com Subject: [PATCH v6.6 23/26] af_unix: Don't access successor in unix_del_edges() during GC. Date: Wed, 21 May 2025 14:45:31 +0000 Message-ID: <20250521144803.2050504-24-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 1af2dface5d286dd1f2f3405a0d6fa9f2c8fb998 ] syzbot reported use-after-free in unix_del_edges(). [0] What the repro does is basically repeat the following quickly. 1. pass a fd of an AF_UNIX socket to itself socketpair(AF_UNIX, SOCK_DGRAM, 0, [3, 4]) =3D 0 sendmsg(3, {..., msg_control=3D[{cmsg_len=3D20, cmsg_level=3DSOL_SOCKET, cmsg_type=3DSCM_RIGHTS, cmsg_data=3D[4]}= ], ...}, 0) =3D 0 2. pass other fds of AF_UNIX sockets to the socket above socketpair(AF_UNIX, SOCK_SEQPACKET, 0, [5, 6]) =3D 0 sendmsg(3, {..., msg_control=3D[{cmsg_len=3D48, cmsg_level=3DSOL_SOCKET, cmsg_type=3DSCM_RIGHTS, cmsg_data=3D[5, = 6]}], ...}, 0) =3D 0 3. close all sockets Here, two skb are created, and every unix_edge->successor is the first socket. Then, __unix_gc() will garbage-collect the two skb: (a) free skb with self-referencing fd (b) free skb holding other sockets After (a), the self-referencing socket will be scheduled to be freed later by the delayed_fput() task. syzbot repeated the sequences above (1. ~ 3.) quickly and triggered the task concurrently while GC was running. So, at (b), the socket was already freed, and accessing it was illegal. unix_del_edges() accesses the receiver socket as edge->successor to optimise GC. However, we should not do it during GC. Garbage-collecting sockets does not change the shape of the rest of the graph, so we need not call unix_update_graph() to update unix_graph_grouped when we purge skb. However, if we clean up all loops in the unix_walk_scc_fast() path, unix_graph_maybe_cyclic remains unchanged (true), and __unix_gc() will call unix_walk_scc_fast() continuously even though there is no socket to garbage-collect. To keep that optimisation while fixing UAF, let's add the same updating logic of unix_graph_maybe_cyclic in unix_walk_scc_fast() as done in unix_walk_scc() and __unix_walk_scc(). Note that when unix_del_edges() is called from other places, the receiver socket is always alive: - sendmsg: the successor's sk_refcnt is bumped by sock_hold() unix_find_other() for SOCK_DGRAM, connect() for SOCK_STREAM - recvmsg: the successor is the receiver, and its fd is alive [0]: BUG: KASAN: slab-use-after-free in unix_edge_successor net/unix/garbage.c:1= 09 [inline] BUG: KASAN: slab-use-after-free in unix_del_edge net/unix/garbage.c:165 [in= line] BUG: KASAN: slab-use-after-free in unix_del_edges+0x148/0x630 net/unix/garb= age.c:237 Read of size 8 at addr ffff888079c6e640 by task kworker/u8:6/1099 CPU: 0 PID: 1099 Comm: kworker/u8:6 Not tainted 6.9.0-rc4-next-20240418-syz= kaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Goo= gle 03/27/2024 Workqueue: events_unbound __unix_gc Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 unix_edge_successor net/unix/garbage.c:109 [inline] unix_del_edge net/unix/garbage.c:165 [inline] unix_del_edges+0x148/0x630 net/unix/garbage.c:237 unix_destroy_fpl+0x59/0x210 net/unix/garbage.c:298 unix_detach_fds net/unix/af_unix.c:1811 [inline] unix_destruct_scm+0x13e/0x210 net/unix/af_unix.c:1826 skb_release_head_state+0x100/0x250 net/core/skbuff.c:1127 skb_release_all net/core/skbuff.c:1138 [inline] __kfree_skb net/core/skbuff.c:1154 [inline] kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1190 __skb_queue_purge_reason include/linux/skbuff.h:3251 [inline] __skb_queue_purge include/linux/skbuff.h:3256 [inline] __unix_gc+0x1732/0x1830 net/unix/garbage.c:575 process_one_work kernel/workqueue.c:3218 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299 worker_thread+0x86d/0xd70 kernel/workqueue.c:3380 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Allocated by task 14427: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 unpoison_slab_object mm/kasan/common.c:312 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3897 [inline] slab_alloc_node mm/slub.c:3957 [inline] kmem_cache_alloc_noprof+0x135/0x290 mm/slub.c:3964 sk_prot_alloc+0x58/0x210 net/core/sock.c:2074 sk_alloc+0x38/0x370 net/core/sock.c:2133 unix_create1+0xb4/0x770 unix_create+0x14e/0x200 net/unix/af_unix.c:1034 __sock_create+0x490/0x920 net/socket.c:1571 sock_create net/socket.c:1622 [inline] __sys_socketpair+0x33e/0x720 net/socket.c:1773 __do_sys_socketpair net/socket.c:1822 [inline] __se_sys_socketpair net/socket.c:1819 [inline] __x64_sys_socketpair+0x9b/0xb0 net/socket.c:1819 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 1805: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2190 [inline] slab_free mm/slub.c:4393 [inline] kmem_cache_free+0x145/0x340 mm/slub.c:4468 sk_prot_free net/core/sock.c:2114 [inline] __sk_destruct+0x467/0x5f0 net/core/sock.c:2208 sock_put include/net/sock.h:1948 [inline] unix_release_sock+0xa8b/0xd20 net/unix/af_unix.c:665 unix_release+0x91/0xc0 net/unix/af_unix.c:1049 __sock_release net/socket.c:659 [inline] sock_close+0xbc/0x240 net/socket.c:1421 __fput+0x406/0x8b0 fs/file_table.c:422 delayed_fput+0x59/0x80 fs/file_table.c:445 process_one_work kernel/workqueue.c:3218 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299 worker_thread+0x86d/0xd70 kernel/workqueue.c:3380 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The buggy address belongs to the object at ffff888079c6e000 which belongs to the cache UNIX of size 1920 The buggy address is located 1600 bytes inside of freed 1920-byte region [ffff888079c6e000, ffff888079c6e780) Reported-by: syzbot+f3f3eef1d2100200e593@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=3Df3f3eef1d2100200e593 Fixes: 77e5593aebba ("af_unix: Skip GC if no cycle exists.") Fixes: fd86344823b5 ("af_unix: Try not to hold unix_gc_lock during accept()= .") Signed-off-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20240419235102.31707-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni (cherry picked from commit 1af2dface5d286dd1f2f3405a0d6fa9f2c8fb998) Signed-off-by: Lee Jones --- net/unix/garbage.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 95240a59808f2..d76450133e4f0 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -158,11 +158,14 @@ static void unix_add_edge(struct scm_fp_list *fpl, st= ruct unix_edge *edge) unix_update_graph(unix_edge_successor(edge)); } =20 +static bool gc_in_progress; + static void unix_del_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex =3D edge->predecessor->vertex; =20 - unix_update_graph(unix_edge_successor(edge)); + if (!gc_in_progress) + unix_update_graph(unix_edge_successor(edge)); =20 list_del(&edge->vertex_entry); vertex->out_degree--; @@ -237,8 +240,10 @@ void unix_del_edges(struct scm_fp_list *fpl) unix_del_edge(fpl, edge); } while (i < fpl->count_unix); =20 - receiver =3D fpl->edges[0].successor; - receiver->scm_stat.nr_unix_fds -=3D fpl->count_unix; + if (!gc_in_progress) { + receiver =3D fpl->edges[0].successor; + receiver->scm_stat.nr_unix_fds -=3D fpl->count_unix; + } WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - fpl->count_unix); out: WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight - fpl->coun= t); @@ -526,6 +531,8 @@ static void unix_walk_scc(struct sk_buff_head *hitlist) =20 static void unix_walk_scc_fast(struct sk_buff_head *hitlist) { + unix_graph_maybe_cyclic =3D false; + while (!list_empty(&unix_unvisited_vertices)) { struct unix_vertex *vertex; struct list_head scc; @@ -543,6 +550,8 @@ static void unix_walk_scc_fast(struct sk_buff_head *hit= list) =20 if (scc_dead) unix_collect_skb(&scc, hitlist); + else if (!unix_graph_maybe_cyclic) + unix_graph_maybe_cyclic =3D unix_scc_cyclic(&scc); =20 list_del(&scc); } @@ -550,8 +559,6 @@ static void unix_walk_scc_fast(struct sk_buff_head *hit= list) list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); } =20 -static bool gc_in_progress; - static void __unix_gc(struct work_struct *work) { struct sk_buff_head hitlist; --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9F061E1E00; Wed, 21 May 2025 14:52:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839150; cv=none; b=EBUZGO/Fnz6E3D0kSW+zrUujMuc+ap7m8UYcv7L08p3O+6TKOu2jHx+yji6rIPQO2lOI9uGxgJc7N7xmGXRh2Dt0dLxkzlfa8UxCam7KbjxGZzS8R6usDgzVFDi0Hp5/kbvGZ9b3D8W4aPXsUMWJu5I2aIqM6c3jSXgN0IC5p7g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839150; c=relaxed/simple; bh=Xj05gGRo0xoYevcLOG2pEIBlkR32I/nMes0A76C29J8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mrXkaUzscI+6f6OAiqtuI9aYjX/MXjxCharhBmaK0yDQxYqDCHmvDRwnhAzNmLA1rG6Bl0IELMoVC34hTVZxU8PYaEbUqPBwu9kuQpN6DN7I6t6VYrEKCOFgm+3b6ov1d2s6i3qB7m1t2xKJucPtZGeMPDe/bwgVUpOdAciDgyo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GpYs3fLK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GpYs3fLK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EA564C4CEE4; Wed, 21 May 2025 14:52:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839150; bh=Xj05gGRo0xoYevcLOG2pEIBlkR32I/nMes0A76C29J8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GpYs3fLKW2G7uGseO7b0EDm5jiVW58q4yfk+dha9DLP6yaWlIF9qHtlnTjzuWzPh8 G8bzSqCrjxDjCcwAJR8O/Jirkp2442qazofQFzc6/frI3454o7NqWOcnJrnAFSzOnP YWUR3E2uwnIukcS04BvqLJ9WnP0W6RZbtDNqSegGKVOMu4l7ZbejZIXes52H/2w56W uXntNavvO4WqGZ73ToSQ2lel/ZGObAaKQZqx4Aj1J1Bk/OEKpc9Xty9ot8vz2ihIFC 2AjoA1HIlDK5CG1zq+fJ25gvQWkJ2vizMcFYTxG+MWYhZcfoK1ui75Jk/39QKlA2WL BmQfR7CkuH52g== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 24/26] af_unix: Add dead flag to struct scm_fp_list. Date: Wed, 21 May 2025 14:45:32 +0000 Message-ID: <20250521144803.2050504-25-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kuniyuki Iwashima [ Upstream commit 7172dc93d621d5dc302d007e95ddd1311ec64283 ] Commit 1af2dface5d2 ("af_unix: Don't access successor in unix_del_edges() during GC.") fixed use-after-free by avoid accessing edge->successor while GC is in progress. However, there could be a small race window where another process could call unix_del_edges() while gc_in_progress is true and __skb_queue_purge() is on the way. So, we need another marker for struct scm_fp_list which indicates if the skb is garbage-collected. This patch adds dead flag in struct scm_fp_list and set it true before calling __skb_queue_purge(). Fixes: 1af2dface5d2 ("af_unix: Don't access successor in unix_del_edges() d= uring GC.") Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240508171150.50601-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski (cherry picked from commit 7172dc93d621d5dc302d007e95ddd1311ec64283) Signed-off-by: Lee Jones --- include/net/scm.h | 1 + net/core/scm.c | 1 + net/unix/garbage.c | 14 ++++++++++---- 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/include/net/scm.h b/include/net/scm.h index 07d66c41cc33c..059e287745dc3 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -32,6 +32,7 @@ struct scm_fp_list { short max; #ifdef CONFIG_UNIX bool inflight; + bool dead; struct list_head vertices; struct unix_edge *edges; #endif diff --git a/net/core/scm.c b/net/core/scm.c index 1e47788379c2c..431bfb3ea3929 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -91,6 +91,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_f= p_list **fplp) fpl->user =3D NULL; #if IS_ENABLED(CONFIG_UNIX) fpl->inflight =3D false; + fpl->dead =3D false; fpl->edges =3D NULL; INIT_LIST_HEAD(&fpl->vertices); #endif diff --git a/net/unix/garbage.c b/net/unix/garbage.c index d76450133e4f0..1f8b8cdfcdc8d 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -158,13 +158,11 @@ static void unix_add_edge(struct scm_fp_list *fpl, st= ruct unix_edge *edge) unix_update_graph(unix_edge_successor(edge)); } =20 -static bool gc_in_progress; - static void unix_del_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex =3D edge->predecessor->vertex; =20 - if (!gc_in_progress) + if (!fpl->dead) unix_update_graph(unix_edge_successor(edge)); =20 list_del(&edge->vertex_entry); @@ -240,7 +238,7 @@ void unix_del_edges(struct scm_fp_list *fpl) unix_del_edge(fpl, edge); } while (i < fpl->count_unix); =20 - if (!gc_in_progress) { + if (!fpl->dead) { receiver =3D fpl->edges[0].successor; receiver->scm_stat.nr_unix_fds -=3D fpl->count_unix; } @@ -559,9 +557,12 @@ static void unix_walk_scc_fast(struct sk_buff_head *hi= tlist) list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); } =20 +static bool gc_in_progress; + static void __unix_gc(struct work_struct *work) { struct sk_buff_head hitlist; + struct sk_buff *skb; =20 spin_lock(&unix_gc_lock); =20 @@ -579,6 +580,11 @@ static void __unix_gc(struct work_struct *work) =20 spin_unlock(&unix_gc_lock); =20 + skb_queue_walk(&hitlist, skb) { + if (UNIXCB(skb).fp) + UNIXCB(skb).fp->dead =3D true; + } + __skb_queue_purge(&hitlist); skip_gc: WRITE_ONCE(gc_in_progress, false); --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2422117E; Wed, 21 May 2025 14:52:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839158; cv=none; b=TnaxMpxyP/e+jHTlgBqQbC5NwMMPVSHXevTDOPjC+qzg3lEBU/IWLa+2uvpAe5yMUf9l7HEQ1luMc6MO2JrvQ6qmQZhxFyCBeVJpPcYOhwSPyvNZUZMUy4Glv3C1PQUShHY+IYS/sdyfHMDxc58773gru9S56MSaxnSKXY1lZAc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839158; c=relaxed/simple; bh=uHsEqfLjT5KcCIC8/dPxqDz7nX5lorkEZB+yPuJUV5o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E/xhRgKImEpdvqQNK2W9msmiddji395r5fkctiGT4ifLJ+E1nD10qkhTya2gJq/G0rhmcXd3q+Gxo1pSaHEu57bnR6VbMBaTkIG64b9inXXVqr6dmhq6NyLJDcSoCKcNIuBvEmINniwNElwpYD2sQGTUR6vCBx+0UWnDRLv89w4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jdYUU98o; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jdYUU98o" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BC228C4CEE4; Wed, 21 May 2025 14:52:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839158; bh=uHsEqfLjT5KcCIC8/dPxqDz7nX5lorkEZB+yPuJUV5o=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jdYUU98of9Eb/CRP6/rAtcygeVOaRKALw1lGV/2kfZGOv+IMw55sImkdkH94uTNgt jXLqK3gmF2IFdWbNix2QSiarx44h+Wx3dIUeyOyxda0zIshmtnhdo6nAihXSCBozUC r9Y74EP2IDUJK5O8p+5O/VJTN6tBosttEP/Om35NX0HGqNJQQTJswmXEau2QtkjdaL Sl9rlfTkVgmGZJ/ajSGq2LToUYeeVimUPLFlbyL7rCCli3pGy2OPh5xtkcZVEt5pXI xGJp6mDJXT0rtzMX8ou7UlwJcceOTpfxtPMaeU8D1Nm5FR25I+qcme+iMfeGI9HmAe 4juvheQZh+brw== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Pavel Begunkov , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org Subject: [PATCH v6.6 25/26] af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS Date: Wed, 21 May 2025 14:45:33 +0000 Message-ID: <20250521144803.2050504-26-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Michal Luczaj [ Upstream commit 041933a1ec7b4173a8e638cae4f8e394331d7e54 ] GC attempts to explicitly drop oob_skb's reference before purging the hit list. The problem is with embryos: kfree_skb(u->oob_skb) is never called on an embryo socket. The python script below [0] sends a listener's fd to its embryo as OOB data. While GC does collect the embryo's queue, it fails to drop the OOB skb's refcount. The skb which was in embryo's receive queue stays as unix_sk(sk)->oob_skb and keeps the listener's refcount [1]. Tell GC to dispose embryo's oob_skb. [0]: from array import array from socket import * addr =3D '\x00unix-oob' lis =3D socket(AF_UNIX, SOCK_STREAM) lis.bind(addr) lis.listen(1) s =3D socket(AF_UNIX, SOCK_STREAM) s.connect(addr) scm =3D (SOL_SOCKET, SCM_RIGHTS, array('i', [lis.fileno()])) s.sendmsg([b'x'], [scm], MSG_OOB) lis.close() [1] $ grep unix-oob /proc/net/unix $ ./unix-oob.py $ grep unix-oob /proc/net/unix 0000000000000000: 00000002 00000000 00000000 0001 02 0 @unix-oob 0000000000000000: 00000002 00000000 00010000 0001 01 6072 @unix-oob Fixes: 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") Signed-off-by: Michal Luczaj Reviewed-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni (cherry picked from commit 041933a1ec7b4173a8e638cae4f8e394331d7e54) Signed-off-by: Lee Jones --- net/unix/garbage.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 1f8b8cdfcdc8d..dfe94a90ece40 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -342,6 +342,18 @@ enum unix_recv_queue_lock_class { U_RECVQ_LOCK_EMBRYO, }; =20 +static void unix_collect_queue(struct unix_sock *u, struct sk_buff_head *h= itlist) +{ + skb_queue_splice_init(&u->sk.sk_receive_queue, hitlist); + +#if IS_ENABLED(CONFIG_AF_UNIX_OOB) + if (u->oob_skb) { + WARN_ON_ONCE(skb_unref(u->oob_skb)); + u->oob_skb =3D NULL; + } +#endif +} + static void unix_collect_skb(struct list_head *scc, struct sk_buff_head *h= itlist) { struct unix_vertex *vertex; @@ -365,18 +377,11 @@ static void unix_collect_skb(struct list_head *scc, s= truct sk_buff_head *hitlist =20 /* listener -> embryo order, the inversion never happens. */ spin_lock_nested(&embryo_queue->lock, U_RECVQ_LOCK_EMBRYO); - skb_queue_splice_init(embryo_queue, hitlist); + unix_collect_queue(unix_sk(skb->sk), hitlist); spin_unlock(&embryo_queue->lock); } } else { - skb_queue_splice_init(queue, hitlist); - -#if IS_ENABLED(CONFIG_AF_UNIX_OOB) - if (u->oob_skb) { - kfree_skb(u->oob_skb); - u->oob_skb =3D NULL; - } -#endif + unix_collect_queue(u, hitlist); } =20 spin_unlock(&queue->lock); --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Sun Dec 14 19:36:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B592220F46; Wed, 21 May 2025 14:52:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839166; cv=none; b=T57xNlpmmSwUKUIcwPedOtec7bLPowr8RgXYVoYwEWzebjvFB2IM1g5g7w1F14shz3MJq3v8PprvLTxDf8BTXQrac5IZn2X6WhWdFKVPqol7WWKnf6cY4xx75+9PQhX2ayAZu5jC8ybxI6rWZ9W2ySWtpuo2GhV1iQOrYtaHglE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747839166; c=relaxed/simple; bh=aXs55NlSVM3VIwWtlSAUIA7UYBiL2MuTfKtNoqtLmI0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pMb0s8FDE45Bjp6MM5bhx8oRvpd5R/b37EV88/8U0Blw7tyvHTK+kMnYFCtuqSdeiXwHcyHs6qTLGww9LMytwGh0LxJDXqI/G0JRGE+HB8uYNNDYlZKwhDISKWV8KS5RXxbo/8LpB+7z0N0BoYVv/G7u9Vp3NUUEOadwrnf/u9M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=H56f4tLY; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="H56f4tLY" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6DD83C4CEE4; Wed, 21 May 2025 14:52:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747839166; bh=aXs55NlSVM3VIwWtlSAUIA7UYBiL2MuTfKtNoqtLmI0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=H56f4tLYqma46ZytPEcHP/iUlcHm9kwD74+sSJtVo1B/BpTLNSiTJwP6RK2u6rBBm 6MzMVUQ00WPYh4fNrqh8dc/4bA3+cGRYw4ho8m6UlnWZBFd86VxVCusmYQLc6xv6Dj Bv4blWZXwft9yy8+V1HKJNdQcLycN+o9LBUpUo9zuaJ73P9S3WOLoAVXVdvxbY/JOY 1S0Ba3XAKQzVNGrBW+sdZdAnMVMnoB+2hYznn3sVCP32EelvuYskW3Fvg9cYoaVc3P jrE3JsoSVLo3aaOJUS5ICN1vqCq4wQ4YStxgXv9GEhlr266zkjh1okhX1xe3xhS1nh 4Ek1fuG0o7WcQ== From: Lee Jones To: lee@kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Kuniyuki Iwashima , Jens Axboe , Sasha Levin , Michal Luczaj , Rao Shoaib , Simon Horman , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: stable@vger.kernel.org, Shigeru Yoshida , syzkaller Subject: [PATCH v6.6 26/26] af_unix: Fix uninit-value in __unix_walk_scc() Date: Wed, 21 May 2025 14:45:34 +0000 Message-ID: <20250521144803.2050504-27-lee@kernel.org> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog In-Reply-To: <20250521144803.2050504-1-lee@kernel.org> References: <20250521144803.2050504-1-lee@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Shigeru Yoshida [ Upstream commit 927fa5b3e4f52e0967bfc859afc98ad1c523d2d5 ] KMSAN reported uninit-value access in __unix_walk_scc() [1]. In the list_for_each_entry_reverse() loop, when the vertex's index equals it's scc_index, the loop uses the variable vertex as a temporary variable that points to a vertex in scc. And when the loop is finished, the variable vertex points to the list head, in this case scc, which is a local variable on the stack (more precisely, it's not even scc and might underflow the call stack of __unix_walk_scc(): container_of(&scc, struct unix_vertex, scc_entry)). However, the variable vertex is used under the label prev_vertex. So if the edge_stack is not empty and the function jumps to the prev_vertex label, the function will access invalid data on the stack. This causes the uninit-value access issue. Fix this by introducing a new temporary variable for the loop. [1] BUG: KMSAN: uninit-value in __unix_walk_scc net/unix/garbage.c:478 [inline] BUG: KMSAN: uninit-value in unix_walk_scc net/unix/garbage.c:526 [inline] BUG: KMSAN: uninit-value in __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584 __unix_walk_scc net/unix/garbage.c:478 [inline] unix_walk_scc net/unix/garbage.c:526 [inline] __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393 kthread+0x3c4/0x530 kernel/kthread.c:389 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Uninit was stored to memory at: unix_walk_scc net/unix/garbage.c:526 [inline] __unix_gc+0x2adf/0x3c20 net/unix/garbage.c:584 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393 kthread+0x3c4/0x530 kernel/kthread.c:389 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Local variable entries created at: ref_tracker_free+0x48/0xf30 lib/ref_tracker.c:222 netdev_tracker_free include/linux/netdevice.h:4058 [inline] netdev_put include/linux/netdevice.h:4075 [inline] dev_put include/linux/netdevice.h:4101 [inline] update_gid_event_work_handler+0xaa/0x1b0 drivers/infiniband/core/roce_gid_= mgmt.c:813 CPU: 1 PID: 12763 Comm: kworker/u8:31 Not tainted 6.10.0-rc4-00217-g35bb670= d65fc #32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 0= 4/01/2014 Workqueue: events_unbound __unix_gc Fixes: 3484f063172d ("af_unix: Detect Strongly Connected Components.") Reported-by: syzkaller Signed-off-by: Shigeru Yoshida Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20240702160428.10153-1-syoshida@redhat.com Signed-off-by: Jakub Kicinski (cherry picked from commit 927fa5b3e4f52e0967bfc859afc98ad1c523d2d5) Signed-off-by: Lee Jones --- net/unix/garbage.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index dfe94a90ece40..23efb78fe9ef4 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -476,6 +476,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex,= unsigned long *last_inde } =20 if (vertex->index =3D=3D vertex->scc_index) { + struct unix_vertex *v; struct list_head scc; bool scc_dead =3D true; =20 @@ -486,15 +487,15 @@ static void __unix_walk_scc(struct unix_vertex *verte= x, unsigned long *last_inde */ __list_cut_position(&scc, &vertex_stack, &vertex->scc_entry); =20 - list_for_each_entry_reverse(vertex, &scc, scc_entry) { + list_for_each_entry_reverse(v, &scc, scc_entry) { /* Don't restart DFS from this vertex in unix_walk_scc(). */ - list_move_tail(&vertex->entry, &unix_visited_vertices); + list_move_tail(&v->entry, &unix_visited_vertices); =20 /* Mark vertex as off-stack. */ - vertex->index =3D unix_vertex_grouped_index; + v->index =3D unix_vertex_grouped_index; =20 if (scc_dead) - scc_dead =3D unix_vertex_dead(vertex); + scc_dead =3D unix_vertex_dead(v); } =20 if (scc_dead) --=20 2.49.0.1112.g889b7c5bd8-goog