From nobody Tue Oct 7 10:35:44 2025 Received: from m16.mail.163.com (m16.mail.163.com [220.197.31.5]) by smtp.subspace.kernel.org (Postfix) with ESMTP id AF8791DF75C; Thu, 10 Jul 2025 10:27:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.31.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752143234; cv=none; b=RIxQyIKKUyLZ9EPumllPs9DVKatZuTuTGFxJ1xt6chiQZT+IQoN66lS93NHt8vkpYSExQ/y6XzJ7RRoCusIfPuRIA+iA0nI/D4/QiaSu3veNjI314vajOSyl+OGX7aBd0THXc2zfiytL0VsjbXjLsaOAdmQ1KP0rIwV1qhXPnQw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752143234; c=relaxed/simple; bh=V7vFD7/Fb8G++Y1kaiWeRxlwbZhdhohmLHeU+PMkKdI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OGE17o4IVhFcfuMbannYbMRJAXbwgr3/2rX8E85Nt8ZTgM7ir5jVkusvqYtk6fZov9a6Qzls+z6oanv5TtIr6WDnFjrAUckrLpgVCHjAQNvFi/Wlx75+ieB7AEypndQSaqI+ZCo40KPyXfwBYmSWtpho4nHdpxFctJ9GEK333jA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com; spf=pass smtp.mailfrom=163.com; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b=M27Gz/d1; arc=none smtp.client-ip=220.197.31.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=163.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b="M27Gz/d1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=9j sQItpI3KggjxElmfZojdWrsrVGwjxm/rP2m4AKOMA=; b=M27Gz/d1EGqoZdCuZt MzKgkwl7d22rG61D0Fy67NXLbNGCZq0pF6mD752l/1RAPbbxZ+D6WYS/uVAQsWOt GzSMALSMGF4VAtmL+uOWH/K3mCHT/pmM5cPNJYMy1ARi5ZoTfiPw/KJARHQDeMT2 JGPwwyPa5LnVn8pzXM92awzFc= Received: from kylin-ERAZER-H610M.. (unknown []) by gzga-smtp-mtada-g1-1 (Coremail) with SMTP id _____wAXowFflW9oGKJ1Dw--.26998S3; Thu, 10 Jul 2025 18:26:41 +0800 (CST) From: Yun Lu To: willemdebruijn.kernel@gmail.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, horms@kernel.org Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 1/3] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd() Date: Thu, 10 Jul 2025 18:26:37 +0800 Message-ID: <20250710102639.280932-2-luyun_611@163.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250710102639.280932-1-luyun_611@163.com> References: <20250710102639.280932-1-luyun_611@163.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _____wAXowFflW9oGKJ1Dw--.26998S3 X-Coremail-Antispam: 1Uf129KBjvJXoWxJr13Zr17KF13tF1fJryfXrb_yoW8tr4rpa y5K347XayrJr10gr1xJ3Z8X3W3X3y8JrZ3CryFv3Waywnxtr9aqF18t3yj9FyrZaykAa43 JF1vvr45Aw1Uta7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07jsYFAUUUUU= X-CM-SenderInfo: pox130jbwriqqrwthudrp/1tbiWwuGzmhvj82eWwABsZ Content-Type: text/plain; charset="utf-8" From: Yun Lu Due to the changes in commit 581073f626e3 ("af_packet: do not call packet_read_pending() from tpacket_destruct_skb()"), every time tpacket_destruct_skb() is executed, the skb_completion is marked as completed. When wait_for_completion_interruptible_timeout() returns completed, the pending_refcnt has not yet been reduced to zero. Therefore, when ph is NULL, the wait function may need to be called multiple times until packet_read_pending() finally returns zero. We should call sock_sndtimeo() only once, otherwise the SO_SNDTIMEO constraint could be way off. Fixes: 581073f626e3 ("af_packet: do not call packet_read_pending() from tpa= cket_destruct_skb()") Cc: stable@kernel.org Suggested-by: Eric Dumazet Signed-off-by: Yun Lu Reviewed-by: Eric Dumazet Reviewed-by: Willem de Bruijn --- net/packet/af_packet.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 3d43f3eae759..7089b8c2a655 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2785,7 +2785,7 @@ static int tpacket_snd(struct packet_sock *po, struct= msghdr *msg) int len_sum =3D 0; int status =3D TP_STATUS_AVAILABLE; int hlen, tlen, copylen =3D 0; - long timeo =3D 0; + long timeo; =20 mutex_lock(&po->pg_vec_lock); =20 @@ -2839,6 +2839,7 @@ static int tpacket_snd(struct packet_sock *po, struct= msghdr *msg) if ((size_max > dev->mtu + reserve + VLAN_HLEN) && !vnet_hdr_sz) size_max =3D dev->mtu + reserve + VLAN_HLEN; =20 + timeo =3D sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT); reinit_completion(&po->skb_completion); =20 do { @@ -2846,7 +2847,6 @@ static int tpacket_snd(struct packet_sock *po, struct= msghdr *msg) TP_STATUS_SEND_REQUEST); if (unlikely(ph =3D=3D NULL)) { if (need_wait && skb) { - timeo =3D sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT); timeo =3D wait_for_completion_interruptible_timeout(&po->skb_completio= n, timeo); if (timeo <=3D 0) { err =3D !timeo ? -ETIMEDOUT : -ERESTARTSYS; --=20 2.43.0 From nobody Tue Oct 7 10:35:44 2025 Received: from m16.mail.163.com (m16.mail.163.com [220.197.31.4]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 074C72E3371; Thu, 10 Jul 2025 10:27:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.31.4 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752143236; cv=none; b=H2Od5ad4cgRUuqGCEDXIE0a0Vmch1etPjf/Mn2xJxcYUjA4HeSUq40Zw2z4AghM0nQfR+p2GjSFNOCWW7Rd7oXq16fNGDftHD9RTdmKFimCg27IPMil4ibxmGcgsugCOAuDTTeEnV0tXYHkt2ToL6GSZLt/47ClyfMY+MqCMswY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752143236; c=relaxed/simple; bh=zp37PlGkHfVmmdq48K1DE8QuYM9q30RGH3jGioaNtZM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JmSr59C/6BzntC0K1B/pxYnDDF/PDnmpv9rVQVkLItMXGvU/GrwNsYUkZuxWYXDhzpIcAw84SPW3yi9sgV455WDio7IV6bCnHZMzyeClfTPjZLMrcIGj1f45R2cB4PZUNdOAuaz5oqI8XoIT6adxtj1DHXFIM//4LpGxZwtTv1M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com; spf=pass smtp.mailfrom=163.com; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b=WRv5MeGD; arc=none smtp.client-ip=220.197.31.4 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=163.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b="WRv5MeGD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=Jn IZhnuu7jsGc4BggO4443W9yHX2Lgf/XTKh0QwjRG8=; b=WRv5MeGDjuS06fDqyx N/atfiGN7KNPoBRLa32MbJ8OUaIXwmaMk271m1N6Zu168TM/G8Ete9qJDUxdW0r8 RLNJCG5eabQ/3F408UwmlXRsMN3DYSbz6GKSoUuEoD4QYwBKiqjgMqOoc4XHxCvd Rm8rNshAehcjlqgd+21vVyrpc= Received: from kylin-ERAZER-H610M.. (unknown []) by gzga-smtp-mtada-g1-1 (Coremail) with SMTP id _____wAXowFflW9oGKJ1Dw--.26998S4; Thu, 10 Jul 2025 18:26:41 +0800 (CST) From: Yun Lu To: willemdebruijn.kernel@gmail.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, horms@kernel.org Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 2/3] af_packet: fix soft lockup issue caused by tpacket_snd() Date: Thu, 10 Jul 2025 18:26:38 +0800 Message-ID: <20250710102639.280932-3-luyun_611@163.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250710102639.280932-1-luyun_611@163.com> References: <20250710102639.280932-1-luyun_611@163.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _____wAXowFflW9oGKJ1Dw--.26998S4 X-Coremail-Antispam: 1Uf129KBjvJXoWxZF1xXFy7Zr4rKw17Aw1rtFb_yoW5urWxpa yYg347t3WDGr1Iqr18Ga1kJr12vw4rJFsrGrWkJ34SywnIyF9ayrWIkrWj9FyUZFWDta4a vF4qvr1UAa4DAaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07jV1vsUUUUU= X-CM-SenderInfo: pox130jbwriqqrwthudrp/1tbiQxqGzmhvkAiYvQAAsr Content-Type: text/plain; charset="utf-8" From: Yun Lu When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for pending_refcnt to decrement to zero before returning. The pending_refcnt is decremented by 1 when the skb->destructor function is called, indicating that the skb has been successfully sent and needs to be destroyed. If an error occurs during this process, the tpacket_snd() function will exit and return error, but pending_refcnt may not yet have decremented to zero. Assuming the next send operation is executed immediately, but there are no available frames to be sent in tx_ring (i.e., packet_current_frame returns NULL), and skb is also NULL, the function will not execute wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it will enter a do-while loop, waiting for pending_refcnt to be zero. Even if the previous skb has completed transmission, the skb->destructor function can only be invoked in the ksoftirqd thread (assuming NAPI threading is enabled). When both the ksoftirqd thread and the tpacket_snd operation happen to run on the same CPU, and the CPU trapped in the do-while loop without yielding, the ksoftirqd thread will not get scheduled to run. As a result, pending_refcnt will never be reduced to zero, and the do-while loop cannot exit, eventually leading to a CPU soft lockup issue. In fact, skb is true for all but the first iterations of that loop, and as long as pending_refcnt is not zero, even if incremented by a previous call, wait_for_completion_interruptible_timeout() should be executed to yield the CPU, allowing the ksoftirqd thread to be scheduled. Therefore, the execution condition of this function should be modified to check if pending_refcnt is not zero, instead of check skb. As a result, packet_read_pending() may be called twice in the loop. This will be optimized in the following patch. Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for trans= mit to complete in AF_PACKET") Cc: stable@kernel.org Suggested-by: LongJun Tang Signed-off-by: Yun Lu --- Changes in v4: - Split to the fix alone. Thanks: Willem de Bruijn. - Link to v3: https://lore.kernel.org/all/20250709095653.62469-3-luyun_611@= 163.com/ Changes in v3: - Simplify the code and reuse ph to continue. Thanks: Eric Dumazet. - Link to v2: https://lore.kernel.org/all/20250708020642.27838-1-luyun_611@= 163.com/ Changes in v2: - Add a Fixes tag. - Link to v1: https://lore.kernel.org/all/20250707081629.10344-1-luyun_611@= 163.com/ --- --- net/packet/af_packet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 7089b8c2a655..581a96ec8e1a 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2846,7 +2846,7 @@ static int tpacket_snd(struct packet_sock *po, struct= msghdr *msg) ph =3D packet_current_frame(po, &po->tx_ring, TP_STATUS_SEND_REQUEST); if (unlikely(ph =3D=3D NULL)) { - if (need_wait && skb) { + if (need_wait && packet_read_pending(&po->tx_ring)) { timeo =3D wait_for_completion_interruptible_timeout(&po->skb_completio= n, timeo); if (timeo <=3D 0) { err =3D !timeo ? -ETIMEDOUT : -ERESTARTSYS; --=20 2.43.0 From nobody Tue Oct 7 10:35:44 2025 Received: from m16.mail.163.com (m16.mail.163.com [117.135.210.3]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 30B7D2E5B13; Thu, 10 Jul 2025 10:27:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=117.135.210.3 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752143237; cv=none; b=Ypnpea+jdRRf2MEv5HmuE4XV+ZR/2IUuvU9WWgriSklrQ64Y5ya6i5PgM4eyCfinpk6u6UKuXbZvFjiQkp4VC/LhDSce93lcbPB9JiWJyqAh2Z98sJmKEYoxU3RTzaStK0F917kZhqdKq2JsiQ88Y62O0AfEWp+nR52HzeIgrV8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752143237; c=relaxed/simple; bh=7/9imShxUWV2j7e1Dds7bGVkk++VkobjkiHuGilplNs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hqHgg2yoW0HhoE7BIaJVxaj3J2XzT2vTsFnGhbSoXG1hhq9x7Q4u0Xt++ci9m2i94oX5fW7FwmfLzDeWW+shkAFtsYhpTRDOJ5rU0rSSMCs4Kkaf9KtbXAWZQsGq/V7P1g5jNOMRqE3hiChfZXwiwtXXn0NarKFNFD/EeXIRVM8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com; spf=pass smtp.mailfrom=163.com; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b=HCK2S4QP; arc=none smtp.client-ip=117.135.210.3 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=163.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b="HCK2S4QP" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=gC mZPOJXf4vSfwRl47UYWa98GUAKgdHSUDNq8kWpIDA=; b=HCK2S4QP15bTo3bstN 4B1VFKhoFWYI1BDJuE2j9qTAzPiwaHgvZteOPdlRIHXl/Rd7SXtcCC4NfFc4rf83 aF3+vmDl93y+DohXna6Qcg1WGDZElLUdpCNJYCPSdT/t4WYZt0TkDYoReWsLbB/U pa/dTt7CcL8MgkyzafpZaNLJw= Received: from kylin-ERAZER-H610M.. (unknown []) by gzga-smtp-mtada-g1-1 (Coremail) with SMTP id _____wAXowFflW9oGKJ1Dw--.26998S5; Thu, 10 Jul 2025 18:26:42 +0800 (CST) From: Yun Lu To: willemdebruijn.kernel@gmail.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, horms@kernel.org Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 3/3] af_packet: optimize the packet_read_pending function called on tpacket_snd() Date: Thu, 10 Jul 2025 18:26:39 +0800 Message-ID: <20250710102639.280932-4-luyun_611@163.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250710102639.280932-1-luyun_611@163.com> References: <20250710102639.280932-1-luyun_611@163.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _____wAXowFflW9oGKJ1Dw--.26998S5 X-Coremail-Antispam: 1Uf129KBjvJXoW7Aw1DWF43CFW5ArykuFWxWFg_yoW8ur13pa yF9r92qwn8Xr17tw4xAF1kJF1Yvw48JFZ5J395X3WaywnxJ3sYvryIyrWj9Fy8uFWxX3W2 qF90yr15Cw1UtFDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07jPnYwUUUUU= X-CM-SenderInfo: pox130jbwriqqrwthudrp/1tbiQxqGzmhvkAiYvQABsq Content-Type: text/plain; charset="utf-8" From: Yun Lu Now the packet_read_pending() may be called twice in the do-while loop, and this function is super expensive on hosts with a large number of cpu, as it's per_cpu variable. In fact, the second call at the end can be removed by reusing the ph to continue for the next iteration, and the ph will be reassigned at the start of the next iteration. Signed-off-by: Yun Lu --- net/packet/af_packet.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 581a96ec8e1a..ea7219e0c23a 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2846,12 +2846,21 @@ static int tpacket_snd(struct packet_sock *po, stru= ct msghdr *msg) ph =3D packet_current_frame(po, &po->tx_ring, TP_STATUS_SEND_REQUEST); if (unlikely(ph =3D=3D NULL)) { + /* Note: packet_read_pending() might be slow if we + * have to call it as it's per_cpu variable, but in + * fast-path we don't have to call it, only when ph + * is NULL, we need to check pending_refcnt. + */ if (need_wait && packet_read_pending(&po->tx_ring)) { timeo =3D wait_for_completion_interruptible_timeout(&po->skb_completio= n, timeo); if (timeo <=3D 0) { err =3D !timeo ? -ETIMEDOUT : -ERESTARTSYS; goto out_put; } + /* Just reuse ph to continue for the next iteration, and + * ph will be reassigned at the start of the next iteration. + */ + ph =3D (void *)1; } /* check for additional frames */ continue; @@ -2943,14 +2952,7 @@ static int tpacket_snd(struct packet_sock *po, struc= t msghdr *msg) } packet_increment_head(&po->tx_ring); len_sum +=3D tp_len; - } while (likely((ph !=3D NULL) || - /* Note: packet_read_pending() might be slow if we have - * to call it as it's per_cpu variable, but in fast-path - * we already short-circuit the loop with the first - * condition, and luckily don't have to go that path - * anyway. - */ - (need_wait && packet_read_pending(&po->tx_ring)))); + } while (likely(ph !=3D NULL)); =20 err =3D len_sum; goto out_put; --=20 2.43.0