net/packet/af_packet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
From: Yun Lu <luyun@kylinos.cn>
When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for
pending_refcnt to decrement to zero before returning. The pending_refcnt
is decremented by 1 when the skb->destructor function is called,
indicating that the skb has been successfully sent and needs to be
destroyed.
If an error occurs during this process, the tpacket_snd() function will
exit and return error, but pending_refcnt may not yet have decremented to
zero. Assuming the next send operation is executed immediately, but there
are no available frames to be sent in tx_ring (i.e., packet_current_frame
returns NULL), and skb is also NULL, the function will not execute
wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it
will enter a do-while loop, waiting for pending_refcnt to be zero. Even
if the previous skb has completed transmission, the skb->destructor
function can only be invoked in the ksoftirqd thread (assuming NAPI
threading is enabled). When both the ksoftirqd thread and the tpacket_snd
operation happen to run on the same CPU, and the CPU trapped in the
do-while loop without yielding, the ksoftirqd thread will not get
scheduled to run. As a result, pending_refcnt will never be reduced to
zero, and the do-while loop cannot exit, eventually leading to a CPU soft
lockup issue.
In fact, as long as pending_refcnt is not zero, even if skb is NULL,
wait_for_completion_interruptible_timeout() should be executed to yield
the CPU, allowing the ksoftirqd thread to be scheduled. Therefore, the
execution condition of this function should be modified to check if
pending_refcnt is not zero.
Signed-off-by: Yun Lu <luyun@kylinos.cn>
---
net/packet/af_packet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 3d43f3eae759..7df96311adb8 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2845,7 +2845,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
ph = packet_current_frame(po, &po->tx_ring,
TP_STATUS_SEND_REQUEST);
if (unlikely(ph == NULL)) {
- if (need_wait && skb) {
+ if (need_wait && packet_read_pending(&po->tx_ring)) {
timeo = sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT);
timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
if (timeo <= 0) {
--
2.43.0
On Mon, Jul 7, 2025 at 1:16 AM Yun Lu <luyun_611@163.com> wrote: > > From: Yun Lu <luyun@kylinos.cn> > > When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for > pending_refcnt to decrement to zero before returning. The pending_refcnt > is decremented by 1 when the skb->destructor function is called, > indicating that the skb has been successfully sent and needs to be > destroyed. > > If an error occurs during this process, the tpacket_snd() function will > exit and return error, but pending_refcnt may not yet have decremented to > zero. Assuming the next send operation is executed immediately, but there > are no available frames to be sent in tx_ring (i.e., packet_current_frame > returns NULL), and skb is also NULL, the function will not execute > wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it > will enter a do-while loop, waiting for pending_refcnt to be zero. Even > if the previous skb has completed transmission, the skb->destructor > function can only be invoked in the ksoftirqd thread (assuming NAPI > threading is enabled). When both the ksoftirqd thread and the tpacket_snd > operation happen to run on the same CPU, and the CPU trapped in the > do-while loop without yielding, the ksoftirqd thread will not get > scheduled to run. As a result, pending_refcnt will never be reduced to > zero, and the do-while loop cannot exit, eventually leading to a CPU soft > lockup issue. > > In fact, as long as pending_refcnt is not zero, even if skb is NULL, > wait_for_completion_interruptible_timeout() should be executed to yield > the CPU, allowing the ksoftirqd thread to be scheduled. Therefore, the > execution condition of this function should be modified to check if > pending_refcnt is not zero. > > Signed-off-by: Yun Lu <luyun@kylinos.cn> I think you forgot a Fixes: tag. Also it seems the soft lockup could happen if MSG_DONTWAIT is set ?
在 2025/7/7 16:56, Eric Dumazet 写道: > On Mon, Jul 7, 2025 at 1:16 AM Yun Lu <luyun_611@163.com> wrote: >> From: Yun Lu <luyun@kylinos.cn> >> >> When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for >> pending_refcnt to decrement to zero before returning. The pending_refcnt >> is decremented by 1 when the skb->destructor function is called, >> indicating that the skb has been successfully sent and needs to be >> destroyed. >> >> If an error occurs during this process, the tpacket_snd() function will >> exit and return error, but pending_refcnt may not yet have decremented to >> zero. Assuming the next send operation is executed immediately, but there >> are no available frames to be sent in tx_ring (i.e., packet_current_frame >> returns NULL), and skb is also NULL, the function will not execute >> wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it >> will enter a do-while loop, waiting for pending_refcnt to be zero. Even >> if the previous skb has completed transmission, the skb->destructor >> function can only be invoked in the ksoftirqd thread (assuming NAPI >> threading is enabled). When both the ksoftirqd thread and the tpacket_snd >> operation happen to run on the same CPU, and the CPU trapped in the >> do-while loop without yielding, the ksoftirqd thread will not get >> scheduled to run. As a result, pending_refcnt will never be reduced to >> zero, and the do-while loop cannot exit, eventually leading to a CPU soft >> lockup issue. >> >> In fact, as long as pending_refcnt is not zero, even if skb is NULL, >> wait_for_completion_interruptible_timeout() should be executed to yield >> the CPU, allowing the ksoftirqd thread to be scheduled. Therefore, the >> execution condition of this function should be modified to check if >> pending_refcnt is not zero. >> >> Signed-off-by: Yun Lu <luyun@kylinos.cn> > I think you forgot a Fixes: tag. Thank you for your advise, I will add this tag in v2 version later. > > Also it seems the soft lockup could happen if MSG_DONTWAIT is set ? If MSG_DONTWAIT is set, need_wait will be false. In this case, once there are no available frames to send (i.e., ph is NULL), the while loop condition will not be satisfied, and the loop will exit and return immediately without waiting for pending_refcnt to decrease to 0. The soft lockup issue should no longer occur. while (likely((ph != NULL) || (need_wait && packet_read_pending(&po->tx_ring))));
© 2016 - 2025 Red Hat, Inc.