From nobody Sun Dec 14 06:37:01 2025 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18D6519F13F for ; Fri, 12 Dec 2025 15:27:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765553279; cv=none; b=GxnpDoMKIb4NX3ghKdtvi//b+v5QcuI6FevrhIVAmt+skFQA5wKPb6RMoVPxdiKfEZ8uHj++Q0UUrJkJrcxg3Nt3jBkxl5tYYEzBFNQ0g6siMcosVSlwN8H0DHc+CqpXDjbWfKzpl6+3lSxbFjdBz32g2H5z2BstDAn/nEbNa2M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765553279; c=relaxed/simple; bh=i+qr47P2eCDHTXJAVkjOdS7xg57iGJ43qDshBEK6miY=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=EGfM7hbHiOrXtpG9mxei/U/mzAXxTAQSmt3Jd+USobRhOG7xyxwH7vXnYgS8pqwO6xmCHtiVUnovbuKweLDUeZzevPWemtYRTNpPbBSxv/6vdhY5sAU1uSos5ovUSpqhmuEKU+mWvmDUoV7m6EIJDpDrHNL7Ka1c/N08ttpS5Jk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=eoOXaMoQ; arc=none smtp.client-ip=209.85.216.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eoOXaMoQ" Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-34c27d14559so7725a91.2 for ; Fri, 12 Dec 2025 07:27:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765553277; x=1766158077; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=bzENitL8AUZ68RERyt6xp7p+jtCblr0NyYweq7WA0Os=; b=eoOXaMoQMgVMdNzz88yV8SuElgbM5NJDiq50Dw5EON2DksG0/gN/2zEa+0K7l+sHOQ xZfhP6miNtyaLWBbDErCy5ygXBmp34U7aa8DfgeVifkOWjuQGpMkZzCLlyCZw1RVVf+U 5xHMjRPTqj40hU+Sd1ruoHaKLiG1Qz/YY6S6+nj+c99ebjbIN32yk03UFby0vNS4hsak XYKgok0TfoKpLfTgy9dE9eKuRujiYWNeWrC86U71RWS/HfGA+FZ3WzYqRHO4ElJL7p88 9qDZlh/m9QHdPjsbYext/O66FGo63fyMErF7n21elANS5+aPUoO2sfBHtEfWlYikU5So 49qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765553277; x=1766158077; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=bzENitL8AUZ68RERyt6xp7p+jtCblr0NyYweq7WA0Os=; b=qWC9WKcaPyoi+zSzt1CVdIPYpjAIo6Lvb3dQ2IWpHdh39d21J/ywx8nf+2nZcV5MjT pinCq4xntBhxxAu48BGp0uyfZzrcs+lJLTFa9PGXnZieeT5J9vA0HISNTCX70V18FdAm Ylg4WIAS94VvCr2F/MVSGsPrq/6tXj7cDQ7YRbrb0AksjhCgV90lcrFIH73rF09LbtW6 FxlXb5HbyhqPFBxDUkOvUgiQneh+7SRXBF9jHndSGkLqVj3RSWL1pGWWDqJW+Nke7Rm7 WZ5OyJnyLzrgI3lSpwKJpA0POStw0tcPO2uOx+LKPZMdW4LmAzlZRAVunKGMoSZH6tPq /RUQ== X-Forwarded-Encrypted: i=1; AJvYcCUr++tyKL5jAir1C/EPQHV5d9EeDqV4tbQmKTGBVa9PH1MSSgj4Aide+lKCetlmJ3ZMvQ0+clJTMzmXp1U=@vger.kernel.org X-Gm-Message-State: AOJu0YwfQ9UWFLsWBT2BPaZV9yX+Ndw78InKVE2JdN6DTVsqpI/4us0M EmdUkFjKeHpiMUGb8ClHMcdzk7AcKefNZillhLjJxOPRcapqHFbzFj7e X-Gm-Gg: AY/fxX4+Dctw1ovel5GThVQ0tcsy7e3Dnuym3zql1qoS/2gbB0FvcE4XHUWdm6F29Ij bc2nFXtRcz9vBGM9RN9Jl8PW43qGSquvPUdAZWG5B8+ZuD2JHLzaCVhEl/W3WHiovPI+n9dce6G S5LyglulbYxC2vBhOoiM+He6deLoyEoxbhY+VzV10CcV/HYiBtRAOAr4/kdF64K9JhhAc8FN+sJ 6KnOlwCtDFe+DeAAI54eQ5XxxajZx2c1IAgVcOD2WNRYYy14c5RuXJy6+KLfatfjTYxTaHuMzYS tmpvVzXuuHeFDnWBTMGQWUjcQqcHtgxdG9zOUii16ZFSXfjPhxjae3TWF9Zi5aOcXx283zz0M/h 1lxe1RzDJ/CvBRH+MeBZgTCl6iUiKHzFlgMG+nbtuIWZxjXTiwCIEJTOkHjqr+OsgPu038pKj+h wV+S6lpcr/L/58lpe4jMLwLvmY X-Google-Smtp-Source: AGHT+IGJkdJ9hixisUeeYL/0HntI53jNSy5a6YB1w0C5X5lbrmy3dicEMz346CEDNpxcNlRXqX/xeA== X-Received: by 2002:a17:90b:35c9:b0:33f:f22c:8602 with SMTP id 98e67ed59e1d1-34abd858b3cmr2097689a91.26.1765553277401; Fri, 12 Dec 2025 07:27:57 -0800 (PST) Received: from minh.192.168.1.1 ([2001:ee0:4f4c:210:5402:1cf7:eb53:9399]) by smtp.googlemail.com with ESMTPSA id 98e67ed59e1d1-34abe3dea94sm2288511a91.11.2025.12.12.07.27.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Dec 2025 07:27:56 -0800 (PST) From: Bui Quang Minh To: netdev@vger.kernel.org Cc: "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , =?UTF-8?q?Eugenio=20P=C3=A9rez?= , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Stanislav Fomichev , virtualization@lists.linux.dev, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, Bui Quang Minh , stable@vger.kernel.org Subject: [PATCH net v2] virtio-net: enable all napis before scheduling refill work Date: Fri, 12 Dec 2025 22:27:41 +0700 Message-ID: <20251212152741.11656-1-minhquangbui99@gmail.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Calling napi_disable() on an already disabled napi can cause the deadlock. In commit 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx"), to avoid the deadlock, when pausing the RX in virtnet_rx_pause[_all](), we disable and cancel the delayed refill work. However, in the virtnet_rx_resume_all(), we enable the delayed refill work too early before enabling all the receive queue napis. The deadlock can be reproduced by running selftests/drivers/net/hw/xsk_reconfig.py with multiqueue virtio-net device and inserting a cond_resched() inside the for loop in virtnet_rx_resume_all() to increase the success rate. Because the worker processing the delayed refilled work runs on the same CPU as virtnet_rx_resume_all(), a reschedule is needed to cause the deadlock. In real scenario, the contention on netdev_lock can cause the reschedule. This fixes the deadlock by ensuring all receive queue's napis are enabled before we enable the delayed refill work in virtnet_rx_resume_all() and virtnet_open(). Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx") Reported-by: Paolo Abeni Closes: https://netdev-ctrl.bots.linux.dev/logs/vmksft/drv-hw-dbg/results/4= 00961/3-xdp-py/stderr Cc: stable@vger.kernel.org Signed-off-by: Bui Quang Minh --- Changes in v2: - Move try_fill_recv() before rx napi_enable() - Link to v1: https://lore.kernel.org/netdev/20251208153419.18196-1-minhqua= ngbui99@gmail.com/ --- drivers/net/virtio_net.c | 71 +++++++++++++++++++++++++--------------- 1 file changed, 45 insertions(+), 26 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 8e04adb57f52..4e08880a9467 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3214,21 +3214,31 @@ static void virtnet_update_settings(struct virtnet_= info *vi) static int virtnet_open(struct net_device *dev) { struct virtnet_info *vi =3D netdev_priv(dev); + bool schedule_refill =3D false; int i, err; =20 - enable_delayed_refill(vi); - + /* - We must call try_fill_recv before enabling napi of the same receive + * queue so that it doesn't race with the call in virtnet_receive. + * - We must enable and schedule delayed refill work only when we have + * enabled all the receive queue's napi. Otherwise, in refill_work, we + * have a deadlock when calling napi_disable on an already disabled + * napi. + */ for (i =3D 0; i < vi->max_queue_pairs; i++) { if (i < vi->curr_queue_pairs) /* Make sure we have some buffers: if oom use wq. */ if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL)) - schedule_delayed_work(&vi->refill, 0); + schedule_refill =3D true; =20 err =3D virtnet_enable_queue_pair(vi, i); if (err < 0) goto err_enable_qp; } =20 + enable_delayed_refill(vi); + if (schedule_refill) + schedule_delayed_work(&vi->refill, 0); + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) { if (vi->status & VIRTIO_NET_S_LINK_UP) netif_carrier_on(vi->dev); @@ -3463,39 +3473,48 @@ static void virtnet_rx_pause(struct virtnet_info *v= i, struct receive_queue *rq) __virtnet_rx_pause(vi, rq); } =20 -static void __virtnet_rx_resume(struct virtnet_info *vi, - struct receive_queue *rq, - bool refill) +static void virtnet_rx_resume_all(struct virtnet_info *vi) { - bool running =3D netif_running(vi->dev); bool schedule_refill =3D false; + int i; =20 - if (refill && !try_fill_recv(vi, rq, GFP_KERNEL)) - schedule_refill =3D true; - if (running) - virtnet_napi_enable(rq); - - if (schedule_refill) - schedule_delayed_work(&vi->refill, 0); -} + if (netif_running(vi->dev)) { + /* See the comment in virtnet_open for the ordering rule + * of try_fill_recv, receive queue napi_enable and delayed + * refill enable/schedule. + */ + for (i =3D 0; i < vi->max_queue_pairs; i++) { + if (i < vi->curr_queue_pairs) + if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL)) + schedule_refill =3D true; =20 -static void virtnet_rx_resume_all(struct virtnet_info *vi) -{ - int i; + virtnet_napi_enable(&vi->rq[i]); + } =20 - enable_delayed_refill(vi); - for (i =3D 0; i < vi->max_queue_pairs; i++) { - if (i < vi->curr_queue_pairs) - __virtnet_rx_resume(vi, &vi->rq[i], true); - else - __virtnet_rx_resume(vi, &vi->rq[i], false); + enable_delayed_refill(vi); + if (schedule_refill) + schedule_delayed_work(&vi->refill, 0); } } =20 static void virtnet_rx_resume(struct virtnet_info *vi, struct receive_queu= e *rq) { - enable_delayed_refill(vi); - __virtnet_rx_resume(vi, rq, true); + bool schedule_refill =3D false; + + if (netif_running(vi->dev)) { + /* See the comment in virtnet_open for the ordering rule + * of try_fill_recv, receive queue napi_enable and delayed + * refill enable/schedule. + */ + if (!try_fill_recv(vi, rq, GFP_KERNEL)) + schedule_refill =3D true; + + virtnet_napi_enable(rq); + + enable_delayed_refill(vi); + if (schedule_refill) + schedule_delayed_work(&vi->refill, 0); + } } =20 static int virtnet_rx_resize(struct virtnet_info *vi, --=20 2.43.0