include/net/sch_generic.h | 2 +- net/core/dev.c | 2 +- net/sched/sch_generic.c | 12 +++++++++--- 3 files changed, 11 insertions(+), 5 deletions(-)
When changing the tx queue length via dev_qdisc_change_tx_queue_len(),
if one of the updates fails, the function currently exits
without rolling back previously modified queues. This can leave the
device and its qdiscs in an inconsistent state. This patch adds rollback logic
that restores the original dev->tx_queue_len and re-applies it to each previously
updated queue's qdisc by invoking qdisc_change_tx_queue_len() again.
To support this, dev_qdisc_change_tx_queue_len() now takes an additional
parameter old_len to remember the original tx_queue_len value.
Note: I have built the kernel with these changes to ensure it compiles, but I
have not tested the runtime behavior, as I am currently unsure how to test this
change.
Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
---
include/net/sch_generic.h | 2 +-
net/core/dev.c | 2 +-
net/sched/sch_generic.c | 12 +++++++++---
3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 638948be4c50..a4f59df2982f 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -681,7 +681,7 @@ void qdisc_class_hash_remove(struct Qdisc_class_hash *,
void qdisc_class_hash_grow(struct Qdisc *, struct Qdisc_class_hash *);
void qdisc_class_hash_destroy(struct Qdisc_class_hash *);
-int dev_qdisc_change_tx_queue_len(struct net_device *dev);
+int dev_qdisc_change_tx_queue_len(struct net_device *dev, unsigned int old_len);
void dev_qdisc_change_real_num_tx(struct net_device *dev,
unsigned int new_real_tx);
void dev_init_scheduler(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index be97c440ecd5..afa3c5a9bba1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9630,7 +9630,7 @@ int netif_change_tx_queue_len(struct net_device *dev, unsigned long new_len)
res = notifier_to_errno(res);
if (res)
goto err_rollback;
- res = dev_qdisc_change_tx_queue_len(dev);
+ res = dev_qdisc_change_tx_queue_len(dev, orig_len);
if (res)
goto err_rollback;
}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 16afb834fe4a..701dfbe722ed 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1445,7 +1445,7 @@ void mq_change_real_num_tx(struct Qdisc *sch, unsigned int new_real_tx)
}
EXPORT_SYMBOL(mq_change_real_num_tx);
-int dev_qdisc_change_tx_queue_len(struct net_device *dev)
+int dev_qdisc_change_tx_queue_len(struct net_device *dev, unsigned int old_len)
{
bool up = dev->flags & IFF_UP;
unsigned int i;
@@ -1456,12 +1456,18 @@ int dev_qdisc_change_tx_queue_len(struct net_device *dev)
for (i = 0; i < dev->num_tx_queues; i++) {
ret = qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
-
- /* TODO: revert changes on a partial failure */
if (ret)
break;
}
+ if (ret) {
+ dev->tx_queue_len = old_len;
+ while (i >= 0) {
+ qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
+ i--;
+ }
+ }
+
if (up)
dev_activate(dev);
return ret;
--
2.50.1
Hi Suchit, kernel test robot noticed the following build warnings: [auto build test WARNING on net-next/main] [also build test WARNING on net/main linus/master horms-ipvs/master v6.16-rc7 next-20250723] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Suchit-Karunakaran/net-Revert-tx-queue-length-on-partial-failure-in-dev_qdisc_change_tx_queue_len/20250722-151746 base: net-next/main patch link: https://lore.kernel.org/r/20250722071508.12497-1-suchitkarunakaran%40gmail.com patch subject: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len() config: parisc-randconfig-r072-20250723 (https://download.01.org/0day-ci/archive/20250723/202507232358.OuGr01a5-lkp@intel.com/config) compiler: hppa-linux-gcc (GCC) 9.5.0 If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202507232358.OuGr01a5-lkp@intel.com/ smatch warnings: net/sched/sch_generic.c:1465 dev_qdisc_change_tx_queue_len() warn: always true condition '(i >= 0) => (0-u32max >= 0)' net/sched/sch_generic.c:1465 dev_qdisc_change_tx_queue_len() warn: always true condition '(i >= 0) => (0-u32max >= 0)' vim +1465 net/sched/sch_generic.c 1447 1448 int dev_qdisc_change_tx_queue_len(struct net_device *dev, unsigned int old_len) 1449 { 1450 bool up = dev->flags & IFF_UP; 1451 unsigned int i; 1452 int ret = 0; 1453 1454 if (up) 1455 dev_deactivate(dev); 1456 1457 for (i = 0; i < dev->num_tx_queues; i++) { 1458 ret = qdisc_change_tx_queue_len(dev, &dev->_tx[i]); 1459 if (ret) 1460 break; 1461 } 1462 1463 if (ret) { 1464 dev->tx_queue_len = old_len; > 1465 while (i >= 0) { 1466 qdisc_change_tx_queue_len(dev, &dev->_tx[i]); 1467 i--; 1468 } 1469 } 1470 1471 if (up) 1472 dev_activate(dev); 1473 return ret; 1474 } 1475 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
On Tue, Jul 22, 2025 at 12:15 AM Suchit Karunakaran <suchitkarunakaran@gmail.com> wrote: > > When changing the tx queue length via dev_qdisc_change_tx_queue_len(), > if one of the updates fails, the function currently exits > without rolling back previously modified queues. This can leave the > device and its qdiscs in an inconsistent state. This patch adds rollback logic > that restores the original dev->tx_queue_len and re-applies it to each previously > updated queue's qdisc by invoking qdisc_change_tx_queue_len() again. > To support this, dev_qdisc_change_tx_queue_len() now takes an additional > parameter old_len to remember the original tx_queue_len value. > > Note: I have built the kernel with these changes to ensure it compiles, but I > have not tested the runtime behavior, as I am currently unsure how to test this > change. > > Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com> > --- > include/net/sch_generic.h | 2 +- > net/core/dev.c | 2 +- > net/sched/sch_generic.c | 12 +++++++++--- > 3 files changed, 11 insertions(+), 5 deletions(-) > > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h > index 638948be4c50..a4f59df2982f 100644 > --- a/include/net/sch_generic.h > +++ b/include/net/sch_generic.h > @@ -681,7 +681,7 @@ void qdisc_class_hash_remove(struct Qdisc_class_hash *, > void qdisc_class_hash_grow(struct Qdisc *, struct Qdisc_class_hash *); > void qdisc_class_hash_destroy(struct Qdisc_class_hash *); > > -int dev_qdisc_change_tx_queue_len(struct net_device *dev); > +int dev_qdisc_change_tx_queue_len(struct net_device *dev, unsigned int old_len); > void dev_qdisc_change_real_num_tx(struct net_device *dev, > unsigned int new_real_tx); > void dev_init_scheduler(struct net_device *dev); > diff --git a/net/core/dev.c b/net/core/dev.c > index be97c440ecd5..afa3c5a9bba1 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -9630,7 +9630,7 @@ int netif_change_tx_queue_len(struct net_device *dev, unsigned long new_len) > res = notifier_to_errno(res); > if (res) > goto err_rollback; > - res = dev_qdisc_change_tx_queue_len(dev); > + res = dev_qdisc_change_tx_queue_len(dev, orig_len); > if (res) > goto err_rollback; > } > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index 16afb834fe4a..701dfbe722ed 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -1445,7 +1445,7 @@ void mq_change_real_num_tx(struct Qdisc *sch, unsigned int new_real_tx) > } > EXPORT_SYMBOL(mq_change_real_num_tx); > > -int dev_qdisc_change_tx_queue_len(struct net_device *dev) > +int dev_qdisc_change_tx_queue_len(struct net_device *dev, unsigned int old_len) > { > bool up = dev->flags & IFF_UP; > unsigned int i; > @@ -1456,12 +1456,18 @@ int dev_qdisc_change_tx_queue_len(struct net_device *dev) > > for (i = 0; i < dev->num_tx_queues; i++) { > ret = qdisc_change_tx_queue_len(dev, &dev->_tx[i]); > - > - /* TODO: revert changes on a partial failure */ > if (ret) > break; > } > > + if (ret) { > + dev->tx_queue_len = old_len; WRITE_ONCE() is missing. > + while (i >= 0) { > + qdisc_change_tx_queue_len(dev, &dev->_tx[i]); What happens if one of these calls fails ? I think a fix will be more complicated...
> > WRITE_ONCE() is missing. > > > + while (i >= 0) { > > + qdisc_change_tx_queue_len(dev, &dev->_tx[i]); > > What happens if one of these calls fails ? > > I think a fix will be more complicated... Hi Eric, Given that pfifo_fast_change_tx_queue_len is currently the only implementation of change_tx_queue_len, would it be reasonable to handle partial failures solely within pfifo_fast_change_tx_queue_len (which in turn leads to skb_array_resize_multiple_bh)? In other words, is it sufficient to modify only the underlying low level implementation of pfifo_fast_change_tx_queue_len for partial failures, given that it's the sole implementation of change_tx_queue_len?
Hi Suchit, On Wed, Jul 23, 2025 at 11:47:09PM +0530, Suchit K wrote: > > > > WRITE_ONCE() is missing. > > > > > + while (i >= 0) { > > > + qdisc_change_tx_queue_len(dev, &dev->_tx[i]); > > > > What happens if one of these calls fails ? > > > > I think a fix will be more complicated... > > Hi Eric, > Given that pfifo_fast_change_tx_queue_len is currently the only > implementation of change_tx_queue_len, would it be reasonable to > handle partial failures solely within pfifo_fast_change_tx_queue_len > (which in turn leads to skb_array_resize_multiple_bh)? In other words, > is it sufficient to modify only the underlying low level > implementation of pfifo_fast_change_tx_queue_len for partial failures, > given that it's the sole implementation of change_tx_queue_len? Thanks for your patch. As you noticed it is tricky to handle the failure elegantly here, which was also the reason why I didn't do it. Did you observe any real issue? To answer your question above: I am not sure if we can do it in pfifo fast implementation since struct netdev_queue is not explicitly exposed to the lower Qdisc. On the other hand, although dev_qdisc_change_tx_queue_len() is generic, it is only called for this very specific code path, so changing it won't impact other code paths, IMHO. Regards, Cong Wang
On Fri, 25 Jul 2025 at 23:17, Cong Wang <xiyou.wangcong@gmail.com> wrote: > > Hi Suchit, > > On Wed, Jul 23, 2025 at 11:47:09PM +0530, Suchit K wrote: > > > > > > WRITE_ONCE() is missing. > > > > > > > + while (i >= 0) { > > > > + qdisc_change_tx_queue_len(dev, &dev->_tx[i]); > > > > > > What happens if one of these calls fails ? > > > > > > I think a fix will be more complicated... > > > > Hi Eric, > > Given that pfifo_fast_change_tx_queue_len is currently the only > > implementation of change_tx_queue_len, would it be reasonable to > > handle partial failures solely within pfifo_fast_change_tx_queue_len > > (which in turn leads to skb_array_resize_multiple_bh)? In other words, > > is it sufficient to modify only the underlying low level > > implementation of pfifo_fast_change_tx_queue_len for partial failures, > > given that it's the sole implementation of change_tx_queue_len? > > Thanks for your patch. > > As you noticed it is tricky to handle the failure elegantly here, which > was also the reason why I didn't do it. Did you observe any real issue? > > To answer your question above: I am not sure if we can do it in pfifo > fast implementation since struct netdev_queue is not explicitly exposed to > the lower Qdisc. > > On the other hand, although dev_qdisc_change_tx_queue_len() is generic, > it is only called for this very specific code path, so changing it won't > impact other code paths, IMHO. > > Regards, > Cong Wang Hi, Thanks for the feedback. I'll try to dig more into it and will post a patch if I find a solution. Thanks once again.
On Wed, Jul 23, 2025 at 11:17 AM Suchit K <suchitkarunakaran@gmail.com> wrote: > > > > > WRITE_ONCE() is missing. > > > > > + while (i >= 0) { > > > + qdisc_change_tx_queue_len(dev, &dev->_tx[i]); > > > > What happens if one of these calls fails ? > > > > I think a fix will be more complicated... > > Hi Eric, > Given that pfifo_fast_change_tx_queue_len is currently the only > implementation of change_tx_queue_len, would it be reasonable to > handle partial failures solely within pfifo_fast_change_tx_queue_len > (which in turn leads to skb_array_resize_multiple_bh)? In other words, > is it sufficient to modify only the underlying low level > implementation of pfifo_fast_change_tx_queue_len for partial failures, > given that it's the sole implementation of change_tx_queue_len? A generic solution will involve two phases... Lots of core changes, for a very narrow case. Most of us do not use pfifo_fast anyway.
> > WRITE_ONCE() is missing. Oops, I'm sorry about that. > > > + while (i >= 0) { > > + qdisc_change_tx_queue_len(dev, &dev->_tx[i]); > > What happens if one of these calls fails ? > > I think a fix will be more complicated... I did consider that, but since I didn’t have a solution, I assumed it wouldn’t fail. I also have a question. In the Qdisc_ops structure, there’s a function pointer for change_tx_queue_len, but I was only able to find a single implementation which is pfifo_fast_change_tx_queue_len. Is that the only one? Apologies if this isn’t the right place to ask such questions. I’d really appreciate any feedback. Thank you!
On Tue, Jul 22, 2025 at 9:22 AM Suchit K <suchitkarunakaran@gmail.com> wrote: > > > > > WRITE_ONCE() is missing. > > Oops, I'm sorry about that. > > > > > > + while (i >= 0) { > > > + qdisc_change_tx_queue_len(dev, &dev->_tx[i]); > > > > What happens if one of these calls fails ? > > > > I think a fix will be more complicated... > > I did consider that, but since I didn’t have a solution, I assumed it > wouldn’t fail. But this definitely could fail. Exactly the same way than the first time. I also have a question. In the Qdisc_ops structure, > there’s a function pointer for change_tx_queue_len, but I was only > able to find a single implementation which is > pfifo_fast_change_tx_queue_len. Is that the only one? Apologies if > this isn’t the right place to ask such questions. I’d really > appreciate any feedback. Thank you! I think only pfifo_fast has to re-allocate its data structures. Other qdiscs eventually dynamically read dev->tx_queue_len (thus the WRITE_ONCE() I mentioned to you)
On Tue, 22 Jul 2025 at 21:58, Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Jul 22, 2025 at 9:22 AM Suchit K <suchitkarunakaran@gmail.com> wrote: > > > > > > > > WRITE_ONCE() is missing. > > > > Oops, I'm sorry about that. > > > > > > > > > + while (i >= 0) { > > > > + qdisc_change_tx_queue_len(dev, &dev->_tx[i]); > > > > > > What happens if one of these calls fails ? > > > > > > I think a fix will be more complicated... > > > > I did consider that, but since I didn’t have a solution, I assumed it > > wouldn’t fail. > > But this definitely could fail. Exactly the same way than the first time. > Yeah, it makes sense. > I also have a question. In the Qdisc_ops structure, > > there’s a function pointer for change_tx_queue_len, but I was only > > able to find a single implementation which is > > pfifo_fast_change_tx_queue_len. Is that the only one? Apologies if > > this isn’t the right place to ask such questions. I’d really > > appreciate any feedback. Thank you! > > I think only pfifo_fast has to re-allocate its data structures. > > Other qdiscs eventually dynamically read dev->tx_queue_len (thus the > WRITE_ONCE() I mentioned to you) Yup got it. Thank you so much.
On Tue, 22 Jul 2025 12:45:08 +0530 Suchit Karunakaran wrote: > + while (i >= 0) { > + qdisc_change_tx_queue_len(dev, &dev->_tx[i]); > + i--; i is unsigned, this loop will never end -- pw-bot: cr
On Tue, 22 Jul 2025 at 19:07, Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 22 Jul 2025 12:45:08 +0530 Suchit Karunakaran wrote: > > + while (i >= 0) { > > + qdisc_change_tx_queue_len(dev, &dev->_tx[i]); > > + i--; > > i is unsigned, this loop will never end > -- > pw-bot: cr Hi Jakub, I'm sorry for the oversight. I'll send a v2 patch shortly to fix it. In the meantime, could you please give me some insights on testing this change? Also, apart from the unsigned int blunder, does the overall approach look reasonable? I’d be grateful for any suggestions or comments. Thank you.
On Tue, 22 Jul 2025 19:26:01 +0530 Suchit K wrote: > I'm sorry for the oversight. I'll send a v2 patch shortly to fix it. Please note that in networking we ask that new versions of patches not be resubmitted within 24h of previous posting: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html > In the meantime, could you please give me some insights on testing > this change? Also, apart from the unsigned int blunder, does the > overall approach look reasonable? I’d be grateful for any suggestions > or comments. Thank you. Hopefully someone reviews before you repost, I didn't look further once I noticed the static analysis warning.
> Please note that in networking we ask that new versions of patches not > be resubmitted within 24h of previous posting: > https://www.kernel.org/doc/html/next/process/maintainer-netdev.html > Yup, thanks for reminding. > > In the meantime, could you please give me some insights on testing > > this change? Also, apart from the unsigned int blunder, does the > > overall approach look reasonable? I’d be grateful for any suggestions > > or comments. Thank you. > > Hopefully someone reviews before you repost, I didn't look further once > I noticed the static analysis warning. Alright, I'll wait until someone reviews it. Thanks!
© 2016 - 2025 Red Hat, Inc.