net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

[PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

Posted by Suchit Karunakaran 2 months, 2 weeks ago

When changing the tx queue length via dev_qdisc_change_tx_queue_len(),
if one of the updates fails, the function currently exits
without rolling back previously modified queues. This can leave the
device and its qdiscs in an inconsistent state. This patch adds rollback logic
that restores the original dev->tx_queue_len and re-applies it to each previously
updated queue's qdisc by invoking qdisc_change_tx_queue_len() again.
To support this, dev_qdisc_change_tx_queue_len() now takes an additional
parameter old_len to remember the original tx_queue_len value.

Note: I have built the kernel with these changes to ensure it compiles, but I
have not tested the runtime behavior, as I am currently unsure how to test this
change.

Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
---
 include/net/sch_generic.h |  2 +-
 net/core/dev.c            |  2 +-
 net/sched/sch_generic.c   | 12 +++++++++---
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 638948be4c50..a4f59df2982f 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -681,7 +681,7 @@ void qdisc_class_hash_remove(struct Qdisc_class_hash *,
 void qdisc_class_hash_grow(struct Qdisc *, struct Qdisc_class_hash *);
 void qdisc_class_hash_destroy(struct Qdisc_class_hash *);
 
-int dev_qdisc_change_tx_queue_len(struct net_device *dev);
+int dev_qdisc_change_tx_queue_len(struct net_device *dev, unsigned int old_len);
 void dev_qdisc_change_real_num_tx(struct net_device *dev,
 				  unsigned int new_real_tx);
 void dev_init_scheduler(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index be97c440ecd5..afa3c5a9bba1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9630,7 +9630,7 @@ int netif_change_tx_queue_len(struct net_device *dev, unsigned long new_len)
 		res = notifier_to_errno(res);
 		if (res)
 			goto err_rollback;
-		res = dev_qdisc_change_tx_queue_len(dev);
+		res = dev_qdisc_change_tx_queue_len(dev, orig_len);
 		if (res)
 			goto err_rollback;
 	}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 16afb834fe4a..701dfbe722ed 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1445,7 +1445,7 @@ void mq_change_real_num_tx(struct Qdisc *sch, unsigned int new_real_tx)
 }
 EXPORT_SYMBOL(mq_change_real_num_tx);
 
-int dev_qdisc_change_tx_queue_len(struct net_device *dev)
+int dev_qdisc_change_tx_queue_len(struct net_device *dev, unsigned int old_len)
 {
 	bool up = dev->flags & IFF_UP;
 	unsigned int i;
@@ -1456,12 +1456,18 @@ int dev_qdisc_change_tx_queue_len(struct net_device *dev)
 
 	for (i = 0; i < dev->num_tx_queues; i++) {
 		ret = qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
-
-		/* TODO: revert changes on a partial failure */
 		if (ret)
 			break;
 	}
 
+	if (ret) {
+		dev->tx_queue_len = old_len;
+		while (i >= 0) {
+			qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
+			i--;
+		}
+	}
+
 	if (up)
 		dev_activate(dev);
 	return ret;
-- 
2.50.1

Re: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

Posted by kernel test robot 2 months, 2 weeks ago

Hi Suchit,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]
[also build test WARNING on net/main linus/master horms-ipvs/master v6.16-rc7 next-20250723]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Suchit-Karunakaran/net-Revert-tx-queue-length-on-partial-failure-in-dev_qdisc_change_tx_queue_len/20250722-151746
base:   net-next/main
patch link:    https://lore.kernel.org/r/20250722071508.12497-1-suchitkarunakaran%40gmail.com
patch subject: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()
config: parisc-randconfig-r072-20250723 (https://download.01.org/0day-ci/archive/20250723/202507232358.OuGr01a5-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 9.5.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507232358.OuGr01a5-lkp@intel.com/

smatch warnings:
net/sched/sch_generic.c:1465 dev_qdisc_change_tx_queue_len() warn: always true condition '(i >= 0) => (0-u32max >= 0)'
net/sched/sch_generic.c:1465 dev_qdisc_change_tx_queue_len() warn: always true condition '(i >= 0) => (0-u32max >= 0)'

vim +1465 net/sched/sch_generic.c

  1447	
  1448	int dev_qdisc_change_tx_queue_len(struct net_device *dev, unsigned int old_len)
  1449	{
  1450		bool up = dev->flags & IFF_UP;
  1451		unsigned int i;
  1452		int ret = 0;
  1453	
  1454		if (up)
  1455			dev_deactivate(dev);
  1456	
  1457		for (i = 0; i < dev->num_tx_queues; i++) {
  1458			ret = qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
  1459			if (ret)
  1460				break;
  1461		}
  1462	
  1463		if (ret) {
  1464			dev->tx_queue_len = old_len;
> 1465			while (i >= 0) {
  1466				qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
  1467				i--;
  1468			}
  1469		}
  1470	
  1471		if (up)
  1472			dev_activate(dev);
  1473		return ret;
  1474	}
  1475	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

Posted by Eric Dumazet 2 months, 2 weeks ago

On Tue, Jul 22, 2025 at 12:15 AM Suchit Karunakaran
<suchitkarunakaran@gmail.com> wrote:
>
> When changing the tx queue length via dev_qdisc_change_tx_queue_len(),
> if one of the updates fails, the function currently exits
> without rolling back previously modified queues. This can leave the
> device and its qdiscs in an inconsistent state. This patch adds rollback logic
> that restores the original dev->tx_queue_len and re-applies it to each previously
> updated queue's qdisc by invoking qdisc_change_tx_queue_len() again.
> To support this, dev_qdisc_change_tx_queue_len() now takes an additional
> parameter old_len to remember the original tx_queue_len value.
>
> Note: I have built the kernel with these changes to ensure it compiles, but I
> have not tested the runtime behavior, as I am currently unsure how to test this
> change.
>
> Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
> ---
>  include/net/sch_generic.h |  2 +-
>  net/core/dev.c            |  2 +-
>  net/sched/sch_generic.c   | 12 +++++++++---
>  3 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 638948be4c50..a4f59df2982f 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -681,7 +681,7 @@ void qdisc_class_hash_remove(struct Qdisc_class_hash *,
>  void qdisc_class_hash_grow(struct Qdisc *, struct Qdisc_class_hash *);
>  void qdisc_class_hash_destroy(struct Qdisc_class_hash *);
>
> -int dev_qdisc_change_tx_queue_len(struct net_device *dev);
> +int dev_qdisc_change_tx_queue_len(struct net_device *dev, unsigned int old_len);
>  void dev_qdisc_change_real_num_tx(struct net_device *dev,
>                                   unsigned int new_real_tx);
>  void dev_init_scheduler(struct net_device *dev);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index be97c440ecd5..afa3c5a9bba1 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9630,7 +9630,7 @@ int netif_change_tx_queue_len(struct net_device *dev, unsigned long new_len)
>                 res = notifier_to_errno(res);
>                 if (res)
>                         goto err_rollback;
> -               res = dev_qdisc_change_tx_queue_len(dev);
> +               res = dev_qdisc_change_tx_queue_len(dev, orig_len);
>                 if (res)
>                         goto err_rollback;
>         }
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 16afb834fe4a..701dfbe722ed 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -1445,7 +1445,7 @@ void mq_change_real_num_tx(struct Qdisc *sch, unsigned int new_real_tx)
>  }
>  EXPORT_SYMBOL(mq_change_real_num_tx);
>
> -int dev_qdisc_change_tx_queue_len(struct net_device *dev)
> +int dev_qdisc_change_tx_queue_len(struct net_device *dev, unsigned int old_len)
>  {
>         bool up = dev->flags & IFF_UP;
>         unsigned int i;
> @@ -1456,12 +1456,18 @@ int dev_qdisc_change_tx_queue_len(struct net_device *dev)
>
>         for (i = 0; i < dev->num_tx_queues; i++) {
>                 ret = qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
> -
> -               /* TODO: revert changes on a partial failure */
>                 if (ret)
>                         break;
>         }
>
> +       if (ret) {
> +               dev->tx_queue_len = old_len;

WRITE_ONCE() is missing.

> +               while (i >= 0) {
> +                       qdisc_change_tx_queue_len(dev, &dev->_tx[i]);

What happens if one of these calls fails ?

I think a fix will be more complicated...

Re: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

Posted by Suchit K 2 months, 2 weeks ago

>
> WRITE_ONCE() is missing.
>
> > +               while (i >= 0) {
> > +                       qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
>
> What happens if one of these calls fails ?
>
> I think a fix will be more complicated...

Hi Eric,
Given that pfifo_fast_change_tx_queue_len is currently the only
implementation of change_tx_queue_len, would it be reasonable to
handle partial failures solely within pfifo_fast_change_tx_queue_len
(which in turn leads to skb_array_resize_multiple_bh)? In other words,
is it sufficient to modify only the underlying low level
implementation of pfifo_fast_change_tx_queue_len for partial failures,
given that it's the sole implementation of change_tx_queue_len?

Re: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

Posted by Cong Wang 2 months, 1 week ago

Hi Suchit,

On Wed, Jul 23, 2025 at 11:47:09PM +0530, Suchit K wrote:
> >
> > WRITE_ONCE() is missing.
> >
> > > +               while (i >= 0) {
> > > +                       qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
> >
> > What happens if one of these calls fails ?
> >
> > I think a fix will be more complicated...
> 
> Hi Eric,
> Given that pfifo_fast_change_tx_queue_len is currently the only
> implementation of change_tx_queue_len, would it be reasonable to
> handle partial failures solely within pfifo_fast_change_tx_queue_len
> (which in turn leads to skb_array_resize_multiple_bh)? In other words,
> is it sufficient to modify only the underlying low level
> implementation of pfifo_fast_change_tx_queue_len for partial failures,
> given that it's the sole implementation of change_tx_queue_len?

Thanks for your patch.

As you noticed it is tricky to handle the failure elegantly here, which
was also the reason why I didn't do it. Did you observe any real issue?

To answer your question above: I am not sure if we can do it in pfifo
fast implementation since struct netdev_queue is not explicitly exposed to
the lower Qdisc.

On the other hand, although dev_qdisc_change_tx_queue_len() is generic,
it is only called for this very specific code path, so changing it won't
impact other code paths, IMHO.

Regards,
Cong Wang

Re: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

Posted by Suchit K 2 months, 1 week ago

On Fri, 25 Jul 2025 at 23:17, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> Hi Suchit,
>
> On Wed, Jul 23, 2025 at 11:47:09PM +0530, Suchit K wrote:
> > >
> > > WRITE_ONCE() is missing.
> > >
> > > > +               while (i >= 0) {
> > > > +                       qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
> > >
> > > What happens if one of these calls fails ?
> > >
> > > I think a fix will be more complicated...
> >
> > Hi Eric,
> > Given that pfifo_fast_change_tx_queue_len is currently the only
> > implementation of change_tx_queue_len, would it be reasonable to
> > handle partial failures solely within pfifo_fast_change_tx_queue_len
> > (which in turn leads to skb_array_resize_multiple_bh)? In other words,
> > is it sufficient to modify only the underlying low level
> > implementation of pfifo_fast_change_tx_queue_len for partial failures,
> > given that it's the sole implementation of change_tx_queue_len?
>
> Thanks for your patch.
>
> As you noticed it is tricky to handle the failure elegantly here, which
> was also the reason why I didn't do it. Did you observe any real issue?
>
> To answer your question above: I am not sure if we can do it in pfifo
> fast implementation since struct netdev_queue is not explicitly exposed to
> the lower Qdisc.
>
> On the other hand, although dev_qdisc_change_tx_queue_len() is generic,
> it is only called for this very specific code path, so changing it won't
> impact other code paths, IMHO.
>
> Regards,
> Cong Wang

Hi, Thanks for the feedback. I'll try to dig more into it and will
post a patch if I find a solution. Thanks once again.

Re: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

Posted by Eric Dumazet 2 months, 2 weeks ago

On Wed, Jul 23, 2025 at 11:17 AM Suchit K <suchitkarunakaran@gmail.com> wrote:
>
> >
> > WRITE_ONCE() is missing.
> >
> > > +               while (i >= 0) {
> > > +                       qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
> >
> > What happens if one of these calls fails ?
> >
> > I think a fix will be more complicated...
>
> Hi Eric,
> Given that pfifo_fast_change_tx_queue_len is currently the only
> implementation of change_tx_queue_len, would it be reasonable to
> handle partial failures solely within pfifo_fast_change_tx_queue_len
> (which in turn leads to skb_array_resize_multiple_bh)? In other words,
> is it sufficient to modify only the underlying low level
> implementation of pfifo_fast_change_tx_queue_len for partial failures,
> given that it's the sole implementation of change_tx_queue_len?

A generic solution will involve two phases...
Lots of core changes, for a very narrow case.

Most of us do not use pfifo_fast anyway.

Re: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

Posted by Suchit K 2 months, 2 weeks ago

>
> WRITE_ONCE() is missing.

Oops, I'm sorry about that.

>
> > +               while (i >= 0) {
> > +                       qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
>
> What happens if one of these calls fails ?
>
> I think a fix will be more complicated...

I did consider that, but since I didn’t have a solution, I assumed it
wouldn’t fail. I also have a question. In the Qdisc_ops structure,
there’s a function pointer for change_tx_queue_len, but I was only
able to find a single implementation which is
pfifo_fast_change_tx_queue_len. Is that the only one? Apologies if
this isn’t the right place to ask such questions. I’d really
appreciate any feedback. Thank you!

Re: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

Posted by Eric Dumazet 2 months, 2 weeks ago

On Tue, Jul 22, 2025 at 9:22 AM Suchit K <suchitkarunakaran@gmail.com> wrote:
>
> >
> > WRITE_ONCE() is missing.
>
> Oops, I'm sorry about that.
>
> >
> > > +               while (i >= 0) {
> > > +                       qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
> >
> > What happens if one of these calls fails ?
> >
> > I think a fix will be more complicated...
>
> I did consider that, but since I didn’t have a solution, I assumed it
> wouldn’t fail.

But this definitely could fail. Exactly the same way than the first time.

I also have a question. In the Qdisc_ops structure,
> there’s a function pointer for change_tx_queue_len, but I was only
> able to find a single implementation which is
> pfifo_fast_change_tx_queue_len. Is that the only one? Apologies if
> this isn’t the right place to ask such questions. I’d really
> appreciate any feedback. Thank you!

I think only pfifo_fast has to re-allocate its data structures.

Other qdiscs eventually dynamically read dev->tx_queue_len (thus the
WRITE_ONCE() I mentioned to you)

Re: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

Posted by Suchit K 2 months, 2 weeks ago

On Tue, 22 Jul 2025 at 21:58, Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Jul 22, 2025 at 9:22 AM Suchit K <suchitkarunakaran@gmail.com> wrote:
> >
> > >
> > > WRITE_ONCE() is missing.
> >
> > Oops, I'm sorry about that.
> >
> > >
> > > > +               while (i >= 0) {
> > > > +                       qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
> > >
> > > What happens if one of these calls fails ?
> > >
> > > I think a fix will be more complicated...
> >
> > I did consider that, but since I didn’t have a solution, I assumed it
> > wouldn’t fail.
>
> But this definitely could fail. Exactly the same way than the first time.
>

Yeah, it makes sense.

> I also have a question. In the Qdisc_ops structure,
> > there’s a function pointer for change_tx_queue_len, but I was only
> > able to find a single implementation which is
> > pfifo_fast_change_tx_queue_len. Is that the only one? Apologies if
> > this isn’t the right place to ask such questions. I’d really
> > appreciate any feedback. Thank you!
>
> I think only pfifo_fast has to re-allocate its data structures.
>
> Other qdiscs eventually dynamically read dev->tx_queue_len (thus the
> WRITE_ONCE() I mentioned to you)

Yup got it. Thank you so much.

Re: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

Posted by Jakub Kicinski 2 months, 2 weeks ago

On Tue, 22 Jul 2025 12:45:08 +0530 Suchit Karunakaran wrote:
> +		while (i >= 0) {
> +			qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
> +			i--;

i is unsigned, this loop will never end
-- 
pw-bot: cr

Re: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

Posted by Suchit K 2 months, 2 weeks ago

On Tue, 22 Jul 2025 at 19:07, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 22 Jul 2025 12:45:08 +0530 Suchit Karunakaran wrote:
> > +             while (i >= 0) {
> > +                     qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
> > +                     i--;
>
> i is unsigned, this loop will never end
> --
> pw-bot: cr

Hi Jakub,
I'm sorry for the oversight. I'll send a v2 patch shortly to fix it.
In the meantime, could you please give me some insights on testing
this change? Also, apart from the unsigned int blunder, does the
overall approach look reasonable? I’d be grateful for any suggestions
or comments. Thank you.

Re: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

Posted by Jakub Kicinski 2 months, 2 weeks ago

On Tue, 22 Jul 2025 19:26:01 +0530 Suchit K wrote:
> I'm sorry for the oversight. I'll send a v2 patch shortly to fix it.

Please note that in networking we ask that new versions of patches not
be resubmitted within 24h of previous posting:
https://www.kernel.org/doc/html/next/process/maintainer-netdev.html

> In the meantime, could you please give me some insights on testing
> this change? Also, apart from the unsigned int blunder, does the
> overall approach look reasonable? I’d be grateful for any suggestions
> or comments. Thank you.

Hopefully someone reviews before you repost, I didn't look further once
I noticed the static analysis warning.

Re: [PATCH] net: Revert tx queue length on partial failure in dev_qdisc_change_tx_queue_len()

Posted by Suchit K 2 months, 2 weeks ago

> Please note that in networking we ask that new versions of patches not
> be resubmitted within 24h of previous posting:
> https://www.kernel.org/doc/html/next/process/maintainer-netdev.html
>

Yup, thanks for reminding.

> > In the meantime, could you please give me some insights on testing
> > this change? Also, apart from the unsigned int blunder, does the
> > overall approach look reasonable? I’d be grateful for any suggestions
> > or comments. Thank you.
>
> Hopefully someone reviews before you repost, I didn't look further once
> I noticed the static analysis warning.

Alright, I'll wait until someone reviews it. Thanks!