[PATCH] net_sched: codel: fix stale state for empty flows in fq_codel

hawk@kernel.org posted 1 patch 1 week, 3 days ago
include/net/codel_impl.h | 1 +
1 file changed, 1 insertion(+)
[PATCH] net_sched: codel: fix stale state for empty flows in fq_codel
Posted by hawk@kernel.org 1 week, 3 days ago
From: Jonas Köppeler <j.koeppeler@tu-berlin.de>

When codel_dequeue() finds an empty queue, it resets vars->dropping
but does not reset vars->first_above_time.  The reference CoDel
algorithm (Nichols & Jacobson, ACM Queue 2012) resets both:

  dodeque_result codel_queue_t::dodeque(time_t now) {
      ...
      if (r.p == NULL) {
          first_above_time = 0;   // <-- Linux omits this
      }
      ...
  }

Note that codel_should_drop() does reset first_above_time when called
with a NULL skb, but codel_dequeue() returns early before ever calling
codel_should_drop() in the empty-queue case.  The post-drop code paths
do reach codel_should_drop(NULL) and correctly reset the timer, so a
dropped packet breaks the cycle -- but the next delivered packet
re-arms first_above_time and the cycle repeats.

For sparse flows such as ICMP ping (one packet every 200ms-1s), the
first packet arms first_above_time, the flow goes empty, and the
second packet arrives after the interval has elapsed and gets dropped.
The pattern repeats, producing sustained loss on flows that are not
actually congested.

Test: veth pair, fq_codel, BQL disabled, 30000 iptables rules in the
consumer namespace (NAPI-64 cycle ~14ms, well above fq_codel's 5ms
target), ping at 5 pps under UDP flood:

  Before fix:  26% ping packet loss
  After fix:    0% ping packet loss

Fix by resetting first_above_time to zero in the empty-queue path
of codel_dequeue(), matching the reference algorithm.

Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM")
Fixes: d068ca2ae2e6 ("codel: split into multiple files")
Co-developed-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
Reported-by: Chris Arges <carges@cloudflare.com>
Tested-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/all/20260318134826.1281205-7-hawk@kernel.org/
---
 include/net/codel_impl.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/net/codel_impl.h b/include/net/codel_impl.h
index 78a27ac73070..b2c359c6dd1b 100644
--- a/include/net/codel_impl.h
+++ b/include/net/codel_impl.h
@@ -158,6 +158,7 @@ static struct sk_buff *codel_dequeue(void *ctx,
 	bool drop;
 
 	if (!skb) {
+		vars->first_above_time = 0;
 		vars->dropping = false;
 		return skb;
 	}
-- 
2.43.0

Re: [PATCH] net_sched: codel: fix stale state for empty flows in fq_codel
Posted by Eric Dumazet 1 week, 3 days ago
On Mon, Mar 23, 2026 at 10:49 AM <hawk@kernel.org> wrote:
>
> From: Jonas Köppeler <j.koeppeler@tu-berlin.de>
>
> When codel_dequeue() finds an empty queue, it resets vars->dropping
> but does not reset vars->first_above_time.  The reference CoDel
> algorithm (Nichols & Jacobson, ACM Queue 2012) resets both:
>
>   dodeque_result codel_queue_t::dodeque(time_t now) {
>       ...
>       if (r.p == NULL) {
>           first_above_time = 0;   // <-- Linux omits this
>       }
>       ...
>   }
>
> Note that codel_should_drop() does reset first_above_time when called
> with a NULL skb, but codel_dequeue() returns early before ever calling
> codel_should_drop() in the empty-queue case.  The post-drop code paths
> do reach codel_should_drop(NULL) and correctly reset the timer, so a
> dropped packet breaks the cycle -- but the next delivered packet
> re-arms first_above_time and the cycle repeats.
>
> For sparse flows such as ICMP ping (one packet every 200ms-1s), the
> first packet arms first_above_time, the flow goes empty, and the
> second packet arrives after the interval has elapsed and gets dropped.
> The pattern repeats, producing sustained loss on flows that are not
> actually congested.
>
> Test: veth pair, fq_codel, BQL disabled, 30000 iptables rules in the
> consumer namespace (NAPI-64 cycle ~14ms, well above fq_codel's 5ms
> target), ping at 5 pps under UDP flood:
>
>   Before fix:  26% ping packet loss
>   After fix:    0% ping packet loss
>
> Fix by resetting first_above_time to zero in the empty-queue path
> of codel_dequeue(), matching the reference algorithm.
>
> Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM")
> Fixes: d068ca2ae2e6 ("codel: split into multiple files")
> Co-developed-by: Jesper Dangaard Brouer <hawk@kernel.org>
> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
> Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
> Reported-by: Chris Arges <carges@cloudflare.com>
> Tested-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Link: https://lore.kernel.org/all/20260318134826.1281205-7-hawk@kernel.org/
> ---
>  include/net/codel_impl.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/include/net/codel_impl.h b/include/net/codel_impl.h
> index 78a27ac73070..b2c359c6dd1b 100644
> --- a/include/net/codel_impl.h
> +++ b/include/net/codel_impl.h
> @@ -158,6 +158,7 @@ static struct sk_buff *codel_dequeue(void *ctx,
>         bool drop;
>
>         if (!skb) {
> +               vars->first_above_time = 0;
>                 vars->dropping = false;
>                 return skb;
>         }

Nice catch, thanks.

I guess I wanted to avoid the ktime_get() cost as much as I could....

Reviewed-by: Eric Dumazet <edumazet@google.com>