improve hw flow offload byte accounting

[PATCH RFC net-next 0/4] improve hw flow offload byte accounting

Posted by Daniel Golle 2 months, 1 week ago

Hardware flow counters report raw byte counts whose semantics
vary by vendor -- some count ingress L2 frames, others egress
L2, others L3. The nf_flow_table framework currently passes
these bytes straight to conntrack without conversion, and
sub-interfaces (VLAN, PPPoE) that are bypassed by hw offload
never see any counter updates at all.

This series lets drivers declare what their counters represent,
so the framework can normalize to L3 for conntrack and
propagate per-layer stats to encap sub-interfaces.

Questions:
 - Sub-interface stats accesses vlan_dev_priv() directly --
   should there be a generic netdev callback instead?
 - Are there hw offload drivers whose counters do not fit the
   ingress-L2 / egress-L2 / L3 model?

Daniel Golle (4):
  net: flow_offload: let drivers report byte counter semantics
  nf_flow_table: track sub-interface and bridge ifindex in flow tuple
  nf_flow_table: convert hw byte counts and update sub-interface stats
  net: ethernet: mtk_eth_soc: report INGRESS_L2 byte_type in flow stats

 .../net/ethernet/mediatek/mtk_ppe_offload.c   |   1 +
 include/net/flow_offload.h                    |   7 +
 include/net/netfilter/nf_flow_table.h         |   5 +
 net/netfilter/nf_flow_table_core.c            |   2 +
 net/netfilter/nf_flow_table_offload.c         | 174 +++++++++++++++++-
 net/netfilter/nf_flow_table_path.c            |   8 +
 6 files changed, 195 insertions(+), 2 deletions(-)

-- 
2.53.0

Re: [PATCH RFC net-next 0/4] improve hw flow offload byte accounting

Posted by Pablo Neira Ayuso 2 months, 1 week ago

On Thu, Apr 09, 2026 at 02:07:22PM +0100, Daniel Golle wrote:
> Hardware flow counters report raw byte counts whose semantics
> vary by vendor -- some count ingress L2 frames, others egress
> L2, others L3. The nf_flow_table framework currently passes
> these bytes straight to conntrack without conversion, and
> sub-interfaces (VLAN, PPPoE) that are bypassed by hw offload
> never see any counter updates at all.

I see, but that is part of the feature itself? Why pretend that these
interface are really seeing traffic while they don't. This aspiration
of trying to do all hardware offload fully transparent (when it is not
the case, not mentioning semantic changes in how packet handling is
done compared to the software plane) does not sound convincing to me.

On top of this, this issue also exists in the software plane: Devices
that are bypasses do not get their counters bumped.

Maybe if this is really a requirement, then this should address the
issue for software too, but is it worth the effort to add
infrastructure for this purpose?

> This series lets drivers declare what their counters represent,
> so the framework can normalize to L3 for conntrack and
> propagate per-layer stats to encap sub-interfaces.
> 
> Questions:
>  - Sub-interface stats accesses vlan_dev_priv() directly --
>    should there be a generic netdev callback instead?
>  - Are there hw offload drivers whose counters do not fit the
>    ingress-L2 / egress-L2 / L3 model?
> 
> Daniel Golle (4):
>   net: flow_offload: let drivers report byte counter semantics
>   nf_flow_table: track sub-interface and bridge ifindex in flow tuple
>   nf_flow_table: convert hw byte counts and update sub-interface stats
>   net: ethernet: mtk_eth_soc: report INGRESS_L2 byte_type in flow stats
> 
>  .../net/ethernet/mediatek/mtk_ppe_offload.c   |   1 +
>  include/net/flow_offload.h                    |   7 +
>  include/net/netfilter/nf_flow_table.h         |   5 +
>  net/netfilter/nf_flow_table_core.c            |   2 +
>  net/netfilter/nf_flow_table_offload.c         | 174 +++++++++++++++++-
>  net/netfilter/nf_flow_table_path.c            |   8 +
>  6 files changed, 195 insertions(+), 2 deletions(-)
> 
> -- 
> 2.53.0

Re: [PATCH RFC net-next 0/4] improve hw flow offload byte accounting

Posted by Daniel Golle 2 months, 1 week ago

On Thu, Apr 09, 2026 at 03:52:41PM +0200, Pablo Neira Ayuso wrote:
> On Thu, Apr 09, 2026 at 02:07:22PM +0100, Daniel Golle wrote:
> > Hardware flow counters report raw byte counts whose semantics
> > vary by vendor -- some count ingress L2 frames, others egress
> > L2, others L3. The nf_flow_table framework currently passes
> > these bytes straight to conntrack without conversion, and
> > sub-interfaces (VLAN, PPPoE) that are bypassed by hw offload
> > never see any counter updates at all.
> 
> I see, but that is part of the feature itself? Why pretend that these
> interface are really seeing traffic while they don't. This aspiration
> of trying to do all hardware offload fully transparent (when it is not
> the case, not mentioning semantic changes in how packet handling is
> done compared to the software plane) does not sound convincing to me.

Please explain what you mean by offloading not being fully
transparent. If the MAC hardware offloads VLAN encap/decap, for
example, we also maintain the counters correctly (it just so happens),
just the flow-offloading case results in a weird overall picture:
hardware interface counters keep increasing, encap interfaces (802.1Q,
PPPoE) don't. That makes it confusing and hard to understand what's
happening when only looking at the interface counters (ie. "what is
all that traffic on my physical WAN interface which isn't PPPoE? Can't
be that all of that is the modems management interface, SNMP, ...")

> 
> On top of this, this issue also exists in the software plane: Devices
> that are bypasses do not get their counters bumped.
> 
> Maybe if this is really a requirement, then this should address the
> issue for software too, but is it worth the effort to add
> infrastructure for this purpose?

To me it would feel more correct to see counters increasing also
for offloaded traffic on software interfaces such as PPPoE or VLAN.

I honestly didn't think about the software fastpath, and yes, I think
it should be addressed there too.

> > This series lets drivers declare what their counters represent,
> > so the framework can normalize to L3 for conntrack and
> > propagate per-layer stats to encap sub-interfaces.

This part could also been seen as an independent fix as currently
conntrack stats for the same traffic differ in case of software
offloading (pure L3 bytes) and hardware offloading (L2 ingress bytes
in case of mtk_ppe).