net/netfilter/nf_flow_table_offload.c | 38 +++++++++++++++++++++++++++ 1 file changed, 38 insertions(+)
When offloading a flow via the default nft flowtable path,
append a FLOW_ACTION_CT_METADATA action if the flow is associated with a conntrack entry.
We do this in both IPv4 and IPv6 route action builders, after NAT mangles and before redirect.
This mirrors net/sched/act_ct.c’s tcf_ct_flow_table_add_action_meta() so drivers that already
parse FLOW_ACTION_CT_METADATA from TC offloads can reuse the same logic for nft flowtables.
Signed-off-by: Elad Yifee <eladwf@gmail.com>
---
net/netfilter/nf_flow_table_offload.c | 38 +++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index e06bc36f49fe..bccae4052319 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -12,6 +12,7 @@
#include <net/netfilter/nf_conntrack_acct.h>
#include <net/netfilter/nf_conntrack_core.h>
#include <net/netfilter/nf_conntrack_tuple.h>
+#include <net/netfilter/nf_conntrack_labels.h>
static struct workqueue_struct *nf_flow_offload_add_wq;
static struct workqueue_struct *nf_flow_offload_del_wq;
@@ -679,6 +680,41 @@ nf_flow_rule_route_common(struct net *net, const struct flow_offload *flow,
return 0;
}
+static void flow_offload_add_ct_metadata(const struct flow_offload *flow,
+ enum flow_offload_tuple_dir dir,
+ struct nf_flow_rule *flow_rule)
+{
+ struct nf_conn *ct = flow->ct;
+ struct flow_action_entry *entry;
+#if IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS)
+ u32 *dst_labels;
+ struct nf_conn_labels *labels;
+#endif
+
+ if (!ct)
+ return;
+
+ entry = flow_action_entry_next(flow_rule);
+ entry->id = FLOW_ACTION_CT_METADATA;
+
+#if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK)
+ entry->ct_metadata.mark = READ_ONCE(ct->mark);
+#endif
+
+ entry->ct_metadata.orig_dir = (dir == FLOW_OFFLOAD_DIR_ORIGINAL);
+
+#if IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS)
+ dst_labels = entry->ct_metadata.labels;
+ labels = nf_ct_labels_find(ct);
+ if (labels)
+ memcpy(dst_labels, labels->bits, NF_CT_LABELS_MAX_SIZE);
+ else
+ memset(dst_labels, 0, NF_CT_LABELS_MAX_SIZE);
+#else
+ memset(entry->ct_metadata.labels, 0, NF_CT_LABELS_MAX_SIZE);
+#endif
+}
+
int nf_flow_rule_route_ipv4(struct net *net, struct flow_offload *flow,
enum flow_offload_tuple_dir dir,
struct nf_flow_rule *flow_rule)
@@ -698,6 +734,7 @@ int nf_flow_rule_route_ipv4(struct net *net, struct flow_offload *flow,
test_bit(NF_FLOW_DNAT, &flow->flags))
flow_offload_ipv4_checksum(net, flow, flow_rule);
+ flow_offload_add_ct_metadata(flow, dir, flow_rule);
flow_offload_redirect(net, flow, dir, flow_rule);
return 0;
@@ -720,6 +757,7 @@ int nf_flow_rule_route_ipv6(struct net *net, struct flow_offload *flow,
flow_offload_port_dnat(net, flow, dir, flow_rule);
}
+ flow_offload_add_ct_metadata(flow, dir, flow_rule);
flow_offload_redirect(net, flow, dir, flow_rule);
return 0;
--
2.48.1
Hi all, One caveat: this change will cause some existing drivers to start returning -EOPNOTSUPP, since they walk the action list and treat any unknown action as fatal in their default switch. In other words, adding CT metadata unconditionally would break offload in those drivers until they are updated. Follow-up patches will therefore be needed to make drivers either parse or safely ignore FLOW_ACTION_CT_METADATA. Because this action is only advisory, it should be harmless for drivers that don’t use it to simply accept and no-op it. Just flagging this up front: the core patch by itself will break some drivers, and additional work is required to make them tolerant of the new metadata. Thanks, Elad
On Tue, Sep 16, 2025 at 07:49:34PM +0300, Elad Yifee wrote: > Hi all, > > One caveat: this change will cause some existing drivers to start > returning -EOPNOTSUPP, since they walk the action list and treat any > unknown action as fatal in their default switch. In other words, adding > CT metadata unconditionally would break offload in those drivers until > they are updated. > > Follow-up patches will therefore be needed to make drivers either parse > or safely ignore FLOW_ACTION_CT_METADATA. Because this action is only > advisory, it should be harmless for drivers that don’t use it to simply > accept and no-op it. > > Just flagging this up front: the core patch by itself will break some > drivers, and additional work is required to make them tolerant of the > new metadata. May I ask, where is the software plane extension for this feature? Will you add it for net/sched/act_ct.c? Adding the hardware offload feature only is a deal breaker IMO.
On Wed, Sep 17, 2025 at 1:39 AM Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > May I ask, where is the software plane extension for this feature? > Will you add it for net/sched/act_ct.c? > > Adding the hardware offload feature only is a deal breaker IMO. Software plane: This doesn’t add a new user feature, it just surfaces existing CT state to offload so the software plane already exists today via nft/TC. In software you can already set/match ct mark/labels (e.g., tag flows), and once offloaded the exporter snapshots that so a driver can map the tag to a HW queue/class if it wants per-flow QoS in hardware. Drivers that don’t need it can simply accept and ignore the metadata. act_ct.c: Yes - I’ll include a small common helper so TC and nft flowtable offloads produce identical CT metadata. If there’s no objection to the direction, I’ll respin with: - the common helper - act_ct switched to it - nft flowtable exporter appending CT metadata
On Wed, Sep 17, 2025 at 06:10:10AM +0300, Elad Yifee wrote: > On Wed, Sep 17, 2025 at 1:39 AM Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > > May I ask, where is the software plane extension for this feature? > > Will you add it for net/sched/act_ct.c? > > > > > > Adding the hardware offload feature only is a deal breaker IMO. > Software plane: This doesn’t add a new user feature, it just surfaces > existing CT state to offload so the software plane already exists > today via nft/TC. In software you can already set/match ct mark/labels > (e.g., tag flows), and once offloaded the exporter snapshots that so a > driver can map the tag to a HW queue/class if it wants per-flow QoS in > hardware. Drivers that don’t need it can simply accept and ignore the > metadata. Hm, flowtable software datapath cannot do ct marks/label at this time. > act_ct.c: Yes - I’ll include a small common helper so TC and nft > flowtable offloads produce identical CT metadata. > > If there’s no objection to the direction, I’ll respin with: > - the common helper > - act_ct switched to it > - nft flowtable exporter appending CT metadata Just to make sure we are on the same page: Software plane has to match the capabilities of the hardware offload plan, new features must work first in the software plane, then extend the hardware offload plane to support it.
On Wed, Sep 17, 2025 at 11:18 AM Pablo Neira Ayuso <pablo@netfilter.org> wrote: > Just to make sure we are on the same page: Software plane has to match > the capabilities of the hardware offload plan, new features must work > first in the software plane, then extend the hardware offload plane to > support it. Thanks - I see what you meant now. This isn’t a new feature that needs to be implemented in software first. We’re not introducing new user semantics, matches, or actions in nft/TC. no datapath changes (including the flowtable software offload fast path). The change only surfaces existing CT state (mark/labels/dir) as FLOW_ACTION_CT_METADATA at the hardware offload boundary so drivers can use it for per-flow QoS, or simply ignore it. When a flow stays in software, behavior remains exactly as today, software QoS continues to use existing tools (nft/TC setting skb->priority/mark, qdiscs, etc.). There’s no SW-HW mismatch introduced here.
Elad Yifee <eladwf@gmail.com> wrote: > When offloading a flow via the default nft flowtable path, > append a FLOW_ACTION_CT_METADATA action if the flow is associated with a conntrack entry. > We do this in both IPv4 and IPv6 route action builders, after NAT mangles and before redirect. > This mirrors net/sched/act_ct.c’s tcf_ct_flow_table_add_action_meta() so drivers that already > parse FLOW_ACTION_CT_METADATA from TC offloads can reuse the same logic for nft flowtables. > > Signed-off-by: Elad Yifee <eladwf@gmail.com> > --- > net/netfilter/nf_flow_table_offload.c | 38 +++++++++++++++++++++++++++ > 1 file changed, 38 insertions(+) > > diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c > index e06bc36f49fe..bccae4052319 100644 > --- a/net/netfilter/nf_flow_table_offload.c > +++ b/net/netfilter/nf_flow_table_offload.c > @@ -12,6 +12,7 @@ > #include <net/netfilter/nf_conntrack_acct.h> > #include <net/netfilter/nf_conntrack_core.h> > #include <net/netfilter/nf_conntrack_tuple.h> > +#include <net/netfilter/nf_conntrack_labels.h> > > static struct workqueue_struct *nf_flow_offload_add_wq; > static struct workqueue_struct *nf_flow_offload_del_wq; > @@ -679,6 +680,41 @@ nf_flow_rule_route_common(struct net *net, const struct flow_offload *flow, > return 0; > } > > +static void flow_offload_add_ct_metadata(const struct flow_offload *flow, > + enum flow_offload_tuple_dir dir, > + struct nf_flow_rule *flow_rule) > +{ > + struct nf_conn *ct = flow->ct; > + struct flow_action_entry *entry; > +#if IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) > + u32 *dst_labels; > + struct nf_conn_labels *labels; > +#endif > + > + if (!ct) > + return; Under what circumstances can flow->ct be NULL? > + entry = flow_action_entry_next(flow_rule); > + entry->id = FLOW_ACTION_CT_METADATA; > + > +#if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) > + entry->ct_metadata.mark = READ_ONCE(ct->mark); > +#endif > + > + entry->ct_metadata.orig_dir = (dir == FLOW_OFFLOAD_DIR_ORIGINAL); > + > +#if IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) > + dst_labels = entry->ct_metadata.labels; > + labels = nf_ct_labels_find(ct); > + if (labels) > + memcpy(dst_labels, labels->bits, NF_CT_LABELS_MAX_SIZE); > + else > + memset(dst_labels, 0, NF_CT_LABELS_MAX_SIZE); > +#else > + memset(entry->ct_metadata.labels, 0, NF_CT_LABELS_MAX_SIZE); > +#endif > +} This looks almost identical tcf_ct_flow_table_add_action_meta(). Any chance to make it a common helper function? act_ct already depends on nf_flow_table anyway.
On Sat, Sep 13, 2025 at 11:52 PM Florian Westphal <fw@strlen.de> wrote: > > Under what circumstances can flow->ct be NULL? I thought it could be NULL in a few cases. I’ll verify this on the inet/IPv4/IPv6 path and report back in the next spin. In any case, the null-guard is harmless, so I kept it. > This looks almost identical tcf_ct_flow_table_add_action_meta(). > > Any chance to make it a common helper function? act_ct already depends > on nf_flow_table anyway. agreed. If there are no objections to the main idea (exporting CT metadata on the nft flowtable path), I’ll prepare a new series that factors the fill logic into a shared helper and converts both act_ct and the nft exporter to use it Thanks for your feedback
© 2016 - 2025 Red Hat, Inc.