[libvirt PATCH 00/28] native support for nftables in virtual network driver

Laine Stump posted 28 patches 11 months, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20230501031943.288145-1-laine@redhat.com
libvirt.spec.in                               |   5 +
meson.build                                   |   1 +
po/POTFILES                                   |   2 +
src/conf/virnetworkobj.c                      |  40 +
src/conf/virnetworkobj.h                      |  11 +
src/libvirt_private.syms                      |  68 +-
src/network/bridge_driver.c                   |  40 +-
src/network/bridge_driver_conf.c              |  44 +
src/network/bridge_driver_conf.h              |   3 +
src/network/bridge_driver_linux.c             | 241 +++--
src/network/bridge_driver_nop.c               |   6 +-
src/network/bridge_driver_platform.h          |   6 +-
src/network/libvirtd_network.aug              |  39 +
src/network/meson.build                       |  11 +
src/network/network.conf                      |  24 +
src/network/test_libvirtd_network.aug.in      |   5 +
src/nwfilter/nwfilter_ebiptables_driver.c     |  16 +-
src/util/meson.build                          |   2 +
src/util/virebtables.c                        |   4 +-
src/util/virfirewall.c                        | 490 ++++++++--
src/util/virfirewall.h                        |  51 +-
src/util/viriptables.c                        | 762 ++++-----------
src/util/viriptables.h                        | 222 ++---
src/util/virnetfilter.c                       | 892 ++++++++++++++++++
src/util/virnetfilter.h                       | 159 ++++
src/util/virnftables.c                        | 698 ++++++++++++++
src/util/virnftables.h                        | 118 +++
.../{base.args => base.iptables}              |   0
tests/networkxml2firewalldata/base.nftables   | 256 +++++
...-linux.args => nat-default-linux.iptables} |   0
.../nat-default-linux.nftables                | 248 +++++
...pv6-linux.args => nat-ipv6-linux.iptables} |   0
.../nat-ipv6-linux.nftables                   | 384 ++++++++
...rgs => nat-ipv6-masquerade-linux.iptables} |   0
.../nat-ipv6-masquerade-linux.nftables        | 456 +++++++++
...linux.args => nat-many-ips-linux.iptables} |   0
.../nat-many-ips-linux.nftables               | 472 +++++++++
...-linux.args => nat-no-dhcp-linux.iptables} |   0
.../nat-no-dhcp-linux.nftables                | 384 ++++++++
...ftp-linux.args => nat-tftp-linux.iptables} |   0
.../nat-tftp-linux.nftables                   | 274 ++++++
...inux.args => route-default-linux.iptables} |   0
.../route-default-linux.nftables              | 162 ++++
tests/networkxml2firewalltest.c               |  56 +-
tests/virfirewalltest.c                       |  20 +-
45 files changed, 5718 insertions(+), 954 deletions(-)
create mode 100644 src/network/libvirtd_network.aug
create mode 100644 src/network/network.conf
create mode 100644 src/network/test_libvirtd_network.aug.in
create mode 100644 src/util/virnetfilter.c
create mode 100644 src/util/virnetfilter.h
create mode 100644 src/util/virnftables.c
create mode 100644 src/util/virnftables.h
rename tests/networkxml2firewalldata/{base.args => base.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/base.nftables
rename tests/networkxml2firewalldata/{nat-default-linux.args => nat-default-linux.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/nat-default-linux.nftables
rename tests/networkxml2firewalldata/{nat-ipv6-linux.args => nat-ipv6-linux.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/nat-ipv6-linux.nftables
rename tests/networkxml2firewalldata/{nat-ipv6-masquerade-linux.args => nat-ipv6-masquerade-linux.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.nftables
rename tests/networkxml2firewalldata/{nat-many-ips-linux.args => nat-many-ips-linux.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/nat-many-ips-linux.nftables
rename tests/networkxml2firewalldata/{nat-no-dhcp-linux.args => nat-no-dhcp-linux.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/nat-no-dhcp-linux.nftables
rename tests/networkxml2firewalldata/{nat-tftp-linux.args => nat-tftp-linux.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/nat-tftp-linux.nftables
rename tests/networkxml2firewalldata/{route-default-linux.args => route-default-linux.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/route-default-linux.nftables
[libvirt PATCH 00/28] native support for nftables in virtual network driver
Posted by Laine Stump 11 months, 3 weeks ago
This patch series enables libvirt to use nftables rules rather than
iptables *when setting up virtual networks* (it does *not* add
nftables support to the nwfilter driver). It accomplishes this by
abstracting several iptables functions (from viriptables.[ch] called
by the virtual network driver into a rudimentary "virNetfilter API"
(in virnetfilter.[ch], having the virtual network driver call the
virNetFilter API rather than calling the existing iptables functions
directly, and then finally adding an equivalent virNftables backend
that can be used instead of iptables (selected manually via a
network.conf setting, or automatically if iptables isn't found on the
host).

A first look at the result may have you thinking that it's filled with
a lot of bad decisions. While I would agree with that in many cases, I
think that overall they are the "least bad" decisions, or at least
"bad within acceptable limits / no worse than something else", and
point out that it's been done in a way that minimizes (actually
eliminates) the need for immediate changes to nwfilter (the other
consumer of iptables, which *also* needs to be updated to use native
nftables), and makes it much easier to change our mind about the
details in the future.

When I first started on this (long, protracted, repeatedly interrupted
for extended periods - many of these patches are > a year old) task, I
considered doing an all-at-once complete replacement of iptables with
nftables, since all the Linux distros we support have had nftables for
several years, and I'm pretty sure nobody has it disabled (not even
sure if it's possible to disable nftables while still enabling
iptables, since they both use xtables in the kernel). But due to
libvirt's use of "-t mangle -j CHECKSUM --checksum-fill" (see commit
fd5b15ff all the way back in July 2010 for details) which has no
equivalent in nftables rules (and we don't *want* it to!!), and the
desire to be able to easily switch back to iptables in case of an
unforeseen regression, we decided that both iptables and nftables need
to be supported (for now), with the default (for now) remaining as
iptables.

Just allowing for dual backends complicated matters, since it means
that we have to have a config file, a setting, detection of which
backends are available, and of course some sort of concept of an
abstracted frontend that can use either backend based on the config
setting (and/or auto-detection). Combining that with the fact that it
would just be "too big" of a project to switch over nwfilter's
iptables usage at the same time means that we have to keep around a
lot of existing code for compatibility's sake rather than just wiping
it all away and starting over.

So, what I've ended up with is:

1) a network.conf file (didn't exist before) with a single setting
"firewall_backend". If unset, the network driver tries to use iptables
on the backend, and if that's missing, then tries to use nftables.

2) a new (internal-only, so transient!) virNetFilterXXX API that is
used by the network driver in place of the iptablesXXX API, and calls
either iptablesXXX or:

3) a virNftablesXXX API that exactly replicates the filtering rules of
the existing iptablesXXX API (except in the custom "libvirt" base
table rather than the system "filter" and "nat" tables). This means
that:

4) when the nftables backend is used, the rules added are *exactly the
same* (functionally speaking) as we currently add for iptables (except
they are in the "libvirt" table).

We had spent some time in IRC discussing different ways of using new
functionality available in nftables to make a more
efficient/performant implemention of the desired filtering, and there
are some really great possibilities that need to be explored, but in
the end there were too many details up in the air, and I decided that
it would be more "accomplishable" (coined a new word there!) to first
replicate existing behavior with nftables, but do it inside a
framework that makes it easy to modify the details in the future (in
particular making it painless to switch back and forth between builds
with differing filter models at runtime) - this way we'll be able to
separate the infrastructure work from the details of the rules (which
we can then more easily work on and experiment with). (This implies
that the main objective right now is "get rid of iptables
dependencies", not "make the filtering faster and more efficient").

Notable features of this patchset:

* allows switching between iptables/nftables backends without
  rebooting or restarting networks/guests.

  Because the commands required to remove a network's filter rules are
  now saved in the network status XML, each time libvirtd (or
  virtnetworkd) is restarted, it will execute exactly the commands
  needed to remove the filter rules that had been added by the
  previous libvirtd/virtnetworkd (rather than just making a guess, as
  we've always done up until now), and then add new rules using the
  current backend+binary's set of rules (while also saving the info
  needed for future removal of these new rules back into the network's
  status XML).

* firewall_backend can be explicitly set in (new)
  /etc/libvirt/network.conf, but if it's not explicitly set, libvirt
  will default to the iptables backend if the iptables binary is
  found, and otherwise fall back to nftables as long as the nft
  binary is found; otherwise the first attempt to start a network will
  fail with an appropriate error.

Things that seem ugly / that I would like to clean up / that I think
are just fine as they are:

* virFirewall does *not* provide a backend-agnostic interface [this is fine]

  * We need to maintain a backward-compatible API for virFirewall so
    that we don't have to touch nwfilter code. Trying to make its API
    backend-agnostic would require individually considering/changing
    every nwfilter use of virFirewall.

  * instead virFirewall objects are just a way to build a collection
    of commands to execute to build a firewall, then execute them
    while collecting info for and building a collection of commands
    that will tear down that firewall in the future.

  Do I want to "fix" this in the future by making virFirewall a higher
  level interface that accepts tokens describing the type of rule to
  add (rather than backend-specific arguments to a backend-specific
  command)?  No. I think I like the way virFirewall works (as
  described in that previous bullet-point), instead I'm thinking that
  it is just slightly mis-named - I've lately been thinking of it as a
  "virNetFilterCmdList". Similarly, the virFirewallRules that it has a
  list of aren't really "rules", they are better described as commands
  or actions, so maybe they should be renamed to virNetfilterCmd or
  virNetfilterAction. But that is just cosmetic, so I didn't want to
  get into it in these patches (especially in case someone disagrees,
  or has a better idea for naming).

* Speaking of renaming - I should probably rename all the
  "iptablesXXX" functions to "virIptablesXXX" to be consistent with so
  much of our other code. I lost the ambition to deal with it right
  now though, so I'm leaving that for later cleanup (or I could do it
  now if it really makes someone's day :-).

* I could have chosen a higher place in the callchain to make the
  virNetfilter abstraction, e.g. at the level of
  "networkAddXXXFirewallRules()" rather than at the lower level of
  iptablesXXX(). That is actually probably what will happen in the
  future (since it will be necessary in order for an nftables-based
  firewall to be significantly different in structure from an
  iptables-based firewall). But that's the beauty of an API being
  private - we can freely add/remove things as needed. the important
  thing is that we now have the basic structure there.

  For now, the split is just above the existing iptablesXXX API
  (util/viriptables.[ch], which seems like a "narrow" enough
  place. Most iptablesXXX functions are written in terms of just 10
  *other* iptablesXXX functions that add iptables-specific commands -
  I've just moved those functions into virnetfilter.[ch]
  (appropriately renamed), and changed them to call the 10
  virNetfilterXXX functions that will in-turn call those 10
  iptablesXXX (or equivalent virNftablesXXX) functions.

* Some people may dislike that the 10 virNetfilterXXX functions are
  each written with a switch statement that has cases to directly call
  each backend, rather than each backend driver having a table of
  pointers to API functions, with the virNetfilter API function
  calling backends[fwBackend]->XXX() (ie the pattern for so many
  drivers in libvirt). But for just 2 backends, that really seemed
  like overkill and unnecessary obfuscation.

* As implemented here, I am storing a "<fwRemoval>" element in the
  network status XML - it contains a serialized virFirewall object
  that directly contains the commands necessary to remove the
  firewall. I could instead just store "<firewall>", which would
  include all the commands that were used to *create* the firewall in
  addition to the commands needed to remove the firewall. The way it's
  done currently takes up less space; switching to storing the full
  firewall *might* be more informative to somebody, but on the other
  hand would make the network status XML *very* long. If anybody has
  an opinion about this, now is the time to bring it up - do you think
  it's worth having a separate list of all the commands that were used
  to create a network's firewall (keeping in mind that there is no
  public API to access it)? Or is it enough to just store what's
  needed to remove the firewall?

* Several months ago Eric Garver posted patches for a pure firewalld
  backend, and I requested that they not be pushed because I wanted
  that to be integrated with my nftables backend support. Due to the
  fact that the firewalld backend is almost entirely implemented by
  putting the bridge into a new firewalld "zone", with no individual
  rules added, that won't happen as just another backend driver file
  in parallel to iptables and nftables; it will instead work by
  checking firewall_backend at a higher level in the network driver,
  thus avoiding the calls to virNetfilterXXX() entirely. I have
  locally merged Eric's patches over the top of these patches, and
  there are surprisingly few conflicts, but since his patches didn't
  account for a user-settable config (but instead just always used the
  firewalld backend if firewalld was active), some of the patches are
  going to require a bit of rework, which I'll take care of after
  getting these patches in.

Laine Stump (28):
  util: add -w/--concurrent when applying the rule rather than when
    building it
  util: new virFirewallRuleGet*() APIs
  util: determine ignoreErrors value when creating rule, not when
    applying
  util: rename iptables helpers that will become the frontend for
    ip&nftables
  util: move backend-agnostic virNetfilter*() functions to their own
    file
  util: make netfilter action a proper typedefed (virFirewall) enum
  util: #define the names used for private packet filter chains
  util: move/rename virFirewallApplyRuleDirect to
    virIptablesApplyFirewallRule
  util/network: reintroduce virFirewallBackend, but different
  network: add (empty) network.conf file to distribution files
  network: allow setting firewallBackend from network.conf
  network: do not add DHCP checksum mangle rule unless using iptables
  network: call backend agnostic function to init private filter chains
  util: setup functions in virnetfilter which will call appropriate
    backend
  build: add nft to the list of binaries we attempt to locate
  util: add nftables backend to virnetfilter API used by network driver
  tests: test cases for nftables backend
  util: new functions to support adding individual rollback rules
  util: check for 0 args when applying iptables rule
  util: implement rollback rule autosave for iptables backend
  util: implement rollback rule autosave for nftables backend
  network: turn on auto-rollback for the rules added for virtual
    networks
  util: new function virFirewallNewFromRollback()
  util: new functions virFirewallParseXML() and virFirewallFormat()
  conf: add a virFirewall object to virNetworkObj
  network: use previously saved list of firewall rules when removing
  network: save network status when firewall rules are reloaded
  network: improve log message when reloading virtual network firewall
    rules

 libvirt.spec.in                               |   5 +
 meson.build                                   |   1 +
 po/POTFILES                                   |   2 +
 src/conf/virnetworkobj.c                      |  40 +
 src/conf/virnetworkobj.h                      |  11 +
 src/libvirt_private.syms                      |  68 +-
 src/network/bridge_driver.c                   |  40 +-
 src/network/bridge_driver_conf.c              |  44 +
 src/network/bridge_driver_conf.h              |   3 +
 src/network/bridge_driver_linux.c             | 241 +++--
 src/network/bridge_driver_nop.c               |   6 +-
 src/network/bridge_driver_platform.h          |   6 +-
 src/network/libvirtd_network.aug              |  39 +
 src/network/meson.build                       |  11 +
 src/network/network.conf                      |  24 +
 src/network/test_libvirtd_network.aug.in      |   5 +
 src/nwfilter/nwfilter_ebiptables_driver.c     |  16 +-
 src/util/meson.build                          |   2 +
 src/util/virebtables.c                        |   4 +-
 src/util/virfirewall.c                        | 490 ++++++++--
 src/util/virfirewall.h                        |  51 +-
 src/util/viriptables.c                        | 762 ++++-----------
 src/util/viriptables.h                        | 222 ++---
 src/util/virnetfilter.c                       | 892 ++++++++++++++++++
 src/util/virnetfilter.h                       | 159 ++++
 src/util/virnftables.c                        | 698 ++++++++++++++
 src/util/virnftables.h                        | 118 +++
 .../{base.args => base.iptables}              |   0
 tests/networkxml2firewalldata/base.nftables   | 256 +++++
 ...-linux.args => nat-default-linux.iptables} |   0
 .../nat-default-linux.nftables                | 248 +++++
 ...pv6-linux.args => nat-ipv6-linux.iptables} |   0
 .../nat-ipv6-linux.nftables                   | 384 ++++++++
 ...rgs => nat-ipv6-masquerade-linux.iptables} |   0
 .../nat-ipv6-masquerade-linux.nftables        | 456 +++++++++
 ...linux.args => nat-many-ips-linux.iptables} |   0
 .../nat-many-ips-linux.nftables               | 472 +++++++++
 ...-linux.args => nat-no-dhcp-linux.iptables} |   0
 .../nat-no-dhcp-linux.nftables                | 384 ++++++++
 ...ftp-linux.args => nat-tftp-linux.iptables} |   0
 .../nat-tftp-linux.nftables                   | 274 ++++++
 ...inux.args => route-default-linux.iptables} |   0
 .../route-default-linux.nftables              | 162 ++++
 tests/networkxml2firewalltest.c               |  56 +-
 tests/virfirewalltest.c                       |  20 +-
 45 files changed, 5718 insertions(+), 954 deletions(-)
 create mode 100644 src/network/libvirtd_network.aug
 create mode 100644 src/network/network.conf
 create mode 100644 src/network/test_libvirtd_network.aug.in
 create mode 100644 src/util/virnetfilter.c
 create mode 100644 src/util/virnetfilter.h
 create mode 100644 src/util/virnftables.c
 create mode 100644 src/util/virnftables.h
 rename tests/networkxml2firewalldata/{base.args => base.iptables} (100%)
 create mode 100644 tests/networkxml2firewalldata/base.nftables
 rename tests/networkxml2firewalldata/{nat-default-linux.args => nat-default-linux.iptables} (100%)
 create mode 100644 tests/networkxml2firewalldata/nat-default-linux.nftables
 rename tests/networkxml2firewalldata/{nat-ipv6-linux.args => nat-ipv6-linux.iptables} (100%)
 create mode 100644 tests/networkxml2firewalldata/nat-ipv6-linux.nftables
 rename tests/networkxml2firewalldata/{nat-ipv6-masquerade-linux.args => nat-ipv6-masquerade-linux.iptables} (100%)
 create mode 100644 tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.nftables
 rename tests/networkxml2firewalldata/{nat-many-ips-linux.args => nat-many-ips-linux.iptables} (100%)
 create mode 100644 tests/networkxml2firewalldata/nat-many-ips-linux.nftables
 rename tests/networkxml2firewalldata/{nat-no-dhcp-linux.args => nat-no-dhcp-linux.iptables} (100%)
 create mode 100644 tests/networkxml2firewalldata/nat-no-dhcp-linux.nftables
 rename tests/networkxml2firewalldata/{nat-tftp-linux.args => nat-tftp-linux.iptables} (100%)
 create mode 100644 tests/networkxml2firewalldata/nat-tftp-linux.nftables
 rename tests/networkxml2firewalldata/{route-default-linux.args => route-default-linux.iptables} (100%)
 create mode 100644 tests/networkxml2firewalldata/route-default-linux.nftables

-- 
2.39.2
Re: [libvirt PATCH 00/28] native support for nftables in virtual network driver
Posted by Daniel P. Berrangé 11 months, 3 weeks ago
On Sun, Apr 30, 2023 at 11:19:15PM -0400, Laine Stump wrote:
> When I first started on this (long, protracted, repeatedly interrupted
> for extended periods - many of these patches are > a year old) task, I
> considered doing an all-at-once complete replacement of iptables with
> nftables, since all the Linux distros we support have had nftables for
> several years, and I'm pretty sure nobody has it disabled (not even
> sure if it's possible to disable nftables while still enabling
> iptables, since they both use xtables in the kernel). But due to
> libvirt's use of "-t mangle -j CHECKSUM --checksum-fill" (see commit
> fd5b15ff all the way back in July 2010 for details) which has no
> equivalent in nftables rules (and we don't *want* it to!!), and the
> desire to be able to easily switch back to iptables in case of an
> unforeseen regression, we decided that both iptables and nftables need
> to be supported (for now), with the default (for now) remaining as
> iptables.

What happens if you use iptables -j CHECKSUM --checksum-fill, and
this is just the iptables compat shim that is secretly talking to
NFT in the kernel ?  Is the checksum stuff just ignored entirely
then ?

If so, then the key decision factor for us, is whether any of the
supported distros still officially support use of the iptables KMOD
instead of nft KMOD.

> So, what I've ended up with is:
> 
> 1) a network.conf file (didn't exist before) with a single setting
> "firewall_backend". If unset, the network driver tries to use iptables
> on the backend, and if that's missing, then tries to use nftables.

For most distros I feel like nftables should be the default
and not require a user opt in.  That raises the question of
how do we deal with the upgrade path.

Historically when starting libvirt we'll blow away our existing
iptables rules and create them fresh. For an upgrade path we'll
need a variant which blows away iptables rules and instead creates
nftables rules. If that works, then we could default to nftables
without user config.


> We had spent some time in IRC discussing different ways of using new
> functionality available in nftables to make a more
> efficient/performant implemention of the desired filtering, and there
> are some really great possibilities that need to be explored, but in
> the end there were too many details up in the air, and I decided that
> it would be more "accomplishable" (coined a new word there!) to first
> replicate existing behavior with nftables, but do it inside a
> framework that makes it easy to modify the details in the future (in
> particular making it painless to switch back and forth between builds
> with differing filter models at runtime) - this way we'll be able to
> separate the infrastructure work from the details of the rules (which
> we can then more easily work on and experiment with). (This implies
> that the main objective right now is "get rid of iptables
> dependencies", not "make the filtering faster and more efficient").

I feel like filtering wrt virtual networks isn't our main
performance bottleneck. Most machines only have a couple
of virtual networks, so most host traffic only has to
match a few rules.

With nwfilter though, if we have 1000 VMs we'll have
1000 top level rules that need evaluating for every
packet. IOW, the performance optimizations allowed
by nftables are much more relevant to nwfilter.

> * allows switching between iptables/nftables backends without
>   rebooting or restarting networks/guests.
> 
>   Because the commands required to remove a network's filter rules are
>   now saved in the network status XML, each time libvirtd (or
>   virtnetworkd) is restarted, it will execute exactly the commands
>   needed to remove the filter rules that had been added by the
>   previous libvirtd/virtnetworkd (rather than just making a guess, as
>   we've always done up until now), and then add new rules using the
>   current backend+binary's set of rules (while also saving the info
>   needed for future removal of these new rules back into the network's
>   status XML).

Ok that's answering my question about upgrades then. It looks like we
should be able to default to nftables out of the box, if we assume
that absence of info in the network status XML implies iptables.

> * virFirewall does *not* provide a backend-agnostic interface [this is fine]
> 
>   * We need to maintain a backward-compatible API for virFirewall so
>     that we don't have to touch nwfilter code. Trying to make its API
>     backend-agnostic would require individually considering/changing
>     every nwfilter use of virFirewall.
> 
>   * instead virFirewall objects are just a way to build a collection
>     of commands to execute to build a firewall, then execute them
>     while collecting info for and building a collection of commands
>     that will tear down that firewall in the future.
> 
>   Do I want to "fix" this in the future by making virFirewall a higher
>   level interface that accepts tokens describing the type of rule to
>   add (rather than backend-specific arguments to a backend-specific
>   command)?  No. I think I like the way virFirewall works (as
>   described in that previous bullet-point), instead I'm thinking that
>   it is just slightly mis-named - I've lately been thinking of it as a
>   "virNetFilterCmdList". Similarly, the virFirewallRules that it has a
>   list of aren't really "rules", they are better described as commands
>   or actions, so maybe they should be renamed to virNetfilterCmd or
>   virNetfilterAction. But that is just cosmetic, so I didn't want to
>   get into it in these patches (especially in case someone disagrees,
>   or has a better idea for naming).

The main thing I'd like to see is to maintain the clean split of
code between what is general purpose, and what is specific to the
virtual network driver and what is specific to the nwfilter
driver. AFAICT, this series mixes that up with circular dependancies
between viriptables.c, virfirewall.c and virnetfilter.c.

I'd like virfirwall.c to remain indepent of anything related to
the virtual network driver.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [libvirt PATCH 00/28] native support for nftables in virtual network driver
Posted by Laine Stump 11 months, 2 weeks ago
On 5/3/23 11:40 AM, Daniel P. Berrangé wrote:
> On Sun, Apr 30, 2023 at 11:19:15PM -0400, Laine Stump wrote:
>> When I first started on this (long, protracted, repeatedly interrupted
>> for extended periods - many of these patches are > a year old) task, I
>> considered doing an all-at-once complete replacement of iptables with
>> nftables, since all the Linux distros we support have had nftables for
>> several years, and I'm pretty sure nobody has it disabled (not even
>> sure if it's possible to disable nftables while still enabling
>> iptables, since they both use xtables in the kernel). But due to
>> libvirt's use of "-t mangle -j CHECKSUM --checksum-fill" (see commit
>> fd5b15ff all the way back in July 2010 for details) which has no
>> equivalent in nftables rules (and we don't *want* it to!!), and the
>> desire to be able to easily switch back to iptables in case of an
>> unforeseen regression, we decided that both iptables and nftables need
>> to be supported (for now), with the default (for now) remaining as
>> iptables.
> 
> What happens if you use iptables -j CHECKSUM --checksum-fill, and
> this is just the iptables compat shim that is secretly talking to
> NFT in the kernel ?  Is the checksum stuff just ignored entirely
> then ?
> 
> If so, then the key decision factor for us, is whether any of the
> supported distros still officially support use of the iptables KMOD
> instead of nft KMOD.
> 
>> So, what I've ended up with is:
>>
>> 1) a network.conf file (didn't exist before) with a single setting
>> "firewall_backend". If unset, the network driver tries to use iptables
>> on the backend, and if that's missing, then tries to use nftables.
> 
> For most distros I feel like nftables should be the default
> and not require a user opt in.  That raises the question of
> how do we deal with the upgrade path.
> 
> Historically when starting libvirt we'll blow away our existing
> iptables rules and create them fresh. For an upgrade path we'll
> need a variant which blows away iptables rules and instead creates
> nftables rules. If that works, then we could default to nftables
> without user config.

Since my patches do that, I would be totally fine with preferring 
nftables over iptables when no config is present. It would definitely 
give a better feeling of "modernizing" if we could do that.

(Actually, I was considering that when I tweak Eric Garver's patches for 
native firewalld networks, I would prefer firewalld over nftables if 
firewalld.service is running. In that case, the order of preference 
(still overrideable, of course) would be firewalld->nftables->iptables, 
which from a functionality standpoint makes the most sense.)


> 
> 
>> We had spent some time in IRC discussing different ways of using new
>> functionality available in nftables to make a more
>> efficient/performant implemention of the desired filtering, and there
>> are some really great possibilities that need to be explored, but in
>> the end there were too many details up in the air, and I decided that
>> it would be more "accomplishable" (coined a new word there!) to first
>> replicate existing behavior with nftables, but do it inside a
>> framework that makes it easy to modify the details in the future (in
>> particular making it painless to switch back and forth between builds
>> with differing filter models at runtime) - this way we'll be able to
>> separate the infrastructure work from the details of the rules (which
>> we can then more easily work on and experiment with). (This implies
>> that the main objective right now is "get rid of iptables
>> dependencies", not "make the filtering faster and more efficient").
> 
> I feel like filtering wrt virtual networks isn't our main
> performance bottleneck. Most machines only have a couple
> of virtual networks, so most host traffic only has to
> match a few rules.
> 
> With nwfilter though, if we have 1000 VMs we'll have
> 1000 top level rules that need evaluating for every
> packet. IOW, the performance optimizations allowed
> by nftables are much more relevant to nwfilter.

That's a good point, and makes me feel better about not trying to 
immediately optimize the rules created here. (although I do recall 
someone once a long time ago who filed a bug about poor performance when 
they had > 500 virtual networks :-P)

(OTOH, I've been reading through your response to 18/28, and what you're 
suggesting there, in the name of simpler removal rather than runtime 
performance, is changing what rules are generated in the initial 
version. I'm still wrapping my head around that - I had confused "chain" 
with "hook" in a strange way, and thought that the behavior was 
different than it apparently is).

> 
>> * allows switching between iptables/nftables backends without
>>    rebooting or restarting networks/guests.
>>
>>    Because the commands required to remove a network's filter rules are
>>    now saved in the network status XML, each time libvirtd (or
>>    virtnetworkd) is restarted, it will execute exactly the commands
>>    needed to remove the filter rules that had been added by the
>>    previous libvirtd/virtnetworkd (rather than just making a guess, as
>>    we've always done up until now), and then add new rules using the
>>    current backend+binary's set of rules (while also saving the info
>>    needed for future removal of these new rules back into the network's
>>    status XML).
> 
> Ok that's answering my question about upgrades then. It looks like we
> should be able to default to nftables out of the box, if we assume
> that absence of info in the network status XML implies iptables.

Yep, that's what the code does - if there is no <fwRemoval> element, 
then we assume that the current network instance was setup with iptables 
rules.

> 
>> * virFirewall does *not* provide a backend-agnostic interface [this is fine]
>>
>>    * We need to maintain a backward-compatible API for virFirewall so
>>      that we don't have to touch nwfilter code. Trying to make its API
>>      backend-agnostic would require individually considering/changing
>>      every nwfilter use of virFirewall.
>>
>>    * instead virFirewall objects are just a way to build a collection
>>      of commands to execute to build a firewall, then execute them
>>      while collecting info for and building a collection of commands
>>      that will tear down that firewall in the future.
>>
>>    Do I want to "fix" this in the future by making virFirewall a higher
>>    level interface that accepts tokens describing the type of rule to
>>    add (rather than backend-specific arguments to a backend-specific
>>    command)?  No. I think I like the way virFirewall works (as
>>    described in that previous bullet-point), instead I'm thinking that
>>    it is just slightly mis-named - I've lately been thinking of it as a
>>    "virNetFilterCmdList". Similarly, the virFirewallRules that it has a
>>    list of aren't really "rules", they are better described as commands
>>    or actions, so maybe they should be renamed to virNetfilterCmd or
>>    virNetfilterAction. But that is just cosmetic, so I didn't want to
>>    get into it in these patches (especially in case someone disagrees,
>>    or has a better idea for naming).
> 
> The main thing I'd like to see is to maintain the clean split of
> code between what is general purpose, and what is specific to the
> virtual network driver and what is specific to the nwfilter
> driver. AFAICT, this series mixes that up with circular dependancies
> between viriptables.c, virfirewall.c and virnetfilter.c.

Yeah, after reading through your comments about that part, I agree. It 
all started because viriptables.[ch] were originally put in util instead 
of network, and I was too focused on maintaining the status quo where 
possible. As for the circular dependency, that was one of the bits where 
I felt a bit dirty writing the code, as mentioned earlier. I'm going to 
try and eliminate that.

> 
> I'd like virfirwall.c to remain indepent of anything related to
> the virtual network driver.

Definitely. And if viriptables is moved from util to network, then we 
*must* eliminate this bit.

Re: [libvirt PATCH 00/28] native support for nftables in virtual network driver
Posted by Ján Tomko 11 months, 3 weeks ago
On a Sunday in 2023, Laine Stump wrote:
>This patch series enables libvirt to use nftables rules rather than
>iptables *when setting up virtual networks* (it does *not* add
>nftables support to the nwfilter driver). It accomplishes this by
>  getting these patches in.

[... 150 lines delted ...]
>
>Laine Stump (28):
>  util: add -w/--concurrent when applying the rule rather than when
>    building it
>  util: new virFirewallRuleGet*() APIs
>  util: determine ignoreErrors value when creating rule, not when
>    applying
>  util: rename iptables helpers that will become the frontend for
>    ip&nftables
>  util: move backend-agnostic virNetfilter*() functions to their own
>    file
>  util: make netfilter action a proper typedefed (virFirewall) enum
>  util: #define the names used for private packet filter chains
>  util: move/rename virFirewallApplyRuleDirect to
>    virIptablesApplyFirewallRule
>  util/network: reintroduce virFirewallBackend, but different
>  network: add (empty) network.conf file to distribution files
>  network: allow setting firewallBackend from network.conf
>  network: do not add DHCP checksum mangle rule unless using iptables
>  network: call backend agnostic function to init private filter chains
>  util: setup functions in virnetfilter which will call appropriate
>    backend
>  build: add nft to the list of binaries we attempt to locate
>  util: add nftables backend to virnetfilter API used by network driver
>  tests: test cases for nftables backend
>  util: new functions to support adding individual rollback rules
>  util: check for 0 args when applying iptables rule
>  util: implement rollback rule autosave for iptables backend
>  util: implement rollback rule autosave for nftables backend
>  network: turn on auto-rollback for the rules added for virtual
>    networks
>  util: new function virFirewallNewFromRollback()
>  util: new functions virFirewallParseXML() and virFirewallFormat()
>  conf: add a virFirewall object to virNetworkObj
>  network: use previously saved list of firewall rules when removing
>  network: save network status when firewall rules are reloaded
>  network: improve log message when reloading virtual network firewall
>    rules
>
> libvirt.spec.in                               |   5 +
> meson.build                                   |   1 +
> po/POTFILES                                   |   2 +
> src/conf/virnetworkobj.c                      |  40 +
> src/conf/virnetworkobj.h                      |  11 +
> src/libvirt_private.syms                      |  68 +-
> src/network/bridge_driver.c                   |  40 +-
> src/network/bridge_driver_conf.c              |  44 +
> src/network/bridge_driver_conf.h              |   3 +
> src/network/bridge_driver_linux.c             | 241 +++--
> src/network/bridge_driver_nop.c               |   6 +-
> src/network/bridge_driver_platform.h          |   6 +-
> src/network/libvirtd_network.aug              |  39 +
> src/network/meson.build                       |  11 +
> src/network/network.conf                      |  24 +
> src/network/test_libvirtd_network.aug.in      |   5 +
> src/nwfilter/nwfilter_ebiptables_driver.c     |  16 +-
> src/util/meson.build                          |   2 +
> src/util/virebtables.c                        |   4 +-
> src/util/virfirewall.c                        | 490 ++++++++--
> src/util/virfirewall.h                        |  51 +-
> src/util/viriptables.c                        | 762 ++++-----------
> src/util/viriptables.h                        | 222 ++---
> src/util/virnetfilter.c                       | 892 ++++++++++++++++++
> src/util/virnetfilter.h                       | 159 ++++
> src/util/virnftables.c                        | 698 ++++++++++++++
> src/util/virnftables.h                        | 118 +++
> .../{base.args => base.iptables}              |   0
> tests/networkxml2firewalldata/base.nftables   | 256 +++++
> ...-linux.args => nat-default-linux.iptables} |   0
> .../nat-default-linux.nftables                | 248 +++++
> ...pv6-linux.args => nat-ipv6-linux.iptables} |   0
> .../nat-ipv6-linux.nftables                   | 384 ++++++++
> ...rgs => nat-ipv6-masquerade-linux.iptables} |   0
> .../nat-ipv6-masquerade-linux.nftables        | 456 +++++++++
> ...linux.args => nat-many-ips-linux.iptables} |   0
> .../nat-many-ips-linux.nftables               | 472 +++++++++
> ...-linux.args => nat-no-dhcp-linux.iptables} |   0
> .../nat-no-dhcp-linux.nftables                | 384 ++++++++
> ...ftp-linux.args => nat-tftp-linux.iptables} |   0
> .../nat-tftp-linux.nftables                   | 274 ++++++
> ...inux.args => route-default-linux.iptables} |   0
> .../route-default-linux.nftables              | 162 ++++
> tests/networkxml2firewalltest.c               |  56 +-
> tests/virfirewalltest.c                       |  20 +-
> 45 files changed, 5718 insertions(+), 954 deletions(-)

Reviewed-by: Ján Tomko <jtomko@redhat.com>

Jano
Re: [libvirt PATCH 00/28] native support for nftables in virtual network driver
Posted by Michal Prívozník 11 months, 3 weeks ago
On 5/1/23 05:19, Laine Stump wrote:
>
>  45 files changed, 5718 insertions(+), 954 deletions(-)
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>

Michal
Re: [libvirt PATCH 00/28] native support for nftables in virtual network driver
Posted by Daniel P. Berrangé 11 months, 2 weeks ago
On Sun, Apr 30, 2023 at 11:19:15PM -0400, Laine Stump wrote:
> This patch series enables libvirt to use nftables rules rather than
> iptables *when setting up virtual networks* (it does *not* add
> nftables support to the nwfilter driver). It accomplishes this by
> abstracting several iptables functions (from viriptables.[ch] called
> by the virtual network driver into a rudimentary "virNetfilter API"
> (in virnetfilter.[ch], having the virtual network driver call the
> virNetFilter API rather than calling the existing iptables functions
> directly, and then finally adding an equivalent virNftables backend
> that can be used instead of iptables (selected manually via a
> network.conf setting, or automatically if iptables isn't found on the
> host).
> 
> A first look at the result may have you thinking that it's filled with
> a lot of bad decisions. While I would agree with that in many cases, I
> think that overall they are the "least bad" decisions, or at least
> "bad within acceptable limits / no worse than something else", and
> point out that it's been done in a way that minimizes (actually
> eliminates) the need for immediate changes to nwfilter (the other
> consumer of iptables, which *also* needs to be updated to use native
> nftables), and makes it much easier to change our mind about the
> details in the future.
> 
> When I first started on this (long, protracted, repeatedly interrupted
> for extended periods - many of these patches are > a year old) task, I
> considered doing an all-at-once complete replacement of iptables with
> nftables, since all the Linux distros we support have had nftables for
> several years, and I'm pretty sure nobody has it disabled (not even
> sure if it's possible to disable nftables while still enabling
> iptables, since they both use xtables in the kernel). But due to
> libvirt's use of "-t mangle -j CHECKSUM --checksum-fill" (see commit
> fd5b15ff all the way back in July 2010 for details) which has no
> equivalent in nftables rules (and we don't *want* it to!!), and the
> desire to be able to easily switch back to iptables in case of an
> unforeseen regression, we decided that both iptables and nftables need
> to be supported (for now), with the default (for now) remaining as
> iptables.
> 
> Just allowing for dual backends complicated matters, since it means
> that we have to have a config file, a setting, detection of which
> backends are available, and of course some sort of concept of an
> abstracted frontend that can use either backend based on the config
> setting (and/or auto-detection). Combining that with the fact that it
> would just be "too big" of a project to switch over nwfilter's
> iptables usage at the same time means that we have to keep around a
> lot of existing code for compatibility's sake rather than just wiping
> it all away and starting over.
> 
> So, what I've ended up with is:
> 
> 1) a network.conf file (didn't exist before) with a single setting
> "firewall_backend". If unset, the network driver tries to use iptables
> on the backend, and if that's missing, then tries to use nftables.

When testing your git branch active-nft-10 leavnig it unset didn't
work:

Running './src/libvirtd'...
2023-05-04 10:16:11.447+0000: 115377: info : libvirt version: 9.3.0
2023-05-04 10:16:11.447+0000: 115377: info : hostname: localhost.localdomain
2023-05-04 10:16:11.447+0000: 115377: error : virFirewallNew:118 : internal error: firewall_backend wasn't set, and no usable setting could be auto-detected
2023-05-04 10:16:11.447+0000: 115377: error : virNetFilterBackendUnsetError:51 : internal error: firewall_backend wasn't set, and no usable setting could be auto-detected
2023-05-04 10:16:11.447+0000: 115377: error : virNetFilterBackendUnsetError:51 : internal error: firewall_backend wasn't set, and no usable setting could be auto-detected
2023-05-04 10:16:11.473+0000: 115377: error : virFirewallNew:118 : internal error: firewall_backend wasn't set, and no usable setting could be auto-detected


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [libvirt PATCH 00/28] native support for nftables in virtual network driver
Posted by Laine Stump 11 months, 2 weeks ago
On 5/4/23 6:47 AM, Daniel P. Berrangé wrote:
> On Sun, Apr 30, 2023 at 11:19:15PM -0400, Laine Stump wrote:
>> This patch series enables libvirt to use nftables rules rather than
>> iptables *when setting up virtual networks* (it does *not* add
>> nftables support to the nwfilter driver). It accomplishes this by
>> abstracting several iptables functions (from viriptables.[ch] called
>> by the virtual network driver into a rudimentary "virNetfilter API"
>> (in virnetfilter.[ch], having the virtual network driver call the
>> virNetFilter API rather than calling the existing iptables functions
>> directly, and then finally adding an equivalent virNftables backend
>> that can be used instead of iptables (selected manually via a
>> network.conf setting, or automatically if iptables isn't found on the
>> host).
>>
>> A first look at the result may have you thinking that it's filled with
>> a lot of bad decisions. While I would agree with that in many cases, I
>> think that overall they are the "least bad" decisions, or at least
>> "bad within acceptable limits / no worse than something else", and
>> point out that it's been done in a way that minimizes (actually
>> eliminates) the need for immediate changes to nwfilter (the other
>> consumer of iptables, which *also* needs to be updated to use native
>> nftables), and makes it much easier to change our mind about the
>> details in the future.
>>
>> When I first started on this (long, protracted, repeatedly interrupted
>> for extended periods - many of these patches are > a year old) task, I
>> considered doing an all-at-once complete replacement of iptables with
>> nftables, since all the Linux distros we support have had nftables for
>> several years, and I'm pretty sure nobody has it disabled (not even
>> sure if it's possible to disable nftables while still enabling
>> iptables, since they both use xtables in the kernel). But due to
>> libvirt's use of "-t mangle -j CHECKSUM --checksum-fill" (see commit
>> fd5b15ff all the way back in July 2010 for details) which has no
>> equivalent in nftables rules (and we don't *want* it to!!), and the
>> desire to be able to easily switch back to iptables in case of an
>> unforeseen regression, we decided that both iptables and nftables need
>> to be supported (for now), with the default (for now) remaining as
>> iptables.
>>
>> Just allowing for dual backends complicated matters, since it means
>> that we have to have a config file, a setting, detection of which
>> backends are available, and of course some sort of concept of an
>> abstracted frontend that can use either backend based on the config
>> setting (and/or auto-detection). Combining that with the fact that it
>> would just be "too big" of a project to switch over nwfilter's
>> iptables usage at the same time means that we have to keep around a
>> lot of existing code for compatibility's sake rather than just wiping
>> it all away and starting over.
>>
>> So, what I've ended up with is:
>>
>> 1) a network.conf file (didn't exist before) with a single setting
>> "firewall_backend". If unset, the network driver tries to use iptables
>> on the backend, and if that's missing, then tries to use nftables.
> 
> When testing your git branch active-nft-10 leavnig it unset didn't
> work:
> 
> Running './src/libvirtd'...
> 2023-05-04 10:16:11.447+0000: 115377: info : libvirt version: 9.3.0
> 2023-05-04 10:16:11.447+0000: 115377: info : hostname: localhost.localdomain
> 2023-05-04 10:16:11.447+0000: 115377: error : virFirewallNew:118 : internal error: firewall_backend wasn't set, and no usable setting could be auto-detected
> 2023-05-04 10:16:11.447+0000: 115377: error : virNetFilterBackendUnsetError:51 : internal error: firewall_backend wasn't set, and no usable setting could be auto-detected
> 2023-05-04 10:16:11.447+0000: 115377: error : virNetFilterBackendUnsetError:51 : internal error: firewall_backend wasn't set, and no usable setting could be auto-detected
> 2023-05-04 10:16:11.473+0000: 115377: error : virFirewallNew:118 : internal error: firewall_backend wasn't set, and no usable setting could be auto-detected

Ugh :-(

I guess I did brak something with all the pre-posting cleanups after my 
last round of testing, as that was working just fine as of the end of 
last week.. I'll see what I broke, and if it's simple I'll update the 
branch on gitlab so that you can try it out if you want (even though 
it's obvious I need to change some things)