From nobody Mon Feb 9 23:42:56 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) client-ip=8.43.85.245; envelope-from=devel-bounces@lists.libvirt.org; helo=lists.libvirt.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) smtp.mailfrom=devel-bounces@lists.libvirt.org; dmarc=pass(p=none dis=none) header.from=gmail.com ARC-Seal: i=1; a=rsa-sha256; t=1770650367; cv=none; d=zohomail.com; s=zohoarc; b=a3J4w3SWdDfGisunNbKXBJmy1GPi6uYbqUYiA0pH4ch2/yWvGV5cdqjzoMhkcrzUDnWANBQHr9KHpK3pqDcQNuHjrBPTEfV+0ERO9S7xy+8ga7dAgBXsWfglc0GGG4Dyv7cjDg20GQSIT+hEHT1RyDDGj/Cq6xOc3m/1/QO1eAM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1770650367; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Owner:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To; bh=lnTXaGJNXBcNVR819CfPQ68A1VOknnyONeqjBq+0EF4=; b=OE/pZg+4C3xUjVwEXDV+W+XIq/0nE22CKiMEXVPudvsRVawETtOdF1XLPWq05+1aqU+UBYwqmufnR7Oxc048qQptjUtHSy4v6yra0Ate+A8y0IkJqhygWw0PqSGLxG4mrxPJwrL1ghT044O8DN7FfKMPunwLvmvlcECKTq7+BYM= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) smtp.mailfrom=devel-bounces@lists.libvirt.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.libvirt.org (lists.libvirt.org [8.43.85.245]) by mx.zohomail.com with SMTPS id 1770650367187683.2662183642225; Mon, 9 Feb 2026 07:19:27 -0800 (PST) Received: by lists.libvirt.org (Postfix, from userid 993) id 21CF144099; Mon, 9 Feb 2026 10:19:26 -0500 (EST) Received: from [172.19.199.6] (lists.libvirt.org [8.43.85.245]) by lists.libvirt.org (Postfix) with ESMTP id 7C94943F79; Mon, 9 Feb 2026 10:09:22 -0500 (EST) Received: by lists.libvirt.org (Postfix, from userid 993) id C28523FD05; Mon, 9 Feb 2026 10:01:40 -0500 (EST) Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (3072 bits) server-digest SHA256) (No client certificate requested) by lists.libvirt.org (Postfix) with ESMTPS id 7116043F6E for ; Mon, 9 Feb 2026 09:30:14 -0500 (EST) Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-b8860d6251bso614624066b.3 for ; Mon, 09 Feb 2026 06:30:14 -0800 (PST) Received: from thinkiepadje.home (2a02-a470-a384-0-74c1-2df-7ef3-3ff.fixed6.kpn.net. [2a02:a470:a384:0:74c1:2df:7ef3:3ff]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-b8eda7a4ccbsm396480566b.21.2026.02.09.06.30.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Feb 2026 06:30:11 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-26) on lists.libvirt.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED,RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED,SPF_PASS autolearn=unavailable autolearn_force=no version=4.0.1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1770647413; x=1771252213; darn=lists.libvirt.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lnTXaGJNXBcNVR819CfPQ68A1VOknnyONeqjBq+0EF4=; b=ULA8pEOIqi+EUbplN2LV52xJ8VON82EfJ0JvyPW5ZMHtTJCDGgJN54U5MVbWNW4V8G iHrM8b8VSHAEADNCn/msic4/ogGMsBa3o7ebpNqWEmLvd/cY6FZoZ4neC2Y4+m1i4PdC ZAQAoJvE6Wn35TFQIzGqxYe/HTFVuAS2xGhKbgq5JP6v0Jllk2RLflWm9PIieHVzrfd8 a/5BI2a2zkO61/hZfOcvoaVRMXGQfJArQ6uUfCd6QGSFJ47LbBwSQkxAHfFrInLAkdka qMA1v45c2T5iWuGBnh0iq74r3bg2NK4OfdBlNirjZ4nEJ8jvOhwTUaT0zUk5BeKxFgo0 yGjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770647413; x=1771252213; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=lnTXaGJNXBcNVR819CfPQ68A1VOknnyONeqjBq+0EF4=; b=k83oKKRIJySoIVpS9BNpeohSndmVtOpPkcXiMhRPCY8yVXkb7de/8Q0So5Cf71a9hT vWE3bso1SCxsNKOcboGcaVS9Q5Xly3rb3jBqzY832CpoXhNeE9OvdVXeFMqno7ZSf37/ I7qsnqbEd6pXSN1w5Nd043vsPVG6xxI/MsMDv3HX5v0yAYHKyfWG419XsmRQMYtOHwqb 2W0KJs1jUZusZJtugqWSy5As53DHw1vXPiD4DvaFYeStq6GdliyJg6E2R7R1HRv1Mq1b FHKTk6hoINS6HJm9MEj15yPHxWnB4x5/hQCfObglIvUWdt3NuamO2FzoQ7hB92fGSXok BLKg== X-Gm-Message-State: AOJu0YyO/pxnio9yswj3xvgm4XfM1qXwefaFeCMIDvFqJ5rjPpWPVChE I3NoltHLP7DtWKeRrAaoA/Ne1NKc6wNvnKahLS4tNCV0Co9oI7s7vMrvpzLV X-Gm-Gg: AZuq6aKZ4Rid5h+ltlFZOX/IUwR0Kc4rRjasJ6jnOEGYduFuahMwiTjJNcdALgZLX0z sKvhV0pWNZVyADrfmLS3HqYndc0CzWy58qFjHjjWrbWG/c/SwytYbt8GE/101gCaxPbBqV7k9di 4XmN5qdPuExGAW21ZnbUvp9Oe+MfG40M+y1ihEXDVlTCLJ3mzbJZj7dEISKdLetUY0GisNfhNit DCoSgPL10XX3eP55YtrTS/HkLj7He98VzurmaajMuF8xDOvlcLp0ySdnJaGpvk2P6g5vh0XNQ5D nVGl/Ojd3scjCVhYO7ryyMq8nLa7JHO3jnoyffnzH+1HfbC1pJ0ZVRJF5Ww1V8cRoWYRtOcvDml 331TzBldLNLGXJHoegWclltS39m42M1ZARmGYpv6Q2Jgs8vS0nNUTlkAShZAqDtgICMPLWwkAEj UzKnA+P1GFRcnA05Hc4bvqGBe3p/D/AgbWcfMIr2rI4D3+3dcDnqajEmbT72XNz1IEA5wnzFeWm 2qBGG8= X-Received: by 2002:a17:907:96a3:b0:b87:8172:257 with SMTP id a640c23a62f3a-b8edf47dd02mr608124866b.64.1770647412178; Mon, 09 Feb 2026 06:30:12 -0800 (PST) From: Dion Bosschieter To: devel@lists.libvirt.org Subject: [PATCH v3 3/5] nwfilter: add nwfilter nftables driver Date: Mon, 9 Feb 2026 15:29:51 +0100 Message-ID: <20260209142953.1016258-4-dionbosschieter@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260209142953.1016258-1-dionbosschieter@gmail.com> References: <20260209142953.1016258-1-dionbosschieter@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: CYFKTX2WLVCBW2LXYPMD77UON6NPSBCZ X-Message-ID-Hash: CYFKTX2WLVCBW2LXYPMD77UON6NPSBCZ X-MailFrom: dionbosschieter@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; header-match-devel.lists.libvirt.org-0; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: jean-louis@dupond.be, Dion Bosschieter X-Mailman-Version: 3.3.10 Precedence: list List-Id: Development discussions about the libvirt library & tools Archived-At: List-Archive: <> List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-ZohoMail-DKIM: pass (identity @gmail.com) X-ZM-MESSAGEID: 1770650368077158500 Resolves issue: https://gitlab.com/libvirt/libvirt/-/issues/603 Benchmarks showed that the amount of iifname jumps for each interface is the cause for this. Switched the nftables driver towards a vmap (verdict map) so we can have 1 rule that jumps to the correct root input/output chain per interface. Which improves throughput as when the number of interface check and jump rules increases the throughput decreases. The issue describes the interface matching works using the interface name and the majority of the effort is the strncpy, this commit also switches nftables to an interface_index compare instead. However, just using the interface_index is not enough, the amount of oif and iif jump rules causes quite a performance issue, the vmap instead solves this. Split rules into separate tables: "libvirt_nwfilter_ethernet" and "libvirt_nwfilter_inet" to preserve existing ebip firewall behavior. Reworked chain logic for clarity with root -input/-output chains per interface. input in the VM interface is filtered in the -input chain(s), output out of the VM inteface is filtered in the -output chain(s). Stuck with two tables for compatibility reasons with ebiptables. Unifying into a single table would break users=E2=80=99 firewall definition= s, which depend on being able to accept traffic at the Ethernet layer (currently defined via ebtables) and apply additional filtering via IP rules (currently defined via ip(6)tables). The nwfilter_nftables_driver splits the ethernet and non ethernet (inet) rules in seperate tables, for above mentioned compatibility reasons. =E2=80=9Clibvirt_nwfilter_ethernet=E2=80=9D and =E2=80=9Clibvirt_nwfilter_i= net=E2=80=9D. Rewrote chain logic, so it is easier to understand, input in the VM interface is filtered in the -input chain(s), output out of the VM inteface is filtered in the -output chain(s). _ethernet and _inet table follow the same style and hook in the same way. Simplified conntrack handling: rules with accept+conntrack are duplicated to the opposite chain for symmetric behavior, to support the existing ebiptables logic. Firewall updates continue to use tmp names for atomic replacement. Unsupported nwfilter features (for now): - STP filtering - Gratuitous ARP filtering - IPSets (potential future support via nft sets) Signed-off-by: Dion Bosschieter --- po/POTFILES | 2 + src/nwfilter/meson.build | 1 + src/nwfilter/nwfilter_nftables_driver.c | 2495 +++++++++++++++++++++++ src/nwfilter/nwfilter_nftables_driver.h | 28 + 4 files changed, 2526 insertions(+) create mode 100644 src/nwfilter/nwfilter_nftables_driver.c create mode 100644 src/nwfilter/nwfilter_nftables_driver.h diff --git a/po/POTFILES b/po/POTFILES index c78d2b8000..5d4c27ac00 100644 --- a/po/POTFILES +++ b/po/POTFILES @@ -162,6 +162,8 @@ src/nwfilter/nwfilter_driver.c src/nwfilter/nwfilter_ebiptables_driver.c src/nwfilter/nwfilter_gentech_driver.c src/nwfilter/nwfilter_learnipaddr.c +src/nwfilter/nwfilter_nftables_driver.c +src/nwfilter/nwfilter_tech_driver.c src/openvz/openvz_conf.c src/openvz/openvz_driver.c src/openvz/openvz_util.c diff --git a/src/nwfilter/meson.build b/src/nwfilter/meson.build index 9e8a4797c5..a94d72d570 100644 --- a/src/nwfilter/meson.build +++ b/src/nwfilter/meson.build @@ -5,6 +5,7 @@ nwfilter_driver_sources =3D [ 'nwfilter_dhcpsnoop.c', 'nwfilter_ebiptables_driver.c', 'nwfilter_learnipaddr.c', + 'nwfilter_nftables_driver.c', ] =20 driver_source_files +=3D files(nwfilter_driver_sources) diff --git a/src/nwfilter/nwfilter_nftables_driver.c b/src/nwfilter/nwfilte= r_nftables_driver.c new file mode 100644 index 0000000000..c7bacb11bb --- /dev/null +++ b/src/nwfilter/nwfilter_nftables_driver.c @@ -0,0 +1,2495 @@ +/* + * nwfilter_nftables_driver.c: driver for nftables on tap devices + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * . + */ + +#include + +#include +#include +#include + +#include "internal.h" + +#include "virbuffer.h" +#include "viralloc.h" +#include "virlog.h" +#include "virerror.h" +#include "nwfilter_conf.h" +#include "nwfilter_nftables_driver.h" +#include "nwfilter_tech_driver.h" +#include "virfile.h" +#include "configmake.h" +#include "virstring.h" +#include "virfirewall.h" + +#define VIR_FROM_THIS VIR_FROM_NWFILTER + +/* define nftable root table */ +#define NF_ETHERNET_TABLE "libvirt_nwfilter_ethernet" +#define NF_INET_TABLE "libvirt_nwfilter_inet" +#define NF_COMMENT \ + "{ comment \"Managed by libvirt for network filters: " \ + "https://libvirt.org/firewall.html#the-network-filter-driver\"; }" +/* nftables counter can be enabled for firewalls transparency */ +#ifndef NF_COUNTER +# define NF_COUNTER 0 +#endif + +/* define chains */ +#define IN_CHAIN "postrouting" +#define OUT_CHAIN "prerouting" +#define FORWARD_CHAIN "forward" + +/* Interface matches depend on interface index, in nftables you can supply + * an interface name as argument which will be turned into the interface i= ndex + * for matching purposes. oif / iif will throw an nft error if the specifi= ed + * interface doesn't exist */ +#define IN_IFMATCH "oif" +#define OUT_IFMATCH "iif" +/* depend on the ifname for a match during moments where the + * interface already has dissapeared (dropAllRules) */ +#define IN_IFNAMEMATCH "oifname" +#define OUT_IFNAMEMATCH "iifname" + +#define DEFAULT_POLICY "accept" + +#ifndef NF_TRACE +# define NF_TRACE 0 +#endif +#if NF_TRACE +# define TRACE_SETTING "meta nftrace set 1;" +#else +# define TRACE_SETTING "" +#endif + +#define CHAINSETTINGS "{ }" + +#define VMAP_IN "vmap-oif" +#define VMAP_OUT "vmap-iif" +#define VMAPSETTINGS "{ type iface_index: verdict; }" + +#define ROOT_CHAINSETTINGS(chain, defaultPolicy) \ + "{ type filter hook "chain" priority %d;" \ + " policy "defaultPolicy"; "TRACE_SETTING" }" + +VIR_LOG_INIT("nwfilter.nwfilter_nftables_driver"); + +/* A lookup table for translating ethernet protocol IDs to human readable + * strings. None of the human readable strings must be found as a prefix + * in another entry here (example 'ab' would be found in 'abc') to allow + * for prefix matching. + */ +static const struct virNWFilterUShortMap l3_protocols[] =3D { + virNWFilterUShortMapEntryIdx(VIR_NWFILTER_PROTO_IDX_IPV4, ETHERTYPE_IP= , "ipv4"), + virNWFilterUShortMapEntryIdx(VIR_NWFILTER_PROTO_IDX_IPV6, ETHERTYPE_IP= V6, "ipv6"), + virNWFilterUShortMapEntryIdx(VIR_NWFILTER_PROTO_IDX_ARP, ETHERTYPE_AR= P, "arp"), + virNWFilterUShortMapEntryIdx(VIR_NWFILTER_PROTO_IDX_RARP, ETHERTYPE_RE= VARP, "rarp"), + virNWFilterUShortMapEntryIdx(VIR_NWFILTER_PROTO_IDX_VLAN, ETHERTYPE_VL= AN, "vlan"), + virNWFilterUShortMapEntryIdx(VIR_NWFILTER_PROTO_IDX_STP, 0, = "stp"), + virNWFilterUShortMapEntryIdx(VIR_NWFILTER_PROTO_IDX_MAC, 0, = "mac"), + virNWFilterUShortMapEntryIdx(VIR_NWFILTER_PROTO_IDX_LAST, 0, = NULL), +}; + +/* + * Given a filtername determine the protocol it is used for evaluating + * We do prefix-matching to determine the protocol. + */ +static enum virNWFilterProtoIdx +nftablesGetProtoIdxByFiltername(const char *filtername) +{ + enum virNWFilterProtoIdx idx; + + for (idx =3D 0; idx < VIR_NWFILTER_PROTO_IDX_LAST; idx++) { + if (STRPREFIX(filtername, l3_protocols[idx].val)) + return idx; + } + + return -1; +} + +static void nftablesCreateTable(virFirewall *fw, + virFirewallLayer layer, + const char *tableName) +{ + virFirewallCmd *fwrule =3D NULL; + int tablePriority =3D STREQ(tableName, NF_ETHERNET_TABLE) ? 0 : 1; + + /* define table */ + virFirewallAddCmd(fw, layer, + "add", "table", "bridge", + tableName, NF_COMMENT, NULL); + + /* create vmap for iface matches */ + virFirewallAddCmd(fw, layer, "add", "map", "bridge", tableName, VMAP_I= N, + VMAPSETTINGS, NULL); + virFirewallAddCmd(fw, layer, "add", "map", "bridge", tableName, VMAP_O= UT, + VMAPSETTINGS, NULL); + + /* define default chains */ + fwrule =3D virFirewallAddCmd(fw, layer, "add", "chain", "bridge", + tableName, IN_CHAIN, NULL); + virFirewallCmdAddArgFormat(fw, fwrule, + ROOT_CHAINSETTINGS(IN_CHAIN, DEFAULT_POLICY= ), + tablePriority); + fwrule =3D virFirewallAddCmd(fw, layer, "add", "chain", "bridge", + tableName, OUT_CHAIN, NULL); + virFirewallCmdAddArgFormat(fw, fwrule, + ROOT_CHAINSETTINGS(OUT_CHAIN, DEFAULT_POLIC= Y), + tablePriority); + + /* add the one jump rule based on the vmap */ + fwrule =3D virFirewallAddCmd(fw, layer, "add", "rule", "bridge", table= Name, + IN_CHAIN, IN_IFMATCH, "vmap", NULL); + virFirewallCmdAddArgFormat(fw, fwrule, "@%s", VMAP_IN); + fwrule =3D virFirewallAddCmd(fw, layer, "add", "rule", "bridge", table= Name, + OUT_CHAIN, OUT_IFMATCH, "vmap", NULL); + virFirewallCmdAddArgFormat(fw, fwrule, "@%s", VMAP_OUT); +} + +static int +nftablesHandleCreateRootTables(virFirewall *fw, + virFirewallLayer layer, + const char *const *lines, + void *opaque G_GNUC_UNUSED) +{ + bool ethernetTableDefined =3D false; + bool inetTableDefined =3D false; + size_t i; + + /* parse nft tables list output to see if tables exist */ + for (i =3D 0; lines[i] !=3D NULL; i++) { + const char *line =3D lines[i]; + if ((line =3D STRSKIP(line, "table bridge ")) =3D=3D NULL) { + continue; + } + + VIR_DEBUG("Considering table for comparison '%s'", lines[i]); + + /* if chain matches basechain */ + if (STRPREFIX(line, NF_ETHERNET_TABLE)) { + ethernetTableDefined =3D true; + } else if (STRPREFIX(line, NF_INET_TABLE)) { + inetTableDefined =3D true; + } + } + + /* if the ethernet table doesn't exist, + * we create it including the default chains*/ + if (!ethernetTableDefined) + nftablesCreateTable(fw, layer, NF_ETHERNET_TABLE); + /* if the inet table doesn't exist, + * we create it including the default chains */ + if (!inetTableDefined) + nftablesCreateTable(fw, layer, NF_INET_TABLE); + + return 0; +} + +static void nftablesAddCmdAction(virFirewall *fw, + virFirewallCmd *fwrule, + virNWFilterRuleActionType action) +{ + switch (action) { + case VIR_NWFILTER_RULE_ACTION_ACCEPT: + virFirewallCmdAddArg(fw, fwrule, "accept"); + break; + case VIR_NWFILTER_RULE_ACTION_DROP: + virFirewallCmdAddArg(fw, fwrule, "drop"); + break; + case VIR_NWFILTER_RULE_ACTION_REJECT: + virFirewallCmdAddArg(fw, fwrule, "drop"); + break; + case VIR_NWFILTER_RULE_ACTION_RETURN: + virFirewallCmdAddArg(fw, fwrule, "return"); + break; + case VIR_NWFILTER_RULE_ACTION_CONTINUE: + virFirewallCmdAddArg(fw, fwrule, "continue"); + break; + case VIR_NWFILTER_RULE_ACTION_LAST: + default: + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unexpected action %1$d"), action); + } +} + +static const char *nftablesGetProtocolType(int protocol) +{ + switch (protocol) { + case VIR_NWFILTER_RULE_PROTOCOL_TCP: + case VIR_NWFILTER_RULE_PROTOCOL_TCPoIPV6: + return "tcp"; + case VIR_NWFILTER_RULE_PROTOCOL_UDP: + case VIR_NWFILTER_RULE_PROTOCOL_UDPoIPV6: + return "udp"; + case VIR_NWFILTER_RULE_PROTOCOL_UDPLITE: + case VIR_NWFILTER_RULE_PROTOCOL_UDPLITEoIPV6: + return "udplite"; + case VIR_NWFILTER_RULE_PROTOCOL_ESP: + case VIR_NWFILTER_RULE_PROTOCOL_ESPoIPV6: + return "esp"; + case VIR_NWFILTER_RULE_PROTOCOL_AH: + case VIR_NWFILTER_RULE_PROTOCOL_AHoIPV6: + return "ah"; + case VIR_NWFILTER_RULE_PROTOCOL_SCTP: + case VIR_NWFILTER_RULE_PROTOCOL_SCTPoIPV6: + return "sctp"; + case VIR_NWFILTER_RULE_PROTOCOL_ICMP: + return "icmp"; + case VIR_NWFILTER_RULE_PROTOCOL_ICMPV6: + return "icmpv6"; + case VIR_NWFILTER_RULE_PROTOCOL_IGMP: + return "igmp"; + case VIR_NWFILTER_RULE_PROTOCOL_ALL: + case VIR_NWFILTER_RULE_PROTOCOL_ALLoIPV6: + return "all"; + default: + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unexpected protocol %1$d"), + protocol); + return ""; + } +} + +static const char * +nftablesGetIpTypeByDataType(nwItemDesc *item) +{ + return (item->datatype =3D=3D DATATYPE_IPV6ADDR) ? "ip6" : "ip"; +} + +static int +nftablesHandleIPHdr(virFirewall *fw, + virFirewallCmd *fwrule, + virNWFilterVarCombIter *vars, + ipHdrDataDef *ipHdr, + bool reverseRule) +{ + char ipaddr[INET6_ADDRSTRLEN]; + char ipaddralt[INET6_ADDRSTRLEN]; + char number[VIR_INT64_STR_BUFLEN]; + const char *ip =3D NULL; + const char *saddr =3D reverseRule ? "daddr" : "saddr"; + const char *daddr =3D reverseRule ? "saddr" : "daddr"; + + if (HAS_ENTRY_ITEM(&ipHdr->dataSrcIPAddr)) { + ip =3D nftablesGetIpTypeByDataType(&ipHdr->dataSrcIPAddr); + virFirewallCmdAddArgList(fw, fwrule, ip, saddr, NULL); + + if (virNWFilterPrintDataType(vars, + ipaddr, sizeof(ipaddr), + &ipHdr->dataSrcIPAddr) < 0) + return -1; + + if (ENTRY_WANT_NEG_SIGN(&ipHdr->dataSrcIPAddr)) + virFirewallCmdAddArg(fw, fwrule, "!"); + + if (HAS_ENTRY_ITEM(&ipHdr->dataSrcIPMask)) { + if (virNWFilterPrintDataType(vars, + number, sizeof(number), + &ipHdr->dataSrcIPMask) < 0) + return -1; + + virFirewallCmdAddArgFormat(fw, fwrule, + "%s/%s", ipaddr, number); + } else { + virFirewallCmdAddArg(fw, fwrule, ipaddr); + } + } else if (HAS_ENTRY_ITEM(&ipHdr->dataSrcIPFrom)) { + ip =3D nftablesGetIpTypeByDataType(&ipHdr->dataSrcIPFrom); + virFirewallCmdAddArgList(fw, fwrule, ip, saddr, NULL); + + if (virNWFilterPrintDataType(vars, + ipaddr, sizeof(ipaddr), + &ipHdr->dataSrcIPFrom) < 0) + return -1; + + if (ENTRY_WANT_NEG_SIGN(&ipHdr->dataSrcIPFrom)) + virFirewallCmdAddArg(fw, fwrule, "!"); + + if (HAS_ENTRY_ITEM(&ipHdr->dataSrcIPTo)) { + + if (virNWFilterPrintDataType(vars, + ipaddralt, sizeof(ipaddralt), + &ipHdr->dataSrcIPTo) < 0) + return -1; + + virFirewallCmdAddArgFormat(fw, fwrule, + "%s-%s", ipaddr, ipaddralt); + } else { + virFirewallCmdAddArg(fw, fwrule, ipaddr); + } + } + + if (HAS_ENTRY_ITEM(&ipHdr->dataDstIPAddr)) { + ip =3D nftablesGetIpTypeByDataType(&ipHdr->dataDstIPAddr); + virFirewallCmdAddArgList(fw, fwrule, ip, daddr, NULL); + + if (virNWFilterPrintDataType(vars, + ipaddr, sizeof(ipaddr), + &ipHdr->dataDstIPAddr) < 0) + return -1; + + if (ENTRY_WANT_NEG_SIGN(&ipHdr->dataDstIPAddr)) + virFirewallCmdAddArg(fw, fwrule, "!"); + + if (HAS_ENTRY_ITEM(&ipHdr->dataDstIPMask)) { + if (virNWFilterPrintDataType(vars, + number, sizeof(number), + &ipHdr->dataDstIPMask) < 0) + return -1; + + virFirewallCmdAddArgFormat(fw, fwrule, + "%s/%s", ipaddr, number); + } else { + virFirewallCmdAddArg(fw, fwrule, ipaddr); + } + } else if (HAS_ENTRY_ITEM(&ipHdr->dataDstIPFrom)) { + ip =3D nftablesGetIpTypeByDataType(&ipHdr->dataDstIPFrom); + virFirewallCmdAddArgList(fw, fwrule, ip, daddr, NULL); + + if (virNWFilterPrintDataType(vars, + ipaddr, sizeof(ipaddr), + &ipHdr->dataDstIPFrom) < 0) + return -1; + + if (ENTRY_WANT_NEG_SIGN(&ipHdr->dataDstIPFrom)) + virFirewallCmdAddArg(fw, fwrule, "!"); + + if (HAS_ENTRY_ITEM(&ipHdr->dataDstIPTo)) { + if (virNWFilterPrintDataType(vars, + ipaddralt, sizeof(ipaddralt), + &ipHdr->dataDstIPTo) < 0) + return -1; + + virFirewallCmdAddArgFormat(fw, fwrule, + "%s-%s", ipaddr, ipaddralt); + } else { + virFirewallCmdAddArg(fw, fwrule, ipaddr); + } + } + + if (HAS_ENTRY_ITEM(&ipHdr->dataDSCP)) { + if (!ip) + ip =3D nftablesGetIpTypeByDataType(&ipHdr->dataDSCP); + + if (virNWFilterPrintDataType(vars, + number, sizeof(number), + &ipHdr->dataDSCP) < 0) + return -1; + + virFirewallCmdAddArgList(fw, fwrule, ip, "dscp", NULL); + if (ENTRY_WANT_NEG_SIGN(&ipHdr->dataDSCP)) + virFirewallCmdAddArg(fw, fwrule, "!"); + virFirewallCmdAddArgList(fw, fwrule, number, NULL); + } + + return 0; +} + +static int +nftablesHandleEthHdr(virFirewall *fw, + virFirewallCmd *fwrule, + virNWFilterVarCombIter *vars, + ethHdrDataDef *ethHdr, + bool reverseRule) +{ + char macaddr[VIR_MAC_STRING_BUFLEN]; + char macmask[VIR_MAC_STRING_BUFLEN]; + const char *saddr =3D reverseRule ? "daddr" : "saddr"; + const char *daddr =3D reverseRule ? "saddr" : "daddr"; + + if (HAS_ENTRY_ITEM(ðHdr->dataSrcMACAddr)) { + const char *comparison =3D NULL; + if (virNWFilterPrintDataType(vars, + macaddr, sizeof(macaddr), + ðHdr->dataSrcMACAddr) < 0) + return -1; + + virFirewallCmdAddArgList(fw, fwrule, "ether", saddr, NULL); + comparison =3D ENTRY_WANT_NEG_SIGN(ðHdr->dataSrcMACAddr) ? + "!=3D" : "=3D=3D"; + + if (HAS_ENTRY_ITEM(ðHdr->dataSrcMACMask)) { + if (virNWFilterPrintDataType(vars, + macmask, sizeof(macmask), + ðHdr->dataSrcMACMask) < 0) + return -1; + + virFirewallCmdAddArgFormat(fw, fwrule, + "& %s %s %s", + macmask, comparison, macaddr); + } else { + virFirewallCmdAddArgList(fw, fwrule, comparison, macaddr, NULL= ); + } + } + + if (HAS_ENTRY_ITEM(ðHdr->dataDstMACAddr)) { + const char *comparison =3D NULL; + if (virNWFilterPrintDataType(vars, + macaddr, sizeof(macaddr), + ðHdr->dataDstMACAddr) < 0) + return -1; + + virFirewallCmdAddArgList(fw, fwrule, "ether", daddr, NULL); + comparison =3D ENTRY_WANT_NEG_SIGN(ðHdr->dataDstMACAddr) ? + "!=3D" : "=3D=3D"; + + if (HAS_ENTRY_ITEM(ðHdr->dataDstMACMask)) { + if (virNWFilterPrintDataType(vars, + macmask, sizeof(macmask), + ðHdr->dataDstMACMask) < 0) + return -1; + + virFirewallCmdAddArgFormat(fw, fwrule, + "& %s %s %s", + macmask, comparison, macaddr); + } else { + virFirewallCmdAddArgList(fw, fwrule, comparison, macaddr, NULL= ); + } + } + + return 0; +} + +static int +insertRuleArg2Param(virFirewall *fw, + virFirewallCmd *fwrule, + virNWFilterVarCombIter *vars, + nwItemDesc *itemLow, + nwItemDesc *itemHigh, + const char *argument, + const char *seperator) +{ + char field[VIR_INT64_STR_BUFLEN]; + char fieldalt[VIR_INT64_STR_BUFLEN]; + + if (HAS_ENTRY_ITEM(itemLow)) { + if (virNWFilterPrintDataType(vars, + field, sizeof(field), + itemLow) < 0) + return -1; + virFirewallCmdAddArg(fw, fwrule, argument); + if (ENTRY_WANT_NEG_SIGN(itemLow)) + virFirewallCmdAddArg(fw, fwrule, "!=3D"); + if (HAS_ENTRY_ITEM(itemHigh)) { + if (virNWFilterPrintDataType(vars, + fieldalt, sizeof(fieldalt), + itemHigh) < 0) + return -1; + virFirewallCmdAddArgFormat(fw, fwrule, + "%s%s%s", field, seperator, fieldal= t); + } else { + virFirewallCmdAddArg(fw, fwrule, field); + } + } + + return 0; +} + +static int +nftablesHandlePortData(virFirewall *fw, + virFirewallCmd *fwrule, + virNWFilterVarCombIter *vars, + const char *protocol, + portDataDef *portData, + bool reverseRule) +{ + char dport[VIR_INT64_STR_BUFLEN]; + char sport[VIR_INT64_STR_BUFLEN]; + + g_snprintf(dport, sizeof(dport), reverseRule ? "%s sport" : "%s dport", + protocol); + g_snprintf(sport, sizeof(sport), reverseRule ? "%s dport": "%s sport", + protocol); + + if (insertRuleArg2Param(fw, fwrule, vars, + &portData->dataDstPortStart, + &portData->dataDstPortEnd, dport, "-") < 0) + return -1; + if (insertRuleArg2Param(fw, fwrule, vars, + &portData->dataSrcPortStart, + &portData->dataSrcPortEnd, sport, "-") < 0) + return -1; + + return 0; +} + +static int +nftablesHandleMacAddr(virFirewall *fw, + virFirewallCmd *fwrule, + virNWFilterVarCombIter *vars, + nwItemDesc *macaddr, + const char *argument) +{ + char macstr[VIR_MAC_STRING_BUFLEN]; + + if (HAS_ENTRY_ITEM(macaddr)) { + if (virNWFilterPrintDataType(vars, + macstr, sizeof(macstr), + macaddr) < 0) + return -1; + + virFirewallCmdAddArg(fw, fwrule, argument); + if (ENTRY_WANT_NEG_SIGN(macaddr)) + virFirewallCmdAddArg(fw, fwrule, "!=3D"); + virFirewallCmdAddArg(fw, fwrule, macstr); + } + + return 0; +} + +static int +nftablesHandleSrcMacAddr(virFirewall *fw, + virFirewallCmd *fwrule, + virNWFilterVarCombIter *vars, + nwItemDesc *srcMacAddr) +{ + return nftablesHandleMacAddr(fw, fwrule, vars, srcMacAddr, "ether sadd= r"); +} + +static void +printStateMatchFlags(int32_t flags, char **bufptr) +{ + g_auto(virBuffer) buf =3D VIR_BUFFER_INITIALIZER; + virNWFilterPrintStateMatchFlags(&buf, "", flags, false); + + /* str to lower needed as nft doesn't accept upper case states */ + g_string_ascii_down(buf.str); + + *bufptr =3D virBufferContentAndReset(&buf); +} + +static bool +nftablesRuleNeedsConntrack(virNWFilterRuleDef *rule) +{ + /* ip only */ + if (virNWFilterRuleIsProtocolEthernet(rule)) { + return false; + } + + /* Skip conntrack if statematch=3Dfalse flag has been set */ + if (rule->flags & RULE_FLAG_NO_STATEMATCH) { + return false; + } + + /* If no state flags are set and rule->action is not accept, + * we should skip conntrack */ + if (!(rule->flags & IPTABLES_STATE_FLAGS) && + rule->action !=3D VIR_NWFILTER_RULE_ACTION_ACCEPT) { + return false; + } + + return true; +} + +static bool +nftablesRuleNeedsConnLimit(ipHdrDataDef *ipHdr, + bool directionIn) +{ + return HAS_ENTRY_ITEM(&ipHdr->dataConnlimitAbove) && !directionIn; +} + +static char * +nftablesPrintTCPFlags(uint8_t flags) +{ + g_auto(virBuffer) buf =3D VIR_BUFFER_INITIALIZER; + g_autofree char *flagsstr =3D NULL; + + if (flags =3D=3D 0) { + virBufferAddLit(&buf, "0"); + } else if (flags =3D=3D 0x3f) { + virBufferAddLit(&buf, "*"); + } else { + flagsstr =3D virNWFilterPrintTCPFlags(flags); + virBufferAdd(&buf, flagsstr, -1); + g_string_ascii_down(buf.str); + } + + return virBufferContentAndReset(&buf); +} + +/* + * nftablesHandleInetRule: + * @fw: the firewall ruleset to add to + * @fwrule: the firewall command to add arguments to + * @vars : A map containing the variables to resolve + * @rule: The rule of the filter to convert + * @directionIn: direction of the rule, true for in false for out + * directionIn is needed for additional conntrack logic + * @reverseRule: Whether to reverse src and dst attributes + * ethernet reverse flag is set conntrack requires a reverse + * rule on the opposite chain + * + * Set arguments on fwrule based on given struct *rule + * + */ +static int +nftablesHandleInetRule(virFirewall *fw, + virFirewallCmd *fwrule, + virNWFilterVarCombIter *vars, + virNWFilterRuleDef *rule, + bool directionIn, + bool reverseRule) +{ + char number[VIR_INT64_STR_BUFLEN]; + bool hasICMPType =3D false; + bool skipDirection =3D false; + g_autofree char *matchState =3D NULL; + ipHdrDataDef *ipHdr =3D NULL; + const char *protocol =3D nftablesGetProtocolType(rule->prtclType); + + virFirewallCmdAddArgList(fw, fwrule, "ether", "type", NULL); + if (virNWFilterRuleIsProtocolIPv6(rule) && + !virNWFilterRuleIsProtocolIPv4(rule)) { + virFirewallCmdAddArg(fw, fwrule, "ip6"); + } else if (virNWFilterRuleIsProtocolIPv4(rule) && + !virNWFilterRuleIsProtocolIPv6(rule)) { + virFirewallCmdAddArg(fw, fwrule, "ip"); + } + + switch ((int)rule->prtclType) { + case VIR_NWFILTER_RULE_PROTOCOL_TCP: + case VIR_NWFILTER_RULE_PROTOCOL_TCPoIPV6: + virFirewallCmdAddArgList(fw, fwrule, "meta", "l4proto", "tcp", NUL= L); + ipHdr =3D &rule->p.tcpHdrFilter.ipHdr; + + if (nftablesHandleSrcMacAddr(fw, fwrule, vars, + &rule->p.tcpHdrFilter.dataSrcMACAddr)= < 0) + return -1; + if (nftablesHandleIPHdr(fw, fwrule, vars, ipHdr, reverseRule) < 0) + return -1; + + if (HAS_ENTRY_ITEM(&rule->p.tcpHdrFilter.dataTCPFlags)) { + g_autofree char *mask =3D NULL; + g_autofree char *flags =3D NULL; + + /* flags & syn =3D=3D syn */ + virFirewallCmdAddArgList(fw, fwrule, "tcp", "flags", "&", NULL= ); + + if (!(mask =3D nftablesPrintTCPFlags( + rule->p.tcpHdrFilter.dataTCPFlags.u.tcpFlags.mas= k))) + return -1; + virFirewallCmdAddArgList(fw, fwrule, mask, ENTRY_WANT_NEG_SIGN( + &rule->p.tcpHdrFilter.dataTCPF= lags) + ? "!=3D" : "=3D=3D", NULL); + + if (!(flags =3D nftablesPrintTCPFlags( + rule->p.tcpHdrFilter.dataTCPFlags.u.tcpFlags.fl= ags))) + return -1; + virFirewallCmdAddArgList(fw, fwrule, "{", flags, "}", NULL); + } + + if (HAS_ENTRY_ITEM(&rule->p.tcpHdrFilter.dataTCPOption)) { + if (virNWFilterPrintDataType(vars, number, sizeof(number), + &rule->p.tcpHdrFilter.dataTCPOpti= on) < 0) + return -1; + + virFirewallCmdAddArgList(fw, fwrule, "tcp", "option", NULL); + if (ENTRY_WANT_NEG_SIGN(&rule->p.tcpHdrFilter.dataTCPOption)) + virFirewallCmdAddArg(fw, fwrule, "!"); + virFirewallCmdAddArg(fw, fwrule, number); + } + + if (nftablesHandlePortData(fw, fwrule, vars, protocol, + &rule->p.tcpHdrFilter.portData, reverseRule) < 0) + return -1; + + break; + case VIR_NWFILTER_RULE_PROTOCOL_UDP: + case VIR_NWFILTER_RULE_PROTOCOL_UDPoIPV6: + virFirewallCmdAddArgList(fw, fwrule, "meta", "l4proto", "udp", NUL= L); + ipHdr =3D &rule->p.udpHdrFilter.ipHdr; + + if (nftablesHandleSrcMacAddr(fw, fwrule, vars, + &rule->p.udpHdrFilter.dataSrcMACAddr)= < 0) + return -1; + if (nftablesHandleIPHdr(fw, fwrule, vars, ipHdr, reverseRule) < 0) + return -1; + if (nftablesHandlePortData(fw, fwrule, vars, protocol, + &rule->p.udpHdrFilter.portData, reverseRule) < 0) + return -1; + break; + case VIR_NWFILTER_RULE_PROTOCOL_UDPLITE: + case VIR_NWFILTER_RULE_PROTOCOL_UDPLITEoIPV6: + virFirewallCmdAddArgList(fw, fwrule, "meta", "l4proto", "udplite",= NULL); + ipHdr =3D &rule->p.udpliteHdrFilter.ipHdr; + + if (nftablesHandleSrcMacAddr(fw, fwrule, vars, + &rule->p.udpliteHdrFilter.dataSrcMACA= ddr) < 0) + return -1; + if (nftablesHandleIPHdr(fw, fwrule, vars, ipHdr, reverseRule) < 0) + return -1; + break; + case VIR_NWFILTER_RULE_PROTOCOL_ESP: + case VIR_NWFILTER_RULE_PROTOCOL_ESPoIPV6: + virFirewallCmdAddArgList(fw, fwrule, "meta", "l4proto", "esp", NUL= L); + ipHdr =3D &rule->p.espHdrFilter.ipHdr; + + if (nftablesHandleSrcMacAddr(fw, fwrule, vars, + &rule->p.espHdrFilter.dataSrcMACAddr) = < 0) + return -1; + if (nftablesHandleIPHdr(fw, fwrule, vars, ipHdr, reverseRule) < 0) + return -1; + break; + case VIR_NWFILTER_RULE_PROTOCOL_AH: + case VIR_NWFILTER_RULE_PROTOCOL_AHoIPV6: + virFirewallCmdAddArgList(fw, fwrule, "meta", "l4proto", "ah", NULL= ); + ipHdr =3D &rule->p.ahHdrFilter.ipHdr; + + if (nftablesHandleSrcMacAddr(fw, fwrule, vars, + &rule->p.ahHdrFilter.dataSrcMACAddr) <= 0) + return -1; + if (nftablesHandleIPHdr(fw, fwrule, vars, ipHdr, reverseRule) < 0) + return -1; + break; + case VIR_NWFILTER_RULE_PROTOCOL_SCTP: + case VIR_NWFILTER_RULE_PROTOCOL_SCTPoIPV6: + virFirewallCmdAddArgList(fw, fwrule, "meta", "l4proto", "sctp", NU= LL); + ipHdr =3D &rule->p.sctpHdrFilter.ipHdr; + + if (nftablesHandleSrcMacAddr(fw, fwrule, vars, + &rule->p.sctpHdrFilter.dataSrcMACAddr)= < 0) + return -1; + + if (nftablesHandleIPHdr(fw, fwrule, vars, ipHdr, reverseRule) < 0) + return -1; + + if (nftablesHandlePortData(fw, fwrule, vars, protocol, + &rule->p.sctpHdrFilter.portData, reverseRule) < 0) + return -1; + break; + case VIR_NWFILTER_RULE_PROTOCOL_ICMP: + case VIR_NWFILTER_RULE_PROTOCOL_ICMPV6: + if (rule->prtclType =3D=3D VIR_NWFILTER_RULE_PROTOCOL_ICMPV6) { + virFirewallCmdAddArgList(fw, fwrule, "ip6", "nexthdr", NULL); + } else { + virFirewallCmdAddArgList(fw, fwrule, "ip", "protocol", NULL); + } + virFirewallCmdAddArg(fw, fwrule, protocol); + + ipHdr =3D &rule->p.icmpHdrFilter.ipHdr; + hasICMPType =3D true; + + if (nftablesHandleSrcMacAddr(fw, fwrule, vars, + &rule->p.icmpHdrFilter.dataSrcMACAddr)= < 0) + return -1; + + if (nftablesHandleIPHdr(fw, fwrule, vars, ipHdr, reverseRule) < 0) + return -1; + + if (HAS_ENTRY_ITEM(&rule->p.icmpHdrFilter.dataICMPType)) { + virFirewallCmdAddArgList(fw, fwrule, protocol, "type", NULL); + + if (virNWFilterPrintDataType(vars, + number, sizeof(number), + &rule->p.icmpHdrFilter.dataICMPTy= pe) < 0) + return -1; + + if (ENTRY_WANT_NEG_SIGN(&rule->p.icmpHdrFilter.dataICMPType)) + virFirewallCmdAddArg(fw, fwrule, "!=3D"); + + virFirewallCmdAddArg(fw, fwrule, number); + + if (HAS_ENTRY_ITEM(&rule->p.icmpHdrFilter.dataICMPCode)) { + virFirewallCmdAddArgList(fw, fwrule, protocol, "code", NUL= L); + + if (virNWFilterPrintDataType(vars, + number, sizeof(number), + &rule->p.icmpHdrFilter.dataIC= MPCode) < 0) + return -1; + + if (ENTRY_WANT_NEG_SIGN(&rule->p.icmpHdrFilter.dataICMPCod= e)) + virFirewallCmdAddArg(fw, fwrule, "!=3D"); + + virFirewallCmdAddArg(fw, fwrule, number); + } + } + break; + case VIR_NWFILTER_RULE_PROTOCOL_IGMP: + virFirewallCmdAddArgList(fw, fwrule, "meta", "l4proto", "igmp", NU= LL); + ipHdr =3D &rule->p.igmpHdrFilter.ipHdr; + + if (nftablesHandleSrcMacAddr(fw, fwrule, vars, + &rule->p.igmpHdrFilter.dataSrcMACAddr)= < 0) + return -1; + + if (nftablesHandleIPHdr(fw, fwrule, vars, ipHdr, reverseRule) < 0) + return -1; + break; + case VIR_NWFILTER_RULE_PROTOCOL_ALL: + case VIR_NWFILTER_RULE_PROTOCOL_ALLoIPV6: + ipHdr =3D &rule->p.allHdrFilter.ipHdr; + if (nftablesHandleSrcMacAddr(fw, fwrule, vars, + &rule->p.allHdrFilter.dataSrcMACAddr) = < 0) + return -1; + + if (nftablesHandleIPHdr(fw, fwrule, vars, ipHdr, reverseRule) < 0) + return -1; + break; + default: + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unexpected protocol %1$d"), + rule->prtclType); + return -1; + } + + /* no support for ipset */ + if (HAS_ENTRY_ITEM(&ipHdr->dataIPSet) && + HAS_ENTRY_ITEM(&ipHdr->dataIPSetFlags)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Rule contains unsupported ipset flags")); + } + + /* apply conn limit only to outgoing connections */ + if (nftablesRuleNeedsConnLimit(ipHdr, directionIn)) { + if (virNWFilterPrintDataType(vars, + number, sizeof(number), + &ipHdr->dataConnlimitAbove) < 0) + return -1; + + /* place connlimit after potential state logic + since this is the most useful order */ + virFirewallCmdAddArgList(fw, fwrule, "ct", "count", "over", NULL); + if (ENTRY_WANT_NEG_SIGN(&ipHdr->dataConnlimitAbove)) + virFirewallCmdAddArg(fw, fwrule, "!=3D"); + virFirewallCmdAddArgList(fw, fwrule, number, NULL); + } + + if (nftablesRuleNeedsConntrack(rule)) { + /* we skip direction when ct count is set or type is icmp */ + skipDirection =3D nftablesRuleNeedsConnLimit(ipHdr, directionIn) || + hasICMPType; + + /* no direction */ + if (!skipDirection) + /* reverse rules are replies, + * otherwise it is the originating direction */ + virFirewallCmdAddArgList(fw, fwrule, "ct", "direction", + (reverseRule ? "reply" : "original"), + NULL); + + if (rule->flags & IPTABLES_STATE_FLAGS && + !(rule->flags & RULE_FLAG_STATE_NONE)) { + printStateMatchFlags(rule->flags, &matchState); + } else { + /* static state match is needed because when no state flags + * have been set but statematch is enabled we need a default */ + /* reverse rules are established connections */ + matchState =3D g_strdup(reverseRule ? + "established" : + "new,established"); + } + virFirewallCmdAddArgList(fw, fwrule, "ct", "state", matchState, NU= LL); + } + + return 0; +} + +static int +insertRuleArgParam(virFirewall *fw, + virFirewallCmd *fwrule, + virNWFilterVarCombIter *vars, + nwItemDesc *item, + const char *argument) +{ + char field[VIR_INT64_STR_BUFLEN]; + + if (HAS_ENTRY_ITEM(item)) { + if (virNWFilterPrintDataType(vars, + field, sizeof(field), + item) < 0) + return -1; + virFirewallCmdAddArg(fw, fwrule, argument); + if (ENTRY_WANT_NEG_SIGN(item)) + virFirewallCmdAddArg(fw, fwrule, "!=3D"); + + virFirewallCmdAddArg(fw, fwrule, field); + } + + return 0; +} + +static int +insertRuleArgParamHex(virFirewall *fw, + virFirewallCmd *fwrule, + virNWFilterVarCombIter *vars, + nwItemDesc *item, + const char *argument) +{ + char field[VIR_INT64_STR_BUFLEN]; + + if (HAS_ENTRY_ITEM(item)) { + if (virNWFilterPrintDataTypeAsHex(vars, + field, sizeof(field), + item) < 0) + return -1; + virFirewallCmdAddArg(fw, fwrule, argument); + if (ENTRY_WANT_NEG_SIGN(item)) + virFirewallCmdAddArg(fw, fwrule, "!=3D"); + + virFirewallCmdAddArg(fw, fwrule, field); + } + + return 0; +} + +/* + * nftablesHandleEthernetRule: + * @fw: the firewall ruleset to add to + * @vars : A map containing the variables to resolve + * @rule: The rule of the filter to convert + * @reverseRule : Whether to reverse src and dst attributes + * ethernet reverse flag is set when direction=3D'inout' is= set + * + * Set arguments on fwrule based on given struct *rule + * + */ +static int +nftablesHandleEthernetRule(virFirewall *fw, + virFirewallCmd *fwrule, + virNWFilterVarCombIter *vars, + virNWFilterRuleDef *rule, + bool reverseRule) +{ + char number[VIR_INT64_STR_BUFLEN]; + char ipaddr[INET_ADDRSTRLEN]; + char ipmask[INET_ADDRSTRLEN]; + char ipv6addr[INET6_ADDRSTRLEN]; + bool hasMask =3D false; + const char *saddr =3D reverseRule ? "daddr" : "saddr"; + const char *daddr =3D reverseRule ? "saddr" : "daddr"; + + switch ((int)rule->prtclType) { + case VIR_NWFILTER_RULE_PROTOCOL_MAC: + if (nftablesHandleEthHdr(fw, fwrule, + vars, + &rule->p.ethHdrFilter.ethHdr, reverseRule= ) < 0) + return -1; + + if (insertRuleArgParamHex(fw, fwrule, vars, + &rule->p.ethHdrFilter.dataProtocolID, + "ether type") < 0) + return -1; + break; + case VIR_NWFILTER_RULE_PROTOCOL_IP: + virFirewallCmdAddArgList(fw, fwrule, "ether", "type", NULL); + if (ENTRY_WANT_NEG_SIGN(&rule->p.ipHdrFilter.ipHdr.dataProtocolID)) + virFirewallCmdAddArg(fw, fwrule, "!=3D"); + virFirewallCmdAddArg(fw, fwrule, "ip"); + + if (nftablesHandleEthHdr(fw, fwrule, + vars, + &rule->p.ipHdrFilter.ethHdr, reverseRule)= < 0) + return -1; + + if (HAS_ENTRY_ITEM(&rule->p.ipHdrFilter.ipHdr.dataSrcIPAddr)) { + if (virNWFilterPrintDataType(vars, + ipaddr, sizeof(ipaddr), + &rule->p.ipHdrFilter.ipHdr.dataSr= cIPAddr) < 0) + return -1; + + virFirewallCmdAddArgList(fw, fwrule, "ip", saddr, NULL); + if (ENTRY_WANT_NEG_SIGN(&rule->p.ipHdrFilter.ipHdr.dataSrcIPAd= dr)) + virFirewallCmdAddArg(fw, fwrule, "!=3D"); + + if (HAS_ENTRY_ITEM(&rule->p.ipHdrFilter.ipHdr.dataSrcIPMask)) { + if (virNWFilterPrintDataType(vars, + number, sizeof(number), + &rule->p.ipHdrFilter.ipHdr.da= taSrcIPMask) < 0) + return -1; + virFirewallCmdAddArgFormat(fw, fwrule, + "%s/%s", ipaddr, number); + } else { + virFirewallCmdAddArg(fw, fwrule, ipaddr); + } + } + + if (HAS_ENTRY_ITEM(&rule->p.ipHdrFilter.ipHdr.dataDstIPAddr)) { + if (virNWFilterPrintDataType(vars, + ipaddr, sizeof(ipaddr), + &rule->p.ipHdrFilter.ipHdr.dataDs= tIPAddr) < 0) + return -1; + + virFirewallCmdAddArgList(fw, fwrule, "ip", daddr, NULL); + if (ENTRY_WANT_NEG_SIGN(&rule->p.ipHdrFilter.ipHdr.dataDstIPAd= dr)) + virFirewallCmdAddArg(fw, fwrule, "!=3D"); + + if (HAS_ENTRY_ITEM(&rule->p.ipHdrFilter.ipHdr.dataDstIPMask)) { + if (virNWFilterPrintDataType(vars, + number, sizeof(number), + &rule->p.ipHdrFilter.ipHdr.da= taDstIPMask) < 0) + return -1; + virFirewallCmdAddArgFormat(fw, fwrule, + "%s/%s", ipaddr, number); + } else { + virFirewallCmdAddArg(fw, fwrule, ipaddr); + } + } + + if (insertRuleArgParam(fw, fwrule, vars, + &rule->p.ipHdrFilter.ipHdr.dataProtocolID, + "ip protocol") < 0) + return -1; + if (insertRuleArg2Param(fw, fwrule, vars, + &rule->p.ipHdrFilter.portData.dataSrcPortS= tart, + &rule->p.ipHdrFilter.portData.dataSrcPortE= nd, + reverseRule ? "th dport" : "th sport", "-"= ) < 0) + return -1; + if (insertRuleArg2Param(fw, fwrule, vars, + &rule->p.ipHdrFilter.portData.dataDstPortS= tart, + &rule->p.ipHdrFilter.portData.dataDstPortE= nd, + reverseRule ? "th sport" : "th dport", "-"= ) < 0) + return -1; + if (insertRuleArgParamHex(fw, fwrule, vars, + &rule->p.ipHdrFilter.ipHdr.dataDSCP, + "ip dscp") < 0) + return -1; + break; + case VIR_NWFILTER_RULE_PROTOCOL_ARP: + case VIR_NWFILTER_RULE_PROTOCOL_RARP: + if (nftablesHandleEthHdr(fw, fwrule, + vars, + &rule->p.arpHdrFilter.ethHdr, reverseRule= ) < 0) + return -1; + + virFirewallCmdAddArgList(fw, fwrule, "ether", "type", NULL); + virFirewallCmdAddArgFormat(fw, fwrule, "0x%x", + (rule->prtclType =3D=3D VIR_NWFILTER_RULE_PROTO= COL_ARP) + ? l3_protocols[VIR_NWFILTER_PROTO_IDX_ARP].attr + : l3_protocols[VIR_NWFILTER_PROTO_IDX_RARP].att= r); + + if (insertRuleArgParam(fw, fwrule, vars, + &rule->p.arpHdrFilter.dataHWType, + "arp htype") < 0) + return -1; + if (insertRuleArgParam(fw, fwrule, vars, + &rule->p.arpHdrFilter.dataOpcode, + "arp operation") < 0) + return -1; + if (insertRuleArgParamHex(fw, fwrule, vars, + &rule->p.arpHdrFilter.dataProtocolType, + "arp ptype") < 0) + return -1; + + if (HAS_ENTRY_ITEM(&rule->p.arpHdrFilter.dataARPSrcIPAddr)) { + if (virNWFilterPrintDataType(vars, + ipaddr, sizeof(ipaddr), + &rule->p.arpHdrFilter.dataARPSrcI= PAddr) < 0) + return -1; + + if (HAS_ENTRY_ITEM(&rule->p.arpHdrFilter.dataARPSrcIPMask)) { + if (virNWFilterPrintDataType(vars, + ipmask, sizeof(ipmask), + &rule->p.arpHdrFilter.dataARP= SrcIPMask) < 0) + return -1; + hasMask =3D true; + } + + virFirewallCmdAddArgList(fw, fwrule, "arp", saddr, "ip", NULL); + if (ENTRY_WANT_NEG_SIGN(&rule->p.arpHdrFilter.dataARPSrcIPAddr= )) + virFirewallCmdAddArg(fw, fwrule, "!=3D"); + virFirewallCmdAddArgFormat(fw, fwrule, + "%s/%s", ipaddr, hasMask ? ipmask := "32"); + } + + if (HAS_ENTRY_ITEM(&rule->p.arpHdrFilter.dataARPDstIPAddr)) { + if (virNWFilterPrintDataType(vars, + ipaddr, sizeof(ipaddr), + &rule->p.arpHdrFilter.dataARPDstI= PAddr) < 0) + return -1; + + if (HAS_ENTRY_ITEM(&rule->p.arpHdrFilter.dataARPDstIPMask)) { + if (virNWFilterPrintDataType(vars, + ipmask, sizeof(ipmask), + &rule->p.arpHdrFilter.dataARP= DstIPMask) < 0) + return -1; + hasMask =3D true; + } + + virFirewallCmdAddArgList(fw, fwrule, "arp", daddr, "ip", NULL); + if (ENTRY_WANT_NEG_SIGN(&rule->p.arpHdrFilter.dataARPDstIPAddr= )) + virFirewallCmdAddArg(fw, fwrule, "!=3D"); + virFirewallCmdAddArgFormat(fw, fwrule, + "%s/%s", ipaddr, hasMask ? ipmask := "32"); + } + + if (nftablesHandleMacAddr(fw, fwrule, vars, + &rule->p.arpHdrFilter.dataARPSrcMACAddr, + reverseRule ? "ether daddr": "ether sadd= r") < 0) + return -1; + if (nftablesHandleMacAddr(fw, fwrule, vars, + &rule->p.arpHdrFilter.dataARPDstMACAddr, + reverseRule ? "ether saddr": "ether dadd= r") < 0) + return -1; + + if (HAS_ENTRY_ITEM(&rule->p.arpHdrFilter.dataGratuitousARP) && + rule->p.arpHdrFilter.dataGratuitousARP.u.boolean) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("GARP filtering in nftables is not supported"= )); + return -1; + } + break; + case VIR_NWFILTER_RULE_PROTOCOL_IPV6: + if (nftablesHandleEthHdr(fw, fwrule, + vars, + &rule->p.ipv6HdrFilter.ethHdr, reverseRul= e) < 0) + return -1; + + virFirewallCmdAddArgList(fw, fwrule, "ether", "type", "ip6", NULL); + + if (HAS_ENTRY_ITEM(&rule->p.ipv6HdrFilter.ipHdr.dataSrcIPAddr)) { + if (virNWFilterPrintDataType(vars, + ipv6addr, sizeof(ipv6addr), + &rule->p.ipv6HdrFilter.ipHdr.data= SrcIPAddr) < 0) + return -1; + + virFirewallCmdAddArgList(fw, fwrule, "ip6", saddr, NULL); + if (ENTRY_WANT_NEG_SIGN(&rule->p.ipv6HdrFilter.ipHdr.dataSrcIP= Addr)) + virFirewallCmdAddArg(fw, fwrule, "!=3D"); + + if (HAS_ENTRY_ITEM(&rule->p.ipv6HdrFilter.ipHdr.dataSrcIPMask)= ) { + if (virNWFilterPrintDataType(vars, + number, sizeof(number), + &rule->p.ipv6HdrFilter.ipHdr.= dataSrcIPMask) < 0) + return -1; + virFirewallCmdAddArgFormat(fw, fwrule, + "%s/%s", ipv6addr, number); + } else { + virFirewallCmdAddArg(fw, fwrule, ipv6addr); + } + } + + if (HAS_ENTRY_ITEM(&rule->p.ipv6HdrFilter.ipHdr.dataDstIPAddr)) { + + if (virNWFilterPrintDataType(vars, + ipv6addr, sizeof(ipv6addr), + &rule->p.ipv6HdrFilter.ipHdr.data= DstIPAddr) < 0) + return -1; + + virFirewallCmdAddArgList(fw, fwrule, "ip6", daddr, NULL); + if (ENTRY_WANT_NEG_SIGN(&rule->p.ipv6HdrFilter.ipHdr.dataDstIP= Addr)) + virFirewallCmdAddArg(fw, fwrule, "!=3D"); + + if (HAS_ENTRY_ITEM(&rule->p.ipv6HdrFilter.ipHdr.dataDstIPMask)= ) { + if (virNWFilterPrintDataType(vars, + number, sizeof(number), + &rule->p.ipv6HdrFilter.ipHdr.= dataDstIPMask) < 0) + return -1; + virFirewallCmdAddArgFormat(fw, fwrule, + "%s/%s", ipv6addr, number); + } else { + virFirewallCmdAddArg(fw, fwrule, ipv6addr); + } + } + + if (insertRuleArgParam(fw, fwrule, vars, + &rule->p.ipv6HdrFilter.ipHdr.dataProtocolID, + "ip6 nexthdr") < 0) + return -1; + if (insertRuleArg2Param(fw, fwrule, vars, + &rule->p.ipv6HdrFilter.portData.dataSrcPor= tStart, + &rule->p.ipv6HdrFilter.portData.dataSrcPor= tEnd, + reverseRule ? "th dport" : "th sport", "-"= ) < 0) + return -1; + if (insertRuleArg2Param(fw, fwrule, vars, + &rule->p.ipv6HdrFilter.portData.dataDstPor= tStart, + &rule->p.ipv6HdrFilter.portData.dataDstPor= tEnd, + reverseRule ? "th sport" : "th dport", "-"= ) < 0) + return -1; + if (HAS_ENTRY_ITEM(&rule->p.ipv6HdrFilter.dataICMPTypeStart) || + HAS_ENTRY_ITEM(&rule->p.ipv6HdrFilter.dataICMPCodeStart)) { + + if (insertRuleArgParam(fw, fwrule, vars, + &rule->p.ipv6HdrFilter.dataICMPTypeStar= t, + "icmpv6 type") < 0) + return -1; + if (insertRuleArgParam(fw, fwrule, vars, + &rule->p.ipv6HdrFilter.dataICMPCodeStar= t, + "icmpv6 code") < 0) + return -1; + } + break; + case VIR_NWFILTER_RULE_PROTOCOL_VLAN: + if (nftablesHandleEthHdr(fw, fwrule, + vars, + &rule->p.vlanHdrFilter.ethHdr, reverseRul= e) < 0) + return -1; + + virFirewallCmdAddArgList(fw, fwrule, "ether", "type", "0x8100", NU= LL); + + if (insertRuleArgParam(fw, fwrule, vars, + &rule->p.vlanHdrFilter.dataVlanID, + "vlan id") < 0) + return -1; + if (insertRuleArgParam(fw, fwrule, vars, + &rule->p.vlanHdrFilter.dataVlanEncap, + "vlan type") < 0) + return -1; + break; + case VIR_NWFILTER_RULE_PROTOCOL_STP: + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("STP filtering in nftables is not supported")); + return -1; + break; + case VIR_NWFILTER_RULE_PROTOCOL_NONE: + break; + default: + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unexpected rule protocol '%1$d', priority '%2$d'= "), + rule->prtclType, + rule->priority); + return -1; + } + + return 0; +} + +/* + * nftablesGetNFTable: + * + * @rule: The rule of the filter + * + * We have a seperate table, due to eb/iptables compatibilty + * Ideally we allow users to have only 1 table in which all rules are plac= ed + * We'll need to turn that into a nwfilter feature + */ +static const char *nftablesGetNFTable(virNWFilterRuleDef *rule) +{ + return virNWFilterRuleIsProtocolEthernet(rule) ? + NF_ETHERNET_TABLE : + NF_INET_TABLE; +} + +static void +nftablesAddCmdUserComment(virFirewall *fw, + virFirewallCmd *fwrule, + virNWFilterRuleDef *rule) +{ + g_autofree char *comment =3D NULL; + comment =3D virStringReplace( + rule->p.allHdrFilter.ipHdr.dataComment.u.string, + "\"", "'"); + + virFirewallCmdAddArgFormat(fw, fwrule, + "\"priority=3D%d,usercomment=3D%s\"", + rule->priority, comment); +} + +/* + * nftablesCreateRuleInstance: + * @fw: the firewall ruleset instance + * @layer: the firewall layer + * @chainPrefix: The suffix to put on the end of the name of the chain + * @rule: The rule of the filter to convert + * @ifname : The name of the interface to apply the rule to + * @vars : A map containing the variables to resolve + * @res : The data structure to store the result(s) into + * + * Convert a single rule into its representation for later instantiation + * + * Returns 0 in case of success with the result stored in the data structu= re + * pointed to by res, -1 otherwise + */ +static int +nftablesCreateRuleInstance(virFirewall *fw, + virFirewallLayer layer, + const char *chainPrefix, + virNWFilterRuleDef *rule, + const char *ifname, + virNWFilterVarCombIter *vars, + bool directionIn, + bool reverseRule) +{ + int ret =3D -1; + char chain[MAX_NF_CHAINNAME_LENGTH]; + virFirewallCmd *fwrule =3D NULL; + const char *root =3D virNWFilterChainSuffixTypeToString( + VIR_NWFILTER_CHAINSUFFIX_ROOT); + const char *nftablesRootTable =3D nftablesGetNFTable(rule); + + /* apply root rules directly on the root chain, for example: + * vnet1-in vnet1-out */ + if (STREQ(chainPrefix, root)) { + g_snprintf(chain, sizeof(chain), "n-%s-%s", ifname, + directionIn ? "in" : "out"); + } else { + g_snprintf(chain, sizeof(chain), "n-%s-%s-%s", ifname, chainPrefix, + directionIn ? "in" : "out"); + } + + fwrule =3D virFirewallAddCmd(fw, layer, + "add", "rule", "bridge", + nftablesRootTable, chain, NULL); + + if (virNWFilterRuleIsProtocolEthernet(rule)) { + if (nftablesHandleEthernetRule(fw, fwrule, vars, rule, reverseRule= ) < 0) + goto cleanup; + } else { + if (nftablesHandleInetRule(fw, fwrule, vars, rule, + directionIn, reverseRule) < 0) + goto cleanup; + } + + if (NF_COUNTER) + virFirewallCmdAddArg(fw, fwrule, "counter"); + + /* specify the action for this rule */ + nftablesAddCmdAction(fw, fwrule, rule->action); + + /* process rule comment */ + virFirewallCmdAddArg(fw, fwrule, "comment"); + + /* ethernet rules don't have the allHdrFilter */ + if (HAS_ENTRY_ITEM(&rule->p.allHdrFilter.ipHdr.dataComment) && + !virNWFilterRuleIsProtocolEthernet(rule)) { + nftablesAddCmdUserComment(fw, fwrule, rule); + } else { + virFirewallCmdAddArgFormat(fw, fwrule, "\"priority=3D%d\"", rule->= priority); + } + + ret =3D 0; + + cleanup: + if (ret =3D=3D -1) + virFirewallRemoveCmd(fw, fwrule); + + return ret; +} + +static int +nftablesRuleInstCommand(virFirewall *fw, + virFirewallLayer layer, + const char *ifname, + virNWFilterRuleInst *rule) +{ + int ret =3D -1; + virNWFilterVarCombIter *vciter; + virNWFilterVarCombIter *tmp; + virNWFilterRuleDirectionType direction =3D rule->def->tt; + + /* rule->vars holds all the variables names that this rule will access. + * iterate over all combinations of the variables' values and instanti= ate + * the filtering rule with each combination. + */ + tmp =3D vciter =3D virNWFilterVarCombIterCreate(rule->vars, + rule->def->varAccess, + rule->def->nVarAccess); + if (!vciter) + return -1; + + do { + bool reverseRule =3D false; + + VIR_DEBUG("rule[chain=3D'%s', dir=3D'%d', prio=3D'%d', action=3D'%= d', chainPrio=3D'%d']", + rule->chainSuffix, + direction, + rule->priority, + rule->def->action, + rule->chainPriority); + + if (direction =3D=3D VIR_NWFILTER_RULE_DIRECTION_INOUT) { + /* for direction inout we run the create instance twice, + * with directionIn set to true and false */ + + /* in */ + if (nftablesCreateRuleInstance(fw, layer, rule->chainSuffix, + rule->def, ifname, tmp, + true, reverseRule) < 0) + goto cleanup; + + /* for ethernet rules, to comply to what ebiptables did, + * we set reverseRule to true on direction inout */ + reverseRule =3D virNWFilterRuleIsProtocolEthernet(rule->def); + + /* out */ + if (nftablesCreateRuleInstance(fw, layer, rule->chainSuffix, + rule->def, ifname, tmp, + false, reverseRule) < 0) + goto cleanup; + } else { + bool directionIn =3D direction =3D=3D VIR_NWFILTER_RULE_DIRECT= ION_IN; + /* otherwise we provide directionIn */ + if (nftablesCreateRuleInstance(fw, layer, rule->chainSuffix, + rule->def, ifname, tmp, + directionIn, reverseRule) < 0) + goto cleanup; + + /* rules that do conntrack matching and have action accept nee= d a + * reverse rule on the other chain to accept the reply directi= on + * so if we accept outbound we need an accept on the inbound f= or + * established connections */ + if (nftablesRuleNeedsConntrack(rule->def) && + rule->def->action =3D=3D VIR_NWFILTER_RULE_ACTION_ACCEPT) { + reverseRule =3D true; + if (nftablesCreateRuleInstance(fw, layer, rule->chainSuffi= x, + rule->def, ifname, tmp, + !directionIn, reverseRule) = < 0) + goto cleanup; + } + } + + tmp =3D virNWFilterVarCombIterNext(tmp); + } while (tmp !=3D NULL); + + ret =3D 0; + cleanup: + virNWFilterVarCombIterFree(vciter); + + return ret; +} + +/* + * nftablesCreateSubChain: + * @fw: the firewall ruleset instance + * @layer: the firewall layer + * @ifname : The name of the interface to apply the chain to + * @chainPrefix: The prefix to put on the beginning of the name of the cha= in + * @protoidx: Protocol id for conditional jump + * @rootChain: The chain to define the jump on + * @chainPostfix: The postfix to put at the end of the name of the chain + * + * Creates the user defined chain, chain=3D'mac', with chainPostfix set to= 'in' + * on vnet1 for example leads to: + * - vnet1-mac-in + * + * Rules get defined on the corresponding chain based on the chosen direct= ion, + * either in or out or both (in and out) when direction has been set to 'i= nout' + */ +static void +nftablesCreateSubChain(virFirewall *fw, + virFirewallLayer layer, + const char *nftablesTableName, + const char *chainPrefix, + enum virNWFilterProtoIdx protoidx, + const char *rootChain, + const char *chainPostfix) +{ + char chain[MAX_NF_CHAINNAME_LENGTH]; + virFirewallCmd *fwrule =3D NULL; + g_snprintf(chain, sizeof(chain), "%s-%s", chainPrefix, chainPostfix); + + VIR_DEBUG("Defining chain '%s'", chain); + + virFirewallAddCmd(fw, layer, "add", "chain", "bridge", + nftablesTableName, chain, CHAINSETTINGS, NULL); + + /* add VM interface jump */ + fwrule =3D virFirewallAddCmd(fw, layer, "add", "rule", "bridge", + nftablesTableName, rootChain, NULL); + if (protoidx !=3D -1 && l3_protocols[protoidx].attr) { + virFirewallCmdAddArgList(fw, fwrule, "ether", "type", NULL); + virFirewallCmdAddArgFormat(fw, fwrule, + "0x%04x", l3_protocols[protoidx].attr); + } + + virFirewallCmdAddArgList(fw, fwrule, "jump", chain, NULL); +} + +static void +nftablesCreateRootChainJump(virFirewall *fw, + virFirewallLayer layer, + const char *ifname, + const char *ifMatch, + const char *topChain, + const char *rootChain, + bool addTmpJump) +{ + virFirewallCmd *fwrule =3D NULL; + + /* add a tmp jump to avoid a firewall hole between + * updating vmap */ + if (addTmpJump) { + /* tmp iif oif jump */ + virFirewallAddCmd(fw, layer, "add", "rule", "bridge", NF_INET_TABL= E, + topChain, ifMatch, ifname, "jump", rootChain, NU= LL); + virFirewallAddCmd(fw, layer, "add", "rule", "bridge", NF_ETHERNET_= TABLE, + topChain, ifMatch, ifname, "jump", rootChain, NU= LL); + } + + /* remove VM interface jump */ + fwrule =3D virFirewallAddCmdFull(fw, layer, true, NULL, NULL, "delete", + "element", "bridge", NF_INET_TABLE, NUL= L); + virFirewallCmdAddArgFormat(fw, fwrule, "vmap-%s", ifMatch); + virFirewallCmdAddArgList(fw, fwrule, "{", ifname, "}", NULL); + /* add VM interface jump */ + fwrule =3D virFirewallAddCmd(fw, layer, "add", "element", "bridge", + NF_INET_TABLE, NULL); + virFirewallCmdAddArgFormat(fw, fwrule, "vmap-%s", ifMatch); + virFirewallCmdAddArgList(fw, fwrule, "{", ifname, ":", "jump", + rootChain, "}", NULL); + + /* remove VM interface jump */ + fwrule =3D virFirewallAddCmdFull(fw, layer, true, NULL, NULL, "delete", + "element", "bridge", + NF_ETHERNET_TABLE, NULL); + virFirewallCmdAddArgFormat(fw, fwrule, "vmap-%s", ifMatch); + virFirewallCmdAddArgList(fw, fwrule, "{", ifname, "}", NULL); + /* add VM interface jump */ + fwrule =3D virFirewallAddCmd(fw, layer, "add", "element", "bridge", + NF_ETHERNET_TABLE, NULL); + virFirewallCmdAddArgFormat(fw, fwrule, "vmap-%s", ifMatch); + virFirewallCmdAddArgList(fw, fwrule, "{", ifname, ":", "jump", rootCha= in, + "}", NULL); +} + +/* + * nftablesCreateRootChain: + * @fw: the firewall ruleset instance + * @layer: the firewall layer + * @ifname : The name of the interface to apply the chain to + * @ifMatch : The matcher to use for this root chain, iif/oif + * @chainPrefix: The prefix to put on the beginning of the name of the cha= in + * @protoidx: Protocol id for conditional jump + * @topChain: The chain to define the jump on + * @rootChain: The root chain for the interface to create + * + * Creates the interface root chain, chainPostfix set to 'in' + * on vnet1 for example, leads to: + * - vnet1-in + * + * These root chains are the chains where all the subchains jumps get adde= d to + * vnet1-in -> jump vnet-mac-in; ether type ip jump vnet-ip-in; + */ +static void +nftablesCreateRootChain(virFirewall *fw, + virFirewallLayer layer, + const char *rootChain) +{ + VIR_DEBUG("Defining root chain '%s'", rootChain); + + virFirewallAddCmd(fw, layer, "add", "chain", "bridge", + NF_ETHERNET_TABLE, rootChain, CHAINSETTINGS, NULL); + + virFirewallAddCmd(fw, layer, "add", "chain", "bridge", + NF_INET_TABLE, rootChain, CHAINSETTINGS, NULL); +} + +typedef struct _nftablesSubChain nftablesSubChain; +struct _nftablesSubChain { + /* we use the lowest rule priority in a chain to compare root rule ins= erts + * see nftablesHandleCreateChains for the explanation */ + virNWFilterRulePriority lowestRulePriority; + virNWFilterChainPriority priority; + enum virNWFilterProtoIdx protoidx; + char prefix[MAX_NF_CHAINNAME_LENGTH]; + const char *suffix; + bool hasEthernetRules; + bool hasInetRules; +}; + +static int nftablesChainCreateSort(const void *a, const void *b, + void *opaque G_GNUC_UNUSED) +{ + const nftablesSubChain *insta =3D *(const nftablesSubChain **)a; + const nftablesSubChain *instb =3D *(const nftablesSubChain **)b; + const char *root =3D virNWFilterChainSuffixTypeToString( + VIR_NWFILTER_CHAINSUFFIX_ROOT); + bool root_a =3D STREQ(insta->suffix, root); + bool root_b =3D STREQ(instb->suffix, root); + + /* ensure root chain commands appear before all others since + we will need them to create the child chains */ + if (root_a) { + if (!root_b) + return -1; /* a before b */ + } else if (root_b) { + return 1; /* b before a */ + } + + /* priorities are limited to range [-1000, 1000] */ + return insta->priority - instb->priority; +} + +static void +nftablesGetSubChains(nftablesSubChain ***chains, + size_t *nchains, + virNWFilterRuleInst **rules, + size_t nrules, + const char *ifname) +{ + size_t i, j; + + for (i =3D 0; i < nrules; i++) { + g_autofree nftablesSubChain *chain =3D NULL; + nftablesSubChain **chainst =3D *chains; + bool registered =3D false; + bool isEthernetRule =3D virNWFilterRuleIsProtocolEthernet( + rules[i]->def); + + for (j =3D 0; j < *nchains; j++) { + if (STREQ(rules[i]->chainSuffix, chainst[j]->suffix)) { + VIR_DEBUG("Chain already registered '%s'", chainst[j]->suf= fix); + + /* using ifs here as they are more readable */ + if (!chainst[j]->hasEthernetRules && isEthernetRule) + chainst[j]->hasEthernetRules =3D true; + if (!chainst[j]->hasInetRules && !isEthernetRule) + chainst[j]->hasInetRules =3D true; + + registered =3D true; + break; + } + } + + if (registered) + continue; + + /* filter out the root chain */ + if (STREQ(rules[i]->chainSuffix, + virNWFilterChainSuffixTypeToString(VIR_NWFILTER_CHAINSUFFIX_RO= OT))) + continue; + + /* register the chain for creation */ + chain =3D g_new0(nftablesSubChain, 1); + + chain->hasEthernetRules =3D isEthernetRule; + chain->hasInetRules =3D !chain->hasEthernetRules; + chain->priority =3D rules[i]->chainPriority; + chain->lowestRulePriority =3D rules[i]->priority; + chain->suffix =3D rules[i]->chainSuffix; + g_snprintf(chain->prefix, sizeof(chain->prefix), + "n-%s-%s", ifname, chain->suffix); + + VIR_APPEND_ELEMENT(*chains, *nchains, chain); + } +} + +static int +nftablesHandleCreateChains(virFirewall *fw, + virFirewallLayer layer, + const char *const *lines G_GNUC_UNUSED, + void *opaque) +{ + size_t i, j, nchains =3D 0; + size_t lastProcessedRootRuleIndex =3D 0; + int ret =3D -1; + virNWFilterChainCreateCallbackData *cbdata =3D opaque; + nftablesSubChain **chains =3D NULL; + char rootChainIn[MAX_NF_CHAINNAME_LENGTH]; + char rootChainOut[MAX_NF_CHAINNAME_LENGTH]; + bool ethernetRootRuleSorting, inetRootRuleSorting; + const char *rootChainName =3D virNWFilterChainSuffixTypeToString( + VIR_NWFILTER_CHAINSUFFIX_ROOT); + g_snprintf(rootChainIn, sizeof(rootChainIn), "n-%s-in", cbdata->ifname= ); + g_snprintf(rootChainOut, sizeof(rootChainOut), "n-%s-out", cbdata->ifn= ame); + + nftablesGetSubChains(&chains, + &nchains, + cbdata->rules, + cbdata->nrules, + cbdata->ifname); + + /* sort chains on their chain priority */ + g_qsort_with_data(chains, nchains, sizeof(chains[0]), + nftablesChainCreateSort, NULL); + + /* first we create the root interface in-out chains */ + nftablesCreateRootChain(fw, layer, rootChainIn); + nftablesCreateRootChain(fw, layer, rootChainOut); + + /* + * Root chain ordering is special. + * + * Filtering rules on the root chain must be interleaved with subchain + * definitions and jumps based on priority. This is required to stay + * compatible with behavior from the ebiptables driver, where root rul= es + * may need to appear before or after chain jumps depending on priorit= y. + * + * Historical note: + * - In the ebiptables driver, iptables/ip6tables had no subchains; + * all inet rules lived directly on the root chain. + * - To avoid duplicating logic for ethernet and inet, we now define + * chains and rules once and apply table-specific ordering instead. + * + * Only the root chain needs this handling. All other chains are alrea= dy + * sorted correctly. Chains cannot be created lazily during rule + * processing, as chains themselves have priorities. + * + * Therefor we apply the following logic: + * - Create the root chain first + * - Process root rules and subchains in priority order + * - Root rules are inserted according to rule priority + * - Subchains are created (with their jump) when their priority req= uires it + * + * Table specific behavior (ethernetRootRuleSorting/inetRootRuleSortin= g): + * - inet: rule->priority vs chain->lowestRulePriority + * - enet: rule->priority vs chain->priority + */ + + /* create chain if it doesn't exist */ + /* define undefined sub chains */ + for (i =3D 0; i < nchains; i++) { + enum virNWFilterProtoIdx protoidx; + + /* root chain firewall rules, if there are root chain firewall rul= es + * with a lower priority than this chains lowest rule priority */ + for (j =3D lastProcessedRootRuleIndex; j < cbdata->nrules; j++) { + /* as root rules are inserted before all other rules, + * we stop walking the rules list when we've hit a no root rul= e*/ + if (STRNEQ(cbdata->rules[j]->chainSuffix, rootChainName)) { + break; + } + + ethernetRootRuleSorting =3D virNWFilterRuleIsProtocolEthernet(= cbdata->rules[j]->def); + inetRootRuleSorting =3D !ethernetRootRuleSorting; + + lastProcessedRootRuleIndex =3D j; + if ((inetRootRuleSorting && chains[i]->lowestRulePriority > cb= data->rules[j]->priority) || + (ethernetRootRuleSorting && chains[i]->priority > cbdata->= rules[j]->priority)) { + if (nftablesRuleInstCommand(fw, layer, + cbdata->ifname, + cbdata->rules[j]) < 0) + goto cleanup; + } else { + break; + } + } + + protoidx =3D nftablesGetProtoIdxByFiltername(chains[i]->suffix); + if (chains[i]->hasEthernetRules) { + nftablesCreateSubChain(fw, layer, NF_ETHERNET_TABLE, + chains[i]->prefix, protoidx, + rootChainIn, "in"); + nftablesCreateSubChain(fw, layer, NF_ETHERNET_TABLE, + chains[i]->prefix, protoidx, + rootChainOut, "out"); + } + if (chains[i]->hasInetRules) { + nftablesCreateSubChain(fw, layer, NF_INET_TABLE, + chains[i]->prefix, protoidx, + rootChainIn, "in"); + nftablesCreateSubChain(fw, layer, NF_INET_TABLE, + chains[i]->prefix, protoidx, + rootChainOut, "out"); + } + } + + /* process the firewall rules and chains */ + /* everything before lastProcessedRootRuleIndex has been created */ + for (i =3D lastProcessedRootRuleIndex; i < cbdata->nrules; i++) { + if (nftablesRuleInstCommand(fw, layer, + cbdata->ifname, cbdata->rules[i]) < 0) + goto cleanup; + } + + /* creation of temp jumps is done as libvirt doesn't execute + * atomic nft changes (yet) */ + nftablesCreateRootChainJump(fw, layer, cbdata->ifname, IN_IFMATCH, + IN_CHAIN, rootChainIn, true); + nftablesCreateRootChainJump(fw, layer, cbdata->ifname, OUT_IFMATCH, + OUT_CHAIN, rootChainOut, true); + + ret =3D 0; + + cleanup: + for (i =3D 0; i < nchains; i++) + g_free(chains[i]); + + return ret; +} + +/** + * nftablesCreateRootTables + * + * @fw: the firewall instance + * + * Run nft list tables and parse if the table already exist + * skips creation of base table if possible + * see handler in nftablesHandleCreateRootTables + */ +static void nftablesCreateRootTables(virFirewall *fw) +{ + virFirewallAddCmdFull(fw, VIR_FIREWALL_LAYER_ETHERNET, + false, nftablesHandleCreateRootTables, + NULL, + "list", "tables", NULL); +} + +/** + * nftablesEnsureRootTablesExist + * + * Run nftablesCreateRootTables in a seperate transaction, + * Follow up commands like: + * - "nft list -a" commands in nftablesRemoveAllInterfaceChains + * - "add chain" commands in nftablesApplyBasicRules + * Can then run and be assured that the tables should exist. + */ +static int nftablesEnsureRootTablesExist(void) +{ + g_autoptr(virFirewall) fw =3D virFirewallNew(VIR_FIREWALL_BACKEND_NFTA= BLES); + virFirewallStartTransaction(fw, 0); + + /* remove interface chains and rules */ + nftablesCreateRootTables(fw); + + return virFirewallApply(fw); +} + +/** + * nftablesCreateChains + * + * @fw: the firewleset instance + * @cbdata: callback data struct which holds variables that + * the call back handler needs in order to create + * the base table and the dependant rules + * + * Run nft list table libvirt-nwfilter and parse if the chains already exi= st + * skips creation of chains if possible + * see handler in nftablesHandleCreateChains + */ +static void nftablesCreateChains(virFirewall *fw, + virNWFilterChainCreateCallbackData *cbdat= a) +{ + virFirewallAddCmdFull(fw, VIR_FIREWALL_LAYER_ETHERNET, + false, nftablesHandleCreateChains, + (void *)cbdata, + "list", "chains", NULL); +} + +static const char *breakStrAt(const char *str, char untilc) +{ + const char *untilPtr =3D strchr(str, untilc); + if (untilPtr) { + *(char *)untilPtr =3D '\0'; + } + + return str; +} + +static int +nftablesHandleRenameChains(virFirewall *fw, + virFirewallLayer layer, + const char *const *lines, + void *opaque) +{ + size_t i =3D 0; + const char *ifname =3D opaque; + const char *tableName =3D NULL; + const char *chain =3D NULL; + const char *newName =3D NULL; + char chainCompare[MAX_NF_CHAINNAME_LENGTH]; + g_snprintf(chainCompare, sizeof(chainCompare), "n-%s-", ifname); + + /* parse nft tables list output to see if chains exist */ + for (i =3D 0; lines[i] !=3D NULL; i++) { + const char *line =3D lines[i]; + + /* first we'll have to parse the table name */ + if (tableName =3D=3D NULL && STRPREFIX(line, "table bridge ")) { + line =3D STRSKIP(line, "table bridge "); + /* parse table that we want to clean */ + tableName =3D breakStrAt(line, ' '); + continue; + } + + virSkipSpaces(&line); + + if ((line =3D STRSKIP(line, "chain ")) =3D=3D NULL) { + continue; + } + chain =3D breakStrAt(line, ' '); + + if (STRPREFIX(chain, chainCompare) && STRPREFIX(chain, "n-")) { + /* new name is name without n- at the prefix */ + newName =3D chain + strlen("n-"); + VIR_DEBUG("Scheduling chain rename '%s'->'%s' on table '%s'", + chain, newName, tableName); + /* delete the chain */ + virFirewallAddCmd(fw, layer, + "rename", "chain", "bridge", + tableName, chain, newName, NULL); + } + } + + return 0; +} + +static void +nftablesRemoveVmapElementList(virFirewall *fw, + virFirewallLayer layer, + const char *line, + const char *tableName, + const char *vmapName, + const char *chainCompare) +{ + const char *vmapKey =3D NULL; + if (STRPREFIX(line, "elements =3D {")) + line =3D STRSKIP(line, "elements =3D {"); + + /* skip spaces up to vmap key */ + virSkipSpaces(&line); + + /* walk the elements */ + while (line && STRNEQ(line, "}") && STRNEQ(line, ",")) { + g_autofree char *vmap =3D g_strdup(line); + vmapKey =3D breakStrAt(vmap, ' '); + + /* skip past this vmap key */ + line =3D STRSKIP(line, vmapKey); + + /* skip " : jump" or ":jump" */ + virSkipSpaces(&line); + if ((line =3D STRSKIP(line, ":")) =3D=3D NULL) + break; + virSkipSpaces(&line); + if ((line =3D STRSKIP(line, "jump")) =3D=3D NULL) + break; + virSkipSpaces(&line); + + if (STRPREFIX(line, chainCompare)) { + VIR_DEBUG("Scheduling vmap element '%s' on '%s' for deletion", + vmapKey, vmapName); + virFirewallAddCmd(fw, layer, + "delete", "element", "bridge", tableName, + vmapName, "{", vmapKey, "}", NULL); + } + + if (strchr(line, ',') !=3D NULL) + line =3D strchr(line, ','); + if (strchr(line, ' ') !=3D NULL) + line =3D strchr(line, ' '); + + /* skip spaces up to next vmap key */ + virSkipSpaces(&line); + } +} + +static int +nftablesHandleRemoveAll(virFirewall *fw, + virFirewallLayer layer, + const char *const *lines, + void *opaque) +{ + size_t i =3D 0; + const char *ifname =3D opaque; + const char *tableName =3D NULL; + const char *chain =3D NULL; + const char *vmapName =3D NULL; + bool vmapParsing =3D false; + char chainCompare[MAX_NF_CHAINNAME_LENGTH]; + char fwCompare[MAX_NF_CHAINNAME_LENGTH]; + char tmpFwCompare[MAX_NF_CHAINNAME_LENGTH]; + g_snprintf(chainCompare, sizeof(chainCompare), "%s-", ifname); + g_snprintf(fwCompare, sizeof(fwCompare), "\"%s\" jump %s-", ifname, if= name); + /* match possible tmp jump on tmp name "\"vnet0\"" jump n-vnet0-" */ + g_snprintf(tmpFwCompare, sizeof(tmpFwCompare), "\"%s\" jump n-%s-", if= name, + ifname); + + /* parse nft tables list output to see if chains exist */ + for (i =3D 0; lines[i] !=3D NULL; i++) { + const char *line =3D lines[i]; + + /* first we'll have to parse the table name */ + if (tableName =3D=3D NULL && STRPREFIX(line, "table bridge ")) { + line =3D STRSKIP(line, "table bridge "); + /* parse table that we want to clean */ + tableName =3D breakStrAt(line, ' '); + continue; + } + + virSkipSpaces(&line); + + /* delete tmp jumps */ + if (strstr(line, fwCompare) !=3D NULL || + strstr(line, tmpFwCompare) !=3D NULL) { + line =3D strchr(line, '#'); + if ((line =3D STRSKIP(line, "# handle ")) =3D=3D NULL) + continue; + + /* delete tmp jump */ + virFirewallAddCmd(fw, layer, + "delete", "rule", "bridge", tableName, chain, + "handle", line, NULL); + + continue; + } + + /* parse current vmap name*/ + if (STRPREFIX(line, "map ") && + (line =3D STRSKIP(line, "map ")) !=3D NULL) { + vmapName =3D breakStrAt(line, ' '); + continue; + } + + /* if we come acros map elements, we enable element list parsing */ + if (STRPREFIX(line, "elements =3D {")) + vmapParsing =3D true; + + /* remove old map elements, if they exist */ + /* we are in process of parsing a vmap elements list */ + if (vmapParsing) { + /* reached end of list */ + if (strstr(line, "}") !=3D NULL) + vmapParsing =3D false; + + nftablesRemoveVmapElementList(fw, layer, line, tableName, + vmapName, chainCompare); + + continue; + } + + if ((line =3D STRSKIP(line, "chain ")) =3D=3D NULL) { + continue; + } + chain =3D breakStrAt(line, ' '); + + if (STRPREFIX(chain, chainCompare)) { + VIR_DEBUG("Scheduling chain '%s' on table '%s' for deletion", + chain, tableName); + /* delete the chain */ + virFirewallAddCmd(fw, layer, + "delete", "chain", "bridge", + tableName, chain, NULL); + } + } + + return 0; +} + +static void +nftablesRemoveAllInterfaceChains(virFirewall *fw, const char *ifname) +{ + virFirewallAddCmdFull(fw, VIR_FIREWALL_LAYER_ETHERNET, + false, nftablesHandleRemoveAll, + (void *)ifname, + "-a", "list", "table", "bridge", + NF_ETHERNET_TABLE, NULL); + + virFirewallAddCmdFull(fw, VIR_FIREWALL_LAYER_ETHERNET, + false, nftablesHandleRemoveAll, + (void *)ifname, + "-a", "list", "table", "bridge", + NF_INET_TABLE, NULL); +} + +static void +nftablesRenameAllInterfaceChains(virFirewall *fw, const char *ifname) +{ + virFirewallAddCmdFull(fw, VIR_FIREWALL_LAYER_ETHERNET, + false, nftablesHandleRenameChains, + (void *)ifname, + "-a", "list", "table", "bridge", + NF_ETHERNET_TABLE, NULL); + + virFirewallAddCmdFull(fw, VIR_FIREWALL_LAYER_ETHERNET, + false, nftablesHandleRenameChains, + (void *)ifname, + "-a", "list", "table", "bridge", + NF_INET_TABLE, NULL); +} + +static int +nftablesApplyNewRules(const char *ifname, + virNWFilterRuleInst **rules, + size_t nrules) +{ + size_t i; + g_autoptr(GHashTable) chains_in_set =3D virHashNew(NULL); + g_autoptr(GHashTable) chains_out_set =3D virHashNew(NULL); + g_autoptr(virFirewall) fw =3D virFirewallNew(VIR_FIREWALL_BACKEND_NFTA= BLES); + virNWFilterChainCreateCallbackData chainCallbackData =3D {ifname, nrul= es, rules}; + + /* nwfilter_nftables applies new rules first, then remove old rules + * in order to do this we: + * - place the new chains under a name prefixed with "n-" + * - create tmp jump that catches vmap switch moment, + * traffic will temporarily not be matched as an entry from the vmap= will + * be deleted and then recreated as you can't atomic update vmaps vi= a a + * single command + * - in the tearOldRules function, we also remove the tmp interface ju= mp to + * the new chains + * - in tearOldRules we remove the old chains + * - in tearOldRules we rename the "n-" chains by removing "n-" from t= he + * chain name + * + * This allows us in a rollback scenario to simply remove the new chai= ns + * and jumps + */ + char tmpIfname[VIR_INT64_STR_BUFLEN]; + g_snprintf(tmpIfname, sizeof(tmpIfname), "n-%s", ifname); + + /* walk the list of rules and increase the priority + * of rules in case the chain priority is of higher value; + * this preserves the order of the rules and ensures that + * the chain will be created before the chain's rules + * are created; don't adjust rules in the root chain + * example: a rule of priority -510 will be adjusted to + * priority -500 and the chain with priority -500 will + * then be created before it. + */ + for (i =3D 0; i < nrules; i++) { + if (rules[i]->chainPriority > rules[i]->priority && + !strstr("root", rules[i]->chainSuffix)) { + + rules[i]->priority =3D rules[i]->chainPriority; + } + } + + /* sort rules */ + if (nrules) { + g_qsort_with_data(rules, nrules, sizeof(rules[0]), + virNWFilterRuleInstSortPtr, NULL); + } + + virFirewallStartTransaction(fw, 0); + + /* create root tables if they don't exist already */ + nftablesCreateRootTables(fw); + /* create user chains and rules */ + nftablesCreateChains(fw, &chainCallbackData); + + /* rollback commands, if necessary */ + virFirewallStartRollback(fw, 0); + nftablesRemoveAllInterfaceChains(fw, tmpIfname); + + /* process rules and apply them */ + return virFirewallApply(fw); +} + +static int +nftablesTeardownNewRules(const char *ifname) +{ + char matchIfname[VIR_INT64_STR_BUFLEN]; + g_autoptr(virFirewall) fw =3D virFirewallNew(VIR_FIREWALL_BACKEND_NFTA= BLES); + + g_snprintf(matchIfname, sizeof(matchIfname), "n-%s", ifname); + + virFirewallStartTransaction(fw, 0); + + /* remove tmp interface chains and rules */ + nftablesRemoveAllInterfaceChains(fw, matchIfname); + + return virFirewallApply(fw); +} + +static int +nftablesTeardownOldRules(const char *ifname) +{ + g_autoptr(virFirewall) fw =3D virFirewallNew(VIR_FIREWALL_BACKEND_NFTA= BLES); + virFirewallStartTransaction(fw, 0); + + /* remove old interface chains and rules */ + nftablesRemoveAllInterfaceChains(fw, ifname); + + /* rename new temp interface chains and rules */ + nftablesRenameAllInterfaceChains(fw, ifname); + + return virFirewallApply(fw); +} + +/** + * nftablesAllTeardown: + * @ifname : the name of the interface to which the rules apply + * + * Unconditionally remove all possible user defined tables and rules + * that were created for the given interface (ifname). + * + * Returns 0 on success, -1 on OOM + */ +static int +nftablesAllTeardown(const char *ifname) +{ + g_autoptr(virFirewall) fw =3D virFirewallNew(VIR_FIREWALL_BACKEND_NFTA= BLES); + virFirewallStartTransaction(fw, 0); + + /* remove interface chains and rules */ + nftablesRemoveAllInterfaceChains(fw, ifname); + + return virFirewallApply(fw); +} + +/** + * nftablesCanApplyBasicRules + * + * Determine whether this driver can apply the basic rules, meaning + * run nftablesApplyBasicRules and nftablesApplyDHCPOnlyRules. + * In case of this driver we need the nft tool available. + */ +static bool nftablesCanApplyBasicRules(void) +{ + return true; +} + +/** + * nftablesApplyBasicRules + * + * @ifname: name of the backend-interface to which to apply the rules + * @macaddr: MAC address the VM is using in packets sent through the + * interface + * + * Returns 0 on success, -1 on failure with the rules removed + * + * Apply basic filtering rules on the given interface + * - filtering for MAC address spoofing + * - allowing IPv4 & ARP traffic + */ +static int +nftablesApplyBasicRules(const char *ifname, + const virMacAddr *macaddr) +{ + g_autoptr(virFirewall) fw =3D virFirewallNew(VIR_FIREWALL_BACKEND_NFTA= BLES); + char macaddr_str[VIR_MAC_STRING_BUFLEN]; + char rootChainIn[MAX_NF_CHAINNAME_LENGTH]; + char rootChainOut[MAX_NF_CHAINNAME_LENGTH]; + + virMacAddrFormat(macaddr, macaddr_str); + + if (nftablesEnsureRootTablesExist() < 0) + return -1; + + if (nftablesAllTeardown(ifname) < 0) + return -1; + + virFirewallStartTransaction(fw, 0); + + /* create root chain */ + g_snprintf(rootChainIn, sizeof(rootChainIn), "%s-in", ifname); + g_snprintf(rootChainOut, sizeof(rootChainOut), "%s-out", ifname); + nftablesCreateRootChain(fw, VIR_FIREWALL_LAYER_ETHERNET, rootChainIn); + nftablesCreateRootChain(fw, VIR_FIREWALL_LAYER_ETHERNET, rootChainOut); + + + /* apply rules to root chain */ + virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_ETHERNET, "add", "rule", "bri= dge", + NF_ETHERNET_TABLE, rootChainOut, "ether", "saddr", + "!=3D", macaddr_str, "drop", NULL); + virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_ETHERNET, "add", "rule", "bri= dge", + NF_ETHERNET_TABLE, rootChainOut, "ether", "type", "i= p", + "accept", NULL); + virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_ETHERNET, "add", "rule", "bri= dge", + NF_ETHERNET_TABLE, rootChainOut, "ether", "type", "a= rp", + "accept", NULL); + virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_ETHERNET, "add", "rule", "bri= dge", + NF_ETHERNET_TABLE, rootChainOut, "accept", NULL); + + nftablesCreateRootChainJump(fw, VIR_FIREWALL_LAYER_ETHERNET, ifname, + IN_IFMATCH, IN_CHAIN, rootChainIn, false); + nftablesCreateRootChainJump(fw, VIR_FIREWALL_LAYER_ETHERNET, ifname, + OUT_IFMATCH, OUT_CHAIN, rootChainOut, fals= e); + + if (virFirewallApply(fw) < 0) { + nftablesAllTeardown(ifname); + return -1; + } + + return 0; +} + +/** + * nftablesApplyDHCPOnlyRules + * + * @ifname: name of the backend-interface to which to apply the rules + * @macaddr: MAC address the VM is using in packets sent through the + * interface + * @dhcpsrvrs: The DHCP server(s) from which the VM may receive traffic + * from; may be NULL + * @leaveTemporary: Whether to leave the table names with their temporary + * names (true) or also perform the renaming to their final names as + * part of this call (false) + * + * Returns 0 on success, -1 on failure with the rules removed + * + * Apply filtering rules so that the VM can only send and receive + * DHCP traffic and nothing else. + */ +static int +nftablesApplyDHCPOnlyRules(const char *ifname, + const virMacAddr *macaddr, + virNWFilterVarValue *dhcpsrvrs, + bool leaveTemporary G_GNUC_UNUSED) +{ + char rootChainIn [MAX_NF_CHAINNAME_LENGTH], + rootChainOut[MAX_NF_CHAINNAME_LENGTH]; + char macaddr_str[VIR_MAC_STRING_BUFLEN]; + unsigned int idx =3D 0; + unsigned int num_dhcpsrvrs; + g_autoptr(virFirewall) fw =3D virFirewallNew(VIR_FIREWALL_BACKEND_NFTA= BLES); + + virMacAddrFormat(macaddr, macaddr_str); + + if (nftablesEnsureRootTablesExist() < 0) + return -1; + + if (nftablesAllTeardown(ifname) < 0) + return -1; + + virFirewallStartTransaction(fw, 0); + + /* create root chain */ + g_snprintf(rootChainIn, sizeof(rootChainIn), "%s-in", ifname); + g_snprintf(rootChainOut, sizeof(rootChainOut), "%s-out", ifname); + nftablesCreateRootChain(fw, VIR_FIREWALL_LAYER_ETHERNET, rootChainIn); + nftablesCreateRootChain(fw, VIR_FIREWALL_LAYER_ETHERNET, rootChainOut); + + virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_ETHERNET, "add", "rule", "bri= dge", + NF_ETHERNET_TABLE, rootChainOut, "ether", "saddr", + macaddr_str, "ether", "type", "ip", + "udp", "sport", "68", "udp", "dport", "67", "accept", NU= LL); + + virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_ETHERNET, "add", "rule", "bri= dge", + NF_ETHERNET_TABLE, rootChainOut, "drop", NULL); + + num_dhcpsrvrs =3D (dhcpsrvrs !=3D NULL) + ? virNWFilterVarValueGetCardinality(dhcpsrvrs) + : 0; + + while (true) { + const char *dhcpserver =3D NULL; + int ctr; + + if (idx < num_dhcpsrvrs) + dhcpserver =3D virNWFilterVarValueGetNthValue(dhcpsrvrs, idx); + + /* + * create two rules allowing response to MAC address of VM + * or to broadcast MAC address + */ + for (ctr =3D 0; ctr < 2; ctr++) { + if (dhcpserver) + virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_ETHERNET, + "add", "rule", "bridge", + NF_ETHERNET_TABLE, rootChainIn, "ether", + "daddr", + (ctr =3D=3D 0) ? macaddr_str : "ff:ff:ff= :ff:ff:ff", + "ether", "type", "ip", + "ip", "saddr", dhcpserver, + "udp", "sport", "67", + "udp", "dport", "68", "accept", NULL); + else + virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_ETHERNET, + "add", "rule", "bridge", + NF_ETHERNET_TABLE, rootChainIn, "ether", + "daddr", + (ctr =3D=3D 0) ? macaddr_str : "ff:ff:ff= :ff:ff:ff", + "ether", "type", "ip", + "udp", "sport", "67", + "udp", "dport", "68", "accept", NULL); + } + + idx++; + + if (idx >=3D num_dhcpsrvrs) + break; + } + + virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_ETHERNET, "add", "rule", "bri= dge", + NF_ETHERNET_TABLE, rootChainIn, "drop", NULL); + + nftablesCreateRootChainJump(fw, VIR_FIREWALL_LAYER_ETHERNET, ifname, + IN_IFMATCH, IN_CHAIN, rootChainIn, false); + nftablesCreateRootChainJump(fw, VIR_FIREWALL_LAYER_ETHERNET, ifname, + OUT_IFMATCH, OUT_CHAIN, rootChainOut, fals= e); + + if (virFirewallApply(fw) < 0) { + nftablesAllTeardown(ifname); + return -1; + } + + return 0; +} + +static int +nftablesRemoveBasicRules(const char *ifname) +{ + return nftablesAllTeardown(ifname); +} + +/** + * nftablesApplyDropAllRules + * + * @ifname: name of the backend-interface to which to apply the rules + * + * Returns 0 on success, -1 on failure with the rules removed + * + * Apply filtering rules so that the VM cannot receive or send traffic. + */ +static int +nftablesDropAllRules(const char *ifname) +{ + char rootChainIn [MAX_NF_CHAINNAME_LENGTH], + rootChainOut[MAX_NF_CHAINNAME_LENGTH]; + g_autoptr(virFirewall) fw =3D virFirewallNew(VIR_FIREWALL_BACKEND_NFTA= BLES); + + if (nftablesAllTeardown(ifname) < 0) + return -1; + + virFirewallStartTransaction(fw, 0); + + /* create root tables if they don't exist already */ + nftablesCreateRootTables(fw); + + /* create root chain */ + g_snprintf(rootChainIn, sizeof(rootChainIn), "%s-in", ifname); + g_snprintf(rootChainOut, sizeof(rootChainOut), "%s-out", ifname); + nftablesCreateRootChain(fw, VIR_FIREWALL_LAYER_ETHERNET, rootChainIn); + nftablesCreateRootChain(fw, VIR_FIREWALL_LAYER_ETHERNET, rootChainOut); + + virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_ETHERNET, "add", "rule", "bri= dge", + NF_ETHERNET_TABLE, rootChainOut, "drop", NULL); + virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_ETHERNET, "add", "rule", "bri= dge", + NF_ETHERNET_TABLE, rootChainIn, "drop", NULL); + + /* tmp iif oif jump */ + virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_ETHERNET, "add", "rule", "bri= dge", + NF_ETHERNET_TABLE, IN_CHAIN, IN_IFNAMEMATCH, ifname, + "jump", rootChainIn, NULL); + virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_ETHERNET, "add", "rule", "bri= dge", + NF_ETHERNET_TABLE, OUT_CHAIN, OUT_IFNAMEMATCH, ifnam= e, + "jump", rootChainOut, NULL); + + if (virFirewallApply(fw) < 0) { + nftablesAllTeardown(ifname); + return -1; + } + + return 0; +} + +static int +nftablesDriverInit(bool privileged) +{ + if (!privileged) + return 0; + + nftables_driver.flags =3D TECHDRV_FLAG_INITIALIZED; + + return 0; +} + +static void +nftablesDriverShutdown(void) +{ + nftables_driver.flags =3D 0; +} + +virNWFilterTechDriver nftables_driver =3D { + .name =3D NFTABLES_DRIVER_ID, + .flags =3D 0, + + .init =3D nftablesDriverInit, + .shutdown =3D nftablesDriverShutdown, + + .applyNewRules =3D nftablesApplyNewRules, + .tearNewRules =3D nftablesTeardownNewRules, + .tearOldRules =3D nftablesTeardownOldRules, + .allTeardown =3D nftablesAllTeardown, + + .canApplyBasicRules =3D nftablesCanApplyBasicRules, + .applyBasicRules =3D nftablesApplyBasicRules, + .applyDHCPOnlyRules =3D nftablesApplyDHCPOnlyRules, + .applyDropAllRules =3D nftablesDropAllRules, + .removeBasicRules =3D nftablesRemoveBasicRules, +}; diff --git a/src/nwfilter/nwfilter_nftables_driver.h b/src/nwfilter/nwfilte= r_nftables_driver.h new file mode 100644 index 0000000000..a767413208 --- /dev/null +++ b/src/nwfilter/nwfilter_nftables_driver.h @@ -0,0 +1,28 @@ +/* + * nwfilter_nftables_driver.h: nftables driver support + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * . + */ + +#pragma once + +#include "nwfilter_tech_driver.h" + +extern virNWFilterTechDriver nftables_driver; + +#define NFTABLES_DRIVER_ID "nftables" + +/* see source/include/uapi/linux/netfilter/nf_tables.h */ +#define MAX_NF_CHAINNAME_LENGTH 256 --=20 2.43.0