From nobody Fri Dec 27 01:13:47 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) client-ip=8.43.85.245; envelope-from=devel-bounces@lists.libvirt.org; helo=lists.libvirt.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) smtp.mailfrom=devel-bounces@lists.libvirt.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.libvirt.org (lists.libvirt.org [8.43.85.245]) by mx.zohomail.com with SMTPS id 173259169853949.675737832728714; Mon, 25 Nov 2024 19:28:18 -0800 (PST) Received: by lists.libvirt.org (Postfix, from userid 996) id D620F18D7; Mon, 25 Nov 2024 22:28:17 -0500 (EST) Received: from lists.libvirt.org (localhost [IPv6:::1]) by lists.libvirt.org (Postfix) with ESMTP id 7E8A018AD; Mon, 25 Nov 2024 22:25:22 -0500 (EST) Received: by lists.libvirt.org (Postfix, from userid 996) id 046FA19AE; Mon, 25 Nov 2024 22:25:17 -0500 (EST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.libvirt.org (Postfix) with ESMTPS id 1C2D419AE for ; Mon, 25 Nov 2024 22:25:03 -0500 (EST) Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-223-LsRltaMxMfSVT66hjxfJng-1; Mon, 25 Nov 2024 22:25:00 -0500 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D573D1955EEA for ; Tue, 26 Nov 2024 03:24:59 +0000 (UTC) Received: from vhost3.router.laine.org (unknown [10.22.88.88]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B3567195E483; Tue, 26 Nov 2024 03:24:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on lists.libvirt.org X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED,SPF_HELO_NONE autolearn=unavailable autolearn_force=no version=3.4.4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732591502; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=saXCzlzOfPMXlk/Z/H1ZnD2QLWMiNMv4ACOZHtCjRZc=; b=HpLqFX5nc/O40MU30X74Gv8KKhD3SqZZ1SO+D9p/uDVkXGnYV1NxCdcNpHrF2TxxOF0naA zRus49OicMxRwgre6/CBaKjG+rydmq1XjjClbq94oFyWcwwTRgjSs9ZwoEOb8aFBieHIEJ k63HxEIvhfyrw6okqMBSZ3e1vULrES4= X-MC-Unique: LsRltaMxMfSVT66hjxfJng-1 X-Mimecast-MFC-AGG-ID: LsRltaMxMfSVT66hjxfJng From: Laine Stump To: devel@lists.libvirt.org Subject: [PATCH v2 6/6] network: add tc filter rule to nftables backend to fix checksum of DHCP responses Date: Mon, 25 Nov 2024 22:24:49 -0500 Message-ID: <20241126032449.912167-7-laine@redhat.com> In-Reply-To: <20241126032449.912167-1-laine@redhat.com> References: <20241126032449.912167-1-laine@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: SgQWnIcarprDI9CGSOsQ8pXcfAkEDGrXT5crf08nK40_1732591499 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Message-ID-Hash: VSHVSF6CKYYTCVOCK3CN7OILJ2QCITDA X-Message-ID-Hash: VSHVSF6CKYYTCVOCK3CN7OILJ2QCITDA X-MailFrom: laine@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-config-1; header-match-config-2; header-match-config-3; header-match-devel.lists.libvirt.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: egarver@redhat.com, mprivozn@redhat.com, psutter@redhat.com, abologna@redhat.com X-Mailman-Version: 3.2.2 Precedence: list List-Id: Development discussions about the libvirt library & tools Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1732591703370019100 Content-Type: text/plain; charset="utf-8"; x-default="true" Please see the commit log for commit v10.9.0-rc1-1-g42ab0148dd for the history and explanation of the problem that this patch is fixing. A shorter explanation is that when a guest is connected to a libvirt virtual network using a virtio-net adapter with in-kernel "vhost-net" packet processing enabled, it will fail to acquire an IP address from a DHCP seever running on the host. In commit v10.9.0-rc1-1-g42ab0148dd we tried fixing this by *zeroing out* the checksums of these packets with an nftables rule (nftables can't recompute the checksum, but it can set it to 0) . This *appeared* to work initially, but it turned out that zeroing the checksum ends up breaking dhcp packets on *non* virtio/vhost-net guest interfaces. That attempt was reverted in commit v10.9.0-rc2. Fortunately, there is an existing way to recompute the checksum of a packet as it leaves an interface - the "tc" (traffic control) utility that libvirt already uses for bandwidth management. This patch uses a tc filter rule to match dhcp response packets on the bridge and recompute their checksum. The filter rule must be attached to a tc qdisc, which may also have a filter attached for bandwidth management (in the element of the network config). Not only must we add the qdisc only once (which was already handled by the patch two prior to this one), but also the filter rule for checksum fixing and the filter rule for bandwidth management must be different priorities so they don't clash; this is solved by adding the checksum-fix filter with "priority 2", while the bandwidth management filter remains "priority 1" (both will always be evaluated anyway, it's just a matter of which is evaluated first). So far this method has worked with every different guest we could throw at it, including several that failed with the previous method. Fixes: b89c4991daa0ee9371f10937fab3b03c5ffdabc6 Reported-by: Rich Jones Reported-by: Andrea Bolognani Fix-Suggested-by: Eric Garver Fix-Suggested-by: Phil Sutter Signed-off-by: Laine Stump --- src/network/network_nftables.c | 68 +++++++++++++++++++ .../forward-dev-linux.nftables | 40 +++++++++++ .../isolated-linux.nftables | 40 +++++++++++ .../nat-default-linux.nftables | 40 +++++++++++ .../nat-ipv6-linux.nftables | 40 +++++++++++ .../nat-ipv6-masquerade-linux.nftables | 40 +++++++++++ .../nat-many-ips-linux.nftables | 40 +++++++++++ .../nat-no-dhcp-linux.nftables | 40 +++++++++++ .../nat-port-range-ipv6-linux.nftables | 40 +++++++++++ .../nat-port-range-linux.nftables | 40 +++++++++++ .../nat-tftp-linux.nftables | 40 +++++++++++ .../route-default-linux.nftables | 40 +++++++++++ 12 files changed, 508 insertions(+) diff --git a/src/network/network_nftables.c b/src/network/network_nftables.c index b3605bd40e..5d716264bf 100644 --- a/src/network/network_nftables.c +++ b/src/network/network_nftables.c @@ -29,6 +29,7 @@ =20 #include "internal.h" #include "virfirewalld.h" +#include "vircommand.h" #include "virerror.h" #include "virlog.h" #include "virhash.h" @@ -924,6 +925,67 @@ nftablesAddIPSpecificFirewallRules(virFirewall *fw, } =20 =20 +/** + * nftablesAddUdpChecksumFixWithTC: + * + * Add a tc filter rule to @ifname (the bridge device of this network) + * that will recompute the checksum of udp packets output from @iface with + * destination port @port. + * + * Normally the checksum should be filled by some part of the basic + * network stack, but there are cases (e.g. DHCP response packets sent + * from virtualization host to a QEMU guest when the guest NIC uses + * vhost-net packet processing) when the host (sender) thinks that + * packet checksums will be computed elsewhere (and so leaves a + * partially computed checksum in the packet header) while the guest + * (receiver) thinks that the checksum has already been fully + * computed; in the meantime none of the code in between has actually + * finished computing the checksum. + * + * An example of this is DHCP response packets from host to guest. If + * the checksum of each of these packets isn't properly computed, then + * many guests (e.g. FreeBSD) will drop them with reason BAD CHECKSUM; + * this tc filter rule will fix the ip and udp checksums, and the + * FreeBSD dhcp client will happily accept the packet. + * + * (NB: if you're wondering how the tc qdisc and filter are removed + * when the network is destroyed, the answer is that the kernel + * automatically (and properly) removes them for us, so we don't need + * to worry about keeping track/deleting as we do with nftables rules) + */ +static int +nftablesAddUdpChecksumFixWithTC(virFirewall *fw, + const char *iface, + int port) +{ + g_autofree char *portstr =3D g_strdup_printf("%d", port); + + /* this will add the qdisc (that the filter below is attached to) + * unless it already exists + */ + if (virNetDevBandWidthAddTxFilterParentQdisc(iface, true) < 0) + return -1; + + /* add a filter to catch all udp packets with dst "port" and + * recompute their checksum + */ + virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_TC, + "filter", "add", "dev", iface, + "prio", "2", "protocol", "ip", "parent", "1:", + "u32", "match", "ip", "dport", portstr, "ffff", + "action", "csum", "ip", "and", "udp", + NULL); + + virFirewallAddRollbackCmd(fw, VIR_FIREWALL_LAYER_TC, + "filter", "del", "dev", iface, + "prio", "2", "protocol", "ip", "parent", "1:= ", + "u32", "match", "ip", "dport", portstr, "fff= f", + "action", "csum", "ip", "and", "udp", + NULL); + return 0; +} + + /* nftablesAddFirewallrules: * * @def - the network that needs an nftables firewall added @@ -944,6 +1006,12 @@ nftablesAddFirewallRules(virNetworkDef *def, virFirew= all **fwRemoval) =20 virFirewallStartTransaction(fw, VIR_FIREWALL_TRANSACTION_AUTO_ROLLBACK= ); =20 + /* add the tc filter rule needed to fixup the checksum of dhcp + * response packets going from host to guest. + */ + if (nftablesAddUdpChecksumFixWithTC(fw, def->bridge, 68) < 0) + return -1; + nftablesAddGeneralFirewallRules(fw, def); =20 for (i =3D 0; diff --git a/tests/networkxml2firewalldata/forward-dev-linux.nftables b/tes= ts/networkxml2firewalldata/forward-dev-linux.nftables index 8badb74beb..6772383b37 100644 --- a/tests/networkxml2firewalldata/forward-dev-linux.nftables +++ b/tests/networkxml2firewalldata/forward-dev-linux.nftables @@ -1,3 +1,43 @@ +tc \ +qdisc \ +show \ +dev \ +virbr0 \ +handle \ +1: +tc \ +qdisc \ +add \ +dev \ +virbr0 \ +root \ +handle \ +1: \ +htb \ +default \ +2 +tc \ +filter \ +add \ +dev \ +virbr0 \ +prio \ +2 \ +protocol \ +ip \ +parent \ +1: \ +u32 \ +match \ +ip \ +dport \ +68 \ +ffff \ +action \ +csum \ +ip \ +and \ +udp nft \ -ae insert \ rule \ diff --git a/tests/networkxml2firewalldata/isolated-linux.nftables b/tests/= networkxml2firewalldata/isolated-linux.nftables index d1b4dac178..546a18b75a 100644 --- a/tests/networkxml2firewalldata/isolated-linux.nftables +++ b/tests/networkxml2firewalldata/isolated-linux.nftables @@ -1,3 +1,43 @@ +tc \ +qdisc \ +show \ +dev \ +virbr0 \ +handle \ +1: +tc \ +qdisc \ +add \ +dev \ +virbr0 \ +root \ +handle \ +1: \ +htb \ +default \ +2 +tc \ +filter \ +add \ +dev \ +virbr0 \ +prio \ +2 \ +protocol \ +ip \ +parent \ +1: \ +u32 \ +match \ +ip \ +dport \ +68 \ +ffff \ +action \ +csum \ +ip \ +and \ +udp nft \ -ae insert \ rule \ diff --git a/tests/networkxml2firewalldata/nat-default-linux.nftables b/tes= ts/networkxml2firewalldata/nat-default-linux.nftables index 28508292f9..08623c1381 100644 --- a/tests/networkxml2firewalldata/nat-default-linux.nftables +++ b/tests/networkxml2firewalldata/nat-default-linux.nftables @@ -1,3 +1,43 @@ +tc \ +qdisc \ +show \ +dev \ +virbr0 \ +handle \ +1: +tc \ +qdisc \ +add \ +dev \ +virbr0 \ +root \ +handle \ +1: \ +htb \ +default \ +2 +tc \ +filter \ +add \ +dev \ +virbr0 \ +prio \ +2 \ +protocol \ +ip \ +parent \ +1: \ +u32 \ +match \ +ip \ +dport \ +68 \ +ffff \ +action \ +csum \ +ip \ +and \ +udp nft \ -ae insert \ rule \ diff --git a/tests/networkxml2firewalldata/nat-ipv6-linux.nftables b/tests/= networkxml2firewalldata/nat-ipv6-linux.nftables index d8a9ba706d..3fd6b94eef 100644 --- a/tests/networkxml2firewalldata/nat-ipv6-linux.nftables +++ b/tests/networkxml2firewalldata/nat-ipv6-linux.nftables @@ -1,3 +1,43 @@ +tc \ +qdisc \ +show \ +dev \ +virbr0 \ +handle \ +1: +tc \ +qdisc \ +add \ +dev \ +virbr0 \ +root \ +handle \ +1: \ +htb \ +default \ +2 +tc \ +filter \ +add \ +dev \ +virbr0 \ +prio \ +2 \ +protocol \ +ip \ +parent \ +1: \ +u32 \ +match \ +ip \ +dport \ +68 \ +ffff \ +action \ +csum \ +ip \ +and \ +udp nft \ -ae insert \ rule \ diff --git a/tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.nftabl= es b/tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.nftables index a7f09cda59..2811e098d1 100644 --- a/tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.nftables +++ b/tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.nftables @@ -1,3 +1,43 @@ +tc \ +qdisc \ +show \ +dev \ +virbr0 \ +handle \ +1: +tc \ +qdisc \ +add \ +dev \ +virbr0 \ +root \ +handle \ +1: \ +htb \ +default \ +2 +tc \ +filter \ +add \ +dev \ +virbr0 \ +prio \ +2 \ +protocol \ +ip \ +parent \ +1: \ +u32 \ +match \ +ip \ +dport \ +68 \ +ffff \ +action \ +csum \ +ip \ +and \ +udp nft \ -ae insert \ rule \ diff --git a/tests/networkxml2firewalldata/nat-many-ips-linux.nftables b/te= sts/networkxml2firewalldata/nat-many-ips-linux.nftables index b826fe6134..5409d5b552 100644 --- a/tests/networkxml2firewalldata/nat-many-ips-linux.nftables +++ b/tests/networkxml2firewalldata/nat-many-ips-linux.nftables @@ -1,3 +1,43 @@ +tc \ +qdisc \ +show \ +dev \ +virbr0 \ +handle \ +1: +tc \ +qdisc \ +add \ +dev \ +virbr0 \ +root \ +handle \ +1: \ +htb \ +default \ +2 +tc \ +filter \ +add \ +dev \ +virbr0 \ +prio \ +2 \ +protocol \ +ip \ +parent \ +1: \ +u32 \ +match \ +ip \ +dport \ +68 \ +ffff \ +action \ +csum \ +ip \ +and \ +udp nft \ -ae insert \ rule \ diff --git a/tests/networkxml2firewalldata/nat-no-dhcp-linux.nftables b/tes= ts/networkxml2firewalldata/nat-no-dhcp-linux.nftables index d8a9ba706d..3fd6b94eef 100644 --- a/tests/networkxml2firewalldata/nat-no-dhcp-linux.nftables +++ b/tests/networkxml2firewalldata/nat-no-dhcp-linux.nftables @@ -1,3 +1,43 @@ +tc \ +qdisc \ +show \ +dev \ +virbr0 \ +handle \ +1: +tc \ +qdisc \ +add \ +dev \ +virbr0 \ +root \ +handle \ +1: \ +htb \ +default \ +2 +tc \ +filter \ +add \ +dev \ +virbr0 \ +prio \ +2 \ +protocol \ +ip \ +parent \ +1: \ +u32 \ +match \ +ip \ +dport \ +68 \ +ffff \ +action \ +csum \ +ip \ +and \ +udp nft \ -ae insert \ rule \ diff --git a/tests/networkxml2firewalldata/nat-port-range-ipv6-linux.nftabl= es b/tests/networkxml2firewalldata/nat-port-range-ipv6-linux.nftables index ceaed6fa40..d74417cdb3 100644 --- a/tests/networkxml2firewalldata/nat-port-range-ipv6-linux.nftables +++ b/tests/networkxml2firewalldata/nat-port-range-ipv6-linux.nftables @@ -1,3 +1,43 @@ +tc \ +qdisc \ +show \ +dev \ +virbr0 \ +handle \ +1: +tc \ +qdisc \ +add \ +dev \ +virbr0 \ +root \ +handle \ +1: \ +htb \ +default \ +2 +tc \ +filter \ +add \ +dev \ +virbr0 \ +prio \ +2 \ +protocol \ +ip \ +parent \ +1: \ +u32 \ +match \ +ip \ +dport \ +68 \ +ffff \ +action \ +csum \ +ip \ +and \ +udp nft \ -ae insert \ rule \ diff --git a/tests/networkxml2firewalldata/nat-port-range-linux.nftables b/= tests/networkxml2firewalldata/nat-port-range-linux.nftables index 1dc37a26ec..b55bb287a9 100644 --- a/tests/networkxml2firewalldata/nat-port-range-linux.nftables +++ b/tests/networkxml2firewalldata/nat-port-range-linux.nftables @@ -1,3 +1,43 @@ +tc \ +qdisc \ +show \ +dev \ +virbr0 \ +handle \ +1: +tc \ +qdisc \ +add \ +dev \ +virbr0 \ +root \ +handle \ +1: \ +htb \ +default \ +2 +tc \ +filter \ +add \ +dev \ +virbr0 \ +prio \ +2 \ +protocol \ +ip \ +parent \ +1: \ +u32 \ +match \ +ip \ +dport \ +68 \ +ffff \ +action \ +csum \ +ip \ +and \ +udp nft \ -ae insert \ rule \ diff --git a/tests/networkxml2firewalldata/nat-tftp-linux.nftables b/tests/= networkxml2firewalldata/nat-tftp-linux.nftables index 28508292f9..08623c1381 100644 --- a/tests/networkxml2firewalldata/nat-tftp-linux.nftables +++ b/tests/networkxml2firewalldata/nat-tftp-linux.nftables @@ -1,3 +1,43 @@ +tc \ +qdisc \ +show \ +dev \ +virbr0 \ +handle \ +1: +tc \ +qdisc \ +add \ +dev \ +virbr0 \ +root \ +handle \ +1: \ +htb \ +default \ +2 +tc \ +filter \ +add \ +dev \ +virbr0 \ +prio \ +2 \ +protocol \ +ip \ +parent \ +1: \ +u32 \ +match \ +ip \ +dport \ +68 \ +ffff \ +action \ +csum \ +ip \ +and \ +udp nft \ -ae insert \ rule \ diff --git a/tests/networkxml2firewalldata/route-default-linux.nftables b/t= ests/networkxml2firewalldata/route-default-linux.nftables index 282c9542a5..76d6902517 100644 --- a/tests/networkxml2firewalldata/route-default-linux.nftables +++ b/tests/networkxml2firewalldata/route-default-linux.nftables @@ -1,3 +1,43 @@ +tc \ +qdisc \ +show \ +dev \ +virbr0 \ +handle \ +1: +tc \ +qdisc \ +add \ +dev \ +virbr0 \ +root \ +handle \ +1: \ +htb \ +default \ +2 +tc \ +filter \ +add \ +dev \ +virbr0 \ +prio \ +2 \ +protocol \ +ip \ +parent \ +1: \ +u32 \ +match \ +ip \ +dport \ +68 \ +ffff \ +action \ +csum \ +ip \ +and \ +udp nft \ -ae insert \ rule \ --=20 2.47.0