From nobody Sun Apr 28 13:35:41 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1524847520408614.5037062928147; Fri, 27 Apr 2018 09:45:20 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 62DF14900F; Fri, 27 Apr 2018 16:45:18 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7C49B696D; Fri, 27 Apr 2018 16:45:17 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 5A0571805971; Fri, 27 Apr 2018 16:45:16 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id w3RGiOnS014956 for ; Fri, 27 Apr 2018 12:44:25 -0400 Received: by smtp.corp.redhat.com (Postfix) id 7F752215CDCF; Fri, 27 Apr 2018 16:44:24 +0000 (UTC) Received: from vhost2.laine.org (ovpn-116-213.phx2.redhat.com [10.3.116.213]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9A0FA215CDCB; Fri, 27 Apr 2018 16:44:23 +0000 (UTC) From: Laine Stump To: libvir-list@redhat.com Date: Fri, 27 Apr 2018 12:44:19 -0400 Message-Id: <20180427164419.549123-1-laine@laine.org> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-loop: libvir-list@redhat.com Cc: Christian Ehrhardt , Christian Ehrhardt Subject: [libvirt] [PATCH v2] nwfilter: increase pcap buffer size to be compatible with TPACKET_V3 X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Fri, 27 Apr 2018 16:45:19 +0000 (UTC) X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Type: text/plain; charset="utf-8" When an nwfilter rule sets the parameter CTRL_IP_LEARNING to "dhcp", this turns on the "dhcpsnoop" thread, which uses libpcap to monitor traffic on the domain's tap device and extract the IP address from the DHCP response. If libpcap on the host is built with HAVE_TPACKET3 defined (to enable support for TPACKET_V3), the dhcpsnoop code's initialization of the libpcap socket would fail with the following error: virNWFilterSnoopDHCPOpen:1134 : internal error: pcap_setfilter: can't rem= ove kernel filter: Bad file descriptor It turns out that this was because TPACKET_V3 requires a larger buffer size than libvirt was setting (we were setting it to 128k). Changing the buffer size to 256k eliminates the error, and the dhcpsnoop thread once again works properly. A fuller explanation of why TPACKET_V3 requires such a large buffer, for future git spelunkers: libpcap calls setsockopt(... SOL_PACKET, PACKET_RX_RING...) to setup a ring buffer for receiving packets; two of the attributes sent to this API are called tp_frame_size, and tp_frame_nr. If libpcap was built with HAVE_TPACKET3 defined, tp_trame_size is set to MAXIMUM_SNAPLEN (defined in libpcap sources as 262144) and tp_frame_nr is set to: [the buffer size we set, i.e. PCAP_BUFFERSIZE i.e. 262144] / tp_frame_size. So if PCAP_BUFFERSIZE < MAXIMUM_SNAPLEN, then tp_frame_nr (the number of frames in the ring buffer) is 0, which is nonsensical. This same value is later used as a multiplier to determine the size for a call to malloc() (which would also fail). (NB: if HAVE_TPACKET3 is *not* defined, then tp_frame_size is set to the snaplen set by the user (in our case 576) plus a small amount to account for ethernet headers, so 256k is far more than adequate) Since the TPACKET_V3 code in libpcap actually reads multiple packets into each frame, it's not a problem to have only a single frame (especially when we are monitoring such infrequent traffic), so it's okay to set this relatively small buffer size (in comparison to the default, which is 2MB), which is important since every guest using dhcp snooping in a nwfilter rule will hold 2 of these buffers for the entire life of the guest. Thanks to Christian Ehrhardt for discovering that buffer size was the problem (this was not at all obvious from the error that was logged!) Resolves: https://bugzilla.redhat.com/1547237 Fixes: https://bugs.launchpad.net/libvirt/+bug/1758037 Signed-off-by: Laine Stump Reviewed-by: John Ferlan --- change from V1: you asked for "big comment", you got it! If you think this is overkill, or underkill, let me know. I decided that the details about proper buffersize were better sitting right next to the definition of PCAP_BUFFERSIZE, but that there should also be a comment down in the code where any potential future error would occur, as that would more easily catch the eye of the hapless keyboard jockey who gets to rediscover this problem the next time libpcap changes. Also, I'm planning to request that the libpcap developers check for a viable buffersize during the pcap_set_buffer_size() call so that in the future this error is reported immediately rather than just having some random-seeming failure at a later time when pcap_setfilter() is called. src/nwfilter/nwfilter_dhcpsnoop.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/src/nwfilter/nwfilter_dhcpsnoop.c b/src/nwfilter/nwfilter_dhcp= snoop.c index 6069e70460..50cfb944a2 100644 --- a/src/nwfilter/nwfilter_dhcpsnoop.c +++ b/src/nwfilter/nwfilter_dhcpsnoop.c @@ -256,10 +256,21 @@ struct _virNWFilterDHCPDecodeJob { # define DHCP_BURST_INTERVAL_S 10 /* sec */ =20 /* - * libpcap 1.5 requires a 128kb buffer - * 128 kb is bigger than (DHCP_PKT_BURST * PCAP_PBUFSIZE / 2) + * NB: Any libpcap built with HAVE_TPACKET3 will require + * PCAP_BUFFERSIZE to be at least 262144 (although + * pcap_set_buffer_size() with a lower value will succeed, and the + * error will only show up later when pcap_setfilter() is called). + * + * It is possible that in the future libpcap could increase the + * minimum size even further, but due to the fact that each guest + * using dhcp snooping keeps 2 pcap sockets open (and thus 2 buffers + * allocated) for the life of the guest, we want to minimize the + * length of the buffer, so instead of leaving it at the default size + * (2MB), we are setting it to the minimum viable size and including + * this clue in the source to help quickly resolve the problem when/if + * it reoccurs. */ -# define PCAP_BUFFERSIZE (128 * 1024) +# define PCAP_BUFFERSIZE (256 * 1024) =20 # define MAX_QUEUED_JOBS (DHCP_PKT_BURST + 2 * DHCP_PKT_RATE) =20 @@ -1114,6 +1125,11 @@ virNWFilterSnoopDHCPOpen(const char *ifname, virMacA= ddr *mac, goto cleanup_nohandle; } =20 + /* IMPORTANT: If there is any failure of *any* pcap_* function + * during setup of the socket, look to the comment where + * PCAP_BUFFERSIZE is defined. It may be too small, even if the + * generated error doesn't imply that. + */ if (pcap_set_snaplen(handle, PCAP_PBUFSIZE) < 0 || pcap_set_buffer_size(handle, PCAP_BUFFERSIZE) < 0 || pcap_activate(handle) < 0) { --=20 2.14.3 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list