From nobody Thu Sep 19 00:15:57 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1720046771; cv=none; d=zohomail.com; s=zohoarc; b=SG79PawWlzqjZVzTJq9wvnj/1OkXwl8svbVMY3oas8Y3D3dQGJQDH3xTQS91QXslZoJn270+uiTtRTSE+avznmAxLVlcut44cP4srVOKLcpYUl+o3ONJeRtKYRKg6GIVHoGiORaV1qMzk4cZV64MRFd+4pM19d0oAPzoLc2lPVE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1720046771; h=Content-Type:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=uKK/AiLdjMfeKEPpHLdhaqq0Kf5J8WlC0WD4BsUonqY=; b=GJIWhLLN9hxtmFNdFbSD3+jdBEB9JpZrnjBBUPkgRErgbsfjGjWSE2jxLlrlRdauoNWA7T32qSRf6o9qaqrBsJ1VHgorliFEB9lLDJwejTjmBSgNAZSSmudSI+EUd80sQGKd9LEHfFSVxJgEKG8/khLxvdTz2PfRDMOk9JQbVJI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1720046771930413.1702843829552; Wed, 3 Jul 2024 15:46:11 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sP8iG-0001yQ-Sd; Wed, 03 Jul 2024 18:44:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sP8iF-0001yE-LL for qemu-devel@nongnu.org; Wed, 03 Jul 2024 18:44:31 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sP8iD-0000sG-T7 for qemu-devel@nongnu.org; Wed, 03 Jul 2024 18:44:31 -0400 Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-632-nE7yoPyCPh6-qmpjdza7Fg-1; Wed, 03 Jul 2024 18:44:26 -0400 Received: by mail-ed1-f70.google.com with SMTP id 4fb4d7f45d1cf-58cd7a95dabso650838a12.2 for ; Wed, 03 Jul 2024 15:44:25 -0700 (PDT) Received: from redhat.com ([2a0d:6fc7:441:91a8:a47d:5a9:c02f:92f2]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-58d3ab39f06sm1138310a12.5.2024.07.03.15.44.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Jul 2024 15:44:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1720046668; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=uKK/AiLdjMfeKEPpHLdhaqq0Kf5J8WlC0WD4BsUonqY=; b=On/H69i/g4OAD7urwGfAFSCjHWPGdE1QyeibfRBAACOdZNEcVtrmzSWzsD1Uwbi7JWk+GS Gx5PAYb+H+6Hq8uZkCgRlUR5A3x4v/ZGFDi7zvCWXqy82+C6IHaFIDdW2d+wB7XPIEZK58 yJAkgE1EZljL9oJGHzVCPI4HLzqMT4U= X-MC-Unique: nE7yoPyCPh6-qmpjdza7Fg-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720046664; x=1720651464; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=uKK/AiLdjMfeKEPpHLdhaqq0Kf5J8WlC0WD4BsUonqY=; b=g8ThQqX4XCVXIFJQytkm8pgV7kgyAWWL9877wE+b/kndCmBXsAOfZVBJ7E5Wrzejrh 2vFUlFiqEB+ux09aUWh20OqzUo19Kubi7kmIQx1HMuMvoBqGB1L3JLV1O3bBfeu1zk6x JHj+phNyNI1GFkZeL2sGGTjrNkj/5+IqL9ea+iOjiO7j9/IkKvsuQs0ICVvTGgiNK1dp E4HOfp3m+JAuGqUo7nURZLZywD/Y6PsG6kjINNApofJLnYrlMjP+UagnfqKHokJE+O/7 xfb4EQ4bF2sbIxHe03vylsXE9rbM91UPUNsvcxvyT7U6v9XHJl4Dw1D2wbP2fb64FRBd krmw== X-Gm-Message-State: AOJu0YzNu0ZQ524rDlo16DHvVCZv8zxC3y1erV7e8/F2lB7PAno1UFyH J7JIGldj3lN48EgWgRlKJ2VsdcsUmrh/0DtrKHHMXkxxgyeCnIB1If8swzMFOL98hH6RkrfLr1S 6NOlOMWvAYTEu2Fb3IGgR6+33VHrJMNiT54UjenjXuAJc/P1DA4VGsPhKMaSXe+kaAUQFYD8TfZ 7puZwoJqWKfbArUldDZBuELFX0uDowXw== X-Received: by 2002:a05:6402:1913:b0:587:2dd1:4b6c with SMTP id 4fb4d7f45d1cf-587a0822bdemr11691843a12.30.1720046663912; Wed, 03 Jul 2024 15:44:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHPVA3gyimLwz7HWkXro6XcDKREO2jj8DTMtOdzI7ag7EE0wqmlhPN0pJg9R4LjRAs0oKo9zA== X-Received: by 2002:a05:6402:1913:b0:587:2dd1:4b6c with SMTP id 4fb4d7f45d1cf-587a0822bdemr11691824a12.30.1720046663179; Wed, 03 Jul 2024 15:44:23 -0700 (PDT) Date: Wed, 3 Jul 2024 18:44:19 -0400 From: "Michael S. Tsirkin" To: qemu-devel@nongnu.org Cc: Peter Maydell , Si-Wei Liu , Joao Martins , Jason Wang Subject: [PULL v3 02/85] vhost: Perform memory section dirty scans once per iteration Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Mailer: git-send-email 2.27.0.106.g8ac3dc51b1 X-Mutt-Fcc: =sent Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=mst@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1720046772982100004 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Si-Wei Liu On setups with one or more virtio-net devices with vhost on, dirty tracking iteration increases cost the bigger the number amount of queues are set up e.g. on idle guests migration the following is observed with virtio-net with vhost=3Don: 48 queues -> 78.11% [.] vhost_dev_sync_region.isra.13 8 queues -> 40.50% [.] vhost_dev_sync_region.isra.13 1 queue -> 6.89% [.] vhost_dev_sync_region.isra.13 2 devices, 1 queue -> 18.60% [.] vhost_dev_sync_region.isra.14 With high memory rates the symptom is lack of convergence as soon as it has a vhost device with a sufficiently high number of queues, the sufficient number of vhost devices. On every migration iteration (every 100msecs) it will redundantly query the *shared log* the number of queues configured with vhost that exist in the guest. For the virtqueue data, this is necessary, but not for the memory sections which are the same. So essentially we end up scanning the dirty log too often. To fix that, select a vhost device responsible for scanning the log with regards to memory sections dirty tracking. It is selected when we enable the logger (during migration) and cleared when we disable the logger. If the vhost logger device goes away for some reason, the logger will be re-selected from the rest of vhost devices. After making mem-section logger a singleton instance, constant cost of 7%-9% (like the 1 queue report) will be seen, no matter how many queues or how many vhost devices are configured: 48 queues -> 8.71% [.] vhost_dev_sync_region.isra.13 2 devices, 8 queues -> 7.97% [.] vhost_dev_sync_region.isra.14 Co-developed-by: Joao Martins Signed-off-by: Joao Martins Signed-off-by: Si-Wei Liu Message-Id: <1710448055-11709-2-git-send-email-si-wei.liu@oracle.com> Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang --- include/hw/virtio/vhost.h | 1 + hw/virtio/vhost.c | 67 +++++++++++++++++++++++++++++++++++---- 2 files changed, 62 insertions(+), 6 deletions(-) diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index 02477788df..d75faf46e9 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -129,6 +129,7 @@ struct vhost_dev { void *opaque; struct vhost_log *log; QLIST_ENTRY(vhost_dev) entry; + QLIST_ENTRY(vhost_dev) logdev_entry; QLIST_HEAD(, vhost_iommu) iommu_list; IOMMUNotifier n; const VhostDevConfigOps *config_ops; diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index a1e8b79e1a..06fc71746e 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -45,6 +45,7 @@ =20 static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX]; static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX]; +static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX]; =20 /* Memslots used by backends that support private memslots (without an fd)= . */ static unsigned int used_memslots; @@ -149,6 +150,47 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev) } } =20 +static inline bool vhost_dev_should_log(struct vhost_dev *dev) +{ + assert(dev->vhost_ops); + assert(dev->vhost_ops->backend_type > VHOST_BACKEND_TYPE_NONE); + assert(dev->vhost_ops->backend_type < VHOST_BACKEND_TYPE_MAX); + + return dev =3D=3D QLIST_FIRST(&vhost_log_devs[dev->vhost_ops->backend_= type]); +} + +static inline void vhost_dev_elect_mem_logger(struct vhost_dev *hdev, bool= add) +{ + VhostBackendType backend_type; + + assert(hdev->vhost_ops); + + backend_type =3D hdev->vhost_ops->backend_type; + assert(backend_type > VHOST_BACKEND_TYPE_NONE); + assert(backend_type < VHOST_BACKEND_TYPE_MAX); + + if (add && !QLIST_IS_INSERTED(hdev, logdev_entry)) { + if (QLIST_EMPTY(&vhost_log_devs[backend_type])) { + QLIST_INSERT_HEAD(&vhost_log_devs[backend_type], + hdev, logdev_entry); + } else { + /* + * The first vhost_device in the list is selected as the shared + * logger to scan memory sections. Put new entry next to the h= ead + * to avoid inadvertent change to the underlying logger device. + * This is done in order to get better cache locality and to a= void + * performance churn on the hot path for log scanning. Even wh= en + * new devices come and go quickly, it wouldn't end up changing + * the active leading logger device at all. + */ + QLIST_INSERT_AFTER(QLIST_FIRST(&vhost_log_devs[backend_type]), + hdev, logdev_entry); + } + } else if (!add && QLIST_IS_INSERTED(hdev, logdev_entry)) { + QLIST_REMOVE(hdev, logdev_entry); + } +} + static int vhost_sync_dirty_bitmap(struct vhost_dev *dev, MemoryRegionSection *section, hwaddr first, @@ -166,12 +208,14 @@ static int vhost_sync_dirty_bitmap(struct vhost_dev *= dev, start_addr =3D MAX(first, start_addr); end_addr =3D MIN(last, end_addr); =20 - for (i =3D 0; i < dev->mem->nregions; ++i) { - struct vhost_memory_region *reg =3D dev->mem->regions + i; - vhost_dev_sync_region(dev, section, start_addr, end_addr, - reg->guest_phys_addr, - range_get_last(reg->guest_phys_addr, - reg->memory_size)); + if (vhost_dev_should_log(dev)) { + for (i =3D 0; i < dev->mem->nregions; ++i) { + struct vhost_memory_region *reg =3D dev->mem->regions + i; + vhost_dev_sync_region(dev, section, start_addr, end_addr, + reg->guest_phys_addr, + range_get_last(reg->guest_phys_addr, + reg->memory_size)); + } } for (i =3D 0; i < dev->nvqs; ++i) { struct vhost_virtqueue *vq =3D dev->vqs + i; @@ -383,6 +427,7 @@ static void vhost_log_put(struct vhost_dev *dev, bool s= ync) g_free(log); } =20 + vhost_dev_elect_mem_logger(dev, false); dev->log =3D NULL; dev->log_size =3D 0; } @@ -998,6 +1043,15 @@ static int vhost_dev_set_log(struct vhost_dev *dev, b= ool enable_log) goto err_vq; } } + + /* + * At log start we select our vhost_device logger that will scan the + * memory sections and skip for the others. This is possible because + * the log is shared amongst all vhost devices for a given type of + * backend. + */ + vhost_dev_elect_mem_logger(dev, enable_log); + return 0; err_vq: for (; i >=3D 0; --i) { @@ -2075,6 +2129,7 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODev= ice *vdev, bool vrings) VHOST_OPS_DEBUG(r, "vhost_set_log_base failed"); goto fail_log; } + vhost_dev_elect_mem_logger(hdev, true); } if (vrings) { r =3D vhost_dev_set_vring_enable(hdev, true); --=20 MST