From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694182929; cv=none; d=zohomail.com; s=zohoarc; b=SAWrYXD269jotkVYMNQ3GSFAV+OQvhtEahxdDyiqDSGXl/mIu2nzPZ6g7pb+cUlG9TBzppHAb14tcHwfsJwPcdWYtg/wuZC8uE47s5yTfZnFvhTGbxoyoAnDWPQN0gfUBrWTTzJgEONpw+4UqR7QPjUc4upNk3g9I7mOuq4FGzg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694182929; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=H2lWTPsgDvCfwCJwS9h9vAACqbsIOrrr2brotmfmJqM=; b=Ngm8B3icq07sTqGE5SHupzoX71PfkgHhDKm18TJJnkwa/jzu9FnyY3c6RkGSvGzGLIeIO+SgGUB+StY+5yo7AG2BXKCvygQNDEW4hjUuS4g0CMX230kGUvraPDMAK0BtkRYoqZRUfHCWmnjzKF/VYgfQ4BWhaKT9/d9jz3zCWfw= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694182929030591.8938975582068; Fri, 8 Sep 2023 07:22:09 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecMq-0000aO-W3; Fri, 08 Sep 2023 10:21:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecMp-0000ZS-5U for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:21:51 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecMm-0001ZY-6u for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:21:50 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-159-xItSUCREPAe7MnvJnlp2Ow-1; Fri, 08 Sep 2023 10:21:44 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AA67E80268A; Fri, 8 Sep 2023 14:21:43 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id E8A64C03295; Fri, 8 Sep 2023 14:21:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182907; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=H2lWTPsgDvCfwCJwS9h9vAACqbsIOrrr2brotmfmJqM=; b=G1/P2n9tg4IyPpr0+FJSh5rpj67B/19tNRneO8n1n4OJhjtqE4y5qJlyuHctF++Ipet5k8 upqh9JmWs/AUDpG+nXzmgc7Kdw7Zcg+j7jXtGPRMczLcshkyKRJ2F0GT+YhCqBnOwZmG9j srN3MGBewESTQqWh3a0GYYTtcti6tpQ= X-MC-Unique: xItSUCREPAe7MnvJnlp2Ow-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org, Tiwei Bie Subject: [PATCH v3 01/16] vhost: Rework memslot filtering and fix "used_memslot" tracking Date: Fri, 8 Sep 2023 16:21:21 +0200 Message-ID: <20230908142136.403541-2-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694182931445100003 Content-Type: text/plain; charset="utf-8" Having multiple vhost devices, some filtering out fd-less memslots and some not, can mess up the "used_memslot" accounting. Consequently our "free memslot" checks become unreliable and we might run out of free memslots at runtime later. An example sequence which can trigger a potential issue that involves different vhost backends (vhost-kernel and vhost-user) and hotplugged memory devices can be found at [1]. Let's make the filtering mechanism less generic and distinguish between backends that support private memslots (without a fd) and ones that only support shared memslots (with a fd). Track the used_memslots for both cases separately and use the corresponding value when required. Note: Most probably we should filter out MAP_PRIVATE fd-based RAM regions (for example, via memory-backend-memfd,...,shared=3Doff or as default with memory-backend-file) as well. When not using MAP_SHARED, it might not work as expected. Add a TODO for now. [1] https://lkml.kernel.org/r/fad9136f-08d3-3fd9-71a1-502069c000cf@redhat.c= om Fixes: 988a27754bbb ("vhost: allow backends to filter memory sections") Cc: Tiwei Bie Acked-by: Igor Mammedov Reviewed-by: Peter Xu Signed-off-by: David Hildenbrand --- hw/virtio/vhost-user.c | 7 ++-- hw/virtio/vhost.c | 56 ++++++++++++++++++++++++++----- include/hw/virtio/vhost-backend.h | 5 ++- 3 files changed, 52 insertions(+), 16 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 8dcf049d42..1e7553352a 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -2500,10 +2500,9 @@ vhost_user_crypto_close_session(struct vhost_dev *de= v, uint64_t session_id) return 0; } =20 -static bool vhost_user_mem_section_filter(struct vhost_dev *dev, - MemoryRegionSection *section) +static bool vhost_user_no_private_memslots(struct vhost_dev *dev) { - return memory_region_get_fd(section->mr) >=3D 0; + return true; } =20 static int vhost_user_get_inflight_fd(struct vhost_dev *dev, @@ -2746,6 +2745,7 @@ const VhostOps user_ops =3D { .vhost_backend_init =3D vhost_user_backend_init, .vhost_backend_cleanup =3D vhost_user_backend_cleanup, .vhost_backend_memslots_limit =3D vhost_user_memslots_limit, + .vhost_backend_no_private_memslots =3D vhost_user_no_private_memsl= ots, .vhost_set_log_base =3D vhost_user_set_log_base, .vhost_set_mem_table =3D vhost_user_set_mem_table, .vhost_set_vring_addr =3D vhost_user_set_vring_addr, @@ -2772,7 +2772,6 @@ const VhostOps user_ops =3D { .vhost_set_config =3D vhost_user_set_config, .vhost_crypto_create_session =3D vhost_user_crypto_create_session, .vhost_crypto_close_session =3D vhost_user_crypto_close_session, - .vhost_backend_mem_section_filter =3D vhost_user_mem_section_filte= r, .vhost_get_inflight_fd =3D vhost_user_get_inflight_fd, .vhost_set_inflight_fd =3D vhost_user_set_inflight_fd, .vhost_dev_start =3D vhost_user_dev_start, diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index e2f6ffb446..c1e6148833 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -45,20 +45,33 @@ static struct vhost_log *vhost_log; static struct vhost_log *vhost_log_shm; =20 +/* Memslots used by backends that support private memslots (without an fd)= . */ static unsigned int used_memslots; + +/* Memslots used by backends that only support shared memslots (with an fd= ). */ +static unsigned int used_shared_memslots; + static QLIST_HEAD(, vhost_dev) vhost_devices =3D QLIST_HEAD_INITIALIZER(vhost_devices); =20 bool vhost_has_free_slot(void) { - unsigned int slots_limit =3D ~0U; + unsigned int free =3D UINT_MAX; struct vhost_dev *hdev; =20 QLIST_FOREACH(hdev, &vhost_devices, entry) { unsigned int r =3D hdev->vhost_ops->vhost_backend_memslots_limit(h= dev); - slots_limit =3D MIN(slots_limit, r); + unsigned int cur_free; + + if (hdev->vhost_ops->vhost_backend_no_private_memslots && + hdev->vhost_ops->vhost_backend_no_private_memslots(hdev)) { + cur_free =3D r - used_shared_memslots; + } else { + cur_free =3D r - used_memslots; + } + free =3D MIN(free, cur_free); } - return slots_limit > used_memslots; + return free > 0; } =20 static void vhost_dev_sync_region(struct vhost_dev *dev, @@ -474,8 +487,7 @@ static int vhost_verify_ring_mappings(struct vhost_dev = *dev, * vhost_section: identify sections needed for vhost access * * We only care about RAM sections here (where virtqueue and guest - * internals accessed by virtio might live). If we find one we still - * allow the backend to potentially filter it out of our list. + * internals accessed by virtio might live). */ static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *sect= ion) { @@ -502,8 +514,16 @@ static bool vhost_section(struct vhost_dev *dev, Memor= yRegionSection *section) return false; } =20 - if (dev->vhost_ops->vhost_backend_mem_section_filter && - !dev->vhost_ops->vhost_backend_mem_section_filter(dev, section= )) { + /* + * Some backends (like vhost-user) can only handle memory regions + * that have an fd (can be mapped into a different process). Filter + * the ones without an fd out, if requested. + * + * TODO: we might have to limit to MAP_SHARED as well. + */ + if (memory_region_get_fd(section->mr) < 0 && + dev->vhost_ops->vhost_backend_no_private_memslots && + dev->vhost_ops->vhost_backend_no_private_memslots(dev)) { trace_vhost_reject_section(mr->name, 2); return false; } @@ -568,7 +588,14 @@ static void vhost_commit(MemoryListener *listener) dev->n_mem_sections * sizeof dev->mem->regions[0]; dev->mem =3D g_realloc(dev->mem, regions_size); dev->mem->nregions =3D dev->n_mem_sections; - used_memslots =3D dev->mem->nregions; + + if (dev->vhost_ops->vhost_backend_no_private_memslots && + dev->vhost_ops->vhost_backend_no_private_memslots(dev)) { + used_shared_memslots =3D dev->mem->nregions; + } else { + used_memslots =3D dev->mem->nregions; + } + for (i =3D 0; i < dev->n_mem_sections; i++) { struct vhost_memory_region *cur_vmr =3D dev->mem->regions + i; struct MemoryRegionSection *mrs =3D dev->mem_sections + i; @@ -1400,6 +1427,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaq= ue, VhostBackendType backend_type, uint32_t busyloop_timeou= t, Error **errp) { + unsigned int used; uint64_t features; int i, r, n_initialized_vqs =3D 0; =20 @@ -1495,7 +1523,17 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opa= que, memory_listener_register(&hdev->memory_listener, &address_space_memory= ); QLIST_INSERT_HEAD(&vhost_devices, hdev, entry); =20 - if (used_memslots > hdev->vhost_ops->vhost_backend_memslots_limit(hdev= )) { + /* + * The listener we registered properly updated the corresponding count= er. + * So we can trust that these values are accurate. + */ + if (hdev->vhost_ops->vhost_backend_no_private_memslots && + hdev->vhost_ops->vhost_backend_no_private_memslots(hdev)) { + used =3D used_shared_memslots; + } else { + used =3D used_memslots; + } + if (used > hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) { error_setg(errp, "vhost backend memory slots limit is less" " than current number of present memory slots"); r =3D -EINVAL; diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-ba= ckend.h index 31a251a9f5..df2821ddae 100644 --- a/include/hw/virtio/vhost-backend.h +++ b/include/hw/virtio/vhost-backend.h @@ -108,8 +108,7 @@ typedef int (*vhost_crypto_create_session_op)(struct vh= ost_dev *dev, typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev, uint64_t session_id); =20 -typedef bool (*vhost_backend_mem_section_filter_op)(struct vhost_dev *dev, - MemoryRegionSection *secti= on); +typedef bool (*vhost_backend_no_private_memslots_op)(struct vhost_dev *dev= ); =20 typedef int (*vhost_get_inflight_fd_op)(struct vhost_dev *dev, uint16_t queue_size, @@ -138,6 +137,7 @@ typedef struct VhostOps { vhost_backend_init vhost_backend_init; vhost_backend_cleanup vhost_backend_cleanup; vhost_backend_memslots_limit vhost_backend_memslots_limit; + vhost_backend_no_private_memslots_op vhost_backend_no_private_memslots; vhost_net_set_backend_op vhost_net_set_backend; vhost_net_set_mtu_op vhost_net_set_mtu; vhost_scsi_set_endpoint_op vhost_scsi_set_endpoint; @@ -172,7 +172,6 @@ typedef struct VhostOps { vhost_set_config_op vhost_set_config; vhost_crypto_create_session_op vhost_crypto_create_session; vhost_crypto_close_session_op vhost_crypto_close_session; - vhost_backend_mem_section_filter_op vhost_backend_mem_section_filter; vhost_get_inflight_fd_op vhost_get_inflight_fd; vhost_set_inflight_fd_op vhost_set_inflight_fd; vhost_dev_start_op vhost_dev_start; --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694183001; cv=none; d=zohomail.com; s=zohoarc; b=Bgy6Jq1Z6GD6OgYCbU0EbnH20wN+762FY8bySSKq5JzjFAUMIWhWZqbXVQgSthu1xoa+FzV3k4rjfPqcPtkEPuUeVGpiSwX/F2ygzL5rGKna2CUpU3HML9M6y9+1MiINtwcvqUbeu3nuK4o+YQIUZTEcMGwtK306WVbxHjQdxWI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694183001; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=5hGg5Jd2AwoQp/M7rl8YRge8IzBVWAeYBCSo0mtTyyc=; b=aC58YW+DyJqfHyU84cv4aPOV6WSsJNYHkns8XJhsZiyUHzgoortb4iBaU/5QUgXyADLDZD25QXbepMdCz5+3mDPCQmk4QS6ybVdCGI5L7jWS5KYWKfnkStFUASHQvM2s/K7oqwgJ04wk9HlFgWyRhK9S8Ne0BxRYHqCZ3l0l3aE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694183001234475.28887209437914; Fri, 8 Sep 2023 07:23:21 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecMv-0000fv-KI; Fri, 08 Sep 2023 10:21:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecMu-0000bl-Ly for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:21:56 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecMs-0001b8-12 for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:21:56 -0400 Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-427-4J6dJIRNOT-wuz6J6CPOJQ-1; Fri, 08 Sep 2023 10:21:47 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 81B513C10156; Fri, 8 Sep 2023 14:21:46 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id E9B3DC03295; Fri, 8 Sep 2023 14:21:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182913; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5hGg5Jd2AwoQp/M7rl8YRge8IzBVWAeYBCSo0mtTyyc=; b=R2FDnGsTut8VG49rRmpJQNCIkz8D75EkTF+mJnsdM3LixUZ4I/eAxZl1XKFm/UCTt3/BHe zcmzlnpsf5FP8V7dynT3LFjXsVsE3xO/lDYOB2J1A0c8H7HY7W1PvvBjzaPLX1XsIFLuOf TbbA477YDZkQAEkjnNG26MIorHq0G4o= X-MC-Unique: 4J6dJIRNOT-wuz6J6CPOJQ-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 02/16] vhost: Remove vhost_backend_can_merge() callback Date: Fri, 8 Sep 2023 16:21:22 +0200 Message-ID: <20230908142136.403541-3-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694183002765100003 Content-Type: text/plain; charset="utf-8" Checking whether the memory regions are equal is sufficient: if they are equal, then most certainly the contained fd is equal. The whole vhost-user memslot handling is suboptimal and overly complicated. We shouldn't have to lookup a RAM memory regions we got notified about in vhost_user_get_mr_data() using a host pointer. But that requires a bigger rework -- especially an alternative vhost_set_mem_table() backend call that simply consumes MemoryRegionSections. For now, let's just drop vhost_backend_can_merge(). Acked-by: Stefan Hajnoczi Reviewed-by: Igor Mammedov Acked-by: Igor Mammedov Reviewed-by: Peter Xu Signed-off-by: David Hildenbrand --- hw/virtio/vhost-user.c | 14 -------------- hw/virtio/vhost-vdpa.c | 1 - hw/virtio/vhost.c | 6 +----- include/hw/virtio/vhost-backend.h | 4 ---- 4 files changed, 1 insertion(+), 24 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 1e7553352a..e6de930872 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -2205,19 +2205,6 @@ static int vhost_user_migration_done(struct vhost_de= v *dev, char* mac_addr) return -ENOTSUP; } =20 -static bool vhost_user_can_merge(struct vhost_dev *dev, - uint64_t start1, uint64_t size1, - uint64_t start2, uint64_t size2) -{ - ram_addr_t offset; - int mfd, rfd; - - (void)vhost_user_get_mr_data(start1, &offset, &mfd); - (void)vhost_user_get_mr_data(start2, &offset, &rfd); - - return mfd =3D=3D rfd; -} - static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu) { VhostUserMsg msg; @@ -2764,7 +2751,6 @@ const VhostOps user_ops =3D { .vhost_set_vring_enable =3D vhost_user_set_vring_enable, .vhost_requires_shm_log =3D vhost_user_requires_shm_log, .vhost_migration_done =3D vhost_user_migration_done, - .vhost_backend_can_merge =3D vhost_user_can_merge, .vhost_net_set_mtu =3D vhost_user_net_set_mtu, .vhost_set_iotlb_callback =3D vhost_user_set_iotlb_callback, .vhost_send_device_iotlb_msg =3D vhost_user_send_device_iotlb_msg, diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 42f2a4bae9..8f07bee041 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -1508,7 +1508,6 @@ const VhostOps vdpa_ops =3D { .vhost_set_config =3D vhost_vdpa_set_config, .vhost_requires_shm_log =3D NULL, .vhost_migration_done =3D NULL, - .vhost_backend_can_merge =3D NULL, .vhost_net_set_mtu =3D NULL, .vhost_set_iotlb_callback =3D NULL, .vhost_send_device_iotlb_msg =3D NULL, diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index c1e6148833..c16ad14535 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -728,11 +728,7 @@ static void vhost_region_add_section(struct vhost_dev = *dev, size_t offset =3D mrs_gpa - prev_gpa_start; =20 if (prev_host_start + offset =3D=3D mrs_host && - section->mr =3D=3D prev_sec->mr && - (!dev->vhost_ops->vhost_backend_can_merge || - dev->vhost_ops->vhost_backend_can_merge(dev, - mrs_host, mrs_size, - prev_host_start, prev_size))) { + section->mr =3D=3D prev_sec->mr) { uint64_t max_end =3D MAX(prev_host_end, mrs_host + mrs_siz= e); need_add =3D false; prev_sec->offset_within_address_space =3D diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-ba= ckend.h index df2821ddae..12d578824b 100644 --- a/include/hw/virtio/vhost-backend.h +++ b/include/hw/virtio/vhost-backend.h @@ -86,9 +86,6 @@ typedef int (*vhost_set_vring_enable_op)(struct vhost_dev= *dev, typedef bool (*vhost_requires_shm_log_op)(struct vhost_dev *dev); typedef int (*vhost_migration_done_op)(struct vhost_dev *dev, char *mac_addr); -typedef bool (*vhost_backend_can_merge_op)(struct vhost_dev *dev, - uint64_t start1, uint64_t size1, - uint64_t start2, uint64_t size2= ); typedef int (*vhost_vsock_set_guest_cid_op)(struct vhost_dev *dev, uint64_t guest_cid); typedef int (*vhost_vsock_set_running_op)(struct vhost_dev *dev, int start= ); @@ -163,7 +160,6 @@ typedef struct VhostOps { vhost_set_vring_enable_op vhost_set_vring_enable; vhost_requires_shm_log_op vhost_requires_shm_log; vhost_migration_done_op vhost_migration_done; - vhost_backend_can_merge_op vhost_backend_can_merge; vhost_vsock_set_guest_cid_op vhost_vsock_set_guest_cid; vhost_vsock_set_running_op vhost_vsock_set_running; vhost_set_iotlb_callback_op vhost_set_iotlb_callback; --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694183036; cv=none; d=zohomail.com; s=zohoarc; b=Ik0U4F4BSdW3PNkNmo3O5tkNNmyIQNOVJ0R5nV9QHKvHbOU8lbKP5JB4laFBOvZOS0vph/m7lwr9v8nbM3gdWyQPo7Y5m6eWA/ZnVtidPe4GdXO1dQSe+UUOVCPyRl5bDV95ZRFJH7Kb9sZTGbTPYtTxTQFxbc34O3hb6QlOCjA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694183036; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=IoJrZdWmbpI2ZSfLps/WZQB/6VAwvY+oBjrmf39TZMg=; b=P8WwNu1+l5YovVEgWFEliHB7C+ZaS/BQ4D+yw5rzI9lkMclCn+l8F5OAyBqUV6uodwSf7my9k1znmUlq8jhMUObRKLA02MEvhSzMiZkCl869oLCuo8rbdguBNiNnsltcLm1sRu9iZqpjwqIPf2UQMAmSrovcEzKhM9uF39q52Bs= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694183036305133.45301318676127; Fri, 8 Sep 2023 07:23:56 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecMx-0000sN-Ct; Fri, 08 Sep 2023 10:21:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecMv-0000gt-QJ for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:21:57 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecMt-0001bb-Ja for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:21:57 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-172-cEeeHnTPPcu7AY8zVLiyEQ-1; Fri, 08 Sep 2023 10:21:49 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 558BD801FA9; Fri, 8 Sep 2023 14:21:49 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id BB5A1C03295; Fri, 8 Sep 2023 14:21:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182915; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IoJrZdWmbpI2ZSfLps/WZQB/6VAwvY+oBjrmf39TZMg=; b=izenIfFtAj8BQ4w0GtHEj5fP1lYZM3vfZIeoMjbqqYm2RBUfW6cMN7CMvgZ98sRIr+1RSR LmpLOMZejCEdu31Q0uVIkn+AzvPcxbQezfTMYmpDY+9JBq1UcwsrVdp71Q49a/fuIci00h WRLjyuOwJlnYuw5UAVKVlq+yrvphhKQ= X-MC-Unique: cEeeHnTPPcu7AY8zVLiyEQ-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 03/16] softmmu/physmem: Fixup qemu_ram_block_from_host() documentation Date: Fri, 8 Sep 2023 16:21:23 +0200 Message-ID: <20230908142136.403541-4-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694183037567100001 Let's fixup the documentation (e.g., removing traces of the ram_addr parameter that no longer exists) and move it to the header file while at it. Suggested-by: Igor Mammedov Acked-by: Igor Mammedov Reviewed-by: Peter Xu Reviewed-by: Philippe Mathieu-Daud=C3=A9 Signed-off-by: David Hildenbrand --- include/exec/cpu-common.h | 15 +++++++++++++++ softmmu/physmem.c | 17 ----------------- 2 files changed, 15 insertions(+), 17 deletions(-) diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h index 41788c0bdd..e29b13dc70 100644 --- a/include/exec/cpu-common.h +++ b/include/exec/cpu-common.h @@ -76,6 +76,21 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length); ram_addr_t qemu_ram_addr_from_host(void *ptr); ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr); RAMBlock *qemu_ram_block_by_name(const char *name); + +/* + * Translates a host ptr back to a RAMBlock and an offset in that RAMBlock. + * + * @ptr: The host pointer to translate. + * @round_offset: Whether to round the result offset down to a target page + * @offset: Will be set to the offset within the returned RAMBlock. + * + * Returns: RAMBlock (or NULL if not found) + * + * By the time this function returns, the returned pointer is not protected + * by RCU anymore. If the caller is not within an RCU critical section and + * does not hold the iothread lock, it must have other means of protecting= the + * pointer, such as a reference to the memory region that owns the RAMBloc= k. + */ RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset, ram_addr_t *offset); ram_addr_t qemu_ram_block_host_offset(RAMBlock *rb, void *host); diff --git a/softmmu/physmem.c b/softmmu/physmem.c index 18277ddd67..c893358923 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -2180,23 +2180,6 @@ ram_addr_t qemu_ram_block_host_offset(RAMBlock *rb, = void *host) return res; } =20 -/* - * Translates a host ptr back to a RAMBlock, a ram_addr and an offset - * in that RAMBlock. - * - * ptr: Host pointer to look up - * round_offset: If true round the result offset down to a page boundary - * *ram_addr: set to result ram_addr - * *offset: set to result offset within the RAMBlock - * - * Returns: RAMBlock (or NULL if not found) - * - * By the time this function returns, the returned pointer is not protected - * by RCU anymore. If the caller is not within an RCU critical section and - * does not hold the iothread lock, it must have other means of protecting= the - * pointer, such as a reference to the region that includes the incoming - * ram_addr_t. - */ RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset, ram_addr_t *offset) { --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694183086; cv=none; d=zohomail.com; s=zohoarc; b=gOZqNpn0Yt05Q5FpDBuThZ264d8QbQkDZOWn/xz5WooKmDFxx88E2OTateT/FTHngQCekvPH/E9fkY+Lvo9gTFzmbOMv9DcaSxgjHOZPgvPloR5BfpwSkUFYggON0LciO0J0MWmOcK5OMXCVBD1/pQ7exppZWTsORQAN86jMVRU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694183086; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=qnTHMLcRnBjSX0HX1pgBvV2KvHSENwYwwq39XL4vDpQ=; b=EWwzHP7CQC8PTad6ERAC6GHeVuHSmNP9hBWzm5+VjneeVochSe758Yu5axAn3/8/LiVgCqtqcF3kSchdBtsEHAV+bA8CenSGCDt7/IkeR75SY2ul/1i5G+PCc0VOIxSi+ma0c+ec/2buctiRzlnNktKlWW7O+E2oaQK9xyvCINo= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694183086975998.1630981002374; Fri, 8 Sep 2023 07:24:46 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecMy-0000yS-B4; Fri, 08 Sep 2023 10:22:00 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecMw-0000pE-PI for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:21:58 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecMu-0001bm-Cn for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:21:58 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-515-01cIp1v0OrKGk1Aeh_B-6w-1; Fri, 08 Sep 2023 10:21:52 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 298F98015AA; Fri, 8 Sep 2023 14:21:52 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id 908C8C03295; Fri, 8 Sep 2023 14:21:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182915; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qnTHMLcRnBjSX0HX1pgBvV2KvHSENwYwwq39XL4vDpQ=; b=gwfDVNzDS7jDTo/xDZj3V3BR+SNJOvDWlPy3ODXSLtewL64UXQZ+rEFyBkvhFnMasJ7c25 dvsMmbAMJ8H13kFB2V7x93QSVJ8io7DBQ8dTCl3TAjHSbrEHHerfGhSM1kEqvFNRze8jYT WpmvYcaJ1xV+5qNST0XVzseJmIZmfJ0= X-MC-Unique: 01cIp1v0OrKGk1Aeh_B-6w-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 04/16] kvm: Return number of free memslots Date: Fri, 8 Sep 2023 16:21:24 +0200 Message-ID: <20230908142136.403541-5-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694183089518100003 Let's return the number of free slots instead of only checking if there is a free slot. While at it, check all address spaces, which will also consider SMM under x86 correctly. This is a preparation for memory devices that consume multiple memslots. Reviewed-by: Philippe Mathieu-Daud=C3=A9 Signed-off-by: David Hildenbrand Reviewed-by: Maciej S. Szmigiero --- accel/kvm/kvm-all.c | 33 ++++++++++++++++++++------------- accel/stubs/kvm-stub.c | 4 ++-- hw/mem/memory-device.c | 2 +- include/sysemu/kvm.h | 2 +- include/sysemu/kvm_int.h | 1 + 5 files changed, 25 insertions(+), 17 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 2ba7521695..a29906d441 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -181,6 +181,24 @@ int kvm_get_max_memslots(void) return s->nr_slots; } =20 +unsigned int kvm_get_free_memslots(void) +{ + unsigned int used_slots =3D 0; + KVMState *s =3D kvm_state; + int i; + + kvm_slots_lock(); + for (i =3D 0; i < s->nr_as; i++) { + if (!s->as[i].ml) { + continue; + } + used_slots =3D MAX(used_slots, s->as[i].ml->nr_used_slots); + } + kvm_slots_unlock(); + + return s->nr_slots - used_slots; +} + /* Called with KVMMemoryListener.slots_lock held */ static KVMSlot *kvm_get_free_slot(KVMMemoryListener *kml) { @@ -196,19 +214,6 @@ static KVMSlot *kvm_get_free_slot(KVMMemoryListener *k= ml) return NULL; } =20 -bool kvm_has_free_slot(MachineState *ms) -{ - KVMState *s =3D KVM_STATE(ms->accelerator); - bool result; - KVMMemoryListener *kml =3D &s->memory_listener; - - kvm_slots_lock(); - result =3D !!kvm_get_free_slot(kml); - kvm_slots_unlock(); - - return result; -} - /* Called with KVMMemoryListener.slots_lock held */ static KVMSlot *kvm_alloc_slot(KVMMemoryListener *kml) { @@ -1387,6 +1392,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, } start_addr +=3D slot_size; size -=3D slot_size; + kml->nr_used_slots--; } while (size); return; } @@ -1412,6 +1418,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, ram_start_offset +=3D slot_size; ram +=3D slot_size; size -=3D slot_size; + kml->nr_used_slots++; } while (size); } =20 diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c index 235dc661bc..a5d4442d8f 100644 --- a/accel/stubs/kvm-stub.c +++ b/accel/stubs/kvm-stub.c @@ -109,9 +109,9 @@ int kvm_irqchip_remove_irqfd_notifier_gsi(KVMState *s, = EventNotifier *n, return -ENOSYS; } =20 -bool kvm_has_free_slot(MachineState *ms) +unsigned int kvm_get_free_memslots(void) { - return false; + return 0; } =20 void kvm_init_cpu_signals(CPUState *cpu) diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c index 667d56bd29..98e355c960 100644 --- a/hw/mem/memory-device.c +++ b/hw/mem/memory-device.c @@ -59,7 +59,7 @@ static void memory_device_check_addable(MachineState *ms,= MemoryRegion *mr, const uint64_t size =3D memory_region_size(mr); =20 /* we will need a new memory slot for kvm and vhost */ - if (kvm_enabled() && !kvm_has_free_slot(ms)) { + if (kvm_enabled() && !kvm_get_free_memslots()) { error_setg(errp, "hypervisor has no free memory slots left"); return; } diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index ee9025f8e9..c3d831baef 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -215,7 +215,7 @@ typedef struct KVMRouteChange { =20 /* external API */ =20 -bool kvm_has_free_slot(MachineState *ms); +unsigned int kvm_get_free_memslots(void); bool kvm_has_sync_mmu(void); int kvm_has_vcpu_events(void); int kvm_has_robust_singlestep(void); diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index 511b42bde5..dba2f78fc3 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -40,6 +40,7 @@ typedef struct KVMMemoryUpdate { typedef struct KVMMemoryListener { MemoryListener listener; KVMSlot *slots; + unsigned int nr_used_slots; int as_id; QSIMPLEQ_HEAD(, KVMMemoryUpdate) transaction_add; QSIMPLEQ_HEAD(, KVMMemoryUpdate) transaction_del; --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694182980; cv=none; d=zohomail.com; s=zohoarc; b=hVKnm6Q/uKvnXxR2fD72Ss0F2Y4eqnGrZ28m61e9mTKPaVXlD5/2bGH+2x3+jyy3v21Nx50qRm7w8xWZn4Q0HLKN0rMQp0TQkDsq8Ff9zZ4F3H+GZXWrD7X3q5HVwHPa3C41crEu3zXSfk4S/aMParNSIGk4OzFxrlRiDrSWbLI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694182980; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=ZnSe76MrQdJqy7SBt0LZuZcExNBPVCvqRNLn7T1yS4g=; b=LcaEXc4wHu2fZyeZjXXzHkT5naWSRAktcu0tT2/q8WVtuPSQImJbZuWanCDPBqJdcR3pqucAE4aSfOqdQ+D+wsafwrop48zuyzvZUcsWEqxulTIXtb16qPCY8o/RFT+88ngPoQ5FNVm0Iq59vFKZtXvSJroPyPVgAd2dW7DFb/I= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694182980966424.5641133308011; Fri, 8 Sep 2023 07:23:00 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecN3-00018p-59; Fri, 08 Sep 2023 10:22:05 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecN1-00014l-JV for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:03 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecMz-0001dN-Cg for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:03 -0400 Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-124-FYYtTZinOqy8HhH5My6Rwg-1; Fri, 08 Sep 2023 10:21:55 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EE831280FED5; Fri, 8 Sep 2023 14:21:54 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id 62F6EC03295; Fri, 8 Sep 2023 14:21:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182920; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZnSe76MrQdJqy7SBt0LZuZcExNBPVCvqRNLn7T1yS4g=; b=TvojlZf4p0lgQm5GtSBjHlPeSWj1WJ08FrkQ5VYz3IWFkyxmZ+q1OxJXepFmgpujrCkXfc fiofjtflB7YjESyaOSodzproQlFTeH+sinPU9CDnJtc+TpGsP1wg0qe22AxLYsTqBFe2iP PuccZMNoTORWG3OjYaUTNCrlpy9Xu8g= X-MC-Unique: FYYtTZinOqy8HhH5My6Rwg-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 05/16] vhost: Return number of free memslots Date: Fri, 8 Sep 2023 16:21:25 +0200 Message-ID: <20230908142136.403541-6-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694182982155100003 Content-Type: text/plain; charset="utf-8" Let's return the number of free slots instead of only checking if there is a free slot. Required to support memory devices that consume multiple memslots. This is a preparation for memory devices that consume multiple memslots. Signed-off-by: David Hildenbrand Reviewed-by: Maciej S. Szmigiero --- hw/mem/memory-device.c | 2 +- hw/virtio/vhost-stub.c | 4 ++-- hw/virtio/vhost.c | 4 ++-- include/hw/virtio/vhost.h | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c index 98e355c960..e09960744d 100644 --- a/hw/mem/memory-device.c +++ b/hw/mem/memory-device.c @@ -63,7 +63,7 @@ static void memory_device_check_addable(MachineState *ms,= MemoryRegion *mr, error_setg(errp, "hypervisor has no free memory slots left"); return; } - if (!vhost_has_free_slot()) { + if (!vhost_get_free_memslots()) { error_setg(errp, "a used vhost backend has no free memory slots le= ft"); return; } diff --git a/hw/virtio/vhost-stub.c b/hw/virtio/vhost-stub.c index aa858ef3fb..d53dd9d288 100644 --- a/hw/virtio/vhost-stub.c +++ b/hw/virtio/vhost-stub.c @@ -2,9 +2,9 @@ #include "hw/virtio/vhost.h" #include "hw/virtio/vhost-user.h" =20 -bool vhost_has_free_slot(void) +unsigned int vhost_get_free_memslots(void) { - return true; + return UINT_MAX; } =20 bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index c16ad14535..8e84dca246 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -54,7 +54,7 @@ static unsigned int used_shared_memslots; static QLIST_HEAD(, vhost_dev) vhost_devices =3D QLIST_HEAD_INITIALIZER(vhost_devices); =20 -bool vhost_has_free_slot(void) +unsigned int vhost_get_free_memslots(void) { unsigned int free =3D UINT_MAX; struct vhost_dev *hdev; @@ -71,7 +71,7 @@ bool vhost_has_free_slot(void) } free =3D MIN(free, cur_free); } - return free > 0; + return free; } =20 static void vhost_dev_sync_region(struct vhost_dev *dev, diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index 6a173cb9fa..603bf834be 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -315,7 +315,7 @@ uint64_t vhost_get_features(struct vhost_dev *hdev, con= st int *feature_bits, */ void vhost_ack_features(struct vhost_dev *hdev, const int *feature_bits, uint64_t features); -bool vhost_has_free_slot(void); +unsigned int vhost_get_free_memslots(void); =20 int vhost_net_set_backend(struct vhost_dev *hdev, struct vhost_vring_file *file); --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694182950; cv=none; d=zohomail.com; s=zohoarc; b=QK/wKzI26L+doHjlWDL0gu+ssLTeGgH2X2b00FmblCqcDbbwLYB766bg0Xhz2c1Qjcyjf8Thv5lqXyq1T4ifKk8OLU/yqRaq+2NQlgtUvNgfbd+QAff5MenhCjM8RNP9l0ZRp8nysAn3KIz6Sog0NsCvAluXE2WjB4u2mI/DEuk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694182950; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=JOJejEHn6Wgf5w7IjMwW18HG8VgX0SXNC6kCPLgj2zs=; b=gN4JOzCmAUPSq9SbfXX8eTY45H8IreYvCnCLjTl16INEV5YLwtD86Vsq6375pLi19RNj8ZlMyjc4EZfCxjlWPloMcYYZ7U8WHPzIbwMixeqYDFpDj22o6yWkTqrKD9O9oWmZnEz0Zdyh9UhsMlznmpOSf0lwFD+BtDVvtgFhJIE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694182950087400.14214268984335; Fri, 8 Sep 2023 07:22:30 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecN5-0001Ew-IH; Fri, 08 Sep 2023 10:22:07 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecN4-0001Cz-8f for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:06 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecN1-0001df-UK for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:06 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-62-0OmnSSKIPCOWMOUnbC9vvw-1; Fri, 08 Sep 2023 10:21:58 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DA7F4101CA83; Fri, 8 Sep 2023 14:21:57 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id 36037D47819; Fri, 8 Sep 2023 14:21:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182923; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JOJejEHn6Wgf5w7IjMwW18HG8VgX0SXNC6kCPLgj2zs=; b=EjJAVFua8UMJ417eZPOGCIQL1xZCcm172WnOds9EsT4NA+2+YMA2y/nKofLaLoKi8GqYrp 51VWa+5Ctx1PB2iOizVEOZz2lla94CZAQ1zfm2ege+BBv23QU1V+eDsCcth+CYcOtRjdxg iM4+XZptLOwp7Qh6Dj9HhjtktC7hzzY= X-MC-Unique: 0OmnSSKIPCOWMOUnbC9vvw-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 06/16] memory-device: Support memory devices with multiple memslots Date: Fri, 8 Sep 2023 16:21:26 +0200 Message-ID: <20230908142136.403541-7-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694182951087100004 Content-Type: text/plain; charset="utf-8" We want to support memory devices that have a memory region container as device memory region that maps multiple RAM memory regions. Let's start by supporting memory devices that statically map multiple RAM memory regions and, thereby, consume multiple memslots. We already have one device that uses a container as device memory region: NVDIMMs. However, a NVDIMM always ends up consuming exactly one memslot. Let's add support for that by asking the memory device via a new callback how many memslots it requires. Signed-off-by: David Hildenbrand Reviewed-by: Maciej S. Szmigiero --- hw/mem/memory-device.c | 27 +++++++++++++++++++-------- include/hw/mem/memory-device.h | 18 ++++++++++++++++++ 2 files changed, 37 insertions(+), 8 deletions(-) diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c index e09960744d..0eec0872a9 100644 --- a/hw/mem/memory-device.c +++ b/hw/mem/memory-device.c @@ -52,19 +52,30 @@ static int memory_device_build_list(Object *obj, void *= opaque) return 0; } =20 -static void memory_device_check_addable(MachineState *ms, MemoryRegion *mr, - Error **errp) +static unsigned int memory_device_get_memslots(MemoryDeviceState *md) +{ + const MemoryDeviceClass *mdc =3D MEMORY_DEVICE_GET_CLASS(md); + + if (mdc->get_memslots) { + return mdc->get_memslots(md); + } + return 1; +} + +static void memory_device_check_addable(MachineState *ms, MemoryDeviceStat= e *md, + MemoryRegion *mr, Error **errp) { const uint64_t used_region_size =3D ms->device_memory->used_region_siz= e; const uint64_t size =3D memory_region_size(mr); + const unsigned int required_memslots =3D memory_device_get_memslots(md= ); =20 - /* we will need a new memory slot for kvm and vhost */ - if (kvm_enabled() && !kvm_get_free_memslots()) { - error_setg(errp, "hypervisor has no free memory slots left"); + /* we will need memory slots for kvm and vhost */ + if (kvm_enabled() && kvm_get_free_memslots() < required_memslots) { + error_setg(errp, "hypervisor has not enough free memory slots left= "); return; } - if (!vhost_get_free_memslots()) { - error_setg(errp, "a used vhost backend has no free memory slots le= ft"); + if (vhost_get_free_memslots() < required_memslots) { + error_setg(errp, "a used vhost backend has not enough free memory = slots left"); return; } =20 @@ -233,7 +244,7 @@ void memory_device_pre_plug(MemoryDeviceState *md, Mach= ineState *ms, goto out; } =20 - memory_device_check_addable(ms, mr, &local_err); + memory_device_check_addable(ms, md, mr, &local_err); if (local_err) { goto out; } diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h index 48d2611fc5..b51a579fb9 100644 --- a/include/hw/mem/memory-device.h +++ b/include/hw/mem/memory-device.h @@ -41,6 +41,11 @@ typedef struct MemoryDeviceState MemoryDeviceState; * successive memory regions are used, a covering memory region has to * be provided. Scattered memory regions are not supported for single * devices. + * + * The device memory region returned via @get_memory_region may either be a + * single RAM memory region or a memory region container with subregions + * that are RAM memory regions or aliases to RAM memory regions. Other + * memory regions or subregions are not supported. */ struct MemoryDeviceClass { /* private */ @@ -88,6 +93,19 @@ struct MemoryDeviceClass { */ MemoryRegion *(*get_memory_region)(MemoryDeviceState *md, Error **errp= ); =20 + /* + * Optional for memory devices that require only a single memslot, + * required for all other memory devices: Return the number of memslots + * (distinct RAM memory regions in the device memory region) that are + * required by the device. + * + * If this function is not implemented, the assumption is "1". + * + * Called when (un)plugging the memory device, to check if the require= ments + * can be satisfied, and to do proper accounting. + */ + unsigned int (*get_memslots)(MemoryDeviceState *md); + /* * Optional: Return the desired minimum alignment of the device in gue= st * physical address space. The final alignment is computed based on th= is --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694182950; cv=none; d=zohomail.com; s=zohoarc; b=BFELG7E29wy1XuBamKBRRnn7sULyakpdOjUXye0RtC3uFXuvFX7+JPgHxxtXSOMsvyRcD417T3xZOCsMvj2SwiwWquR0+ut0H91DSwwEN4hviChm9of1RPA7h7q7KUpgKGOxYO4N0W6K3i7chruGidcfxJO8magX4t+5/Tprmkk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694182950; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=Q8s7qWm1hXNJSjwsdxL6SMu6onsy76kevioaNWWVl20=; b=N/Gd2WKPcgjXYpy+NeUeYlr8nMFTQw2Ho87JRCifoUoHCB/eh6QPsmpYrxT+7dVpADW57DfhZQShFBL344P4Ae730BvxEr+kslsV7WjIWp1j9Rp40oDXg3TS+sRcLN24vDUlfrGI6y8SJ2hUsM/PcwimjOJ44G1VkEgiYxCAHHQ= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 169418295029750.04792371557596; Fri, 8 Sep 2023 07:22:30 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecNA-0001Ju-6M; Fri, 08 Sep 2023 10:22:12 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecN7-0001JP-UW for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:09 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecN5-0001fG-RC for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:09 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-17-9rJQ9ttnMyCEKHvWFGitTA-1; Fri, 08 Sep 2023 10:22:01 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C6293800C78; Fri, 8 Sep 2023 14:22:00 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id 25895C03295; Fri, 8 Sep 2023 14:21:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182926; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q8s7qWm1hXNJSjwsdxL6SMu6onsy76kevioaNWWVl20=; b=YZJo+B2ll9bmYyr12GfqoytmGJmCfuQGJyMfEWaC1S5A1Mj0bGDDiFJYQl2KGnvK9QVvPI F1n0R0iMwaYXmbfr4a6WCzG84dydXNqIpMtoQe6dDkCYM5Ard/FZUzKzfqa2BUktCn5vS/ hCF/Z8yXPhbqZB0xDaKurbt9xnzjc60= X-MC-Unique: 9rJQ9ttnMyCEKHvWFGitTA-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 07/16] stubs: Rename qmp_memory_device.c to memory_device.c Date: Fri, 8 Sep 2023 16:21:27 +0200 Message-ID: <20230908142136.403541-8-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694182951077100003 Content-Type: text/plain; charset="utf-8" We want to place non-qmp stubs in there, so let's rename it. While at it, put it into the MAINTAINERS file under "Memory devices". Signed-off-by: David Hildenbrand Reviewed-by: Maciej S. Szmigiero --- MAINTAINERS | 1 + stubs/{qmp_memory_device.c =3D> memory_device.c} | 0 stubs/meson.build | 2 +- 3 files changed, 2 insertions(+), 1 deletion(-) rename stubs/{qmp_memory_device.c =3D> memory_device.c} (100%) diff --git a/MAINTAINERS b/MAINTAINERS index b471973e1e..89b0093e81 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2852,6 +2852,7 @@ F: hw/mem/pc-dimm.c F: include/hw/mem/memory-device.h F: include/hw/mem/nvdimm.h F: include/hw/mem/pc-dimm.h +F: stubs/memory_device.c F: docs/nvdimm.txt =20 SPICE diff --git a/stubs/qmp_memory_device.c b/stubs/memory_device.c similarity index 100% rename from stubs/qmp_memory_device.c rename to stubs/memory_device.c diff --git a/stubs/meson.build b/stubs/meson.build index ef6e39a64d..cde44972bf 100644 --- a/stubs/meson.build +++ b/stubs/meson.build @@ -32,7 +32,7 @@ stub_ss.add(files('monitor.c')) stub_ss.add(files('monitor-core.c')) stub_ss.add(files('physmem.c')) stub_ss.add(files('qemu-timer-notify-cb.c')) -stub_ss.add(files('qmp_memory_device.c')) +stub_ss.add(files('memory_device.c')) stub_ss.add(files('qmp-command-available.c')) stub_ss.add(files('qmp-quit.c')) stub_ss.add(files('qtest.c')) --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694182992; cv=none; d=zohomail.com; s=zohoarc; b=D+GGokMc//LWhG3PhIGy8G9EAttQiOC1lfbN1PSL1oIot52M7sOtAzjd5sG9ASQzL6RC/mOUk2F+/JRwhp6CQPF7EC9JqifNYLSuY09u0fihe3vQ7o5EUOXVyo8EHtk73ri++Z36ZYqo9KQrR62CDBtfgxPWsADnifwOk9an5Wk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694182992; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=IcjRBlX23B58QlEcv5JOZ+Baolj2SxlkLTfiV4y7RyI=; b=ANDb/XHtcH1oxAWhwiFrlrJ15+3IyBciNGJsev3DhMmoqmVMu6gosZQjmAa26ymtvGk4ai6GD98mwktxQVeTAxkKeFO4aGPjfjCeT5Nk6kKYETotpwuGlAIiEjwd9z7Cr85PTuOviiq1+I1yI9mtG7pVwBrurUdz/qixxgOeOfY= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694182992181736.950520602977; Fri, 8 Sep 2023 07:23:12 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecNC-0001UK-Dk; Fri, 08 Sep 2023 10:22:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNB-0001QP-5w for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:13 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecN6-0001gx-6n for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:12 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-130-016hA699OFCXNjq0SAiorw-1; Fri, 08 Sep 2023 10:22:04 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A0CAB816525; Fri, 8 Sep 2023 14:22:03 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0A20AC03295; Fri, 8 Sep 2023 14:22:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182927; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IcjRBlX23B58QlEcv5JOZ+Baolj2SxlkLTfiV4y7RyI=; b=HMUGXfsIPG/mhdK9VP37Sw8ExfQLU6EkOeI6Ng/MPvJQ/PYvrE/Vj0AFoDhYSMDmGPPbBu dwx+YlVIAs7XBD12IcWS3n7Gm1qQ0zs39k2KNf/yQ2+UK8OrilsByyjC0Ke/z2AQ62Z1SE Ynd+DcM+NFEB6HfKHK8bkUorRfHShcw= X-MC-Unique: 016hA699OFCXNjq0SAiorw-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 08/16] memory-device: Track required and actually used memslots in DeviceMemoryState Date: Fri, 8 Sep 2023 16:21:28 +0200 Message-ID: <20230908142136.403541-9-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694182994294100003 Content-Type: text/plain; charset="utf-8" Let's track how many memslots are required by plugged memory devices and how many are currently actually getting used by plugged memory devices. "required - used" is the number of reserved memslots. For now, the number of used and required memslots is always equal, and there are no reservations. This is a preparation for memory devices that want to dynamically consume memslots after initially specifying how many they require -- where we'll end up with reserved memslots. To track the number of used memslots, create a new address space for our device memory and register a memory listener (add/remove) for that address space. Signed-off-by: David Hildenbrand Reviewed-by: Maciej S. Szmigiero --- hw/mem/memory-device.c | 54 ++++++++++++++++++++++++++++++++++++++++++ include/hw/boards.h | 10 +++++++- 2 files changed, 63 insertions(+), 1 deletion(-) diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c index 0eec0872a9..d37cfbd65d 100644 --- a/hw/mem/memory-device.c +++ b/hw/mem/memory-device.c @@ -286,6 +286,7 @@ void memory_device_plug(MemoryDeviceState *md, MachineS= tate *ms) g_assert(ms->device_memory); =20 ms->device_memory->used_region_size +=3D memory_region_size(mr); + ms->device_memory->required_memslots +=3D memory_device_get_memslots(m= d); memory_region_add_subregion(&ms->device_memory->mr, addr - ms->device_memory->base, mr); trace_memory_device_plug(DEVICE(md)->id ? DEVICE(md)->id : "", addr); @@ -305,6 +306,7 @@ void memory_device_unplug(MemoryDeviceState *md, Machin= eState *ms) =20 memory_region_del_subregion(&ms->device_memory->mr, mr); ms->device_memory->used_region_size -=3D memory_region_size(mr); + ms->device_memory->required_memslots -=3D memory_device_get_memslots(m= d); trace_memory_device_unplug(DEVICE(md)->id ? DEVICE(md)->id : "", mdc->get_addr(md)); } @@ -324,6 +326,50 @@ uint64_t memory_device_get_region_size(const MemoryDev= iceState *md, return memory_region_size(mr); } =20 +static void memory_devices_region_mod(MemoryListener *listener, + MemoryRegionSection *mrs, bool add) +{ + DeviceMemoryState *dms =3D container_of(listener, DeviceMemoryState, + listener); + + if (!memory_region_is_ram(mrs->mr)) { + warn_report("Unexpected memory region mapped into device memory re= gion."); + return; + } + + /* + * The expectation is that each distinct RAM memory region section in + * our region for memory devices consumes exactly one memslot in KVM + * and in vhost. For vhost, this is true, except: + * * ROM memory regions don't consume a memslot. These get used very + * rarely for memory devices (R/O NVDIMMs). + * * Memslots without a fd (memory-backend-ram) don't necessarily + * consume a memslot. Such setups are quite rare and possibly bogus: + * the memory would be inaccessible by such vhost devices. + * + * So for vhost, in corner cases we might over-estimate the number of + * memslots that are currently used or that might still be reserved + * (required - used). + */ + dms->used_memslots +=3D add ? 1 : -1; + + if (dms->used_memslots > dms->required_memslots) { + warn_report("Memory devices use more memory slots than indicated a= s required."); + } +} + +static void memory_devices_region_add(MemoryListener *listener, + MemoryRegionSection *mrs) +{ + return memory_devices_region_mod(listener, mrs, true); +} + +static void memory_devices_region_del(MemoryListener *listener, + MemoryRegionSection *mrs) +{ + return memory_devices_region_mod(listener, mrs, false); +} + void machine_memory_devices_init(MachineState *ms, hwaddr base, uint64_t s= ize) { g_assert(size); @@ -333,8 +379,16 @@ void machine_memory_devices_init(MachineState *ms, hwa= ddr base, uint64_t size) =20 memory_region_init(&ms->device_memory->mr, OBJECT(ms), "device-memory", size); + address_space_init(&ms->device_memory->as, &ms->device_memory->mr, + "device-memory"); memory_region_add_subregion(get_system_memory(), ms->device_memory->ba= se, &ms->device_memory->mr); + + /* Track the number of memslots used by memory devices. */ + ms->device_memory->listener.region_add =3D memory_devices_region_add; + ms->device_memory->listener.region_del =3D memory_devices_region_del; + memory_listener_register(&ms->device_memory->listener, + &ms->device_memory->as); } =20 static const TypeInfo memory_device_info =3D { diff --git a/include/hw/boards.h b/include/hw/boards.h index 3b541ffd24..e344ded607 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -296,15 +296,23 @@ struct MachineClass { * DeviceMemoryState: * @base: address in guest physical address space where the memory * address space for memory devices starts - * @mr: address space container for memory devices + * @mr: memory region container for memory devices + * @as: address space for memory devices + * @listener: memory listener used to track used memslots in the address s= pace * @dimm_size: the sum of plugged DIMMs' sizes * @used_region_size: the part of @mr already used by memory devices + * @required_memslots: the number of memslots required by memory devices + * @used_memslots: the number of memslots currently used by memory devices */ typedef struct DeviceMemoryState { hwaddr base; MemoryRegion mr; + AddressSpace as; + MemoryListener listener; uint64_t dimm_size; uint64_t used_region_size; + unsigned int required_memslots; + unsigned int used_memslots; } DeviceMemoryState; =20 /** --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694182953101749.4202278860158; Fri, 8 Sep 2023 07:22:33 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecNE-0001Wz-42; Fri, 08 Sep 2023 10:22:16 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNC-0001Tq-93 for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:14 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecN9-0001jU-MN for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:14 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-658-dAuzfzXiOwKSFnAHbJ0VbQ-1; Fri, 08 Sep 2023 10:22:07 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 76A93181C283; Fri, 8 Sep 2023 14:22:06 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id DBAB2C03295; Fri, 8 Sep 2023 14:22:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182930; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pjxTzzi+bUALobF5fcq+ZnLIGXFuzgDZ7k3W7Jwshcc=; b=OdQT8gbKAW4zGzvi71Uc07VAh//OskGsEnpAgVFW3oBrTViwprsaNq9dxygSzZwRIJ3GFf z0UJ1Dny4HBpx9JZ2Sk8LshVdGT5noLIFWHjCJhAgwpxWpjgdfA0vPf4MKDCLvH9LuLnwF /+9kkYXTQ5wJMZ5/GovtuDX4u857KZk= X-MC-Unique: dAuzfzXiOwKSFnAHbJ0VbQ-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 09/16] memory-device, vhost: Support memory devices that dynamically consume memslots Date: Fri, 8 Sep 2023 16:21:29 +0200 Message-ID: <20230908142136.403541-10-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1694182955373100002 Content-Type: text/plain; charset="utf-8" We want to support memory devices that have a dynamically managed memory region container as device memory region. This device memory region maps multiple RAM memory subregions (e.g., aliases to the same RAM memory region), whereby these subregions can be (un)mapped on demand. Each RAM subregion will consume a memslot in KVM and vhost, resulting in such a new device consuming memslots dynamically, and initially usually 0. We already track the number of used vs. required memslots for all memslots. From that, we can derive the number of reserved memslots that must not be used otherwise. The target use case is virtio-mem and the hyper-v balloon, which will dynamically map aliases to RAM memory region into their device memory region container. Properly document what's supported and what's not and extend the vhost memslot check accordingly. Signed-off-by: David Hildenbrand Reviewed-by: Maciej S. Szmigiero --- hw/mem/memory-device.c | 29 +++++++++++++++++++++++++++-- hw/virtio/vhost.c | 18 ++++++++++++++---- include/hw/mem/memory-device.h | 7 +++++++ stubs/memory_device.c | 5 +++++ 4 files changed, 53 insertions(+), 6 deletions(-) diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c index d37cfbd65d..1b14ba5661 100644 --- a/hw/mem/memory-device.c +++ b/hw/mem/memory-device.c @@ -62,19 +62,44 @@ static unsigned int memory_device_get_memslots(MemoryDe= viceState *md) return 1; } =20 +/* + * Memslots that are reserved by memory devices (required but still report= ed + * as free from KVM / vhost). + */ +static unsigned int get_reserved_memslots(MachineState *ms) +{ + if (ms->device_memory->used_memslots > + ms->device_memory->required_memslots) { + /* This is unexpected, and we warned already in the memory notifie= r. */ + return 0; + } + return ms->device_memory->required_memslots - + ms->device_memory->used_memslots; +} + +unsigned int memory_devices_get_reserved_memslots(void) +{ + if (!current_machine->device_memory) { + return 0; + } + return get_reserved_memslots(current_machine); +} + static void memory_device_check_addable(MachineState *ms, MemoryDeviceStat= e *md, MemoryRegion *mr, Error **errp) { const uint64_t used_region_size =3D ms->device_memory->used_region_siz= e; const uint64_t size =3D memory_region_size(mr); const unsigned int required_memslots =3D memory_device_get_memslots(md= ); + const unsigned int reserved_memslots =3D get_reserved_memslots(ms); =20 /* we will need memory slots for kvm and vhost */ - if (kvm_enabled() && kvm_get_free_memslots() < required_memslots) { + if (kvm_enabled() && + kvm_get_free_memslots() < required_memslots + reserved_memslots) { error_setg(errp, "hypervisor has not enough free memory slots left= "); return; } - if (vhost_get_free_memslots() < required_memslots) { + if (vhost_get_free_memslots() < required_memslots + reserved_memslots)= { error_setg(errp, "a used vhost backend has not enough free memory = slots left"); return; } diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 8e84dca246..f7e1ac12a8 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -23,6 +23,7 @@ #include "qemu/log.h" #include "standard-headers/linux/vhost_types.h" #include "hw/virtio/virtio-bus.h" +#include "hw/mem/memory-device.h" #include "migration/blocker.h" #include "migration/qemu-file-types.h" #include "sysemu/dma.h" @@ -1423,7 +1424,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaq= ue, VhostBackendType backend_type, uint32_t busyloop_timeou= t, Error **errp) { - unsigned int used; + unsigned int used, reserved, limit; uint64_t features; int i, r, n_initialized_vqs =3D 0; =20 @@ -1529,9 +1530,18 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opa= que, } else { used =3D used_memslots; } - if (used > hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) { - error_setg(errp, "vhost backend memory slots limit is less" - " than current number of present memory slots"); + /* + * We assume that all reserved memslots actually require a real memslot + * in our vhost backend. This might not be true, for example, if the + * memslot would be ROM. If ever relevant, we can optimize for that -- + * but we'll need additional information about the reservations. + */ + reserved =3D memory_devices_get_reserved_memslots(); + limit =3D hdev->vhost_ops->vhost_backend_memslots_limit(hdev); + if (used + reserved > limit) { + error_setg(errp, "vhost backend memory slots limit (%d) is less" + " than current number of used (%d) and reserved (%d)" + " memory slots for memory devices.", limit, used, reser= ved); r =3D -EINVAL; goto fail_busyloop; } diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h index b51a579fb9..c7b624da6a 100644 --- a/include/hw/mem/memory-device.h +++ b/include/hw/mem/memory-device.h @@ -46,6 +46,12 @@ typedef struct MemoryDeviceState MemoryDeviceState; * single RAM memory region or a memory region container with subregions * that are RAM memory regions or aliases to RAM memory regions. Other * memory regions or subregions are not supported. + * + * If the device memory region returned via @get_memory_region is a + * memory region container, it's supported to dynamically (un)map subregio= ns + * as long as the number of memslots returned by @get_memslots() won't + * be exceeded and as long as all memory regions are of the same kind (e.g= ., + * all RAM or all ROM). */ struct MemoryDeviceClass { /* private */ @@ -125,6 +131,7 @@ struct MemoryDeviceClass { =20 MemoryDeviceInfoList *qmp_memory_device_list(void); uint64_t get_plugged_memory_size(void); +unsigned int memory_devices_get_reserved_memslots(void); void memory_device_pre_plug(MemoryDeviceState *md, MachineState *ms, const uint64_t *legacy_align, Error **errp); void memory_device_plug(MemoryDeviceState *md, MachineState *ms); diff --git a/stubs/memory_device.c b/stubs/memory_device.c index e75cac62dc..318a5d4187 100644 --- a/stubs/memory_device.c +++ b/stubs/memory_device.c @@ -10,3 +10,8 @@ uint64_t get_plugged_memory_size(void) { return (uint64_t)-1; } + +unsigned int memory_devices_get_reserved_memslots(void) +{ + return 0; +} --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694183044; cv=none; d=zohomail.com; s=zohoarc; b=akoqxGdutAhqVHsjncmKdR4/Es+goSyWxoGx8l22wcm6fjqTK+dQC13b54XtflKUgY1LqYqbIg9Gsns5G9r///2js3hcaVeoZGpRdxt8KjxH6QcjEncRV5DlElldCcm2WhN3gfTZFL24/N87CJrIpjU/sL9anvfu5drUxPNNOkI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694183044; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=76sLIvIi1LRlkl4IyII47l3BU7MLPwQ+dI9Px2upkuc=; b=ZPqlcYO94lKJuiIdo86w6i0IWcfnQkZlbhiiClFJfSE90YQg9eeO3F7wd98ThQ9DFRemIW7x8OmlkxVgDscjEKCbDogJXbwlubYksy03pV5lzk0DuFMspjP4FcTd2avVdu7Mk99ZiKC8aXkr5x1fj2BNJONMVZUctxzUpYHAtJw= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694183044930370.96171149921895; Fri, 8 Sep 2023 07:24:04 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecNI-0001hp-Qm; Fri, 08 Sep 2023 10:22:20 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNG-0001ce-MR for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:18 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNE-0001li-FT for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:18 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-385-yM6MKPrHNgmVi4RSUOESow-1; Fri, 08 Sep 2023 10:22:10 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 98E02181C280; Fri, 8 Sep 2023 14:22:09 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id B5BADC4F860; Fri, 8 Sep 2023 14:22:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182935; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=76sLIvIi1LRlkl4IyII47l3BU7MLPwQ+dI9Px2upkuc=; b=KghJ0POrIIqI2wepGQ+D65C6lRG4DyMZ1fQ0sfWcXXx+lKFCdyBw0pB12efVCt/J36Q2pl ww1WhkhxAumvgYNFGtjHrjCG2ITgbY9YX0ynL97GNyALvZmoPOrLMAo2TmMQqhriA0U2zn WT9Jk9i/svNMTwqGMShKlMyb2CZQhPk= X-MC-Unique: yM6MKPrHNgmVi4RSUOESow-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 10/16] kvm: Add stub for kvm_get_max_memslots() Date: Fri, 8 Sep 2023 16:21:30 +0200 Message-ID: <20230908142136.403541-11-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694183045663100003 Content-Type: text/plain; charset="utf-8" We'll need the stub soon from memory device context. While at it, use "unsigned int" as return value and place the declaration next to kvm_get_free_memslots(). Signed-off-by: David Hildenbrand Reviewed-by: Maciej S. Szmigiero --- accel/kvm/kvm-all.c | 2 +- accel/stubs/kvm-stub.c | 5 +++++ include/sysemu/kvm.h | 2 +- 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index a29906d441..5383bfddc3 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -174,7 +174,7 @@ void kvm_resample_fd_notify(int gsi) } } =20 -int kvm_get_max_memslots(void) +unsigned int kvm_get_max_memslots(void) { KVMState *s =3D KVM_STATE(current_accel()); =20 diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c index a5d4442d8f..51f522e52e 100644 --- a/accel/stubs/kvm-stub.c +++ b/accel/stubs/kvm-stub.c @@ -109,6 +109,11 @@ int kvm_irqchip_remove_irqfd_notifier_gsi(KVMState *s,= EventNotifier *n, return -ENOSYS; } =20 +unsigned int kvm_get_max_memslots(void) +{ + return 0; +} + unsigned int kvm_get_free_memslots(void) { return 0; diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index c3d831baef..97a8a4f201 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -215,6 +215,7 @@ typedef struct KVMRouteChange { =20 /* external API */ =20 +unsigned int kvm_get_max_memslots(void); unsigned int kvm_get_free_memslots(void); bool kvm_has_sync_mmu(void); int kvm_has_vcpu_events(void); @@ -552,7 +553,6 @@ int kvm_set_one_reg(CPUState *cs, uint64_t id, void *so= urce); */ int kvm_get_one_reg(CPUState *cs, uint64_t id, void *target); struct ppc_radix_page_info *kvm_get_radix_page_info(void); -int kvm_get_max_memslots(void); =20 /* Notify resamplefd for EOI of specific interrupts. */ void kvm_resample_fd_notify(int gsi); --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694183039; cv=none; d=zohomail.com; s=zohoarc; b=dw4j7vvSd2yH9k/EHNndaD6yPTjVvqdLaeNHfKBzDD7WSm0PVKbGuYAwElPsNHcNN1ja9wa7K2hmFpCYmdPzHlWsmnFlZpVy0jfTFCpf9QclADAuv1vQKRn7aw8S3/oNPJ5lVn+coiEb6yU0k5R+EOORZe3aOV3kyVSuQQxXrXA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694183039; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=514wTQxaHYtnxSk3ThPQLF4tfMogtFuGARQoE0qmeVc=; b=PHekHrWwlBa+N35iaOvqNF+8tsiksaMZn18sgXbZQJfTSwsnhWsRIN0BD5NzfF2lq/whH+XH2VgxFuVtMqog/5QzIGnL+M0hL4A0Xdbn7rJWBbfx81icegtJNMOZg3bzrTxcEuQxIi8liRG0Q2GO3iqxHEVoBSmia/OI5JWWOz0= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694183039230387.1773545544272; Fri, 8 Sep 2023 07:23:59 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecNK-0001py-PX; Fri, 08 Sep 2023 10:22:22 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNI-0001hd-Kw for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:20 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNG-0001lu-1e for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:20 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-501-D7Up91O8OEWekiN_z8Wg9w-1; Fri, 08 Sep 2023 10:22:13 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7649980D75F; Fri, 8 Sep 2023 14:22:12 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id D3CE1C03295; Fri, 8 Sep 2023 14:22:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182937; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=514wTQxaHYtnxSk3ThPQLF4tfMogtFuGARQoE0qmeVc=; b=YsFRxCxQty2yplh83BFIA0Kqnxw2g76n5HC4ItfRS+V4j8Ewk6tAIMVablxIFxLB+49019 PZgDQzgpP7oJUkbG2pGAIsG/SqbG5DbmrKoL66sN9Jqh/VKiBV/JfYWeWkxkTPGUPyeVdr MDFcIw4mDiCi9DDCaG9vL0IMR2eGZx4= X-MC-Unique: D7Up91O8OEWekiN_z8Wg9w-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 11/16] vhost: Add vhost_get_max_memslots() Date: Fri, 8 Sep 2023 16:21:31 +0200 Message-ID: <20230908142136.403541-12-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694183039577100009 Content-Type: text/plain; charset="utf-8" Let's add vhost_get_max_memslots(). Signed-off-by: David Hildenbrand Reviewed-by: Maciej S. Szmigiero --- hw/virtio/vhost-stub.c | 5 +++++ hw/virtio/vhost.c | 11 +++++++++++ include/hw/virtio/vhost.h | 1 + 3 files changed, 17 insertions(+) diff --git a/hw/virtio/vhost-stub.c b/hw/virtio/vhost-stub.c index d53dd9d288..52d42adab2 100644 --- a/hw/virtio/vhost-stub.c +++ b/hw/virtio/vhost-stub.c @@ -2,6 +2,11 @@ #include "hw/virtio/vhost.h" #include "hw/virtio/vhost-user.h" =20 +unsigned int vhost_get_max_memslots(void) +{ + return UINT_MAX; +} + unsigned int vhost_get_free_memslots(void) { return UINT_MAX; diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index f7e1ac12a8..ee193b07c7 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -55,6 +55,17 @@ static unsigned int used_shared_memslots; static QLIST_HEAD(, vhost_dev) vhost_devices =3D QLIST_HEAD_INITIALIZER(vhost_devices); =20 +unsigned int vhost_get_max_memslots(void) +{ + unsigned int max =3D UINT_MAX; + struct vhost_dev *hdev; + + QLIST_FOREACH(hdev, &vhost_devices, entry) { + max =3D MIN(max, hdev->vhost_ops->vhost_backend_memslots_limit(hde= v)); + } + return max; +} + unsigned int vhost_get_free_memslots(void) { unsigned int free =3D UINT_MAX; diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index 603bf834be..c7e5467693 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -315,6 +315,7 @@ uint64_t vhost_get_features(struct vhost_dev *hdev, con= st int *feature_bits, */ void vhost_ack_features(struct vhost_dev *hdev, const int *feature_bits, uint64_t features); +unsigned int vhost_get_max_memslots(void); unsigned int vhost_get_free_memslots(void); =20 int vhost_net_set_backend(struct vhost_dev *hdev, --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694183051347786.0503108564518; Fri, 8 Sep 2023 07:24:11 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecNM-00023i-FL; Fri, 08 Sep 2023 10:22:24 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNK-0001q0-Od for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:22 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNH-0001mX-TA for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:22 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-297-c3_h3CjrODCKZcbcDI1oVQ-1; Fri, 08 Sep 2023 10:22:16 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6CB3E816529; Fri, 8 Sep 2023 14:22:15 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id B0A44D47819; Fri, 8 Sep 2023 14:22:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182939; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k14rdc0nJagTHK/ORPExT6NlcTj4QZ8GHl1RLO0mxwQ=; b=VrVdt/LFC1mpfgNDxjoMw3qQddTCgQTP4A0pS/bVt0tTz8GYjoE5Up6atSnuCbvbFcjVp3 yziGJNSQlHl9LTolM50+EX+kxh4/HgqU33h4ABmYg25WWUmti0paJzLV8aq7BLY25c19Rj 9poP1veec9frg3bm63f/3bPOyqj1X1w= X-MC-Unique: c3_h3CjrODCKZcbcDI1oVQ-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 12/16] memory-device, vhost: Support automatic decision on the number of memslots Date: Fri, 8 Sep 2023 16:21:32 +0200 Message-ID: <20230908142136.403541-13-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1694183051775100004 Content-Type: text/plain; charset="utf-8" We want to support memory devices that can automatically decide how many memslots they will use. In the worst case, they have to use a single memslot. The target use cases are virtio-mem and the hyper-v balloon. Let's calculate a reasonable limit such a memory device may use, and instruct the device to make a decision based on that limit. Use a simple heuristic that considers: * A memslot soft-limit for all memory devices of 256; also, to not consume too many memslots -- which could harm performance. * Actually still free and unreserved memslots * The percentage of the remaining device memory region that memory device will occupy. Further, while we properly check before plugging a memory device whether there still is are free memslots, we have other memslot consumers (such as boot memory, PCI BARs) that don't perform any checks and might dynamically consume memslots without any prior reservation. So we might succeed in plugging a memory device, but once we dynamically map a PCI BAR we would be in trouble. Doing accounting / reservation / checks for all such users is problematic (e.g., sometimes we might temporarily split boot memory into two memslots, triggered by the BIOS). We use the historic magic memslot number of 509 as orientation to when supporting 256 memory devices -> memslots (leaving 253 for boot memory and other devices) has been proven to work reliable. We'll fallback to suggesting a single memslot if we don't have at least 509 total memslots. Plugging vhost devices with less than 509 memslots available while we have memory devices plugged that consume multiple memslots due to automatic decisions can be problematic. Most configurations might just fail due to "limit < used + reserved", however, it can also happen that these memory devices would suddenly consume memslots that would actually be required by other memslot consumers (boot, PCI BARs) later. Note that this has always been sketchy with vhost devices that support only a small number of memslots; but we don't want to make it any worse.So let's keep it simple and simply reject plugging such vhost devices in such a configuration. Eventually, all vhost devices that want to be fully compatible with such memory devices should support a decent number of memslots (>=3D 509). Signed-off-by: David Hildenbrand Reviewed-by: Maciej S. Szmigiero --- hw/mem/memory-device.c | 96 ++++++++++++++++++++++++++++++++-- hw/virtio/vhost.c | 14 ++++- include/hw/boards.h | 4 ++ include/hw/mem/memory-device.h | 32 ++++++++++++ stubs/memory_device.c | 5 ++ 5 files changed, 147 insertions(+), 4 deletions(-) diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c index 1b14ba5661..ae38f48f16 100644 --- a/hw/mem/memory-device.c +++ b/hw/mem/memory-device.c @@ -85,13 +85,93 @@ unsigned int memory_devices_get_reserved_memslots(void) return get_reserved_memslots(current_machine); } =20 +bool memory_devices_memslot_auto_decision_active(void) +{ + if (!current_machine->device_memory) { + return false; + } + + return current_machine->device_memory->memslot_auto_decision_active; +} + +static unsigned int memory_device_memslot_decision_limit(MachineState *ms, + MemoryRegion *mr) +{ + const unsigned int reserved =3D get_reserved_memslots(ms); + const uint64_t size =3D memory_region_size(mr); + unsigned int max =3D vhost_get_max_memslots(); + unsigned int free =3D vhost_get_free_memslots(); + uint64_t available_space; + unsigned int memslots; + + if (kvm_enabled()) { + max =3D MIN(max, kvm_get_max_memslots()); + free =3D MIN(free, kvm_get_free_memslots()); + } + + /* + * If we only have less overall memslots than what we consider reasona= ble, + * just keep it to a minimum. + */ + if (max < MEMORY_DEVICES_SAFE_MAX_MEMSLOTS) { + return 1; + } + + /* + * Consider our soft-limit across all memory devices. We don't really + * expect to exceed this limit in reasonable configurations. + */ + if (MEMORY_DEVICES_SOFT_MEMSLOT_LIMIT <=3D + ms->device_memory->required_memslots) { + return 1; + } + memslots =3D MEMORY_DEVICES_SOFT_MEMSLOT_LIMIT - + ms->device_memory->required_memslots; + + /* + * Consider the actually still free memslots. This is only relevant if + * other memslot consumers would consume *significantly* more memslots= than + * what we prepared for (> 253). Unlikely, but let's just handle it + * cleanly. + */ + memslots =3D MIN(memslots, free - reserved); + if (memslots < 1 || unlikely(free < reserved)) { + return 1; + } + + /* We cannot have any other memory devices? So give all to this device= . */ + if (size =3D=3D ms->maxram_size - ms->ram_size) { + return memslots; + } + + /* + * Simple heuristic: equally distribute the memslots over the space + * still available for memory devices. + */ + available_space =3D ms->maxram_size - ms->ram_size - + ms->device_memory->used_region_size; + memslots =3D (double)memslots * size / available_space; + return memslots < 1 ? 1 : memslots; +} + static void memory_device_check_addable(MachineState *ms, MemoryDeviceStat= e *md, MemoryRegion *mr, Error **errp) { + const MemoryDeviceClass *mdc =3D MEMORY_DEVICE_GET_CLASS(md); const uint64_t used_region_size =3D ms->device_memory->used_region_siz= e; const uint64_t size =3D memory_region_size(mr); - const unsigned int required_memslots =3D memory_device_get_memslots(md= ); const unsigned int reserved_memslots =3D get_reserved_memslots(ms); + unsigned int required_memslots, memslot_limit; + + /* + * Instruct the device to decide how many memslots to use, if applicab= le, + * before we query the number of required memslots the first time. + */ + if (mdc->decide_memslots) { + memslot_limit =3D memory_device_memslot_decision_limit(ms, mr); + mdc->decide_memslots(md, memslot_limit); + } + required_memslots =3D memory_device_get_memslots(md); =20 /* we will need memory slots for kvm and vhost */ if (kvm_enabled() && @@ -300,6 +380,7 @@ out: void memory_device_plug(MemoryDeviceState *md, MachineState *ms) { const MemoryDeviceClass *mdc =3D MEMORY_DEVICE_GET_CLASS(md); + const unsigned int memslots =3D memory_device_get_memslots(md); const uint64_t addr =3D mdc->get_addr(md); MemoryRegion *mr; =20 @@ -311,7 +392,11 @@ void memory_device_plug(MemoryDeviceState *md, Machine= State *ms) g_assert(ms->device_memory); =20 ms->device_memory->used_region_size +=3D memory_region_size(mr); - ms->device_memory->required_memslots +=3D memory_device_get_memslots(m= d); + ms->device_memory->required_memslots +=3D memslots; + if (mdc->decide_memslots && memslots > 1) { + ms->device_memory->memslot_auto_decision_active++; + } + memory_region_add_subregion(&ms->device_memory->mr, addr - ms->device_memory->base, mr); trace_memory_device_plug(DEVICE(md)->id ? DEVICE(md)->id : "", addr); @@ -320,6 +405,7 @@ void memory_device_plug(MemoryDeviceState *md, MachineS= tate *ms) void memory_device_unplug(MemoryDeviceState *md, MachineState *ms) { const MemoryDeviceClass *mdc =3D MEMORY_DEVICE_GET_CLASS(md); + const unsigned int memslots =3D memory_device_get_memslots(md); MemoryRegion *mr; =20 /* @@ -330,8 +416,12 @@ void memory_device_unplug(MemoryDeviceState *md, Machi= neState *ms) g_assert(ms->device_memory); =20 memory_region_del_subregion(&ms->device_memory->mr, mr); + + if (mdc->decide_memslots && memslots > 1) { + ms->device_memory->memslot_auto_decision_active--; + } ms->device_memory->used_region_size -=3D memory_region_size(mr); - ms->device_memory->required_memslots -=3D memory_device_get_memslots(m= d); + ms->device_memory->required_memslots -=3D memslots; trace_memory_device_unplug(DEVICE(md)->id ? DEVICE(md)->id : "", mdc->get_addr(md)); } diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index ee193b07c7..24013b39d6 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -1462,6 +1462,19 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opa= que, goto fail; } =20 + limit =3D hdev->vhost_ops->vhost_backend_memslots_limit(hdev); + if (limit < MEMORY_DEVICES_SAFE_MAX_MEMSLOTS && + memory_devices_memslot_auto_decision_active()) { + error_setg(errp, "some memory device (like virtio-mem)" + " decided how many memory slots to use based on the overall" + " number of memory slots; this vhost backend would further" + " restricts the overall number of memory slots"); + error_append_hint(errp, "Try plugging this vhost backend before" + " plugging such memory devices.\n"); + r =3D -EINVAL; + goto fail; + } + for (i =3D 0; i < hdev->nvqs; ++i, ++n_initialized_vqs) { r =3D vhost_virtqueue_init(hdev, hdev->vqs + i, hdev->vq_index + i= ); if (r < 0) { @@ -1548,7 +1561,6 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaq= ue, * but we'll need additional information about the reservations. */ reserved =3D memory_devices_get_reserved_memslots(); - limit =3D hdev->vhost_ops->vhost_backend_memslots_limit(hdev); if (used + reserved > limit) { error_setg(errp, "vhost backend memory slots limit (%d) is less" " than current number of used (%d) and reserved (%d)" diff --git a/include/hw/boards.h b/include/hw/boards.h index e344ded607..c62641c92b 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -303,6 +303,9 @@ struct MachineClass { * @used_region_size: the part of @mr already used by memory devices * @required_memslots: the number of memslots required by memory devices * @used_memslots: the number of memslots currently used by memory devices + * @memslot_auto_decision_active: whether any plugged memory device + * automatically decided to use more than + * one memslot */ typedef struct DeviceMemoryState { hwaddr base; @@ -313,6 +316,7 @@ typedef struct DeviceMemoryState { uint64_t used_region_size; unsigned int required_memslots; unsigned int used_memslots; + unsigned int memslot_auto_decision_active; } DeviceMemoryState; =20 /** diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h index c7b624da6a..3354d6c166 100644 --- a/include/hw/mem/memory-device.h +++ b/include/hw/mem/memory-device.h @@ -14,6 +14,7 @@ #define MEMORY_DEVICE_H =20 #include "hw/qdev-core.h" +#include "qemu/typedefs.h" #include "qapi/qapi-types-machine.h" #include "qom/object.h" =20 @@ -99,6 +100,15 @@ struct MemoryDeviceClass { */ MemoryRegion *(*get_memory_region)(MemoryDeviceState *md, Error **errp= ); =20 + /* + * Optional: Instruct the memory device to decide how many memory slots + * it requires, not exceeding the given limit. + * + * Called exactly once when pre-plugging the memory device, before + * querying the number of memslots using @get_memslots the first time. + */ + void (*decide_memslots)(MemoryDeviceState *md, unsigned int limit); + /* * Optional for memory devices that require only a single memslot, * required for all other memory devices: Return the number of memslots @@ -129,9 +139,31 @@ struct MemoryDeviceClass { MemoryDeviceInfo *info); }; =20 +/* + * Traditionally, KVM/vhost in many setups supported 509 memslots, whereby + * 253 memslots were "reserved" for boot memory and other devices (such + * as PCI BARs, which can get mapped dynamically) and 256 memslots were + * dedicated for DIMMs. These magic numbers worked reliably in the past. + * + * Further, using many memslots can negatively affect performance, so sett= ing + * the soft-limit of memslots used by memory devices to the traditional + * DIMM limit of 256 sounds reasonable. + * + * If we have less than 509 memslots, we will instruct memory devices that + * support automatically deciding how many memslots to use to only use a s= ingle + * one. + * + * Hotplugging vhost devices with at least 509 memslots is not expected to + * cause problems, not even when devices automatically decided how many me= mslots + * to use. + */ +#define MEMORY_DEVICES_SOFT_MEMSLOT_LIMIT 256 +#define MEMORY_DEVICES_SAFE_MAX_MEMSLOTS 509 + MemoryDeviceInfoList *qmp_memory_device_list(void); uint64_t get_plugged_memory_size(void); unsigned int memory_devices_get_reserved_memslots(void); +bool memory_devices_memslot_auto_decision_active(void); void memory_device_pre_plug(MemoryDeviceState *md, MachineState *ms, const uint64_t *legacy_align, Error **errp); void memory_device_plug(MemoryDeviceState *md, MachineState *ms); diff --git a/stubs/memory_device.c b/stubs/memory_device.c index 318a5d4187..15fd93ff67 100644 --- a/stubs/memory_device.c +++ b/stubs/memory_device.c @@ -15,3 +15,8 @@ unsigned int memory_devices_get_reserved_memslots(void) { return 0; } + +bool memory_devices_memslot_auto_decision_active(void) +{ + return false; +} --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694183060; cv=none; d=zohomail.com; s=zohoarc; b=RdwK0v6Gri9LDnT3kfqiZ/Z8isDf/zK2BwucxVfLQCn4/93Ar7/082a2Y/Htak8zl946mOWNhUyHE8U/O1M+n3EXkwbDbOWzUV6+InSPkCComOQF13IuZtSDKPs/bHGCcfuszk79Pi7bWx2ZAUBgHhiuUVEzI01kKcTDBeOSyqU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694183060; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=HHE7mkrehWHgWXyRLmrgc4bzKfsgE2qS/kJ9qG1wK7I=; b=CjkJ3LgBh4QR4sxc4wuTES/2d4iD/0SrASyfV65lbX491nWB7E8kiMxwSvBSsiCL7gGTACXn11450O/PAsfVY2TDQy6X/NY8Qpw+Q0CiQj4I/Vgb/fY95gzLx1fYwuyBUnrllxZ8uO6KsIyLVcdomEvSAN4jikc+GIK90QYcoBI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694183060692685.2916663944394; Fri, 8 Sep 2023 07:24:20 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecNP-0002R8-GP; Fri, 08 Sep 2023 10:22:27 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNM-00025D-Qr for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:24 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNK-0001nI-Fh for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:24 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-96-mnRxv0qmMK-6G8ySNyzUgQ-1; Fri, 08 Sep 2023 10:22:18 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 45B08101CA83; Fri, 8 Sep 2023 14:22:18 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id A67FCC03295; Fri, 8 Sep 2023 14:22:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182941; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HHE7mkrehWHgWXyRLmrgc4bzKfsgE2qS/kJ9qG1wK7I=; b=Ww5H7od7BcG1SwTOAz6pvxQKxjjpeIh+eM6XP7ugVNoKx36riXCAxXX9UXsRt0MVmBZJH5 nxrAopbLI95suG8HevSvIw8IWmNE0KZWJyYtFphWlYmJskl8E+MuNH8fjicqpCZViTd1sE HLcB2zjaaU9jeL9wI8VMopc7OmYRR3E= X-MC-Unique: mnRxv0qmMK-6G8ySNyzUgQ-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 13/16] memory: Clarify mapping requirements for RamDiscardManager Date: Fri, 8 Sep 2023 16:21:33 +0200 Message-ID: <20230908142136.403541-14-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694183061920100001 Content-Type: text/plain; charset="utf-8" We really only care about the RAM memory region not being mapped into an address space yet as long as we're still setting up the RamDiscardManager. Once mapped into an address space, memory notifiers would get notified about such a region and any attempts to modify the RamDiscardManager would be wrong. While "mapped into an address space" is easy to check for RAM regions that are mapped directly (following the ->container links), it's harder to check when such regions are mapped indirectly via aliases. For now, we can only detect that a region is mapped through an alias (->mapped_via_alias), but we don't have a handle on these aliases to follow all their ->container links to test if they are eventually mapped into an address space. So relax the assertion in memory_region_set_ram_discard_manager(), remove the check in memory_region_get_ram_discard_manager() and clarify the doc. Signed-off-by: David Hildenbrand Reviewed-by: Maciej S. Szmigiero --- include/exec/memory.h | 5 +++-- softmmu/memory.c | 4 ++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index 68284428f8..5feb704585 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -593,8 +593,9 @@ typedef void (*ReplayRamDiscard)(MemoryRegionSection *s= ection, void *opaque); * populated (consuming memory), to be used/accessed by the VM. * * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the - * #MemoryRegion isn't mapped yet; it cannot change while the #MemoryRegio= n is - * mapped. + * #MemoryRegion isn't mapped into an address space yet (either directly + * or via an alias); it cannot change while the #MemoryRegion is + * mapped into an address space. * * The #RamDiscardManager is intended to be used by technologies that are * incompatible with discarding of RAM (e.g., VFIO, which may pin all diff --git a/softmmu/memory.c b/softmmu/memory.c index 7d9494ce70..c1e8aa133f 100644 --- a/softmmu/memory.c +++ b/softmmu/memory.c @@ -2081,7 +2081,7 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion= *iommu_mr) =20 RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr) { - if (!memory_region_is_mapped(mr) || !memory_region_is_ram(mr)) { + if (!memory_region_is_ram(mr)) { return NULL; } return mr->rdm; @@ -2090,7 +2090,7 @@ RamDiscardManager *memory_region_get_ram_discard_mana= ger(MemoryRegion *mr) void memory_region_set_ram_discard_manager(MemoryRegion *mr, RamDiscardManager *rdm) { - g_assert(memory_region_is_ram(mr) && !memory_region_is_mapped(mr)); + g_assert(memory_region_is_ram(mr)); g_assert(!rdm || !mr->rdm); mr->rdm =3D rdm; } --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694182956; cv=none; d=zohomail.com; s=zohoarc; b=V8kVusgBeV3OSl8Rs1SXIUGC3Nr191DxXGG+BqS+z5Ll9vaty0TOiRznOMnn8CsyK8nWUu4H79RNhH7dOejNtDO9WhtqF9HTm5wWbcc9PbQiJgd34hBwbeuF6yMFc2OPROpCY8iEV/4lj7+zNviSl3OD37zLAwXMkaFcWIYPqRc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694182956; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=7LOIIoUbqvFVrDtHTmWzNfIshWtFm2CK8ylVK/Ns/Os=; b=OHYAxqiuFtyKwrMF05JFm0hcUjxf4ti74dezUs46dO6adFMIDnBrzr+eRAGQ/3mbnrowaUtf0CIn5ritStvGPeQNfoH/ZQw8Sk73t6Yjj618I8vMqNozeQs1Cv1v5PC3edofPT1gsjqLF0hwwcei2vlZhmhyyWEJVbmNlbtOOx8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 16941829564151013.3365333205871; Fri, 8 Sep 2023 07:22:36 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecNU-0002wN-Su; Fri, 08 Sep 2023 10:22:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNS-0002p6-Mq for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:30 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNO-0001nt-5I for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:30 -0400 Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-262-ndHNC8LiORm60kREOCdOpA-1; Fri, 08 Sep 2023 10:22:21 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 37C3838157C1; Fri, 8 Sep 2023 14:22:21 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id 81EE8D47819; Fri, 8 Sep 2023 14:22:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182945; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7LOIIoUbqvFVrDtHTmWzNfIshWtFm2CK8ylVK/Ns/Os=; b=DxCd6ZAQ0ojXVd+SoGZzGVAI6IMISuYGVUowrBV2rJbEOSb9qcc9s0XV2cwyxMZGJ0zSAy yFTtiKnwmNsJbsv44yHwadQ9qPTfJ7K7llCFyspDsx7dRGBE9DVl4ktPhcU3ZYVLbuA0Hm gc1piCiRpfEhsS0Ux8Y4A2fXg9j468E= X-MC-Unique: ndHNC8LiORm60kREOCdOpA-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 14/16] virtio-mem: Expose device memory via multiple memslots if enabled Date: Fri, 8 Sep 2023 16:21:34 +0200 Message-ID: <20230908142136.403541-15-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694182957293100005 Content-Type: text/plain; charset="utf-8" Having large virtio-mem devices that only expose little memory to a VM is currently a problem: we map the whole sparse memory region into the guest using a single memslot, resulting in one gigantic memslot in KVM. KVM allocates metadata for the whole memslot, which can result in quite some memory waste. Assuming we have a 1 TiB virtio-mem device and only expose little (e.g., 1 GiB) memory, we would create a single 1 TiB memslot and KVM has to allocate metadata for that 1 TiB memslot: on x86, this implies allocating a significant amount of memory for metadata: (1) RMAP: 8 bytes per 4 KiB, 8 bytes per 2 MiB, 8 bytes per 1 GiB -> For 1 TiB: 2147483648 + 4194304 + 8192 =3D ~ 2 GiB (0.2 %) With the TDP MMU (cat /sys/module/kvm/parameters/tdp_mmu) this gets allocated lazily when required for nested VMs (2) gfn_track: 2 bytes per 4 KiB -> For 1 TiB: 536870912 =3D ~512 MiB (0.05 %) (3) lpage_info: 4 bytes per 2 MiB, 4 bytes per 1 GiB -> For 1 TiB: 2097152 + 4096 =3D ~2 MiB (0.0002 %) (4) 2x dirty bitmaps for tracking: 2x 1 bit per 4 KiB page -> For 1 TiB: 536870912 =3D 64 MiB (0.006 %) So we primarily care about (1) and (2). The bad thing is, that the memory consumption *doubles* once SMM is enabled, because we create the memslot once for !SMM and once for SMM. Having a 1 TiB memslot without the TDP MMU consumes around: * With SMM: 5 GiB * Without SMM: 2.5 GiB Having a 1 TiB memslot with the TDP MMU consumes around: * With SMM: 1 GiB * Without SMM: 512 MiB ... and that's really something we want to optimize, to be able to just start a VM with small boot memory (e.g., 4 GiB) and a virtio-mem device that can grow very large (e.g., 1 TiB). Consequently, using multiple memslots and only mapping the memslots we really need can significantly reduce memory waste and speed up memslot-related operations. Let's expose the sparse RAM memory region using multiple memslots, mapping only the memslots we currently need into our device memory region container. * With VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we only map the memslots that actually have memory plugged, and dynamically (un)map when (un)plugging memory blocks. * Without VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we always map the memslots covered by the usable region, and dynamically (un)map when resizing the usable region. We'll auto-detect the number of memslots to use based on the memslot limit provided by the core. We'll use at most 1 memslot per gigabyte. Note that our global limit of memslots accross all memory devices is currently set to 256: even with multiple large virtio-mem devices, we'd still have a sane limit on the number of memslots used. The default is a single memslot for now ("multiple-memslots=3Doff"). The optimization must be enabled manually using "multiple-memslots=3Don", becau= se some vhost setups (e.g., hotplug of vhost-user devices) might be problematic until we support more memslots especially in vhost-user backends. Note that "multiple-memslots=3Don" is just a hint that multiple memslots *may* be used for internal optimizations, not that multiple memslots *must* be used. The actual number of memslots that are used is an internal detail: for example, once memslot metadata is no longer an issue, we could simply stop optimizing for that. Migration source and destination can differ on the setting of "multiple-memslots". Signed-off-by: David Hildenbrand Reviewed-by: Maciej S. Szmigiero --- hw/virtio/virtio-mem-pci.c | 21 +++ hw/virtio/virtio-mem.c | 266 ++++++++++++++++++++++++++++++++- include/hw/virtio/virtio-mem.h | 23 ++- 3 files changed, 306 insertions(+), 4 deletions(-) diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c index c4597e029e..1b4e9a3284 100644 --- a/hw/virtio/virtio-mem-pci.c +++ b/hw/virtio/virtio-mem-pci.c @@ -48,6 +48,25 @@ static MemoryRegion *virtio_mem_pci_get_memory_region(Me= moryDeviceState *md, return vmc->get_memory_region(vmem, errp); } =20 +static void virtio_mem_pci_decide_memslots(MemoryDeviceState *md, + unsigned int limit) +{ + VirtIOMEMPCI *pci_mem =3D VIRTIO_MEM_PCI(md); + VirtIOMEM *vmem =3D VIRTIO_MEM(&pci_mem->vdev); + VirtIOMEMClass *vmc =3D VIRTIO_MEM_GET_CLASS(vmem); + + vmc->decide_memslots(vmem, limit); +} + +static unsigned int virtio_mem_pci_get_memslots(MemoryDeviceState *md) +{ + VirtIOMEMPCI *pci_mem =3D VIRTIO_MEM_PCI(md); + VirtIOMEM *vmem =3D VIRTIO_MEM(&pci_mem->vdev); + VirtIOMEMClass *vmc =3D VIRTIO_MEM_GET_CLASS(vmem); + + return vmc->get_memslots(vmem); +} + static uint64_t virtio_mem_pci_get_plugged_size(const MemoryDeviceState *m= d, Error **errp) { @@ -150,6 +169,8 @@ static void virtio_mem_pci_class_init(ObjectClass *klas= s, void *data) mdc->set_addr =3D virtio_mem_pci_set_addr; mdc->get_plugged_size =3D virtio_mem_pci_get_plugged_size; mdc->get_memory_region =3D virtio_mem_pci_get_memory_region; + mdc->decide_memslots =3D virtio_mem_pci_decide_memslots; + mdc->get_memslots =3D virtio_mem_pci_get_memslots; mdc->fill_device_info =3D virtio_mem_pci_fill_device_info; mdc->get_min_alignment =3D virtio_mem_pci_get_min_alignment; =20 diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index b6e781741e..724fcb189a 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -66,6 +66,13 @@ static uint32_t virtio_mem_default_thp_size(void) return default_thp_size; } =20 +/* + * The minimum memslot size depends on this setting ("sane default"), the + * device block size, and the memory backend page size. The last (or singl= e) + * memslot might be smaller than this constant. + */ +#define VIRTIO_MEM_MIN_MEMSLOT_SIZE (1 * GiB) + /* * We want to have a reasonable default block size such that * 1. We avoid splitting THPs when unplugging memory, which degrades @@ -483,6 +490,94 @@ static bool virtio_mem_valid_range(const VirtIOMEM *vm= em, uint64_t gpa, return true; } =20 +static void virtio_mem_activate_memslot(VirtIOMEM *vmem, unsigned int idx) +{ + const uint64_t memslot_offset =3D idx * vmem->memslot_size; + + /* + * Instead of enabling/disabling memslot, we add/remove them. This sho= uld + * make address space updates faster, because we don't have to loop ov= er + * many disabled subregions. + */ + if (memory_region_is_mapped(&vmem->memslots[idx])) { + return; + } + memory_region_add_subregion(vmem->mr, memslot_offset, &vmem->memslots[= idx]); +} + +static void virtio_mem_deactivate_memslot(VirtIOMEM *vmem, unsigned int id= x) +{ + if (!memory_region_is_mapped(&vmem->memslots[idx])) { + return; + } + memory_region_del_subregion(vmem->mr, &vmem->memslots[idx]); +} + +static void virtio_mem_activate_memslots_to_plug(VirtIOMEM *vmem, + uint64_t offset, uint64_t= size) +{ + const unsigned int start_idx =3D offset / vmem->memslot_size; + const unsigned int end_idx =3D (offset + size + vmem->memslot_size - 1= ) / + vmem->memslot_size; + unsigned int idx; + + if (vmem->unplugged_inaccessible =3D=3D ON_OFF_AUTO_OFF) { + /* All memslots covered by the usable region are always enabled. */ + return; + } + + /* Activate all involved memslots in a single transaction. */ + memory_region_transaction_begin(); + for (idx =3D start_idx; idx < end_idx; idx++) { + virtio_mem_activate_memslot(vmem, idx); + } + memory_region_transaction_commit(); +} + +static void virtio_mem_deactivate_unplugged_memslots(VirtIOMEM *vmem, + uint64_t offset, + uint64_t size) +{ + const uint64_t region_size =3D memory_region_size(&vmem->memdev->mr); + const unsigned int start_idx =3D offset / vmem->memslot_size; + const unsigned int end_idx =3D (offset + size + vmem->memslot_size - 1= ) / + vmem->memslot_size; + unsigned int idx; + + if (vmem->unplugged_inaccessible =3D=3D ON_OFF_AUTO_OFF) { + /* All memslots covered by the usable region are always enabled. */ + return; + } + + /* Deactivate all memslots with unplugged blocks in a single transacti= on. */ + memory_region_transaction_begin(); + for (idx =3D start_idx; idx < end_idx; idx++) { + const uint64_t memslot_offset =3D idx * vmem->memslot_size; + uint64_t memslot_size =3D vmem->memslot_size; + + /* The size of the last memslot might be smaller. */ + if (idx =3D=3D vmem->nb_memslots - 1) { + memslot_size =3D region_size - memslot_offset; + } + + /* + * Partially covered memslots might still have some blocks plugged= and + * have to remain enabled if that's the case. + */ + if (offset > memslot_offset || + offset + size < memslot_offset + memslot_size) { + const uint64_t gpa =3D vmem->addr + memslot_offset; + + if (!virtio_mem_is_range_unplugged(vmem, gpa, memslot_size)) { + continue; + } + } + + virtio_mem_deactivate_memslot(vmem, idx); + } + memory_region_transaction_commit(); +} + static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa, uint64_t size, bool plug) { @@ -500,6 +595,8 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, = uint64_t start_gpa, } virtio_mem_notify_unplug(vmem, offset, size); virtio_mem_set_range_unplugged(vmem, start_gpa, size); + /* Disable completely unplugged memslots after updating the state.= */ + virtio_mem_deactivate_unplugged_memslots(vmem, offset, size); return 0; } =20 @@ -527,7 +624,20 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem,= uint64_t start_gpa, } =20 if (!ret) { + /* + * Activate before notifying and rollback in case of any errors. + * + * When enabling a yet disabled memslot, memory notifiers will get + * notified about the added memory region and can register with the + * RamDiscardManager; this will traverse all plugged blocks and sk= ip the + * blocks we are plugging here. The following notification will in= form + * registered listeners about the blocks we're plugging. + */ + virtio_mem_activate_memslots_to_plug(vmem, offset, size); ret =3D virtio_mem_notify_plug(vmem, offset, size); + if (ret) { + virtio_mem_deactivate_unplugged_memslots(vmem, offset, size); + } } if (ret) { /* Could be preallocation or a notifier populated memory. */ @@ -602,6 +712,7 @@ static void virtio_mem_resize_usable_region(VirtIOMEM *= vmem, { uint64_t newsize =3D MIN(memory_region_size(&vmem->memdev->mr), requested_size + VIRTIO_MEM_USABLE_EXTENT); + unsigned int idx; =20 /* The usable region size always has to be multiples of the block size= . */ newsize =3D QEMU_ALIGN_UP(newsize, vmem->block_size); @@ -616,12 +727,33 @@ static void virtio_mem_resize_usable_region(VirtIOMEM= *vmem, =20 trace_virtio_mem_resized_usable_region(vmem->usable_region_size, newsi= ze); vmem->usable_region_size =3D newsize; + + if (vmem->unplugged_inaccessible =3D=3D ON_OFF_AUTO_OFF) { + /* + * Activate all memslots covered by the usable region and deactiva= te the + * remaining ones in a single transaction. + */ + memory_region_transaction_begin(); + for (idx =3D 0; idx < vmem->nb_memslots; idx++) { + if (vmem->memslot_size * idx < vmem->usable_region_size) { + virtio_mem_activate_memslot(vmem, idx); + } else { + virtio_mem_deactivate_memslot(vmem, idx); + } + } + memory_region_transaction_commit(); + } } =20 static int virtio_mem_unplug_all(VirtIOMEM *vmem) { + const uint64_t region_size =3D memory_region_size(&vmem->memdev->mr); RAMBlock *rb =3D vmem->memdev->mr.ram_block; =20 + if (virtio_mem_is_busy()) { + return -EBUSY; + } + if (vmem->size) { if (virtio_mem_is_busy()) { return -EBUSY; @@ -634,6 +766,9 @@ static int virtio_mem_unplug_all(VirtIOMEM *vmem) bitmap_clear(vmem->bitmap, 0, vmem->bitmap_size); vmem->size =3D 0; notifier_list_notify(&vmem->size_change_notifiers, &vmem->size); + + /* Deactivate all memslots after updating the state. */ + virtio_mem_deactivate_unplugged_memslots(vmem, 0, region_size); } =20 trace_virtio_mem_unplugged_all(); @@ -790,6 +925,43 @@ static void virtio_mem_system_reset(void *opaque) virtio_mem_unplug_all(vmem); } =20 +static void virtio_mem_prepare_mr(VirtIOMEM *vmem) +{ + const uint64_t region_size =3D memory_region_size(&vmem->memdev->mr); + + g_assert(!vmem->mr); + vmem->mr =3D g_new0(MemoryRegion, 1); + memory_region_init(vmem->mr, OBJECT(vmem), "virtio-mem", + region_size); + vmem->mr->align =3D memory_region_get_alignment(&vmem->memdev->mr); +} + +static void virtio_mem_prepare_memslots(VirtIOMEM *vmem) +{ + const uint64_t region_size =3D memory_region_size(&vmem->memdev->mr); + unsigned int idx; + + g_assert(!vmem->memslots && vmem->nb_memslots); + vmem->memslots =3D g_new0(MemoryRegion, vmem->nb_memslots); + + /* Initialize our memslots, but don't map them yet. */ + for (idx =3D 0; idx < vmem->nb_memslots; idx++) { + const uint64_t memslot_offset =3D idx * vmem->memslot_size; + uint64_t memslot_size =3D vmem->memslot_size; + char name[20]; + + /* The size of the last memslot might be smaller. */ + if (idx =3D=3D vmem->nb_memslots - 1) { + memslot_size =3D region_size - memslot_offset; + } + + snprintf(name, sizeof(name), "memslot-%u", idx); + memory_region_init_alias(&vmem->memslots[idx], OBJECT(vmem), name, + &vmem->memdev->mr, memslot_offset, + memslot_size); + } +} + static void virtio_mem_device_realize(DeviceState *dev, Error **errp) { MachineState *ms =3D MACHINE(qdev_get_machine()); @@ -921,8 +1093,6 @@ static void virtio_mem_device_realize(DeviceState *dev= , Error **errp) } } =20 - virtio_mem_resize_usable_region(vmem, vmem->requested_size, true); - vmem->bitmap_size =3D memory_region_size(&vmem->memdev->mr) / vmem->block_size; vmem->bitmap =3D bitmap_new(vmem->bitmap_size); @@ -930,6 +1100,18 @@ static void virtio_mem_device_realize(DeviceState *de= v, Error **errp) virtio_init(vdev, VIRTIO_ID_MEM, sizeof(struct virtio_mem_config)); vmem->vq =3D virtio_add_queue(vdev, 128, virtio_mem_handle_request); =20 + if (!vmem->mr) { + virtio_mem_prepare_mr(vmem); + } + if (!vmem->nb_memslots || vmem->nb_memslots =3D=3D 1) { + vmem->nb_memslots =3D 1; + vmem->memslot_size =3D memory_region_size(&vmem->memdev->mr); + } + if (!vmem->memslots) { + virtio_mem_prepare_memslots(vmem); + } + + virtio_mem_resize_usable_region(vmem, vmem->requested_size, true); host_memory_backend_set_mapped(vmem->memdev, true); vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem)); if (vmem->early_migration) { @@ -963,6 +1145,7 @@ static void virtio_mem_device_unrealize(DeviceState *d= ev) } vmstate_unregister_ram(&vmem->memdev->mr, DEVICE(vmem)); host_memory_backend_set_mapped(vmem->memdev, false); + virtio_mem_resize_usable_region(vmem, 0, true); virtio_del_queue(vdev, 0); virtio_cleanup(vdev); g_free(vmem->bitmap); @@ -1235,9 +1418,66 @@ static MemoryRegion *virtio_mem_get_memory_region(Vi= rtIOMEM *vmem, Error **errp) if (!vmem->memdev) { error_setg(errp, "'%s' property must be set", VIRTIO_MEM_MEMDEV_PR= OP); return NULL; + } else if (!vmem->mr) { + virtio_mem_prepare_mr(vmem); } =20 - return &vmem->memdev->mr; + return vmem->mr; +} + +static void virtio_mem_decide_memslots(VirtIOMEM *vmem, unsigned int limit) +{ + uint64_t region_size, memslot_size, min_memslot_size; + unsigned int memslots; + RAMBlock *rb; + + /* We're called exactly once, before realizing the device. */ + g_assert(!vmem->nb_memslots); + + /* If realizing the device will fail, just assume a single memslot. */ + if (limit <=3D 1 || !vmem->multiple_memslots || !vmem->memdev || + !vmem->memdev->mr.ram_block) { + vmem->nb_memslots =3D 1; + return; + } + + rb =3D vmem->memdev->mr.ram_block; + region_size =3D memory_region_size(&vmem->memdev->mr); + + /* + * Determine the default block size now, to determine the minimum mems= lot + * size. We want the minimum slot size to be at least the device block= size. + */ + if (!vmem->block_size) { + vmem->block_size =3D virtio_mem_default_block_size(rb); + } + /* If realizing the device will fail, just assume a single memslot. */ + if (vmem->block_size < qemu_ram_pagesize(rb) || + !QEMU_IS_ALIGNED(region_size, vmem->block_size)) { + vmem->nb_memslots =3D 1; + return; + } + + /* + * All memslots except the last one have a reasonable minimum size, and + * and all memslot sizes are aligned to the device block size. + */ + memslot_size =3D QEMU_ALIGN_UP(region_size / limit, vmem->block_size); + min_memslot_size =3D MAX(vmem->block_size, VIRTIO_MEM_MIN_MEMSLOT_SIZE= ); + memslot_size =3D MAX(memslot_size, min_memslot_size); + + memslots =3D QEMU_ALIGN_UP(region_size, memslot_size) / memslot_size; + if (memslots !=3D 1) { + vmem->memslot_size =3D memslot_size; + } + vmem->nb_memslots =3D memslots; +} + +static unsigned int virtio_mem_get_memslots(VirtIOMEM *vmem) +{ + /* We're called after instructed to make a decision. */ + g_assert(vmem->nb_memslots); + return vmem->nb_memslots; } =20 static void virtio_mem_add_size_change_notifier(VirtIOMEM *vmem, @@ -1377,6 +1617,21 @@ static void virtio_mem_instance_init(Object *obj) NULL, NULL); } =20 +static void virtio_mem_instance_finalize(Object *obj) +{ + VirtIOMEM *vmem =3D VIRTIO_MEM(obj); + + /* + * Note: the core already dropped the references on all memory regions + * (it's passed as the owner to memory_region_init_*()) and finalized + * these objects. We can simply free the memory. + */ + g_free(vmem->memslots); + vmem->memslots =3D NULL; + g_free(vmem->mr); + vmem->mr =3D NULL; +} + static Property virtio_mem_properties[] =3D { DEFINE_PROP_UINT64(VIRTIO_MEM_ADDR_PROP, VirtIOMEM, addr, 0), DEFINE_PROP_UINT32(VIRTIO_MEM_NODE_PROP, VirtIOMEM, node, 0), @@ -1389,6 +1644,8 @@ static Property virtio_mem_properties[] =3D { #endif DEFINE_PROP_BOOL(VIRTIO_MEM_EARLY_MIGRATION_PROP, VirtIOMEM, early_migration, true), + DEFINE_PROP_BOOL(VIRTIO_MEM_MULTIPLE_MEMSLOTS_PROP, VirtIOMEM, + multiple_memslots, false), DEFINE_PROP_END_OF_LIST(), }; =20 @@ -1556,6 +1813,8 @@ static void virtio_mem_class_init(ObjectClass *klass,= void *data) =20 vmc->fill_device_info =3D virtio_mem_fill_device_info; vmc->get_memory_region =3D virtio_mem_get_memory_region; + vmc->decide_memslots =3D virtio_mem_decide_memslots; + vmc->get_memslots =3D virtio_mem_get_memslots; vmc->add_size_change_notifier =3D virtio_mem_add_size_change_notifier; vmc->remove_size_change_notifier =3D virtio_mem_remove_size_change_not= ifier; vmc->unplug_request_check =3D virtio_mem_unplug_request_check; @@ -1573,6 +1832,7 @@ static const TypeInfo virtio_mem_info =3D { .parent =3D TYPE_VIRTIO_DEVICE, .instance_size =3D sizeof(VirtIOMEM), .instance_init =3D virtio_mem_instance_init, + .instance_finalize =3D virtio_mem_instance_finalize, .class_init =3D virtio_mem_class_init, .class_size =3D sizeof(VirtIOMEMClass), .interfaces =3D (InterfaceInfo[]) { diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h index ab0fe2b4f2..70096957db 100644 --- a/include/hw/virtio/virtio-mem.h +++ b/include/hw/virtio/virtio-mem.h @@ -33,6 +33,7 @@ OBJECT_DECLARE_TYPE(VirtIOMEM, VirtIOMEMClass, #define VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP "unplugged-inaccessible" #define VIRTIO_MEM_EARLY_MIGRATION_PROP "x-early-migration" #define VIRTIO_MEM_PREALLOC_PROP "prealloc" +#define VIRTIO_MEM_MULTIPLE_MEMSLOTS_PROP "multiple-memslots" =20 struct VirtIOMEM { VirtIODevice parent_obj; @@ -44,7 +45,22 @@ struct VirtIOMEM { int32_t bitmap_size; unsigned long *bitmap; =20 - /* assigned memory backend and memory region */ + /* Device memory region in which we map the individual memslots. */ + MemoryRegion *mr; + + /* The individual memslots (aliases into the memory backend). */ + MemoryRegion *memslots; + + /* The total number of memslots. */ + uint16_t nb_memslots; + + /* Size of one memslot (the last one can be smaller). */ + uint64_t memslot_size; + + /* + * Assigned memory backend with the RAM memory region we split into + * memslots, to map the individual memslots only on demand. + */ HostMemoryBackend *memdev; =20 /* NUMA node */ @@ -82,6 +98,9 @@ struct VirtIOMEM { */ bool early_migration; =20 + /* Whether we may use multiple memslots instead of only a single one. = */ + bool multiple_memslots; + /* notifiers to notify when "size" changes */ NotifierList size_change_notifiers; =20 @@ -96,6 +115,8 @@ struct VirtIOMEMClass { /* public */ void (*fill_device_info)(const VirtIOMEM *vmen, VirtioMEMDeviceInfo *v= i); MemoryRegion *(*get_memory_region)(VirtIOMEM *vmem, Error **errp); + void (*decide_memslots)(VirtIOMEM *vmem, unsigned int limit); + unsigned int (*get_memslots)(VirtIOMEM *vmem); void (*add_size_change_notifier)(VirtIOMEM *vmem, Notifier *notifier); void (*remove_size_change_notifier)(VirtIOMEM *vmem, Notifier *notifie= r); void (*unplug_request_check)(VirtIOMEM *vmem, Error **errp); --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694182981798163.8133811399182; Fri, 8 Sep 2023 07:23:01 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecNW-000367-Rp; Fri, 08 Sep 2023 10:22:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNV-0002x5-1u for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:33 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNS-0001oO-A2 for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:32 -0400 Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-256-qtfm5iYsNya7i4_z3fsj9Q-1; Fri, 08 Sep 2023 10:22:26 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1EA0538157C1; Fri, 8 Sep 2023 14:22:24 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id 741FCC03295; Fri, 8 Sep 2023 14:22:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182949; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KrwsrXg6CMtd0ieBjJ6SOmEJOtdWHLLQt0tHjCxhvDg=; b=CsIgr1Wsf+4T7R+aJOFLjPp9LZIxrjcvvBVtZ9Sf8vnv9K4ahd7nckTj7y6cxZlrIPmhpi 9TSIKPkDnl200r2KgyWT4mZ2TdXQLBS9cZjIRj/PaCzXopTfg5WY7s0xf9BaknjEwaqXIB zt0Ry3ZDyz/q2i07tKOVjp89nKoFpas= X-MC-Unique: qtfm5iYsNya7i4_z3fsj9Q-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 15/16] memory, vhost: Allow for marking memory device memory regions unmergeable Date: Fri, 8 Sep 2023 16:21:35 +0200 Message-ID: <20230908142136.403541-16-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1694182982211100004 Let's allow for marking memory regions unmergeable, to teach flatview code and vhost to not merge adjacent aliases to the same memory region into a larger memory section; instead, we want separate aliases to stay separate such that we can atomically map/unmap aliases without affecting other aliases. This is desired for virtio-mem mapping device memory located on a RAM memory region via multiple aliases into a memory region container, resulting in separate memslots that can get (un)mapped atomically. As an example with virtio-mem, the layout would look something like this: [...] 0000000240000000-00000020bfffffff (prio 0, i/o): device-memory 0000000240000000-000000043fffffff (prio 0, i/o): virtio-mem 0000000240000000-000000027fffffff (prio 0, ram): alias memslot-0 @mem= 2 0000000000000000-000000003fffffff 0000000280000000-00000002bfffffff (prio 0, ram): alias memslot-1 @mem= 2 0000000040000000-000000007fffffff 00000002c0000000-00000002ffffffff (prio 0, ram): alias memslot-2 @mem= 2 0000000080000000-00000000bfffffff [...] Without unmergable memory regions, all three memslots would get merged into a single memory section. For example, when mapping another alias (e.g., virtio-mem-memslot-3) or when unmapping any of the mapped aliases, memory listeners will first get notified about the removal of the big memory section to then get notified about re-adding of the new (differently merged) memory section(s). In an ideal world, memory listeners would be able to deal with that atomically, like KVM nowadays does. However, (a) supporting this for other memory listeners (vhost-user, vfio) is fairly hard: temporary removal can result in all kinds of issues on concurrent access to guest memory; and (b) this handling is undesired, because temporarily removing+readding can consume quite some time on bigger memslots and is not efficient (e.g., vfio unpinning and repinning pages ...). Let's allow for marking a memory region unmergeable, such that we can atomically (un)map aliases to the same memory region, similar to (un)mapping individual DIMMs. Similarly, teach vhost code to not redo what flatview core stopped doing: don't merge such sections. Merging in vhost code is really only relevant for handling random holes in boot memory where; without this merging, the vhost-user backend wouldn't be able to mmap() some boot memory backed on hugetlb. We'll use this for virtio-mem next. Reviewed-by: Philippe Mathieu-Daud=C3=A9 Signed-off-by: David Hildenbrand --- hw/virtio/vhost.c | 4 ++-- include/exec/memory.h | 22 ++++++++++++++++++++++ softmmu/memory.c | 31 +++++++++++++++++++++++++------ 3 files changed, 49 insertions(+), 8 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 24013b39d6..503a160c96 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -707,7 +707,7 @@ static void vhost_region_add_section(struct vhost_dev *= dev, mrs_size, mrs_host); } =20 - if (dev->n_tmp_sections) { + if (dev->n_tmp_sections && !section->unmergeable) { /* Since we already have at least one section, lets see if * this extends it; since we're scanning in order, we only * have to look at the last one, and the FlatView that calls @@ -740,7 +740,7 @@ static void vhost_region_add_section(struct vhost_dev *= dev, size_t offset =3D mrs_gpa - prev_gpa_start; =20 if (prev_host_start + offset =3D=3D mrs_host && - section->mr =3D=3D prev_sec->mr) { + section->mr =3D=3D prev_sec->mr && !prev_sec->unmergeable)= { uint64_t max_end =3D MAX(prev_host_end, mrs_host + mrs_siz= e); need_add =3D false; prev_sec->offset_within_address_space =3D diff --git a/include/exec/memory.h b/include/exec/memory.h index 5feb704585..916d565533 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -95,6 +95,7 @@ struct ReservedRegion { * relative to the region's address space * @readonly: writes to this section are ignored * @nonvolatile: this section is non-volatile + * @unmergeable: this section should not get merged with adjacent sections */ struct MemoryRegionSection { Int128 size; @@ -104,6 +105,7 @@ struct MemoryRegionSection { hwaddr offset_within_address_space; bool readonly; bool nonvolatile; + bool unmergeable; }; =20 typedef struct IOMMUTLBEntry IOMMUTLBEntry; @@ -767,6 +769,7 @@ struct MemoryRegion { bool nonvolatile; bool rom_device; bool flush_coalesced_mmio; + bool unmergeable; uint8_t dirty_log_mask; bool is_iommu; RAMBlock *ram_block; @@ -2344,6 +2347,25 @@ void memory_region_set_size(MemoryRegion *mr, uint64= _t size); void memory_region_set_alias_offset(MemoryRegion *mr, hwaddr offset); =20 +/* + * memory_region_set_unmergeable: Set a memory region unmergeable + * + * Mark a memory region unmergeable, resulting in the memory region (or + * everything contained in a memory region container) not getting merged w= hen + * simplifying the address space and notifying memory listeners. Consequen= tly, + * memory listeners will never get notified about ranges that are larger t= han + * the original memory regions. + * + * This is primarily useful when multiple aliases to a RAM memory region a= re + * mapped into a memory region container, and updates (e.g., enable/disabl= e or + * map/unmap) of individual memory region aliases are not supposed to affe= ct + * other memory regions in the same container. + * + * @mr: the #MemoryRegion to be updated + * @unmergeable: whether to mark the #MemoryRegion unmergeable + */ +void memory_region_set_unmergeable(MemoryRegion *mr, bool unmergeable); + /** * memory_region_present: checks if an address relative to a @container * translates into #MemoryRegion within @container diff --git a/softmmu/memory.c b/softmmu/memory.c index c1e8aa133f..4e078c21af 100644 --- a/softmmu/memory.c +++ b/softmmu/memory.c @@ -224,6 +224,7 @@ struct FlatRange { bool romd_mode; bool readonly; bool nonvolatile; + bool unmergeable; }; =20 #define FOR_EACH_FLAT_RANGE(var, view) \ @@ -240,6 +241,7 @@ section_from_flat_range(FlatRange *fr, FlatView *fv) .offset_within_address_space =3D int128_get64(fr->addr.start), .readonly =3D fr->readonly, .nonvolatile =3D fr->nonvolatile, + .unmergeable =3D fr->unmergeable, }; } =20 @@ -250,7 +252,8 @@ static bool flatrange_equal(FlatRange *a, FlatRange *b) && a->offset_in_region =3D=3D b->offset_in_region && a->romd_mode =3D=3D b->romd_mode && a->readonly =3D=3D b->readonly - && a->nonvolatile =3D=3D b->nonvolatile; + && a->nonvolatile =3D=3D b->nonvolatile + && a->unmergeable =3D=3D b->unmergeable; } =20 static FlatView *flatview_new(MemoryRegion *mr_root) @@ -323,7 +326,8 @@ static bool can_merge(FlatRange *r1, FlatRange *r2) && r1->dirty_log_mask =3D=3D r2->dirty_log_mask && r1->romd_mode =3D=3D r2->romd_mode && r1->readonly =3D=3D r2->readonly - && r1->nonvolatile =3D=3D r2->nonvolatile; + && r1->nonvolatile =3D=3D r2->nonvolatile + && !r1->unmergeable && !r2->unmergeable; } =20 /* Attempt to simplify a view by merging adjacent ranges */ @@ -599,7 +603,8 @@ static void render_memory_region(FlatView *view, Int128 base, AddrRange clip, bool readonly, - bool nonvolatile) + bool nonvolatile, + bool unmergeable) { MemoryRegion *subregion; unsigned i; @@ -616,6 +621,7 @@ static void render_memory_region(FlatView *view, int128_addto(&base, int128_make64(mr->addr)); readonly |=3D mr->readonly; nonvolatile |=3D mr->nonvolatile; + unmergeable |=3D mr->unmergeable; =20 tmp =3D addrrange_make(base, mr->size); =20 @@ -629,14 +635,14 @@ static void render_memory_region(FlatView *view, int128_subfrom(&base, int128_make64(mr->alias->addr)); int128_subfrom(&base, int128_make64(mr->alias_offset)); render_memory_region(view, mr->alias, base, clip, - readonly, nonvolatile); + readonly, nonvolatile, unmergeable); return; } =20 /* Render subregions in priority order. */ QTAILQ_FOREACH(subregion, &mr->subregions, subregions_link) { render_memory_region(view, subregion, base, clip, - readonly, nonvolatile); + readonly, nonvolatile, unmergeable); } =20 if (!mr->terminates) { @@ -652,6 +658,7 @@ static void render_memory_region(FlatView *view, fr.romd_mode =3D mr->romd_mode; fr.readonly =3D readonly; fr.nonvolatile =3D nonvolatile; + fr.unmergeable =3D unmergeable; =20 /* Render the region itself into any gaps left by the current view. */ for (i =3D 0; i < view->nr && int128_nz(remain); ++i) { @@ -753,7 +760,7 @@ static FlatView *generate_memory_topology(MemoryRegion = *mr) if (mr) { render_memory_region(view, mr, int128_zero(), addrrange_make(int128_zero(), int128_2_64()), - false, false); + false, false, false); } flatview_simplify(view); =20 @@ -2751,6 +2758,18 @@ void memory_region_set_alias_offset(MemoryRegion *mr= , hwaddr offset) memory_region_transaction_commit(); } =20 +void memory_region_set_unmergeable(MemoryRegion *mr, bool unmergeable) +{ + if (unmergeable =3D=3D mr->unmergeable) { + return; + } + + memory_region_transaction_begin(); + mr->unmergeable =3D unmergeable; + memory_region_update_pending |=3D mr->enabled; + memory_region_transaction_commit(); +} + uint64_t memory_region_get_alignment(const MemoryRegion *mr) { return mr->align; --=20 2.41.0 From nobody Wed May 15 23:02:54 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1694183018; cv=none; d=zohomail.com; s=zohoarc; b=XOMAXKrsGNEvlVIhA2p7h65eoJbbUrVJUQBCQ9sYbFqbPyNFMMhF7J3Z3+6B2XiNPzJXDzFOZkacQo0DsE2QaJKa7HXjqcx1uVneEEDaoCSfINZqfbdjN9cLsLCXlBB+vvpZGnmY47k5RO7gRq5S45S/oyRLNeDg1jaHE8xXdeE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1694183018; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=R/r91JxNwqMaVpd/ZQFxQM/yxmFUNXIIPUEKh4lBVr8=; b=hhY6Z/Ft7hwD/4qhHuNbsqgxP3wWFr55PxHJUwfMSGcDMzXvj0wrZ8GJbJ5oKiE96zcSruwYhwgo3aZVYy7n1g5JKZngEA+5vAt6qSUC+q9E6gnHlBt0S320Klcu94E90eiqJf6vZ8QoniNHn2yoOHirnGaY5SKWb/KlWMZpcps= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1694183018574795.4176371757369; Fri, 8 Sep 2023 07:23:38 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qecNd-0003IR-13; Fri, 08 Sep 2023 10:22:41 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNb-0003F4-Mf for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:39 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qecNW-0001qe-E2 for qemu-devel@nongnu.org; Fri, 08 Sep 2023 10:22:39 -0400 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-657-XuMjOBYuPSC_TEBrzB_4QA-1; Fri, 08 Sep 2023 10:22:27 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EB531816524; Fri, 8 Sep 2023 14:22:26 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.194.76]) by smtp.corp.redhat.com (Postfix) with ESMTP id 59380C03295; Fri, 8 Sep 2023 14:22:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694182953; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R/r91JxNwqMaVpd/ZQFxQM/yxmFUNXIIPUEKh4lBVr8=; b=V9UUprQdkL57qa+m+/N/+dWChJf2ZCs1oDGJOlrv6NJP/YlQt/fEkG0Er3dkamZZvb6gUi uqQtxVFCONpeZs4RqjUrjJWe+g/0K7JZSqLFVqDsz/defmNUqzWAeOqdQSzsg3Jkwmh05T 8yAfTIlnoFEq5/TC+qME3G8vf12vlFQ= X-MC-Unique: XuMjOBYuPSC_TEBrzB_4QA-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v3 16/16] virtio-mem: Mark memslot alias memory regions unmergeable Date: Fri, 8 Sep 2023 16:21:36 +0200 Message-ID: <20230908142136.403541-17-david@redhat.com> In-Reply-To: <20230908142136.403541-1-david@redhat.com> References: <20230908142136.403541-1-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1694183019317100002 Let's mark the memslot alias memory regions as unmergable, such that flatview and vhost won't merge adjacent memory region aliases and we can atomically map/unmap individual aliases without affecting adjacent alias memory regions. This handles vhost and vfio in multiple-memslot mode correctly (which do not support atomic memslot updates) and avoids the temporary removal of large memslots, which can be an expensive operation. For example, vfio might have to unpin + repin a lot of memory, which is undesired. Reviewed-by: Philippe Mathieu-Daud=C3=A9 Signed-off-by: David Hildenbrand --- hw/virtio/virtio-mem.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 724fcb189a..50770b577a 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -959,6 +959,12 @@ static void virtio_mem_prepare_memslots(VirtIOMEM *vme= m) memory_region_init_alias(&vmem->memslots[idx], OBJECT(vmem), name, &vmem->memdev->mr, memslot_offset, memslot_size); + /* + * We want to be able to atomically and efficiently activate/deact= ivate + * individual memslots without affecting adjacent memslots in memo= ry + * notifiers. + */ + memory_region_set_unmergeable(&vmem->memslots[idx], true); } } =20 --=20 2.41.0