From nobody Thu Nov 28 10:50:32 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1692969820; cv=none; d=zohomail.com; s=zohoarc; b=lL+5stvMXtg9PJm2bjCUpdI8wNxUjcmPiYf7vpkLY72NKfwgB21tyokEBrEk9PB9Gb7xishia/I+l+cw3F8C5oVQpL9XGFaB7NMjwDeamAa/PZM8niKSfnD4j97AUUlygUS3FyWtbo7d/C+gCRoeMpZB6Y16X+aW0E0WON6ige4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1692969820; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=7LOIIoUbqvFVrDtHTmWzNfIshWtFm2CK8ylVK/Ns/Os=; b=gl2Yyq/ciQHaVgslLPjHQqTAGj3A+h0p/cmuWNTmlkdEH/dsCUYoYAm3URDlL5TSsFZEoHLeReAJnz16AN9HUanVXATx012zkqqYnBFLDa/3cAk3Aj2Kz9UbjtfzL9ZN36FmFNvAzxPNWXSchSGc/zH+McMBM4GisA9ML6jHNo0= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1692969820554885.2106542733565; Fri, 25 Aug 2023 06:23:40 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qZWm6-0001vN-Bp; Fri, 25 Aug 2023 09:22:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qZWm4-0001uE-NE for qemu-devel@nongnu.org; Fri, 25 Aug 2023 09:22:52 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qZWm1-0001pm-Eg for qemu-devel@nongnu.org; Fri, 25 Aug 2023 09:22:52 -0400 Received: from mimecast-mx02.redhat.com (66.187.233.73 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-641-NdwHRLQDNEa-EI9SHiniAA-1; Fri, 25 Aug 2023 09:22:47 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 87FB71C07848; Fri, 25 Aug 2023 13:22:46 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.193.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 22987140E950; Fri, 25 Aug 2023 13:22:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1692969768; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7LOIIoUbqvFVrDtHTmWzNfIshWtFm2CK8ylVK/Ns/Os=; b=f6bZDTdYwiYPiMVyHV86B048ibdT/XBOiN0UuNAZOdL3RmdEIArWnoUSCOWCQkuatsrZaQ cmbVlHchdZQfqDs8tv4UzLaGo5op6MZMQzTrFhZUqAd/Wlh8FiLtJPdvk6d5d/sWry78tW PzFTU53+AiZPsFd93XA1LZL5SBBJsAk= X-MC-Unique: NdwHRLQDNEa-EI9SHiniAA-1 From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , Paolo Bonzini , Igor Mammedov , Xiao Guangrong , "Michael S. Tsirkin" , Peter Xu , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Eduardo Habkost , Marcel Apfelbaum , Yanan Wang , Michal Privoznik , =?UTF-8?q?Daniel=20P=20=2E=20Berrang=C3=A9?= , Gavin Shan , Alex Williamson , Stefan Hajnoczi , "Maciej S . Szmigiero" , kvm@vger.kernel.org Subject: [PATCH v2 14/16] virtio-mem: Expose device memory via multiple memslots if enabled Date: Fri, 25 Aug 2023 15:21:47 +0200 Message-ID: <20230825132149.366064-15-david@redhat.com> In-Reply-To: <20230825132149.366064-1-david@redhat.com> References: <20230825132149.366064-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1692969821846100005 Content-Type: text/plain; charset="utf-8" Having large virtio-mem devices that only expose little memory to a VM is currently a problem: we map the whole sparse memory region into the guest using a single memslot, resulting in one gigantic memslot in KVM. KVM allocates metadata for the whole memslot, which can result in quite some memory waste. Assuming we have a 1 TiB virtio-mem device and only expose little (e.g., 1 GiB) memory, we would create a single 1 TiB memslot and KVM has to allocate metadata for that 1 TiB memslot: on x86, this implies allocating a significant amount of memory for metadata: (1) RMAP: 8 bytes per 4 KiB, 8 bytes per 2 MiB, 8 bytes per 1 GiB -> For 1 TiB: 2147483648 + 4194304 + 8192 =3D ~ 2 GiB (0.2 %) With the TDP MMU (cat /sys/module/kvm/parameters/tdp_mmu) this gets allocated lazily when required for nested VMs (2) gfn_track: 2 bytes per 4 KiB -> For 1 TiB: 536870912 =3D ~512 MiB (0.05 %) (3) lpage_info: 4 bytes per 2 MiB, 4 bytes per 1 GiB -> For 1 TiB: 2097152 + 4096 =3D ~2 MiB (0.0002 %) (4) 2x dirty bitmaps for tracking: 2x 1 bit per 4 KiB page -> For 1 TiB: 536870912 =3D 64 MiB (0.006 %) So we primarily care about (1) and (2). The bad thing is, that the memory consumption *doubles* once SMM is enabled, because we create the memslot once for !SMM and once for SMM. Having a 1 TiB memslot without the TDP MMU consumes around: * With SMM: 5 GiB * Without SMM: 2.5 GiB Having a 1 TiB memslot with the TDP MMU consumes around: * With SMM: 1 GiB * Without SMM: 512 MiB ... and that's really something we want to optimize, to be able to just start a VM with small boot memory (e.g., 4 GiB) and a virtio-mem device that can grow very large (e.g., 1 TiB). Consequently, using multiple memslots and only mapping the memslots we really need can significantly reduce memory waste and speed up memslot-related operations. Let's expose the sparse RAM memory region using multiple memslots, mapping only the memslots we currently need into our device memory region container. * With VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we only map the memslots that actually have memory plugged, and dynamically (un)map when (un)plugging memory blocks. * Without VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we always map the memslots covered by the usable region, and dynamically (un)map when resizing the usable region. We'll auto-detect the number of memslots to use based on the memslot limit provided by the core. We'll use at most 1 memslot per gigabyte. Note that our global limit of memslots accross all memory devices is currently set to 256: even with multiple large virtio-mem devices, we'd still have a sane limit on the number of memslots used. The default is a single memslot for now ("multiple-memslots=3Doff"). The optimization must be enabled manually using "multiple-memslots=3Don", becau= se some vhost setups (e.g., hotplug of vhost-user devices) might be problematic until we support more memslots especially in vhost-user backends. Note that "multiple-memslots=3Don" is just a hint that multiple memslots *may* be used for internal optimizations, not that multiple memslots *must* be used. The actual number of memslots that are used is an internal detail: for example, once memslot metadata is no longer an issue, we could simply stop optimizing for that. Migration source and destination can differ on the setting of "multiple-memslots". Signed-off-by: David Hildenbrand --- hw/virtio/virtio-mem-pci.c | 21 +++ hw/virtio/virtio-mem.c | 266 ++++++++++++++++++++++++++++++++- include/hw/virtio/virtio-mem.h | 23 ++- 3 files changed, 306 insertions(+), 4 deletions(-) diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c index c4597e029e..1b4e9a3284 100644 --- a/hw/virtio/virtio-mem-pci.c +++ b/hw/virtio/virtio-mem-pci.c @@ -48,6 +48,25 @@ static MemoryRegion *virtio_mem_pci_get_memory_region(Me= moryDeviceState *md, return vmc->get_memory_region(vmem, errp); } =20 +static void virtio_mem_pci_decide_memslots(MemoryDeviceState *md, + unsigned int limit) +{ + VirtIOMEMPCI *pci_mem =3D VIRTIO_MEM_PCI(md); + VirtIOMEM *vmem =3D VIRTIO_MEM(&pci_mem->vdev); + VirtIOMEMClass *vmc =3D VIRTIO_MEM_GET_CLASS(vmem); + + vmc->decide_memslots(vmem, limit); +} + +static unsigned int virtio_mem_pci_get_memslots(MemoryDeviceState *md) +{ + VirtIOMEMPCI *pci_mem =3D VIRTIO_MEM_PCI(md); + VirtIOMEM *vmem =3D VIRTIO_MEM(&pci_mem->vdev); + VirtIOMEMClass *vmc =3D VIRTIO_MEM_GET_CLASS(vmem); + + return vmc->get_memslots(vmem); +} + static uint64_t virtio_mem_pci_get_plugged_size(const MemoryDeviceState *m= d, Error **errp) { @@ -150,6 +169,8 @@ static void virtio_mem_pci_class_init(ObjectClass *klas= s, void *data) mdc->set_addr =3D virtio_mem_pci_set_addr; mdc->get_plugged_size =3D virtio_mem_pci_get_plugged_size; mdc->get_memory_region =3D virtio_mem_pci_get_memory_region; + mdc->decide_memslots =3D virtio_mem_pci_decide_memslots; + mdc->get_memslots =3D virtio_mem_pci_get_memslots; mdc->fill_device_info =3D virtio_mem_pci_fill_device_info; mdc->get_min_alignment =3D virtio_mem_pci_get_min_alignment; =20 diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index b6e781741e..724fcb189a 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -66,6 +66,13 @@ static uint32_t virtio_mem_default_thp_size(void) return default_thp_size; } =20 +/* + * The minimum memslot size depends on this setting ("sane default"), the + * device block size, and the memory backend page size. The last (or singl= e) + * memslot might be smaller than this constant. + */ +#define VIRTIO_MEM_MIN_MEMSLOT_SIZE (1 * GiB) + /* * We want to have a reasonable default block size such that * 1. We avoid splitting THPs when unplugging memory, which degrades @@ -483,6 +490,94 @@ static bool virtio_mem_valid_range(const VirtIOMEM *vm= em, uint64_t gpa, return true; } =20 +static void virtio_mem_activate_memslot(VirtIOMEM *vmem, unsigned int idx) +{ + const uint64_t memslot_offset =3D idx * vmem->memslot_size; + + /* + * Instead of enabling/disabling memslot, we add/remove them. This sho= uld + * make address space updates faster, because we don't have to loop ov= er + * many disabled subregions. + */ + if (memory_region_is_mapped(&vmem->memslots[idx])) { + return; + } + memory_region_add_subregion(vmem->mr, memslot_offset, &vmem->memslots[= idx]); +} + +static void virtio_mem_deactivate_memslot(VirtIOMEM *vmem, unsigned int id= x) +{ + if (!memory_region_is_mapped(&vmem->memslots[idx])) { + return; + } + memory_region_del_subregion(vmem->mr, &vmem->memslots[idx]); +} + +static void virtio_mem_activate_memslots_to_plug(VirtIOMEM *vmem, + uint64_t offset, uint64_t= size) +{ + const unsigned int start_idx =3D offset / vmem->memslot_size; + const unsigned int end_idx =3D (offset + size + vmem->memslot_size - 1= ) / + vmem->memslot_size; + unsigned int idx; + + if (vmem->unplugged_inaccessible =3D=3D ON_OFF_AUTO_OFF) { + /* All memslots covered by the usable region are always enabled. */ + return; + } + + /* Activate all involved memslots in a single transaction. */ + memory_region_transaction_begin(); + for (idx =3D start_idx; idx < end_idx; idx++) { + virtio_mem_activate_memslot(vmem, idx); + } + memory_region_transaction_commit(); +} + +static void virtio_mem_deactivate_unplugged_memslots(VirtIOMEM *vmem, + uint64_t offset, + uint64_t size) +{ + const uint64_t region_size =3D memory_region_size(&vmem->memdev->mr); + const unsigned int start_idx =3D offset / vmem->memslot_size; + const unsigned int end_idx =3D (offset + size + vmem->memslot_size - 1= ) / + vmem->memslot_size; + unsigned int idx; + + if (vmem->unplugged_inaccessible =3D=3D ON_OFF_AUTO_OFF) { + /* All memslots covered by the usable region are always enabled. */ + return; + } + + /* Deactivate all memslots with unplugged blocks in a single transacti= on. */ + memory_region_transaction_begin(); + for (idx =3D start_idx; idx < end_idx; idx++) { + const uint64_t memslot_offset =3D idx * vmem->memslot_size; + uint64_t memslot_size =3D vmem->memslot_size; + + /* The size of the last memslot might be smaller. */ + if (idx =3D=3D vmem->nb_memslots - 1) { + memslot_size =3D region_size - memslot_offset; + } + + /* + * Partially covered memslots might still have some blocks plugged= and + * have to remain enabled if that's the case. + */ + if (offset > memslot_offset || + offset + size < memslot_offset + memslot_size) { + const uint64_t gpa =3D vmem->addr + memslot_offset; + + if (!virtio_mem_is_range_unplugged(vmem, gpa, memslot_size)) { + continue; + } + } + + virtio_mem_deactivate_memslot(vmem, idx); + } + memory_region_transaction_commit(); +} + static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa, uint64_t size, bool plug) { @@ -500,6 +595,8 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, = uint64_t start_gpa, } virtio_mem_notify_unplug(vmem, offset, size); virtio_mem_set_range_unplugged(vmem, start_gpa, size); + /* Disable completely unplugged memslots after updating the state.= */ + virtio_mem_deactivate_unplugged_memslots(vmem, offset, size); return 0; } =20 @@ -527,7 +624,20 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem,= uint64_t start_gpa, } =20 if (!ret) { + /* + * Activate before notifying and rollback in case of any errors. + * + * When enabling a yet disabled memslot, memory notifiers will get + * notified about the added memory region and can register with the + * RamDiscardManager; this will traverse all plugged blocks and sk= ip the + * blocks we are plugging here. The following notification will in= form + * registered listeners about the blocks we're plugging. + */ + virtio_mem_activate_memslots_to_plug(vmem, offset, size); ret =3D virtio_mem_notify_plug(vmem, offset, size); + if (ret) { + virtio_mem_deactivate_unplugged_memslots(vmem, offset, size); + } } if (ret) { /* Could be preallocation or a notifier populated memory. */ @@ -602,6 +712,7 @@ static void virtio_mem_resize_usable_region(VirtIOMEM *= vmem, { uint64_t newsize =3D MIN(memory_region_size(&vmem->memdev->mr), requested_size + VIRTIO_MEM_USABLE_EXTENT); + unsigned int idx; =20 /* The usable region size always has to be multiples of the block size= . */ newsize =3D QEMU_ALIGN_UP(newsize, vmem->block_size); @@ -616,12 +727,33 @@ static void virtio_mem_resize_usable_region(VirtIOMEM= *vmem, =20 trace_virtio_mem_resized_usable_region(vmem->usable_region_size, newsi= ze); vmem->usable_region_size =3D newsize; + + if (vmem->unplugged_inaccessible =3D=3D ON_OFF_AUTO_OFF) { + /* + * Activate all memslots covered by the usable region and deactiva= te the + * remaining ones in a single transaction. + */ + memory_region_transaction_begin(); + for (idx =3D 0; idx < vmem->nb_memslots; idx++) { + if (vmem->memslot_size * idx < vmem->usable_region_size) { + virtio_mem_activate_memslot(vmem, idx); + } else { + virtio_mem_deactivate_memslot(vmem, idx); + } + } + memory_region_transaction_commit(); + } } =20 static int virtio_mem_unplug_all(VirtIOMEM *vmem) { + const uint64_t region_size =3D memory_region_size(&vmem->memdev->mr); RAMBlock *rb =3D vmem->memdev->mr.ram_block; =20 + if (virtio_mem_is_busy()) { + return -EBUSY; + } + if (vmem->size) { if (virtio_mem_is_busy()) { return -EBUSY; @@ -634,6 +766,9 @@ static int virtio_mem_unplug_all(VirtIOMEM *vmem) bitmap_clear(vmem->bitmap, 0, vmem->bitmap_size); vmem->size =3D 0; notifier_list_notify(&vmem->size_change_notifiers, &vmem->size); + + /* Deactivate all memslots after updating the state. */ + virtio_mem_deactivate_unplugged_memslots(vmem, 0, region_size); } =20 trace_virtio_mem_unplugged_all(); @@ -790,6 +925,43 @@ static void virtio_mem_system_reset(void *opaque) virtio_mem_unplug_all(vmem); } =20 +static void virtio_mem_prepare_mr(VirtIOMEM *vmem) +{ + const uint64_t region_size =3D memory_region_size(&vmem->memdev->mr); + + g_assert(!vmem->mr); + vmem->mr =3D g_new0(MemoryRegion, 1); + memory_region_init(vmem->mr, OBJECT(vmem), "virtio-mem", + region_size); + vmem->mr->align =3D memory_region_get_alignment(&vmem->memdev->mr); +} + +static void virtio_mem_prepare_memslots(VirtIOMEM *vmem) +{ + const uint64_t region_size =3D memory_region_size(&vmem->memdev->mr); + unsigned int idx; + + g_assert(!vmem->memslots && vmem->nb_memslots); + vmem->memslots =3D g_new0(MemoryRegion, vmem->nb_memslots); + + /* Initialize our memslots, but don't map them yet. */ + for (idx =3D 0; idx < vmem->nb_memslots; idx++) { + const uint64_t memslot_offset =3D idx * vmem->memslot_size; + uint64_t memslot_size =3D vmem->memslot_size; + char name[20]; + + /* The size of the last memslot might be smaller. */ + if (idx =3D=3D vmem->nb_memslots - 1) { + memslot_size =3D region_size - memslot_offset; + } + + snprintf(name, sizeof(name), "memslot-%u", idx); + memory_region_init_alias(&vmem->memslots[idx], OBJECT(vmem), name, + &vmem->memdev->mr, memslot_offset, + memslot_size); + } +} + static void virtio_mem_device_realize(DeviceState *dev, Error **errp) { MachineState *ms =3D MACHINE(qdev_get_machine()); @@ -921,8 +1093,6 @@ static void virtio_mem_device_realize(DeviceState *dev= , Error **errp) } } =20 - virtio_mem_resize_usable_region(vmem, vmem->requested_size, true); - vmem->bitmap_size =3D memory_region_size(&vmem->memdev->mr) / vmem->block_size; vmem->bitmap =3D bitmap_new(vmem->bitmap_size); @@ -930,6 +1100,18 @@ static void virtio_mem_device_realize(DeviceState *de= v, Error **errp) virtio_init(vdev, VIRTIO_ID_MEM, sizeof(struct virtio_mem_config)); vmem->vq =3D virtio_add_queue(vdev, 128, virtio_mem_handle_request); =20 + if (!vmem->mr) { + virtio_mem_prepare_mr(vmem); + } + if (!vmem->nb_memslots || vmem->nb_memslots =3D=3D 1) { + vmem->nb_memslots =3D 1; + vmem->memslot_size =3D memory_region_size(&vmem->memdev->mr); + } + if (!vmem->memslots) { + virtio_mem_prepare_memslots(vmem); + } + + virtio_mem_resize_usable_region(vmem, vmem->requested_size, true); host_memory_backend_set_mapped(vmem->memdev, true); vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem)); if (vmem->early_migration) { @@ -963,6 +1145,7 @@ static void virtio_mem_device_unrealize(DeviceState *d= ev) } vmstate_unregister_ram(&vmem->memdev->mr, DEVICE(vmem)); host_memory_backend_set_mapped(vmem->memdev, false); + virtio_mem_resize_usable_region(vmem, 0, true); virtio_del_queue(vdev, 0); virtio_cleanup(vdev); g_free(vmem->bitmap); @@ -1235,9 +1418,66 @@ static MemoryRegion *virtio_mem_get_memory_region(Vi= rtIOMEM *vmem, Error **errp) if (!vmem->memdev) { error_setg(errp, "'%s' property must be set", VIRTIO_MEM_MEMDEV_PR= OP); return NULL; + } else if (!vmem->mr) { + virtio_mem_prepare_mr(vmem); } =20 - return &vmem->memdev->mr; + return vmem->mr; +} + +static void virtio_mem_decide_memslots(VirtIOMEM *vmem, unsigned int limit) +{ + uint64_t region_size, memslot_size, min_memslot_size; + unsigned int memslots; + RAMBlock *rb; + + /* We're called exactly once, before realizing the device. */ + g_assert(!vmem->nb_memslots); + + /* If realizing the device will fail, just assume a single memslot. */ + if (limit <=3D 1 || !vmem->multiple_memslots || !vmem->memdev || + !vmem->memdev->mr.ram_block) { + vmem->nb_memslots =3D 1; + return; + } + + rb =3D vmem->memdev->mr.ram_block; + region_size =3D memory_region_size(&vmem->memdev->mr); + + /* + * Determine the default block size now, to determine the minimum mems= lot + * size. We want the minimum slot size to be at least the device block= size. + */ + if (!vmem->block_size) { + vmem->block_size =3D virtio_mem_default_block_size(rb); + } + /* If realizing the device will fail, just assume a single memslot. */ + if (vmem->block_size < qemu_ram_pagesize(rb) || + !QEMU_IS_ALIGNED(region_size, vmem->block_size)) { + vmem->nb_memslots =3D 1; + return; + } + + /* + * All memslots except the last one have a reasonable minimum size, and + * and all memslot sizes are aligned to the device block size. + */ + memslot_size =3D QEMU_ALIGN_UP(region_size / limit, vmem->block_size); + min_memslot_size =3D MAX(vmem->block_size, VIRTIO_MEM_MIN_MEMSLOT_SIZE= ); + memslot_size =3D MAX(memslot_size, min_memslot_size); + + memslots =3D QEMU_ALIGN_UP(region_size, memslot_size) / memslot_size; + if (memslots !=3D 1) { + vmem->memslot_size =3D memslot_size; + } + vmem->nb_memslots =3D memslots; +} + +static unsigned int virtio_mem_get_memslots(VirtIOMEM *vmem) +{ + /* We're called after instructed to make a decision. */ + g_assert(vmem->nb_memslots); + return vmem->nb_memslots; } =20 static void virtio_mem_add_size_change_notifier(VirtIOMEM *vmem, @@ -1377,6 +1617,21 @@ static void virtio_mem_instance_init(Object *obj) NULL, NULL); } =20 +static void virtio_mem_instance_finalize(Object *obj) +{ + VirtIOMEM *vmem =3D VIRTIO_MEM(obj); + + /* + * Note: the core already dropped the references on all memory regions + * (it's passed as the owner to memory_region_init_*()) and finalized + * these objects. We can simply free the memory. + */ + g_free(vmem->memslots); + vmem->memslots =3D NULL; + g_free(vmem->mr); + vmem->mr =3D NULL; +} + static Property virtio_mem_properties[] =3D { DEFINE_PROP_UINT64(VIRTIO_MEM_ADDR_PROP, VirtIOMEM, addr, 0), DEFINE_PROP_UINT32(VIRTIO_MEM_NODE_PROP, VirtIOMEM, node, 0), @@ -1389,6 +1644,8 @@ static Property virtio_mem_properties[] =3D { #endif DEFINE_PROP_BOOL(VIRTIO_MEM_EARLY_MIGRATION_PROP, VirtIOMEM, early_migration, true), + DEFINE_PROP_BOOL(VIRTIO_MEM_MULTIPLE_MEMSLOTS_PROP, VirtIOMEM, + multiple_memslots, false), DEFINE_PROP_END_OF_LIST(), }; =20 @@ -1556,6 +1813,8 @@ static void virtio_mem_class_init(ObjectClass *klass,= void *data) =20 vmc->fill_device_info =3D virtio_mem_fill_device_info; vmc->get_memory_region =3D virtio_mem_get_memory_region; + vmc->decide_memslots =3D virtio_mem_decide_memslots; + vmc->get_memslots =3D virtio_mem_get_memslots; vmc->add_size_change_notifier =3D virtio_mem_add_size_change_notifier; vmc->remove_size_change_notifier =3D virtio_mem_remove_size_change_not= ifier; vmc->unplug_request_check =3D virtio_mem_unplug_request_check; @@ -1573,6 +1832,7 @@ static const TypeInfo virtio_mem_info =3D { .parent =3D TYPE_VIRTIO_DEVICE, .instance_size =3D sizeof(VirtIOMEM), .instance_init =3D virtio_mem_instance_init, + .instance_finalize =3D virtio_mem_instance_finalize, .class_init =3D virtio_mem_class_init, .class_size =3D sizeof(VirtIOMEMClass), .interfaces =3D (InterfaceInfo[]) { diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h index ab0fe2b4f2..70096957db 100644 --- a/include/hw/virtio/virtio-mem.h +++ b/include/hw/virtio/virtio-mem.h @@ -33,6 +33,7 @@ OBJECT_DECLARE_TYPE(VirtIOMEM, VirtIOMEMClass, #define VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP "unplugged-inaccessible" #define VIRTIO_MEM_EARLY_MIGRATION_PROP "x-early-migration" #define VIRTIO_MEM_PREALLOC_PROP "prealloc" +#define VIRTIO_MEM_MULTIPLE_MEMSLOTS_PROP "multiple-memslots" =20 struct VirtIOMEM { VirtIODevice parent_obj; @@ -44,7 +45,22 @@ struct VirtIOMEM { int32_t bitmap_size; unsigned long *bitmap; =20 - /* assigned memory backend and memory region */ + /* Device memory region in which we map the individual memslots. */ + MemoryRegion *mr; + + /* The individual memslots (aliases into the memory backend). */ + MemoryRegion *memslots; + + /* The total number of memslots. */ + uint16_t nb_memslots; + + /* Size of one memslot (the last one can be smaller). */ + uint64_t memslot_size; + + /* + * Assigned memory backend with the RAM memory region we split into + * memslots, to map the individual memslots only on demand. + */ HostMemoryBackend *memdev; =20 /* NUMA node */ @@ -82,6 +98,9 @@ struct VirtIOMEM { */ bool early_migration; =20 + /* Whether we may use multiple memslots instead of only a single one. = */ + bool multiple_memslots; + /* notifiers to notify when "size" changes */ NotifierList size_change_notifiers; =20 @@ -96,6 +115,8 @@ struct VirtIOMEMClass { /* public */ void (*fill_device_info)(const VirtIOMEM *vmen, VirtioMEMDeviceInfo *v= i); MemoryRegion *(*get_memory_region)(VirtIOMEM *vmem, Error **errp); + void (*decide_memslots)(VirtIOMEM *vmem, unsigned int limit); + unsigned int (*get_memslots)(VirtIOMEM *vmem); void (*add_size_change_notifier)(VirtIOMEM *vmem, Notifier *notifier); void (*remove_size_change_notifier)(VirtIOMEM *vmem, Notifier *notifie= r); void (*unplug_request_check)(VirtIOMEM *vmem, Error **errp); --=20 2.41.0