From nobody Mon May 6 17:40:57 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1643121954175627.5234487964128; Tue, 25 Jan 2022 06:45:54 -0800 (PST) Received: from localhost ([::1]:39048 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nCN4z-0000u0-6v for importer@patchew.org; Tue, 25 Jan 2022 09:45:53 -0500 Received: from eggs.gnu.org ([209.51.188.92]:52530) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCMKd-0000Js-Na for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:58:01 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:22766) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCMKQ-0002hz-22 for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:57:59 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-639-OFUL0MytMoy7TKJcp9hE9w-1; Tue, 25 Jan 2022 08:57:39 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9CC7F1018720 for ; Tue, 25 Jan 2022 13:57:38 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.46]) by smtp.corp.redhat.com (Postfix) with ESMTP id 41F5D2377B; Tue, 25 Jan 2022 13:57:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643119060; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MKVdDpmsiD0N08grHKgnsOW9qzZBzzpOjdkj/yG8Onk=; b=faNi+g0R9kHiHSBRrGkBZ+gnZIRjprgygt8OUoph6NKrXBDqZowJ8QTFQLtCsRgGR8MlB3 FZ9nphXkpxmzlowoKf5anf8hAlZXYq8rn+Zh+26JKKuHK6ZFrKwQxmSoSB+w7BeTDTg74O +hMZulpV2clxkE6Cpk+LbqOL+tg6ENo= X-MC-Unique: OFUL0MytMoy7TKJcp9hE9w-1 From: David Hildenbrand To: qemu-devel@nongnu.org Subject: [PATCH v2 1/2] virtio-mem: Fail if a memory backend with "prealloc=on" is specified Date: Tue, 25 Jan 2022 14:57:33 +0100 Message-Id: <20220125135734.134928-2-david@redhat.com> In-Reply-To: <20220125135734.134928-1-david@redhat.com> References: <20220125135734.134928-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -29 X-Spam_score: -3.0 X-Spam_bar: --- X-Spam_report: (-3.0 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.158, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Michal Privoznik , Juan Quintela , "Michael S . Tsirkin" , "Dr . David Alan Gilbert" , David Hildenbrand Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1643121956240100001 Content-Type: text/plain; charset="utf-8" "prealloc=3Don" for the memory backend does not work as expected, as virtio-mem will simply discard all preallocated memory immediately again. In the best case, it's an expensive NOP. In the worst case, it's an unexpected allocation error. Instead, "prealloc=3Don" should be specified for the virtio-mem device only, such that virtio-mem will try preallocating memory before plugging memory dynamically to the guest. Fail if such a memory backend is provided. Tested-by: Michal Privoznik Signed-off-by: David Hildenbrand --- hw/virtio/virtio-mem.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index f55dcf61f2..b7bad6ef96 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -773,6 +773,12 @@ static void virtio_mem_device_realize(DeviceState *dev= , Error **errp) error_setg(errp, "'%s' property specifies an unsupported memdev", VIRTIO_MEM_MEMDEV_PROP); return; + } else if (vmem->memdev->prealloc) { + error_setg(errp, "'%s' property specifies a memdev with preallocat= ion" + " enabled: %s. Instead, specify 'prealloc=3Don' for the" + " virtio-mem device. ", VIRTIO_MEM_MEMDEV_PROP, + object_get_canonical_path_component(OBJECT(vmem->memdev= ))); + return; } =20 if ((nb_numa_nodes && vmem->node >=3D nb_numa_nodes) || --=20 2.34.1 From nobody Mon May 6 17:40:57 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1643125397418442.53321252110584; Tue, 25 Jan 2022 07:43:17 -0800 (PST) Received: from localhost ([::1]:36740 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nCNyV-0000cu-TL for importer@patchew.org; Tue, 25 Jan 2022 10:43:15 -0500 Received: from eggs.gnu.org ([209.51.188.92]:52560) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCMKh-0000K6-EM for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:58:04 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:49976) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCMKc-0002i7-3e for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:58:01 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-547-wgctyUktMnyBUcMyQQHMUQ-1; Tue, 25 Jan 2022 08:57:41 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 79126190D37A for ; Tue, 25 Jan 2022 13:57:40 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.46]) by smtp.corp.redhat.com (Postfix) with ESMTP id EA1B82377B; Tue, 25 Jan 2022 13:57:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643119062; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jI6ReNUwkVesIrTpeu2fiSAGMyfAiTu0gkgaEEZcj6Q=; b=GgrRuCv1wCddrmelPa7lhWJhYBPPg1v1UOefZzplq4HzphHl0qCBKq3dMfvgo3el/nL4p3 SRgWKyPhj2D59Afd6Mo6K644vdAhsnvCVE9cyCzO/gwfWAL0RmKb0mGBx86kA1EtEyoR9Z k5IYM/GJSrhUASPZmeb1yeiWV4JevM0= X-MC-Unique: wgctyUktMnyBUcMyQQHMUQ-1 From: David Hildenbrand To: qemu-devel@nongnu.org Subject: [PATCH v2 2/2] virtio-mem: Handle preallocation with migration Date: Tue, 25 Jan 2022 14:57:34 +0100 Message-Id: <20220125135734.134928-3-david@redhat.com> In-Reply-To: <20220125135734.134928-1-david@redhat.com> References: <20220125135734.134928-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -29 X-Spam_score: -3.0 X-Spam_bar: --- X-Spam_report: (-3.0 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.158, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Michal Privoznik , Juan Quintela , "Michael S . Tsirkin" , "Dr . David Alan Gilbert" , David Hildenbrand Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1643125399640100001 Content-Type: text/plain; charset="utf-8" During precopy we usually write all plugged ares and essentially allocate them. However, there are two corner cases: 1) Migrating the zeropage When the zeropage gets migrated, we first check if the destination range is already zero and avoid performing a write in that case: ram_handle_compressed(). If the memory backend, like anonymous RAM or most filesystems, populate the shared zeropage when reading a (file) hole, we don't preallocate backend memory. In that case, we have to explicitly trigger the allocation to allocate actual backend memory. 2) Excluding memory ranges during migration For example, virtio-balloon free page hinting will exclude some pages from getting migrated. In that case, we don't allocate memory for plugged ranges when migrating. So trigger allocation of all plugged ranges when restoring the device state and fail gracefully if allocation fails. Handling postcopy is a bit more tricky, as postcopy and preallocation are problematic in general. To at least mimic what ordinary preallocation does, temporarily try allocating the requested amount of memory and fail postcopy in case the requested size on source and destination doesn't match. This way, we at least checked that there isn't a fundamental configuration issue and that we were able to preallocate the required amount of memory at least once, instead of failing unrecoverably during postcopy later. However, just as ordinary preallocation with postcopy, it's racy. Tested-by: Michal Privoznik Reviewed-by: Dr. David Alan Gilbert Signed-off-by: David Hildenbrand --- hw/virtio/virtio-mem.c | 136 +++++++++++++++++++++++++++++++++ include/hw/virtio/virtio-mem.h | 6 ++ 2 files changed, 142 insertions(+) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index b7bad6ef96..226081fb63 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -27,6 +27,7 @@ #include "qapi/visitor.h" #include "exec/ram_addr.h" #include "migration/misc.h" +#include "migration/postcopy-ram.h" #include "hw/boards.h" #include "hw/qdev-properties.h" #include CONFIG_DEVICES @@ -203,6 +204,30 @@ static int virtio_mem_for_each_unplugged_range(const V= irtIOMEM *vmem, void *arg, return ret; } =20 +static int virtio_mem_for_each_plugged_range(const VirtIOMEM *vmem, void *= arg, + virtio_mem_range_cb cb) +{ + unsigned long first_bit, last_bit; + uint64_t offset, size; + int ret =3D 0; + + first_bit =3D find_first_bit(vmem->bitmap, vmem->bitmap_size); + while (first_bit < vmem->bitmap_size) { + offset =3D first_bit * vmem->block_size; + last_bit =3D find_next_zero_bit(vmem->bitmap, vmem->bitmap_size, + first_bit + 1) - 1; + size =3D (last_bit - first_bit + 1) * vmem->block_size; + + ret =3D cb(vmem, arg, offset, size); + if (ret) { + break; + } + first_bit =3D find_next_bit(vmem->bitmap, vmem->bitmap_size, + last_bit + 2); + } + return ret; +} + /* * Adjust the memory section to cover the intersection with the given rang= e. * @@ -828,6 +853,7 @@ static void virtio_mem_device_realize(DeviceState *dev,= Error **errp) if (!vmem->block_size) { vmem->block_size =3D virtio_mem_default_block_size(rb); } + vmem->initial_requested_size =3D vmem->requested_size; =20 if (vmem->block_size < page_size) { error_setg(errp, "'%s' property has to be at least the page size (= 0x%" @@ -888,6 +914,7 @@ static void virtio_mem_device_realize(DeviceState *dev,= Error **errp) */ memory_region_set_ram_discard_manager(&vmem->memdev->mr, RAM_DISCARD_MANAGER(vmem)); + postcopy_add_notifier(&vmem->postcopy_notifier); } =20 static void virtio_mem_device_unrealize(DeviceState *dev) @@ -895,6 +922,7 @@ static void virtio_mem_device_unrealize(DeviceState *de= v) VirtIODevice *vdev =3D VIRTIO_DEVICE(dev); VirtIOMEM *vmem =3D VIRTIO_MEM(dev); =20 + postcopy_remove_notifier(&vmem->postcopy_notifier); /* * The unplug handler unmapped the memory region, it cannot be * found via an address space anymore. Unset ourselves. @@ -924,12 +952,119 @@ static int virtio_mem_restore_unplugged(VirtIOMEM *v= mem) virtio_mem_discard_range_cb= ); } =20 +static int virtio_mem_prealloc_range(const VirtIOMEM *vmem, uint64_t offse= t, + uint64_t size) +{ + void *area =3D memory_region_get_ram_ptr(&vmem->memdev->mr) + offset; + int fd =3D memory_region_get_fd(&vmem->memdev->mr); + Error *local_err =3D NULL; + + os_mem_prealloc(fd, area, size, 1, &local_err); + if (local_err) { + error_report_err(local_err); + return -ENOMEM; + } + return 0; +} + +static int virtio_mem_prealloc_range_cb(const VirtIOMEM *vmem, void *arg, + uint64_t offset, uint64_t size) +{ + return virtio_mem_prealloc_range(vmem, offset, size); +} + +static int virtio_mem_restore_prealloc(VirtIOMEM *vmem) +{ + /* + * Make sure any preallocated memory is really preallocated. Migration + * might have skipped some pages or optimized for the zeropage. + */ + return virtio_mem_for_each_plugged_range(vmem, NULL, + virtio_mem_prealloc_range_cb); +} + +static int virtio_mem_postcopy_notify(NotifierWithReturn *notifier, + void *opaque) +{ + struct PostcopyNotifyData *pnd =3D opaque; + VirtIOMEM *vmem =3D container_of(notifier, VirtIOMEM, postcopy_notifie= r); + RAMBlock *rb =3D vmem->memdev->mr.ram_block; + int ret; + + if (pnd->reason !=3D POSTCOPY_NOTIFY_INBOUND_ADVISE || !vmem->prealloc= || + !vmem->initial_requested_size) { + return 0; + } + assert(!vmem->size); + + /* + * When creating the device we discard all memory and we don't know + * which blocks the source has plugged (and should be preallocated) un= til we + * restore the device state. However, we cannot allocate when restorin= g the + * device state either if postcopy is already active. + * + * If we reach this point, postcopy is possible and we have preallocat= ion + * enabled. + * + * Temporarily allocate the requested size to see if there is a fundam= ental + * configuration issue that would make postcopy fail because the memory + * backend is out of memory. While this increases reliability, + * prealloc+postcopy cannot be fully reliable: see the comment in + * virtio_mem_post_load(). + */ + ret =3D virtio_mem_prealloc_range(vmem, 0, vmem->initial_requested_siz= e); + if (ram_block_discard_range(rb, 0, vmem->initial_requested_size)) { + ret =3D ret ? ret : -EINVAL; + return ret; + } + return 0; +} + static int virtio_mem_post_load(void *opaque, int version_id) { VirtIOMEM *vmem =3D VIRTIO_MEM(opaque); RamDiscardListener *rdl; int ret; =20 + if (vmem->prealloc) { + if (migration_in_incoming_postcopy()) { + /* + * Prealloc with postcopy cannot possibly work fully reliable = in + * general: preallocation has to populate all memory immediate= ly and + * fail gracefully before the guest started running on the + * destination while postcopy wants to discard memory and popu= late + * on demand after the guest started running on the destinatio= n. + * + * For ordinary memory backends, "prealloc=3Don" is essentially + * overridden by postcopy, which will simply discard prealloca= ted + * pages and might fail later when running out of backend memo= ry + * when trying to place a page: the earlier preallocation only= makes + * it less likely to fail, but nothing (not even huge page + * reservation) will guarantee that postcopy will find a free = page + * to place once the guest is running on the destination. + * + * We temporarily allocate "requested-size" during + * POSTCOPY_NOTIFY_INBOUND_ADVISE, before migrating any memory= . This + * resembles what is done with ordinary memory backends. + * + * We need to have a matching requested size on source and + * destination that we actually temporarily allocated the right + * amount of memory. As requested-size changed when restoring = the + * state, check against the initial value. + */ + if (vmem->requested_size !=3D vmem->initial_requested_size) { + error_report("postcopy with 'prealloc=3Don' needs matching" + " 'requested-size' on source and destination"= ); + return -EINVAL; + } + } else { + ret =3D virtio_mem_restore_prealloc(vmem); + if (ret) { + return ret; + } + } + } + /* * We started out with all memory discarded and our memory region is m= apped * into an address space. Replay, now that we updated the bitmap. @@ -1198,6 +1333,7 @@ static void virtio_mem_instance_init(Object *obj) =20 notifier_list_init(&vmem->size_change_notifiers); QLIST_INIT(&vmem->rdl_list); + vmem->postcopy_notifier.notify =3D virtio_mem_postcopy_notify; =20 object_property_add(obj, VIRTIO_MEM_SIZE_PROP, "size", virtio_mem_get_= size, NULL, NULL, NULL); diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h index 7745cfc1a3..45395152d2 100644 --- a/include/hw/virtio/virtio-mem.h +++ b/include/hw/virtio/virtio-mem.h @@ -61,6 +61,9 @@ struct VirtIOMEM { /* requested size */ uint64_t requested_size; =20 + /* initial requested size on startup */ + uint64_t initial_requested_size; + /* block size and alignment */ uint64_t block_size; =20 @@ -77,6 +80,9 @@ struct VirtIOMEM { /* notifiers to notify when "size" changes */ NotifierList size_change_notifiers; =20 + /* notifier for postcopy events */ + NotifierWithReturn postcopy_notifier; + /* listeners to notify on plug/unplug activity. */ QLIST_HEAD(, RamDiscardListener) rdl_list; }; --=20 2.34.1