From nobody Sat Nov 15 15:29:49 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1750709321; cv=none; d=zohomail.com; s=zohoarc; b=OeOM7WYiR6t3oMwXPZhGeucGpgX6j54ZsbVAvlACz0cJBbPwS8/fRf/lyXCjSxHL1Bgl6sou6Ph4KBxNalTa81AAUQDcW3bEOtMYcheVJyC/HPyLNoFEzZJZV1JaLW8NL6K6My7E1nFmyiIxiiXlp6ykFGIv2pFu8Gl4kGpTs+s= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1750709321; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=5TZkEPcKUD0IF2Ws/fLavANEe8cHL59ROo6m22DohYw=; b=DK4vQnNRXPCegg6bQFU2qc/7pd64UU5vlyDRNP5nv/f34S9+95z1JylAXUehsaxaJtxpuqpWWCgpTpJOSWb2U1wVQNiLdWFnfwDF9qdISvMRvwOTrYHCGki21F2s1+XUc59r2dK/Ji3sSvG+2XKiq2HSgqrHTH2DwrL4PQjmTWA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1750709321332507.2078428703952; Mon, 23 Jun 2025 13:08:41 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uTnRt-00063r-L4; Mon, 23 Jun 2025 16:07:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uTnRq-00063D-T3 for qemu-devel@nongnu.org; Mon, 23 Jun 2025 16:07:22 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uTnRo-0003YV-9h for qemu-devel@nongnu.org; Mon, 23 Jun 2025 16:07:22 -0400 Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-97-PAze08d5MLyDQ2nET_RuoQ-1; Mon, 23 Jun 2025 16:07:15 -0400 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-6faeebe9c5bso39565096d6.2 for ; Mon, 23 Jun 2025 13:07:15 -0700 (PDT) Received: from x1.com ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6fd095a775bsm48427976d6.112.2025.06.23.13.07.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Jun 2025 13:07:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750709238; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5TZkEPcKUD0IF2Ws/fLavANEe8cHL59ROo6m22DohYw=; b=RbkUstfi3Fo/A2Mo8LILuLDj48x+nQKBDIeyQc8MtBlCvbvGJ5hDPECAI0FM4xsj9O1N/d VRTET0NRFM7gO16nnfpuryG5ZlRVBzuaM8de8SoxBl8TfZ+snisNiOP79A2H71DAxYRGdb smQR5cyJp+xiapFAXIcRs2ozDbi4Lqk= X-MC-Unique: PAze08d5MLyDQ2nET_RuoQ-1 X-Mimecast-MFC-AGG-ID: PAze08d5MLyDQ2nET_RuoQ_1750709235 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750709235; x=1751314035; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5TZkEPcKUD0IF2Ws/fLavANEe8cHL59ROo6m22DohYw=; b=aHfhajgFmS0Ai/rutBBCGuWpEZ4joiDMGkhIbeMKNS7+1/S/rvsfpKeyBTCXTd9snX iuCdtVvwy8ELt4fYCDDXQDdVk6hRBZqrhaP1zAl2UTB3iNA1VKo9wq1/G9g3JUJRRePD 7V07joAzPbrRopIpKqeEZvQPkmcr2eTGw+/qcApt5JtzLQjomoTltiyw7KNFN1TR8bBX yXOa3G7qgI0hrpHkD1SjuguPDa9ZOPewwusOoE1Cfm0hw6LDtBM4qoq4exiv6q3M/asL /1qsg2xh3AVIhloR1Cc6Am4ddG7ht0r4GObrz9xkUxRu6ACuZpSS3NwI8hqb8xZVs3YE +bWg== X-Gm-Message-State: AOJu0Yx3bl4rpMBQqZsevNs/7uhjE5x2VJEKIxijbVrORVJXwoTQmdHK Q2yh0OD9g+ZFboFV81umBe62edH2TwMq0dj8oCPVemcqCCqxO36xdj3LLuY2rCb/U4+NrAnzNaK je7pXxHKkm6cN2N/5I8sUNj1hlbtshzMZXOjP2QCmQNoBY7nEtTF8Efki/mwzFAd4UnOskjDOXa gF5vqUIANG5NpwvNw0BpiLI62n7xIN5kuzj6XX7g== X-Gm-Gg: ASbGnctMagorayO6BTaqTXS3IXsBVxOh3NYs8VyBPd2h1ioYJinqsREJtpcGGc4TJSp 1Y6R8iuzt1QGOlYi7LTGv2c5z6atDx+UIZg7YFjznefwqvakHBEJodqaOIePhUyl8eqEx+wC99l SF333VINpDStvO48RBGp0DAupoK8JdtISq/3L7pCQhFsGy+/R8j3rCSfvLLVEzTy8DzYbPon9dC SK0DRw7BSBqIsR4yhktUqvL01mqfvZfnAeD+GEhVWtICq28lERa4FEUxsl5pA/NXDOy9TrfVumD vRr4KhlfZdw= X-Received: by 2002:a05:6214:e4a:b0:6fa:ccb6:6036 with SMTP id 6a1803df08f44-6fd0a591efbmr225451886d6.35.1750709234574; Mon, 23 Jun 2025 13:07:14 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH3TN3WkHGTyhYJHWfv4+tN4GuJlRo6W5GEPVNknNlrNa2p3zvRp/ov9wjd3lJAl1/zV8HjTQ== X-Received: by 2002:a05:6214:e4a:b0:6fa:ccb6:6036 with SMTP id 6a1803df08f44-6fd0a591efbmr225450936d6.35.1750709233687; Mon, 23 Jun 2025 13:07:13 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org, Stefan Hajnoczi Cc: peterx@redhat.com, Fabiano Rosas , Paolo Bonzini , David Hildenbrand , Chenyi Qiang , Alexey Kardashevskiy , Pankaj Gupta Subject: [PULL 10/11] ram-block-attributes: Introduce RamBlockAttributes to manage RAMBlock with guest_memfd Date: Mon, 23 Jun 2025 16:06:55 -0400 Message-ID: <20250623200656.1882674-11-peterx@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250623200656.1882674-1-peterx@redhat.com> References: <20250623200656.1882674-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1750709323768116600 Content-Type: text/plain; charset="utf-8" From: Chenyi Qiang Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated discard") highlighted that subsystems like VFIO may disable RAM block discard. However, guest_memfd relies on discard operations for page conversion between private and shared memory, potentially leading to the stale IOMMU mapping issue when assigning hardware devices to confidential VMs via shared memory. To address this and allow shared device assignement, it is crucial to ensure the VFIO system refreshes its IOMMU mappings. RamDiscardManager is an existing interface (used by virtio-mem) to adjust VFIO mappings in relation to VM page assignment. Effectively page conversion is similar to hot-removing a page in one mode and adding it back in the other. Therefore, similar actions are required for page conversion events. Introduce the RamDiscardManager to guest_memfd to facilitate this process. Since guest_memfd is not an object, it cannot directly implement the RamDiscardManager interface. Implementing it in HostMemoryBackend is not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks have a memory backend while others do not. Notably, virtual BIOS RAMBlocks using memory_region_init_ram_guest_memfd() do not have a backend. To manage RAMBlocks with guest_memfd, define a new object named RamBlockAttributes to implement the RamDiscardManager interface. This object can store the guest_memfd information such as the bitmap for shared memory and the registered listeners for event notifications. A new state_change() helper function is provided to notify listeners, such as VFIO, allowing VFIO to do dynamically DMA map and unmap for the shared memory according to conversion events. Note that in the current context of RamDiscardManager for guest_memfd, the shared state is analogous to being populated, while the private state can be considered discarded for simplicity. In the future, it would be more complicated if considering more states like private/shared/discarded at the same time. In current implementation, memory state tracking is performed at the host page size granularity, as the minimum conversion size can be one page per request. Additionally, VFIO expected the DMA mapping for a specific IOVA to be mapped and unmapped with the same granularity. Confidential VMs may perform partial conversions, such as conversions on small regions within a larger one. To prevent such invalid cases and until support for DMA mapping cut operations is available, all operations are performed with 4K granularity. In addition, memory conversion failures cause QEMU to quit rather than resuming the guest or retrying the operation at present. It would be future work to add more error handling or rollback mechanisms once conversion failures are allowed. For example, in-place conversion of guest_memfd could retry the unmap operation during the conversion from shared to private. For now, keep the complex error handling out of the picture as it is not required. Tested-by: Alexey Kardashevskiy Reviewed-by: Alexey Kardashevskiy Reviewed-by: Pankaj Gupta Signed-off-by: Chenyi Qiang Link: https://lore.kernel.org/r/20250612082747.51539-5-chenyi.qiang@intel.c= om [peterx: squash fixup from Chenyi to fix builds] Signed-off-by: Peter Xu --- MAINTAINERS | 1 + include/system/ramblock.h | 21 ++ system/ram-block-attributes.c | 444 ++++++++++++++++++++++++++++++++++ system/meson.build | 1 + system/trace-events | 3 + 5 files changed, 470 insertions(+) create mode 100644 system/ram-block-attributes.c diff --git a/MAINTAINERS b/MAINTAINERS index 94c4076127..27f4fe3f25 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3175,6 +3175,7 @@ F: system/memory.c F: system/memory_mapping.c F: system/physmem.c F: system/memory-internal.h +F: system/ram-block-attributes.c F: scripts/coccinelle/memory-region-housekeeping.cocci =20 Memory devices diff --git a/include/system/ramblock.h b/include/system/ramblock.h index d8a116ba99..1bab9e2dac 100644 --- a/include/system/ramblock.h +++ b/include/system/ramblock.h @@ -22,6 +22,10 @@ #include "exec/cpu-common.h" #include "qemu/rcu.h" #include "exec/ramlist.h" +#include "system/hostmem.h" + +#define TYPE_RAM_BLOCK_ATTRIBUTES "ram-block-attributes" +OBJECT_DECLARE_SIMPLE_TYPE(RamBlockAttributes, RAM_BLOCK_ATTRIBUTES) =20 struct RAMBlock { struct rcu_head rcu; @@ -91,4 +95,21 @@ struct RAMBlock { ram_addr_t postcopy_length; }; =20 +struct RamBlockAttributes { + Object parent; + + RAMBlock *ram_block; + + /* 1-setting of the bitmap represents ram is populated (shared) */ + unsigned bitmap_size; + unsigned long *bitmap; + + QLIST_HEAD(, RamDiscardListener) rdl_list; +}; + +RamBlockAttributes *ram_block_attributes_create(RAMBlock *ram_block); +void ram_block_attributes_destroy(RamBlockAttributes *attr); +int ram_block_attributes_state_change(RamBlockAttributes *attr, uint64_t o= ffset, + uint64_t size, bool to_discard); + #endif diff --git a/system/ram-block-attributes.c b/system/ram-block-attributes.c new file mode 100644 index 0000000000..68e8a02703 --- /dev/null +++ b/system/ram-block-attributes.c @@ -0,0 +1,444 @@ +/* + * QEMU ram block attributes + * + * Copyright Intel + * + * Author: + * Chenyi Qiang + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "system/ramblock.h" +#include "trace.h" + +OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(RamBlockAttributes, + ram_block_attributes, + RAM_BLOCK_ATTRIBUTES, + OBJECT, + { TYPE_RAM_DISCARD_MANAGER }, + { }) + +static size_t +ram_block_attributes_get_block_size(const RamBlockAttributes *attr) +{ + /* + * Because page conversion could be manipulated in the size of at leas= t 4K + * or 4K aligned, Use the host page size as the granularity to track t= he + * memory attribute. + */ + g_assert(attr && attr->ram_block); + g_assert(attr->ram_block->page_size =3D=3D qemu_real_host_page_size()); + return attr->ram_block->page_size; +} + + +static bool +ram_block_attributes_rdm_is_populated(const RamDiscardManager *rdm, + const MemoryRegionSection *section) +{ + const RamBlockAttributes *attr =3D RAM_BLOCK_ATTRIBUTES(rdm); + const size_t block_size =3D ram_block_attributes_get_block_size(attr); + const uint64_t first_bit =3D section->offset_within_region / block_siz= e; + const uint64_t last_bit =3D + first_bit + int128_get64(section->size) / block_size - 1; + unsigned long first_discarded_bit; + + first_discarded_bit =3D find_next_zero_bit(attr->bitmap, last_bit + 1, + first_bit); + return first_discarded_bit > last_bit; +} + +typedef int (*ram_block_attributes_section_cb)(MemoryRegionSection *s, + void *arg); + +static int +ram_block_attributes_notify_populate_cb(MemoryRegionSection *section, + void *arg) +{ + RamDiscardListener *rdl =3D arg; + + return rdl->notify_populate(rdl, section); +} + +static int +ram_block_attributes_notify_discard_cb(MemoryRegionSection *section, + void *arg) +{ + RamDiscardListener *rdl =3D arg; + + rdl->notify_discard(rdl, section); + return 0; +} + +static int +ram_block_attributes_for_each_populated_section(const RamBlockAttributes *= attr, + MemoryRegionSection *secti= on, + void *arg, + ram_block_attributes_secti= on_cb cb) +{ + unsigned long first_bit, last_bit; + uint64_t offset, size; + const size_t block_size =3D ram_block_attributes_get_block_size(attr); + int ret =3D 0; + + first_bit =3D section->offset_within_region / block_size; + first_bit =3D find_next_bit(attr->bitmap, attr->bitmap_size, + first_bit); + + while (first_bit < attr->bitmap_size) { + MemoryRegionSection tmp =3D *section; + + offset =3D first_bit * block_size; + last_bit =3D find_next_zero_bit(attr->bitmap, attr->bitmap_size, + first_bit + 1) - 1; + size =3D (last_bit - first_bit + 1) * block_size; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + break; + } + + ret =3D cb(&tmp, arg); + if (ret) { + error_report("%s: Failed to notify RAM discard listener: %s", + __func__, strerror(-ret)); + break; + } + + first_bit =3D find_next_bit(attr->bitmap, attr->bitmap_size, + last_bit + 2); + } + + return ret; +} + +static int +ram_block_attributes_for_each_discarded_section(const RamBlockAttributes *= attr, + MemoryRegionSection *secti= on, + void *arg, + ram_block_attributes_secti= on_cb cb) +{ + unsigned long first_bit, last_bit; + uint64_t offset, size; + const size_t block_size =3D ram_block_attributes_get_block_size(attr); + int ret =3D 0; + + first_bit =3D section->offset_within_region / block_size; + first_bit =3D find_next_zero_bit(attr->bitmap, attr->bitmap_size, + first_bit); + + while (first_bit < attr->bitmap_size) { + MemoryRegionSection tmp =3D *section; + + offset =3D first_bit * block_size; + last_bit =3D find_next_bit(attr->bitmap, attr->bitmap_size, + first_bit + 1) - 1; + size =3D (last_bit - first_bit + 1) * block_size; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + break; + } + + ret =3D cb(&tmp, arg); + if (ret) { + error_report("%s: Failed to notify RAM discard listener: %s", + __func__, strerror(-ret)); + break; + } + + first_bit =3D find_next_zero_bit(attr->bitmap, + attr->bitmap_size, + last_bit + 2); + } + + return ret; +} + +static uint64_t +ram_block_attributes_rdm_get_min_granularity(const RamDiscardManager *rdm, + const MemoryRegion *mr) +{ + const RamBlockAttributes *attr =3D RAM_BLOCK_ATTRIBUTES(rdm); + + g_assert(mr =3D=3D attr->ram_block->mr); + return ram_block_attributes_get_block_size(attr); +} + +static void +ram_block_attributes_rdm_register_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl, + MemoryRegionSection *section) +{ + RamBlockAttributes *attr =3D RAM_BLOCK_ATTRIBUTES(rdm); + int ret; + + g_assert(section->mr =3D=3D attr->ram_block->mr); + rdl->section =3D memory_region_section_new_copy(section); + + QLIST_INSERT_HEAD(&attr->rdl_list, rdl, next); + + ret =3D ram_block_attributes_for_each_populated_section(attr, section,= rdl, + ram_block_attributes_notify_populate_c= b); + if (ret) { + error_report("%s: Failed to register RAM discard listener: %s", + __func__, strerror(-ret)); + exit(1); + } +} + +static void +ram_block_attributes_rdm_unregister_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl) +{ + RamBlockAttributes *attr =3D RAM_BLOCK_ATTRIBUTES(rdm); + int ret; + + g_assert(rdl->section); + g_assert(rdl->section->mr =3D=3D attr->ram_block->mr); + + if (rdl->double_discard_supported) { + rdl->notify_discard(rdl, rdl->section); + } else { + ret =3D ram_block_attributes_for_each_populated_section(attr, + rdl->section, rdl, ram_block_attributes_notify_discard_cb); + if (ret) { + error_report("%s: Failed to unregister RAM discard listener: %= s", + __func__, strerror(-ret)); + exit(1); + } + } + + memory_region_section_free_copy(rdl->section); + rdl->section =3D NULL; + QLIST_REMOVE(rdl, next); +} + +typedef struct RamBlockAttributesReplayData { + ReplayRamDiscardState fn; + void *opaque; +} RamBlockAttributesReplayData; + +static int ram_block_attributes_rdm_replay_cb(MemoryRegionSection *section, + void *arg) +{ + RamBlockAttributesReplayData *data =3D arg; + + return data->fn(section, data->opaque); +} + +static int +ram_block_attributes_rdm_replay_populated(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamDiscardState replay_fn, + void *opaque) +{ + RamBlockAttributes *attr =3D RAM_BLOCK_ATTRIBUTES(rdm); + RamBlockAttributesReplayData data =3D { .fn =3D replay_fn, .opaque =3D= opaque }; + + g_assert(section->mr =3D=3D attr->ram_block->mr); + return ram_block_attributes_for_each_populated_section(attr, section, = &data, + ram_block_attributes_rdm_repla= y_cb); +} + +static int +ram_block_attributes_rdm_replay_discarded(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamDiscardState replay_fn, + void *opaque) +{ + RamBlockAttributes *attr =3D RAM_BLOCK_ATTRIBUTES(rdm); + RamBlockAttributesReplayData data =3D { .fn =3D replay_fn, .opaque =3D= opaque }; + + g_assert(section->mr =3D=3D attr->ram_block->mr); + return ram_block_attributes_for_each_discarded_section(attr, section, = &data, + ram_block_attributes_rdm_repla= y_cb); +} + +static bool +ram_block_attributes_is_valid_range(RamBlockAttributes *attr, uint64_t off= set, + uint64_t size) +{ + MemoryRegion *mr =3D attr->ram_block->mr; + + g_assert(mr); + + uint64_t region_size =3D memory_region_size(mr); + const size_t block_size =3D ram_block_attributes_get_block_size(attr); + + if (!QEMU_IS_ALIGNED(offset, block_size) || + !QEMU_IS_ALIGNED(size, block_size)) { + return false; + } + if (offset + size <=3D offset) { + return false; + } + if (offset + size > region_size) { + return false; + } + return true; +} + +static void ram_block_attributes_notify_discard(RamBlockAttributes *attr, + uint64_t offset, + uint64_t size) +{ + RamDiscardListener *rdl; + + QLIST_FOREACH(rdl, &attr->rdl_list, next) { + MemoryRegionSection tmp =3D *rdl->section; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + continue; + } + rdl->notify_discard(rdl, &tmp); + } +} + +static int +ram_block_attributes_notify_populate(RamBlockAttributes *attr, + uint64_t offset, uint64_t size) +{ + RamDiscardListener *rdl; + int ret =3D 0; + + QLIST_FOREACH(rdl, &attr->rdl_list, next) { + MemoryRegionSection tmp =3D *rdl->section; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + continue; + } + ret =3D rdl->notify_populate(rdl, &tmp); + if (ret) { + break; + } + } + + return ret; +} + +int ram_block_attributes_state_change(RamBlockAttributes *attr, + uint64_t offset, uint64_t size, + bool to_discard) +{ + const size_t block_size =3D ram_block_attributes_get_block_size(attr); + const unsigned long first_bit =3D offset / block_size; + const unsigned long nbits =3D size / block_size; + const unsigned long last_bit =3D first_bit + nbits - 1; + const bool is_discarded =3D find_next_bit(attr->bitmap, attr->bitmap_s= ize, + first_bit) > last_bit; + const bool is_populated =3D find_next_zero_bit(attr->bitmap, + attr->bitmap_size, first_bit) > last_bit; + unsigned long bit; + int ret =3D 0; + + if (!ram_block_attributes_is_valid_range(attr, offset, size)) { + error_report("%s, invalid range: offset 0x%" PRIx64 ", size " + "0x%" PRIx64, __func__, offset, size); + return -EINVAL; + } + + trace_ram_block_attributes_state_change(offset, size, + is_discarded ? "discarded" : + is_populated ? "populated" : + "mixture", + to_discard ? "discarded" : + "populated"); + if (to_discard) { + if (is_discarded) { + /* Already private */ + } else if (is_populated) { + /* Completely shared */ + bitmap_clear(attr->bitmap, first_bit, nbits); + ram_block_attributes_notify_discard(attr, offset, size); + } else { + /* Unexpected mixture: process individual blocks */ + for (bit =3D first_bit; bit < first_bit + nbits; bit++) { + if (!test_bit(bit, attr->bitmap)) { + continue; + } + clear_bit(bit, attr->bitmap); + ram_block_attributes_notify_discard(attr, bit * block_size, + block_size); + } + } + } else { + if (is_populated) { + /* Already shared */ + } else if (is_discarded) { + /* Completely private */ + bitmap_set(attr->bitmap, first_bit, nbits); + ret =3D ram_block_attributes_notify_populate(attr, offset, siz= e); + } else { + /* Unexpected mixture: process individual blocks */ + for (bit =3D first_bit; bit < first_bit + nbits; bit++) { + if (test_bit(bit, attr->bitmap)) { + continue; + } + set_bit(bit, attr->bitmap); + ret =3D ram_block_attributes_notify_populate(attr, + bit * block_siz= e, + block_size); + if (ret) { + break; + } + } + } + } + + return ret; +} + +RamBlockAttributes *ram_block_attributes_create(RAMBlock *ram_block) +{ + const int block_size =3D qemu_real_host_page_size(); + RamBlockAttributes *attr; + MemoryRegion *mr =3D ram_block->mr; + + attr =3D RAM_BLOCK_ATTRIBUTES(object_new(TYPE_RAM_BLOCK_ATTRIBUTES)); + + attr->ram_block =3D ram_block; + if (memory_region_set_ram_discard_manager(mr, RAM_DISCARD_MANAGER(attr= ))) { + object_unref(OBJECT(attr)); + return NULL; + } + attr->bitmap_size =3D + ROUND_UP(int128_get64(mr->size), block_size) / block_size; + attr->bitmap =3D bitmap_new(attr->bitmap_size); + + return attr; +} + +void ram_block_attributes_destroy(RamBlockAttributes *attr) +{ + g_assert(attr); + + g_free(attr->bitmap); + memory_region_set_ram_discard_manager(attr->ram_block->mr, NULL); + object_unref(OBJECT(attr)); +} + +static void ram_block_attributes_init(Object *obj) +{ + RamBlockAttributes *attr =3D RAM_BLOCK_ATTRIBUTES(obj); + + QLIST_INIT(&attr->rdl_list); +} + +static void ram_block_attributes_finalize(Object *obj) +{ +} + +static void ram_block_attributes_class_init(ObjectClass *klass, + const void *data) +{ + RamDiscardManagerClass *rdmc =3D RAM_DISCARD_MANAGER_CLASS(klass); + + rdmc->get_min_granularity =3D ram_block_attributes_rdm_get_min_granula= rity; + rdmc->register_listener =3D ram_block_attributes_rdm_register_listener; + rdmc->unregister_listener =3D ram_block_attributes_rdm_unregister_list= ener; + rdmc->is_populated =3D ram_block_attributes_rdm_is_populated; + rdmc->replay_populated =3D ram_block_attributes_rdm_replay_populated; + rdmc->replay_discarded =3D ram_block_attributes_rdm_replay_discarded; +} diff --git a/system/meson.build b/system/meson.build index 7514bf3455..6d21ff9faa 100644 --- a/system/meson.build +++ b/system/meson.build @@ -17,6 +17,7 @@ system_ss.add(files( 'dma-helpers.c', 'globals.c', 'ioport.c', + 'ram-block-attributes.c', 'memory_mapping.c', 'memory.c', 'physmem.c', diff --git a/system/trace-events b/system/trace-events index be12ebfb41..82856e44f2 100644 --- a/system/trace-events +++ b/system/trace-events @@ -52,3 +52,6 @@ dirtylimit_state_finalize(void) dirtylimit_throttle_pct(int cpu_index, uint64_t pct, int64_t time_us) "CPU= [%d] throttle percent: %" PRIu64 ", throttle adjust time %"PRIi64 " us" dirtylimit_set_vcpu(int cpu_index, uint64_t quota) "CPU[%d] set dirty page= rate limit %"PRIu64 dirtylimit_vcpu_execute(int cpu_index, int64_t sleep_time_us) "CPU[%d] sle= ep %"PRIi64 " us" + +# ram-block-attributes.c +ram_block_attributes_state_change(uint64_t offset, uint64_t size, const ch= ar *from, const char *to) "offset 0x%"PRIx64" size 0x%"PRIx64" from '%s' to= '%s'" --=20 2.49.0