From nobody Sat Nov 15 15:29:54 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=intel.com ARC-Seal: i=1; a=rsa-sha256; t=1749716972; cv=none; d=zohomail.com; s=zohoarc; b=d2VyCtURKIp3E3V9AfmtXO0zOvvKgWWqiKC5CcV7rB8snbkOADjca2vlwW+K7e62ig7nrK/eRr0iAAyp6DXJ0jG1hwkK9qOyTolW78iMHVaCdoySx/ad1nYQl9XUUESjo2Ox+7brxAKYt/TCXL2mljlHDGTvdnvGv/kqh/lQpRk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1749716972; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=q5I7K2zr0X3P3n6R1WzZ0l90zWGupK1CD3pitNNdw3w=; b=OZF9/7jfZDXj+wBkq19iA4KpZRXXkb7+qeIIjLgRsXeL2ga2CYHRQjm3MyNcajPL5PTdfHa7GxkV6gCghWVwPDilGH5BF8qHos6DRLw2igGHhbXq/Ulg4q+oM6Lbb4sTigeXLyxoOdm1jDDsfZH7bJXAwoFcm29BdfzWGeaGsOg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1749716972089466.91056747144364; Thu, 12 Jun 2025 01:29:32 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uPdIL-000846-06; Thu, 12 Jun 2025 04:28:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPdIG-0007xr-LS for qemu-devel@nongnu.org; Thu, 12 Jun 2025 04:28:16 -0400 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPdIC-0004FP-Kf for qemu-devel@nongnu.org; Thu, 12 Jun 2025 04:28:16 -0400 Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2025 01:28:01 -0700 Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2025 01:27:57 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749716893; x=1781252893; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7IgABLsYR8Zq0dvwt44G88KbyIyTml9np5EwVhR+q9w=; b=MngBMFaQZ88NPy5xAwaljskdqh9oi1NwetsmMP+yeABdxOkmm90S0mKT 4erB/l4k5jThBbo3VNkcjB4DaMzdp0H02XUnp9O8xtEhOEGhwsSjUh0kW EyMi66ySqXM+rjjmUMQAcfnnsf+F41s/T9c4BcMVH7VjAQTiMGJdh7Oda 6NlY41D21JGDCKZZRyBSvtezNk4uBDf/8KufqUOrD1A8ahpS3Ds+TmP5o qreZPZxUkXBDG2FCujiFGdiuzTxxV9ke9plnI6wvDEpv6lxF96E344/Gs lqd8raMcNNBeCzdIcT+jxlpJ+H5lMgBVPOIVQK0PDP1EHcUr82ZZzg3Ka w==; X-CSE-ConnectionGUID: XsyAGzr4SmGyd9udPK5xtA== X-CSE-MsgGUID: 6uVQVMXAQU27rK5X1667mQ== X-IronPort-AV: E=McAfee;i="6800,10657,11461"; a="69453415" X-IronPort-AV: E=Sophos;i="6.16,230,1744095600"; d="scan'208";a="69453415" X-CSE-ConnectionGUID: ogz5Fg51SgKsuY/rbawLJQ== X-CSE-MsgGUID: WQq7uGJtQXiTh2hB/psoKg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,230,1744095600"; d="scan'208";a="152442014" From: Chenyi Qiang To: David Hildenbrand , Alexey Kardashevskiy , Peter Xu , Gupta Pankaj , Paolo Bonzini , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Zhao Liu , Baolu Lu , Gao Chao , Xu Yilun , Li Xiaoyao , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , Alex Williamson Subject: [PATCH v7 1/5] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Date: Thu, 12 Jun 2025 16:27:42 +0800 Message-ID: <20250612082747.51539-2-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250612082747.51539-1-chenyi.qiang@intel.com> References: <20250612082747.51539-1-chenyi.qiang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=192.198.163.8; envelope-from=chenyi.qiang@intel.com; helo=mgamail.intel.com X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @intel.com) X-ZM-MESSAGEID: 1749716975384116600 Content-Type: text/plain; charset="utf-8" Rename the helper to memory_region_section_intersect_range() to make it more generic. Meanwhile, define the @end as Int128 and replace the related operations with Int128_* format since the helper is exported as a wider API. Suggested-by: Alexey Kardashevskiy Reviewed-by: Alexey Kardashevskiy Reviewed-by: Pankaj Gupta Reviewed-by: David Hildenbrand Reviewed-by: Zhao Liu Reviewed-by: Xiaoyao Li Signed-off-by: Chenyi Qiang --- Changes in v7: - Add Reivewed-by from Xiaoyao and Pankaj Changes in v6: - No change. Changes in v5: - Indent change for int128 ops to avoid the line over 80 - Add two Review-by from Alexey and Zhao --- hw/virtio/virtio-mem.c | 32 +++++--------------------------- include/system/memory.h | 30 ++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+), 27 deletions(-) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index a3d1a676e7..b3c126ea1e 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -244,28 +244,6 @@ static int virtio_mem_for_each_plugged_range(VirtIOMEM= *vmem, void *arg, return ret; } =20 -/* - * Adjust the memory section to cover the intersection with the given rang= e. - * - * Returns false if the intersection is empty, otherwise returns true. - */ -static bool virtio_mem_intersect_memory_section(MemoryRegionSection *s, - uint64_t offset, uint64_t = size) -{ - uint64_t start =3D MAX(s->offset_within_region, offset); - uint64_t end =3D MIN(s->offset_within_region + int128_get64(s->size), - offset + size); - - if (end <=3D start) { - return false; - } - - s->offset_within_address_space +=3D start - s->offset_within_region; - s->offset_within_region =3D start; - s->size =3D int128_make64(end - start); - return true; -} - typedef int (*virtio_mem_section_cb)(MemoryRegionSection *s, void *arg); =20 static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem, @@ -287,7 +265,7 @@ static int virtio_mem_for_each_plugged_section(const Vi= rtIOMEM *vmem, first_bit + 1) - 1; size =3D (last_bit - first_bit + 1) * vmem->block_size; =20 - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { break; } ret =3D cb(&tmp, arg); @@ -319,7 +297,7 @@ static int virtio_mem_for_each_unplugged_section(const = VirtIOMEM *vmem, first_bit + 1) - 1; size =3D (last_bit - first_bit + 1) * vmem->block_size; =20 - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { break; } ret =3D cb(&tmp, arg); @@ -355,7 +333,7 @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, u= int64_t offset, QLIST_FOREACH(rdl, &vmem->rdl_list, next) { MemoryRegionSection tmp =3D *rdl->section; =20 - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } rdl->notify_discard(rdl, &tmp); @@ -371,7 +349,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint= 64_t offset, QLIST_FOREACH(rdl, &vmem->rdl_list, next) { MemoryRegionSection tmp =3D *rdl->section; =20 - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)) { continue; } ret =3D rdl->notify_populate(rdl, &tmp); @@ -388,7 +366,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint= 64_t offset, if (rdl2 =3D=3D rdl) { break; } - if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) { + if (!memory_region_section_intersect_range(&tmp, offset, size)= ) { continue; } rdl2->notify_discard(rdl2, &tmp); diff --git a/include/system/memory.h b/include/system/memory.h index 0848690ea4..da97753e28 100644 --- a/include/system/memory.h +++ b/include/system/memory.h @@ -1211,6 +1211,36 @@ MemoryRegionSection *memory_region_section_new_copy(= MemoryRegionSection *s); */ void memory_region_section_free_copy(MemoryRegionSection *s); =20 +/** + * memory_region_section_intersect_range: Adjust the memory section to cov= er + * the intersection with the given range. + * + * @s: the #MemoryRegionSection to be adjusted + * @offset: the offset of the given range in the memory region + * @size: the size of the given range + * + * Returns false if the intersection is empty, otherwise returns true. + */ +static inline bool memory_region_section_intersect_range(MemoryRegionSecti= on *s, + uint64_t offset, + uint64_t size) +{ + uint64_t start =3D MAX(s->offset_within_region, offset); + Int128 end =3D int128_min(int128_add(int128_make64(s->offset_within_re= gion), + s->size), + int128_add(int128_make64(offset), + int128_make64(size))); + + if (int128_le(end, int128_make64(start))) { + return false; + } + + s->offset_within_address_space +=3D start - s->offset_within_region; + s->offset_within_region =3D start; + s->size =3D int128_sub(end, int128_make64(start)); + return true; +} + /** * memory_region_init: Initialize a memory region * --=20 2.43.5 From nobody Sat Nov 15 15:29:54 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=intel.com ARC-Seal: i=1; a=rsa-sha256; t=1749716979; cv=none; d=zohomail.com; s=zohoarc; b=lRvmhEOomGlDjWsT1H8FpC7tiCJACzlBIU7JVmJ8i4yAq1WwcaFAOOdjZkiaiCbKCo37XrAfRki106LS9L5w8UqPtyEpgwFn9BpLLDxUcXJwbZBxJAE97lTg9m0HmMWAEt9M+EqFQegOve0p/jloxH+cdofpvyHwQX4S9zquyQY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1749716979; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=nN4tX3b8wALkKypXPohibpxb900cxoyYxRLiKLMXVos=; b=l2px1Mg3qOCpqop+mYFsU0gSoGK0poAkjEaIURfpfwlOKF7ZizNdehq9GwQ3yolsILVZiWRAjzsIB0NbtnUCbgs+3MiT8D+/MngvqH7NmegAaJBH6ERLeSf5gztOwV39k5zuwqYwatD3ArTCWdXBok3vcKXYkiigqXxuIcoRC3s= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1749716979306207.55695508854012; Thu, 12 Jun 2025 01:29:39 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uPdIJ-00081O-Om; Thu, 12 Jun 2025 04:28:19 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPdII-0007z7-Gh for qemu-devel@nongnu.org; Thu, 12 Jun 2025 04:28:18 -0400 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPdIC-0004Fi-Nd for qemu-devel@nongnu.org; Thu, 12 Jun 2025 04:28:17 -0400 Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2025 01:28:05 -0700 Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2025 01:28:01 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749716893; x=1781252893; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LdiTvLQwV3NPajQBdRm+xuKF+dxYwpbIICIDW52TqYs=; b=ibxv7ARYeGBsicvoc41J4M06iwQttvDWUaCRvNn8o+uGTOnPx/LVAaNp dDrgZS0ILUHj284NcjIGS5doHKYDa6h07r0HCRqFjp5nqsNERhHhYADma gOmU8nDr0MOrVIQEFQPoJVhgw4hEHSwsjSykdklE89vRahTTd2iTtoytg Kek3sHOAVRcp2KOWrLLYHO2ma5coKxKjOCKWSyrXuNllZUy6TakB2r16z DtFFR5YsZul78ERGSR872GJKEILx9aL/Mm3zOoaDirVHau1lNg+M3TS0w BptpTPBy/a5WEeG1hyOpyik2MAClMx0Fs3t7PZPc+Z0RNsrIKBP3WDgXH Q==; X-CSE-ConnectionGUID: wtpApKrPTXa6ST4VxiW6PA== X-CSE-MsgGUID: hZ5NNS6vT3Cq4hg8amcoYw== X-IronPort-AV: E=McAfee;i="6800,10657,11461"; a="69453430" X-IronPort-AV: E=Sophos;i="6.16,230,1744095600"; d="scan'208";a="69453430" X-CSE-ConnectionGUID: pfpPH5VHSyCMjItiH4/4Fw== X-CSE-MsgGUID: I3o9yiCnSuuA9zi7OiOy+g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,230,1744095600"; d="scan'208";a="152442030" From: Chenyi Qiang To: David Hildenbrand , Alexey Kardashevskiy , Peter Xu , Gupta Pankaj , Paolo Bonzini , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Zhao Liu , Baolu Lu , Gao Chao , Xu Yilun , Li Xiaoyao , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , Alex Williamson Subject: [PATCH v7 2/5] memory: Change memory_region_set_ram_discard_manager() to return the result Date: Thu, 12 Jun 2025 16:27:43 +0800 Message-ID: <20250612082747.51539-3-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250612082747.51539-1-chenyi.qiang@intel.com> References: <20250612082747.51539-1-chenyi.qiang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=192.198.163.8; envelope-from=chenyi.qiang@intel.com; helo=mgamail.intel.com X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @intel.com) X-ZM-MESSAGEID: 1749716980935116600 Content-Type: text/plain; charset="utf-8" Modify memory_region_set_ram_discard_manager() to return -EBUSY if a RamDiscardManager is already set in the MemoryRegion. The caller must handle this failure, such as having virtio-mem undo its actions and fail the realize() process. Opportunistically move the call earlier to avoid complex error handling. This change is beneficial when introducing a new RamDiscardManager instance besides virtio-mem. After ram_block_coordinated_discard_require(true) unlocks all RamDiscardManager instances, only one instance is allowed to be set for one MemoryRegion at present. Suggested-by: David Hildenbrand Reviewed-by: David Hildenbrand Reviewed-by: Pankaj Gupta Tested-by: Alexey Kardashevskiy Reviewed-by: Alexey Kardashevskiy Reviewed-by: Xiaoyao Li Signed-off-by: Chenyi Qiang --- Changes in v7: - Add Reviewed-by from Pankaj, Alexey and Xiaoyao Changes in v6: - Add Reviewed-by from David. Changes in v5: - Nit in commit message (return false -> -EBUSY) - Add set_ram_discard_manager(NULL) when ram_block_discard_range() fails. Changes in v3: - Move set_ram_discard_manager() up to avoid a g_free() - Clean up set_ram_discard_manager() definition --- hw/virtio/virtio-mem.c | 30 +++++++++++++++++------------- include/system/memory.h | 6 +++--- system/memory.c | 10 +++++++--- 3 files changed, 27 insertions(+), 19 deletions(-) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index b3c126ea1e..2e491e8c44 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -1047,6 +1047,17 @@ static void virtio_mem_device_realize(DeviceState *d= ev, Error **errp) return; } =20 + /* + * Set ourselves as RamDiscardManager before the plug handler maps the + * memory region and exposes it via an address space. + */ + if (memory_region_set_ram_discard_manager(&vmem->memdev->mr, + RAM_DISCARD_MANAGER(vmem))) { + error_setg(errp, "Failed to set RamDiscardManager"); + ram_block_coordinated_discard_require(false); + return; + } + /* * We don't know at this point whether shared RAM is migrated using * QEMU or migrated using the file content. "x-ignore-shared" will be @@ -1061,6 +1072,7 @@ static void virtio_mem_device_realize(DeviceState *de= v, Error **errp) ret =3D ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb= )); if (ret) { error_setg_errno(errp, -ret, "Unexpected error discarding RAM"= ); + memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL); ram_block_coordinated_discard_require(false); return; } @@ -1122,13 +1134,6 @@ static void virtio_mem_device_realize(DeviceState *d= ev, Error **errp) vmem->system_reset =3D VIRTIO_MEM_SYSTEM_RESET(obj); vmem->system_reset->vmem =3D vmem; qemu_register_resettable(obj); - - /* - * Set ourselves as RamDiscardManager before the plug handler maps the - * memory region and exposes it via an address space. - */ - memory_region_set_ram_discard_manager(&vmem->memdev->mr, - RAM_DISCARD_MANAGER(vmem)); } =20 static void virtio_mem_device_unrealize(DeviceState *dev) @@ -1136,12 +1141,6 @@ static void virtio_mem_device_unrealize(DeviceState = *dev) VirtIODevice *vdev =3D VIRTIO_DEVICE(dev); VirtIOMEM *vmem =3D VIRTIO_MEM(dev); =20 - /* - * The unplug handler unmapped the memory region, it cannot be - * found via an address space anymore. Unset ourselves. - */ - memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL); - qemu_unregister_resettable(OBJECT(vmem->system_reset)); object_unref(OBJECT(vmem->system_reset)); =20 @@ -1154,6 +1153,11 @@ static void virtio_mem_device_unrealize(DeviceState = *dev) virtio_del_queue(vdev, 0); virtio_cleanup(vdev); g_free(vmem->bitmap); + /* + * The unplug handler unmapped the memory region, it cannot be + * found via an address space anymore. Unset ourselves. + */ + memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL); ram_block_coordinated_discard_require(false); } =20 diff --git a/include/system/memory.h b/include/system/memory.h index da97753e28..60983d4977 100644 --- a/include/system/memory.h +++ b/include/system/memory.h @@ -2499,13 +2499,13 @@ static inline bool memory_region_has_ram_discard_ma= nager(MemoryRegion *mr) * * This function must not be called for a mapped #MemoryRegion, a #MemoryR= egion * that does not cover RAM, or a #MemoryRegion that already has a - * #RamDiscardManager assigned. + * #RamDiscardManager assigned. Return 0 if the rdm is set successfully. * * @mr: the #MemoryRegion * @rdm: #RamDiscardManager to set */ -void memory_region_set_ram_discard_manager(MemoryRegion *mr, - RamDiscardManager *rdm); +int memory_region_set_ram_discard_manager(MemoryRegion *mr, + RamDiscardManager *rdm); =20 /** * memory_region_find: translate an address/size relative to a diff --git a/system/memory.c b/system/memory.c index 306e9ff9eb..d0c186e9f6 100644 --- a/system/memory.c +++ b/system/memory.c @@ -2106,12 +2106,16 @@ RamDiscardManager *memory_region_get_ram_discard_ma= nager(MemoryRegion *mr) return mr->rdm; } =20 -void memory_region_set_ram_discard_manager(MemoryRegion *mr, - RamDiscardManager *rdm) +int memory_region_set_ram_discard_manager(MemoryRegion *mr, + RamDiscardManager *rdm) { g_assert(memory_region_is_ram(mr)); - g_assert(!rdm || !mr->rdm); + if (mr->rdm && rdm) { + return -EBUSY; + } + mr->rdm =3D rdm; + return 0; } =20 uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *= rdm, --=20 2.43.5 From nobody Sat Nov 15 15:29:54 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=intel.com ARC-Seal: i=1; a=rsa-sha256; t=1749716980; cv=none; d=zohomail.com; s=zohoarc; b=W3gtkrFpgSkBaIPTU7Ygd/gGD79xyJmwdpwTHpqXuetMlnt3ERUrATSjPaTw0a45EHzCHCF5mzmyWwHokkxL49fjfTDcwMJYLlCV32WaoMZl9MsPwfi4OmOhUEkb11wvOAAVNl1f2ex7zEfRt+RSBGnfwLNlRScRFvyizyTUglY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1749716980; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=Za6nBvK55fR3DmrYBDomCCN4NVTEtK3FPtJJxch/6HQ=; b=k2fJ3qqvctp8TsoAe0QQElk0xmixjB5uMrdpyCy15R4kx9vP14cBAJFjlJYhhIf4qV2uJHLGbBr8TwIOwalbbRLAat4VLaCpBxoFiyk/YNAiWdfqcv2Zo76TZVYP2gVeUgHzhNTK3Z53Wn0ee/Z70h2JCzzyrGfMgpgSZHHoDAQ= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1749716980642379.35566108346734; Thu, 12 Jun 2025 01:29:40 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uPdIO-00086c-4h; Thu, 12 Jun 2025 04:28:24 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPdIJ-0007zr-1T for qemu-devel@nongnu.org; Thu, 12 Jun 2025 04:28:19 -0400 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPdIG-0004FP-Qp for qemu-devel@nongnu.org; Thu, 12 Jun 2025 04:28:18 -0400 Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2025 01:28:10 -0700 Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2025 01:28:05 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749716897; x=1781252897; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IoDfVvR6sOQKPtD7NymR46Bpd3lDqmDVR2c529+KaPQ=; b=iScjMdZnPz1jcbmpfxh47T+ApnM15FnHpwKhIwZ7OnzhGZslMAjJj7sk oqSN5YSad0ZfwbdyWSHG3UKoyzvavLeCeYUqcL7zW9qF+fF3Z9ZjNDpNs zi7aOnNnZ/Bet5rjMQelr80kPiMyTLgIW1mhRN/pN+jjFBdvnWH5uoiFk HysafEQgpcvkE5ZbdopM1GYsPSo4cm5Hsaf3nu8Zzk+/5xR81ix1APNid DSWEKogmj91iEMKoyqWC1AQAhEKLYn28XUnm2gnoHaxgH3rtvl+2Ndm/j 6P7GJZad9cNUQubja9aJRyueovj9RbCU3XHZzcMcqfzfuLSKcxN8RMNjm g==; X-CSE-ConnectionGUID: tiPsQvMYS8aScLrBUKTRGg== X-CSE-MsgGUID: eAKeem/pScOZtZHBtYNjpg== X-IronPort-AV: E=McAfee;i="6800,10657,11461"; a="69453444" X-IronPort-AV: E=Sophos;i="6.16,230,1744095600"; d="scan'208";a="69453444" X-CSE-ConnectionGUID: DiI5kDGUQzyaj1BObYm3GA== X-CSE-MsgGUID: 2lPuQWzSQNiOrb/Uvrf9AA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,230,1744095600"; d="scan'208";a="152442037" From: Chenyi Qiang To: David Hildenbrand , Alexey Kardashevskiy , Peter Xu , Gupta Pankaj , Paolo Bonzini , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Zhao Liu , Baolu Lu , Gao Chao , Xu Yilun , Li Xiaoyao , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , Alex Williamson Subject: [PATCH v7 3/5] memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard() Date: Thu, 12 Jun 2025 16:27:44 +0800 Message-ID: <20250612082747.51539-4-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250612082747.51539-1-chenyi.qiang@intel.com> References: <20250612082747.51539-1-chenyi.qiang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=192.198.163.8; envelope-from=chenyi.qiang@intel.com; helo=mgamail.intel.com X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @intel.com) X-ZM-MESSAGEID: 1749716983061116600 Content-Type: text/plain; charset="utf-8" Update ReplayRamDiscard() function to return the result and unify the ReplayRamPopulate() and ReplayRamDiscard() to ReplayRamDiscardState() at the same time due to their identical definitions. This unification simplifies related structures, such as VirtIOMEMReplayData, which makes it cleaner. Reviewed-by: David Hildenbrand Reviewed-by: Pankaj Gupta Reviewed-by: Xiaoyao Li Signed-off-by: Chenyi Qiang --- Changes in v7: - Add Reviewed-by from Xiaoyao and Pankaj. Changes in v6: - Add Reviewed-by from David - Add a documentation comment for the prototype change Changes in v5: - Rename ReplayRamStateChange to ReplayRamDiscardState (David) - return data->fn(s, data->opaque) instead of 0 in virtio_mem_rdm_replay_discarded_cb(). (Alexey) Changes in v4: - Modify the commit message. We won't use Replay() operation when doing the attribute change like v3. --- hw/virtio/virtio-mem.c | 21 +++++++------- include/system/memory.h | 64 ++++++++++++++++++++++++++++++----------- migration/ram.c | 5 ++-- system/memory.c | 12 ++++---- 4 files changed, 66 insertions(+), 36 deletions(-) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 2e491e8c44..c46f6f9c3e 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -1732,7 +1732,7 @@ static bool virtio_mem_rdm_is_populated(const RamDisc= ardManager *rdm, } =20 struct VirtIOMEMReplayData { - void *fn; + ReplayRamDiscardState fn; void *opaque; }; =20 @@ -1740,12 +1740,12 @@ static int virtio_mem_rdm_replay_populated_cb(Memor= yRegionSection *s, void *arg) { struct VirtIOMEMReplayData *data =3D arg; =20 - return ((ReplayRamPopulate)data->fn)(s, data->opaque); + return data->fn(s, data->opaque); } =20 static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm, MemoryRegionSection *s, - ReplayRamPopulate replay_fn, + ReplayRamDiscardState replay_fn, void *opaque) { const VirtIOMEM *vmem =3D VIRTIO_MEM(rdm); @@ -1764,14 +1764,13 @@ static int virtio_mem_rdm_replay_discarded_cb(Memor= yRegionSection *s, { struct VirtIOMEMReplayData *data =3D arg; =20 - ((ReplayRamDiscard)data->fn)(s, data->opaque); - return 0; + return data->fn(s, data->opaque); } =20 -static void virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm, - MemoryRegionSection *s, - ReplayRamDiscard replay_fn, - void *opaque) +static int virtio_mem_rdm_replay_discarded(const RamDiscardManager *rdm, + MemoryRegionSection *s, + ReplayRamDiscardState replay_fn, + void *opaque) { const VirtIOMEM *vmem =3D VIRTIO_MEM(rdm); struct VirtIOMEMReplayData data =3D { @@ -1780,8 +1779,8 @@ static void virtio_mem_rdm_replay_discarded(const Ram= DiscardManager *rdm, }; =20 g_assert(s->mr =3D=3D &vmem->memdev->mr); - virtio_mem_for_each_unplugged_section(vmem, s, &data, - virtio_mem_rdm_replay_discarded_= cb); + return virtio_mem_for_each_unplugged_section(vmem, s, &data, + virtio_mem_rdm_replay_dis= carded_cb); } =20 static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm, diff --git a/include/system/memory.h b/include/system/memory.h index 60983d4977..eb2618e1b4 100644 --- a/include/system/memory.h +++ b/include/system/memory.h @@ -576,8 +576,20 @@ static inline void ram_discard_listener_init(RamDiscar= dListener *rdl, rdl->double_discard_supported =3D double_discard_supported; } =20 -typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void *opaqu= e); -typedef void (*ReplayRamDiscard)(MemoryRegionSection *section, void *opaqu= e); +/** + * ReplayRamDiscardState: + * + * The callback handler for #RamDiscardManagerClass.replay_populated/ + * #RamDiscardManagerClass.replay_discarded to invoke on populated/discard= ed + * parts. + * + * @section: the #MemoryRegionSection of populated/discarded part + * @opaque: pointer to forward to the callback + * + * Returns 0 on success, or a negative error if failed. + */ +typedef int (*ReplayRamDiscardState)(MemoryRegionSection *section, + void *opaque); =20 /* * RamDiscardManagerClass: @@ -651,36 +663,38 @@ struct RamDiscardManagerClass { /** * @replay_populated: * - * Call the #ReplayRamPopulate callback for all populated parts within= the - * #MemoryRegionSection via the #RamDiscardManager. + * Call the #ReplayRamDiscardState callback for all populated parts wi= thin + * the #MemoryRegionSection via the #RamDiscardManager. * * In case any call fails, no further calls are made. * * @rdm: the #RamDiscardManager * @section: the #MemoryRegionSection - * @replay_fn: the #ReplayRamPopulate callback + * @replay_fn: the #ReplayRamDiscardState callback * @opaque: pointer to forward to the callback * * Returns 0 on success, or a negative error if any notification faile= d. */ int (*replay_populated)(const RamDiscardManager *rdm, MemoryRegionSection *section, - ReplayRamPopulate replay_fn, void *opaque); + ReplayRamDiscardState replay_fn, void *opaque); =20 /** * @replay_discarded: * - * Call the #ReplayRamDiscard callback for all discarded parts within = the - * #MemoryRegionSection via the #RamDiscardManager. + * Call the #ReplayRamDiscardState callback for all discarded parts wi= thin + * the #MemoryRegionSection via the #RamDiscardManager. * * @rdm: the #RamDiscardManager * @section: the #MemoryRegionSection - * @replay_fn: the #ReplayRamDiscard callback + * @replay_fn: the #ReplayRamDiscardState callback * @opaque: pointer to forward to the callback + * + * Returns 0 on success, or a negative error if any notification faile= d. */ - void (*replay_discarded)(const RamDiscardManager *rdm, - MemoryRegionSection *section, - ReplayRamDiscard replay_fn, void *opaque); + int (*replay_discarded)(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamDiscardState replay_fn, void *opaque); =20 /** * @register_listener: @@ -721,15 +735,31 @@ uint64_t ram_discard_manager_get_min_granularity(cons= t RamDiscardManager *rdm, bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, const MemoryRegionSection *section); =20 +/** + * ram_discard_manager_replay_populated: + * + * A wrapper to call the #RamDiscardManagerClass.replay_populated callback + * of the #RamDiscardManager. + * + * Returns 0 on success, or a negative error if any notification failed. + */ int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, MemoryRegionSection *section, - ReplayRamPopulate replay_fn, + ReplayRamDiscardState replay_fn, void *opaque); =20 -void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, - MemoryRegionSection *section, - ReplayRamDiscard replay_fn, - void *opaque); +/** + * ram_discard_manager_replay_discarded: + * + * A wrapper to call the #RamDiscardManagerClass.replay_discarded callback + * of the #RamDiscardManager. + * + * Returns 0 on success, or a negative error if any notification failed. + */ +int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamDiscardState replay_fn, + void *opaque); =20 void ram_discard_manager_register_listener(RamDiscardManager *rdm, RamDiscardListener *rdl, diff --git a/migration/ram.c b/migration/ram.c index d26dbd37c4..2d4af497b5 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -848,8 +848,8 @@ static inline bool migration_bitmap_clear_dirty(RAMStat= e *rs, return ret; } =20 -static void dirty_bitmap_clear_section(MemoryRegionSection *section, - void *opaque) +static int dirty_bitmap_clear_section(MemoryRegionSection *section, + void *opaque) { const hwaddr offset =3D section->offset_within_region; const hwaddr size =3D int128_get64(section->size); @@ -868,6 +868,7 @@ static void dirty_bitmap_clear_section(MemoryRegionSect= ion *section, } *cleared_bits +=3D bitmap_count_one_with_offset(rb->bmap, start, npage= s); bitmap_clear(rb->bmap, start, npages); + return 0; } =20 /* diff --git a/system/memory.c b/system/memory.c index d0c186e9f6..76b44b8220 100644 --- a/system/memory.c +++ b/system/memory.c @@ -2138,7 +2138,7 @@ bool ram_discard_manager_is_populated(const RamDiscar= dManager *rdm, =20 int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, MemoryRegionSection *section, - ReplayRamPopulate replay_fn, + ReplayRamDiscardState replay_fn, void *opaque) { RamDiscardManagerClass *rdmc =3D RAM_DISCARD_MANAGER_GET_CLASS(rdm); @@ -2147,15 +2147,15 @@ int ram_discard_manager_replay_populated(const RamD= iscardManager *rdm, return rdmc->replay_populated(rdm, section, replay_fn, opaque); } =20 -void ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, - MemoryRegionSection *section, - ReplayRamDiscard replay_fn, - void *opaque) +int ram_discard_manager_replay_discarded(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamDiscardState replay_fn, + void *opaque) { RamDiscardManagerClass *rdmc =3D RAM_DISCARD_MANAGER_GET_CLASS(rdm); =20 g_assert(rdmc->replay_discarded); - rdmc->replay_discarded(rdm, section, replay_fn, opaque); + return rdmc->replay_discarded(rdm, section, replay_fn, opaque); } =20 void ram_discard_manager_register_listener(RamDiscardManager *rdm, --=20 2.43.5 From nobody Sat Nov 15 15:29:54 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=intel.com ARC-Seal: i=1; a=rsa-sha256; t=1749716975; cv=none; d=zohomail.com; s=zohoarc; b=GV5IAgpt+3xgMC2516BYSdRnZvcAj4yW0E+LIfTRZzHmzTanUro18v7/R0Ra15muG73RqR6G+yZsf3kO5nv9CwU6CaNoKqV8ti1lupDCi1jschHv/eiw4IMcYF0I4NtXmtFQHvaE9BUZ7K4k79koY4bRVkDxCE1dkODigYfJNe8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1749716975; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=kgBCvEgDaNlW4KGXaWMylS/cO9vXgzad/8ySaGzi2W0=; b=E7v6ozQ7spwFYtl9agCalij4MV6d/ZVUCGpg7fhk6dkxjlXFUNP8IRuiv6nIIm07AVii6hHzHvYD+Uc+ZOnJF2bCvHuUBvxEPvtTebsyTk7JYBQ1XLnyPf3fm9TewfwpznpNDwniLA5hyO2HcR8zH46bb9urjvhdD7R4DqnvR1A= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1749716975182624.4826175069365; Thu, 12 Jun 2025 01:29:35 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uPdIS-0008Bf-7O; Thu, 12 Jun 2025 04:28:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPdIL-00085J-5W for qemu-devel@nongnu.org; Thu, 12 Jun 2025 04:28:21 -0400 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPdIH-0004F6-1A for qemu-devel@nongnu.org; Thu, 12 Jun 2025 04:28:20 -0400 Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2025 01:28:14 -0700 Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2025 01:28:10 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749716897; x=1781252897; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DifGOdnO77+Dr41Cci6/a574tJUna+hwNOTdT224L8E=; b=DbemAugbHQvDi1sWa8NWJTA2mCzldFbHXdY/LXmW/aWHmR/tJ7RQH32P qClc+o8PFS+SrfYrKsNtIKNgkpvRK8+1FZ7frEkCUyhL6OQYKdB070SKp 4IduVTwiJvLXfZ0LFZiwjmA6UdM0fdw7uLA4IYGh+9z+XN8Dmc0LXuroc ljYKg2/jW2+90bs/j9w3FnAFqPpb4aA2zhOK3F4uXp6wpxXyL/120kI++ S9KROhM1UtiENbb5lUuiYALK537B7NjtdxEgCWDAedPeArC/CPRBS+7Ph Yus3ND8nA+fRBw/9nPSZ4s9xJ/4mk6EtAMqx6PjhJqxY1KhMDGgavN9Jb g==; X-CSE-ConnectionGUID: TtJs8cJEStyfjwIqpfLBgg== X-CSE-MsgGUID: 4xpAtuNZRKG2O2HPvhZI5w== X-IronPort-AV: E=McAfee;i="6800,10657,11461"; a="69453454" X-IronPort-AV: E=Sophos;i="6.16,230,1744095600"; d="scan'208";a="69453454" X-CSE-ConnectionGUID: DlFFdq/rQAeFzjOjHhMTVw== X-CSE-MsgGUID: 8l7l4u8qQI6kTqGJzYJmMA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,230,1744095600"; d="scan'208";a="152442049" From: Chenyi Qiang To: David Hildenbrand , Alexey Kardashevskiy , Peter Xu , Gupta Pankaj , Paolo Bonzini , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Zhao Liu , Baolu Lu , Gao Chao , Xu Yilun , Li Xiaoyao , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , Alex Williamson Subject: [PATCH v7 4/5] ram-block-attributes: Introduce RamBlockAttributes to manage RAMBlock with guest_memfd Date: Thu, 12 Jun 2025 16:27:45 +0800 Message-ID: <20250612082747.51539-5-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250612082747.51539-1-chenyi.qiang@intel.com> References: <20250612082747.51539-1-chenyi.qiang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=192.198.163.8; envelope-from=chenyi.qiang@intel.com; helo=mgamail.intel.com X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @intel.com) X-ZM-MESSAGEID: 1749716976958116600 Content-Type: text/plain; charset="utf-8" Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated discard") highlighted that subsystems like VFIO may disable RAM block discard. However, guest_memfd relies on discard operations for page conversion between private and shared memory, potentially leading to the stale IOMMU mapping issue when assigning hardware devices to confidential VMs via shared memory. To address this and allow shared device assignement, it is crucial to ensure the VFIO system refreshes its IOMMU mappings. RamDiscardManager is an existing interface (used by virtio-mem) to adjust VFIO mappings in relation to VM page assignment. Effectively page conversion is similar to hot-removing a page in one mode and adding it back in the other. Therefore, similar actions are required for page conversion events. Introduce the RamDiscardManager to guest_memfd to facilitate this process. Since guest_memfd is not an object, it cannot directly implement the RamDiscardManager interface. Implementing it in HostMemoryBackend is not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks have a memory backend while others do not. Notably, virtual BIOS RAMBlocks using memory_region_init_ram_guest_memfd() do not have a backend. To manage RAMBlocks with guest_memfd, define a new object named RamBlockAttributes to implement the RamDiscardManager interface. This object can store the guest_memfd information such as the bitmap for shared memory and the registered listeners for event notifications. A new state_change() helper function is provided to notify listeners, such as VFIO, allowing VFIO to do dynamically DMA map and unmap for the shared memory according to conversion events. Note that in the current context of RamDiscardManager for guest_memfd, the shared state is analogous to being populated, while the private state can be considered discarded for simplicity. In the future, it would be more complicated if considering more states like private/shared/discarded at the same time. In current implementation, memory state tracking is performed at the host page size granularity, as the minimum conversion size can be one page per request. Additionally, VFIO expected the DMA mapping for a specific IOVA to be mapped and unmapped with the same granularity. Confidential VMs may perform partial conversions, such as conversions on small regions within a larger one. To prevent such invalid cases and until support for DMA mapping cut operations is available, all operations are performed with 4K granularity. In addition, memory conversion failures cause QEMU to quit rather than resuming the guest or retrying the operation at present. It would be future work to add more error handling or rollback mechanisms once conversion failures are allowed. For example, in-place conversion of guest_memfd could retry the unmap operation during the conversion from shared to private. For now, keep the complex error handling out of the picture as it is not required. Tested-by: Alexey Kardashevskiy Reviewed-by: Alexey Kardashevskiy Reviewed-by: Pankaj Gupta Signed-off-by: Chenyi Qiang --- Changes in v7: - Unwrap the two helpers(is_range_populated() and is is_range_discarded()). (Alexey) - Use bit to do the iteration instead of offset without additional variables of "cur" and "end" (Alexey) - Add Reviewed-by from Pankaj and Alexey. Changes in v6: - Change the object type name from RamBlockAttribute to RamBlockAttributes. (David) - Save the associated RAMBlock instead MemoryRegion in RamBlockAttributes. (David) - Squash the state_change() helper introduction in this commit as well as the mixture conversion case handling. (David) - Change the block_size type from int to size_t and some cleanup in validation check. (Alexey) - Add a tracepoint to track the state changes. (Alexey) Changes in v5: - Revert to use RamDiscardManager interface instead of introducing new hierarchy of class to manage private/shared state, and keep using the new name of RamBlockAttribute compared with the MemoryAttributeManager in v3. - Use *simple* version of object_define and object_declare since the state_change() function is changed as an exported function instead of a virtual function in later patch. - Move the introduction of RamBlockAttribute field to this patch and rename it to ram_shared. (Alexey) - call the exit() when register/unregister failed. (Zhao) - Add the ram-block-attribute.c to Memory API related part in MAINTAINERS. Changes in v4: - Change the name from memory-attribute-manager to ram-block-attribute. - Implement the newly-introduced PrivateSharedManager instead of RamDiscardManager and change related commit message. - Define the new object in ramblock.h instead of adding a new file. --- MAINTAINERS | 1 + include/system/ramblock.h | 21 ++ system/meson.build | 1 + system/ram-block-attributes.c | 442 ++++++++++++++++++++++++++++++++++ system/trace-events | 3 + 5 files changed, 468 insertions(+) create mode 100644 system/ram-block-attributes.c diff --git a/MAINTAINERS b/MAINTAINERS index aa6763077e..6a86cee73a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3172,6 +3172,7 @@ F: system/memory.c F: system/memory_mapping.c F: system/physmem.c F: system/memory-internal.h +F: system/ram-block-attributes.c F: scripts/coccinelle/memory-region-housekeeping.cocci =20 Memory devices diff --git a/include/system/ramblock.h b/include/system/ramblock.h index d8a116ba99..1bab9e2dac 100644 --- a/include/system/ramblock.h +++ b/include/system/ramblock.h @@ -22,6 +22,10 @@ #include "exec/cpu-common.h" #include "qemu/rcu.h" #include "exec/ramlist.h" +#include "system/hostmem.h" + +#define TYPE_RAM_BLOCK_ATTRIBUTES "ram-block-attributes" +OBJECT_DECLARE_SIMPLE_TYPE(RamBlockAttributes, RAM_BLOCK_ATTRIBUTES) =20 struct RAMBlock { struct rcu_head rcu; @@ -91,4 +95,21 @@ struct RAMBlock { ram_addr_t postcopy_length; }; =20 +struct RamBlockAttributes { + Object parent; + + RAMBlock *ram_block; + + /* 1-setting of the bitmap represents ram is populated (shared) */ + unsigned bitmap_size; + unsigned long *bitmap; + + QLIST_HEAD(, RamDiscardListener) rdl_list; +}; + +RamBlockAttributes *ram_block_attributes_create(RAMBlock *ram_block); +void ram_block_attributes_destroy(RamBlockAttributes *attr); +int ram_block_attributes_state_change(RamBlockAttributes *attr, uint64_t o= ffset, + uint64_t size, bool to_discard); + #endif diff --git a/system/meson.build b/system/meson.build index 7514bf3455..6d21ff9faa 100644 --- a/system/meson.build +++ b/system/meson.build @@ -17,6 +17,7 @@ system_ss.add(files( 'dma-helpers.c', 'globals.c', 'ioport.c', + 'ram-block-attributes.c', 'memory_mapping.c', 'memory.c', 'physmem.c', diff --git a/system/ram-block-attributes.c b/system/ram-block-attributes.c new file mode 100644 index 0000000000..dbb8c9675b --- /dev/null +++ b/system/ram-block-attributes.c @@ -0,0 +1,442 @@ +/* + * QEMU ram block attributes + * + * Copyright Intel + * + * Author: + * Chenyi Qiang + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "system/ramblock.h" +#include "trace.h" + +OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(RamBlockAttributes, + ram_block_attributes, + RAM_BLOCK_ATTRIBUTES, + OBJECT, + { TYPE_RAM_DISCARD_MANAGER }, + { }) + +static size_t +ram_block_attributes_get_block_size(const RamBlockAttributes *attr) +{ + /* + * Because page conversion could be manipulated in the size of at leas= t 4K + * or 4K aligned, Use the host page size as the granularity to track t= he + * memory attribute. + */ + g_assert(attr && attr->ram_block); + g_assert(attr->ram_block->page_size =3D=3D qemu_real_host_page_size()); + return attr->ram_block->page_size; +} + + +static bool +ram_block_attributes_rdm_is_populated(const RamDiscardManager *rdm, + const MemoryRegionSection *section) +{ + const RamBlockAttributes *attr =3D RAM_BLOCK_ATTRIBUTES(rdm); + const size_t block_size =3D ram_block_attributes_get_block_size(attr); + const uint64_t first_bit =3D section->offset_within_region / block_siz= e; + const uint64_t last_bit =3D first_bit + int128_get64(section->size) / = block_size - 1; + unsigned long first_discarded_bit; + + first_discarded_bit =3D find_next_zero_bit(attr->bitmap, last_bit + 1, + first_bit); + return first_discarded_bit > last_bit; +} + +typedef int (*ram_block_attributes_section_cb)(MemoryRegionSection *s, + void *arg); + +static int +ram_block_attributes_notify_populate_cb(MemoryRegionSection *section, + void *arg) +{ + RamDiscardListener *rdl =3D arg; + + return rdl->notify_populate(rdl, section); +} + +static int +ram_block_attributes_notify_discard_cb(MemoryRegionSection *section, + void *arg) +{ + RamDiscardListener *rdl =3D arg; + + rdl->notify_discard(rdl, section); + return 0; +} + +static int +ram_block_attributes_for_each_populated_section(const RamBlockAttributes *= attr, + MemoryRegionSection *secti= on, + void *arg, + ram_block_attributes_secti= on_cb cb) +{ + unsigned long first_bit, last_bit; + uint64_t offset, size; + const size_t block_size =3D ram_block_attributes_get_block_size(attr); + int ret =3D 0; + + first_bit =3D section->offset_within_region / block_size; + first_bit =3D find_next_bit(attr->bitmap, attr->bitmap_size, + first_bit); + + while (first_bit < attr->bitmap_size) { + MemoryRegionSection tmp =3D *section; + + offset =3D first_bit * block_size; + last_bit =3D find_next_zero_bit(attr->bitmap, attr->bitmap_size, + first_bit + 1) - 1; + size =3D (last_bit - first_bit + 1) * block_size; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + break; + } + + ret =3D cb(&tmp, arg); + if (ret) { + error_report("%s: Failed to notify RAM discard listener: %s", + __func__, strerror(-ret)); + break; + } + + first_bit =3D find_next_bit(attr->bitmap, attr->bitmap_size, + last_bit + 2); + } + + return ret; +} + +static int +ram_block_attributes_for_each_discarded_section(const RamBlockAttributes *= attr, + MemoryRegionSection *secti= on, + void *arg, + ram_block_attributes_secti= on_cb cb) +{ + unsigned long first_bit, last_bit; + uint64_t offset, size; + const size_t block_size =3D ram_block_attributes_get_block_size(attr); + int ret =3D 0; + + first_bit =3D section->offset_within_region / block_size; + first_bit =3D find_next_zero_bit(attr->bitmap, attr->bitmap_size, + first_bit); + + while (first_bit < attr->bitmap_size) { + MemoryRegionSection tmp =3D *section; + + offset =3D first_bit * block_size; + last_bit =3D find_next_bit(attr->bitmap, attr->bitmap_size, + first_bit + 1) - 1; + size =3D (last_bit - first_bit + 1) * block_size; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + break; + } + + ret =3D cb(&tmp, arg); + if (ret) { + error_report("%s: Failed to notify RAM discard listener: %s", + __func__, strerror(-ret)); + break; + } + + first_bit =3D find_next_zero_bit(attr->bitmap, + attr->bitmap_size, + last_bit + 2); + } + + return ret; +} + +static uint64_t +ram_block_attributes_rdm_get_min_granularity(const RamDiscardManager *rdm, + const MemoryRegion *mr) +{ + const RamBlockAttributes *attr =3D RAM_BLOCK_ATTRIBUTES(rdm); + + g_assert(mr =3D=3D attr->ram_block->mr); + return ram_block_attributes_get_block_size(attr); +} + +static void +ram_block_attributes_rdm_register_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl, + MemoryRegionSection *section) +{ + RamBlockAttributes *attr =3D RAM_BLOCK_ATTRIBUTES(rdm); + int ret; + + g_assert(section->mr =3D=3D attr->ram_block->mr); + rdl->section =3D memory_region_section_new_copy(section); + + QLIST_INSERT_HEAD(&attr->rdl_list, rdl, next); + + ret =3D ram_block_attributes_for_each_populated_section(attr, section,= rdl, + ram_block_attributes_notify_populate_c= b); + if (ret) { + error_report("%s: Failed to register RAM discard listener: %s", + __func__, strerror(-ret)); + exit(1); + } +} + +static void +ram_block_attributes_rdm_unregister_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl) +{ + RamBlockAttributes *attr =3D RAM_BLOCK_ATTRIBUTES(rdm); + int ret; + + g_assert(rdl->section); + g_assert(rdl->section->mr =3D=3D attr->ram_block->mr); + + if (rdl->double_discard_supported) { + rdl->notify_discard(rdl, rdl->section); + } else { + ret =3D ram_block_attributes_for_each_populated_section(attr, + rdl->section, rdl, ram_block_attributes_notify_discard_cb); + if (ret) { + error_report("%s: Failed to unregister RAM discard listener: %= s", + __func__, strerror(-ret)); + exit(1); + } + } + + memory_region_section_free_copy(rdl->section); + rdl->section =3D NULL; + QLIST_REMOVE(rdl, next); +} + +typedef struct RamBlockAttributesReplayData { + ReplayRamDiscardState fn; + void *opaque; +} RamBlockAttributesReplayData; + +static int ram_block_attributes_rdm_replay_cb(MemoryRegionSection *section, + void *arg) +{ + RamBlockAttributesReplayData *data =3D arg; + + return data->fn(section, data->opaque); +} + +static int +ram_block_attributes_rdm_replay_populated(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamDiscardState replay_fn, + void *opaque) +{ + RamBlockAttributes *attr =3D RAM_BLOCK_ATTRIBUTES(rdm); + RamBlockAttributesReplayData data =3D { .fn =3D replay_fn, .opaque =3D= opaque }; + + g_assert(section->mr =3D=3D attr->ram_block->mr); + return ram_block_attributes_for_each_populated_section(attr, section, = &data, + ram_block_attributes_rdm_repla= y_cb); +} + +static int +ram_block_attributes_rdm_replay_discarded(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamDiscardState replay_fn, + void *opaque) +{ + RamBlockAttributes *attr =3D RAM_BLOCK_ATTRIBUTES(rdm); + RamBlockAttributesReplayData data =3D { .fn =3D replay_fn, .opaque =3D= opaque }; + + g_assert(section->mr =3D=3D attr->ram_block->mr); + return ram_block_attributes_for_each_discarded_section(attr, section, = &data, + ram_block_attributes_rdm_repla= y_cb); +} + +static bool +ram_block_attributes_is_valid_range(RamBlockAttributes *attr, uint64_t off= set, + uint64_t size) +{ + MemoryRegion *mr =3D attr->ram_block->mr; + + g_assert(mr); + + uint64_t region_size =3D memory_region_size(mr); + const size_t block_size =3D ram_block_attributes_get_block_size(attr); + + if (!QEMU_IS_ALIGNED(offset, block_size) || + !QEMU_IS_ALIGNED(size, block_size)) { + return false; + } + if (offset + size <=3D offset) { + return false; + } + if (offset + size > region_size) { + return false; + } + return true; +} + +static void ram_block_attributes_notify_discard(RamBlockAttributes *attr, + uint64_t offset, + uint64_t size) +{ + RamDiscardListener *rdl; + + QLIST_FOREACH(rdl, &attr->rdl_list, next) { + MemoryRegionSection tmp =3D *rdl->section; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + continue; + } + rdl->notify_discard(rdl, &tmp); + } +} + +static int +ram_block_attributes_notify_populate(RamBlockAttributes *attr, + uint64_t offset, uint64_t size) +{ + RamDiscardListener *rdl; + int ret =3D 0; + + QLIST_FOREACH(rdl, &attr->rdl_list, next) { + MemoryRegionSection tmp =3D *rdl->section; + + if (!memory_region_section_intersect_range(&tmp, offset, size)) { + continue; + } + ret =3D rdl->notify_populate(rdl, &tmp); + if (ret) { + break; + } + } + + return ret; +} + +int ram_block_attributes_state_change(RamBlockAttributes *attr, + uint64_t offset, uint64_t size, + bool to_discard) +{ + const size_t block_size =3D ram_block_attributes_get_block_size(attr); + const unsigned long first_bit =3D offset / block_size; + const unsigned long nbits =3D size / block_size; + const unsigned long last_bit =3D first_bit + nbits - 1; + const bool is_discarded =3D find_next_bit(attr->bitmap, attr->bitmap_s= ize, + first_bit) > last_bit; + const bool is_populated =3D find_next_zero_bit(attr->bitmap, + attr->bitmap_size, first_bit) > last_bit; + unsigned long bit; + int ret =3D 0; + + if (!ram_block_attributes_is_valid_range(attr, offset, size)) { + error_report("%s, invalid range: offset 0x%lx, size 0x%lx", + __func__, offset, size); + return -EINVAL; + } + + trace_ram_block_attributes_state_change(offset, size, + is_discarded ? "discarded" : + is_populated ? "populated" : + "mixture", + to_discard ? "discarded" : + "populated"); + if (to_discard) { + if (is_discarded) { + /* Already private */ + } else if (is_populated) { + /* Completely shared */ + bitmap_clear(attr->bitmap, first_bit, nbits); + ram_block_attributes_notify_discard(attr, offset, size); + } else { + /* Unexpected mixture: process individual blocks */ + for (bit =3D first_bit; bit < first_bit + nbits; bit++) { + if (!test_bit(bit, attr->bitmap)) { + continue; + } + clear_bit(bit, attr->bitmap); + ram_block_attributes_notify_discard(attr, bit * block_size, + block_size); + } + } + } else { + if (is_populated) { + /* Already shared */ + } else if (is_discarded) { + /* Completely private */ + bitmap_set(attr->bitmap, first_bit, nbits); + ret =3D ram_block_attributes_notify_populate(attr, offset, siz= e); + } else { + /* Unexpected mixture: process individual blocks */ + for (bit =3D first_bit; bit < first_bit + nbits; bit++) { + if (test_bit(bit, attr->bitmap)) { + continue; + } + set_bit(bit, attr->bitmap); + ret =3D ram_block_attributes_notify_populate(attr, + bit * block_siz= e, + block_size); + if (ret) { + break; + } + } + } + } + + return ret; +} + +RamBlockAttributes *ram_block_attributes_create(RAMBlock *ram_block) +{ + const int block_size =3D qemu_real_host_page_size(); + RamBlockAttributes *attr; + MemoryRegion *mr =3D ram_block->mr; + + attr =3D RAM_BLOCK_ATTRIBUTES(object_new(TYPE_RAM_BLOCK_ATTRIBUTES)); + + attr->ram_block =3D ram_block; + if (memory_region_set_ram_discard_manager(mr, RAM_DISCARD_MANAGER(attr= ))) { + object_unref(OBJECT(attr)); + return NULL; + } + attr->bitmap_size =3D ROUND_UP(mr->size, block_size) / block_size; + attr->bitmap =3D bitmap_new(attr->bitmap_size); + + return attr; +} + +void ram_block_attributes_destroy(RamBlockAttributes *attr) +{ + g_assert(attr); + + g_free(attr->bitmap); + memory_region_set_ram_discard_manager(attr->ram_block->mr, NULL); + object_unref(OBJECT(attr)); +} + +static void ram_block_attributes_init(Object *obj) +{ + RamBlockAttributes *attr =3D RAM_BLOCK_ATTRIBUTES(obj); + + QLIST_INIT(&attr->rdl_list); +} + +static void ram_block_attributes_finalize(Object *obj) +{ +} + +static void ram_block_attributes_class_init(ObjectClass *klass, + const void *data) +{ + RamDiscardManagerClass *rdmc =3D RAM_DISCARD_MANAGER_CLASS(klass); + + rdmc->get_min_granularity =3D ram_block_attributes_rdm_get_min_granula= rity; + rdmc->register_listener =3D ram_block_attributes_rdm_register_listener; + rdmc->unregister_listener =3D ram_block_attributes_rdm_unregister_list= ener; + rdmc->is_populated =3D ram_block_attributes_rdm_is_populated; + rdmc->replay_populated =3D ram_block_attributes_rdm_replay_populated; + rdmc->replay_discarded =3D ram_block_attributes_rdm_replay_discarded; +} diff --git a/system/trace-events b/system/trace-events index be12ebfb41..82856e44f2 100644 --- a/system/trace-events +++ b/system/trace-events @@ -52,3 +52,6 @@ dirtylimit_state_finalize(void) dirtylimit_throttle_pct(int cpu_index, uint64_t pct, int64_t time_us) "CPU= [%d] throttle percent: %" PRIu64 ", throttle adjust time %"PRIi64 " us" dirtylimit_set_vcpu(int cpu_index, uint64_t quota) "CPU[%d] set dirty page= rate limit %"PRIu64 dirtylimit_vcpu_execute(int cpu_index, int64_t sleep_time_us) "CPU[%d] sle= ep %"PRIi64 " us" + +# ram-block-attributes.c +ram_block_attributes_state_change(uint64_t offset, uint64_t size, const ch= ar *from, const char *to) "offset 0x%"PRIx64" size 0x%"PRIx64" from '%s' to= '%s'" --=20 2.43.5 From nobody Sat Nov 15 15:29:54 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=intel.com ARC-Seal: i=1; a=rsa-sha256; t=1749716990; cv=none; d=zohomail.com; s=zohoarc; b=S0+mE7KCF6/LzY9DuJ+MORaoLJTzZVNr12bvO9rd6J5MBDnQ9iz62KP1lq1KzgWgX8B+0M26BieXlX5ej2XCdjFvxX4KwrOEweMpTOo2F2zOPCYaV2wO9BeWCiifiYlIOR1CxqY4fhdciEvo2nuuPw6uG/tbG9H+PO6TfX6lQ84= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1749716990; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=tMrz1Y4jnujL/E91IfoqHFMSaB64T462nYyJXsq1PhM=; b=U611TlamlXr+N0nFmFmxDIrR1ed1LNG0lpQGcnwI86Y6wE4OyhEh5WQMkiou0eCXP98QRzcJFGjswMxmjboE/pS/hpBEEcJfcfZ86x4bimszb8WiaQas4hoAxriCqOE3pIymy49foF/Q9sgQjsih+i35ubYbuXGqgILFfhRcRug= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1749716990361539.0390930331048; Thu, 12 Jun 2025 01:29:50 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uPdIR-0008A4-Ct; Thu, 12 Jun 2025 04:28:27 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPdIL-000852-4r for qemu-devel@nongnu.org; Thu, 12 Jun 2025 04:28:21 -0400 Received: from mgamail.intel.com ([192.198.163.8]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uPdIJ-0004Fi-5r for qemu-devel@nongnu.org; Thu, 12 Jun 2025 04:28:20 -0400 Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2025 01:28:18 -0700 Received: from emr-bkc.sh.intel.com ([10.112.230.82]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2025 01:28:14 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749716899; x=1781252899; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2q50YptO0BwNcKJJ1GYtGtPlphWJ7IdYPt7wdCRzghY=; b=DEfbGYJx+pYQ4Z7mkbbelg421f2jPNSguk8YO3Bt0UHOc9/3sXToVjd0 p+MJJ5yYTY165QeIwBLPaiDe7kGS7LQyd4zLuOy35qYsoG8BV5BD++a4i LdSgeT0H7lBEzdZz2Rg8LKlkOy9+Sm6pzeXZrIlyL/qc+gmtoGhCyHvvK ASvvYd7Mo+WGlzAeuTrg6T66Bt8OYctox6YlBoFi5nv7MHcDvvY1j2Ho7 4Vc0diykrrwsTtiV1AQaYEe/Q+LQB3A9frrkb4elEUCWks53i7zDNDS8O imEUA8DWlaB2q59dt2lixAhLBSsz28XjMmhnW43R3C+dj+78eepYqgC60 Q==; X-CSE-ConnectionGUID: UIyApgjqQhegiqlT8p27mg== X-CSE-MsgGUID: rO8e3gziQWek7TmzjTiCGA== X-IronPort-AV: E=McAfee;i="6800,10657,11461"; a="69453464" X-IronPort-AV: E=Sophos;i="6.16,230,1744095600"; d="scan'208";a="69453464" X-CSE-ConnectionGUID: echUqQE1SK64eZ7/AldPRw== X-CSE-MsgGUID: fz3XyqcsTkGUeb1OAGLQ2g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,230,1744095600"; d="scan'208";a="152442064" From: Chenyi Qiang To: David Hildenbrand , Alexey Kardashevskiy , Peter Xu , Gupta Pankaj , Paolo Bonzini , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Michael Roth Cc: Chenyi Qiang , qemu-devel@nongnu.org, kvm@vger.kernel.org, Williams Dan J , Zhao Liu , Baolu Lu , Gao Chao , Xu Yilun , Li Xiaoyao , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , Alex Williamson Subject: [PATCH v7 5/5] physmem: Support coordinated discarding of RAM with guest_memfd Date: Thu, 12 Jun 2025 16:27:46 +0800 Message-ID: <20250612082747.51539-6-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250612082747.51539-1-chenyi.qiang@intel.com> References: <20250612082747.51539-1-chenyi.qiang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=192.198.163.8; envelope-from=chenyi.qiang@intel.com; helo=mgamail.intel.com X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @intel.com) X-ZM-MESSAGEID: 1749716990971116600 Content-Type: text/plain; charset="utf-8" A new field, attributes, was introduced in RAMBlock to link to a RamBlockAttributes object, which centralizes all guest_memfd related information (such as fd and status bitmap) within a RAMBlock. Create and initialize the RamBlockAttributes object upon ram_block_add(). Meanwhile, register the object in the target RAMBlock's MemoryRegion. After that, guest_memfd-backed RAMBlock is associated with the RamDiscardManager interface, and the users can execute RamDiscardManager specific handling. For example, VFIO will register the RamDiscardListener and get notifications when the state_change() helper invokes. As coordinate discarding of RAM with guest_memfd is now supported, only block uncoordinated discard. Tested-by: Alexey Kardashevskiy Reviewed-by: Alexey Kardashevskiy Acked-by: David Hildenbrand Signed-off-by: Chenyi Qiang --- Changes in v7: - Add some document about the attribute and status consistency (David). - Add Reviewed-by and Acked-by from Alexey and David. Changes in v6: - Squash the unblocking of cooridnate discard into this commit. - Remove the checks in migration path. Changes in v5: - Revert to use RamDiscardManager interface. - Move the object_new() into the ram_block_attribute_create() helper. - Add some check in migration path. Changes in v4: - Remove the replay operations for attribute changes which will be handled in a listener in following patches. - Add some comment in the error path of realize() to remind the future development of the unified error path. Changes in v3: - Use ram_discard_manager_reply_populated/discarded() to set the memory attribute and add the undo support if state_change() failed. - Didn't add Reviewed-by from Alexey due to the new changes in this commit. --- accel/kvm/kvm-all.c | 9 +++++++++ include/system/ramblock.h | 1 + system/physmem.c | 23 +++++++++++++++++++++-- 3 files changed, 31 insertions(+), 2 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 51526d301b..3b390bbb09 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -3089,6 +3089,15 @@ int kvm_convert_memory(hwaddr start, hwaddr size, bo= ol to_private) addr =3D memory_region_get_ram_ptr(mr) + section.offset_within_region; rb =3D qemu_ram_block_from_host(addr, false, &offset); =20 + ret =3D ram_block_attributes_state_change(RAM_BLOCK_ATTRIBUTES(mr->rdm= ), + offset, size, to_private); + if (ret) { + error_report("Failed to notify the listener the state change of " + "(0x%"HWADDR_PRIx" + 0x%"HWADDR_PRIx") to %s", + start, size, to_private ? "private" : "shared"); + goto out_unref; + } + if (to_private) { if (rb->page_size !=3D qemu_real_host_page_size()) { /* diff --git a/include/system/ramblock.h b/include/system/ramblock.h index 1bab9e2dac..87e847e184 100644 --- a/include/system/ramblock.h +++ b/include/system/ramblock.h @@ -46,6 +46,7 @@ struct RAMBlock { int fd; uint64_t fd_offset; int guest_memfd; + RamBlockAttributes *attributes; size_t page_size; /* dirty bitmap used during migration */ unsigned long *bmap; diff --git a/system/physmem.c b/system/physmem.c index a8a9ca309e..ff0ca40222 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -1916,7 +1916,7 @@ static void ram_block_add(RAMBlock *new_block, Error = **errp) } assert(new_block->guest_memfd < 0); =20 - ret =3D ram_block_discard_require(true); + ret =3D ram_block_coordinated_discard_require(true); if (ret < 0) { error_setg_errno(errp, -ret, "cannot set up private guest memory: discard = currently blocked"); @@ -1931,6 +1931,24 @@ static void ram_block_add(RAMBlock *new_block, Error= **errp) goto out_free; } =20 + /* + * The attribute bitmap of the RamBlockAttributes is default to + * discarded, which mimics the behavior of kvm_set_phys_mem() when= it + * calls kvm_set_memory_attributes_private(). This leads to a brief + * period of inconsistency between the creation of the RAMBlock an= d its + * mapping into the physical address space. However, this is not + * problematic, as no users rely on the attribute status to perform + * any actions during this interval. + */ + new_block->attributes =3D ram_block_attributes_create(new_block); + if (!new_block->attributes) { + error_setg(errp, "Failed to create ram block attribute"); + close(new_block->guest_memfd); + ram_block_coordinated_discard_require(false); + qemu_mutex_unlock_ramlist(); + goto out_free; + } + /* * Add a specific guest_memfd blocker if a generic one would not be * added by ram_block_add_cpr_blocker. @@ -2287,8 +2305,9 @@ static void reclaim_ramblock(RAMBlock *block) } =20 if (block->guest_memfd >=3D 0) { + ram_block_attributes_destroy(block->attributes); close(block->guest_memfd); - ram_block_discard_require(false); + ram_block_coordinated_discard_require(false); } =20 g_free(block); --=20 2.43.5