From nobody Tue Nov 11 08:32:43 2025 Delivered-To: importer@patchew.org Received-SPF: none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain of lists.xenproject.org) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain of lists.xenproject.org) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail(p=none dis=none) header.from=intel.com ARC-Seal: i=1; a=rsa-sha256; t=1569426712; cv=none; d=zoho.com; s=zohoarc; b=ln2YYwuoDlqzdEX++FbvEqXQZAmkCAB6df0lwthbBO4p/Ub0QbgJzfpuUl5Unh+Jc61QgZbpyRkDY5DoWFtNbPgOpbtUXvZbzJdVeb3L/eN5orVxBeaAqKL0T0XGaA7n2bkLV9Iw2d2mt8c0YXV29lmWQLAJoiln6UtIl0nrzWA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1569426712; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=37SjMAva2uuyS02fTSw7r+Fa1d4mf0x3/7LcUG39pFo=; b=IGweg5AUuharRutXl7gRRfV+sRP7F5ZcCwX+/yM8zHHxdTY6EpwRgCKK1YuBqZ9Ryzy4WHncl1HG1YWWFumcedun75m2HC2AdgKLTIPbx5hZJGLSkMS/T1nPjBGbBmr3Q/e+vr9Hgwfe6J3NfMHDPvCajFD+qzGwNVCNdmkPnB0= ARC-Authentication-Results: i=1; mx.zoho.com; spf=none (zoho.com: 192.237.175.120 is neither permitted nor denied by domain of lists.xenproject.org) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1569426712774561.9900546654966; Wed, 25 Sep 2019 08:51:52 -0700 (PDT) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iD9Ys-0002Ij-Cs; Wed, 25 Sep 2019 15:50:38 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iD9Yq-0002HV-Pa for xen-devel@lists.xenproject.org; Wed, 25 Sep 2019 15:50:36 +0000 Received: from mga12.intel.com (unknown [192.55.52.136]) by localhost (Halon) with ESMTPS id 15334a78-dfac-11e9-8628-bc764e2007e4; Wed, 25 Sep 2019 15:49:39 +0000 (UTC) Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2019 08:49:39 -0700 Received: from tlengyel-mobl2.amr.corp.intel.com (HELO localhost.localdomain) ([10.252.129.153]) by orsmga006.jf.intel.com with ESMTP; 25 Sep 2019 08:49:38 -0700 X-Inumbo-ID: 15334a78-dfac-11e9-8628-bc764e2007e4 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,548,1559545200"; d="scan'208";a="193812716" From: Tamas K Lengyel To: xen-devel@lists.xenproject.org Date: Wed, 25 Sep 2019 08:48:55 -0700 Message-Id: X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [Xen-devel] [RFC PATCH for-next 17/18] xen/mem_sharing: VM forking X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Stefano Stabellini , Tamas K Lengyel , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Tim Deegan , Julien Grall , Tamas K Lengyel , Jan Beulich , =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" VM forking is the process of creating a domain with an empty memory space a= nd a parent domain specified from which to populate the memory when necessary. F= or the new domain to be functional the VM state is copied over as part of the = fork operation (HVM params, hap allocation, etc). Signed-off-by: Tamas K Lengyel --- xen/arch/x86/hvm/hvm.c | 2 +- xen/arch/x86/mm/mem_sharing.c | 235 ++++++++++++++++++++++++++++++ xen/arch/x86/mm/p2m.c | 11 +- xen/include/asm-x86/mem_sharing.h | 20 ++- xen/include/public/memory.h | 5 + xen/include/xen/sched.h | 1 + 6 files changed, 270 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 2af2f936a5..872bd112ba 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1890,7 +1890,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned l= ong gla, } #endif =20 - /* Spurious fault? PoD and log-dirty also take this path. */ + /* Spurious fault? PoD, log-dirty and VM forking also take this path. = */ if ( p2m_is_ram(p2mt) ) { rc =3D 1; diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index f54969bcad..64b9723f8c 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -22,11 +22,13 @@ =20 #include #include +#include #include #include #include #include #include +#include #include #include #include @@ -36,6 +38,9 @@ #include #include #include +#include +#include +#include #include =20 #include "mm-locks.h" @@ -1423,6 +1428,207 @@ static inline int mem_sharing_control(struct domain= *d, bool enable) return 0; } =20 +/* + * Forking a page only gets called when the VM faults due to no entry being + * in the EPT for the access. Depending on the type of access we either + * populate the physmap with a shared entry for read-only access or + * fork the page if its a write access. + * + * The client p2m is already locked so we only need to lock + * the parent's here. + */ +int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing) +{ + int rc =3D -ENOENT; + shr_handle_t handle; + struct domain *parent; + struct p2m_domain *p2m; + unsigned long gfn_l =3D gfn_x(gfn); + mfn_t mfn, new_mfn; + p2m_type_t p2mt; + struct page_info *page; + + if ( !mem_sharing_is_fork(d) ) + return -ENOENT; + + parent =3D d->parent; + + if ( !unsharing ) + { + /* For read-only accesses we just add a shared entry to the physma= p */ + while ( parent ) + { + if ( !(rc =3D nominate_page(parent, gfn, 0, &handle)) ) + break; + + parent =3D parent->parent; + } + + if ( !rc ) + { + /* The client's p2m is already locked */ + struct p2m_domain *pp2m =3D p2m_get_hostp2m(parent); + + p2m_lock(pp2m); + rc =3D add_to_physmap(parent, gfn_l, handle, d, gfn_l, false); + p2m_unlock(pp2m); + + if ( !rc ) + return 0; + } + } + + /* + * If it's a write access (ie. unsharing) or if adding a shared entry = to + * the physmap failed we'll fork the page directly. + */ + p2m =3D p2m_get_hostp2m(d); + parent =3D d->parent; + + while ( parent ) + { + mfn =3D get_gfn_query(parent, gfn_l, &p2mt); + + if ( mfn_valid(mfn) && p2m_is_any_ram(p2mt) ) + break; + + put_gfn(parent, gfn_l); + parent =3D parent->parent; + } + + if ( !parent ) + return -ENOENT; + + if ( !(page =3D alloc_domheap_page(d, 0)) ) + { + put_gfn(parent, gfn_l); + return -ENOMEM; + } + + new_mfn =3D page_to_mfn(page); + copy_domain_page(new_mfn, mfn); + set_gpfn_from_mfn(mfn_x(new_mfn), gfn_l); + + put_gfn(parent, gfn_l); + + return p2m->set_entry(p2m, gfn, new_mfn, PAGE_ORDER_4K, p2m_ram_rw, + p2m->default_access, -1); +} + +static int bring_up_vcpus(struct domain *cd, struct cpupool *cpupool) +{ + int ret; + unsigned int i, cpu; + cpumask_t *online; + + if ( (ret =3D cpupool_move_domain(cd, cpupool)) ) + return ret; + + for ( i =3D 0; i < cd->max_vcpus; i++ ) + { + if ( cd->vcpu[i] ) + continue; + + online =3D cpupool_domain_cpumask(cd); + + cpu =3D (i =3D=3D 0) ? + cpumask_any(online) : + cpumask_cycle(cd->vcpu[i-1]->processor, online); + + if ( !vcpu_create(cd, i, cpu) ) + return -EINVAL; + } + + domain_update_node_affinity(cd); + return 0; +} + +static int fork_hap_allocation(struct domain *d, struct domain *cd) +{ + int rc; + bool preempted; + unsigned long mb =3D hap_get_allocation(d); + + if ( mb =3D=3D hap_get_allocation(cd) ) + return 0; + + paging_lock(cd); + rc =3D hap_set_allocation(cd, mb << (20 - PAGE_SHIFT), &preempted); + paging_unlock(cd); + + if ( rc ) + return rc; + + if ( preempted ) + return -ERESTART; + + return 0; +} + +static int fork_hvm(struct domain *d, struct domain *cd) +{ + int rc, i; + struct hvm_domain_context c =3D { 0 }; + uint32_t tsc_mode; + uint32_t gtsc_khz; + uint32_t incarnation; + uint64_t elapsed_nsec; + + c.size =3D hvm_save_size(d); + if ( (c.data =3D xmalloc_bytes(c.size)) =3D=3D NULL ) + return -ENOMEM; + + for ( i =3D 0; i < HVM_NR_PARAMS; i++ ) + { + uint64_t value =3D 0; + + if ( hvm_get_param(d, i, &value) || !value ) + continue; + + if ( (rc =3D hvm_set_param(cd, i, value)) ) + goto out; + } + + tsc_get_info(d, &tsc_mode, &elapsed_nsec, >sc_khz, &incarnation); + tsc_set_info(cd, tsc_mode, elapsed_nsec, gtsc_khz, incarnation); + + if ( (rc =3D hvm_save(d, &c)) ) + goto out; + + c.cur =3D 0; + rc =3D hvm_load(cd, &c); + +out: + xfree(c.data); + return rc; +} + +static int mem_sharing_fork(struct domain *d, struct domain *cd) +{ + int rc; + + if ( !d->controller_pause_count && + (rc =3D domain_pause_by_systemcontroller(d)) ) + return rc; + + cd->max_pages =3D d->max_pages; + cd->max_vcpus =3D d->max_vcpus; + + /* this is preemptible so it's the first to get done */ + if ( (rc =3D fork_hap_allocation(d, cd)) ) + return rc; + + if ( (rc =3D bring_up_vcpus(cd, d->cpupool)) ) + return rc; + + if ( (rc =3D fork_hvm(d, cd)) ) + return rc; + + cd->parent =3D d; + + return 0; +} + int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) { int rc; @@ -1677,6 +1883,35 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem= _sharing_op_t) arg) rc =3D debug_gref(d, mso.u.debug.u.gref); break; =20 + case XENMEM_sharing_op_fork: + { + struct domain *pd; + + rc =3D -EINVAL; + if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] || + mso.u.fork._pad[2] ) + goto out; + + rc =3D rcu_lock_live_remote_domain_by_id(mso.u.fork.parent_dom= ain, + &pd); + if ( rc ) + break; + + if ( !mem_sharing_enabled(pd) ) + { + if ( (rc =3D mem_sharing_control(pd, true)) ) + return rc; + } + + rc =3D mem_sharing_fork(pd, d); + + if ( rc =3D=3D -ERESTART ) + rc =3D hypercall_create_continuation(__HYPERVISOR_memory_o= p, + "lh", XENMEM_sharing_op, + arg); + rcu_unlock_domain(pd); + break; + } default: rc =3D -ENOSYS; break; diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index aee0347785..97872a7cc4 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -503,6 +503,14 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, un= signed long gfn_l, =20 mfn =3D p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL); =20 + /* Check if we need to fork the page */ + if ( (q & P2M_ALLOC) && p2m_is_hole(*t) && + !mem_sharing_fork_page(p2m->domain, gfn, !!(q & P2M_UNSHARE)) ) + { + mfn =3D p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL); + } + + /* Check if we need to unshare the page */ if ( (q & P2M_UNSHARE) && p2m_is_shared(*t) ) { ASSERT(p2m_is_hostp2m(p2m)); @@ -581,7 +589,8 @@ struct page_info *p2m_get_page_from_gfn( return page; =20 /* Error path: not a suitable GFN at all */ - if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_pod(*t) ) + if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_pod(*t) && + !mem_sharing_is_fork(p2m->domain) ) return NULL; } =20 diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sh= aring.h index 18302adbfa..a5617c87dd 100644 --- a/xen/include/asm-x86/mem_sharing.h +++ b/xen/include/asm-x86/mem_sharing.h @@ -26,8 +26,7 @@ =20 #ifdef CONFIG_MEM_SHARING =20 -struct mem_sharing_domain -{ +struct mem_sharing_domain { bool enabled; =20 /* @@ -40,6 +39,9 @@ struct mem_sharing_domain #define mem_sharing_enabled(d) \ (hap_enabled(d) && (d)->arch.hvm.mem_sharing.enabled) =20 +#define mem_sharing_is_fork(d) \ + (mem_sharing_enabled(d) && !!((d)->parent)) + /* Auditing of memory sharing code? */ #ifndef NDEBUG #define MEM_SHARING_AUDIT 1 @@ -90,6 +92,9 @@ int mem_sharing_unshare_page(struct domain *d, return rc; } =20 +int mem_sharing_fork_page(struct domain *d, gfn_t gfn, + bool unsharing); + /* * If called by a foreign domain, possible errors are * -EBUSY -> ring full @@ -119,6 +124,7 @@ int relinquish_shared_pages(struct domain *d); #else =20 #define mem_sharing_enabled(d) false +#define mem_sharing_is_fork(p2m) false =20 static inline unsigned int mem_sharing_get_nr_saved_mfns(void) { @@ -145,6 +151,16 @@ int mem_sharing_notify_enomem(struct domain *d, unsign= ed long gfn, return -EOPNOTSUPP; } =20 +static inline int mem_sharing_fork(struct domain *d, struct domain *cd, bo= ol vcpu) +{ + return -EOPNOTSUPP; +} + +static inline int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool = lock) +{ + return -EOPNOTSUPP; +} + #endif =20 #endif /* __MEM_SHARING_H__ */ diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h index cfdda6e2a8..90a3f4498e 100644 --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -482,6 +482,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t); #define XENMEM_sharing_op_add_physmap 6 #define XENMEM_sharing_op_audit 7 #define XENMEM_sharing_op_range_share 8 +#define XENMEM_sharing_op_fork 9 =20 #define XENMEM_SHARING_OP_S_HANDLE_INVALID (-10) #define XENMEM_SHARING_OP_C_HANDLE_INVALID (-9) @@ -532,6 +533,10 @@ struct xen_mem_sharing_op { uint32_t gref; /* IN: gref to debug */ } u; } debug; + struct mem_sharing_op_fork { + domid_t parent_domain; + uint16_t _pad[3]; /* Must be set to 0 */ + } fork; } u; }; typedef struct xen_mem_sharing_op xen_mem_sharing_op_t; diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 2d17c84915..dad6715d14 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -455,6 +455,7 @@ struct domain /* Memory sharing support */ #ifdef CONFIG_MEM_SHARING struct vm_event_domain *vm_event_share; + struct domain *parent; /* VM fork parent */ #endif /* Memory paging support */ #ifdef CONFIG_HAS_MEM_PAGING --=20 2.20.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel