From nobody Mon Feb 9 14:08:32 2026 Delivered-To: importer@patchew.org Received-SPF: none (zohomail.com: 192.237.175.120 is neither permitted nor denied by domain of lists.xenproject.org) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; spf=none (zohomail.com: 192.237.175.120 is neither permitted nor denied by domain of lists.xenproject.org) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail(p=none dis=none) header.from=intel.com ARC-Seal: i=1; a=rsa-sha256; t=1582658376; cv=none; d=zohomail.com; s=zohoarc; b=E2czJ5nzkCOyqJ1dG1EEFfbKuh69ImPeNUMKY9X612CCaSYFjkHdu0iApRJ74sH1orS0l2qyD1jCcTdUtmwmRYlvQ8+leMWyotm4tPufboGNgJ1iuT0fCPUUfLdAg6xAEpsUDBFxRw6rFpc236lpNK5hNMt5fyzggYWi2io++QU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1582658376; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=JHgrhGGRDpKtNQJi73YLcpGMSHP2l6k731t8QurHKGU=; b=FHfGPm7bD1wwT1+mgavVUaqh9lcZ1RnFi1xYY1wHuzvvvPdpJVB/JlCzpRrix0C5H5dIBkdLZVy2AW426F1SZhEYGirUZ6bXmp6PfXYvljGgmvfhUljgs4NR0Cj5Flz3b+ZTHGvGJOP6EQGILrsR9KPSJkZc/hIRGrYRYH5Y1aA= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=none (zohomail.com: 192.237.175.120 is neither permitted nor denied by domain of lists.xenproject.org) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1582658376243343.1832387967793; Tue, 25 Feb 2020 11:19:36 -0800 (PST) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1j6fim-00078Q-5J; Tue, 25 Feb 2020 19:18:20 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1j6fik-000785-Eo for xen-devel@lists.xenproject.org; Tue, 25 Feb 2020 19:18:18 +0000 Received: from mga06.intel.com (unknown [134.134.136.31]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 90725402-5803-11ea-a490-bc764e2007e4; Tue, 25 Feb 2020 19:18:12 +0000 (UTC) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Feb 2020 11:18:11 -0800 Received: from tlengyel-mobl2.amr.corp.intel.com (HELO localhost.localdomain) ([10.254.187.145]) by orsmga003.jf.intel.com with ESMTP; 25 Feb 2020 11:18:08 -0800 X-Inumbo-ID: 90725402-5803-11ea-a490-bc764e2007e4 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,485,1574150400"; d="scan'208";a="237776382" From: Tamas K Lengyel To: xen-devel@lists.xenproject.org Date: Tue, 25 Feb 2020 11:17:55 -0800 Message-Id: <8df741964b56c10ed912f9187dcb31aae7251085.1582658216.git.tamas.lengyel@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [Xen-devel] [PATCH v10 1/3] xen/mem_sharing: VM forking X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Stefano Stabellini , Tamas K Lengyel , Wei Liu , Konrad Rzeszutek Wilk , Andrew Cooper , Ian Jackson , George Dunlap , Tamas K Lengyel , Jan Beulich , Julien Grall , =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" VM forking is the process of creating a domain with an empty memory space a= nd a parent domain specified from which to populate the memory when necessary. F= or the new domain to be functional the VM state is copied over as part of the = fork operation (HVM params, hap allocation, etc). Signed-off-by: Tamas K Lengyel --- v10: setup vcpu_info pages for vCPUs in the fork if the parent has them setup pages for special HVM PFNs if the parent has them minor adjustments based on Roger's comments --- xen/arch/x86/domain.c | 11 ++ xen/arch/x86/hvm/hvm.c | 4 +- xen/arch/x86/mm/hap/hap.c | 3 +- xen/arch/x86/mm/mem_sharing.c | 287 ++++++++++++++++++++++++++++++ xen/arch/x86/mm/p2m.c | 9 +- xen/common/domain.c | 3 + xen/include/asm-x86/hap.h | 1 + xen/include/asm-x86/hvm/hvm.h | 2 + xen/include/asm-x86/mem_sharing.h | 17 ++ xen/include/public/memory.h | 5 + xen/include/xen/sched.h | 5 + 11 files changed, 342 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index fe63c23676..1ab0ca0942 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -2203,6 +2203,17 @@ int domain_relinquish_resources(struct domain *d) ret =3D relinquish_shared_pages(d); if ( ret ) return ret; + + /* + * If the domain is forked, decrement the parent's pause count + * and release the domain. + */ + if ( d->parent ) + { + domain_unpause(d->parent); + put_domain(d->parent); + d->parent =3D NULL; + } } #endif =20 diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index a339b36a0d..c284f3cf5f 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1915,7 +1915,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned l= ong gla, } #endif =20 - /* Spurious fault? PoD and log-dirty also take this path. */ + /* Spurious fault? PoD, log-dirty and VM forking also take this path. = */ if ( p2m_is_ram(p2mt) ) { rc =3D 1; @@ -4429,7 +4429,7 @@ static int hvm_allow_get_param(struct domain *d, return rc; } =20 -static int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value) +int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value) { int rc; =20 diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index 3d93f3451c..c7c7ff6e99 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -321,8 +321,7 @@ static void hap_free_p2m_page(struct domain *d, struct = page_info *pg) } =20 /* Return the size of the pool, rounded up to the nearest MB */ -static unsigned int -hap_get_allocation(struct domain *d) +unsigned int hap_get_allocation(struct domain *d) { unsigned int pg =3D d->arch.paging.hap.total_pages + d->arch.paging.hap.p2m_pages; diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 3835bc928f..8ee37e6943 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -22,6 +22,7 @@ =20 #include #include +#include #include #include #include @@ -36,6 +37,8 @@ #include #include #include +#include +#include #include =20 #include "mm-locks.h" @@ -1444,6 +1447,263 @@ static inline int mem_sharing_control(struct domain= *d, bool enable) return 0; } =20 +/* + * Forking a page only gets called when the VM faults due to no entry being + * in the EPT for the access. Depending on the type of access we either + * populate the physmap with a shared entry for read-only access or + * fork the page if its a write access. + * + * The client p2m is already locked so we only need to lock + * the parent's here. + */ +int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing) +{ + int rc =3D -ENOENT; + shr_handle_t handle; + struct domain *parent =3D d->parent; + struct p2m_domain *p2m; + unsigned long gfn_l =3D gfn_x(gfn); + mfn_t mfn, new_mfn; + p2m_type_t p2mt; + struct page_info *page; + + if ( !mem_sharing_is_fork(d) ) + return -ENOENT; + + if ( !unsharing ) + { + /* For read-only accesses we just add a shared entry to the physma= p */ + while ( parent ) + { + if ( !(rc =3D nominate_page(parent, gfn, 0, &handle)) ) + break; + + parent =3D parent->parent; + } + + if ( !rc ) + { + /* The client's p2m is already locked */ + struct p2m_domain *pp2m =3D p2m_get_hostp2m(parent); + + p2m_lock(pp2m); + rc =3D add_to_physmap(parent, gfn_l, handle, d, gfn_l, false); + p2m_unlock(pp2m); + + if ( !rc ) + return 0; + } + } + + /* + * If it's a write access (ie. unsharing) or if adding a shared entry = to + * the physmap failed we'll fork the page directly. + */ + p2m =3D p2m_get_hostp2m(d); + parent =3D d->parent; + + while ( parent ) + { + mfn =3D get_gfn_query(parent, gfn_l, &p2mt); + + /* + * We can't fork grant memory from the parent, only regular ram. + */ + if ( mfn_valid(mfn) && p2m_is_ram(p2mt) ) + break; + + put_gfn(parent, gfn_l); + parent =3D parent->parent; + } + + if ( !parent ) + return -ENOENT; + + if ( !(page =3D alloc_domheap_page(d, 0)) ) + { + put_gfn(parent, gfn_l); + return -ENOMEM; + } + + new_mfn =3D page_to_mfn(page); + copy_domain_page(new_mfn, mfn); + set_gpfn_from_mfn(mfn_x(new_mfn), gfn_l); + + put_gfn(parent, gfn_l); + + return p2m->set_entry(p2m, gfn, new_mfn, PAGE_ORDER_4K, p2m_ram_rw, + p2m->default_access, -1); +} + +static int bring_up_vcpus(struct domain *cd, struct domain *d) +{ + unsigned int i; + struct p2m_domain *p2m =3D p2m_get_hostp2m(cd); + int ret =3D -EINVAL; + + if ( d->max_vcpus !=3D cd->max_vcpus ) + return ret; + + if ( (ret =3D cpupool_move_domain(cd, d->cpupool)) ) + return ret; + + for ( i =3D 0; i < cd->max_vcpus; i++ ) + { + mfn_t vcpu_info_mfn; + + if ( !d->vcpu[i] || cd->vcpu[i] ) + continue; + + if ( !vcpu_create(cd, i) ) + return -EINVAL; + + /* + * Map in a page for the vcpu_info if the guest uses one to the ex= act + * same spot. + */ + vcpu_info_mfn =3D d->vcpu[i]->vcpu_info_mfn; + if ( !mfn_eq(vcpu_info_mfn, INVALID_MFN) ) + { + struct page_info *page; + mfn_t new_mfn; + gfn_t gfn =3D mfn_to_gfn(d, vcpu_info_mfn); + unsigned long gfn_l =3D gfn_x(gfn); + + if ( !(page =3D alloc_domheap_page(cd, 0)) ) + return -ENOMEM; + + new_mfn =3D page_to_mfn(page); + set_gpfn_from_mfn(mfn_x(new_mfn), gfn_l); + + if ( !(ret =3D p2m->set_entry(p2m, gfn, new_mfn, PAGE_ORDER_4K, + p2m_ram_rw, p2m->default_access, -= 1)) ) + return ret; + + if ( !(ret =3D map_vcpu_info(cd->vcpu[i], gfn_l, + d->vcpu[i]->vcpu_info_offset)) ) + return ret; + } + } + + domain_update_node_affinity(cd); + return 0; +} + +static int fork_hap_allocation(struct domain *cd, struct domain *d) +{ + int rc; + bool preempted; + unsigned long mb =3D hap_get_allocation(d); + + if ( mb =3D=3D hap_get_allocation(cd) ) + return 0; + + paging_lock(cd); + rc =3D hap_set_allocation(cd, mb << (20 - PAGE_SHIFT), &preempted); + paging_unlock(cd); + + return preempted ? -ERESTART : rc; +} + +static void fork_tsc(struct domain *cd, struct domain *d) +{ + uint32_t tsc_mode; + uint32_t gtsc_khz; + uint32_t incarnation; + uint64_t elapsed_nsec; + + tsc_get_info(d, &tsc_mode, &elapsed_nsec, >sc_khz, &incarnation); + /* Don't bump incarnation on set */ + tsc_set_info(cd, tsc_mode, elapsed_nsec, gtsc_khz, incarnation - 1); +} + +static int populate_special_pages(struct domain *cd) +{ + struct p2m_domain *p2m =3D p2m_get_hostp2m(cd); + static const unsigned int params[] =3D + { + HVM_PARAM_STORE_PFN, + HVM_PARAM_IOREQ_PFN, + HVM_PARAM_BUFIOREQ_PFN, + HVM_PARAM_CONSOLE_PFN + }; + unsigned int i; + + for ( i=3D0; i<4; i++ ) + { + uint64_t value =3D 0; + mfn_t new_mfn; + struct page_info *page; + + if ( hvm_get_param(cd, params[i], &value) || !value ) + continue; + + if ( !(page =3D alloc_domheap_page(cd, 0)) ) + return -ENOMEM; + + new_mfn =3D page_to_mfn(page); + set_gpfn_from_mfn(mfn_x(new_mfn), value); + + return p2m->set_entry(p2m, _gfn(value), new_mfn, PAGE_ORDER_4K, + p2m_ram_rw, p2m->default_access, -1); + } + + return 0; +} + +static int fork(struct domain *d, struct domain *cd) +{ + int rc =3D -EBUSY; + + if ( !cd->controller_pause_count ) + return rc; + + /* + * We only want to get and pause the parent once, not each time this + * operation is restarted due to preemption. + */ + if ( !cd->parent_paused ) + { + if ( !get_domain(d) ) + { + ASSERT_UNREACHABLE(); + return -EBUSY; + } + + domain_pause(d); + cd->parent_paused =3D true; + cd->max_pages =3D d->max_pages; + cd->max_vcpus =3D d->max_vcpus; + } + + /* this is preemptible so it's the first to get done */ + if ( (rc =3D fork_hap_allocation(cd, d)) ) + goto done; + + if ( (rc =3D bring_up_vcpus(cd, d)) ) + goto done; + + if ( (rc =3D hvm_copy_context_and_params(cd, d)) ) + goto done; + + if ( (rc =3D populate_special_pages(cd)) ) + goto done; + + fork_tsc(cd, d); + + cd->parent =3D d; + + done: + if ( rc && rc !=3D -ERESTART ) + { + domain_unpause(d); + put_domain(d); + cd->parent_paused =3D false; + } + + return rc; +} + int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) { int rc; @@ -1698,6 +1958,33 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem= _sharing_op_t) arg) rc =3D debug_gref(d, mso.u.debug.u.gref); break; =20 + case XENMEM_sharing_op_fork: + { + struct domain *pd; + + rc =3D -EINVAL; + if ( mso.u.fork._pad[0] || mso.u.fork._pad[1] || + mso.u.fork._pad[2] ) + goto out; + + rc =3D rcu_lock_live_remote_domain_by_id(mso.u.fork.parent_domain, + &pd); + if ( rc ) + goto out; + + if ( !mem_sharing_enabled(pd) && (rc =3D mem_sharing_control(pd, t= rue)) ) + goto out; + + rc =3D fork(pd, d); + + if ( rc =3D=3D -ERESTART ) + rc =3D hypercall_create_continuation(__HYPERVISOR_memory_op, + "lh", XENMEM_sharing_op, + arg); + rcu_unlock_domain(pd); + break; + } + default: rc =3D -ENOSYS; break; diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index c5f428d67c..2358808227 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -509,6 +509,12 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, un= signed long gfn_l, =20 mfn =3D p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL); =20 + /* Check if we need to fork the page */ + if ( (q & P2M_ALLOC) && p2m_is_hole(*t) && + !mem_sharing_fork_page(p2m->domain, gfn, !!(q & P2M_UNSHARE)) ) + mfn =3D p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL); + + /* Check if we need to unshare the page */ if ( (q & P2M_UNSHARE) && p2m_is_shared(*t) ) { ASSERT(p2m_is_hostp2m(p2m)); @@ -588,7 +594,8 @@ struct page_info *p2m_get_page_from_gfn( return page; =20 /* Error path: not a suitable GFN at all */ - if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_pod(*t) ) + if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_pod(*t) && + !mem_sharing_is_fork(p2m->domain) ) return NULL; } =20 diff --git a/xen/common/domain.c b/xen/common/domain.c index 6ad458fa6b..02998235dd 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -1269,6 +1269,9 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, = unsigned offset) =20 v->vcpu_info =3D new_info; v->vcpu_info_mfn =3D page_to_mfn(page); +#ifdef CONFIG_MEM_SHARING + v->vcpu_info_offset =3D offset; +#endif =20 /* Set new vcpu_info pointer /before/ setting pending flags. */ smp_wmb(); diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h index b94bfb4ed0..1bf07e49fe 100644 --- a/xen/include/asm-x86/hap.h +++ b/xen/include/asm-x86/hap.h @@ -45,6 +45,7 @@ int hap_track_dirty_vram(struct domain *d, =20 extern const struct paging_mode *hap_paging_get_mode(struct vcpu *); int hap_set_allocation(struct domain *d, unsigned int pages, bool *preempt= ed); +unsigned int hap_get_allocation(struct domain *d); =20 #endif /* XEN_HAP_H */ =20 diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index 24da824cbf..35e970b030 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -339,6 +339,8 @@ bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, = struct vcpu *v), =20 int hvm_copy_context_and_params(struct domain *src, struct domain *dst); =20 +int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value); + #ifdef CONFIG_HVM =20 #define hvm_get_guest_tsc(v) hvm_get_guest_tsc_fixed(v, 0) diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sh= aring.h index 53760a2896..ac968fae3f 100644 --- a/xen/include/asm-x86/mem_sharing.h +++ b/xen/include/asm-x86/mem_sharing.h @@ -39,6 +39,9 @@ struct mem_sharing_domain =20 #define mem_sharing_enabled(d) ((d)->arch.hvm.mem_sharing.enabled) =20 +#define mem_sharing_is_fork(d) \ + (mem_sharing_enabled(d) && !!((d)->parent)) + /* Auditing of memory sharing code? */ #ifndef NDEBUG #define MEM_SHARING_AUDIT 1 @@ -88,6 +91,9 @@ static inline int mem_sharing_unshare_page(struct domain = *d, return rc; } =20 +int mem_sharing_fork_page(struct domain *d, gfn_t gfn, + bool unsharing); + /* * If called by a foreign domain, possible errors are * -EBUSY -> ring full @@ -117,6 +123,7 @@ int relinquish_shared_pages(struct domain *d); #else =20 #define mem_sharing_enabled(d) false +#define mem_sharing_is_fork(p2m) false =20 static inline unsigned int mem_sharing_get_nr_saved_mfns(void) { @@ -141,6 +148,16 @@ static inline int mem_sharing_notify_enomem(struct dom= ain *d, unsigned long gfn, return -EOPNOTSUPP; } =20 +static inline int mem_sharing_fork(struct domain *d, struct domain *cd, bo= ol vcpu) +{ + return -EOPNOTSUPP; +} + +static inline int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool = lock) +{ + return -EOPNOTSUPP; +} + #endif =20 #endif /* __MEM_SHARING_H__ */ diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h index 126d0ff06e..c1dbad060e 100644 --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -482,6 +482,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t); #define XENMEM_sharing_op_add_physmap 6 #define XENMEM_sharing_op_audit 7 #define XENMEM_sharing_op_range_share 8 +#define XENMEM_sharing_op_fork 9 =20 #define XENMEM_SHARING_OP_S_HANDLE_INVALID (-10) #define XENMEM_SHARING_OP_C_HANDLE_INVALID (-9) @@ -532,6 +533,10 @@ struct xen_mem_sharing_op { uint32_t gref; /* IN: gref to debug */ } u; } debug; + struct mem_sharing_op_fork { /* OP_FORK */ + domid_t parent_domain; /* IN: parent's domain id */ + uint16_t _pad[3]; /* Must be set to 0 */ + } fork; } u; }; typedef struct xen_mem_sharing_op xen_mem_sharing_op_t; diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 3a4f43098c..c6ba5a52a4 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -248,6 +248,9 @@ struct vcpu =20 /* Guest-specified relocation of vcpu_info. */ mfn_t vcpu_info_mfn; +#ifdef CONFIG_MEM_SHARING + uint32_t vcpu_info_offset; +#endif =20 struct evtchn_fifo_vcpu *evtchn_fifo; =20 @@ -503,6 +506,8 @@ struct domain /* Memory sharing support */ #ifdef CONFIG_MEM_SHARING struct vm_event_domain *vm_event_share; + struct domain *parent; /* VM fork parent */ + bool parent_paused; #endif /* Memory paging support */ #ifdef CONFIG_HAS_MEM_PAGING --=20 2.20.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel