From nobody Fri Apr 26 13:40:48 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail(p=none dis=none) header.from=intel.com ARC-Seal: i=1; a=rsa-sha256; t=1584983146; cv=none; d=zohomail.com; s=zohoarc; b=jrzgPtb0PmSSH18fCrvegykSIWfD+tlSky2UMYTzNClCuvuvX0EC5eK4E0X6xekPu9XGFk8sA6dzE2hpi0YJ9VIckw+chLQB2GKe8Nrg9VOJdW/ZUkuu2Flg4J18Whq5RFM9qJbVFThaWbvKTpnp8mN07u33pbbb91cUgPq1cNU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1584983146; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=NYGLQ3dI+XOzDqiTYoLYb3IC1xhYXZkzZXLMl1Jml6w=; b=V2KRWOFsjdeX7NSniiyTJmnOTc8vXfAm/iVcTNTyEA6WrKf5rE7VCYp29uEeNBeSuTWJp27e4KFL+VxPe6u6lUCO3LuQ72Fy7c/2zax91/9LNSI1pa71ZDRDuyVScWxXyU4wPtlY/awafct6JU28mt5KoO8ibl2g6ZZNEQAViBI= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1584983146802818.4174368704669; Mon, 23 Mar 2020 10:05:46 -0700 (PDT) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1jGQVW-0003Ht-3g; Mon, 23 Mar 2020 17:04:58 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1jGQVU-0003Hn-IT for xen-devel@lists.xenproject.org; Mon, 23 Mar 2020 17:04:56 +0000 Received: from mga03.intel.com (unknown [134.134.136.65]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 67fbc398-6d28-11ea-bec1-bc764e2007e4; Mon, 23 Mar 2020 17:04:50 +0000 (UTC) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 10:04:48 -0700 Received: from chengwei-mobl2.amr.corp.intel.com (HELO localhost.localdomain) ([10.251.233.37]) by orsmga004.jf.intel.com with ESMTP; 23 Mar 2020 10:04:47 -0700 X-Inumbo-ID: 67fbc398-6d28-11ea-bec1-bc764e2007e4 IronPort-SDR: Rm8XAIYOWVzU8+9mEmFY9pzRSC9IAVLdcKIW8H1eo1bzhVD2SfCyLkpit4yxT2B2bGBzFc1EjF zAviOfp8wK7w== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False IronPort-SDR: CoxcxkLyIKENA1r71MCa6a58iFhQ9DQ+37obSFW60fwnFexdzHGDRxhqJ6hCFMOF0GnXl503rQ xEA2n3AACQ3A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,297,1580803200"; d="scan'208";a="392975012" From: Tamas K Lengyel To: xen-devel@lists.xenproject.org Date: Mon, 23 Mar 2020 10:04:35 -0700 Message-Id: X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: [Xen-devel] [PATCH v12 1/3] xen/mem_sharing: VM forking X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Stefano Stabellini , Tamas K Lengyel , Wei Liu , Andrew Cooper , Ian Jackson , George Dunlap , Tamas K Lengyel , Jan Beulich , Julien Grall , =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" Content-Type: text/plain; charset="utf-8" VM forking is the process of creating a domain with an empty memory space a= nd a parent domain specified from which to populate the memory when necessary. F= or the new domain to be functional the VM state is copied over as part of the = fork operation (HVM params, hap allocation, etc). Signed-off-by: Tamas K Lengyel Acked-by: Jan Beulich --- v12: Minor style adjustments Jan pointed out Convert mem_sharing_is_fork to inline function v11: Fully copy vcpu_info pages Setup vcpu_runstate for forks Added TODO note for PV timers Copy shared_info page Add copy_settings function, to be shared with fork_reset in the next p= atch --- xen/arch/x86/domain.c | 11 + xen/arch/x86/hvm/hvm.c | 4 +- xen/arch/x86/mm/hap/hap.c | 3 +- xen/arch/x86/mm/mem_sharing.c | 368 ++++++++++++++++++++++++++++++ xen/arch/x86/mm/p2m.c | 9 +- xen/common/domain.c | 3 + xen/include/asm-x86/hap.h | 1 + xen/include/asm-x86/hvm/hvm.h | 2 + xen/include/asm-x86/mem_sharing.h | 18 ++ xen/include/public/memory.h | 5 + xen/include/xen/sched.h | 5 + 11 files changed, 424 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index caf2ecad7e..11d3c2216e 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -2202,6 +2202,17 @@ int domain_relinquish_resources(struct domain *d) ret =3D relinquish_shared_pages(d); if ( ret ) return ret; + + /* + * If the domain is forked, decrement the parent's pause count + * and release the domain. + */ + if ( mem_sharing_is_fork(d) ) + { + domain_unpause(d->parent); + put_domain(d->parent); + d->parent =3D NULL; + } } #endif =20 diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index a3d115b650..304b3d1562 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1917,7 +1917,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned l= ong gla, } #endif =20 - /* Spurious fault? PoD and log-dirty also take this path. */ + /* Spurious fault? PoD, log-dirty and VM forking also take this path. = */ if ( p2m_is_ram(p2mt) ) { rc =3D 1; @@ -4377,7 +4377,7 @@ static int hvm_allow_get_param(struct domain *d, return rc; } =20 -static int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value) +int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value) { int rc; =20 diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index a6d5e39b02..814d0c3253 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -321,8 +321,7 @@ static void hap_free_p2m_page(struct domain *d, struct = page_info *pg) } =20 /* Return the size of the pool, rounded up to the nearest MB */ -static unsigned int -hap_get_allocation(struct domain *d) +unsigned int hap_get_allocation(struct domain *d) { unsigned int pg =3D d->arch.paging.hap.total_pages + d->arch.paging.hap.p2m_pages; diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 3835bc928f..23deeddff2 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -22,6 +22,7 @@ =20 #include #include +#include #include #include #include @@ -36,6 +37,8 @@ #include #include #include +#include +#include #include =20 #include "mm-locks.h" @@ -1444,6 +1447,334 @@ static inline int mem_sharing_control(struct domain= *d, bool enable) return 0; } =20 +/* + * Forking a page only gets called when the VM faults due to no entry being + * in the EPT for the access. Depending on the type of access we either + * populate the physmap with a shared entry for read-only access or + * fork the page if its a write access. + * + * The client p2m is already locked so we only need to lock + * the parent's here. + */ +int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing) +{ + int rc =3D -ENOENT; + shr_handle_t handle; + struct domain *parent =3D d->parent; + struct p2m_domain *p2m; + unsigned long gfn_l =3D gfn_x(gfn); + mfn_t mfn, new_mfn; + p2m_type_t p2mt; + struct page_info *page; + + if ( !mem_sharing_is_fork(d) ) + return -ENOENT; + + if ( !unsharing ) + { + /* For read-only accesses we just add a shared entry to the physma= p */ + while ( parent ) + { + if ( !(rc =3D nominate_page(parent, gfn, 0, &handle)) ) + break; + + parent =3D parent->parent; + } + + if ( !rc ) + { + /* The client's p2m is already locked */ + struct p2m_domain *pp2m =3D p2m_get_hostp2m(parent); + + p2m_lock(pp2m); + rc =3D add_to_physmap(parent, gfn_l, handle, d, gfn_l, false); + p2m_unlock(pp2m); + + if ( !rc ) + return 0; + } + } + + /* + * If it's a write access (ie. unsharing) or if adding a shared entry = to + * the physmap failed we'll fork the page directly. + */ + p2m =3D p2m_get_hostp2m(d); + parent =3D d->parent; + + while ( parent ) + { + mfn =3D get_gfn_query(parent, gfn_l, &p2mt); + + /* + * We can't fork grant memory from the parent, only regular ram. + */ + if ( mfn_valid(mfn) && p2m_is_ram(p2mt) ) + break; + + put_gfn(parent, gfn_l); + parent =3D parent->parent; + } + + if ( !parent ) + return -ENOENT; + + if ( !(page =3D alloc_domheap_page(d, 0)) ) + { + put_gfn(parent, gfn_l); + return -ENOMEM; + } + + new_mfn =3D page_to_mfn(page); + copy_domain_page(new_mfn, mfn); + set_gpfn_from_mfn(mfn_x(new_mfn), gfn_l); + + put_gfn(parent, gfn_l); + + return p2m->set_entry(p2m, gfn, new_mfn, PAGE_ORDER_4K, p2m_ram_rw, + p2m->default_access, -1); +} + +static int bring_up_vcpus(struct domain *cd, struct domain *d) +{ + unsigned int i; + int ret =3D -EINVAL; + + if ( d->max_vcpus !=3D cd->max_vcpus || + (ret =3D cpupool_move_domain(cd, d->cpupool)) ) + return ret; + + for ( i =3D 0; i < cd->max_vcpus; i++ ) + { + if ( !d->vcpu[i] || cd->vcpu[i] ) + continue; + + if ( !vcpu_create(cd, i) ) + return -EINVAL; + } + + domain_update_node_affinity(cd); + return 0; +} + +static int copy_vcpu_settings(struct domain *cd, struct domain *d) +{ + unsigned int i; + struct p2m_domain *p2m =3D p2m_get_hostp2m(cd); + int ret =3D -EINVAL; + + for ( i =3D 0; i < cd->max_vcpus; i++ ) + { + const struct vcpu *d_vcpu =3D d->vcpu[i]; + struct vcpu *cd_vcpu =3D cd->vcpu[i]; + struct vcpu_runstate_info runstate; + mfn_t vcpu_info_mfn; + + if ( !d_vcpu || !cd_vcpu ) + continue; + + /* + * Copy & map in the vcpu_info page if the guest uses one + */ + vcpu_info_mfn =3D d_vcpu->vcpu_info_mfn; + if ( !mfn_eq(vcpu_info_mfn, INVALID_MFN) ) + { + mfn_t new_vcpu_info_mfn =3D cd_vcpu->vcpu_info_mfn; + + /* + * Allocate & map the page for it if it hasn't been already + */ + if ( mfn_eq(new_vcpu_info_mfn, INVALID_MFN) ) + { + gfn_t gfn =3D mfn_to_gfn(d, vcpu_info_mfn); + unsigned long gfn_l =3D gfn_x(gfn); + struct page_info *page; + + if ( !(page =3D alloc_domheap_page(cd, 0)) ) + return -ENOMEM; + + new_vcpu_info_mfn =3D page_to_mfn(page); + set_gpfn_from_mfn(mfn_x(new_vcpu_info_mfn), gfn_l); + + ret =3D p2m->set_entry(p2m, gfn, new_vcpu_info_mfn, PAGE_O= RDER_4K, + p2m_ram_rw, p2m->default_access, -1); + if ( ret ) + return ret; + + ret =3D map_vcpu_info(cd_vcpu, gfn_l, + d_vcpu->vcpu_info_offset); + if ( ret ) + return ret; + } + + copy_domain_page(new_vcpu_info_mfn, vcpu_info_mfn); + } + + /* + * Setup the vCPU runstate area + */ + if ( guest_handle_is_null(runstate_guest(cd_vcpu)) ) + { + runstate_guest(cd_vcpu) =3D runstate_guest(d_vcpu); + vcpu_runstate_get(cd_vcpu, &runstate); + __copy_to_guest(runstate_guest(cd_vcpu), &runstate, 1); + } + + /* + * TODO: to support VMs with PV interfaces copy additional + * settings here, such as PV timers. + */ + } + + return 0; +} + +static int fork_hap_allocation(struct domain *cd, struct domain *d) +{ + int rc; + bool preempted; + unsigned long mb =3D hap_get_allocation(d); + + if ( mb =3D=3D hap_get_allocation(cd) ) + return 0; + + paging_lock(cd); + rc =3D hap_set_allocation(cd, mb << (20 - PAGE_SHIFT), &preempted); + paging_unlock(cd); + + return preempted ? -ERESTART : rc; +} + +static void copy_tsc(struct domain *cd, struct domain *d) +{ + uint32_t tsc_mode; + uint32_t gtsc_khz; + uint32_t incarnation; + uint64_t elapsed_nsec; + + tsc_get_info(d, &tsc_mode, &elapsed_nsec, >sc_khz, &incarnation); + /* Don't bump incarnation on set */ + tsc_set_info(cd, tsc_mode, elapsed_nsec, gtsc_khz, incarnation - 1); +} + +static int copy_special_pages(struct domain *cd, struct domain *d) +{ + mfn_t new_mfn, old_mfn; + struct p2m_domain *p2m =3D p2m_get_hostp2m(cd); + static const unsigned int params[] =3D + { + HVM_PARAM_STORE_PFN, + HVM_PARAM_IOREQ_PFN, + HVM_PARAM_BUFIOREQ_PFN, + HVM_PARAM_CONSOLE_PFN + }; + unsigned int i; + int rc; + + for ( i =3D 0; i < 4; i++ ) + { + p2m_type_t t; + uint64_t value =3D 0; + struct page_info *page; + + if ( hvm_get_param(cd, params[i], &value) || !value ) + continue; + + old_mfn =3D get_gfn_query_unlocked(d, value, &t); + new_mfn =3D get_gfn_query_unlocked(cd, value, &t); + + /* + * Allocate the page and map it in if it's not present + */ + if ( mfn_eq(new_mfn, INVALID_MFN) ) + { + if ( !(page =3D alloc_domheap_page(cd, 0)) ) + return -ENOMEM; + + new_mfn =3D page_to_mfn(page); + set_gpfn_from_mfn(mfn_x(new_mfn), value); + + rc =3D p2m->set_entry(p2m, _gfn(value), new_mfn, PAGE_ORDER_4K, + p2m_ram_rw, p2m->default_access, -1); + if ( rc ) + return rc; + } + + copy_domain_page(new_mfn, old_mfn); + } + + old_mfn =3D _mfn(virt_to_mfn(d->shared_info)); + new_mfn =3D _mfn(virt_to_mfn(cd->shared_info)); + copy_domain_page(new_mfn, old_mfn); + + return 0; +} + +static int copy_settings(struct domain *cd, struct domain *d) +{ + int rc; + + if ( (rc =3D copy_vcpu_settings(cd, d)) ) + return rc; + + if ( (rc =3D hvm_copy_context_and_params(cd, d)) ) + return rc; + + if ( (rc =3D copy_special_pages(cd, d)) ) + return rc; + + copy_tsc(cd, d); + + return rc; +} + +static int fork(struct domain *cd, struct domain *d) +{ + int rc =3D -EBUSY; + + if ( !cd->controller_pause_count ) + return rc; + + /* + * We only want to get and pause the parent once, not each time this + * operation is restarted due to preemption. + */ + if ( !cd->parent_paused ) + { + if ( !get_domain(d) ) + { + ASSERT_UNREACHABLE(); + return -EBUSY; + } + + domain_pause(d); + cd->parent_paused =3D true; + cd->max_pages =3D d->max_pages; + } + + /* this is preemptible so it's the first to get done */ + if ( (rc =3D fork_hap_allocation(cd, d)) ) + goto done; + + if ( (rc =3D bring_up_vcpus(cd, d)) ) + goto done; + + if ( (rc =3D copy_settings(cd, d)) ) + goto done; + + cd->parent =3D d; + + done: + if ( rc && rc !=3D -ERESTART ) + { + domain_unpause(d); + put_domain(d); + cd->parent_paused =3D false; + } + + return rc; +} + int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) { int rc; @@ -1698,6 +2029,43 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem= _sharing_op_t) arg) rc =3D debug_gref(d, mso.u.debug.u.gref); break; =20 + case XENMEM_sharing_op_fork: + { + struct domain *pd; + + rc =3D -EINVAL; + if ( mso.u.fork.pad[0] || mso.u.fork.pad[1] || + mso.u.fork.pad[2] ) + goto out; + + rc =3D rcu_lock_live_remote_domain_by_id(mso.u.fork.parent_domain, + &pd); + if ( rc ) + goto out; + + rc =3D -EINVAL; + if ( pd->max_vcpus !=3D d->max_vcpus ) + { + rcu_unlock_domain(pd); + goto out; + } + + if ( !mem_sharing_enabled(pd) && (rc =3D mem_sharing_control(pd, t= rue)) ) + { + rcu_unlock_domain(pd); + goto out; + } + + rc =3D fork(d, pd); + + if ( rc =3D=3D -ERESTART ) + rc =3D hypercall_create_continuation(__HYPERVISOR_memory_op, + "lh", XENMEM_sharing_op, + arg); + rcu_unlock_domain(pd); + break; + } + default: rc =3D -ENOSYS; break; diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 9f51370327..1ed7d13084 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -509,6 +509,12 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, un= signed long gfn_l, =20 mfn =3D p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL); =20 + /* Check if we need to fork the page */ + if ( (q & P2M_ALLOC) && p2m_is_hole(*t) && + !mem_sharing_fork_page(p2m->domain, gfn, q & P2M_UNSHARE) ) + mfn =3D p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL); + + /* Check if we need to unshare the page */ if ( (q & P2M_UNSHARE) && p2m_is_shared(*t) ) { ASSERT(p2m_is_hostp2m(p2m)); @@ -588,7 +594,8 @@ struct page_info *p2m_get_page_from_gfn( return page; =20 /* Error path: not a suitable GFN at all */ - if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_pod(*t) ) + if ( !p2m_is_ram(*t) && !p2m_is_paging(*t) && !p2m_is_pod(*t) && + !mem_sharing_is_fork(p2m->domain) ) return NULL; } =20 diff --git a/xen/common/domain.c b/xen/common/domain.c index b4eb476a9c..62aed53a16 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -1270,6 +1270,9 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, = unsigned offset) =20 v->vcpu_info =3D new_info; v->vcpu_info_mfn =3D page_to_mfn(page); +#ifdef CONFIG_MEM_SHARING + v->vcpu_info_offset =3D offset; +#endif =20 /* Set new vcpu_info pointer /before/ setting pending flags. */ smp_wmb(); diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h index b94bfb4ed0..1bf07e49fe 100644 --- a/xen/include/asm-x86/hap.h +++ b/xen/include/asm-x86/hap.h @@ -45,6 +45,7 @@ int hap_track_dirty_vram(struct domain *d, =20 extern const struct paging_mode *hap_paging_get_mode(struct vcpu *); int hap_set_allocation(struct domain *d, unsigned int pages, bool *preempt= ed); +unsigned int hap_get_allocation(struct domain *d); =20 #endif /* XEN_HAP_H */ =20 diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index b007b2e343..f283c7d187 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -336,6 +336,8 @@ unsigned long hvm_cr4_guest_valid_bits(const struct dom= ain *d, bool restore); =20 int hvm_copy_context_and_params(struct domain *src, struct domain *dst); =20 +int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value); + #ifdef CONFIG_HVM =20 #define hvm_get_guest_tsc(v) hvm_get_guest_tsc_fixed(v, 0) diff --git a/xen/include/asm-x86/mem_sharing.h b/xen/include/asm-x86/mem_sh= aring.h index 53b7929d0e..78c3a2c343 100644 --- a/xen/include/asm-x86/mem_sharing.h +++ b/xen/include/asm-x86/mem_sharing.h @@ -77,6 +77,14 @@ static inline int mem_sharing_unshare_page(struct domain= *d, return rc; } =20 +static inline bool mem_sharing_is_fork(struct domain *d) +{ + return d->parent; +} + +int mem_sharing_fork_page(struct domain *d, gfn_t gfn, + bool unsharing); + /* * If called by a foreign domain, possible errors are * -EBUSY -> ring full @@ -130,6 +138,16 @@ static inline int mem_sharing_notify_enomem(struct dom= ain *d, unsigned long gfn, return -EOPNOTSUPP; } =20 +static inline bool mem_sharing_is_fork(struct domain *d) +{ + return false; +} + +static inline int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool = lock) +{ + return -EOPNOTSUPP; +} + #endif =20 #endif /* __MEM_SHARING_H__ */ diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h index 126d0ff06e..5ee4e0da12 100644 --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -482,6 +482,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t); #define XENMEM_sharing_op_add_physmap 6 #define XENMEM_sharing_op_audit 7 #define XENMEM_sharing_op_range_share 8 +#define XENMEM_sharing_op_fork 9 =20 #define XENMEM_SHARING_OP_S_HANDLE_INVALID (-10) #define XENMEM_SHARING_OP_C_HANDLE_INVALID (-9) @@ -532,6 +533,10 @@ struct xen_mem_sharing_op { uint32_t gref; /* IN: gref to debug */ } u; } debug; + struct mem_sharing_op_fork { /* OP_FORK */ + domid_t parent_domain; /* IN: parent's domain id */ + uint16_t pad[3]; /* Must be set to 0 */ + } fork; } u; }; typedef struct xen_mem_sharing_op xen_mem_sharing_op_t; diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index e6813288ab..881f2bb0c2 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -247,6 +247,9 @@ struct vcpu =20 /* Guest-specified relocation of vcpu_info. */ mfn_t vcpu_info_mfn; +#ifdef CONFIG_MEM_SHARING + unsigned short vcpu_info_offset; +#endif =20 struct evtchn_fifo_vcpu *evtchn_fifo; =20 @@ -480,6 +483,8 @@ struct domain /* Memory sharing support */ #ifdef CONFIG_MEM_SHARING struct vm_event_domain *vm_event_share; + struct domain *parent; /* VM fork parent */ + bool parent_paused; #endif /* Memory paging support */ #ifdef CONFIG_HAS_MEM_PAGING --=20 2.20.1 From nobody Fri Apr 26 13:40:48 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail(p=none dis=none) header.from=intel.com ARC-Seal: i=1; a=rsa-sha256; t=1584983143; cv=none; d=zohomail.com; s=zohoarc; b=LEjWtWQoduaAu1mMAkvFCB1JQD837XsmdWTgZ5xfaBkvxwHuvIX1+kewJM3RXR/MiJGZLF/zSoyoVoMdbjdqtBFgnatGT2FoyS0jiuohgI3/9oLiADmRzWnxYSiEBmuqOaf0/x8A7menPvda+ShhaMeDnRFjmXir44VpWRP+u/U= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1584983143; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=+Mk4g2/ku88aPWBScuL6A/nhyvgtq0VbDSa7BwMddE8=; b=TDRtKgG6Eq5VwBnIei1vT2mTrsGTHm6NOOMQ8lVsTysjFwPe7O3x2QNiPl18dTBGqnNuVSMbIbInHUAUIpzVnYUt0hP/uqmSCFORomF8flSFT8V0y/l2/IHrTGtw/GFaYf7SD0FbZpmt7taOco0DFy1milXXdsqgkfBCqhFuvPk= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1584983143923546.5653671490282; Mon, 23 Mar 2020 10:05:43 -0700 (PDT) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1jGQVa-0003JK-HB; Mon, 23 Mar 2020 17:05:02 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1jGQVZ-0003J4-Ip for xen-devel@lists.xenproject.org; Mon, 23 Mar 2020 17:05:01 +0000 Received: from mga03.intel.com (unknown [134.134.136.65]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 68fa6d80-6d28-11ea-a6c1-bc764e2007e4; Mon, 23 Mar 2020 17:04:51 +0000 (UTC) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 10:04:50 -0700 Received: from chengwei-mobl2.amr.corp.intel.com (HELO localhost.localdomain) ([10.251.233.37]) by orsmga004.jf.intel.com with ESMTP; 23 Mar 2020 10:04:48 -0700 X-Inumbo-ID: 68fa6d80-6d28-11ea-a6c1-bc764e2007e4 IronPort-SDR: N1a9jo2Rb3klV4g1ppjXgv+agI3Oc2poMKdFRtbiXQcTOB8x64qdpgVVolv1D7M6LQD0Gm4yPk ZUnjJQLF4psQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False IronPort-SDR: GLchKTekAqT8g5YOq5lgi+F+qApkA+YQiyAsaxnT/isWNjAuXHeC6UQmO1Wpz6hJ9rsvGHFVyQ 7hjBZLrcYUYw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,297,1580803200"; d="scan'208";a="392975019" From: Tamas K Lengyel To: xen-devel@lists.xenproject.org Date: Mon, 23 Mar 2020 10:04:36 -0700 Message-Id: <46457bd6e877abe12a8c005c23f0f1aab13afd24.1584981438.git.tamas.lengyel@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: [Xen-devel] [PATCH v12 2/3] x86/mem_sharing: reset a fork X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Tamas K Lengyel , Tamas K Lengyel , Wei Liu , Andrew Cooper , Ian Jackson , George Dunlap , Stefano Stabellini , Jan Beulich , Julien Grall , =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" Content-Type: text/plain; charset="utf-8" Implement hypercall that allows a fork to shed all memory that got allocated for it during its execution and re-load its vCPU context from the parent VM. This allows the forked VM to reset into the same state the parent VM is in a faster way then creating a new fork would be. Measurements show about a 2x speedup during normal fuzzing operations. Performance may vary depending how much memory got allocated for the forked VM. If it has been completely deduplicated from the parent VM then creating a new fork would likely be mo= re performant. Signed-off-by: Tamas K Lengyel Reviewed-by: Roger Pau Monn=C3=A9 --- v12: remove continuation & add comment back address style issues pointed out by Jan --- xen/arch/x86/mm/mem_sharing.c | 77 +++++++++++++++++++++++++++++++++++ xen/include/public/memory.h | 1 + 2 files changed, 78 insertions(+) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 23deeddff2..930a5f58ef 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1775,6 +1775,60 @@ static int fork(struct domain *cd, struct domain *d) return rc; } =20 +/* + * The fork reset operation is intended to be used on short-lived forks on= ly. + * There is no hypercall continuation operation implemented for this reaso= n. + * For forks that obtain a larger memory footprint it is likely going to be + * more performant to create a new fork instead of resetting an existing o= ne. + * + * TODO: In case this hypercall would become useful on forks with larger m= emory + * footprints the hypercall continuation should be implemented (or if this + * feature needs to be become "stable"). + */ +static int mem_sharing_fork_reset(struct domain *d, struct domain *pd) +{ + int rc; + struct p2m_domain *p2m =3D p2m_get_hostp2m(d); + struct page_info *page, *tmp; + + spin_lock(&d->page_alloc_lock); + domain_pause(d); + + page_list_for_each_safe(page, tmp, &d->page_list) + { + p2m_type_t p2mt; + p2m_access_t p2ma; + mfn_t mfn =3D page_to_mfn(page); + gfn_t gfn =3D mfn_to_gfn(d, mfn); + + mfn =3D __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma, + 0, NULL, false); + + /* only reset pages that are sharable */ + if ( !p2m_is_sharable(p2mt) ) + continue; + + /* take an extra reference or just skip if can't for whatever reas= on */ + if ( !get_page(page, d) ) + continue; + + /* forked memory is 4k, not splitting large pages so this must wor= k */ + rc =3D p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K, + p2m_invalid, p2m_access_rwx, -1); + ASSERT(!rc); + + put_page_alloc_ref(page); + put_page(page); + } + + rc =3D copy_settings(d, pd); + + domain_unpause(d); + spin_unlock(&d->page_alloc_lock); + + return rc; +} + int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) { int rc; @@ -2066,6 +2120,29 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem= _sharing_op_t) arg) break; } =20 + case XENMEM_sharing_op_fork_reset: + { + struct domain *pd; + + rc =3D -EINVAL; + if ( mso.u.fork.pad[0] || mso.u.fork.pad[1] || + mso.u.fork.pad[2] ) + goto out; + + rc =3D -ENOSYS; + if ( !d->parent ) + goto out; + + rc =3D rcu_lock_live_remote_domain_by_id(d->parent->domain_id, &pd= ); + if ( rc ) + goto out; + + rc =3D mem_sharing_fork_reset(d, pd); + + rcu_unlock_domain(pd); + break; + } + default: rc =3D -ENOSYS; break; diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h index 5ee4e0da12..d36d64b8dc 100644 --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -483,6 +483,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t); #define XENMEM_sharing_op_audit 7 #define XENMEM_sharing_op_range_share 8 #define XENMEM_sharing_op_fork 9 +#define XENMEM_sharing_op_fork_reset 10 =20 #define XENMEM_SHARING_OP_S_HANDLE_INVALID (-10) #define XENMEM_SHARING_OP_C_HANDLE_INVALID (-9) --=20 2.20.1 From nobody Fri Apr 26 13:40:48 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail(p=none dis=none) header.from=intel.com ARC-Seal: i=1; a=rsa-sha256; t=1584983149; cv=none; d=zohomail.com; s=zohoarc; b=iwxJUFNkbBAKV/uRvlGNNOPQ37wQWKuc8EA/b9r5REzq4N6dE2PfOI1W2mkqnCWzEHAemcIY/09yF3Xi6l4p1mdTkXo2Q8an48IIEkPtzgYxqb0jA91lwO0aPbHHOxU+gZ+uxSQyQ57rxfHnHSyN06XrFfZi96qJrX1QR2uZmYY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1584983149; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=2dGQi6OrlIT6lAlFQf3f84FOl0fylrANpjJGNhmXyOc=; b=aP2cJbwxLwz4Ca79aQffldGLPffvaP52Z05ktnQ5jh48CoaQC8gHIVr+NNlry48OyZjBW1ONjYEtAkEvcjclr9RndO6C5vXK0XdIfyqFrO47RMIdjMUAWz5RqhIQIfK0mUzZLtRJ3buYT30RVMUsgVcIsxXUG2RzS99uzo/D5aw= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1584983149722153.8097749574589; Mon, 23 Mar 2020 10:05:49 -0700 (PDT) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1jGQVf-0003KU-Qk; Mon, 23 Mar 2020 17:05:07 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1jGQVe-0003KG-JT for xen-devel@lists.xenproject.org; Mon, 23 Mar 2020 17:05:06 +0000 Received: from mga03.intel.com (unknown [134.134.136.65]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 69533ff0-6d28-11ea-bec1-bc764e2007e4; Mon, 23 Mar 2020 17:04:52 +0000 (UTC) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 10:04:51 -0700 Received: from chengwei-mobl2.amr.corp.intel.com (HELO localhost.localdomain) ([10.251.233.37]) by orsmga004.jf.intel.com with ESMTP; 23 Mar 2020 10:04:50 -0700 X-Inumbo-ID: 69533ff0-6d28-11ea-bec1-bc764e2007e4 IronPort-SDR: /BuS0QpbaLtr0WUW63bAMtosZUOd5hZqIWPhCtnez9WurDoDW4aT4EJGcCYcNBp7c3UurPBfuW RYH4RlDI33wA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False IronPort-SDR: wl+4/9tEKCKjPQlHi5d86EdiZ+O4W1eCPTGVOgA7jUDjpBlpq2IB+K2jWw7MOIlLPhfWxDk60D KHL10TWeHJzA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,297,1580803200"; d="scan'208";a="392975023" From: Tamas K Lengyel To: xen-devel@lists.xenproject.org Date: Mon, 23 Mar 2020 10:04:37 -0700 Message-Id: <65b4006fab035a89d7731fa16bae642e4c19e8ad.1584981438.git.tamas.lengyel@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: [Xen-devel] [PATCH v12 3/3] xen/tools: VM forking toolstack side X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Anthony PERARD , Ian Jackson , Tamas K Lengyel , Wei Liu Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" Content-Type: text/plain; charset="utf-8" Add necessary bits to implement "xl fork-vm" commands. The command allows t= he user to specify how to launch the device model allowing for a late-launch m= odel in which the user can execute the fork without the device model and decide = to only later launch it. Signed-off-by: Tamas K Lengyel --- docs/man/xl.1.pod.in | 44 +++++ tools/libxc/include/xenctrl.h | 13 ++ tools/libxc/xc_memshr.c | 22 +++ tools/libxl/libxl.h | 11 ++ tools/libxl/libxl_create.c | 361 +++++++++++++++++++--------------- tools/libxl/libxl_dm.c | 2 +- tools/libxl/libxl_dom.c | 43 +++- tools/libxl/libxl_internal.h | 7 + tools/libxl/libxl_types.idl | 1 + tools/libxl/libxl_x86.c | 41 ++++ tools/xl/Makefile | 2 +- tools/xl/xl.h | 5 + tools/xl/xl_cmdtable.c | 15 ++ tools/xl/xl_forkvm.c | 147 ++++++++++++++ tools/xl/xl_vmcontrol.c | 14 ++ 15 files changed, 562 insertions(+), 166 deletions(-) create mode 100644 tools/xl/xl_forkvm.c diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in index 09339282e6..59c03c6427 100644 --- a/docs/man/xl.1.pod.in +++ b/docs/man/xl.1.pod.in @@ -708,6 +708,50 @@ above). =20 =3Dback =20 +=3Ditem B [I] I + +Create a fork of a running VM. The domain will be paused after the operat= ion +and remains paused while forks of it exist. Experimental and x86 only. +Forks can only be made of domains with HAP enabled and on Intel hardware. = The +parent domain must be created with the xl toolstack and its configuration = must +not manually define max_grant_frames, max_maptrack_frames or max_event_cha= nnels. + +B + +=3Dover 4 + +=3Ditem B<-p> + +Leave the fork paused after creating it. + +=3Ditem B<--launch-dm> + +Specify whether the device model (QEMU) should be launched for the fork. L= ate +launch allows to start the device model for an already running fork. + +=3Ditem B<-C> + +The config file to use when launching the device model. Currently require= d when +launching the device model. Most config settings MUST match the parent do= main +exactly, only change VM name, disk path and network configurations. + +=3Ditem B<-Q> + +The path to the qemu save file to use when launching the device model. Cu= rrently +required when launching the device model. + +=3Ditem B<--fork-reset> + +Perform a reset operation of an already running fork. Note that resetting= may +be less performant then creating a new fork depending on how much memory t= he +fork has deduplicated during its runtime. + +=3Ditem B<--max-vcpus> + +Specify the max-vcpus matching the parent domain when not launching the dm. + +=3Dback + =3Ditem B [I] =20 Display the number of shared pages for a specified domain. If no domain is diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index fc6e57a1a0..00cb4cf1f7 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2225,6 +2225,19 @@ int xc_memshr_range_share(xc_interface *xch, uint64_t first_gfn, uint64_t last_gfn); =20 +int xc_memshr_fork(xc_interface *xch, + uint32_t source_domain, + uint32_t client_domain); + +/* + * Note: this function is only intended to be used on short-lived forks th= at + * haven't yet aquired a lot of memory. In case the fork has a lot of memo= ry + * it is likely more performant to create a new fork with xc_memshr_fork. + * + * With VMs that have a lot of memory this call may block for a long time. + */ +int xc_memshr_fork_reset(xc_interface *xch, uint32_t forked_domain); + /* Debug calls: return the number of pages referencing the shared frame ba= cking * the input argument. Should be one or greater. * diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c index 97e2e6a8d9..d0e4ee225b 100644 --- a/tools/libxc/xc_memshr.c +++ b/tools/libxc/xc_memshr.c @@ -239,6 +239,28 @@ int xc_memshr_debug_gref(xc_interface *xch, return xc_memshr_memop(xch, domid, &mso); } =20 +int xc_memshr_fork(xc_interface *xch, uint32_t pdomid, uint32_t domid) +{ + xen_mem_sharing_op_t mso; + + memset(&mso, 0, sizeof(mso)); + + mso.op =3D XENMEM_sharing_op_fork; + mso.u.fork.parent_domain =3D pdomid; + + return xc_memshr_memop(xch, domid, &mso); +} + +int xc_memshr_fork_reset(xc_interface *xch, uint32_t domid) +{ + xen_mem_sharing_op_t mso; + + memset(&mso, 0, sizeof(mso)); + mso.op =3D XENMEM_sharing_op_fork_reset; + + return xc_memshr_memop(xch, domid, &mso); +} + int xc_memshr_audit(xc_interface *xch) { xen_mem_sharing_op_t mso; diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 71709dc585..088e81c78b 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -2666,6 +2666,17 @@ int libxl_psr_get_hw_info(libxl_ctx *ctx, libxl_psr_= feat_type type, unsigned int lvl, unsigned int *nr, libxl_psr_hw_info **info); void libxl_psr_hw_info_list_free(libxl_psr_hw_info *list, unsigned int nr); + +int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t max_vcp= us, uint32_t *domid) + LIBXL_EXTERNAL_CALLERS_ONLY; + +int libxl_domain_fork_launch_dm(libxl_ctx *ctx, libxl_domain_config *d_con= fig, + uint32_t domid, + const libxl_asyncprogress_how *aop_console= _how) + LIBXL_EXTERNAL_CALLERS_ONLY; + +int libxl_domain_fork_reset(libxl_ctx *ctx, uint32_t domid) + LIBXL_EXTERNAL_CALLERS_ONLY; #endif =20 /* misc */ diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index e7cb2dbc2b..5705b6e3a5 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -538,12 +538,12 @@ out: return ret; } =20 -int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config, - libxl__domain_build_state *state, - uint32_t *domid, bool soft_reset) +static int libxl__domain_make_xs_entries(libxl__gc *gc, libxl_domain_confi= g *d_config, + libxl__domain_build_state *state, + uint32_t domid) { libxl_ctx *ctx =3D libxl__gc_owner(gc); - int ret, rc, nb_vm; + int rc, nb_vm; const char *dom_type; char *uuid_string; char *dom_path, *vm_path, *libxl_path; @@ -555,9 +555,6 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_conf= ig *d_config, =20 /* convenience aliases */ libxl_domain_create_info *info =3D &d_config->c_info; - libxl_domain_build_info *b_info =3D &d_config->b_info; - - assert(soft_reset || *domid =3D=3D INVALID_DOMID); =20 uuid_string =3D libxl__uuid2string(gc, info->uuid); if (!uuid_string) { @@ -565,137 +562,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_co= nfig *d_config, goto out; } =20 - if (!soft_reset) { - struct xen_domctl_createdomain create =3D { - .ssidref =3D info->ssidref, - .max_vcpus =3D b_info->max_vcpus, - .max_evtchn_port =3D b_info->event_channels, - .max_grant_frames =3D b_info->max_grant_frames, - .max_maptrack_frames =3D b_info->max_maptrack_frames, - }; - - if (info->type !=3D LIBXL_DOMAIN_TYPE_PV) { - create.flags |=3D XEN_DOMCTL_CDF_hvm; - create.flags |=3D - libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0; - create.flags |=3D - libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off; - } - - assert(info->passthrough !=3D LIBXL_PASSTHROUGH_DEFAULT); - LOG(DETAIL, "passthrough: %s", - libxl_passthrough_to_string(info->passthrough)); - - if (info->passthrough !=3D LIBXL_PASSTHROUGH_DISABLED) - create.flags |=3D XEN_DOMCTL_CDF_iommu; - - if (info->passthrough =3D=3D LIBXL_PASSTHROUGH_SYNC_PT) - create.iommu_opts |=3D XEN_DOMCTL_IOMMU_no_sharept; - - /* Ultimately, handle is an array of 16 uint8_t, same as uuid */ - libxl_uuid_copy(ctx, (libxl_uuid *)&create.handle, &info->uuid); - - ret =3D libxl__arch_domain_prepare_config(gc, d_config, &create); - if (ret < 0) { - LOGED(ERROR, *domid, "fail to get domain config"); - rc =3D ERROR_FAIL; - goto out; - } - - for (;;) { - uint32_t local_domid; - bool recent; - - if (info->domid =3D=3D RANDOM_DOMID) { - uint16_t v; - - ret =3D libxl__random_bytes(gc, (void *)&v, sizeof(v)); - if (ret < 0) - break; - - v &=3D DOMID_MASK; - if (!libxl_domid_valid_guest(v)) - continue; - - local_domid =3D v; - } else { - local_domid =3D info->domid; /* May not be valid */ - } - - ret =3D xc_domain_create(ctx->xch, &local_domid, &create); - if (ret < 0) { - /* - * If we generated a random domid and creation failed - * because that domid already exists then simply try - * again. - */ - if (errno =3D=3D EEXIST && info->domid =3D=3D RANDOM_DOMID) - continue; - - LOGED(ERROR, local_domid, "domain creation fail"); - rc =3D ERROR_FAIL; - goto out; - } - - /* A new domain now exists */ - *domid =3D local_domid; - - rc =3D libxl__is_domid_recent(gc, local_domid, &recent); - if (rc) - goto out; - - /* The domid is not recent, so we're done */ - if (!recent) - break; - - /* - * If the domid was specified then there's no point in - * trying again. - */ - if (libxl_domid_valid_guest(info->domid)) { - LOGED(ERROR, local_domid, "domain id recently used"); - rc =3D ERROR_FAIL; - goto out; - } - - /* - * The domain is recent and so cannot be used. Clear domid - * here since, if xc_domain_destroy() fails below there is - * little point calling it again in the error path. - */ - *domid =3D INVALID_DOMID; - - ret =3D xc_domain_destroy(ctx->xch, local_domid); - if (ret < 0) { - LOGED(ERROR, local_domid, "domain destroy fail"); - rc =3D ERROR_FAIL; - goto out; - } - - /* The domain was successfully destroyed, so we can try again = */ - } - - rc =3D libxl__arch_domain_save_config(gc, d_config, state, &create= ); - if (rc < 0) - goto out; - } - - /* - * If soft_reset is set the the domid will have been valid on entry. - * If it was not set then xc_domain_create() should have assigned a - * valid value. Either way, if we reach this point, domid should be - * valid. - */ - assert(libxl_domid_valid_guest(*domid)); - - ret =3D xc_cpupool_movedomain(ctx->xch, info->poolid, *domid); - if (ret < 0) { - LOGED(ERROR, *domid, "domain move fail"); - rc =3D ERROR_FAIL; - goto out; - } - - dom_path =3D libxl__xs_get_dompath(gc, *domid); + dom_path =3D libxl__xs_get_dompath(gc, domid); if (!dom_path) { rc =3D ERROR_FAIL; goto out; @@ -703,12 +570,12 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_co= nfig *d_config, =20 vm_path =3D GCSPRINTF("/vm/%s", uuid_string); if (!vm_path) { - LOGD(ERROR, *domid, "cannot allocate create paths"); + LOGD(ERROR, domid, "cannot allocate create paths"); rc =3D ERROR_FAIL; goto out; } =20 - libxl_path =3D libxl__xs_libxl_path(gc, *domid); + libxl_path =3D libxl__xs_libxl_path(gc, domid); if (!libxl_path) { rc =3D ERROR_FAIL; goto out; @@ -719,10 +586,10 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_co= nfig *d_config, =20 roperm[0].id =3D 0; roperm[0].perms =3D XS_PERM_NONE; - roperm[1].id =3D *domid; + roperm[1].id =3D domid; roperm[1].perms =3D XS_PERM_READ; =20 - rwperm[0].id =3D *domid; + rwperm[0].id =3D domid; rwperm[0].perms =3D XS_PERM_NONE; =20 retry_transaction: @@ -740,7 +607,7 @@ retry_transaction: noperm, ARRAY_SIZE(noperm)); =20 xs_write(ctx->xsh, t, GCSPRINTF("%s/vm", dom_path), vm_path, strlen(vm= _path)); - rc =3D libxl__domain_rename(gc, *domid, 0, info->name, t); + rc =3D libxl__domain_rename(gc, domid, 0, info->name, t); if (rc) goto out; =20 @@ -830,7 +697,7 @@ retry_transaction: =20 vm_list =3D libxl_list_vm(ctx, &nb_vm); if (!vm_list) { - LOGD(ERROR, *domid, "cannot get number of running guests"); + LOGD(ERROR, domid, "cannot get number of running guests"); rc =3D ERROR_FAIL; goto out; } @@ -854,7 +721,7 @@ retry_transaction: t =3D 0; goto retry_transaction; } - LOGED(ERROR, *domid, "domain creation ""xenstore transaction commi= t failed"); + LOGED(ERROR, domid, "domain creation ""xenstore transaction commit= failed"); rc =3D ERROR_FAIL; goto out; } @@ -866,6 +733,155 @@ retry_transaction: return rc; } =20 +int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config, + libxl__domain_build_state *state, + uint32_t *domid, bool soft_reset) +{ + libxl_ctx *ctx =3D libxl__gc_owner(gc); + int ret, rc; + + /* convenience aliases */ + libxl_domain_create_info *info =3D &d_config->c_info; + libxl_domain_build_info *b_info =3D &d_config->b_info; + + assert(soft_reset || *domid =3D=3D INVALID_DOMID); + + if (!soft_reset) { + struct xen_domctl_createdomain create =3D { + .ssidref =3D info->ssidref, + .max_vcpus =3D b_info->max_vcpus, + .max_evtchn_port =3D b_info->event_channels, + .max_grant_frames =3D b_info->max_grant_frames, + .max_maptrack_frames =3D b_info->max_maptrack_frames, + }; + + if (info->type !=3D LIBXL_DOMAIN_TYPE_PV) { + create.flags |=3D XEN_DOMCTL_CDF_hvm; + create.flags |=3D + libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0; + create.flags |=3D + libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off; + } + + assert(info->passthrough !=3D LIBXL_PASSTHROUGH_DEFAULT); + LOG(DETAIL, "passthrough: %s", + libxl_passthrough_to_string(info->passthrough)); + + if (info->passthrough !=3D LIBXL_PASSTHROUGH_DISABLED) + create.flags |=3D XEN_DOMCTL_CDF_iommu; + + if (info->passthrough =3D=3D LIBXL_PASSTHROUGH_SYNC_PT) + create.iommu_opts |=3D XEN_DOMCTL_IOMMU_no_sharept; + + /* Ultimately, handle is an array of 16 uint8_t, same as uuid */ + libxl_uuid_copy(ctx, (libxl_uuid *)&create.handle, &info->uuid); + + ret =3D libxl__arch_domain_prepare_config(gc, d_config, &create); + if (ret < 0) { + LOGED(ERROR, *domid, "fail to get domain config"); + rc =3D ERROR_FAIL; + goto out; + } + + for (;;) { + uint32_t local_domid; + bool recent; + + if (info->domid =3D=3D RANDOM_DOMID) { + uint16_t v; + + ret =3D libxl__random_bytes(gc, (void *)&v, sizeof(v)); + if (ret < 0) + break; + + v &=3D DOMID_MASK; + if (!libxl_domid_valid_guest(v)) + continue; + + local_domid =3D v; + } else { + local_domid =3D info->domid; /* May not be valid */ + } + + ret =3D xc_domain_create(ctx->xch, &local_domid, &create); + if (ret < 0) { + /* + * If we generated a random domid and creation failed + * because that domid already exists then simply try + * again. + */ + if (errno =3D=3D EEXIST && info->domid =3D=3D RANDOM_DOMID) + continue; + + LOGED(ERROR, local_domid, "domain creation fail"); + rc =3D ERROR_FAIL; + goto out; + } + + /* A new domain now exists */ + *domid =3D local_domid; + + rc =3D libxl__is_domid_recent(gc, local_domid, &recent); + if (rc) + goto out; + + /* The domid is not recent, so we're done */ + if (!recent) + break; + + /* + * If the domid was specified then there's no point in + * trying again. + */ + if (libxl_domid_valid_guest(info->domid)) { + LOGED(ERROR, local_domid, "domain id recently used"); + rc =3D ERROR_FAIL; + goto out; + } + + /* + * The domain is recent and so cannot be used. Clear domid + * here since, if xc_domain_destroy() fails below there is + * little point calling it again in the error path. + */ + *domid =3D INVALID_DOMID; + + ret =3D xc_domain_destroy(ctx->xch, local_domid); + if (ret < 0) { + LOGED(ERROR, local_domid, "domain destroy fail"); + rc =3D ERROR_FAIL; + goto out; + } + + /* The domain was successfully destroyed, so we can try again = */ + } + + rc =3D libxl__arch_domain_save_config(gc, d_config, state, &create= ); + if (rc < 0) + goto out; + } + + /* + * If soft_reset is set the the domid will have been valid on entry. + * If it was not set then xc_domain_create() should have assigned a + * valid value. Either way, if we reach this point, domid should be + * valid. + */ + assert(libxl_domid_valid_guest(*domid)); + + ret =3D xc_cpupool_movedomain(ctx->xch, info->poolid, *domid); + if (ret < 0) { + LOGED(ERROR, *domid, "domain move fail"); + rc =3D ERROR_FAIL; + goto out; + } + + rc =3D libxl__domain_make_xs_entries(gc, d_config, state, *domid); + +out: + return rc; +} + static int store_libxl_entry(libxl__gc *gc, uint32_t domid, libxl_domain_build_info *b_info) { @@ -1191,16 +1207,32 @@ static void initiate_domain_create(libxl__egc *egc, ret =3D libxl__domain_config_setdefault(gc,d_config,domid); if (ret) goto error_out; =20 - ret =3D libxl__domain_make(gc, d_config, &dcs->build_state, &domid, - dcs->soft_reset); - if (ret) { - LOGD(ERROR, domid, "cannot make domain: %d", ret); + if ( !d_config->dm_restore_file ) + { + ret =3D libxl__domain_make(gc, d_config, &dcs->build_state, &domid, + dcs->soft_reset); dcs->guest_domid =3D domid; + + if (ret) { + LOGD(ERROR, domid, "cannot make domain: %d", ret); + ret =3D ERROR_FAIL; + goto error_out; + } + } else if ( dcs->guest_domid !=3D INVALID_DOMID ) { + domid =3D dcs->guest_domid; + + ret =3D libxl__domain_make_xs_entries(gc, d_config, &dcs->build_st= ate, domid); + if (ret) { + LOGD(ERROR, domid, "cannot make domain: %d", ret); + ret =3D ERROR_FAIL; + goto error_out; + } + } else { + LOGD(ERROR, domid, "cannot make domain"); ret =3D ERROR_FAIL; goto error_out; } =20 - dcs->guest_domid =3D domid; dcs->sdss.dm.guest_domid =3D 0; /* means we haven't spawned */ =20 /* post-4.13 todo: move these next bits of defaulting to @@ -1236,7 +1268,7 @@ static void initiate_domain_create(libxl__egc *egc, if (ret) goto error_out; =20 - if (restore_fd >=3D 0 || dcs->soft_reset) { + if (restore_fd >=3D 0 || dcs->soft_reset || d_config->dm_restore_file)= { LOGD(DEBUG, domid, "restoring, not running bootloader"); domcreate_bootloader_done(egc, &dcs->bl, 0); } else { @@ -1312,7 +1344,16 @@ static void domcreate_bootloader_done(libxl__egc *eg= c, dcs->sdss.dm.callback =3D domcreate_devmodel_started; dcs->sdss.callback =3D domcreate_devmodel_started; =20 - if (restore_fd < 0 && !dcs->soft_reset) { + if (restore_fd < 0 && !dcs->soft_reset && !d_config->dm_restore_file) { + rc =3D libxl__domain_build(gc, d_config, domid, state); + domcreate_rebuild_done(egc, dcs, rc); + return; + } + + if ( d_config->dm_restore_file ) { + dcs->srs.dcs =3D dcs; + dcs->srs.ao =3D ao; + state->forked_vm =3D true; rc =3D libxl__domain_build(gc, d_config, domid, state); domcreate_rebuild_done(egc, dcs, rc); return; @@ -1510,6 +1551,7 @@ static void domcreate_rebuild_done(libxl__egc *egc, /* convenience aliases */ const uint32_t domid =3D dcs->guest_domid; libxl_domain_config *const d_config =3D dcs->guest_config; + libxl__domain_build_state *const state =3D &dcs->build_state; =20 if (ret) { LOGD(ERROR, domid, "cannot (re-)build domain: %d", ret); @@ -1517,6 +1559,9 @@ static void domcreate_rebuild_done(libxl__egc *egc, goto error_out; } =20 + if ( d_config->dm_restore_file ) + state->saved_state =3D GCSPRINTF("%s", d_config->dm_restore_file); + store_libxl_entry(gc, domid, &d_config->b_info); =20 libxl__multidev_begin(ao, &dcs->multidev); @@ -1947,7 +1992,7 @@ static void domain_create_cb(libxl__egc *egc, libxl__domain_create_state *dcs, int rc, uint32_t domid); =20 -static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config, +int libxl__do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config, uint32_t *domid, int restore_fd, int send_back= _fd, const libxl_domain_restore_params *params, const libxl_asyncop_how *ao_how, @@ -1960,6 +2005,8 @@ static int do_domain_create(libxl_ctx *ctx, libxl_dom= ain_config *d_config, GCNEW(cdcs); cdcs->dcs.ao =3D ao; cdcs->dcs.guest_config =3D d_config; + cdcs->dcs.guest_domid =3D *domid; + libxl_domain_config_init(&cdcs->dcs.guest_config_saved); libxl_domain_config_copy(ctx, &cdcs->dcs.guest_config_saved, d_config); cdcs->dcs.restore_fd =3D cdcs->dcs.libxc_fd =3D restore_fd; @@ -2204,8 +2251,8 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_dom= ain_config *d_config, const libxl_asyncprogress_how *aop_console_how) { unset_disk_colo_restore(d_config); - return do_domain_create(ctx, d_config, domid, -1, -1, NULL, - ao_how, aop_console_how); + return libxl__do_domain_create(ctx, d_config, domid, -1, -1, NULL, + ao_how, aop_console_how); } =20 int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_con= fig, @@ -2221,8 +2268,8 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl= _domain_config *d_config, unset_disk_colo_restore(d_config); } =20 - return do_domain_create(ctx, d_config, domid, restore_fd, send_back_fd, - params, ao_how, aop_console_how); + return libxl__do_domain_create(ctx, d_config, domid, restore_fd, send_= back_fd, + params, ao_how, aop_console_how); } =20 int libxl_domain_soft_reset(libxl_ctx *ctx, diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index f4007bbe50..b615f1fc88 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -2803,7 +2803,7 @@ static void device_model_spawn_outcome(libxl__egc *eg= c, =20 libxl__domain_build_state *state =3D dmss->build_state; =20 - if (state->saved_state) { + if (state->saved_state && !state->forked_vm) { ret2 =3D unlink(state->saved_state); if (ret2) { LOGED(ERROR, dmss->guest_domid, "%s: failed to remove device-m= odel state %s", diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 71cb578923..3bc7117b99 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -249,9 +249,12 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid, libxl_domain_build_info *const info =3D &d_config->b_info; libxl_ctx *ctx =3D libxl__gc_owner(gc); char *xs_domid, *con_domid; - int rc; + int rc =3D 0; uint64_t size; =20 + if ( state->forked_vm ) + goto skip_fork; + if (xc_domain_max_vcpus(ctx->xch, domid, info->max_vcpus) !=3D 0) { LOG(ERROR, "Couldn't set max vcpu count"); return ERROR_FAIL; @@ -362,7 +365,6 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid, } } =20 - rc =3D libxl__arch_extra_memory(gc, info, &size); if (rc < 0) { LOGE(ERROR, "Couldn't get arch extra constant memory size"); @@ -374,6 +376,11 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid, return ERROR_FAIL; } =20 + rc =3D libxl__arch_domain_create(gc, d_config, domid); + if ( rc ) + goto out; + +skip_fork: xs_domid =3D xs_read(ctx->xsh, XBT_NULL, "/tool/xenstored/domid", NULL= ); state->store_domid =3D xs_domid ? atoi(xs_domid) : 0; free(xs_domid); @@ -385,8 +392,7 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid, state->store_port =3D xc_evtchn_alloc_unbound(ctx->xch, domid, state->= store_domid); state->console_port =3D xc_evtchn_alloc_unbound(ctx->xch, domid, state= ->console_domid); =20 - rc =3D libxl__arch_domain_create(gc, d_config, domid); - +out: return rc; } =20 @@ -444,6 +450,9 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid, char **ents; int i, rc; =20 + if ( state->forked_vm ) + goto skip_fork; + if (info->num_vnuma_nodes && !info->num_vcpu_soft_affinity) { rc =3D set_vnuma_affinity(gc, domid, info); if (rc) @@ -466,6 +475,7 @@ int libxl__build_post(libxl__gc *gc, uint32_t domid, } } =20 +skip_fork: ents =3D libxl__calloc(gc, 12 + (info->max_vcpus * 2) + 2, sizeof(char= *)); ents[0] =3D "memory/static-max"; ents[1] =3D GCSPRINTF("%"PRId64, info->max_memkb); @@ -728,14 +738,16 @@ static int hvm_build_set_params(xc_interface *handle,= uint32_t domid, libxl_domain_build_info *info, int store_evtchn, unsigned long *store_mfn, int console_evtchn, unsigned long *console= _mfn, - domid_t store_domid, domid_t console_domid) + domid_t store_domid, domid_t console_domid, + bool forked_vm) { struct hvm_info_table *va_hvm; uint8_t *va_map, sum; uint64_t str_mfn, cons_mfn; int i; =20 - if (info->type =3D=3D LIBXL_DOMAIN_TYPE_HVM) { + if ( info->type =3D=3D LIBXL_DOMAIN_TYPE_HVM && !forked_vm ) + { va_map =3D xc_map_foreign_range(handle, domid, XC_PAGE_SIZE, PROT_READ | PROT_WRITE, HVM_INFO_PFN); @@ -1051,6 +1063,23 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, struct xc_dom_image *dom =3D NULL; bool device_model =3D info->type =3D=3D LIBXL_DOMAIN_TYPE_HVM ? true := false; =20 + if ( state->forked_vm ) + { + rc =3D hvm_build_set_params(ctx->xch, domid, info, state->store_po= rt, + &state->store_mfn, state->console_port, + &state->console_mfn, state->store_domid, + state->console_domid, state->forked_vm); + + if ( rc ) + return rc; + + return xc_dom_gnttab_seed(ctx->xch, domid, true, + state->console_mfn, + state->store_mfn, + state->console_domid, + state->store_domid); + } + xc_dom_loginit(ctx->xch); =20 /* @@ -1175,7 +1204,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, rc =3D hvm_build_set_params(ctx->xch, domid, info, state->store_port, &state->store_mfn, state->console_port, &state->console_mfn, state->store_domid, - state->console_domid); + state->console_domid, false); if (rc !=3D 0) { LOG(ERROR, "hvm build set params failed"); goto out; diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 5f39e44cb9..d05ff31e83 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -1374,6 +1374,7 @@ typedef struct { =20 char *saved_state; int dm_monitor_fd; + bool forked_vm; =20 libxl__file_reference pv_kernel; libxl__file_reference pv_ramdisk; @@ -4818,6 +4819,12 @@ _hidden int libxl__domain_pvcontrol(libxl__egc *egc, /* Check whether a domid is recent */ int libxl__is_domid_recent(libxl__gc *gc, uint32_t domid, bool *recent); =20 +_hidden int libxl__do_domain_create(libxl_ctx *ctx, libxl_domain_config *d= _config, + uint32_t *domid, int restore_fd, int s= end_back_fd, + const libxl_domain_restore_params *par= ams, + const libxl_asyncop_how *ao_how, + const libxl_asyncprogress_how *aop_con= sole_how); + #endif =20 /* diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index f7c473be74..2bb5e6319e 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -958,6 +958,7 @@ libxl_domain_config =3D Struct("domain_config", [ ("on_watchdog", libxl_action_on_shutdown), ("on_crash", libxl_action_on_shutdown), ("on_soft_reset", libxl_action_on_shutdown), + ("dm_restore_file", string, {'const': True}), ], dir=3DDIR_IN) =20 libxl_diskinfo =3D Struct("diskinfo", [ diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c index f8bc828e62..f4312411fc 100644 --- a/tools/libxl/libxl_x86.c +++ b/tools/libxl/libxl_x86.c @@ -2,6 +2,7 @@ #include "libxl_arch.h" =20 #include +#include =20 int libxl__arch_domain_prepare_config(libxl__gc *gc, libxl_domain_config *d_config, @@ -842,6 +843,46 @@ int libxl__arch_passthrough_mode_setdefault(libxl__gc = *gc, return rc; } =20 +/* + * The parent domain is expected to be created with default settings for + * - max_evtch_port + * - max_grant_frames + * - max_maptrack_frames + */ +int libxl_domain_fork_vm(libxl_ctx *ctx, uint32_t pdomid, uint32_t max_vcp= us, uint32_t *domid) +{ + int rc; + struct xen_domctl_createdomain create =3D {0}; + create.flags |=3D XEN_DOMCTL_CDF_hvm; + create.flags |=3D XEN_DOMCTL_CDF_hap; + create.flags |=3D XEN_DOMCTL_CDF_oos_off; + create.arch.emulation_flags =3D (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI); + create.ssidref =3D SECINITSID_DOMU; + create.max_vcpus =3D max_vcpus; + create.max_evtchn_port =3D 1023; + create.max_grant_frames =3D LIBXL_MAX_GRANT_FRAMES_DEFAULT; + create.max_maptrack_frames =3D LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT; + + if ( (rc =3D xc_domain_create(ctx->xch, domid, &create)) ) + return rc; + + if ( (rc =3D xc_memshr_fork(ctx->xch, pdomid, *domid)) ) + xc_domain_destroy(ctx->xch, *domid); + + return rc; +} + +int libxl_domain_fork_launch_dm(libxl_ctx *ctx, libxl_domain_config *d_con= fig, + uint32_t domid, + const libxl_asyncprogress_how *aop_console= _how) +{ + return libxl__do_domain_create(ctx, d_config, &domid, -1, -1, 0, 0, ao= p_console_how); +} + +int libxl_domain_fork_reset(libxl_ctx *ctx, uint32_t domid) +{ + return xc_memshr_fork_reset(ctx->xch, domid); +} =20 /* * Local variables: diff --git a/tools/xl/Makefile b/tools/xl/Makefile index af4912e67a..073222233b 100644 --- a/tools/xl/Makefile +++ b/tools/xl/Makefile @@ -15,7 +15,7 @@ LDFLAGS +=3D $(PTHREAD_LDFLAGS) CFLAGS_XL +=3D $(CFLAGS_libxenlight) CFLAGS_XL +=3D -Wshadow =20 -XL_OBJS-$(CONFIG_X86) =3D xl_psr.o +XL_OBJS-$(CONFIG_X86) =3D xl_psr.o xl_forkvm.o XL_OBJS =3D xl.o xl_cmdtable.o xl_sxp.o xl_utils.o $(XL_OBJS-y) XL_OBJS +=3D xl_parse.o xl_cpupool.o xl_flask.o XL_OBJS +=3D xl_vtpm.o xl_block.o xl_nic.o xl_usb.o diff --git a/tools/xl/xl.h b/tools/xl/xl.h index 06569c6c4a..1105c34b15 100644 --- a/tools/xl/xl.h +++ b/tools/xl/xl.h @@ -31,6 +31,7 @@ struct cmd_spec { }; =20 struct domain_create { + uint32_t ddomid; /* fork launch dm for this domid */ int debug; int daemonize; int monitor; /* handle guest reboots etc */ @@ -45,6 +46,7 @@ struct domain_create { const char *config_file; char *extra_config; /* extra config string */ const char *restore_file; + const char *dm_restore_file; char *colo_proxy_script; bool userspace_colo_proxy; int migrate_fd; /* -1 means none */ @@ -128,6 +130,8 @@ int main_pciassignable_remove(int argc, char **argv); int main_pciassignable_list(int argc, char **argv); #ifndef LIBXL_HAVE_NO_SUSPEND_RESUME int main_restore(int argc, char **argv); +int main_fork_launch_dm(int argc, char **argv); +int main_fork_reset(int argc, char **argv); int main_migrate_receive(int argc, char **argv); int main_save(int argc, char **argv); int main_migrate(int argc, char **argv); @@ -212,6 +216,7 @@ int main_psr_cat_cbm_set(int argc, char **argv); int main_psr_cat_show(int argc, char **argv); int main_psr_mba_set(int argc, char **argv); int main_psr_mba_show(int argc, char **argv); +int main_fork_vm(int argc, char **argv); #endif int main_qemu_monitor_command(int argc, char **argv); =20 diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c index 08335394e5..ef634abf32 100644 --- a/tools/xl/xl_cmdtable.c +++ b/tools/xl/xl_cmdtable.c @@ -187,6 +187,21 @@ struct cmd_spec cmd_table[] =3D { "Restore a domain from a saved state", "- for internal use only", }, +#if defined(__i386__) || defined(__x86_64__) + { "fork-vm", + &main_fork_vm, 0, 1, + "Fork a domain from the running parent domid. Experimental. Most con= fig settings must match parent.", + "[options] ", + "-h Print this help.\n" + "-C Use config file for VM fork.\n" + "-Q Use qemu save file for VM fork.\n" + "--launch-dm Launch device model (QEMU) for VM fork= .\n" + "--fork-reset Reset VM fork.\n" + "--max-vcpus Specify max-vcpus matching the parent = domain when not launching dm\n" + "-p Do not unpause fork VM after operation= .\n" + "-d Enable debug messages.\n" + }, +#endif #endif { "dump-core", &main_dump_core, 0, 1, diff --git a/tools/xl/xl_forkvm.c b/tools/xl/xl_forkvm.c new file mode 100644 index 0000000000..a7ee5b4771 --- /dev/null +++ b/tools/xl/xl_forkvm.c @@ -0,0 +1,147 @@ +/* + * Copyright 2020 Intel Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "xl.h" +#include "xl_utils.h" +#include "xl_parse.h" + +int main_fork_vm(int argc, char **argv) +{ + int rc, debug =3D 0; + uint32_t domid_in =3D INVALID_DOMID, domid_out =3D INVALID_DOMID; + int launch_dm =3D 1; + bool reset =3D 0; + bool pause =3D 0; + const char *config_file =3D NULL; + const char *dm_restore_file =3D NULL; + uint32_t max_vcpus =3D 0; + + int opt; + static struct option opts[] =3D { + {"launch-dm", 1, 0, 'l'}, + {"fork-reset", 0, 0, 'r'}, + {"max-vcpus", 1, 0, 'm'}, + COMMON_LONG_OPTS + }; + + SWITCH_FOREACH_OPT(opt, "phdC:Q:l:rm:N:D:B:V:", opts, "fork-vm", 1) { + case 'd': + debug =3D 1; + break; + case 'p': + pause =3D 1; + break; + case 'm': + max_vcpus =3D atoi(optarg); + break; + case 'C': + config_file =3D optarg; + break; + case 'Q': + dm_restore_file =3D optarg; + break; + case 'l': + if ( !strcmp(optarg, "no") ) + launch_dm =3D 0; + if ( !strcmp(optarg, "yes") ) + launch_dm =3D 1; + if ( !strcmp(optarg, "late") ) + launch_dm =3D 2; + break; + case 'r': + reset =3D 1; + break; + case 'N': /* fall-through */ + case 'D': /* fall-through */ + case 'B': /* fall-through */ + case 'V': + fprintf(stderr, "Unimplemented option(s)\n"); + return EXIT_FAILURE; + } + + if (argc-optind =3D=3D 1) { + domid_in =3D atoi(argv[optind]); + } else { + help("fork-vm"); + return EXIT_FAILURE; + } + + if (launch_dm && (!config_file || !dm_restore_file)) { + fprintf(stderr, "Currently you must provide both -C and -Q options= \n"); + return EXIT_FAILURE; + } + + if (reset) { + domid_out =3D domid_in; + if (libxl_domain_fork_reset(ctx, domid_in) =3D=3D EXIT_FAILURE) + return EXIT_FAILURE; + } + + if (launch_dm =3D=3D 2 || reset) { + domid_out =3D domid_in; + rc =3D EXIT_SUCCESS; + } else { + if ( !max_vcpus ) + { + fprintf(stderr, "Currently you must parent's max_vcpu for this= option\n"); + return EXIT_FAILURE; + } + + rc =3D libxl_domain_fork_vm(ctx, domid_in, max_vcpus, &domid_out); + } + + if (rc =3D=3D EXIT_SUCCESS) { + if ( launch_dm ) { + struct domain_create dom_info; + memset(&dom_info, 0, sizeof(dom_info)); + dom_info.ddomid =3D domid_out; + dom_info.dm_restore_file =3D dm_restore_file; + dom_info.debug =3D debug; + dom_info.paused =3D pause; + dom_info.config_file =3D config_file; + dom_info.migrate_fd =3D -1; + dom_info.send_back_fd =3D -1; + rc =3D create_domain(&dom_info) < 0 ? EXIT_FAILURE : EXIT_SUCC= ESS; + } else if ( !pause ) + rc =3D libxl_domain_unpause(ctx, domid_out, NULL); + } + + if (rc =3D=3D EXIT_SUCCESS) + fprintf(stderr, "fork-vm command successfully returned domid: %u\n= ", domid_out); + else if ( domid_out !=3D INVALID_DOMID ) + libxl_domain_destroy(ctx, domid_out, 0); + + return rc; +} + +/* + * Local variables: + * mode: C + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c index 2e2d427492..782fbbc24b 100644 --- a/tools/xl/xl_vmcontrol.c +++ b/tools/xl/xl_vmcontrol.c @@ -676,6 +676,12 @@ int create_domain(struct domain_create *dom_info) =20 int restoring =3D (restore_file || (migrate_fd >=3D 0)); =20 +#if defined(__i386__) || defined(__x86_64__) + /* VM forking */ + uint32_t ddomid =3D dom_info->ddomid; // launch dm for this domain iff= set + const char *dm_restore_file =3D dom_info->dm_restore_file; +#endif + libxl_domain_config_init(&d_config); =20 if (restoring) { @@ -926,6 +932,14 @@ start: * restore/migrate-receive it again. */ restoring =3D 0; +#if defined(__i386__) || defined(__x86_64__) + } else if ( ddomid ) { + d_config.dm_restore_file =3D dm_restore_file; + ret =3D libxl_domain_fork_launch_dm(ctx, &d_config, ddomid, + autoconnect_console_how); + domid =3D ddomid; + ddomid =3D INVALID_DOMID; +#endif } else if (domid_soft_reset !=3D INVALID_DOMID) { /* Do soft reset. */ ret =3D libxl_domain_soft_reset(ctx, &d_config, domid_soft_reset, --=20 2.20.1