From nobody Mon Feb 9 17:37:16 2026 Delivered-To: importer@patchew.org Received-SPF: none (zohomail.com: 192.237.175.120 is neither permitted nor denied by domain of lists.xenproject.org) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; spf=none (zohomail.com: 192.237.175.120 is neither permitted nor denied by domain of lists.xenproject.org) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail(p=none dis=none) header.from=amzn.com ARC-Seal: i=1; a=rsa-sha256; t=1583429523; cv=none; d=zohomail.com; s=zohoarc; b=QRV7HKOOiR3ogGFAeUYDF/HFqlHJ6ybeuyy/oH/cGDR5xbBG0zPTGj5f5wSYfSIwDcg7V1WxX+2SgE83GgcW2M7D5DX3RILfhYRrI/tPv7lmgDVtAZGWkIoLmTdmoxAZqkVoYSMegXG9RkMSJihE1hwlsKVYi1dNXPg4WTsljUc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1583429523; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=xBdlklOCl7ghvSa9IynyOQSPl/WD4rx+xQEfO3AsvbU=; b=SSmLRn/ibzi9iCPQaeBA1OJulEnrpZrJsYeoWDSK0FRleg6nLtJOqPsMKc6AMs2ArI+0uOZyq0yWA44d53+Ob6aatPb+dIK5GJW6onFCTic9iWc1/d3hOH5clhRrH7CfGKYwQ8VF7Sbo1eDCouh6GMSzD7VfgndOBZHZblxuk38= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=none (zohomail.com: 192.237.175.120 is neither permitted nor denied by domain of lists.xenproject.org) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1583429523434790.4194285000694; Thu, 5 Mar 2020 09:32:03 -0800 (PST) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1j9uLD-0002vI-83; Thu, 05 Mar 2020 17:31:23 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1j9uLB-0002v0-QM for xen-devel@lists.xenproject.org; Thu, 05 Mar 2020 17:31:21 +0000 Received: from smtp-fw-6001.amazon.com (unknown [52.95.48.154]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 20e90a8a-5f07-11ea-8eb5-bc764e2007e4; Thu, 05 Mar 2020 17:31:21 +0000 (UTC) Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-1e-303d0b0e.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-6001.iad6.amazon.com with ESMTP; 05 Mar 2020 17:31:19 +0000 Received: from EX13MTAUEA002.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166]) by email-inbound-relay-1e-303d0b0e.us-east-1.amazon.com (Postfix) with ESMTPS id B29C4A2E6A; Thu, 5 Mar 2020 17:31:16 +0000 (UTC) Received: from EX13D32EUC004.ant.amazon.com (10.43.164.121) by EX13MTAUEA002.ant.amazon.com (10.43.61.77) with Microsoft SMTP Server (TLS) id 15.0.1236.3; Thu, 5 Mar 2020 17:30:49 +0000 Received: from EX13MTAUEA001.ant.amazon.com (10.43.61.82) by EX13D32EUC004.ant.amazon.com (10.43.164.121) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Thu, 5 Mar 2020 17:30:48 +0000 Received: from u2f063a87eabd5f.cbg10.amazon.com (10.125.106.135) by mail-relay.amazon.com (10.43.61.243) with Microsoft SMTP Server id 15.0.1367.3 via Frontend Transport; Thu, 5 Mar 2020 17:30:46 +0000 X-Inumbo-ID: 20e90a8a-5f07-11ea-8eb5-bc764e2007e4 IronPort-SDR: DOYTJ1XBAIOxhmR935oULIEM+v1Yzf42vRGVfizZNbVBqecS8S2ghSDRI+br/Y1oKxkV2dQSHN kXipmkLIC+Ig== X-IronPort-AV: E=Sophos;i="5.70,518,1574121600"; d="scan'208";a="21208630" From: To: Date: Thu, 5 Mar 2020 17:30:41 +0000 Message-ID: <20200305173041.5141-3-pdurrant@amzn.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200305173041.5141-1-pdurrant@amzn.com> References: <20200305173041.5141-1-pdurrant@amzn.com> MIME-Version: 1.0 Precedence: Bulk Subject: [Xen-devel] [PATCH v6 2/2] docs/designs: Add a design document for migration of xenstore data X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Stefano Stabellini , Julien Grall , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Paul Durrant , Ian Jackson , Jan Beulich Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" From: Paul Durrant This patch details proposes extra migration data and xenstore protocol extensions to support non-cooperative live migration of guests. NOTE: doc/misc/xenstore.txt is also amened to replace the term for the INTRODUCE operation with the , since this is what it actually is. Signed-off-by: Paul Durrant --- Cc: Andrew Cooper Cc: George Dunlap Cc: Ian Jackson Cc: Jan Beulich Cc: Julien Grall Cc: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Wei Liu v6: - Addressed comments from Julien v5: - Add QUIESCE - Make semantics of in GET_DOMAIN_WATCHES more clear v4: - Drop the restrictions on special paths v3: - New in v3 --- docs/designs/xenstore-migration.md | 171 +++++++++++++++++++++++++++++ docs/misc/xenstore.txt | 6 +- 2 files changed, 174 insertions(+), 3 deletions(-) create mode 100644 docs/designs/xenstore-migration.md diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-mig= ration.md new file mode 100644 index 0000000000..7e61f462f0 --- /dev/null +++ b/docs/designs/xenstore-migration.md @@ -0,0 +1,171 @@ +# Xenstore Migration + +## Background + +The design for *Non-Cooperative Migration of Guests*[1] explains that extra +save records are required in the migrations stream to allow a guest running +PV drivers to be migrated without its co-operation. Moreover the save +records must include details of registered xenstore watches as well as +content; information that cannot currently be recovered from `xenstored`, +and hence some extension to the xenstore protocol[2] will also be required. + +The *libxenlight Domain Image Format* specification[3] already defines a +record type `EMULATOR_XENSTORE_DATA` but this is not suitable for +transferring xenstore data pertaining to the domain directly as it is +specified such that keys are relative to the path +`/local/domain/$dm_domid/device-model/$domid`. Thus it is necessary to +define at least one new save record type. + +## Proposal + +### New Save Record + +A new mandatory record type should be defined within the libxenlight Domain +Image Format: + +`0x00000007: DOMAIN_XENSTORE_DATA` + +The format of each of these new records should be as follows: + + +``` +0 1 2 3 4 5 6 7 octet ++------------------------+------------------------+ +| type | record specific data | ++------------------------+ | +... ++-------------------------------------------------+ +``` + +NB: The record data does not contain a length because the libxenlight reco= rd +header specifies the length. + + +| Field | Description | +|--------|--------------------------------------------------| +| `type` | 0x00000000: invalid | +| | 0x00000001: node data | +| | 0x00000002: watch data | +| | 0x00000003: transaction data | +| | 0x00000004 - 0xFFFFFFFF: reserved for future use | + + +where data is always in the form of a tuple as follows + + +**node data** + + +`|||+` + + +`` and `` should be suitable to formulate a `WRITE` operation +to the receiving xenstored and the `|+` list should be +similarly suitable to formulate a subsequent `SET_PERMS` operation. +`` specifies the number of entries in the `|+` +list and `` must be placed at the end because it may contain NUL +octets. + + +**watch data** + + +`||` + +`` again is absolute and, together with ``, should +be suitable to formulate an `ADD_DOMAIN_WATCHES` operation (see below). + + +**transaction data** + + +`||+` + +Each `` should be a uint32_t value represented as unsigned decimal +suitable for passing as a *tx_id* to the re-defined `TRANSACTION_START` +operation (see below). `` is the number of entries in the +`|+` list. + + +### Protocol Extension + +Before xenstore state is migrated it is necessary to wait for any pending +reads, writes, watch registrations etc. to complete, and also to make sure +that xenstored does not start processing any new requests (so that new +requests remain pending on the shared ring for subsequent processing on the +new host). Hence the following operation is needed: + +``` +QUIESCE | + +Complete processing of any request issued by the specified domain, and +do not process any further requests from the shared ring. +``` + +The `WATCH` operation does not allow specification of a ``; it is +assumed that the watch pertains to the domain that owns the shared ring +over which the operation is passed. Hence, for the tool-stack to be able +to register a watch on behalf of a domain a new operation is needed: + +``` +ADD_DOMAIN_WATCHES ||+ + +Adds watches on behalf of the specified domain. + + is a NUL separated tuple of |. The semantics of this +operation are identical to the domain issuing WATCH || for +each . +``` + +The watch information for a domain also needs to be extracted from the +sending xenstored so the following operation is also needed: + +``` +GET_DOMAIN_WATCHES | ||* + +Gets the list of watches that are currently registered for the domain. + + is a NUL separated tuple of |. The sub-list returned +will start at items into the the overall list of watches and may +be truncated (at a boundary) such that the returned data fits +within XENSTORE_PAYLOAD_MAX. + +If is beyond the end of the overall list then the returned sub- +list will be empty. If the value of changes then it indicates +that the overall watch list has changed and thus it may be necessary +to re-issue the operation for previous values of . +``` + +To deal with transactions that were pending when the domain is migrated +it is necessary to start transactions with the same `` in the +receiving xenstored but for them to result in an `EAGAIN` when the +`TRANSACTION_END` operation is peformed. Thus the `TRANSACTION_START` +operation needs to be re-defined as follows: + +``` +TRANSACTION_START | | + is an opaque uint32_t represented as unsigned decimal. + If tx_id is 0 for this operation then a new transaction will be started + with a tx_id allocated by xenstored. If a non-0 tx_id is specified then + a transaction with that tx_id will be started and automatically marked + `conflicting'. The tx_id will always be passed back in . + After this, the tx_id may be used in the request header field for + other operations. + When a transaction is started whole db is copied; reads and writes + happen on the copy. +``` + +It may also be desirable to state in the protocol specification that +the `INTRODUCE` operation should not clear the `` specified such that +a `RELEASE` operation followed by an `INTRODUCE` operation form an +idempotent pair. The current implementation of *C xentored* does this +(in the `domain_conn_reset()` function) but this could be dropped as this +behaviour is not currently specified and the page will always be zeroed +for a newly created domain. + + +* * * + +[1] See https://xenbits.xen.org/gitweb/?p=3Dxen.git;a=3Dblob;f=3Ddocs/desi= gns/non-cooperative-migration.md +[2] See https://xenbits.xen.org/gitweb/?p=3Dxen.git;a=3Dblob;f=3Ddocs/misc= /xenstore.txt +[3] See https://xenbits.xen.org/gitweb/?p=3Dxen.git;a=3Dblob;f=3Ddocs/spec= s/libxl-migration-stream.pandoc diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt index 6f8569d576..51e6b12931 100644 --- a/docs/misc/xenstore.txt +++ b/docs/misc/xenstore.txt @@ -254,7 +254,7 @@ TRANSACTION_END F| =20 ---------- Domain management and xenstored communications ---------- =20 -INTRODUCE |||? +INTRODUCE |||? Notifies xenstored to communicate with this domain. =20 INTRODUCE is currently only used by xend (during domain @@ -262,12 +262,12 @@ INTRODUCE |||? xenstored prevents its use other than by dom0. =20 must be a real domain id (not 0 and not a special - DOMID_... value). must be a machine page in that domain + DOMID_... value). must be a machine page in that domain represented in signed decimal (!). must be event channel is an unbound event channel in (likewise in decimal), on which xenstored will call bind_interdomain. Violations of these rules may result in undefined behaviour; - for example passing a high-bit-set 32-bit mfn as an unsigned + for example passing a high-bit-set 32-bit gfn as an unsigned decimal will attempt to use 0x7fffffff instead (!). =20 RELEASE | --=20 2.20.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel