[PATCH] docs: update xenstore migration design document

Juergen Gross posted 1 patch 4 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20200414155942.3347-1-jgross@suse.com
Maintainers: Wei Liu <wl@xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>, Jan Beulich <jbeulich@suse.com>, Ian Jackson <ian.jackson@eu.citrix.com>, Julien Grall <julien@xen.org>, George Dunlap <george.dunlap@citrix.com>, Stefano Stabellini <sstabellini@kernel.org>
docs/designs/xenstore-migration.md | 90 ++++++++++++++++++++++++++++++++++++--
1 file changed, 87 insertions(+), 3 deletions(-)
[PATCH] docs: update xenstore migration design document
Posted by Juergen Gross 4 years ago
In the past there have been several attempts to make Xenstore
restartable. This requires to transfer the internal Xenstore state to
the new instance. With the Xenstore migration protocol added recently
to Xen's documentation a first base has been defined to represent the
state of Xenstore. This can be expanded a little bit in order to have
a full state representation which is needed as a first step for live
updating Xenstore.

Add some definitions to designs/xenstore-migration.md which are needed
for live update of xenstored.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 docs/designs/xenstore-migration.md | 90 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 87 insertions(+), 3 deletions(-)

diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md
index 6ab351e8fe..09bb4700b4 100644
--- a/docs/designs/xenstore-migration.md
+++ b/docs/designs/xenstore-migration.md
@@ -9,6 +9,10 @@ records must include details of registered xenstore watches as well as
 content; information that cannot currently be recovered from `xenstored`,
 and hence some extension to the xenstore protocol[2] will also be required.
 
+As a similar set of data is needed for transferring xenstore data from
+one instance to another when live updating xenstored the same definitions
+are being used.
+
 The *libxenlight Domain Image Format* specification[3] already defines a
 record type `EMULATOR_XENSTORE_DATA` but this is not suitable for
 transferring xenstore data pertaining to the domain directly as it is
@@ -48,7 +52,10 @@ where type is one of the following values
 |        | 0x00000001: NODE_DATA                            |
 |        | 0x00000002: WATCH_DATA                           |
 |        | 0x00000003: TRANSACTION_DATA                     |
-|        | 0x00000004 - 0xFFFFFFFF: reserved for future use |
+|        | 0x00000004: TRANSACTION_NODE_DATA                |
+|        | 0x00000005: GUEST_RING_DATA                      |
+|        | 0x00000006: DOMAIN_START (live update only)      |
+|        | 0x00000007 - 0xFFFFFFFF: reserved for future use |
 
 
 and data is one of the record data formats described in the following
@@ -79,7 +86,7 @@ as follows:
 +-------------------------------+
 | perm count (N)                |
 +-------------------------------+
-| perm0                         |
+| perm1                         |
 +-------------------------------+
 ...
 +-------------------------------+
@@ -93,7 +100,7 @@ as follows:
 +-------------------------------+
 ```
 
-where perm0..N are formatted as follows:
+where perm1..N are formatted as follows:
 
 
 ```
@@ -164,6 +171,83 @@ as follows:
 where tx_id is the non-zero identifier values of an open transaction.
 
 
+**TRANSACTION_NODE_DATA**
+
+
+Each TRANSACTION_NODE_DATA record specifies a transaction local xenstore
+node. Its is similar to the NODE_DATA record with the addition of a
+transaction id:
+
+```
+    0       1       2       3     octet
++-------+-------+-------+-------+
+| TRANSACTION_NODE_DATA         |
++-------------------------------+
+| tx_id                         |
++-------------------------------+
+| path length                   |
++-------------------------------+
+| path data                     |
+...
+| pad (0 to 3 octets)           |
++-------------------------------+
+| perm count (N)                |
++-------------------------------+
+| perm1                         |
++-------------------------------+
+...
++-------------------------------+
+| permN                         |
++-------------------------------+
+| value length                  |
++-------------------------------+
+| value data                    |
+...
+| pad (0 to 3 octets)           |
++-------------------------------+
+```
+
+where perm1..N are formatted as specified in the NODE_DATA record. A perm
+count of 0 denotes a node having been deleted in the transaction.
+
+
+**GUEST_RING_DATA**
+
+
+The GUEST_RING_DATA record is used to transfer data which is pending to be
+written to the guest's xenstore ring buffer. It si formatted as follows:
+
+
+```
++-------+-------+-------+-------+
+| GUEST_RING_DATA               |
++-------------------------------+
+| value length                  |
++-------------------------------+
+| value data                    |
+...
+| pad (0 to 3 octets)           |
++-------------------------------+
+```
+
+**DOMAIN_START**
+
+
+For live updating xenstored data of multiple domains needs to be transferred.
+For this purpose the DOMAIN_START record is being used. All records of types
+other than NODE_DATA always relate to the last DOMAIN_START record in the
+stream. A DOMAIN_START record just contains a domain-id:
+
+
+```
++-------+-------+-------+-------+
+| DOMAIN_START                  |
++-------------------------------+
+| domid         | pad           |
++-------------------------------+
+```
+
+
 ### Protocol Extension
 
 Before xenstore state is migrated it is necessary to wait for any pending
-- 
2.16.4


RE: [PATCH] docs: update xenstore migration design document
Posted by Paul Durrant 4 years ago
> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Juergen Gross
> Sent: 14 April 2020 17:00
> To: xen-devel@lists.xenproject.org
> Cc: Juergen Gross <jgross@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Julien Grall
> <julien@xen.org>; Wei Liu <wl@xen.org>; Andrew Cooper <andrew.cooper3@citrix.com>; Ian Jackson
> <ian.jackson@eu.citrix.com>; George Dunlap <george.dunlap@citrix.com>; Jan Beulich <jbeulich@suse.com>
> Subject: [PATCH] docs: update xenstore migration design document
> 
> In the past there have been several attempts to make Xenstore
> restartable. This requires to transfer the internal Xenstore state to
> the new instance. With the Xenstore migration protocol added recently
> to Xen's documentation a first base has been defined to represent the
> state of Xenstore. This can be expanded a little bit in order to have
> a full state representation which is needed as a first step for live
> updating Xenstore.
> 
> Add some definitions to designs/xenstore-migration.md which are needed
> for live update of xenstored.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  docs/designs/xenstore-migration.md | 90 ++++++++++++++++++++++++++++++++++++--
>  1 file changed, 87 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md
> index 6ab351e8fe..09bb4700b4 100644
> --- a/docs/designs/xenstore-migration.md
> +++ b/docs/designs/xenstore-migration.md
> @@ -9,6 +9,10 @@ records must include details of registered xenstore watches as well as
>  content; information that cannot currently be recovered from `xenstored`,
>  and hence some extension to the xenstore protocol[2] will also be required.
> 
> +As a similar set of data is needed for transferring xenstore data from
> +one instance to another when live updating xenstored the same definitions
> +are being used.
> +

That makes sense, but using an external entity to extract the state makes less sense in the context of live update so, going
forward, I suggest dropping the section on extra commands.
Also, we should define a separate 'xenstore domain image format' which can be included as an opaque entity in the migration stream,
in the same way as the qemu save image.

>  The *libxenlight Domain Image Format* specification[3] already defines a
>  record type `EMULATOR_XENSTORE_DATA` but this is not suitable for
>  transferring xenstore data pertaining to the domain directly as it is
> @@ -48,7 +52,10 @@ where type is one of the following values
>  |        | 0x00000001: NODE_DATA                            |
>  |        | 0x00000002: WATCH_DATA                           |
>  |        | 0x00000003: TRANSACTION_DATA                     |
> -|        | 0x00000004 - 0xFFFFFFFF: reserved for future use |
> +|        | 0x00000004: TRANSACTION_NODE_DATA                |
> +|        | 0x00000005: GUEST_RING_DATA                      |
> +|        | 0x00000006: DOMAIN_START (live update only)      |
> +|        | 0x00000007 - 0xFFFFFFFF: reserved for future use |
> 
> 
>  and data is one of the record data formats described in the following
> @@ -79,7 +86,7 @@ as follows:
>  +-------------------------------+
>  | perm count (N)                |
>  +-------------------------------+
> -| perm0                         |
> +| perm1                         |
>  +-------------------------------+
>  ...
>  +-------------------------------+
> @@ -93,7 +100,7 @@ as follows:
>  +-------------------------------+
>  ```
> 
> -where perm0..N are formatted as follows:
> +where perm1..N are formatted as follows:
> 
> 
>  ```
> @@ -164,6 +171,83 @@ as follows:
>  where tx_id is the non-zero identifier values of an open transaction.
> 
> 
> +**TRANSACTION_NODE_DATA**
> +
> +
> +Each TRANSACTION_NODE_DATA record specifies a transaction local xenstore
> +node. Its is similar to the NODE_DATA record with the addition of a
> +transaction id:
> +
> +```
> +    0       1       2       3     octet
> ++-------+-------+-------+-------+
> +| TRANSACTION_NODE_DATA         |
> ++-------------------------------+
> +| tx_id                         |
> ++-------------------------------+
> +| path length                   |
> ++-------------------------------+
> +| path data                     |
> +...
> +| pad (0 to 3 octets)           |
> ++-------------------------------+
> +| perm count (N)                |
> ++-------------------------------+
> +| perm1                         |
> ++-------------------------------+
> +...
> ++-------------------------------+
> +| permN                         |
> ++-------------------------------+
> +| value length                  |
> ++-------------------------------+
> +| value data                    |
> +...
> +| pad (0 to 3 octets)           |
> ++-------------------------------+
> +```
> +
> +where perm1..N are formatted as specified in the NODE_DATA record. A perm
> +count of 0 denotes a node having been deleted in the transaction.
> +

Transferring this state means we can presumably drop the TRANSACTION_DATA, since we can infer open transactions from the presence of
these records?

> +
> +**GUEST_RING_DATA**
> +
> +
> +The GUEST_RING_DATA record is used to transfer data which is pending to be
> +written to the guest's xenstore ring buffer. It si formatted as follows:
> +
> +
> +```
> ++-------+-------+-------+-------+
> +| GUEST_RING_DATA               |
> ++-------------------------------+
> +| value length                  |
> ++-------------------------------+
> +| value data                    |
> +...
> +| pad (0 to 3 octets)           |
> ++-------------------------------+
> +```
> +
> +**DOMAIN_START**
> +
> +
> +For live updating xenstored data of multiple domains needs to be transferred.
> +For this purpose the DOMAIN_START record is being used. All records of types
> +other than NODE_DATA always relate to the last DOMAIN_START record in the
> +stream. A DOMAIN_START record just contains a domain-id:

If we define a separate stream format as I suggest above, I'd expect the domain-id to be part of the header.

> +
> +
> +```
> ++-------+-------+-------+-------+
> +| DOMAIN_START                  |
> ++-------------------------------+
> +| domid         | pad           |
> ++-------------------------------+
> +```
> +
> +
>  ### Protocol Extension

As mentioned above, this section ought to be dropped for the moment. We should define the ways in which xenstored should dump state
in various scenarios: e.g. for LU it will likely be serialize to memory (possibly dev/shmem?), for migration/save-restore it will
probably be serialize to fd (socket/file). We should also define how dumping state will be triggered.

  Paul


Re: [PATCH] docs: update xenstore migration design document
Posted by Jürgen Groß 4 years ago
On 15.04.20 10:09, Paul Durrant wrote:
>> -----Original Message-----
>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Juergen Gross
>> Sent: 14 April 2020 17:00
>> To: xen-devel@lists.xenproject.org
>> Cc: Juergen Gross <jgross@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Julien Grall
>> <julien@xen.org>; Wei Liu <wl@xen.org>; Andrew Cooper <andrew.cooper3@citrix.com>; Ian Jackson
>> <ian.jackson@eu.citrix.com>; George Dunlap <george.dunlap@citrix.com>; Jan Beulich <jbeulich@suse.com>
>> Subject: [PATCH] docs: update xenstore migration design document
>>
>> In the past there have been several attempts to make Xenstore
>> restartable. This requires to transfer the internal Xenstore state to
>> the new instance. With the Xenstore migration protocol added recently
>> to Xen's documentation a first base has been defined to represent the
>> state of Xenstore. This can be expanded a little bit in order to have
>> a full state representation which is needed as a first step for live
>> updating Xenstore.
>>
>> Add some definitions to designs/xenstore-migration.md which are needed
>> for live update of xenstored.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   docs/designs/xenstore-migration.md | 90 ++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 87 insertions(+), 3 deletions(-)
>>
>> diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md
>> index 6ab351e8fe..09bb4700b4 100644
>> --- a/docs/designs/xenstore-migration.md
>> +++ b/docs/designs/xenstore-migration.md
>> @@ -9,6 +9,10 @@ records must include details of registered xenstore watches as well as
>>   content; information that cannot currently be recovered from `xenstored`,
>>   and hence some extension to the xenstore protocol[2] will also be required.
>>
>> +As a similar set of data is needed for transferring xenstore data from
>> +one instance to another when live updating xenstored the same definitions
>> +are being used.
>> +
> 
> That makes sense, but using an external entity to extract the state makes less sense in the context of live update so, going
> forward, I suggest dropping the section on extra commands.

Right, this (if still needed) should rather go to docs/misc/xenstore.txt

> Also, we should define a separate 'xenstore domain image format' which can be included as an opaque entity in the migration stream,
> in the same way as the qemu save image.

Fine with me.

> 
>>   The *libxenlight Domain Image Format* specification[3] already defines a
>>   record type `EMULATOR_XENSTORE_DATA` but this is not suitable for
>>   transferring xenstore data pertaining to the domain directly as it is
>> @@ -48,7 +52,10 @@ where type is one of the following values
>>   |        | 0x00000001: NODE_DATA                            |
>>   |        | 0x00000002: WATCH_DATA                           |
>>   |        | 0x00000003: TRANSACTION_DATA                     |
>> -|        | 0x00000004 - 0xFFFFFFFF: reserved for future use |
>> +|        | 0x00000004: TRANSACTION_NODE_DATA                |
>> +|        | 0x00000005: GUEST_RING_DATA                      |
>> +|        | 0x00000006: DOMAIN_START (live update only)      |
>> +|        | 0x00000007 - 0xFFFFFFFF: reserved for future use |
>>
>>
>>   and data is one of the record data formats described in the following
>> @@ -79,7 +86,7 @@ as follows:
>>   +-------------------------------+
>>   | perm count (N)                |
>>   +-------------------------------+
>> -| perm0                         |
>> +| perm1                         |
>>   +-------------------------------+
>>   ...
>>   +-------------------------------+
>> @@ -93,7 +100,7 @@ as follows:
>>   +-------------------------------+
>>   ```
>>
>> -where perm0..N are formatted as follows:
>> +where perm1..N are formatted as follows:
>>
>>
>>   ```
>> @@ -164,6 +171,83 @@ as follows:
>>   where tx_id is the non-zero identifier values of an open transaction.
>>
>>
>> +**TRANSACTION_NODE_DATA**
>> +
>> +
>> +Each TRANSACTION_NODE_DATA record specifies a transaction local xenstore
>> +node. Its is similar to the NODE_DATA record with the addition of a
>> +transaction id:
>> +
>> +```
>> +    0       1       2       3     octet
>> ++-------+-------+-------+-------+
>> +| TRANSACTION_NODE_DATA         |
>> ++-------------------------------+
>> +| tx_id                         |
>> ++-------------------------------+
>> +| path length                   |
>> ++-------------------------------+
>> +| path data                     |
>> +...
>> +| pad (0 to 3 octets)           |
>> ++-------------------------------+
>> +| perm count (N)                |
>> ++-------------------------------+
>> +| perm1                         |
>> ++-------------------------------+
>> +...
>> ++-------------------------------+
>> +| permN                         |
>> ++-------------------------------+
>> +| value length                  |
>> ++-------------------------------+
>> +| value data                    |
>> +...
>> +| pad (0 to 3 octets)           |
>> ++-------------------------------+
>> +```
>> +
>> +where perm1..N are formatted as specified in the NODE_DATA record. A perm
>> +count of 0 denotes a node having been deleted in the transaction.
>> +
> 
> Transferring this state means we can presumably drop the TRANSACTION_DATA, since we can infer open transactions from the presence of
> these records?

No, I don't think so. Imagine a case where just a transaction has
been started without any further activity in this transaction having
happened yet.

> 
>> +
>> +**GUEST_RING_DATA**
>> +
>> +
>> +The GUEST_RING_DATA record is used to transfer data which is pending to be
>> +written to the guest's xenstore ring buffer. It si formatted as follows:
>> +
>> +
>> +```
>> ++-------+-------+-------+-------+
>> +| GUEST_RING_DATA               |
>> ++-------------------------------+
>> +| value length                  |
>> ++-------------------------------+
>> +| value data                    |
>> +...
>> +| pad (0 to 3 octets)           |
>> ++-------------------------------+
>> +```
>> +
>> +**DOMAIN_START**
>> +
>> +
>> +For live updating xenstored data of multiple domains needs to be transferred.
>> +For this purpose the DOMAIN_START record is being used. All records of types
>> +other than NODE_DATA always relate to the last DOMAIN_START record in the
>> +stream. A DOMAIN_START record just contains a domain-id:
> 
> If we define a separate stream format as I suggest above, I'd expect the domain-id to be part of the header.

Yes.

> 
>> +
>> +
>> +```
>> ++-------+-------+-------+-------+
>> +| DOMAIN_START                  |
>> ++-------------------------------+
>> +| domid         | pad           |
>> ++-------------------------------+
>> +```
>> +
>> +
>>   ### Protocol Extension
> 
> As mentioned above, this section ought to be dropped for the moment. We should define the ways in which xenstored should dump state
> in various scenarios: e.g. for LU it will likely be serialize to memory (possibly dev/shmem?), for migration/save-restore it will
> probably be serialize to fd (socket/file). We should also define how dumping state will be triggered.

I was planning to use XS_CONTROL for this purpose. But I'm open for
other solutions, too. And even using XS_CONTROL might want us to
have a library routine for that purpose encapsulating the way how the
extracted state is being transferred.

In the LU case the state dumping will need to be handled Xenstore
internally (especially in stubdom this will just be a memory blob inside
the stubdom, while in the daemon case it could easily be a file).


Juergen

Re: [PATCH] docs: update xenstore migration design document
Posted by Edwin Torok 4 years ago
On Tue, 2020-04-14 at 17:59 +0200, Juergen Gross wrote:
> In the past there have been several attempts to make Xenstore
> restartable. This requires to transfer the internal Xenstore state to
> the new instance. With the Xenstore migration protocol added recently
> to Xen's documentation a first base has been defined to represent the
> state of Xenstore. This can be expanded a little bit in order to have
> a full state representation which is needed as a first step for live
> updating Xenstore.
> 
> Add some definitions to designs/xenstore-migration.md which are
> needed
> for live update of xenstored.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  docs/designs/xenstore-migration.md | 90
> ++++++++++++++++++++++++++++++++++++--
>  1 file changed, 87 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/designs/xenstore-migration.md
> b/docs/designs/xenstore-migration.md
> index 6ab351e8fe..09bb4700b4 100644
> --- a/docs/designs/xenstore-migration.md
> +++ b/docs/designs/xenstore-migration.md
> @@ -9,6 +9,10 @@ records must include details of registered xenstore
> watches as well as
>  content; information that cannot currently be recovered from
> `xenstored`,
>  and hence some extension to the xenstore protocol[2] will also be
> required.
>  
> +As a similar set of data is needed for transferring xenstore data
> from
> +one instance to another when live updating xenstored the same
> definitions
> +are being used.
> +
>  The *libxenlight Domain Image Format* specification[3] already
> defines a
>  record type `EMULATOR_XENSTORE_DATA` but this is not suitable for
>  transferring xenstore data pertaining to the domain directly as it
> is
> @@ -48,7 +52,10 @@ where type is one of the following values
>  |        | 0x00000001: NODE_DATA                            |
>  |        | 0x00000002: WATCH_DATA                           |
>  |        | 0x00000003: TRANSACTION_DATA                     |
> -|        | 0x00000004 - 0xFFFFFFFF: reserved for future use |
> +|        | 0x00000004: TRANSACTION_NODE_DATA                |
> +|        | 0x00000005: GUEST_RING_DATA                      |
> +|        | 0x00000006: DOMAIN_START (live update only)      |
> +|        | 0x00000007 - 0xFFFFFFFF: reserved for future use |
>  
>  
>  and data is one of the record data formats described in the
> following
> @@ -79,7 +86,7 @@ as follows:
>  +-------------------------------+
>  | perm count (N)                |
>  +-------------------------------+
> -| perm0                         |
> +| perm1                         |
>  +-------------------------------+
>  ...
>  +-------------------------------+
> @@ -93,7 +100,7 @@ as follows:
>  +-------------------------------+
>  ```
>  
> -where perm0..N are formatted as follows:
> +where perm1..N are formatted as follows:
>  
>  
>  ```
> @@ -164,6 +171,83 @@ as follows:
>  where tx_id is the non-zero identifier values of an open
> transaction.
>  
>  
> +**TRANSACTION_NODE_DATA**
> +
> +
> +Each TRANSACTION_NODE_DATA record specifies a transaction local
> xenstore
> +node. Its is similar to the NODE_DATA record with the addition of a
> +transaction id:
> +
> +```
> +    0       1       2       3     octet
> ++-------+-------+-------+-------+
> +| TRANSACTION_NODE_DATA         |
> ++-------------------------------+
> +| tx_id                         |
> ++-------------------------------+
> +| path length                   |
> ++-------------------------------+
> +| path data                     |
> +...
> +| pad (0 to 3 octets)           |
> ++-------------------------------+
> +| perm count (N)                |
> ++-------------------------------+
> +| perm1                         |
> ++-------------------------------+
> +...
> ++-------------------------------+
> +| permN                         |
> ++-------------------------------+
> +| value length                  |
> ++-------------------------------+
> +| value data                    |
> +...
> +| pad (0 to 3 octets)           |
> ++-------------------------------+
> +```
> +
> +where perm1..N are formatted as specified in the NODE_DATA record. A
> perm
> +count of 0 denotes a node having been deleted in the transaction.


oxenstored also tracks the number of operations that a transaction has
performed, which includes readonly operations AFAICT, which cannot be
inferred from counting TRANSACTION_NODE_DATA entries.
I think the operation count would have to be serialized as part of
TRANSACTION_DATA.

> +
> +
> +**GUEST_RING_DATA**
> +
> +
> +The GUEST_RING_DATA record is used to transfer data which is pending
> to be
> +written to the guest's xenstore ring buffer. It si formatted as 

typo: s/si/is/

> follows:
> +
> +
> +```
> ++-------+-------+-------+-------+
> +| GUEST_RING_DATA               |
> ++-------------------------------+
> +| value length                  |
> ++-------------------------------+
> +| value data                    |
> +...
> +| pad (0 to 3 octets)           |
> ++-------------------------------+
> +```
> +
> +**DOMAIN_START**
> +
> +
> +For live updating xenstored data of multiple domains needs to be
> transferred.
> +For this purpose the DOMAIN_START record is being used. All records
> of types
> +other than NODE_DATA always relate to the last DOMAIN_START record
> in the
> +stream. A DOMAIN_START record just contains a domain-id:
> +
> +
> +```
> ++-------+-------+-------+-------+
> +| DOMAIN_START                  |
> ++-------------------------------+
> +| domid         | pad           |
> ++-------------------------------+
> +```

There is some more information that might be useful here: mfn and
remote_port. (based on the information that INTRODUCE needs)

Best regards,
--Edwin
Re: [PATCH] docs: update xenstore migration design document
Posted by Jürgen Groß 4 years ago
On 15.04.20 12:16, Edwin Torok wrote:
> On Tue, 2020-04-14 at 17:59 +0200, Juergen Gross wrote:
>> In the past there have been several attempts to make Xenstore
>> restartable. This requires to transfer the internal Xenstore state to
>> the new instance. With the Xenstore migration protocol added recently
>> to Xen's documentation a first base has been defined to represent the
>> state of Xenstore. This can be expanded a little bit in order to have
>> a full state representation which is needed as a first step for live
>> updating Xenstore.
>>
>> Add some definitions to designs/xenstore-migration.md which are
>> needed
>> for live update of xenstored.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   docs/designs/xenstore-migration.md | 90
>> ++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 87 insertions(+), 3 deletions(-)
>>
>> diff --git a/docs/designs/xenstore-migration.md
>> b/docs/designs/xenstore-migration.md
>> index 6ab351e8fe..09bb4700b4 100644
>> --- a/docs/designs/xenstore-migration.md
>> +++ b/docs/designs/xenstore-migration.md
>> @@ -9,6 +9,10 @@ records must include details of registered xenstore
>> watches as well as
>>   content; information that cannot currently be recovered from
>> `xenstored`,
>>   and hence some extension to the xenstore protocol[2] will also be
>> required.
>>   
>> +As a similar set of data is needed for transferring xenstore data
>> from
>> +one instance to another when live updating xenstored the same
>> definitions
>> +are being used.
>> +
>>   The *libxenlight Domain Image Format* specification[3] already
>> defines a
>>   record type `EMULATOR_XENSTORE_DATA` but this is not suitable for
>>   transferring xenstore data pertaining to the domain directly as it
>> is
>> @@ -48,7 +52,10 @@ where type is one of the following values
>>   |        | 0x00000001: NODE_DATA                            |
>>   |        | 0x00000002: WATCH_DATA                           |
>>   |        | 0x00000003: TRANSACTION_DATA                     |
>> -|        | 0x00000004 - 0xFFFFFFFF: reserved for future use |
>> +|        | 0x00000004: TRANSACTION_NODE_DATA                |
>> +|        | 0x00000005: GUEST_RING_DATA                      |
>> +|        | 0x00000006: DOMAIN_START (live update only)      |
>> +|        | 0x00000007 - 0xFFFFFFFF: reserved for future use |
>>   
>>   
>>   and data is one of the record data formats described in the
>> following
>> @@ -79,7 +86,7 @@ as follows:
>>   +-------------------------------+
>>   | perm count (N)                |
>>   +-------------------------------+
>> -| perm0                         |
>> +| perm1                         |
>>   +-------------------------------+
>>   ...
>>   +-------------------------------+
>> @@ -93,7 +100,7 @@ as follows:
>>   +-------------------------------+
>>   ```
>>   
>> -where perm0..N are formatted as follows:
>> +where perm1..N are formatted as follows:
>>   
>>   
>>   ```
>> @@ -164,6 +171,83 @@ as follows:
>>   where tx_id is the non-zero identifier values of an open
>> transaction.
>>   
>>   
>> +**TRANSACTION_NODE_DATA**
>> +
>> +
>> +Each TRANSACTION_NODE_DATA record specifies a transaction local
>> xenstore
>> +node. Its is similar to the NODE_DATA record with the addition of a
>> +transaction id:
>> +
>> +```
>> +    0       1       2       3     octet
>> ++-------+-------+-------+-------+
>> +| TRANSACTION_NODE_DATA         |
>> ++-------------------------------+
>> +| tx_id                         |
>> ++-------------------------------+
>> +| path length                   |
>> ++-------------------------------+
>> +| path data                     |
>> +...
>> +| pad (0 to 3 octets)           |
>> ++-------------------------------+
>> +| perm count (N)                |
>> ++-------------------------------+
>> +| perm1                         |
>> ++-------------------------------+
>> +...
>> ++-------------------------------+
>> +| permN                         |
>> ++-------------------------------+
>> +| value length                  |
>> ++-------------------------------+
>> +| value data                    |
>> +...
>> +| pad (0 to 3 octets)           |
>> ++-------------------------------+
>> +```
>> +
>> +where perm1..N are formatted as specified in the NODE_DATA record. A
>> perm
>> +count of 0 denotes a node having been deleted in the transaction.
> 
> 
> oxenstored also tracks the number of operations that a transaction has
> performed, which includes readonly operations AFAICT, which cannot be
> inferred from counting TRANSACTION_NODE_DATA entries.
> I think the operation count would have to be serialized as part of
> TRANSACTION_DATA.

No, I don't think this is necessary. The read nodes can be included in
the TRANSACTION_NODE_DATA entries, too, as long as the transaction is
terminated with failure (EAGAIN). In case oxenstored is needing more,
e.g. access types, we can include that.

The TRANSACTION_NODE_DATA entries are primarily needed to ensure
returning consistent data in case of reads of nodes after having them
accessed before in the same transaction.

> 
>> +
>> +
>> +**GUEST_RING_DATA**
>> +
>> +
>> +The GUEST_RING_DATA record is used to transfer data which is pending
>> to be
>> +written to the guest's xenstore ring buffer. It si formatted as
> 
> typo: s/si/is/

Thanks.

> 
>> follows:
>> +
>> +
>> +```
>> ++-------+-------+-------+-------+
>> +| GUEST_RING_DATA               |
>> ++-------------------------------+
>> +| value length                  |
>> ++-------------------------------+
>> +| value data                    |
>> +...
>> +| pad (0 to 3 octets)           |
>> ++-------------------------------+
>> +```
>> +
>> +**DOMAIN_START**
>> +
>> +
>> +For live updating xenstored data of multiple domains needs to be
>> transferred.
>> +For this purpose the DOMAIN_START record is being used. All records
>> of types
>> +other than NODE_DATA always relate to the last DOMAIN_START record
>> in the
>> +stream. A DOMAIN_START record just contains a domain-id:
>> +
>> +
>> +```
>> ++-------+-------+-------+-------+
>> +| DOMAIN_START                  |
>> ++-------------------------------+
>> +| domid         | pad           |
>> ++-------------------------------+
>> +```
> 
> There is some more information that might be useful here: mfn and
> remote_port. (based on the information that INTRODUCE needs)

Oh yes, indeed. And additionally we need something like SOCKET_START
with the file descriptor of a socket based connection, and a global
entry with the main socket file descriptor used for connecting.


Juergen