[PATCH] tools/xenstore: add error indicator to ring page

Juergen Gross posted 1 patch 2 years, 2 months ago
Test gitlab-ci passed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20220210111620.5256-1-jgross@suse.com
There is a newer version of this series
docs/misc/xenstore-ring.txt       | 29 +++++++++++++++++++++
tools/xenstore/xenstored_core.c   | 43 +++++++------------------------
tools/xenstore/xenstored_core.h   |  1 -
tools/xenstore/xenstored_domain.c | 34 +++++++++++++++++++++++-
tools/xenstore/xenstored_domain.h |  1 +
xen/include/public/io/xs_wire.h   |  9 +++++++
6 files changed, 82 insertions(+), 35 deletions(-)
[PATCH] tools/xenstore: add error indicator to ring page
Posted by Juergen Gross 2 years, 2 months ago
In case Xenstore is detecting a malicious ring page modification (e.g.
an invalid producer or consumer index set by a guest) it will ignore
the connection of that guest in future.

Add a new error field to the ring page indicating that case. Add a new
feature bit in order to signal the presence of that error field.

Move the ignore_connection() function to xenstored_domain.c in order
to be able to access the ring page for setting the error indicator.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 docs/misc/xenstore-ring.txt       | 29 +++++++++++++++++++++
 tools/xenstore/xenstored_core.c   | 43 +++++++------------------------
 tools/xenstore/xenstored_core.h   |  1 -
 tools/xenstore/xenstored_domain.c | 34 +++++++++++++++++++++++-
 tools/xenstore/xenstored_domain.h |  1 +
 xen/include/public/io/xs_wire.h   |  9 +++++++
 6 files changed, 82 insertions(+), 35 deletions(-)

diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
index 16b4d0f5ac..ec5b8eb4b9 100644
--- a/docs/misc/xenstore-ring.txt
+++ b/docs/misc/xenstore-ring.txt
@@ -22,6 +22,7 @@ Offset  Length  Description
 2060    4       Output producer offset
 2064    4       Server feature bitmap
 2068    4       Connection state
+2072    4       Connection error indicator
 
 The Input data and Output data are circular buffers. Each buffer is
 associated with a pair of free-running offsets labelled "consumer" and
@@ -66,6 +67,7 @@ The following features are defined:
 Mask    Description
 -----------------------------------------------------------------
 1       Ring reconnection (see the ring reconnection feature below)
+2       Connection error indicator (see connection error feature below)
 
 The "Connection state" field is used to request a ring close and reconnect.
 The "Connection state" field only contains valid data if the server has
@@ -78,6 +80,18 @@ Value   Description
 1       Ring close and reconnect is in progress (see the "ring
         reconnection feature" described below)
 
+The "Connection error indicator" is used to let the server indicate it has
+detected some error that led to deactivation of the connection by the server.
+If the feature has been advertised then the "Connection error indicator" may
+take the following values:
+
+Value   Description
+-----------------------------------------------------------------
+0       No error, connection is valid
+1       Communication problems (event channel not functional)
+2       Inconsistent producer or consumer offset
+3       Protocol violation (client data package too long)
+
 The ring reconnection feature
 =============================
 
@@ -114,3 +128,18 @@ packet boundary.
 
 Note that only the guest may set the Connection state to 1 and only the
 server may set it back to 0.
+
+The connection error feature
+============================
+
+The connection error feature allows the server to signal error conditions
+leading to a stop of the communication with the client. In case such an error
+condition has occurred, the server will set the appropriate error condition in
+the Connection error indicator and will stop communication with the client.
+
+The server will discard any already read or written packets, in-flight
+requests, watches and transactions associated with the connection.
+
+Depending on the error cause it might be possible that a reconnect via the
+ring reconnection feature (if present) can be performed. There is no guarantee
+this will succeed.
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 91d3adccb1..6e4022e5da 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1455,35 +1455,6 @@ static struct {
 	[XS_DIRECTORY_PART]    = { "DIRECTORY_PART",    send_directory_part },
 };
 
-/*
- * Keep the connection alive but stop processing any new request or sending
- * reponse. This is to allow sending @releaseDomain watch event at the correct
- * moment and/or to allow the connection to restart (not yet implemented).
- *
- * All watches, transactions, buffers will be freed.
- */
-void ignore_connection(struct connection *conn)
-{
-	struct buffered_data *out, *tmp;
-
-	trace("CONN %p ignored\n", conn);
-
-	conn->is_ignored = true;
-	conn_delete_all_watches(conn);
-	conn_delete_all_transactions(conn);
-
-	list_for_each_entry_safe(out, tmp, &conn->out_list, list) {
-		list_del(&out->list);
-		talloc_free(out);
-	}
-
-	talloc_free(conn->in);
-	conn->in = NULL;
-	/* if this is a socket connection, drop it now */
-	if (conn->fd >= 0)
-		talloc_free(conn);
-}
-
 static const char *sockmsg_string(enum xsd_sockmsg_type type)
 {
 	if ((unsigned int)type < ARRAY_SIZE(wire_funcs) && wire_funcs[type].str)
@@ -1598,6 +1569,7 @@ static void handle_input(struct connection *conn)
 {
 	int bytes;
 	struct buffered_data *in;
+	unsigned int err;
 
 	if (!conn->in) {
 		conn->in = new_buffer(conn);
@@ -1612,8 +1584,10 @@ static void handle_input(struct connection *conn)
 		if (in->used != sizeof(in->hdr)) {
 			bytes = conn->funcs->read(conn, in->hdr.raw + in->used,
 						  sizeof(in->hdr) - in->used);
-			if (bytes < 0)
+			if (bytes < 0) {
+				err = XENSTORE_ERROR_RINGIDX;
 				goto bad_client;
+			}
 			in->used += bytes;
 			if (in->used != sizeof(in->hdr))
 				return;
@@ -1621,6 +1595,7 @@ static void handle_input(struct connection *conn)
 			if (in->hdr.msg.len > XENSTORE_PAYLOAD_MAX) {
 				syslog(LOG_ERR, "Client tried to feed us %i",
 				       in->hdr.msg.len);
+				err = XENSTORE_ERROR_PROTO;
 				goto bad_client;
 			}
 		}
@@ -1638,8 +1613,10 @@ static void handle_input(struct connection *conn)
 
 	bytes = conn->funcs->read(conn, in->buffer + in->used,
 				  in->hdr.msg.len - in->used);
-	if (bytes < 0)
+	if (bytes < 0) {
+		err = XENSTORE_ERROR_RINGIDX;
 		goto bad_client;
+	}
 
 	in->used += bytes;
 	if (in->used != in->hdr.msg.len)
@@ -1649,14 +1626,14 @@ static void handle_input(struct connection *conn)
 	return;
 
 bad_client:
-	ignore_connection(conn);
+	ignore_connection(conn, err);
 }
 
 static void handle_output(struct connection *conn)
 {
 	/* Ignore the connection if an error occured */
 	if (!write_messages(conn))
-		ignore_connection(conn);
+		ignore_connection(conn, XENSTORE_ERROR_RINGIDX);
 }
 
 struct connection *new_connection(const struct interface_funcs *funcs)
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 190d2447cd..742812a974 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -206,7 +206,6 @@ struct node *read_node(struct connection *conn, const void *ctx,
 
 struct connection *new_connection(const struct interface_funcs *funcs);
 struct connection *get_connection_by_id(unsigned int conn_id);
-void ignore_connection(struct connection *conn);
 void check_store(void);
 void corrupt(struct connection *conn, const char *fmt, ...);
 
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index d03c7d93a9..ae065fcbee 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -427,6 +427,38 @@ static void domain_conn_reset(struct domain *domain)
 	domain->interface->rsp_cons = domain->interface->rsp_prod = 0;
 }
 
+/*
+ * Keep the connection alive but stop processing any new request or sending
+ * reponse. This is to allow sending @releaseDomain watch event at the correct
+ * moment and/or to allow the connection to restart (not yet implemented).
+ *
+ * All watches, transactions, buffers will be freed.
+ */
+void ignore_connection(struct connection *conn, unsigned int err)
+{
+	struct buffered_data *out, *tmp;
+
+	trace("CONN %p ignored, reason %u\n", conn, err);
+
+	if (conn->domain && conn->domain->interface)
+		conn->domain->interface->error = err;
+
+	conn->is_ignored = true;
+	conn_delete_all_watches(conn);
+	conn_delete_all_transactions(conn);
+
+	list_for_each_entry_safe(out, tmp, &conn->out_list, list) {
+		list_del(&out->list);
+		talloc_free(out);
+	}
+
+	talloc_free(conn->in);
+	conn->in = NULL;
+	/* if this is a socket connection, drop it now */
+	if (conn->fd >= 0)
+		talloc_free(conn);
+}
+
 static struct domain *introduce_domain(const void *ctx,
 				       unsigned int domid,
 				       evtchn_port_t port, bool restore)
@@ -1305,7 +1337,7 @@ void read_state_connection(const void *ctx, const void *state)
 		 * dead. So mark it as ignored.
 		 */
 		if (!domain->port || !domain->interface)
-			ignore_connection(conn);
+			ignore_connection(conn, XENSTORE_ERROR_COMM);
 
 		if (sc->spec.ring.tdomid != DOMID_INVALID) {
 			tdomain = find_or_alloc_domain(ctx,
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index 1e929b8f8c..4a37de67a0 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -47,6 +47,7 @@ int do_reset_watches(struct connection *conn, struct buffered_data *in);
 void domain_init(int evtfd);
 void dom0_init(void);
 void domain_deinit(void);
+void ignore_connection(struct connection *conn, unsigned int err);
 
 /* Returns the implicit path of a connection (only domains have this) */
 const char *get_implicit_path(const struct connection *conn);
diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h
index 4dd6632669..953a0050a3 100644
--- a/xen/include/public/io/xs_wire.h
+++ b/xen/include/public/io/xs_wire.h
@@ -124,6 +124,7 @@ struct xenstore_domain_interface {
     XENSTORE_RING_IDX rsp_cons, rsp_prod;
     uint32_t server_features; /* Bitmap of features supported by the server */
     uint32_t connection;
+    uint32_t error;
 };
 
 /* Violating this is very bad.  See docs/misc/xenstore.txt. */
@@ -135,11 +136,19 @@ struct xenstore_domain_interface {
 
 /* The ability to reconnect a ring */
 #define XENSTORE_SERVER_FEATURE_RECONNECTION 1
+/* The presence of the "error" field in the ring page */
+#define XENSTORE_SERVER_FEATURE_ERROR        2
 
 /* Valid values for the connection field */
 #define XENSTORE_CONNECTED 0 /* the steady-state */
 #define XENSTORE_RECONNECT 1 /* guest has initiated a reconnect */
 
+/* Valid values for the error field */
+#define XENSTORE_ERROR_NONE    0 /* No error */
+#define XENSTORE_ERROR_COMM    1 /* Communication problem */
+#define XENSTORE_ERROR_RINGIDX 2 /* Invalid ring index */
+#define XENSTORE_ERROR_PROTO   3 /* Protocol violation (payload too long) */
+
 #endif /* _XS_WIRE_H */
 
 /*
-- 
2.34.1


Re: [PATCH] tools/xenstore: add error indicator to ring page
Posted by Anthony PERARD 2 years, 2 months ago
On Thu, Feb 10, 2022 at 12:16:20PM +0100, Juergen Gross wrote:
> +The "Connection error indicator" is used to let the server indicate it has
> +detected some error that led to deactivation of the connection by the server.
> +If the feature has been advertised then the "Connection error indicator" may
> +take the following values:
> +
> +Value   Description
> +-----------------------------------------------------------------
> +0       No error, connection is valid
> +1       Communication problems (event channel not functional)
> +2       Inconsistent producer or consumer offset
> +3       Protocol violation (client data package too long)

Is this meant to be the only possible error value? If in the future we
want to add more possible error, does it going to need a new feature
bit and maybe a new error field?

Thanks,

-- 
Anthony PERARD

Re: [PATCH] tools/xenstore: add error indicator to ring page
Posted by Juergen Gross 2 years, 2 months ago
On 15.02.22 16:42, Anthony PERARD wrote:
> On Thu, Feb 10, 2022 at 12:16:20PM +0100, Juergen Gross wrote:
>> +The "Connection error indicator" is used to let the server indicate it has
>> +detected some error that led to deactivation of the connection by the server.
>> +If the feature has been advertised then the "Connection error indicator" may
>> +take the following values:
>> +
>> +Value   Description
>> +-----------------------------------------------------------------
>> +0       No error, connection is valid
>> +1       Communication problems (event channel not functional)
>> +2       Inconsistent producer or consumer offset
>> +3       Protocol violation (client data package too long)
> 
> Is this meant to be the only possible error value? If in the future we
> want to add more possible error, does it going to need a new feature
> bit and maybe a new error field?

No, as the guest is not opting into this feature, but just gets it
presented, there is no need to have another bit for new error values.

Note that this is a purely informational interface. The error value
(other than 0) is only for diagnostic purposes, there is no way a
guest could react in a sane way to a specific error case.


Juergen
Re: [PATCH] tools/xenstore: add error indicator to ring page
Posted by Anthony PERARD 2 years, 2 months ago
On Tue, Feb 15, 2022 at 04:45:28PM +0100, Juergen Gross wrote:
> On 15.02.22 16:42, Anthony PERARD wrote:
> > On Thu, Feb 10, 2022 at 12:16:20PM +0100, Juergen Gross wrote:
> > > +The "Connection error indicator" is used to let the server indicate it has
> > > +detected some error that led to deactivation of the connection by the server.
> > > +If the feature has been advertised then the "Connection error indicator" may
> > > +take the following values:
> > > +
> > > +Value   Description
> > > +-----------------------------------------------------------------
> > > +0       No error, connection is valid
> > > +1       Communication problems (event channel not functional)
> > > +2       Inconsistent producer or consumer offset
> > > +3       Protocol violation (client data package too long)
> > 
> > Is this meant to be the only possible error value? If in the future we
> > want to add more possible error, does it going to need a new feature
> > bit and maybe a new error field?
> 
> No, as the guest is not opting into this feature, but just gets it
> presented, there is no need to have another bit for new error values.

This probably needs to be spelled out in the documents that.

> Note that this is a purely informational interface. The error value
> (other than 0) is only for diagnostic purposes, there is no way a
> guest could react in a sane way to a specific error case.

This could also be spelled out in the document, in the new section "The
connection error feature", that a value other than 0 is a connection
error even if the error number isn't known to the client.

Thanks,

-- 
Anthony PERARD

Re: [PATCH] tools/xenstore: add error indicator to ring page
Posted by Juergen Gross 2 years, 2 months ago
On 17.02.22 11:56, Anthony PERARD wrote:
> On Tue, Feb 15, 2022 at 04:45:28PM +0100, Juergen Gross wrote:
>> On 15.02.22 16:42, Anthony PERARD wrote:
>>> On Thu, Feb 10, 2022 at 12:16:20PM +0100, Juergen Gross wrote:
>>>> +The "Connection error indicator" is used to let the server indicate it has
>>>> +detected some error that led to deactivation of the connection by the server.
>>>> +If the feature has been advertised then the "Connection error indicator" may
>>>> +take the following values:
>>>> +
>>>> +Value   Description
>>>> +-----------------------------------------------------------------
>>>> +0       No error, connection is valid
>>>> +1       Communication problems (event channel not functional)
>>>> +2       Inconsistent producer or consumer offset
>>>> +3       Protocol violation (client data package too long)
>>>
>>> Is this meant to be the only possible error value? If in the future we
>>> want to add more possible error, does it going to need a new feature
>>> bit and maybe a new error field?
>>
>> No, as the guest is not opting into this feature, but just gets it
>> presented, there is no need to have another bit for new error values.
> 
> This probably needs to be spelled out in the documents that.
> 
>> Note that this is a purely informational interface. The error value
>> (other than 0) is only for diagnostic purposes, there is no way a
>> guest could react in a sane way to a specific error case.
> 
> This could also be spelled out in the document, in the new section "The
> connection error feature", that a value other than 0 is a connection
> error even if the error number isn't known to the client.

Okay, will add something to the doc (for both issues).


Juergen