Xen Security Advisory 419 v2 (CVE-2022-42322,CVE-2022-42323) - Xenstore: Cooperating guests can create arbitrary numbers of nodes

Xen.org security team posted 1 patch 1 year, 6 months ago
Failed in applying to current master (apply log)
Xen Security Advisory 419 v2 (CVE-2022-42322,CVE-2022-42323) - Xenstore: Cooperating guests can create arbitrary numbers of nodes
Posted by Xen.org security team 1 year, 6 months ago
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

     Xen Security Advisory CVE-2022-42322,CVE-2022-42323 / XSA-419
                               version 2

   Xenstore: Cooperating guests can create arbitrary numbers of nodes

UPDATES IN VERSION 2
====================

Public release.

ISSUE DESCRIPTION
=================

Since the fix of XSA-322 any Xenstore node owned by a removed domain
will be modified to be owned by Dom0.  This will allow two malicious
guests working together to create an arbitrary number of Xenstore
nodes.

This is possible by domain A letting domain B write into domain A's
local Xenstore tree.  Domain B can then create many nodes and reboot.
The nodes created by domain B will now be owned by Dom0.  By repeating
this process over and over again an arbitrary number of nodes can be
created, as Dom0's number of nodes isn't limited by Xenstore quota.

IMPACT
======

Two malicious guests working together can drive xenstored into an
out of memory situation, resulting in a Denial of Service (DoS) of
xenstored.

This inhibits creation of new guests and changing the configuration of
already running guests.

VULNERABLE SYSTEMS
==================

All versions of Xen with the fix for XSA-322 are in principle vulnerable.

Both Xenstore implementations (C and Ocaml) are vulnerable.

MITIGATION
==========

There is no mitigation available.

CREDITS
=======

This issue was discovered by Jürgen Groß of SUSE.

RESOLUTION
==========

Applying the appropriate attached patch resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa419/xsa419-oxenstored.patch             xen-unstable
xsa419/xsa419-xenstored-??.patch           xen-unstable, Xen 4.16.x
xsa419/xsa419-4.15-oxenstored.patch        Xen 4.15.x
xsa419/xsa419-4.15-xenstored-??.patch      Xen 4.15.x
xsa419/xsa419-4.14-oxenstored.patch        Xen 4.14.x
xsa419/xsa419-4.14-xenstored-??.patch      Xen 4.14.x
xsa419/xsa419-4.13-oxenstored.patch        Xen 4.13.x
xsa419/xsa419-4.13-xenstored-??.patch      Xen 4.13.x

$ sha256sum xsa419* xsa419*/*
eaeb2a67accac70743cd9bed055b31bee2402600b7452f79da4bb969d7b5607f  xsa419.meta
34abd947ceaf1251afc81356a3ff374bc06c046651f5f9d0894d90c93295d1ca  xsa419/xsa419-4.13-oxenstored.patch
713eea1d9be7a5bef7a681a10648d2ea7db36c961cc8a9c55e147db14f59fbc2  xsa419/xsa419-4.13-xenstored-01.patch
d7b0369ee1c87a08783c0484ae5f62f1c61be9c405e6568085052867bb520b2a  xsa419/xsa419-4.13-xenstored-02.patch
f6e0cd7491d602db3a7ac9b9e94afb59c30bf8690cd116850d8eafc481f022a9  xsa419/xsa419-4.13-xenstored-03.patch
18daa2d6b9d243bfd81e221af9ae1d74cbc621614b78dc751bb6ccdba3d66451  xsa419/xsa419-4.14-oxenstored.patch
d631f12da2a8fcf674aeed33d0037bfff4b11587d6d52e4709739a8d1e90f33a  xsa419/xsa419-4.14-xenstored-01.patch
dc3834b30ac15d31ad1a13a8b4925229f13ce7955f2cc2223651764c55d41e64  xsa419/xsa419-4.14-xenstored-02.patch
f15d02bfc9ee5119347708fd2e4d26c6b4aa18827afab1a10b9139344ca88861  xsa419/xsa419-4.14-xenstored-03.patch
95c35f32cf64229df2768109acc360a6f6ec4ddfdcbde4f0d8165f67432d3eef  xsa419/xsa419-4.15-oxenstored.patch
773e98ee9ddb37e4a743d4435340066aabdc5fb41b6ff12e6b91c709484204ab  xsa419/xsa419-4.15-xenstored-01.patch
4d1f9135be43e121576909787a6403aa1c1e5fa72ead8764326e21beb48d83d4  xsa419/xsa419-4.15-xenstored-02.patch
484bdddae7a750cbddeddb93be5840e3cfdda5799f667c6b5d66c3c9b7217d55  xsa419/xsa419-4.15-xenstored-03.patch
1c790ddc8cbabb32012c7636c46e017b0cbdd1cc23c56baabda4d5dca9531454  xsa419/xsa419-oxenstored.patch
3c53e103f7927ae28ab5c7a3954c7d0a6fbbdce0340816936adb5938cd48c776  xsa419/xsa419-xenstored-01.patch
978e3100b20e0126ee238d3e1c2036b25582b1c3333a028e120d700bac8d2a13  xsa419/xsa419-xenstored-02.patch
57f7015289a940e7f2dc66aedb1c04d37d0aef687a7b91453582e960b7f93076  xsa419/xsa419-xenstored-03.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of patches or mitigations is NOT permitted (except where
all the affected systems and VMs are administered and used only by
organisations which are members of the Xen Project Security Issues
Predisclosure List).  Specifically, deployment on public cloud systems
is NOT permitted.

This is because the patches will result in a guest visible change of
behavior of Xenstore.

Deployment is permitted only AFTER the embargo ends.


(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmNg+64MHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZR4gIAI+TcWgMKtaJ6G6MeakBbgxliMCO7+C01+94H6ZH
7dC57n3Qm12t3q6WGnPG1YYzKGWT2hsSU8/JtIkuZFe2qyvuG5cVhVcrdOGGkhsZ
4ui517R76Ldb/cBtraX6QRJni+T58ZdQGAChr6KuD8cyMgXAl1gtto9+/rQtsDzr
7XxKcR/+CcNWAOpZTNJ6DtS8V1RuRNtMSuoTCpC3Ph+Edir05bPlz6BF8EnV0cqU
+Jk9nYHGg5H11L0K+yx4cIAWfaP/n/Z2AoND23tV3T4o3U3zQR9xNBvcY1zuvioG
rEfzGNQx9ECQLHYPTzHNNWs+9Fb8eHyiRvkS6SKGolxpj9A=
=nNvr
-----END PGP SIGNATURE-----
{
  "XSA": 419,
  "SupportedVersions": [
    "master",
    "4.16",
    "4.15",
    "4.14",
    "4.13"
  ],
  "Trees": [
    "xen"
  ],
  "Recipes": {
    "4.13": {
      "Recipes": {
        "xen": {
          "StableRef": "0be63c2615b268001f7cc9b72ce25eed952737dc",
          "Prereqs": [
            414,
            415,
            326,
            416,
            417,
            418
          ],
          "Patches": [
            "xsa419/xsa419-4.13-xenstored-??.patch",
            "xsa419/xsa419-4.13-oxenstored.patch"
          ]
        }
      }
    },
    "4.14": {
      "Recipes": {
        "xen": {
          "StableRef": "016de62747b26ead5a5c763b640fe8e205cd182b",
          "Prereqs": [
            414,
            415,
            326,
            416,
            417,
            418
          ],
          "Patches": [
            "xsa419/xsa419-4.14-xenstored-??.patch",
            "xsa419/xsa419-4.14-oxenstored.patch"
          ]
        }
      }
    },
    "4.15": {
      "Recipes": {
        "xen": {
          "StableRef": "816580afdd1730d4f85f64477a242a439af1cdf8",
          "Prereqs": [
            414,
            415,
            326,
            416,
            417,
            418
          ],
          "Patches": [
            "xsa419/xsa419-4.15-xenstored-??.patch",
            "xsa419/xsa419-4.15-oxenstored.patch"
          ]
        }
      }
    },
    "4.16": {
      "Recipes": {
        "xen": {
          "StableRef": "1bce7fb1f702da4f7a749c6f1457ecb20bf74fca",
          "Prereqs": [
            412,
            414,
            415,
            326,
            416,
            417,
            418
          ],
          "Patches": [
            "xsa419/xsa419-xenstored-??.patch",
            "xsa419/xsa419-oxenstored.patch"
          ]
        }
      }
    },
    "master": {
      "Recipes": {
        "xen": {
          "StableRef": "cc4747be8ba157a3b310921e9ee07fb8545aa206",
          "Prereqs": [
            412,
            414,
            415,
            326,
            416,
            417,
            418
          ],
          "Patches": [
            "xsa419/xsa419-xenstored-??.patch",
            "xsa419/xsa419-oxenstored.patch"
          ]
        }
      }
    }
  }
}From a21dcd5dedc7af0e73395e8bd6af3dcdef1dc9e5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
Date: Wed, 12 Oct 2022 19:13:06 +0100
Subject: tools/ocaml/xenstored: Fix quota bypass on domain shutdown
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

XSA-322 fixed a domid reuse vulnerability by assigning Dom0 as the owner of
any nodes left after a domain is shutdown (e.g. outside its /local/domain/N
tree).

However Dom0 has no quota on purpose, so this opened up another potential
attack vector. Avoid it by deleting these nodes instead of assigning them to
Dom0.

This is part of XSA-419 / CVE-2022-42323.

Reported-by: Juergen Gross <jgross@suse.com>
Fixes: c46eff921209 ("tools/ocaml/xenstored: clean up permissions for dead domains")
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

diff --git a/tools/ocaml/xenstored/perms.ml b/tools/ocaml/xenstored/perms.ml
index e8a16221f8fa..84f2503e8e29 100644
--- a/tools/ocaml/xenstored/perms.ml
+++ b/tools/ocaml/xenstored/perms.ml
@@ -64,8 +64,7 @@ let get_owner perm = perm.owner
 * *)
 let remove_domid ~domid perm =
 	let acl = List.filter (fun (acl_domid, _) -> acl_domid <> domid) perm.acl in
-	let owner = if perm.owner = domid then 0 else perm.owner in
-	{ perm with acl; owner }
+	if perm.owner = domid then None else Some { perm with acl; owner = perm.owner }
 
 let default0 = create 0 NONE []
 
diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
index 328d3a51984f..d82764f60fb4 100644
--- a/tools/ocaml/xenstored/store.ml
+++ b/tools/ocaml/xenstored/store.ml
@@ -89,10 +89,21 @@ let check_owner node connection =
 
 let rec recurse fct node = fct node; List.iter (recurse fct) node.children
 
-(** [recurse_map f tree] applies [f] on each node in the tree recursively *)
-let recurse_map f =
+(** [recurse_filter_map f tree] applies [f] on each node in the tree recursively,
+    possibly removing some nodes.
+    Note that the nodes removed this way won't generate watch events.
+*)
+let recurse_filter_map f =
+	let invalid = -1 in
+	let is_valid node = node.perms.owner <> invalid in
 	let rec walk node =
-		f { node with children = List.rev_map walk node.children |> List.rev }
+		(* Map.filter_map is Ocaml 4.11+ only *)
+		let node =
+		{ node with children =
+			List.rev_map walk node.children |> List.filter is_valid |> List.rev } in
+		match f node with
+		| Some keep -> keep
+		| None -> { node with perms = {node.perms with owner = invalid } }
 	in
 	walk
 
@@ -446,11 +457,13 @@ let setperms store perm path nperms =
 
 let reset_permissions store domid =
 	Logging.info "store|node" "Cleaning up xenstore ACLs for domid %d" domid;
-	store.root <- Node.recurse_map (fun node ->
-		let perms = Perms.Node.remove_domid ~domid node.perms in
-		if perms <> node.perms then
-			Logging.debug "store|node" "Changed permissions for node %s" (Node.get_name node);
-		{ node with perms }
+	store.root <- Node.recurse_filter_map (fun node ->
+		match Perms.Node.remove_domid ~domid node.perms with
+		| None -> None
+		| Some perms ->
+			if perms <> node.perms then
+				Logging.debug "store|node" "Changed permissions for node %s" (Node.get_name node);
+			Some { node with perms }
 	) store.root
 
 type ops = {
From 104afb886f1d9243f99e0fd998d7c898bc834831 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: remove nodes owned by destroyed domain

In case a domain is removed from Xenstore, remove all nodes owned by
it per default.

This tackles the problem that nodes might be created by a domain
outside its home path in Xenstore, leading to Xenstore hogging more
and more memory. Domain quota don't work in this case if the guest is
rebooting in between.

Since XSA-322 ownership of such stale nodes is transferred to dom0,
which is helping against unintended access, but not against OOM of
Xenstore.

As a fallback for weird cases add a Xenstore start parameter for
keeping today's way to handle stale nodes, adding the risk of Xenstore
hitting an OOM situation.

This is part of XSA-419 / CVE-2022-42322.

Reported-by: Juergen Gross <jgross@suse.com>
Fixes: 496306324d8d ("tools/xenstore: revoke access rights for removed domains")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 9655af4c01fa..7eb698e1fc22 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -76,6 +76,7 @@ static bool verbose = false;
 LIST_HEAD(connections);
 int tracefd = -1;
 static bool recovery = true;
+bool keep_orphans = false;
 static int reopen_log_pipe[2];
 static int reopen_log_pipe0_pollfd_idx = -1;
 char *tracefile = NULL;
@@ -662,7 +663,7 @@ struct node *read_node(struct connection *conn, const void *ctx,
 	node->perms.p = hdr->perms;
 	node->acc.domid = node->perms.p[0].id;
 	node->acc.memory = data.dsize;
-	if (domain_adjust_node_perms(conn, node))
+	if (domain_adjust_node_perms(node))
 		goto error;
 
 	/* If owner is gone reset currently accounted memory size. */
@@ -705,7 +706,7 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
 	void *p;
 	struct xs_tdb_record_hdr *hdr;
 
-	if (domain_adjust_node_perms(conn, node))
+	if (domain_adjust_node_perms(node))
 		return errno;
 
 	data.dsize = sizeof(*hdr)
@@ -1546,7 +1547,7 @@ static int delnode_sub(const void *ctx, struct connection *conn,
 	return WALK_TREE_RM_CHILDENTRY;
 }
 
-static int _rm(struct connection *conn, const void *ctx, const char *name)
+int rm_node(struct connection *conn, const void *ctx, const char *name)
 {
 	struct node *parent;
 	char *parentname = get_parent(ctx, name);
@@ -1608,7 +1609,7 @@ static int do_rm(const void *ctx, struct connection *conn,
 	if (streq(name, "/"))
 		return EINVAL;
 
-	ret = _rm(conn, ctx, name);
+	ret = rm_node(conn, ctx, name);
 	if (ret)
 		return ret;
 
@@ -2461,6 +2462,8 @@ static void usage(void)
 "  -R, --no-recovery       to request that no recovery should be attempted when\n"
 "                          the store is corrupted (debug only),\n"
 "  -I, --internal-db       store database in memory, not on disk\n"
+"  -K, --keep-orphans      don't delete nodes owned by a domain when the\n"
+"                          domain is deleted (this is a security risk!)\n"
 "  -V, --verbose           to request verbose execution.\n");
 }
 
@@ -2484,6 +2487,7 @@ static struct option options[] = {
 	{ "timeout", 1, NULL, 'w' },
 	{ "no-recovery", 0, NULL, 'R' },
 	{ "internal-db", 0, NULL, 'I' },
+	{ "keep-orphans", 0, NULL, 'K' },
 	{ "verbose", 0, NULL, 'V' },
 	{ "watch-nb", 1, NULL, 'W' },
 	{ NULL, 0, NULL, 0 } };
@@ -2558,7 +2562,7 @@ int main(int argc, char *argv[])
 	int timeout;
 
 
-	while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:Q:q:T:RVW:w:", options,
+	while ((opt = getopt_long(argc, argv, "DE:F:HKNPS:t:A:Q:q:T:RVW:w:", options,
 				  NULL)) != -1) {
 		switch (opt) {
 		case 'D':
@@ -2594,6 +2598,9 @@ int main(int argc, char *argv[])
 		case 'I':
 			tdb_flags = TDB_INTERNAL|TDB_NOLOCK;
 			break;
+		case 'K':
+			keep_orphans = true;
+			break;
 		case 'V':
 			verbose = true;
 			break;
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index fc9882ac37d5..ec24c27ac240 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -204,6 +204,9 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
 struct node *read_node(struct connection *conn, const void *ctx,
 		       const char *name);
 
+/* Remove a node and its children. */
+int rm_node(struct connection *conn, const void *ctx, const char *name);
+
 void setup_structure(void);
 struct connection *new_connection(connwritefn_t *write, connreadfn_t *read);
 void check_store(void);
@@ -242,6 +245,7 @@ extern int quota_req_outstanding;
 extern int quota_trans_nodes;
 extern int quota_memory_per_domain_soft;
 extern int quota_memory_per_domain_hard;
+extern bool keep_orphans;
 
 extern unsigned int timeout_watch_event_msec;
 
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 4a0e6a05b8cb..ca639cf433cf 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -204,10 +204,65 @@ static void unmap_interface(void *interface)
 	xengnttab_unmap(*xgt_handle, interface, 1);
 }
 
+static int domain_tree_remove_sub(const void *ctx, struct connection *conn,
+				  struct node *node, void *arg)
+{
+	struct domain *domain = arg;
+	TDB_DATA key;
+	int ret = WALK_TREE_OK;
+
+	if (node->perms.p[0].id != domain->domid)
+		return WALK_TREE_OK;
+
+	if (keep_orphans) {
+		key.dptr = (char *)node->name;
+		key.dsize = strlen(node->name);
+		domain->nbentry--;
+		node->perms.p[0].id = priv_domid;
+		node->acc.memory = 0;
+		domain_entry_inc(NULL, node);
+		if (write_node_raw(NULL, &key, node, true)) {
+			/* That's unfortunate. We only can try to continue. */
+			syslog(LOG_ERR,
+			       "error when moving orphaned node %s to dom0\n",
+			       node->name);
+		} else
+			trace("orphaned node %s moved to dom0\n", node->name);
+	} else {
+		if (rm_node(NULL, ctx, node->name)) {
+			/* That's unfortunate. We only can try to continue. */
+			syslog(LOG_ERR,
+			       "error when deleting orphaned node %s\n",
+			       node->name);
+		} else
+			trace("orphaned node %s deleted\n", node->name);
+
+		/* Skip children in all cases in order to avoid more errors. */
+		ret = WALK_TREE_SKIP_CHILDREN;
+	}
+
+	return domain->nbentry > 0 ? ret : WALK_TREE_SUCCESS_STOP;
+}
+
+static void domain_tree_remove(struct domain *domain)
+{
+	int ret;
+	struct walk_funcs walkfuncs = { .enter = domain_tree_remove_sub };
+
+	if (domain->nbentry > 0) {
+		ret = walk_node_tree(domain, NULL, "/", &walkfuncs, domain);
+		if (ret == WALK_TREE_ERROR_STOP)
+			syslog(LOG_ERR,
+			       "error when looking for orphaned nodes\n");
+	}
+}
+
 static int destroy_domain(void *_domain)
 {
 	struct domain *domain = _domain;
 
+	domain_tree_remove(domain);
+
 	list_del(&domain->list);
 
 	if (!domain->introduced)
@@ -845,15 +900,15 @@ int domain_entry_inc(struct connection *conn, struct node *node)
 	struct domain *d;
 	unsigned int domid;
 
-	if (!conn)
+	if (!node->perms.p)
 		return 0;
 
-	domid = node->perms.p ? node->perms.p[0].id : conn->id;
+	domid = node->perms.p[0].id;
 
-	if (conn->transaction) {
+	if (conn && conn->transaction) {
 		transaction_entry_inc(conn->transaction, domid);
 	} else {
-		d = (domid == conn->id && conn->domain) ? conn->domain
+		d = (conn && domid == conn->id && conn->domain) ? conn->domain
 		    : find_or_alloc_existing_domain(domid);
 		if (d)
 			d->nbentry++;
@@ -914,23 +969,11 @@ int domain_alloc_permrefs(struct node_perms *perms)
  * Remove permissions for no longer existing domains in order to avoid a new
  * domain with the same domid inheriting the permissions.
  */
-int domain_adjust_node_perms(struct connection *conn, struct node *node)
+int domain_adjust_node_perms(struct node *node)
 {
 	unsigned int i;
 	int ret;
 
-	ret = chk_domain_generation(node->perms.p[0].id, node->generation);
-
-	/* If the owner doesn't exist any longer give it to priv domain. */
-	if (!ret) {
-		/*
-		 * In theory we'd need to update the number of dom0 nodes here,
-		 * but we could be called for a read of the node. So better
-		 * avoid the risk to overflow the node count of dom0.
-		 */
-		node->perms.p[0].id = priv_domid;
-	}
-
 	for (i = 1; i < node->perms.num; i++) {
 		if (node->perms.p[i].perms & XS_PERM_IGNORE)
 			continue;
@@ -948,15 +991,15 @@ void domain_entry_dec(struct connection *conn, struct node *node)
 	struct domain *d;
 	unsigned int domid;
 
-	if (!conn)
+	if (!node->perms.p)
 		return;
 
 	domid = node->perms.p ? node->perms.p[0].id : conn->id;
 
-	if (conn->transaction) {
+	if (conn && conn->transaction) {
 		transaction_entry_dec(conn->transaction, domid);
 	} else {
-		d = (domid == conn->id && conn->domain) ? conn->domain
+		d = (conn && domid == conn->id && conn->domain) ? conn->domain
 		    : find_domain_struct(domid);
 		if (d) {
 			d->nbentry--;
@@ -1075,7 +1118,7 @@ int domain_memory_add(unsigned int domid, int mem, bool no_quota_check)
 		 * exist, as accounting is done either for a domain related to
 		 * the current connection, or for the domain owning a node
 		 * (which is always existing, as the owner of the node is
-		 * tested to exist and replaced by domid 0 if not).
+		 * tested to exist and deleted or replaced by domid 0 if not).
 		 * So not finding the related domain MUST be an error in the
 		 * data base.
 		 */
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index bab405209e2a..5bd253395d77 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -64,7 +64,7 @@ bool domain_can_write(struct connection *conn);
 bool domain_is_unprivileged(struct connection *conn);
 
 /* Remove node permissions for no longer existing domains. */
-int domain_adjust_node_perms(struct connection *conn, struct node *node);
+int domain_adjust_node_perms(struct node *node);
 int domain_alloc_permrefs(struct node_perms *perms);
 
 /* Quota manipulation */
From 656848a5dc314d830038df2ac6c86e42a6a89557 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:13 +0200
Subject: tools/xenstore: make the internal memory data base the default

Having a file backed data base has the only advantage of being capable
to dump the contents of it while Xenstore is running, and potentially
using less swap space in case the data base can't be kept in memory.

It has the major disadvantage of a huge performance overhead: switching
to keep the data base in memory only speeds up live update of xenstored
with 120000 nodes from 20 minutes to 11 seconds. A complete tree walk
of this configuration will be reduced from 7 seconds to 280 msecs
(measured by "xenstore-control check").

So make the internal memory data base the default and enhance the
"--internal-db" command line parameter to take an optional parameter
allowing to switch the internal data base back to the file based one.

This is part of XSA-419.

Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/helpers/init-xenstore-domain.c b/tools/helpers/init-xenstore-domain.c
index adb8408b63c2..b4d7ed573ecc 100644
--- a/tools/helpers/init-xenstore-domain.c
+++ b/tools/helpers/init-xenstore-domain.c
@@ -128,9 +128,9 @@ static int build(xc_interface *xch)
     }
 
     if ( param )
-        snprintf(cmdline, 512, "--event %d --internal-db %s", rv, param);
+        snprintf(cmdline, 512, "--event %d %s", rv, param);
     else
-        snprintf(cmdline, 512, "--event %d --internal-db", rv);
+        snprintf(cmdline, 512, "--event %d", rv);
 
     dom = xc_dom_allocate(xch, cmdline, NULL);
     rv = xc_dom_kernel_file(dom, kernel);
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 7eb698e1fc22..683554ff2cde 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2132,7 +2132,7 @@ static void accept_connection(int sock, bool canwrite)
 }
 #endif
 
-static int tdb_flags;
+static int tdb_flags = TDB_INTERNAL | TDB_NOLOCK;
 
 /* We create initial nodes manually. */
 static void manual_node(const char *name, const char *child)
@@ -2461,7 +2461,8 @@ static void usage(void)
 "                          watch-event: time a watch-event is kept pending\n"
 "  -R, --no-recovery       to request that no recovery should be attempted when\n"
 "                          the store is corrupted (debug only),\n"
-"  -I, --internal-db       store database in memory, not on disk\n"
+"  -I, --internal-db [on|off] store database in memory, not on disk, default is\n"
+"                          memory, with \"--internal-db off\" it is on disk\n"
 "  -K, --keep-orphans      don't delete nodes owned by a domain when the\n"
 "                          domain is deleted (this is a security risk!)\n"
 "  -V, --verbose           to request verbose execution.\n");
@@ -2486,7 +2487,7 @@ static struct option options[] = {
 	{ "quota-soft", 1, NULL, 'q' },
 	{ "timeout", 1, NULL, 'w' },
 	{ "no-recovery", 0, NULL, 'R' },
-	{ "internal-db", 0, NULL, 'I' },
+	{ "internal-db", 2, NULL, 'I' },
 	{ "keep-orphans", 0, NULL, 'K' },
 	{ "verbose", 0, NULL, 'V' },
 	{ "watch-nb", 1, NULL, 'W' },
@@ -2562,7 +2563,8 @@ int main(int argc, char *argv[])
 	int timeout;
 
 
-	while ((opt = getopt_long(argc, argv, "DE:F:HKNPS:t:A:Q:q:T:RVW:w:", options,
+	while ((opt = getopt_long(argc, argv,
+				  "DE:F:HI::KNPS:t:A:Q:q:T:RVW:w:", options,
 				  NULL)) != -1) {
 		switch (opt) {
 		case 'D':
@@ -2596,7 +2598,8 @@ int main(int argc, char *argv[])
 			tracefile = optarg;
 			break;
 		case 'I':
-			tdb_flags = TDB_INTERNAL|TDB_NOLOCK;
+			if (optarg && !strcmp(optarg, "off"))
+				tdb_flags = 0;
 			break;
 		case 'K':
 			keep_orphans = true;
From da4e33921c48ba5f7b206e3034b21db5586a6900 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:13 +0200
Subject: docs: enhance xenstore.txt with permissions description
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The permission scheme of Xenstore nodes is not really covered by
docs/misc/xenstore.txt, other than referring to the Xen wiki.

Add a paragraph explaining the permissions of nodes, and especially
mentioning removal of nodes when a domain has been removed from
Xenstore.

This is part of XSA-419.

Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>

diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
index 0dbac442d79d..d5084bcbaa01 100644
--- a/docs/misc/xenstore.txt
+++ b/docs/misc/xenstore.txt
@@ -43,6 +43,17 @@ bytes are forbidden; clients specifying relative paths should keep
 them to within 2048 bytes.  (See XENSTORE_*_PATH_MAX in xs_wire.h.)
 
 
+Each node has one or multiple permission entries.  Permissions are
+granted by domain-id, the first permission entry of each node specifies
+the owner of the node.  Permissions of a node can be changed by the
+owner of the node, the owner can only be modified by the control
+domain (usually domain id 0).  The owner always has the right to read
+and write the node, while other permissions can be setup to allow
+read and/or write access.  When a domain is being removed from Xenstore
+nodes owned by that domain will be removed together with all of those
+nodes' children.
+
+
 Communication with xenstore is via either sockets, or event channel
 and shared memory, as specified in io/xs_wire.h: each message in
 either direction is a header formatted as a struct xsd_sockmsg
From 2d6fe43936ab00bef9a84b0a60b1546731433920 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
Date: Wed, 12 Oct 2022 19:13:06 +0100
Subject: tools/ocaml/xenstored: Fix quota bypass on domain shutdown
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

XSA-322 fixed a domid reuse vulnerability by assigning Dom0 as the owner of
any nodes left after a domain is shutdown (e.g. outside its /local/domain/N
tree).

However Dom0 has no quota on purpose, so this opened up another potential
attack vector. Avoid it by deleting these nodes instead of assigning them to
Dom0.

This is part of XSA-419 / CVE-2022-42323.

Reported-by: Juergen Gross <jgross@suse.com>
Fixes: c46eff921209 ("tools/ocaml/xenstored: clean up permissions for dead domains")
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

diff --git a/tools/ocaml/xenstored/perms.ml b/tools/ocaml/xenstored/perms.ml
index e8a16221f8fa..84f2503e8e29 100644
--- a/tools/ocaml/xenstored/perms.ml
+++ b/tools/ocaml/xenstored/perms.ml
@@ -64,8 +64,7 @@ let get_owner perm = perm.owner
 * *)
 let remove_domid ~domid perm =
 	let acl = List.filter (fun (acl_domid, _) -> acl_domid <> domid) perm.acl in
-	let owner = if perm.owner = domid then 0 else perm.owner in
-	{ perm with acl; owner }
+	if perm.owner = domid then None else Some { perm with acl; owner = perm.owner }
 
 let default0 = create 0 NONE []
 
diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
index 328d3a51984f..d82764f60fb4 100644
--- a/tools/ocaml/xenstored/store.ml
+++ b/tools/ocaml/xenstored/store.ml
@@ -89,10 +89,21 @@ let check_owner node connection =
 
 let rec recurse fct node = fct node; List.iter (recurse fct) node.children
 
-(** [recurse_map f tree] applies [f] on each node in the tree recursively *)
-let recurse_map f =
+(** [recurse_filter_map f tree] applies [f] on each node in the tree recursively,
+    possibly removing some nodes.
+    Note that the nodes removed this way won't generate watch events.
+*)
+let recurse_filter_map f =
+	let invalid = -1 in
+	let is_valid node = node.perms.owner <> invalid in
 	let rec walk node =
-		f { node with children = List.rev_map walk node.children |> List.rev }
+		(* Map.filter_map is Ocaml 4.11+ only *)
+		let node =
+		{ node with children =
+			List.rev_map walk node.children |> List.filter is_valid |> List.rev } in
+		match f node with
+		| Some keep -> keep
+		| None -> { node with perms = {node.perms with owner = invalid } }
 	in
 	walk
 
@@ -446,11 +457,13 @@ let setperms store perm path nperms =
 
 let reset_permissions store domid =
 	Logging.info "store|node" "Cleaning up xenstore ACLs for domid %d" domid;
-	store.root <- Node.recurse_map (fun node ->
-		let perms = Perms.Node.remove_domid ~domid node.perms in
-		if perms <> node.perms then
-			Logging.debug "store|node" "Changed permissions for node %s" (Node.get_name node);
-		{ node with perms }
+	store.root <- Node.recurse_filter_map (fun node ->
+		match Perms.Node.remove_domid ~domid node.perms with
+		| None -> None
+		| Some perms ->
+			if perms <> node.perms then
+				Logging.debug "store|node" "Changed permissions for node %s" (Node.get_name node);
+			Some { node with perms }
 	) store.root
 
 type ops = {
From 54f77772e8afcc15e3a384216641eabbe9a08da8 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: remove nodes owned by destroyed domain

In case a domain is removed from Xenstore, remove all nodes owned by
it per default.

This tackles the problem that nodes might be created by a domain
outside its home path in Xenstore, leading to Xenstore hogging more
and more memory. Domain quota don't work in this case if the guest is
rebooting in between.

Since XSA-322 ownership of such stale nodes is transferred to dom0,
which is helping against unintended access, but not against OOM of
Xenstore.

As a fallback for weird cases add a Xenstore start parameter for
keeping today's way to handle stale nodes, adding the risk of Xenstore
hitting an OOM situation.

This is part of XSA-419 / CVE-2022-42322.

Reported-by: Juergen Gross <jgross@suse.com>
Fixes: 496306324d8d ("tools/xenstore: revoke access rights for removed domains")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 5bb7b8521324..eb1d7c2b3b9b 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -79,6 +79,7 @@ static bool verbose = false;
 LIST_HEAD(connections);
 int tracefd = -1;
 static bool recovery = true;
+bool keep_orphans = false;
 static int reopen_log_pipe[2];
 static int reopen_log_pipe0_pollfd_idx = -1;
 char *tracefile = NULL;
@@ -664,7 +665,7 @@ struct node *read_node(struct connection *conn, const void *ctx,
 	node->perms.p = hdr->perms;
 	node->acc.domid = node->perms.p[0].id;
 	node->acc.memory = data.dsize;
-	if (domain_adjust_node_perms(conn, node))
+	if (domain_adjust_node_perms(node))
 		goto error;
 
 	/* If owner is gone reset currently accounted memory size. */
@@ -707,7 +708,7 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
 	void *p;
 	struct xs_tdb_record_hdr *hdr;
 
-	if (domain_adjust_node_perms(conn, node))
+	if (domain_adjust_node_perms(node))
 		return errno;
 
 	data.dsize = sizeof(*hdr)
@@ -1548,7 +1549,7 @@ static int delnode_sub(const void *ctx, struct connection *conn,
 	return WALK_TREE_RM_CHILDENTRY;
 }
 
-static int _rm(struct connection *conn, const void *ctx, const char *name)
+int rm_node(struct connection *conn, const void *ctx, const char *name)
 {
 	struct node *parent;
 	char *parentname = get_parent(ctx, name);
@@ -1610,7 +1611,7 @@ static int do_rm(const void *ctx, struct connection *conn,
 	if (streq(name, "/"))
 		return EINVAL;
 
-	ret = _rm(conn, ctx, name);
+	ret = rm_node(conn, ctx, name);
 	if (ret)
 		return ret;
 
@@ -2446,6 +2447,8 @@ static void usage(void)
 "  -R, --no-recovery       to request that no recovery should be attempted when\n"
 "                          the store is corrupted (debug only),\n"
 "  -I, --internal-db       store database in memory, not on disk\n"
+"  -K, --keep-orphans      don't delete nodes owned by a domain when the\n"
+"                          domain is deleted (this is a security risk!)\n"
 "  -V, --verbose           to request verbose execution.\n");
 }
 
@@ -2469,6 +2472,7 @@ static struct option options[] = {
 	{ "timeout", 1, NULL, 'w' },
 	{ "no-recovery", 0, NULL, 'R' },
 	{ "internal-db", 0, NULL, 'I' },
+	{ "keep-orphans", 0, NULL, 'K' },
 	{ "verbose", 0, NULL, 'V' },
 	{ "watch-nb", 1, NULL, 'W' },
 	{ NULL, 0, NULL, 0 } };
@@ -2543,7 +2547,7 @@ int main(int argc, char *argv[])
 	int timeout;
 
 
-	while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:Q:q:T:RVW:w:", options,
+	while ((opt = getopt_long(argc, argv, "DE:F:HKNPS:t:A:Q:q:T:RVW:w:", options,
 				  NULL)) != -1) {
 		switch (opt) {
 		case 'D':
@@ -2579,6 +2583,9 @@ int main(int argc, char *argv[])
 		case 'I':
 			tdb_flags = TDB_INTERNAL|TDB_NOLOCK;
 			break;
+		case 'K':
+			keep_orphans = true;
+			break;
 		case 'V':
 			verbose = true;
 			break;
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index fc9882ac37d5..ec24c27ac240 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -204,6 +204,9 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
 struct node *read_node(struct connection *conn, const void *ctx,
 		       const char *name);
 
+/* Remove a node and its children. */
+int rm_node(struct connection *conn, const void *ctx, const char *name);
+
 void setup_structure(void);
 struct connection *new_connection(connwritefn_t *write, connreadfn_t *read);
 void check_store(void);
@@ -242,6 +245,7 @@ extern int quota_req_outstanding;
 extern int quota_trans_nodes;
 extern int quota_memory_per_domain_soft;
 extern int quota_memory_per_domain_hard;
+extern bool keep_orphans;
 
 extern unsigned int timeout_watch_event_msec;
 
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 1c29f6854119..9e63ec23a922 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -200,10 +200,65 @@ static void unmap_interface(void *interface)
 	xengnttab_unmap(*xgt_handle, interface, 1);
 }
 
+static int domain_tree_remove_sub(const void *ctx, struct connection *conn,
+				  struct node *node, void *arg)
+{
+	struct domain *domain = arg;
+	TDB_DATA key;
+	int ret = WALK_TREE_OK;
+
+	if (node->perms.p[0].id != domain->domid)
+		return WALK_TREE_OK;
+
+	if (keep_orphans) {
+		key.dptr = (char *)node->name;
+		key.dsize = strlen(node->name);
+		domain->nbentry--;
+		node->perms.p[0].id = priv_domid;
+		node->acc.memory = 0;
+		domain_entry_inc(NULL, node);
+		if (write_node_raw(NULL, &key, node, true)) {
+			/* That's unfortunate. We only can try to continue. */
+			syslog(LOG_ERR,
+			       "error when moving orphaned node %s to dom0\n",
+			       node->name);
+		} else
+			trace("orphaned node %s moved to dom0\n", node->name);
+	} else {
+		if (rm_node(NULL, ctx, node->name)) {
+			/* That's unfortunate. We only can try to continue. */
+			syslog(LOG_ERR,
+			       "error when deleting orphaned node %s\n",
+			       node->name);
+		} else
+			trace("orphaned node %s deleted\n", node->name);
+
+		/* Skip children in all cases in order to avoid more errors. */
+		ret = WALK_TREE_SKIP_CHILDREN;
+	}
+
+	return domain->nbentry > 0 ? ret : WALK_TREE_SUCCESS_STOP;
+}
+
+static void domain_tree_remove(struct domain *domain)
+{
+	int ret;
+	struct walk_funcs walkfuncs = { .enter = domain_tree_remove_sub };
+
+	if (domain->nbentry > 0) {
+		ret = walk_node_tree(domain, NULL, "/", &walkfuncs, domain);
+		if (ret == WALK_TREE_ERROR_STOP)
+			syslog(LOG_ERR,
+			       "error when looking for orphaned nodes\n");
+	}
+}
+
 static int destroy_domain(void *_domain)
 {
 	struct domain *domain = _domain;
 
+	domain_tree_remove(domain);
+
 	list_del(&domain->list);
 
 	if (!domain->introduced)
@@ -838,15 +893,15 @@ int domain_entry_inc(struct connection *conn, struct node *node)
 	struct domain *d;
 	unsigned int domid;
 
-	if (!conn)
+	if (!node->perms.p)
 		return 0;
 
-	domid = node->perms.p ? node->perms.p[0].id : conn->id;
+	domid = node->perms.p[0].id;
 
-	if (conn->transaction) {
+	if (conn && conn->transaction) {
 		transaction_entry_inc(conn->transaction, domid);
 	} else {
-		d = (domid == conn->id && conn->domain) ? conn->domain
+		d = (conn && domid == conn->id && conn->domain) ? conn->domain
 		    : find_or_alloc_existing_domain(domid);
 		if (d)
 			d->nbentry++;
@@ -907,23 +962,11 @@ int domain_alloc_permrefs(struct node_perms *perms)
  * Remove permissions for no longer existing domains in order to avoid a new
  * domain with the same domid inheriting the permissions.
  */
-int domain_adjust_node_perms(struct connection *conn, struct node *node)
+int domain_adjust_node_perms(struct node *node)
 {
 	unsigned int i;
 	int ret;
 
-	ret = chk_domain_generation(node->perms.p[0].id, node->generation);
-
-	/* If the owner doesn't exist any longer give it to priv domain. */
-	if (!ret) {
-		/*
-		 * In theory we'd need to update the number of dom0 nodes here,
-		 * but we could be called for a read of the node. So better
-		 * avoid the risk to overflow the node count of dom0.
-		 */
-		node->perms.p[0].id = priv_domid;
-	}
-
 	for (i = 1; i < node->perms.num; i++) {
 		if (node->perms.p[i].perms & XS_PERM_IGNORE)
 			continue;
@@ -941,15 +984,15 @@ void domain_entry_dec(struct connection *conn, struct node *node)
 	struct domain *d;
 	unsigned int domid;
 
-	if (!conn)
+	if (!node->perms.p)
 		return;
 
 	domid = node->perms.p ? node->perms.p[0].id : conn->id;
 
-	if (conn->transaction) {
+	if (conn && conn->transaction) {
 		transaction_entry_dec(conn->transaction, domid);
 	} else {
-		d = (domid == conn->id && conn->domain) ? conn->domain
+		d = (conn && domid == conn->id && conn->domain) ? conn->domain
 		    : find_domain_struct(domid);
 		if (d) {
 			d->nbentry--;
@@ -1068,7 +1111,7 @@ int domain_memory_add(unsigned int domid, int mem, bool no_quota_check)
 		 * exist, as accounting is done either for a domain related to
 		 * the current connection, or for the domain owning a node
 		 * (which is always existing, as the owner of the node is
-		 * tested to exist and replaced by domid 0 if not).
+		 * tested to exist and deleted or replaced by domid 0 if not).
 		 * So not finding the related domain MUST be an error in the
 		 * data base.
 		 */
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index bab405209e2a..5bd253395d77 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -64,7 +64,7 @@ bool domain_can_write(struct connection *conn);
 bool domain_is_unprivileged(struct connection *conn);
 
 /* Remove node permissions for no longer existing domains. */
-int domain_adjust_node_perms(struct connection *conn, struct node *node);
+int domain_adjust_node_perms(struct node *node);
 int domain_alloc_permrefs(struct node_perms *perms);
 
 /* Quota manipulation */
From 46ae93fdfdac7dc088a5f7b11b7cb25c48717727 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:13 +0200
Subject: tools/xenstore: make the internal memory data base the default

Having a file backed data base has the only advantage of being capable
to dump the contents of it while Xenstore is running, and potentially
using less swap space in case the data base can't be kept in memory.

It has the major disadvantage of a huge performance overhead: switching
to keep the data base in memory only speeds up live update of xenstored
with 120000 nodes from 20 minutes to 11 seconds. A complete tree walk
of this configuration will be reduced from 7 seconds to 280 msecs
(measured by "xenstore-control check").

So make the internal memory data base the default and enhance the
"--internal-db" command line parameter to take an optional parameter
allowing to switch the internal data base back to the file based one.

This is part of XSA-419.

Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/helpers/init-xenstore-domain.c b/tools/helpers/init-xenstore-domain.c
index 4ce8299c3c36..2dfc78d4de56 100644
--- a/tools/helpers/init-xenstore-domain.c
+++ b/tools/helpers/init-xenstore-domain.c
@@ -137,9 +137,9 @@ static int build(xc_interface *xch)
     }
 
     if ( param )
-        snprintf(cmdline, 512, "--event %d --internal-db %s", rv, param);
+        snprintf(cmdline, 512, "--event %d %s", rv, param);
     else
-        snprintf(cmdline, 512, "--event %d --internal-db", rv);
+        snprintf(cmdline, 512, "--event %d", rv);
 
     dom = xc_dom_allocate(xch, cmdline, NULL);
     if ( !dom )
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index eb1d7c2b3b9b..7ed852fd7958 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2134,7 +2134,7 @@ static void accept_connection(int sock, bool canwrite)
 }
 #endif
 
-static int tdb_flags;
+static int tdb_flags = TDB_INTERNAL | TDB_NOLOCK;
 
 /* We create initial nodes manually. */
 static void manual_node(const char *name, const char *child)
@@ -2446,7 +2446,8 @@ static void usage(void)
 "                          watch-event: time a watch-event is kept pending\n"
 "  -R, --no-recovery       to request that no recovery should be attempted when\n"
 "                          the store is corrupted (debug only),\n"
-"  -I, --internal-db       store database in memory, not on disk\n"
+"  -I, --internal-db [on|off] store database in memory, not on disk, default is\n"
+"                          memory, with \"--internal-db off\" it is on disk\n"
 "  -K, --keep-orphans      don't delete nodes owned by a domain when the\n"
 "                          domain is deleted (this is a security risk!)\n"
 "  -V, --verbose           to request verbose execution.\n");
@@ -2471,7 +2472,7 @@ static struct option options[] = {
 	{ "quota-soft", 1, NULL, 'q' },
 	{ "timeout", 1, NULL, 'w' },
 	{ "no-recovery", 0, NULL, 'R' },
-	{ "internal-db", 0, NULL, 'I' },
+	{ "internal-db", 2, NULL, 'I' },
 	{ "keep-orphans", 0, NULL, 'K' },
 	{ "verbose", 0, NULL, 'V' },
 	{ "watch-nb", 1, NULL, 'W' },
@@ -2547,7 +2548,8 @@ int main(int argc, char *argv[])
 	int timeout;
 
 
-	while ((opt = getopt_long(argc, argv, "DE:F:HKNPS:t:A:Q:q:T:RVW:w:", options,
+	while ((opt = getopt_long(argc, argv,
+				  "DE:F:HI::KNPS:t:A:Q:q:T:RVW:w:", options,
 				  NULL)) != -1) {
 		switch (opt) {
 		case 'D':
@@ -2581,7 +2583,8 @@ int main(int argc, char *argv[])
 			tracefile = optarg;
 			break;
 		case 'I':
-			tdb_flags = TDB_INTERNAL|TDB_NOLOCK;
+			if (optarg && !strcmp(optarg, "off"))
+				tdb_flags = 0;
 			break;
 		case 'K':
 			keep_orphans = true;
From be8aca4b46bddc66e8596b6dd920c23d5339e2f0 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:13 +0200
Subject: docs: enhance xenstore.txt with permissions description
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The permission scheme of Xenstore nodes is not really covered by
docs/misc/xenstore.txt, other than referring to the Xen wiki.

Add a paragraph explaining the permissions of nodes, and especially
mentioning removal of nodes when a domain has been removed from
Xenstore.

This is part of XSA-419.

Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>

diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
index 1f42a377c10f..6aa07c5ed80b 100644
--- a/docs/misc/xenstore.txt
+++ b/docs/misc/xenstore.txt
@@ -43,6 +43,17 @@ bytes are forbidden; clients specifying relative paths should keep
 them to within 2048 bytes.  (See XENSTORE_*_PATH_MAX in xs_wire.h.)
 
 
+Each node has one or multiple permission entries.  Permissions are
+granted by domain-id, the first permission entry of each node specifies
+the owner of the node.  Permissions of a node can be changed by the
+owner of the node, the owner can only be modified by the control
+domain (usually domain id 0).  The owner always has the right to read
+and write the node, while other permissions can be setup to allow
+read and/or write access.  When a domain is being removed from Xenstore
+nodes owned by that domain will be removed together with all of those
+nodes' children.
+
+
 Communication with xenstore is via either sockets, or event channel
 and shared memory, as specified in io/xs_wire.h: each message in
 either direction is a header formatted as a struct xsd_sockmsg
From 04d04545ca1b281596e1e13b39113f6a37259ab1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
Date: Wed, 12 Oct 2022 19:13:06 +0100
Subject: tools/ocaml/xenstored: Fix quota bypass on domain shutdown
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

XSA-322 fixed a domid reuse vulnerability by assigning Dom0 as the owner of
any nodes left after a domain is shutdown (e.g. outside its /local/domain/N
tree).

However Dom0 has no quota on purpose, so this opened up another potential
attack vector. Avoid it by deleting these nodes instead of assigning them to
Dom0.

This is part of XSA-419 / CVE-2022-42323.

Reported-by: Juergen Gross <jgross@suse.com>
Fixes: c46eff921209 ("tools/ocaml/xenstored: clean up permissions for dead domains")
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

diff --git a/tools/ocaml/xenstored/perms.ml b/tools/ocaml/xenstored/perms.ml
index e8a16221f8fa..84f2503e8e29 100644
--- a/tools/ocaml/xenstored/perms.ml
+++ b/tools/ocaml/xenstored/perms.ml
@@ -64,8 +64,7 @@ let get_owner perm = perm.owner
 * *)
 let remove_domid ~domid perm =
 	let acl = List.filter (fun (acl_domid, _) -> acl_domid <> domid) perm.acl in
-	let owner = if perm.owner = domid then 0 else perm.owner in
-	{ perm with acl; owner }
+	if perm.owner = domid then None else Some { perm with acl; owner = perm.owner }
 
 let default0 = create 0 NONE []
 
diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
index 20e67b142746..70f0c83de404 100644
--- a/tools/ocaml/xenstored/store.ml
+++ b/tools/ocaml/xenstored/store.ml
@@ -87,10 +87,21 @@ let check_owner node connection =
 
 let rec recurse fct node = fct node; SymbolMap.iter (fun _ -> recurse fct) node.children
 
-(** [recurse_map f tree] applies [f] on each node in the tree recursively *)
-let recurse_map f =
+(** [recurse_filter_map f tree] applies [f] on each node in the tree recursively,
+    possibly removing some nodes.
+    Note that the nodes removed this way won't generate watch events.
+*)
+let recurse_filter_map f =
+	let invalid = -1 in
+	let is_valid _ node = node.perms.owner <> invalid in
 	let rec walk node =
-		f { node with children = SymbolMap.map walk node.children }
+		(* Map.filter_map is Ocaml 4.11+ only *)
+		let node =
+		{ node with children =
+			SymbolMap.map walk node.children |> SymbolMap.filter is_valid } in
+		match f node with
+		| Some keep -> keep
+		| None -> { node with perms = {node.perms with owner = invalid } }
 	in
 	walk
 
@@ -444,11 +455,13 @@ let setperms store perm path nperms =
 
 let reset_permissions store domid =
 	Logging.info "store|node" "Cleaning up xenstore ACLs for domid %d" domid;
-	store.root <- Node.recurse_map (fun node ->
-		let perms = Perms.Node.remove_domid ~domid node.perms in
-		if perms <> node.perms then
-			Logging.debug "store|node" "Changed permissions for node %s" (Node.get_name node);
-		{ node with perms }
+	store.root <- Node.recurse_filter_map (fun node ->
+		match Perms.Node.remove_domid ~domid node.perms with
+		| None -> None
+		| Some perms ->
+			if perms <> node.perms then
+				Logging.debug "store|node" "Changed permissions for node %s" (Node.get_name node);
+			Some { node with perms }
 	) store.root
 
 type ops = {
From 49637f211a88672706c1c980f0348bf836d516bd Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: remove nodes owned by destroyed domain

In case a domain is removed from Xenstore, remove all nodes owned by
it per default.

This tackles the problem that nodes might be created by a domain
outside its home path in Xenstore, leading to Xenstore hogging more
and more memory. Domain quota don't work in this case if the guest is
rebooting in between.

Since XSA-322 ownership of such stale nodes is transferred to dom0,
which is helping against unintended access, but not against OOM of
Xenstore.

As a fallback for weird cases add a Xenstore start parameter for
keeping today's way to handle stale nodes, adding the risk of Xenstore
hitting an OOM situation.

This is part of XSA-419 / CVE-2022-42322.

Reported-by: Juergen Gross <jgross@suse.com>
Fixes: 496306324d8d ("tools/xenstore: revoke access rights for removed domains")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index e8cdfeef50c7..d5b2e59b0db6 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -80,6 +80,7 @@ static bool verbose = false;
 LIST_HEAD(connections);
 int tracefd = -1;
 static bool recovery = true;
+bool keep_orphans = false;
 static int reopen_log_pipe[2];
 static int reopen_log_pipe0_pollfd_idx = -1;
 char *tracefile = NULL;
@@ -722,7 +723,7 @@ struct node *read_node(struct connection *conn, const void *ctx,
 	node->perms.p = hdr->perms;
 	node->acc.domid = node->perms.p[0].id;
 	node->acc.memory = data.dsize;
-	if (domain_adjust_node_perms(conn, node))
+	if (domain_adjust_node_perms(node))
 		goto error;
 
 	/* If owner is gone reset currently accounted memory size. */
@@ -765,7 +766,7 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
 	void *p;
 	struct xs_tdb_record_hdr *hdr;
 
-	if (domain_adjust_node_perms(conn, node))
+	if (domain_adjust_node_perms(node))
 		return errno;
 
 	data.dsize = sizeof(*hdr)
@@ -1617,7 +1618,7 @@ static int delnode_sub(const void *ctx, struct connection *conn,
 	return WALK_TREE_RM_CHILDENTRY;
 }
 
-static int _rm(struct connection *conn, const void *ctx, const char *name)
+int rm_node(struct connection *conn, const void *ctx, const char *name)
 {
 	struct node *parent;
 	char *parentname = get_parent(ctx, name);
@@ -1681,7 +1682,7 @@ static int do_rm(const void *ctx, struct connection *conn,
 	if (streq(name, "/"))
 		return EINVAL;
 
-	ret = _rm(conn, ctx, name);
+	ret = rm_node(conn, ctx, name);
 	if (ret)
 		return ret;
 
@@ -2537,6 +2538,8 @@ static void usage(void)
 "  -R, --no-recovery       to request that no recovery should be attempted when\n"
 "                          the store is corrupted (debug only),\n"
 "  -I, --internal-db       store database in memory, not on disk\n"
+"  -K, --keep-orphans      don't delete nodes owned by a domain when the\n"
+"                          domain is deleted (this is a security risk!)\n"
 "  -V, --verbose           to request verbose execution.\n");
 }
 
@@ -2561,6 +2564,7 @@ static struct option options[] = {
 	{ "timeout", 1, NULL, 'w' },
 	{ "no-recovery", 0, NULL, 'R' },
 	{ "internal-db", 0, NULL, 'I' },
+	{ "keep-orphans", 0, NULL, 'K' },
 	{ "verbose", 0, NULL, 'V' },
 	{ "watch-nb", 1, NULL, 'W' },
 #ifndef NO_LIVE_UPDATE
@@ -2641,7 +2645,7 @@ int main(int argc, char *argv[])
 	orig_argc = argc;
 	orig_argv = argv;
 
-	while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:Q:q:T:RVW:w:U",
+	while ((opt = getopt_long(argc, argv, "DE:F:HKNPS:t:A:M:Q:q:T:RVW:w:U",
 				  options, NULL)) != -1) {
 		switch (opt) {
 		case 'D':
@@ -2677,6 +2681,9 @@ int main(int argc, char *argv[])
 		case 'I':
 			tdb_flags = TDB_INTERNAL|TDB_NOLOCK;
 			break;
+		case 'K':
+			keep_orphans = true;
+			break;
 		case 'V':
 			verbose = true;
 			break;
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 3190494bbeb5..9a9dbb2c3c86 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -233,6 +233,9 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
 struct node *read_node(struct connection *conn, const void *ctx,
 		       const char *name);
 
+/* Remove a node and its children. */
+int rm_node(struct connection *conn, const void *ctx, const char *name);
+
 void setup_structure(bool live_update);
 struct connection *new_connection(connwritefn_t *write, connreadfn_t *read);
 struct connection *get_connection_by_id(unsigned int conn_id);
@@ -279,6 +282,7 @@ extern int quota_req_outstanding;
 extern int quota_trans_nodes;
 extern int quota_memory_per_domain_soft;
 extern int quota_memory_per_domain_hard;
+extern bool keep_orphans;
 
 extern unsigned int timeout_watch_event_msec;
 
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index a91cc75ab59b..ee4b19387db8 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -196,10 +196,64 @@ static void unmap_interface(void *interface)
 	xengnttab_unmap(*xgt_handle, interface, 1);
 }
 
+static int domain_tree_remove_sub(const void *ctx, struct connection *conn,
+				  struct node *node, void *arg)
+{
+	struct domain *domain = arg;
+	TDB_DATA key;
+	int ret = WALK_TREE_OK;
+
+	if (node->perms.p[0].id != domain->domid)
+		return WALK_TREE_OK;
+
+	if (keep_orphans) {
+		set_tdb_key(node->name, &key);
+		domain->nbentry--;
+		node->perms.p[0].id = priv_domid;
+		node->acc.memory = 0;
+		domain_entry_inc(NULL, node);
+		if (write_node_raw(NULL, &key, node, true)) {
+			/* That's unfortunate. We only can try to continue. */
+			syslog(LOG_ERR,
+			       "error when moving orphaned node %s to dom0\n",
+			       node->name);
+		} else
+			trace("orphaned node %s moved to dom0\n", node->name);
+	} else {
+		if (rm_node(NULL, ctx, node->name)) {
+			/* That's unfortunate. We only can try to continue. */
+			syslog(LOG_ERR,
+			       "error when deleting orphaned node %s\n",
+			       node->name);
+		} else
+			trace("orphaned node %s deleted\n", node->name);
+
+		/* Skip children in all cases in order to avoid more errors. */
+		ret = WALK_TREE_SKIP_CHILDREN;
+	}
+
+	return domain->nbentry > 0 ? ret : WALK_TREE_SUCCESS_STOP;
+}
+
+static void domain_tree_remove(struct domain *domain)
+{
+	int ret;
+	struct walk_funcs walkfuncs = { .enter = domain_tree_remove_sub };
+
+	if (domain->nbentry > 0) {
+		ret = walk_node_tree(domain, NULL, "/", &walkfuncs, domain);
+		if (ret == WALK_TREE_ERROR_STOP)
+			syslog(LOG_ERR,
+			       "error when looking for orphaned nodes\n");
+	}
+}
+
 static int destroy_domain(void *_domain)
 {
 	struct domain *domain = _domain;
 
+	domain_tree_remove(domain);
+
 	list_del(&domain->list);
 
 	if (!domain->introduced)
@@ -857,15 +911,15 @@ int domain_entry_inc(struct connection *conn, struct node *node)
 	struct domain *d;
 	unsigned int domid;
 
-	if (!conn)
+	if (!node->perms.p)
 		return 0;
 
-	domid = node->perms.p ? node->perms.p[0].id : conn->id;
+	domid = node->perms.p[0].id;
 
-	if (conn->transaction) {
+	if (conn && conn->transaction) {
 		transaction_entry_inc(conn->transaction, domid);
 	} else {
-		d = (domid == conn->id && conn->domain) ? conn->domain
+		d = (conn && domid == conn->id && conn->domain) ? conn->domain
 		    : find_or_alloc_existing_domain(domid);
 		if (d)
 			d->nbentry++;
@@ -926,23 +980,11 @@ int domain_alloc_permrefs(struct node_perms *perms)
  * Remove permissions for no longer existing domains in order to avoid a new
  * domain with the same domid inheriting the permissions.
  */
-int domain_adjust_node_perms(struct connection *conn, struct node *node)
+int domain_adjust_node_perms(struct node *node)
 {
 	unsigned int i;
 	int ret;
 
-	ret = chk_domain_generation(node->perms.p[0].id, node->generation);
-
-	/* If the owner doesn't exist any longer give it to priv domain. */
-	if (!ret) {
-		/*
-		 * In theory we'd need to update the number of dom0 nodes here,
-		 * but we could be called for a read of the node. So better
-		 * avoid the risk to overflow the node count of dom0.
-		 */
-		node->perms.p[0].id = priv_domid;
-	}
-
 	for (i = 1; i < node->perms.num; i++) {
 		if (node->perms.p[i].perms & XS_PERM_IGNORE)
 			continue;
@@ -960,15 +1002,15 @@ void domain_entry_dec(struct connection *conn, struct node *node)
 	struct domain *d;
 	unsigned int domid;
 
-	if (!conn)
+	if (!node->perms.p)
 		return;
 
 	domid = node->perms.p ? node->perms.p[0].id : conn->id;
 
-	if (conn->transaction) {
+	if (conn && conn->transaction) {
 		transaction_entry_dec(conn->transaction, domid);
 	} else {
-		d = (domid == conn->id && conn->domain) ? conn->domain
+		d = (conn && domid == conn->id && conn->domain) ? conn->domain
 		    : find_domain_struct(domid);
 		if (d) {
 			d->nbentry--;
@@ -1087,7 +1129,7 @@ int domain_memory_add(unsigned int domid, int mem, bool no_quota_check)
 		 * exist, as accounting is done either for a domain related to
 		 * the current connection, or for the domain owning a node
 		 * (which is always existing, as the owner of the node is
-		 * tested to exist and replaced by domid 0 if not).
+		 * tested to exist and deleted or replaced by domid 0 if not).
 		 * So not finding the related domain MUST be an error in the
 		 * data base.
 		 */
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index 0b4f56b8146c..491d7a325bd3 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -65,7 +65,7 @@ bool domain_can_write(struct connection *conn);
 bool domain_is_unprivileged(struct connection *conn);
 
 /* Remove node permissions for no longer existing domains. */
-int domain_adjust_node_perms(struct connection *conn, struct node *node);
+int domain_adjust_node_perms(struct node *node);
 int domain_alloc_permrefs(struct node_perms *perms);
 
 /* Quota manipulation */
From 95a6c85b0a4bd7828cf9860770254cee7810e8b0 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:13 +0200
Subject: tools/xenstore: make the internal memory data base the default

Having a file backed data base has the only advantage of being capable
to dump the contents of it while Xenstore is running, and potentially
using less swap space in case the data base can't be kept in memory.

It has the major disadvantage of a huge performance overhead: switching
to keep the data base in memory only speeds up live update of xenstored
with 120000 nodes from 20 minutes to 11 seconds. A complete tree walk
of this configuration will be reduced from 7 seconds to 280 msecs
(measured by "xenstore-control check").

So make the internal memory data base the default and enhance the
"--internal-db" command line parameter to take an optional parameter
allowing to switch the internal data base back to the file based one.

This is part of XSA-419.

Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/helpers/init-xenstore-domain.c b/tools/helpers/init-xenstore-domain.c
index 32689abd7479..d080dae5d3b8 100644
--- a/tools/helpers/init-xenstore-domain.c
+++ b/tools/helpers/init-xenstore-domain.c
@@ -214,9 +214,9 @@ static int build(xc_interface *xch)
     }
 
     if ( param )
-        snprintf(cmdline, 512, "--event %d --internal-db %s", rv, param);
+        snprintf(cmdline, 512, "--event %d %s", rv, param);
     else
-        snprintf(cmdline, 512, "--event %d --internal-db", rv);
+        snprintf(cmdline, 512, "--event %d", rv);
 
     dom->cmdline = xc_dom_strdup(dom, cmdline);
     dom->xenstore_domid = domid;
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index d5b2e59b0db6..9ddbd934f794 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2230,7 +2230,7 @@ static void accept_connection(int sock)
 }
 #endif
 
-static int tdb_flags;
+static int tdb_flags = TDB_INTERNAL | TDB_NOLOCK;
 
 /* We create initial nodes manually. */
 static void manual_node(const char *name, const char *child)
@@ -2537,7 +2537,8 @@ static void usage(void)
 "                          watch-event: time a watch-event is kept pending\n"
 "  -R, --no-recovery       to request that no recovery should be attempted when\n"
 "                          the store is corrupted (debug only),\n"
-"  -I, --internal-db       store database in memory, not on disk\n"
+"  -I, --internal-db [on|off] store database in memory, not on disk, default is\n"
+"                          memory, with \"--internal-db off\" it is on disk\n"
 "  -K, --keep-orphans      don't delete nodes owned by a domain when the\n"
 "                          domain is deleted (this is a security risk!)\n"
 "  -V, --verbose           to request verbose execution.\n");
@@ -2563,7 +2564,7 @@ static struct option options[] = {
 	{ "quota-soft", 1, NULL, 'q' },
 	{ "timeout", 1, NULL, 'w' },
 	{ "no-recovery", 0, NULL, 'R' },
-	{ "internal-db", 0, NULL, 'I' },
+	{ "internal-db", 2, NULL, 'I' },
 	{ "keep-orphans", 0, NULL, 'K' },
 	{ "verbose", 0, NULL, 'V' },
 	{ "watch-nb", 1, NULL, 'W' },
@@ -2645,7 +2646,8 @@ int main(int argc, char *argv[])
 	orig_argc = argc;
 	orig_argv = argv;
 
-	while ((opt = getopt_long(argc, argv, "DE:F:HKNPS:t:A:M:Q:q:T:RVW:w:U",
+	while ((opt = getopt_long(argc, argv,
+				  "DE:F:HI::KNPS:t:A:M:Q:q:T:RVW:w:U",
 				  options, NULL)) != -1) {
 		switch (opt) {
 		case 'D':
@@ -2679,7 +2681,8 @@ int main(int argc, char *argv[])
 			tracefile = optarg;
 			break;
 		case 'I':
-			tdb_flags = TDB_INTERNAL|TDB_NOLOCK;
+			if (optarg && !strcmp(optarg, "off"))
+				tdb_flags = 0;
 			break;
 		case 'K':
 			keep_orphans = true;
From d4f7b150d38c644a6efb4a19385f9bb92c76842d Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:13 +0200
Subject: docs: enhance xenstore.txt with permissions description
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The permission scheme of Xenstore nodes is not really covered by
docs/misc/xenstore.txt, other than referring to the Xen wiki.

Add a paragraph explaining the permissions of nodes, and especially
mentioning removal of nodes when a domain has been removed from
Xenstore.

This is part of XSA-419.

Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>

diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
index a7d006519ae8..eccd596ee38c 100644
--- a/docs/misc/xenstore.txt
+++ b/docs/misc/xenstore.txt
@@ -43,6 +43,17 @@ bytes are forbidden; clients specifying relative paths should keep
 them to within 2048 bytes.  (See XENSTORE_*_PATH_MAX in xs_wire.h.)
 
 
+Each node has one or multiple permission entries.  Permissions are
+granted by domain-id, the first permission entry of each node specifies
+the owner of the node.  Permissions of a node can be changed by the
+owner of the node, the owner can only be modified by the control
+domain (usually domain id 0).  The owner always has the right to read
+and write the node, while other permissions can be setup to allow
+read and/or write access.  When a domain is being removed from Xenstore
+nodes owned by that domain will be removed together with all of those
+nodes' children.
+
+
 Communication with xenstore is via either sockets, or event channel
 and shared memory, as specified in io/xs_wire.h: each message in
 either direction is a header formatted as a struct xsd_sockmsg
From 09228369a549427294febe351372d7227e624da1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
Date: Wed, 12 Oct 2022 19:13:06 +0100
Subject: tools/ocaml/xenstored: Fix quota bypass on domain shutdown
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

XSA-322 fixed a domid reuse vulnerability by assigning Dom0 as the owner of
any nodes left after a domain is shutdown (e.g. outside its /local/domain/N
tree).

However Dom0 has no quota on purpose, so this opened up another potential
attack vector. Avoid it by deleting these nodes instead of assigning them to
Dom0.

This is part of XSA-419 / CVE-2022-42323.

Reported-by: Juergen Gross <jgross@suse.com>
Fixes: c46eff921209 ("tools/ocaml/xenstored: clean up permissions for dead domains")
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>

diff --git a/tools/ocaml/xenstored/perms.ml b/tools/ocaml/xenstored/perms.ml
index e8a16221f8fa..84f2503e8e29 100644
--- a/tools/ocaml/xenstored/perms.ml
+++ b/tools/ocaml/xenstored/perms.ml
@@ -64,8 +64,7 @@ let get_owner perm = perm.owner
 * *)
 let remove_domid ~domid perm =
 	let acl = List.filter (fun (acl_domid, _) -> acl_domid <> domid) perm.acl in
-	let owner = if perm.owner = domid then 0 else perm.owner in
-	{ perm with acl; owner }
+	if perm.owner = domid then None else Some { perm with acl; owner = perm.owner }
 
 let default0 = create 0 NONE []
 
diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
index 20e67b142746..70f0c83de404 100644
--- a/tools/ocaml/xenstored/store.ml
+++ b/tools/ocaml/xenstored/store.ml
@@ -87,10 +87,21 @@ let check_owner node connection =
 
 let rec recurse fct node = fct node; SymbolMap.iter (fun _ -> recurse fct) node.children
 
-(** [recurse_map f tree] applies [f] on each node in the tree recursively *)
-let recurse_map f =
+(** [recurse_filter_map f tree] applies [f] on each node in the tree recursively,
+    possibly removing some nodes.
+    Note that the nodes removed this way won't generate watch events.
+*)
+let recurse_filter_map f =
+	let invalid = -1 in
+	let is_valid _ node = node.perms.owner <> invalid in
 	let rec walk node =
-		f { node with children = SymbolMap.map walk node.children }
+		(* Map.filter_map is Ocaml 4.11+ only *)
+		let node =
+		{ node with children =
+			SymbolMap.map walk node.children |> SymbolMap.filter is_valid } in
+		match f node with
+		| Some keep -> keep
+		| None -> { node with perms = {node.perms with owner = invalid } }
 	in
 	walk
 
@@ -444,11 +455,13 @@ let setperms store perm path nperms =
 
 let reset_permissions store domid =
 	Logging.info "store|node" "Cleaning up xenstore ACLs for domid %d" domid;
-	store.root <- Node.recurse_map (fun node ->
-		let perms = Perms.Node.remove_domid ~domid node.perms in
-		if perms <> node.perms then
-			Logging.debug "store|node" "Changed permissions for node %s" (Node.get_name node);
-		{ node with perms }
+	store.root <- Node.recurse_filter_map (fun node ->
+		match Perms.Node.remove_domid ~domid node.perms with
+		| None -> None
+		| Some perms ->
+			if perms <> node.perms then
+				Logging.debug "store|node" "Changed permissions for node %s" (Node.get_name node);
+			Some { node with perms }
 	) store.root
 
 type ops = {
From e4250bf8b39ed73623c75b0f1436ac7c1d45aba7 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: remove nodes owned by destroyed domain

In case a domain is removed from Xenstore, remove all nodes owned by
it per default.

This tackles the problem that nodes might be created by a domain
outside its home path in Xenstore, leading to Xenstore hogging more
and more memory. Domain quota don't work in this case if the guest is
rebooting in between.

Since XSA-322 ownership of such stale nodes is transferred to dom0,
which is helping against unintended access, but not against OOM of
Xenstore.

As a fallback for weird cases add a Xenstore start parameter for
keeping today's way to handle stale nodes, adding the risk of Xenstore
hitting an OOM situation.

This is part of XSA-419 / CVE-2022-42322.

Reported-by: Juergen Gross <jgross@suse.com>
Fixes: 496306324d8d ("tools/xenstore: revoke access rights for removed domains")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index bdc14679adf5..13e48aaa731c 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -80,6 +80,7 @@ static bool verbose = false;
 LIST_HEAD(connections);
 int tracefd = -1;
 static bool recovery = true;
+bool keep_orphans = false;
 static int reopen_log_pipe[2];
 static int reopen_log_pipe0_pollfd_idx = -1;
 char *tracefile = NULL;
@@ -753,7 +754,7 @@ struct node *read_node(struct connection *conn, const void *ctx,
 	node->perms.p = hdr->perms;
 	node->acc.domid = node->perms.p[0].id;
 	node->acc.memory = data.dsize;
-	if (domain_adjust_node_perms(conn, node))
+	if (domain_adjust_node_perms(node))
 		goto error;
 
 	/* If owner is gone reset currently accounted memory size. */
@@ -796,7 +797,7 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
 	void *p;
 	struct xs_tdb_record_hdr *hdr;
 
-	if (domain_adjust_node_perms(conn, node))
+	if (domain_adjust_node_perms(node))
 		return errno;
 
 	data.dsize = sizeof(*hdr)
@@ -1647,7 +1648,7 @@ static int delnode_sub(const void *ctx, struct connection *conn,
 	return WALK_TREE_RM_CHILDENTRY;
 }
 
-static int _rm(struct connection *conn, const void *ctx, const char *name)
+int rm_node(struct connection *conn, const void *ctx, const char *name)
 {
 	struct node *parent;
 	char *parentname = get_parent(ctx, name);
@@ -1711,7 +1712,7 @@ static int do_rm(const void *ctx, struct connection *conn,
 	if (streq(name, "/"))
 		return EINVAL;
 
-	ret = _rm(conn, ctx, name);
+	ret = rm_node(conn, ctx, name);
 	if (ret)
 		return ret;
 
@@ -2618,6 +2619,8 @@ static void usage(void)
 "  -R, --no-recovery       to request that no recovery should be attempted when\n"
 "                          the store is corrupted (debug only),\n"
 "  -I, --internal-db       store database in memory, not on disk\n"
+"  -K, --keep-orphans      don't delete nodes owned by a domain when the\n"
+"                          domain is deleted (this is a security risk!)\n"
 "  -V, --verbose           to request verbose execution.\n");
 }
 
@@ -2642,6 +2645,7 @@ static struct option options[] = {
 	{ "timeout", 1, NULL, 'w' },
 	{ "no-recovery", 0, NULL, 'R' },
 	{ "internal-db", 0, NULL, 'I' },
+	{ "keep-orphans", 0, NULL, 'K' },
 	{ "verbose", 0, NULL, 'V' },
 	{ "watch-nb", 1, NULL, 'W' },
 #ifndef NO_LIVE_UPDATE
@@ -2721,7 +2725,7 @@ int main(int argc, char *argv[])
 	orig_argc = argc;
 	orig_argv = argv;
 
-	while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:Q:q:T:RVW:w:U",
+	while ((opt = getopt_long(argc, argv, "DE:F:HKNPS:t:A:M:Q:q:T:RVW:w:U",
 				  options, NULL)) != -1) {
 		switch (opt) {
 		case 'D':
@@ -2757,6 +2761,9 @@ int main(int argc, char *argv[])
 		case 'I':
 			tdb_flags = TDB_INTERNAL|TDB_NOLOCK;
 			break;
+		case 'K':
+			keep_orphans = true;
+			break;
 		case 'V':
 			verbose = true;
 			break;
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index acb00ad96914..37006d508dbf 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -240,6 +240,9 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
 struct node *read_node(struct connection *conn, const void *ctx,
 		       const char *name);
 
+/* Remove a node and its children. */
+int rm_node(struct connection *conn, const void *ctx, const char *name);
+
 void setup_structure(bool live_update);
 struct connection *new_connection(const struct interface_funcs *funcs);
 struct connection *get_connection_by_id(unsigned int conn_id);
@@ -284,6 +287,7 @@ extern int quota_req_outstanding;
 extern int quota_trans_nodes;
 extern int quota_memory_per_domain_soft;
 extern int quota_memory_per_domain_hard;
+extern bool keep_orphans;
 
 extern unsigned int timeout_watch_event_msec;
 
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 98b401fdec30..84b7817cd5e6 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -227,10 +227,64 @@ static void unmap_interface(void *interface)
 	xengnttab_unmap(*xgt_handle, interface, 1);
 }
 
+static int domain_tree_remove_sub(const void *ctx, struct connection *conn,
+				  struct node *node, void *arg)
+{
+	struct domain *domain = arg;
+	TDB_DATA key;
+	int ret = WALK_TREE_OK;
+
+	if (node->perms.p[0].id != domain->domid)
+		return WALK_TREE_OK;
+
+	if (keep_orphans) {
+		set_tdb_key(node->name, &key);
+		domain->nbentry--;
+		node->perms.p[0].id = priv_domid;
+		node->acc.memory = 0;
+		domain_entry_inc(NULL, node);
+		if (write_node_raw(NULL, &key, node, true)) {
+			/* That's unfortunate. We only can try to continue. */
+			syslog(LOG_ERR,
+			       "error when moving orphaned node %s to dom0\n",
+			       node->name);
+		} else
+			trace("orphaned node %s moved to dom0\n", node->name);
+	} else {
+		if (rm_node(NULL, ctx, node->name)) {
+			/* That's unfortunate. We only can try to continue. */
+			syslog(LOG_ERR,
+			       "error when deleting orphaned node %s\n",
+			       node->name);
+		} else
+			trace("orphaned node %s deleted\n", node->name);
+
+		/* Skip children in all cases in order to avoid more errors. */
+		ret = WALK_TREE_SKIP_CHILDREN;
+	}
+
+	return domain->nbentry > 0 ? ret : WALK_TREE_SUCCESS_STOP;
+}
+
+static void domain_tree_remove(struct domain *domain)
+{
+	int ret;
+	struct walk_funcs walkfuncs = { .enter = domain_tree_remove_sub };
+
+	if (domain->nbentry > 0) {
+		ret = walk_node_tree(domain, NULL, "/", &walkfuncs, domain);
+		if (ret == WALK_TREE_ERROR_STOP)
+			syslog(LOG_ERR,
+			       "error when looking for orphaned nodes\n");
+	}
+}
+
 static int destroy_domain(void *_domain)
 {
 	struct domain *domain = _domain;
 
+	domain_tree_remove(domain);
+
 	list_del(&domain->list);
 
 	if (!domain->introduced)
@@ -883,15 +937,15 @@ int domain_entry_inc(struct connection *conn, struct node *node)
 	struct domain *d;
 	unsigned int domid;
 
-	if (!conn)
+	if (!node->perms.p)
 		return 0;
 
-	domid = node->perms.p ? node->perms.p[0].id : conn->id;
+	domid = node->perms.p[0].id;
 
-	if (conn->transaction) {
+	if (conn && conn->transaction) {
 		transaction_entry_inc(conn->transaction, domid);
 	} else {
-		d = (domid == conn->id && conn->domain) ? conn->domain
+		d = (conn && domid == conn->id && conn->domain) ? conn->domain
 		    : find_or_alloc_existing_domain(domid);
 		if (d)
 			d->nbentry++;
@@ -952,23 +1006,11 @@ int domain_alloc_permrefs(struct node_perms *perms)
  * Remove permissions for no longer existing domains in order to avoid a new
  * domain with the same domid inheriting the permissions.
  */
-int domain_adjust_node_perms(struct connection *conn, struct node *node)
+int domain_adjust_node_perms(struct node *node)
 {
 	unsigned int i;
 	int ret;
 
-	ret = chk_domain_generation(node->perms.p[0].id, node->generation);
-
-	/* If the owner doesn't exist any longer give it to priv domain. */
-	if (!ret) {
-		/*
-		 * In theory we'd need to update the number of dom0 nodes here,
-		 * but we could be called for a read of the node. So better
-		 * avoid the risk to overflow the node count of dom0.
-		 */
-		node->perms.p[0].id = priv_domid;
-	}
-
 	for (i = 1; i < node->perms.num; i++) {
 		if (node->perms.p[i].perms & XS_PERM_IGNORE)
 			continue;
@@ -986,15 +1028,15 @@ void domain_entry_dec(struct connection *conn, struct node *node)
 	struct domain *d;
 	unsigned int domid;
 
-	if (!conn)
+	if (!node->perms.p)
 		return;
 
 	domid = node->perms.p ? node->perms.p[0].id : conn->id;
 
-	if (conn->transaction) {
+	if (conn && conn->transaction) {
 		transaction_entry_dec(conn->transaction, domid);
 	} else {
-		d = (domid == conn->id && conn->domain) ? conn->domain
+		d = (conn && domid == conn->id && conn->domain) ? conn->domain
 		    : find_domain_struct(domid);
 		if (d) {
 			d->nbentry--;
@@ -1113,7 +1155,7 @@ int domain_memory_add(unsigned int domid, int mem, bool no_quota_check)
 		 * exist, as accounting is done either for a domain related to
 		 * the current connection, or for the domain owning a node
 		 * (which is always existing, as the owner of the node is
-		 * tested to exist and replaced by domid 0 if not).
+		 * tested to exist and deleted or replaced by domid 0 if not).
 		 * So not finding the related domain MUST be an error in the
 		 * data base.
 		 */
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index 7fe0a21d9e45..b38c82991dc6 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -62,7 +62,7 @@ const char *get_implicit_path(const struct connection *conn);
 bool domain_is_unprivileged(struct connection *conn);
 
 /* Remove node permissions for no longer existing domains. */
-int domain_adjust_node_perms(struct connection *conn, struct node *node);
+int domain_adjust_node_perms(struct node *node);
 int domain_alloc_permrefs(struct node_perms *perms);
 
 /* Quota manipulation */
From 929da557efea6c7d2340467d9a7fdae7fda6d2b1 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:13 +0200
Subject: tools/xenstore: make the internal memory data base the default

Having a file backed data base has the only advantage of being capable
to dump the contents of it while Xenstore is running, and potentially
using less swap space in case the data base can't be kept in memory.

It has the major disadvantage of a huge performance overhead: switching
to keep the data base in memory only speeds up live update of xenstored
with 120000 nodes from 20 minutes to 11 seconds. A complete tree walk
of this configuration will be reduced from 7 seconds to 280 msecs
(measured by "xenstore-control check").

So make the internal memory data base the default and enhance the
"--internal-db" command line parameter to take an optional parameter
allowing to switch the internal data base back to the file based one.

This is part of XSA-419.

Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/helpers/init-xenstore-domain.c b/tools/helpers/init-xenstore-domain.c
index 2d9ab6f1c583..04e351ca29a8 100644
--- a/tools/helpers/init-xenstore-domain.c
+++ b/tools/helpers/init-xenstore-domain.c
@@ -222,9 +222,9 @@ static int build(xc_interface *xch)
     }
 
     if ( param )
-        snprintf(cmdline, 512, "--event %d --internal-db %s", rv, param);
+        snprintf(cmdline, 512, "--event %d %s", rv, param);
     else
-        snprintf(cmdline, 512, "--event %d --internal-db", rv);
+        snprintf(cmdline, 512, "--event %d", rv);
 
     dom->guest_domid = domid;
     dom->cmdline = xc_dom_strdup(dom, cmdline);
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 13e48aaa731c..36fb4a832834 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2308,7 +2308,7 @@ static void accept_connection(int sock)
 }
 #endif
 
-static int tdb_flags;
+static int tdb_flags = TDB_INTERNAL | TDB_NOLOCK;
 
 /* We create initial nodes manually. */
 static void manual_node(const char *name, const char *child)
@@ -2618,7 +2618,8 @@ static void usage(void)
 "                          watch-event: time a watch-event is kept pending\n"
 "  -R, --no-recovery       to request that no recovery should be attempted when\n"
 "                          the store is corrupted (debug only),\n"
-"  -I, --internal-db       store database in memory, not on disk\n"
+"  -I, --internal-db [on|off] store database in memory, not on disk, default is\n"
+"                          memory, with \"--internal-db off\" it is on disk\n"
 "  -K, --keep-orphans      don't delete nodes owned by a domain when the\n"
 "                          domain is deleted (this is a security risk!)\n"
 "  -V, --verbose           to request verbose execution.\n");
@@ -2644,7 +2645,7 @@ static struct option options[] = {
 	{ "quota-soft", 1, NULL, 'q' },
 	{ "timeout", 1, NULL, 'w' },
 	{ "no-recovery", 0, NULL, 'R' },
-	{ "internal-db", 0, NULL, 'I' },
+	{ "internal-db", 2, NULL, 'I' },
 	{ "keep-orphans", 0, NULL, 'K' },
 	{ "verbose", 0, NULL, 'V' },
 	{ "watch-nb", 1, NULL, 'W' },
@@ -2725,7 +2726,8 @@ int main(int argc, char *argv[])
 	orig_argc = argc;
 	orig_argv = argv;
 
-	while ((opt = getopt_long(argc, argv, "DE:F:HKNPS:t:A:M:Q:q:T:RVW:w:U",
+	while ((opt = getopt_long(argc, argv,
+				  "DE:F:HI::KNPS:t:A:M:Q:q:T:RVW:w:U",
 				  options, NULL)) != -1) {
 		switch (opt) {
 		case 'D':
@@ -2759,7 +2761,8 @@ int main(int argc, char *argv[])
 			tracefile = optarg;
 			break;
 		case 'I':
-			tdb_flags = TDB_INTERNAL|TDB_NOLOCK;
+			if (optarg && !strcmp(optarg, "off"))
+				tdb_flags = 0;
 			break;
 		case 'K':
 			keep_orphans = true;
From 54e63b7e7c42e4f975163809a01574e78552a6ab Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:13 +0200
Subject: docs: enhance xenstore.txt with permissions description
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The permission scheme of Xenstore nodes is not really covered by
docs/misc/xenstore.txt, other than referring to the Xen wiki.

Add a paragraph explaining the permissions of nodes, and especially
mentioning removal of nodes when a domain has been removed from
Xenstore.

This is part of XSA-419.

Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>

diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
index 988ef89cba2d..44428ae3a755 100644
--- a/docs/misc/xenstore.txt
+++ b/docs/misc/xenstore.txt
@@ -43,6 +43,17 @@ bytes are forbidden; clients specifying relative paths should keep
 them to within 2048 bytes.  (See XENSTORE_*_PATH_MAX in xs_wire.h.)
 
 
+Each node has one or multiple permission entries.  Permissions are
+granted by domain-id, the first permission entry of each node specifies
+the owner of the node.  Permissions of a node can be changed by the
+owner of the node, the owner can only be modified by the control
+domain (usually domain id 0).  The owner always has the right to read
+and write the node, while other permissions can be setup to allow
+read and/or write access.  When a domain is being removed from Xenstore
+nodes owned by that domain will be removed together with all of those
+nodes' children.
+
+
 Communication with xenstore is via either sockets, or event channel
 and shared memory, as specified in io/xs_wire.h: each message in
 either direction is a header formatted as a struct xsd_sockmsg