Xen Security Advisory 418 v2 (CVE-2022-42321) - Xenstore: Guests can crash xenstored via exhausting the stack

Xen.org security team posted 1 patch 1 year, 6 months ago
Failed in applying to current master (apply log)
Xen Security Advisory 418 v2 (CVE-2022-42321) - Xenstore: Guests can crash xenstored via exhausting the stack
Posted by Xen.org security team 1 year, 6 months ago
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

            Xen Security Advisory CVE-2022-42321 / XSA-418
                               version 2

      Xenstore: Guests can crash xenstored via exhausting the stack

UPDATES IN VERSION 2
====================

Public release.

ISSUE DESCRIPTION
=================

Xenstored is using recursion for some Xenstore operations (e.g. for
deleting a sub-tree of Xenstore nodes). With sufficiently deep nesting
levels this can result in stack exhaustion on xenstored, leading to a
crash of xenstored.

IMPACT
======

A malicious guest creating very deep nesting levels of Xenstore nodes
might be able to crash xenstored, resulting in a Denial of Service (DoS)
of Xenstore.

This will inhibit creation of new guests or changing the configuration of
already running guests.

VULNERABLE SYSTEMS
==================

All versions of Xen are affected.

Only systems running the C variant of Xenstore (xenstored or xenstore-
stubdom) are vulnerable.

Systems using the Ocaml variant of Xenstore (oxenstored) are not vulnerable.

MITIGATION
==========

Running oxenstored instead of xenstored will avoid the vulnerability.

CREDITS
=======

This issue was discovered by David Vrabel of Amazon.

RESOLUTION
==========

Applying the appropriate attached patch resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa418/xsa418-??.patch           xen-unstable
xsa418/xsa418-4.16-??.patch      Xen 4.16.x
xsa418/xsa418-4.15-??.patch      Xen 4.15.x
xsa418/xsa418-4.14-??.patch      Xen 4.14.x - 4.13.x

$ sha256sum xsa418* xsa418*/*
dba8cf354728d5b9248d9649d042835b2f5f96dd995d0fe23a07a157cba68500  xsa418.meta
d13f084bbca78d35b991fe5347297d13f77b4e49ad816344363a61a8335e6632  xsa418/xsa418-01.patch
ac9acb8cda844e3873ec0a77fb9bd58581d6f1084f8a38fa494bff548c9232ae  xsa418/xsa418-02.patch
bc29743d71eed3ba41d1ec732e5c0011107dcc06d945ec554ef04314e0272898  xsa418/xsa418-03.patch
bba67ab17c8c132258b0cfbc701e2b79ae6ea5ef507f4c09e103c19a9c729b03  xsa418/xsa418-4.14-01.patch
79eadfee1eeae340256331b5e189f1c8514106dae5ca208b0f4965ba6f6e9e51  xsa418/xsa418-4.14-02.patch
6a96c8636fc3c2a1539b9c21d3af4e0a68124dc4a7219c5eacd685f7d0543dd7  xsa418/xsa418-4.14-03.patch
fe4ad75c34ceba6427c6f2ea7ad86af4a25ba3f5f9dc42fdd4ef7bf4fa60d39d  xsa418/xsa418-4.14-04.patch
7884b7850d991d098409a3a9a27050f3d34486a3b459e0c2047d1dc43e13515f  xsa418/xsa418-4.14-05.patch
27c070655bf27a2ca84506703d76ab5b3c9fd22155a29af5c882013cd5580640  xsa418/xsa418-4.14-06.patch
313707f2b0738680015a38ec50d93f149c386c72c809cd17de8f52e2d883b8e0  xsa418/xsa418-4.15-01.patch
4628506b3f4407034b7c6e0159a6719225f6a4c70fe12b30375f515bb6ce5d93  xsa418/xsa418-4.15-02.patch
a59fed27d614de06a8d508da6345dda7260d2ac7ff9762372b34c4e6a5dfa432  xsa418/xsa418-4.15-03.patch
99ea45e5f877afe01af189ebfe3114edc8d3283829424adc53760d385b8a202e  xsa418/xsa418-4.15-04.patch
dd10d3c3af942fd941604029a5b5262ae6d8f7c7a9071b243904bc34c8d14ab2  xsa418/xsa418-4.15-05.patch
1a50edee9d3a04a982ba22bcf150475f396494c03b4b6eaf18b45561f0d005fc  xsa418/xsa418-4.15-06.patch
042cf55472e911b871a8062613b604e7a4641505bae4e6505a176b2976906739  xsa418/xsa418-4.15-07.patch
669e8fc1637b92846ad7b72eb510c05920b267bc54340e83b3f1c8df2092ecbc  xsa418/xsa418-4.16-01.patch
b382431343ab873d6ab88557b09891dc821a497200c1b61e7b64286bba899ea9  xsa418/xsa418-4.16-02.patch
27737bfa0d3e475ba0e468ab3dcf0274bde40948e5f669f179d2964f6cfab4cf  xsa418/xsa418-4.16-03.patch
5677156c12063d0cbad273d45800bb25176308ffd7b660d73aac3a36e4099055  xsa418/xsa418-4.16-04.patch
3c3b0282cbc50da485f6b7a871e0cc318725db2b3debf098b0fc6d0598488a48  xsa418/xsa418-4.16-05.patch
d871d0e38f6db4cc86591c63cb37c63aed9ed0ba88429236eb91d142090da529  xsa418/xsa418-4.16-06.patch
145a98f2540b5c17c7d262e1df80103c4478d622a4eeba07d1566679d81a4542  xsa418/xsa418-4.16-07.patch
70874f345806b376fea1b02b0ed4d493d792a43f5c6fc29c13e0658350086f92  xsa418/xsa418-04.patch
8d94a7c6e9e484569c6eb98f274fa7489e68a9f16d12092839bb519cfc32a7b3  xsa418/xsa418-05.patch
e5ecc6d3756a485114b57e0d02ff53d6eb3b312fac117a99c05bc392faa45d27  xsa418/xsa418-06.patch
77695fa2f1bfeee051d4a0e0d1e0b654f5177ce104a72635c2f1bafb1d6631cb  xsa418/xsa418-07.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches and/or mitigations described above (or
others which are substantially similar) is permitted during the
embargo, even on public-facing systems with untrusted guest users and
administrators.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.


(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmNg+6sMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZ0wAH/36wusPv68bogxxnnNwL6eFmZZ1Rd90mAMfw6Qyt
OYo3tOWhnZVVH3uC84S7s/zWsZWJaaWxTnGW03Gxnep3GstufnWnV0m/VsmXsI9L
/W0C23SgWxao+Bc819TRWF3JTcSb/wdbBbgHOJbu8gzLQc7T8xsgUeOr34fpAtZv
qr2fExhKrlxdWYodDJLdZryZRBQ1ZKbO+Rihpv23FKst4HhlQvCvWr99oK6/ubkp
2mzLjeotWxT2G+RnQNJp4JqgXaYr6972/Q5h75lCxQZWxw7baIS62gTaFfK8cD4p
j4gVo2zYtMBivUZngmTF36iRN743NAOz3HsvU1pEphbc24o=
=6SQq
-----END PGP SIGNATURE-----
{
  "XSA": 418,
  "SupportedVersions": [
    "master",
    "4.16",
    "4.15",
    "4.14",
    "4.13"
  ],
  "Trees": [
    "xen"
  ],
  "Recipes": {
    "4.13": {
      "Recipes": {
        "xen": {
          "StableRef": "0be63c2615b268001f7cc9b72ce25eed952737dc",
          "Prereqs": [
            414,
            415,
            326,
            416,
            417
          ],
          "Patches": [
            "xsa418/xsa418-4.14-??.patch"
          ]
        }
      }
    },
    "4.14": {
      "Recipes": {
        "xen": {
          "StableRef": "016de62747b26ead5a5c763b640fe8e205cd182b",
          "Prereqs": [
            414,
            415,
            326,
            416,
            417
          ],
          "Patches": [
            "xsa418/xsa418-4.14-??.patch"
          ]
        }
      }
    },
    "4.15": {
      "Recipes": {
        "xen": {
          "StableRef": "816580afdd1730d4f85f64477a242a439af1cdf8",
          "Prereqs": [
            414,
            415,
            326,
            416,
            417
          ],
          "Patches": [
            "xsa418/xsa418-4.15-??.patch"
          ]
        }
      }
    },
    "4.16": {
      "Recipes": {
        "xen": {
          "StableRef": "1bce7fb1f702da4f7a749c6f1457ecb20bf74fca",
          "Prereqs": [
            412,
            414,
            415,
            326,
            416,
            417
          ],
          "Patches": [
            "xsa418/xsa418-4.16-??.patch"
          ]
        }
      }
    },
    "master": {
      "Recipes": {
        "xen": {
          "StableRef": "cc4747be8ba157a3b310921e9ee07fb8545aa206",
          "Prereqs": [
            412,
            414,
            415,
            326,
            416,
            417
          ],
          "Patches": [
            "xsa418/xsa418-??.patch"
          ]
        }
      }
    }
  }
}From 9923c06a4f19c5a46927e35f103ec65b65e416c2 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:11 +0200
Subject: tools/xenstore: remove recursion from construct_node()

In order to reduce stack usage due to recursion, switch
construct_node() to use a loop instead.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index a0c176fa203e..cb52f68d4ddf 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1373,45 +1373,69 @@ static int add_child(const void *ctx, struct node *parent, const char *name)
 static struct node *construct_node(struct connection *conn, const void *ctx,
 				   const char *name)
 {
-	struct node *parent, *node;
-	char *parentname = get_parent(ctx, name);
+	const char **names = NULL;
+	unsigned int levels = 0;
+	struct node *node = NULL;
+	struct node *parent = NULL;
+	const char *parentname = talloc_strdup(ctx, name);
 
 	if (!parentname)
 		return NULL;
 
-	/* If parent doesn't exist, create it. */
-	parent = read_node(conn, parentname, parentname);
-	if (!parent && errno == ENOENT)
-		parent = construct_node(conn, ctx, parentname);
-	if (!parent)
-		return NULL;
+	/* Walk the path up until an existing node is found. */
+	while (!parent) {
+		names = talloc_realloc(ctx, names, const char *, levels + 1);
+		if (!names)
+			goto nomem;
 
-	/* Add child to parent. */
-	if (add_child(ctx, parent, name))
-		goto nomem;
+		/*
+		 * names[0] is the name of the node to construct initially,
+		 * names[1] is its parent, and so on.
+		 */
+		names[levels] = parentname;
+		parentname = get_parent(ctx, parentname);
+		if (!parentname)
+			return NULL;
 
-	/* Allocate node */
-	node = talloc(ctx, struct node);
-	if (!node)
-		goto nomem;
-	node->name = talloc_strdup(node, name);
-	if (!node->name)
-		goto nomem;
+		/* Try to read parent node until we found an existing one. */
+		parent = read_node(conn, ctx, parentname);
+		if (!parent && (errno != ENOENT || !strcmp(parentname, "/")))
+			return NULL;
 
-	/* Inherit permissions, except unprivileged domains own what they create */
-	node->perms.num = parent->perms.num;
-	node->perms.p = talloc_memdup(node, parent->perms.p,
-				      node->perms.num * sizeof(*node->perms.p));
-	if (!node->perms.p)
-		goto nomem;
-	if (domain_is_unprivileged(conn))
-		node->perms.p[0].id = conn->id;
+		levels++;
+	}
+
+	/* Walk the path down again constructing the missing nodes. */
+	for (; levels > 0; levels--) {
+		/* Add child to parent. */
+		if (add_child(ctx, parent, names[levels - 1]))
+			goto nomem;
+
+		/* Allocate node */
+		node = talloc(ctx, struct node);
+		if (!node)
+			goto nomem;
+		node->name = talloc_steal(node, names[levels - 1]);
+
+		/* Inherit permissions, unpriv domains own what they create. */
+		node->perms.num = parent->perms.num;
+		node->perms.p = talloc_memdup(node, parent->perms.p,
+					      node->perms.num *
+					      sizeof(*node->perms.p));
+		if (!node->perms.p)
+			goto nomem;
+		if (domain_is_unprivileged(conn))
+			node->perms.p[0].id = conn->id;
+
+		/* No children, no data */
+		node->children = node->data = NULL;
+		node->childlen = node->datalen = 0;
+		node->acc.memory = 0;
+		node->parent = parent;
+
+		parent = node;
+	}
 
-	/* No children, no data */
-	node->children = node->data = NULL;
-	node->childlen = node->datalen = 0;
-	node->acc.memory = 0;
-	node->parent = parent;
 	return node;
 
 nomem:
From 86c6ee483547a61355b32117e439d47b4daccbdc Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:11 +0200
Subject: tools/xenstore: don't let remove_child_entry() call corrupt()

In case of write_node() returning an error, remove_child_entry() will
call corrupt() today. This could result in an endless recursion, as
remove_child_entry() is called by corrupt(), too:

corrupt()
  check_store()
    check_store_()
      remove_child_entry()

Fix that by letting remove_child_entry() return an error instead and
let the caller decide what to do.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index cb52f68d4ddf..5642917e67db 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1604,15 +1604,15 @@ static void memdel(void *mem, unsigned off, unsigned len, unsigned total)
 	memmove(mem + off, mem + off + len, total - off - len);
 }
 
-static void remove_child_entry(struct connection *conn, struct node *node,
-			       size_t offset)
+static int remove_child_entry(struct connection *conn, struct node *node,
+			      size_t offset)
 {
 	size_t childlen = strlen(node->children + offset);
 
 	memdel(node->children, offset, childlen + 1, node->childlen);
 	node->childlen -= childlen + 1;
-	if (write_node(conn, node, true))
-		corrupt(conn, "Can't update parent node '%s'", node->name);
+
+	return write_node(conn, node, true);
 }
 
 static void delete_child(struct connection *conn,
@@ -1622,7 +1622,9 @@ static void delete_child(struct connection *conn,
 
 	for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
 		if (streq(node->children+i, childname)) {
-			remove_child_entry(conn, node, i);
+			if (remove_child_entry(conn, node, i))
+				corrupt(conn, "Can't update parent node '%s'",
+					node->name);
 			return;
 		}
 	}
@@ -2304,6 +2306,17 @@ int remember_string(struct hashtable *hash, const char *str)
 	return hashtable_insert(hash, k, (void *)1);
 }
 
+static int rm_child_entry(struct node *node, size_t off, size_t len)
+{
+	if (!recovery)
+		return off;
+
+	if (remove_child_entry(NULL, node, off))
+		log("check_store: child entry could not be removed from '%s'",
+		    node->name);
+
+	return off - len - 1;
+}
 
 /**
  * A node has a children field that names the children of the node, separated
@@ -2356,12 +2369,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
 				if (hashtable_search(children, childname)) {
 					log("check_store: '%s' is duplicated!",
 					    childname);
-
-					if (recovery) {
-						remove_child_entry(NULL, node,
-								   i);
-						i -= childlen + 1;
-					}
+					i = rm_child_entry(node, i, childlen);
 				}
 				else {
 					if (!remember_string(children,
@@ -2378,11 +2386,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
 			} else if (errno != ENOMEM) {
 				log("check_store: No child '%s' found!\n",
 				    childname);
-
-				if (recovery) {
-					remove_child_entry(NULL, node, i);
-					i -= childlen + 1;
-				}
+				i = rm_child_entry(node, i, childlen);
 			} else {
 				log("check_store: ENOMEM");
 				ret = ENOMEM;
From 567ac251a657f8c1847abb5a15ffa686b5aed6fb Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:11 +0200
Subject: tools/xenstore: add generic treewalk function

Add a generic function to walk the complete node tree. It will start
at "/" and descend recursively into each child, calling a function
specified by the caller. Depending on the return value of the user
specified function the walk will be aborted, continued, or the current
child will be skipped by not descending into its children.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 5642917e67db..f78faf0c3e06 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1834,6 +1834,135 @@ static int do_set_perms(const void *ctx, struct connection *conn,
 	return 0;
 }
 
+static char *child_name(const void *ctx, const char *s1, const char *s2)
+{
+	if (strcmp(s1, "/"))
+		return talloc_asprintf(ctx, "%s/%s", s1, s2);
+	return talloc_asprintf(ctx, "/%s", s2);
+}
+
+static int rm_from_parent(struct connection *conn, struct node *parent,
+			  const char *name)
+{
+	size_t off;
+
+	if (!parent)
+		return WALK_TREE_ERROR_STOP;
+
+	for (off = parent->childoff - 1; off && parent->children[off - 1];
+	     off--);
+	if (remove_child_entry(conn, parent, off)) {
+		log("treewalk: child entry could not be removed from '%s'",
+		    parent->name);
+		return WALK_TREE_ERROR_STOP;
+	}
+	parent->childoff = off;
+
+	return WALK_TREE_OK;
+}
+
+static int walk_call_func(const void *ctx, struct connection *conn,
+			  struct node *node, struct node *parent, void *arg,
+			  int (*func)(const void *ctx, struct connection *conn,
+				      struct node *node, void *arg))
+{
+	int ret;
+
+	if (!func)
+		return WALK_TREE_OK;
+
+	ret = func(ctx, conn, node, arg);
+	if (ret == WALK_TREE_RM_CHILDENTRY && parent)
+		ret = rm_from_parent(conn, parent, node->name);
+
+	return ret;
+}
+
+int walk_node_tree(const void *ctx, struct connection *conn, const char *root,
+		   struct walk_funcs *funcs, void *arg)
+{
+	int ret = 0;
+	void *tmpctx;
+	char *name;
+	struct node *node = NULL;
+	struct node *parent = NULL;
+
+	tmpctx = talloc_new(ctx);
+	if (!tmpctx) {
+		errno = ENOMEM;
+		return WALK_TREE_ERROR_STOP;
+	}
+	name = talloc_strdup(tmpctx, root);
+	if (!name) {
+		errno = ENOMEM;
+		talloc_free(tmpctx);
+		return WALK_TREE_ERROR_STOP;
+	}
+
+	/* Continue the walk until an error is returned. */
+	while (ret >= 0) {
+		/* node == NULL possible only for the initial loop iteration. */
+		if (node) {
+			/* Go one step up if ret or if last child finished. */
+			if (ret || node->childoff >= node->childlen) {
+				parent = node->parent;
+				/* Call function AFTER processing a node. */
+				ret = walk_call_func(ctx, conn, node, parent,
+						     arg, funcs->exit);
+				/* Last node, so exit loop. */
+				if (!parent)
+					break;
+				talloc_free(node);
+				/* Continue with parent. */
+				node = parent;
+				continue;
+			}
+			/* Get next child of current node. */
+			name = child_name(tmpctx, node->name,
+					  node->children + node->childoff);
+			if (!name) {
+				ret = WALK_TREE_ERROR_STOP;
+				break;
+			}
+			/* Point to next child. */
+			node->childoff += strlen(node->children +
+						 node->childoff) + 1;
+			/* Descent into children. */
+			parent = node;
+		}
+		/* Read next node (root node or next child). */
+		node = read_node(conn, tmpctx, name);
+		if (!node) {
+			/* Child not found - should not happen! */
+			/* ENOENT case can be handled by supplied function. */
+			if (errno == ENOENT && funcs->enoent)
+				ret = funcs->enoent(ctx, conn, parent, name,
+						    arg);
+			else
+				ret = WALK_TREE_ERROR_STOP;
+			if (!parent)
+				break;
+			if (ret == WALK_TREE_RM_CHILDENTRY)
+				ret = rm_from_parent(conn, parent, name);
+			if (ret < 0)
+				break;
+			talloc_free(name);
+			node = parent;
+			continue;
+		}
+		talloc_free(name);
+		node->parent = parent;
+		node->childoff = 0;
+		/* Call function BEFORE processing a node. */
+		ret = walk_call_func(ctx, conn, node, parent, arg,
+				     funcs->enter);
+	}
+
+	talloc_free(tmpctx);
+
+	return ret < 0 ? ret : WALK_TREE_OK;
+}
+
 static struct {
 	const char *str;
 	int (*func)(const void *ctx, struct connection *conn,
@@ -2284,18 +2413,6 @@ static int keys_equal_fn(void *key1, void *key2)
 	return 0 == strcmp((char *)key1, (char *)key2);
 }
 
-
-static char *child_name(const char *s1, const char *s2)
-{
-	if (strcmp(s1, "/")) {
-		return talloc_asprintf(NULL, "%s/%s", s1, s2);
-	}
-	else {
-		return talloc_asprintf(NULL, "/%s", s2);
-	}
-}
-
-
 int remember_string(struct hashtable *hash, const char *str)
 {
 	char *k = malloc(strlen(str) + 1);
@@ -2355,7 +2472,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
 		while (i < node->childlen && !ret) {
 			struct node *childnode;
 			size_t childlen = strlen(node->children + i);
-			char * childname = child_name(node->name,
+			char * childname = child_name(NULL, node->name,
 						      node->children + i);
 
 			if (!childname) {
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index f7c37fe3b565..acb00ad96914 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -202,6 +202,7 @@ struct node {
 
 	/* Children, each nul-terminated. */
 	unsigned int childlen;
+	unsigned int childoff;	/* Used by walk_node_tree() internally. */
 	char *children;
 
 	/* Allocation information for node currently in store. */
@@ -333,6 +334,45 @@ void read_state_buffered_data(const void *ctx, struct connection *conn,
 			      const struct xs_state_connection *sc);
 void read_state_node(const void *ctx, const void *state);
 
+/*
+ * Walk the node tree below root calling funcs->enter() and funcs->exit() for
+ * each node. funcs->enter() is being called when entering a node, so before
+ * any of the children of the node is processed. funcs->exit() is being
+ * called when leaving the node, so after all children have been processed.
+ * funcs->enoent() is being called when a node isn't existing.
+ * funcs->*() return values:
+ *  < 0: tree walk is stopped, walk_node_tree() returns funcs->*() return value
+ *       in case WALK_TREE_ERROR_STOP is returned, errno should be set
+ *  WALK_TREE_OK: tree walk is continuing
+ *  WALK_TREE_SKIP_CHILDREN: tree walk won't descend below current node, but
+ *       walk continues
+ *  WALK_TREE_RM_CHILDENTRY: Remove the child entry from its parent and write
+ *       the modified parent node back to the data base, implies to not descend
+ *       below the current node, but to continue the walk
+ * funcs->*() is allowed to modify the node it is called for in the data base.
+ * In case funcs->enter() is deleting the node, it must not return WALK_TREE_OK
+ * in order to avoid descending into no longer existing children.
+ */
+/* Return values for funcs->*() and walk_node_tree(). */
+#define WALK_TREE_SUCCESS_STOP  -100    /* Stop walk early, no error. */
+#define WALK_TREE_ERROR_STOP    -1      /* Stop walk due to error. */
+#define WALK_TREE_OK            0       /* No error. */
+/* Return value for funcs->*() only. */
+#define WALK_TREE_SKIP_CHILDREN 1       /* Don't recurse below current node. */
+#define WALK_TREE_RM_CHILDENTRY 2       /* Remove child entry from parent. */
+
+struct walk_funcs {
+	int (*enter)(const void *ctx, struct connection *conn,
+		     struct node *node, void *arg);
+	int (*exit)(const void *ctx, struct connection *conn,
+		    struct node *node, void *arg);
+	int (*enoent)(const void *ctx, struct connection *conn,
+		      struct node *parent, char *name, void *arg);
+};
+
+int walk_node_tree(const void *ctx, struct connection *conn, const char *root,
+		   struct walk_funcs *funcs, void *arg);
+
 #endif /* _XENSTORED_CORE_H */
 
 /*
From 0d4f442b883a27986111fc524d4575a356959bae Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:11 +0200
Subject: tools/xenstore: remove recursion from construct_node()

In order to reduce stack usage due to recursion, switch
construct_node() to use a loop instead.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 34a8469dd69d..e7971c828e8b 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1254,57 +1254,91 @@ static char *basename(const char *name)
 	return strrchr(name, '/') + 1;
 }
 
-static struct node *construct_node(struct connection *conn, const void *ctx,
-				   const char *name)
+static int add_child(const void *ctx, struct node *parent, const char *name)
 {
 	const char *base;
 	unsigned int baselen;
-	struct node *parent, *node;
-	char *children, *parentname = get_parent(ctx, name);
-
-	if (!parentname)
-		return NULL;
+	char *children;
 
-	/* If parent doesn't exist, create it. */
-	parent = read_node(conn, parentname, parentname);
-	if (!parent && errno == ENOENT)
-		parent = construct_node(conn, ctx, parentname);
-	if (!parent)
-		return NULL;
-
-	/* Add child to parent. */
 	base = basename(name);
 	baselen = strlen(base) + 1;
 	children = talloc_array(ctx, char, parent->childlen + baselen);
 	if (!children)
-		goto nomem;
+		return ENOMEM;
 	memcpy(children, parent->children, parent->childlen);
 	memcpy(children + parent->childlen, base, baselen);
 	parent->children = children;
 	parent->childlen += baselen;
 
-	/* Allocate node */
-	node = talloc(ctx, struct node);
-	if (!node)
-		goto nomem;
-	node->name = talloc_strdup(node, name);
-	if (!node->name)
-		goto nomem;
-
-	/* Inherit permissions, except unprivileged domains own what they create */
-	node->perms.num = parent->perms.num;
-	node->perms.p = talloc_memdup(node, parent->perms.p,
-				      node->perms.num * sizeof(*node->perms.p));
-	if (!node->perms.p)
-		goto nomem;
-	if (domain_is_unprivileged(conn))
-		node->perms.p[0].id = conn->id;
-
-	/* No children, no data */
-	node->children = node->data = NULL;
-	node->childlen = node->datalen = 0;
-	node->acc.memory = 0;
-	node->parent = parent;
+	return 0;
+}
+
+static struct node *construct_node(struct connection *conn, const void *ctx,
+				   const char *name)
+{
+	const char **names = NULL;
+	unsigned int levels = 0;
+	struct node *node = NULL;
+	struct node *parent = NULL;
+	const char *parentname = talloc_strdup(ctx, name);
+
+	if (!parentname)
+		return NULL;
+
+	/* Walk the path up until an existing node is found. */
+	while (!parent) {
+		names = talloc_realloc(ctx, names, const char *, levels + 1);
+		if (!names)
+			goto nomem;
+
+		/*
+		 * names[0] is the name of the node to construct initially,
+		 * names[1] is its parent, and so on.
+		 */
+		names[levels] = parentname;
+		parentname = get_parent(ctx, parentname);
+		if (!parentname)
+			return NULL;
+
+		/* Try to read parent node until we found an existing one. */
+		parent = read_node(conn, ctx, parentname);
+		if (!parent && (errno != ENOENT || !strcmp(parentname, "/")))
+			return NULL;
+
+		levels++;
+	}
+
+	/* Walk the path down again constructing the missing nodes. */
+	for (; levels > 0; levels--) {
+		/* Add child to parent. */
+		if (add_child(ctx, parent, names[levels - 1]))
+			goto nomem;
+
+		/* Allocate node */
+		node = talloc(ctx, struct node);
+		if (!node)
+			goto nomem;
+		node->name = talloc_steal(node, names[levels - 1]);
+
+		/* Inherit permissions, unpriv domains own what they create. */
+		node->perms.num = parent->perms.num;
+		node->perms.p = talloc_memdup(node, parent->perms.p,
+					      node->perms.num *
+					      sizeof(*node->perms.p));
+		if (!node->perms.p)
+			goto nomem;
+		if (domain_is_unprivileged(conn))
+			node->perms.p[0].id = conn->id;
+
+		/* No children, no data */
+		node->children = node->data = NULL;
+		node->childlen = node->datalen = 0;
+		node->acc.memory = 0;
+		node->parent = parent;
+
+		parent = node;
+	}
+
 	return node;
 
 nomem:
From b8075553eab9edd8e40f4c6da6c4439835aa1eea Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:11 +0200
Subject: tools/xenstore: don't let remove_child_entry() call corrupt()

In case of write_node() returning an error, remove_child_entry() will
call corrupt() today. This could result in an endless recursion, as
remove_child_entry() is called by corrupt(), too:

corrupt()
  check_store()
    check_store_()
      remove_child_entry()

Fix that by letting remove_child_entry() return an error instead and
let the caller decide what to do.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index e7971c828e8b..cef9fc1c1e12 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1505,15 +1505,15 @@ static void memdel(void *mem, unsigned off, unsigned len, unsigned total)
 	memmove(mem + off, mem + off + len, total - off - len);
 }
 
-static void remove_child_entry(struct connection *conn, struct node *node,
-			       size_t offset)
+static int remove_child_entry(struct connection *conn, struct node *node,
+			      size_t offset)
 {
 	size_t childlen = strlen(node->children + offset);
 
 	memdel(node->children, offset, childlen + 1, node->childlen);
 	node->childlen -= childlen + 1;
-	if (write_node(conn, node, true))
-		corrupt(conn, "Can't update parent node '%s'", node->name);
+
+	return write_node(conn, node, true);
 }
 
 static void delete_child(struct connection *conn,
@@ -1523,7 +1523,9 @@ static void delete_child(struct connection *conn,
 
 	for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
 		if (streq(node->children+i, childname)) {
-			remove_child_entry(conn, node, i);
+			if (remove_child_entry(conn, node, i))
+				corrupt(conn, "Can't update parent node '%s'",
+					node->name);
 			return;
 		}
 	}
@@ -2125,6 +2127,17 @@ int remember_string(struct hashtable *hash, const char *str)
 	return hashtable_insert(hash, k, (void *)1);
 }
 
+static int rm_child_entry(struct node *node, size_t off, size_t len)
+{
+	if (!recovery)
+		return off;
+
+	if (remove_child_entry(NULL, node, off))
+		log("check_store: child entry could not be removed from '%s'",
+		    node->name);
+
+	return off - len - 1;
+}
 
 /**
  * A node has a children field that names the children of the node, separated
@@ -2173,12 +2186,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
 				if (hashtable_search(children, childname)) {
 					log("check_store: '%s' is duplicated!",
 					    childname);
-
-					if (recovery) {
-						remove_child_entry(NULL, node,
-								   i);
-						i -= childlen + 1;
-					}
+					i = rm_child_entry(node, i, childlen);
 				}
 				else {
 					if (!remember_string(children,
@@ -2195,11 +2203,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
 			} else if (errno != ENOMEM) {
 				log("check_store: No child '%s' found!\n",
 				    childname);
-
-				if (recovery) {
-					remove_child_entry(NULL, node, i);
-					i -= childlen + 1;
-				}
+				i = rm_child_entry(node, i, childlen);
 			} else {
 				log("check_store: ENOMEM");
 				ret = ENOMEM;
From 60e5a1dd436a0d5778208df734044e6c610b8a81 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:11 +0200
Subject: tools/xenstore: add generic treewalk function

Add a generic function to walk the complete node tree. It will start
at "/" and descend recursively into each child, calling a function
specified by the caller. Depending on the return value of the user
specified function the walk will be aborted, continued, or the current
child will be skipped by not descending into its children.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index cef9fc1c1e12..47559a85d4c2 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1733,6 +1733,135 @@ static int do_set_perms(const void *ctx, struct connection *conn,
 	return 0;
 }
 
+static char *child_name(const void *ctx, const char *s1, const char *s2)
+{
+	if (strcmp(s1, "/"))
+		return talloc_asprintf(ctx, "%s/%s", s1, s2);
+	return talloc_asprintf(ctx, "/%s", s2);
+}
+
+static int rm_from_parent(struct connection *conn, struct node *parent,
+			  const char *name)
+{
+	size_t off;
+
+	if (!parent)
+		return WALK_TREE_ERROR_STOP;
+
+	for (off = parent->childoff - 1; off && parent->children[off - 1];
+	     off--);
+	if (remove_child_entry(conn, parent, off)) {
+		log("treewalk: child entry could not be removed from '%s'",
+		    parent->name);
+		return WALK_TREE_ERROR_STOP;
+	}
+	parent->childoff = off;
+
+	return WALK_TREE_OK;
+}
+
+static int walk_call_func(const void *ctx, struct connection *conn,
+			  struct node *node, struct node *parent, void *arg,
+			  int (*func)(const void *ctx, struct connection *conn,
+				      struct node *node, void *arg))
+{
+	int ret;
+
+	if (!func)
+		return WALK_TREE_OK;
+
+	ret = func(ctx, conn, node, arg);
+	if (ret == WALK_TREE_RM_CHILDENTRY && parent)
+		ret = rm_from_parent(conn, parent, node->name);
+
+	return ret;
+}
+
+int walk_node_tree(const void *ctx, struct connection *conn, const char *root,
+		   struct walk_funcs *funcs, void *arg)
+{
+	int ret = 0;
+	void *tmpctx;
+	char *name;
+	struct node *node = NULL;
+	struct node *parent = NULL;
+
+	tmpctx = talloc_new(ctx);
+	if (!tmpctx) {
+		errno = ENOMEM;
+		return WALK_TREE_ERROR_STOP;
+	}
+	name = talloc_strdup(tmpctx, root);
+	if (!name) {
+		errno = ENOMEM;
+		talloc_free(tmpctx);
+		return WALK_TREE_ERROR_STOP;
+	}
+
+	/* Continue the walk until an error is returned. */
+	while (ret >= 0) {
+		/* node == NULL possible only for the initial loop iteration. */
+		if (node) {
+			/* Go one step up if ret or if last child finished. */
+			if (ret || node->childoff >= node->childlen) {
+				parent = node->parent;
+				/* Call function AFTER processing a node. */
+				ret = walk_call_func(ctx, conn, node, parent,
+						     arg, funcs->exit);
+				/* Last node, so exit loop. */
+				if (!parent)
+					break;
+				talloc_free(node);
+				/* Continue with parent. */
+				node = parent;
+				continue;
+			}
+			/* Get next child of current node. */
+			name = child_name(tmpctx, node->name,
+					  node->children + node->childoff);
+			if (!name) {
+				ret = WALK_TREE_ERROR_STOP;
+				break;
+			}
+			/* Point to next child. */
+			node->childoff += strlen(node->children +
+						 node->childoff) + 1;
+			/* Descent into children. */
+			parent = node;
+		}
+		/* Read next node (root node or next child). */
+		node = read_node(conn, tmpctx, name);
+		if (!node) {
+			/* Child not found - should not happen! */
+			/* ENOENT case can be handled by supplied function. */
+			if (errno == ENOENT && funcs->enoent)
+				ret = funcs->enoent(ctx, conn, parent, name,
+						    arg);
+			else
+				ret = WALK_TREE_ERROR_STOP;
+			if (!parent)
+				break;
+			if (ret == WALK_TREE_RM_CHILDENTRY)
+				ret = rm_from_parent(conn, parent, name);
+			if (ret < 0)
+				break;
+			talloc_free(name);
+			node = parent;
+			continue;
+		}
+		talloc_free(name);
+		node->parent = parent;
+		node->childoff = 0;
+		/* Call function BEFORE processing a node. */
+		ret = walk_call_func(ctx, conn, node, parent, arg,
+				     funcs->enter);
+	}
+
+	talloc_free(tmpctx);
+
+	return ret < 0 ? ret : WALK_TREE_OK;
+}
+
 static struct {
 	const char *str;
 	int (*func)(const void *ctx, struct connection *conn,
@@ -2105,18 +2234,6 @@ static int keys_equal_fn(void *key1, void *key2)
 	return 0 == strcmp((char *)key1, (char *)key2);
 }
 
-
-static char *child_name(const char *s1, const char *s2)
-{
-	if (strcmp(s1, "/")) {
-		return talloc_asprintf(NULL, "%s/%s", s1, s2);
-	}
-	else {
-		return talloc_asprintf(NULL, "/%s", s2);
-	}
-}
-
-
 int remember_string(struct hashtable *hash, const char *str)
 {
 	char *k = malloc(strlen(str) + 1);
@@ -2172,7 +2289,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
 		while (i < node->childlen && !ret) {
 			struct node *childnode;
 			size_t childlen = strlen(node->children + i);
-			char * childname = child_name(node->name,
+			char * childname = child_name(NULL, node->name,
 						      node->children + i);
 
 			if (!childname) {
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 5abf06c21c98..fc9882ac37d5 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -167,6 +167,7 @@ struct node {
 
 	/* Children, each nul-terminated. */
 	unsigned int childlen;
+	unsigned int childoff;	/* Used by walk_node_tree() internally. */
 	char *children;
 
 	/* Allocation information for node currently in store. */
@@ -278,6 +279,45 @@ int do_tdb_delete(struct connection *conn, TDB_DATA *key,
 
 void conn_free_buffered_data(struct connection *conn);
 
+/*
+ * Walk the node tree below root calling funcs->enter() and funcs->exit() for
+ * each node. funcs->enter() is being called when entering a node, so before
+ * any of the children of the node is processed. funcs->exit() is being
+ * called when leaving the node, so after all children have been processed.
+ * funcs->enoent() is being called when a node isn't existing.
+ * funcs->*() return values:
+ *  < 0: tree walk is stopped, walk_node_tree() returns funcs->*() return value
+ *       in case WALK_TREE_ERROR_STOP is returned, errno should be set
+ *  WALK_TREE_OK: tree walk is continuing
+ *  WALK_TREE_SKIP_CHILDREN: tree walk won't descend below current node, but
+ *       walk continues
+ *  WALK_TREE_RM_CHILDENTRY: Remove the child entry from its parent and write
+ *       the modified parent node back to the data base, implies to not descend
+ *       below the current node, but to continue the walk
+ * funcs->*() is allowed to modify the node it is called for in the data base.
+ * In case funcs->enter() is deleting the node, it must not return WALK_TREE_OK
+ * in order to avoid descending into no longer existing children.
+ */
+/* Return values for funcs->*() and walk_node_tree(). */
+#define WALK_TREE_SUCCESS_STOP  -100    /* Stop walk early, no error. */
+#define WALK_TREE_ERROR_STOP    -1      /* Stop walk due to error. */
+#define WALK_TREE_OK            0       /* No error. */
+/* Return value for funcs->*() only. */
+#define WALK_TREE_SKIP_CHILDREN 1       /* Don't recurse below current node. */
+#define WALK_TREE_RM_CHILDENTRY 2       /* Remove child entry from parent. */
+
+struct walk_funcs {
+	int (*enter)(const void *ctx, struct connection *conn,
+		     struct node *node, void *arg);
+	int (*exit)(const void *ctx, struct connection *conn,
+		    struct node *node, void *arg);
+	int (*enoent)(const void *ctx, struct connection *conn,
+		      struct node *parent, char *name, void *arg);
+};
+
+int walk_node_tree(const void *ctx, struct connection *conn, const char *root,
+		   struct walk_funcs *funcs, void *arg);
+
 #endif /* _XENSTORED_CORE_H */
 
 /*
From d58d069b0af5e64b667609723ba24ad5242fc920 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: simplify check_store()

check_store() is using a hash table for storing all node names it has
found via walking the tree. Additionally it using another hash table
for all children of a node to detect duplicate child names.

Simplify that by dropping the second hash table as the first one is
already holding all the needed information.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 47559a85d4c2..302fb5b93c9b 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2277,46 +2277,34 @@ static int check_store_(const char *name, struct hashtable *reachable)
 	if (node) {
 		size_t i = 0;
 
-		struct hashtable * children =
-			create_hashtable(16, hash_from_key_fn, keys_equal_fn);
-
 		if (!remember_string(reachable, name)) {
-			hashtable_destroy(children, 0);
 			log("check_store: ENOMEM");
 			return ENOMEM;
 		}
 
 		while (i < node->childlen && !ret) {
-			struct node *childnode;
+			struct node *childnode = NULL;
 			size_t childlen = strlen(node->children + i);
-			char * childname = child_name(NULL, node->name,
-						      node->children + i);
+			char *childname = child_name(NULL, node->name,
+						     node->children + i);
 
 			if (!childname) {
 				log("check_store: ENOMEM");
 				ret = ENOMEM;
 				break;
 			}
+
+			if (hashtable_search(reachable, childname)) {
+				log("check_store: '%s' is duplicated!",
+				    childname);
+				i = rm_child_entry(node, i, childlen);
+				goto next;
+			}
+
 			childnode = read_node(NULL, childname, childname);
-			
+
 			if (childnode) {
-				if (hashtable_search(children, childname)) {
-					log("check_store: '%s' is duplicated!",
-					    childname);
-					i = rm_child_entry(node, i, childlen);
-				}
-				else {
-					if (!remember_string(children,
-							     childname)) {
-						log("check_store: ENOMEM");
-						talloc_free(childnode);
-						talloc_free(childname);
-						ret = ENOMEM;
-						break;
-					}
-					ret = check_store_(childname,
-							   reachable);
-				}
+				ret = check_store_(childname, reachable);
 			} else if (errno != ENOMEM) {
 				log("check_store: No child '%s' found!\n",
 				    childname);
@@ -2326,19 +2314,18 @@ static int check_store_(const char *name, struct hashtable *reachable)
 				ret = ENOMEM;
 			}
 
+ next:
 			talloc_free(childnode);
 			talloc_free(childname);
 			i += childlen + 1;
 		}
 
-		hashtable_destroy(children, 0 /* Don't free values (they are
-						 all (void *)1) */);
 		talloc_free(node);
 	} else if (errno != ENOMEM) {
 		/* Impossible, because no database should ever be without the
 		   root, and otherwise, we've just checked in our caller
 		   (which made a recursive call to get here). */
-		   
+
 		log("check_store: No child '%s' found: impossible!", name);
 	} else {
 		log("check_store: ENOMEM");
From ac24da3fde16215b4ba5f0cc8d9686881e97bb42 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: use treewalk for check_store()

Instead of doing an open tree walk using call recursion, use
walk_node_tree() when checking the store for inconsistencies.

This will reduce code size and avoid many nesting levels of function
calls which could potentially exhaust the stack.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 302fb5b93c9b..1cc11d0b1c30 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2244,18 +2244,6 @@ int remember_string(struct hashtable *hash, const char *str)
 	return hashtable_insert(hash, k, (void *)1);
 }
 
-static int rm_child_entry(struct node *node, size_t off, size_t len)
-{
-	if (!recovery)
-		return off;
-
-	if (remove_child_entry(NULL, node, off))
-		log("check_store: child entry could not be removed from '%s'",
-		    node->name);
-
-	return off - len - 1;
-}
-
 /**
  * A node has a children field that names the children of the node, separated
  * by NULs.  We check whether there are entries in there that are duplicated
@@ -2269,70 +2257,29 @@ static int rm_child_entry(struct node *node, size_t off, size_t len)
  * As we go, we record each node in the given reachable hashtable.  These
  * entries will be used later in clean_store.
  */
-static int check_store_(const char *name, struct hashtable *reachable)
+static int check_store_step(const void *ctx, struct connection *conn,
+			    struct node *node, void *arg)
 {
-	struct node *node = read_node(NULL, name, name);
-	int ret = 0;
-
-	if (node) {
-		size_t i = 0;
-
-		if (!remember_string(reachable, name)) {
-			log("check_store: ENOMEM");
-			return ENOMEM;
-		}
-
-		while (i < node->childlen && !ret) {
-			struct node *childnode = NULL;
-			size_t childlen = strlen(node->children + i);
-			char *childname = child_name(NULL, node->name,
-						     node->children + i);
-
-			if (!childname) {
-				log("check_store: ENOMEM");
-				ret = ENOMEM;
-				break;
-			}
-
-			if (hashtable_search(reachable, childname)) {
-				log("check_store: '%s' is duplicated!",
-				    childname);
-				i = rm_child_entry(node, i, childlen);
-				goto next;
-			}
+	struct hashtable *reachable = arg;
 
-			childnode = read_node(NULL, childname, childname);
-
-			if (childnode) {
-				ret = check_store_(childname, reachable);
-			} else if (errno != ENOMEM) {
-				log("check_store: No child '%s' found!\n",
-				    childname);
-				i = rm_child_entry(node, i, childlen);
-			} else {
-				log("check_store: ENOMEM");
-				ret = ENOMEM;
-			}
+	if (hashtable_search(reachable, (void *)node->name)) {
+		log("check_store: '%s' is duplicated!", node->name);
+		return recovery ? WALK_TREE_RM_CHILDENTRY
+				: WALK_TREE_SKIP_CHILDREN;
+	}
 
- next:
-			talloc_free(childnode);
-			talloc_free(childname);
-			i += childlen + 1;
-		}
+	if (!remember_string(reachable, node->name))
+		return WALK_TREE_ERROR_STOP;
 
-		talloc_free(node);
-	} else if (errno != ENOMEM) {
-		/* Impossible, because no database should ever be without the
-		   root, and otherwise, we've just checked in our caller
-		   (which made a recursive call to get here). */
+	return WALK_TREE_OK;
+}
 
-		log("check_store: No child '%s' found: impossible!", name);
-	} else {
-		log("check_store: ENOMEM");
-		ret = ENOMEM;
-	}
+static int check_store_enoent(const void *ctx, struct connection *conn,
+			      struct node *parent, char *name, void *arg)
+{
+	log("check_store: node '%s' not found", name);
 
-	return ret;
+	return recovery ? WALK_TREE_RM_CHILDENTRY : WALK_TREE_OK;
 }
 
 
@@ -2381,24 +2328,28 @@ static void clean_store(struct hashtable *reachable)
 
 void check_store(void)
 {
-	char * root = talloc_strdup(NULL, "/");
-	struct hashtable * reachable =
-		create_hashtable(16, hash_from_key_fn, keys_equal_fn);
- 
+	struct hashtable *reachable;
+	struct walk_funcs walkfuncs = {
+		.enter = check_store_step,
+		.enoent = check_store_enoent,
+	};
+
+	reachable = create_hashtable(16, hash_from_key_fn, keys_equal_fn);
 	if (!reachable) {
 		log("check_store: ENOMEM");
 		return;
 	}
 
 	log("Checking store ...");
-	if (!check_store_(root, reachable) &&
-	    !check_transactions(reachable))
+	if (walk_node_tree(NULL, NULL, "/", &walkfuncs, reachable)) {
+		if (errno == ENOMEM)
+			log("check_store: ENOMEM");
+	} else if (!check_transactions(reachable))
 		clean_store(reachable);
 	log("Checking store complete.");
 
 	hashtable_destroy(reachable, 0 /* Don't free values (they are all
 					  (void *)1) */);
-	talloc_free(root);
 }
 
 
From 56d02eea4e7740e4e140cf317c6ec3daa74a3e89 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: use treewalk for deleting nodes

Instead of doing an open tree walk using call recursion, use
walk_node_tree() when deleting a sub-tree of nodes.

This will reduce code size and avoid many nesting levels of function
calls which could potentially exhaust the stack.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 1cc11d0b1c30..5bb7b8521324 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1233,21 +1233,6 @@ static int do_read(const void *ctx, struct connection *conn,
 	return 0;
 }
 
-static void delete_node_single(struct connection *conn, struct node *node)
-{
-	TDB_DATA key;
-
-	if (access_node(conn, node, NODE_ACCESS_DELETE, &key))
-		return;
-
-	if (do_tdb_delete(conn, &key, &node->acc) != 0) {
-		corrupt(conn, "Could not delete '%s'", node->name);
-		return;
-	}
-
-	domain_entry_dec(conn, node);
-}
-
 /* Must not be / */
 static char *basename(const char *name)
 {
@@ -1516,69 +1501,59 @@ static int remove_child_entry(struct connection *conn, struct node *node,
 	return write_node(conn, node, true);
 }
 
-static void delete_child(struct connection *conn,
-			 struct node *node, const char *childname)
+static int delete_child(struct connection *conn,
+			struct node *node, const char *childname)
 {
 	unsigned int i;
 
 	for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
 		if (streq(node->children+i, childname)) {
-			if (remove_child_entry(conn, node, i))
-				corrupt(conn, "Can't update parent node '%s'",
-					node->name);
-			return;
+			errno = remove_child_entry(conn, node, i) ? EIO : 0;
+			return errno;
 		}
 	}
 	corrupt(conn, "Can't find child '%s' in %s", childname, node->name);
+
+	errno = EIO;
+	return errno;
 }
 
-static int delete_node(struct connection *conn, const void *ctx,
-		       struct node *parent, struct node *node, bool watch_exact)
+static int delnode_sub(const void *ctx, struct connection *conn,
+		       struct node *node, void *arg)
 {
-	char *name;
+	const char *root = arg;
+	bool watch_exact;
+	int ret;
+	TDB_DATA key;
 
-	/* Delete children. */
-	while (node->childlen) {
-		struct node *child;
+	/* Any error here will probably be repeated for all following calls. */
+	ret = access_node(conn, node, NODE_ACCESS_DELETE, &key);
+	if (ret > 0)
+		return WALK_TREE_SUCCESS_STOP;
 
-		name = talloc_asprintf(node, "%s/%s", node->name,
-				       node->children);
-		child = name ? read_node(conn, node, name) : NULL;
-		if (child) {
-			if (delete_node(conn, ctx, node, child, true))
-				return errno;
-		} else {
-			trace("delete_node: Error deleting child '%s/%s'!\n",
-			      node->name, node->children);
-			/* Quit deleting. */
-			errno = ENOMEM;
-			return errno;
-		}
-		talloc_free(name);
-	}
+	/* In case of error stop the walk. */
+	if (!ret && do_tdb_delete(conn, &key, &node->acc))
+		return WALK_TREE_SUCCESS_STOP;
 
 	/*
 	 * Fire the watches now, when we can still see the node permissions.
 	 * This fine as we are single threaded and the next possible read will
 	 * be handled only after the node has been really removed.
-	 */
+	*/
+	watch_exact = strcmp(root, node->name);
 	fire_watches(conn, ctx, node->name, node, watch_exact, NULL);
-	delete_node_single(conn, node);
-	delete_child(conn, parent, basename(node->name));
-	talloc_free(node);
 
-	return 0;
+	domain_entry_dec(conn, node);
+
+	return WALK_TREE_RM_CHILDENTRY;
 }
 
-static int _rm(struct connection *conn, const void *ctx, struct node *node,
-	       const char *name)
+static int _rm(struct connection *conn, const void *ctx, const char *name)
 {
-	/*
-	 * Deleting node by node, so the result is always consistent even in
-	 * case of a failure.
-	 */
 	struct node *parent;
 	char *parentname = get_parent(ctx, name);
+	struct walk_funcs walkfuncs = { .exit = delnode_sub };
+	int ret;
 
 	if (!parentname)
 		return errno;
@@ -1586,9 +1561,21 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
 	parent = read_node(conn, ctx, parentname);
 	if (!parent)
 		return read_node_can_propagate_errno() ? errno : EINVAL;
-	node->parent = parent;
 
-	return delete_node(conn, ctx, parent, node, false);
+	ret = walk_node_tree(ctx, conn, name, &walkfuncs, (void *)name);
+	if (ret < 0) {
+		if (ret == WALK_TREE_ERROR_STOP) {
+			corrupt(conn, "error when deleting sub-nodes of %s\n",
+				name);
+			errno = EIO;
+		}
+		return errno;
+	}
+
+	if (delete_child(conn, parent, basename(name)))
+		return errno;
+
+	return 0;
 }
 
 
@@ -1623,7 +1610,7 @@ static int do_rm(const void *ctx, struct connection *conn,
 	if (streq(name, "/"))
 		return EINVAL;
 
-	ret = _rm(conn, ctx, node, name);
+	ret = _rm(conn, ctx, name);
 	if (ret)
 		return ret;
 
From 92fa48921c17cebe6b61c161f7ff5dfa27975a18 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:11 +0200
Subject: tools/xenstore: remove recursion from construct_node()

In order to reduce stack usage due to recursion, switch
construct_node() to use a loop instead.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 8aecd425f274..46a37e5257e5 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1343,45 +1343,69 @@ static int add_child(const void *ctx, struct node *parent, const char *name)
 static struct node *construct_node(struct connection *conn, const void *ctx,
 				   const char *name)
 {
-	struct node *parent, *node;
-	char *parentname = get_parent(ctx, name);
+	const char **names = NULL;
+	unsigned int levels = 0;
+	struct node *node = NULL;
+	struct node *parent = NULL;
+	const char *parentname = talloc_strdup(ctx, name);
 
 	if (!parentname)
 		return NULL;
 
-	/* If parent doesn't exist, create it. */
-	parent = read_node(conn, parentname, parentname);
-	if (!parent && errno == ENOENT)
-		parent = construct_node(conn, ctx, parentname);
-	if (!parent)
-		return NULL;
+	/* Walk the path up until an existing node is found. */
+	while (!parent) {
+		names = talloc_realloc(ctx, names, const char *, levels + 1);
+		if (!names)
+			goto nomem;
 
-	/* Add child to parent. */
-	if (add_child(ctx, parent, name))
-		goto nomem;
+		/*
+		 * names[0] is the name of the node to construct initially,
+		 * names[1] is its parent, and so on.
+		 */
+		names[levels] = parentname;
+		parentname = get_parent(ctx, parentname);
+		if (!parentname)
+			return NULL;
 
-	/* Allocate node */
-	node = talloc(ctx, struct node);
-	if (!node)
-		goto nomem;
-	node->name = talloc_strdup(node, name);
-	if (!node->name)
-		goto nomem;
+		/* Try to read parent node until we found an existing one. */
+		parent = read_node(conn, ctx, parentname);
+		if (!parent && (errno != ENOENT || !strcmp(parentname, "/")))
+			return NULL;
 
-	/* Inherit permissions, except unprivileged domains own what they create */
-	node->perms.num = parent->perms.num;
-	node->perms.p = talloc_memdup(node, parent->perms.p,
-				      node->perms.num * sizeof(*node->perms.p));
-	if (!node->perms.p)
-		goto nomem;
-	if (domain_is_unprivileged(conn))
-		node->perms.p[0].id = conn->id;
+		levels++;
+	}
+
+	/* Walk the path down again constructing the missing nodes. */
+	for (; levels > 0; levels--) {
+		/* Add child to parent. */
+		if (add_child(ctx, parent, names[levels - 1]))
+			goto nomem;
+
+		/* Allocate node */
+		node = talloc(ctx, struct node);
+		if (!node)
+			goto nomem;
+		node->name = talloc_steal(node, names[levels - 1]);
+
+		/* Inherit permissions, unpriv domains own what they create. */
+		node->perms.num = parent->perms.num;
+		node->perms.p = talloc_memdup(node, parent->perms.p,
+					      node->perms.num *
+					      sizeof(*node->perms.p));
+		if (!node->perms.p)
+			goto nomem;
+		if (domain_is_unprivileged(conn))
+			node->perms.p[0].id = conn->id;
+
+		/* No children, no data */
+		node->children = node->data = NULL;
+		node->childlen = node->datalen = 0;
+		node->acc.memory = 0;
+		node->parent = parent;
+
+		parent = node;
+	}
 
-	/* No children, no data */
-	node->children = node->data = NULL;
-	node->childlen = node->datalen = 0;
-	node->acc.memory = 0;
-	node->parent = parent;
 	return node;
 
 nomem:
From df45ea379c68e675ce88bbe25e17cfe49ec108b7 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:11 +0200
Subject: tools/xenstore: don't let remove_child_entry() call corrupt()

In case of write_node() returning an error, remove_child_entry() will
call corrupt() today. This could result in an endless recursion, as
remove_child_entry() is called by corrupt(), too:

corrupt()
  check_store()
    check_store_()
      remove_child_entry()

Fix that by letting remove_child_entry() return an error instead and
let the caller decide what to do.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 46a37e5257e5..4c3897721bdd 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1574,15 +1574,15 @@ static void memdel(void *mem, unsigned off, unsigned len, unsigned total)
 	memmove(mem + off, mem + off + len, total - off - len);
 }
 
-static void remove_child_entry(struct connection *conn, struct node *node,
-			       size_t offset)
+static int remove_child_entry(struct connection *conn, struct node *node,
+			      size_t offset)
 {
 	size_t childlen = strlen(node->children + offset);
 
 	memdel(node->children, offset, childlen + 1, node->childlen);
 	node->childlen -= childlen + 1;
-	if (write_node(conn, node, true))
-		corrupt(conn, "Can't update parent node '%s'", node->name);
+
+	return write_node(conn, node, true);
 }
 
 static void delete_child(struct connection *conn,
@@ -1592,7 +1592,9 @@ static void delete_child(struct connection *conn,
 
 	for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
 		if (streq(node->children+i, childname)) {
-			remove_child_entry(conn, node, i);
+			if (remove_child_entry(conn, node, i))
+				corrupt(conn, "Can't update parent node '%s'",
+					node->name);
 			return;
 		}
 	}
@@ -2226,6 +2228,17 @@ int remember_string(struct hashtable *hash, const char *str)
 	return hashtable_insert(hash, k, (void *)1);
 }
 
+static int rm_child_entry(struct node *node, size_t off, size_t len)
+{
+	if (!recovery)
+		return off;
+
+	if (remove_child_entry(NULL, node, off))
+		log("check_store: child entry could not be removed from '%s'",
+		    node->name);
+
+	return off - len - 1;
+}
 
 /**
  * A node has a children field that names the children of the node, separated
@@ -2278,12 +2291,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
 				if (hashtable_search(children, childname)) {
 					log("check_store: '%s' is duplicated!",
 					    childname);
-
-					if (recovery) {
-						remove_child_entry(NULL, node,
-								   i);
-						i -= childlen + 1;
-					}
+					i = rm_child_entry(node, i, childlen);
 				}
 				else {
 					if (!remember_string(children,
@@ -2300,11 +2308,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
 			} else if (errno != ENOMEM) {
 				log("check_store: No child '%s' found!\n",
 				    childname);
-
-				if (recovery) {
-					remove_child_entry(NULL, node, i);
-					i -= childlen + 1;
-				}
+				i = rm_child_entry(node, i, childlen);
 			} else {
 				log("check_store: ENOMEM");
 				ret = ENOMEM;
From 78d86d76345df44d41fad8659f637a93ccd849b7 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:11 +0200
Subject: tools/xenstore: add generic treewalk function

Add a generic function to walk the complete node tree. It will start
at "/" and descend recursively into each child, calling a function
specified by the caller. Depending on the return value of the user
specified function the walk will be aborted, continued, or the current
child will be skipped by not descending into its children.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 4c3897721bdd..7463d0a002d7 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1804,6 +1804,135 @@ static int do_set_perms(const void *ctx, struct connection *conn,
 	return 0;
 }
 
+static char *child_name(const void *ctx, const char *s1, const char *s2)
+{
+	if (strcmp(s1, "/"))
+		return talloc_asprintf(ctx, "%s/%s", s1, s2);
+	return talloc_asprintf(ctx, "/%s", s2);
+}
+
+static int rm_from_parent(struct connection *conn, struct node *parent,
+			  const char *name)
+{
+	size_t off;
+
+	if (!parent)
+		return WALK_TREE_ERROR_STOP;
+
+	for (off = parent->childoff - 1; off && parent->children[off - 1];
+	     off--);
+	if (remove_child_entry(conn, parent, off)) {
+		log("treewalk: child entry could not be removed from '%s'",
+		    parent->name);
+		return WALK_TREE_ERROR_STOP;
+	}
+	parent->childoff = off;
+
+	return WALK_TREE_OK;
+}
+
+static int walk_call_func(const void *ctx, struct connection *conn,
+			  struct node *node, struct node *parent, void *arg,
+			  int (*func)(const void *ctx, struct connection *conn,
+				      struct node *node, void *arg))
+{
+	int ret;
+
+	if (!func)
+		return WALK_TREE_OK;
+
+	ret = func(ctx, conn, node, arg);
+	if (ret == WALK_TREE_RM_CHILDENTRY && parent)
+		ret = rm_from_parent(conn, parent, node->name);
+
+	return ret;
+}
+
+int walk_node_tree(const void *ctx, struct connection *conn, const char *root,
+		   struct walk_funcs *funcs, void *arg)
+{
+	int ret = 0;
+	void *tmpctx;
+	char *name;
+	struct node *node = NULL;
+	struct node *parent = NULL;
+
+	tmpctx = talloc_new(ctx);
+	if (!tmpctx) {
+		errno = ENOMEM;
+		return WALK_TREE_ERROR_STOP;
+	}
+	name = talloc_strdup(tmpctx, root);
+	if (!name) {
+		errno = ENOMEM;
+		talloc_free(tmpctx);
+		return WALK_TREE_ERROR_STOP;
+	}
+
+	/* Continue the walk until an error is returned. */
+	while (ret >= 0) {
+		/* node == NULL possible only for the initial loop iteration. */
+		if (node) {
+			/* Go one step up if ret or if last child finished. */
+			if (ret || node->childoff >= node->childlen) {
+				parent = node->parent;
+				/* Call function AFTER processing a node. */
+				ret = walk_call_func(ctx, conn, node, parent,
+						     arg, funcs->exit);
+				/* Last node, so exit loop. */
+				if (!parent)
+					break;
+				talloc_free(node);
+				/* Continue with parent. */
+				node = parent;
+				continue;
+			}
+			/* Get next child of current node. */
+			name = child_name(tmpctx, node->name,
+					  node->children + node->childoff);
+			if (!name) {
+				ret = WALK_TREE_ERROR_STOP;
+				break;
+			}
+			/* Point to next child. */
+			node->childoff += strlen(node->children +
+						 node->childoff) + 1;
+			/* Descent into children. */
+			parent = node;
+		}
+		/* Read next node (root node or next child). */
+		node = read_node(conn, tmpctx, name);
+		if (!node) {
+			/* Child not found - should not happen! */
+			/* ENOENT case can be handled by supplied function. */
+			if (errno == ENOENT && funcs->enoent)
+				ret = funcs->enoent(ctx, conn, parent, name,
+						    arg);
+			else
+				ret = WALK_TREE_ERROR_STOP;
+			if (!parent)
+				break;
+			if (ret == WALK_TREE_RM_CHILDENTRY)
+				ret = rm_from_parent(conn, parent, name);
+			if (ret < 0)
+				break;
+			talloc_free(name);
+			node = parent;
+			continue;
+		}
+		talloc_free(name);
+		node->parent = parent;
+		node->childoff = 0;
+		/* Call function BEFORE processing a node. */
+		ret = walk_call_func(ctx, conn, node, parent, arg,
+				     funcs->enter);
+	}
+
+	talloc_free(tmpctx);
+
+	return ret < 0 ? ret : WALK_TREE_OK;
+}
+
 static struct {
 	const char *str;
 	int (*func)(const void *ctx, struct connection *conn,
@@ -2206,18 +2335,6 @@ static int keys_equal_fn(void *key1, void *key2)
 	return 0 == strcmp((char *)key1, (char *)key2);
 }
 
-
-static char *child_name(const char *s1, const char *s2)
-{
-	if (strcmp(s1, "/")) {
-		return talloc_asprintf(NULL, "%s/%s", s1, s2);
-	}
-	else {
-		return talloc_asprintf(NULL, "/%s", s2);
-	}
-}
-
-
 int remember_string(struct hashtable *hash, const char *str)
 {
 	char *k = malloc(strlen(str) + 1);
@@ -2277,7 +2394,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
 		while (i < node->childlen && !ret) {
 			struct node *childnode;
 			size_t childlen = strlen(node->children + i);
-			char * childname = child_name(node->name,
+			char * childname = child_name(NULL, node->name,
 						      node->children + i);
 
 			if (!childname) {
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index 1eb3708f82dd..f0fd8c352857 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -195,6 +195,7 @@ struct node {
 
 	/* Children, each nul-terminated. */
 	unsigned int childlen;
+	unsigned int childoff;	/* Used by walk_node_tree() internally. */
 	char *children;
 
 	/* Allocation information for node currently in store. */
@@ -334,6 +335,45 @@ void read_state_buffered_data(const void *ctx, struct connection *conn,
 			      const struct xs_state_connection *sc);
 void read_state_node(const void *ctx, const void *state);
 
+/*
+ * Walk the node tree below root calling funcs->enter() and funcs->exit() for
+ * each node. funcs->enter() is being called when entering a node, so before
+ * any of the children of the node is processed. funcs->exit() is being
+ * called when leaving the node, so after all children have been processed.
+ * funcs->enoent() is being called when a node isn't existing.
+ * funcs->*() return values:
+ *  < 0: tree walk is stopped, walk_node_tree() returns funcs->*() return value
+ *       in case WALK_TREE_ERROR_STOP is returned, errno should be set
+ *  WALK_TREE_OK: tree walk is continuing
+ *  WALK_TREE_SKIP_CHILDREN: tree walk won't descend below current node, but
+ *       walk continues
+ *  WALK_TREE_RM_CHILDENTRY: Remove the child entry from its parent and write
+ *       the modified parent node back to the data base, implies to not descend
+ *       below the current node, but to continue the walk
+ * funcs->*() is allowed to modify the node it is called for in the data base.
+ * In case funcs->enter() is deleting the node, it must not return WALK_TREE_OK
+ * in order to avoid descending into no longer existing children.
+ */
+/* Return values for funcs->*() and walk_node_tree(). */
+#define WALK_TREE_SUCCESS_STOP  -100    /* Stop walk early, no error. */
+#define WALK_TREE_ERROR_STOP    -1      /* Stop walk due to error. */
+#define WALK_TREE_OK            0       /* No error. */
+/* Return value for funcs->*() only. */
+#define WALK_TREE_SKIP_CHILDREN 1       /* Don't recurse below current node. */
+#define WALK_TREE_RM_CHILDENTRY 2       /* Remove child entry from parent. */
+
+struct walk_funcs {
+	int (*enter)(const void *ctx, struct connection *conn,
+		     struct node *node, void *arg);
+	int (*exit)(const void *ctx, struct connection *conn,
+		    struct node *node, void *arg);
+	int (*enoent)(const void *ctx, struct connection *conn,
+		      struct node *parent, char *name, void *arg);
+};
+
+int walk_node_tree(const void *ctx, struct connection *conn, const char *root,
+		   struct walk_funcs *funcs, void *arg);
+
 #endif /* _XENSTORED_CORE_H */
 
 /*
From 1e72aa3c507c9e9bb495d12907189034bc8b1e7a Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: simplify check_store()

check_store() is using a hash table for storing all node names it has
found via walking the tree. Additionally it using another hash table
for all children of a node to detect duplicate child names.

Simplify that by dropping the second hash table as the first one is
already holding all the needed information.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 7463d0a002d7..a48255c64cad 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2378,50 +2378,34 @@ static int check_store_(const char *name, struct hashtable *reachable)
 	if (node) {
 		size_t i = 0;
 
-		struct hashtable * children =
-			create_hashtable(16, hash_from_key_fn, keys_equal_fn);
-		if (!children) {
-			log("check_store create table: ENOMEM");
-			return ENOMEM;
-		}
-
 		if (!remember_string(reachable, name)) {
-			hashtable_destroy(children, 0);
 			log("check_store: ENOMEM");
 			return ENOMEM;
 		}
 
 		while (i < node->childlen && !ret) {
-			struct node *childnode;
+			struct node *childnode = NULL;
 			size_t childlen = strlen(node->children + i);
-			char * childname = child_name(NULL, node->name,
-						      node->children + i);
+			char *childname = child_name(NULL, node->name,
+						     node->children + i);
 
 			if (!childname) {
 				log("check_store: ENOMEM");
 				ret = ENOMEM;
 				break;
 			}
+
+			if (hashtable_search(reachable, childname)) {
+				log("check_store: '%s' is duplicated!",
+				    childname);
+				i = rm_child_entry(node, i, childlen);
+				goto next;
+			}
+
 			childnode = read_node(NULL, childname, childname);
-			
+
 			if (childnode) {
-				if (hashtable_search(children, childname)) {
-					log("check_store: '%s' is duplicated!",
-					    childname);
-					i = rm_child_entry(node, i, childlen);
-				}
-				else {
-					if (!remember_string(children,
-							     childname)) {
-						log("check_store: ENOMEM");
-						talloc_free(childnode);
-						talloc_free(childname);
-						ret = ENOMEM;
-						break;
-					}
-					ret = check_store_(childname,
-							   reachable);
-				}
+				ret = check_store_(childname, reachable);
 			} else if (errno != ENOMEM) {
 				log("check_store: No child '%s' found!\n",
 				    childname);
@@ -2431,19 +2415,18 @@ static int check_store_(const char *name, struct hashtable *reachable)
 				ret = ENOMEM;
 			}
 
+ next:
 			talloc_free(childnode);
 			talloc_free(childname);
 			i += childlen + 1;
 		}
 
-		hashtable_destroy(children, 0 /* Don't free values (they are
-						 all (void *)1) */);
 		talloc_free(node);
 	} else if (errno != ENOMEM) {
 		/* Impossible, because no database should ever be without the
 		   root, and otherwise, we've just checked in our caller
 		   (which made a recursive call to get here). */
-		   
+
 		log("check_store: No child '%s' found: impossible!", name);
 	} else {
 		log("check_store: ENOMEM");
From 358c392359a29d28751907303dc181073a42ab33 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: use treewalk for check_store()

Instead of doing an open tree walk using call recursion, use
walk_node_tree() when checking the store for inconsistencies.

This will reduce code size and avoid many nesting levels of function
calls which could potentially exhaust the stack.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index a48255c64cad..ed8bc9b02ed2 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2345,18 +2345,6 @@ int remember_string(struct hashtable *hash, const char *str)
 	return hashtable_insert(hash, k, (void *)1);
 }
 
-static int rm_child_entry(struct node *node, size_t off, size_t len)
-{
-	if (!recovery)
-		return off;
-
-	if (remove_child_entry(NULL, node, off))
-		log("check_store: child entry could not be removed from '%s'",
-		    node->name);
-
-	return off - len - 1;
-}
-
 /**
  * A node has a children field that names the children of the node, separated
  * by NULs.  We check whether there are entries in there that are duplicated
@@ -2370,70 +2358,29 @@ static int rm_child_entry(struct node *node, size_t off, size_t len)
  * As we go, we record each node in the given reachable hashtable.  These
  * entries will be used later in clean_store.
  */
-static int check_store_(const char *name, struct hashtable *reachable)
+static int check_store_step(const void *ctx, struct connection *conn,
+			    struct node *node, void *arg)
 {
-	struct node *node = read_node(NULL, name, name);
-	int ret = 0;
-
-	if (node) {
-		size_t i = 0;
-
-		if (!remember_string(reachable, name)) {
-			log("check_store: ENOMEM");
-			return ENOMEM;
-		}
-
-		while (i < node->childlen && !ret) {
-			struct node *childnode = NULL;
-			size_t childlen = strlen(node->children + i);
-			char *childname = child_name(NULL, node->name,
-						     node->children + i);
-
-			if (!childname) {
-				log("check_store: ENOMEM");
-				ret = ENOMEM;
-				break;
-			}
-
-			if (hashtable_search(reachable, childname)) {
-				log("check_store: '%s' is duplicated!",
-				    childname);
-				i = rm_child_entry(node, i, childlen);
-				goto next;
-			}
-
-			childnode = read_node(NULL, childname, childname);
+	struct hashtable *reachable = arg;
 
-			if (childnode) {
-				ret = check_store_(childname, reachable);
-			} else if (errno != ENOMEM) {
-				log("check_store: No child '%s' found!\n",
-				    childname);
-				i = rm_child_entry(node, i, childlen);
-			} else {
-				log("check_store: ENOMEM");
-				ret = ENOMEM;
-			}
+	if (hashtable_search(reachable, (void *)node->name)) {
+		log("check_store: '%s' is duplicated!", node->name);
+		return recovery ? WALK_TREE_RM_CHILDENTRY
+				: WALK_TREE_SKIP_CHILDREN;
+	}
 
- next:
-			talloc_free(childnode);
-			talloc_free(childname);
-			i += childlen + 1;
-		}
+	if (!remember_string(reachable, node->name))
+		return WALK_TREE_ERROR_STOP;
 
-		talloc_free(node);
-	} else if (errno != ENOMEM) {
-		/* Impossible, because no database should ever be without the
-		   root, and otherwise, we've just checked in our caller
-		   (which made a recursive call to get here). */
+	return WALK_TREE_OK;
+}
 
-		log("check_store: No child '%s' found: impossible!", name);
-	} else {
-		log("check_store: ENOMEM");
-		ret = ENOMEM;
-	}
+static int check_store_enoent(const void *ctx, struct connection *conn,
+			      struct node *parent, char *name, void *arg)
+{
+	log("check_store: node '%s' not found", name);
 
-	return ret;
+	return recovery ? WALK_TREE_RM_CHILDENTRY : WALK_TREE_OK;
 }
 
 
@@ -2482,24 +2429,28 @@ static void clean_store(struct hashtable *reachable)
 
 void check_store(void)
 {
-	char * root = talloc_strdup(NULL, "/");
-	struct hashtable * reachable =
-		create_hashtable(16, hash_from_key_fn, keys_equal_fn);
- 
+	struct hashtable *reachable;
+	struct walk_funcs walkfuncs = {
+		.enter = check_store_step,
+		.enoent = check_store_enoent,
+	};
+
+	reachable = create_hashtable(16, hash_from_key_fn, keys_equal_fn);
 	if (!reachable) {
 		log("check_store: ENOMEM");
 		return;
 	}
 
 	log("Checking store ...");
-	if (!check_store_(root, reachable) &&
-	    !check_transactions(reachable))
+	if (walk_node_tree(NULL, NULL, "/", &walkfuncs, reachable)) {
+		if (errno == ENOMEM)
+			log("check_store: ENOMEM");
+	} else if (!check_transactions(reachable))
 		clean_store(reachable);
 	log("Checking store complete.");
 
 	hashtable_destroy(reachable, 0 /* Don't free values (they are all
 					  (void *)1) */);
-	talloc_free(root);
 }
 
 
From c47828c7286478a9dc04010ae4191c4c023f2fa8 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: use treewalk for deleting nodes

Instead of doing an open tree walk using call recursion, use
walk_node_tree() when deleting a sub-tree of nodes.

This will reduce code size and avoid many nesting levels of function
calls which could potentially exhaust the stack.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index ed8bc9b02ed2..9576411757fa 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1300,21 +1300,6 @@ static int do_read(const void *ctx, struct connection *conn,
 	return 0;
 }
 
-static void delete_node_single(struct connection *conn, struct node *node)
-{
-	TDB_DATA key;
-
-	if (access_node(conn, node, NODE_ACCESS_DELETE, &key))
-		return;
-
-	if (do_tdb_delete(conn, &key, &node->acc) != 0) {
-		corrupt(conn, "Could not delete '%s'", node->name);
-		return;
-	}
-
-	domain_entry_dec(conn, node);
-}
-
 /* Must not be / */
 static char *basename(const char *name)
 {
@@ -1585,69 +1570,59 @@ static int remove_child_entry(struct connection *conn, struct node *node,
 	return write_node(conn, node, true);
 }
 
-static void delete_child(struct connection *conn,
-			 struct node *node, const char *childname)
+static int delete_child(struct connection *conn,
+			struct node *node, const char *childname)
 {
 	unsigned int i;
 
 	for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
 		if (streq(node->children+i, childname)) {
-			if (remove_child_entry(conn, node, i))
-				corrupt(conn, "Can't update parent node '%s'",
-					node->name);
-			return;
+			errno = remove_child_entry(conn, node, i) ? EIO : 0;
+			return errno;
 		}
 	}
 	corrupt(conn, "Can't find child '%s' in %s", childname, node->name);
+
+	errno = EIO;
+	return errno;
 }
 
-static int delete_node(struct connection *conn, const void *ctx,
-		       struct node *parent, struct node *node, bool watch_exact)
+static int delnode_sub(const void *ctx, struct connection *conn,
+		       struct node *node, void *arg)
 {
-	char *name;
+	const char *root = arg;
+	bool watch_exact;
+	int ret;
+	TDB_DATA key;
 
-	/* Delete children. */
-	while (node->childlen) {
-		struct node *child;
+	/* Any error here will probably be repeated for all following calls. */
+	ret = access_node(conn, node, NODE_ACCESS_DELETE, &key);
+	if (ret > 0)
+		return WALK_TREE_SUCCESS_STOP;
 
-		name = talloc_asprintf(node, "%s/%s", node->name,
-				       node->children);
-		child = name ? read_node(conn, node, name) : NULL;
-		if (child) {
-			if (delete_node(conn, ctx, node, child, true))
-				return errno;
-		} else {
-			trace("delete_node: Error deleting child '%s/%s'!\n",
-			      node->name, node->children);
-			/* Quit deleting. */
-			errno = ENOMEM;
-			return errno;
-		}
-		talloc_free(name);
-	}
+	/* In case of error stop the walk. */
+	if (!ret && do_tdb_delete(conn, &key, &node->acc))
+		return WALK_TREE_SUCCESS_STOP;
 
 	/*
 	 * Fire the watches now, when we can still see the node permissions.
 	 * This fine as we are single threaded and the next possible read will
 	 * be handled only after the node has been really removed.
-	 */
+	*/
+	watch_exact = strcmp(root, node->name);
 	fire_watches(conn, ctx, node->name, node, watch_exact, NULL);
-	delete_node_single(conn, node);
-	delete_child(conn, parent, basename(node->name));
-	talloc_free(node);
 
-	return 0;
+	domain_entry_dec(conn, node);
+
+	return WALK_TREE_RM_CHILDENTRY;
 }
 
-static int _rm(struct connection *conn, const void *ctx, struct node *node,
-	       const char *name)
+static int _rm(struct connection *conn, const void *ctx, const char *name)
 {
-	/*
-	 * Deleting node by node, so the result is always consistent even in
-	 * case of a failure.
-	 */
 	struct node *parent;
 	char *parentname = get_parent(ctx, name);
+	struct walk_funcs walkfuncs = { .exit = delnode_sub };
+	int ret;
 
 	if (!parentname)
 		return errno;
@@ -1655,9 +1630,21 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
 	parent = read_node(conn, ctx, parentname);
 	if (!parent)
 		return read_node_can_propagate_errno() ? errno : EINVAL;
-	node->parent = parent;
 
-	return delete_node(conn, ctx, parent, node, false);
+	ret = walk_node_tree(ctx, conn, name, &walkfuncs, (void *)name);
+	if (ret < 0) {
+		if (ret == WALK_TREE_ERROR_STOP) {
+			corrupt(conn, "error when deleting sub-nodes of %s\n",
+				name);
+			errno = EIO;
+		}
+		return errno;
+	}
+
+	if (delete_child(conn, parent, basename(name)))
+		return errno;
+
+	return 0;
 }
 
 
@@ -1694,7 +1681,7 @@ static int do_rm(const void *ctx, struct connection *conn,
 	if (streq(name, "/"))
 		return EINVAL;
 
-	ret = _rm(conn, ctx, node, name);
+	ret = _rm(conn, ctx, name);
 	if (ret)
 		return ret;
 
From a4a0cd435a2d74fc8901cc7979cc37745389f96e Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: use treewalk for creating node records

Instead of doing an open tree walk using call recursion, use
walk_node_tree() when creating the node records during a live update.

This will reduce code size and avoid many nesting levels of function
calls which could potentially exhaust the stack.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 9576411757fa..e8cdfeef50c7 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2990,132 +2990,109 @@ const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
 	return NULL;
 }
 
-const char *dump_state_node_perms(FILE *fp, struct xs_state_node *sn,
-				  const struct xs_permissions *perms,
+const char *dump_state_node_perms(FILE *fp, const struct xs_permissions *perms,
 				  unsigned int n_perms)
 {
 	unsigned int p;
 
 	for (p = 0; p < n_perms; p++) {
+		struct xs_state_node_perm sp;
+
 		switch ((int)perms[p].perms & ~XS_PERM_IGNORE) {
 		case XS_PERM_READ:
-			sn->perms[p].access = XS_STATE_NODE_PERM_READ;
+			sp.access = XS_STATE_NODE_PERM_READ;
 			break;
 		case XS_PERM_WRITE:
-			sn->perms[p].access = XS_STATE_NODE_PERM_WRITE;
+			sp.access = XS_STATE_NODE_PERM_WRITE;
 			break;
 		case XS_PERM_READ | XS_PERM_WRITE:
-			sn->perms[p].access = XS_STATE_NODE_PERM_BOTH;
+			sp.access = XS_STATE_NODE_PERM_BOTH;
 			break;
 		default:
-			sn->perms[p].access = XS_STATE_NODE_PERM_NONE;
+			sp.access = XS_STATE_NODE_PERM_NONE;
 			break;
 		}
-		sn->perms[p].flags = (perms[p].perms & XS_PERM_IGNORE)
+		sp.flags = (perms[p].perms & XS_PERM_IGNORE)
 				     ? XS_STATE_NODE_PERM_IGNORE : 0;
-		sn->perms[p].domid = perms[p].id;
-	}
+		sp.domid = perms[p].id;
 
-	if (fwrite(sn->perms, sizeof(*sn->perms), n_perms, fp) != n_perms)
-		return "Dump node permissions error";
+		if (fwrite(&sp, sizeof(sp), 1, fp) != 1)
+			return "Dump node permissions error";
+	}
 
 	return NULL;
 }
 
-static const char *dump_state_node_tree(FILE *fp, char *path)
+struct dump_node_data {
+	FILE *fp;
+	const char *err;
+};
+
+static int dump_state_node_err(struct dump_node_data *data, const char *err)
+{
+	data->err = err;
+	return WALK_TREE_ERROR_STOP;
+}
+
+static int dump_state_node(const void *ctx, struct connection *conn,
+			   struct node *node, void *arg)
 {
-	unsigned int pathlen, childlen, p = 0;
+	struct dump_node_data *data = arg;
+	FILE *fp = data->fp;
+	unsigned int pathlen;
 	struct xs_state_record_header head;
 	struct xs_state_node sn;
-	TDB_DATA key, data;
-	const struct xs_tdb_record_hdr *hdr;
-	const char *child;
 	const char *ret;
 
-	pathlen = strlen(path) + 1;
-
-	set_tdb_key(path, &key);
-	data = tdb_fetch(tdb_ctx, key);
-	if (data.dptr == NULL)
-		return "Error reading node";
-
-	/* Clean up in case of failure. */
-	talloc_steal(path, data.dptr);
-
-	hdr = (void *)data.dptr;
+	pathlen = strlen(node->name) + 1;
 
 	head.type = XS_STATE_TYPE_NODE;
 	head.length = sizeof(sn);
 	sn.conn_id = 0;
 	sn.ta_id = 0;
 	sn.ta_access = 0;
-	sn.perm_n = hdr->num_perms;
+	sn.perm_n = node->perms.num;
 	sn.path_len = pathlen;
-	sn.data_len = hdr->datalen;
-	head.length += hdr->num_perms * sizeof(*sn.perms);
+	sn.data_len = node->datalen;
+	head.length += node->perms.num * sizeof(*sn.perms);
 	head.length += pathlen;
-	head.length += hdr->datalen;
+	head.length += node->datalen;
 	head.length = ROUNDUP(head.length, 3);
 
 	if (fwrite(&head, sizeof(head), 1, fp) != 1)
-		return "Dump node state error";
+		return dump_state_node_err(data, "Dump node head error");
 	if (fwrite(&sn, sizeof(sn), 1, fp) != 1)
-		return "Dump node state error";
+		return dump_state_node_err(data, "Dump node state error");
 
-	ret = dump_state_node_perms(fp, &sn, hdr->perms, hdr->num_perms);
+	ret = dump_state_node_perms(fp, node->perms.p, node->perms.num);
 	if (ret)
-		return ret;
+		return dump_state_node_err(data, ret);
 
-	if (fwrite(path, pathlen, 1, fp) != 1)
-		return "Dump node path error";
-	if (hdr->datalen &&
-	    fwrite(hdr->perms + hdr->num_perms, hdr->datalen, 1, fp) != 1)
-		return "Dump node data error";
+	if (fwrite(node->name, pathlen, 1, fp) != 1)
+		return dump_state_node_err(data, "Dump node path error");
+
+	if (node->datalen && fwrite(node->data, node->datalen, 1, fp) != 1)
+		return dump_state_node_err(data, "Dump node data error");
 
 	ret = dump_state_align(fp);
 	if (ret)
-		return ret;
-
-	child = (char *)(hdr->perms + hdr->num_perms) + hdr->datalen;
-
-	/*
-	 * Use path for constructing children paths.
-	 * As we don't write out nodes without having written their parent
-	 * already we will never clobber a part of the path we'll need later.
-	 */
-	pathlen--;
-	if (path[pathlen - 1] != '/') {
-		path[pathlen] = '/';
-		pathlen++;
-	}
-	while (p < hdr->childlen) {
-		childlen = strlen(child) + 1;
-		if (pathlen + childlen > XENSTORE_ABS_PATH_MAX)
-			return "Dump node path length error";
-		strcpy(path + pathlen, child);
-		ret = dump_state_node_tree(fp, path);
-		if (ret)
-			return ret;
-		p += childlen;
-		child += childlen;
-	}
-
-	talloc_free(data.dptr);
+		return dump_state_node_err(data, ret);
 
-	return NULL;
+	return WALK_TREE_OK;
 }
 
 const char *dump_state_nodes(FILE *fp, const void *ctx)
 {
-	char *path;
-
-	path = talloc_size(ctx, XENSTORE_ABS_PATH_MAX);
-	if (!path)
-		return "Path buffer allocation error";
+	struct dump_node_data data = {
+		.fp = fp,
+		.err = "Dump node walk error"
+	};
+	struct walk_funcs walkfuncs = { .enter = dump_state_node };
 
-	strcpy(path, "/");
+	if (walk_node_tree(ctx, NULL, "/", &walkfuncs, &data))
+		return data.err;
 
-	return dump_state_node_tree(fp, path);
+	return NULL;
 }
 
 void read_state_global(const void *ctx, const void *state)
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index f0fd8c352857..3190494bbeb5 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -326,8 +326,7 @@ const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
 				     const struct connection *conn,
 				     struct xs_state_connection *sc);
 const char *dump_state_nodes(FILE *fp, const void *ctx);
-const char *dump_state_node_perms(FILE *fp, struct xs_state_node *sn,
-				  const struct xs_permissions *perms,
+const char *dump_state_node_perms(FILE *fp, const struct xs_permissions *perms,
 				  unsigned int n_perms);
 
 void read_state_global(const void *ctx, const void *state);
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 8b503c2dfe07..a91cc75ab59b 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -1449,7 +1449,7 @@ static const char *dump_state_special_node(FILE *fp, const char *name,
 	if (fwrite(&sn, sizeof(sn), 1, fp) != 1)
 		return "Dump special node error";
 
-	ret = dump_state_node_perms(fp, &sn, perms->p, perms->num);
+	ret = dump_state_node_perms(fp, perms->p, perms->num);
 	if (ret)
 		return ret;
 
From d1e6dca486599ab914af7b38b3782b237d3d603b Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:11 +0200
Subject: tools/xenstore: remove recursion from construct_node()

In order to reduce stack usage due to recursion, switch
construct_node() to use a loop instead.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index c676ee4e4e4f..3907c35643e9 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1377,45 +1377,69 @@ static int add_child(const void *ctx, struct node *parent, const char *name)
 static struct node *construct_node(struct connection *conn, const void *ctx,
 				   const char *name)
 {
-	struct node *parent, *node;
-	char *parentname = get_parent(ctx, name);
+	const char **names = NULL;
+	unsigned int levels = 0;
+	struct node *node = NULL;
+	struct node *parent = NULL;
+	const char *parentname = talloc_strdup(ctx, name);
 
 	if (!parentname)
 		return NULL;
 
-	/* If parent doesn't exist, create it. */
-	parent = read_node(conn, parentname, parentname);
-	if (!parent && errno == ENOENT)
-		parent = construct_node(conn, ctx, parentname);
-	if (!parent)
-		return NULL;
+	/* Walk the path up until an existing node is found. */
+	while (!parent) {
+		names = talloc_realloc(ctx, names, const char *, levels + 1);
+		if (!names)
+			goto nomem;
 
-	/* Add child to parent. */
-	if (add_child(ctx, parent, name))
-		goto nomem;
+		/*
+		 * names[0] is the name of the node to construct initially,
+		 * names[1] is its parent, and so on.
+		 */
+		names[levels] = parentname;
+		parentname = get_parent(ctx, parentname);
+		if (!parentname)
+			return NULL;
 
-	/* Allocate node */
-	node = talloc(ctx, struct node);
-	if (!node)
-		goto nomem;
-	node->name = talloc_strdup(node, name);
-	if (!node->name)
-		goto nomem;
+		/* Try to read parent node until we found an existing one. */
+		parent = read_node(conn, ctx, parentname);
+		if (!parent && (errno != ENOENT || !strcmp(parentname, "/")))
+			return NULL;
 
-	/* Inherit permissions, except unprivileged domains own what they create */
-	node->perms.num = parent->perms.num;
-	node->perms.p = talloc_memdup(node, parent->perms.p,
-				      node->perms.num * sizeof(*node->perms.p));
-	if (!node->perms.p)
-		goto nomem;
-	if (domain_is_unprivileged(conn))
-		node->perms.p[0].id = conn->id;
+		levels++;
+	}
+
+	/* Walk the path down again constructing the missing nodes. */
+	for (; levels > 0; levels--) {
+		/* Add child to parent. */
+		if (add_child(ctx, parent, names[levels - 1]))
+			goto nomem;
+
+		/* Allocate node */
+		node = talloc(ctx, struct node);
+		if (!node)
+			goto nomem;
+		node->name = talloc_steal(node, names[levels - 1]);
+
+		/* Inherit permissions, unpriv domains own what they create. */
+		node->perms.num = parent->perms.num;
+		node->perms.p = talloc_memdup(node, parent->perms.p,
+					      node->perms.num *
+					      sizeof(*node->perms.p));
+		if (!node->perms.p)
+			goto nomem;
+		if (domain_is_unprivileged(conn))
+			node->perms.p[0].id = conn->id;
+
+		/* No children, no data */
+		node->children = node->data = NULL;
+		node->childlen = node->datalen = 0;
+		node->acc.memory = 0;
+		node->parent = parent;
+
+		parent = node;
+	}
 
-	/* No children, no data */
-	node->children = node->data = NULL;
-	node->childlen = node->datalen = 0;
-	node->acc.memory = 0;
-	node->parent = parent;
 	return node;
 
 nomem:
From c13d85a2fe94bbf3cb8186b89324c5d1b4f9a61f Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:11 +0200
Subject: tools/xenstore: don't let remove_child_entry() call corrupt()

In case of write_node() returning an error, remove_child_entry() will
call corrupt() today. This could result in an endless recursion, as
remove_child_entry() is called by corrupt(), too:

corrupt()
  check_store()
    check_store_()
      remove_child_entry()

Fix that by letting remove_child_entry() return an error instead and
let the caller decide what to do.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 3907c35643e9..f433a45dc217 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1608,15 +1608,15 @@ static void memdel(void *mem, unsigned off, unsigned len, unsigned total)
 	memmove(mem + off, mem + off + len, total - off - len);
 }
 
-static void remove_child_entry(struct connection *conn, struct node *node,
-			       size_t offset)
+static int remove_child_entry(struct connection *conn, struct node *node,
+			      size_t offset)
 {
 	size_t childlen = strlen(node->children + offset);
 
 	memdel(node->children, offset, childlen + 1, node->childlen);
 	node->childlen -= childlen + 1;
-	if (write_node(conn, node, true))
-		corrupt(conn, "Can't update parent node '%s'", node->name);
+
+	return write_node(conn, node, true);
 }
 
 static void delete_child(struct connection *conn,
@@ -1626,7 +1626,9 @@ static void delete_child(struct connection *conn,
 
 	for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
 		if (streq(node->children+i, childname)) {
-			remove_child_entry(conn, node, i);
+			if (remove_child_entry(conn, node, i))
+				corrupt(conn, "Can't update parent node '%s'",
+					node->name);
 			return;
 		}
 	}
@@ -2325,6 +2327,17 @@ int remember_string(struct hashtable *hash, const char *str)
 	return hashtable_insert(hash, k, (void *)1);
 }
 
+static int rm_child_entry(struct node *node, size_t off, size_t len)
+{
+	if (!recovery)
+		return off;
+
+	if (remove_child_entry(NULL, node, off))
+		log("check_store: child entry could not be removed from '%s'",
+		    node->name);
+
+	return off - len - 1;
+}
 
 /**
  * A node has a children field that names the children of the node, separated
@@ -2377,12 +2390,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
 				if (hashtable_search(children, childname)) {
 					log("check_store: '%s' is duplicated!",
 					    childname);
-
-					if (recovery) {
-						remove_child_entry(NULL, node,
-								   i);
-						i -= childlen + 1;
-					}
+					i = rm_child_entry(node, i, childlen);
 				}
 				else {
 					if (!remember_string(children,
@@ -2399,11 +2407,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
 			} else if (errno != ENOMEM) {
 				log("check_store: No child '%s' found!\n",
 				    childname);
-
-				if (recovery) {
-					remove_child_entry(NULL, node, i);
-					i -= childlen + 1;
-				}
+				i = rm_child_entry(node, i, childlen);
 			} else {
 				log("check_store: ENOMEM");
 				ret = ENOMEM;
From aac9b51b6fbbbd16c910f69365345528c5bec106 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:11 +0200
Subject: tools/xenstore: add generic treewalk function

Add a generic function to walk the complete node tree. It will start
at "/" and descend recursively into each child, calling a function
specified by the caller. Depending on the return value of the user
specified function the walk will be aborted, continued, or the current
child will be skipped by not descending into its children.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index f433a45dc217..2cda3ee375ab 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1838,6 +1838,135 @@ static int do_set_perms(const void *ctx, struct connection *conn,
 	return 0;
 }
 
+static char *child_name(const void *ctx, const char *s1, const char *s2)
+{
+	if (strcmp(s1, "/"))
+		return talloc_asprintf(ctx, "%s/%s", s1, s2);
+	return talloc_asprintf(ctx, "/%s", s2);
+}
+
+static int rm_from_parent(struct connection *conn, struct node *parent,
+			  const char *name)
+{
+	size_t off;
+
+	if (!parent)
+		return WALK_TREE_ERROR_STOP;
+
+	for (off = parent->childoff - 1; off && parent->children[off - 1];
+	     off--);
+	if (remove_child_entry(conn, parent, off)) {
+		log("treewalk: child entry could not be removed from '%s'",
+		    parent->name);
+		return WALK_TREE_ERROR_STOP;
+	}
+	parent->childoff = off;
+
+	return WALK_TREE_OK;
+}
+
+static int walk_call_func(const void *ctx, struct connection *conn,
+			  struct node *node, struct node *parent, void *arg,
+			  int (*func)(const void *ctx, struct connection *conn,
+				      struct node *node, void *arg))
+{
+	int ret;
+
+	if (!func)
+		return WALK_TREE_OK;
+
+	ret = func(ctx, conn, node, arg);
+	if (ret == WALK_TREE_RM_CHILDENTRY && parent)
+		ret = rm_from_parent(conn, parent, node->name);
+
+	return ret;
+}
+
+int walk_node_tree(const void *ctx, struct connection *conn, const char *root,
+		   struct walk_funcs *funcs, void *arg)
+{
+	int ret = 0;
+	void *tmpctx;
+	char *name;
+	struct node *node = NULL;
+	struct node *parent = NULL;
+
+	tmpctx = talloc_new(ctx);
+	if (!tmpctx) {
+		errno = ENOMEM;
+		return WALK_TREE_ERROR_STOP;
+	}
+	name = talloc_strdup(tmpctx, root);
+	if (!name) {
+		errno = ENOMEM;
+		talloc_free(tmpctx);
+		return WALK_TREE_ERROR_STOP;
+	}
+
+	/* Continue the walk until an error is returned. */
+	while (ret >= 0) {
+		/* node == NULL possible only for the initial loop iteration. */
+		if (node) {
+			/* Go one step up if ret or if last child finished. */
+			if (ret || node->childoff >= node->childlen) {
+				parent = node->parent;
+				/* Call function AFTER processing a node. */
+				ret = walk_call_func(ctx, conn, node, parent,
+						     arg, funcs->exit);
+				/* Last node, so exit loop. */
+				if (!parent)
+					break;
+				talloc_free(node);
+				/* Continue with parent. */
+				node = parent;
+				continue;
+			}
+			/* Get next child of current node. */
+			name = child_name(tmpctx, node->name,
+					  node->children + node->childoff);
+			if (!name) {
+				ret = WALK_TREE_ERROR_STOP;
+				break;
+			}
+			/* Point to next child. */
+			node->childoff += strlen(node->children +
+						 node->childoff) + 1;
+			/* Descent into children. */
+			parent = node;
+		}
+		/* Read next node (root node or next child). */
+		node = read_node(conn, tmpctx, name);
+		if (!node) {
+			/* Child not found - should not happen! */
+			/* ENOENT case can be handled by supplied function. */
+			if (errno == ENOENT && funcs->enoent)
+				ret = funcs->enoent(ctx, conn, parent, name,
+						    arg);
+			else
+				ret = WALK_TREE_ERROR_STOP;
+			if (!parent)
+				break;
+			if (ret == WALK_TREE_RM_CHILDENTRY)
+				ret = rm_from_parent(conn, parent, name);
+			if (ret < 0)
+				break;
+			talloc_free(name);
+			node = parent;
+			continue;
+		}
+		talloc_free(name);
+		node->parent = parent;
+		node->childoff = 0;
+		/* Call function BEFORE processing a node. */
+		ret = walk_call_func(ctx, conn, node, parent, arg,
+				     funcs->enter);
+	}
+
+	talloc_free(tmpctx);
+
+	return ret < 0 ? ret : WALK_TREE_OK;
+}
+
 static struct {
 	const char *str;
 	int (*func)(const void *ctx, struct connection *conn,
@@ -2305,18 +2434,6 @@ static int keys_equal_fn(void *key1, void *key2)
 	return 0 == strcmp((char *)key1, (char *)key2);
 }
 
-
-static char *child_name(const char *s1, const char *s2)
-{
-	if (strcmp(s1, "/")) {
-		return talloc_asprintf(NULL, "%s/%s", s1, s2);
-	}
-	else {
-		return talloc_asprintf(NULL, "/%s", s2);
-	}
-}
-
-
 int remember_string(struct hashtable *hash, const char *str)
 {
 	char *k = malloc(strlen(str) + 1);
@@ -2376,7 +2493,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
 		while (i < node->childlen && !ret) {
 			struct node *childnode;
 			size_t childlen = strlen(node->children + i);
-			char * childname = child_name(node->name,
+			char * childname = child_name(NULL, node->name,
 						      node->children + i);
 
 			if (!childname) {
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index bfd3fc1e9df3..2d9942171d92 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -202,6 +202,7 @@ struct node {
 
 	/* Children, each nul-terminated. */
 	unsigned int childlen;
+	unsigned int childoff;	/* Used by walk_node_tree() internally. */
 	char *children;
 
 	/* Allocation information for node currently in store. */
@@ -338,6 +339,45 @@ void read_state_buffered_data(const void *ctx, struct connection *conn,
 			      const struct xs_state_connection *sc);
 void read_state_node(const void *ctx, const void *state);
 
+/*
+ * Walk the node tree below root calling funcs->enter() and funcs->exit() for
+ * each node. funcs->enter() is being called when entering a node, so before
+ * any of the children of the node is processed. funcs->exit() is being
+ * called when leaving the node, so after all children have been processed.
+ * funcs->enoent() is being called when a node isn't existing.
+ * funcs->*() return values:
+ *  < 0: tree walk is stopped, walk_node_tree() returns funcs->*() return value
+ *       in case WALK_TREE_ERROR_STOP is returned, errno should be set
+ *  WALK_TREE_OK: tree walk is continuing
+ *  WALK_TREE_SKIP_CHILDREN: tree walk won't descend below current node, but
+ *       walk continues
+ *  WALK_TREE_RM_CHILDENTRY: Remove the child entry from its parent and write
+ *       the modified parent node back to the data base, implies to not descend
+ *       below the current node, but to continue the walk
+ * funcs->*() is allowed to modify the node it is called for in the data base.
+ * In case funcs->enter() is deleting the node, it must not return WALK_TREE_OK
+ * in order to avoid descending into no longer existing children.
+ */
+/* Return values for funcs->*() and walk_node_tree(). */
+#define WALK_TREE_SUCCESS_STOP  -100    /* Stop walk early, no error. */
+#define WALK_TREE_ERROR_STOP    -1      /* Stop walk due to error. */
+#define WALK_TREE_OK            0       /* No error. */
+/* Return value for funcs->*() only. */
+#define WALK_TREE_SKIP_CHILDREN 1       /* Don't recurse below current node. */
+#define WALK_TREE_RM_CHILDENTRY 2       /* Remove child entry from parent. */
+
+struct walk_funcs {
+	int (*enter)(const void *ctx, struct connection *conn,
+		     struct node *node, void *arg);
+	int (*exit)(const void *ctx, struct connection *conn,
+		    struct node *node, void *arg);
+	int (*enoent)(const void *ctx, struct connection *conn,
+		      struct node *parent, char *name, void *arg);
+};
+
+int walk_node_tree(const void *ctx, struct connection *conn, const char *root,
+		   struct walk_funcs *funcs, void *arg);
+
 #endif /* _XENSTORED_CORE_H */
 
 /*
From bdc931fb5dcebbd8d0e44b5d8bd3fb9106ee8596 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: simplify check_store()

check_store() is using a hash table for storing all node names it has
found via walking the tree. Additionally it using another hash table
for all children of a node to detect duplicate child names.

Simplify that by dropping the second hash table as the first one is
already holding all the needed information.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 2cda3ee375ab..760f3c16c794 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2477,50 +2477,34 @@ static int check_store_(const char *name, struct hashtable *reachable)
 	if (node) {
 		size_t i = 0;
 
-		struct hashtable * children =
-			create_hashtable(16, hash_from_key_fn, keys_equal_fn);
-		if (!children) {
-			log("check_store create table: ENOMEM");
-			return ENOMEM;
-		}
-
 		if (!remember_string(reachable, name)) {
-			hashtable_destroy(children, 0);
 			log("check_store: ENOMEM");
 			return ENOMEM;
 		}
 
 		while (i < node->childlen && !ret) {
-			struct node *childnode;
+			struct node *childnode = NULL;
 			size_t childlen = strlen(node->children + i);
-			char * childname = child_name(NULL, node->name,
-						      node->children + i);
+			char *childname = child_name(NULL, node->name,
+						     node->children + i);
 
 			if (!childname) {
 				log("check_store: ENOMEM");
 				ret = ENOMEM;
 				break;
 			}
+
+			if (hashtable_search(reachable, childname)) {
+				log("check_store: '%s' is duplicated!",
+				    childname);
+				i = rm_child_entry(node, i, childlen);
+				goto next;
+			}
+
 			childnode = read_node(NULL, childname, childname);
-			
+
 			if (childnode) {
-				if (hashtable_search(children, childname)) {
-					log("check_store: '%s' is duplicated!",
-					    childname);
-					i = rm_child_entry(node, i, childlen);
-				}
-				else {
-					if (!remember_string(children,
-							     childname)) {
-						log("check_store: ENOMEM");
-						talloc_free(childnode);
-						talloc_free(childname);
-						ret = ENOMEM;
-						break;
-					}
-					ret = check_store_(childname,
-							   reachable);
-				}
+				ret = check_store_(childname, reachable);
 			} else if (errno != ENOMEM) {
 				log("check_store: No child '%s' found!\n",
 				    childname);
@@ -2530,19 +2514,18 @@ static int check_store_(const char *name, struct hashtable *reachable)
 				ret = ENOMEM;
 			}
 
+ next:
 			talloc_free(childnode);
 			talloc_free(childname);
 			i += childlen + 1;
 		}
 
-		hashtable_destroy(children, 0 /* Don't free values (they are
-						 all (void *)1) */);
 		talloc_free(node);
 	} else if (errno != ENOMEM) {
 		/* Impossible, because no database should ever be without the
 		   root, and otherwise, we've just checked in our caller
 		   (which made a recursive call to get here). */
-		   
+
 		log("check_store: No child '%s' found: impossible!", name);
 	} else {
 		log("check_store: ENOMEM");
From 27817f0a7d6802be04e8f43a0900b02f881b28b2 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: use treewalk for check_store()

Instead of doing an open tree walk using call recursion, use
walk_node_tree() when checking the store for inconsistencies.

This will reduce code size and avoid many nesting levels of function
calls which could potentially exhaust the stack.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 760f3c16c794..efdd1888fd78 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2444,18 +2444,6 @@ int remember_string(struct hashtable *hash, const char *str)
 	return hashtable_insert(hash, k, (void *)1);
 }
 
-static int rm_child_entry(struct node *node, size_t off, size_t len)
-{
-	if (!recovery)
-		return off;
-
-	if (remove_child_entry(NULL, node, off))
-		log("check_store: child entry could not be removed from '%s'",
-		    node->name);
-
-	return off - len - 1;
-}
-
 /**
  * A node has a children field that names the children of the node, separated
  * by NULs.  We check whether there are entries in there that are duplicated
@@ -2469,70 +2457,29 @@ static int rm_child_entry(struct node *node, size_t off, size_t len)
  * As we go, we record each node in the given reachable hashtable.  These
  * entries will be used later in clean_store.
  */
-static int check_store_(const char *name, struct hashtable *reachable)
+static int check_store_step(const void *ctx, struct connection *conn,
+			    struct node *node, void *arg)
 {
-	struct node *node = read_node(NULL, name, name);
-	int ret = 0;
-
-	if (node) {
-		size_t i = 0;
-
-		if (!remember_string(reachable, name)) {
-			log("check_store: ENOMEM");
-			return ENOMEM;
-		}
-
-		while (i < node->childlen && !ret) {
-			struct node *childnode = NULL;
-			size_t childlen = strlen(node->children + i);
-			char *childname = child_name(NULL, node->name,
-						     node->children + i);
-
-			if (!childname) {
-				log("check_store: ENOMEM");
-				ret = ENOMEM;
-				break;
-			}
+	struct hashtable *reachable = arg;
 
-			if (hashtable_search(reachable, childname)) {
-				log("check_store: '%s' is duplicated!",
-				    childname);
-				i = rm_child_entry(node, i, childlen);
-				goto next;
-			}
-
-			childnode = read_node(NULL, childname, childname);
-
-			if (childnode) {
-				ret = check_store_(childname, reachable);
-			} else if (errno != ENOMEM) {
-				log("check_store: No child '%s' found!\n",
-				    childname);
-				i = rm_child_entry(node, i, childlen);
-			} else {
-				log("check_store: ENOMEM");
-				ret = ENOMEM;
-			}
+	if (hashtable_search(reachable, (void *)node->name)) {
+		log("check_store: '%s' is duplicated!", node->name);
+		return recovery ? WALK_TREE_RM_CHILDENTRY
+				: WALK_TREE_SKIP_CHILDREN;
+	}
 
- next:
-			talloc_free(childnode);
-			talloc_free(childname);
-			i += childlen + 1;
-		}
+	if (!remember_string(reachable, node->name))
+		return WALK_TREE_ERROR_STOP;
 
-		talloc_free(node);
-	} else if (errno != ENOMEM) {
-		/* Impossible, because no database should ever be without the
-		   root, and otherwise, we've just checked in our caller
-		   (which made a recursive call to get here). */
+	return WALK_TREE_OK;
+}
 
-		log("check_store: No child '%s' found: impossible!", name);
-	} else {
-		log("check_store: ENOMEM");
-		ret = ENOMEM;
-	}
+static int check_store_enoent(const void *ctx, struct connection *conn,
+			      struct node *parent, char *name, void *arg)
+{
+	log("check_store: node '%s' not found", name);
 
-	return ret;
+	return recovery ? WALK_TREE_RM_CHILDENTRY : WALK_TREE_OK;
 }
 
 
@@ -2581,24 +2528,28 @@ static void clean_store(struct hashtable *reachable)
 
 void check_store(void)
 {
-	char * root = talloc_strdup(NULL, "/");
-	struct hashtable * reachable =
-		create_hashtable(16, hash_from_key_fn, keys_equal_fn);
- 
+	struct hashtable *reachable;
+	struct walk_funcs walkfuncs = {
+		.enter = check_store_step,
+		.enoent = check_store_enoent,
+	};
+
+	reachable = create_hashtable(16, hash_from_key_fn, keys_equal_fn);
 	if (!reachable) {
 		log("check_store: ENOMEM");
 		return;
 	}
 
 	log("Checking store ...");
-	if (!check_store_(root, reachable) &&
-	    !check_transactions(reachable))
+	if (walk_node_tree(NULL, NULL, "/", &walkfuncs, reachable)) {
+		if (errno == ENOMEM)
+			log("check_store: ENOMEM");
+	} else if (!check_transactions(reachable))
 		clean_store(reachable);
 	log("Checking store complete.");
 
 	hashtable_destroy(reachable, 0 /* Don't free values (they are all
 					  (void *)1) */);
-	talloc_free(root);
 }
 
 
From 6ea0ffbd88b11f23779d763501ec1370b590bb2a Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: use treewalk for deleting nodes

Instead of doing an open tree walk using call recursion, use
walk_node_tree() when deleting a sub-tree of nodes.

This will reduce code size and avoid many nesting levels of function
calls which could potentially exhaust the stack.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index efdd1888fd78..58fb651542ec 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1334,21 +1334,6 @@ static int do_read(const void *ctx, struct connection *conn,
 	return 0;
 }
 
-static void delete_node_single(struct connection *conn, struct node *node)
-{
-	TDB_DATA key;
-
-	if (access_node(conn, node, NODE_ACCESS_DELETE, &key))
-		return;
-
-	if (do_tdb_delete(conn, &key, &node->acc) != 0) {
-		corrupt(conn, "Could not delete '%s'", node->name);
-		return;
-	}
-
-	domain_entry_dec(conn, node);
-}
-
 /* Must not be / */
 static char *basename(const char *name)
 {
@@ -1619,69 +1604,59 @@ static int remove_child_entry(struct connection *conn, struct node *node,
 	return write_node(conn, node, true);
 }
 
-static void delete_child(struct connection *conn,
-			 struct node *node, const char *childname)
+static int delete_child(struct connection *conn,
+			struct node *node, const char *childname)
 {
 	unsigned int i;
 
 	for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
 		if (streq(node->children+i, childname)) {
-			if (remove_child_entry(conn, node, i))
-				corrupt(conn, "Can't update parent node '%s'",
-					node->name);
-			return;
+			errno = remove_child_entry(conn, node, i) ? EIO : 0;
+			return errno;
 		}
 	}
 	corrupt(conn, "Can't find child '%s' in %s", childname, node->name);
+
+	errno = EIO;
+	return errno;
 }
 
-static int delete_node(struct connection *conn, const void *ctx,
-		       struct node *parent, struct node *node, bool watch_exact)
+static int delnode_sub(const void *ctx, struct connection *conn,
+		       struct node *node, void *arg)
 {
-	char *name;
+	const char *root = arg;
+	bool watch_exact;
+	int ret;
+	TDB_DATA key;
 
-	/* Delete children. */
-	while (node->childlen) {
-		struct node *child;
+	/* Any error here will probably be repeated for all following calls. */
+	ret = access_node(conn, node, NODE_ACCESS_DELETE, &key);
+	if (ret > 0)
+		return WALK_TREE_SUCCESS_STOP;
 
-		name = talloc_asprintf(node, "%s/%s", node->name,
-				       node->children);
-		child = name ? read_node(conn, node, name) : NULL;
-		if (child) {
-			if (delete_node(conn, ctx, node, child, true))
-				return errno;
-		} else {
-			trace("delete_node: Error deleting child '%s/%s'!\n",
-			      node->name, node->children);
-			/* Quit deleting. */
-			errno = ENOMEM;
-			return errno;
-		}
-		talloc_free(name);
-	}
+	/* In case of error stop the walk. */
+	if (!ret && do_tdb_delete(conn, &key, &node->acc))
+		return WALK_TREE_SUCCESS_STOP;
 
 	/*
 	 * Fire the watches now, when we can still see the node permissions.
 	 * This fine as we are single threaded and the next possible read will
 	 * be handled only after the node has been really removed.
-	 */
+	*/
+	watch_exact = strcmp(root, node->name);
 	fire_watches(conn, ctx, node->name, node, watch_exact, NULL);
-	delete_node_single(conn, node);
-	delete_child(conn, parent, basename(node->name));
-	talloc_free(node);
 
-	return 0;
+	domain_entry_dec(conn, node);
+
+	return WALK_TREE_RM_CHILDENTRY;
 }
 
-static int _rm(struct connection *conn, const void *ctx, struct node *node,
-	       const char *name)
+static int _rm(struct connection *conn, const void *ctx, const char *name)
 {
-	/*
-	 * Deleting node by node, so the result is always consistent even in
-	 * case of a failure.
-	 */
 	struct node *parent;
 	char *parentname = get_parent(ctx, name);
+	struct walk_funcs walkfuncs = { .exit = delnode_sub };
+	int ret;
 
 	if (!parentname)
 		return errno;
@@ -1689,9 +1664,21 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
 	parent = read_node(conn, ctx, parentname);
 	if (!parent)
 		return read_node_can_propagate_errno() ? errno : EINVAL;
-	node->parent = parent;
 
-	return delete_node(conn, ctx, parent, node, false);
+	ret = walk_node_tree(ctx, conn, name, &walkfuncs, (void *)name);
+	if (ret < 0) {
+		if (ret == WALK_TREE_ERROR_STOP) {
+			corrupt(conn, "error when deleting sub-nodes of %s\n",
+				name);
+			errno = EIO;
+		}
+		return errno;
+	}
+
+	if (delete_child(conn, parent, basename(name)))
+		return errno;
+
+	return 0;
 }
 
 
@@ -1728,7 +1715,7 @@ static int do_rm(const void *ctx, struct connection *conn,
 	if (streq(name, "/"))
 		return EINVAL;
 
-	ret = _rm(conn, ctx, node, name);
+	ret = _rm(conn, ctx, name);
 	if (ret)
 		return ret;
 
From 1ee281b18b52bec87335ea64ee74cc159e63d036 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: use treewalk for creating node records

Instead of doing an open tree walk using call recursion, use
walk_node_tree() when creating the node records during a live update.

This will reduce code size and avoid many nesting levels of function
calls which could potentially exhaust the stack.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 58fb651542ec..05d349778bb4 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -3120,101 +3120,76 @@ const char *dump_state_node_perms(FILE *fp, const struct xs_permissions *perms,
 	return NULL;
 }
 
-static const char *dump_state_node_tree(FILE *fp, char *path,
-					unsigned int path_max_len)
+struct dump_node_data {
+	FILE *fp;
+	const char *err;
+};
+
+static int dump_state_node_err(struct dump_node_data *data, const char *err)
 {
-	unsigned int pathlen, childlen, p = 0;
+	data->err = err;
+	return WALK_TREE_ERROR_STOP;
+}
+
+static int dump_state_node(const void *ctx, struct connection *conn,
+			   struct node *node, void *arg)
+{
+	struct dump_node_data *data = arg;
+	FILE *fp = data->fp;
+	unsigned int pathlen;
 	struct xs_state_record_header head;
 	struct xs_state_node sn;
-	TDB_DATA key, data;
-	const struct xs_tdb_record_hdr *hdr;
-	const char *child;
 	const char *ret;
 
-	pathlen = strlen(path) + 1;
-
-	set_tdb_key(path, &key);
-	data = tdb_fetch(tdb_ctx, key);
-	if (data.dptr == NULL)
-		return "Error reading node";
-
-	/* Clean up in case of failure. */
-	talloc_steal(path, data.dptr);
-
-	hdr = (void *)data.dptr;
+	pathlen = strlen(node->name) + 1;
 
 	head.type = XS_STATE_TYPE_NODE;
 	head.length = sizeof(sn);
 	sn.conn_id = 0;
 	sn.ta_id = 0;
 	sn.ta_access = 0;
-	sn.perm_n = hdr->num_perms;
+	sn.perm_n = node->perms.num;
 	sn.path_len = pathlen;
-	sn.data_len = hdr->datalen;
-	head.length += hdr->num_perms * sizeof(*sn.perms);
+	sn.data_len = node->datalen;
+	head.length += node->perms.num * sizeof(*sn.perms);
 	head.length += pathlen;
-	head.length += hdr->datalen;
+	head.length += node->datalen;
 	head.length = ROUNDUP(head.length, 3);
 
 	if (fwrite(&head, sizeof(head), 1, fp) != 1)
-		return "Dump node state error";
+		return dump_state_node_err(data, "Dump node head error");
 	if (fwrite(&sn, sizeof(sn), 1, fp) != 1)
-		return "Dump node state error";
+		return dump_state_node_err(data, "Dump node state error");
 
-	ret = dump_state_node_perms(fp, hdr->perms, hdr->num_perms);
+	ret = dump_state_node_perms(fp, node->perms.p, node->perms.num);
 	if (ret)
-		return ret;
+		return dump_state_node_err(data, ret);
+
+	if (fwrite(node->name, pathlen, 1, fp) != 1)
+		return dump_state_node_err(data, "Dump node path error");
 
-	if (fwrite(path, pathlen, 1, fp) != 1)
-		return "Dump node path error";
-	if (hdr->datalen &&
-	    fwrite(hdr->perms + hdr->num_perms, hdr->datalen, 1, fp) != 1)
-		return "Dump node data error";
+	if (node->datalen && fwrite(node->data, node->datalen, 1, fp) != 1)
+		return dump_state_node_err(data, "Dump node data error");
 
 	ret = dump_state_align(fp);
 	if (ret)
-		return ret;
+		return dump_state_node_err(data, ret);
 
-	child = (char *)(hdr->perms + hdr->num_perms) + hdr->datalen;
-
-	/*
-	 * Use path for constructing children paths.
-	 * As we don't write out nodes without having written their parent
-	 * already we will never clobber a part of the path we'll need later.
-	 */
-	pathlen--;
-	if (path[pathlen - 1] != '/') {
-		path[pathlen] = '/';
-		pathlen++;
-	}
-	while (p < hdr->childlen) {
-		childlen = strlen(child) + 1;
-		if (pathlen + childlen > path_max_len)
-			return "Dump node path length error";
-		strcpy(path + pathlen, child);
-		ret = dump_state_node_tree(fp, path, path_max_len);
-		if (ret)
-			return ret;
-		p += childlen;
-		child += childlen;
-	}
-
-	talloc_free(data.dptr);
-
-	return NULL;
+	return WALK_TREE_OK;
 }
 
 const char *dump_state_nodes(FILE *fp, const void *ctx)
 {
-	char *path;
-
-	path = talloc_size(ctx, XENSTORE_ABS_PATH_MAX + 1);
-	if (!path)
-		return "Path buffer allocation error";
+	struct dump_node_data data = {
+		.fp = fp,
+		.err = "Dump node walk error"
+	};
+	struct walk_funcs walkfuncs = { .enter = dump_state_node };
 
-	strcpy(path, "/");
+	if (walk_node_tree(ctx, NULL, "/", &walkfuncs, &data))
+		return data.err;
 
-	return dump_state_node_tree(fp, path, XENSTORE_ABS_PATH_MAX + 1);
+	return NULL;
 }
 
 void read_state_global(const void *ctx, const void *state)
From ffd1ba6b97f24c3bf29b49557c137d70a3819d07 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: simplify check_store()

check_store() is using a hash table for storing all node names it has
found via walking the tree. Additionally it using another hash table
for all children of a node to detect duplicate child names.

Simplify that by dropping the second hash table as the first one is
already holding all the needed information.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index f78faf0c3e06..b3596c7eeb99 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2456,50 +2456,34 @@ static int check_store_(const char *name, struct hashtable *reachable)
 	if (node) {
 		size_t i = 0;
 
-		struct hashtable * children =
-			create_hashtable(16, hash_from_key_fn, keys_equal_fn);
-		if (!children) {
-			log("check_store create table: ENOMEM");
-			return ENOMEM;
-		}
-
 		if (!remember_string(reachable, name)) {
-			hashtable_destroy(children, 0);
 			log("check_store: ENOMEM");
 			return ENOMEM;
 		}
 
 		while (i < node->childlen && !ret) {
-			struct node *childnode;
+			struct node *childnode = NULL;
 			size_t childlen = strlen(node->children + i);
-			char * childname = child_name(NULL, node->name,
-						      node->children + i);
+			char *childname = child_name(NULL, node->name,
+						     node->children + i);
 
 			if (!childname) {
 				log("check_store: ENOMEM");
 				ret = ENOMEM;
 				break;
 			}
+
+			if (hashtable_search(reachable, childname)) {
+				log("check_store: '%s' is duplicated!",
+				    childname);
+				i = rm_child_entry(node, i, childlen);
+				goto next;
+			}
+
 			childnode = read_node(NULL, childname, childname);
-			
+
 			if (childnode) {
-				if (hashtable_search(children, childname)) {
-					log("check_store: '%s' is duplicated!",
-					    childname);
-					i = rm_child_entry(node, i, childlen);
-				}
-				else {
-					if (!remember_string(children,
-							     childname)) {
-						log("check_store: ENOMEM");
-						talloc_free(childnode);
-						talloc_free(childname);
-						ret = ENOMEM;
-						break;
-					}
-					ret = check_store_(childname,
-							   reachable);
-				}
+				ret = check_store_(childname, reachable);
 			} else if (errno != ENOMEM) {
 				log("check_store: No child '%s' found!\n",
 				    childname);
@@ -2509,19 +2493,18 @@ static int check_store_(const char *name, struct hashtable *reachable)
 				ret = ENOMEM;
 			}
 
+ next:
 			talloc_free(childnode);
 			talloc_free(childname);
 			i += childlen + 1;
 		}
 
-		hashtable_destroy(children, 0 /* Don't free values (they are
-						 all (void *)1) */);
 		talloc_free(node);
 	} else if (errno != ENOMEM) {
 		/* Impossible, because no database should ever be without the
 		   root, and otherwise, we've just checked in our caller
 		   (which made a recursive call to get here). */
-		   
+
 		log("check_store: No child '%s' found: impossible!", name);
 	} else {
 		log("check_store: ENOMEM");
From 47854a52eb8a5c31e6c0c150a87c86787ecde1c6 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: use treewalk for check_store()

Instead of doing an open tree walk using call recursion, use
walk_node_tree() when checking the store for inconsistencies.

This will reduce code size and avoid many nesting levels of function
calls which could potentially exhaust the stack.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index b3596c7eeb99..b0889186b61a 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2423,18 +2423,6 @@ int remember_string(struct hashtable *hash, const char *str)
 	return hashtable_insert(hash, k, (void *)1);
 }
 
-static int rm_child_entry(struct node *node, size_t off, size_t len)
-{
-	if (!recovery)
-		return off;
-
-	if (remove_child_entry(NULL, node, off))
-		log("check_store: child entry could not be removed from '%s'",
-		    node->name);
-
-	return off - len - 1;
-}
-
 /**
  * A node has a children field that names the children of the node, separated
  * by NULs.  We check whether there are entries in there that are duplicated
@@ -2448,70 +2436,29 @@ static int rm_child_entry(struct node *node, size_t off, size_t len)
  * As we go, we record each node in the given reachable hashtable.  These
  * entries will be used later in clean_store.
  */
-static int check_store_(const char *name, struct hashtable *reachable)
+static int check_store_step(const void *ctx, struct connection *conn,
+			    struct node *node, void *arg)
 {
-	struct node *node = read_node(NULL, name, name);
-	int ret = 0;
-
-	if (node) {
-		size_t i = 0;
-
-		if (!remember_string(reachable, name)) {
-			log("check_store: ENOMEM");
-			return ENOMEM;
-		}
-
-		while (i < node->childlen && !ret) {
-			struct node *childnode = NULL;
-			size_t childlen = strlen(node->children + i);
-			char *childname = child_name(NULL, node->name,
-						     node->children + i);
-
-			if (!childname) {
-				log("check_store: ENOMEM");
-				ret = ENOMEM;
-				break;
-			}
-
-			if (hashtable_search(reachable, childname)) {
-				log("check_store: '%s' is duplicated!",
-				    childname);
-				i = rm_child_entry(node, i, childlen);
-				goto next;
-			}
+	struct hashtable *reachable = arg;
 
-			childnode = read_node(NULL, childname, childname);
-
-			if (childnode) {
-				ret = check_store_(childname, reachable);
-			} else if (errno != ENOMEM) {
-				log("check_store: No child '%s' found!\n",
-				    childname);
-				i = rm_child_entry(node, i, childlen);
-			} else {
-				log("check_store: ENOMEM");
-				ret = ENOMEM;
-			}
+	if (hashtable_search(reachable, (void *)node->name)) {
+		log("check_store: '%s' is duplicated!", node->name);
+		return recovery ? WALK_TREE_RM_CHILDENTRY
+				: WALK_TREE_SKIP_CHILDREN;
+	}
 
- next:
-			talloc_free(childnode);
-			talloc_free(childname);
-			i += childlen + 1;
-		}
+	if (!remember_string(reachable, node->name))
+		return WALK_TREE_ERROR_STOP;
 
-		talloc_free(node);
-	} else if (errno != ENOMEM) {
-		/* Impossible, because no database should ever be without the
-		   root, and otherwise, we've just checked in our caller
-		   (which made a recursive call to get here). */
+	return WALK_TREE_OK;
+}
 
-		log("check_store: No child '%s' found: impossible!", name);
-	} else {
-		log("check_store: ENOMEM");
-		ret = ENOMEM;
-	}
+static int check_store_enoent(const void *ctx, struct connection *conn,
+			      struct node *parent, char *name, void *arg)
+{
+	log("check_store: node '%s' not found", name);
 
-	return ret;
+	return recovery ? WALK_TREE_RM_CHILDENTRY : WALK_TREE_OK;
 }
 
 
@@ -2560,29 +2507,28 @@ static void clean_store(struct hashtable *reachable)
 
 void check_store(void)
 {
-	char *root = talloc_strdup(NULL, "/");
 	struct hashtable *reachable;
+	struct walk_funcs walkfuncs = {
+		.enter = check_store_step,
+		.enoent = check_store_enoent,
+	};
 
-	if (!root) {
-		log("check_store: ENOMEM");
-		return;
-	}
 	reachable = create_hashtable(16, hash_from_key_fn, keys_equal_fn);
 	if (!reachable) {
 		log("check_store: ENOMEM");
-		goto out_root;
+		return;
 	}
 
 	log("Checking store ...");
-	if (!check_store_(root, reachable) &&
-	    !check_transactions(reachable))
+	if (walk_node_tree(NULL, NULL, "/", &walkfuncs, reachable)) {
+		if (errno == ENOMEM)
+			log("check_store: ENOMEM");
+	} else if (!check_transactions(reachable))
 		clean_store(reachable);
 	log("Checking store complete.");
 
 	hashtable_destroy(reachable, 0 /* Don't free values (they are all
 					  (void *)1) */);
- out_root:
-	talloc_free(root);
 }
 
 
From 06c89dd48c08f1eaa2e5bcc1407554ee3e0b4e27 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: use treewalk for deleting nodes

Instead of doing an open tree walk using call recursion, use
walk_node_tree() when deleting a sub-tree of nodes.

This will reduce code size and avoid many nesting levels of function
calls which could potentially exhaust the stack.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index b0889186b61a..fa24bcfea4fb 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1330,21 +1330,6 @@ static int do_read(const void *ctx, struct connection *conn,
 	return 0;
 }
 
-static void delete_node_single(struct connection *conn, struct node *node)
-{
-	TDB_DATA key;
-
-	if (access_node(conn, node, NODE_ACCESS_DELETE, &key))
-		return;
-
-	if (do_tdb_delete(conn, &key, &node->acc) != 0) {
-		corrupt(conn, "Could not delete '%s'", node->name);
-		return;
-	}
-
-	domain_entry_dec(conn, node);
-}
-
 /* Must not be / */
 static char *basename(const char *name)
 {
@@ -1615,69 +1600,59 @@ static int remove_child_entry(struct connection *conn, struct node *node,
 	return write_node(conn, node, true);
 }
 
-static void delete_child(struct connection *conn,
-			 struct node *node, const char *childname)
+static int delete_child(struct connection *conn,
+			struct node *node, const char *childname)
 {
 	unsigned int i;
 
 	for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
 		if (streq(node->children+i, childname)) {
-			if (remove_child_entry(conn, node, i))
-				corrupt(conn, "Can't update parent node '%s'",
-					node->name);
-			return;
+			errno = remove_child_entry(conn, node, i) ? EIO : 0;
+			return errno;
 		}
 	}
 	corrupt(conn, "Can't find child '%s' in %s", childname, node->name);
+
+	errno = EIO;
+	return errno;
 }
 
-static int delete_node(struct connection *conn, const void *ctx,
-		       struct node *parent, struct node *node, bool watch_exact)
+static int delnode_sub(const void *ctx, struct connection *conn,
+		       struct node *node, void *arg)
 {
-	char *name;
+	const char *root = arg;
+	bool watch_exact;
+	int ret;
+	TDB_DATA key;
 
-	/* Delete children. */
-	while (node->childlen) {
-		struct node *child;
+	/* Any error here will probably be repeated for all following calls. */
+	ret = access_node(conn, node, NODE_ACCESS_DELETE, &key);
+	if (ret > 0)
+		return WALK_TREE_SUCCESS_STOP;
 
-		name = talloc_asprintf(node, "%s/%s", node->name,
-				       node->children);
-		child = name ? read_node(conn, node, name) : NULL;
-		if (child) {
-			if (delete_node(conn, ctx, node, child, true))
-				return errno;
-		} else {
-			trace("delete_node: Error deleting child '%s/%s'!\n",
-			      node->name, node->children);
-			/* Quit deleting. */
-			errno = ENOMEM;
-			return errno;
-		}
-		talloc_free(name);
-	}
+	/* In case of error stop the walk. */
+	if (!ret && do_tdb_delete(conn, &key, &node->acc))
+		return WALK_TREE_SUCCESS_STOP;
 
 	/*
 	 * Fire the watches now, when we can still see the node permissions.
 	 * This fine as we are single threaded and the next possible read will
 	 * be handled only after the node has been really removed.
-	 */
+	*/
+	watch_exact = strcmp(root, node->name);
 	fire_watches(conn, ctx, node->name, node, watch_exact, NULL);
-	delete_node_single(conn, node);
-	delete_child(conn, parent, basename(node->name));
-	talloc_free(node);
 
-	return 0;
+	domain_entry_dec(conn, node);
+
+	return WALK_TREE_RM_CHILDENTRY;
 }
 
-static int _rm(struct connection *conn, const void *ctx, struct node *node,
-	       const char *name)
+static int _rm(struct connection *conn, const void *ctx, const char *name)
 {
-	/*
-	 * Deleting node by node, so the result is always consistent even in
-	 * case of a failure.
-	 */
 	struct node *parent;
 	char *parentname = get_parent(ctx, name);
+	struct walk_funcs walkfuncs = { .exit = delnode_sub };
+	int ret;
 
 	if (!parentname)
 		return errno;
@@ -1685,9 +1660,21 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
 	parent = read_node(conn, ctx, parentname);
 	if (!parent)
 		return read_node_can_propagate_errno() ? errno : EINVAL;
-	node->parent = parent;
 
-	return delete_node(conn, ctx, parent, node, false);
+	ret = walk_node_tree(ctx, conn, name, &walkfuncs, (void *)name);
+	if (ret < 0) {
+		if (ret == WALK_TREE_ERROR_STOP) {
+			corrupt(conn, "error when deleting sub-nodes of %s\n",
+				name);
+			errno = EIO;
+		}
+		return errno;
+	}
+
+	if (delete_child(conn, parent, basename(name)))
+		return errno;
+
+	return 0;
 }
 
 
@@ -1724,7 +1711,7 @@ static int do_rm(const void *ctx, struct connection *conn,
 	if (streq(name, "/"))
 		return EINVAL;
 
-	ret = _rm(conn, ctx, node, name);
+	ret = _rm(conn, ctx, name);
 	if (ret)
 		return ret;
 
From 742b8b3e4c2f775e47819754ca914e61aaa078c3 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 13 Sep 2022 07:35:12 +0200
Subject: tools/xenstore: use treewalk for creating node records

Instead of doing an open tree walk using call recursion, use
walk_node_tree() when creating the node records during a live update.

This will reduce code size and avoid many nesting levels of function
calls which could potentially exhaust the stack.

This is part of XSA-418 / CVE-2022-42321.

Reported-by: Julien Grall <dvrabel@amazon.co.uk>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index fa24bcfea4fb..bdc14679adf5 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -3088,101 +3088,76 @@ const char *dump_state_node_perms(FILE *fp, const struct xs_permissions *perms,
 	return NULL;
 }
 
-static const char *dump_state_node_tree(FILE *fp, char *path,
-					unsigned int path_max_len)
+struct dump_node_data {
+	FILE *fp;
+	const char *err;
+};
+
+static int dump_state_node_err(struct dump_node_data *data, const char *err)
 {
-	unsigned int pathlen, childlen, p = 0;
+	data->err = err;
+	return WALK_TREE_ERROR_STOP;
+}
+
+static int dump_state_node(const void *ctx, struct connection *conn,
+			   struct node *node, void *arg)
+{
+	struct dump_node_data *data = arg;
+	FILE *fp = data->fp;
+	unsigned int pathlen;
 	struct xs_state_record_header head;
 	struct xs_state_node sn;
-	TDB_DATA key, data;
-	const struct xs_tdb_record_hdr *hdr;
-	const char *child;
 	const char *ret;
 
-	pathlen = strlen(path) + 1;
-
-	set_tdb_key(path, &key);
-	data = tdb_fetch(tdb_ctx, key);
-	if (data.dptr == NULL)
-		return "Error reading node";
-
-	/* Clean up in case of failure. */
-	talloc_steal(path, data.dptr);
-
-	hdr = (void *)data.dptr;
+	pathlen = strlen(node->name) + 1;
 
 	head.type = XS_STATE_TYPE_NODE;
 	head.length = sizeof(sn);
 	sn.conn_id = 0;
 	sn.ta_id = 0;
 	sn.ta_access = 0;
-	sn.perm_n = hdr->num_perms;
+	sn.perm_n = node->perms.num;
 	sn.path_len = pathlen;
-	sn.data_len = hdr->datalen;
-	head.length += hdr->num_perms * sizeof(*sn.perms);
+	sn.data_len = node->datalen;
+	head.length += node->perms.num * sizeof(*sn.perms);
 	head.length += pathlen;
-	head.length += hdr->datalen;
+	head.length += node->datalen;
 	head.length = ROUNDUP(head.length, 3);
 
 	if (fwrite(&head, sizeof(head), 1, fp) != 1)
-		return "Dump node state error";
+		return dump_state_node_err(data, "Dump node head error");
 	if (fwrite(&sn, sizeof(sn), 1, fp) != 1)
-		return "Dump node state error";
+		return dump_state_node_err(data, "Dump node state error");
 
-	ret = dump_state_node_perms(fp, hdr->perms, hdr->num_perms);
+	ret = dump_state_node_perms(fp, node->perms.p, node->perms.num);
 	if (ret)
-		return ret;
+		return dump_state_node_err(data, ret);
+
+	if (fwrite(node->name, pathlen, 1, fp) != 1)
+		return dump_state_node_err(data, "Dump node path error");
 
-	if (fwrite(path, pathlen, 1, fp) != 1)
-		return "Dump node path error";
-	if (hdr->datalen &&
-	    fwrite(hdr->perms + hdr->num_perms, hdr->datalen, 1, fp) != 1)
-		return "Dump node data error";
+	if (node->datalen && fwrite(node->data, node->datalen, 1, fp) != 1)
+		return dump_state_node_err(data, "Dump node data error");
 
 	ret = dump_state_align(fp);
 	if (ret)
-		return ret;
+		return dump_state_node_err(data, ret);
 
-	child = (char *)(hdr->perms + hdr->num_perms) + hdr->datalen;
-
-	/*
-	 * Use path for constructing children paths.
-	 * As we don't write out nodes without having written their parent
-	 * already we will never clobber a part of the path we'll need later.
-	 */
-	pathlen--;
-	if (path[pathlen - 1] != '/') {
-		path[pathlen] = '/';
-		pathlen++;
-	}
-	while (p < hdr->childlen) {
-		childlen = strlen(child) + 1;
-		if (pathlen + childlen > path_max_len)
-			return "Dump node path length error";
-		strcpy(path + pathlen, child);
-		ret = dump_state_node_tree(fp, path, path_max_len);
-		if (ret)
-			return ret;
-		p += childlen;
-		child += childlen;
-	}
-
-	talloc_free(data.dptr);
-
-	return NULL;
+	return WALK_TREE_OK;
 }
 
 const char *dump_state_nodes(FILE *fp, const void *ctx)
 {
-	char *path;
-
-	path = talloc_size(ctx, XENSTORE_ABS_PATH_MAX + 1);
-	if (!path)
-		return "Path buffer allocation error";
+	struct dump_node_data data = {
+		.fp = fp,
+		.err = "Dump node walk error"
+	};
+	struct walk_funcs walkfuncs = { .enter = dump_state_node };
 
-	strcpy(path, "/");
+	if (walk_node_tree(ctx, NULL, "/", &walkfuncs, &data))
+		return data.err;
 
-	return dump_state_node_tree(fp, path, XENSTORE_ABS_PATH_MAX + 1);
+	return NULL;
 }
 
 void read_state_global(const void *ctx, const void *state)