Xen Security Advisory 425 v1 (CVE-2022-42330) - Guests can cause Xenstore crash via soft reset

Posted by Xen.org security team 1 year, 3 months ago
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

            Xen Security Advisory CVE-2022-42330 / XSA-425

            Guests can cause Xenstore crash via soft reset

ISSUE DESCRIPTION
=================

When a guest issues a "Soft Reset" (e.g. for performing a kexec) the
libxl based Xen toolstack will normally perform a XS_RELEASE Xenstore
operation.

Due to a bug in xenstored this can result in a crash of xenstored.

Any other use of XS_RELEASE will have the same impact.

IMPACT
======

A malicious guest could try to kexec until it hits the xenstored bug,
resulting in the inability to perform any further domain administration
like starting new guests, or adding/removing resources to or from any
existing guest.

VULNERABLE SYSTEMS
==================

Only Xen version 4.17 is vulnerable. Systems running an older version
of Xen are not vulnerable.

All Xen systems using C xenstored are vulnerable. Systems using the
OCaml variant of xenstored are not vulnerable.

Systems running only PV guests (x86 only) are not vulnerable, as long as
they are using a libxl based toolstack.

MITIGATION
==========

The problem can be avoided by either:

- - using the OCaml xenstored variant

- - explicitly configuring guests to NOT perform the "Soft Reset" action
  by adding:
    on_soft_reset="reboot"
  or similar to the guest's configuration. This will break kexec in the
  guest, though.

NOTE REGARDING LACK OF EMBARGO
==============================

This issue was discussed in public already.

RESOLUTION
==========

Applying the attached patch resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa425.patch           xen-unstable, Xen 4.17.x

$ sha256sum xsa425*
49f322c955fe7857cc824bba80625e56f582fdf0a4b244f513b6750e15ba5e48  xsa425.patch
$

-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmPRQroMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZEpsIAJmIVB2lvqT2Qdp0pPSoaJIxXxuGE320kVTWmudB
F2WbRCxeubqoOC/MyHTLOujMix6wBHnbm1cMQo0r4Vah/KX34vPS3wYqDZQYZtES
aEkOQ+214QLAS2futcT0gde9idKpShI9jjWSRwcH01a7V6tlwwidc4V0luUFV0iX
EKHPJ89rbbCMP1fOq5B+C7UP8oyiHItNWPWPFBwtUeXKvFiPOoyUPCoTHG8CCYHG
WiVbeaZab7x/9+WUwXJ6hZqZiVr6NqoaItOx9Nbw4yCHwJlAj2UfA9skmqtGbPbB
vxhkbIgOeiWoPvZgTGQjzZLosWO5+y30Fv5QYIbjA2/1OSQ=
=7kiM
-----END PGP SIGNATURE-----
From: Jason Andryuk <jandryuk@gmail.com>
Subject: Revert "tools/xenstore: simplify loop handling connection I/O"

I'm observing guest kexec trigger xenstored to abort on a double free.

gdb output:
Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140645614258112) at ./nptl/pthread_kill.c:44
44    ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
    at ./nptl/pthread_kill.c:44
    at ./nptl/pthread_kill.c:78
    at ./nptl/pthread_kill.c:89
    at ../sysdeps/posix/raise.c:26
    at talloc.c:119
    ptr=ptr@entry=0x559fae724290) at talloc.c:232
    at xenstored_core.c:2945
(gdb) frame 5
    at talloc.c:119
119            TALLOC_ABORT("Bad talloc magic value - double free");
(gdb) frame 7
    at xenstored_core.c:2945
2945                talloc_increase_ref_count(conn);
(gdb) p conn
$1 = (struct connection *) 0x559fae724290

Looking at a xenstore trace, we have:
IN 0x559fae71f250 20230120 17:40:53 READ (/local/domain/3/image/device-model-dom
id )
wrl: dom    0      1  msec      10000 credit     1000000 reserve        100 disc
ard
wrl: dom    3      1  msec      10000 credit     1000000 reserve        100 disc
ard
wrl: dom    0      0  msec      10000 credit     1000000 reserve          0 disc
ard
wrl: dom    3      0  msec      10000 credit     1000000 reserve          0 disc
ard
OUT 0x559fae71f250 20230120 17:40:53 ERROR (ENOENT )
wrl: dom    0      1  msec      10000 credit     1000000 reserve        100 disc
ard
wrl: dom    3      1  msec      10000 credit     1000000 reserve        100 disc
ard
IN 0x559fae71f250 20230120 17:40:53 RELEASE (3 )
DESTROY watch 0x559fae73f630
DESTROY watch 0x559fae75ddf0
DESTROY watch 0x559fae75ec30
DESTROY watch 0x559fae75ea60
DESTROY watch 0x559fae732c00
DESTROY watch 0x559fae72cea0
DESTROY watch 0x559fae728fc0
DESTROY watch 0x559fae729570
DESTROY connection 0x559fae724290
orphaned node /local/domain/3/device/suspend/event-channel deleted
orphaned node /local/domain/3/device/vbd/51712 deleted
orphaned node /local/domain/3/device/vkbd/0 deleted
orphaned node /local/domain/3/device/vif/0 deleted
orphaned node /local/domain/3/control/shutdown deleted
orphaned node /local/domain/3/control/feature-poweroff deleted
orphaned node /local/domain/3/control/feature-reboot deleted
orphaned node /local/domain/3/control/feature-suspend deleted
orphaned node /local/domain/3/control/feature-s3 deleted
orphaned node /local/domain/3/control/feature-s4 deleted
orphaned node /local/domain/3/control/sysrq deleted
orphaned node /local/domain/3/data deleted
orphaned node /local/domain/3/drivers deleted
orphaned node /local/domain/3/feature deleted
orphaned node /local/domain/3/attr deleted
orphaned node /local/domain/3/error deleted
orphaned node /local/domain/3/console/backend-id deleted

and no further output.

The trace shows that DESTROY was called for connection 0x559fae724290,
but that is the same pointer (conn) main() was looping through from
connections.  So it wasn't actually removed from the connections list?

Reverting commit e8e6e42279a5 "tools/xenstore: simplify loop handling
connection I/O" fixes the abort/double free.  I think the use of
list_for_each_entry_safe is incorrect.  list_for_each_entry_safe makes
traversal safe for deleting the current iterator, but RELEASE/do_release
will delete some other entry in the connections list.  I think the
observed abort is because list_for_each_entry has next pointing to the
deleted connection, and it is used in the subsequent iteration.

Add a comment explaining the unsuitability of list_for_each_entry_safe.
Also notice that the old code takes a reference on next which would
prevents a use-after-free.

This reverts commit e8e6e42279a5723239c5c40ba4c7f579a979465d.

This is XSA-425/CVE-2022-42330.

Fixes: e8e6e42279a5 ("tools/xenstore: simplify loop handling connection I/O")
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
---
 tools/xenstore/xenstored_core.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 78a3edaa4e..029e3852fc 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -2941,8 +2941,23 @@ int main(int argc, char *argv[])
 			}
 		}
 
-		list_for_each_entry_safe(conn, next, &connections, list) {
-			talloc_increase_ref_count(conn);
+		/*
+		 * list_for_each_entry_safe is not suitable here because
+		 * handle_input may delete entries besides the current one, but
+		 * those may be in the temporary next which would trigger a
+		 * use-after-free.  list_for_each_entry_safe is only safe for
+		 * deleting the current entry.
+		 */
+		next = list_entry(connections.next, typeof(*conn), list);
+		if (&next->list != &connections)
+			talloc_increase_ref_count(next);
+		while (&next->list != &connections) {
+			conn = next;
+
+			next = list_entry(conn->list.next,
+					  typeof(*conn), list);
+			if (&next->list != &connections)
+				talloc_increase_ref_count(next);
 
 			if (conn_can_read(conn))
 				handle_input(conn);
-- 
2.34.1