Xen Security Advisory 477 v2 (CVE-2025-58150) - x86: buffer overrun with shadow paging + tracing

Posted by Xen.org security team 1 week, 3 days ago
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

            Xen Security Advisory CVE-2025-58150 / XSA-477
                               version 2

           x86: buffer overrun with shadow paging + tracing

UPDATES IN VERSION 2
====================

Public release.

ISSUE DESCRIPTION
=================

Shadow mode tracing code uses a set of per-CPU variables to avoid
cumbersome parameter passing.  Some of these variables are written to
with guest controlled data, of guest controllable size.  That size can
be larger than the variable, and bounding of the writes was missing.

IMPACT
======

The exact effects depend on what's adjacent to the variables in
question.  The most likely effects are bogus trace data, but none of
privilege escalation, information leaks, or Denial of Service (DoS) can
be excluded without detailed analysis of the particular build of Xen.

VULNERABLE SYSTEMS
==================

Only x86 systems are vulnerable.  Arm systems are not vulnerable.

Only HVM guests running in shadow paging mode and with tracing enabled
can leverage the vulnerability.

MITIGATION
==========

Running HVM guests in HAP mode only will avoid the vulnerability.

Not enabling tracing will also avoid the vulnerability.  Tracing is
enabled by the "tbuf_size=" command line option, or by running tools
like xentrace or xenbaked in Dom0.  Note that on a running system
stopping xentrace / xenbaked would disable tracing.  For xentrace,
however, this additionally requires that it wasn't started with the -x
option.  Stopping previously enabled tracing can of course only prevent
future damage; prior damage may have occurred and may manifest only
later.

CREDITS
=======

This issue was discovered by Jan Beulich of SUSE.

RESOLUTION
==========

Applying the appropriate attached patch resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa477.patch           xen-unstable - Xen 4.19.x
xsa477-4.18.patch      Xen 4.18.x

$ sha256sum xsa477*
025783441d7db846e717a1e48547b0db7a36fcc6af652b688524c684f0c3d2a7  xsa477.patch
194da830e15195873456b145a8df83af43aaae7a82fa6cb6852928d75c68909c  xsa477-4.18.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches and/or mitigations described above (or
others which are substantially similar) is permitted during the
embargo, even on public-facing systems with untrusted guest users and
administrators.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAml4qLYMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZ+IkH/jgVtAAifglnIrxstdAUXMritwnXvcrIJaKjG7yj
8980GavdbttObFRL+d2XvPXAQLRWCbgMNgNFA9s/6EhH2cCMF9mmeYxxU9zqG9qi
MQyfp1v/UpNrvD4hdHIXhohMELF6IdXQkrRvnB0hJwSPsDEzMZyofTOKppmSqSE1
tIdFXD1R845KTl9eG1lX4uwr2KhAjAgk4DrpIvxmtkiz3yF8kznjAGDSA7luKkTU
XBSlBe9u/9Yg5cspQrh7tVQ0K+6wDR6f4bCq26P/VCDUjwRIzHDhdP+RzKaumLGn
nTU0aAuIBlXYCa+8HB5c9vf/yLldKflYZ4Qmb3jGD4GYZrQ=
=nlvD
-----END PGP SIGNATURE-----
From: Jan Beulich <jbeulich@suse.com>
Subject: x86/shadow: don't overrun trace_emul_write_val

Guests can do wider-than-PTE-size writes on page tables. The tracing
helper variable, however, only offers space for a single PTE (and it is
being switched to the more correct type right here). Therefore bound
incoming write sizes to the amount of space available.

To not leave dead code (which is a Misra concern), drop the now unused
guest_pa_t as well.

Also move and adjust GUEST_PTE_SIZE: Derive it rather than using hard-
coded numbers, and put it in the sole source file where it's actually
needed. This then also addresses a Misra rule 20.9 ("All identifiers
used in the controlling expression of #if or #elif preprocessing
directives shall be #define'd before evaluation") violation:
GUEST_PAGING_LEVELS is #define'd only in multi.c.

This is XSA-477 / CVE-2025-58150.

Fixes: 9a86ac1aa3d2 ("xentrace 5/7: Additional tracing for the shadow code")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1970,15 +1970,15 @@ static void sh_prefetch(struct vcpu *v,
 
 #if GUEST_PAGING_LEVELS == 4
 typedef u64 guest_va_t;
-typedef u64 guest_pa_t;
 #elif GUEST_PAGING_LEVELS == 3
 typedef u32 guest_va_t;
-typedef u64 guest_pa_t;
 #else
 typedef u32 guest_va_t;
-typedef u32 guest_pa_t;
 #endif
 
+/* Size (in bytes) of a guest PTE */
+#define GUEST_PTE_SIZE sizeof(guest_l1e_t)
+
 /* Shadow trace event with GUEST_PAGING_LEVELS folded into the event field. */
 static void sh_trace(uint32_t event, unsigned int extra, const void *extra_data)
 {
@@ -2048,11 +2048,14 @@ static void __maybe_unused sh_trace_gfn_
 static DEFINE_PER_CPU(guest_va_t,trace_emulate_initial_va);
 static DEFINE_PER_CPU(int,trace_extra_emulation_count);
 #endif
-static DEFINE_PER_CPU(guest_pa_t,trace_emulate_write_val);
+static DEFINE_PER_CPU(guest_l1e_t, trace_emulate_write_val);
 
 static void cf_check trace_emulate_write_val(
     const void *ptr, unsigned long vaddr, const void *src, unsigned int bytes)
 {
+    if ( bytes > sizeof(this_cpu(trace_emulate_write_val)) )
+        bytes = sizeof(this_cpu(trace_emulate_write_val));
+
 #if GUEST_PAGING_LEVELS == 3
     if ( vaddr == this_cpu(trace_emulate_initial_va) )
         memcpy(&this_cpu(trace_emulate_write_val), src, bytes);
@@ -2077,13 +2080,16 @@ static inline void sh_trace_emulate(gues
             /*
              * For GUEST_PAGING_LEVELS=3 (PAE paging), guest_l1e is 64 while
              * guest_va is 32.  Put it first to avoid padding.
+             *
+             * Note: .write_val is an arbitrary set of written bytes, possibly
+             * misaligned and possibly spanning the next gl1e.
              */
             guest_l1e_t gl1e, write_val;
             guest_va_t va;
             uint32_t flags:29, emulation_count:3;
         } d = {
             .gl1e            = gl1e,
-            .write_val.l1    = this_cpu(trace_emulate_write_val),
+            .write_val       = this_cpu(trace_emulate_write_val),
             .va              = va,
 #if GUEST_PAGING_LEVELS == 3
             .emulation_count = this_cpu(trace_extra_emulation_count),
@@ -2672,7 +2677,7 @@ static int cf_check sh_page_fault(
     paging_unlock(d);
     put_gfn(d, gfn_x(gfn));
 
-    this_cpu(trace_emulate_write_val) = 0;
+    this_cpu(trace_emulate_write_val) = (guest_l1e_t){};
 
 #if SHADOW_OPTIMIZATIONS & SHOPT_FAST_EMULATION
  early_emulation:
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -120,14 +120,6 @@ enum {
     TRCE_SFLAG_OOS_FIXUP_EVICT,
 };
 
-
-/* Size (in bytes) of a guest PTE */
-#if GUEST_PAGING_LEVELS >= 3
-# define GUEST_PTE_SIZE 8
-#else
-# define GUEST_PTE_SIZE 4
-#endif
-
 /******************************************************************************
  * Auditing routines
  */
From: Jan Beulich <jbeulich@suse.com>
Subject: x86/shadow: don't overrun trace_emul_write_val

Guests can do wider-than-PTE-size writes on page tables. The tracing
helper variable, however, only offers space for a single PTE (and it is
being switched to the more correct type right here). Therefore bound
incoming write sizes to the amount of space available.

To not leave dead code (which is a Misra concern), drop the now unused
guest_pa_t as well.

Also move and adjust GUEST_PTE_SIZE: Derive it rather than using hard-
coded numbers, and put it in the sole source file where it's actually
needed. This then also addresses a Misra rule 20.9 ("All identifiers
used in the controlling expression of #if or #elif preprocessing
directives shall be #define'd before evaluation") violation:
GUEST_PAGING_LEVELS is #define'd only in multi.c.

This is XSA-477 / CVE-2025-58150.

Fixes: 9a86ac1aa3d2 ("xentrace 5/7: Additional tracing for the shadow code")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1965,15 +1965,15 @@ static void sh_prefetch(struct vcpu *v,
 
 #if GUEST_PAGING_LEVELS == 4
 typedef u64 guest_va_t;
-typedef u64 guest_pa_t;
 #elif GUEST_PAGING_LEVELS == 3
 typedef u32 guest_va_t;
-typedef u64 guest_pa_t;
 #else
 typedef u32 guest_va_t;
-typedef u32 guest_pa_t;
 #endif
 
+/* Size (in bytes) of a guest PTE */
+#define GUEST_PTE_SIZE sizeof(guest_l1e_t)
+
 static inline void trace_shadow_gen(u32 event, guest_va_t va)
 {
     if ( tb_init_done )
@@ -2062,11 +2062,14 @@ static inline void trace_shadow_emulate_
 static DEFINE_PER_CPU(guest_va_t,trace_emulate_initial_va);
 static DEFINE_PER_CPU(int,trace_extra_emulation_count);
 #endif
-static DEFINE_PER_CPU(guest_pa_t,trace_emulate_write_val);
+static DEFINE_PER_CPU(guest_l1e_t, trace_emulate_write_val);
 
 static void cf_check trace_emulate_write_val(
     const void *ptr, unsigned long vaddr, const void *src, unsigned int bytes)
 {
+    if ( bytes > sizeof(this_cpu(trace_emulate_write_val)) )
+        bytes = sizeof(this_cpu(trace_emulate_write_val));
+
 #if GUEST_PAGING_LEVELS == 3
     if ( vaddr == this_cpu(trace_emulate_initial_va) )
         memcpy(&this_cpu(trace_emulate_write_val), src, bytes);
@@ -2088,8 +2091,13 @@ static inline void trace_shadow_emulate(
     if ( tb_init_done )
     {
         struct __packed {
-            /* for PAE, guest_l1e may be 64 while guest_va may be 32;
-               so put it first for alignment sake. */
+            /*
+             * For GUEST_PAGING_LEVELS=3 (PAE paging), guest_l1e is 64 while
+             * guest_va is 32.  Put it first to avoid padding.
+             *
+             * Note: .write_val is an arbitrary set of written bytes, possibly
+             * misaligned and possibly spanning the next gl1e.
+             */
             guest_l1e_t gl1e, write_val;
             guest_va_t va;
             uint32_t flags:29, emulation_count:3;
@@ -2099,7 +2107,7 @@ static inline void trace_shadow_emulate(
         event = TRC_SHADOW_EMULATE | ((GUEST_PAGING_LEVELS-2)<<8);
 
         d.gl1e = gl1e;
-        d.write_val.l1 = this_cpu(trace_emulate_write_val);
+        d.write_val = this_cpu(trace_emulate_write_val);
         d.va = va;
 #if GUEST_PAGING_LEVELS == 3
         d.emulation_count = this_cpu(trace_extra_emulation_count);
@@ -2680,7 +2688,7 @@ static int cf_check sh_page_fault(
     paging_unlock(d);
     put_gfn(d, gfn_x(gfn));
 
-    this_cpu(trace_emulate_write_val) = 0;
+    this_cpu(trace_emulate_write_val) = (guest_l1e_t){};
 
 #if SHADOW_OPTIMIZATIONS & SHOPT_FAST_EMULATION
  early_emulation:
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -120,14 +120,6 @@ enum {
     TRCE_SFLAG_OOS_FIXUP_EVICT,
 };
 
-
-/* Size (in bytes) of a guest PTE */
-#if GUEST_PAGING_LEVELS >= 3
-# define GUEST_PTE_SIZE 8
-#else
-# define GUEST_PTE_SIZE 4
-#endif
-
 /******************************************************************************
  * Auditing routines
  */