Xen Security Advisory 451 v2 (CVE-2023-46841) - x86: shadow stack vs exceptions from emulation stubs

Xen.org security team posted 1 patch 2 months ago
Failed in applying to current master (apply log)
Xen Security Advisory 451 v2 (CVE-2023-46841) - x86: shadow stack vs exceptions from emulation stubs
Posted by Xen.org security team 2 months ago
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

            Xen Security Advisory CVE-2023-46841 / XSA-451
                               version 2

         x86: shadow stack vs exceptions from emulation stubs

UPDATES IN VERSION 2
====================

Largely cosmetic adjustment in patches.

Public release.

ISSUE DESCRIPTION
=================

Recent x86 CPUs offer functionality named Control-flow Enforcement
Technology (CET).  A sub-feature of this are Shadow Stacks (CET-SS).
CET-SS is a hardware feature designed to protect against Return Oriented
Programming attacks. When enabled, traditional stacks holding both data
and return addresses are accompanied by so called "shadow stacks",
holding little more than return addresses.  Shadow stacks aren't
writable by normal instructions, and upon function returns their
contents are used to check for possible manipulation of a return address
coming from the traditional stack.

In particular certain memory accesses need intercepting by Xen.  In
various cases the necessary emulation involves kind of replaying of
the instruction.  Such replaying typically involves filling and then
invoking of a stub.  Such a replayed instruction may raise an
exceptions, which is expected and dealt with accordingly.

Unfortunately the interaction of both of the above wasn't right:
Recovery involves removal of a call frame from the (traditional) stack.
The counterpart of this operation for the shadow stack was missing.

IMPACT
======

An unprivileged guest can cause a hypervisor crash, causing a Denial of
Service (DoS) of the entire host.

VULNERABLE SYSTEMS
==================

Xen 4.14 and onwards are vulnerable.  Xen 4.13 and older are not
vulnerable.

Only x86 systems with CET-SS enabled are vulnerable.  x86 systems with
CET-SS unavailable or disabled are not vulnerable.  Arm systems are not
vulnerable.  See
https://xenbits.xen.org/docs/latest/faq.html#tell-if-cet-is-active
for how to determine whether CET-SS is active.

Only HVM or PVH guests can leverage the vulnerability.  PV guests cannot
leverage the vulnerability.

MITIGATION
==========

While in principle it is possible to disable use of CET on capable
systems using the "cet=no-shstk" command line option, doing so disables
an important security feature and may therefore not be advisable.

CREDITS
=======

This issue was discovered by Jan Beulich of SUSE.

RESOLUTION
==========

Applying the appropriate (set of) attached patch(es) resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa451-?.patch         xen-unstable
xsa451-4.18.patch      Xen 4.18.x
xsa451-4.17.patch      Xen 4.17.x
xsa451-4.16.patch      Xen 4.16.x
xsa451-4.15.patch      Xen 4.15.x

$ sha256sum xsa451*
446178a9a37646e62622988efffa3d1ffa0b579fc089ab79138507acfd3440c0  xsa451-1.patch
614ab6925ea60f36212f0cd01929f3a97161de1828040770792e146c170bfea2  xsa451-2.patch
ad529273d7dc97bff239f1727a9702eb24d41b723d2a3077a1fecc4684900f91  xsa451-3.patch
2c68480657220cfab92fe9821ce201ff7c9e0b541619a1add541f3d66fa13e9d  xsa451-4.15.patch
fa8ab72e61fae0130fb81b0a7ce508fdb3bcb3c800b0ab7684aa6595cbad88ea  xsa451-4.16.patch
e41cab6471586a5f50e10eb26895fec624cc6d8fd3b4ff71495466df8aaa19e5  xsa451-4.17.patch
d6b76a8db6c80c0684fc94becc2e23091c8f1dcbebc726438dbb1a6cde543335  xsa451-4.18.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches and/or mitigations described above (or
others which are substantially similar) is permitted during the
embargo, even on public-facing systems with untrusted guest users and
administrators.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmXdu4UMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZApoIAMmKsIAqbt/QlUZFUXYx+DAW20Bl7DUGjJlFv6kx
pBDSxW3a2evYo+CTTapVeRfosI+/kI61pcyFd19EGdthVcgufPQOC7yVxmu8j7Wi
s6lb/h0b6vKFOUubKN+EtaVRR34acqmQwSq668AjcyL8M5xIdWfYDpKHVft29x8i
QwKdKnvsWwaFrUathVTlspqcHLkNWf7+nsTVapMG2O15UrqYdJPErhL/Bh+iwSih
exc/fRFyQuqFL7qHnvPXz+AhajjHmDO+1Z3OCir9MleyZ3JJvIq6Vnje75+DFHeT
n9kFt29LJMvRzlDzIdfUy9R98h0r3WIQBaicFO2pBKlp6i8=
=JJb5
-----END PGP SIGNATURE-----
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86: document how stub exception recovery works

Describe how it is meant to work, even if one aspect of it will only be
taken care of subsequently.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -94,6 +94,22 @@ search_exception_table(const struct cpu_
     if ( region && region->ex )
         return search_one_extable(region->ex, region->ex_end, regs->rip);
 
+    /*
+     * Emulation stubs (which are per-CPU) are constructed with a RET at the
+     * end, and are CALLed by the invoking code.
+     *
+     * An exception in the stubs may occur anywhere, so we first match any
+     * %rip in the correct stub, with a sanity check on %rsp too.  But, an
+     * entry in ex_table[] needs to be compile-time constant, so we register
+     * the fixup address using the invoking CALL's return address.
+     *
+     * To recover, we:
+     * 1) Emulate a pseudo-RET to get out of the stub.  We POP the return
+     *    address off the stack(s), use it to look up the fixup address, and
+     *    JMP there, then
+     * 2) Emulate a PUSH of 'token' onto the data stack to pass information
+     *    about the exception back to the invoking code.
+     */
     if ( regs->rip >= stub + STUB_BUF_SIZE / 2 &&
          regs->rip < stub + STUB_BUF_SIZE &&
          regs->rsp > (unsigned long)regs &&
From: Jan Beulich <jbeulich@suse.com>
Subject: x86: account for shadow stack in exception-from-stub recovery

Dealing with exceptions raised from within emulation stubs involves
discarding return address (replaced by exception related information).
Such discarding of course also requires removing the corresponding entry
from the shadow stack.

Also amend the comment in fixup_exception_return(), to further clarify
why use of ptr[1] can't be an out-of-bounds access.

While touching do_invalid_op() also add a missing fall-through
annotation.

This is CVE-2023-46841 / XSA-451.

Fixes: 209fb9919b50 ("x86/extable: Adjust extable handling to be shadow stack compatible")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -86,13 +86,16 @@ search_one_extable(const struct exceptio
 }
 
 unsigned long
-search_exception_table(const struct cpu_user_regs *regs)
+search_exception_table(const struct cpu_user_regs *regs, unsigned long *stub_ra)
 {
     const struct virtual_region *region = find_text_region(regs->rip);
     unsigned long stub = this_cpu(stubs.addr);
 
     if ( region && region->ex )
+    {
+        *stub_ra = 0;
         return search_one_extable(region->ex, region->ex_end, regs->rip);
+    }
 
     /*
      * Emulation stubs (which are per-CPU) are constructed with a RET at the
@@ -115,13 +118,13 @@ search_exception_table(const struct cpu_
          regs->rsp > (unsigned long)regs &&
          regs->rsp < (unsigned long)get_cpu_info() )
     {
-        unsigned long retptr = *(unsigned long *)regs->rsp;
+        unsigned long retaddr = *(unsigned long *)regs->rsp, fixup;
 
-        region = find_text_region(retptr);
-        retptr = region && region->ex
-                 ? search_one_extable(region->ex, region->ex_end, retptr)
-                 : 0;
-        if ( retptr )
+        region = find_text_region(retaddr);
+        fixup = region && region->ex
+                ? search_one_extable(region->ex, region->ex_end, retaddr)
+                : 0;
+        if ( fixup )
         {
             /*
              * Put trap number and error code on the stack (in place of the
@@ -133,7 +136,8 @@ search_exception_table(const struct cpu_
             };
 
             *(unsigned long *)regs->rsp = token.raw;
-            return retptr;
+            *stub_ra = retaddr;
+            return fixup;
         }
     }
 
--- a/xen/arch/x86/include/asm/uaccess.h
+++ b/xen/arch/x86/include/asm/uaccess.h
@@ -421,7 +421,8 @@ union stub_exception_token {
     unsigned long raw;
 };
 
-extern unsigned long search_exception_table(const struct cpu_user_regs *regs);
+extern unsigned long search_exception_table(const struct cpu_user_regs *regs,
+                                            unsigned long *stub_ra);
 extern void sort_exception_tables(void);
 extern void sort_exception_table(struct exception_table_entry *start,
                                  const struct exception_table_entry *stop);
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -838,7 +838,7 @@ void asmlinkage do_unhandled_trap(struct
 }
 
 static void fixup_exception_return(struct cpu_user_regs *regs,
-                                   unsigned long fixup)
+                                   unsigned long fixup, unsigned long stub_ra)
 {
     if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
     {
@@ -855,7 +855,8 @@ static void fixup_exception_return(struc
             /*
              * Search for %rip.  The shstk currently looks like this:
              *
-             *   ...  [Likely pointed to by SSP]
+             *   tok  [Supervisor token, == &tok | BUSY, only with FRED inactive]
+             *   ...  [Pointed to by SSP for most exceptions, empty in IST cases]
              *   %cs  [== regs->cs]
              *   %rip [== regs->rip]
              *   SSP  [Likely points to 3 slots higher, above %cs]
@@ -873,7 +874,56 @@ static void fixup_exception_return(struc
              */
             if ( ptr[0] == regs->rip && ptr[1] == regs->cs )
             {
+                unsigned long primary_shstk =
+                    (ssp & ~(STACK_SIZE - 1)) +
+                    (PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8;
+
                 wrss(fixup, ptr);
+
+                if ( !stub_ra )
+                    goto shstk_done;
+
+                /*
+                 * Stub recovery ought to happen only when the outer context
+                 * was on the main shadow stack.  We need to also "pop" the
+                 * stub's return address from the interrupted context's shadow
+                 * stack.  That is,
+                 * - if we're still on the main stack, we need to move the
+                 *   entire stack (up to and including the exception frame)
+                 *   up by one slot, incrementing the original SSP in the
+                 *   exception frame,
+                 * - if we're on an IST stack, we need to increment the
+                 *   original SSP.
+                 */
+                BUG_ON((ptr[-1] ^ primary_shstk) >> PAGE_SHIFT);
+
+                if ( (ssp ^ primary_shstk) >> PAGE_SHIFT )
+                {
+                    /*
+                     * We're on an IST stack.  First make sure the two return
+                     * addresses actually match.  Then increment the interrupted
+                     * context's SSP.
+                     */
+                    BUG_ON(stub_ra != *(unsigned long*)ptr[-1]);
+                    wrss(ptr[-1] + 8, &ptr[-1]);
+                    goto shstk_done;
+                }
+
+                /* Make sure the two return addresses actually match. */
+                BUG_ON(stub_ra != ptr[2]);
+
+                /* Move exception frame, updating SSP there. */
+                wrss(ptr[1], &ptr[2]); /* %cs */
+                wrss(ptr[0], &ptr[1]); /* %rip */
+                wrss(ptr[-1] + 8, &ptr[0]); /* SSP */
+
+                /* Move all newer entries. */
+                while ( --ptr != _p(ssp) )
+                    wrss(ptr[-1], &ptr[0]);
+
+                /* Finally account for our own stack having shifted up. */
+                asm volatile ( "incsspd %0" :: "r" (2) );
+
                 goto shstk_done;
             }
         }
@@ -894,7 +944,8 @@ static void fixup_exception_return(struc
 
 static bool extable_fixup(struct cpu_user_regs *regs, bool print)
 {
-    unsigned long fixup = search_exception_table(regs);
+    unsigned long stub_ra = 0;
+    unsigned long fixup = search_exception_table(regs, &stub_ra);
 
     if ( unlikely(fixup == 0) )
         return false;
@@ -908,7 +959,7 @@ static bool extable_fixup(struct cpu_use
                vector_name(regs->entry_vector), regs->error_code,
                _p(regs->rip), _p(regs->rip), _p(fixup));
 
-    fixup_exception_return(regs, fixup);
+    fixup_exception_return(regs, fixup, stub_ra);
     this_cpu(last_extable_addr) = regs->rip;
 
     return true;
@@ -1171,7 +1222,8 @@ void asmlinkage do_invalid_op(struct cpu
     {
     case BUGFRAME_run_fn:
     case BUGFRAME_warn:
-        fixup_exception_return(regs, (unsigned long)eip);
+        fixup_exception_return(regs, (unsigned long)eip, 0);
+        fallthrough;
     case BUGFRAME_bug:
     case BUGFRAME_assert:
         return;
From: Jan Beulich <jbeulich@suse.com>
Subject: x86: re-run exception-from-stub recovery selftests with CET-SS enabled

On the BSP, shadow stacks are enabled only relatively late in the
booting process. They in particular aren't active yet when initcalls are
run. Keep the testing there, but invoke that testing a 2nd time when
shadow stacks are active, to make sure we won't regress that case after
addressing XSA-451.

While touching this code, switch the guard from NDEBUG to CONFIG_DEBUG,
such that IS_ENABLED() can validly be used at the new call site.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -144,10 +144,11 @@ search_exception_table(const struct cpu_
     return 0;
 }
 
-#ifndef NDEBUG
+#ifdef CONFIG_DEBUG
+#include <asm/setup.h>
 #include <asm/traps.h>
 
-static int __init cf_check stub_selftest(void)
+int __init cf_check stub_selftest(void)
 {
     static const struct {
         uint8_t opc[8];
@@ -171,7 +172,8 @@ static int __init cf_check stub_selftest
     unsigned int i;
     bool fail = false;
 
-    printk("Running stub recovery selftests...\n");
+    printk("%s stub recovery selftests...\n",
+           system_state < SYS_STATE_active ? "Running" : "Re-running");
 
     for ( i = 0; i < ARRAY_SIZE(tests); ++i )
     {
--- a/xen/arch/x86/include/asm/setup.h
+++ b/xen/arch/x86/include/asm/setup.h
@@ -38,6 +38,8 @@ void *bootstrap_map(const module_t *mod)
 
 int remove_xen_ranges(struct rangeset *r);
 
+int cf_check stub_selftest(void);
+
 extern uint8_t kbd_shift_flags;
 
 #ifdef NDEBUG
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -740,6 +740,10 @@ static void noreturn init_done(void)
 
     system_state = SYS_STATE_active;
 
+    /* Re-run stub recovery self-tests with CET-SS active. */
+    if ( IS_ENABLED(CONFIG_DEBUG) && cpu_has_xen_shstk )
+        stub_selftest();
+
     domain_unpause_by_systemcontroller(dom0);
 
     /* MUST be done prior to removing .init data. */
From: Jan Beulich <jbeulich@suse.com>
Subject: x86: account for shadow stack in exception-from-stub recovery

Dealing with exceptions raised from within emulation stubs involves
discarding return address (replaced by exception related information).
Such discarding of course also requires removing the corresponding entry
from the shadow stack.

Also amend the comment in fixup_exception_return(), to further clarify
why use of ptr[1] can't be an out-of-bounds access.

This is CVE-2023-46841 / XSA-451.

Fixes: 209fb9919b50 ("x86/extable: Adjust extable handling to be shadow stack compatible")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -85,26 +85,29 @@ search_one_extable(const struct exceptio
 }
 
 unsigned long
-search_exception_table(const struct cpu_user_regs *regs)
+search_exception_table(const struct cpu_user_regs *regs, unsigned long *stub_ra)
 {
     const struct virtual_region *region = find_text_region(regs->rip);
     unsigned long stub = this_cpu(stubs.addr);
 
     if ( region && region->ex )
+    {
+        *stub_ra = 0;
         return search_one_extable(region->ex, region->ex_end - 1, regs->rip);
+    }
 
     if ( regs->rip >= stub + STUB_BUF_SIZE / 2 &&
          regs->rip < stub + STUB_BUF_SIZE &&
          regs->rsp > (unsigned long)regs &&
          regs->rsp < (unsigned long)get_cpu_info() )
     {
-        unsigned long retptr = *(unsigned long *)regs->rsp;
+        unsigned long retaddr = *(unsigned long *)regs->rsp, fixup;
 
-        region = find_text_region(retptr);
-        retptr = region && region->ex
-                 ? search_one_extable(region->ex, region->ex_end - 1, retptr)
-                 : 0;
-        if ( retptr )
+        region = find_text_region(retaddr);
+        fixup = region && region->ex
+                ? search_one_extable(region->ex, region->ex_end - 1, retaddr)
+                : 0;
+        if ( fixup )
         {
             /*
              * Put trap number and error code on the stack (in place of the
@@ -116,7 +119,8 @@ search_exception_table(const struct cpu_
             };
 
             *(unsigned long *)regs->rsp = token.raw;
-            return retptr;
+            *stub_ra = retaddr;
+            return fixup;
         }
     }
 
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -783,7 +783,7 @@ static void do_reserved_trap(struct cpu_
 }
 
 static void fixup_exception_return(struct cpu_user_regs *regs,
-                                   unsigned long fixup)
+                                   unsigned long fixup, unsigned long stub_ra)
 {
     if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
     {
@@ -800,7 +800,8 @@ static void fixup_exception_return(struc
             /*
              * Search for %rip.  The shstk currently looks like this:
              *
-             *   ...  [Likely pointed to by SSP]
+             *   tok  [Supervisor token, == &tok | BUSY, only with FRED inactive]
+             *   ...  [Pointed to by SSP for most exceptions, empty in IST cases]
              *   %cs  [== regs->cs]
              *   %rip [== regs->rip]
              *   SSP  [Likely points to 3 slots higher, above %cs]
@@ -818,7 +819,56 @@ static void fixup_exception_return(struc
              */
             if ( ptr[0] == regs->rip && ptr[1] == regs->cs )
             {
+                unsigned long primary_shstk =
+                    (ssp & ~(STACK_SIZE - 1)) +
+                    (PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8;
+
                 wrss(fixup, ptr);
+
+                if ( !stub_ra )
+                    goto shstk_done;
+
+                /*
+                 * Stub recovery ought to happen only when the outer context
+                 * was on the main shadow stack.  We need to also "pop" the
+                 * stub's return address from the interrupted context's shadow
+                 * stack.  That is,
+                 * - if we're still on the main stack, we need to move the
+                 *   entire stack (up to and including the exception frame)
+                 *   up by one slot, incrementing the original SSP in the
+                 *   exception frame,
+                 * - if we're on an IST stack, we need to increment the
+                 *   original SSP.
+                 */
+                BUG_ON((ptr[-1] ^ primary_shstk) >> PAGE_SHIFT);
+
+                if ( (ssp ^ primary_shstk) >> PAGE_SHIFT )
+                {
+                    /*
+                     * We're on an IST stack.  First make sure the two return
+                     * addresses actually match.  Then increment the interrupted
+                     * context's SSP.
+                     */
+                    BUG_ON(stub_ra != *(unsigned long*)ptr[-1]);
+                    wrss(ptr[-1] + 8, &ptr[-1]);
+                    goto shstk_done;
+                }
+
+                /* Make sure the two return addresses actually match. */
+                BUG_ON(stub_ra != ptr[2]);
+
+                /* Move exception frame, updating SSP there. */
+                wrss(ptr[1], &ptr[2]); /* %cs */
+                wrss(ptr[0], &ptr[1]); /* %rip */
+                wrss(ptr[-1] + 8, &ptr[0]); /* SSP */
+
+                /* Move all newer entries. */
+                while ( --ptr != _p(ssp) )
+                    wrss(ptr[-1], &ptr[0]);
+
+                /* Finally account for our own stack having shifted up. */
+                asm volatile ( "incsspd %0" :: "r" (2) );
+
                 goto shstk_done;
             }
         }
@@ -839,7 +889,8 @@ static void fixup_exception_return(struc
 
 static bool extable_fixup(struct cpu_user_regs *regs, bool print)
 {
-    unsigned long fixup = search_exception_table(regs);
+    unsigned long stub_ra = 0;
+    unsigned long fixup = search_exception_table(regs, &stub_ra);
 
     if ( unlikely(fixup == 0) )
         return false;
@@ -853,7 +904,7 @@ static bool extable_fixup(struct cpu_use
                vec_name(regs->entry_vector), regs->error_code,
                _p(regs->rip), _p(regs->rip), _p(fixup));
 
-    fixup_exception_return(regs, fixup);
+    fixup_exception_return(regs, fixup, stub_ra);
     this_cpu(last_extable_addr) = regs->rip;
 
     return true;
@@ -1144,7 +1195,7 @@ void do_invalid_op(struct cpu_user_regs
         void (*fn)(struct cpu_user_regs *) = bug_ptr(bug);
 
         fn(regs);
-        fixup_exception_return(regs, (unsigned long)eip);
+        fixup_exception_return(regs, (unsigned long)eip, 0);
         return;
     }
 
@@ -1165,7 +1216,7 @@ void do_invalid_op(struct cpu_user_regs
     case BUGFRAME_warn:
         printk("Xen WARN at %s%s:%d\n", prefix, filename, lineno);
         show_execution_state(regs);
-        fixup_exception_return(regs, (unsigned long)eip);
+        fixup_exception_return(regs, (unsigned long)eip, 0);
         return;
 
     case BUGFRAME_bug:
--- a/xen/include/asm-x86/uaccess.h
+++ b/xen/include/asm-x86/uaccess.h
@@ -421,7 +421,8 @@ union stub_exception_token {
     unsigned long raw;
 };
 
-extern unsigned long search_exception_table(const struct cpu_user_regs *regs);
+extern unsigned long search_exception_table(const struct cpu_user_regs *regs,
+                                            unsigned long *stub_ra);
 extern void sort_exception_tables(void);
 extern void sort_exception_table(struct exception_table_entry *start,
                                  const struct exception_table_entry *stop);
From: Jan Beulich <jbeulich@suse.com>
Subject: x86: account for shadow stack in exception-from-stub recovery

Dealing with exceptions raised from within emulation stubs involves
discarding return address (replaced by exception related information).
Such discarding of course also requires removing the corresponding entry
from the shadow stack.

Also amend the comment in fixup_exception_return(), to further clarify
why use of ptr[1] can't be an out-of-bounds access.

This is CVE-2023-46841 / XSA-451.

Fixes: 209fb9919b50 ("x86/extable: Adjust extable handling to be shadow stack compatible")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -86,26 +86,29 @@ search_one_extable(const struct exceptio
 }
 
 unsigned long
-search_exception_table(const struct cpu_user_regs *regs)
+search_exception_table(const struct cpu_user_regs *regs, unsigned long *stub_ra)
 {
     const struct virtual_region *region = find_text_region(regs->rip);
     unsigned long stub = this_cpu(stubs.addr);
 
     if ( region && region->ex )
+    {
+        *stub_ra = 0;
         return search_one_extable(region->ex, region->ex_end, regs->rip);
+    }
 
     if ( regs->rip >= stub + STUB_BUF_SIZE / 2 &&
          regs->rip < stub + STUB_BUF_SIZE &&
          regs->rsp > (unsigned long)regs &&
          regs->rsp < (unsigned long)get_cpu_info() )
     {
-        unsigned long retptr = *(unsigned long *)regs->rsp;
+        unsigned long retaddr = *(unsigned long *)regs->rsp, fixup;
 
-        region = find_text_region(retptr);
-        retptr = region && region->ex
-                 ? search_one_extable(region->ex, region->ex_end, retptr)
-                 : 0;
-        if ( retptr )
+        region = find_text_region(retaddr);
+        fixup = region && region->ex
+                ? search_one_extable(region->ex, region->ex_end, retaddr)
+                : 0;
+        if ( fixup )
         {
             /*
              * Put trap number and error code on the stack (in place of the
@@ -117,7 +120,8 @@ search_exception_table(const struct cpu_
             };
 
             *(unsigned long *)regs->rsp = token.raw;
-            return retptr;
+            *stub_ra = retaddr;
+            return fixup;
         }
     }
 
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -895,7 +895,7 @@ static void do_reserved_trap(struct cpu_
 }
 
 static void fixup_exception_return(struct cpu_user_regs *regs,
-                                   unsigned long fixup)
+                                   unsigned long fixup, unsigned long stub_ra)
 {
     if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
     {
@@ -912,7 +912,8 @@ static void fixup_exception_return(struc
             /*
              * Search for %rip.  The shstk currently looks like this:
              *
-             *   ...  [Likely pointed to by SSP]
+             *   tok  [Supervisor token, == &tok | BUSY, only with FRED inactive]
+             *   ...  [Pointed to by SSP for most exceptions, empty in IST cases]
              *   %cs  [== regs->cs]
              *   %rip [== regs->rip]
              *   SSP  [Likely points to 3 slots higher, above %cs]
@@ -930,7 +931,56 @@ static void fixup_exception_return(struc
              */
             if ( ptr[0] == regs->rip && ptr[1] == regs->cs )
             {
+                unsigned long primary_shstk =
+                    (ssp & ~(STACK_SIZE - 1)) +
+                    (PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8;
+
                 wrss(fixup, ptr);
+
+                if ( !stub_ra )
+                    goto shstk_done;
+
+                /*
+                 * Stub recovery ought to happen only when the outer context
+                 * was on the main shadow stack.  We need to also "pop" the
+                 * stub's return address from the interrupted context's shadow
+                 * stack.  That is,
+                 * - if we're still on the main stack, we need to move the
+                 *   entire stack (up to and including the exception frame)
+                 *   up by one slot, incrementing the original SSP in the
+                 *   exception frame,
+                 * - if we're on an IST stack, we need to increment the
+                 *   original SSP.
+                 */
+                BUG_ON((ptr[-1] ^ primary_shstk) >> PAGE_SHIFT);
+
+                if ( (ssp ^ primary_shstk) >> PAGE_SHIFT )
+                {
+                    /*
+                     * We're on an IST stack.  First make sure the two return
+                     * addresses actually match.  Then increment the interrupted
+                     * context's SSP.
+                     */
+                    BUG_ON(stub_ra != *(unsigned long*)ptr[-1]);
+                    wrss(ptr[-1] + 8, &ptr[-1]);
+                    goto shstk_done;
+                }
+
+                /* Make sure the two return addresses actually match. */
+                BUG_ON(stub_ra != ptr[2]);
+
+                /* Move exception frame, updating SSP there. */
+                wrss(ptr[1], &ptr[2]); /* %cs */
+                wrss(ptr[0], &ptr[1]); /* %rip */
+                wrss(ptr[-1] + 8, &ptr[0]); /* SSP */
+
+                /* Move all newer entries. */
+                while ( --ptr != _p(ssp) )
+                    wrss(ptr[-1], &ptr[0]);
+
+                /* Finally account for our own stack having shifted up. */
+                asm volatile ( "incsspd %0" :: "r" (2) );
+
                 goto shstk_done;
             }
         }
@@ -951,7 +1001,8 @@ static void fixup_exception_return(struc
 
 static bool extable_fixup(struct cpu_user_regs *regs, bool print)
 {
-    unsigned long fixup = search_exception_table(regs);
+    unsigned long stub_ra = 0;
+    unsigned long fixup = search_exception_table(regs, &stub_ra);
 
     if ( unlikely(fixup == 0) )
         return false;
@@ -965,7 +1016,7 @@ static bool extable_fixup(struct cpu_use
                vec_name(regs->entry_vector), regs->error_code,
                _p(regs->rip), _p(regs->rip), _p(fixup));
 
-    fixup_exception_return(regs, fixup);
+    fixup_exception_return(regs, fixup, stub_ra);
     this_cpu(last_extable_addr) = regs->rip;
 
     return true;
@@ -1256,7 +1307,7 @@ void do_invalid_op(struct cpu_user_regs
         void (*fn)(struct cpu_user_regs *) = bug_ptr(bug);
 
         fn(regs);
-        fixup_exception_return(regs, (unsigned long)eip);
+        fixup_exception_return(regs, (unsigned long)eip, 0);
         return;
     }
 
@@ -1277,7 +1328,7 @@ void do_invalid_op(struct cpu_user_regs
     case BUGFRAME_warn:
         printk("Xen WARN at %s%s:%d\n", prefix, filename, lineno);
         show_execution_state(regs);
-        fixup_exception_return(regs, (unsigned long)eip);
+        fixup_exception_return(regs, (unsigned long)eip, 0);
         return;
 
     case BUGFRAME_bug:
--- a/xen/include/asm-x86/uaccess.h
+++ b/xen/include/asm-x86/uaccess.h
@@ -421,7 +421,8 @@ union stub_exception_token {
     unsigned long raw;
 };
 
-extern unsigned long search_exception_table(const struct cpu_user_regs *regs);
+extern unsigned long search_exception_table(const struct cpu_user_regs *regs,
+                                            unsigned long *stub_ra);
 extern void sort_exception_tables(void);
 extern void sort_exception_table(struct exception_table_entry *start,
                                  const struct exception_table_entry *stop);
From: Jan Beulich <jbeulich@suse.com>
Subject: x86: account for shadow stack in exception-from-stub recovery

Dealing with exceptions raised from within emulation stubs involves
discarding return address (replaced by exception related information).
Such discarding of course also requires removing the corresponding entry
from the shadow stack.

Also amend the comment in fixup_exception_return(), to further clarify
why use of ptr[1] can't be an out-of-bounds access.

This is CVE-2023-46841 / XSA-451.

Fixes: 209fb9919b50 ("x86/extable: Adjust extable handling to be shadow stack compatible")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -86,26 +86,29 @@ search_one_extable(const struct exceptio
 }
 
 unsigned long
-search_exception_table(const struct cpu_user_regs *regs)
+search_exception_table(const struct cpu_user_regs *regs, unsigned long *stub_ra)
 {
     const struct virtual_region *region = find_text_region(regs->rip);
     unsigned long stub = this_cpu(stubs.addr);
 
     if ( region && region->ex )
+    {
+        *stub_ra = 0;
         return search_one_extable(region->ex, region->ex_end, regs->rip);
+    }
 
     if ( regs->rip >= stub + STUB_BUF_SIZE / 2 &&
          regs->rip < stub + STUB_BUF_SIZE &&
          regs->rsp > (unsigned long)regs &&
          regs->rsp < (unsigned long)get_cpu_info() )
     {
-        unsigned long retptr = *(unsigned long *)regs->rsp;
+        unsigned long retaddr = *(unsigned long *)regs->rsp, fixup;
 
-        region = find_text_region(retptr);
-        retptr = region && region->ex
-                 ? search_one_extable(region->ex, region->ex_end, retptr)
-                 : 0;
-        if ( retptr )
+        region = find_text_region(retaddr);
+        fixup = region && region->ex
+                ? search_one_extable(region->ex, region->ex_end, retaddr)
+                : 0;
+        if ( fixup )
         {
             /*
              * Put trap number and error code on the stack (in place of the
@@ -117,7 +120,8 @@ search_exception_table(const struct cpu_
             };
 
             *(unsigned long *)regs->rsp = token.raw;
-            return retptr;
+            *stub_ra = retaddr;
+            return fixup;
         }
     }
 
--- a/xen/arch/x86/include/asm/uaccess.h
+++ b/xen/arch/x86/include/asm/uaccess.h
@@ -421,7 +421,8 @@ union stub_exception_token {
     unsigned long raw;
 };
 
-extern unsigned long search_exception_table(const struct cpu_user_regs *regs);
+extern unsigned long search_exception_table(const struct cpu_user_regs *regs,
+                                            unsigned long *stub_ra);
 extern void sort_exception_tables(void);
 extern void sort_exception_table(struct exception_table_entry *start,
                                  const struct exception_table_entry *stop);
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -856,7 +856,7 @@ void do_unhandled_trap(struct cpu_user_r
 }
 
 static void fixup_exception_return(struct cpu_user_regs *regs,
-                                   unsigned long fixup)
+                                   unsigned long fixup, unsigned long stub_ra)
 {
     if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
     {
@@ -873,7 +873,8 @@ static void fixup_exception_return(struc
             /*
              * Search for %rip.  The shstk currently looks like this:
              *
-             *   ...  [Likely pointed to by SSP]
+             *   tok  [Supervisor token, == &tok | BUSY, only with FRED inactive]
+             *   ...  [Pointed to by SSP for most exceptions, empty in IST cases]
              *   %cs  [== regs->cs]
              *   %rip [== regs->rip]
              *   SSP  [Likely points to 3 slots higher, above %cs]
@@ -891,7 +892,56 @@ static void fixup_exception_return(struc
              */
             if ( ptr[0] == regs->rip && ptr[1] == regs->cs )
             {
+                unsigned long primary_shstk =
+                    (ssp & ~(STACK_SIZE - 1)) +
+                    (PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8;
+
                 wrss(fixup, ptr);
+
+                if ( !stub_ra )
+                    goto shstk_done;
+
+                /*
+                 * Stub recovery ought to happen only when the outer context
+                 * was on the main shadow stack.  We need to also "pop" the
+                 * stub's return address from the interrupted context's shadow
+                 * stack.  That is,
+                 * - if we're still on the main stack, we need to move the
+                 *   entire stack (up to and including the exception frame)
+                 *   up by one slot, incrementing the original SSP in the
+                 *   exception frame,
+                 * - if we're on an IST stack, we need to increment the
+                 *   original SSP.
+                 */
+                BUG_ON((ptr[-1] ^ primary_shstk) >> PAGE_SHIFT);
+
+                if ( (ssp ^ primary_shstk) >> PAGE_SHIFT )
+                {
+                    /*
+                     * We're on an IST stack.  First make sure the two return
+                     * addresses actually match.  Then increment the interrupted
+                     * context's SSP.
+                     */
+                    BUG_ON(stub_ra != *(unsigned long*)ptr[-1]);
+                    wrss(ptr[-1] + 8, &ptr[-1]);
+                    goto shstk_done;
+                }
+
+                /* Make sure the two return addresses actually match. */
+                BUG_ON(stub_ra != ptr[2]);
+
+                /* Move exception frame, updating SSP there. */
+                wrss(ptr[1], &ptr[2]); /* %cs */
+                wrss(ptr[0], &ptr[1]); /* %rip */
+                wrss(ptr[-1] + 8, &ptr[0]); /* SSP */
+
+                /* Move all newer entries. */
+                while ( --ptr != _p(ssp) )
+                    wrss(ptr[-1], &ptr[0]);
+
+                /* Finally account for our own stack having shifted up. */
+                asm volatile ( "incsspd %0" :: "r" (2) );
+
                 goto shstk_done;
             }
         }
@@ -912,7 +962,8 @@ static void fixup_exception_return(struc
 
 static bool extable_fixup(struct cpu_user_regs *regs, bool print)
 {
-    unsigned long fixup = search_exception_table(regs);
+    unsigned long stub_ra = 0;
+    unsigned long fixup = search_exception_table(regs, &stub_ra);
 
     if ( unlikely(fixup == 0) )
         return false;
@@ -926,7 +977,7 @@ static bool extable_fixup(struct cpu_use
                vector_name(regs->entry_vector), regs->error_code,
                _p(regs->rip), _p(regs->rip), _p(fixup));
 
-    fixup_exception_return(regs, fixup);
+    fixup_exception_return(regs, fixup, stub_ra);
     this_cpu(last_extable_addr) = regs->rip;
 
     return true;
@@ -1214,7 +1265,7 @@ void do_invalid_op(struct cpu_user_regs
         void (*fn)(struct cpu_user_regs *) = bug_ptr(bug);
 
         fn(regs);
-        fixup_exception_return(regs, (unsigned long)eip);
+        fixup_exception_return(regs, (unsigned long)eip, 0);
         return;
     }
 
@@ -1235,7 +1286,7 @@ void do_invalid_op(struct cpu_user_regs
     case BUGFRAME_warn:
         printk("Xen WARN at %s%s:%d\n", prefix, filename, lineno);
         show_execution_state(regs);
-        fixup_exception_return(regs, (unsigned long)eip);
+        fixup_exception_return(regs, (unsigned long)eip, 0);
         return;
 
     case BUGFRAME_bug:
From: Jan Beulich <jbeulich@suse.com>
Subject: x86: account for shadow stack in exception-from-stub recovery

Dealing with exceptions raised from within emulation stubs involves
discarding return address (replaced by exception related information).
Such discarding of course also requires removing the corresponding entry
from the shadow stack.

Also amend the comment in fixup_exception_return(), to further clarify
why use of ptr[1] can't be an out-of-bounds access.

While touching do_invalid_op() also add a missing fall-through
annotation.

This is CVE-2023-46841 / XSA-451.

Fixes: 209fb9919b50 ("x86/extable: Adjust extable handling to be shadow stack compatible")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -86,26 +86,29 @@ search_one_extable(const struct exceptio
 }
 
 unsigned long
-search_exception_table(const struct cpu_user_regs *regs)
+search_exception_table(const struct cpu_user_regs *regs, unsigned long *stub_ra)
 {
     const struct virtual_region *region = find_text_region(regs->rip);
     unsigned long stub = this_cpu(stubs.addr);
 
     if ( region && region->ex )
+    {
+        *stub_ra = 0;
         return search_one_extable(region->ex, region->ex_end, regs->rip);
+    }
 
     if ( regs->rip >= stub + STUB_BUF_SIZE / 2 &&
          regs->rip < stub + STUB_BUF_SIZE &&
          regs->rsp > (unsigned long)regs &&
          regs->rsp < (unsigned long)get_cpu_info() )
     {
-        unsigned long retptr = *(unsigned long *)regs->rsp;
+        unsigned long retaddr = *(unsigned long *)regs->rsp, fixup;
 
-        region = find_text_region(retptr);
-        retptr = region && region->ex
-                 ? search_one_extable(region->ex, region->ex_end, retptr)
-                 : 0;
-        if ( retptr )
+        region = find_text_region(retaddr);
+        fixup = region && region->ex
+                ? search_one_extable(region->ex, region->ex_end, retaddr)
+                : 0;
+        if ( fixup )
         {
             /*
              * Put trap number and error code on the stack (in place of the
@@ -117,7 +120,8 @@ search_exception_table(const struct cpu_
             };
 
             *(unsigned long *)regs->rsp = token.raw;
-            return retptr;
+            *stub_ra = retaddr;
+            return fixup;
         }
     }
 
--- a/xen/arch/x86/include/asm/uaccess.h
+++ b/xen/arch/x86/include/asm/uaccess.h
@@ -421,7 +421,8 @@ union stub_exception_token {
     unsigned long raw;
 };
 
-extern unsigned long search_exception_table(const struct cpu_user_regs *regs);
+extern unsigned long search_exception_table(const struct cpu_user_regs *regs,
+                                            unsigned long *stub_ra);
 extern void sort_exception_tables(void);
 extern void sort_exception_table(struct exception_table_entry *start,
                                  const struct exception_table_entry *stop);
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -845,7 +845,7 @@ void do_unhandled_trap(struct cpu_user_r
 }
 
 static void fixup_exception_return(struct cpu_user_regs *regs,
-                                   unsigned long fixup)
+                                   unsigned long fixup, unsigned long stub_ra)
 {
     if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
     {
@@ -862,7 +862,8 @@ static void fixup_exception_return(struc
             /*
              * Search for %rip.  The shstk currently looks like this:
              *
-             *   ...  [Likely pointed to by SSP]
+             *   tok  [Supervisor token, == &tok | BUSY, only with FRED inactive]
+             *   ...  [Pointed to by SSP for most exceptions, empty in IST cases]
              *   %cs  [== regs->cs]
              *   %rip [== regs->rip]
              *   SSP  [Likely points to 3 slots higher, above %cs]
@@ -880,7 +881,56 @@ static void fixup_exception_return(struc
              */
             if ( ptr[0] == regs->rip && ptr[1] == regs->cs )
             {
+                unsigned long primary_shstk =
+                    (ssp & ~(STACK_SIZE - 1)) +
+                    (PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8;
+
                 wrss(fixup, ptr);
+
+                if ( !stub_ra )
+                    goto shstk_done;
+
+                /*
+                 * Stub recovery ought to happen only when the outer context
+                 * was on the main shadow stack.  We need to also "pop" the
+                 * stub's return address from the interrupted context's shadow
+                 * stack.  That is,
+                 * - if we're still on the main stack, we need to move the
+                 *   entire stack (up to and including the exception frame)
+                 *   up by one slot, incrementing the original SSP in the
+                 *   exception frame,
+                 * - if we're on an IST stack, we need to increment the
+                 *   original SSP.
+                 */
+                BUG_ON((ptr[-1] ^ primary_shstk) >> PAGE_SHIFT);
+
+                if ( (ssp ^ primary_shstk) >> PAGE_SHIFT )
+                {
+                    /*
+                     * We're on an IST stack.  First make sure the two return
+                     * addresses actually match.  Then increment the interrupted
+                     * context's SSP.
+                     */
+                    BUG_ON(stub_ra != *(unsigned long*)ptr[-1]);
+                    wrss(ptr[-1] + 8, &ptr[-1]);
+                    goto shstk_done;
+                }
+
+                /* Make sure the two return addresses actually match. */
+                BUG_ON(stub_ra != ptr[2]);
+
+                /* Move exception frame, updating SSP there. */
+                wrss(ptr[1], &ptr[2]); /* %cs */
+                wrss(ptr[0], &ptr[1]); /* %rip */
+                wrss(ptr[-1] + 8, &ptr[0]); /* SSP */
+
+                /* Move all newer entries. */
+                while ( --ptr != _p(ssp) )
+                    wrss(ptr[-1], &ptr[0]);
+
+                /* Finally account for our own stack having shifted up. */
+                asm volatile ( "incsspd %0" :: "r" (2) );
+
                 goto shstk_done;
             }
         }
@@ -901,7 +951,8 @@ static void fixup_exception_return(struc
 
 static bool extable_fixup(struct cpu_user_regs *regs, bool print)
 {
-    unsigned long fixup = search_exception_table(regs);
+    unsigned long stub_ra = 0;
+    unsigned long fixup = search_exception_table(regs, &stub_ra);
 
     if ( unlikely(fixup == 0) )
         return false;
@@ -915,7 +966,7 @@ static bool extable_fixup(struct cpu_use
                vector_name(regs->entry_vector), regs->error_code,
                _p(regs->rip), _p(regs->rip), _p(fixup));
 
-    fixup_exception_return(regs, fixup);
+    fixup_exception_return(regs, fixup, stub_ra);
     this_cpu(last_extable_addr) = regs->rip;
 
     return true;
@@ -1183,7 +1234,8 @@ void do_invalid_op(struct cpu_user_regs
     {
     case BUGFRAME_run_fn:
     case BUGFRAME_warn:
-        fixup_exception_return(regs, (unsigned long)eip);
+        fixup_exception_return(regs, (unsigned long)eip, 0);
+        fallthrough;
     case BUGFRAME_bug:
     case BUGFRAME_assert:
         return;