Xen Security Advisory 427 v2 (CVE-2022-42332) - x86 shadow plus log-dirty mode use-after-free

Posted by Xen.org security team 2 years, 10 months ago
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

            Xen Security Advisory CVE-2022-42332 / XSA-427
                               version 2

             x86 shadow plus log-dirty mode use-after-free

UPDATES IN VERSION 2
====================

Public release.

ISSUE DESCRIPTION
=================

In environments where host assisted address translation is necessary
but Hardware Assisted Paging (HAP) is unavailable, Xen will run guests
in so called shadow mode.  Shadow mode maintains a pool of memory used
for both shadow page tables as well as auxiliary data structures.  To
migrate or snapshot guests, Xen additionally runs them in so called
log-dirty mode.  The data structures needed by the log-dirty tracking
are part of aformentioned auxiliary data.

In order to keep error handling efforts within reasonable bounds, for
operations which may require memory allocations shadow mode logic
ensures up front that enough memory is available for the worst case
requirements.  Unfortunately, while page table memory is properly
accounted for on the code path requiring the potential establishing of
new shadows, demands by the log-dirty infrastructure were not taken into
consideration.  As a result, just established shadow page tables could
be freed again immediately, while other code is still accessing them on
the assumption that they would remain allocated.

IMPACT
======

Guests running in shadow mode and being subject to migration or
snapshotting may be able to cause Denial of Service and other problems,
including escalation of privilege.

VULNERABLE SYSTEMS
==================

All Xen versions from at least 3.2 onwards are vulnerable.  Earlier
versions have not been inspected.

Only x86 systems are vulnerable.  The vulnerability is limited to
migration and snapshotting of guests, and only to PV ones as well as
HVM or PVH ones run with shadow paging.

MITIGATION
==========

Not migrating or snapshotting guests will avoid the vulnerability.

Running only HVM or PVH guests and only in HAP (Hardware Assisted
Paging) mode will also avoid the vulnerability.

CREDITS
=======

This issue was discovered by Jan Beulich of SUSE.

RESOLUTION
==========

Applying the appropriate attached patch resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa427.patch           xen-unstable - Xen 4.17.x
xsa427-4.16.patch      Xen 4.16.x
xsa427-4.15.patch      Xen 4.15.x
xsa427-4.14.patch      Xen 4.14.x

$ sha256sum xsa427*
5ebcdc495ba6f439e47be7e17dbb8fbdecf4de66d2fac560d460f6841bd3bef3  xsa427.meta
aa39316cbb81478c62b3d5c5aea7edfb6ade3bc5e6d0aa8c4677f9ea7bcc97a2  xsa427.patch
5ba679bc2170b0d9cd4c6ce139057e3287a6ee00434fa0e9a7a02441420030ff  xsa427-4.14.patch
410ee6be28412841ab5aba1131f7dd7b84b9983f6c93974605f196fd278562e1  xsa427-4.15.patch
76c1850eb9a274c1feb5a8645f61ecf394a0551278f4e40e123ec07ea307f101  xsa427-4.16.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches and/or mitigations described above (or
others which are substantially similar) is permitted during the
embargo, even on public-facing systems with untrusted guest users and
administrators.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmQZlVkMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZgRMH/RU6mB8M/feJeZDkYbrLPmT3yLiw6BpWroMTUTpv
5kIlixxlfQqyv8gqd25p5WMMKUsZlPZdLCT0iOlyMTNz6EUPRBME2Yb3ByiM7O7/
kFtlFDk5ZY5c/Vk1w0XuLm+YcABj0xnsn003YvgknmZfBJ2HWdR2iIayT/NjfQ+u
twErqUqa7il2Em5M8ZwHZeJjCUN9t+g2sv5sdI/rQeRge8ofjsquLubpgUVMGjiV
xwwUPCn3co0/2WArB4mHjWCNcoATk1NVZ3CTUyKGl5Mr+EvdmYWvzmlDa4wc8QPV
tNoASqXw0MbOOTy+RnZQHwappCDP371MirPq4IaTwiXy7eo=
=0flx
-----END PGP SIGNATURE-----
{
  "XSA": 427,
  "SupportedVersions": [
    "master",
    "4.17",
    "4.16",
    "4.15",
    "4.14"
  ],
  "Trees": [
    "xen"
  ],
  "Recipes": {
    "4.14": {
      "Recipes": {
        "xen": {
          "StableRef": "c267abfaf2d8176371eda037f9b9152458e0656d",
          "Prereqs": [],
          "Patches": [
            "xsa427-4.14.patch"
          ]
        }
      }
    },
    "4.15": {
      "Recipes": {
        "xen": {
          "StableRef": "fa875574b73618daf3bc70e6ff4d342493fa11d9",
          "Prereqs": [],
          "Patches": [
            "xsa427-4.15.patch"
          ]
        }
      }
    },
    "4.16": {
      "Recipes": {
        "xen": {
          "StableRef": "84dfe7a56f04a7412fa4869b3e756c49e1cfbe75",
          "Prereqs": [],
          "Patches": [
            "xsa427-4.16.patch"
          ]
        }
      }
    },
    "4.17": {
      "Recipes": {
        "xen": {
          "StableRef": "ec5b058d2a6436a2e180315522fcf1645a8153b4",
          "Prereqs": [],
          "Patches": [
            "xsa427.patch"
          ]
        }
      }
    },
    "master": {
      "Recipes": {
        "xen": {
          "StableRef": "31270f11a96ebb875cd70661e2df9e5c6edd7564",
          "Prereqs": [],
          "Patches": [
            "xsa427.patch"
          ]
        }
      }
    }
  }
}From: Jan Beulich <jbeulich@suse.com>
Subject: x86/shadow: account for log-dirty mode when pre-allocating

Pre-allocation is intended to ensure that in the course of constructing
or updating shadows there won't be any risk of just made shadows or
shadows being acted upon can disappear under our feet. The amount of
pages pre-allocated then, however, needs to account for all possible
subsequent allocations. While the use in sh_page_fault() accounts for
all shadows which may need making, so far it didn't account for
allocations coming from log-dirty tracking (which piggybacks onto the
P2M allocation functions).

Since shadow_prealloc() takes a count of shadows (or other data
structures) rather than a count of pages, putting the adjustment at the
call site of this function won't work very well: We simply can't express
the correct count that way in all cases. Instead take care of this in
the function itself, by "snooping" for L1 type requests. (While not
applicable right now, future new request sites of L1 tables would then
also be covered right away.)

It is relevant to note here that pre-allocations like the one done from
shadow_alloc_p2m_page() are benign when they fall in the "scope" of an
earlier pre-alloc which already included that count: The inner call will
simply find enough pages available then; it'll bail right away.

This is CVE-2022-42332 / XSA-427.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
---
v2: Entirely different approach.

--- a/xen/arch/x86/include/asm/paging.h
+++ b/xen/arch/x86/include/asm/paging.h
@@ -189,6 +189,10 @@ bool paging_mfn_is_dirty(const struct do
 #define L4_LOGDIRTY_IDX(pfn) ((pfn_x(pfn) >> (PAGE_SHIFT + 3 + PAGETABLE_ORDER * 2)) & \
                               (LOGDIRTY_NODE_ENTRIES-1))
 
+#define paging_logdirty_levels() \
+    (DIV_ROUND_UP(PADDR_BITS - PAGE_SHIFT - (PAGE_SHIFT + 3), \
+                  PAGE_SHIFT - ilog2(sizeof(mfn_t))) + 1)
+
 #ifdef CONFIG_HVM
 /* VRAM dirty tracking support */
 struct sh_dirty_vram {
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -282,6 +282,7 @@ void paging_mark_pfn_dirty(struct domain
     if ( unlikely(!VALID_M2P(pfn_x(pfn))) )
         return;
 
+    BUILD_BUG_ON(paging_logdirty_levels() != 4);
     i1 = L1_LOGDIRTY_IDX(pfn);
     i2 = L2_LOGDIRTY_IDX(pfn);
     i3 = L3_LOGDIRTY_IDX(pfn);
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1011,7 +1011,17 @@ bool shadow_prealloc(struct domain *d, u
     if ( unlikely(d->is_dying) )
        return false;
 
-    ret = _shadow_prealloc(d, shadow_size(type) * count);
+    count *= shadow_size(type);
+    /*
+     * Log-dirty handling may result in allocations when populating its
+     * tracking structures.  Tie this to the caller requesting space for L1
+     * shadows.
+     */
+    if ( paging_mode_log_dirty(d) &&
+         ((SHF_L1_ANY | SHF_FL1_ANY) & (1u << type)) )
+        count += paging_logdirty_levels();
+
+    ret = _shadow_prealloc(d, count);
     if ( !ret && (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
         /*
          * Failing to allocate memory required for shadow usage can only result in
From: Jan Beulich <jbeulich@suse.com>
Subject: x86/shadow: account for log-dirty mode when pre-allocating

Pre-allocation is intended to ensure that in the course of constructing
or updating shadows there won't be any risk of just made shadows or
shadows being acted upon can disappear under our feet. The amount of
pages pre-allocated then, however, needs to account for all possible
subsequent allocations. While the use in sh_page_fault() accounts for
all shadows which may need making, so far it didn't account for
allocations coming from log-dirty tracking (which piggybacks onto the
P2M allocation functions).

Since shadow_prealloc() takes a count of shadows (or other data
structures) rather than a count of pages, putting the adjustment at the
call site of this function won't work very well: We simply can't express
the correct count that way in all cases. Instead take care of this in
the function itself, by "snooping" for L1 type requests. (While not
applicable right now, future new request sites of L1 tables would then
also be covered right away.)

It is relevant to note here that pre-allocations like the one done from
shadow_alloc_p2m_page() are benign when they fall in the "scope" of an
earlier pre-alloc which already included that count: The inner call will
simply find enough pages available then; it'll bail right away.

This is CVE-2022-42332 / XSA-427.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>

--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -190,6 +190,10 @@ int paging_mfn_is_dirty(struct domain *d
 #define L4_LOGDIRTY_IDX(pfn) ((pfn_x(pfn) >> (PAGE_SHIFT + 3 + PAGETABLE_ORDER * 2)) & \
                               (LOGDIRTY_NODE_ENTRIES-1))
 
+#define paging_logdirty_levels() \
+    (DIV_ROUND_UP(PADDR_BITS - PAGE_SHIFT - (PAGE_SHIFT + 3), \
+                  PAGE_SHIFT - ilog2(sizeof(mfn_t))) + 1)
+
 /* VRAM dirty tracking support */
 struct sh_dirty_vram {
     unsigned long begin_pfn;
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -278,6 +278,7 @@ void paging_mark_pfn_dirty(struct domain
     if ( unlikely(!VALID_M2P(pfn_x(pfn))) )
         return;
 
+    BUILD_BUG_ON(paging_logdirty_levels() != 4);
     i1 = L1_LOGDIRTY_IDX(pfn);
     i2 = L2_LOGDIRTY_IDX(pfn);
     i3 = L3_LOGDIRTY_IDX(pfn);
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1012,7 +1012,17 @@ bool shadow_prealloc(struct domain *d, u
     if ( unlikely(d->is_dying) )
        return false;
 
-    ret = _shadow_prealloc(d, shadow_size(type) * count);
+    count *= shadow_size(type);
+    /*
+     * Log-dirty handling may result in allocations when populating its
+     * tracking structures.  Tie this to the caller requesting space for L1
+     * shadows.
+     */
+    if ( paging_mode_log_dirty(d) &&
+         ((SHF_L1_ANY | SHF_FL1_ANY) & (1u << type)) )
+        count += paging_logdirty_levels();
+
+    ret = _shadow_prealloc(d, count);
     if ( !ret && (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
         /*
          * Failing to allocate memory required for shadow usage can only result in
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -269,6 +269,7 @@ static inline void sh_terminate_list(str
 #define SHF_64  (SHF_L1_64|SHF_FL1_64|SHF_L2_64|SHF_L2H_64|SHF_L3_64|SHF_L4_64)
 
 #define SHF_L1_ANY  (SHF_L1_32|SHF_L1_PAE|SHF_L1_64)
+#define SHF_FL1_ANY (SHF_FL1_32|SHF_FL1_PAE|SHF_FL1_64)
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
 /* Marks a guest L1 page table which is shadowed but not write-protected.
From: Jan Beulich <jbeulich@suse.com>
Subject: x86/shadow: account for log-dirty mode when pre-allocating

Pre-allocation is intended to ensure that in the course of constructing
or updating shadows there won't be any risk of just made shadows or
shadows being acted upon can disappear under our feet. The amount of
pages pre-allocated then, however, needs to account for all possible
subsequent allocations. While the use in sh_page_fault() accounts for
all shadows which may need making, so far it didn't account for
allocations coming from log-dirty tracking (which piggybacks onto the
P2M allocation functions).

Since shadow_prealloc() takes a count of shadows (or other data
structures) rather than a count of pages, putting the adjustment at the
call site of this function won't work very well: We simply can't express
the correct count that way in all cases. Instead take care of this in
the function itself, by "snooping" for L1 type requests. (While not
applicable right now, future new request sites of L1 tables would then
also be covered right away.)

It is relevant to note here that pre-allocations like the one done from
shadow_alloc_p2m_page() are benign when they fall in the "scope" of an
earlier pre-alloc which already included that count: The inner call will
simply find enough pages available then; it'll bail right away.

This is CVE-2022-42332 / XSA-427.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>

--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -190,6 +190,10 @@ int paging_mfn_is_dirty(struct domain *d
 #define L4_LOGDIRTY_IDX(pfn) ((pfn_x(pfn) >> (PAGE_SHIFT + 3 + PAGETABLE_ORDER * 2)) & \
                               (LOGDIRTY_NODE_ENTRIES-1))
 
+#define paging_logdirty_levels() \
+    (DIV_ROUND_UP(PADDR_BITS - PAGE_SHIFT - (PAGE_SHIFT + 3), \
+                  PAGE_SHIFT - ilog2(sizeof(mfn_t))) + 1)
+
 #ifdef CONFIG_HVM
 /* VRAM dirty tracking support */
 struct sh_dirty_vram {
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -280,6 +280,7 @@ void paging_mark_pfn_dirty(struct domain
     if ( unlikely(!VALID_M2P(pfn_x(pfn))) )
         return;
 
+    BUILD_BUG_ON(paging_logdirty_levels() != 4);
     i1 = L1_LOGDIRTY_IDX(pfn);
     i2 = L2_LOGDIRTY_IDX(pfn);
     i3 = L3_LOGDIRTY_IDX(pfn);
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1014,7 +1014,17 @@ bool shadow_prealloc(struct domain *d, u
     if ( unlikely(d->is_dying) )
        return false;
 
-    ret = _shadow_prealloc(d, shadow_size(type) * count);
+    count *= shadow_size(type);
+    /*
+     * Log-dirty handling may result in allocations when populating its
+     * tracking structures.  Tie this to the caller requesting space for L1
+     * shadows.
+     */
+    if ( paging_mode_log_dirty(d) &&
+         ((SHF_L1_ANY | SHF_FL1_ANY) & (1u << type)) )
+        count += paging_logdirty_levels();
+
+    ret = _shadow_prealloc(d, count);
     if ( !ret && (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
         /*
          * Failing to allocate memory required for shadow usage can only result in
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -269,6 +269,7 @@ static inline void sh_terminate_list(str
 #define SHF_64  (SHF_L1_64|SHF_FL1_64|SHF_L2_64|SHF_L2H_64|SHF_L3_64|SHF_L4_64)
 
 #define SHF_L1_ANY  (SHF_L1_32|SHF_L1_PAE|SHF_L1_64)
+#define SHF_FL1_ANY (SHF_FL1_32|SHF_FL1_PAE|SHF_FL1_64)
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
 /* Marks a guest L1 page table which is shadowed but not write-protected.
From: Jan Beulich <jbeulich@suse.com>
Subject: x86/shadow: account for log-dirty mode when pre-allocating

Pre-allocation is intended to ensure that in the course of constructing
or updating shadows there won't be any risk of just made shadows or
shadows being acted upon can disappear under our feet. The amount of
pages pre-allocated then, however, needs to account for all possible
subsequent allocations. While the use in sh_page_fault() accounts for
all shadows which may need making, so far it didn't account for
allocations coming from log-dirty tracking (which piggybacks onto the
P2M allocation functions).

Since shadow_prealloc() takes a count of shadows (or other data
structures) rather than a count of pages, putting the adjustment at the
call site of this function won't work very well: We simply can't express
the correct count that way in all cases. Instead take care of this in
the function itself, by "snooping" for L1 type requests. (While not
applicable right now, future new request sites of L1 tables would then
also be covered right away.)

It is relevant to note here that pre-allocations like the one done from
shadow_alloc_p2m_page() are benign when they fall in the "scope" of an
earlier pre-alloc which already included that count: The inner call will
simply find enough pages available then; it'll bail right away.

This is CVE-2022-42332 / XSA-427.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>

--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -192,6 +192,10 @@ int paging_mfn_is_dirty(struct domain *d
 #define L4_LOGDIRTY_IDX(pfn) ((pfn_x(pfn) >> (PAGE_SHIFT + 3 + PAGETABLE_ORDER * 2)) & \
                               (LOGDIRTY_NODE_ENTRIES-1))
 
+#define paging_logdirty_levels() \
+    (DIV_ROUND_UP(PADDR_BITS - PAGE_SHIFT - (PAGE_SHIFT + 3), \
+                  PAGE_SHIFT - ilog2(sizeof(mfn_t))) + 1)
+
 #ifdef CONFIG_HVM
 /* VRAM dirty tracking support */
 struct sh_dirty_vram {
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -280,6 +280,7 @@ void paging_mark_pfn_dirty(struct domain
     if ( unlikely(!VALID_M2P(pfn_x(pfn))) )
         return;
 
+    BUILD_BUG_ON(paging_logdirty_levels() != 4);
     i1 = L1_LOGDIRTY_IDX(pfn);
     i2 = L2_LOGDIRTY_IDX(pfn);
     i3 = L3_LOGDIRTY_IDX(pfn);
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1015,7 +1015,17 @@ bool shadow_prealloc(struct domain *d, u
     if ( unlikely(d->is_dying) )
        return false;
 
-    ret = _shadow_prealloc(d, shadow_size(type) * count);
+    count *= shadow_size(type);
+    /*
+     * Log-dirty handling may result in allocations when populating its
+     * tracking structures.  Tie this to the caller requesting space for L1
+     * shadows.
+     */
+    if ( paging_mode_log_dirty(d) &&
+         ((SHF_L1_ANY | SHF_FL1_ANY) & (1u << type)) )
+        count += paging_logdirty_levels();
+
+    ret = _shadow_prealloc(d, count);
     if ( !ret && (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
         /*
          * Failing to allocate memory required for shadow usage can only result in