Xen Security Advisory 399 v2 (CVE-2022-26357) - race in VT-d domain ID cleanup

Xen.org security team posted 1 patch 2 years, 1 month ago
Failed in applying to current master (apply log)
Xen Security Advisory 399 v2 (CVE-2022-26357) - race in VT-d domain ID cleanup
Posted by Xen.org security team 2 years, 1 month ago
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

            Xen Security Advisory CVE-2022-26357 / XSA-399
                               version 2

                    race in VT-d domain ID cleanup

UPDATES IN VERSION 2
====================

Public release.

ISSUE DESCRIPTION
=================

Xen domain IDs are up to 15 bits wide.  VT-d hardware may allow for only
less than 15 bits to hold a domain ID associating a physical device with
a particular domain.  Therefore internally Xen domain IDs are mapped to
the smaller value range.  The cleaning up of the housekeeping structures
has a race, allowing for VT-d domain IDs to be leaked and flushes to be
bypassed.

IMPACT
======

The precise impact is system specific, but would typically be a Denial
of Service (DoS) affecting the entire host.  Privilege escalation and
information leaks cannot be ruled out.

VULNERABLE SYSTEMS
==================

Xen versions 4.11 through 4.16 are vulnerable.  Xen versions 4.10 and
earlier are not vulnerable.

Only x86 systems with VT-d IOMMU hardware are vulnerable.  Arm systems
as well as x86 systems without VT-d hardware or without any IOMMUs in
use are not vulnerable.

Only x86 guests which have physical devices passed through to them can
leverage the vulnerability.

MITIGATION
==========

Not passing through physical devices to untrusted guests will avoid
the vulnerability.

CREDITS
=======

This issue was discovered by Jan Beulich of SUSE.

RESOLUTION
==========

Applying the appropriate attached patch resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa399.patch           xen-unstable
xsa399-4.16.patch      Xen 4.16.x - Xen 4.13.x
xsa399-4.12.patch      Xen 4.12.x

$ sha256sum xsa399*
53b9745564eb21f70dbb7bd7194ff3518f29cd9715c68e9dd7eff25812968019  xsa399.patch
16c3327a60d8ab6c3524f10f57d63efaf2e3e54b807bc285a749cd1a94392a30  xsa399-4.12.patch
79d0f5a0442dec0a806d77a722a1d2c04793572fe0b564bf86dcd1c6d992a679  xsa399-4.16.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches described above (or others which are
substantially similar) is permitted during the embargo, even on
public-facing systems with untrusted guest users and administrators.

HOWEVER, deployment of the mitigation is NOT permitted (except where
all the affected systems and VMs are administered and used only by
organisations which are members of the Xen Project Security Issues
Predisclosure List).  Specifically, deployment on public cloud systems
is NOT permitted.

This is because removal of pass-through devices or their replacement by
emulated devices is a guest visible configuration change, which may lead
to re-discovery of the issue.

Deployment of this mitigation is permitted only AFTER the embargo ends.

AND: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmJMJDcMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZpo8H/AqiAS0l5WJWl00bTQ4Q69REzd83m9Y3+UnUqRaf
JUFWo4R1m4V2zJlq0E3TR/2ZS1RkXFJxlmXQyzueFmDEvMV2oKB0ids5ta1oUO2E
eiQxdSFbTLrLnhI+4IxbTHHy+ovSHT/SKPeo1Zd1tXHfZ35g1OgGTYHHqj7RKJHp
SyZT4iuAKjIr61M4NBKJcycpfRidlXEDvAotDX3jBQ06t3vgs/12nwe5LzzeV2V4
sIDjpeDGNKzgT2NgLP2b+XMEUg1259iWb19tS3PPNJaLKSvQqTBOFjK+sqh7ACXV
v6ph2Yy0Q/ZP+N9DvCeBCPEU9A9RhmPYzobU+Lc/T85SrQ4=
=sp/Q
-----END PGP SIGNATURE-----
From: Jan Beulich <jbeulich@suse.com>
Subject: VT-d: correct ordering of operations in cleanup_domid_map()

The function may be called without any locks held (leaving aside the
domctl one, which we surely don't want to depend on here), so needs to
play safe wrt other accesses to domid_map[] and domid_bitmap[]. This is
to avoid context_set_domain_id()'s writing of domid_map[] to be reset to
zero right away in the case of it racing the freeing of a DID.

For the interaction with context_set_domain_id() and did_to_domain_id()
see the code comment.

{check_,}cleanup_domid_map() are called with pcidevs_lock held or during
domain cleanup only (and pcidevs_lock is also held around
context_set_domain_id()), i.e. racing calls with the same (dom, iommu)
tuple cannot occur.

domain_iommu_domid(), besides its use by cleanup_domid_map(), has its
result used only to control flushing, and hence a stale result would
only lead to a stray extra flush.

This is CVE-2022-26357 / XSA-399.

Fixes: b9c20c78789f ("VT-d: per-iommu domain-id")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -175,8 +175,14 @@ static void cleanup_domid_map(struct dom
 
     if ( iommu_domid >= 0 )
     {
+        /*
+         * Update domid_map[] /before/ domid_bitmap[] to avoid a race with
+         * context_set_domain_id(), setting the slot to DOMID_INVALID for
+         * did_to_domain_id() to return a suitable value while the bit is
+         * still set.
+         */
+        iommu->domid_map[iommu_domid] = DOMID_INVALID;
         clear_bit(iommu_domid, iommu->domid_bitmap);
-        iommu->domid_map[iommu_domid] = 0;
     }
 }
 
From: Jan Beulich <jbeulich@suse.com>
Subject: VT-d: correct ordering of operations in cleanup_domid_map()

The function may be called without any locks held (leaving aside the
domctl one, which we surely don't want to depend on here), so needs to
play safe wrt other accesses to domid_map[] and domid_bitmap[]. This is
to avoid context_set_domain_id()'s writing of domid_map[] to be reset to
zero right away in the case of it racing the freeing of a DID.

For the interaction with context_set_domain_id() and ->domid_map[] reads
see the code comment.

{check_,}cleanup_domid_map() are called with pcidevs_lock held or during
domain cleanup only (and pcidevs_lock is also held around
context_set_domain_id()), i.e. racing calls with the same (dom, iommu)
tuple cannot occur.

domain_iommu_domid(), besides its use by cleanup_domid_map(), has its
result used only to control flushing, and hence a stale result would
only lead to a stray extra flush.

This is CVE-2022-26357 / XSA-399.

Fixes: b9c20c78789f ("VT-d: per-iommu domain-id")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1770,8 +1770,14 @@ static int domain_context_unmap(struct d
             goto out;
         }
 
+        /*
+         * Update domid_map[] /before/ domid_bitmap[] to avoid a race with
+         * context_set_domain_id(), setting the slot to DOMID_INVALID for
+         * ->domid_map[] reads to produce a suitable value while the bit is
+         * still set.
+         */
+        iommu->domid_map[iommu_domid] = DOMID_INVALID;
         clear_bit(iommu_domid, iommu->domid_bitmap);
-        iommu->domid_map[iommu_domid] = 0;
     }
 
 out:
From: Jan Beulich <jbeulich@suse.com>
Subject: VT-d: correct ordering of operations in cleanup_domid_map()

The function may be called without any locks held (leaving aside the
domctl one, which we surely don't want to depend on here), so needs to
play safe wrt other accesses to domid_map[] and domid_bitmap[]. This is
to avoid context_set_domain_id()'s writing of domid_map[] to be reset to
zero right away in the case of it racing the freeing of a DID.

For the interaction with context_set_domain_id() and ->domid_map[] reads
see the code comment.

{check_,}cleanup_domid_map() are called with pcidevs_lock held or during
domain cleanup only (and pcidevs_lock is also held around
context_set_domain_id()), i.e. racing calls with the same (dom, iommu)
tuple cannot occur.

domain_iommu_domid(), besides its use by cleanup_domid_map(), has its
result used only to control flushing, and hence a stale result would
only lead to a stray extra flush.

This is CVE-2022-26357 / XSA-399.

Fixes: b9c20c78789f ("VT-d: per-iommu domain-id")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -152,8 +152,14 @@ static void cleanup_domid_map(struct dom
 
     if ( iommu_domid >= 0 )
     {
+        /*
+         * Update domid_map[] /before/ domid_bitmap[] to avoid a race with
+         * context_set_domain_id(), setting the slot to DOMID_INVALID for
+         * ->domid_map[] reads to produce a suitable value while the bit is
+         * still set.
+         */
+        iommu->domid_map[iommu_domid] = DOMID_INVALID;
         clear_bit(iommu_domid, iommu->domid_bitmap);
-        iommu->domid_map[iommu_domid] = 0;
     }
 }