From nobody Mon Feb  9 04:53:00 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F053CCDB465
	for <linux-kernel@archiver.kernel.org>; Mon, 16 Oct 2023 16:36:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234415AbjJPQgW (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 16 Oct 2023 12:36:22 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41198 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234345AbjJPQfn (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 16 Oct 2023 12:35:43 -0400
Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7989E6F89;
        Mon, 16 Oct 2023 09:22:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1697473371; x=1729009371;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=gDn2qlctsRhiSfk4B8miWiyBiv/7pxkEkr9nT08qBkA=;
  b=Z5Wh+bkbNlcyh/iBByb+uF8mG+BepxMoaapFb+Yl3S8CUU+1e3sZ1C0F
   FQn7aQNfmBEaJBzF1c29V2kDjl/co1ONM9w6Z+yTCodRASsfAAOXT4LnR
   akP0rQyahEs3u6/oJlOqQ0xvwuQeAa/ILYH0anpYVo2OToZAsiZN0piS9
   DdHtrxMUh8IXKKzxWz4+MD18peXdgFFCNpsAiIQRmTjB1/rJ4OHweInU1
   ZFkr15DF6QXqy3SVO8jF+hdI1UOAvlUW76WOZVBHJ0V7soVyfSyCGUeWo
   Iff/m+mYW2wfcAFYkGZfEPY1MBEnlEfmf57Qww5uDdcUeFI2sSGmbm1+T
   A==;
X-IronPort-AV: E=McAfee;i="6600,9927,10865"; a="364922164"
X-IronPort-AV: E=Sophos;i="6.03,229,1694761200";
   d="scan'208";a="364922164"
Received: from fmsmga006.fm.intel.com ([10.253.24.20])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Oct 2023 09:16:13 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10865"; a="1003006490"
X-IronPort-AV: E=Sophos;i="6.03,229,1694761200";
   d="scan'208";a="1003006490"
Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31])
  by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Oct 2023 09:16:12 -0700
From: isaku.yamahata@intel.com
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com,
        Paolo Bonzini <pbonzini@redhat.com>, erdemaktas@google.com,
        Sean Christopherson <seanjc@google.com>,
        Sagi Shahar <sagis@google.com>,
        David Matlack <dmatlack@google.com>,
        Kai Huang <kai.huang@intel.com>,
        Zhi Wang <zhi.wang.linux@gmail.com>, chen.bo@intel.com,
        hang.yuan@intel.com, tina.zhang@intel.com
Subject: [PATCH v16 112/116] KVM: x86: design documentation on TDX support of
 x86 KVM TDP MMU
Date: Mon, 16 Oct 2023 09:15:04 -0700
Message-Id: 
 <44763e315fdacf093a45caeac21732f5ec437912.1697471314.git.isaku.yamahata@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <cover.1697471314.git.isaku.yamahata@intel.com>
References: <cover.1697471314.git.isaku.yamahata@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a high level design document on TDX changes to TDP MMU.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 Documentation/virt/kvm/x86/index.rst       |   1 +
 Documentation/virt/kvm/x86/tdx-tdp-mmu.rst | 443 +++++++++++++++++++++
 2 files changed, 444 insertions(+)
 create mode 100644 Documentation/virt/kvm/x86/tdx-tdp-mmu.rst

diff --git a/Documentation/virt/kvm/x86/index.rst b/Documentation/virt/kvm/=
x86/index.rst
index 851e99174762..63a78bd41b16 100644
--- a/Documentation/virt/kvm/x86/index.rst
+++ b/Documentation/virt/kvm/x86/index.rst
@@ -16,4 +16,5 @@ KVM for x86 systems
    msr
    nested-vmx
    running-nested-guests
+   tdx-tdp-mmu
    timekeeping
diff --git a/Documentation/virt/kvm/x86/tdx-tdp-mmu.rst b/Documentation/vir=
t/kvm/x86/tdx-tdp-mmu.rst
new file mode 100644
index 000000000000..49d103720272
--- /dev/null
+++ b/Documentation/virt/kvm/x86/tdx-tdp-mmu.rst
@@ -0,0 +1,443 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Design of TDP MMU for TDX support
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
+This document describes a (high level) design for TDX support of KVM TDP M=
MU of
+x86 KVM.
+
+In this document, we use "TD" or "guest TD" to differentiate it from the c=
urrent
+"VM" (Virtual Machine), which is supported by KVM today.
+
+
+Background of TDX
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+TD private memory is designed to hold TD private content, encrypted by the=
 CPU
+using the TD ephemeral key.  An encryption engine holds a table of encrypt=
ion
+keys, and an encryption key is selected for each memory transaction based =
on a
+Host Key Identifier (HKID).  By design, the host VMM does not have access =
to the
+encryption keys.
+
+In the first generation of MKTME, HKID is "stolen" from the physical addre=
ss by
+allocating a configurable number of bits from the top of the physical addr=
ess.
+The HKID space is partitioned into shared HKIDs for legacy MKTME accesses =
and
+private HKIDs for SEAM-mode-only accesses.  We use 0 for the shared HKID o=
n the
+host so that MKTME can be opaque or bypassed on the host.
+
+During TDX non-root operation (i.e. guest TD), memory accesses can be qual=
ified
+as either shared or private, based on the value of a new SHARED bit in the=
 Guest
+Physical Address (GPA).  The CPU translates shared GPAs using the usual VM=
X EPT
+(Extended Page Table) or "Shared EPT" (in this document), which resides in=
 the
+host VMM memory.  The Shared EPT is directly managed by the host VMM - the=
 same
+as with the current VMX.  Since guest TDs usually require I/O, and the data
+exchange needs to be done via shared memory, thus KVM needs to use the cur=
rent
+EPT functionality even for TDs.
+
+The CPU translates private GPAs using a separate Secure EPT.  The Secure E=
PT
+pages are encrypted and integrity-protected with the TD's ephemeral privat=
e key.
+Secure EPT can be managed _indirectly_ by the host VMM, using the TDX inte=
rface
+functions (SEAMCALLs), and thus conceptually Secure EPT is a subset of EPT
+because not all functionalities are available.
+
+Since the execution of such interface functions takes much longer time than
+accessing memory directly, in KVM we use the existing TDP code to mirror t=
he
+Secure EPT for the TD. And we think there are at least two options today in
+terms of the timing for executing such SEAMCALLs:
+
+1. synchronous, i.e. while walking the TDP page tables, or
+2. post-walk, i.e. record what needs to be done to the real Secure EPT dur=
ing
+   the walk, and execute SEAMCALLs later.
+
+The option 1 seems to be more intuitive and simpler, but the Secure EPT
+concurrency rules are different from the ones of the TDP or EPT. For examp=
le,
+MEM.SEPT.RD acquire shared access to the whole Secure EPT tree of the targ=
et
+
+Secure EPT(SEPT) operations
+---------------------------
+Secure EPT is an Extended Page Table for GPA-to-HPA translation of TD priv=
ate
+HPA.  A Secure EPT is designed to be encrypted with the TD's ephemeral pri=
vate
+key. SEPT pages are allocated by the host VMM via Intel TDX functions, but=
 their
+content is intended to be hidden and is not architectural.
+
+Unlike the conventional EPT, the CPU can't directly read/write its entry.
+Instead, TDX SEAMCALL API is used.  Several SEAMCALLs correspond to operat=
ion on
+the EPT entry.
+
+* TDH.MEM.SEPT.ADD():
+
+  Add a secure EPT page from the secure EPT tree.  This corresponds to upd=
ating
+  the non-leaf EPT entry with present bit set
+
+* TDH.MEM.SEPT.REMOVE():
+
+  Remove the secure page from the secure EPT tree.  There is no correspond=
ing
+  to the EPT operation.
+
+* TDH.MEM.SEPT.RD():
+
+  Read the secure EPT entry.  This corresponds to reading the EPT entry as
+  memory.  Please note that this is much slower than direct memory reading.
+
+* TDH.MEM.PAGE.ADD() and TDH.MEM.PAGE.AUG():
+
+  Add a private page to the secure EPT tree.  This corresponds to updating=
 the
+  leaf EPT entry with present bit set.
+
+* THD.MEM.PAGE.REMOVE():
+
+  Remove a private page from the secure EPT tree.  There is no correspondi=
ng
+  to the EPT operation.
+
+* TDH.MEM.RANGE.BLOCK():
+
+  This (mostly) corresponds to clearing the present bit of the leaf EPT en=
try.
+  Note that the private page is still linked in the secure EPT.  To remove=
 it
+  from the secure EPT, TDH.MEM.SEPT.REMOVE() and TDH.MEM.PAGE.REMOVE() nee=
ds to
+  be called.
+
+* TDH.MEM.TRACK():
+
+  Increment the TLB epoch counter. This (mostly) corresponds to EPT TLB fl=
ush.
+  Note that the private page is still linked in the secure EPT.  To remove=
 it
+  from the secure EPT, tdh_mem_page_remove() needs to be called.
+
+
+Adding private page
+-------------------
+The procedure of populating the private page looks as follows.
+
+1. TDH.MEM.SEPT.ADD(512G level)
+2. TDH.MEM.SEPT.ADD(1G level)
+3. TDH.MEM.SEPT.ADD(2M level)
+4. TDH.MEM.PAGE.AUG(4K level)
+
+Those operations correspond to updating the EPT entries.
+
+Dropping private page and TLB shootdown
+---------------------------------------
+The procedure of dropping the private page looks as follows.
+
+1. TDH.MEM.RANGE.BLOCK(4K level)
+
+   This mostly corresponds to clear the present bit in the EPT entry.  This
+   prevents (or blocks) TLB entry from creating in the future.  Note that =
the
+   private page is still linked in the secure EPT tree and the existing ca=
che
+   entry in the TLB isn't flushed.
+
+2. TDH.MEM.TRACK(range) and TLB shootdown
+
+   This mostly corresponds to the EPT TLB shootdown.  Because all vcpus sh=
are
+   the same Secure EPT, all vcpus need to flush TLB.
+
+   * TDH.MEM.TRACK(range) by one vcpu.  It increments the global internal =
TLB
+     epoch counter.
+
+   * send IPI to remote vcpus
+   * Other vcpu exits to VMM from guest TD and then re-enter. TDH.VP.ENTER=
().
+   * TDH.VP.ENTER() checks the TLB epoch counter and If its TLB is old, fl=
ush
+     TLB.
+
+   Note that only single vcpu issues tdh_mem_track().
+
+   Note that the private page is still linked in the secure EPT tree, unli=
ke the
+   conventional EPT.
+
+3. TDH.MEM.PAGE.PROMOTE, TDH.MEM.PAGEDEMOTE(), TDH.MEM.PAGE.RELOCATE(), or
+   TDH.MEM.PAGE.REMOVE()
+
+   There is no corresponding operation to the conventional EPT.
+
+   * When changing page size (e.g. 4K <-> 2M) TDH.MEM.PAGE.PROMOTE() or
+     TDH.MEM.PAGE.DEMOTE() is used.  During those operation, the guest pag=
e is
+     kept referenced in the Secure EPT.
+
+   * When migrating page, TDH.MEM.PAGE.RELOCATE().  This requires both sou=
rce
+     page and destination page.
+   * when destroying TD, TDH.MEM.PAGE.REMOVE() removes the private page fr=
om the
+     secure EPT tree.  In this case TLB shootdown is not needed because vc=
pus
+     don't run any more.
+
+The basic idea for TDX support
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
+Because shared EPT is the same as the existing EPT, use the existing logic=
 for
+shared EPT.  On the other hand, secure EPT requires additional operations
+instead of directly reading/writing of the EPT entry.
+
+On EPT violation, The KVM mmu walks down the EPT tree from the root, deter=
mines
+the EPT entry to operate, and updates the entry. If necessary, a TLB shoot=
down
+is done.  Because it's very slow to directly walk secure EPT by TDX SEAMCA=
LL,
+TDH.MEM.SEPT.RD(), the mirror of secure EPT is created and maintained.  Add
+hooks to KVM MMU to reuse the existing code.
+
+EPT violation on shared GPA
+---------------------------
+(1) EPT violation on shared GPA or zapping shared GPA
+    ::
+
+        walk down shared EPT tree (the existing code)
+                |
+                |
+                V
+        shared EPT tree (CPU refers.)
+
+(2) update the EPT entry. (the existing code)
+
+    TLB shootdown in the case of zapping.
+
+
+EPT violation on private GPA
+----------------------------
+(1) EPT violation on private GPA or zapping private GPA
+    ::
+
+        walk down the mirror of secure EPT tree (mostly same as the existi=
ng code)
+            |
+            |
+            V
+        mirror of secure EPT tree (KVM MMU software only. reuse of the exi=
sting code)
+
+(2) update the (mirrored) EPT entry. (mostly same as the existing code)
+
+(3) call the hooks with what EPT entry is changed
+    ::
+
+           |
+        NEW: hooks in KVM MMU
+           |
+           V
+        secure EPT root(CPU refers)
+
+(4) the TDX backend calls necessary TDX SEAMCALLs to update real secure EP=
T.
+
+The major modification is to add hooks for the TDX backend for additional
+operations and to pass down which EPT, shared EPT, or private EPT is used,=
 and
+twist the behavior if we're operating on private EPT.
+
+The following depicts the relationship.
+::
+
+                    KVM                             |       TDX module
+                     |                              |           |
+        -------------+----------                    |           |
+        |                      |                    |           |
+        V                      V                    |           |
+     shared GPA           private GPA               |           V
+  CPU shared EPT pointer  KVM private EPT pointer   |  CPU secure EPT poin=
ter
+        |                      |                    |           |
+        |                      |                    |           |
+        V                      V                    |           V
+  shared EPT                private EPT<-------mirror----->Secure EPT
+        |                      |                    |           |
+        |                      \--------------------+------\    |
+        |                                           |      |    |
+        V                                           |      V    V
+  shared guest page                                 |    private guest page
+                                                    |
+                                                    |
+                              non-encrypted memory  |    encrypted memory
+                                                    |
+
+shared EPT: CPU and KVM walk with shared GPA
+            Maintained by the existing code
+private EPT: KVM walks with private GPA
+             Maintained by the twisted existing code
+secure EPT: CPU walks with private GPA.
+            Maintained by TDX module with TDX SEAMCALLs via hooks
+
+
+Tracking private EPT page
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
+Shared EPT pages are managed by struct kvm_mmu_page.  They are linked in a=
 list
+structure.  When necessary, the list is traversed to operate on.  Private =
EPT
+pages have different characteristics.  For example, private pages can't be
+swapped out.  When shrinking memory, we'd like to traverse only shared EPT=
 pages
+and skip private EPT pages.  Likewise, page migration isn't supported for
+private pages (yet).  Introduce an additional list to track shared EPT pag=
es and
+track private EPT pages independently.
+
+At the beginning of EPT violation, the fault handler knows fault GPA, thus=
 it
+knows which EPT to operate on, private or shared.  If it's private EPT,
+an additional task is done.  Something like "if (private) { callback a hoo=
k }".
+Since the fault handler has deep function calls, it's cumbersome to hold t=
he
+information of which EPT is operating.  Options to mitigate it are
+
+1. Pass the information as an argument for the function call.
+2. Record the information in struct kvm_mmu_page somehow.
+3. Record the information in vcpu structure.
+
+Option 2 was chosen.  Because option 1 requires modifying all the function=
s.  It
+would affect badly to the normal case.  Option 3 doesn't work well because=
 in
+some cases, we need to walk both private and shared EPT.
+
+The role of the EPT page can be utilized and one bit can be curved out from
+unused bits in struct kvm_mmu_page_role.  When allocating the EPT page,
+initialize the information. Mostly struct kvm_mmu_page is available because
+we're operating on EPT pages.
+
+
+The conversion of private GPA and shared GPA
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+A page of a given GPA can be assigned to only private GPA xor shared GPA a=
t one
+time.  (This is the restriction by KVM implementation to avoid doubling gu=
est
+memory usage.  Not by TDX architecture.)  The GPA can't be accessed
+simultaneously via both private GPA and shared GPA.  On guest startup, all=
 the
+GPAs are assigned as private.  Guest converts the range of GPA to shared (=
or
+private) from private (or shared) by MapGPA hypercall.  MapGPA hypercall t=
akes
+the start GPA and the size of the region.  If the given start GPA is shared
+(shared bit set), VMM converts the region into shared (if it's already sha=
red,
+nop).
+
+If the guest TD triggers an EPT violation on the already converted region,
+i.e. EPT violation on private(or shared) GPA when page is shared(or privat=
e),
+the access won't be allowed.  KVM_EXIT_MEMORY_FAULT is triggered.  The user
+space VMM will decide how to handle it.
+
+If the guest access private (or shared) GPA after the conversion to shared=
 (or
+private), the following sequence will be observed
+
+1. MapGPA(shared GPA: shared bit set) hypercall
+2. KVM cause KVM_TDX_EXIT with hypercall to the user space VMM.
+3. The user space VMM converts the GPA with KVM_SET_MEMORY_ATTRIBUTES(shar=
ed).
+4. The user space VMM resumes vcpu execution with KVM_VCPU_RUN
+5. Guest TD accesses private GPA (shared bit cleared)
+6. KVM gets EPT violation on private GPA (shared bit cleared)
+7. KVM finds the GPA was set to be shared in the xarray while the faulting=
 GPA
+   is private (shared bit cleared)
+8. KVM_EXIT_MEMORY_FAULT.  User space VMM, e.g. qemu, decide what to do.
+   Typically requests KVM conversion of GPA without MapGPA hypercall.
+9. KVM converts GPA from shared to private with
+   KVM_SET_MEMORY_ATTRIBUTES(private)
+10. Resume vcpu execution
+
+At step 9, user space VMM may think such memory access is due to race, let=
 vcpu
+resume without conversion with the expectation that other vcpu issues MapG=
PA.
+Or user space VMM may think such memory access is doubtful and the guest is
+trying to attack VMM.  It may throttle vcpu execution as mitigation or fin=
ally
+kill such a guest.  Or user space VMM may think it's a bug of the guest TD=
, kill
+the guest TD.
+
+This sequence is not efficient.  Guest TD shouldn't access private (or sha=
red)
+GPA after converting GPA to shared (or private).  Although KVM can handle =
it,
+it's sub-optimal and won't be optimized.
+
+The original TDP MMU and race condition
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+Because vcpus share the EPT, once the EPT entry is zapped, we need to shoo=
tdown
+TLB.  Send IPI to remote vcpus.  Remote vcpus flush their down TLBs.  Unti=
l TLB
+shootdown is done, vcpus may reference the zapped guest page.
+
+TDP MMU uses read lock of mmu_lock to mitigate vcpu contention.  When read=
 lock
+is obtained, it depends on the atomic update of the EPT entry.  (On the ot=
her
+hand legacy MMU uses write lock.)  When vcpu is populating/zapping the EPT=
 entry
+with a read lock held, other vcpu may be populating or zapping the same EPT
+entry at the same time.
+
+To avoid the race condition, the entry is frozen.  It means the EPT entry =
is set
+to the special value, REMOVED_SPTE which clears the present bit.  And then=
 after
+TLB shootdown, update the EPT entry to the final value.
+
+Concurrent zapping
+------------------
+1. read lock
+2. freeze the EPT entry (atomically set the value to REMOVED_SPTE)
+   If other vcpu froze the entry, restart page fault.
+3. TLB shootdown
+
+   * send IPI to remote vcpus
+   * TLB flush (local and remote)
+
+   For each entry update, TLB shootdown is needed because of the
+   concurrency.
+4. atomically set the EPT entry to the final value
+5. read unlock
+
+Concurrent populating
+---------------------
+In the case of populating the non-present EPT entry, atomically update the=
 EPT
+entry.
+
+1. read lock
+
+2. atomically update the EPT entry
+   If other vcpu frozen the entry or updated the entry, restart page fault.
+
+3. read unlock
+
+In the case of updating the present EPT entry (e.g. page migration), the
+operation is split into two.  Zapping the entry and populating the entry.
+
+1. read lock
+2. zap the EPT entry.  follow the concurrent zapping case.
+3. populate the non-present EPT entry.
+4. read unlock
+
+Non-concurrent batched zapping
+------------------------------
+In some cases, zapping the ranges is done exclusively with a write lock he=
ld.
+In this case, the TLB shootdown is batched into one.
+
+1. write lock
+2. zap the EPT entries by traversing them
+3. TLB shootdown
+4. write unlock
+
+For Secure EPT, TDX SEAMCALLs are needed in addition to updating the mirro=
red
+EPT entry.
+
+TDX concurrent zapping
+----------------------
+Add a hook for TDX SEAMCALLs at the step of the TLB shootdown.
+
+1. read lock
+2. freeze the EPT entry(set the value to REMOVED_SPTE)
+3. TLB shootdown via a hook
+
+   * TLB.MEM.RANGE.BLOCK()
+   * TLB.MEM.TRACK()
+   * send IPI to remote vcpus
+
+4. set the EPT entry to the final value
+5. read unlock
+
+TDX concurrent populating
+-------------------------
+TDX SEAMCALLs are required in addition to operating the mirrored EPT entry=
.  The
+frozen entry is utilized by following the zapping case to avoid the race
+condition.  A hook can be added.
+
+1. read lock
+2. freeze the EPT entry
+3. hook
+
+   * TDH_MEM_SEPT_ADD() for non-leaf or TDH_MEM_PAGE_AUG() for leaf.
+
+4. set the EPT entry to the final value
+5. read unlock
+
+Without freezing the entry, the following race can happen.  Suppose two vc=
pus
+are faulting on the same GPA and the 2M and 4K level entries aren't popula=
ted
+yet.
+
+* vcpu 1: update 2M level EPT entry
+* vcpu 2: update 4K level EPT entry
+* vcpu 2: TDX SEAMCALL to update 4K secure EPT entry =3D> error
+* vcpu 1: TDX SEAMCALL to update 2M secure EPT entry
+
+
+TDX non-concurrent batched zapping
+----------------------------------
+For simplicity, the procedure of concurrent populating is utilized.  The
+procedure can be optimized later.
+
+
+Co-existing with unmapping guest private memory
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+TODO.  This needs to be addressed.
+
+
+Restrictions or future work
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
+The following features aren't supported yet at the moment.
+
+* optimizing non-concurrent zap
+* Large page
+* Page migration
--=20
2.25.1