[PATCH 3/5] x86/ucode: Refine TLB flush fix for AMD Fam17h CPUs

Andrew Cooper posted 5 patches 1 week, 2 days ago
There is a newer version of this series
[PATCH 3/5] x86/ucode: Refine TLB flush fix for AMD Fam17h CPUs
Posted by Andrew Cooper 1 week, 2 days ago
In the time since Xen discovered this, Linux stubled on it too and AMD
produced a narrower fix, limited to Fam17h CPUs only.  To my knowledge,
there's no erratum or other public statement from AMD on the matter.

Adjust Xen to match the narrower fix.

Link: https://lore.kernel.org/lkml/ZyulbYuvrkshfsd2@antipodes/T/#u
Fixes: f19a199281a2 ("x86/AMD: flush TLB after ucode update")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

There is a difference in memory clobber with the invlpg() wrapper.
apply_microcode() specifically does not want a memory clobber, whereas
flush_area_local() doesn't need it as far as I can tell (there's nothing
unsafe to move across this instruction).
---
 xen/arch/x86/cpu/microcode/amd.c    | 14 +++++++++++---
 xen/arch/x86/flushtlb.c             |  3 +--
 xen/arch/x86/include/asm/flushtlb.h |  5 +++++
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index 59332da2b827..7ff702c06caf 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -306,10 +306,18 @@ static int cf_check apply_microcode(const struct microcode_patch *patch,
     sig->rev = rev;
 
     /*
-     * Some processors leave the ucode blob mapping as UC after the update.
-     * Flush the mapping to regain normal cacheability.
+     * Family 0x17 processors leave the mapping of the ucode as UC after the
+     * update.  Flush the mapping to regain normal cacheability.
+     *
+     * We do not know the granularity of mapping, and at 3200 bytes in size
+     * there is a good chance of crossing a 4k page boundary.  Shoot-down the
+     * start and end just to be safe.
      */
-    flush_area_local(patch, FLUSH_TLB_GLOBAL | FLUSH_ORDER(0));
+    if ( boot_cpu_data.family == 0x17 )
+    {
+        invlpg(patch);
+        invlpg((const void *)patch + F17H_MPB_MAX_SIZE - 1);
+    }
 
     /* check current patch id and patch's id for match */
     if ( hw_err || (rev != patch->patch_id) )
diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 94b2a30e8d30..09e676c151fa 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -222,8 +222,7 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
                 }
             }
             else
-                asm volatile ( "invlpg %0"
-                               : : "m" (*(const char *)(va)) : "memory" );
+                invlpg(va);
         }
         else
             do_tlb_flush();
diff --git a/xen/arch/x86/include/asm/flushtlb.h b/xen/arch/x86/include/asm/flushtlb.h
index 019d886f2b80..37bc203652b3 100644
--- a/xen/arch/x86/include/asm/flushtlb.h
+++ b/xen/arch/x86/include/asm/flushtlb.h
@@ -98,6 +98,11 @@ static inline unsigned long read_cr3(void)
     return cr3;
 }
 
+static inline void invlpg(const void *p)
+{
+    asm volatile ( "invlpg %0" :: "m" (*(const char *)p) );
+}
+
 /* Write pagetable base and implicitly tick the tlbflush clock. */
 void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
 
-- 
2.39.5


Re: [PATCH 3/5] x86/ucode: Refine TLB flush fix for AMD Fam17h CPUs
Posted by Jan Beulich 1 week, 1 day ago
On 20.10.2025 15:19, Andrew Cooper wrote:
> In the time since Xen discovered this, Linux stubled on it too and AMD
> produced a narrower fix, limited to Fam17h CPUs only.  To my knowledge,
> there's no erratum or other public statement from AMD on the matter.
> 
> Adjust Xen to match the narrower fix.
> 
> Link: https://lore.kernel.org/lkml/ZyulbYuvrkshfsd2@antipodes/T/#u
> Fixes: f19a199281a2 ("x86/AMD: flush TLB after ucode update")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> 
> There is a difference in memory clobber with the invlpg() wrapper.
> apply_microcode() specifically does not want a memory clobber, whereas
> flush_area_local() doesn't need it as far as I can tell (there's nothing
> unsafe to move across this instruction).

The memory access(es) it would not want moving across would be page table
writes. With link-time optimization, wouldn't it in principle be possible
for flush_area_local() to be inlined, and the invlpg() then be moved?
Potentially ahead of a PTE write, seeing that read_cr4() is merely a
simple memory only, and hence the compiler could utilize knowledge it has
to short-circuit that as well?

For the ucode case things can't move unduly due to both rdmsrl() and
invlpg() using "asm volatile()".

With the clobber re-added
Acked-by: Jan Beulich <jbeulich@suse.com>

Otherwise I need to be further educated as to why omitting the clobber is
safe in all (present and future) cases.

Jan