In the time since Xen discovered this, Linux stubled on it too and AMD
produced a narrower fix, limited to Fam17h CPUs only. To my knowledge,
there's no erratum or other public statement from AMD on the matter.
Adjust Xen to match the narrower fix.
Link: https://lore.kernel.org/lkml/ZyulbYuvrkshfsd2@antipodes/T/#u
Fixes: f19a199281a2 ("x86/AMD: flush TLB after ucode update")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
There is a difference in memory clobber with the invlpg() wrapper.
apply_microcode() specifically does not want a memory clobber, whereas
flush_area_local() doesn't need it as far as I can tell (there's nothing
unsafe to move across this instruction).
---
xen/arch/x86/cpu/microcode/amd.c | 14 +++++++++++---
xen/arch/x86/flushtlb.c | 3 +--
xen/arch/x86/include/asm/flushtlb.h | 5 +++++
3 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index 59332da2b827..7ff702c06caf 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -306,10 +306,18 @@ static int cf_check apply_microcode(const struct microcode_patch *patch,
sig->rev = rev;
/*
- * Some processors leave the ucode blob mapping as UC after the update.
- * Flush the mapping to regain normal cacheability.
+ * Family 0x17 processors leave the mapping of the ucode as UC after the
+ * update. Flush the mapping to regain normal cacheability.
+ *
+ * We do not know the granularity of mapping, and at 3200 bytes in size
+ * there is a good chance of crossing a 4k page boundary. Shoot-down the
+ * start and end just to be safe.
*/
- flush_area_local(patch, FLUSH_TLB_GLOBAL | FLUSH_ORDER(0));
+ if ( boot_cpu_data.family == 0x17 )
+ {
+ invlpg(patch);
+ invlpg((const void *)patch + F17H_MPB_MAX_SIZE - 1);
+ }
/* check current patch id and patch's id for match */
if ( hw_err || (rev != patch->patch_id) )
diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 94b2a30e8d30..09e676c151fa 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -222,8 +222,7 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
}
}
else
- asm volatile ( "invlpg %0"
- : : "m" (*(const char *)(va)) : "memory" );
+ invlpg(va);
}
else
do_tlb_flush();
diff --git a/xen/arch/x86/include/asm/flushtlb.h b/xen/arch/x86/include/asm/flushtlb.h
index 019d886f2b80..37bc203652b3 100644
--- a/xen/arch/x86/include/asm/flushtlb.h
+++ b/xen/arch/x86/include/asm/flushtlb.h
@@ -98,6 +98,11 @@ static inline unsigned long read_cr3(void)
return cr3;
}
+static inline void invlpg(const void *p)
+{
+ asm volatile ( "invlpg %0" :: "m" (*(const char *)p) );
+}
+
/* Write pagetable base and implicitly tick the tlbflush clock. */
void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
--
2.39.5
On 20.10.2025 15:19, Andrew Cooper wrote:
> In the time since Xen discovered this, Linux stubled on it too and AMD
> produced a narrower fix, limited to Fam17h CPUs only. To my knowledge,
> there's no erratum or other public statement from AMD on the matter.
>
> Adjust Xen to match the narrower fix.
>
> Link: https://lore.kernel.org/lkml/ZyulbYuvrkshfsd2@antipodes/T/#u
> Fixes: f19a199281a2 ("x86/AMD: flush TLB after ucode update")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
>
> There is a difference in memory clobber with the invlpg() wrapper.
> apply_microcode() specifically does not want a memory clobber, whereas
> flush_area_local() doesn't need it as far as I can tell (there's nothing
> unsafe to move across this instruction).
The memory access(es) it would not want moving across would be page table
writes. With link-time optimization, wouldn't it in principle be possible
for flush_area_local() to be inlined, and the invlpg() then be moved?
Potentially ahead of a PTE write, seeing that read_cr4() is merely a
simple memory only, and hence the compiler could utilize knowledge it has
to short-circuit that as well?
For the ucode case things can't move unduly due to both rdmsrl() and
invlpg() using "asm volatile()".
With the clobber re-added
Acked-by: Jan Beulich <jbeulich@suse.com>
Otherwise I need to be further educated as to why omitting the clobber is
safe in all (present and future) cases.
Jan
© 2016 - 2025 Red Hat, Inc.