[PATCH v2] x86: prefer RDTSCP in rdtsc_ordered()

Jan Beulich posted 1 patch 1 month, 3 weeks ago
Failed in applying to current master (apply log)
[PATCH v2] x86: prefer RDTSCP in rdtsc_ordered()
Posted by Jan Beulich 1 month, 3 weeks ago
If available, its use is supposed to be cheaper than LFENCE+RDTSC, and
is virtually guaranteed to be cheaper than MFENCE+RDTSC.

Update commentary (and indentation) while there.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Replace original part of the comment with more up-to-date info.

--- a/xen/arch/x86/include/asm/msr.h
+++ b/xen/arch/x86/include/asm/msr.h
@@ -108,18 +108,24 @@ static inline uint64_t rdtsc(void)
 
 static inline uint64_t rdtsc_ordered(void)
 {
-	/*
-	 * The RDTSC instruction is not ordered relative to memory access.
-	 * The Intel SDM and the AMD APM are both vague on this point, but
-	 * empirically an RDTSC instruction can be speculatively executed
-	 * before prior loads.  An RDTSC immediately after an appropriate
-	 * barrier appears to be ordered as a normal load, that is, it
-	 * provides the same ordering guarantees as reading from a global
-	 * memory location that some other imaginary CPU is updating
-	 * continuously with a time stamp.
-	 */
-	alternative("lfence", "mfence", X86_FEATURE_MFENCE_RDTSC);
-	return rdtsc();
+    uint64_t low, high, aux;
+
+    /*
+     * The RDTSC instruction is not serializing.  Make it dispatch serializing
+     * for the purposes here by issuing LFENCE (or MFENCE if necessary) ahead
+     * of it.
+     *
+     * RDTSCP, otoh, "does wait until all previous instructions have executed
+     * and all previous loads are globally visible" (SDM) / "forces all older
+     * instructions to retire before reading the timestamp counter" (APM).
+     */
+    alternative_io_2("lfence; rdtsc",
+                     "mfence; rdtsc", X86_FEATURE_MFENCE_RDTSC,
+                     "rdtscp",        X86_FEATURE_RDTSCP,
+                     ASM_OUTPUT2("=a" (low), "=d" (high), "=c" (aux)),
+                     /* no inputs */);
+
+    return (high << 32) | low;
 }
 
 #define __write_tsc(val) wrmsrl(MSR_IA32_TSC, val)
Re: [PATCH v2] x86: prefer RDTSCP in rdtsc_ordered()
Posted by Andrew Cooper 1 month, 3 weeks ago
On 01/10/2024 9:15 am, Jan Beulich wrote:
> If available, its use is supposed to be cheaper than LFENCE+RDTSC, and
> is virtually guaranteed to be cheaper than MFENCE+RDTSC.
>
> Update commentary (and indentation) while there.
>
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>