arch/x86/include/asm/processor.h | 2 ++ arch/x86/kernel/cpu/common.c | 38 ++++++++++++++++++-------------- 2 files changed, 24 insertions(+), 16 deletions(-)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Xen Security Advisory CVE-2024-53241 / XSA-466
version 3
Xen hypercall page unsafe against speculative attacks
UPDATES IN VERSION 3
====================
Update of patch 5, public release.
ISSUE DESCRIPTION
=================
Xen guests need to use different processor instructions to make explicit
calls into the Xen hypervisor depending on guest type and/or CPU
vendor. In order to hide those differences, the hypervisor can fill a
hypercall page with the needed instruction sequences, allowing the guest
operating system to call into the hypercall page instead of having to
choose the correct instructions.
The hypercall page contains whole functions, which are written by the
hypervisor and executed by the guest. With the lack of an interface
between the guest OS and the hypervisor specifying how a potential
modification of those functions should look like, the Xen hypervisor has
no knowledge how any potential mitigation should look like or which
hardening features should be put into place.
This results in potential vulnerabilities if the guest OS is using any
speculative mitigation that performs a compiler transform on "ret"
instructions in order to work (e.g. the Linux kernel rethunk or safe-ret
mitigations).
Furthermore, the hypercall page has no provision for Control-flow
Integrity schemes (e.g. kCFI/CET-IBT/FineIBT), and will simply
malfunction in such configurations.
IMPACT
======
Some mitigations for hardware vulnerabilities the guest OS is relying on to
work might not be fully functional, resulting in e.g. guest user processes
being able to read data they ought not have access to.
VULNERABLE SYSTEMS
==================
Only x86 systems are potentially vulnerable, Arm systems are not vulnerable.
All guest types (PV, PVH and HVM) are potentially vulnerable.
Linux guests are known to be vulnerable, guests using other operating
systems might be vulnerable, too.
MITIGATION
==========
Running only Linux guest kernels not relying on "ret" assembler instruction
patching (kernel config option CONFIG_MITIGATION_RETHUNK/CONFIG_RETHUNK
disabled) will avoid the vulnerability, as long as this option isn't
required to be safe on the underlying hardware.
CREDITS
=======
This issue was discovered by Andrew Cooper of XenServer.
RESOLUTION
==========
Applying the set of attached patches resolves this issue.
The patch to Xen is simply a documentation update to clarify that an OS author
might not want to use a hypercall page.
xsa466-linux-*.patch Linux
xsa466-xen.patch xen-unstable
$ sha256sum xsa466*
498fb2538f650d694bbd6b7d2333dcf9a12d0bdfcba65257a7d14c88f5b86801 xsa466-linux-01.patch
1e0d5f68d1cb4a0ef8914ae6bdeb4e18bae94c6d19659708ad707da784c0aa5c xsa466-linux-02.patch
b3056b34c1565f901cb4ba11c03a51d4f045b5de7cd16c6e510e0bcee8cc6cd7 xsa466-linux-03.patch
0215e56739ab5b0d0ec0125f3d1806c3a0a0dcb3f562014f59b5145184a41467 xsa466-linux-04.patch
314e67060ab4f47883cf2b124d54ce3cd4b0363f0545ad907a7b754a4405aacd xsa466-linux-05.patch
adbef75416379d96ebb72463872f993e9d8b7d119091480ad1e70fd448481733 xsa466-linux-06.patch
36874014cee5d5213610a6ffdd0e3e67d0258d28f2587b8470fdd0cef96e5013 xsa466-linux-07.patch
367f981ef8adc11b99cc6999b784305bcdcd55db0358fd6a2171509bf7f64345 xsa466-xen.patch
$
DEPLOYMENT DURING EMBARGO
=========================
Deployment of patches or mitigations is NOT permitted (except where
all the affected systems and VMs are administered and used only by
organisations which are members of the Xen Project Security Issues
Predisclosure List). Specifically, deployment on public cloud systems
is NOT permitted.
This is because the mitigation or patches need to be applied to the guests.
Deployment is permitted only AFTER the embargo ends.
(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable. This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)
For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----
iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmdhaxAMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZUeMH/0Qkn9G8iQWJ0fHMCxvd1lcr3RNWK5GfXqlAZvuJ
YRQMDslYCzuvzrkLnoe/P/zSSs+omEYMcOVsJCBkTqePs8yIdqwBvBfZ79I1htIu
IdyJt8SE5+b70ZlumQJ8ef1Za3lp8bxvEZVa8GIokOu0Ef1iqUKNl7tQgIoQjOUH
bV/1sFN5MNFsUshOW5DnLiRrE8j0/0nfbzHPu5H9S2B4eN38oPmTabvZG/IHky8R
VFyTvqFrKZONDDhdxyFE9PBOtP6Bu3EV+Emmxb3Q84l2oEIqgab0Xxj4QGBTuLMn
PPcU5/D6Giqx3jBMdrkMAXtBuXBYO/inqsX1IJLic9W13+A=
=wlOW
-----END PGP SIGNATURE-----
From efbcd61d9bebb771c836a3b8bfced8165633db7c Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Thu, 17 Oct 2024 08:29:48 +0200
Subject: x86: make get_cpu_vendor() accessible from Xen code
In order to be able to differentiate between AMD and Intel based
systems for very early hypercalls without having to rely on the Xen
hypercall page, make get_cpu_vendor() non-static.
Refactor early_cpu_init() for the same reason by splitting out the
loop initializing cpu_devs() into an externally callable function.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/include/asm/processor.h | 2 ++
arch/x86/kernel/cpu/common.c | 38 ++++++++++++++++++--------------
2 files changed, 24 insertions(+), 16 deletions(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index c0975815980c..20e6009381ed 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -230,6 +230,8 @@ static inline unsigned long long l1tf_pfn_limit(void)
return BIT_ULL(boot_cpu_data.x86_cache_bits - 1 - PAGE_SHIFT);
}
+void init_cpu_devs(void);
+void get_cpu_vendor(struct cpuinfo_x86 *c);
extern void early_cpu_init(void);
extern void identify_secondary_cpu(struct cpuinfo_x86 *);
extern void print_cpu_info(struct cpuinfo_x86 *);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a5c28975c608..3e9037690814 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -867,7 +867,7 @@ static void cpu_detect_tlb(struct cpuinfo_x86 *c)
tlb_lld_4m[ENTRIES], tlb_lld_1g[ENTRIES]);
}
-static void get_cpu_vendor(struct cpuinfo_x86 *c)
+void get_cpu_vendor(struct cpuinfo_x86 *c)
{
char *v = c->x86_vendor_id;
int i;
@@ -1649,15 +1649,11 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
detect_nopl();
}
-void __init early_cpu_init(void)
+void __init init_cpu_devs(void)
{
const struct cpu_dev *const *cdev;
int count = 0;
-#ifdef CONFIG_PROCESSOR_SELECT
- pr_info("KERNEL supported cpus:\n");
-#endif
-
for (cdev = __x86_cpu_dev_start; cdev < __x86_cpu_dev_end; cdev++) {
const struct cpu_dev *cpudev = *cdev;
@@ -1665,20 +1661,30 @@ void __init early_cpu_init(void)
break;
cpu_devs[count] = cpudev;
count++;
+ }
+}
+void __init early_cpu_init(void)
+{
#ifdef CONFIG_PROCESSOR_SELECT
- {
- unsigned int j;
-
- for (j = 0; j < 2; j++) {
- if (!cpudev->c_ident[j])
- continue;
- pr_info(" %s %s\n", cpudev->c_vendor,
- cpudev->c_ident[j]);
- }
- }
+ unsigned int i, j;
+
+ pr_info("KERNEL supported cpus:\n");
#endif
+
+ init_cpu_devs();
+
+#ifdef CONFIG_PROCESSOR_SELECT
+ for (i = 0; i < X86_VENDOR_NUM && cpu_devs[i]; i++) {
+ for (j = 0; j < 2; j++) {
+ if (!cpu_devs[i]->c_ident[j])
+ continue;
+ pr_info(" %s %s\n", cpu_devs[i]->c_vendor,
+ cpu_devs[i]->c_ident[j]);
+ }
}
+#endif
+
early_identify_cpu(&boot_cpu_data);
}
--
2.43.0
From dda014ba59331dee4f3b773a020e109932f4bd24 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Fri, 29 Nov 2024 15:47:49 +0100
Subject: objtool/x86: allow syscall instruction
The syscall instruction is used in Xen PV mode for doing hypercalls.
Allow syscall to be used in the kernel in case it is tagged with an
unwind hint for objtool.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
---
tools/objtool/check.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 4ce176ad411f..76060da755b5 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3820,9 +3820,12 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
break;
case INSN_CONTEXT_SWITCH:
- if (func && (!next_insn || !next_insn->hint)) {
- WARN_INSN(insn, "unsupported instruction in callable function");
- return 1;
+ if (func) {
+ if (!next_insn || !next_insn->hint) {
+ WARN_INSN(insn, "unsupported instruction in callable function");
+ return 1;
+ }
+ break;
}
return 0;
--
2.43.0
From 0ef8047b737d7480a5d4c46d956e97c190f13050 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Fri, 29 Nov 2024 16:15:54 +0100
Subject: x86/static-call: provide a way to do very early
static-call updates
Add static_call_update_early() for updating static-call targets in
very early boot.
This will be needed for support of Xen guest type specific hypercall
functions.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
arch/x86/include/asm/static_call.h | 15 ++++++++++++
arch/x86/include/asm/sync_core.h | 6 ++---
arch/x86/kernel/static_call.c | 9 ++++++++
include/linux/compiler.h | 37 +++++++++++++++++++++---------
include/linux/static_call.h | 1 +
kernel/static_call_inline.c | 2 +-
6 files changed, 55 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/static_call.h b/arch/x86/include/asm/static_call.h
index 125c407e2abe..41502bd2afd6 100644
--- a/arch/x86/include/asm/static_call.h
+++ b/arch/x86/include/asm/static_call.h
@@ -65,4 +65,19 @@
extern bool __static_call_fixup(void *tramp, u8 op, void *dest);
+extern void __static_call_update_early(void *tramp, void *func);
+
+#define static_call_update_early(name, _func) \
+({ \
+ typeof(&STATIC_CALL_TRAMP(name)) __F = (_func); \
+ if (static_call_initialized) { \
+ __static_call_update(&STATIC_CALL_KEY(name), \
+ STATIC_CALL_TRAMP_ADDR(name), __F);\
+ } else { \
+ WRITE_ONCE(STATIC_CALL_KEY(name).func, _func); \
+ __static_call_update_early(STATIC_CALL_TRAMP_ADDR(name),\
+ __F); \
+ } \
+})
+
#endif /* _ASM_STATIC_CALL_H */
diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index ab7382f92aff..96bda43538ee 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -8,7 +8,7 @@
#include <asm/special_insns.h>
#ifdef CONFIG_X86_32
-static inline void iret_to_self(void)
+static __always_inline void iret_to_self(void)
{
asm volatile (
"pushfl\n\t"
@@ -19,7 +19,7 @@ static inline void iret_to_self(void)
: ASM_CALL_CONSTRAINT : : "memory");
}
#else
-static inline void iret_to_self(void)
+static __always_inline void iret_to_self(void)
{
unsigned int tmp;
@@ -55,7 +55,7 @@ static inline void iret_to_self(void)
* Like all of Linux's memory ordering operations, this is a
* compiler barrier as well.
*/
-static inline void sync_core(void)
+static __always_inline void sync_core(void)
{
/*
* The SERIALIZE instruction is the most straightforward way to
diff --git a/arch/x86/kernel/static_call.c b/arch/x86/kernel/static_call.c
index 4eefaac64c6c..9eed0c144dad 100644
--- a/arch/x86/kernel/static_call.c
+++ b/arch/x86/kernel/static_call.c
@@ -172,6 +172,15 @@ void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
}
EXPORT_SYMBOL_GPL(arch_static_call_transform);
+noinstr void __static_call_update_early(void *tramp, void *func)
+{
+ BUG_ON(system_state != SYSTEM_BOOTING);
+ BUG_ON(!early_boot_irqs_disabled);
+ BUG_ON(static_call_initialized);
+ __text_gen_insn(tramp, JMP32_INSN_OPCODE, tramp, func, JMP32_INSN_SIZE);
+ sync_core();
+}
+
#ifdef CONFIG_MITIGATION_RETHUNK
/*
* This is called by apply_returns() to fix up static call trampolines,
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 469a64dd6495..240c632c5b95 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -216,28 +216,43 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
#endif /* __KERNEL__ */
+/**
+ * offset_to_ptr - convert a relative memory offset to an absolute pointer
+ * @off: the address of the 32-bit offset value
+ */
+static inline void *offset_to_ptr(const int *off)
+{
+ return (void *)((unsigned long)off + *off);
+}
+
+#endif /* __ASSEMBLY__ */
+
+#ifdef CONFIG_64BIT
+#define ARCH_SEL(a,b) a
+#else
+#define ARCH_SEL(a,b) b
+#endif
+
/*
* Force the compiler to emit 'sym' as a symbol, so that we can reference
* it from inline assembler. Necessary in case 'sym' could be inlined
* otherwise, or eliminated entirely due to lack of references that are
* visible to the compiler.
*/
-#define ___ADDRESSABLE(sym, __attrs) \
- static void * __used __attrs \
+#define ___ADDRESSABLE(sym, __attrs) \
+ static void * __used __attrs \
__UNIQUE_ID(__PASTE(__addressable_,sym)) = (void *)(uintptr_t)&sym;
+
#define __ADDRESSABLE(sym) \
___ADDRESSABLE(sym, __section(".discard.addressable"))
-/**
- * offset_to_ptr - convert a relative memory offset to an absolute pointer
- * @off: the address of the 32-bit offset value
- */
-static inline void *offset_to_ptr(const int *off)
-{
- return (void *)((unsigned long)off + *off);
-}
+#define __ADDRESSABLE_ASM(sym) \
+ .pushsection .discard.addressable,"aw"; \
+ .align ARCH_SEL(8,4); \
+ ARCH_SEL(.quad, .long) __stringify(sym); \
+ .popsection;
-#endif /* __ASSEMBLY__ */
+#define __ADDRESSABLE_ASM_STR(sym) __stringify(__ADDRESSABLE_ASM(sym))
#ifdef __CHECKER__
#define __BUILD_BUG_ON_ZERO_MSG(e, msg) (0)
diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index 141e6b176a1b..785980af8972 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -138,6 +138,7 @@
#ifdef CONFIG_HAVE_STATIC_CALL
#include <asm/static_call.h>
+extern int static_call_initialized;
/*
* Either @site or @tramp can be NULL.
*/
diff --git a/kernel/static_call_inline.c b/kernel/static_call_inline.c
index 5259cda486d0..bb7d066a7c39 100644
--- a/kernel/static_call_inline.c
+++ b/kernel/static_call_inline.c
@@ -15,7 +15,7 @@ extern struct static_call_site __start_static_call_sites[],
extern struct static_call_tramp_key __start_static_call_tramp_key[],
__stop_static_call_tramp_key[];
-static int static_call_initialized;
+int static_call_initialized;
/*
* Must be called before early_initcall() to be effective.
--
2.43.0
From a2796dff62d6c6bfc5fbebdf2bee0d5ac0438906 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Wed, 16 Oct 2024 10:40:26 +0200
Subject: x86/xen: don't do PV iret hypercall through hypercall
page
Instead of jumping to the Xen hypercall page for doing the iret
hypercall, directly code the required sequence in xen-asm.S.
This is done in preparation of no longer using hypercall page at all,
as it has shown to cause problems with speculation mitigations.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
arch/x86/xen/xen-asm.S | 27 ++++++++++++++++++---------
1 file changed, 18 insertions(+), 9 deletions(-)
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 83189cf5cdce..ca6edfe4c14b 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -176,7 +176,6 @@ SYM_CODE_START(xen_early_idt_handler_array)
SYM_CODE_END(xen_early_idt_handler_array)
__FINIT
-hypercall_iret = hypercall_page + __HYPERVISOR_iret * 32
/*
* Xen64 iret frame:
*
@@ -186,17 +185,28 @@ hypercall_iret = hypercall_page + __HYPERVISOR_iret * 32
* cs
* rip <-- standard iret frame
*
- * flags
+ * flags <-- xen_iret must push from here on
*
- * rcx }
- * r11 }<-- pushed by hypercall page
- * rsp->rax }
+ * rcx
+ * r11
+ * rsp->rax
*/
+.macro xen_hypercall_iret
+ pushq $0 /* Flags */
+ push %rcx
+ push %r11
+ push %rax
+ mov $__HYPERVISOR_iret, %eax
+ syscall /* Do the IRET. */
+#ifdef CONFIG_MITIGATION_SLS
+ int3
+#endif
+.endm
+
SYM_CODE_START(xen_iret)
UNWIND_HINT_UNDEFINED
ANNOTATE_NOENDBR
- pushq $0
- jmp hypercall_iret
+ xen_hypercall_iret
SYM_CODE_END(xen_iret)
/*
@@ -301,8 +311,7 @@ SYM_CODE_START(xen_entry_SYSENTER_compat)
ENDBR
lea 16(%rsp), %rsp /* strip %rcx, %r11 */
mov $-ENOSYS, %rax
- pushq $0
- jmp hypercall_iret
+ xen_hypercall_iret
SYM_CODE_END(xen_entry_SYSENTER_compat)
SYM_CODE_END(xen_entry_SYSCALL_compat)
--
2.43.0
From b4845bb6383821a9516ce30af3a27dc873e37fd4 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Thu, 17 Oct 2024 11:00:52 +0200
Subject: x86/xen: add central hypercall functions
Add generic hypercall functions usable for all normal (i.e. not iret)
hypercalls. Depending on the guest type and the processor vendor
different functions need to be used due to the to be used instruction
for entering the hypervisor:
- PV guests need to use syscall
- HVM/PVH guests on Intel need to use vmcall
- HVM/PVH guests on AMD and Hygon need to use vmmcall
As PVH guests need to issue hypercalls very early during boot, there
is a 4th hypercall function needed for HVM/PVH which can be used on
Intel and AMD processors. It will check the vendor type and then set
the Intel or AMD specific function to use via static_call().
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
---
arch/x86/include/asm/xen/hypercall.h | 3 +
arch/x86/xen/enlighten.c | 65 ++++++++++++++++++++++
arch/x86/xen/enlighten_hvm.c | 4 ++
arch/x86/xen/enlighten_pv.c | 4 +-
arch/x86/xen/xen-asm.S | 23 ++++++++
arch/x86/xen/xen-head.S | 83 ++++++++++++++++++++++++++++
arch/x86/xen/xen-ops.h | 9 +++
7 files changed, 190 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index a2dd24947eb8..6b4dd4de08a6 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -88,6 +88,9 @@ struct xen_dm_op_buf;
extern struct { char _entry[32]; } hypercall_page[];
+void xen_hypercall_func(void);
+DECLARE_STATIC_CALL(xen_hypercall, xen_hypercall_func);
+
#define __HYPERCALL "call hypercall_page+%c[offset]"
#define __HYPERCALL_ENTRY(x) \
[offset] "i" (__HYPERVISOR_##x * sizeof(hypercall_page[0]))
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 84e5adbd0925..1887435af2fb 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -2,6 +2,7 @@
#include <linux/console.h>
#include <linux/cpu.h>
+#include <linux/instrumentation.h>
#include <linux/kexec.h>
#include <linux/memblock.h>
#include <linux/slab.h>
@@ -23,6 +24,9 @@
EXPORT_SYMBOL_GPL(hypercall_page);
+DEFINE_STATIC_CALL(xen_hypercall, xen_hypercall_hvm);
+EXPORT_STATIC_CALL_TRAMP(xen_hypercall);
+
/*
* Pointer to the xen_vcpu_info structure or
* &HYPERVISOR_shared_info->vcpu_info[cpu]. See xen_hvm_init_shared_info
@@ -68,6 +72,67 @@ EXPORT_SYMBOL(xen_start_flags);
*/
struct shared_info *HYPERVISOR_shared_info = &xen_dummy_shared_info;
+static __ref void xen_get_vendor(void)
+{
+ init_cpu_devs();
+ cpu_detect(&boot_cpu_data);
+ get_cpu_vendor(&boot_cpu_data);
+}
+
+void xen_hypercall_setfunc(void)
+{
+ if (static_call_query(xen_hypercall) != xen_hypercall_hvm)
+ return;
+
+ if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
+ boot_cpu_data.x86_vendor == X86_VENDOR_HYGON))
+ static_call_update(xen_hypercall, xen_hypercall_amd);
+ else
+ static_call_update(xen_hypercall, xen_hypercall_intel);
+}
+
+/*
+ * Evaluate processor vendor in order to select the correct hypercall
+ * function for HVM/PVH guests.
+ * Might be called very early in boot before vendor has been set by
+ * early_cpu_init().
+ */
+noinstr void *__xen_hypercall_setfunc(void)
+{
+ void (*func)(void);
+
+ /*
+ * Xen is supported only on CPUs with CPUID, so testing for
+ * X86_FEATURE_CPUID is a test for early_cpu_init() having been
+ * run.
+ *
+ * Note that __xen_hypercall_setfunc() is noinstr only due to a nasty
+ * dependency chain: it is being called via the xen_hypercall static
+ * call when running as a PVH or HVM guest. Hypercalls need to be
+ * noinstr due to PV guests using hypercalls in noinstr code. So we
+ * can safely tag the function body as "instrumentation ok", since
+ * the PV guest requirement is not of interest here (xen_get_vendor()
+ * calls noinstr functions, and static_call_update_early() might do
+ * so, too).
+ */
+ instrumentation_begin();
+
+ if (!boot_cpu_has(X86_FEATURE_CPUID))
+ xen_get_vendor();
+
+ if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
+ boot_cpu_data.x86_vendor == X86_VENDOR_HYGON))
+ func = xen_hypercall_amd;
+ else
+ func = xen_hypercall_intel;
+
+ static_call_update_early(xen_hypercall, func);
+
+ instrumentation_end();
+
+ return func;
+}
+
static int xen_cpu_up_online(unsigned int cpu)
{
xen_init_lock_cpu(cpu);
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index 24d2957a4726..973a74fc966a 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -300,6 +300,10 @@ static uint32_t __init xen_platform_hvm(void)
if (xen_pv_domain())
return 0;
+ /* Set correct hypercall function. */
+ if (xen_domain)
+ xen_hypercall_setfunc();
+
if (xen_pvh_domain() && nopv) {
/* Guest booting via the Xen-PVH boot entry goes here */
pr_info("\"nopv\" parameter is ignored in PVH guest\n");
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index d6818c6cafda..a8eb7e0c473c 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1341,6 +1341,9 @@ asmlinkage __visible void __init xen_start_kernel(struct start_info *si)
xen_domain_type = XEN_PV_DOMAIN;
xen_start_flags = xen_start_info->flags;
+ /* Interrupts are guaranteed to be off initially. */
+ early_boot_irqs_disabled = true;
+ static_call_update_early(xen_hypercall, xen_hypercall_pv);
xen_setup_features();
@@ -1431,7 +1434,6 @@ asmlinkage __visible void __init xen_start_kernel(struct start_info *si)
WARN_ON(xen_cpuhp_setup(xen_cpu_up_prepare_pv, xen_cpu_dead_pv));
local_irq_disable();
- early_boot_irqs_disabled = true;
xen_raw_console_write("mapping kernel into physical memory\n");
xen_setup_kernel_pagetable((pgd_t *)xen_start_info->pt_base,
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index ca6edfe4c14b..b518f36d1ca2 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -20,9 +20,32 @@
#include <linux/init.h>
#include <linux/linkage.h>
+#include <linux/objtool.h>
#include <../entry/calling.h>
.pushsection .noinstr.text, "ax"
+/*
+ * PV hypercall interface to the hypervisor.
+ *
+ * Called via inline asm(), so better preserve %rcx and %r11.
+ *
+ * Input:
+ * %eax: hypercall number
+ * %rdi, %rsi, %rdx, %r10, %r8: args 1..5 for the hypercall
+ * Output: %rax
+ */
+SYM_FUNC_START(xen_hypercall_pv)
+ ANNOTATE_NOENDBR
+ push %rcx
+ push %r11
+ UNWIND_HINT_SAVE
+ syscall
+ UNWIND_HINT_RESTORE
+ pop %r11
+ pop %rcx
+ RET
+SYM_FUNC_END(xen_hypercall_pv)
+
/*
* Disabling events is simply a matter of making the event mask
* non-zero.
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 7f6c69dbb816..c173ba6740e9 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -6,9 +6,11 @@
#include <linux/elfnote.h>
#include <linux/init.h>
+#include <linux/instrumentation.h>
#include <asm/boot.h>
#include <asm/asm.h>
+#include <asm/frame.h>
#include <asm/msr.h>
#include <asm/page_types.h>
#include <asm/percpu.h>
@@ -87,6 +89,87 @@ SYM_CODE_END(xen_cpu_bringup_again)
#endif
#endif
+ .pushsection .noinstr.text, "ax"
+/*
+ * Xen hypercall interface to the hypervisor.
+ *
+ * Input:
+ * %eax: hypercall number
+ * 32-bit:
+ * %ebx, %ecx, %edx, %esi, %edi: args 1..5 for the hypercall
+ * 64-bit:
+ * %rdi, %rsi, %rdx, %r10, %r8: args 1..5 for the hypercall
+ * Output: %[er]ax
+ */
+SYM_FUNC_START(xen_hypercall_hvm)
+ ENDBR
+ FRAME_BEGIN
+ /* Save all relevant registers (caller save and arguments). */
+#ifdef CONFIG_X86_32
+ push %eax
+ push %ebx
+ push %ecx
+ push %edx
+ push %esi
+ push %edi
+#else
+ push %rax
+ push %rcx
+ push %rdx
+ push %rdi
+ push %rsi
+ push %r11
+ push %r10
+ push %r9
+ push %r8
+#ifdef CONFIG_FRAME_POINTER
+ pushq $0 /* Dummy push for stack alignment. */
+#endif
+#endif
+ /* Set the vendor specific function. */
+ call __xen_hypercall_setfunc
+ /* Set ZF = 1 if AMD, Restore saved registers. */
+#ifdef CONFIG_X86_32
+ lea xen_hypercall_amd, %ebx
+ cmp %eax, %ebx
+ pop %edi
+ pop %esi
+ pop %edx
+ pop %ecx
+ pop %ebx
+ pop %eax
+#else
+ lea xen_hypercall_amd(%rip), %rbx
+ cmp %rax, %rbx
+#ifdef CONFIG_FRAME_POINTER
+ pop %rax /* Dummy pop. */
+#endif
+ pop %r8
+ pop %r9
+ pop %r10
+ pop %r11
+ pop %rsi
+ pop %rdi
+ pop %rdx
+ pop %rcx
+ pop %rax
+#endif
+ /* Use correct hypercall function. */
+ jz xen_hypercall_amd
+ jmp xen_hypercall_intel
+SYM_FUNC_END(xen_hypercall_hvm)
+
+SYM_FUNC_START(xen_hypercall_amd)
+ vmmcall
+ RET
+SYM_FUNC_END(xen_hypercall_amd)
+
+SYM_FUNC_START(xen_hypercall_intel)
+ vmcall
+ RET
+SYM_FUNC_END(xen_hypercall_intel)
+ .popsection
+
ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz "linux")
ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz "2.6")
ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION, .asciz "xen-3.0")
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index e1b782e823e6..63c13a2ccf55 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -326,4 +326,13 @@ static inline void xen_smp_intr_free_pv(unsigned int cpu) {}
static inline void xen_smp_count_cpus(void) { }
#endif /* CONFIG_SMP */
+#ifdef CONFIG_XEN_PV
+void xen_hypercall_pv(void);
+#endif
+void xen_hypercall_hvm(void);
+void xen_hypercall_amd(void);
+void xen_hypercall_intel(void);
+void xen_hypercall_setfunc(void);
+void *__xen_hypercall_setfunc(void);
+
#endif /* XEN_OPS_H */
--
2.43.0
From b1c2cb86f4a7861480ad54bb9a58df3cbebf8e92 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Thu, 17 Oct 2024 14:47:13 +0200
Subject: x86/xen: use new hypercall functions instead of hypercall
page
Call the Xen hypervisor via the new xen_hypercall_func static-call
instead of the hypercall page.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
arch/x86/include/asm/xen/hypercall.h | 33 +++++++++++++++++-----------
1 file changed, 20 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 6b4dd4de08a6..7d5f8ad66774 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -39,9 +39,11 @@
#include <linux/string.h>
#include <linux/types.h>
#include <linux/pgtable.h>
+#include <linux/instrumentation.h>
#include <trace/events/xen.h>
+#include <asm/alternative.h>
#include <asm/page.h>
#include <asm/smap.h>
#include <asm/nospec-branch.h>
@@ -91,9 +93,17 @@ extern struct { char _entry[32]; } hypercall_page[];
void xen_hypercall_func(void);
DECLARE_STATIC_CALL(xen_hypercall, xen_hypercall_func);
-#define __HYPERCALL "call hypercall_page+%c[offset]"
-#define __HYPERCALL_ENTRY(x) \
- [offset] "i" (__HYPERVISOR_##x * sizeof(hypercall_page[0]))
+#ifdef MODULE
+#define __ADDRESSABLE_xen_hypercall
+#else
+#define __ADDRESSABLE_xen_hypercall __ADDRESSABLE_ASM_STR(__SCK__xen_hypercall)
+#endif
+
+#define __HYPERCALL \
+ __ADDRESSABLE_xen_hypercall \
+ "call __SCT__xen_hypercall"
+
+#define __HYPERCALL_ENTRY(x) "a" (x)
#ifdef CONFIG_X86_32
#define __HYPERCALL_RETREG "eax"
@@ -151,7 +161,7 @@ DECLARE_STATIC_CALL(xen_hypercall, xen_hypercall_func);
__HYPERCALL_0ARG(); \
asm volatile (__HYPERCALL \
: __HYPERCALL_0PARAM \
- : __HYPERCALL_ENTRY(name) \
+ : __HYPERCALL_ENTRY(__HYPERVISOR_ ## name) \
: __HYPERCALL_CLOBBER0); \
(type)__res; \
})
@@ -162,7 +172,7 @@ DECLARE_STATIC_CALL(xen_hypercall, xen_hypercall_func);
__HYPERCALL_1ARG(a1); \
asm volatile (__HYPERCALL \
: __HYPERCALL_1PARAM \
- : __HYPERCALL_ENTRY(name) \
+ : __HYPERCALL_ENTRY(__HYPERVISOR_ ## name) \
: __HYPERCALL_CLOBBER1); \
(type)__res; \
})
@@ -173,7 +183,7 @@ DECLARE_STATIC_CALL(xen_hypercall, xen_hypercall_func);
__HYPERCALL_2ARG(a1, a2); \
asm volatile (__HYPERCALL \
: __HYPERCALL_2PARAM \
- : __HYPERCALL_ENTRY(name) \
+ : __HYPERCALL_ENTRY(__HYPERVISOR_ ## name) \
: __HYPERCALL_CLOBBER2); \
(type)__res; \
})
@@ -184,7 +194,7 @@ DECLARE_STATIC_CALL(xen_hypercall, xen_hypercall_func);
__HYPERCALL_3ARG(a1, a2, a3); \
asm volatile (__HYPERCALL \
: __HYPERCALL_3PARAM \
- : __HYPERCALL_ENTRY(name) \
+ : __HYPERCALL_ENTRY(__HYPERVISOR_ ## name) \
: __HYPERCALL_CLOBBER3); \
(type)__res; \
})
@@ -195,7 +205,7 @@ DECLARE_STATIC_CALL(xen_hypercall, xen_hypercall_func);
__HYPERCALL_4ARG(a1, a2, a3, a4); \
asm volatile (__HYPERCALL \
: __HYPERCALL_4PARAM \
- : __HYPERCALL_ENTRY(name) \
+ : __HYPERCALL_ENTRY(__HYPERVISOR_ ## name) \
: __HYPERCALL_CLOBBER4); \
(type)__res; \
})
@@ -209,12 +219,9 @@ xen_single_call(unsigned int call,
__HYPERCALL_DECLS;
__HYPERCALL_5ARG(a1, a2, a3, a4, a5);
- if (call >= PAGE_SIZE / sizeof(hypercall_page[0]))
- return -EINVAL;
-
- asm volatile(CALL_NOSPEC
+ asm volatile(__HYPERCALL
: __HYPERCALL_5PARAM
- : [thunk_target] "a" (&hypercall_page[call])
+ : __HYPERCALL_ENTRY(call)
: __HYPERCALL_CLOBBER5);
return (long)__res;
--
2.43.0
From 7fa0da5373685e7ed249af3fa317ab1e1ba8b0a6 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Thu, 17 Oct 2024 15:27:31 +0200
Subject: x86/xen: remove hypercall page
The hypercall page is no longer needed. It can be removed, as from the
Xen perspective it is optional.
But, from Linux's perspective, it removes naked RET instructions that
escape the speculative protections that Call Depth Tracking and/or
Untrain Ret are trying to achieve.
This is part of XSA-466 / CVE-2024-53241.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
arch/x86/include/asm/xen/hypercall.h | 2 --
arch/x86/kernel/callthunks.c | 5 -----
arch/x86/kernel/vmlinux.lds.S | 4 ----
arch/x86/xen/enlighten.c | 2 --
arch/x86/xen/enlighten_hvm.c | 9 +--------
arch/x86/xen/enlighten_pvh.c | 7 -------
arch/x86/xen/xen-head.S | 24 ------------------------
7 files changed, 1 insertion(+), 52 deletions(-)
diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 7d5f8ad66774..97771b9d33af 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -88,8 +88,6 @@ struct xen_dm_op_buf;
* there aren't more than 5 arguments...)
*/
-extern struct { char _entry[32]; } hypercall_page[];
-
void xen_hypercall_func(void);
DECLARE_STATIC_CALL(xen_hypercall, xen_hypercall_func);
diff --git a/arch/x86/kernel/callthunks.c b/arch/x86/kernel/callthunks.c
index 465647456753..f17d16607882 100644
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -142,11 +142,6 @@ static bool skip_addr(void *dest)
if (dest >= (void *)relocate_kernel &&
dest < (void*)relocate_kernel + KEXEC_CONTROL_CODE_MAX_SIZE)
return true;
-#endif
-#ifdef CONFIG_XEN
- if (dest >= (void *)hypercall_page &&
- dest < (void*)hypercall_page + PAGE_SIZE)
- return true;
#endif
return false;
}
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index fab3ac9a4574..6a17396c8174 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -519,14 +519,10 @@ INIT_PER_CPU(irq_stack_backing_store);
* linker will never mark as relocatable. (Using just ABSOLUTE() is not
* sufficient for that).
*/
-#ifdef CONFIG_XEN
#ifdef CONFIG_XEN_PV
xen_elfnote_entry_value =
ABSOLUTE(xen_elfnote_entry) + ABSOLUTE(startup_xen);
#endif
-xen_elfnote_hypercall_page_value =
- ABSOLUTE(xen_elfnote_hypercall_page) + ABSOLUTE(hypercall_page);
-#endif
#ifdef CONFIG_PVH
xen_elfnote_phys32_entry_value =
ABSOLUTE(xen_elfnote_phys32_entry) + ABSOLUTE(pvh_start_xen - LOAD_OFFSET);
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 1887435af2fb..43dcd8c7badc 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -22,8 +22,6 @@
#include "xen-ops.h"
-EXPORT_SYMBOL_GPL(hypercall_page);
-
DEFINE_STATIC_CALL(xen_hypercall, xen_hypercall_hvm);
EXPORT_STATIC_CALL_TRAMP(xen_hypercall);
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index 973a74fc966a..fe57ff85d004 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -106,15 +106,8 @@ static void __init init_hvm_pv_info(void)
/* PVH set up hypercall page in xen_prepare_pvh(). */
if (xen_pvh_domain())
pv_info.name = "Xen PVH";
- else {
- u64 pfn;
- uint32_t msr;
-
+ else
pv_info.name = "Xen HVM";
- msr = cpuid_ebx(base + 2);
- pfn = __pa(hypercall_page);
- wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
- }
xen_setup_features();
diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index bf68c329fc01..0e3d930bcb89 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -129,17 +129,10 @@ static void __init pvh_arch_setup(void)
void __init xen_pvh_init(struct boot_params *boot_params)
{
- u32 msr;
- u64 pfn;
-
xen_pvh = 1;
xen_domain_type = XEN_HVM_DOMAIN;
xen_start_flags = pvh_start_info.flags;
- msr = cpuid_ebx(xen_cpuid_base() + 2);
- pfn = __pa(hypercall_page);
- wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
-
x86_init.oem.arch_setup = pvh_arch_setup;
x86_init.oem.banner = xen_banner;
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index c173ba6740e9..9252652afe59 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -22,28 +22,6 @@
#include <xen/interface/xen-mca.h>
#include <asm/xen/interface.h>
-.pushsection .noinstr.text, "ax"
- .balign PAGE_SIZE
-SYM_CODE_START(hypercall_page)
- .rept (PAGE_SIZE / 32)
- UNWIND_HINT_FUNC
- ANNOTATE_NOENDBR
- ANNOTATE_UNRET_SAFE
- ret
- /*
- * Xen will write the hypercall page, and sort out ENDBR.
- */
- .skip 31, 0xcc
- .endr
-
-#define HYPERCALL(n) \
- .equ xen_hypercall_##n, hypercall_page + __HYPERVISOR_##n * 32; \
- .type xen_hypercall_##n, @function; .size xen_hypercall_##n, 32
-#include <asm/xen-hypercalls.h>
-#undef HYPERCALL
-SYM_CODE_END(hypercall_page)
-.popsection
-
#ifdef CONFIG_XEN_PV
__INIT
SYM_CODE_START(startup_xen)
@@ -199,8 +177,6 @@ SYM_FUNC_END(xen_hypercall_intel)
#else
# define FEATURES_DOM0 0
#endif
- ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, .globl xen_elfnote_hypercall_page;
- xen_elfnote_hypercall_page: _ASM_PTR xen_elfnote_hypercall_page_value - .)
ELFNOTE(Xen, XEN_ELFNOTE_SUPPORTED_FEATURES,
.long FEATURES_PV | FEATURES_PVH | FEATURES_DOM0)
ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz "generic")
--
2.43.0
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: docs/guest-guide: Discuss when not use a hypercall page
The Linux rethunk and safe-ret speculative safety techniques involve
transforming `ret` to `jmp __x86_return_thunk` at compile time. Placing naked
`ret`s back in executable .text breaks these mitigations.
CET-IBT requires ENDBR instructions, and while we could in principle fix that,
the need to select between ENDBR32 or ENDBR64 means that the contents of the
hypercall page would need to become more mode-specific than it currently
is (HVM hypercall pages are currently 32bit and 64bit compatbile). However,
there's no feasible way to make a hypercall page compatible with fine-grain
CFI schemes such as FineIBT.
OSes which care about either of these things are better off avoiding the
hypercall page.
This is part of XSA-466.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
diff --git a/docs/guest-guide/x86/hypercall-abi.rst b/docs/guest-guide/x86/hypercall-abi.rst
index 8004122ca49d..745fbbb64a26 100644
--- a/docs/guest-guide/x86/hypercall-abi.rst
+++ b/docs/guest-guide/x86/hypercall-abi.rst
@@ -82,6 +82,13 @@ The hypercall page is a page of guest RAM into which Xen will write suitable
transfer stubs. It is intended as a convenience for guests, but use of the
hypercall page is not mandatory for making hypercalls to Xen.
+.. note::
+
+ There are cases where a hypercall page should not be used. It contains
+ ``ret`` instructions which are not compatible with certain speculative
+ security techniques, and it does not contain ``endbr`` instructions which
+ are necessary for certain Control-flow Integrity schemes.
+
Creating a hypercall page is an isolated operation from Xen's point of view.
It is the guests responsibility to ensure that the hypercall page, once
written by Xen, is mapped with executable permissions so it may be used.
On Tue, 2024-12-17 at 12:18 +0000, Xen.org security team wrote: > Xen Security Advisory CVE-2024-53241 / XSA-466 > version 3 > > Xen hypercall page unsafe against speculative attacks > > UPDATES IN VERSION 3 > ==================== > > Update of patch 5, public release. Can't we even use the hypercall page early in boot? Surely we have to know whether we're running on an Intel or AMD CPU before we get to the point where we can enable any of the new control-flow integrity support? Do we need to jump through those hoops do do that early detection and setup? Enabling the hypercall page is also one of the two points where Xen will 'latch' that the guest is 64-bit, which affects the layout of the shared_info, vcpu_info and runstate structures. The other such latching point is when the guest sets HVM_PARAM_CALLBACK_IRQ, and I *think* that should work in all implementations of the Xen ABI (including QEMU/KVM and EC2). But would want to test. But perhaps it wouldn't hurt for maximal compatibility for Linux to set the hypercall page *anyway*, even if Linux doesn't then use it — or only uses it during early boot?
On 23.12.24 15:24, David Woodhouse wrote: > On Tue, 2024-12-17 at 12:18 +0000, Xen.org security team wrote: >> Xen Security Advisory CVE-2024-53241 / XSA-466 >> version 3 >> >> Xen hypercall page unsafe against speculative attacks >> >> UPDATES IN VERSION 3 >> ==================== >> >> Update of patch 5, public release. > > Can't we even use the hypercall page early in boot? Surely we have to > know whether we're running on an Intel or AMD CPU before we get to the > point where we can enable any of the new control-flow integrity > support? Do we need to jump through those hoops do do that early > detection and setup? The downside of this approach would be to have another variant to do hypercalls. So you'd have to replace the variant being able to use AMD or INTEL specific instructions with a function doing the hypercall via the hypercall page. I'm planning to send patches for Xen and the kernel to add CPUID feature bits indicating which instruction to use. This will make life much easier. > Enabling the hypercall page is also one of the two points where Xen > will 'latch' that the guest is 64-bit, which affects the layout of the > shared_info, vcpu_info and runstate structures. > > The other such latching point is when the guest sets > HVM_PARAM_CALLBACK_IRQ, and I *think* that should work in all > implementations of the Xen ABI (including QEMU/KVM and EC2). But would > want to test. > > But perhaps it wouldn't hurt for maximal compatibility for Linux to set > the hypercall page *anyway*, even if Linux doesn't then use it — or > only uses it during early boot? I'm seeing potential problems with that approach when someone is using an out-of-tree module doing hypercalls. With having the hypercall page present such a module would add a way to do speculative attacks, while deleting the hypercall page would result in a failure trying to load such a module. Juergen
On Thu, 2025-01-02 at 13:07 +0100, Jürgen Groß wrote: > On 23.12.24 15:24, David Woodhouse wrote: > > On Tue, 2024-12-17 at 12:18 +0000, Xen.org security team wrote: > > > Xen Security Advisory CVE-2024-53241 / XSA-466 > > > version 3 > > > > > > Xen hypercall page unsafe against speculative attacks > > > > > > UPDATES IN VERSION 3 > > > ==================== > > > > > > Update of patch 5, public release. > > > > Can't we even use the hypercall page early in boot? Surely we have to > > know whether we're running on an Intel or AMD CPU before we get to the > > point where we can enable any of the new control-flow integrity > > support? Do we need to jump through those hoops do do that early > > detection and setup? > > The downside of this approach would be to have another variant to do > hypercalls. So you'd have to replace the variant being able to use AMD > or INTEL specific instructions with a function doing the hypercall via > the hypercall page. You'd probably start with the hypercall function just jumping directly into the temporary hypercall page during early boot, and then you'd update them to use the natively prepared vmcall/vmmcall version later. All the complexity of patching and CPU detection in early boot seems to be somewhat gratuitous and even counter-productive given the change it introduces to 64-bit latching. And even if the 64-bit latch does happen when HVM_PARAM_CALLBACK_IRQ is set, isn't that potentially a lot later in boot? Xen will be treating this guest as 32-bit until then, so won't all the vcpu_info and runstate structures be wrong even as the secondary CPUs are already up and running? > I'm planning to send patches for Xen and the kernel to add CPUID feature > bits indicating which instruction to use. This will make life much easier. > > > Enabling the hypercall page is also one of the two points where Xen > > will 'latch' that the guest is 64-bit, which affects the layout of the > > shared_info, vcpu_info and runstate structures. > > > > The other such latching point is when the guest sets > > HVM_PARAM_CALLBACK_IRQ, and I *think* that should work in all > > implementations of the Xen ABI (including QEMU/KVM and EC2). But would > > want to test. > > > > But perhaps it wouldn't hurt for maximal compatibility for Linux to set > > the hypercall page *anyway*, even if Linux doesn't then use it — or > > only uses it during early boot? > > I'm seeing potential problems with that approach when someone is using > an out-of-tree module doing hypercalls. > > With having the hypercall page present such a module would add a way to do > speculative attacks, while deleting the hypercall page would result in a > failure trying to load such a module. Is that a response to the original patch series, or to my suggestion? If we temporarily ask Xen to populate a hypercall page which is used during early boot (or even if it's *not* used, and only used to make sure Xen latches 64-bit mode early)... I don't see why that makes any difference to modules. I wasn't suggesting we keep it around and *export* it.
On 02.01.25 13:53, David Woodhouse wrote: > On Thu, 2025-01-02 at 13:07 +0100, Jürgen Groß wrote: >> On 23.12.24 15:24, David Woodhouse wrote: >>> On Tue, 2024-12-17 at 12:18 +0000, Xen.org security team wrote: >>>> Xen Security Advisory CVE-2024-53241 / XSA-466 >>>> version 3 >>>> >>>> Xen hypercall page unsafe against speculative attacks >>>> >>>> UPDATES IN VERSION 3 >>>> ==================== >>>> >>>> Update of patch 5, public release. >>> >>> Can't we even use the hypercall page early in boot? Surely we have to >>> know whether we're running on an Intel or AMD CPU before we get to the >>> point where we can enable any of the new control-flow integrity >>> support? Do we need to jump through those hoops do do that early >>> detection and setup? >> >> The downside of this approach would be to have another variant to do >> hypercalls. So you'd have to replace the variant being able to use AMD >> or INTEL specific instructions with a function doing the hypercall via >> the hypercall page. > > You'd probably start with the hypercall function just jumping directly > into the temporary hypercall page during early boot, and then you'd > update them to use the natively prepared vmcall/vmmcall version later. > > All the complexity of patching and CPU detection in early boot seems to > be somewhat gratuitous and even counter-productive given the change it > introduces to 64-bit latching. > > And even if the 64-bit latch does happen when HVM_PARAM_CALLBACK_IRQ is > set, isn't that potentially a lot later in boot? Xen will be treating > this guest as 32-bit until then, so won't all the vcpu_info and > runstate structures be wrong even as the secondary CPUs are already up > and running? What I don't get is why this latching isn't done when the shared info page is mapped into the guest via the XENMAPSPACE_shared_info hypercall or maybe additionally when VCPUOP_register_runstate_memory_area is being used by the guest. These are the earliest possible cases where the guest is able to access this data. > >> I'm planning to send patches for Xen and the kernel to add CPUID feature >> bits indicating which instruction to use. This will make life much easier. >> >>> Enabling the hypercall page is also one of the two points where Xen >>> will 'latch' that the guest is 64-bit, which affects the layout of the >>> shared_info, vcpu_info and runstate structures. >>> >>> The other such latching point is when the guest sets >>> HVM_PARAM_CALLBACK_IRQ, and I *think* that should work in all >>> implementations of the Xen ABI (including QEMU/KVM and EC2). But would >>> want to test. >>> >>> But perhaps it wouldn't hurt for maximal compatibility for Linux to set >>> the hypercall page *anyway*, even if Linux doesn't then use it — or >>> only uses it during early boot? >> >> I'm seeing potential problems with that approach when someone is using >> an out-of-tree module doing hypercalls. >> >> With having the hypercall page present such a module would add a way to do >> speculative attacks, while deleting the hypercall page would result in a >> failure trying to load such a module. > > Is that a response to the original patch series, or to my suggestion? > > If we temporarily ask Xen to populate a hypercall page which is used > during early boot (or even if it's *not* used, and only used to make > sure Xen latches 64-bit mode early)... I don't see why that makes any > difference to modules. I wasn't suggesting we keep it around and > *export* it. Ah, I didn't read your suggestion that way. Still I believe using the hypercall page is not a good idea, especially as we'd add a hard dependency on the ability to enable CFI in the kernel related to the switch from the hypercall page to the new direct hypercall functions. Juergen
On 02.01.2025 14:38, Jürgen Groß wrote: > On 02.01.25 13:53, David Woodhouse wrote: >> On Thu, 2025-01-02 at 13:07 +0100, Jürgen Groß wrote: >>> On 23.12.24 15:24, David Woodhouse wrote: >>>> On Tue, 2024-12-17 at 12:18 +0000, Xen.org security team wrote: >>>>> Xen Security Advisory CVE-2024-53241 / XSA-466 >>>>> version 3 >>>>> >>>>> Xen hypercall page unsafe against speculative attacks >>>>> >>>>> UPDATES IN VERSION 3 >>>>> ==================== >>>>> >>>>> Update of patch 5, public release. >>>> >>>> Can't we even use the hypercall page early in boot? Surely we have to >>>> know whether we're running on an Intel or AMD CPU before we get to the >>>> point where we can enable any of the new control-flow integrity >>>> support? Do we need to jump through those hoops do do that early >>>> detection and setup? >>> >>> The downside of this approach would be to have another variant to do >>> hypercalls. So you'd have to replace the variant being able to use AMD >>> or INTEL specific instructions with a function doing the hypercall via >>> the hypercall page. >> >> You'd probably start with the hypercall function just jumping directly >> into the temporary hypercall page during early boot, and then you'd >> update them to use the natively prepared vmcall/vmmcall version later. >> >> All the complexity of patching and CPU detection in early boot seems to >> be somewhat gratuitous and even counter-productive given the change it >> introduces to 64-bit latching. >> >> And even if the 64-bit latch does happen when HVM_PARAM_CALLBACK_IRQ is >> set, isn't that potentially a lot later in boot? Xen will be treating >> this guest as 32-bit until then, so won't all the vcpu_info and >> runstate structures be wrong even as the secondary CPUs are already up >> and running? > > What I don't get is why this latching isn't done when the shared info > page is mapped into the guest via the XENMAPSPACE_shared_info hypercall > or maybe additionally when VCPUOP_register_runstate_memory_area is being > used by the guest. The respective commit (6c13b7b80f02) lacking details, my guess is that because at that point both operations you mention didn't have HVM-specific logic (yet), the first HVM-specific operation used by the PV ("unmodified") drivers was selected. pv-ops (having a different init sequence) appeared only later, and was then (seemingly) sufficiently covered by the latching done when the hypercall page was initialized (which was added a few months after said commit). Jan
On Thu, 2025-01-02 at 14:38 +0100, Jürgen Groß wrote: > On 02.01.25 13:53, David Woodhouse wrote: > > On Thu, 2025-01-02 at 13:07 +0100, Jürgen Groß wrote: > > > On 23.12.24 15:24, David Woodhouse wrote: > > > > On Tue, 2024-12-17 at 12:18 +0000, Xen.org security team wrote: > > > > > Xen Security Advisory CVE-2024-53241 / XSA-466 > > > > > version 3 > > > > > > > > > > Xen hypercall page unsafe against speculative attacks > > > > > > > > > > UPDATES IN VERSION 3 > > > > > ==================== > > > > > > > > > > Update of patch 5, public release. > > > > > > > > Can't we even use the hypercall page early in boot? Surely we have to > > > > know whether we're running on an Intel or AMD CPU before we get to the > > > > point where we can enable any of the new control-flow integrity > > > > support? Do we need to jump through those hoops do do that early > > > > detection and setup? > > > > > > The downside of this approach would be to have another variant to do > > > hypercalls. So you'd have to replace the variant being able to use AMD > > > or INTEL specific instructions with a function doing the hypercall via > > > the hypercall page. > > > > You'd probably start with the hypercall function just jumping directly > > into the temporary hypercall page during early boot, and then you'd > > update them to use the natively prepared vmcall/vmmcall version later. > > > > All the complexity of patching and CPU detection in early boot seems to > > be somewhat gratuitous and even counter-productive given the change it > > introduces to 64-bit latching. > > > > And even if the 64-bit latch does happen when HVM_PARAM_CALLBACK_IRQ is > > set, isn't that potentially a lot later in boot? Xen will be treating > > this guest as 32-bit until then, so won't all the vcpu_info and > > runstate structures be wrong even as the secondary CPUs are already up > > and running? > > What I don't get is why this latching isn't done when the shared info > page is mapped into the guest via the XENMAPSPACE_shared_info hypercall > or maybe additionally when VCPUOP_register_runstate_memory_area is being > used by the guest. > > These are the earliest possible cases where the guest is able to access > this data. Well, that's a great idea. Got a time machine? If you have, I have some comments on the MSI→PIRQ mapping nonsense too... :) > > > > > I'm planning to send patches for Xen and the kernel to add CPUID feature > > > bits indicating which instruction to use. This will make life much easier. > > > > > > > Enabling the hypercall page is also one of the two points where Xen > > > > will 'latch' that the guest is 64-bit, which affects the layout of the > > > > shared_info, vcpu_info and runstate structures. > > > > > > > > The other such latching point is when the guest sets > > > > HVM_PARAM_CALLBACK_IRQ, and I *think* that should work in all > > > > implementations of the Xen ABI (including QEMU/KVM and EC2). But would > > > > want to test. > > > > > > > > But perhaps it wouldn't hurt for maximal compatibility for Linux to set > > > > the hypercall page *anyway*, even if Linux doesn't then use it — or > > > > only uses it during early boot? > > > > > > I'm seeing potential problems with that approach when someone is using > > > an out-of-tree module doing hypercalls. > > > > > > With having the hypercall page present such a module would add a way to do > > > speculative attacks, while deleting the hypercall page would result in a > > > failure trying to load such a module. > > > > Is that a response to the original patch series, or to my suggestion? > > > > If we temporarily ask Xen to populate a hypercall page which is used > > during early boot (or even if it's *not* used, and only used to make > > sure Xen latches 64-bit mode early)... I don't see why that makes any > > difference to modules. I wasn't suggesting we keep it around and > > *export* it. > > Ah, I didn't read your suggestion that way. > > Still I believe using the hypercall page is not a good idea, especially as > we'd add a hard dependency on the ability to enable CFI in the kernel related > to the switch from the hypercall page to the new direct hypercall functions. Are you suggesting that you're able to enable the CPU-specific CFI protections before you even know whether it's an Intel or AMD CPU?
On 02.01.25 14:40, David Woodhouse wrote: > On Thu, 2025-01-02 at 14:38 +0100, Jürgen Groß wrote: >> On 02.01.25 13:53, David Woodhouse wrote: >>> On Thu, 2025-01-02 at 13:07 +0100, Jürgen Groß wrote: >>>> On 23.12.24 15:24, David Woodhouse wrote: >>>>> On Tue, 2024-12-17 at 12:18 +0000, Xen.org security team wrote: >>>>>> Xen Security Advisory CVE-2024-53241 / XSA-466 >>>>>> version 3 >>>>>> >>>>>> Xen hypercall page unsafe against speculative attacks >>>>>> >>>>>> UPDATES IN VERSION 3 >>>>>> ==================== >>>>>> >>>>>> Update of patch 5, public release. >>>>> >>>>> Can't we even use the hypercall page early in boot? Surely we have to >>>>> know whether we're running on an Intel or AMD CPU before we get to the >>>>> point where we can enable any of the new control-flow integrity >>>>> support? Do we need to jump through those hoops do do that early >>>>> detection and setup? >>>> >>>> The downside of this approach would be to have another variant to do >>>> hypercalls. So you'd have to replace the variant being able to use AMD >>>> or INTEL specific instructions with a function doing the hypercall via >>>> the hypercall page. >>> >>> You'd probably start with the hypercall function just jumping directly >>> into the temporary hypercall page during early boot, and then you'd >>> update them to use the natively prepared vmcall/vmmcall version later. >>> >>> All the complexity of patching and CPU detection in early boot seems to >>> be somewhat gratuitous and even counter-productive given the change it >>> introduces to 64-bit latching. >>> >>> And even if the 64-bit latch does happen when HVM_PARAM_CALLBACK_IRQ is >>> set, isn't that potentially a lot later in boot? Xen will be treating >>> this guest as 32-bit until then, so won't all the vcpu_info and >>> runstate structures be wrong even as the secondary CPUs are already up >>> and running? >> >> What I don't get is why this latching isn't done when the shared info >> page is mapped into the guest via the XENMAPSPACE_shared_info hypercall >> or maybe additionally when VCPUOP_register_runstate_memory_area is being >> used by the guest. >> >> These are the earliest possible cases where the guest is able to access >> this data. > > Well, that's a great idea. Got a time machine? If you have, I have some > comments on the MSI→PIRQ mapping nonsense too... :) > >>> >>>> I'm planning to send patches for Xen and the kernel to add CPUID feature >>>> bits indicating which instruction to use. This will make life much easier. >>>> >>>>> Enabling the hypercall page is also one of the two points where Xen >>>>> will 'latch' that the guest is 64-bit, which affects the layout of the >>>>> shared_info, vcpu_info and runstate structures. >>>>> >>>>> The other such latching point is when the guest sets >>>>> HVM_PARAM_CALLBACK_IRQ, and I *think* that should work in all >>>>> implementations of the Xen ABI (including QEMU/KVM and EC2). But would >>>>> want to test. >>>>> >>>>> But perhaps it wouldn't hurt for maximal compatibility for Linux to set >>>>> the hypercall page *anyway*, even if Linux doesn't then use it — or >>>>> only uses it during early boot? >>>> >>>> I'm seeing potential problems with that approach when someone is using >>>> an out-of-tree module doing hypercalls. >>>> >>>> With having the hypercall page present such a module would add a way to do >>>> speculative attacks, while deleting the hypercall page would result in a >>>> failure trying to load such a module. >>> >>> Is that a response to the original patch series, or to my suggestion? >>> >>> If we temporarily ask Xen to populate a hypercall page which is used >>> during early boot (or even if it's *not* used, and only used to make >>> sure Xen latches 64-bit mode early)... I don't see why that makes any >>> difference to modules. I wasn't suggesting we keep it around and >>> *export* it. >> >> Ah, I didn't read your suggestion that way. >> >> Still I believe using the hypercall page is not a good idea, especially as >> we'd add a hard dependency on the ability to enable CFI in the kernel related >> to the switch from the hypercall page to the new direct hypercall functions. > > Are you suggesting that you're able to enable the CPU-specific CFI > protections before you even know whether it's an Intel or AMD CPU? Not before that, but maybe rather soon afterwards. And the hypercall page needs to be decommissioned before the next hypercall is happening. The question is whether we have a hook in place to do that switch between cpu identification and CFI enabling. Juergen
On Thu, 2025-01-02 at 15:02 +0100, Jürgen Groß wrote: > > Are you suggesting that you're able to enable the CPU-specific CFI > > protections before you even know whether it's an Intel or AMD CPU? > > Not before that, but maybe rather soon afterwards. And the hypercall page > needs to be decommissioned before the next hypercall is happening. The question > is whether we have a hook in place to do that switch between cpu identification > and CFI enabling. Not sure that's how I'd phrase it. Even if we have to add a hook at the right time to switch from the Xen-populated hypercall page to the one filled in by Linux, the question is whether adding that hook is simpler than all this early static_call stuff that's been thrown together, and the open questions about the 64-bit latching.
On 02.01.25 15:06, David Woodhouse wrote: > On Thu, 2025-01-02 at 15:02 +0100, Jürgen Groß wrote: >>> Are you suggesting that you're able to enable the CPU-specific CFI >>> protections before you even know whether it's an Intel or AMD CPU? >> >> Not before that, but maybe rather soon afterwards. And the hypercall page >> needs to be decommissioned before the next hypercall is happening. The question >> is whether we have a hook in place to do that switch between cpu identification >> and CFI enabling. > > Not sure that's how I'd phrase it. Even if we have to add a hook at the > right time to switch from the Xen-populated hypercall page to the one > filled in by Linux, the question is whether adding that hook is simpler > than all this early static_call stuff that's been thrown together, and > the open questions about the 64-bit latching. This is a valid question, yes. My first version of these patches didn't work with static_call, but used the paravirt call patching mechanism replacing an indirect call with a direct one via ALTERNATIVEs. That version was disliked by some involved x86 maintainers, resulting in the addition of the early static_call update mechanism. One thing to mention regarding the 64-bit latching: what would you do with HVM domains? Those are setting up the hypercall page rather late. In case the kernel would use CFI, enabling would happen way before the guest would issue any hypercall, so I guess the latching needs to happen by other means anyway. Or would you want to register the hypercall page without ever intending to use it? Juergen
On Thu, 2025-01-02 at 15:16 +0100, Jürgen Groß wrote: > On 02.01.25 15:06, David Woodhouse wrote: > > On Thu, 2025-01-02 at 15:02 +0100, Jürgen Groß wrote: > > > > Are you suggesting that you're able to enable the CPU-specific CFI > > > > protections before you even know whether it's an Intel or AMD CPU? > > > > > > Not before that, but maybe rather soon afterwards. And the hypercall page > > > needs to be decommissioned before the next hypercall is happening. The question > > > is whether we have a hook in place to do that switch between cpu identification > > > and CFI enabling. > > > > Not sure that's how I'd phrase it. Even if we have to add a hook at the > > right time to switch from the Xen-populated hypercall page to the one > > filled in by Linux, the question is whether adding that hook is simpler > > than all this early static_call stuff that's been thrown together, and > > the open questions about the 64-bit latching. > > This is a valid question, yes. My first version of these patches didn't > work with static_call, but used the paravirt call patching mechanism > replacing an indirect call with a direct one via ALTERNATIVEs. That > version was disliked by some involved x86 maintainers, resulting in the > addition of the early static_call update mechanism. > > One thing to mention regarding the 64-bit latching: what would you do > with HVM domains? Those are setting up the hypercall page rather late. > In case the kernel would use CFI, enabling would happen way before the > guest would issue any hypercall, so I guess the latching needs to happen > by other means anyway. Or would you want to register the hypercall page > without ever intending to use it? With xen_no_vector_callback on the command line, the hypervisor doesn't realise that the guest is 64-bit until long after all the CPUs are brought up. It does boot (and hey, QEMU does get this right!) but I'm still concerned that all those shared structures are 32-bit for that long. I do think the guest kernel should either set the hypercall page, or HVM_PARAM_CALLBACK_IRQ, as early as possible. $ ./qemu-system-x86_64 -display none -vga none -serial mon:stdio -accel kvm,xen-version=0x4000a,kernel-irqchip=split -smp 2 -kernel ~/git/linux-2.6/arch/x86/boot/bzImage -append 'root=/dev/xvda1 console=ttyS0 xen_no_vector_callback earlyprintk=serial' -drive file=/var/lib/libvirt/images/fedora28.qcow2,if=xen -trace kvm_xen\* -m 2g kvm_xen_soft_reset kvm_xen_set_vcpu_attr vcpu attr cpu 0 type 0 gpa 0xffffffffffffffff kvm_xen_set_vcpu_attr vcpu attr cpu 0 type 1 gpa 0xffffffffffffffff kvm_xen_set_vcpu_attr vcpu attr cpu 0 type 2 gpa 0xffffffffffffffff kvm_xen_set_vcpu_callback callback vcpu 0 vector 0 kvm_xen_set_vcpu_attr vcpu attr cpu 1 type 0 gpa 0xffffffffffffffff kvm_xen_set_vcpu_attr vcpu attr cpu 1 type 1 gpa 0xffffffffffffffff kvm_xen_set_vcpu_attr vcpu attr cpu 1 type 2 gpa 0xffffffffffffffff kvm_xen_set_vcpu_callback callback vcpu 1 vector 0 Probing EDD (edd=off to disable)... ok [ 0.000000] Linux version 6.13.0-rc2+ (dwoodhou@i7.infradead.org) (gcc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-3), GNU ld version 2.43.1-2.fc41) #2210 SMP PREEMPT_DYNAMIC Mon Jan 6 17:10:02 GMT 2025 [ 0.000000] Command line: root=/dev/xvda1 console=ttyS0 xen_no_vector_callback earlyprintk=serial [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffdffff] usable [ 0.000000] BIOS-e820: [mem 0x000000007ffe0000-0x000000007fffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000feff8000-0x00000000feffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved [ 0.000000] printk: legacy bootconsole [earlyser0] enabled [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] APIC: Static calls initialized [ 0.000000] SMBIOS 2.8 present. [ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 0.000000] DMI: Memory slots populated: 1/1 [ 0.000000] Hypervisor detected: Xen HVM [ 0.000000] Xen version 4.10. kvm_xen_hypercall xen_hypercall: cpu 0 cpl 0 input 17 a0 0x6 a1 0xffffffff8a003e30 a2 0x0 ret 0x0 kvm_xen_set_shared_info shared info at gfn 0x10 kvm_xen_hypercall xen_hypercall: cpu 0 cpl 0 input 12 a0 0x7 a1 0xffffffff8a003e30 a2 0x8000000000000163 ret 0x0 kvm_xen_set_vcpu_attr vcpu attr cpu 0 type 0 gpa 0x10000 kvm_xen_set_vcpu_attr vcpu attr cpu 1 type 0 gpa 0x10040 [ 0.000000] platform_pci_unplug: Netfront and the Xen platform PCI driver have been compiled for this kernel: unplug emulated NICs. [ 0.000000] platform_pci_unplug: Blkfront and the Xen platform PCI driver have been compiled for this kernel: unplug emulated disks. [ 0.000000] You might have to change the root device [ 0.000000] from /dev/hd[a-d] to /dev/xvd[a-d] [ 0.000000] in your root= kernel command line option kvm_xen_hypercall xen_hypercall: cpu 0 cpl 0 input 34 a0 0x9 a1 0xffffffff8a003e38 a2 0x0 ret 0xffffffffffffffda [ 0.000000] last_pfn = 0x7ffe0 max_arch_pfn = 0x400000000 [ 0.000000] MTRR map: 4 entries (3 fixed + 1 variable; max 19), built from 8 variable MTRRs [ 0.000000] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT Memory KASLR using RDTSC... [ 0.000000] found SMP MP-table at [mem 0x000f5480-0x000f548f] [ 0.000000] ACPI: Early table checksum verification disabled [ 0.000000] ACPI: RSDP 0x00000000000F52A0 000014 (v00 BOCHS ) [ 0.000000] ACPI: RSDT 0x000000007FFE2379 000034 (v01 BOCHS BXPC 00000001 BXPC 00000001) [ 0.000000] ACPI: FACP 0x000000007FFE2225 000074 (v01 BOCHS BXPC 00000001 BXPC 00000001) [ 0.000000] ACPI: DSDT 0x000000007FFE0040 0021E5 (v01 BOCHS BXPC 00000001 BXPC 00000001) [ 0.000000] ACPI: FACS 0x000000007FFE0000 000040 [ 0.000000] ACPI: APIC 0x000000007FFE2299 000080 (v03 BOCHS BXPC 00000001 BXPC 00000001) [ 0.000000] ACPI: HPET 0x000000007FFE2319 000038 (v01 BOCHS BXPC 00000001 BXPC 00000001) [ 0.000000] ACPI: WAET 0x000000007FFE2351 000028 (v01 BOCHS BXPC 00000001 BXPC 00000001) [ 0.000000] ACPI: Reserving FACP table memory at [mem 0x7ffe2225-0x7ffe2298] [ 0.000000] ACPI: Reserving DSDT table memory at [mem 0x7ffe0040-0x7ffe2224] [ 0.000000] ACPI: Reserving FACS table memory at [mem 0x7ffe0000-0x7ffe003f] [ 0.000000] ACPI: Reserving APIC table memory at [mem 0x7ffe2299-0x7ffe2318] [ 0.000000] ACPI: Reserving HPET table memory at [mem 0x7ffe2319-0x7ffe2350] [ 0.000000] ACPI: Reserving WAET table memory at [mem 0x7ffe2351-0x7ffe2378] [ 0.000000] No NUMA configuration found [ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000007ffdffff] [ 0.000000] NODE_DATA(0) allocated [mem 0x7ffb4c00-0x7ffdffff] [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff] [ 0.000000] DMA32 [mem 0x0000000001000000-0x000000007ffdffff] [ 0.000000] Normal empty [ 0.000000] Device empty [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff] [ 0.000000] node 0: [mem 0x0000000000100000-0x000000007ffdffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000007ffdffff] [ 0.000000] On node 0, zone DMA: 1 pages in unavailable ranges [ 0.000000] On node 0, zone DMA: 97 pages in unavailable ranges [ 0.000000] On node 0, zone DMA32: 32 pages in unavailable ranges [ 0.000000] ACPI: PM-Timer IO Port: 0x608 [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) [ 0.000000] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23 [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) [ 0.000000] ACPI: Using ACPI (MADT) for SMP configuration information [ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000 [ 0.000000] CPU topo: Max. logical packages: 1 [ 0.000000] CPU topo: Max. logical dies: 1 [ 0.000000] CPU topo: Max. dies per package: 1 [ 0.000000] CPU topo: Max. threads per core: 1 [ 0.000000] CPU topo: Num. cores per package: 2 [ 0.000000] CPU topo: Num. threads per package: 2 [ 0.000000] CPU topo: Allowing 2 present CPUs plus 0 hotplug CPUs [ 0.000000] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff] [ 0.000000] PM: hibernation: Registered nosave memory: [mem 0x0009f000-0x0009ffff] [ 0.000000] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000effff] [ 0.000000] PM: hibernation: Registered nosave memory: [mem 0x000f0000-0x000fffff] [ 0.000000] [mem 0x80000000-0xfeff7fff] available for PCI devices [ 0.000000] Booting paravirtualized kernel on Xen HVM [ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns [ 0.000000] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1 [ 0.000000] percpu: Embedded 69 pages/cpu s245760 r8192 d28672 u1048576 kvm_xen_hypercall xen_hypercall: cpu 0 cpl 0 input 24 a0 0xa a1 0x0 a2 0xffffffff8a003e90 ret 0x0 kvm_xen_set_vcpu_attr vcpu attr cpu 0 type 0 gpa 0x7dc37c40 [ 0.000000] Kernel command line: root=/dev/xvda1 console=ttyS0 xen_no_vector_callback earlyprintk=serial [ 0.000000] printk: log buffer data + meta data: 262144 + 917504 = 1179648 bytes [ 0.000000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes, linear) [ 0.000000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear) [ 0.000000] Fallback order for Node 0: 0 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 524158 [ 0.000000] Policy zone: DMA32 [ 0.000000] mem auto-init: stack:all(zero), heap alloc:on, heap free:off [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 [ 0.000000] Kernel/User page tables isolation: enabled Poking KASLR using RDTSC... [ 0.000000] ftrace: allocating 57063 entries in 223 pages [ 0.000000] ftrace: allocated 223 pages with 7 groups [ 0.000000] Dynamic Preempt: voluntary [ 0.000000] Running RCU self tests [ 0.000000] Running RCU synchronous self tests [ 0.000000] rcu: Preemptible hierarchical RCU implementation. [ 0.000000] rcu: RCU lockdep checking is enabled. [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=2. [ 0.000000] Trampoline variant of Tasks RCU enabled. [ 0.000000] Rude variant of Tasks RCU enabled. [ 0.000000] Tracing variant of Tasks RCU enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies. [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2 [ 0.000000] Running RCU synchronous self tests [ 0.000000] RCU Tasks: Setting shift to 1 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=2. [ 0.000000] RCU Tasks Rude: Setting shift to 1 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=2. [ 0.000000] RCU Tasks Trace: Setting shift to 1 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=2. [ 0.000000] NR_IRQS: 524544, nr_irqs: 440, preallocated irqs: 16 kvm_xen_hypercall xen_hypercall: cpu 0 cpl 0 input 32 a0 0xb a1 0xffffffff8a003e10 a2 0x0 ret 0xffffffffffffffda [ 0.000000] xen:events: Using 2-level ABI [ 0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention. [ 0.000000] kfence: initialized - using 2097152 bytes for 255 objects at 0x(____ptrval____)-0x(____ptrval____) [ 0.000000] Console: colour *CGA 80x25 kvm_xen_hypercall xen_hypercall: cpu 0 cpl 0 input 34 a0 0x1 a1 0xffffffff8a003e90 a2 0x7ff0 ret 0xffffffffffffffea [ 0.000000] Cannot get hvm parameter CONSOLE_EVTCHN (18): -22! [ 0.000000] printk: legacy console [ttyS0] enabled [ 0.000000] printk: legacy console [ttyS0] enabled [ 0.000000] printk: legacy bootconsole [earlyser0] disabled [ 0.000000] printk: legacy bootconsole [earlyser0] disabled [ 0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [ 0.000000] ... MAX_LOCKDEP_SUBCLASSES: 8 [ 0.000000] ... MAX_LOCK_DEPTH: 48 [ 0.000000] ... MAX_LOCKDEP_KEYS: 8192 [ 0.000000] ... CLASSHASH_SIZE: 4096 [ 0.000000] ... MAX_LOCKDEP_ENTRIES: 32768 [ 0.000000] ... MAX_LOCKDEP_CHAINS: 65536 [ 0.000000] ... CHAINHASH_SIZE: 32768 [ 0.000000] memory used by lock dependency info: 6429 kB [ 0.000000] memory used for stack traces: 4224 kB [ 0.000000] per task-struct memory footprint: 1920 bytes [ 0.000000] ACPI: Core revision 20240827 [ 0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns [ 0.001000] APIC: Switch to symmetric I/O mode setup [ 0.002000] x2apic enabled [ 0.004000] APIC: Switched APIC routing to: physical x2apic [ 0.004000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 0.012000] tsc: Unable to calibrate against PIT [ 0.013000] tsc: using HPET reference calibration [ 0.013000] tsc: Detected 2592.897 MHz processor [ 0.000012] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2560065444d, max_idle_ns: 440795301426 ns [ 0.001491] Calibrating delay loop (skipped), value calculated using timer frequency.. 5185.79 BogoMIPS (lpj=2592897) [ 0.002655] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.003489] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 [ 0.004222] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization [ 0.004490] Spectre V2 : Mitigation: Retpolines [ 0.005070] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch [ 0.005489] Spectre V2 : Spectre v2 / SpectreRSB : Filling RSB on VMEXIT [ 0.006489] Speculative Store Bypass: Vulnerable [ 0.007489] MDS: Vulnerable: Clear CPU buffers attempted, no microcode [ 0.008489] MMIO Stale Data: Unknown: No mitigations [ 0.009133] x86/fpu: x87 FPU will use FXSAVE [ 0.038149] Freeing SMP alternatives memory: 48K [ 0.038496] pid_max: default: 32768 minimum: 301 [ 0.039991] LSM: initializing lsm=lockdown,capability,yama,selinux,bpf,landlock,ima,evm [ 0.040774] Yama: becoming mindful. [ 0.041533] SELinux: Initializing. [ 0.044715] LSM support for eBPF active [ 0.045317] landlock: Up and running. [ 0.045832] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear) [ 0.046494] Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear) [ 0.049687] Running RCU synchronous self tests [ 0.050288] Running RCU synchronous self tests [ 0.153191] smpboot: CPU0: Intel QEMU Virtual CPU version 2.5+ (family: 0xf, model: 0x6b, stepping: 0x1) [ 0.154918] Running RCU Tasks wait API self tests [ 0.258712] Running RCU Tasks Rude wait API self tests [ 0.259499] Running RCU Tasks Trace wait API self tests [ 0.260607] Performance Events: unsupported Netburst CPU model 107 no PMU driver, software events only. [ 0.261553] signal: max sigframe size: 1440 [ 0.262654] rcu: Hierarchical SRCU implementation. [ 0.263493] rcu: Max phase no-delay instances is 400. [ 0.264678] Timer migration: 1 hierarchy levels; 8 children per group; 1 crossnode level [ 0.266509] Callback from call_rcu_tasks_trace() invoked. [ 0.267676] NMI watchdog: Perf NMI watchdog permanently disabled [ 0.268688] smp: Bringing up secondary CPUs ... kvm_xen_hypercall xen_hypercall: cpu 0 cpl 0 input 24 a0 0xa a1 0x1 a2 0xffffa662c0013c90 ret 0x0 kvm_xen_set_vcpu_attr vcpu attr cpu 1 type 0 gpa 0x7dd37c40 [ 0.270151] smpboot: x86: Booting SMP configuration: [ 0.270501] .... node #0, CPUs: #1 [ 0.283862] smp: Brought up 1 node, 2 CPUs [ 0.285133] smpboot: Total of 2 processors activated (10371.58 BogoMIPS) [ 0.286883] Memory: 1985476K/2096632K available (22528K kernel code, 4967K rwdata, 10120K rodata, 5116K init, 16104K bss, 105184K reserved, 0K cma-reserved) [ 0.289089] devtmpfs: initialized [ 0.290565] x86/mm: Memory block size: 128MB [ 0.295795] Running RCU synchronous self tests [ 0.296526] Running RCU synchronous self tests [ 0.298579] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns [ 0.300510] futex hash table entries: 512 (order: 4, 65536 bytes, linear) [ 0.303381] pinctrl core: initialized pinctrl subsystem [ 0.304923] PM: RTC time: 17:13:37, date: 2025-01-06 [ 0.307951] NET: Registered PF_NETLINK/PF_ROUTE protocol family [ 0.310030] DMA: preallocated 256 KiB GFP_KERNEL pool for atomic allocations [ 0.310667] DMA: preallocated 256 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations [ 0.312515] DMA: preallocated 256 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations [ 0.313836] audit: initializing netlink subsys (disabled) kvm_xen_hypercall xen_hypercall: cpu 0 cpl 0 input 34 a0 0x1 a1 0xffffa662c0013d90 a2 0x0 ret 0x0 kvm_xen_hypercall xen_hypercall: cpu 0 cpl 0 input 34 a0 0x1 a1 0xffffa662c0013d90 a2 0x0 ret 0x0 [ 0.314652] audit: type=2000 audit(1736183617.328:1): state=initialized audit_enabled=0 res=1 [ 0.315932] thermal_sys: Registered thermal governor 'fair_share' [ 0.316492] thermal_sys: Registered thermal governor 'bang_bang' [ 0.317508] thermal_sys: Registered thermal governor 'step_wise' [ 0.319506] thermal_sys: Registered thermal governor 'user_space' [ 0.321642] cpuidle: using governor menu [ 0.325622] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 [ 0.329027] PCI: Using configuration type 1 for base access [ 0.330824] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible. [ 0.339561] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages [ 0.340495] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page [ 0.363514] cryptd: max_cpu_qlen set to 1000 [ 0.365598] raid6: skipped pq benchmark and selected sse2x4 [ 0.366469] raid6: using intx1 recovery algorithm [ 0.368039] ACPI: Added _OSI(Module Device) [ 0.368493] ACPI: Added _OSI(Processor Device) [ 0.369190] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.369492] ACPI: Added _OSI(Processor Aggregator Device) [ 0.373105] ACPI: 1 ACPI AML tables successfully acquired and loaded [ 0.376550] ACPI: Interpreter enabled [ 0.377182] ACPI: PM: (supports S0 S3 S4 S5) [ 0.377507] ACPI: Using IOAPIC for interrupt routing [ 0.378972] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [ 0.379491] PCI: Using E820 reservations for host bridge windows [ 0.381586] ACPI: Enabled 2 GPEs in block 00 to 0F [ 0.390118] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff]) [ 0.390499] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI EDR HPX-Type3] [ 0.391493] acpi PNP0A03:00: _OSC: not requesting OS control; OS requires [ExtendedConfig ASPM ClockPM MSI] [ 0.393553] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended configuration space under this bridge [ 0.397001] acpiphp: Slot [2] registered [ 0.397539] acpiphp: Slot [3] registered [ 0.398194] acpiphp: Slot [4] registered [ 0.398530] acpiphp: Slot [5] registered [ 0.399528] acpiphp: Slot [6] registered [ 0.400150] acpiphp: Slot [7] registered [ 0.400534] acpiphp: Slot [8] registered [ 0.401144] acpiphp: Slot [9] registered [ 0.401530] acpiphp: Slot [10] registered [ 0.402528] acpiphp: Slot [11] registered [ 0.403185] acpiphp: Slot [12] registered [ 0.403530] acpiphp: Slot [13] registered [ 0.404534] acpiphp: Slot [14] registered [ 0.405193] acpiphp: Slot [15] registered [ 0.405528] acpiphp: Slot [16] registered [ 0.406534] acpiphp: Slot [17] registered [ 0.407208] acpiphp: Slot [18] registered [ 0.407535] acpiphp: Slot [19] registered [ 0.408533] acpiphp: Slot [20] registered [ 0.409175] acpiphp: Slot [21] registered [ 0.409528] acpiphp: Slot [22] registered [ 0.410531] acpiphp: Slot [23] registered [ 0.411183] acpiphp: Slot [24] registered [ 0.411528] acpiphp: Slot [25] registered [ 0.412494] acpiphp: Slot [26] registered [ 0.413150] acpiphp: Slot [27] registered [ 0.413528] acpiphp: Slot [28] registered [ 0.414197] acpiphp: Slot [29] registered [ 0.414529] acpiphp: Slot [30] registered [ 0.415536] acpiphp: Slot [31] registered [ 0.416192] PCI host bridge to bus 0000:00 [ 0.416499] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window] [ 0.417492] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] [ 0.418491] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] [ 0.419492] pci_bus 0000:00: root bus resource [mem 0x80000000-0xfebfffff window] [ 0.420491] pci_bus 0000:00: root bus resource [mem 0x100000000-0x17fffffff window] [ 0.422492] pci_bus 0000:00: root bus resource [bus 00-ff] [ 0.423427] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000 conventional PCI endpoint [ 0.425569] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100 conventional PCI endpoint [ 0.428423] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180 conventional PCI endpoint [ 0.433418] pci 0000:00:01.1: BAR 4 [io 0xc100-0xc10f] [ 0.436000] pci 0000:00:01.1: BAR 0 [io 0x01f0-0x01f7]: legacy IDE quirk [ 0.436493] pci 0000:00:01.1: BAR 1 [io 0x03f6]: legacy IDE quirk [ 0.437492] pci 0000:00:01.1: BAR 2 [io 0x0170-0x0177]: legacy IDE quirk [ 0.438491] pci 0000:00:01.1: BAR 3 [io 0x0376]: legacy IDE quirk [ 0.440728] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000 conventional PCI endpoint [ 0.442546] pci 0000:00:01.3: quirk: [io 0x0600-0x063f] claimed by PIIX4 ACPI [ 0.443503] pci 0000:00:01.3: quirk: [io 0x0700-0x070f] claimed by PIIX4 SMB [ 0.444900] pci 0000:00:02.0: [5853:0001] type 00 class 0xff8000 conventional PCI endpoint [ 0.446744] pci 0000:00:02.0: BAR 0 [io 0xc000-0xc0ff] [ 0.447993] pci 0000:00:02.0: BAR 1 [mem 0xfd000000-0xfdffffff pref] [ 0.455675] ACPI: PCI: Interrupt link LNKA configured for IRQ 10 [ 0.456685] ACPI: PCI: Interrupt link LNKB configured for IRQ 10 [ 0.459712] ACPI: PCI: Interrupt link LNKC configured for IRQ 11 [ 0.460680] ACPI: PCI: Interrupt link LNKD configured for IRQ 11 [ 0.461576] ACPI: PCI: Interrupt link LNKS configured for IRQ 9 [ 0.463986] xen:balloon: Initialising balloon driver [ 0.465678] iommu: Default domain type: Translated [ 0.466501] iommu: DMA domain TLB invalidation policy: lazy mode [ 0.468566] Callback from call_rcu_tasks() invoked. [ 0.469004] SCSI subsystem initialized [ 0.470795] ACPI: bus type USB registered [ 0.472546] usbcore: registered new interface driver usbfs [ 0.473436] usbcore: registered new interface driver hub [ 0.473525] usbcore: registered new device driver usb [ 0.474597] pps_core: LinuxPPS API ver. 1 registered [ 0.475493] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it> [ 0.476501] PTP clock support registered [ 0.477636] EDAC MC: Ver: 3.0.0 [ 0.481290] NetLabel: Initializing [ 0.481501] NetLabel: domain hash size = 128 [ 0.482500] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO [ 0.483585] NetLabel: unlabeled traffic allowed by default [ 0.484524] mctp: management component transport protocol core [ 0.485492] NET: Registered PF_MCTP protocol family [ 0.486540] PCI: Using ACPI for IRQ routing [ 0.487795] vgaarb: loaded [ 0.488698] hpet: 3 channels of 0 reserved for per-cpu timers [ 0.489549] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 [ 0.490496] hpet0: 3 comparators, 64-bit 100.000000 MHz counter [ 0.494755] clocksource: Switched to clocksource tsc-early [ 0.501201] VFS: Disk quotas dquot_6.6.0 [ 0.501853] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.503199] pnp: PnP ACPI init [ 0.504524] pnp: PnP ACPI: found 6 devices [ 0.519976] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns [ 0.521639] NET: Registered PF_INET protocol family [ 0.522542] IP idents hash table entries: 32768 (order: 6, 262144 bytes, linear) [ 0.524887] tcp_listen_portaddr_hash hash table entries: 1024 (order: 4, 73728 bytes, linear) [ 0.526620] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear) [ 0.528105] TCP established hash table entries: 16384 (order: 5, 131072 bytes, linear) [ 0.532557] TCP bind hash table entries: 16384 (order: 9, 2359296 bytes, linear) [ 0.536739] TCP: Hash tables configured (established 16384 bind 16384) [ 0.539128] MPTCP token hash table entries: 2048 (order: 5, 180224 bytes, linear) [ 0.540371] UDP hash table entries: 1024 (order: 6, 262144 bytes, linear) [ 0.541517] UDP-Lite hash table entries: 1024 (order: 6, 262144 bytes, linear) [ 0.542936] NET: Registered PF_UNIX/PF_LOCAL protocol family [ 0.543851] NET: Registered PF_XDP protocol family [ 0.544592] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window] [ 0.545510] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window] [ 0.546432] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window] [ 0.547435] pci_bus 0000:00: resource 7 [mem 0x80000000-0xfebfffff window] [ 0.548457] pci_bus 0000:00: resource 8 [mem 0x100000000-0x17fffffff window] [ 0.549597] pci 0000:00:01.0: PIIX3: Enabling Passive Release [ 0.550482] pci 0000:00:00.0: Limiting direct PCI/PCI transfers [ 0.551425] PCI: CLS 0 bytes, default 64 [ 0.554908] Initialise system trusted keyrings [ 0.556537] Key type blacklist registered [ 0.558226] workingset: timestamp_bits=36 max_order=19 bucket_order=0 [ 0.560381] zbud: loaded [ 0.566463] integrity: Platform Keyring initialized [ 0.590614] NET: Registered PF_ALG protocol family [ 0.591432] xor: measuring software checksum speed [ 0.592372] prefetch64-sse : 22503 MB/sec [ 0.593475] generic_sse : 15243 MB/sec [ 0.594241] xor: using function: prefetch64-sse (22503 MB/sec) [ 0.595246] Key type asymmetric registered [ 0.595958] Asymmetric key parser 'x509' registered [ 0.599978] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245) [ 0.602692] io scheduler mq-deadline registered [ 0.603828] io scheduler kyber registered [ 0.604967] io scheduler bfq registered [ 0.610614] atomic64_test: passed for x86-64 platform with CX8 and with SSE [ 0.612803] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 [ 0.614863] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 [ 0.617292] ACPI: button: Power Button [PWRF] [ 0.622286] ACPI: \_SB_.LNKB: Enabled at IRQ 10 Set long mode 1 kvm_xen_hypercall xen_hypercall: cpu 0 cpl 0 input 34 a0 0x0 a1 0xffffa662c0013c10 a2 0x0 ret 0x0 kvm_xen_hypercall xen_hypercall: cpu 1 cpl 0 input 20 a0 0x6 a1 0xffffa662c0013c18 a2 0x1 ret 0x0 kvm_xen_hypercall xen_hypercall: cpu 1 cpl 0 input 20 a0 0x6 a1 0xffffa662c0013c18 a2 0x1 ret 0x0 kvm_xen_hypercall xen_hypercall: cpu 1 cpl 0 input 20 a0 0x6 a1 0xffffa662c0013c00 a2 0x1 ret 0x0 kvm_xen_hypercall xen_hypercall: cpu 1 cpl 0 input 20 a0 0x8 a1 0xffffa662c0013ba8 a2 0x1 ret 0x0 [ 0.628919] xen:grant_table: Grant tables using version 1 layout kvm_xen_hypercall xen_hypercall: cpu 1 cpl 0 input 20 a0 0x6 a1 0xffffa662c0013be0 a2 0x1 ret 0x0 kvm_xen_hypercall xen_hypercall: cpu 1 cpl 0 input 20 a0 0x6 a1 0xffffa662c0013bd8 a2 0x1 ret 0x0 kvm_xen_hypercall xen_hypercall: cpu 1 cpl 0 input 12 a0 0x7 a1 0xffffa662c0013bc0 a2 0x7ff0 ret 0x0 [ 0.631024] Grant table initialized kvm_xen_hypercall xen_hypercall: cpu 1 cpl 0 input 34 a0 0x1 a1 0xffffa662c0013da0 a2 0x7ff0 ret 0xffffffffffffffea
On 06.01.2025 18:19, David Woodhouse wrote: > On Thu, 2025-01-02 at 15:16 +0100, Jürgen Groß wrote: >> On 02.01.25 15:06, David Woodhouse wrote: >>> On Thu, 2025-01-02 at 15:02 +0100, Jürgen Groß wrote: >>>>> Are you suggesting that you're able to enable the CPU-specific CFI >>>>> protections before you even know whether it's an Intel or AMD CPU? >>>> >>>> Not before that, but maybe rather soon afterwards. And the hypercall page >>>> needs to be decommissioned before the next hypercall is happening. The question >>>> is whether we have a hook in place to do that switch between cpu identification >>>> and CFI enabling. >>> >>> Not sure that's how I'd phrase it. Even if we have to add a hook at the >>> right time to switch from the Xen-populated hypercall page to the one >>> filled in by Linux, the question is whether adding that hook is simpler >>> than all this early static_call stuff that's been thrown together, and >>> the open questions about the 64-bit latching. >> >> This is a valid question, yes. My first version of these patches didn't >> work with static_call, but used the paravirt call patching mechanism >> replacing an indirect call with a direct one via ALTERNATIVEs. That >> version was disliked by some involved x86 maintainers, resulting in the >> addition of the early static_call update mechanism. >> >> One thing to mention regarding the 64-bit latching: what would you do >> with HVM domains? Those are setting up the hypercall page rather late. >> In case the kernel would use CFI, enabling would happen way before the >> guest would issue any hypercall, so I guess the latching needs to happen >> by other means anyway. Or would you want to register the hypercall page >> without ever intending to use it? > > With xen_no_vector_callback on the command line, the hypervisor doesn't > realise that the guest is 64-bit until long after all the CPUs are > brought up. > > It does boot (and hey, QEMU does get this right!) but I'm still > concerned that all those shared structures are 32-bit for that long. I > do think the guest kernel should either set the hypercall page, or > HVM_PARAM_CALLBACK_IRQ, as early as possible. How about we adjust the behavior in Xen instead: We could latch the size on every hypercall, making sure to invoke update_domain_wallclock_time() only when the size actually changed (to not incur the extra overhead), unless originating from the two places the latching is currently done at (to avoid altering existing behavior)? Then again latching more frequently (as suggested above or by any other model) also comes with the risk of causing issues, at the very least for "exotic" guests. E.g. with two vCPU-s in different modes, we'd ping-pong the guest between both formats then. Jan
On Tue, 2025-01-07 at 09:20 +0100, Jan Beulich wrote: > > How about we adjust the behavior in Xen instead: We could latch the size > on every hypercall, making sure to invoke update_domain_wallclock_time() > only when the size actually changed (to not incur the extra overhead), > unless originating from the two places the latching is currently done at > (to avoid altering existing behavior)? > > Then again latching more frequently (as suggested above or by any other > model) also comes with the risk of causing issues, at the very least for > "exotic" guests. E.g. with two vCPU-s in different modes, we'd ping-pong > the guest between both formats then. Indeed. I think it's much better for the guest to just write to the hypercall page MSR early, like it always did. It doesn't even *need* to be an executable page; just a data page which is then freed/overwritten. But if we *want* to use it during early boot so that we don't need all that early CPU detection and static_call complexity, that's fine too.
On Thu, 2025-01-02 at 15:16 +0100, Jürgen Groß wrote: > On 02.01.25 15:06, David Woodhouse wrote: > > On Thu, 2025-01-02 at 15:02 +0100, Jürgen Groß wrote: > > > > Are you suggesting that you're able to enable the CPU-specific CFI > > > > protections before you even know whether it's an Intel or AMD CPU? > > > > > > Not before that, but maybe rather soon afterwards. And the hypercall page > > > needs to be decommissioned before the next hypercall is happening. The question > > > is whether we have a hook in place to do that switch between cpu identification > > > and CFI enabling. > > > > Not sure that's how I'd phrase it. Even if we have to add a hook at the > > right time to switch from the Xen-populated hypercall page to the one > > filled in by Linux, the question is whether adding that hook is simpler > > than all this early static_call stuff that's been thrown together, and > > the open questions about the 64-bit latching. > > This is a valid question, yes. My first version of these patches didn't > work with static_call, but used the paravirt call patching mechanism > replacing an indirect call with a direct one via ALTERNATIVEs. That > version was disliked by some involved x86 maintainers, resulting in the > addition of the early static_call update mechanism. > > One thing to mention regarding the 64-bit latching: what would you do > with HVM domains? Those are setting up the hypercall page rather late. In the HVM case it's from init_hypervisor_platform which is called from slightly later in setup_arch(), so it's after static_call_init(). But still long before HVM_PARAM_CALLBACK_IRQ is set in some cases, I think. > In case the kernel would use CFI, enabling would happen way before the > guest would issue any hypercall, so I guess the latching needs to happen > by other means anyway. Or would you want to register the hypercall page > without ever intending to use it? I'd be tempted to do so without using it, yes. You only need to allocate a 4KiB page, ask Xen to populate it, then free it immediately. Or maybe just set HVM_PARAM_CALLBACK_IRQ instead, to make sure it's done? When xen_set_upcall_vector() is called for CPU0 it does that: /* Trick toolstack to think we are enlightened. */ if (!cpu) rc = xen_set_callback_via(1); Maybe we just lift that out and do it somewhere unconditional, earlier? But for PVH I'd still be inclined to set up a hypercall page early and use it until we are able to switch.
On 23.12.2024 15:24, David Woodhouse wrote: > On Tue, 2024-12-17 at 12:18 +0000, Xen.org security team wrote: >> Xen Security Advisory CVE-2024-53241 / XSA-466 >> version 3 >> >> Xen hypercall page unsafe against speculative attacks >> >> UPDATES IN VERSION 3 >> ==================== >> >> Update of patch 5, public release. > > Can't we even use the hypercall page early in boot? Surely we have to > know whether we're running on an Intel or AMD CPU before we get to the > point where we can enable any of the new control-flow integrity > support? Do we need to jump through those hoops do do that early > detection and setup? Yes, putting it e.g. in .init.text ought to be possible and not violate security requirements. > Enabling the hypercall page is also one of the two points where Xen > will 'latch' that the guest is 64-bit, which affects the layout of the > shared_info, vcpu_info and runstate structures. Hmm, this is a side effect which I fear wasn't considered when putting together all of this. Making ourselves dependent upon ... > The other such latching point is when the guest sets > HVM_PARAM_CALLBACK_IRQ, and I *think* that should work in all > implementations of the Xen ABI (including QEMU/KVM and EC2). But would > want to test. ... just this may end up too little, especially when considering transitions between OSes / OS-like environments (boot loader -> OS, OS -> kexec-ed OS). > But perhaps it wouldn't hurt for maximal compatibility for Linux to set > the hypercall page *anyway*, even if Linux doesn't then use it — or > only uses it during early boot? Indeed. Jan
Hello, Le 17/12/2024 à 13:18, Xen.org security team a écrit : > Xen guests need to use different processor instructions to make explicit > calls into the Xen hypervisor depending on guest type and/or CPU > vendor. In order to hide those differences, the hypervisor can fill a > hypercall page with the needed instruction sequences, allowing the guest > operating system to call into the hypercall page instead of having to > choose the correct instructions. > > The hypercall page contains whole functions, which are written by the > hypervisor and executed by the guest. With the lack of an interface > between the guest OS and the hypervisor specifying how a potential > modification of those functions should look like, the Xen hypervisor has > no knowledge how any potential mitigation should look like or which > hardening features should be put into place. > Should we consider adding a interface to know how to the guest is supposed to make hypercalls (what hypercall instruction/flavor) ? Such as the guest can have its own hypercall implementations but knows which one to use. > This results in potential vulnerabilities if the guest OS is using any > speculative mitigation that performs a compiler transform on "ret" > instructions in order to work (e.g. the Linux kernel rethunk or safe-ret > mitigations). > > Furthermore, the hypercall page has no provision for Control-flow > Integrity schemes (e.g. kCFI/CET-IBT/FineIBT), and will simply > malfunction in such configurations. > > IMPACT > ====== > > Some mitigations for hardware vulnerabilities the guest OS is relying on to > work might not be fully functional, resulting in e.g. guest user processes > being able to read data they ought not have access to. > > VULNERABLE SYSTEMS > ================== > > Only x86 systems are potentially vulnerable, Arm systems are not vulnerable. > > All guest types (PV, PVH and HVM) are potentially vulnerable. > > Linux guests are known to be vulnerable, guests using other operating > systems might be vulnerable, too. > > MITIGATION > ========== > > Running only Linux guest kernels not relying on "ret" assembler instruction > patching (kernel config option CONFIG_MITIGATION_RETHUNK/CONFIG_RETHUNK > disabled) will avoid the vulnerability, as long as this option isn't > required to be safe on the underlying hardware. > > CREDITS > ======= > > This issue was discovered by Andrew Cooper of XenServer. > > RESOLUTION > ========== > > Applying the set of attached patches resolves this issue. > > The patch to Xen is simply a documentation update to clarify that an OS author > might not want to use a hypercall page. > > xsa466-linux-*.patch Linux > xsa466-xen.patch xen-unstable > > $ sha256sum xsa466* > 498fb2538f650d694bbd6b7d2333dcf9a12d0bdfcba65257a7d14c88f5b86801 xsa466-linux-01.patch > 1e0d5f68d1cb4a0ef8914ae6bdeb4e18bae94c6d19659708ad707da784c0aa5c xsa466-linux-02.patch > b3056b34c1565f901cb4ba11c03a51d4f045b5de7cd16c6e510e0bcee8cc6cd7 xsa466-linux-03.patch > 0215e56739ab5b0d0ec0125f3d1806c3a0a0dcb3f562014f59b5145184a41467 xsa466-linux-04.patch > 314e67060ab4f47883cf2b124d54ce3cd4b0363f0545ad907a7b754a4405aacd xsa466-linux-05.patch > adbef75416379d96ebb72463872f993e9d8b7d119091480ad1e70fd448481733 xsa466-linux-06.patch > 36874014cee5d5213610a6ffdd0e3e67d0258d28f2587b8470fdd0cef96e5013 xsa466-linux-07.patch > 367f981ef8adc11b99cc6999b784305bcdcd55db0358fd6a2171509bf7f64345 xsa466-xen.patch > $ > > DEPLOYMENT DURING EMBARGO > ========================= > > Deployment of patches or mitigations is NOT permitted (except where > all the affected systems and VMs are administered and used only by > organisations which are members of the Xen Project Security Issues > Predisclosure List). Specifically, deployment on public cloud systems > is NOT permitted. > > This is because the mitigation or patches need to be applied to the guests. > > Deployment is permitted only AFTER the embargo ends. > > (Note: this during-embargo deployment notice is retained in > post-embargo publicly released Xen Project advisories, even though it > is then no longer applicable. This is to enable the community to have > oversight of the Xen Project Security Team's decisionmaking.) > > For more information about permissible uses of embargoed information, > consult the Xen Project community's agreed Security Policy: > http://www.xenproject.org/security-policy.html Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech
On 17/12/2024 1:21 pm, Teddy Astie wrote: > Hello, > > Le 17/12/2024 à 13:18, Xen.org security team a écrit : >> Xen guests need to use different processor instructions to make explicit >> calls into the Xen hypervisor depending on guest type and/or CPU >> vendor. In order to hide those differences, the hypervisor can fill a >> hypercall page with the needed instruction sequences, allowing the guest >> operating system to call into the hypercall page instead of having to >> choose the correct instructions. >> >> The hypercall page contains whole functions, which are written by the >> hypervisor and executed by the guest. With the lack of an interface >> between the guest OS and the hypervisor specifying how a potential >> modification of those functions should look like, the Xen hypervisor has >> no knowledge how any potential mitigation should look like or which >> hardening features should be put into place. >> > Should we consider adding a interface to know how to the guest is > supposed to make hypercalls (what hypercall instruction/flavor) ? Such > as the guest can have its own hypercall implementations but knows which > one to use. Better enumeration is coming with the hypercall API/ABI changes, but a guest already has enough information to correctly issue hypercalls to the current ABI. Hence why we didn't make this fix in Linux depend on matching change in Xen. ~Andrew
© 2016 - 2025 Red Hat, Inc.