arch/x86/include/asm/msr-index.h | 9 + arch/x86/kernel/cpu/microcode/core.c | 11 + arch/x86/kernel/cpu/microcode/intel.c | 341 +++++++++++++++++++++++ arch/x86/kernel/cpu/microcode/internal.h | 4 +- 4 files changed, 364 insertions(+), 1 deletion(-)
Hi all,
Here is a revision following up feedbacks from this posting [1]:
* Instead of embedding staging logic directly into load_late_locked(),
it was suggested [3] to place it within load_late_stop_cpus(). This
change ensures that staging is treated as a preparatory step
before CPUs are stopped.
* Rather than introducing a separate .c file only for staging, the
recommendation was to consolidate all staging-related code within
intel.c [4].
* The previous implementation lacks clarity in explaining key aspects
of the mailbox and staging handler. Improving documentation and
readability in these areas was suggested as a necessary refinement
[5,6], as I understood.
* It was also requested to fold MSR definitions into their usage
patches [7].
In addition to addressing these points, I’ve considered a unified staging
state struct (patch 2), primarily to simplify the staging handler loop
while also improving overall code organization.
This series is based on the tip/master branch. You can also find it from
this repo:
git://github.com/intel-staging/microcode.git staging_v2
I suspect the maintainers could afford another look at least after the
upcoming merge window. In the meantime, I would appreciate any additional
feedback from those interested in this feature.
The original cover letter, which provides some background on this feature
enabling and its initial integration considerations, can be found in the
previous postings [1,2]. The relevant specification has also been posted
[8].
Thanks,
Chang
[1] Last posting: https://lore.kernel.org/lkml/20241211014213.3671-1-chang.seok.bae@intel.com/
[2] RFC: https://lore.kernel.org/lkml/20241001161042.465584-1-chang.seok.bae@intel.com/
[3] https://lore.kernel.org/lkml/20250218113634.GGZ7RwwkrrXADX0eRo@fat_crate.local/
[4] https://lore.kernel.org/lkml/20250226175642.GOZ79V2jWQTH5rbuXo@fat_crate.local/
[5] https://lore.kernel.org/lkml/fac46937-e0a5-42c1-96ee-65fec4e17551@intel.com/
[6] https://lore.kernel.org/lkml/1aee0888-b87b-443c-84fa-3bc000cbebcf@intel.com/
[7] https://lore.kernel.org/lkml/20250226171923.GMZ79NG_8wDtZ8vyWH@fat_crate.local/
[8] Staging Spec: https://cdrdv2.intel.com/v1/dl/getContent/782715
Chang S. Bae (6):
x86/microcode: Introduce staging step to reduce late-loading time
x86/microcode/intel: Define staging state struct
x86/microcode/intel: Establish staging control logic
x86/microcode/intel: Implement staging handler
x86/microcode/intel: Support mailbox transfer
x86/microcode/intel: Enable staging when available
arch/x86/include/asm/msr-index.h | 9 +
arch/x86/kernel/cpu/microcode/core.c | 11 +
arch/x86/kernel/cpu/microcode/intel.c | 341 +++++++++++++++++++++++
arch/x86/kernel/cpu/microcode/internal.h | 4 +-
4 files changed, 364 insertions(+), 1 deletion(-)
base-commit: 758ea9705c51865858c612f591c6e6950dcafccf
--
2.45.2
Hi all,
This is another revision addressing feedback from the last posting [1]
over the past few weeks:
* Dave suggested using a local variable to clarify the scope handled
by each helper, via a function argument [2]. Accordingly, the patch
order was adjusted (patch 2 <-> patch 3).
* Chao provided helpful feedback, leading to additional refinements.
This round contains relatively smaller changes compared to the last one,
but I hope this iteration provide a chance to draw more reviews or even
collect a few tags.
As before, the patch series is based on tip/master and is also available
in this repo:
git://github.com/intel-staging/microcode.git staging_v3
Thanks,
Chang
[1]: V2: https://lore.kernel.org/lkml/Z+O8DK5NZJL43Nt6@intel.com/
[2]: https://lore.kernel.org/lkml/b01224ee-c935-4b08-a76f-5dc49341182d@intel.com/
Chang S. Bae (6):
x86/microcode: Introduce staging step to reduce late-loading time
x86/microcode/intel: Establish staging control logic
x86/microcode/intel: Define staging state struct
x86/microcode/intel: Implement staging handler
x86/microcode/intel: Support mailbox transfer
x86/microcode/intel: Enable staging when available
arch/x86/include/asm/msr-index.h | 9 +
arch/x86/include/asm/topology.h | 1 +
arch/x86/kernel/cpu/microcode/core.c | 11 +
arch/x86/kernel/cpu/microcode/intel.c | 338 +++++++++++++++++++++++
arch/x86/kernel/cpu/microcode/internal.h | 4 +-
5 files changed, 362 insertions(+), 1 deletion(-)
base-commit: 033163360247053bec75b81ac2d34aeb9d994e59
--
2.45.2
Dear x86 maintainers, This is another posting of the series, now including a Tested-by tag. There are no code changes. The use case for this feature is late-loading with increased load times, most likely in cloud service environments. I reached out to potential users for reviews and feedback, and Anselm Busse has confirmed his testing. While additional reviews or endorsements could help ease the maintainers' burden and build confidence, there hasn't been much recent traction. At this point, I believe it's better to move forward and get a clear assessment from the maintainers -- either to prompt more engagement from prospective users or to determine its readiness for merging. Of course, I appreciate any further feedback on this set. As usual, the patch series is based on tip/master and is also available in this repo: git://github.com/intel-staging/microcode.git staging_v4 Last posting (V3): https://lore.kernel.org/lkml/20250409232713.4536-1-chang.seok.bae@intel.com/ Thanks, Chang Chang S. Bae (6): x86/microcode: Introduce staging step to reduce late-loading time x86/microcode/intel: Establish staging control logic x86/microcode/intel: Define staging state struct x86/microcode/intel: Implement staging handler x86/microcode/intel: Support mailbox transfer x86/microcode/intel: Enable staging when available arch/x86/include/asm/msr-index.h | 9 + arch/x86/include/asm/topology.h | 1 + arch/x86/kernel/cpu/microcode/core.c | 11 + arch/x86/kernel/cpu/microcode/intel.c | 338 +++++++++++++++++++++++ arch/x86/kernel/cpu/microcode/internal.h | 4 +- 5 files changed, 362 insertions(+), 1 deletion(-) base-commit: ca76508b9352e8c770b58213cc6c4700e459b7c2 -- 2.48.1
Hi all,
This is another iteration -- changes since v4 [*]:
* Reworked the preparatory change before referencing
cpu_primary_thread_mask, based on Dave’s feedback. This is now patch1.
* Incorporated further feedback from Dave on the staging code, mainly
to improve clarity, reduce ambiguity, and fix minor issues. Each
patch includes some details.
* Collected Chao’s review tag (thanks!) for the first and last patches.
As usual, the series is also available here:
git://github.com/intel-staging/microcode.git staging_v5
[*] https://lore.kernel.org/lkml/20250813172649.15474-1-chang.seok.bae@intel.com/
Thanks,
Chang
Chang S. Bae (7):
x86/cpu/topology: Make primary thread mask available with SMP=n
x86/microcode: Introduce staging step to reduce late-loading time
x86/microcode/intel: Establish staging control logic
x86/microcode/intel: Define staging state struct
x86/microcode/intel: Implement staging handler
x86/microcode/intel: Support mailbox transfer
x86/microcode/intel: Enable staging when available
arch/x86/include/asm/msr-index.h | 9 +
arch/x86/include/asm/topology.h | 12 +-
arch/x86/kernel/cpu/microcode/core.c | 11 +
arch/x86/kernel/cpu/microcode/intel.c | 386 +++++++++++++++++++++++
arch/x86/kernel/cpu/microcode/internal.h | 4 +-
arch/x86/kernel/cpu/topology.c | 4 -
arch/x86/kernel/cpu/topology_common.c | 3 +
arch/x86/kernel/smpboot.c | 3 -
8 files changed, 418 insertions(+), 14 deletions(-)
base-commit: 7182bf4176f93be42225d2ef983894febfa4a1b1
--
2.48.1
On Sat, Aug 23, 2025 at 08:52:03AM -0700, Chang S. Bae wrote: > Hi all, > > This is another iteration -- changes since v4 [*]: > > * Reworked the preparatory change before referencing > cpu_primary_thread_mask, based on Dave’s feedback. This is now patch1. > > * Incorporated further feedback from Dave on the staging code, mainly > to improve clarity, reduce ambiguity, and fix minor issues. Each > patch includes some details. > > * Collected Chao’s review tag (thanks!) for the first and last patches. > > As usual, the series is also available here: > git://github.com/intel-staging/microcode.git staging_v5 > > [*] https://lore.kernel.org/lkml/20250813172649.15474-1-chang.seok.bae@intel.com/ > > Thanks, > Chang Reviewed-by: Tony Luck <tony.luck@intel.com> -Tony
On 8/26/2025 3:13 PM, Luck, Tony wrote: > > Reviewed-by: Tony Luck <tony.luck@intel.com> Thanks!
cpu_primary_thread_mask is only defined when CONFIG_SMP=y. However, even
in UP kernels there is always exactly one CPU, which can reasonably be
treated as the primary thread.
Historically, topology_is_primary_thread() always returned true with
CONFIG_SMP=n. A recent commit:
4b455f59945aa ("cpu/SMT: Provide a default topology_is_primary_thread()")
replaced it with a generic implementation with the note:
"When disabling SMT, the primary thread of the SMT will remain
enabled/active. Architectures that have a special primary thread (e.g.
x86) need to override this function. ..."
For consistency and clarity, make the primary thread mask available
regardless of SMP, similar to cpu_possible_mask and cpu_present_mask.
Move __cpu_primary_thread_mask into common code to prevent build issues.
Let cpu_mark_primary_thread() configure the mask even for UP kernels,
alongside other masks. Then, topology_is_primary_thread() can
consistently reference it.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
V4 -> V5: New patch
Preparatory patch to set up the mask correctly before its new use in
patch3.
---
arch/x86/include/asm/topology.h | 12 ++++++------
arch/x86/kernel/cpu/topology.c | 4 ----
arch/x86/kernel/cpu/topology_common.c | 3 +++
arch/x86/kernel/smpboot.c | 3 ---
4 files changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 6c79ee7c0957..281252af6e9d 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -218,6 +218,12 @@ static inline unsigned int topology_amd_nodes_per_pkg(void)
return __amd_nodes_per_pkg;
}
+#else /* CONFIG_SMP */
+static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
+static inline int topology_max_smt_threads(void) { return 1; }
+static inline unsigned int topology_amd_nodes_per_pkg(void) { return 1; }
+#endif /* !CONFIG_SMP */
+
extern struct cpumask __cpu_primary_thread_mask;
#define cpu_primary_thread_mask ((const struct cpumask *)&__cpu_primary_thread_mask)
@@ -231,12 +237,6 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
}
#define topology_is_primary_thread topology_is_primary_thread
-#else /* CONFIG_SMP */
-static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
-static inline int topology_max_smt_threads(void) { return 1; }
-static inline unsigned int topology_amd_nodes_per_pkg(void) { return 1; }
-#endif /* !CONFIG_SMP */
-
static inline void arch_fix_phys_package_id(int num, u32 slot)
{
}
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index e35ccdc84910..f083023f7dd9 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -75,15 +75,11 @@ bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
return phys_id == (u64)cpuid_to_apicid[cpu];
}
-#ifdef CONFIG_SMP
static void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid)
{
if (!(apicid & (__max_threads_per_core - 1)))
cpumask_set_cpu(cpu, &__cpu_primary_thread_mask);
}
-#else
-static inline void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid) { }
-#endif
/*
* Convert the APIC ID to a domain level ID by masking out the low bits
diff --git a/arch/x86/kernel/cpu/topology_common.c b/arch/x86/kernel/cpu/topology_common.c
index b5a5e1411469..71625795d711 100644
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -16,6 +16,9 @@ EXPORT_SYMBOL_GPL(x86_topo_system);
unsigned int __amd_nodes_per_pkg __ro_after_init;
EXPORT_SYMBOL_GPL(__amd_nodes_per_pkg);
+/* CPUs which are the primary SMT threads */
+struct cpumask __cpu_primary_thread_mask __read_mostly;
+
void topology_set_dom(struct topo_scan *tscan, enum x86_topology_domains dom,
unsigned int shift, unsigned int ncpus)
{
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 33e166f6ab12..7804175d2d87 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -103,9 +103,6 @@ EXPORT_PER_CPU_SYMBOL(cpu_core_map);
DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
EXPORT_PER_CPU_SYMBOL(cpu_die_map);
-/* CPUs which are the primary SMT threads */
-struct cpumask __cpu_primary_thread_mask __read_mostly;
-
/* Representing CPUs for which sibling maps can be computed */
static cpumask_var_t cpu_sibling_setup_mask;
--
2.48.1
As microcode patch sizes continue to grow, late-loading latency spikes
can lead to timeouts and disruptions in running workloads. This trend of
increasing patch sizes is expected to continue, so a foundational
solution is needed to address the issue.
To mitigate the problem, a new staging feature is introduced. This option
processes most of the microcode update (excluding activation) on a
non-critical path, allowing CPUs to remain operational during the
majority of the update. By offloading work from the critical path,
staging can significantly reduces latency spikes.
Integrate staging as a preparatory step in late-loading. Introduce a new
callback for staging, which is invoked at the beginning of
load_late_stop_cpus(), before CPUs enter the rendezvous phase.
Staging follows an opportunistic model:
* If successful, it reduces CPU rendezvous time
* Even though it fails, the process falls back to the legacy path to
finish the loading process but with potentially higher latency.
Extend struct microcode_ops to incorporate staging properties, which will
be implemented in the vendor code separately.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
Reviewed-by: Chao Gao <chao.gao@intel.com>
---
V4 -> V5:
* Collect Chao's review tag
V1 -> V2:
* Move invocation inside of load_late_stop_cpus() (Boris)
* Add more note about staging (Dave)
There were discussions about whether staging success should be enforced
by a configurable option. That topic is identified as follow-up work,
separate from this series.
https://lore.kernel.org/lkml/54308373-7867-4b76-be34-63730953f83c@intel.com/
---
arch/x86/kernel/cpu/microcode/core.c | 11 +++++++++++
arch/x86/kernel/cpu/microcode/internal.h | 4 +++-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index b92e09a87c69..34e569ee1db2 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -552,6 +552,17 @@ static int load_late_stop_cpus(bool is_safe)
pr_err("You should switch to early loading, if possible.\n");
}
+ /*
+ * Pre-load the microcode image into a staging device. This
+ * process is preemptible and does not require stopping CPUs.
+ * Successful staging simplifies the subsequent late-loading
+ * process, reducing rendezvous time.
+ *
+ * Even if the transfer fails, the update will proceed as usual.
+ */
+ if (microcode_ops->use_staging)
+ microcode_ops->stage_microcode();
+
atomic_set(&late_cpus_in, num_online_cpus());
atomic_set(&offline_in_nmi, 0);
loops_per_usec = loops_per_jiffy / (TICK_NSEC / 1000);
diff --git a/arch/x86/kernel/cpu/microcode/internal.h b/arch/x86/kernel/cpu/microcode/internal.h
index 50a9702ae4e2..adf02ebbf7a3 100644
--- a/arch/x86/kernel/cpu/microcode/internal.h
+++ b/arch/x86/kernel/cpu/microcode/internal.h
@@ -31,10 +31,12 @@ struct microcode_ops {
* See also the "Synchronization" section in microcode_core.c.
*/
enum ucode_state (*apply_microcode)(int cpu);
+ void (*stage_microcode)(void);
int (*collect_cpu_info)(int cpu, struct cpu_signature *csig);
void (*finalize_late_load)(int result);
unsigned int nmi_safe : 1,
- use_nmi : 1;
+ use_nmi : 1,
+ use_staging : 1;
};
struct early_load_data {
--
2.48.1
On Sat, Aug 23, 2025 at 08:52:05AM -0700, Chang S. Bae wrote:
> As microcode patch sizes continue to grow, late-loading latency spikes
> can lead to timeouts and disruptions in running workloads. This trend of
> increasing patch sizes is expected to continue, so a foundational
> solution is needed to address the issue.
>
> To mitigate the problem, a new staging feature is introduced. This option
> processes most of the microcode update (excluding activation) on a
> non-critical path, allowing CPUs to remain operational during the
> majority of the update. By offloading work from the critical path,
> staging can significantly reduces latency spikes.
"reduce"
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 9/4/2025 5:08 AM, Borislav Petkov wrote: > On Sat, Aug 23, 2025 at 08:52:05AM -0700, Chang S. Bae wrote: >> >> staging can significantly reduces latency spikes. > > "reduce" Sorry for the type. Fixed.
When microcode staging is initiated, operations are carried out through
an MMIO interface. Each package has a unique interface specified by the
IA32_MCU_STAGING_MBOX_ADDR MSR, which maps to a set of 32-bit registers.
Prepare staging with the following steps:
1. Ensure the microcode image is 32-bit aligned to match the MMIO
register size.
2. Identify each MMIO interface based on its per-package scope.
3. Invoke the staging function for each identified interface, which
will be implemented separately.
Also, define cpu_primary_thread_mask for the CONFIG_SMP=n case, allowing
consistent use when narrowing down primary threads to locate the
per-package interface.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
Link: https://lore.kernel.org/all/871pznq229.ffs@tglx
---
V4 -> V5:
* Rebase on the primary thread cpumask fix (Dave)
* Clean up the rev print code (Dave)
* rdmsrl_on_cpu() -> rdmsrq_on_cpu (Chao)
V2 -> V3:
* Remove a global variable and adjust stage_microcode() (Dave).
* Simplify for_each_cpu() loop control code
* Handle rdmsrl_on_cpu() return code explicitly (Chao)
V1 -> V2:
* Adjust to reference the staging_state struct.
* Add lockdep_assert_cpus_held() (Boris)
---
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/kernel/cpu/microcode/intel.c | 49 +++++++++++++++++++++++++++
2 files changed, 51 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b65c3ba5fa14..0356155f9264 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -913,6 +913,8 @@
#define MSR_IA32_UCODE_WRITE 0x00000079
#define MSR_IA32_UCODE_REV 0x0000008b
+#define MSR_IA32_MCU_STAGING_MBOX_ADDR 0x000007a5
+
/* Intel SGX Launch Enclave Public Key Hash MSRs */
#define MSR_IA32_SGXLEPUBKEYHASH0 0x0000008C
#define MSR_IA32_SGXLEPUBKEYHASH1 0x0000008D
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 371ca6eac00e..d309fb1f058f 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -299,6 +299,54 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
return size ? NULL : patch;
}
+/*
+ * Handle the staging process using the mailbox MMIO interface.
+ * Return the result state.
+ */
+static enum ucode_state do_stage(u64 mmio_pa)
+{
+ pr_debug_once("Staging implementation is pending.\n");
+ return UCODE_ERROR;
+}
+
+static void stage_microcode(void)
+{
+ unsigned int pkg_id = UINT_MAX;
+ enum ucode_state ret;
+ int cpu, err;
+ u64 mmio_pa;
+
+ if (!IS_ALIGNED(get_totalsize(&ucode_patch_late->hdr), sizeof(u32)))
+ return;
+
+ lockdep_assert_cpus_held();
+
+ /*
+ * The MMIO address is unique per package, and all the SMT
+ * primary threads are online here. Find each MMIO space by
+ * their package ids to avoid duplicate staging.
+ */
+ for_each_cpu(cpu, cpu_primary_thread_mask) {
+ if (topology_logical_package_id(cpu) == pkg_id)
+ continue;
+ pkg_id = topology_logical_package_id(cpu);
+
+ err = rdmsrq_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &mmio_pa);
+ if (WARN_ON_ONCE(err))
+ return;
+
+ ret = do_stage(mmio_pa);
+ if (ret != UCODE_OK) {
+ pr_err("Error: staging failed with %s for CPU%d at package %u.\n",
+ ret == UCODE_TIMEOUT ? "timeout" : "error state",
+ cpu, pkg_id);
+ return;
+ }
+ }
+
+ pr_info("Staging of patch revision 0x%x succeeded.\n", ucode_patch_late->hdr.rev);
+}
+
static enum ucode_state __apply_microcode(struct ucode_cpu_info *uci,
struct microcode_intel *mc,
u32 *cur_rev)
@@ -627,6 +675,7 @@ static struct microcode_ops microcode_intel_ops = {
.collect_cpu_info = collect_cpu_info,
.apply_microcode = apply_microcode_late,
.finalize_late_load = finalize_late_load,
+ .stage_microcode = stage_microcode,
.use_nmi = IS_ENABLED(CONFIG_X86_64),
};
--
2.48.1
On Sat, Aug 23, 2025 at 08:52:06AM -0700, Chang S. Bae wrote:
> When microcode staging is initiated, operations are carried out through
> an MMIO interface. Each package has a unique interface specified by the
> IA32_MCU_STAGING_MBOX_ADDR MSR, which maps to a set of 32-bit registers.
>
> Prepare staging with the following steps:
>
> 1. Ensure the microcode image is 32-bit aligned to match the MMIO
> register size.
>
> 2. Identify each MMIO interface based on its per-package scope.
>
> 3. Invoke the staging function for each identified interface, which
> will be implemented separately.
>
> Also, define cpu_primary_thread_mask for the CONFIG_SMP=n case, allowing
> consistent use when narrowing down primary threads to locate the
> per-package interface.
This paragraph is stale now and can go.
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index b65c3ba5fa14..0356155f9264 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -913,6 +913,8 @@
> #define MSR_IA32_UCODE_WRITE 0x00000079
> #define MSR_IA32_UCODE_REV 0x0000008b
>
> +#define MSR_IA32_MCU_STAGING_MBOX_ADDR 0x000007a5
Doesn't look sorted to me.
> +
> /* Intel SGX Launch Enclave Public Key Hash MSRs */
> #define MSR_IA32_SGXLEPUBKEYHASH0 0x0000008C
> #define MSR_IA32_SGXLEPUBKEYHASH1 0x0000008D
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> index 371ca6eac00e..d309fb1f058f 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -299,6 +299,54 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
> return size ? NULL : patch;
> }
>
> +/*
> + * Handle the staging process using the mailbox MMIO interface.
> + * Return the result state.
> + */
> +static enum ucode_state do_stage(u64 mmio_pa)
> +{
> + pr_debug_once("Staging implementation is pending.\n");
> + return UCODE_ERROR;
> +}
> +
> +static void stage_microcode(void)
> +{
> + unsigned int pkg_id = UINT_MAX;
> + enum ucode_state ret;
> + int cpu, err;
> + u64 mmio_pa;
> +
> + if (!IS_ALIGNED(get_totalsize(&ucode_patch_late->hdr), sizeof(u32)))
> + return;
> +
> + lockdep_assert_cpus_held();
> +
> + /*
> + * The MMIO address is unique per package, and all the SMT
> + * primary threads are online here. Find each MMIO space by
> + * their package ids to avoid duplicate staging.
> + */
> + for_each_cpu(cpu, cpu_primary_thread_mask) {
> + if (topology_logical_package_id(cpu) == pkg_id)
> + continue;
<---- newline here.
> + pkg_id = topology_logical_package_id(cpu);
> +
> + err = rdmsrq_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &mmio_pa);
> + if (WARN_ON_ONCE(err))
> + return;
> +
> + ret = do_stage(mmio_pa);
> + if (ret != UCODE_OK) {
> + pr_err("Error: staging failed with %s for CPU%d at package %u.\n",
> + ret == UCODE_TIMEOUT ? "timeout" : "error state",
What does "error state" mean?
Are we going to dump additional error state so that it is clear why it failed?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 9/4/2025 5:13 AM, Borislav Petkov wrote:
> On Sat, Aug 23, 2025 at 08:52:06AM -0700, Chang S. Bae wrote:
>>
>> Also, define cpu_primary_thread_mask for the CONFIG_SMP=n case, allowing
>> consistent use when narrowing down primary threads to locate the
>> per-package interface.
>
> This paragraph is stale now and can go.
Ah, right. Sorry, I had to chop it out.
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index b65c3ba5fa14..0356155f9264 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -913,6 +913,8 @@
>> #define MSR_IA32_UCODE_WRITE 0x00000079
>> #define MSR_IA32_UCODE_REV 0x0000008b
>>
>> +#define MSR_IA32_MCU_STAGING_MBOX_ADDR 0x000007a5
>
> Doesn't look sorted to me.
Okay, I think this 'MCU' looks a bit fuzzy. Intel folks already use this
acronym elsewhere (e.g. MSR_IA32_MCU_OPT_CTRL). It might be clearer to
consolidate them under one section, like:
/* Intel microcode update MSRs */
#define MSR_IA32_MCU_OPT_CTRL 0x00000123
#define MSR_IA32_MCU_ENUMERATION 0x0000007b <- patch7 adds this
#define MSR_IA32_MCU_STAGING_MBOX_ADDR 0x000007a5
...
>> + /*
>> + * The MMIO address is unique per package, and all the SMT
>> + * primary threads are online here. Find each MMIO space by
>> + * their package ids to avoid duplicate staging.
>> + */
>> + for_each_cpu(cpu, cpu_primary_thread_mask) {
>> + if (topology_logical_package_id(cpu) == pkg_id)
>> + continue;
>
> <---- newline here.
Fixed. Thanks.
>> + ret = do_stage(mmio_pa);
>> + if (ret != UCODE_OK) {
>> + pr_err("Error: staging failed with %s for CPU%d at package %u.\n",
>> + ret == UCODE_TIMEOUT ? "timeout" : "error state",
>
> What does "error state" mean?
>
> Are we going to dump additional error state so that it is clear why it failed?
Yeah, right. The wording "error state" is vague.
The next two patches in this series introduce helpers that return an
error code and *also* update this ucode_state. The latter could go away.
Instead, the error code could be just down through here and decoded like:
static const char *staging_errstr(int code)
{
switch (code) {
case -ETIMEDOUT: return "timeout";
...
default: return "unknown error";
}
}
static void stage_microcode(void)
{
...
err = do_stage(mmio_pa);
if (err) {
pr_err("Error: staging failed with %s for CPU%d at package %u.\n",
staging_errstr(err), cpu, pkg_id);
return;
}
}
I've attached a diff on top of V5 to picture what it will look like.
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 455b50514d87..ae4b08e71e20 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -98,7 +98,6 @@ struct extended_sigtable {
* @chunk_size: Size of each data piece
* @bytes_sent: Total bytes transmitted so far
* @offset: Current offset in the microcode image
- * @state: Current state of the staging process
*/
struct staging_state {
void __iomem *mmio_base;
@@ -106,7 +105,6 @@ struct staging_state {
unsigned int chunk_size;
unsigned int bytes_sent;
unsigned int offset;
- enum ucode_state state;
};
#define DEFAULT_UCODE_TOTALSIZE (DEFAULT_UCODE_DATASIZE + MC_HEADER_SIZE)
@@ -425,7 +423,7 @@ static inline unsigned int calc_next_chunk_size(unsigned int ucode_len, unsigned
* Update the chunk size and decide whether another chunk can be sent.
* This accounts for remaining data and retry limits.
*/
-static bool can_send_next_chunk(struct staging_state *ss)
+static bool can_send_next_chunk(struct staging_state *ss, int *err)
{
ss->chunk_size = calc_next_chunk_size(ss->ucode_len, ss->offset);
/*
@@ -445,10 +443,11 @@ static bool can_send_next_chunk(struct staging_state *ss)
* likely stuck and mark the state as timeout.
*/
if (ss->bytes_sent + ss->chunk_size > ss->ucode_len * 2) {
- ss->state = UCODE_TIMEOUT;
+ *err = -ETIMEDOUT;
return false;
}
+ *err = 0;
return true;
}
@@ -465,9 +464,9 @@ static inline bool is_end_offset(u32 offset)
* the end offset, or no more transactions are permitted (retry limit
* reached).
*/
-static inline bool staging_is_complete(struct staging_state *ss)
+static inline bool staging_is_complete(struct staging_state *ss, int *err)
{
- return is_end_offset(ss->offset) || !can_send_next_chunk(ss);
+ return is_end_offset(ss->offset) || !can_send_next_chunk(ss, err);
}
/*
@@ -489,21 +488,16 @@ static int wait_for_transaction(struct staging_state *ss)
}
/* Check for explicit error response */
- if (status & MASK_MBOX_STATUS_ERROR) {
- ss->state = UCODE_ERROR;
- return -EPROTO;
- }
+ if (status & MASK_MBOX_STATUS_ERROR)
+ return -EIO;
/*
* Hardware is neither responded to the action nor signaled any
* error. Treat this as timeout.
*/
- if (!(status & MASK_MBOX_STATUS_READY)) {
- ss->state = UCODE_TIMEOUT;
+ if (!(status & MASK_MBOX_STATUS_READY))
return -ETIMEDOUT;
- }
- ss->state = UCODE_OK;
return 0;
}
@@ -545,7 +539,6 @@ static int fetch_next_offset(struct staging_state *ss)
const u64 expected_header = MBOX_HEADER(MBOX_HEADER_SIZE + MBOX_RESPONSE_SIZE);
u32 offset, status;
u64 header;
- int err;
/*
* The 'response' mailbox returns three fields, in order:
@@ -558,55 +551,42 @@ static int fetch_next_offset(struct staging_state *ss)
status = read_mbox_dword(ss->mmio_base);
/* All valid responses must start with the expected header. */
- if (header != expected_header) {
- pr_err_once("staging: invalid response header\n");
- err = -EINVAL;
- goto err_out;
- }
+ if (header != expected_header)
+ return -EBADR;
/*
* Verify the offset: If not at the end marker, it must not
* exceed the microcode image length
*/
- if (!is_end_offset(offset) && offset > ss->ucode_len) {
- pr_err_once("staging: invalid response offset\n");
- err = -EINVAL;
- goto err_out;
- }
+ if (!is_end_offset(offset) && offset > ss->ucode_len)
+ return -EINVAL;
/* Hardware may report errors explicitly in the status field */
- if (status & MASK_MBOX_RESP_ERROR) {
- err = -EPROTO;
- goto err_out;
- }
+ if (status & MASK_MBOX_RESP_ERROR)
+ return -EPROTO;
ss->offset = offset;
- ss->state = UCODE_OK;
return 0;
-
-err_out:
- ss->state = UCODE_ERROR;
- return err;
}
/*
* Handle the staging process using the mailbox MMIO interface. The
* microcode image is transferred in chunks until completion. Return the
- * result state.
+ * error code.
*/
-static enum ucode_state do_stage(u64 mmio_pa)
+static int do_stage(u64 mmio_pa)
{
struct staging_state ss = {};
int err;
ss.mmio_base = ioremap(mmio_pa, MBOX_REG_NUM * MBOX_REG_SIZE);
if (WARN_ON_ONCE(!ss.mmio_base))
- return UCODE_ERROR;
+ return -EADDRNOTAVAIL;
init_stage(&ss);
/* Perform the staging process while within the retry limit */
- while (!staging_is_complete(&ss)) {
+ while (!staging_is_complete(&ss, &err)) {
/* Send a chunk of microcode each time: */
err = send_data_chunk(&ss, ucode_patch_late);
if (err)
@@ -622,17 +602,25 @@ static enum ucode_state do_stage(u64 mmio_pa)
iounmap(ss.mmio_base);
- /*
- * The helpers update ss.state on error. The final state is
- * returned to the caller.
- */
- return ss.state;
+ return err;
+}
+
+static const char *staging_errstr(int code)
+{
+ switch (code) {
+ case -EBADR: return "invalid response header";
+ case -EINVAL: return "invalid next offset";
+ case -EIO: return "transaction error";
+ case -ETIMEDOUT: return "timeout";
+ case -EPROTO: return "response error";
+ case -EADDRNOTAVAIL: return "ioremap() failure";
+ default: return "unknown error";
+ };
}
static void stage_microcode(void)
{
unsigned int pkg_id = UINT_MAX;
- enum ucode_state ret;
int cpu, err;
u64 mmio_pa;
@@ -656,11 +644,10 @@ static void stage_microcode(void)
if (WARN_ON_ONCE(err))
return;
- ret = do_stage(mmio_pa);
- if (ret != UCODE_OK) {
+ err = do_stage(mmio_pa);
+ if (err) {
pr_err("Error: staging failed with %s for CPU%d at package %u.\n",
- ret == UCODE_TIMEOUT ? "timeout" : "error state",
- cpu, pkg_id);
+ staging_errstr(err), cpu, pkg_id);
return;
}
}
On Thu, Sep 04, 2025 at 05:04:24PM -0700, Chang S. Bae wrote:
> Okay, I think this 'MCU' looks a bit fuzzy.
I mean we should try to sort them by MSR *number* so that they're easier to
find. I know, I know, the whole file needs sorting but we don't need the churn
right now. So let's put new ones where they should be, i.e.:
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 6eb0c62c7fbd..87be1f815211 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -922,8 +922,6 @@
#define MSR_IA32_UCODE_WRITE 0x00000079
#define MSR_IA32_UCODE_REV 0x0000008b
-#define MSR_IA32_MCU_STAGING_MBOX_ADDR 0x000007a5
-
/* Intel SGX Launch Enclave Public Key Hash MSRs */
#define MSR_IA32_SGXLEPUBKEYHASH0 0x0000008C
#define MSR_IA32_SGXLEPUBKEYHASH1 0x0000008D
@@ -1219,6 +1217,8 @@
#define MSR_IA32_VMX_VMFUNC 0x00000491
#define MSR_IA32_VMX_PROCBASED_CTLS3 0x00000492
+#define MSR_IA32_MCU_STAGING_MBOX_ADDR 0x000007a5
+
/* Resctrl MSRs: */
/* - Intel: */
#define MSR_IA32_L3_QOS_CFG 0xc81
> The next two patches in this series introduce helpers that return an error
> code and *also* update this ucode_state. The latter could go away. Instead,
> the error code could be just down through here and decoded like:
You don't need the strings decoding... yet. This stuff is going to be looked
at only by us so there's no problem with simply dumping the errval and then
going to consult the sources where that value comes from.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 9/5/2025 4:13 AM, Borislav Petkov wrote:
>
> So let's put new ones where they should be
I see.
> You don't need the strings decoding... yet. This stuff is going to be looked
> at only by us so there's no problem with simply dumping the errval and then
> going to consult the sources where that value comes from.
Makes sense. Then, it could be just like this:
pr_err("Error: staging failed (code = %d) for CPU%d at package %u.\n",
err, cpu, pkg_id);
Thanks,
Chang
To facilitate a structured staging handler, a set of functions will be
introduced. Define staging_state struct to simplify function prototypes
by consolidating relevant data, instead of passing multiple local
variables.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
---
V4 -> V5: Drop the ucode_ptr field (Dave)
V1 -> V2: New patch
Prior to V2, local variables were used to track state values, with the
intention of improving readability by explicitly passing them between
functions. However, given feedbacks, introducing a dedicated data
structure looks to provide a benefit by simplifying the main loop.
---
arch/x86/kernel/cpu/microcode/intel.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index d309fb1f058f..3ca22457d839 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -54,6 +54,25 @@ struct extended_sigtable {
struct extended_signature sigs[];
};
+/**
+ * struct staging_state - Tracks the current staging process state
+ *
+ * @mmio_base: MMIO base address for staging
+ * @ucode_len: Total size of the microcode image
+ * @chunk_size: Size of each data piece
+ * @bytes_sent: Total bytes transmitted so far
+ * @offset: Current offset in the microcode image
+ * @state: Current state of the staging process
+ */
+struct staging_state {
+ void __iomem *mmio_base;
+ unsigned int ucode_len;
+ unsigned int chunk_size;
+ unsigned int bytes_sent;
+ unsigned int offset;
+ enum ucode_state state;
+};
+
#define DEFAULT_UCODE_TOTALSIZE (DEFAULT_UCODE_DATASIZE + MC_HEADER_SIZE)
#define EXT_HEADER_SIZE (sizeof(struct extended_sigtable))
#define EXT_SIGNATURE_SIZE (sizeof(struct extended_signature))
--
2.48.1
On Sat, Aug 23, 2025 at 08:52:07AM -0700, Chang S. Bae wrote:
> To facilitate a structured staging handler, a set of functions will be
> introduced.
No need for that sentence.
> Define staging_state struct to simplify function prototypes
> by consolidating relevant data, instead of passing multiple local
> variables.
That's clear enough.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 9/4/2025 6:48 AM, Borislav Petkov wrote: > On Sat, Aug 23, 2025 at 08:52:07AM -0700, Chang S. Bae wrote: >> To facilitate a structured staging handler, a set of functions will be >> introduced. > > No need for that sentence. Fixed. Thanks
Previously, per-package staging invocations and their associated state
data were established. The next step is to implement the actual staging
handler according to the specified protocol. Below are key aspects to
note:
(a) Each staging process must begin by resetting the staging hardware.
(b) The staging hardware processes up to a page-sized chunk of the
microcode image per iteration, requiring software to submit data
incrementally.
(c) Once a data chunk is processed, the hardware responds with an
offset in the image for the next chunk.
(d) The offset may indicate completion or request retransmission of an
already transferred chunk. As long as the total transferred data
remains within the predefined limit (twice the image size),
retransmissions should be acceptable.
With that, incorporate these code sequences to the staging handler:
1. Initialization: Map the MMIO space via ioremap(). Reset the staging
hardware and initialize software state, ensuring a fresh staging
process aligned with (a).
2. Processing Loop: Introduce a loop iterating over data chunk,
following (b), with proper termination conditions established from
(d) -- stop staging when the hardware signals completion, or if the
total transmitted data exceeds the predefined limit.
3. Loop Body: Finally, compose the loop body with two steps --
transmitting a data chunk and retrieving the next offset from the
hardware response, aligning with (b) and (c).
Since data transmission and mailbox format handling require additional
details, they are implemented separately in next changes.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
---
V4 -> V5:
* Convert helper functions to return error codes (Dave)
* Consolidate loop-control logic
* Refactor next-chunk calculation/check for clarity
* Remove offset sanity check (moved to next patch)
V2 -> V3:
* Rework code to eliminate global variables (Dave)
* Remove redundant variable resets (Chao)
V1 -> V2:
* Re-write the changelog for clarity (Dave).
* Move staging handling code into intel.c (Boris).
* Add extensive comments to clarify staging logic and hardware
interactions, along with function renaming (Dave).
---
arch/x86/kernel/cpu/microcode/intel.c | 137 +++++++++++++++++++++++++-
1 file changed, 133 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 3ca22457d839..a1b13202330d 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -20,6 +20,8 @@
#include <linux/cpu.h>
#include <linux/uio.h>
#include <linux/mm.h>
+#include <linux/delay.h>
+#include <linux/io.h>
#include <asm/cpu_device_id.h>
#include <asm/processor.h>
@@ -33,6 +35,16 @@ static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
#define UCODE_BSP_LOADED ((struct microcode_intel *)0x1UL)
+/* Defines for the microcode staging mailbox interface */
+
+#define MBOX_REG_NUM 4
+#define MBOX_REG_SIZE sizeof(u32)
+
+#define MBOX_CONTROL_OFFSET 0x0
+#define MBOX_STATUS_OFFSET 0x4
+
+#define MASK_MBOX_CTRL_ABORT BIT(0)
+
/* Current microcode patch used in early patching on the APs. */
static struct microcode_intel *ucode_patch_va __read_mostly;
static struct microcode_intel *ucode_patch_late __read_mostly;
@@ -319,13 +331,130 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
}
/*
- * Handle the staging process using the mailbox MMIO interface.
- * Return the result state.
+ * Prepare for a new microcode transfer: reset hardware and record the
+ * image size.
+ */
+static void init_stage(struct staging_state *ss)
+{
+ ss->ucode_len = get_totalsize(&ucode_patch_late->hdr);
+
+ /*
+ * Abort any ongoing process, effectively resetting the device.
+ * Unlike regular mailbox data processing requests, this
+ * operation does not require a status check.
+ */
+ writel(MASK_MBOX_CTRL_ABORT, ss->mmio_base + MBOX_CONTROL_OFFSET);
+}
+
+/*
+ * Return PAGE_SIZE, or remaining bytes if this is the final chunk
+ */
+static inline unsigned int calc_next_chunk_size(unsigned int ucode_len, unsigned int offset)
+{
+ return min(PAGE_SIZE, ucode_len - offset);
+}
+
+/*
+ * Update the chunk size and decide whether another chunk can be sent.
+ * This accounts for remaining data and retry limits.
+ */
+static bool can_send_next_chunk(struct staging_state *ss)
+{
+ ss->chunk_size = calc_next_chunk_size(ss->ucode_len, ss->offset);
+ /*
+ * Each microcode image is divided into chunks, each at most
+ * one page size. A 10-chunk image would typically require 10
+ * transactions.
+ *
+ * However, the hardware managing the mailbox has limited
+ * resources and may not cache the entire image, potentially
+ * requesting the same chunk multiple times.
+ *
+ * To tolerate this behavior, allow up to twice the expected
+ * number of transactions (i.e., a 10-chunk image can take up to
+ * 20 attempts).
+ *
+ * If the number of attempts exceeds this limit, the hardware is
+ * likely stuck and mark the state as timeout.
+ */
+ if (ss->bytes_sent + ss->chunk_size > ss->ucode_len * 2) {
+ ss->state = UCODE_TIMEOUT;
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Determine whether staging is complete: either the hardware signaled
+ * the end offset, or no more transactions are permitted (retry limit
+ * reached).
+ */
+static inline bool staging_is_complete(struct staging_state *ss)
+{
+ return (ss->offset == UINT_MAX) || !can_send_next_chunk(ss);
+}
+
+/*
+ * Transmit a chunk of the microcode image to the hardware.
+ * Return 0 on success, or an error code on failure.
+ */
+static int send_data_chunk(struct staging_state *ss, void *ucode_ptr __maybe_unused)
+{
+ pr_debug_once("Staging mailbox loading code needs to be implemented.\n");
+ ss->state = UCODE_ERROR;
+ return -EPROTONOSUPPORT;
+}
+
+/*
+ * Retrieve the next offset from the hardware response.
+ * Return 0 on success, or an error code on failure.
+ */
+static int fetch_next_offset(struct staging_state *ss)
+{
+ pr_debug_once("Staging mailbox response handling code needs to be implemented.\n\n");
+ ss->state = UCODE_ERROR;
+ return -EPROTONOSUPPORT;
+}
+
+/*
+ * Handle the staging process using the mailbox MMIO interface. The
+ * microcode image is transferred in chunks until completion. Return the
+ * result state.
*/
static enum ucode_state do_stage(u64 mmio_pa)
{
- pr_debug_once("Staging implementation is pending.\n");
- return UCODE_ERROR;
+ struct staging_state ss = {};
+ int err;
+
+ ss.mmio_base = ioremap(mmio_pa, MBOX_REG_NUM * MBOX_REG_SIZE);
+ if (WARN_ON_ONCE(!ss.mmio_base))
+ return UCODE_ERROR;
+
+ init_stage(&ss);
+
+ /* Perform the staging process while within the retry limit */
+ while (!staging_is_complete(&ss)) {
+ /* Send a chunk of microcode each time: */
+ err = send_data_chunk(&ss, ucode_patch_late);
+ if (err)
+ break;
+ /*
+ * Then, ask the hardware which piece of the image it
+ * needs next. The same piece may be sent more than once.
+ */
+ err = fetch_next_offset(&ss);
+ if (err)
+ break;
+ }
+
+ iounmap(ss.mmio_base);
+
+ /*
+ * The helpers update ss.state on error. The final state is
+ * returned to the caller.
+ */
+ return ss.state;
}
static void stage_microcode(void)
--
2.48.1
On Sat, Aug 23, 2025 at 08:52:08AM -0700, Chang S. Bae wrote:
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> index 3ca22457d839..a1b13202330d 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -20,6 +20,8 @@
> #include <linux/cpu.h>
> #include <linux/uio.h>
> #include <linux/mm.h>
> +#include <linux/delay.h>
> +#include <linux/io.h>
You do see those are sorted by header name length in a reverse order, right?
>
> #include <asm/cpu_device_id.h>
> #include <asm/processor.h>
> @@ -33,6 +35,16 @@ static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
>
> #define UCODE_BSP_LOADED ((struct microcode_intel *)0x1UL)
>
> +/* Defines for the microcode staging mailbox interface */
> +
^ Superfluous newline.
> +#define MBOX_REG_NUM 4
> +#define MBOX_REG_SIZE sizeof(u32)
> +
> +#define MBOX_CONTROL_OFFSET 0x0
> +#define MBOX_STATUS_OFFSET 0x4
> +
> +#define MASK_MBOX_CTRL_ABORT BIT(0)
> +
> /* Current microcode patch used in early patching on the APs. */
> static struct microcode_intel *ucode_patch_va __read_mostly;
> static struct microcode_intel *ucode_patch_late __read_mostly;
...
> +/*
> + * Return PAGE_SIZE, or remaining bytes if this is the final chunk
> + */
> +static inline unsigned int calc_next_chunk_size(unsigned int ucode_len, unsigned int offset)
> +{
> + return min(PAGE_SIZE, ucode_len - offset);
> +}
That oneliner looks useless - sticking a comment over tne min() and putting it
at the single callsite below is good enough.
> +
> +/*
> + * Update the chunk size and decide whether another chunk can be sent.
> + * This accounts for remaining data and retry limits.
> + */
> +static bool can_send_next_chunk(struct staging_state *ss)
> +{
> + ss->chunk_size = calc_next_chunk_size(ss->ucode_len, ss->offset);
> + /*
> + * Each microcode image is divided into chunks, each at most
> + * one page size. A 10-chunk image would typically require 10
^^^^
> + * transactions.
> + *
> + * However, the hardware managing the mailbox has limited
> + * resources and may not cache the entire image, potentially
> + * requesting the same chunk multiple times.
> + *
> + * To tolerate this behavior, allow up to twice the expected
> + * number of transactions (i.e., a 10-chunk image can take up to
> + * 20 attempts).
Looks quirky but ok, let's try it in practice first...
> + *
> + * If the number of attempts exceeds this limit, the hardware is
> + * likely stuck and mark the state as timeout.
> + */
> + if (ss->bytes_sent + ss->chunk_size > ss->ucode_len * 2) {
> + ss->state = UCODE_TIMEOUT;
> + return false;
> + }
> +
> + return true;
> +}
...
> static enum ucode_state do_stage(u64 mmio_pa)
> {
> - pr_debug_once("Staging implementation is pending.\n");
> - return UCODE_ERROR;
> + struct staging_state ss = {};
> + int err;
> +
> + ss.mmio_base = ioremap(mmio_pa, MBOX_REG_NUM * MBOX_REG_SIZE);
> + if (WARN_ON_ONCE(!ss.mmio_base))
> + return UCODE_ERROR;
> +
> + init_stage(&ss);
> +
> + /* Perform the staging process while within the retry limit */
> + while (!staging_is_complete(&ss)) {
> + /* Send a chunk of microcode each time: */
> + err = send_data_chunk(&ss, ucode_patch_late);
> + if (err)
> + break;
> + /*
> + * Then, ask the hardware which piece of the image it
> + * needs next. The same piece may be sent more than once.
If this is part of normal operation, your send-max-2x-the-size heuristic might
fail quickly here. I'd track the number of chunks it wants you to send and
then set a per-chunk limit and when it reaches that limit, then cancel the
transaction. Dunno, let's try the simple scheme first...
> + */
> + err = fetch_next_offset(&ss);
> + if (err)
> + break;
> + }
> +
> + iounmap(ss.mmio_base);
> +
> + /*
> + * The helpers update ss.state on error. The final state is
> + * returned to the caller.
> + */
> + return ss.state;
> }
>
> static void stage_microcode(void)
> --
> 2.48.1
>
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 9/10/2025 11:33 AM, Borislav Petkov wrote:
> On Sat, Aug 23, 2025 at 08:52:08AM -0700, Chang S. Bae wrote:
>> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
>> index 3ca22457d839..a1b13202330d 100644
>> --- a/arch/x86/kernel/cpu/microcode/intel.c
>> +++ b/arch/x86/kernel/cpu/microcode/intel.c
>> @@ -20,6 +20,8 @@
>> #include <linux/cpu.h>
>> #include <linux/uio.h>
>> #include <linux/mm.h>
>> +#include <linux/delay.h>
>> +#include <linux/io.h>
>
> You do see those are sorted by header name length in a reverse order, right?
Okay, fixed -- and I adjusted patch6 as well:
#include <linux/initrd.h>
#include <linux/io.h>
#include <linux/kernel.h>
+#include <linux/pci_ids.h>
#include <linux/slab.h>
#include <linux/cpu.h>
#include <linux/uio.h>
>> +/* Defines for the microcode staging mailbox interface */
>> +
>
> ^ Superfluous newline.
Dropped.
>> +/*
>> + * Return PAGE_SIZE, or remaining bytes if this is the final chunk
>> + */
>> +static inline unsigned int calc_next_chunk_size(unsigned int ucode_len, unsigned int offset)
>> +{
>> + return min(PAGE_SIZE, ucode_len - offset);
>> +}
>
> That oneliner looks useless - sticking a comment over tne min() and putting it
> at the single callsite below is good enough.
Agreed -- removed the helper and moved them.
>> +/*
>> + * Update the chunk size and decide whether another chunk can be sent.
>> + * This accounts for remaining data and retry limits.
>> + */
>> +static bool can_send_next_chunk(struct staging_state *ss)
>> +{
>> + ss->chunk_size = calc_next_chunk_size(ss->ucode_len, ss->offset);
>> + /*
>> + * Each microcode image is divided into chunks, each at most
>> + * one page size. A 10-chunk image would typically require 10
> ^^^^
Fixed.
Just to make sure, include the diff here.
Thanks for the careful review and for sticking with this set.diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index c7a75afd2b9a..4d663cab4f48 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -12,16 +12,16 @@
*/
#define pr_fmt(fmt) "microcode: " fmt
#include <linux/earlycpio.h>
+#include <linux/delay.h>
#include <linux/firmware.h>
#include <linux/uaccess.h>
#include <linux/initrd.h>
+#include <linux/io.h>
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/cpu.h>
#include <linux/uio.h>
#include <linux/mm.h>
-#include <linux/delay.h>
-#include <linux/io.h>
#include <asm/cpu_device_id.h>
#include <asm/processor.h>
@@ -36,7 +36,6 @@ static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
#define UCODE_BSP_LOADED ((struct microcode_intel *)0x1UL)
/* Defines for the microcode staging mailbox interface */
-
#define MBOX_REG_NUM 4
#define MBOX_REG_SIZE sizeof(u32)
@@ -344,24 +343,18 @@ static void init_stage(struct staging_state *ss)
writel(MASK_MBOX_CTRL_ABORT, ss->mmio_base + MBOX_CONTROL_OFFSET);
}
-/*
- * Return PAGE_SIZE, or remaining bytes if this is the final chunk
- */
-static inline unsigned int calc_next_chunk_size(unsigned int ucode_len, unsigned int offset)
-{
- return min(PAGE_SIZE, ucode_len - offset);
-}
-
/*
* Update the chunk size and decide whether another chunk can be sent.
* This accounts for remaining data and retry limits.
*/
static bool can_send_next_chunk(struct staging_state *ss, int *err)
{
- ss->chunk_size = calc_next_chunk_size(ss->ucode_len, ss->offset);
+ /* a page size or remaining bytes if this is the final chunk */
+ ss->chunk_size = min(PAGE_SIZE, ss->ucode_len - ss->offset);
+
/*
* Each microcode image is divided into chunks, each at most
- * one page size. A 10-chunk image would typically require 10
+ * one page size. A 10-chunk image would typically require 10
* transactions.
*
* However, the hardware managing the mailbox has limited
Previously, the functions for sending microcode data and retrieving the
next offset were placeholders, as they required handling the specific
mailbox format. Implement them as following:
== Mailbox Format ==
The staging mailbox consists of two primary sections: 'header' and
'data'. While the microcode must be transferred following this format,
the actual data transfer mechanism involves reading and writing to
specific MMIO registers.
== Mailbox Data Registers ==
Unlike conventional interfaces that allocate MMIO space for each data
chunk, the staging interface features a "narrow" interface, using only
two dword-sized registers for read and write operations.
For example, if writing 2 dwords of data to a device. Typically, the
device would expose 2 dwords of "wide" MMIO space. To send the data to
the device:
writel(buf[0], io_addr + 0);
writel(buf[1], io_addr + 1);
But, this interface is a bit different. Instead of having a "wide"
interface where there is separate MMIO space for each word in a
transaction, it has a "narrow" interface where several words are written
to the same spot in MMIO space:
writel(buf[0], io_addr);
writel(buf[1], io_addr);
The same goes for the read side.
== Implementation Summary ==
Given that, introduce two layers of helper functions at first:
* Low-level helpers for reading and writing to data registers directly.
* Wrapper functions for handling mailbox header and data sections.
Using them, implement send_data_chunk() and fetch_next_offset()
functions. Add explicit error and timeout handling routine in
wait_for_transaction(), finishing up the transfer.
Both hardware error states and implicit errors -- invalid header or
offset -- result in UCODE_ERROR. Emit a clear message for the latter.
Note: The kernel has support for similar mailboxes. But none of them are
compatible with this one. Trying to share code resulted in a bloated
mess, so this code is standalone.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
---
V4 -> V5: Addressed Dave's feedback
* fetch_next_offset():
- Make dword reads explicit
- Consolidate offset validation -- adding another user for the
end-offset checker
- Convert WARN_* with pr_err_once()
* Simplify transaction waiting logic a bit
V2 -> V3:
* Update code to reflect the removal of a global variable (Dave).
V1 -> V2:
* Add lots of code comments and edit the changelog (Dave).
* Encapsulate register read/write operations for processing header and
data sections.
---
arch/x86/kernel/cpu/microcode/intel.c | 186 +++++++++++++++++++++++++-
1 file changed, 179 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index a1b13202330d..f6b365eba6a2 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -22,6 +22,7 @@
#include <linux/mm.h>
#include <linux/delay.h>
#include <linux/io.h>
+#include <linux/pci_ids.h>
#include <asm/cpu_device_id.h>
#include <asm/processor.h>
@@ -42,8 +43,31 @@ static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
#define MBOX_CONTROL_OFFSET 0x0
#define MBOX_STATUS_OFFSET 0x4
+#define MBOX_WRDATA_OFFSET 0x8
+#define MBOX_RDDATA_OFFSET 0xc
#define MASK_MBOX_CTRL_ABORT BIT(0)
+#define MASK_MBOX_CTRL_GO BIT(31)
+
+#define MASK_MBOX_STATUS_ERROR BIT(2)
+#define MASK_MBOX_STATUS_READY BIT(31)
+
+#define MASK_MBOX_RESP_SUCCESS BIT(0)
+#define MASK_MBOX_RESP_PROGRESS BIT(1)
+#define MASK_MBOX_RESP_ERROR BIT(2)
+
+#define MBOX_CMD_LOAD 0x3
+#define MBOX_OBJ_STAGING 0xb
+#define MBOX_HEADER(size) ((PCI_VENDOR_ID_INTEL) | \
+ (MBOX_OBJ_STAGING << 16) | \
+ ((u64)((size) / sizeof(u32)) << 32))
+
+/* The size of each mailbox header */
+#define MBOX_HEADER_SIZE sizeof(u64)
+/* The size of staging hardware response */
+#define MBOX_RESPONSE_SIZE sizeof(u64)
+
+#define MBOX_XACTION_TIMEOUT_MS (10 * MSEC_PER_SEC)
/* Current microcode patch used in early patching on the APs. */
static struct microcode_intel *ucode_patch_va __read_mostly;
@@ -330,6 +354,49 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
return size ? NULL : patch;
}
+static inline u32 read_mbox_dword(void __iomem *mmio_base)
+{
+ u32 dword = readl(mmio_base + MBOX_RDDATA_OFFSET);
+
+ /* Acknowledge read completion to the staging hardware */
+ writel(0, mmio_base + MBOX_RDDATA_OFFSET);
+ return dword;
+}
+
+static inline void write_mbox_dword(void __iomem *mmio_base, u32 dword)
+{
+ writel(dword, mmio_base + MBOX_WRDATA_OFFSET);
+}
+
+static inline u64 read_mbox_header(void __iomem *mmio_base)
+{
+ u32 high, low;
+
+ low = read_mbox_dword(mmio_base);
+ high = read_mbox_dword(mmio_base);
+
+ return ((u64)high << 32) | low;
+}
+
+static inline void write_mbox_header(void __iomem *mmio_base, u64 value)
+{
+ write_mbox_dword(mmio_base, value);
+ write_mbox_dword(mmio_base, value >> 32);
+}
+
+static void write_mbox_data(void __iomem *mmio_base, u32 *chunk, unsigned int chunk_bytes)
+{
+ int i;
+
+ /*
+ * The MMIO space is mapped as Uncached (UC). Each write arrives
+ * at the device as an individual transaction in program order.
+ * The device can then resemble the sequence accordingly.
+ */
+ for (i = 0; i < chunk_bytes / sizeof(u32); i++)
+ write_mbox_dword(mmio_base, chunk[i]);
+}
+
/*
* Prepare for a new microcode transfer: reset hardware and record the
* image size.
@@ -385,6 +452,14 @@ static bool can_send_next_chunk(struct staging_state *ss)
return true;
}
+/*
+ * The hardware indicates completion by returning a sentinel end offset
+ */
+static inline bool is_end_offset(u32 offset)
+{
+ return offset == UINT_MAX;
+}
+
/*
* Determine whether staging is complete: either the hardware signaled
* the end offset, or no more transactions are permitted (retry limit
@@ -392,18 +467,73 @@ static bool can_send_next_chunk(struct staging_state *ss)
*/
static inline bool staging_is_complete(struct staging_state *ss)
{
- return (ss->offset == UINT_MAX) || !can_send_next_chunk(ss);
+ return is_end_offset(ss->offset) || !can_send_next_chunk(ss);
+}
+
+/*
+ * Wait for the hardware to complete a transaction.
+ * Return 0 on success, or an error code on failure.
+ */
+static int wait_for_transaction(struct staging_state *ss)
+{
+ u32 timeout, status;
+
+ /* Allow time for hardware to complete the operation: */
+ for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT_MS; timeout++) {
+ msleep(1);
+
+ status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
+ /* Break out early if the hardware is ready: */
+ if (status & MASK_MBOX_STATUS_READY)
+ break;
+ }
+
+ /* Check for explicit error response */
+ if (status & MASK_MBOX_STATUS_ERROR) {
+ ss->state = UCODE_ERROR;
+ return -EPROTO;
+ }
+
+ /*
+ * Hardware is neither responded to the action nor signaled any
+ * error. Treat this as timeout.
+ */
+ if (!(status & MASK_MBOX_STATUS_READY)) {
+ ss->state = UCODE_TIMEOUT;
+ return -ETIMEDOUT;
+ }
+
+ ss->state = UCODE_OK;
+ return 0;
}
/*
* Transmit a chunk of the microcode image to the hardware.
* Return 0 on success, or an error code on failure.
*/
-static int send_data_chunk(struct staging_state *ss, void *ucode_ptr __maybe_unused)
+static int send_data_chunk(struct staging_state *ss, void *ucode_ptr)
{
- pr_debug_once("Staging mailbox loading code needs to be implemented.\n");
- ss->state = UCODE_ERROR;
- return -EPROTONOSUPPORT;
+ u32 *src_chunk = ucode_ptr + ss->offset;
+ u16 mbox_size;
+
+ /*
+ * Write a 'request' mailbox object in this order:
+ * 1. Mailbox header includes total size
+ * 2. Command header specifies the load operation
+ * 3. Data section contains a microcode chunk
+ *
+ * Thus, the mailbox size is two headers plus the chunk size.
+ */
+ mbox_size = MBOX_HEADER_SIZE * 2 + ss->chunk_size;
+ write_mbox_header(ss->mmio_base, MBOX_HEADER(mbox_size));
+ write_mbox_header(ss->mmio_base, MBOX_CMD_LOAD);
+ write_mbox_data(ss->mmio_base, src_chunk, ss->chunk_size);
+ ss->bytes_sent += ss->chunk_size;
+
+ /* Notify the hardware that the mailbox is ready for processing. */
+ writel(MASK_MBOX_CTRL_GO, ss->mmio_base + MBOX_CONTROL_OFFSET);
+
+ return wait_for_transaction(ss);
}
/*
@@ -412,9 +542,51 @@ static int send_data_chunk(struct staging_state *ss, void *ucode_ptr __maybe_unu
*/
static int fetch_next_offset(struct staging_state *ss)
{
- pr_debug_once("Staging mailbox response handling code needs to be implemented.\n\n");
+ const u64 expected_header = MBOX_HEADER(MBOX_HEADER_SIZE + MBOX_RESPONSE_SIZE);
+ u32 offset, status;
+ u64 header;
+ int err;
+
+ /*
+ * The 'response' mailbox returns three fields, in order:
+ * 1. Header
+ * 2. Next offset in the microcode image
+ * 3. Status flags
+ */
+ header = read_mbox_header(ss->mmio_base);
+ offset = read_mbox_dword(ss->mmio_base);
+ status = read_mbox_dword(ss->mmio_base);
+
+ /* All valid responses must start with the expected header. */
+ if (header != expected_header) {
+ pr_err_once("staging: invalid response header\n");
+ err = -EINVAL;
+ goto err_out;
+ }
+
+ /*
+ * Verify the offset: If not at the end marker, it must not
+ * exceed the microcode image length
+ */
+ if (!is_end_offset(offset) && offset > ss->ucode_len) {
+ pr_err_once("staging: invalid response offset\n");
+ err = -EINVAL;
+ goto err_out;
+ }
+
+ /* Hardware may report errors explicitly in the status field */
+ if (status & MASK_MBOX_RESP_ERROR) {
+ err = -EPROTO;
+ goto err_out;
+ }
+
+ ss->offset = offset;
+ ss->state = UCODE_OK;
+ return 0;
+
+err_out:
ss->state = UCODE_ERROR;
- return -EPROTONOSUPPORT;
+ return err;
}
/*
--
2.48.1
On Sat, Aug 23, 2025 at 08:52:09AM -0700, Chang S. Bae wrote:
> Previously, the functions for sending microcode data and retrieving the
> next offset were placeholders, as they required handling the specific
> mailbox format. Implement them as following:
>
> == Mailbox Format ==
>
> The staging mailbox consists of two primary sections: 'header' and
> 'data'. While the microcode must be transferred following this format,
> the actual data transfer mechanism involves reading and writing to
> specific MMIO registers.
>
> == Mailbox Data Registers ==
>
> Unlike conventional interfaces that allocate MMIO space for each data
> chunk, the staging interface features a "narrow" interface, using only
> two dword-sized registers for read and write operations.
>
> For example, if writing 2 dwords of data to a device. Typically, the
^
|
abrupt sentence end. Looks like the idea was to not have a fullstop there but
continue...
> device would expose 2 dwords of "wide" MMIO space. To send the data to
> the device:
>
> writel(buf[0], io_addr + 0);
> writel(buf[1], io_addr + 1);
>
> But, this interface is a bit different. Instead of having a "wide"
> interface where there is separate MMIO space for each word in a
> transaction, it has a "narrow" interface where several words are written
> to the same spot in MMIO space:
>
> writel(buf[0], io_addr);
> writel(buf[1], io_addr);
>
> The same goes for the read side.
I don't understand what the point of this explanation is if all you end up
doing is writing to "io_addr". Why introduce the unnecessary confusion?
>
> == Implementation Summary ==
>
> Given that, introduce two layers of helper functions at first:
>
> * Low-level helpers for reading and writing to data registers directly.
> * Wrapper functions for handling mailbox header and data sections.
>
> Using them, implement send_data_chunk() and fetch_next_offset()
> functions. Add explicit error and timeout handling routine in
> wait_for_transaction(), finishing up the transfer.
>
> Both hardware error states and implicit errors -- invalid header or
> offset -- result in UCODE_ERROR. Emit a clear message for the latter.a
Can we pu-lease stop explaining what the code does?!
Do not talk about *what* the patch is doing in the commit message - that
should be obvious from the diff itself. Rather, concentrate on the *why*
it needs to be done.
> Note: The kernel has support for similar mailboxes. But none of them are
> compatible with this one. Trying to share code resulted in a bloated
> mess, so this code is standalone.
Now that belongs in a commit message. Stuff which is non-obvious etc.
> +static void write_mbox_data(void __iomem *mmio_base, u32 *chunk, unsigned int chunk_bytes)
> +{
> + int i;
> +
> + /*
> + * The MMIO space is mapped as Uncached (UC). Each write arrives
> + * at the device as an individual transaction in program order.
> + * The device can then resemble the sequence accordingly.
reassemble?
> + */
> + for (i = 0; i < chunk_bytes / sizeof(u32); i++)
> + write_mbox_dword(mmio_base, chunk[i]);
> +}
> +
> /*
> * Prepare for a new microcode transfer: reset hardware and record the
> * image size.
...
> +static int wait_for_transaction(struct staging_state *ss)
> +{
> + u32 timeout, status;
> +
> + /* Allow time for hardware to complete the operation: */
> + for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT_MS; timeout++) {
> + msleep(1);
> +
> + status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
> + /* Break out early if the hardware is ready: */
> + if (status & MASK_MBOX_STATUS_READY)
> + break;
> + }
> +
> + /* Check for explicit error response */
> + if (status & MASK_MBOX_STATUS_ERROR) {
> + ss->state = UCODE_ERROR;
> + return -EPROTO;
> + }
> +
> + /*
> + * Hardware is neither responded to the action nor signaled any
s/is/has/
> + * error. Treat this as timeout.
> + */
> + if (!(status & MASK_MBOX_STATUS_READY)) {
> + ss->state = UCODE_TIMEOUT;
> + return -ETIMEDOUT;
> + }
> +
> + ss->state = UCODE_OK;
> + return 0;
> }
...
> @@ -412,9 +542,51 @@ static int send_data_chunk(struct staging_state *ss, void *ucode_ptr __maybe_unu
> */
> static int fetch_next_offset(struct staging_state *ss)
> {
> - pr_debug_once("Staging mailbox response handling code needs to be implemented.\n\n");
> + const u64 expected_header = MBOX_HEADER(MBOX_HEADER_SIZE + MBOX_RESPONSE_SIZE);
> + u32 offset, status;
> + u64 header;
> + int err;
> +
> + /*
> + * The 'response' mailbox returns three fields, in order:
> + * 1. Header
> + * 2. Next offset in the microcode image
> + * 3. Status flags
> + */
> + header = read_mbox_header(ss->mmio_base);
> + offset = read_mbox_dword(ss->mmio_base);
> + status = read_mbox_dword(ss->mmio_base);
> +
> + /* All valid responses must start with the expected header. */
> + if (header != expected_header) {
> + pr_err_once("staging: invalid response header\n");
> + err = -EINVAL;
> + goto err_out;
> + }
> +
> + /*
> + * Verify the offset: If not at the end marker, it must not
> + * exceed the microcode image length
> + */
> + if (!is_end_offset(offset) && offset > ss->ucode_len) {
> + pr_err_once("staging: invalid response offset\n");
You might want to dump some of the actual values in those pr_* statements so
that it dumps more breadcrumbs and those prints are more useful when debugging
issues.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 9/12/2025 9:34 AM, Borislav Petkov wrote:
> On Sat, Aug 23, 2025 at 08:52:09AM -0700, Chang S. Bae wrote:
>
> I don't understand what the point of this explanation is if all you end up
> doing is writing to "io_addr". Why introduce the unnecessary confusion?
Yikes, it looks like describing the format and interface went into too
much detail when that's already covered in the spec.
>> Note: The kernel has support for similar mailboxes. But none of them are
>> compatible with this one. Trying to share code resulted in a bloated
>> mess, so this code is standalone.
>
> Now that belongs in a commit message. Stuff which is non-obvious etc.
Okay, maybe the changelog could be something like this concisely:
The functions for sending microcode data and retrieving the next offset
were previously placeholders, as they need to handle a specific mailbox
format.
While the kernel supports similar mailboxes, none of them are compatible
with this one. Attempts to share code led to unnecessary complexity, so
add a dedicated implementation instead.
>> + /*
>> + * The MMIO space is mapped as Uncached (UC). Each write arrives
>> + * at the device as an individual transaction in program order.
>> + * The device can then resemble the sequence accordingly.
>
> reassemble?
Yep, fixed.
>> + /*
>> + * Hardware is neither responded to the action nor signaled any
>
> s/is/has/
Also fixed -- and I'll do a more careful pass to catch typos like that.
> You might want to dump some of the actual values in those pr_* statements so
> that it dumps more breadcrumbs and those prints are more useful when debugging
> issues.
Yeah, good suggestion -- maybe it could dump invalid header and offset
values like these:
pr_err_once("staging: invalid response header (0x%llx)\n", header);
pr_err_once("staging: invalid offset (%u) after the image end (%u)",
offset, ss->ucode_len);
On Fri, Sep 12, 2025 at 05:51:49PM -0700, Chang S. Bae wrote:
> The functions for sending microcode data and retrieving the next offset
> were previously placeholders, as they need to handle a specific mailbox
> format.
>
> While the kernel supports similar mailboxes, none of them are compatible
> with this one. Attempts to share code led to unnecessary complexity, so
> add a dedicated implementation instead.
Better.
> Yeah, good suggestion -- maybe it could dump invalid header and offset
> values like these:
>
> pr_err_once("staging: invalid response header (0x%llx)\n", header);
>
> pr_err_once("staging: invalid offset (%u) after the image end (%u)",
> offset, ss->ucode_len);
Should be more helpful than before, yeah.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
With the staging code being ready, check for staging availability by
reading these following MSRs:
* IA32_ARCH_CAPABILITIES (bit 16) for the presence of
IA32_MCU_ENUMERATION
* IA32_MCU_ENUMERATION (bit 4) for the staging feature availability.
When available, enable it by setting the feature bit.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
---
V4 -> V5:
* Collect Chao's Review tag
* rdmsrl() -> rdmsrq() (Chao)
V1 -> V2:
* Fold MSR definings (Boris).
---
arch/x86/include/asm/msr-index.h | 7 +++++++
arch/x86/kernel/cpu/microcode/intel.c | 17 +++++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 0356155f9264..2b7b598c414f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -166,6 +166,10 @@
* Processor MMIO stale data
* vulnerabilities.
*/
+#define ARCH_CAP_MCU_ENUM BIT(16) /*
+ * Indicates the presence of microcode update
+ * feature enumeration and status information
+ */
#define ARCH_CAP_FB_CLEAR BIT(17) /*
* VERW clears CPU fill buffer
* even on MDS_NO CPUs.
@@ -913,6 +917,9 @@
#define MSR_IA32_UCODE_WRITE 0x00000079
#define MSR_IA32_UCODE_REV 0x0000008b
+#define MSR_IA32_MCU_ENUMERATION 0x0000007b
+#define MCU_STAGING BIT(4)
+
#define MSR_IA32_MCU_STAGING_MBOX_ADDR 0x000007a5
/* Intel SGX Launch Enclave Public Key Hash MSRs */
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index f6b365eba6a2..7d6582f8a80f 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -1007,6 +1007,18 @@ static __init void calc_llc_size_per_core(struct cpuinfo_x86 *c)
llc_size_per_core = (unsigned int)llc_size;
}
+static __init bool staging_available(void)
+{
+ u64 val;
+
+ val = x86_read_arch_cap_msr();
+ if (!(val & ARCH_CAP_MCU_ENUM))
+ return false;
+
+ rdmsrq(MSR_IA32_MCU_ENUMERATION, val);
+ return !!(val & MCU_STAGING);
+}
+
struct microcode_ops * __init init_intel_microcode(void)
{
struct cpuinfo_x86 *c = &boot_cpu_data;
@@ -1017,6 +1029,11 @@ struct microcode_ops * __init init_intel_microcode(void)
return NULL;
}
+ if (staging_available()) {
+ microcode_intel_ops.use_staging = true;
+ pr_info("Enabled staging feature.\n");
+ }
+
calc_llc_size_per_core(c);
return µcode_intel_ops;
--
2.48.1
On 8/13/25 10:26, Chang S. Bae wrote: > The use case for this feature is late-loading with increased load times, > most likely in cloud service environments. I reached out to potential > users for reviews and feedback, and Anselm Busse has confirmed his > testing. I'm glad someone tested this, but it would also be ideal to have some code reviews from the folks that are actually going to use this. From the lack of reviews, I'm going to assume that folks don't really want this merged any time soon.
As microcode patch sizes continue to grow, late-loading latency spikes
can lead to timeouts and disruptions in running workloads. This trend of
increasing patch sizes is expected to continue, so a foundational
solution is needed to address the issue.
To mitigate the problem, a new staging feature is introduced. This option
processes most of the microcode update (excluding activation) on a
non-critical path, allowing CPUs to remain operational during the
majority of the update. By offloading work from the critical path,
staging can significantly reduces latency spikes.
Integrate staging as a preparatory step in late-loading. Introduce a new
callback for staging, which is invoked at the beginning of
load_late_stop_cpus(), before CPUs enter the rendezvous phase.
Staging follows an opportunistic model:
* If successful, it reduces CPU rendezvous time
* Even though it fails, the process falls back to the legacy path to
finish the loading process but with potentially higher latency.
Extend struct microcode_ops to incorporate staging properties, which will
be implemented in the vendor code separately.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
---
There were discussions about whether staging success should be enforced
by a configurable option. That topic is identified as follow-up work,
separate from this series.
https://lore.kernel.org/lkml/54308373-7867-4b76-be34-63730953f83c@intel.com/
V1 -> V2:
* Move invocation inside of load_late_stop_cpus() (Boris)
* Add more note about staging (Dave)
---
arch/x86/kernel/cpu/microcode/core.c | 11 +++++++++++
arch/x86/kernel/cpu/microcode/internal.h | 4 +++-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index b92e09a87c69..34e569ee1db2 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -552,6 +552,17 @@ static int load_late_stop_cpus(bool is_safe)
pr_err("You should switch to early loading, if possible.\n");
}
+ /*
+ * Pre-load the microcode image into a staging device. This
+ * process is preemptible and does not require stopping CPUs.
+ * Successful staging simplifies the subsequent late-loading
+ * process, reducing rendezvous time.
+ *
+ * Even if the transfer fails, the update will proceed as usual.
+ */
+ if (microcode_ops->use_staging)
+ microcode_ops->stage_microcode();
+
atomic_set(&late_cpus_in, num_online_cpus());
atomic_set(&offline_in_nmi, 0);
loops_per_usec = loops_per_jiffy / (TICK_NSEC / 1000);
diff --git a/arch/x86/kernel/cpu/microcode/internal.h b/arch/x86/kernel/cpu/microcode/internal.h
index 50a9702ae4e2..adf02ebbf7a3 100644
--- a/arch/x86/kernel/cpu/microcode/internal.h
+++ b/arch/x86/kernel/cpu/microcode/internal.h
@@ -31,10 +31,12 @@ struct microcode_ops {
* See also the "Synchronization" section in microcode_core.c.
*/
enum ucode_state (*apply_microcode)(int cpu);
+ void (*stage_microcode)(void);
int (*collect_cpu_info)(int cpu, struct cpu_signature *csig);
void (*finalize_late_load)(int result);
unsigned int nmi_safe : 1,
- use_nmi : 1;
+ use_nmi : 1,
+ use_staging : 1;
};
struct early_load_data {
--
2.48.1
>+ /*
>+ * Pre-load the microcode image into a staging device. This
>+ * process is preemptible and does not require stopping CPUs.
>+ * Successful staging simplifies the subsequent late-loading
>+ * process, reducing rendezvous time.
>+ *
>+ * Even if the transfer fails, the update will proceed as usual.
>+ */
>+ if (microcode_ops->use_staging)
>+ microcode_ops->stage_microcode();
->use_staging is not necessary. we can directly check if ->stage_microcode()
is NULL, or even add a stub function and then call ->stage_microcode()
unconditionally here. But since this pattern occurs only once in the series,
I don't have a strong preference.
If my review counts for anything,
Reviewed-by: Chao Gao <chao.gao@intel.com>
>+
> atomic_set(&late_cpus_in, num_online_cpus());
> atomic_set(&offline_in_nmi, 0);
> loops_per_usec = loops_per_jiffy / (TICK_NSEC / 1000);
>diff --git a/arch/x86/kernel/cpu/microcode/internal.h b/arch/x86/kernel/cpu/microcode/internal.h
>index 50a9702ae4e2..adf02ebbf7a3 100644
>--- a/arch/x86/kernel/cpu/microcode/internal.h
>+++ b/arch/x86/kernel/cpu/microcode/internal.h
>@@ -31,10 +31,12 @@ struct microcode_ops {
> * See also the "Synchronization" section in microcode_core.c.
> */
> enum ucode_state (*apply_microcode)(int cpu);
>+ void (*stage_microcode)(void);
> int (*collect_cpu_info)(int cpu, struct cpu_signature *csig);
> void (*finalize_late_load)(int result);
> unsigned int nmi_safe : 1,
>- use_nmi : 1;
>+ use_nmi : 1,
>+ use_staging : 1;
> };
>
> struct early_load_data {
>--
>2.48.1
>
When microcode staging is initiated, operations are carried out through
an MMIO interface. Each package has a unique interface specified by the
IA32_MCU_STAGING_MBOX_ADDR MSR, which maps to a set of 32-bit registers.
Prepare staging with the following steps:
1. Ensure the microcode image is 32-bit aligned to match the MMIO
register size.
2. Identify each MMIO interface based on its per-package scope.
3. Invoke the staging function for each identified interface, which
will be implemented separately.
Also, define cpu_primary_thread_mask for the CONFIG_SMP=n case, allowing
consistent use when narrowing down primary threads to locate the
per-package interface.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
Link: https://lore.kernel.org/all/871pznq229.ffs@tglx
---
V2 -> V3:
* Remove a global variable and adjust stage_microcode() (Dave).
* Simplify for_each_cpu() loop control code
* Handle rdmsrl_on_cpu() return code explicitly (Chao)
V1 -> V2:
* Adjust to reference the staging_state struct.
* Add lockdep_assert_cpus_held() (Boris)
---
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/include/asm/topology.h | 1 +
arch/x86/kernel/cpu/microcode/intel.c | 50 +++++++++++++++++++++++++++
3 files changed, 53 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b65c3ba5fa14..0356155f9264 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -913,6 +913,8 @@
#define MSR_IA32_UCODE_WRITE 0x00000079
#define MSR_IA32_UCODE_REV 0x0000008b
+#define MSR_IA32_MCU_STAGING_MBOX_ADDR 0x000007a5
+
/* Intel SGX Launch Enclave Public Key Hash MSRs */
#define MSR_IA32_SGXLEPUBKEYHASH0 0x0000008C
#define MSR_IA32_SGXLEPUBKEYHASH1 0x0000008D
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 6c79ee7c0957..91b5fc44ca62 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -235,6 +235,7 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
static inline int topology_max_smt_threads(void) { return 1; }
static inline unsigned int topology_amd_nodes_per_pkg(void) { return 1; }
+#define cpu_primary_thread_mask cpu_none_mask
#endif /* !CONFIG_SMP */
static inline void arch_fix_phys_package_id(int num, u32 slot)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 371ca6eac00e..468c4d3d5d66 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -299,6 +299,55 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
return size ? NULL : patch;
}
+/*
+ * Handle the staging process using the mailbox MMIO interface.
+ * Return the result state.
+ */
+static enum ucode_state do_stage(u64 mmio_pa)
+{
+ pr_debug_once("Staging implementation is pending.\n");
+ return UCODE_ERROR;
+}
+
+static void stage_microcode(void)
+{
+ unsigned int pkg_id = UINT_MAX;
+ enum ucode_state ret;
+ int cpu, err;
+ u64 mmio_pa;
+
+ if (!IS_ALIGNED(get_totalsize(&ucode_patch_late->hdr), sizeof(u32)))
+ return;
+
+ lockdep_assert_cpus_held();
+
+ /*
+ * The MMIO address is unique per package, and all the SMT
+ * primary threads are online here. Find each MMIO space by
+ * their package ids to avoid duplicate staging.
+ */
+ for_each_cpu(cpu, cpu_primary_thread_mask) {
+ if (topology_logical_package_id(cpu) == pkg_id)
+ continue;
+ pkg_id = topology_logical_package_id(cpu);
+
+ err = rdmsrl_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &mmio_pa);
+ if (WARN_ON_ONCE(err))
+ return;
+
+ ret = do_stage(mmio_pa);
+ if (ret != UCODE_OK) {
+ pr_err("Error: staging failed with %s for CPU%d at package %u.\n",
+ ret == UCODE_TIMEOUT ? "timeout" : "error state",
+ cpu, pkg_id);
+ return;
+ }
+ }
+
+ pr_info("Staging of patch revision 0x%x succeeded.\n",
+ ((struct microcode_header_intel *)ucode_patch_late)->rev);
+}
+
static enum ucode_state __apply_microcode(struct ucode_cpu_info *uci,
struct microcode_intel *mc,
u32 *cur_rev)
@@ -627,6 +676,7 @@ static struct microcode_ops microcode_intel_ops = {
.collect_cpu_info = collect_cpu_info,
.apply_microcode = apply_microcode_late,
.finalize_late_load = finalize_late_load,
+ .stage_microcode = stage_microcode,
.use_nmi = IS_ENABLED(CONFIG_X86_64),
};
--
2.48.1
On 8/13/25 10:26, Chang S. Bae wrote:
> When microcode staging is initiated, operations are carried out through
> an MMIO interface. Each package has a unique interface specified by the
> IA32_MCU_STAGING_MBOX_ADDR MSR, which maps to a set of 32-bit registers.
>
> Prepare staging with the following steps:
>
> 1. Ensure the microcode image is 32-bit aligned to match the MMIO
> register size.
>
> 2. Identify each MMIO interface based on its per-package scope.
>
> 3. Invoke the staging function for each identified interface, which
> will be implemented separately.
>
> Also, define cpu_primary_thread_mask for the CONFIG_SMP=n case, allowing
> consistent use when narrowing down primary threads to locate the
> per-package interface.
...
> static inline unsigned int topology_amd_nodes_per_pkg(void) { return 1; }
> +#define cpu_primary_thread_mask cpu_none_mask
> #endif /* !CONFIG_SMP */
Isn't 'cpu_none_mask' a mask containing no CPUs? How can that possible
work here:
for_each_cpu(cpu, cpu_primary_thread_mask) {
? Wouldn't it just not run through the for loop at all on CONFIG_SMP=n?
Is that what we want for some reason? I would have thought that we'd
still want to find the MMIO address for CPU 0, the one and only CPU.
> static inline void arch_fix_phys_package_id(int num, u32 slot)
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> index 371ca6eac00e..468c4d3d5d66 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -299,6 +299,55 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
> return size ? NULL : patch;
> }
>
> +/*
> + * Handle the staging process using the mailbox MMIO interface.
> + * Return the result state.
> + */
> +static enum ucode_state do_stage(u64 mmio_pa)
> +{
> + pr_debug_once("Staging implementation is pending.\n");
> + return UCODE_ERROR;
> +}
> +
> +static void stage_microcode(void)
> +{
> + unsigned int pkg_id = UINT_MAX;
> + enum ucode_state ret;
> + int cpu, err;
> + u64 mmio_pa;
> +
> + if (!IS_ALIGNED(get_totalsize(&ucode_patch_late->hdr), sizeof(u32)))
> + return;
> +
> + lockdep_assert_cpus_held();
> +
> + /*
> + * The MMIO address is unique per package, and all the SMT
> + * primary threads are online here. Find each MMIO space by
> + * their package ids to avoid duplicate staging.
> + */
> + for_each_cpu(cpu, cpu_primary_thread_mask) {
> + if (topology_logical_package_id(cpu) == pkg_id)
> + continue;
> + pkg_id = topology_logical_package_id(cpu);
> +
> + err = rdmsrl_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &mmio_pa);
> + if (WARN_ON_ONCE(err))
> + return;
> +
> + ret = do_stage(mmio_pa);
> + if (ret != UCODE_OK) {
> + pr_err("Error: staging failed with %s for CPU%d at package %u.\n",
> + ret == UCODE_TIMEOUT ? "timeout" : "error state",
> + cpu, pkg_id);
> + return;
> + }
> + }
> +
> + pr_info("Staging of patch revision 0x%x succeeded.\n",
> + ((struct microcode_header_intel *)ucode_patch_late)->rev);
> +}
Hmmm. Consider:
static struct microcode_intel *ucode_patch_late __read_mostly;
and:
struct microcode_intel {
struct microcode_header_intel hdr;
unsigned int bits[];
};
So isn't this whole ugly cast thing equivalent to:
ucode_patch_late->hdr.rev
?
On 8/13/2025 11:21 AM, Dave Hansen wrote:
> On 8/13/25 10:26, Chang S. Bae wrote:
>>
>> Also, define cpu_primary_thread_mask for the CONFIG_SMP=n case, allowing
>> consistent use when narrowing down primary threads to locate the
>> per-package interface.
> ...
>> static inline unsigned int topology_amd_nodes_per_pkg(void) { return 1; }
>> +#define cpu_primary_thread_mask cpu_none_mask
>> #endif /* !CONFIG_SMP */
>
> Isn't 'cpu_none_mask' a mask containing no CPUs? How can that possible
> work here:
>
> for_each_cpu(cpu, cpu_primary_thread_mask) {
>
> ? Wouldn't it just not run through the for loop at all on CONFIG_SMP=n?
> Is that what we want for some reason? I would have thought that we'd
> still want to find the MMIO address for CPU 0, the one and only CPU.
Yeah, right.
Then, looking at it again, I see this:
config MICROCODE_LATE_LOADING
bool "Late microcode loading (DANGEROUS)"
default n
depends on MICROCODE && SMP
This optimization only applies to the late-loading path. But, today I
also had to clarify this dependency for myself. At least, my changelog
could've made it clearer, sorry.
>> +
>> + pr_info("Staging of patch revision 0x%x succeeded.\n",
>> + ((struct microcode_header_intel *)ucode_patch_late)->rev);
>> +}
> Hmmm. Consider:
>
> static struct microcode_intel *ucode_patch_late __read_mostly;
>
> and:
>
> struct microcode_intel {
> struct microcode_header_intel hdr;
> unsigned int bits[];
> };
>
> So isn't this whole ugly cast thing equivalent to:
>
> ucode_patch_late->hdr.rev
>
> ?
Indeed. I must have been blind to that bit of ugliness. Thanks for
spotting on it!
Chang
On 8/13/25 13:46, Chang S. Bae wrote:
>> Isn't 'cpu_none_mask' a mask containing no CPUs? How can that possible
>> work here:
>>
>> for_each_cpu(cpu, cpu_primary_thread_mask) {
>>
>> ? Wouldn't it just not run through the for loop at all on CONFIG_SMP=n?
>> Is that what we want for some reason? I would have thought that we'd
>> still want to find the MMIO address for CPU 0, the one and only CPU.
>
> Yeah, right.
>
> Then, looking at it again, I see this:
>
> config MICROCODE_LATE_LOADING
> bool "Late microcode loading (DANGEROUS)"
> default n
> depends on MICROCODE && SMP
>
> This optimization only applies to the late-loading path. But, today I
> also had to clarify this dependency for myself. At least, my changelog
> could've made it clearer, sorry.
I'm not following.
I _think_ you're trying to say that it's a "no harm, no foul" situation
because this user of 'cpu_primary_thread_mask' won't even get compiled
in the buggy !SMP case.
But that's not the problem. The issue is that this line of code:
#define cpu_primary_thread_mask cpu_none_mask
reads as 100% bogus to me. Even on !SMP kernels,
'cpu_primary_thread_mask' should have one CPU in it. Right? The _one_
thread that's present is a primary thread. If this were a mask for
secondary threads, 'cpu_none_mask' would make sense. But it's not.
So could we please make use 'cpu_primary_thread_mask' is getting defined
correctly whether it's really getting compiled into the end image or not?
On 8/13/2025 1:55 PM, Dave Hansen wrote:
>
> But that's not the problem. The issue is that this line of code:
>
> #define cpu_primary_thread_mask cpu_none_mask
With CONFIG_SMP=n, on the core side (include/linux/cpu_smt.h), the code
clarifies there is no SMT:
# define cpu_smt_control (CPU_SMT_NOT_IMPLEMENTED)
This leads kernel/cpu.c to return an empty mask:
static inline const struct cpumask *cpuhp_get_primary_thread_mask(void)
{
return cpu_none_mask;
}
On the x86 side, the definition is explicit that “primary threads” are
SMT threads (arch/x86/kernel/smpboot.c):
/* CPUs which are the primary SMT threads */
struct cpumask __cpu_primary_thread_mask __read_mostly;
And via ifdeffery, this mask is only available to SMP kernels.
So it seems I had been subscribing this model -- no primary threads
without SMP.
> reads as 100% bogus to me. Even on !SMP kernels,
> 'cpu_primary_thread_mask' should have one CPU in it. Right? The _one_
> thread that's present is a primary thread. If this were a mask for
> secondary threads, 'cpu_none_mask' would make sense. But it's not.
Your confidence made me take another look.
Digging into the history, I found that x86 used to have this in the !SMP
case:
static inline bool topology_is_primary_thread(unsigned int cpu)
{
return true;
}
That stayed until the recent commit 4b455f59945aa ("cpu/SMT: Provide a
default topology_is_primary_thread()"), which now defines it in
include/linux/topology.h with this telling comment:
/*
* When disabling SMT, the primary thread of the SMT will remain
* enabled/active. Architectures that have a special primary thread
* (e.g. x86) need to override this function. ...
*/
This comment basically supports your point.
> So could we please make use 'cpu_primary_thread_mask' is getting defined
> correctly whether it's really getting compiled into the end image or not?
Given that, I’m thinking of simplifying the x86 side a bit -- by making
the primary thread mask configured and available regardless of
CONFIG_SMP, matching the behavior of other cpumasks. And its relevant
helpers are also available, like in the attached diff.
I think the change still aligns x86 with the core code -- especially
with the note in topology_is_primary_thread(). With that, the user may
be introduced here.
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 6c79ee7c0957..281252af6e9d 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -218,6 +218,12 @@ static inline unsigned int topology_amd_nodes_per_pkg(void)
return __amd_nodes_per_pkg;
}
+#else /* CONFIG_SMP */
+static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
+static inline int topology_max_smt_threads(void) { return 1; }
+static inline unsigned int topology_amd_nodes_per_pkg(void) { return 1; }
+#endif /* !CONFIG_SMP */
+
extern struct cpumask __cpu_primary_thread_mask;
#define cpu_primary_thread_mask ((const struct cpumask *)&__cpu_primary_thread_mask)
@@ -231,12 +237,6 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
}
#define topology_is_primary_thread topology_is_primary_thread
-#else /* CONFIG_SMP */
-static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
-static inline int topology_max_smt_threads(void) { return 1; }
-static inline unsigned int topology_amd_nodes_per_pkg(void) { return 1; }
-#endif /* !CONFIG_SMP */
-
static inline void arch_fix_phys_package_id(int num, u32 slot)
{
}
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index e35ccdc84910..946004d7dd1d 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -52,6 +52,9 @@ u32 cpuid_to_apicid[] __ro_after_init = { [0 ... NR_CPUS - 1] = BAD_APICID, };
/* Bitmaps to mark registered APICs at each topology domain */
static struct { DECLARE_BITMAP(map, MAX_LOCAL_APIC); } apic_maps[TOPO_MAX_DOMAIN] __ro_after_init;
+/* CPUs which are the primary SMT threads */
+struct cpumask __cpu_primary_thread_mask __read_mostly;
+
/*
* Keep track of assigned, disabled and rejected CPUs. Present assigned
* with 1 as CPU #0 is reserved for the boot CPU.
@@ -75,15 +78,11 @@ bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
return phys_id == (u64)cpuid_to_apicid[cpu];
}
-#ifdef CONFIG_SMP
static void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid)
{
if (!(apicid & (__max_threads_per_core - 1)))
cpumask_set_cpu(cpu, &__cpu_primary_thread_mask);
}
-#else
-static inline void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid) { }
-#endif
/*
* Convert the APIC ID to a domain level ID by masking out the low bits
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 33e166f6ab12..7804175d2d87 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -103,9 +103,6 @@ EXPORT_PER_CPU_SYMBOL(cpu_core_map);
DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
EXPORT_PER_CPU_SYMBOL(cpu_die_map);
-/* CPUs which are the primary SMT threads */
-struct cpumask __cpu_primary_thread_mask __read_mostly;
-
/* Representing CPUs for which sibling maps can be computed */
static cpumask_var_t cpu_sibling_setup_mask;
cpu_primary_thread_mask is only defined when CONFIG_SMP=y. However, even
in UP kernels there is always exactly one CPU, which can reasonably be
treated as the primary thread.
Historically, topology_is_primary_thread() always returned true with
CONFIG_SMP=n. A recent commit:
4b455f59945aa ("cpu/SMT: Provide a default topology_is_primary_thread()")
replaced it with a generic implementation with the note:
"When disabling SMT, the primary thread of the SMT will remain
enabled/active. Architectures that have a special primary thread (e.g.
x86) need to override this function. ..."
For consistency and clarity, make the primary thread mask available
regardless of SMP, similar to cpu_possible_mask and cpu_present_mask.
Move __cpu_primary_thread_mask into common code to prevent build issues.
Let cpu_mark_primary_thread() configure the mask even for UP kernels,
alongside other masks. Then, topology_is_primary_thread() can
consistently reference it.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
This improvement was identified from feedback on a feature-enabling
series [*], where a user of this mask is introduced. It is posted here as
a standalone patch for clarity and self-containment. The next revision of
that series will depend on this.
[*] https://lore.kernel.org/lkml/20250813172649.15474-1-chang.seok.bae@intel.com/
---
arch/x86/include/asm/topology.h | 12 ++++++------
arch/x86/kernel/cpu/topology.c | 4 ----
arch/x86/kernel/cpu/topology_common.c | 3 +++
arch/x86/kernel/smpboot.c | 3 ---
4 files changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 6c79ee7c0957..281252af6e9d 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -218,6 +218,12 @@ static inline unsigned int topology_amd_nodes_per_pkg(void)
return __amd_nodes_per_pkg;
}
+#else /* CONFIG_SMP */
+static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
+static inline int topology_max_smt_threads(void) { return 1; }
+static inline unsigned int topology_amd_nodes_per_pkg(void) { return 1; }
+#endif /* !CONFIG_SMP */
+
extern struct cpumask __cpu_primary_thread_mask;
#define cpu_primary_thread_mask ((const struct cpumask *)&__cpu_primary_thread_mask)
@@ -231,12 +237,6 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
}
#define topology_is_primary_thread topology_is_primary_thread
-#else /* CONFIG_SMP */
-static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
-static inline int topology_max_smt_threads(void) { return 1; }
-static inline unsigned int topology_amd_nodes_per_pkg(void) { return 1; }
-#endif /* !CONFIG_SMP */
-
static inline void arch_fix_phys_package_id(int num, u32 slot)
{
}
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index dcf05c64dd82..6a76caf813d6 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -75,15 +75,11 @@ bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
return phys_id == (u64)cpuid_to_apicid[cpu];
}
-#ifdef CONFIG_SMP
static void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid)
{
if (!(apicid & (__max_threads_per_core - 1)))
cpumask_set_cpu(cpu, &__cpu_primary_thread_mask);
}
-#else
-static inline void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid) { }
-#endif
/*
* Convert the APIC ID to a domain level ID by masking out the low bits
diff --git a/arch/x86/kernel/cpu/topology_common.c b/arch/x86/kernel/cpu/topology_common.c
index b5a5e1411469..71625795d711 100644
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -16,6 +16,9 @@ EXPORT_SYMBOL_GPL(x86_topo_system);
unsigned int __amd_nodes_per_pkg __ro_after_init;
EXPORT_SYMBOL_GPL(__amd_nodes_per_pkg);
+/* CPUs which are the primary SMT threads */
+struct cpumask __cpu_primary_thread_mask __read_mostly;
+
void topology_set_dom(struct topo_scan *tscan, enum x86_topology_domains dom,
unsigned int shift, unsigned int ncpus)
{
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 33e166f6ab12..7804175d2d87 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -103,9 +103,6 @@ EXPORT_PER_CPU_SYMBOL(cpu_core_map);
DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
EXPORT_PER_CPU_SYMBOL(cpu_die_map);
-/* CPUs which are the primary SMT threads */
-struct cpumask __cpu_primary_thread_mask __read_mostly;
-
/* Representing CPUs for which sibling maps can be computed */
static cpumask_var_t cpu_sibling_setup_mask;
--
2.48.1
On 8/22/2025 3:39 PM, Chang S. Bae wrote: > ... The next revision of > that series will depend on this. Fold into the V5 posting now: https://lore.kernel.org/lkml/20250823155214.17465-2-chang.seok.bae@intel.com/ Thanks, Chang
When microcode staging is initiated, operations are carried out through
an MMIO interface. Each package has a unique interface specified by the
IA32_MCU_STAGING_MBOX_ADDR MSR, which maps to a set of 32-bit registers.
Prepare staging with the following steps:
1. Ensure the microcode image is 32-bit aligned to match the MMIO
register size.
2. Identify each MMIO interface based on its per-package scope.
3. Invoke the staging function for each identified interface, which
will be implemented separately.
Also, define cpu_primary_thread_mask for the CONFIG_SMP=n case, allowing
consistent use when narrowing down primary threads to locate the
per-package interface.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
Link: https://lore.kernel.org/all/871pznq229.ffs@tglx
---
V4 -> V4a:
* Rebase on the primary thread cpumask fix (Dave)
* Clean up the rev print code (Dave)
* rdmsrl_on_cpu() -> rdmsrq_on_cpu (Chao)
---
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/kernel/cpu/microcode/intel.c | 49 +++++++++++++++++++++++++++
2 files changed, 51 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b65c3ba5fa14..0356155f9264 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -913,6 +913,8 @@
#define MSR_IA32_UCODE_WRITE 0x00000079
#define MSR_IA32_UCODE_REV 0x0000008b
+#define MSR_IA32_MCU_STAGING_MBOX_ADDR 0x000007a5
+
/* Intel SGX Launch Enclave Public Key Hash MSRs */
#define MSR_IA32_SGXLEPUBKEYHASH0 0x0000008C
#define MSR_IA32_SGXLEPUBKEYHASH1 0x0000008D
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 371ca6eac00e..d309fb1f058f 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -299,6 +299,54 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
return size ? NULL : patch;
}
+/*
+ * Handle the staging process using the mailbox MMIO interface.
+ * Return the result state.
+ */
+static enum ucode_state do_stage(u64 mmio_pa)
+{
+ pr_debug_once("Staging implementation is pending.\n");
+ return UCODE_ERROR;
+}
+
+static void stage_microcode(void)
+{
+ unsigned int pkg_id = UINT_MAX;
+ enum ucode_state ret;
+ int cpu, err;
+ u64 mmio_pa;
+
+ if (!IS_ALIGNED(get_totalsize(&ucode_patch_late->hdr), sizeof(u32)))
+ return;
+
+ lockdep_assert_cpus_held();
+
+ /*
+ * The MMIO address is unique per package, and all the SMT
+ * primary threads are online here. Find each MMIO space by
+ * their package ids to avoid duplicate staging.
+ */
+ for_each_cpu(cpu, cpu_primary_thread_mask) {
+ if (topology_logical_package_id(cpu) == pkg_id)
+ continue;
+ pkg_id = topology_logical_package_id(cpu);
+
+ err = rdmsrq_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &mmio_pa);
+ if (WARN_ON_ONCE(err))
+ return;
+
+ ret = do_stage(mmio_pa);
+ if (ret != UCODE_OK) {
+ pr_err("Error: staging failed with %s for CPU%d at package %u.\n",
+ ret == UCODE_TIMEOUT ? "timeout" : "error state",
+ cpu, pkg_id);
+ return;
+ }
+ }
+
+ pr_info("Staging of patch revision 0x%x succeeded.\n", ucode_patch_late->hdr.rev);
+}
+
static enum ucode_state __apply_microcode(struct ucode_cpu_info *uci,
struct microcode_intel *mc,
u32 *cur_rev)
@@ -627,6 +675,7 @@ static struct microcode_ops microcode_intel_ops = {
.collect_cpu_info = collect_cpu_info,
.apply_microcode = apply_microcode_late,
.finalize_late_load = finalize_late_load,
+ .stage_microcode = stage_microcode,
.use_nmi = IS_ENABLED(CONFIG_X86_64),
};
--
2.48.1
What's with the "v4a" stuff? Is this an attempt to torture test b4? ;) Seriously, though, it's been more than a week since the last series. Just send a new series, please.
To facilitate a structured staging handler, a set of functions will be
introduced. Define staging_state struct to simplify function prototypes
by consolidating relevant data, instead of passing multiple local
variables.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
---
V1 -> V2: New patch
Prior to V2, local variables were used to track state values, with the
intention of improving readability by explicitly passing them between
functions. However, given feedbacks, introducing a dedicated data
structure looks to provide a benefit by simplifying the main loop.
---
arch/x86/kernel/cpu/microcode/intel.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 468c4d3d5d66..36b426a90844 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -54,6 +54,27 @@ struct extended_sigtable {
struct extended_signature sigs[];
};
+/**
+ * struct staging_state - Tracks the current staging process state
+ *
+ * @mmio_base: MMIO base address for staging
+ * @ucode_ptr: Pointer to the microcode image
+ * @ucode_len: Total size of the microcode image
+ * @chunk_size: Size of each data piece
+ * @bytes_sent: Total bytes transmitted so far
+ * @offset: Current offset in the microcode image
+ * @state: Current state of the staging process
+ */
+struct staging_state {
+ void __iomem *mmio_base;
+ void *ucode_ptr;
+ unsigned int ucode_len;
+ unsigned int chunk_size;
+ unsigned int bytes_sent;
+ unsigned int offset;
+ enum ucode_state state;
+};
+
#define DEFAULT_UCODE_TOTALSIZE (DEFAULT_UCODE_DATASIZE + MC_HEADER_SIZE)
#define EXT_HEADER_SIZE (sizeof(struct extended_sigtable))
#define EXT_SIGNATURE_SIZE (sizeof(struct extended_signature))
--
2.48.1
> +/**
> + * struct staging_state - Tracks the current staging process state
> + *
> + * @mmio_base: MMIO base address for staging
> + * @ucode_ptr: Pointer to the microcode image
> + * @ucode_len: Total size of the microcode image
> + * @chunk_size: Size of each data piece
> + * @bytes_sent: Total bytes transmitted so far
> + * @offset: Current offset in the microcode image
> + * @state: Current state of the staging process
> + */
> +struct staging_state {
> + void __iomem *mmio_base;
> + void *ucode_ptr;
This is assigned a single time to ucode_patch_late. Shouldn't it share
the same type? Or, maybe not even get a structure member since it can
only have one value.
To facilitate a structured staging handler, a set of functions will be
introduced. Define staging_state struct to simplify function prototypes
by consolidating relevant data, instead of passing multiple local
variables.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
---
V4 -> V4a: Drop the ucode_ptr field (Dave)
---
arch/x86/kernel/cpu/microcode/intel.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index d309fb1f058f..3ca22457d839 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -54,6 +54,25 @@ struct extended_sigtable {
struct extended_signature sigs[];
};
+/**
+ * struct staging_state - Tracks the current staging process state
+ *
+ * @mmio_base: MMIO base address for staging
+ * @ucode_len: Total size of the microcode image
+ * @chunk_size: Size of each data piece
+ * @bytes_sent: Total bytes transmitted so far
+ * @offset: Current offset in the microcode image
+ * @state: Current state of the staging process
+ */
+struct staging_state {
+ void __iomem *mmio_base;
+ unsigned int ucode_len;
+ unsigned int chunk_size;
+ unsigned int bytes_sent;
+ unsigned int offset;
+ enum ucode_state state;
+};
+
#define DEFAULT_UCODE_TOTALSIZE (DEFAULT_UCODE_DATASIZE + MC_HEADER_SIZE)
#define EXT_HEADER_SIZE (sizeof(struct extended_sigtable))
#define EXT_SIGNATURE_SIZE (sizeof(struct extended_signature))
--
2.48.1
Previously, per-package staging invocations and their associated state
data were established. The next step is to implement the actual staging
handler according to the specified protocol. Below are key aspects to
note:
(a) Each staging process must begin by resetting the staging hardware.
(b) The staging hardware processes up to a page-sized chunk of the
microcode image per iteration, requiring software to submit data
incrementally.
(c) Once a data chunk is processed, the hardware responds with an
offset in the image for the next chunk.
(d) The offset may indicate completion or request retransmission of an
already transferred chunk. As long as the total transferred data
remains within the predefined limit (twice the image size),
retransmissions should be acceptable.
With that, incorporate these code sequences to the staging handler:
1. Initialization: Map the MMIO space via ioremap(). Reset the staging
hardware and initialize software state, ensuring a fresh staging
process aligned with (a).
2. Processing Loop: Introduce a loop iterating over data chunk,
following (b), with proper termination conditions established from
(d) -- stop staging when the hardware signals completion, or if the
total transmitted data exceeds the predefined limit.
3. Loop Body: Finally, compose the loop body with two steps --
transmitting a data chunk and retrieving the next offset from the
hardware response, aligning with (b) and (c).
Since data transmission and mailbox format handling require additional
details, they are implemented separately in next changes.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
---
V2 -> V3:
* Rework code to eliminate global variables (Dave)
* Remove redundant variable resets (Chao)
V1 -> V2:
* Re-write the changelog for clarity (Dave).
* Move staging handling code into intel.c (Boris).
* Add extensive comments to clarify staging logic and hardware
interactions, along with function renaming (Dave).
---
arch/x86/kernel/cpu/microcode/intel.c | 120 +++++++++++++++++++++++++-
1 file changed, 116 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 36b426a90844..3eb3331c5012 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -20,6 +20,8 @@
#include <linux/cpu.h>
#include <linux/uio.h>
#include <linux/mm.h>
+#include <linux/delay.h>
+#include <linux/io.h>
#include <asm/cpu_device_id.h>
#include <asm/processor.h>
@@ -33,6 +35,29 @@ static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
#define UCODE_BSP_LOADED ((struct microcode_intel *)0x1UL)
+/* Defines for the microcode staging mailbox interface */
+
+#define MBOX_REG_NUM 4
+#define MBOX_REG_SIZE sizeof(u32)
+
+#define MBOX_CONTROL_OFFSET 0x0
+#define MBOX_STATUS_OFFSET 0x4
+
+#define MASK_MBOX_CTRL_ABORT BIT(0)
+
+/*
+ * Each microcode image is divided into chunks, each at most
+ * MBOX_XACTION_SIZE in size. A 10-chunk image would typically require
+ * 10 transactions. However, the hardware managing the mailbox has
+ * limited resources and may not cache the entire image, potentially
+ * requesting the same chunk multiple times.
+ *
+ * To accommodate this behavior, allow up to twice the expected number of
+ * transactions (i.e., a 10-chunk image can take up to 20 attempts).
+ */
+#define MBOX_XACTION_SIZE PAGE_SIZE
+#define MBOX_XACTION_MAX(imgsz) ((imgsz) * 2)
+
/* Current microcode patch used in early patching on the APs. */
static struct microcode_intel *ucode_patch_va __read_mostly;
static struct microcode_intel *ucode_patch_late __read_mostly;
@@ -321,13 +346,100 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
}
/*
- * Handle the staging process using the mailbox MMIO interface.
- * Return the result state.
+ * Prepare for a new microcode transfer by resetting hardware and
+ * configuring microcode image info.
+ */
+static void init_stage(struct staging_state *ss)
+{
+ ss->ucode_ptr = ucode_patch_late;
+ ss->ucode_len = get_totalsize(&ucode_patch_late->hdr);
+
+ /*
+ * Abort any ongoing process, effectively resetting the device.
+ * Unlike regular mailbox data processing requests, this
+ * operation does not require a status check.
+ */
+ writel(MASK_MBOX_CTRL_ABORT, ss->mmio_base + MBOX_CONTROL_OFFSET);
+}
+
+/*
+ * Check if the staging process has completed. The hardware signals
+ * completion by setting a unique end offset.
+ */
+static inline bool is_stage_complete(unsigned int offset)
+{
+ return offset == UINT_MAX;
+}
+
+/*
+ * Determine if the next data chunk can be sent. Each chunk is typically
+ * one page unless the remaining data is smaller. If the total
+ * transmitted data exceeds the defined limit, a timeout occurs.
+ */
+static bool can_send_next_chunk(struct staging_state *ss)
+{
+ WARN_ON_ONCE(ss->ucode_len < ss->offset);
+ ss->chunk_size = min(MBOX_XACTION_SIZE, ss->ucode_len - ss->offset);
+
+ if (ss->bytes_sent + ss->chunk_size > MBOX_XACTION_MAX(ss->ucode_len)) {
+ ss->state = UCODE_TIMEOUT;
+ return false;
+ }
+ return true;
+}
+
+/*
+ * Transmit a chunk of the microcode image to the hardware.
+ * Return true if the chunk is processed successfully.
+ */
+static bool send_data_chunk(struct staging_state *ss)
+{
+ pr_debug_once("Staging mailbox loading code needs to be implemented.\n");
+ ss->state = UCODE_ERROR;
+ return false;
+}
+
+/*
+ * Retrieve the next offset from the hardware response.
+ * Return true if the response is valid, false otherwise.
+ */
+static bool fetch_next_offset(struct staging_state *ss)
+{
+ pr_debug_once("Staging mailbox response handling code needs to be implemented.\n\n");
+ ss->state = UCODE_ERROR;
+ return false;
+}
+
+/*
+ * Handle the staging process using the mailbox MMIO interface. The
+ * microcode image is transferred in chunks until completion. Return the
+ * result state.
*/
static enum ucode_state do_stage(u64 mmio_pa)
{
- pr_debug_once("Staging implementation is pending.\n");
- return UCODE_ERROR;
+ struct staging_state ss = {};
+
+ ss.mmio_base = ioremap(mmio_pa, MBOX_REG_NUM * MBOX_REG_SIZE);
+ if (WARN_ON_ONCE(!ss.mmio_base))
+ return UCODE_ERROR;
+
+ init_stage(&ss);
+
+ /* Perform the staging process while within the retry limit */
+ while (!is_stage_complete(ss.offset) && can_send_next_chunk(&ss)) {
+ /* Send a chunk of microcode each time: */
+ if (!send_data_chunk(&ss))
+ break;
+ /*
+ * Then, ask the hardware which piece of the image it
+ * needs next. The same piece may be sent more than once.
+ */
+ if (!fetch_next_offset(&ss))
+ break;
+ }
+
+ iounmap(ss.mmio_base);
+ return ss.state;
}
static void stage_microcode(void)
--
2.48.1
> +/*
> + * Determine if the next data chunk can be sent. Each chunk is typically
> + * one page unless the remaining data is smaller. If the total
> + * transmitted data exceeds the defined limit, a timeout occurs.
> + */
This comment isn't really telling the whole story. It's not just
determining if the chunk can be sent, it's calculating it and filling it in.
> +static bool can_send_next_chunk(struct staging_state *ss)
> +{
> + WARN_ON_ONCE(ss->ucode_len < ss->offset);
Please don't WARN_ON() they can be fatal because of panic_on_warn. Also
I think this is the wrong spot for this. We should enforce this at the
time ss->offset is _established_ which is oddly enough in the next patch.
ss->offset = read_mbox_dword(ss->mmio_base);
if (ss->offset > ss->ucode_len)
// error out
> + ss->chunk_size = min(MBOX_XACTION_SIZE, ss->ucode_len - ss->offset);
It's a _little_ non-obvious that "can_send_next_chunk()" is also setting
->chunk_size. It would be easier to grok if it was something like:
ok = calc_next_chunk_size(&ss);
if (!ok)
// error out
> + if (ss->bytes_sent + ss->chunk_size > MBOX_XACTION_MAX(ss->ucode_len)) {
> + ss->state = UCODE_TIMEOUT;
> + return false;
> + }
"TIMEOUT" seems like an odd thing to call this failure. Can you explain
the choice of this error code a bit, please?
> +/*
> + * Handle the staging process using the mailbox MMIO interface. The
> + * microcode image is transferred in chunks until completion. Return the
> + * result state.
> */
> static enum ucode_state do_stage(u64 mmio_pa)
> {
> - pr_debug_once("Staging implementation is pending.\n");
> - return UCODE_ERROR;
> + struct staging_state ss = {};
> +
> + ss.mmio_base = ioremap(mmio_pa, MBOX_REG_NUM * MBOX_REG_SIZE);
> + if (WARN_ON_ONCE(!ss.mmio_base))
> + return UCODE_ERROR;
> +
> + init_stage(&ss);
> +
> + /* Perform the staging process while within the retry limit */
> + while (!is_stage_complete(ss.offset) && can_send_next_chunk(&ss)) {
> + /* Send a chunk of microcode each time: */
> + if (!send_data_chunk(&ss))
> + break;
> + /*
> + * Then, ask the hardware which piece of the image it
> + * needs next. The same piece may be sent more than once.
> + */
> + if (!fetch_next_offset(&ss))
> + break;
> + }
The return types here are a _bit_ wonky. The 'bool' returns make sense
for things like is_stage_complete(). But they don't look right for:
if (!send_data_chunk(&ss))
break;
where we'd typically use an -ERRNO and where 0 mean success. It would
look something like this:
while (!staging_is_complete(&ss)) {
err = send_data_chunk(&ss);
if (err)
break;
err = fetch_next_offset(&ss);
if (err)
break;
}
That's utterly unambiguous about the intent and what types the send and
fetch function _must_ have.
Note I also moved the can_send_next_chunk() call into
staging_is_complete(). I think that makes sense as well for the
top-level loop.
Previously, per-package staging invocations and their associated state
data were established. The next step is to implement the actual staging
handler according to the specified protocol. Below are key aspects to
note:
(a) Each staging process must begin by resetting the staging hardware.
(b) The staging hardware processes up to a page-sized chunk of the
microcode image per iteration, requiring software to submit data
incrementally.
(c) Once a data chunk is processed, the hardware responds with an
offset in the image for the next chunk.
(d) The offset may indicate completion or request retransmission of an
already transferred chunk. As long as the total transferred data
remains within the predefined limit (twice the image size),
retransmissions should be acceptable.
With that, incorporate these code sequences to the staging handler:
1. Initialization: Map the MMIO space via ioremap(). Reset the staging
hardware and initialize software state, ensuring a fresh staging
process aligned with (a).
2. Processing Loop: Introduce a loop iterating over data chunk,
following (b), with proper termination conditions established from
(d) -- stop staging when the hardware signals completion, or if the
total transmitted data exceeds the predefined limit.
3. Loop Body: Finally, compose the loop body with two steps --
transmitting a data chunk and retrieving the next offset from the
hardware response, aligning with (b) and (c).
Since data transmission and mailbox format handling require additional
details, they are implemented separately in next changes.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
---
V4 -> V4a:
* Convert helper functions to return error codes (Dave)
* Consolidate loop-control logic
* Refactor next-chunk calculation/check for clarity
* Remove offset sanity check (moved to next patch)
---
arch/x86/kernel/cpu/microcode/intel.c | 137 +++++++++++++++++++++++++-
1 file changed, 133 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 3ca22457d839..a1b13202330d 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -20,6 +20,8 @@
#include <linux/cpu.h>
#include <linux/uio.h>
#include <linux/mm.h>
+#include <linux/delay.h>
+#include <linux/io.h>
#include <asm/cpu_device_id.h>
#include <asm/processor.h>
@@ -33,6 +35,16 @@ static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
#define UCODE_BSP_LOADED ((struct microcode_intel *)0x1UL)
+/* Defines for the microcode staging mailbox interface */
+
+#define MBOX_REG_NUM 4
+#define MBOX_REG_SIZE sizeof(u32)
+
+#define MBOX_CONTROL_OFFSET 0x0
+#define MBOX_STATUS_OFFSET 0x4
+
+#define MASK_MBOX_CTRL_ABORT BIT(0)
+
/* Current microcode patch used in early patching on the APs. */
static struct microcode_intel *ucode_patch_va __read_mostly;
static struct microcode_intel *ucode_patch_late __read_mostly;
@@ -319,13 +331,130 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
}
/*
- * Handle the staging process using the mailbox MMIO interface.
- * Return the result state.
+ * Prepare for a new microcode transfer: reset hardware and record the
+ * image size.
+ */
+static void init_stage(struct staging_state *ss)
+{
+ ss->ucode_len = get_totalsize(&ucode_patch_late->hdr);
+
+ /*
+ * Abort any ongoing process, effectively resetting the device.
+ * Unlike regular mailbox data processing requests, this
+ * operation does not require a status check.
+ */
+ writel(MASK_MBOX_CTRL_ABORT, ss->mmio_base + MBOX_CONTROL_OFFSET);
+}
+
+/*
+ * Return PAGE_SIZE, or remaining bytes if this is the final chunk
+ */
+static inline unsigned int calc_next_chunk_size(unsigned int ucode_len, unsigned int offset)
+{
+ return min(PAGE_SIZE, ucode_len - offset);
+}
+
+/*
+ * Update the chunk size and decide whether another chunk can be sent.
+ * This accounts for remaining data and retry limits.
+ */
+static bool can_send_next_chunk(struct staging_state *ss)
+{
+ ss->chunk_size = calc_next_chunk_size(ss->ucode_len, ss->offset);
+ /*
+ * Each microcode image is divided into chunks, each at most
+ * one page size. A 10-chunk image would typically require 10
+ * transactions.
+ *
+ * However, the hardware managing the mailbox has limited
+ * resources and may not cache the entire image, potentially
+ * requesting the same chunk multiple times.
+ *
+ * To tolerate this behavior, allow up to twice the expected
+ * number of transactions (i.e., a 10-chunk image can take up to
+ * 20 attempts).
+ *
+ * If the number of attempts exceeds this limit, the hardware is
+ * likely stuck and mark the state as timeout.
+ */
+ if (ss->bytes_sent + ss->chunk_size > ss->ucode_len * 2) {
+ ss->state = UCODE_TIMEOUT;
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Determine whether staging is complete: either the hardware signaled
+ * the end offset, or no more transactions are permitted (retry limit
+ * reached).
+ */
+static inline bool staging_is_complete(struct staging_state *ss)
+{
+ return (ss->offset == UINT_MAX) || !can_send_next_chunk(ss);
+}
+
+/*
+ * Transmit a chunk of the microcode image to the hardware.
+ * Return 0 on success, or an error code on failure.
+ */
+static int send_data_chunk(struct staging_state *ss, void *ucode_ptr __maybe_unused)
+{
+ pr_debug_once("Staging mailbox loading code needs to be implemented.\n");
+ ss->state = UCODE_ERROR;
+ return -EPROTONOSUPPORT;
+}
+
+/*
+ * Retrieve the next offset from the hardware response.
+ * Return 0 on success, or an error code on failure.
+ */
+static int fetch_next_offset(struct staging_state *ss)
+{
+ pr_debug_once("Staging mailbox response handling code needs to be implemented.\n\n");
+ ss->state = UCODE_ERROR;
+ return -EPROTONOSUPPORT;
+}
+
+/*
+ * Handle the staging process using the mailbox MMIO interface. The
+ * microcode image is transferred in chunks until completion. Return the
+ * result state.
*/
static enum ucode_state do_stage(u64 mmio_pa)
{
- pr_debug_once("Staging implementation is pending.\n");
- return UCODE_ERROR;
+ struct staging_state ss = {};
+ int err;
+
+ ss.mmio_base = ioremap(mmio_pa, MBOX_REG_NUM * MBOX_REG_SIZE);
+ if (WARN_ON_ONCE(!ss.mmio_base))
+ return UCODE_ERROR;
+
+ init_stage(&ss);
+
+ /* Perform the staging process while within the retry limit */
+ while (!staging_is_complete(&ss)) {
+ /* Send a chunk of microcode each time: */
+ err = send_data_chunk(&ss, ucode_patch_late);
+ if (err)
+ break;
+ /*
+ * Then, ask the hardware which piece of the image it
+ * needs next. The same piece may be sent more than once.
+ */
+ err = fetch_next_offset(&ss);
+ if (err)
+ break;
+ }
+
+ iounmap(ss.mmio_base);
+
+ /*
+ * The helpers update ss.state on error. The final state is
+ * returned to the caller.
+ */
+ return ss.state;
}
static void stage_microcode(void)
--
2.48.1
Previously, the functions for sending microcode data and retrieving the
next offset were placeholders, as they required handling the specific
mailbox format. Implement them as following:
== Mailbox Format ==
The staging mailbox consists of two primary sections: 'header' and
'data'. While the microcode must be transferred following this format,
the actual data transfer mechanism involves reading and writing to
specific MMIO registers.
== Mailbox Data Registers ==
Unlike conventional interfaces that allocate MMIO space for each data
chunk, the staging interface features a "narrow" interface, using only
two dword-sized registers for read and write operations.
For example, if writing 2 dwords of data to a device. Typically, the
device would expose 2 dwords of "wide" MMIO space. To send the data to
the device:
writel(buf[0], io_addr + 0);
writel(buf[1], io_addr + 1);
But, this interface is a bit different. Instead of having a "wide"
interface where there is separate MMIO space for each word in a
transaction, it has a "narrow" interface where several words are written
to the same spot in MMIO space:
writel(buf[0], io_addr);
writel(buf[1], io_addr);
The same goes for the read side.
== Implementation Summary ==
Given that, introduce two layers of helper functions at first:
* Low-level helpers for reading and writing to data registers directly.
* Wrapper functions for handling mailbox header and data sections.
Using them, implement send_data_chunk() and fetch_next_offset()
functions. Add explicit error and timeout handling routine in
wait_for_transaction(), finishing up the transfer.
Note: The kernel has support for similar mailboxes. But none of them are
compatible with this one. Trying to share code resulted in a bloated
mess, so this code is standalone.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
---
V2 -> V3:
* Update code to reflect the removal of a global variable (Dave).
V1 -> V2:
* Add lots of code comments and edit the changelog (Dave).
* Encapsulate register read/write operations for processing header and
data sections.
---
arch/x86/kernel/cpu/microcode/intel.c | 150 ++++++++++++++++++++++++--
1 file changed, 144 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 3eb3331c5012..5402bcb96f47 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -22,6 +22,7 @@
#include <linux/mm.h>
#include <linux/delay.h>
#include <linux/io.h>
+#include <linux/pci_ids.h>
#include <asm/cpu_device_id.h>
#include <asm/processor.h>
@@ -42,8 +43,30 @@ static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
#define MBOX_CONTROL_OFFSET 0x0
#define MBOX_STATUS_OFFSET 0x4
+#define MBOX_WRDATA_OFFSET 0x8
+#define MBOX_RDDATA_OFFSET 0xc
#define MASK_MBOX_CTRL_ABORT BIT(0)
+#define MASK_MBOX_CTRL_GO BIT(31)
+
+#define MASK_MBOX_STATUS_ERROR BIT(2)
+#define MASK_MBOX_STATUS_READY BIT(31)
+
+#define MASK_MBOX_RESP_SUCCESS BIT(0)
+#define MASK_MBOX_RESP_PROGRESS BIT(1)
+#define MASK_MBOX_RESP_ERROR BIT(2)
+
+#define MBOX_CMD_LOAD 0x3
+#define MBOX_OBJ_STAGING 0xb
+#define DWORD_ALIGN(size) ((size) / sizeof(u32))
+#define MBOX_HEADER(mbox_size) (PCI_VENDOR_ID_INTEL | \
+ (MBOX_OBJ_STAGING << 16) | \
+ ((u64)DWORD_ALIGN(mbox_size) << 32))
+
+/* The size of each mailbox header */
+#define MBOX_HEADER_SIZE sizeof(u64)
+/* The size of staging hardware response */
+#define MBOX_RESPONSE_SIZE sizeof(u64)
/*
* Each microcode image is divided into chunks, each at most
@@ -57,6 +80,7 @@ static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
*/
#define MBOX_XACTION_SIZE PAGE_SIZE
#define MBOX_XACTION_MAX(imgsz) ((imgsz) * 2)
+#define MBOX_XACTION_TIMEOUT_MS (10 * MSEC_PER_SEC)
/* Current microcode patch used in early patching on the APs. */
static struct microcode_intel *ucode_patch_va __read_mostly;
@@ -345,6 +369,49 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
return size ? NULL : patch;
}
+static inline u32 read_mbox_dword(void __iomem *mmio_base)
+{
+ u32 dword = readl(mmio_base + MBOX_RDDATA_OFFSET);
+
+ /* Acknowledge read completion to the staging hardware */
+ writel(0, mmio_base + MBOX_RDDATA_OFFSET);
+ return dword;
+}
+
+static inline void write_mbox_dword(void __iomem *mmio_base, u32 dword)
+{
+ writel(dword, mmio_base + MBOX_WRDATA_OFFSET);
+}
+
+static inline u64 read_mbox_header(void __iomem *mmio_base)
+{
+ u32 high, low;
+
+ low = read_mbox_dword(mmio_base);
+ high = read_mbox_dword(mmio_base);
+
+ return ((u64)high << 32) | low;
+}
+
+static inline void write_mbox_header(void __iomem *mmio_base, u64 value)
+{
+ write_mbox_dword(mmio_base, value);
+ write_mbox_dword(mmio_base, value >> 32);
+}
+
+static void write_mbox_data(void __iomem *mmio_base, u32 *chunk, unsigned int chunk_size)
+{
+ int i;
+
+ /*
+ * The MMIO space is mapped as Uncached (UC). Each write arrives
+ * at the device as an individual transaction in program order.
+ * The device can then resemble the sequence accordingly.
+ */
+ for (i = 0; i < DWORD_ALIGN(chunk_size); i++)
+ write_mbox_dword(mmio_base, chunk[i]);
+}
+
/*
* Prepare for a new microcode transfer by resetting hardware and
* configuring microcode image info.
@@ -388,15 +455,71 @@ static bool can_send_next_chunk(struct staging_state *ss)
return true;
}
+/*
+ * Wait for the hardware to complete a transaction.
+ * Return true on success, false on failure.
+ */
+static bool wait_for_transaction(struct staging_state *ss)
+{
+ u32 timeout, status;
+
+ /* Allow time for hardware to complete the operation: */
+ for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT_MS; timeout++) {
+ msleep(1);
+
+ status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
+ /* Break out early if the hardware is ready: */
+ if (status & MASK_MBOX_STATUS_READY)
+ break;
+ }
+
+ status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
+
+ /* Check for explicit error response */
+ if (status & MASK_MBOX_STATUS_ERROR) {
+ ss->state = UCODE_ERROR;
+ return false;
+ }
+
+ /*
+ * Hardware is neither responded to the action nor
+ * signaled any error. Treat the case as timeout.
+ */
+ if (!(status & MASK_MBOX_STATUS_READY)) {
+ ss->state = UCODE_TIMEOUT;
+ return false;
+ }
+
+ ss->state = UCODE_OK;
+ return true;
+}
+
/*
* Transmit a chunk of the microcode image to the hardware.
* Return true if the chunk is processed successfully.
*/
static bool send_data_chunk(struct staging_state *ss)
{
- pr_debug_once("Staging mailbox loading code needs to be implemented.\n");
- ss->state = UCODE_ERROR;
- return false;
+ u16 mbox_size = MBOX_HEADER_SIZE * 2 + ss->chunk_size;
+ u32 *chunk = ss->ucode_ptr + ss->offset;
+
+ /*
+ * Write 'request' mailbox object in the following order:
+ * - Mailbox header includes total size
+ * - Command header specifies the load operation
+ * - Data section contains a microcode chunk
+ */
+ write_mbox_header(ss->mmio_base, MBOX_HEADER(mbox_size));
+ write_mbox_header(ss->mmio_base, MBOX_CMD_LOAD);
+ write_mbox_data(ss->mmio_base, chunk, ss->chunk_size);
+ ss->bytes_sent += ss->chunk_size;
+
+ /*
+ * Notify the hardware that the mailbox is ready for processing.
+ * The staging hardware will process the request asynchronously.
+ */
+ writel(MASK_MBOX_CTRL_GO, ss->mmio_base + MBOX_CONTROL_OFFSET);
+ return wait_for_transaction(ss);
}
/*
@@ -405,9 +528,24 @@ static bool send_data_chunk(struct staging_state *ss)
*/
static bool fetch_next_offset(struct staging_state *ss)
{
- pr_debug_once("Staging mailbox response handling code needs to be implemented.\n\n");
- ss->state = UCODE_ERROR;
- return false;
+ const u16 mbox_size = MBOX_HEADER_SIZE + MBOX_RESPONSE_SIZE;
+
+ /* All responses begin with the same header value: */
+ WARN_ON_ONCE(read_mbox_header(ss->mmio_base) != MBOX_HEADER(mbox_size));
+
+ /*
+ * The 'response' mailbox contains two dword data:
+ * - First has next offset in microcode image
+ * - Second delivers status flag
+ */
+ ss->offset = read_mbox_dword(ss->mmio_base);
+ if (read_mbox_dword(ss->mmio_base) & MASK_MBOX_RESP_ERROR) {
+ ss->state = UCODE_ERROR;
+ return false;
+ }
+
+ ss->state = UCODE_OK;
+ return true;
}
/*
--
2.48.1
On 8/13/25 10:26, Chang S. Bae wrote:
> Previously, the functions for sending microcode data and retrieving the
> next offset were placeholders, as they required handling the specific
> mailbox format. Implement them as following:
>
> == Mailbox Format ==
>
> The staging mailbox consists of two primary sections: 'header' and
> 'data'. While the microcode must be transferred following this format,
> the actual data transfer mechanism involves reading and writing to
> specific MMIO registers.
>
> == Mailbox Data Registers ==
>
> Unlike conventional interfaces that allocate MMIO space for each data
> chunk, the staging interface features a "narrow" interface, using only
> two dword-sized registers for read and write operations.
>
> For example, if writing 2 dwords of data to a device. Typically, the
> device would expose 2 dwords of "wide" MMIO space. To send the data to
> the device:
>
> writel(buf[0], io_addr + 0);
> writel(buf[1], io_addr + 1);
>
> But, this interface is a bit different. Instead of having a "wide"
> interface where there is separate MMIO space for each word in a
> transaction, it has a "narrow" interface where several words are written
> to the same spot in MMIO space:
>
> writel(buf[0], io_addr);
> writel(buf[1], io_addr);
>
> The same goes for the read side.
>
> == Implementation Summary ==
>
> Given that, introduce two layers of helper functions at first:
>
> * Low-level helpers for reading and writing to data registers directly.
> * Wrapper functions for handling mailbox header and data sections.
>
> Using them, implement send_data_chunk() and fetch_next_offset()
> functions. Add explicit error and timeout handling routine in
> wait_for_transaction(), finishing up the transfer.
>
> Note: The kernel has support for similar mailboxes. But none of them are
> compatible with this one. Trying to share code resulted in a bloated
> mess, so this code is standalone.
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Tested-by: Anselm Busse <abusse@amazon.de>
> ---
> V2 -> V3:
> * Update code to reflect the removal of a global variable (Dave).
>
> V1 -> V2:
> * Add lots of code comments and edit the changelog (Dave).
> * Encapsulate register read/write operations for processing header and
> data sections.
> ---
> arch/x86/kernel/cpu/microcode/intel.c | 150 ++++++++++++++++++++++++--
> 1 file changed, 144 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> index 3eb3331c5012..5402bcb96f47 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -22,6 +22,7 @@
> #include <linux/mm.h>
> #include <linux/delay.h>
> #include <linux/io.h>
> +#include <linux/pci_ids.h>
>
> #include <asm/cpu_device_id.h>
> #include <asm/processor.h>
> @@ -42,8 +43,30 @@ static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
>
> #define MBOX_CONTROL_OFFSET 0x0
> #define MBOX_STATUS_OFFSET 0x4
> +#define MBOX_WRDATA_OFFSET 0x8
> +#define MBOX_RDDATA_OFFSET 0xc
>
> #define MASK_MBOX_CTRL_ABORT BIT(0)
> +#define MASK_MBOX_CTRL_GO BIT(31)
> +
> +#define MASK_MBOX_STATUS_ERROR BIT(2)
> +#define MASK_MBOX_STATUS_READY BIT(31)
> +
> +#define MASK_MBOX_RESP_SUCCESS BIT(0)
> +#define MASK_MBOX_RESP_PROGRESS BIT(1)
> +#define MASK_MBOX_RESP_ERROR BIT(2)
> +
> +#define MBOX_CMD_LOAD 0x3
> +#define MBOX_OBJ_STAGING 0xb
> +#define DWORD_ALIGN(size) ((size) / sizeof(u32))
> +#define MBOX_HEADER(mbox_size) (PCI_VENDOR_ID_INTEL | \
> + (MBOX_OBJ_STAGING << 16) | \
> + ((u64)DWORD_ALIGN(mbox_size) << 32))
> +
> +/* The size of each mailbox header */
> +#define MBOX_HEADER_SIZE sizeof(u64)
> +/* The size of staging hardware response */
> +#define MBOX_RESPONSE_SIZE sizeof(u64)
>
> /*
> * Each microcode image is divided into chunks, each at most
> @@ -57,6 +80,7 @@ static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
> */
> #define MBOX_XACTION_SIZE PAGE_SIZE
> #define MBOX_XACTION_MAX(imgsz) ((imgsz) * 2)
> +#define MBOX_XACTION_TIMEOUT_MS (10 * MSEC_PER_SEC)
>
> /* Current microcode patch used in early patching on the APs. */
> static struct microcode_intel *ucode_patch_va __read_mostly;
> @@ -345,6 +369,49 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
> return size ? NULL : patch;
> }
>
> +static inline u32 read_mbox_dword(void __iomem *mmio_base)
> +{
> + u32 dword = readl(mmio_base + MBOX_RDDATA_OFFSET);
> +
> + /* Acknowledge read completion to the staging hardware */
> + writel(0, mmio_base + MBOX_RDDATA_OFFSET);
> + return dword;
> +}
> +
> +static inline void write_mbox_dword(void __iomem *mmio_base, u32 dword)
> +{
> + writel(dword, mmio_base + MBOX_WRDATA_OFFSET);
> +}
> +
> +static inline u64 read_mbox_header(void __iomem *mmio_base)
> +{
> + u32 high, low;
> +
> + low = read_mbox_dword(mmio_base);
> + high = read_mbox_dword(mmio_base);
> +
> + return ((u64)high << 32) | low;
> +}
> +
> +static inline void write_mbox_header(void __iomem *mmio_base, u64 value)
> +{
> + write_mbox_dword(mmio_base, value);
> + write_mbox_dword(mmio_base, value >> 32);
> +}
> +
> +static void write_mbox_data(void __iomem *mmio_base, u32 *chunk, unsigned int chunk_size)
> +{
> + int i;
> +
> + /*
> + * The MMIO space is mapped as Uncached (UC). Each write arrives
> + * at the device as an individual transaction in program order.
> + * The device can then resemble the sequence accordingly.
> + */
> + for (i = 0; i < DWORD_ALIGN(chunk_size); i++)
I don't really like the DWORD_ALIGN(). To me, this is much more clear:
for (i = 0; i < chunk_size / sizeof(u32); i++)
> + write_mbox_dword(mmio_base, chunk[i]);
> +}
I'd _maybe_ make it "chunk_bytes" instead of "chunk_size", but that's
just a naming nit.
> /*
> * Prepare for a new microcode transfer by resetting hardware and
> * configuring microcode image info.
> @@ -388,15 +455,71 @@ static bool can_send_next_chunk(struct staging_state *ss)
> return true;
> }
>
> +/*
> + * Wait for the hardware to complete a transaction.
> + * Return true on success, false on failure.
> + */
> +static bool wait_for_transaction(struct staging_state *ss)
> +{
> + u32 timeout, status;
> +
> + /* Allow time for hardware to complete the operation: */
> + for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT_MS; timeout++) {
> + msleep(1);
> +
> + status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
> + /* Break out early if the hardware is ready: */
> + if (status & MASK_MBOX_STATUS_READY)
> + break;
> + }
> +
> + status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
Why is another read needed here? It had to go through the loop above at
_least_ once, right?
> /*
> * Transmit a chunk of the microcode image to the hardware.
> * Return true if the chunk is processed successfully.
> */
> static bool send_data_chunk(struct staging_state *ss)
> {
> - pr_debug_once("Staging mailbox loading code needs to be implemented.\n");
> - ss->state = UCODE_ERROR;
> - return false;
> + u16 mbox_size = MBOX_HEADER_SIZE * 2 + ss->chunk_size;
The "* 2" needs to be explained.
> + u32 *chunk = ss->ucode_ptr + ss->offset;
I think calling this "src_chunk" would make a lot of sense.
> + /*
> + * Write 'request' mailbox object in the following order:
> + * - Mailbox header includes total size
> + * - Command header specifies the load operation
> + * - Data section contains a microcode chunk
> + */
> + write_mbox_header(ss->mmio_base, MBOX_HEADER(mbox_size));
> + write_mbox_header(ss->mmio_base, MBOX_CMD_LOAD);
> + write_mbox_data(ss->mmio_base, chunk, ss->chunk_size);
> + ss->bytes_sent += ss->chunk_size;
> +
> + /*
> + * Notify the hardware that the mailbox is ready for processing.
> + * The staging hardware will process the request asynchronously.
> + */
> + writel(MASK_MBOX_CTRL_GO, ss->mmio_base + MBOX_CONTROL_OFFSET);
> + return wait_for_transaction(ss);
> }
Why is it important that "The staging hardware will process the request
asynchronously."? We *could* go off and do other things, but we don't.
We sit here and poll. So what does it matter?
>
> /*
> @@ -405,9 +528,24 @@ static bool send_data_chunk(struct staging_state *ss)
> */
> static bool fetch_next_offset(struct staging_state *ss)
> {
> - pr_debug_once("Staging mailbox response handling code needs to be implemented.\n\n");
> - ss->state = UCODE_ERROR;
> - return false;
> + const u16 mbox_size = MBOX_HEADER_SIZE + MBOX_RESPONSE_SIZE;
> +
> + /* All responses begin with the same header value: */
> + WARN_ON_ONCE(read_mbox_header(ss->mmio_base) != MBOX_HEADER(mbox_size));
Again, this seems like a pr_warn() and then 'return false', not a "panic
the system" kind of error.
We should not be panic'ing the system in code that's purely an
optimization at this point.
It's also, IMNHO, bad form to do important actions inside of a WARN_ON()
condition. I'd much rather it be:
/*
* All responses start with the same mailbox header. Consume
* it and ensure that it has the expected contents.
*/
static inline int consume_mbox_header(ss)
{
u32 header = read_mbox_header(ss->mmio_base);
u32 expected_header = MBOX_HEADER(mbox_size));
if (header != expected_header) {
pr_warn_once(...);
return -EINVAL;
}
return 0;
}
The other way to do it is to do the three reads next to each other and
make it obvious that there are 3 dwords coming out of the device:
u32 header = read_mbox_dword(ss->mmio_base);
u32 offset = read_mbox_dword(ss->mmio_base);
u32 err_word = read_mbox_dword(ss->mmio_base);
... then go do all the checking after that.
Previously, the functions for sending microcode data and retrieving the
next offset were placeholders, as they required handling the specific
mailbox format. Implement them as following:
== Mailbox Format ==
The staging mailbox consists of two primary sections: 'header' and
'data'. While the microcode must be transferred following this format,
the actual data transfer mechanism involves reading and writing to
specific MMIO registers.
== Mailbox Data Registers ==
Unlike conventional interfaces that allocate MMIO space for each data
chunk, the staging interface features a "narrow" interface, using only
two dword-sized registers for read and write operations.
For example, if writing 2 dwords of data to a device. Typically, the
device would expose 2 dwords of "wide" MMIO space. To send the data to
the device:
writel(buf[0], io_addr + 0);
writel(buf[1], io_addr + 1);
But, this interface is a bit different. Instead of having a "wide"
interface where there is separate MMIO space for each word in a
transaction, it has a "narrow" interface where several words are written
to the same spot in MMIO space:
writel(buf[0], io_addr);
writel(buf[1], io_addr);
The same goes for the read side.
== Implementation Summary ==
Given that, introduce two layers of helper functions at first:
* Low-level helpers for reading and writing to data registers directly.
* Wrapper functions for handling mailbox header and data sections.
Using them, implement send_data_chunk() and fetch_next_offset()
functions. Add explicit error and timeout handling routine in
wait_for_transaction(), finishing up the transfer.
Both hardware error states and implicit errors -- invalid header or
offset -- result in UCODE_ERROR. Emit a clear message for the latter.
Note: The kernel has support for similar mailboxes. But none of them are
compatible with this one. Trying to share code resulted in a bloated
mess, so this code is standalone.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
---
V4 -> V4a: Addressed Dave's feedback
* fetch_next_offset():
- Make dword reads explicit
- Consolidate offset validation -- adding another user for the
end-offset checker
- Convert WARN_* with pr_err_once()
* Simplify transaction waiting logic a bit
---
arch/x86/kernel/cpu/microcode/intel.c | 186 +++++++++++++++++++++++++-
1 file changed, 179 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index a1b13202330d..f6b365eba6a2 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -22,6 +22,7 @@
#include <linux/mm.h>
#include <linux/delay.h>
#include <linux/io.h>
+#include <linux/pci_ids.h>
#include <asm/cpu_device_id.h>
#include <asm/processor.h>
@@ -42,8 +43,31 @@ static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
#define MBOX_CONTROL_OFFSET 0x0
#define MBOX_STATUS_OFFSET 0x4
+#define MBOX_WRDATA_OFFSET 0x8
+#define MBOX_RDDATA_OFFSET 0xc
#define MASK_MBOX_CTRL_ABORT BIT(0)
+#define MASK_MBOX_CTRL_GO BIT(31)
+
+#define MASK_MBOX_STATUS_ERROR BIT(2)
+#define MASK_MBOX_STATUS_READY BIT(31)
+
+#define MASK_MBOX_RESP_SUCCESS BIT(0)
+#define MASK_MBOX_RESP_PROGRESS BIT(1)
+#define MASK_MBOX_RESP_ERROR BIT(2)
+
+#define MBOX_CMD_LOAD 0x3
+#define MBOX_OBJ_STAGING 0xb
+#define MBOX_HEADER(size) ((PCI_VENDOR_ID_INTEL) | \
+ (MBOX_OBJ_STAGING << 16) | \
+ ((u64)((size) / sizeof(u32)) << 32))
+
+/* The size of each mailbox header */
+#define MBOX_HEADER_SIZE sizeof(u64)
+/* The size of staging hardware response */
+#define MBOX_RESPONSE_SIZE sizeof(u64)
+
+#define MBOX_XACTION_TIMEOUT_MS (10 * MSEC_PER_SEC)
/* Current microcode patch used in early patching on the APs. */
static struct microcode_intel *ucode_patch_va __read_mostly;
@@ -330,6 +354,49 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
return size ? NULL : patch;
}
+static inline u32 read_mbox_dword(void __iomem *mmio_base)
+{
+ u32 dword = readl(mmio_base + MBOX_RDDATA_OFFSET);
+
+ /* Acknowledge read completion to the staging hardware */
+ writel(0, mmio_base + MBOX_RDDATA_OFFSET);
+ return dword;
+}
+
+static inline void write_mbox_dword(void __iomem *mmio_base, u32 dword)
+{
+ writel(dword, mmio_base + MBOX_WRDATA_OFFSET);
+}
+
+static inline u64 read_mbox_header(void __iomem *mmio_base)
+{
+ u32 high, low;
+
+ low = read_mbox_dword(mmio_base);
+ high = read_mbox_dword(mmio_base);
+
+ return ((u64)high << 32) | low;
+}
+
+static inline void write_mbox_header(void __iomem *mmio_base, u64 value)
+{
+ write_mbox_dword(mmio_base, value);
+ write_mbox_dword(mmio_base, value >> 32);
+}
+
+static void write_mbox_data(void __iomem *mmio_base, u32 *chunk, unsigned int chunk_bytes)
+{
+ int i;
+
+ /*
+ * The MMIO space is mapped as Uncached (UC). Each write arrives
+ * at the device as an individual transaction in program order.
+ * The device can then resemble the sequence accordingly.
+ */
+ for (i = 0; i < chunk_bytes / sizeof(u32); i++)
+ write_mbox_dword(mmio_base, chunk[i]);
+}
+
/*
* Prepare for a new microcode transfer: reset hardware and record the
* image size.
@@ -385,6 +452,14 @@ static bool can_send_next_chunk(struct staging_state *ss)
return true;
}
+/*
+ * The hardware indicates completion by returning a sentinel end offset
+ */
+static inline bool is_end_offset(u32 offset)
+{
+ return offset == UINT_MAX;
+}
+
/*
* Determine whether staging is complete: either the hardware signaled
* the end offset, or no more transactions are permitted (retry limit
@@ -392,18 +467,73 @@ static bool can_send_next_chunk(struct staging_state *ss)
*/
static inline bool staging_is_complete(struct staging_state *ss)
{
- return (ss->offset == UINT_MAX) || !can_send_next_chunk(ss);
+ return is_end_offset(ss->offset) || !can_send_next_chunk(ss);
+}
+
+/*
+ * Wait for the hardware to complete a transaction.
+ * Return 0 on success, or an error code on failure.
+ */
+static int wait_for_transaction(struct staging_state *ss)
+{
+ u32 timeout, status;
+
+ /* Allow time for hardware to complete the operation: */
+ for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT_MS; timeout++) {
+ msleep(1);
+
+ status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
+ /* Break out early if the hardware is ready: */
+ if (status & MASK_MBOX_STATUS_READY)
+ break;
+ }
+
+ /* Check for explicit error response */
+ if (status & MASK_MBOX_STATUS_ERROR) {
+ ss->state = UCODE_ERROR;
+ return -EPROTO;
+ }
+
+ /*
+ * Hardware is neither responded to the action nor signaled any
+ * error. Treat this as timeout.
+ */
+ if (!(status & MASK_MBOX_STATUS_READY)) {
+ ss->state = UCODE_TIMEOUT;
+ return -ETIMEDOUT;
+ }
+
+ ss->state = UCODE_OK;
+ return 0;
}
/*
* Transmit a chunk of the microcode image to the hardware.
* Return 0 on success, or an error code on failure.
*/
-static int send_data_chunk(struct staging_state *ss, void *ucode_ptr __maybe_unused)
+static int send_data_chunk(struct staging_state *ss, void *ucode_ptr)
{
- pr_debug_once("Staging mailbox loading code needs to be implemented.\n");
- ss->state = UCODE_ERROR;
- return -EPROTONOSUPPORT;
+ u32 *src_chunk = ucode_ptr + ss->offset;
+ u16 mbox_size;
+
+ /*
+ * Write a 'request' mailbox object in this order:
+ * 1. Mailbox header includes total size
+ * 2. Command header specifies the load operation
+ * 3. Data section contains a microcode chunk
+ *
+ * Thus, the mailbox size is two headers plus the chunk size.
+ */
+ mbox_size = MBOX_HEADER_SIZE * 2 + ss->chunk_size;
+ write_mbox_header(ss->mmio_base, MBOX_HEADER(mbox_size));
+ write_mbox_header(ss->mmio_base, MBOX_CMD_LOAD);
+ write_mbox_data(ss->mmio_base, src_chunk, ss->chunk_size);
+ ss->bytes_sent += ss->chunk_size;
+
+ /* Notify the hardware that the mailbox is ready for processing. */
+ writel(MASK_MBOX_CTRL_GO, ss->mmio_base + MBOX_CONTROL_OFFSET);
+
+ return wait_for_transaction(ss);
}
/*
@@ -412,9 +542,51 @@ static int send_data_chunk(struct staging_state *ss, void *ucode_ptr __maybe_unu
*/
static int fetch_next_offset(struct staging_state *ss)
{
- pr_debug_once("Staging mailbox response handling code needs to be implemented.\n\n");
+ const u64 expected_header = MBOX_HEADER(MBOX_HEADER_SIZE + MBOX_RESPONSE_SIZE);
+ u32 offset, status;
+ u64 header;
+ int err;
+
+ /*
+ * The 'response' mailbox returns three fields, in order:
+ * 1. Header
+ * 2. Next offset in the microcode image
+ * 3. Status flags
+ */
+ header = read_mbox_header(ss->mmio_base);
+ offset = read_mbox_dword(ss->mmio_base);
+ status = read_mbox_dword(ss->mmio_base);
+
+ /* All valid responses must start with the expected header. */
+ if (header != expected_header) {
+ pr_err_once("staging: invalid response header\n");
+ err = -EINVAL;
+ goto err_out;
+ }
+
+ /*
+ * Verify the offset: If not at the end marker, it must not
+ * exceed the microcode image length
+ */
+ if (!is_end_offset(offset) && offset > ss->ucode_len) {
+ pr_err_once("staging: invalid response offset\n");
+ err = -EINVAL;
+ goto err_out;
+ }
+
+ /* Hardware may report errors explicitly in the status field */
+ if (status & MASK_MBOX_RESP_ERROR) {
+ err = -EPROTO;
+ goto err_out;
+ }
+
+ ss->offset = offset;
+ ss->state = UCODE_OK;
+ return 0;
+
+err_out:
ss->state = UCODE_ERROR;
- return -EPROTONOSUPPORT;
+ return err;
}
/*
--
2.48.1
With the staging code being ready, check for staging availability by
reading these following MSRs:
* IA32_ARCH_CAPABILITIES (bit 16) for the presence of
IA32_MCU_ENUMERATION
* IA32_MCU_ENUMERATION (bit 4) for the staging feature availability.
When available, enable it by setting the feature bit.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
---
V1 -> V2: Fold MSR definings (Boris).
---
arch/x86/include/asm/msr-index.h | 7 +++++++
arch/x86/kernel/cpu/microcode/intel.c | 17 +++++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 0356155f9264..2b7b598c414f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -166,6 +166,10 @@
* Processor MMIO stale data
* vulnerabilities.
*/
+#define ARCH_CAP_MCU_ENUM BIT(16) /*
+ * Indicates the presence of microcode update
+ * feature enumeration and status information
+ */
#define ARCH_CAP_FB_CLEAR BIT(17) /*
* VERW clears CPU fill buffer
* even on MDS_NO CPUs.
@@ -913,6 +917,9 @@
#define MSR_IA32_UCODE_WRITE 0x00000079
#define MSR_IA32_UCODE_REV 0x0000008b
+#define MSR_IA32_MCU_ENUMERATION 0x0000007b
+#define MCU_STAGING BIT(4)
+
#define MSR_IA32_MCU_STAGING_MBOX_ADDR 0x000007a5
/* Intel SGX Launch Enclave Public Key Hash MSRs */
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 5402bcb96f47..4da5f3581a94 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -959,6 +959,18 @@ static __init void calc_llc_size_per_core(struct cpuinfo_x86 *c)
llc_size_per_core = (unsigned int)llc_size;
}
+static __init bool staging_available(void)
+{
+ u64 val;
+
+ val = x86_read_arch_cap_msr();
+ if (!(val & ARCH_CAP_MCU_ENUM))
+ return false;
+
+ rdmsrl(MSR_IA32_MCU_ENUMERATION, val);
+ return !!(val & MCU_STAGING);
+}
+
struct microcode_ops * __init init_intel_microcode(void)
{
struct cpuinfo_x86 *c = &boot_cpu_data;
@@ -969,6 +981,11 @@ struct microcode_ops * __init init_intel_microcode(void)
return NULL;
}
+ if (staging_available()) {
+ microcode_intel_ops.use_staging = true;
+ pr_info("Enabled staging feature.\n");
+ }
+
calc_llc_size_per_core(c);
return µcode_intel_ops;
--
2.48.1
On Wed, Aug 13, 2025 at 10:26:49AM -0700, Chang S. Bae wrote:
>With the staging code being ready, check for staging availability by
>reading these following MSRs:
>
> * IA32_ARCH_CAPABILITIES (bit 16) for the presence of
> IA32_MCU_ENUMERATION
>
> * IA32_MCU_ENUMERATION (bit 4) for the staging feature availability.
>
>When available, enable it by setting the feature bit.
>
>Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
>Tested-by: Anselm Busse <abusse@amazon.de>
Reviewed-by: Chao Gao <chao.gao@intel.com>
<snip>
>+static __init bool staging_available(void)
>+{
>+ u64 val;
>+
>+ val = x86_read_arch_cap_msr();
>+ if (!(val & ARCH_CAP_MCU_ENUM))
>+ return false;
>+
>+ rdmsrl(MSR_IA32_MCU_ENUMERATION, val);
nit: s/rdmsrl/rdmsrq
rdmsrl has been renamed to rdmsrq.
>+ return !!(val & MCU_STAGING);
>+}
>+
On 8/18/2025 1:35 AM, Chao Gao wrote: > > Reviewed-by: Chao Gao <chao.gao@intel.com> Thanks. >> + rdmsrl(MSR_IA32_MCU_ENUMERATION, val); > > nit: s/rdmsrl/rdmsrq > > rdmsrl has been renamed to rdmsrq. Yeah, right. Thanks for the catch!
As microcode patch sizes continue to grow, late-loading latency spikes
can lead to timeouts and disruptions in running workloads. This trend of
increasing patch sizes is expected to continue, so a foundational
solution is needed to address the issue.
To mitigate the problem, a new staging feature is introduced. This option
processes most of the microcode update (excluding activation) on a
non-critical path, allowing CPUs to remain operational during the
majority of the update. By offloading work from the critical path,
staging can significantly reduces latency spikes.
Integrate staging as a preparatory step in late-loading. Introduce a new
callback for staging, which is invoked at the beginning of
load_late_stop_cpus(), before CPUs enter the rendezvous phase.
Staging follows an opportunistic model:
* If successful, it reduces CPU rendezvous time
* Even though it fails, the process falls back to the legacy path to
finish the loading process but with potentially higher latency.
Extend struct microcode_ops to incorporate staging properties, which will
be implemented in the vendor code separately.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
V2 -> V3: No change
Note:
There were discussions about whether staging success should be enforced
by a configurable option. That topic is identified as follow-up work,
separate from this series.
https://lore.kernel.org/lkml/54308373-7867-4b76-be34-63730953f83c@intel.com/
V1 -> V2:
* Move invocation inside of load_late_stop_cpus() (Boris)
* Add more note about staging (Dave)
---
arch/x86/kernel/cpu/microcode/core.c | 11 +++++++++++
arch/x86/kernel/cpu/microcode/internal.h | 4 +++-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index b3658d11e7b6..c4aff44a7ffc 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -541,6 +541,17 @@ static int load_late_stop_cpus(bool is_safe)
pr_err("You should switch to early loading, if possible.\n");
}
+ /*
+ * Pre-load the microcode image into a staging device. This
+ * process is preemptible and does not require stopping CPUs.
+ * Successful staging simplifies the subsequent late-loading
+ * process, reducing rendezvous time.
+ *
+ * Even if the transfer fails, the update will proceed as usual.
+ */
+ if (microcode_ops->use_staging)
+ microcode_ops->stage_microcode();
+
atomic_set(&late_cpus_in, num_online_cpus());
atomic_set(&offline_in_nmi, 0);
loops_per_usec = loops_per_jiffy / (TICK_NSEC / 1000);
diff --git a/arch/x86/kernel/cpu/microcode/internal.h b/arch/x86/kernel/cpu/microcode/internal.h
index 5df621752fef..4b983b4cddbd 100644
--- a/arch/x86/kernel/cpu/microcode/internal.h
+++ b/arch/x86/kernel/cpu/microcode/internal.h
@@ -31,10 +31,12 @@ struct microcode_ops {
* See also the "Synchronization" section in microcode_core.c.
*/
enum ucode_state (*apply_microcode)(int cpu);
+ void (*stage_microcode)(void);
int (*collect_cpu_info)(int cpu, struct cpu_signature *csig);
void (*finalize_late_load)(int result);
unsigned int nmi_safe : 1,
- use_nmi : 1;
+ use_nmi : 1,
+ use_staging : 1;
};
struct early_load_data {
--
2.45.2
When microcode staging is initiated, operations are carried out through
an MMIO interface. Each package has a unique interface specified by the
IA32_MCU_STAGING_MBOX_ADDR MSR, which maps to a set of 32-bit registers.
Prepare staging with the following steps:
1. Ensure the microcode image is 32-bit aligned to match the MMIO
register size.
2. Identify each MMIO interface based on its per-package scope.
3. Invoke the staging function for each identified interface, which
will be implemented separately.
Also, define cpu_primary_thread_mask for the CONFIG_SMP=n case, allowing
consistent use when narrowing down primary threads to locate the
per-package interface.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Link: https://lore.kernel.org/all/871pznq229.ffs@tglx
---
V2 -> V3:
* Remove a global variable and adjust stage_microcode() (Dave).
* Simplify for_each_cpu() loop control code
* Handle rdmsrl_on_cpu() return code explicitly (Chao)
V1 -> V2:
* Adjust to reference the staging_state struct.
* Add lockdep_assert_cpus_held() (Boris)
---
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/include/asm/topology.h | 1 +
arch/x86/kernel/cpu/microcode/intel.c | 50 +++++++++++++++++++++++++++
3 files changed, 53 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 53da787b9326..f5b05e4f1e27 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -897,6 +897,8 @@
#define MSR_IA32_UCODE_WRITE 0x00000079
#define MSR_IA32_UCODE_REV 0x0000008b
+#define MSR_IA32_MCU_STAGING_MBOX_ADDR 0x000007a5
+
/* Intel SGX Launch Enclave Public Key Hash MSRs */
#define MSR_IA32_SGXLEPUBKEYHASH0 0x0000008C
#define MSR_IA32_SGXLEPUBKEYHASH1 0x0000008D
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 6c79ee7c0957..91b5fc44ca62 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -235,6 +235,7 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
static inline int topology_max_smt_threads(void) { return 1; }
static inline unsigned int topology_amd_nodes_per_pkg(void) { return 1; }
+#define cpu_primary_thread_mask cpu_none_mask
#endif /* !CONFIG_SMP */
static inline void arch_fix_phys_package_id(int num, u32 slot)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 819199bc0119..a791d5dc2cef 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -299,6 +299,55 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
return size ? NULL : patch;
}
+/*
+ * Handle the staging process using the mailbox MMIO interface.
+ * Return the result state.
+ */
+static enum ucode_state do_stage(u64 mmio_pa)
+{
+ pr_debug_once("Staging implementation is pending.\n");
+ return UCODE_ERROR;
+}
+
+static void stage_microcode(void)
+{
+ unsigned int pkg_id = UINT_MAX;
+ enum ucode_state ret;
+ int cpu, err;
+ u64 mmio_pa;
+
+ if (!IS_ALIGNED(get_totalsize(&ucode_patch_late->hdr), sizeof(u32)))
+ return;
+
+ lockdep_assert_cpus_held();
+
+ /*
+ * The MMIO address is unique per package, and all the SMT
+ * primary threads are online here. Find each MMIO space by
+ * their package ids to avoid duplicate staging.
+ */
+ for_each_cpu(cpu, cpu_primary_thread_mask) {
+ if (topology_logical_package_id(cpu) == pkg_id)
+ continue;
+ pkg_id = topology_logical_package_id(cpu);
+
+ err = rdmsrl_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &mmio_pa);
+ if (WARN_ON_ONCE(err))
+ return;
+
+ ret = do_stage(mmio_pa);
+ if (ret != UCODE_OK) {
+ pr_err("Error: staging failed with %s for CPU%d at package %u.\n",
+ ret == UCODE_TIMEOUT ? "timeout" : "error state",
+ cpu, pkg_id);
+ return;
+ }
+ }
+
+ pr_info("Staging of patch revision 0x%x succeeded.\n",
+ ((struct microcode_header_intel *)ucode_patch_late)->rev);
+}
+
static enum ucode_state __apply_microcode(struct ucode_cpu_info *uci,
struct microcode_intel *mc,
u32 *cur_rev)
@@ -627,6 +676,7 @@ static struct microcode_ops microcode_intel_ops = {
.collect_cpu_info = collect_cpu_info,
.apply_microcode = apply_microcode_late,
.finalize_late_load = finalize_late_load,
+ .stage_microcode = stage_microcode,
.use_nmi = IS_ENABLED(CONFIG_X86_64),
};
--
2.45.2
To facilitate a structured staging handler, a set of functions will be
introduced. Define staging_state struct to simplify function prototypes
by consolidating relevant data, instead of passing multiple local
variables.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
V2 -> V3: No change
V1 -> V2: New patch
Prior to V2, local variables were used to track state values, with the
intention of improving readability by explicitly passing them between
functions. However, given feedbacks, introducing a dedicated data
structure looks to provide a benefit by simplifying the main loop.
---
arch/x86/kernel/cpu/microcode/intel.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index a791d5dc2cef..d8ea172d90e2 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -54,6 +54,27 @@ struct extended_sigtable {
struct extended_signature sigs[];
};
+/**
+ * struct staging_state - Tracks the current staging process state
+ *
+ * @mmio_base: MMIO base address for staging
+ * @ucode_ptr: Pointer to the microcode image
+ * @ucode_len: Total size of the microcode image
+ * @chunk_size: Size of each data piece
+ * @bytes_sent: Total bytes transmitted so far
+ * @offset: Current offset in the microcode image
+ * @state: Current state of the staging process
+ */
+struct staging_state {
+ void __iomem *mmio_base;
+ void *ucode_ptr;
+ unsigned int ucode_len;
+ unsigned int chunk_size;
+ unsigned int bytes_sent;
+ unsigned int offset;
+ enum ucode_state state;
+};
+
#define DEFAULT_UCODE_TOTALSIZE (DEFAULT_UCODE_DATASIZE + MC_HEADER_SIZE)
#define EXT_HEADER_SIZE (sizeof(struct extended_sigtable))
#define EXT_SIGNATURE_SIZE (sizeof(struct extended_signature))
--
2.45.2
Previously, per-package staging invocations and their associated state
data were established. The next step is to implement the actual staging
handler according to the specified protocol. Below are key aspects to
note:
(a) Each staging process must begin by resetting the staging hardware.
(b) The staging hardware processes up to a page-sized chunk of the
microcode image per iteration, requiring software to submit data
incrementally.
(c) Once a data chunk is processed, the hardware responds with an
offset in the image for the next chunk.
(d) The offset may indicate completion or request retransmission of an
already transferred chunk. As long as the total transferred data
remains within the predefined limit (twice the image size),
retransmissions should be acceptable.
With that, incorporate these code sequences to the staging handler:
1. Initialization: Map the MMIO space via ioremap(). Reset the staging
hardware and initialize software state, ensuring a fresh staging
process aligned with (a).
2. Processing Loop: Introduce a loop iterating over data chunk,
following (b), with proper termination conditions established from
(d) -- stop staging when the hardware signals completion, or if the
total transmitted data exceeds the predefined limit.
3. Loop Body: Finally, compose the loop body with two steps --
transmitting a data chunk and retrieving the next offset from the
hardware response, aligning with (b) and (c).
Since data transmission and mailbox format handling require additional
details, they are implemented separately in next changes.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
V2 -> V3:
* Rework code to eliminate global variables (Dave)
* Remove redundant variable resets (Chao)
V1 -> V2:
* Re-write the changelog for clarity (Dave).
* Move staging handling code into intel.c (Boris).
* Add extensive comments to clarify staging logic and hardware
interactions, along with function renaming (Dave).
---
arch/x86/kernel/cpu/microcode/intel.c | 120 +++++++++++++++++++++++++-
1 file changed, 116 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index d8ea172d90e2..12910b8f9f8a 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -20,6 +20,8 @@
#include <linux/cpu.h>
#include <linux/uio.h>
#include <linux/mm.h>
+#include <linux/delay.h>
+#include <linux/io.h>
#include <asm/cpu_device_id.h>
#include <asm/processor.h>
@@ -33,6 +35,29 @@ static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
#define UCODE_BSP_LOADED ((struct microcode_intel *)0x1UL)
+/* Defines for the microcode staging mailbox interface */
+
+#define MBOX_REG_NUM 4
+#define MBOX_REG_SIZE sizeof(u32)
+
+#define MBOX_CONTROL_OFFSET 0x0
+#define MBOX_STATUS_OFFSET 0x4
+
+#define MASK_MBOX_CTRL_ABORT BIT(0)
+
+/*
+ * Each microcode image is divided into chunks, each at most
+ * MBOX_XACTION_SIZE in size. A 10-chunk image would typically require
+ * 10 transactions. However, the hardware managing the mailbox has
+ * limited resources and may not cache the entire image, potentially
+ * requesting the same chunk multiple times.
+ *
+ * To accommodate this behavior, allow up to twice the expected number of
+ * transactions (i.e., a 10-chunk image can take up to 20 attempts).
+ */
+#define MBOX_XACTION_SIZE PAGE_SIZE
+#define MBOX_XACTION_MAX(imgsz) ((imgsz) * 2)
+
/* Current microcode patch used in early patching on the APs. */
static struct microcode_intel *ucode_patch_va __read_mostly;
static struct microcode_intel *ucode_patch_late __read_mostly;
@@ -321,13 +346,100 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
}
/*
- * Handle the staging process using the mailbox MMIO interface.
- * Return the result state.
+ * Prepare for a new microcode transfer by resetting hardware and
+ * configuring microcode image info.
+ */
+static void init_stage(struct staging_state *ss)
+{
+ ss->ucode_ptr = ucode_patch_late;
+ ss->ucode_len = get_totalsize(&ucode_patch_late->hdr);
+
+ /*
+ * Abort any ongoing process, effectively resetting the device.
+ * Unlike regular mailbox data processing requests, this
+ * operation does not require a status check.
+ */
+ writel(MASK_MBOX_CTRL_ABORT, ss->mmio_base + MBOX_CONTROL_OFFSET);
+}
+
+/*
+ * Check if the staging process has completed. The hardware signals
+ * completion by setting a unique end offset.
+ */
+static inline bool is_stage_complete(unsigned int offset)
+{
+ return offset == UINT_MAX;
+}
+
+/*
+ * Determine if the next data chunk can be sent. Each chunk is typically
+ * one page unless the remaining data is smaller. If the total
+ * transmitted data exceeds the defined limit, a timeout occurs.
+ */
+static bool can_send_next_chunk(struct staging_state *ss)
+{
+ WARN_ON_ONCE(ss->ucode_len < ss->offset);
+ ss->chunk_size = min(MBOX_XACTION_SIZE, ss->ucode_len - ss->offset);
+
+ if (ss->bytes_sent + ss->chunk_size > MBOX_XACTION_MAX(ss->ucode_len)) {
+ ss->state = UCODE_TIMEOUT;
+ return false;
+ }
+ return true;
+}
+
+/*
+ * Transmit a chunk of the microcode image to the hardware.
+ * Return true if the chunk is processed successfully.
+ */
+static bool send_data_chunk(struct staging_state *ss)
+{
+ pr_debug_once("Staging mailbox loading code needs to be implemented.\n");
+ ss->state = UCODE_ERROR;
+ return false;
+}
+
+/*
+ * Retrieve the next offset from the hardware response.
+ * Return true if the response is valid, false otherwise.
+ */
+static bool fetch_next_offset(struct staging_state *ss)
+{
+ pr_debug_once("Staging mailbox response handling code needs to be implemented.\n\n");
+ ss->state = UCODE_ERROR;
+ return false;
+}
+
+/*
+ * Handle the staging process using the mailbox MMIO interface. The
+ * microcode image is transferred in chunks until completion. Return the
+ * result state.
*/
static enum ucode_state do_stage(u64 mmio_pa)
{
- pr_debug_once("Staging implementation is pending.\n");
- return UCODE_ERROR;
+ struct staging_state ss = {};
+
+ ss.mmio_base = ioremap(mmio_pa, MBOX_REG_NUM * MBOX_REG_SIZE);
+ if (WARN_ON_ONCE(!ss.mmio_base))
+ return UCODE_ERROR;
+
+ init_stage(&ss);
+
+ /* Perform the staging process while within the retry limit */
+ while (!is_stage_complete(ss.offset) && can_send_next_chunk(&ss)) {
+ /* Send a chunk of microcode each time: */
+ if (!send_data_chunk(&ss))
+ break;
+ /*
+ * Then, ask the hardware which piece of the image it
+ * needs next. The same piece may be sent more than once.
+ */
+ if (!fetch_next_offset(&ss))
+ break;
+ }
+
+ iounmap(ss.mmio_base);
+ return ss.state;
}
static void stage_microcode(void)
--
2.45.2
Previously, the functions for sending microcode data and retrieving the
next offset were placeholders, as they required handling the specific
mailbox format. Implement them as following:
== Mailbox Format ==
The staging mailbox consists of two primary sections: 'header' and
'data'. While the microcode must be transferred following this format,
the actual data transfer mechanism involves reading and writing to
specific MMIO registers.
== Mailbox Data Registers ==
Unlike conventional interfaces that allocate MMIO space for each data
chunk, the staging interface features a "narrow" interface, using only
two dword-sized registers for read and write operations.
For example, if writing 2 dwords of data to a device. Typically, the
device would expose 2 dwords of "wide" MMIO space. To send the data to
the device:
writel(buf[0], io_addr + 0);
writel(buf[1], io_addr + 1);
But, this interface is a bit different. Instead of having a "wide"
interface where there is separate MMIO space for each word in a
transaction, it has a "narrow" interface where several words are written
to the same spot in MMIO space:
writel(buf[0], io_addr);
writel(buf[1], io_addr);
The same goes for the read side.
== Implementation Summary ==
Given that, introduce two layers of helper functions at first:
* Low-level helpers for reading and writing to data registers directly.
* Wrapper functions for handling mailbox header and data sections.
Using them, implement send_data_chunk() and fetch_next_offset()
functions. Add explicit error and timeout handling routine in
wait_for_transaction(), finishing up the transfer.
Note: The kernel has support for similar mailboxes. But none of them are
compatible with this one. Trying to share code resulted in a bloated
mess, so this code is standalone.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
V2 -> V3:
* Update code to reflect the removal of a global variable (Dave).
V1 -> V2:
* Add lots of code comments and edit the changelog (Dave).
* Encapsulate register read/write operations for processing header and
data sections.
---
arch/x86/kernel/cpu/microcode/intel.c | 150 ++++++++++++++++++++++++--
1 file changed, 144 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 12910b8f9f8a..542b3abad8e3 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -22,6 +22,7 @@
#include <linux/mm.h>
#include <linux/delay.h>
#include <linux/io.h>
+#include <linux/pci_ids.h>
#include <asm/cpu_device_id.h>
#include <asm/processor.h>
@@ -42,8 +43,30 @@ static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
#define MBOX_CONTROL_OFFSET 0x0
#define MBOX_STATUS_OFFSET 0x4
+#define MBOX_WRDATA_OFFSET 0x8
+#define MBOX_RDDATA_OFFSET 0xc
#define MASK_MBOX_CTRL_ABORT BIT(0)
+#define MASK_MBOX_CTRL_GO BIT(31)
+
+#define MASK_MBOX_STATUS_ERROR BIT(2)
+#define MASK_MBOX_STATUS_READY BIT(31)
+
+#define MASK_MBOX_RESP_SUCCESS BIT(0)
+#define MASK_MBOX_RESP_PROGRESS BIT(1)
+#define MASK_MBOX_RESP_ERROR BIT(2)
+
+#define MBOX_CMD_LOAD 0x3
+#define MBOX_OBJ_STAGING 0xb
+#define DWORD_ALIGN(size) ((size) / sizeof(u32))
+#define MBOX_HEADER(mbox_size) (PCI_VENDOR_ID_INTEL | \
+ (MBOX_OBJ_STAGING << 16) | \
+ ((u64)DWORD_ALIGN(mbox_size) << 32))
+
+/* The size of each mailbox header */
+#define MBOX_HEADER_SIZE sizeof(u64)
+/* The size of staging hardware response */
+#define MBOX_RESPONSE_SIZE sizeof(u64)
/*
* Each microcode image is divided into chunks, each at most
@@ -57,6 +80,7 @@ static const char ucode_path[] = "kernel/x86/microcode/GenuineIntel.bin";
*/
#define MBOX_XACTION_SIZE PAGE_SIZE
#define MBOX_XACTION_MAX(imgsz) ((imgsz) * 2)
+#define MBOX_XACTION_TIMEOUT_MS (10 * MSEC_PER_SEC)
/* Current microcode patch used in early patching on the APs. */
static struct microcode_intel *ucode_patch_va __read_mostly;
@@ -345,6 +369,49 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
return size ? NULL : patch;
}
+static inline u32 read_mbox_dword(void __iomem *mmio_base)
+{
+ u32 dword = readl(mmio_base + MBOX_RDDATA_OFFSET);
+
+ /* Acknowledge read completion to the staging hardware */
+ writel(0, mmio_base + MBOX_RDDATA_OFFSET);
+ return dword;
+}
+
+static inline void write_mbox_dword(void __iomem *mmio_base, u32 dword)
+{
+ writel(dword, mmio_base + MBOX_WRDATA_OFFSET);
+}
+
+static inline u64 read_mbox_header(void __iomem *mmio_base)
+{
+ u32 high, low;
+
+ low = read_mbox_dword(mmio_base);
+ high = read_mbox_dword(mmio_base);
+
+ return ((u64)high << 32) | low;
+}
+
+static inline void write_mbox_header(void __iomem *mmio_base, u64 value)
+{
+ write_mbox_dword(mmio_base, value);
+ write_mbox_dword(mmio_base, value >> 32);
+}
+
+static void write_mbox_data(void __iomem *mmio_base, u32 *chunk, unsigned int chunk_size)
+{
+ int i;
+
+ /*
+ * The MMIO space is mapped as Uncached (UC). Each write arrives
+ * at the device as an individual transaction in program order.
+ * The device can then resemble the sequence accordingly.
+ */
+ for (i = 0; i < DWORD_ALIGN(chunk_size); i++)
+ write_mbox_dword(mmio_base, chunk[i]);
+}
+
/*
* Prepare for a new microcode transfer by resetting hardware and
* configuring microcode image info.
@@ -388,15 +455,71 @@ static bool can_send_next_chunk(struct staging_state *ss)
return true;
}
+/*
+ * Wait for the hardware to complete a transaction.
+ * Return true on success, false on failure.
+ */
+static bool wait_for_transaction(struct staging_state *ss)
+{
+ u32 timeout, status;
+
+ /* Allow time for hardware to complete the operation: */
+ for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT_MS; timeout++) {
+ msleep(1);
+
+ status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
+ /* Break out early if the hardware is ready: */
+ if (status & MASK_MBOX_STATUS_READY)
+ break;
+ }
+
+ status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
+
+ /* Check for explicit error response */
+ if (status & MASK_MBOX_STATUS_ERROR) {
+ ss->state = UCODE_ERROR;
+ return false;
+ }
+
+ /*
+ * Hardware is neither responded to the action nor
+ * signaled any error. Treat the case as timeout.
+ */
+ if (!(status & MASK_MBOX_STATUS_READY)) {
+ ss->state = UCODE_TIMEOUT;
+ return false;
+ }
+
+ ss->state = UCODE_OK;
+ return true;
+}
+
/*
* Transmit a chunk of the microcode image to the hardware.
* Return true if the chunk is processed successfully.
*/
static bool send_data_chunk(struct staging_state *ss)
{
- pr_debug_once("Staging mailbox loading code needs to be implemented.\n");
- ss->state = UCODE_ERROR;
- return false;
+ u16 mbox_size = MBOX_HEADER_SIZE * 2 + ss->chunk_size;
+ u32 *chunk = ss->ucode_ptr + ss->offset;
+
+ /*
+ * Write 'request' mailbox object in the following order:
+ * - Mailbox header includes total size
+ * - Command header specifies the load operation
+ * - Data section contains a microcode chunk
+ */
+ write_mbox_header(ss->mmio_base, MBOX_HEADER(mbox_size));
+ write_mbox_header(ss->mmio_base, MBOX_CMD_LOAD);
+ write_mbox_data(ss->mmio_base, chunk, ss->chunk_size);
+ ss->bytes_sent += ss->chunk_size;
+
+ /*
+ * Notify the hardware that the mailbox is ready for processing.
+ * The staging hardware will process the request asynchronously.
+ */
+ writel(MASK_MBOX_CTRL_GO, ss->mmio_base + MBOX_CONTROL_OFFSET);
+ return wait_for_transaction(ss);
}
/*
@@ -405,9 +528,24 @@ static bool send_data_chunk(struct staging_state *ss)
*/
static bool fetch_next_offset(struct staging_state *ss)
{
- pr_debug_once("Staging mailbox response handling code needs to be implemented.\n\n");
- ss->state = UCODE_ERROR;
- return false;
+ const u16 mbox_size = MBOX_HEADER_SIZE + MBOX_RESPONSE_SIZE;
+
+ /* All responses begin with the same header value: */
+ WARN_ON_ONCE(read_mbox_header(ss->mmio_base) != MBOX_HEADER(mbox_size));
+
+ /*
+ * The 'response' mailbox contains two dword data:
+ * - First has next offset in microcode image
+ * - Second delivers status flag
+ */
+ ss->offset = read_mbox_dword(ss->mmio_base);
+ if (read_mbox_dword(ss->mmio_base) & MASK_MBOX_RESP_ERROR) {
+ ss->state = UCODE_ERROR;
+ return false;
+ }
+
+ ss->state = UCODE_OK;
+ return true;
}
/*
--
2.45.2
>+/*
>+ * Wait for the hardware to complete a transaction.
>+ * Return true on success, false on failure.
>+ */
>+static bool wait_for_transaction(struct staging_state *ss)
>+{
>+ u32 timeout, status;
>+
>+ /* Allow time for hardware to complete the operation: */
>+ for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT_MS; timeout++) {
>+ msleep(1);
>+
>+ status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
>+ /* Break out early if the hardware is ready: */
>+ if (status & MASK_MBOX_STATUS_READY)
>+ break;
Shouldn't we exit the loop if the MASK_MBOX_STATUS_ERROR is set, or is the
ERROR bit always set in conjunction with the READY bit?
>+ }
>+
>+ status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
I still think this read is not needed.
>+
>+ /* Check for explicit error response */
>+ if (status & MASK_MBOX_STATUS_ERROR) {
>+ ss->state = UCODE_ERROR;
>+ return false;
>+ }
>+
>+ /*
>+ * Hardware is neither responded to the action nor
>+ * signaled any error. Treat the case as timeout.
>+ */
>+ if (!(status & MASK_MBOX_STATUS_READY)) {
>+ ss->state = UCODE_TIMEOUT;
>+ return false;
>+ }
>+
>+ ss->state = UCODE_OK;
>+ return true;
>+}
On 4/16/2025 7:14 AM, Chao Gao wrote:
>> +/*
>> + * Wait for the hardware to complete a transaction.
>> + * Return true on success, false on failure.
>> + */
>> +static bool wait_for_transaction(struct staging_state *ss)
>> +{
>> + u32 timeout, status;
>> +
>> + /* Allow time for hardware to complete the operation: */
>> + for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT_MS; timeout++) {
>> + msleep(1);
>> +
>> + status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
>> + /* Break out early if the hardware is ready: */
>> + if (status & MASK_MBOX_STATUS_READY)
>> + break;
>
> Shouldn't we exit the loop if the MASK_MBOX_STATUS_ERROR is set, or is the
> ERROR bit always set in conjunction with the READY bit?
>
>> + }
>> +
>> + status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
>
> I still think this read is not needed.
No, if it times out, it could exit without having read the status.
But this is a slow path — instead of trying to save a few cycles, I
wanted to present the logic more clearly:
* Give the hardware enough time, but not forever.
* Oh, we don't need to wait anymore if it's done
* After waiting, check the status and handle it properly.
Isn’t it clearer to read the status after the wait, right before the
error handling?
Also, reading the status inside the loop is just a way to determine
whether to break — not to interpret or act on it. That should come after
the wait completes.
On 4/16/25 10:22, Chang S. Bae wrote:
> On 4/16/2025 7:14 AM, Chao Gao wrote:
>>> +/*
>>> + * Wait for the hardware to complete a transaction.
>>> + * Return true on success, false on failure.
>>> + */
>>> +static bool wait_for_transaction(struct staging_state *ss)
>>> +{
>>> + u32 timeout, status;
>>> +
>>> + /* Allow time for hardware to complete the operation: */
>>> + for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT_MS; timeout++) {
>>> + msleep(1);
>>> +
>>> + status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
>>> + /* Break out early if the hardware is ready: */
>>> + if (status & MASK_MBOX_STATUS_READY)
>>> + break;
>>
>> Shouldn't we exit the loop if the MASK_MBOX_STATUS_ERROR is set, or is
>> the
>> ERROR bit always set in conjunction with the READY bit?
>>
>>> + }
>>> +
>>> + status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
>>
>> I still think this read is not needed.
>
> No, if it times out, it could exit without having read the status.
Can it? There are only two ways out of the loop: the explicit break and
the implicit break from the 'timeout' check. Both happen at the "end" of
the loop, unless the timeout< check failed up front:
for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT_MS; timeout++){
msleep(1);
status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
if (status & MASK_MBOX_STATUS_READY)
break;
}
> But this is a slow path — instead of trying to save a few cycles, I
> wanted to present the logic more clearly:
>
> * Give the hardware enough time, but not forever.
> * Oh, we don't need to wait anymore if it's done
>
> * After waiting, check the status and handle it properly.
>
> Isn’t it clearer to read the status after the wait, right before the
> error handling?
It's simpler to read the code the way you have it. You don't have to
consider if 'status' could be uninitialized. It makes the "timeout loop"
*COMPLETELY* independent from the rest of the function. It could be:
foo()
{
spin_until_ready();
status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
...
}
for example.
It's less likely to break, too. Let's say someone added a new break:
for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT_MS; timeout++){
+ if (some_other_condition());
+ break;
msleep(1);
status = readl(ss->mmio_base + MBOX_STATUS_OFFSET);
if (status & MASK_MBOX_STATUS_READY)
break;
}
you'd have a bug on your hands.
I'm fine with the way you have it. I probably would have written it that
way too.
With the staging code being ready, check for staging availability by
reading these following MSRs:
* IA32_ARCH_CAPABILITIES (bit 16) for the presence of
IA32_MCU_ENUMERATION
* IA32_MCU_ENUMERATION (bit 4) for the staging feature availability.
When available, enable it by setting the feature bit.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
V2 -> V3: No change
V1 -> V2: Fold MSR definings (Boris).
---
arch/x86/include/asm/msr-index.h | 7 +++++++
arch/x86/kernel/cpu/microcode/intel.c | 17 +++++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index f5b05e4f1e27..4eb1001fc5ac 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -166,6 +166,10 @@
* Processor MMIO stale data
* vulnerabilities.
*/
+#define ARCH_CAP_MCU_ENUM BIT(16) /*
+ * Indicates the presence of microcode update
+ * feature enumeration and status information
+ */
#define ARCH_CAP_FB_CLEAR BIT(17) /*
* VERW clears CPU fill buffer
* even on MDS_NO CPUs.
@@ -897,6 +901,9 @@
#define MSR_IA32_UCODE_WRITE 0x00000079
#define MSR_IA32_UCODE_REV 0x0000008b
+#define MSR_IA32_MCU_ENUMERATION 0x0000007b
+#define MCU_STAGING BIT(4)
+
#define MSR_IA32_MCU_STAGING_MBOX_ADDR 0x000007a5
/* Intel SGX Launch Enclave Public Key Hash MSRs */
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 542b3abad8e3..89c17cdd4336 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -959,6 +959,18 @@ static __init void calc_llc_size_per_core(struct cpuinfo_x86 *c)
llc_size_per_core = (unsigned int)llc_size;
}
+static __init bool staging_available(void)
+{
+ u64 val;
+
+ val = x86_read_arch_cap_msr();
+ if (!(val & ARCH_CAP_MCU_ENUM))
+ return false;
+
+ rdmsrl(MSR_IA32_MCU_ENUMERATION, val);
+ return !!(val & MCU_STAGING);
+}
+
struct microcode_ops * __init init_intel_microcode(void)
{
struct cpuinfo_x86 *c = &boot_cpu_data;
@@ -969,6 +981,11 @@ struct microcode_ops * __init init_intel_microcode(void)
return NULL;
}
+ if (staging_available()) {
+ microcode_intel_ops.use_staging = true;
+ pr_info("Enabled staging feature.\n");
+ }
+
calc_llc_size_per_core(c);
return µcode_intel_ops;
--
2.45.2
© 2016 - 2025 Red Hat, Inc.