Users need to set "cpufreq=amd-cppc" in xen cmdline to enable
amd-cppc driver, which selects ACPI Collaborative Performance
and Power Control (CPPC) on supported AMD hardware to provide a
finer grained frequency control mechanism.
`verbose` option can also be included to support verbose print.
When users setting "cpufreq=amd-cppc", a new amd-cppc driver
shall be registered and used. All hooks for amd-cppc driver are transiently
missing and will be implemented in the ongoing commits.
New xen-pm internal flag XEN_PROCESSOR_PM_CPPC is introduced, to be
differentiated with legacy XEN_PROCESSOR_PM_PX. We define
XEN_PROCESSOR_PM_CPPC 0x100, as it is the next value to use after 8-bits wide
public xen-pm options. All xen-pm flag checking involving XEN_PROCESSOR_PM_PX
shall also be updated to consider XEN_PROCESSOR_PM_CPPC now.
Xen is not expected to support both or mixed mode (CPPC & legacy PSS, _PCT,
_PPC) operations, so only one cpufreq driver gets registerd, either amd-cppc
or legacy P-states driver, which is reflected and asserted by the incompatible
flags XEN_PROCESSOR_PM_PX and XEN_PROCESSOR_PM_CPPC.
Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
---
v1 -> v2:
- Obey to alphabetic sorting and also strict it with CONFIG_AMD
- Remove unnecessary empty comment line
- Use __initconst_cf_clobber for pre-filled structure cpufreq_driver
- Make new switch-case code apply to Hygon CPUs too
- Change ENOSYS with EOPNOTSUPP
- Blanks around binary operator
- Change all amd_/-pstate defined values to amd_/-cppc
---
v2 -> v3
- refactor too long lines
- Make sure XEN_PROCESSOR_PM_PX and XEN_PROCESSOR_PM_CPPC incompatible flags
after cpufreq register registrantion
---
v3 -> v4:
- introduce XEN_PROCESSOR_PM_CPPC in xen internal header
- complement "Hygon" in log message
- remove unnecessary if()
- grow cpufreq_xen_opts[] array
---
v4 -> v5:
- remove XEN_PROCESSOR_PM_xxx flag sanitization from individual driver
- prefer ! over "== 0" in purely boolean contexts
- Blank line between non-fall-through case blocks
- add build-time checking between internal and public XEN_PROCESSOR_PM_*
values
- define XEN_PROCESSOR_PM_CPPC with 0x100, as it is the next value to use
after public interface, and public mask SIF_PM_MASK is 8 bits wide.
- as Dom0 will send the CPPC/Px data whenever it could, the return value shall
be 0 instead of -ENOSYS/EOPNOTSUP when platform doesn't require these data.
---
 docs/misc/xen-command-line.pandoc         |  7 ++-
 xen/arch/x86/acpi/cpufreq/Makefile        |  1 +
 xen/arch/x86/acpi/cpufreq/amd-cppc.c      | 68 +++++++++++++++++++++++
 xen/arch/x86/acpi/cpufreq/cpufreq.c       | 63 ++++++++++++++++++++-
 xen/arch/x86/platform_hypercall.c         | 13 ++++-
 xen/drivers/acpi/pmstat.c                 |  3 +-
 xen/drivers/cpufreq/cpufreq.c             | 18 +++++-
 xen/include/acpi/cpufreq/cpufreq.h        |  6 +-
 xen/include/acpi/cpufreq/processor_perf.h |  3 +
 xen/include/public/sysctl.h               |  1 +
 10 files changed, 175 insertions(+), 8 deletions(-)
 create mode 100644 xen/arch/x86/acpi/cpufreq/amd-cppc.c
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 89db6e83be..9ef847a336 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -515,7 +515,7 @@ If set, force use of the performance counters for oprofile, rather than detectin
 available support.
 
 ### cpufreq
-> `= none | {{ <boolean> | xen } { [:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } [,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]]`
+> `= none | {{ <boolean> | xen } { [:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } [,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]] | amd-cppc[:[verbose]]`
 
 > Default: `xen`
 
@@ -526,7 +526,7 @@ choice of `dom0-kernel` is deprecated and not supported by all Dom0 kernels.
 * `<maxfreq>` and `<minfreq>` are integers which represent max and min processor frequencies
   respectively.
 * `verbose` option can be included as a string or also as `verbose=<integer>`
-  for `xen`.  It is a boolean for `hwp`.
+  for `xen`.  It is a boolean for `hwp` and `amd-cppc`.
 * `hwp` selects Hardware-Controlled Performance States (HWP) on supported Intel
   hardware.  HWP is a Skylake+ feature which provides better CPU power
   management.  The default is disabled.  If `hwp` is selected, but hardware
@@ -534,6 +534,9 @@ choice of `dom0-kernel` is deprecated and not supported by all Dom0 kernels.
 * `<hdc>` is a boolean to enable Hardware Duty Cycling (HDC).  HDC enables the
   processor to autonomously force physical package components into idle state.
   The default is enabled, but the option only applies when `hwp` is enabled.
+* `amd-cppc` selects ACPI Collaborative Performance and Power Control (CPPC)
+  on supported AMD hardware to provide finer grained frequency control
+  mechanism. The default is disabled.
 
 There is also support for `;`-separated fallback options:
 `cpufreq=hwp;xen,verbose`.  This first tries `hwp` and falls back to `xen` if
diff --git a/xen/arch/x86/acpi/cpufreq/Makefile b/xen/arch/x86/acpi/cpufreq/Makefile
index e7dbe434a8..a2ba34bda0 100644
--- a/xen/arch/x86/acpi/cpufreq/Makefile
+++ b/xen/arch/x86/acpi/cpufreq/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_INTEL) += acpi.o
+obj-$(CONFIG_AMD) += amd-cppc.o
 obj-y += cpufreq.o
 obj-$(CONFIG_INTEL) += hwp.o
 obj-$(CONFIG_AMD) += powernow.o
diff --git a/xen/arch/x86/acpi/cpufreq/amd-cppc.c b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
new file mode 100644
index 0000000000..9e64bf957a
--- /dev/null
+++ b/xen/arch/x86/acpi/cpufreq/amd-cppc.c
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * amd-cppc.c - AMD Processor CPPC Frequency Driver
+ *
+ * Copyright (C) 2025 Advanced Micro Devices, Inc. All Rights Reserved.
+ *
+ * Author: Penny Zheng <penny.zheng@amd.com>
+ *
+ * AMD CPPC cpufreq driver introduces a new CPU performance scaling design
+ * for AMD processors using the ACPI Collaborative Performance and Power
+ * Control (CPPC) feature which provides finer grained frequency control range.
+ */
+
+#include <xen/domain.h>
+#include <xen/init.h>
+#include <xen/param.h>
+#include <acpi/cpufreq/cpufreq.h>
+
+static bool __init amd_cppc_handle_option(const char *s, const char *end)
+{
+    int ret;
+
+    ret = parse_boolean("verbose", s, end);
+    if ( ret >= 0 )
+    {
+        cpufreq_verbose = ret;
+        return true;
+    }
+
+    return false;
+}
+
+int __init amd_cppc_cmdline_parse(const char *s, const char *e)
+{
+    do {
+        const char *end = strpbrk(s, ",;");
+
+        if ( !amd_cppc_handle_option(s, end) )
+        {
+            printk(XENLOG_WARNING
+                   "cpufreq/amd-cppc: option '%.*s' not recognized\n",
+                   (int)((end ?: e) - s), s);
+
+            return -EINVAL;
+        }
+
+        s = end ? end + 1 : NULL;
+    } while ( s && s < e );
+
+    return 0;
+}
+
+static const struct cpufreq_driver __initconst_cf_clobber
+amd_cppc_cpufreq_driver =
+{
+    .name   = XEN_AMD_CPPC_DRIVER_NAME,
+};
+
+int __init amd_cppc_register_driver(void)
+{
+    if ( !cpu_has_cppc )
+    {
+        xen_processor_pmbits &= ~XEN_PROCESSOR_PM_CPPC;
+        return -ENODEV;
+    }
+
+    return cpufreq_register_driver(&amd_cppc_cpufreq_driver);
+}
diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index 61e98b67bd..c40b610c86 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -148,6 +148,11 @@ static int __init cf_check cpufreq_driver_init(void)
                 case CPUFREQ_none:
                     ret = 0;
                     break;
+
+                default:
+                    printk(XENLOG_WARNING
+                           "Unsupported cpufreq driver for vendor Intel\n");
+                    break;
                 }
 
                 if ( ret != -ENODEV )
@@ -157,7 +162,63 @@ static int __init cf_check cpufreq_driver_init(void)
 
         case X86_VENDOR_AMD:
         case X86_VENDOR_HYGON:
-            ret = IS_ENABLED(CONFIG_AMD) ? powernow_register_driver() : -ENODEV;
+            unsigned int i;
+
+            if ( !IS_ENABLED(CONFIG_AMD) )
+            {
+                ret = -ENODEV;
+                break;
+            }
+            ret = -ENOENT;
+
+            for ( i = 0; i < cpufreq_xen_cnt; i++ )
+            {
+                switch ( cpufreq_xen_opts[i] )
+                {
+                case CPUFREQ_xen:
+                    ret = powernow_register_driver();
+                    break;
+
+                case CPUFREQ_amd_cppc:
+                    ret = amd_cppc_register_driver();
+                    break;
+
+                case CPUFREQ_none:
+                    ret = 0;
+                    break;
+
+                default:
+                    printk(XENLOG_WARNING
+                           "Unsupported cpufreq driver for vendor AMD or Hygon\n");
+                    break;
+                }
+
+                if ( ret != -ENODEV )
+                    break;
+            }
+
+            /*
+             * After cpufreq driver registeration, XEN_PROCESSOR_PM_CPPC
+             * and XEN_PROCESSOR_PM_PX shall become exclusive flags.
+             */
+            if ( !ret && i < cpufreq_xen_cnt )
+            {
+                switch ( cpufreq_xen_opts[i] )
+                {
+                case CPUFREQ_amd_cppc:
+                    xen_processor_pmbits &= ~XEN_PROCESSOR_PM_PX;
+                    break;
+
+                case CPUFREQ_xen:
+                    xen_processor_pmbits &= ~XEN_PROCESSOR_PM_CPPC;
+                    break;
+
+                case CPUFREQ_none:
+                default:
+                    break;
+                }
+            }
+
             break;
         }
     }
diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 49717e9ca9..231912ed27 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -539,9 +539,12 @@ ret_t do_platform_op(
         case XEN_PM_PX:
             if ( !(xen_processor_pmbits & XEN_PROCESSOR_PM_PX) )
             {
-                ret = -ENOSYS;
+                ret = 0;
                 break;
             }
+            /* Xen doesn't support mixed mode */
+            ASSERT(!(xen_processor_pmbits & XEN_PROCESSOR_PM_CPPC));
+
             ret = set_px_pminfo(op->u.set_pminfo.id, &op->u.set_pminfo.u.perf);
             break;
  
@@ -573,6 +576,14 @@ ret_t do_platform_op(
         }
 
         case XEN_PM_CPPC:
+            if ( !(xen_processor_pmbits & XEN_PROCESSOR_PM_CPPC) )
+            {
+                ret = 0;
+                break;
+            }
+            /* Xen doesn't support mixed mode */
+            ASSERT(!(xen_processor_pmbits & XEN_PROCESSOR_PM_PX));
+
             ret = set_cppc_pminfo(op->u.set_pminfo.id,
                                   &op->u.set_pminfo.u.cppc_data);
             break;
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index f7351f219d..330bc3a1c2 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -464,7 +464,8 @@ int do_pm_op(struct xen_sysctl_pm_op *op)
     switch ( op->cmd & PM_PARA_CATEGORY_MASK )
     {
     case CPUFREQ_PARA:
-        if ( !(xen_processor_pmbits & XEN_PROCESSOR_PM_PX) )
+        if ( !(xen_processor_pmbits & (XEN_PROCESSOR_PM_PX |
+                                       XEN_PROCESSOR_PM_CPPC)) )
             return -ENODEV;
         if ( !pmpt || !(pmpt->init & (XEN_PX_INIT | XEN_CPPC_INIT)) )
             return -EINVAL;
diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index d1b51c8dd0..fe55720afd 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -65,14 +65,15 @@ LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
 /* set xen as default cpufreq */
 enum cpufreq_controller cpufreq_controller = FREQCTL_xen;
 
-enum cpufreq_xen_opt __initdata cpufreq_xen_opts[2] = { CPUFREQ_xen,
+enum cpufreq_xen_opt __initdata cpufreq_xen_opts[3] = { CPUFREQ_xen,
                                                         CPUFREQ_none };
 unsigned int __initdata cpufreq_xen_cnt = 1;
 
-static const char __initconst cpufreq_opts_str[][5] = {
+static const char __initconst cpufreq_opts_str[][9] = {
     [CPUFREQ_none] = "none",
     [CPUFREQ_xen] = "xen",
     [CPUFREQ_hwp] = "hwp",
+    [CPUFREQ_amd_cppc] = "amd-cppc",
 };
 
 static int __init cpufreq_cmdline_parse(const char *s, const char *e);
@@ -94,6 +95,8 @@ static int __init handle_cpufreq_cmdline(enum cpufreq_xen_opt option)
 {
     int ret = 0;
 
+    /* Do not occupy bits reserved for public xen-pm */
+    BUILD_BUG_ON(MASK_INSR(XEN_PROCESSOR_PM_CPPC, SIF_PM_MASK));
     if ( cpufreq_opts_contain(option) )
     {
         printk(XENLOG_WARNING "Duplicate cpufreq driver option: %s\n",
@@ -105,6 +108,10 @@ static int __init handle_cpufreq_cmdline(enum cpufreq_xen_opt option)
     cpufreq_xen_opts[cpufreq_xen_cnt++] = option;
     switch ( option )
     {
+    case CPUFREQ_amd_cppc:
+        xen_processor_pmbits |= XEN_PROCESSOR_PM_CPPC;
+        break;
+
     case CPUFREQ_hwp:
     case CPUFREQ_xen:
         xen_processor_pmbits |= XEN_PROCESSOR_PM_PX;
@@ -172,6 +179,13 @@ static int __init cf_check setup_cpufreq_option(const char *str)
             if ( arg[0] && arg[1] )
                 ret = hwp_cmdline_parse(arg + 1, end);
         }
+        else if ( IS_ENABLED(CONFIG_AMD) && choice < 0 &&
+                  !cmdline_strcmp(str, "amd-cppc") )
+        {
+            ret = handle_cpufreq_cmdline(CPUFREQ_amd_cppc);
+            if ( arg[0] && arg[1] )
+                ret = amd_cppc_cmdline_parse(arg + 1, end);
+        }
         else
             ret = -EINVAL;
 
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index a3c84143af..83050c58b2 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -28,8 +28,9 @@ enum cpufreq_xen_opt {
     CPUFREQ_none,
     CPUFREQ_xen,
     CPUFREQ_hwp,
+    CPUFREQ_amd_cppc,
 };
-extern enum cpufreq_xen_opt cpufreq_xen_opts[2];
+extern enum cpufreq_xen_opt cpufreq_xen_opts[3];
 extern unsigned int cpufreq_xen_cnt;
 struct cpufreq_governor;
 
@@ -277,4 +278,7 @@ int set_hwp_para(struct cpufreq_policy *policy,
 
 int acpi_cpufreq_register(void);
 
+int amd_cppc_cmdline_parse(const char *s, const char *e);
+int amd_cppc_register_driver(void);
+
 #endif /* __XEN_CPUFREQ_PM_H__ */
diff --git a/xen/include/acpi/cpufreq/processor_perf.h b/xen/include/acpi/cpufreq/processor_perf.h
index f1f4f3138d..9b6466e705 100644
--- a/xen/include/acpi/cpufreq/processor_perf.h
+++ b/xen/include/acpi/cpufreq/processor_perf.h
@@ -5,6 +5,9 @@
 #include <public/sysctl.h>
 #include <xen/acpi.h>
 
+/* Internal ability bits */
+#define XEN_PROCESSOR_PM_CPPC   0x100
+
 #define XEN_CPPC_INIT 0x40000000U
 #define XEN_PX_INIT   0x80000000U
 
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index b0fec271d3..42997252ef 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -423,6 +423,7 @@ struct xen_set_cppc_para {
     uint32_t activity_window;
 };
 
+#define XEN_AMD_CPPC_DRIVER_NAME "amd-cppc"
 #define XEN_HWP_DRIVER_NAME "hwp"
 
 /*
-- 
2.34.1On 27.05.2025 10:48, Penny Zheng wrote:
> Users need to set "cpufreq=amd-cppc" in xen cmdline to enable
> amd-cppc driver, which selects ACPI Collaborative Performance
> and Power Control (CPPC) on supported AMD hardware to provide a
> finer grained frequency control mechanism.
> `verbose` option can also be included to support verbose print.
> 
> When users setting "cpufreq=amd-cppc", a new amd-cppc driver
> shall be registered and used. All hooks for amd-cppc driver are transiently
> missing and will be implemented in the ongoing commits.
> 
> New xen-pm internal flag XEN_PROCESSOR_PM_CPPC is introduced, to be
> differentiated with legacy XEN_PROCESSOR_PM_PX. We define
> XEN_PROCESSOR_PM_CPPC 0x100, as it is the next value to use after 8-bits wide
> public xen-pm options. All xen-pm flag checking involving XEN_PROCESSOR_PM_PX
> shall also be updated to consider XEN_PROCESSOR_PM_CPPC now.
> 
> Xen is not expected to support both or mixed mode (CPPC & legacy PSS, _PCT,
> _PPC) operations, so only one cpufreq driver gets registerd, either amd-cppc
> or legacy P-states driver, which is reflected and asserted by the incompatible
> flags XEN_PROCESSOR_PM_PX and XEN_PROCESSOR_PM_CPPC.
> 
> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
> ---
> v1 -> v2:
> - Obey to alphabetic sorting and also strict it with CONFIG_AMD
> - Remove unnecessary empty comment line
> - Use __initconst_cf_clobber for pre-filled structure cpufreq_driver
> - Make new switch-case code apply to Hygon CPUs too
> - Change ENOSYS with EOPNOTSUPP
> - Blanks around binary operator
> - Change all amd_/-pstate defined values to amd_/-cppc
> ---
> v2 -> v3
> - refactor too long lines
> - Make sure XEN_PROCESSOR_PM_PX and XEN_PROCESSOR_PM_CPPC incompatible flags
> after cpufreq register registrantion
> ---
> v3 -> v4:
> - introduce XEN_PROCESSOR_PM_CPPC in xen internal header
> - complement "Hygon" in log message
> - remove unnecessary if()
> - grow cpufreq_xen_opts[] array
> ---
> v4 -> v5:
> - remove XEN_PROCESSOR_PM_xxx flag sanitization from individual driver
> - prefer ! over "== 0" in purely boolean contexts
> - Blank line between non-fall-through case blocks
> - add build-time checking between internal and public XEN_PROCESSOR_PM_*
> values
> - define XEN_PROCESSOR_PM_CPPC with 0x100, as it is the next value to use
> after public interface, and public mask SIF_PM_MASK is 8 bits wide.
> - as Dom0 will send the CPPC/Px data whenever it could, the return value shall
> be 0 instead of -ENOSYS/EOPNOTSUP when platform doesn't require these data.
> ---
>  docs/misc/xen-command-line.pandoc         |  7 ++-
>  xen/arch/x86/acpi/cpufreq/Makefile        |  1 +
>  xen/arch/x86/acpi/cpufreq/amd-cppc.c      | 68 +++++++++++++++++++++++
>  xen/arch/x86/acpi/cpufreq/cpufreq.c       | 63 ++++++++++++++++++++-
>  xen/arch/x86/platform_hypercall.c         | 13 ++++-
>  xen/drivers/acpi/pmstat.c                 |  3 +-
>  xen/drivers/cpufreq/cpufreq.c             | 18 +++++-
>  xen/include/acpi/cpufreq/cpufreq.h        |  6 +-
>  xen/include/acpi/cpufreq/processor_perf.h |  3 +
>  xen/include/public/sysctl.h               |  1 +
>  10 files changed, 175 insertions(+), 8 deletions(-)
>  create mode 100644 xen/arch/x86/acpi/cpufreq/amd-cppc.c
This patch does quite a bit more than what the subject suggests it means to
do; please consider adjusting.
> +int __init amd_cppc_register_driver(void)
> +{
> +    if ( !cpu_has_cppc )
> +    {
> +        xen_processor_pmbits &= ~XEN_PROCESSOR_PM_CPPC;
> +        return -ENODEV;
> +    }
> +
> +    return cpufreq_register_driver(&amd_cppc_cpufreq_driver);
Isn't this premature? I fear in particular that some of the hooks, which
are still all NULL, might have a way of getting invoked.
> @@ -157,7 +162,63 @@ static int __init cf_check cpufreq_driver_init(void)
>  
>          case X86_VENDOR_AMD:
>          case X86_VENDOR_HYGON:
> -            ret = IS_ENABLED(CONFIG_AMD) ? powernow_register_driver() : -ENODEV;
> +            unsigned int i;
> +
> +            if ( !IS_ENABLED(CONFIG_AMD) )
> +            {
> +                ret = -ENODEV;
> +                break;
> +            }
> +            ret = -ENOENT;
> +
> +            for ( i = 0; i < cpufreq_xen_cnt; i++ )
> +            {
> +                switch ( cpufreq_xen_opts[i] )
> +                {
> +                case CPUFREQ_xen:
> +                    ret = powernow_register_driver();
> +                    break;
> +
> +                case CPUFREQ_amd_cppc:
> +                    ret = amd_cppc_register_driver();
> +                    break;
> +
> +                case CPUFREQ_none:
> +                    ret = 0;
> +                    break;
> +
> +                default:
> +                    printk(XENLOG_WARNING
> +                           "Unsupported cpufreq driver for vendor AMD or Hygon\n");
> +                    break;
> +                }
> +
> +                if ( ret != -ENODEV )
> +                    break;
This, I think, needs some commenting. It's not quite clear why we shouldn't
try the next option if registration failed with other than -ENODEV.
> +            }
> +
> +            /*
> +             * After cpufreq driver registeration, XEN_PROCESSOR_PM_CPPC
> +             * and XEN_PROCESSOR_PM_PX shall become exclusive flags.
> +             */
> +            if ( !ret && i < cpufreq_xen_cnt )
> +            {
> +                switch ( cpufreq_xen_opts[i] )
> +                {
> +                case CPUFREQ_amd_cppc:
> +                    xen_processor_pmbits &= ~XEN_PROCESSOR_PM_PX;
> +                    break;
> +
> +                case CPUFREQ_xen:
> +                    xen_processor_pmbits &= ~XEN_PROCESSOR_PM_CPPC;
> +                    break;
> +
> +                case CPUFREQ_none:
> +                default:
What's the point of the separate "case" label here?
> +                    break;
> +                }
> +            }
Why does this pruning of xen_processor_pmbits sit in the AMD-only code path?
Iirc earlier on you had it in the *_register_driver() themselves, and my
request was to put it in a central, generic place. It's central now, but not
generic (and this way it could as well have remained in *_register_driver()).
> --- a/xen/arch/x86/platform_hypercall.c
> +++ b/xen/arch/x86/platform_hypercall.c
> @@ -539,9 +539,12 @@ ret_t do_platform_op(
>          case XEN_PM_PX:
>              if ( !(xen_processor_pmbits & XEN_PROCESSOR_PM_PX) )
>              {
> -                ret = -ENOSYS;
> +                ret = 0;
>                  break;
>              }
> +            /* Xen doesn't support mixed mode */
> +            ASSERT(!(xen_processor_pmbits & XEN_PROCESSOR_PM_CPPC));
> +
>              ret = set_px_pminfo(op->u.set_pminfo.id, &op->u.set_pminfo.u.perf);
>              break;
I don't see how this change relates to the purpose of the patch. From the
description (and in the absence of a code comment) it also doesn't become
clear at all why that change of return value would be needed (and would be
correct to do). From comments I gave on earlier versions of this series, I
think I can guess what this is about, but such shouldn't be slipped in
silently.
> @@ -573,6 +576,14 @@ ret_t do_platform_op(
>          }
>  
>          case XEN_PM_CPPC:
> +            if ( !(xen_processor_pmbits & XEN_PROCESSOR_PM_CPPC) )
> +            {
> +                ret = 0;
> +                break;
> +            }
As per above, this yielding success needs justifying.
> @@ -94,6 +95,8 @@ static int __init handle_cpufreq_cmdline(enum cpufreq_xen_opt option)
>  {
>      int ret = 0;
>  
> +    /* Do not occupy bits reserved for public xen-pm */
> +    BUILD_BUG_ON(MASK_INSR(XEN_PROCESSOR_PM_CPPC, SIF_PM_MASK));
This looks like an abuse of MASK_INSR(). Why not simply
    BUILD_BUG_ON(XEN_PROCESSOR_PM_CPPC & SIF_PM_MASK);
?
Also please separate this by a blank line from what follows, or perhaps
even better ...
> @@ -105,6 +108,10 @@ static int __init handle_cpufreq_cmdline(enum cpufreq_xen_opt option)
>      cpufreq_xen_opts[cpufreq_xen_cnt++] = option;
>      switch ( option )
>      {
> +    case CPUFREQ_amd_cppc:
> +        xen_processor_pmbits |= XEN_PROCESSOR_PM_CPPC;
> +        break;
... move it here.
> @@ -172,6 +179,13 @@ static int __init cf_check setup_cpufreq_option(const char *str)
>              if ( arg[0] && arg[1] )
>                  ret = hwp_cmdline_parse(arg + 1, end);
>          }
> +        else if ( IS_ENABLED(CONFIG_AMD) && choice < 0 &&
> +                  !cmdline_strcmp(str, "amd-cppc") )
> +        {
> +            ret = handle_cpufreq_cmdline(CPUFREQ_amd_cppc);
> +            if ( arg[0] && arg[1] )
> +                ret = amd_cppc_cmdline_parse(arg + 1, end);
> +        }
See Jason's comment on the earlier patch.
> --- a/xen/include/acpi/cpufreq/processor_perf.h
> +++ b/xen/include/acpi/cpufreq/processor_perf.h
> @@ -5,6 +5,9 @@
>  #include <public/sysctl.h>
>  #include <xen/acpi.h>
>  
> +/* Internal ability bits */
> +#define XEN_PROCESSOR_PM_CPPC   0x100
The comment wants extending, to have a reference to the other XEN_PROCESSOR_PM_*
bits and perhaps also to SIF_PM_MASK. In fact the BUILD_BUG_ON() that I commented
on above could be replaced by
#if XEN_PROCESSOR_PM_CPPC & SIF_PM_MASK
# error "..."
#endif
right here.
Jan
                
            [Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, June 12, 2025 6:42 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Anthony PERARD <anthony.perard@vates.tech>;
> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v5 06/18] xen/x86: introduce "cpufreq=amd-cppc" xen
> cmdline
>
> On 27.05.2025 10:48, Penny Zheng wrote:
> > --- a/xen/arch/x86/platform_hypercall.c
> > +++ b/xen/arch/x86/platform_hypercall.c
> > @@ -94,6 +95,8 @@ static int __init handle_cpufreq_cmdline(enum
> > cpufreq_xen_opt option)  {
> >      int ret = 0;
> >
> > +    /* Do not occupy bits reserved for public xen-pm */
> > +    BUILD_BUG_ON(MASK_INSR(XEN_PROCESSOR_PM_CPPC,
> SIF_PM_MASK));
>
> This looks like an abuse of MASK_INSR(). Why not simply
>
>     BUILD_BUG_ON(XEN_PROCESSOR_PM_CPPC & SIF_PM_MASK);
>
> ?
Because in SIF_PM_MASK, it's bit 8 to 15 reserved for xen-pm options,
See "
#define SIF_PM_MASK       (0xFF<<8) /* reserve 1 byte for xen-pm options */
"
So I'm trying to use MASK_INSR() to do the necessary right shift (other than using 8 directly, in case SIF_PM_MASK changes in the future...)
>
> Jan
                
            On 19.06.2025 10:43, Penny, Zheng wrote:
> [Public]
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, June 12, 2025 6:42 PM
>> To: Penny, Zheng <penny.zheng@amd.com>
>> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
>> <andrew.cooper3@citrix.com>; Anthony PERARD <anthony.perard@vates.tech>;
>> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
>> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [PATCH v5 06/18] xen/x86: introduce "cpufreq=amd-cppc" xen
>> cmdline
>>
>> On 27.05.2025 10:48, Penny Zheng wrote:
>>> --- a/xen/arch/x86/platform_hypercall.c
>>> +++ b/xen/arch/x86/platform_hypercall.c
>>> @@ -94,6 +95,8 @@ static int __init handle_cpufreq_cmdline(enum
>>> cpufreq_xen_opt option)  {
>>>      int ret = 0;
>>>
>>> +    /* Do not occupy bits reserved for public xen-pm */
>>> +    BUILD_BUG_ON(MASK_INSR(XEN_PROCESSOR_PM_CPPC,
>> SIF_PM_MASK));
>>
>> This looks like an abuse of MASK_INSR(). Why not simply
>>
>>     BUILD_BUG_ON(XEN_PROCESSOR_PM_CPPC & SIF_PM_MASK);
>>
>> ?
> 
> Because in SIF_PM_MASK, it's bit 8 to 15 reserved for xen-pm options,
> See "
> #define SIF_PM_MASK       (0xFF<<8) /* reserve 1 byte for xen-pm options */
> "
> So I'm trying to use MASK_INSR() to do the necessary right shift (other than using 8 directly, in case SIF_PM_MASK changes in the future...)
Oh, right, so my replacement suggestion was wrong. But XEN_PROCESSOR_PM_CPPC
isn't contained within the 8 bits. MASK_INSR() could conceivably have a
(debug) check that the passed value actually fits the mask.
     BUILD_BUG_ON(XEN_PROCESSOR_PM_CPPC & MASK_EXTR(~0, SIF_PM_MASK));
?
Jan
                
            [Public]
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, June 12, 2025 6:42 PM
> To: Penny, Zheng <penny.zheng@amd.com>
> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Anthony PERARD <anthony.perard@vates.tech>;
> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v5 06/18] xen/x86: introduce "cpufreq=amd-cppc" xen
> cmdline
>
> On 27.05.2025 10:48, Penny Zheng wrote:
> > Users need to set "cpufreq=amd-cppc" in xen cmdline to enable amd-cppc
> > driver, which selects ACPI Collaborative Performance and Power Control
> > (CPPC) on supported AMD hardware to provide a finer grained frequency
> > control mechanism.
> > `verbose` option can also be included to support verbose print.
> >
> > When users setting "cpufreq=amd-cppc", a new amd-cppc driver shall be
> > registered and used. All hooks for amd-cppc driver are transiently
> > missing and will be implemented in the ongoing commits.
> >
> > New xen-pm internal flag XEN_PROCESSOR_PM_CPPC is introduced, to be
> > differentiated with legacy XEN_PROCESSOR_PM_PX. We define
> > XEN_PROCESSOR_PM_CPPC 0x100, as it is the next value to use after
> > 8-bits wide public xen-pm options. All xen-pm flag checking involving
> > XEN_PROCESSOR_PM_PX shall also be updated to consider
> XEN_PROCESSOR_PM_CPPC now.
> >
> > Xen is not expected to support both or mixed mode (CPPC & legacy PSS,
> > _PCT,
> > _PPC) operations, so only one cpufreq driver gets registerd, either
> > amd-cppc or legacy P-states driver, which is reflected and asserted by
> > the incompatible flags XEN_PROCESSOR_PM_PX and
> XEN_PROCESSOR_PM_CPPC.
> >
> > Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
> > ---
> > v1 -> v2:
> > - Obey to alphabetic sorting and also strict it with CONFIG_AMD
> > - Remove unnecessary empty comment line
> > - Use __initconst_cf_clobber for pre-filled structure cpufreq_driver
> > - Make new switch-case code apply to Hygon CPUs too
> > - Change ENOSYS with EOPNOTSUPP
> > - Blanks around binary operator
> > - Change all amd_/-pstate defined values to amd_/-cppc
> > ---
> > v2 -> v3
> > - refactor too long lines
> > - Make sure XEN_PROCESSOR_PM_PX and XEN_PROCESSOR_PM_CPPC
> incompatible
> > flags after cpufreq register registrantion
> > ---
> > v3 -> v4:
> > - introduce XEN_PROCESSOR_PM_CPPC in xen internal header
> > - complement "Hygon" in log message
> > - remove unnecessary if()
> > - grow cpufreq_xen_opts[] array
> > ---
> > v4 -> v5:
> > - remove XEN_PROCESSOR_PM_xxx flag sanitization from individual driver
> > - prefer ! over "== 0" in purely boolean contexts
> > - Blank line between non-fall-through case blocks
> > - add build-time checking between internal and public
> > XEN_PROCESSOR_PM_* values
> > - define XEN_PROCESSOR_PM_CPPC with 0x100, as it is the next value to
> > use after public interface, and public mask SIF_PM_MASK is 8 bits wide.
> > - as Dom0 will send the CPPC/Px data whenever it could, the return
> > value shall be 0 instead of -ENOSYS/EOPNOTSUP when platform doesn't require
> these data.
> > ---
> >  docs/misc/xen-command-line.pandoc         |  7 ++-
> >  xen/arch/x86/acpi/cpufreq/Makefile        |  1 +
> >  xen/arch/x86/acpi/cpufreq/amd-cppc.c      | 68 +++++++++++++++++++++++
> >  xen/arch/x86/acpi/cpufreq/cpufreq.c       | 63 ++++++++++++++++++++-
> >  xen/arch/x86/platform_hypercall.c         | 13 ++++-
> >  xen/drivers/acpi/pmstat.c                 |  3 +-
> >  xen/drivers/cpufreq/cpufreq.c             | 18 +++++-
> >  xen/include/acpi/cpufreq/cpufreq.h        |  6 +-
> >  xen/include/acpi/cpufreq/processor_perf.h |  3 +
> >  xen/include/public/sysctl.h               |  1 +
> >  10 files changed, 175 insertions(+), 8 deletions(-)  create mode
> > 100644 xen/arch/x86/acpi/cpufreq/amd-cppc.c
>
> > @@ -157,7 +162,63 @@ static int __init cf_check
> > cpufreq_driver_init(void)
> >
> >          case X86_VENDOR_AMD:
> >          case X86_VENDOR_HYGON:
> > -            ret = IS_ENABLED(CONFIG_AMD) ? powernow_register_driver() : -
> ENODEV;
> > +            unsigned int i;
> > +
> > +            if ( !IS_ENABLED(CONFIG_AMD) )
> > +            {
> > +                ret = -ENODEV;
> > +                break;
> > +            }
> > +            ret = -ENOENT;
> > +
> > +            for ( i = 0; i < cpufreq_xen_cnt; i++ )
> > +            {
> > +                switch ( cpufreq_xen_opts[i] )
> > +                {
> > +                case CPUFREQ_xen:
> > +                    ret = powernow_register_driver();
> > +                    break;
> > +
> > +                case CPUFREQ_amd_cppc:
> > +                    ret = amd_cppc_register_driver();
> > +                    break;
> > +
> > +                case CPUFREQ_none:
> > +                    ret = 0;
> > +                    break;
> > +
> > +                default:
> > +                    printk(XENLOG_WARNING
> > +                           "Unsupported cpufreq driver for vendor AMD or Hygon\n");
> > +                    break;
> > +                }
> > +
> > +                if ( ret != -ENODEV )
> > +                    break;
>
> This, I think, needs some commenting. It's not quite clear why we shouldn't try the
> next option if registration failed with other than -ENODEV.
>
I followed the original logic.
Now, I'm trying to understand the reason. I read the related code, there are two code path erroring out other than -ENODEV
In cpufreq_register_driver(), either the driver itself is broken, like missing mandatory hooks, etc, yet in which case, IMO we shall try the fallback option,
or repeated registration, TBH, which seems unlikely to me. cpufreq_driver_init() is a presmp call, so repeated registration doesn't come from racing.
Then if we successfully registered a driver, we will immediately exit the loop. How come we will register twice?
Or am I missing something for this error path:
```
        if ( cpufreq_driver.init )
                return -EBUSY;
```
>
> > +                    break;
> > +                }
> > +            }
>
> Why does this pruning of xen_processor_pmbits sit in the AMD-only code path?
> Iirc earlier on you had it in the *_register_driver() themselves, and my request was
> to put it in a central, generic place. It's central now, but not generic (and this way it
> could as well have remained in *_register_driver()).
>
True, before, I was wrongly assumed that XEN_PROCESSOR_PM_CPPC could only be initialized in AMD.
We could also provide "cpufreq=amd-cppc" in INTEL platform, yet under which scenario, we shall set ret with -ENODEV errno (right now is -ENOENT, need to fix) to keep looping and try the next option...
I’ll move it out of the AMD-only patch, and put it after the whole switch-case
>
> Jan
                
            On 17.06.2025 09:15, Penny, Zheng wrote:
> [Public]
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, June 12, 2025 6:42 PM
>> To: Penny, Zheng <penny.zheng@amd.com>
>> Cc: Huang, Ray <Ray.Huang@amd.com>; Andrew Cooper
>> <andrew.cooper3@citrix.com>; Anthony PERARD <anthony.perard@vates.tech>;
>> Orzel, Michal <Michal.Orzel@amd.com>; Julien Grall <julien@xen.org>; Roger Pau
>> Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [PATCH v5 06/18] xen/x86: introduce "cpufreq=amd-cppc" xen
>> cmdline
>>
>> On 27.05.2025 10:48, Penny Zheng wrote:
>>> Users need to set "cpufreq=amd-cppc" in xen cmdline to enable amd-cppc
>>> driver, which selects ACPI Collaborative Performance and Power Control
>>> (CPPC) on supported AMD hardware to provide a finer grained frequency
>>> control mechanism.
>>> `verbose` option can also be included to support verbose print.
>>>
>>> When users setting "cpufreq=amd-cppc", a new amd-cppc driver shall be
>>> registered and used. All hooks for amd-cppc driver are transiently
>>> missing and will be implemented in the ongoing commits.
>>>
>>> New xen-pm internal flag XEN_PROCESSOR_PM_CPPC is introduced, to be
>>> differentiated with legacy XEN_PROCESSOR_PM_PX. We define
>>> XEN_PROCESSOR_PM_CPPC 0x100, as it is the next value to use after
>>> 8-bits wide public xen-pm options. All xen-pm flag checking involving
>>> XEN_PROCESSOR_PM_PX shall also be updated to consider
>> XEN_PROCESSOR_PM_CPPC now.
>>>
>>> Xen is not expected to support both or mixed mode (CPPC & legacy PSS,
>>> _PCT,
>>> _PPC) operations, so only one cpufreq driver gets registerd, either
>>> amd-cppc or legacy P-states driver, which is reflected and asserted by
>>> the incompatible flags XEN_PROCESSOR_PM_PX and
>> XEN_PROCESSOR_PM_CPPC.
>>>
>>> Signed-off-by: Penny Zheng <Penny.Zheng@amd.com>
>>> ---
>>> v1 -> v2:
>>> - Obey to alphabetic sorting and also strict it with CONFIG_AMD
>>> - Remove unnecessary empty comment line
>>> - Use __initconst_cf_clobber for pre-filled structure cpufreq_driver
>>> - Make new switch-case code apply to Hygon CPUs too
>>> - Change ENOSYS with EOPNOTSUPP
>>> - Blanks around binary operator
>>> - Change all amd_/-pstate defined values to amd_/-cppc
>>> ---
>>> v2 -> v3
>>> - refactor too long lines
>>> - Make sure XEN_PROCESSOR_PM_PX and XEN_PROCESSOR_PM_CPPC
>> incompatible
>>> flags after cpufreq register registrantion
>>> ---
>>> v3 -> v4:
>>> - introduce XEN_PROCESSOR_PM_CPPC in xen internal header
>>> - complement "Hygon" in log message
>>> - remove unnecessary if()
>>> - grow cpufreq_xen_opts[] array
>>> ---
>>> v4 -> v5:
>>> - remove XEN_PROCESSOR_PM_xxx flag sanitization from individual driver
>>> - prefer ! over "== 0" in purely boolean contexts
>>> - Blank line between non-fall-through case blocks
>>> - add build-time checking between internal and public
>>> XEN_PROCESSOR_PM_* values
>>> - define XEN_PROCESSOR_PM_CPPC with 0x100, as it is the next value to
>>> use after public interface, and public mask SIF_PM_MASK is 8 bits wide.
>>> - as Dom0 will send the CPPC/Px data whenever it could, the return
>>> value shall be 0 instead of -ENOSYS/EOPNOTSUP when platform doesn't require
>> these data.
>>> ---
>>>  docs/misc/xen-command-line.pandoc         |  7 ++-
>>>  xen/arch/x86/acpi/cpufreq/Makefile        |  1 +
>>>  xen/arch/x86/acpi/cpufreq/amd-cppc.c      | 68 +++++++++++++++++++++++
>>>  xen/arch/x86/acpi/cpufreq/cpufreq.c       | 63 ++++++++++++++++++++-
>>>  xen/arch/x86/platform_hypercall.c         | 13 ++++-
>>>  xen/drivers/acpi/pmstat.c                 |  3 +-
>>>  xen/drivers/cpufreq/cpufreq.c             | 18 +++++-
>>>  xen/include/acpi/cpufreq/cpufreq.h        |  6 +-
>>>  xen/include/acpi/cpufreq/processor_perf.h |  3 +
>>>  xen/include/public/sysctl.h               |  1 +
>>>  10 files changed, 175 insertions(+), 8 deletions(-)  create mode
>>> 100644 xen/arch/x86/acpi/cpufreq/amd-cppc.c
>>
>>> @@ -157,7 +162,63 @@ static int __init cf_check
>>> cpufreq_driver_init(void)
>>>
>>>          case X86_VENDOR_AMD:
>>>          case X86_VENDOR_HYGON:
>>> -            ret = IS_ENABLED(CONFIG_AMD) ? powernow_register_driver() : -
>> ENODEV;
>>> +            unsigned int i;
>>> +
>>> +            if ( !IS_ENABLED(CONFIG_AMD) )
>>> +            {
>>> +                ret = -ENODEV;
>>> +                break;
>>> +            }
>>> +            ret = -ENOENT;
>>> +
>>> +            for ( i = 0; i < cpufreq_xen_cnt; i++ )
>>> +            {
>>> +                switch ( cpufreq_xen_opts[i] )
>>> +                {
>>> +                case CPUFREQ_xen:
>>> +                    ret = powernow_register_driver();
>>> +                    break;
>>> +
>>> +                case CPUFREQ_amd_cppc:
>>> +                    ret = amd_cppc_register_driver();
>>> +                    break;
>>> +
>>> +                case CPUFREQ_none:
>>> +                    ret = 0;
>>> +                    break;
>>> +
>>> +                default:
>>> +                    printk(XENLOG_WARNING
>>> +                           "Unsupported cpufreq driver for vendor AMD or Hygon\n");
>>> +                    break;
>>> +                }
>>> +
>>> +                if ( ret != -ENODEV )
>>> +                    break;
>>
>> This, I think, needs some commenting. It's not quite clear why we shouldn't try the
>> next option if registration failed with other than -ENODEV.
> 
> I followed the original logic.
Which may easily itself be partly bogus.
> Now, I'm trying to understand the reason. I read the related code, there are two code path erroring out other than -ENODEV
> In cpufreq_register_driver(), either the driver itself is broken, like missing mandatory hooks, etc, yet in which case, IMO we shall try the fallback option,
> or repeated registration, TBH, which seems unlikely to me. cpufreq_driver_init() is a presmp call, so repeated registration doesn't come from racing.
> Then if we successfully registered a driver, we will immediately exit the loop. How come we will register twice?
> Or am I missing something for this error path:
> ```
>         if ( cpufreq_driver.init )
>                 return -EBUSY;
> ```
Imo this error path is there "just in case".
Jan
                
            © 2016 - 2025 Red Hat, Inc.