[PATCH v4 0/7] platform/x86/amd/hsmp: Support Family 1Ah Model 50h-5Fh telemetry

Muralidhara M K posted 7 patches 1 week, 4 days ago
Documentation/arch/x86/amd_hsmp.rst  |  30 ++-
arch/x86/include/uapi/asm/amd_hsmp.h | 294 +++++++++++++++++++++++++--
drivers/platform/x86/amd/hsmp/acpi.c |  23 ++-
drivers/platform/x86/amd/hsmp/hsmp.c | 169 +++++++++++++--
drivers/platform/x86/amd/hsmp/hsmp.h |   6 +-
5 files changed, 485 insertions(+), 37 deletions(-)
[PATCH v4 0/7] platform/x86/amd/hsmp: Support Family 1Ah Model 50h-5Fh telemetry
Posted by Muralidhara M K 1 week, 4 days ago
Hi,

This series adds HSMP protocol version 7 support to the AMD HSMP driver
for Family 1Ah Model 50h-5Fh systems, and addresses a small set of
related issues that surfaced while wiring up the new platform.

The bulk of the work is exposing the new firmware-defined metric table
layout to userspace. HSMP_PROTO_VER7 firmware delivers a per-IOD /
per-CCD blob (struct hsmp_metric_table_zen6, ~13 KB) that does not fit
the existing per-socket struct hsmp_metric_table and does not fit the
existing sysfs metrics_bin transport at all, since binary sysfs
attributes are bounded by PAGE_SIZE. metrics_bin also has a long-
standing tearing problem: userspace can read it in multiple chunks and
observe an inconsistent snapshot if SMU refreshes the table between
read() calls.

To address both, the series:

  - Adds a stable UAPI description of the v7 metric-table layout
    (struct hsmp_metric_table_zen6{,_iod,_ccd}) that mirrors the
    firmware blob 1:1.

  - Stops hard-coding the metric-table region size in the driver and
    sources it from firmware (HSMP_GET_METRIC_TABLE_DRAM_ADDR args[2]),
    falling back to the old size on v6 firmware that leaves the field
    zero. The same size is used for ioremap() and for the ioctl
    bounds check.

  - Adds a new HSMP_IOCTL_GET_TELEMETRY_DATA on the existing HSMP
    character device that always copies the full firmware-reported
    table in one shot, removing both the PAGE_SIZE cap and the
    chunked-read tearing window. The trailing reserved field in the
    request is rejected if non-zero so future kernels can repurpose
    it without breaking deployed userspace, and user-supplied indices
    (sock_ind, msg_id) are clamped with array_index_nospec() to
    mitigate Spectre v1.

  - Wires the ACPI driver to populate the DRAM mapping for v7
    (currently gated on '== HSMP_PROTO_VER6'), so the new ioctl path
    is actually reachable on Family 1Ah Model 50h-5Fh hardware. The
    legacy metrics_bin attribute is preserved and continues to work
    on v6; on v7 it returns -EOPNOTSUPP pointing at the ioctl.

Around that, the series also:

  - Adds the new HSMP message IDs (0x29-0x2A, 0x33-0x3A) introduced by
    Family 1Ah Model 50h-5Fh firmware (PC6/CC6 control, CCD power /
    thermal monitoring, DIMM sideband access, floor- and SDPS-limit
    control, command-enable discovery) and converts three SET-only
    messages (HSMP_SET_XGMI_LINK_WIDTH, HSMP_SET_DF_PSTATE,
    HSMP_SET_PSTATE_MAX_MIN) to HSMP_SET_GET so userspace can read
    back the currently programmed value via bit[31] of args[0].

  - Relaxes validate_message() to an upper-bound check for every
    message type (HSMP_SET_GET already did this). Existing userspace
    that asks for fewer response words than firmware now provides is
    no longer rejected with -EINVAL.

  - Tightens metric-table read locking with a per-socket guard(mutex)
    around the SMU-side refresh + memcpy_fromio() sequence, and
    reorders devm_mutex_init() before devm_ioremap() so the
    "metric_tbl_addr != NULL implies metric_tbl_lock is initialized"
    invariant holds on every probe path.

Documentation/arch/x86/amd_hsmp.rst is updated to describe the
metrics_bin / IOCTL_GET_TELEMETRY_DATA split and the protocol-version
gating.

The series is bisect-safe: every commit was built individually and
checkpatch.pl --strict is clean across the series.

v3:
https://lore.kernel.org/platform-driver-x86/20260517151211.415627-1-muralidhara.mk@amd.com/T/#t

Changes since v3:

  - 1/7 (new HSMP messages): Tightened the inline bitfield comments
    on HSMP_SET_POWER_MODE / HSMP_PC6_ENABLE / HSMP_CC6_ENABLE /
    HSMP_DIMM_SB_RD / HSMP_DIMM_SB_WR / HSMP_FLOOR_LIMIT /
    HSMP_SDPS_LIMIT. Rewrote the commit log to spell out the
    backward-compatibility analysis for bit[31] of args[0] on
    the three SET->SET_GET conversions.

  - 2/7 (UAPI structs): Added per-field comments explaining the
    num_active_ccds placement (in the IOD block, matching the
    firmware-side aggregator) and the ccd[] sizing
    (HSMP_F1A_M50_M5F_MAX_CCDS = 8, hardware maximum, userspace
    iterates [0 .. num_active_ccds - 1]).

  - 3/7 (response_sz): Rewrote the commit log to motivate the change
    in terms of forward compatibility of older userspace against
    newer firmware/descriptor tables.

  - 4/7 (firmware-reported table size): Rewrote the commit log to
    document args[2] semantics and the 0-fallback path that keeps
    behaviour identical on v6 firmware.

  - 5/7 (HSMP_IOCTL_GET_TELEMETRY_DATA):
      * Added array_index_nospec() on req.sock_ind and msg.msg_id to
        mitigate Spectre v1 on the user-controlled ioctl indices.
      * Rejected non-zero req.reserved with -EINVAL so the field
        stays repurposable.
      * Tightened the UAPI struct layout reasoning under the
        surrounding #pragma pack(4) and the size-mismatch handling.
      * Trimmed the long implementation-detail paragraphs from the
        commit log.

  - 6/7 (enable v7 on the ACPI driver):
      * Restored the legacy sysfs metrics_bin attribute (it was
        dropped in v3) and gated reads by protocol version so v6
        userspace is unaffected and v7 userspace gets -EOPNOTSUPP
        pointing at HSMP_IOCTL_GET_TELEMETRY_DATA.
      * Removed the ABI-break paragraph from the commit log now that
        metrics_bin is preserved.

  - 7/7 (guard(mutex)):
      * Reordered devm_mutex_init() before devm_ioremap() so that
        sock->metric_tbl_addr is never published with an
        uninitialized metric_tbl_lock on the non-fatal init_acpi() /
        init_platform_device() error paths.
      * Rewrote the commit log to document that ordering invariant.

  - Documentation/arch/x86/amd_hsmp.rst: Documented the metrics_bin
    PAGE_SIZE / protocol-version constraints and the new
    HSMP_IOCTL_GET_TELEMETRY_DATA path, and cleaned up whitespace in
    the embedded struct snippet.

Thanks,
Murali

Muralidhara M K (7):
  platform/x86/amd/hsmp: Add new HSMP messages for Family 1Ah, Model
    50h-5Fh
  platform/x86/amd/hsmp: Add UAPI structures for Family 1Ah Model
    50h-5Fh metrics table
  platform/x86/amd/hsmp: Unify response_sz validation to an upper-bound
    check
  platform/x86/amd/hsmp: Source metric-table size from firmware
  platform/x86/amd/hsmp: Add IOCTL_GET_TELEMETRY_DATA for metric table
    reads
  platform/x86/amd/hsmp: Enable HSMP_PROTO_VER7 metric tables on the
    ACPI driver via the IOCTL
  platform/x86/amd/hsmp: Make metric table read locking use guard(mutex)

 Documentation/arch/x86/amd_hsmp.rst  |  30 ++-
 arch/x86/include/uapi/asm/amd_hsmp.h | 294 +++++++++++++++++++++++++--
 drivers/platform/x86/amd/hsmp/acpi.c |  23 ++-
 drivers/platform/x86/amd/hsmp/hsmp.c | 169 +++++++++++++--
 drivers/platform/x86/amd/hsmp/hsmp.h |   6 +-
 5 files changed, 485 insertions(+), 37 deletions(-)

-- 
2.34.1