[RFC PATCH v2 0/5] x86/hygon: Add Family 0x18 node enumeration and SMN access

Lin Wang posted 5 patches 1 month, 3 weeks ago
MAINTAINERS                       |    3 +
arch/x86/Kconfig                  |    4 +
arch/x86/include/asm/amd/nb.h     |    7 +
arch/x86/include/asm/amd/node.h   |    4 +
arch/x86/include/asm/hygon/node.h |   85 +++
arch/x86/kernel/Makefile          |    1 +
arch/x86/kernel/amd_nb.c          |   22 +
arch/x86/kernel/amd_node.c        |  106 ++-
arch/x86/kernel/hygon_node.c      | 1008 +++++++++++++++++++++++++++++
include/linux/pci_ids.h           |    6 +
10 files changed, 1223 insertions(+), 23 deletions(-)
create mode 100644 arch/x86/include/asm/hygon/node.h
create mode 100644 arch/x86/kernel/hygon_node.c
[RFC PATCH v2 0/5] x86/hygon: Add Family 0x18 node enumeration and SMN access
Posted by Lin Wang 1 month, 3 weeks ago
== Background ==

The "AMD NB and SMN rework" series [1][2] restructured AMD northbridge
and SMN support into amd_nb.c and amd_node.c, providing a clean
framework for discovering and accessing AMD Data Fabric (DF) instances
via PCI config space and SMN.

Hygon Family 0x18 implements a Data Fabric topology that is structurally
comparable to AMD Zen systems at the register level, but diverges from
several platform conventions that amd_nb.c and amd_node.c rely on. This
series adds the missing Hygon DF support while keeping all Hygon
enablement in Hygon-specific files.

== Overview ==

This series adds kernel infrastructure for two closely related functions
on Hygon Family 0x18:

  1. DF node enumeration -- discovering and identifying each Data Fabric
     (DF) instance on the system through its PCI config registers, so
     that the NB and SMN subsystems can access them correctly.

  2. CPU-to-node mapping -- establishing a canonical logical node ID that
     bridges the DF register world (socket_id, DFID) and the CPU
     topology world (phys_node_id from CPUID), so that EDAC, MCE, and
     ATL consumers can translate a CPU reference to the correct DF node
     in O(1).

== Problem ==

AMD Zen-based systems enumerate DF nodes at fixed PCI slots 00:18.x
through 00:1f.x on segment 0 bus 0. The slot-minus-0x18 identity is a
platform guarantee that amd_nb.c and amd_node.c rely on throughout.

Hygon Family 0x18 processors expose DF instances at platform-assigned
PCI slots with no fixed relationship to node identity. As a result:

  - amd_nb.c cannot populate amd_northbridges[] for Hygon systems.
  - amd_smn_init() cannot assign SMN root devices to the correct nodes.
  - There is no kernel mechanism to map a CPU to its DF node. The
    phys_node_id values from CPUID 8000001Eh[7:0] are globally unique
    but sparse (socket 0: 0..3, socket 1: 16..19, etc.) and cannot be
    used directly as indices.

== Solution ==

v2 follows the architectural direction from v1 review: all Hygon
enablement goes into Hygon-specific files. Rather than adding Hygon
branches into amd_nb.c and amd_node.c, v2 introduces two small generic
registration interfaces -- amd_nb_set_cache() and smn_set_roots() -- so
that a vendor-specific module can build its data structures independently
and hand them to the shared infrastructure.

hygon_node.c is the Hygon-specific DF enumeration module. It reads
hardware identity directly from each DF instance, builds a correctly
ordered node array, provides the CPU-to-node mapping as an O(1) lookup,
and registers the results with the AMD NB and SMN layers through the
generic interfaces above.

The existing AMD code paths remain unchanged: amd_cache_northbridges()
and amd_smn_init() continue to run exactly as before on AMD hardware.
On Hygon systems, the AMD paths early-return and Hygon's own
fs_initcall handles all registration.

== Hardware Background ==

Each DF instance exposes identity through PCI config registers on its
function siblings:

  F1x200 (SystemCfg), all models:
    [30:28]  MySocketId  -- hardware socket ID
    [23:20]  MyDieId     -- die ID (equals DFID on most models)

  F5x180 (FabricId), Model 0x06-0x08 only:
    [19:16]  DFID        -- real Data Fabric ID (MyDieId != DFID here)

DFID classifies each DF instance:
  DFID >= 4: Compute Die (CDD) -- hosts CPU cores and UMC controllers.
             Each CDD has a platform-unique phys_node_id from
             CPUID leaf 8000001Eh[7:0].
  DFID <  4: I/O Die (IOD)     -- interconnect and I/O; no CPUs, no UMC.

The phys_node_id is globally unique but sparse across sockets:

  4-socket, 4 CDD/socket (Model 0x04/0x05):
    Socket 0: DFID=4,5,6,7   phys_nid=0,1,2,3
    Socket 1: DFID=4,5,6,7   phys_nid=16,17,18,19
    ...

  2-socket, sparse DFID from F5x180 (Model 0x06):
    Socket 0: DFID=4,5,8,9   phys_nid=0,1,2,3
    Socket 1: DFID=4,5,8,9   phys_nid=16,17,18,19

== Design ==

=== DF Node Enumeration ===

hygon_build_cache() runs lazily on the first API call, under a mutex,
and publishes the completed cache via smp_store_release(ready).

Phase 1 -- Collect:
  Walk all DF misc (F3) devices matching hygon_nb_misc_ids[]. For each:
    - Find the F4 (link) sibling on the same PCI slot.
    - Read F1x200 via the F1 sibling: socket_id and die ID.
    - On Model 0x06-0x08: read F5x180 via the F5 sibling for the real
      DFID (MyDieId != DFID on these models).
    - A model-to-device-ID table (hygon_df_table[]) maps boot CPU model
      to the expected F1/F5 device IDs, avoiding per-model switch-case
      logic and making new-model support a one-row table addition.
  Validate that socket IDs are dense (0..N-1).

Phase 2 -- Sort and classify:
  Sort all collected entries by (is_cdd DESC, socket_id ASC, dfid ASC).
  Count num_cdd. Validate: num_cdd > 0, fits in u8, divisible by
  num_sockets, each socket contributes the same CDD count.

After sorting, the node array is partitioned as:

  nodes[]
  +----------------------------------------------------------+
  | CDD region: indices 0 .. num_cdd-1                       |
  | sorted by (socket_id ASC, dfid ASC)                      |
  | index = logical_node_id                                  |
  +----------------------------------------------------------+
  | IOD region: indices num_cdd .. num_nodes-1               |
  | socket_id used for SMN root assignment only              |
  +----------------------------------------------------------+

=== CPU-to-Node Mapping ===

The array index in the CDD region serves as the logical_node_id.
Consumers such as EDAC, MCE, and ATL need to translate a CPU reference
to this index. The challenge is bridging two independent hardware worlds:

  DF world:   socket_id and DFID from PCI config registers (F1x200, F5x180)
  CPU world:  phys_node_id from CPUID 8000001Eh[7:0] per core

Both are hardware-fixed values. Neither depends on software enumeration
order. A third hardware property connects them:

    Within a socket, CDDs with ascending DFID are assigned ascending
    phys_node_id values. Lower socket IDs always occupy lower phys_nid
    ranges.

Phase 3 of the cache build exploits this:
  - Collect unique phys_node_id values from online CPUs (one per CDD)
    via topology_amd_node_id(), using a bitmap for de-duplication.
  - Sort the collected values globally ascending.
  - Map: nid_to_logical[nids[i]] = i  for i in 0..num_cdd-1.

The mapping is stored as a 256-byte direct-mapped array for O(1) lookup.

If the hardware property is violated on a future platform, Phase 3
validation detects it (collected phys_nid count != num_cdd) and fails
loudly rather than producing a silently wrong mapping.

=== NB Cache and SMN Root Registration ===

Once enumeration is complete, two independent fs_initcalls perform
registration that plugs Hygon into the shared AMD infrastructure:

  1. hygon_nb_init() -- builds a struct amd_northbridge[] array from the
     enumerated nodes and registers it via amd_nb_set_cache(), so
     amd_nb consumers (EDAC, MCE decode, etc.) work transparently.

  2. hygon_smn_init() -- Hygon shares SMN root devices per-socket (all
     nodes on the same socket use the same host-bridge root). Discovers
     host-bridge roots by PCI class, reserves config regions exclusively,
     expands the per-socket roots into a per-node array and registers it
     via smn_set_roots(). Properly releases reserved config regions and
     allocated memory on any failure.

Both registration functions follow the same pattern: the caller builds
and owns the data, hands it to the generic layer, and the generic layer
checks for double-registration (-EBUSY).

== API ==

Exported (for loadable module consumers: EDAC, MCE, ATL):

  hygon_f18h_model()              model byte, 0 if not Hygon Family 0x18
  hygon_cdd_num()                 CDD count (EDAC instance sizing)
  hygon_get_dfid(misc, &dfid)     DFID for a DF misc device
  hygon_cpu_to_df_node(cpu)       DF node index (0..N-1), or -errno

Inline helper in the header:
  hygon_f18h_model_in_range(first, last)  -- true if model in [first, last]

asm/hygon/node.h provides static inline stubs returning 0/NULL/-ENODEV
for all functions when CONFIG_HYGON_NODE=n; no #ifdef guards are needed
in consumer code.

== Series ==

  [1/5] pci_ids: Add Hygon Family 0x18 DF device IDs
        Hygon-specific PCI device IDs for DF misc (F3) and link (F4)
        functions across all supported Hygon Family 0x18 models.

  [2/5] x86/hygon: Add Family 0x18 node enumeration API header
        asm/hygon/node.h: API declarations, CONFIG_HYGON_NODE=n stubs,
        and inline model-check helpers.

  [3/5] x86/amd_nb: Add amd_nb_set_cache() for external NB cache
        registration
        Vendor-neutral data registration interface: the caller builds
        a struct amd_northbridge[] array, populates the mandatory misc
        and link device pointers, and transfers ownership to the NB
        layer. Guard amd_cache_northbridges() to AMD-only.

  [4/5] x86/amd_node: Add smn_set_roots() and smn_activate() for
        external SMN registration
        Vendor-neutral interface for registering a pre-built per-node
        SMN root array. Guard amd_smn_init() to AMD-only and remove the
        Hygon vendor check from get_next_root().

  [5/5] x86/hygon: Add Fam18h node enumeration, NB cache and SMN init
        Core implementation in hygon_node.c: CONFIG_HYGON_NODE Kconfig,
        three-phase cache build, hardware-anchored CPU-to-node mapping,
        NB cache registration via hygon_nb_init() and amd_nb_set_cache(),
        SMN root discovery and registration via hygon_smn_init() and
        smn_set_roots(). Both as independent fs_initcalls. All exported
        API functions.

== Testing ==

Verified boot, SMN access, and CPU-to-node mapping correctness on:
  - Hygon Family 0x18 Model 0x04, 4-socket (16 CDD + 4 IOD)
  - Hygon Family 0x18 Model 0x05, 2-socket (8 CDD + 2 IOD)
  - Hygon Family 0x18 Model 0x06, 2-socket, sparse DFID (8 CDD + 2 IOD)

== Feedback Requested ==

  - Is the generic registration approach (amd_nb_set_cache / smn_set_roots)
    the right way for non-AMD vendors to plug into the existing NB/SMN
    infrastructure, or would a different integration point be preferred?

  - Is the three-phase enumeration approach the right structure for
    Hygon DF node discovery?

  - Is the hardware-anchored bijection (Phase 3) a sound way to
    establish the CPU-to-CDD mapping, or is there a cleaner mechanism
    that does not rely on the DFID-ASC <-> phys_nid-ASC property?

  - Does the exported API (four functions) provide the right abstraction
    for downstream consumers?

== Changes since v1 [5] ==

The main change addresses Boris's feedback [3] and Mario's suggestions [4]:
all Hygon enablement is now in Hygon-specific files, with no Hygon
branches added to AMD code paths.

Specifically:

  - v1 patches 4/5 added Hygon branches directly into amd_nb.c
    (amd_cache_northbridges) and amd_node.c (amd_smn_init).
    v2 replaces this with two generic registration interfaces:
    amd_nb_set_cache() and smn_set_roots(). The AMD code paths
    themselves are unchanged; Hygon builds and registers its own
    data from hygon_node.c.

  - v1 reused AMD-prefixed PCI device IDs for Hygon devices.
    v2 defines Hygon-specific IDs (PCI_DEVICE_ID_HYGON_18H_M04H_*)
    in the Hygon section of pci_ids.h, per Mario's suggestion.

  - v2 introduces CONFIG_HYGON_NODE (depends on CPU_SUP_HYGON, PCI,
    AMD_NB, AMD_NODE) to gate all Hygon-specific code, per Mario's
    suggestion.

  - v1 left the Hygon vendor ID check in amd_node.c get_next_root().
    v2 removes it so that Hygon host bridges are no longer enumerated
    by the AMD path; Hygon enumerates its own roots in hygon_node.c.

Link: https://lore.kernel.org/all/20241206161210.163701-1-yazen.ghannam@amd.com/ # [1]
Link: https://lore.kernel.org/all/20250107222847.3300430-1-yazen.ghannam@amd.com/ # [2]
Link: https://lore.kernel.org/all/20260402235036.GDac8AzDPVVq-tBeG-@fat_crate.local/ # [3]
Link: https://lore.kernel.org/all/ab6a8335-48ab-4800-be27-761a91264ad7@amd.com/ # [4]
Link: https://lore.kernel.org/all/20260402111515.1155505-1-wanglin@open-hieco.net/ # [5] RFC v1

Lin Wang (5):
  pci_ids: Add Hygon Family 0x18 DF device IDs
  x86/hygon: Add Family 0x18 node enumeration API header
  x86/amd_nb: Add amd_nb_set_cache() for external NB cache registration
  x86/amd_node: Add smn_set_roots() and smn_activate() for external SMN registration
  x86/hygon: Add Fam18h node enumeration, NB cache and SMN init

 MAINTAINERS                       |    3 +
 arch/x86/Kconfig                  |    4 +
 arch/x86/include/asm/amd/nb.h     |    7 +
 arch/x86/include/asm/amd/node.h   |    4 +
 arch/x86/include/asm/hygon/node.h |   85 +++
 arch/x86/kernel/Makefile          |    1 +
 arch/x86/kernel/amd_nb.c          |   22 +
 arch/x86/kernel/amd_node.c        |  106 ++-
 arch/x86/kernel/hygon_node.c      | 1008 +++++++++++++++++++++++++++++
 include/linux/pci_ids.h           |    6 +
 10 files changed, 1223 insertions(+), 23 deletions(-)
 create mode 100644 arch/x86/include/asm/hygon/node.h
 create mode 100644 arch/x86/kernel/hygon_node.c

-- 
2.43.0