:p
atchew
Login
Hi All, TPH (TLP Processing Hints) is a PCIe feature that allows endpoint devices to provide optimization hints for requests that target memory space. These hints, in a format called steering tag (ST), are provided in the requester's TLP headers and allow the system hardware, including the Root Complex, to optimize the utilization of platform resources for the requests. Upcoming AMD hardware implement a new Cache Injection feature that leverages TPH. Cache Injection allows PCIe endpoints to inject I/O Coherent DMA writes directly into an L2 within the CCX (core complex) closest to the CPU core that will consume it. This technology is aimed at applications requiring high performance and low latency, such as networking and storage applications. This series introduces generic TPH support in Linux, allowing STs to be retrieved from ACPI _DSM (as defined by ACPI) and used by PCIe endpoint drivers as needed. As a demonstration, it includes an example usage in the Broadcom BNXT driver. When running on Broadcom NICs with the appropriate firmware, Cache Injection shows substantial memory bandwidth savings in real-world benchmarks. This solution is vendor-neutral, as both TPH and ACPI _DSM are industry standards. V1->V2: * Rebase on top of pci.git/for-linus (6.10-rc1) * Address mismatched data types reported by Sparse (Sparse checking passed) * Add a new API, pcie_tph_intr_vec_supported(), for checking IRQ mode support * Skip bnxt affinity notifier registration if pcie_tph_intr_vec_supported()=false * Minor fixes in bnxt driver (i.e. warning messages) Manoj Panicker (1): bnxt_en: Add TPH support in BNXT driver Michael Chan (1): bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings Wei Huang (8): PCI: Introduce PCIe TPH support framework PCI: Add TPH related register definition PCI/TPH: Implement a command line option to disable TPH PCI/TPH: Implement a command line option to force No ST Mode PCI/TPH: Introduce API functions to manage steering tags PCI/TPH: Retrieve steering tag from ACPI _DSM PCI/TPH: Add TPH documentation Documentation/PCI/index.rst | 1 + Documentation/PCI/tph.rst | 57 ++ .../admin-guide/kernel-parameters.txt | 2 + Documentation/driver-api/pci/pci.rst | 3 + drivers/net/ethernet/broadcom/bnxt/bnxt.c | 62 +- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 4 + drivers/pci/pci-driver.c | 12 +- drivers/pci/pci.c | 24 + drivers/pci/pci.h | 6 + drivers/pci/pcie/Kconfig | 10 + drivers/pci/pcie/Makefile | 1 + drivers/pci/pcie/tph.c | 582 ++++++++++++++++++ drivers/pci/probe.c | 1 + drivers/vfio/pci/vfio_pci_config.c | 7 +- include/linux/pci-tph.h | 78 +++ include/linux/pci.h | 6 + include/uapi/linux/pci_regs.h | 35 +- 17 files changed, 881 insertions(+), 10 deletions(-) create mode 100644 Documentation/PCI/tph.rst create mode 100644 drivers/pci/pcie/tph.c create mode 100644 include/linux/pci-tph.h -- 2.44.0
This patch implements the framework for PCIe TPH support. It introduces tph.c source file, along with CONFIG_PCIE_TPH, to Linux PCIe subsystem. A new member, named tph_cap, is also introduced in pci_dev to cache TPH capability offset. Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> --- drivers/pci/pci.h | 6 ++++++ drivers/pci/pcie/Kconfig | 10 ++++++++++ drivers/pci/pcie/Makefile | 1 + drivers/pci/pcie/tph.c | 28 ++++++++++++++++++++++++++++ drivers/pci/probe.c | 1 + include/linux/pci.h | 4 ++++ 6 files changed, 50 insertions(+) create mode 100644 drivers/pci/pcie/tph.c diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -XXX,XX +XXX,XX @@ static inline int pci_iov_bus_range(struct pci_bus *bus) #endif /* CONFIG_PCI_IOV */ +#ifdef CONFIG_PCIE_TPH +void pcie_tph_init(struct pci_dev *dev); +#else +static inline void pcie_tph_init(struct pci_dev *dev) {} +#endif + #ifdef CONFIG_PCIE_PTM void pci_ptm_init(struct pci_dev *dev); void pci_save_ptm_state(struct pci_dev *dev); diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/pcie/Kconfig +++ b/drivers/pci/pcie/Kconfig @@ -XXX,XX +XXX,XX @@ config PCIE_EDR the PCI Firmware Specification r3.2. Enable this if you want to support hybrid DPC model which uses both firmware and OS to implement DPC. + +config PCIE_TPH + bool "TLP Processing Hints" + default n + help + This option adds support for PCIE TLP Processing Hints (TPH). + TPH allows endpoint devices to provide optimization hints, such as + desired caching behavior, for requests that target memory space. + These hints, called steering tags, can empower the system hardware + to optimize the utilization of platform resources. diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/pcie/Makefile +++ b/drivers/pci/pcie/Makefile @@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_PCIE_PME) += pme.o obj-$(CONFIG_PCIE_DPC) += dpc.o obj-$(CONFIG_PCIE_PTM) += ptm.o obj-$(CONFIG_PCIE_EDR) += edr.o +obj-$(CONFIG_PCIE_TPH) += tph.o diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/drivers/pci/pcie/tph.c @@ -XXX,XX +XXX,XX @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * TPH (TLP Processing Hints) support + * + * Copyright (C) 2024 Advanced Micro Devices, Inc. + * Eric Van Tassell <Eric.VanTassell@amd.com> + * Wei Huang <wei.huang2@amd.com> + */ + +#define pr_fmt(fmt) "TPH: " fmt +#define dev_fmt pr_fmt + +#include <linux/acpi.h> +#include <uapi/linux/pci_regs.h> +#include <linux/kernel.h> +#include <linux/errno.h> +#include <linux/msi.h> +#include <linux/pci.h> +#include <linux/msi.h> +#include <linux/pci-acpi.h> + +#include "../pci.h" + +void pcie_tph_init(struct pci_dev *dev) +{ + dev->tph_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_TPH); +} + diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -XXX,XX +XXX,XX @@ static void pci_init_capabilities(struct pci_dev *dev) pci_dpc_init(dev); /* Downstream Port Containment */ pci_rcec_init(dev); /* Root Complex Event Collector */ pci_doe_init(dev); /* Data Object Exchange */ + pcie_tph_init(dev); /* TLP Processing Hints */ pcie_report_downtraining(dev); pci_init_reset_methods(dev); diff --git a/include/linux/pci.h b/include/linux/pci.h index XXXXXXX..XXXXXXX 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -XXX,XX +XXX,XX @@ struct pci_dev { /* These methods index pci_reset_fn_methods[] */ u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */ + +#ifdef CONFIG_PCIE_TPH + u16 tph_cap; /* TPH capability offset */ +#endif }; static inline struct pci_dev *pci_physfn(struct pci_dev *dev) -- 2.44.0
Linux has some basic, but incomplete, definition for the TPH Requester capability registers. Also the control registers of TPH Requester and the TPH Completer are missing. This patch adds all required definitions to support TPH enablement. Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> --- drivers/vfio/pci/vfio_pci_config.c | 7 +++--- include/uapi/linux/pci_regs.h | 35 ++++++++++++++++++++++++++---- 2 files changed, 35 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -XXX,XX +XXX,XX @@ static int vfio_ext_cap_len(struct vfio_pci_core_device *vdev, u16 ecap, u16 epo if (ret) return pcibios_err_to_errno(ret); - if ((dword & PCI_TPH_CAP_LOC_MASK) == PCI_TPH_LOC_CAP) { + if (((dword & PCI_TPH_CAP_LOC_MASK) >> PCI_TPH_CAP_LOC_SHIFT) + == PCI_TPH_LOC_CAP) { int sts; sts = dword & PCI_TPH_CAP_ST_MASK; sts >>= PCI_TPH_CAP_ST_SHIFT; - return PCI_TPH_BASE_SIZEOF + (sts * 2) + 2; + return PCI_TPH_ST_TABLE + (sts * 2) + 2; } - return PCI_TPH_BASE_SIZEOF; + return PCI_TPH_ST_TABLE; case PCI_EXT_CAP_ID_DVSEC: ret = pci_read_config_dword(pdev, epos + PCI_DVSEC_HEADER1, &dword); if (ret) diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index XXXXXXX..XXXXXXX 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -XXX,XX +XXX,XX @@ #define PCI_EXP_DEVCAP2_ATOMIC_COMP64 0x00000100 /* 64b AtomicOp completion */ #define PCI_EXP_DEVCAP2_ATOMIC_COMP128 0x00000200 /* 128b AtomicOp completion */ #define PCI_EXP_DEVCAP2_LTR 0x00000800 /* Latency tolerance reporting */ +#define PCI_EXP_DEVCAP2_TPH_COMP 0x00003000 /* TPH completer support */ #define PCI_EXP_DEVCAP2_OBFF_MASK 0x000c0000 /* OBFF support mechanism */ #define PCI_EXP_DEVCAP2_OBFF_MSG 0x00040000 /* New message signaling */ #define PCI_EXP_DEVCAP2_OBFF_WAKE 0x00080000 /* Re-use WAKE# for OBFF */ @@ -XXX,XX +XXX,XX @@ #define PCI_DPA_CAP_SUBSTATE_MASK 0x1F /* # substates - 1 */ #define PCI_DPA_BASE_SIZEOF 16 /* size with 0 substates */ +/* TPH Completer Support */ +#define PCI_EXP_DEVCAP2_TPH_COMP_SHIFT 12 +#define PCI_EXP_DEVCAP2_TPH_COMP_NONE 0x0 /* None */ +#define PCI_EXP_DEVCAP2_TPH_COMP_TPH_ONLY 0x1 /* TPH only */ +#define PCI_EXP_DEVCAP2_TPH_COMP_TPH_AND_EXT 0x3 /* TPH and Extended TPH */ + /* TPH Requester */ #define PCI_TPH_CAP 4 /* capability register */ +#define PCI_TPH_CAP_NO_ST 0x1 /* no ST mode supported */ +#define PCI_TPH_CAP_NO_ST_SHIFT 0x0 /* no ST mode supported shift */ +#define PCI_TPH_CAP_INT_VEC 0x2 /* interrupt vector mode supported */ +#define PCI_TPH_CAP_INT_VEC_SHIFT 0x1 /* interrupt vector mode supported shift */ +#define PCI_TPH_CAP_DS 0x4 /* device specific mode supported */ +#define PCI_TPH_CAP_DS_SHIFT 0x4 /* device specific mode supported shift */ #define PCI_TPH_CAP_LOC_MASK 0x600 /* location mask */ -#define PCI_TPH_LOC_NONE 0x000 /* no location */ -#define PCI_TPH_LOC_CAP 0x200 /* in capability */ -#define PCI_TPH_LOC_MSIX 0x400 /* in MSI-X */ +#define PCI_TPH_CAP_LOC_SHIFT 9 /* location shift */ +#define PCI_TPH_LOC_NONE 0x0 /* no ST Table */ +#define PCI_TPH_LOC_CAP 0x1 /* ST Table in extended capability */ +#define PCI_TPH_LOC_MSIX 0x2 /* ST table in MSI-X table */ #define PCI_TPH_CAP_ST_MASK 0x07FF0000 /* ST table mask */ #define PCI_TPH_CAP_ST_SHIFT 16 /* ST table shift */ -#define PCI_TPH_BASE_SIZEOF 0xc /* size with no ST table */ + +#define PCI_TPH_CTRL 0x8 /* control register */ +#define PCI_TPH_CTRL_MODE_SEL_MASK 0x7 /* ST Model Select mask */ +#define PCI_TPH_CTRL_MODE_SEL_SHIFT 0x0 /* ST Model Select shift */ +#define PCI_TPH_NO_ST_MODE 0x0 /* No ST Mode */ +#define PCI_TPH_INT_VEC_MODE 0x1 /* Interrupt Vector Mode */ +#define PCI_TPH_DEV_SPEC_MODE 0x2 /* Device Specific Mode */ +#define PCI_TPH_CTRL_REQ_EN_MASK 0x300 /* TPH Requester mask */ +#define PCI_TPH_CTRL_REQ_EN_SHIFT 8 /* TPH Requester shift */ +#define PCI_TPH_REQ_DISABLE 0x0 /* No TPH request allowed */ +#define PCI_TPH_REQ_TPH_ONLY 0x1 /* 8-bit TPH tags allowed */ +#define PCI_TPH_REQ_EXT_TPH 0x3 /* 16-bit TPH tags allowed */ + +#define PCI_TPH_ST_TABLE 0xc /* base of ST table */ /* Downstream Port Containment */ #define PCI_EXP_DPC_CAP 0x04 /* DPC Capability */ -- 2.44.0
Provide a kernel option, with related helper functions, to completely disable TPH so that no TPH headers are generated. Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> --- .../admin-guide/kernel-parameters.txt | 1 + drivers/pci/pci-driver.c | 7 ++++- drivers/pci/pci.c | 12 ++++++++ drivers/pci/pcie/tph.c | 30 +++++++++++++++++++ include/linux/pci-tph.h | 19 ++++++++++++ include/linux/pci.h | 1 + 6 files changed, 69 insertions(+), 1 deletion(-) create mode 100644 include/linux/pci-tph.h diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index XXXXXXX..XXXXXXX 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -XXX,XX +XXX,XX @@ nomio [S390] Do not use MIO instructions. norid [S390] ignore the RID field and force use of one PCI domain per PCI function + notph [PCIE] Do not use PCIe TPH pcie_aspm= [PCIE] Forcibly enable or ignore PCIe Active State Power Management. diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -XXX,XX +XXX,XX @@ #include <linux/acpi.h> #include <linux/dma-map-ops.h> #include <linux/iommu.h> +#include <linux/pci-tph.h> #include "pci.h" #include "pcie/portdrv.h" @@ -XXX,XX +XXX,XX @@ static long local_pci_probe(void *_ddi) pm_runtime_get_sync(dev); pci_dev->driver = pci_drv; rc = pci_drv->probe(pci_dev, ddi->id); - if (!rc) + if (!rc) { + if (pci_tph_disabled()) + pcie_tph_disable(pci_dev); + return rc; + } if (rc < 0) { pci_dev->driver = NULL; pm_runtime_put_sync(dev); diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -XXX,XX +XXX,XX @@ static bool pcie_ari_disabled; /* If set, the PCIe ATS capability will not be used. */ static bool pcie_ats_disabled; +/* If set, the PCIe TPH capability will not be used. */ +static bool pcie_tph_disabled; + /* If set, the PCI config space of each device is printed during boot. */ bool pci_early_dump; @@ -XXX,XX +XXX,XX @@ bool pci_ats_disabled(void) } EXPORT_SYMBOL_GPL(pci_ats_disabled); +bool pci_tph_disabled(void) +{ + return pcie_tph_disabled; +} +EXPORT_SYMBOL_GPL(pci_tph_disabled); + /* Disable bridge_d3 for all PCIe ports */ static bool pci_bridge_d3_disable; /* Force bridge_d3 for all PCIe ports */ @@ -XXX,XX +XXX,XX @@ static int __init pci_setup(char *str) pci_no_domains(); } else if (!strncmp(str, "noari", 5)) { pcie_ari_disabled = true; + } else if (!strcmp(str, "notph")) { + pr_info("PCIe: TPH is disabled\n"); + pcie_tph_disabled = true; } else if (!strncmp(str, "cbiosize=", 9)) { pci_cardbus_io_size = memparse(str + 9, &str); } else if (!strncmp(str, "cbmemsize=", 10)) { diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/pcie/tph.c +++ b/drivers/pci/pcie/tph.c @@ -XXX,XX +XXX,XX @@ #include <linux/errno.h> #include <linux/msi.h> #include <linux/pci.h> +#include <linux/pci-tph.h> #include <linux/msi.h> #include <linux/pci-acpi.h> #include "../pci.h" +static int tph_set_reg_field_u32(struct pci_dev *dev, u8 offset, u32 mask, + u8 shift, u32 field) +{ + u32 reg_val; + int ret; + + if (!dev->tph_cap) + return -EINVAL; + + ret = pci_read_config_dword(dev, dev->tph_cap + offset, ®_val); + if (ret) + return ret; + + reg_val &= ~mask; + reg_val |= (field << shift) & mask; + + ret = pci_write_config_dword(dev, dev->tph_cap + offset, reg_val); + + return ret; +} + +int pcie_tph_disable(struct pci_dev *dev) +{ + return tph_set_reg_field_u32(dev, PCI_TPH_CTRL, + PCI_TPH_CTRL_REQ_EN_MASK, + PCI_TPH_CTRL_REQ_EN_SHIFT, + PCI_TPH_REQ_DISABLE); +} + void pcie_tph_init(struct pci_dev *dev) { dev->tph_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_TPH); diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/include/linux/pci-tph.h @@ -XXX,XX +XXX,XX @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * TPH (TLP Processing Hints) + * + * Copyright (C) 2024 Advanced Micro Devices, Inc. + * Eric Van Tassell <Eric.VanTassell@amd.com> + * Wei Huang <wei.huang2@amd.com> + */ +#ifndef LINUX_PCI_TPH_H +#define LINUX_PCI_TPH_H + +#ifdef CONFIG_PCIE_TPH +int pcie_tph_disable(struct pci_dev *dev); +#else +static inline int pcie_tph_disable(struct pci_dev *dev) +{ return -EOPNOTSUPP; } +#endif + +#endif /* LINUX_PCI_TPH_H */ diff --git a/include/linux/pci.h b/include/linux/pci.h index XXXXXXX..XXXXXXX 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -XXX,XX +XXX,XX @@ static inline bool pci_aer_available(void) { return false; } #endif bool pci_ats_disabled(void); +bool pci_tph_disabled(void); #ifdef CONFIG_PCIE_PTM int pci_enable_ptm(struct pci_dev *dev, u8 *granularity); -- 2.44.0
When "No ST mode" is enabled, end-point devices can generate TPH headers but with all steering tags treated as zero. A steering tag of zero is interpreted as "using the default policy" by the root complex. This is essential to quantify the benefit of steering tags for some given workloads. Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> --- .../admin-guide/kernel-parameters.txt | 1 + drivers/pci/pci-driver.c | 7 ++++++- drivers/pci/pci.c | 12 +++++++++++ drivers/pci/pcie/tph.c | 21 +++++++++++++++++++ include/linux/pci-tph.h | 3 +++ include/linux/pci.h | 1 + 6 files changed, 44 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index XXXXXXX..XXXXXXX 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -XXX,XX +XXX,XX @@ norid [S390] ignore the RID field and force use of one PCI domain per PCI function notph [PCIE] Do not use PCIe TPH + nostmode [PCIE] Force TPH to use No ST Mode pcie_aspm= [PCIE] Forcibly enable or ignore PCIe Active State Power Management. diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -XXX,XX +XXX,XX @@ static long local_pci_probe(void *_ddi) pci_dev->driver = pci_drv; rc = pci_drv->probe(pci_dev, ddi->id); if (!rc) { - if (pci_tph_disabled()) + if (pci_tph_disabled()) { pcie_tph_disable(pci_dev); + return rc; + } + + if (pci_tph_nostmode()) + tph_set_dev_nostmode(pci_dev); return rc; } diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -XXX,XX +XXX,XX @@ static bool pcie_ats_disabled; /* If set, the PCIe TPH capability will not be used. */ static bool pcie_tph_disabled; +/* If TPH is enabled, "No ST Mode" will be enforced. */ +static bool pcie_tph_nostmode; + /* If set, the PCI config space of each device is printed during boot. */ bool pci_early_dump; @@ -XXX,XX +XXX,XX @@ bool pci_tph_disabled(void) } EXPORT_SYMBOL_GPL(pci_tph_disabled); +bool pci_tph_nostmode(void) +{ + return pcie_tph_nostmode; +} +EXPORT_SYMBOL_GPL(pci_tph_nostmode); + /* Disable bridge_d3 for all PCIe ports */ static bool pci_bridge_d3_disable; /* Force bridge_d3 for all PCIe ports */ @@ -XXX,XX +XXX,XX @@ static int __init pci_setup(char *str) } else if (!strcmp(str, "notph")) { pr_info("PCIe: TPH is disabled\n"); pcie_tph_disabled = true; + } else if (!strcmp(str, "nostmode")) { + pr_info("PCIe: TPH No ST Mode is enabled\n"); + pcie_tph_nostmode = true; } else if (!strncmp(str, "cbiosize=", 9)) { pci_cardbus_io_size = memparse(str + 9, &str); } else if (!strncmp(str, "cbmemsize=", 10)) { diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/pcie/tph.c +++ b/drivers/pci/pcie/tph.c @@ -XXX,XX +XXX,XX @@ static int tph_set_reg_field_u32(struct pci_dev *dev, u8 offset, u32 mask, return ret; } +int tph_set_dev_nostmode(struct pci_dev *dev) +{ + int ret; + + /* set ST Mode Select to "No ST Mode" */ + ret = tph_set_reg_field_u32(dev, PCI_TPH_CTRL, + PCI_TPH_CTRL_MODE_SEL_MASK, + PCI_TPH_CTRL_MODE_SEL_SHIFT, + PCI_TPH_NO_ST_MODE); + if (ret) + return ret; + + /* set "TPH Requester Enable" to "TPH only" */ + ret = tph_set_reg_field_u32(dev, PCI_TPH_CTRL, + PCI_TPH_CTRL_REQ_EN_MASK, + PCI_TPH_CTRL_REQ_EN_SHIFT, + PCI_TPH_REQ_TPH_ONLY); + + return ret; +} + int pcie_tph_disable(struct pci_dev *dev) { return tph_set_reg_field_u32(dev, PCI_TPH_CTRL, diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h index XXXXXXX..XXXXXXX 100644 --- a/include/linux/pci-tph.h +++ b/include/linux/pci-tph.h @@ -XXX,XX +XXX,XX @@ #ifdef CONFIG_PCIE_TPH int pcie_tph_disable(struct pci_dev *dev); +int tph_set_dev_nostmode(struct pci_dev *dev); #else static inline int pcie_tph_disable(struct pci_dev *dev) { return -EOPNOTSUPP; } +static inline int tph_set_dev_nostmode(struct pci_dev *dev) +{ return -EOPNOTSUPP; } #endif #endif /* LINUX_PCI_TPH_H */ diff --git a/include/linux/pci.h b/include/linux/pci.h index XXXXXXX..XXXXXXX 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -XXX,XX +XXX,XX @@ static inline bool pci_aer_available(void) { return false; } bool pci_ats_disabled(void); bool pci_tph_disabled(void); +bool pci_tph_nostmode(void); #ifdef CONFIG_PCIE_PTM int pci_enable_ptm(struct pci_dev *dev, u8 *granularity); -- 2.44.0
This patch introduces three API functions, pcie_tph_intr_vec_supported(), pcie_tph_get_st() and pcie_tph_set_st(), for a driver to query, retrieve or configure device's steering tags. There are two possible locations for steering tag table and the code automatically figure out the right location to set the tags if pcie_tph_set_st() is called. Note the tag value is always zero currently and will be extended in the follow-up patches. Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> --- drivers/pci/pcie/tph.c | 402 ++++++++++++++++++++++++++++++++++++++++ include/linux/pci-tph.h | 22 +++ 2 files changed, 424 insertions(+) diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/pcie/tph.c +++ b/drivers/pci/pcie/tph.c @@ -XXX,XX +XXX,XX @@ static int tph_set_reg_field_u32(struct pci_dev *dev, u8 offset, u32 mask, return ret; } +static int tph_get_reg_field_u32(struct pci_dev *dev, u8 offset, u32 mask, + u8 shift, u32 *field) +{ + u32 reg_val; + int ret; + + if (!dev->tph_cap) + return -EINVAL; + + ret = pci_read_config_dword(dev, dev->tph_cap + offset, ®_val); + if (ret) + return ret; + + *field = (reg_val & mask) >> shift; + + return 0; +} + +static int tph_get_table_size(struct pci_dev *dev, u16 *size_out) +{ + int ret; + u32 tmp; + + ret = tph_get_reg_field_u32(dev, PCI_TPH_CAP, + PCI_TPH_CAP_ST_MASK, + PCI_TPH_CAP_ST_SHIFT, &tmp); + + if (ret) + return ret; + + *size_out = (u16)tmp; + + return 0; +} + +/* + * For a given device, return a pointer to the MSI table entry at msi_index. + */ +static void __iomem *tph_msix_table_entry(struct pci_dev *dev, + u16 msi_index) +{ + void __iomem *entry; + u16 tbl_sz; + int ret; + + ret = tph_get_table_size(dev, &tbl_sz); + if (ret || msi_index > tbl_sz) + return NULL; + + entry = dev->msix_base + msi_index * PCI_MSIX_ENTRY_SIZE; + + return entry; +} + +/* + * For a given device, return a pointer to the vector control register at + * offset 0xc of MSI table entry at msi_index. + */ +static void __iomem *tph_msix_vector_control(struct pci_dev *dev, + u16 msi_index) +{ + void __iomem *vec_ctrl_addr = tph_msix_table_entry(dev, msi_index); + + if (vec_ctrl_addr) + vec_ctrl_addr += PCI_MSIX_ENTRY_VECTOR_CTRL; + + return vec_ctrl_addr; +} + +/* + * Translate from MSI-X interrupt index to struct msi_desc * + */ +static struct msi_desc *tph_msix_index_to_desc(struct pci_dev *dev, int index) +{ + struct msi_desc *entry; + + msi_lock_descs(&dev->dev); + msi_for_each_desc(entry, &dev->dev, MSI_DESC_ASSOCIATED) { + if (entry->msi_index == index) + return entry; + } + msi_unlock_descs(&dev->dev); + + return NULL; +} + +static bool tph_int_vec_mode_supported(struct pci_dev *dev) +{ + u32 mode = 0; + int ret; + + ret = tph_get_reg_field_u32(dev, PCI_TPH_CAP, + PCI_TPH_CAP_INT_VEC, + PCI_TPH_CAP_INT_VEC_SHIFT, &mode); + if (ret) + return false; + + return !!mode; +} + +static int tph_get_table_location(struct pci_dev *dev, u8 *loc_out) +{ + u32 loc; + int ret; + + ret = tph_get_reg_field_u32(dev, PCI_TPH_CAP, PCI_TPH_CAP_LOC_MASK, + PCI_TPH_CAP_LOC_SHIFT, &loc); + if (ret) + return ret; + + *loc_out = (u8)loc; + + return 0; +} + +static bool msix_nr_in_bounds(struct pci_dev *dev, int msix_nr) +{ + u16 tbl_sz; + + if (tph_get_table_size(dev, &tbl_sz)) + return false; + + return msix_nr <= tbl_sz; +} + +/* Return root port capability - 0 means none */ +static int get_root_port_completer_cap(struct pci_dev *dev) +{ + struct pci_dev *rp; + int ret; + int val; + + rp = pcie_find_root_port(dev); + if (!rp) { + pr_err("cannot find root port of %s\n", dev_name(&dev->dev)); + return 0; + } + + ret = pcie_capability_read_dword(rp, PCI_EXP_DEVCAP2, &val); + if (ret) { + pr_err("cannot read device capabilities 2 of %s\n", + dev_name(&dev->dev)); + return 0; + } + + val &= PCI_EXP_DEVCAP2_TPH_COMP; + + return val >> PCI_EXP_DEVCAP2_TPH_COMP_SHIFT; +} + +/* + * TPH device needs to be below a rootport with the TPH Completer and + * the completer must offer a compatible level of completer support to that + * requested by the device driver. + */ +static bool completer_support_ok(struct pci_dev *dev, u8 req) +{ + int rp_cap; + + rp_cap = get_root_port_completer_cap(dev); + + if (req > rp_cap) { + pr_err("root port lacks proper TPH completer capability\n"); + return false; + } + + return true; +} + +/* + * The PCI Specification version 5.0 requires the "No ST Mode" mode + * be supported by any compatible device. + */ +static bool no_st_mode_supported(struct pci_dev *dev) +{ + bool no_st; + int ret; + u32 tmp; + + ret = tph_get_reg_field_u32(dev, PCI_TPH_CAP, PCI_TPH_CAP_NO_ST, + PCI_TPH_CAP_NO_ST_SHIFT, &tmp); + if (ret) + return false; + + no_st = !!tmp; + + if (!no_st) { + pr_err("TPH devices must support no ST mode\n"); + return false; + } + + return true; +} + +static int tph_write_ctrl_reg(struct pci_dev *dev, u32 value) +{ + int ret; + + ret = tph_set_reg_field_u32(dev, PCI_TPH_CTRL, ~0L, 0, value); + + if (ret) + goto err_out; + + return 0; + +err_out: + /* minimizing possible harm by disabling TPH */ + pcie_tph_disable(dev); + return ret; +} + +/* Update the ST Mode Select field of the TPH Control Register */ +static int tph_set_ctrl_reg_mode_sel(struct pci_dev *dev, u8 st_mode) +{ + int ret; + u32 ctrl_reg; + + ret = tph_get_reg_field_u32(dev, PCI_TPH_CTRL, ~0L, 0, &ctrl_reg); + if (ret) + return ret; + + /* clear the mode select and enable fields */ + ctrl_reg &= ~(PCI_TPH_CTRL_MODE_SEL_MASK); + ctrl_reg |= ((u32)(st_mode << PCI_TPH_CTRL_MODE_SEL_SHIFT) & + PCI_TPH_CTRL_MODE_SEL_MASK); + + ret = tph_write_ctrl_reg(dev, ctrl_reg); + if (ret) + return ret; + + return 0; +} + +/* Write the steering tag to MSI-X vector control register */ +static void tph_write_tag_to_msix(struct pci_dev *dev, int msix_nr, u16 tag) +{ + u32 val; + void __iomem *vec_ctrl; + struct msi_desc *msi_desc; + + msi_desc = tph_msix_index_to_desc(dev, msix_nr); + if (!msi_desc) { + pr_err("MSI-X descriptor for #%d not found\n", msix_nr); + return; + } + + vec_ctrl = tph_msix_vector_control(dev, msi_desc->msi_index); + + val = readl(vec_ctrl); + val &= 0xffff; + val |= (tag << 16); + writel(val, vec_ctrl); + + /* read back to flush the update */ + val = readl(vec_ctrl); + msi_unlock_descs(&dev->dev); +} + +/* Update the TPH Requester Enable field of the TPH Control Register */ +static int tph_set_ctrl_reg_en(struct pci_dev *dev, u8 req_type) +{ + int ret; + u32 ctrl_reg; + + ret = tph_get_reg_field_u32(dev, PCI_TPH_CTRL, ~0L, 0, + &ctrl_reg); + if (ret) + return ret; + + /* clear the mode select and enable fields and set new values*/ + ctrl_reg &= ~(PCI_TPH_CTRL_REQ_EN_MASK); + ctrl_reg |= (((u32)req_type << PCI_TPH_CTRL_REQ_EN_SHIFT) & + PCI_TPH_CTRL_REQ_EN_MASK); + + ret = tph_write_ctrl_reg(dev, ctrl_reg); + if (ret) + return ret; + + return 0; +} + +static bool pcie_tph_write_st(struct pci_dev *dev, unsigned int msix_nr, + u8 req_type, u16 tag) +{ + int offset; + u8 loc; + int ret; + + /* setting ST isn't needed - not an error, just return true */ + if (!dev->tph_cap || pci_tph_disabled() || pci_tph_nostmode() || + !dev->msix_enabled || !tph_int_vec_mode_supported(dev)) + return true; + + /* setting ST is incorrect in the following cases - return error */ + if (!no_st_mode_supported(dev) || !msix_nr_in_bounds(dev, msix_nr) || + !completer_support_ok(dev, req_type)) + return false; + + /* + * disable TPH before updating the tag to avoid potential instability + * as cautioned about in the "ST Table Programming" of PCI-E spec + */ + pcie_tph_disable(dev); + + ret = tph_get_table_location(dev, &loc); + if (ret) + return false; + + switch (loc) { + case PCI_TPH_LOC_MSIX: + tph_write_tag_to_msix(dev, msix_nr, tag); + break; + case PCI_TPH_LOC_CAP: + offset = dev->tph_cap + PCI_TPH_ST_TABLE + + msix_nr * sizeof(u16); + pci_write_config_word(dev, offset, tag); + break; + default: + pr_err("unable to write steering tag for device %s\n", + dev_name(&dev->dev)); + return false; + } + + /* select interrupt vector mode */ + tph_set_ctrl_reg_mode_sel(dev, PCI_TPH_INT_VEC_MODE); + tph_set_ctrl_reg_en(dev, req_type); + + return true; +} + int tph_set_dev_nostmode(struct pci_dev *dev) { int ret; @@ -XXX,XX +XXX,XX @@ void pcie_tph_init(struct pci_dev *dev) dev->tph_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_TPH); } +/** + * pcie_tph_intr_vec_supported() - Check if interrupt vector mode supported for dev + * @dev: pci device + * + * Return: + * true : intr vector mode supported + * false: intr vector mode not supported + */ +bool pcie_tph_intr_vec_supported(struct pci_dev *dev) +{ + if (!dev->tph_cap || pci_tph_disabled() || !dev->msix_enabled || + !tph_int_vec_mode_supported(dev)) + return false; + + return true; +} +EXPORT_SYMBOL(pcie_tph_intr_vec_supported); + +/** + * pcie_tph_get_st() - Retrieve steering tag for a specific CPU + * @dev: pci device + * @cpu: the acpi cpu_uid. + * @mem_type: memory type (vram, nvram) + * @req_type: request type (disable, tph, extended tph) + * @tag: steering tag return value + * + * Return: + * true : success + * false: failed + */ +bool pcie_tph_get_st(struct pci_dev *dev, unsigned int cpu, + enum tph_mem_type mem_type, u8 req_type, + u16 *tag) +{ + *tag = 0; + + return true; +} +EXPORT_SYMBOL(pcie_tph_get_st); + +/** + * pcie_tph_set_st() - Set steering tag in ST table entry + * @dev: pci device + * @msix_nr: ordinal number of msix interrupt. + * @cpu: the acpi cpu_uid. + * @mem_type: memory type (vram, nvram) + * @req_type: request type (disable, tph, extended tph) + * + * Return: + * true : success + * false: failed + */ +bool pcie_tph_set_st(struct pci_dev *dev, unsigned int msix_nr, + unsigned int cpu, enum tph_mem_type mem_type, + u8 req_type) +{ + u16 tag; + bool ret = true; + + ret = pcie_tph_get_st(dev, cpu, mem_type, req_type, &tag); + + if (!ret) + return false; + + pr_debug("%s: writing tag %d for msi-x intr %d (cpu: %d)\n", + __func__, tag, msix_nr, cpu); + + ret = pcie_tph_write_st(dev, msix_nr, req_type, tag); + + return ret; +} +EXPORT_SYMBOL(pcie_tph_set_st); diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h index XXXXXXX..XXXXXXX 100644 --- a/include/linux/pci-tph.h +++ b/include/linux/pci-tph.h @@ -XXX,XX +XXX,XX @@ #ifndef LINUX_PCI_TPH_H #define LINUX_PCI_TPH_H +enum tph_mem_type { + TPH_MEM_TYPE_VM, /* volatile memory type */ + TPH_MEM_TYPE_PM /* persistent memory type */ +}; + #ifdef CONFIG_PCIE_TPH int pcie_tph_disable(struct pci_dev *dev); int tph_set_dev_nostmode(struct pci_dev *dev); +bool pcie_tph_intr_vec_supported(struct pci_dev *dev); +bool pcie_tph_get_st(struct pci_dev *dev, unsigned int cpu, + enum tph_mem_type tag_type, u8 req_enable, + u16 *tag); +bool pcie_tph_set_st(struct pci_dev *dev, unsigned int msix_nr, + unsigned int cpu, enum tph_mem_type tag_type, + u8 req_enable); #else static inline int pcie_tph_disable(struct pci_dev *dev) { return -EOPNOTSUPP; } static inline int tph_set_dev_nostmode(struct pci_dev *dev) { return -EOPNOTSUPP; } +static inline bool pcie_tph_intr_vec_supported(struct pci_dev *dev) +{ return false; } +static inline bool pcie_tph_get_st(struct pci_dev *dev, unsigned int cpu, + enum tph_mem_type tag_type, u8 req_enable, + u16 *tag) +{ return false; } +static inline bool pcie_tph_set_st(struct pci_dev *dev, unsigned int msix_nr, + unsigned int cpu, enum tph_mem_type tag_type, + u8 req_enable) +{ return true; } #endif #endif /* LINUX_PCI_TPH_H */ -- 2.44.0
According to PCI SIG ECN, calling the _DSM firmware method for a given CPU_UID returns the steering tags for different types of memory (volatile, non-volatile). These tags are supposed to be used in ST table entry for optimal results. Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> --- drivers/pci/pcie/tph.c | 103 +++++++++++++++++++++++++++++++++++++++- include/linux/pci-tph.h | 34 +++++++++++++ 2 files changed, 136 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/tph.c b/drivers/pci/pcie/tph.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/pcie/tph.c +++ b/drivers/pci/pcie/tph.c @@ -XXX,XX +XXX,XX @@ static int tph_get_table_location(struct pci_dev *dev, u8 *loc_out) return 0; } +static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type, + union st_info *st_tag) +{ + switch (req_type) { + case PCI_TPH_REQ_TPH_ONLY: /* 8 bit tags */ + switch (mem_type) { + case TPH_MEM_TYPE_VM: + if (st_tag->vm_st_valid) + return st_tag->vm_st; + break; + case TPH_MEM_TYPE_PM: + if (st_tag->pm_st_valid) + return st_tag->pm_st; + break; + } + break; + case PCI_TPH_REQ_EXT_TPH: /* 16 bit tags */ + switch (mem_type) { + case TPH_MEM_TYPE_VM: + if (st_tag->vm_xst_valid) + return st_tag->vm_xst; + break; + case TPH_MEM_TYPE_PM: + if (st_tag->pm_xst_valid) + return st_tag->pm_xst; + break; + } + break; + default: + pr_err("invalid steering tag in ACPI _DSM\n"); + return 0; + } + + return 0; +} + +#define MIN_ST_DSM_REV 7 +#define ST_DSM_FUNC_INDEX 0xf +static bool invoke_dsm(acpi_handle handle, u32 cpu_uid, u8 ph, + u8 target_type, bool cache_ref_valid, + u64 cache_ref, union st_info *st_out) +{ + union acpi_object in_obj, in_buf[3], *out_obj; + + in_buf[0].integer.type = ACPI_TYPE_INTEGER; + in_buf[0].integer.value = 0; /* 0 => processor cache steering tags */ + + in_buf[1].integer.type = ACPI_TYPE_INTEGER; + in_buf[1].integer.value = cpu_uid; + + in_buf[2].integer.type = ACPI_TYPE_INTEGER; + in_buf[2].integer.value = ph & 3; + in_buf[2].integer.value |= (target_type & 1) << 2; + in_buf[2].integer.value |= (cache_ref_valid & 1) << 3; + in_buf[2].integer.value |= (cache_ref << 32); + + in_obj.type = ACPI_TYPE_PACKAGE; + in_obj.package.count = ARRAY_SIZE(in_buf); + in_obj.package.elements = in_buf; + + out_obj = acpi_evaluate_dsm(handle, &pci_acpi_dsm_guid, MIN_ST_DSM_REV, + ST_DSM_FUNC_INDEX, &in_obj); + + if (!out_obj) + return false; + + if (out_obj->type != ACPI_TYPE_BUFFER) { + pr_err("invalid return type %d from TPH _DSM\n", + out_obj->type); + ACPI_FREE(out_obj); + return false; + } + + st_out->value = *((u64 *)(out_obj->buffer.pointer)); + + ACPI_FREE(out_obj); + + return true; +} + +static acpi_handle root_complex_acpi_handle(struct pci_dev *dev) +{ + struct pci_dev *root_port; + + root_port = pcie_find_root_port(dev); + + if (!root_port || !root_port->bus || !root_port->bus->bridge) + return NULL; + + return ACPI_HANDLE(root_port->bus->bridge); +} + static bool msix_nr_in_bounds(struct pci_dev *dev, int msix_nr) { u16 tbl_sz; @@ -XXX,XX +XXX,XX @@ bool pcie_tph_get_st(struct pci_dev *dev, unsigned int cpu, enum tph_mem_type mem_type, u8 req_type, u16 *tag) { - *tag = 0; + union st_info info; + + if (!invoke_dsm(root_complex_acpi_handle(dev), cpu, 0, 0, false, 0, + &info)) { + *tag = 0; + return false; + } + + *tag = tph_extract_tag(mem_type, req_type, &info); + pr_debug("%s: cpu=%d tag=%d\n", __func__, cpu, *tag); return true; } diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h index XXXXXXX..XXXXXXX 100644 --- a/include/linux/pci-tph.h +++ b/include/linux/pci-tph.h @@ -XXX,XX +XXX,XX @@ enum tph_mem_type { TPH_MEM_TYPE_PM /* persistent memory type */ }; +/* + * The st_info struct defines the steering tag returned by the firmware _DSM + * method defined in PCI SIG ECN. The specification is available at: + * https://members.pcisig.com/wg/PCI-SIG/document/15470. + + * @vm_st_valid: 8 bit tag for volatile memory is valid + * @vm_xst_valid: 16 bit tag for volatile memory is valid + * @vm_ignore: 1 => was and will be ignored, 0 => ph should be supplied + * @vm_st: 8 bit steering tag for volatile mem + * @vm_xst: 16 bit steering tag for volatile mem + * @pm_st_valid: 8 bit tag for persistent memory is valid + * @pm_xst_valid: 16 bit tag for persistent memory is valid + * @pm_ignore: 1 => was and will be ignore, 0 => ph should be supplied + * @pm_st: 8 bit steering tag for persistent mem + * @pm_xst: 16 bit steering tag for persistent mem + */ +union st_info { + struct { + u64 vm_st_valid:1, + vm_xst_valid:1, + vm_ph_ignore:1, + rsvd1:5, + vm_st:8, + vm_xst:16, + pm_st_valid:1, + pm_xst_valid:1, + pm_ph_ignore:1, + rsvd2:5, + pm_st:8, + pm_xst:16; + }; + u64 value; +}; + #ifdef CONFIG_PCIE_TPH int pcie_tph_disable(struct pci_dev *dev); int tph_set_dev_nostmode(struct pci_dev *dev); -- 2.44.0
Provide a document for TPH feature, including the description of kernel options and driver API interface. Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> --- Documentation/PCI/index.rst | 1 + Documentation/PCI/tph.rst | 57 ++++++++++++++++++++++++++++ Documentation/driver-api/pci/pci.rst | 3 ++ 3 files changed, 61 insertions(+) create mode 100644 Documentation/PCI/tph.rst diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst index XXXXXXX..XXXXXXX 100644 --- a/Documentation/PCI/index.rst +++ b/Documentation/PCI/index.rst @@ -XXX,XX +XXX,XX @@ PCI Bus Subsystem pcieaer-howto endpoint/index boot-interrupts + tph diff --git a/Documentation/PCI/tph.rst b/Documentation/PCI/tph.rst new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/Documentation/PCI/tph.rst @@ -XXX,XX +XXX,XX @@ +.. SPDX-License-Identifier: GPL-2.0 + +=========== +TPH Support +=========== + + +:Copyright: 2024 Advanced Micro Devices, Inc. +:Authors: - Eric van Tassell <eric.vantassell@amd.com> + - Wei Huang <wei.huang2@amd.com> + +Overview +======== +TPH (TLP Processing Hints) is a PCIe feature that allows endpoint devices +to provide optimization hints, such as desired caching behavior, for +requests that target memory space. These hints, in a format called steering +tags, are provided in the requester's TLP headers and can empower the system +hardware, including the Root Complex, to optimize the utilization of platform +resources for the requests. + +User Guide +========== + +Kernel Options +-------------- +There are two kernel command line options available to control TPH feature + + * "notph": TPH will be disabled for all endpoint devices. + * "nostmode": TPH will be enabled but the ST Mode will be forced to "No ST Mode". + +Device Driver API +----------------- +In brief, an endpoint device driver using the TPH interface to configure +Interrupt Vector Mode will call pcie_tph_set_st() when setting up MSI-X +interrupts as shown below: + +.. code-block:: c + + for (i = 0, j = 0; i < nr_rings; i++) { + ... + rc = request_irq(irq->vector, irq->handler, flags, irq->name, NULL); + ... + if (!pcie_tph_set_st(pdev, i, cpumask_first(irq->cpu_mask), + TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY)) + pr_err("Error in configuring steering tag\n"); + ... + } + +If a device only supports TPH vendor specific mode, its driver can call +pcie_tph_get_st() to retrieve the steering tag for a specific CPU and uses +the tag to control TPH behavior. + +.. kernel-doc:: drivers/pci/pcie/tph.c + :export: + +.. kernel-doc:: drivers/pci/pcie/tph.c + :identifiers: pcie_tph_set_st diff --git a/Documentation/driver-api/pci/pci.rst b/Documentation/driver-api/pci/pci.rst index XXXXXXX..XXXXXXX 100644 --- a/Documentation/driver-api/pci/pci.rst +++ b/Documentation/driver-api/pci/pci.rst @@ -XXX,XX +XXX,XX @@ PCI Support Library .. kernel-doc:: drivers/pci/pci-sysfs.c :internal: +.. kernel-doc:: drivers/pci/pcie/tph.c + :export: + PCI Hotplug Support Library --------------------------- -- 2.44.0
From: Manoj Panicker <manoj.panicker2@amd.com> As a usage example, this patch implements TPH support in Broadcom BNXT device driver by invoking pcie_tph_set_st() function when interrupt affinity is changed. Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com> Reviewed-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 54 +++++++++++++++++++++++ drivers/net/ethernet/broadcom/bnxt/bnxt.h | 4 ++ 2 files changed, 58 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -XXX,XX +XXX,XX @@ #include <net/page_pool/helpers.h> #include <linux/align.h> #include <net/netdev_queues.h> +#include <linux/pci-tph.h> #include "bnxt_hsi.h" #include "bnxt.h" @@ -XXX,XX +XXX,XX @@ static void bnxt_free_irq(struct bnxt *bp) free_cpumask_var(irq->cpu_mask); irq->have_cpumask = 0; } + if (pcie_tph_intr_vec_supported(bp->pdev)) + irq_set_affinity_notifier(irq->vector, NULL); free_irq(irq->vector, bp->bnapi[i]); } @@ -XXX,XX +XXX,XX @@ static void bnxt_free_irq(struct bnxt *bp) } } +static void bnxt_rtnl_lock_sp(struct bnxt *bp); +static void bnxt_rtnl_unlock_sp(struct bnxt *bp); +static void __bnxt_irq_affinity_notify(struct irq_affinity_notify *notify, + const cpumask_t *mask) +{ + struct bnxt_irq *irq; + + irq = container_of(notify, struct bnxt_irq, affinity_notify); + cpumask_copy(irq->cpu_mask, mask); + + if (!pcie_tph_set_st(irq->bp->pdev, irq->msix_nr, + cpumask_first(irq->cpu_mask), + TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY)) + netdev_dbg(irq->bp->dev, "error in setting steering tag\n"); + + if (netif_running(irq->bp->dev)) { + rtnl_lock(); + bnxt_close_nic(irq->bp, false, false); + bnxt_open_nic(irq->bp, false, false); + rtnl_unlock(); + } +} + +static void __bnxt_irq_affinity_release(struct kref __always_unused *ref) +{ +} + +static inline void bnxt_register_affinity_notifier(struct bnxt_irq *irq) +{ + struct irq_affinity_notify *notify; + + notify = &irq->affinity_notify; + notify->irq = irq->vector; + notify->notify = __bnxt_irq_affinity_notify; + notify->release = __bnxt_irq_affinity_release; + + irq_set_affinity_notifier(irq->vector, notify); +} + static int bnxt_request_irq(struct bnxt *bp) { int i, j, rc = 0; @@ -XXX,XX +XXX,XX @@ static int bnxt_request_irq(struct bnxt *bp) int numa_node = dev_to_node(&bp->pdev->dev); irq->have_cpumask = 1; + irq->msix_nr = map_idx; cpumask_set_cpu(cpumask_local_spread(i, numa_node), irq->cpu_mask); rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask); @@ -XXX,XX +XXX,XX @@ static int bnxt_request_irq(struct bnxt *bp) irq->vector); break; } + + if (pcie_tph_intr_vec_supported(bp->pdev)) { + irq->bp = bp; + bnxt_register_affinity_notifier(irq); + + /* first setup */ + if (!pcie_tph_set_st(bp->pdev, i, + cpumask_first(irq->cpu_mask), + TPH_MEM_TYPE_VM, PCI_TPH_REQ_TPH_ONLY)) + netdev_dbg(bp->dev, "error in setting steering tag\n"); + } } } return rc; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index XXXXXXX..XXXXXXX 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -XXX,XX +XXX,XX @@ struct bnxt_irq { u8 have_cpumask:1; char name[IFNAMSIZ + 2]; cpumask_var_t cpu_mask; + + int msix_nr; + struct bnxt *bp; + struct irq_affinity_notify affinity_notify; }; #define HWRM_RING_ALLOC_TX 0x1 -- 2.44.0
From: Michael Chan <michael.chan@broadcom.com> Newer firmware can use the NQ ring ID associated with each RX/RX AGG ring to enable PCIe steering tag. Older firmware will just ignore the information. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -XXX,XX +XXX,XX @@ static int hwrm_ring_alloc_send_msg(struct bnxt *bp, /* Association of rx ring with stats context */ grp_info = &bp->grp_info[ring->grp_idx]; + req->nq_ring_id = cpu_to_le16(grp_info->cp_fw_ring_id); req->rx_buf_size = cpu_to_le16(bp->rx_buf_use_size); req->stat_ctx_id = cpu_to_le32(grp_info->fw_stats_ctx); req->enables |= cpu_to_le32( - RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID); + RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID | + RING_ALLOC_REQ_ENABLES_NQ_RING_ID_VALID); if (NET_IP_ALIGN == 2) flags = RING_ALLOC_REQ_FLAGS_RX_SOP_PAD; req->flags = cpu_to_le16(flags); @@ -XXX,XX +XXX,XX @@ static int hwrm_ring_alloc_send_msg(struct bnxt *bp, /* Association of agg ring with rx ring */ grp_info = &bp->grp_info[ring->grp_idx]; req->rx_ring_id = cpu_to_le16(grp_info->rx_fw_ring_id); + req->nq_ring_id = cpu_to_le16(grp_info->cp_fw_ring_id); req->rx_buf_size = cpu_to_le16(BNXT_RX_PAGE_SIZE); req->stat_ctx_id = cpu_to_le32(grp_info->fw_stats_ctx); req->enables |= cpu_to_le32( RING_ALLOC_REQ_ENABLES_RX_RING_ID_VALID | - RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID); + RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID | + RING_ALLOC_REQ_ENABLES_NQ_RING_ID_VALID); } else { req->ring_type = RING_ALLOC_REQ_RING_TYPE_RX; } -- 2.44.0
Hi All, TPH (TLP Processing Hints) is a PCIe feature that allows endpoint devices to provide optimization hints for requests that target memory space. These hints, in a format called steering tag (ST), are provided in the requester's TLP headers and allow the system hardware, including the Root Complex, to optimize the utilization of platform resources for the requests. Upcoming AMD hardware implement a new Cache Injection feature that leverages TPH. Cache Injection allows PCIe endpoints to inject I/O Coherent DMA writes directly into an L2 within the CCX (core complex) closest to the CPU core that will consume it. This technology is aimed at applications requiring high performance and low latency, such as networking and storage applications. This series introduces generic TPH support in Linux, allowing STs to be retrieved and used by PCIe endpoint drivers as needed. As a demonstration, it includes an example usage in the Broadcom BNXT driver. When running on Broadcom NICs with the appropriate firmware, it shows substantial memory bandwidth savings and better network bandwidth using real-world benchmarks. This solution is vendor-neutral and implemented based on industry standards (PCIe Spec and PCI FW Spec). V5->V6: * Rebase on top of pci/main (tag: pci-v6.12-changes) * Fix spellings and FIELD_PREP/bnxt.c compilation errors (Simon) * Move tph.c to drivers/pci directory (Lukas) * Remove CONFIG_ACPI dependency (Lukas) * Slightly re-arrange save/restore sequence (Lukas) V4->V5: * Rebase on top of net-next/main tree (Broadcom) * Remove TPH mode query and TPH enabled checking functions (Bjorn) * Remove "nostmode" kernel parameter (Bjorn) * Add "notph" kernel parameter support (Bjorn) * Add back TPH documentation (Bjorn) * Change TPH register namings (Bjorn) * Squash TPH enable/disable/save/restore funcs as a single patch (Bjorn) * Squash ST get_st/set_st funcs as a single patch (Bjorn) * Replace nic_open/close with netdev_rx_queue_restart() (Jakub, Broadcom) V3->V4: * Rebase on top of the latest pci/next tree (tag: 6.11-rc1) * Add new API functioins to query/enable/disable TPH support * Make pcie_tph_set_st() completely independent from pcie_tph_get_cpu_st() * Rewrite bnxt.c based on new APIs * Remove documentation for now due to constantly changing API * Remove pci=notph, but keep pci=nostmode with better flow (Bjorn) * Lots of code rewrite in tph.c & pci-tph.h with cleaner interface (Bjorn) * Add TPH save/restore support (Paul Luse and Lukas Wunner) V2->V3: * Rebase on top of pci/next tree (tag: pci-v6.11-changes) * Redefine PCI TPH registers (pci_regs.h) without breaking uapi * Fix commit subjects/messages for kernel options (Jonathan and Bjorn) * Break API functions into three individual patches for easy review * Rewrite lots of code in tph.c/tph.h based (Jonathan and Bjorn) V1->V2: * Rebase on top of pci.git/for-linus (6.10-rc1) * Address mismatched data types reported by Sparse (Sparse check passed) * Add pcie_tph_intr_vec_supported() for checking IRQ mode support * Skip bnxt affinity notifier registration if pcie_tph_intr_vec_supported()=false * Minor fixes in bnxt driver (i.e. warning messages) Manoj Panicker (1): bnxt_en: Add TPH support in BNXT driver Michael Chan (1): bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings Wei Huang (3): PCI: Add TLP Processing Hints (TPH) support PCI/TPH: Add Steering Tag support PCI/TPH: Add TPH documentation Documentation/PCI/index.rst | 1 + Documentation/PCI/tph.rst | 132 +++++ .../admin-guide/kernel-parameters.txt | 4 + Documentation/driver-api/pci/pci.rst | 3 + drivers/net/ethernet/broadcom/bnxt/bnxt.c | 91 ++- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 7 + drivers/pci/Kconfig | 10 + drivers/pci/Makefile | 1 + drivers/pci/pci.c | 4 + drivers/pci/pci.h | 12 + drivers/pci/probe.c | 1 + drivers/pci/tph.c | 544 ++++++++++++++++++ include/linux/pci-tph.h | 44 ++ include/linux/pci.h | 7 + include/uapi/linux/pci_regs.h | 38 +- net/core/netdev_rx_queue.c | 1 + 16 files changed, 890 insertions(+), 10 deletions(-) create mode 100644 Documentation/PCI/tph.rst create mode 100644 drivers/pci/tph.c create mode 100644 include/linux/pci-tph.h -- 2.46.0
Add support for PCIe TLP Processing Hints (TPH) support (see PCIe r6.2, sec 6.17). Add missing TPH register definitions in pci_regs.h, including the TPH Requester capability register, TPH Requester control register, TPH Completer capability, and the ST fields of MSI-X entry. Introduce pcie_enable_tph() and pcie_disable_tph(), enabling drivers to toggle TPH support and configure specific ST mode as needed. Also add a new kernel parameter, "pci=notph", allowing users to disable TPH support across the entire system. Co-developed-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> --- .../admin-guide/kernel-parameters.txt | 4 + drivers/pci/Kconfig | 10 + drivers/pci/Makefile | 1 + drivers/pci/pci.c | 4 + drivers/pci/pci.h | 12 ++ drivers/pci/probe.c | 1 + drivers/pci/tph.c | 197 ++++++++++++++++++ include/linux/pci-tph.h | 21 ++ include/linux/pci.h | 7 + include/uapi/linux/pci_regs.h | 38 +++- 10 files changed, 287 insertions(+), 8 deletions(-) create mode 100644 drivers/pci/tph.c create mode 100644 include/linux/pci-tph.h diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index XXXXXXX..XXXXXXX 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -XXX,XX +XXX,XX @@ nomio [S390] Do not use MIO instructions. norid [S390] ignore the RID field and force use of one PCI domain per PCI function + notph [PCIE] If the PCIE_TPH kernel config parameter + is enabled, this kernel boot option can be used + to disable PCIe TLP Processing Hints support + system-wide. pcie_aspm= [PCIE] Forcibly enable or ignore PCIe Active State Power Management. diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -XXX,XX +XXX,XX @@ config PCI_PASID If unsure, say N. +config PCIE_TPH + bool "TLP Processing Hints" + default n + help + This option adds support for PCIe TLP Processing Hints (TPH). + TPH allows endpoint devices to provide optimization hints, such as + desired caching behavior, for requests that target memory space. + These hints, called Steering Tags, can empower the system hardware + to optimize the utilization of platform resources. + config PCI_P2PDMA bool "PCI peer-to-peer transfer support" depends on ZONE_DEVICE diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makefile @@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_VGA_ARB) += vgaarb.o obj-$(CONFIG_PCI_DOE) += doe.o obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o obj-$(CONFIG_PCI_NPEM) += npem.o +obj-$(CONFIG_PCIE_TPH) += tph.o # Endpoint library must be initialized before its users obj-$(CONFIG_PCI_ENDPOINT) += endpoint/ diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -XXX,XX +XXX,XX @@ int pci_save_state(struct pci_dev *dev) pci_save_dpc_state(dev); pci_save_aer_state(dev); pci_save_ptm_state(dev); + pci_save_tph_state(dev); return pci_save_vc_state(dev); } EXPORT_SYMBOL(pci_save_state); @@ -XXX,XX +XXX,XX @@ void pci_restore_state(struct pci_dev *dev) pci_restore_rebar_state(dev); pci_restore_dpc_state(dev); pci_restore_ptm_state(dev); + pci_restore_tph_state(dev); pci_aer_clear_status(dev); pci_restore_aer_state(dev); @@ -XXX,XX +XXX,XX @@ static int __init pci_setup(char *str) pci_no_domains(); } else if (!strncmp(str, "noari", 5)) { pcie_ari_disabled = true; + } else if (!strncmp(str, "notph", 5)) { + pci_no_tph(); } else if (!strncmp(str, "cbiosize=", 9)) { pci_cardbus_io_size = memparse(str + 9, &str); } else if (!strncmp(str, "cbmemsize=", 10)) { diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -XXX,XX +XXX,XX @@ static inline int pci_iov_bus_range(struct pci_bus *bus) #endif /* CONFIG_PCI_IOV */ +#ifdef CONFIG_PCIE_TPH +void pci_restore_tph_state(struct pci_dev *dev); +void pci_save_tph_state(struct pci_dev *dev); +void pci_no_tph(void); +void pci_tph_init(struct pci_dev *dev); +#else +static inline void pci_restore_tph_state(struct pci_dev *dev) { } +static inline void pci_save_tph_state(struct pci_dev *dev) { } +static inline void pci_no_tph(void) { } +static inline void pci_tph_init(struct pci_dev *dev) { } +#endif + #ifdef CONFIG_PCIE_PTM void pci_ptm_init(struct pci_dev *dev); void pci_save_ptm_state(struct pci_dev *dev); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -XXX,XX +XXX,XX @@ static void pci_init_capabilities(struct pci_dev *dev) pci_dpc_init(dev); /* Downstream Port Containment */ pci_rcec_init(dev); /* Root Complex Event Collector */ pci_doe_init(dev); /* Data Object Exchange */ + pci_tph_init(dev); /* TLP Processing Hints */ pcie_report_downtraining(dev); pci_init_reset_methods(dev); diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/drivers/pci/tph.c @@ -XXX,XX +XXX,XX @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * TPH (TLP Processing Hints) support + * + * Copyright (C) 2024 Advanced Micro Devices, Inc. + * Eric Van Tassell <Eric.VanTassell@amd.com> + * Wei Huang <wei.huang2@amd.com> + */ +#include <linux/pci.h> +#include <linux/bitfield.h> +#include <linux/pci-tph.h> + +#include "pci.h" + +/* System-wide TPH disabled */ +static bool pci_tph_disabled; + +static u8 get_st_modes(struct pci_dev *pdev) +{ + u32 reg; + + pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®); + reg &= PCI_TPH_CAP_ST_NS | PCI_TPH_CAP_ST_IV | PCI_TPH_CAP_ST_DS; + + return reg; +} + +/* Return device's Root Port completer capability */ +static u8 get_rp_completer_type(struct pci_dev *pdev) +{ + struct pci_dev *rp; + u32 reg; + int ret; + + rp = pcie_find_root_port(pdev); + if (!rp) + return 0; + + ret = pcie_capability_read_dword(rp, PCI_EXP_DEVCAP2, ®); + if (ret) + return 0; + + return FIELD_GET(PCI_EXP_DEVCAP2_TPH_COMP_MASK, reg); +} + +/** + * pcie_disable_tph - Turn off TPH support for device + * @pdev: PCI device + * + * Return: none + */ +void pcie_disable_tph(struct pci_dev *pdev) +{ + if (!pdev->tph_cap) + return; + + if (!pdev->tph_enabled) + return; + + pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, 0); + + pdev->tph_mode = 0; + pdev->tph_req_type = 0; + pdev->tph_enabled = 0; +} +EXPORT_SYMBOL(pcie_disable_tph); + +/** + * pcie_enable_tph - Enable TPH support for device using a specific ST mode + * @pdev: PCI device + * @mode: ST mode to enable. Current supported modes include: + * + * - PCI_TPH_ST_NS_MODE: NO ST Mode + * - PCI_TPH_ST_IV_MODE: Interrupt Vector Mode + * - PCI_TPH_ST_DS_MODE: Device Specific Mode + * + * Checks whether the mode is actually supported by the device before enabling + * and returns an error if not. Additionally determines what types of requests, + * TPH or extended TPH, can be issued by the device based on its TPH requester + * capability and the Root Port's completer capability. + * + * Return: 0 on success, otherwise negative value (-errno) + */ +int pcie_enable_tph(struct pci_dev *pdev, int mode) +{ + u32 reg; + u8 dev_modes; + u8 rp_req_type; + + /* Honor "notph" kernel parameter */ + if (pci_tph_disabled) + return -EINVAL; + + if (!pdev->tph_cap) + return -EINVAL; + + if (pdev->tph_enabled) + return -EBUSY; + + /* Sanitize and check ST mode compatibility */ + mode &= PCI_TPH_CTRL_MODE_SEL_MASK; + dev_modes = get_st_modes(pdev); + if (!((1 << mode) & dev_modes)) + return -EINVAL; + + pdev->tph_mode = mode; + + /* Get req_type supported by device and its Root Port */ + pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®); + if (FIELD_GET(PCI_TPH_CAP_EXT_TPH, reg)) + pdev->tph_req_type = PCI_TPH_REQ_EXT_TPH; + else + pdev->tph_req_type = PCI_TPH_REQ_TPH_ONLY; + + rp_req_type = get_rp_completer_type(pdev); + + /* Final req_type is the smallest value of two */ + pdev->tph_req_type = min(pdev->tph_req_type, rp_req_type); + + if (pdev->tph_req_type == PCI_TPH_REQ_DISABLE) + return -EINVAL; + + /* Write them into TPH control register */ + pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, ®); + + reg &= ~PCI_TPH_CTRL_MODE_SEL_MASK; + reg |= FIELD_PREP(PCI_TPH_CTRL_MODE_SEL_MASK, pdev->tph_mode); + + reg &= ~PCI_TPH_CTRL_REQ_EN_MASK; + reg |= FIELD_PREP(PCI_TPH_CTRL_REQ_EN_MASK, pdev->tph_req_type); + + pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, reg); + + pdev->tph_enabled = 1; + + return 0; +} +EXPORT_SYMBOL(pcie_enable_tph); + +void pci_restore_tph_state(struct pci_dev *pdev) +{ + struct pci_cap_saved_state *save_state; + u32 *cap; + + if (!pdev->tph_cap) + return; + + if (!pdev->tph_enabled) + return; + + save_state = pci_find_saved_ext_cap(pdev, PCI_EXT_CAP_ID_TPH); + if (!save_state) + return; + + /* Restore control register and all ST entries */ + cap = &save_state->cap.data[0]; + pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, *cap++); +} + +void pci_save_tph_state(struct pci_dev *pdev) +{ + struct pci_cap_saved_state *save_state; + u32 *cap; + + if (!pdev->tph_cap) + return; + + if (!pdev->tph_enabled) + return; + + save_state = pci_find_saved_ext_cap(pdev, PCI_EXT_CAP_ID_TPH); + if (!save_state) + return; + + /* Save control register */ + cap = &save_state->cap.data[0]; + pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, cap++); +} + +void pci_no_tph(void) +{ + pci_tph_disabled = true; + + pr_info("PCIe TPH is disabled\n"); +} + +void pci_tph_init(struct pci_dev *pdev) +{ + u32 save_size; + + pdev->tph_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_TPH); + if (!pdev->tph_cap) + return; + + save_size = sizeof(u32); + pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_TPH, save_size); +} diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/include/linux/pci-tph.h @@ -XXX,XX +XXX,XX @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * TPH (TLP Processing Hints) + * + * Copyright (C) 2024 Advanced Micro Devices, Inc. + * Eric Van Tassell <Eric.VanTassell@amd.com> + * Wei Huang <wei.huang2@amd.com> + */ +#ifndef LINUX_PCI_TPH_H +#define LINUX_PCI_TPH_H + +#ifdef CONFIG_PCIE_TPH +void pcie_disable_tph(struct pci_dev *pdev); +int pcie_enable_tph(struct pci_dev *pdev, int mode); +#else +static inline void pcie_disable_tph(struct pci_dev *pdev) { } +static inline int pcie_enable_tph(struct pci_dev *pdev, int mode) +{ return -EINVAL; } +#endif + +#endif /* LINUX_PCI_TPH_H */ diff --git a/include/linux/pci.h b/include/linux/pci.h index XXXXXXX..XXXXXXX 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -XXX,XX +XXX,XX @@ struct pci_dev { unsigned int ats_enabled:1; /* Address Translation Svc */ unsigned int pasid_enabled:1; /* Process Address Space ID */ unsigned int pri_enabled:1; /* Page Request Interface */ + unsigned int tph_enabled:1; /* TLP Processing Hints */ unsigned int is_managed:1; /* Managed via devres */ unsigned int is_msi_managed:1; /* MSI release via devres installed */ unsigned int needs_freset:1; /* Requires fundamental reset */ @@ -XXX,XX +XXX,XX @@ struct pci_dev { /* These methods index pci_reset_fn_methods[] */ u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */ + +#ifdef CONFIG_PCIE_TPH + u16 tph_cap; /* TPH capability offset */ + u8 tph_mode; /* TPH mode */ + u8 tph_req_type; /* TPH requester type */ +#endif }; static inline struct pci_dev *pci_physfn(struct pci_dev *dev) diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index XXXXXXX..XXXXXXX 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -XXX,XX +XXX,XX @@ #define PCI_MSIX_ENTRY_UPPER_ADDR 0x4 /* Message Upper Address */ #define PCI_MSIX_ENTRY_DATA 0x8 /* Message Data */ #define PCI_MSIX_ENTRY_VECTOR_CTRL 0xc /* Vector Control */ -#define PCI_MSIX_ENTRY_CTRL_MASKBIT 0x00000001 +#define PCI_MSIX_ENTRY_CTRL_MASKBIT 0x00000001 /* Mask Bit */ +#define PCI_MSIX_ENTRY_CTRL_ST_LOWER 0x00ff0000 /* ST Lower */ +#define PCI_MSIX_ENTRY_CTRL_ST_UPPER 0xff000000 /* ST Upper */ /* CompactPCI Hotswap Register */ @@ -XXX,XX +XXX,XX @@ #define PCI_EXP_DEVCAP2_ATOMIC_COMP64 0x00000100 /* 64b AtomicOp completion */ #define PCI_EXP_DEVCAP2_ATOMIC_COMP128 0x00000200 /* 128b AtomicOp completion */ #define PCI_EXP_DEVCAP2_LTR 0x00000800 /* Latency tolerance reporting */ +#define PCI_EXP_DEVCAP2_TPH_COMP_MASK 0x00003000 /* TPH completer support */ #define PCI_EXP_DEVCAP2_OBFF_MASK 0x000c0000 /* OBFF support mechanism */ #define PCI_EXP_DEVCAP2_OBFF_MSG 0x00040000 /* New message signaling */ #define PCI_EXP_DEVCAP2_OBFF_WAKE 0x00080000 /* Re-use WAKE# for OBFF */ @@ -XXX,XX +XXX,XX @@ #define PCI_DPA_CAP_SUBSTATE_MASK 0x1F /* # substates - 1 */ #define PCI_DPA_BASE_SIZEOF 16 /* size with 0 substates */ +/* TPH Completer Support */ +#define PCI_EXP_DEVCAP2_TPH_COMP_NONE 0x0 /* None */ +#define PCI_EXP_DEVCAP2_TPH_COMP_TPH_ONLY 0x1 /* TPH only */ +#define PCI_EXP_DEVCAP2_TPH_COMP_EXT_TPH 0x3 /* TPH and Extended TPH */ + /* TPH Requester */ #define PCI_TPH_CAP 4 /* capability register */ -#define PCI_TPH_CAP_LOC_MASK 0x600 /* location mask */ -#define PCI_TPH_LOC_NONE 0x000 /* no location */ -#define PCI_TPH_LOC_CAP 0x200 /* in capability */ -#define PCI_TPH_LOC_MSIX 0x400 /* in MSI-X */ -#define PCI_TPH_CAP_ST_MASK 0x07FF0000 /* ST table mask */ -#define PCI_TPH_CAP_ST_SHIFT 16 /* ST table shift */ -#define PCI_TPH_BASE_SIZEOF 0xc /* size with no ST table */ +#define PCI_TPH_CAP_ST_NS 0x00000001 /* No ST Mode Supported */ +#define PCI_TPH_CAP_ST_IV 0x00000002 /* Interrupt Vector Mode Supported */ +#define PCI_TPH_CAP_ST_DS 0x00000004 /* Device Specific Mode Supported */ +#define PCI_TPH_CAP_EXT_TPH 0x00000100 /* Ext TPH Requester Supported */ +#define PCI_TPH_CAP_LOC_MASK 0x00000600 /* ST Table Location */ +#define PCI_TPH_LOC_NONE 0x00000000 /* Not present */ +#define PCI_TPH_LOC_CAP 0x00000200 /* In capability */ +#define PCI_TPH_LOC_MSIX 0x00000400 /* In MSI-X */ +#define PCI_TPH_CAP_ST_MASK 0x07FF0000 /* ST Table Size */ +#define PCI_TPH_CAP_ST_SHIFT 16 /* ST Table Size shift */ +#define PCI_TPH_BASE_SIZEOF 0xc /* Size with no ST table */ + +#define PCI_TPH_CTRL 8 /* control register */ +#define PCI_TPH_CTRL_MODE_SEL_MASK 0x00000007 /* ST Mode Select */ +#define PCI_TPH_ST_NS_MODE 0x0 /* No ST Mode */ +#define PCI_TPH_ST_IV_MODE 0x1 /* Interrupt Vector Mode */ +#define PCI_TPH_ST_DS_MODE 0x2 /* Device Specific Mode */ +#define PCI_TPH_CTRL_REQ_EN_MASK 0x00000300 /* TPH Requester Enable */ +#define PCI_TPH_REQ_DISABLE 0x0 /* No TPH requests allowed */ +#define PCI_TPH_REQ_TPH_ONLY 0x1 /* TPH only requests allowed */ +#define PCI_TPH_REQ_EXT_TPH 0x3 /* Extended TPH requests allowed */ /* Downstream Port Containment */ #define PCI_EXP_DPC_CAP 0x04 /* DPC Capability */ -- 2.46.0
pcie_tph_get_cpu_st() is added to allow a caller to retrieve Steering Tags for a target memory that is associated with a specific CPU. The ST tag is retrieved by invoking ACPI _DSM of the device's Root Port device. pcie_tph_set_st_entry() is added to support updating the device's Steering Tags. The tags will be written into the device's MSI-X table or the ST table located in the TPH Extended Capability space. Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> --- drivers/pci/tph.c | 349 +++++++++++++++++++++++++++++++++++++++- include/linux/pci-tph.h | 23 +++ 2 files changed, 371 insertions(+), 1 deletion(-) diff --git a/drivers/pci/tph.c b/drivers/pci/tph.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/pci/tph.c +++ b/drivers/pci/tph.c @@ -XXX,XX +XXX,XX @@ * Wei Huang <wei.huang2@amd.com> */ #include <linux/pci.h> +#include <linux/pci-acpi.h> +#include <linux/msi.h> #include <linux/bitfield.h> #include <linux/pci-tph.h> @@ -XXX,XX +XXX,XX @@ /* System-wide TPH disabled */ static bool pci_tph_disabled; +#ifdef CONFIG_ACPI +/* + * The st_info struct defines the Steering Tag (ST) info returned by the + * firmware _DSM method defined in the approved ECN for PCI Firmware Spec, + * available at https://members.pcisig.com/wg/PCI-SIG/document/15470. + * + * @vm_st_valid: 8-bit ST for volatile memory is valid + * @vm_xst_valid: 16-bit extended ST for volatile memory is valid + * @vm_ph_ignore: 1 => PH was and will be ignored, 0 => PH should be supplied + * @vm_st: 8-bit ST for volatile mem + * @vm_xst: 16-bit extended ST for volatile mem + * @pm_st_valid: 8-bit ST for persistent memory is valid + * @pm_xst_valid: 16-bit extended ST for persistent memory is valid + * @pm_ph_ignore: 1 => PH was and will be ignored, 0 => PH should be supplied + * @pm_st: 8-bit ST for persistent mem + * @pm_xst: 16-bit extended ST for persistent mem + */ +union st_info { + struct { + u64 vm_st_valid : 1; + u64 vm_xst_valid : 1; + u64 vm_ph_ignore : 1; + u64 rsvd1 : 5; + u64 vm_st : 8; + u64 vm_xst : 16; + u64 pm_st_valid : 1; + u64 pm_xst_valid : 1; + u64 pm_ph_ignore : 1; + u64 rsvd2 : 5; + u64 pm_st : 8; + u64 pm_xst : 16; + }; + u64 value; +}; + +static u16 tph_extract_tag(enum tph_mem_type mem_type, u8 req_type, + union st_info *info) +{ + switch (req_type) { + case PCI_TPH_REQ_TPH_ONLY: /* 8-bit tag */ + switch (mem_type) { + case TPH_MEM_TYPE_VM: + if (info->vm_st_valid) + return info->vm_st; + break; + case TPH_MEM_TYPE_PM: + if (info->pm_st_valid) + return info->pm_st; + break; + } + break; + case PCI_TPH_REQ_EXT_TPH: /* 16-bit tag */ + switch (mem_type) { + case TPH_MEM_TYPE_VM: + if (info->vm_xst_valid) + return info->vm_xst; + break; + case TPH_MEM_TYPE_PM: + if (info->pm_xst_valid) + return info->pm_xst; + break; + } + break; + default: + return 0; + } + + return 0; +} + +#define TPH_ST_DSM_FUNC_INDEX 0xF +static acpi_status tph_invoke_dsm(acpi_handle handle, u32 cpu_uid, + union st_info *st_out) +{ + union acpi_object arg3[3], in_obj, *out_obj; + + if (!acpi_check_dsm(handle, &pci_acpi_dsm_guid, 7, + BIT(TPH_ST_DSM_FUNC_INDEX))) + return AE_ERROR; + + /* DWORD: feature ID (0 for processor cache ST query) */ + arg3[0].integer.type = ACPI_TYPE_INTEGER; + arg3[0].integer.value = 0; + + /* DWORD: target UID */ + arg3[1].integer.type = ACPI_TYPE_INTEGER; + arg3[1].integer.value = cpu_uid; + + /* QWORD: properties, all 0's */ + arg3[2].integer.type = ACPI_TYPE_INTEGER; + arg3[2].integer.value = 0; + + in_obj.type = ACPI_TYPE_PACKAGE; + in_obj.package.count = ARRAY_SIZE(arg3); + in_obj.package.elements = arg3; + + out_obj = acpi_evaluate_dsm(handle, &pci_acpi_dsm_guid, 7, + TPH_ST_DSM_FUNC_INDEX, &in_obj); + if (!out_obj) + return AE_ERROR; + + if (out_obj->type != ACPI_TYPE_BUFFER) { + ACPI_FREE(out_obj); + return AE_ERROR; + } + + st_out->value = *((u64 *)(out_obj->buffer.pointer)); + + ACPI_FREE(out_obj); + + return AE_OK; +} +#endif + +/* Update the TPH Requester Enable field of TPH Control Register */ +static void set_ctrl_reg_req_en(struct pci_dev *pdev, u8 req_type) +{ + u32 reg; + + pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, ®); + + reg &= ~PCI_TPH_CTRL_REQ_EN_MASK; + reg |= FIELD_PREP(PCI_TPH_CTRL_REQ_EN_MASK, req_type); + + pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, reg); +} + static u8 get_st_modes(struct pci_dev *pdev) { u32 reg; @@ -XXX,XX +XXX,XX @@ static u8 get_st_modes(struct pci_dev *pdev) return reg; } +static u32 get_st_table_loc(struct pci_dev *pdev) +{ + u32 reg; + + pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®); + + return FIELD_GET(PCI_TPH_CAP_LOC_MASK, reg); +} + +/* + * Return the size of ST table. If ST table is not in TPH Requester Extended + * Capability space, return 0. Otherwise return the ST Table Size + 1. + */ +static u16 get_st_table_size(struct pci_dev *pdev) +{ + u32 reg; + u32 loc; + + /* Check ST table location first */ + loc = get_st_table_loc(pdev); + + /* Convert loc to match with PCI_TPH_LOC_* defined in pci_regs.h */ + loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc); + if (loc != PCI_TPH_LOC_CAP) + return 0; + + pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CAP, ®); + + return FIELD_GET(PCI_TPH_CAP_ST_MASK, reg) + 1; +} + /* Return device's Root Port completer capability */ static u8 get_rp_completer_type(struct pci_dev *pdev) { @@ -XXX,XX +XXX,XX @@ static u8 get_rp_completer_type(struct pci_dev *pdev) return FIELD_GET(PCI_EXP_DEVCAP2_TPH_COMP_MASK, reg); } +/* Write ST to MSI-X vector control reg - Return 0 if OK, otherwise -errno */ +static int write_tag_to_msix(struct pci_dev *pdev, int msix_idx, u16 tag) +{ + struct msi_desc *msi_desc = NULL; + void __iomem *vec_ctrl; + u32 val, mask, st_val; + int err = 0; + + msi_lock_descs(&pdev->dev); + + /* Find the msi_desc entry with matching msix_idx */ + msi_for_each_desc(msi_desc, &pdev->dev, MSI_DESC_ASSOCIATED) { + if (msi_desc->msi_index == msix_idx) + break; + } + + if (!msi_desc) { + err = -ENXIO; + goto err_out; + } + + st_val = (u32)tag; + + /* Get the vector control register (offset 0xc) pointed by msix_idx */ + vec_ctrl = pdev->msix_base + msix_idx * PCI_MSIX_ENTRY_SIZE; + vec_ctrl += PCI_MSIX_ENTRY_VECTOR_CTRL; + + val = readl(vec_ctrl); + mask = PCI_MSIX_ENTRY_CTRL_ST_LOWER | PCI_MSIX_ENTRY_CTRL_ST_UPPER; + val &= ~mask; + val |= FIELD_PREP(mask, st_val); + writel(val, vec_ctrl); + + /* Read back to flush the update */ + val = readl(vec_ctrl); + +err_out: + msi_unlock_descs(&pdev->dev); + return err; +} + +/* Write tag to ST table - Return 0 if OK, otherwise -errno */ +static int write_tag_to_st_table(struct pci_dev *pdev, int index, u16 tag) +{ + int st_table_size; + int offset; + + /* Check if index is out of bound */ + st_table_size = get_st_table_size(pdev); + if (index >= st_table_size) + return -ENXIO; + + offset = pdev->tph_cap + PCI_TPH_BASE_SIZEOF + index * sizeof(u16); + + return pci_write_config_word(pdev, offset, tag); +} + +/** + * pcie_tph_get_cpu_st() - Retrieve Steering Tag for a target memory associated + * with a specific CPU + * @pdev: PCI device + * @mem_type: target memory type (volatile or persistent RAM) + * @cpu_uid: associated CPU id + * @tag: Steering Tag to be returned + * + * This function returns the Steering Tag for a target memory that is + * associated with a specific CPU as indicated by cpu_uid. + * + * Returns: 0 if success, otherwise negative value (-errno) + */ +int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type mem_type, + unsigned int cpu_uid, u16 *tag) +{ +#ifdef CONFIG_ACPI + struct pci_dev *rp; + acpi_handle rp_acpi_handle; + union st_info info; + + rp = pcie_find_root_port(pdev); + if (!rp || !rp->bus || !rp->bus->bridge) + return -ENODEV; + + rp_acpi_handle = ACPI_HANDLE(rp->bus->bridge); + + if (tph_invoke_dsm(rp_acpi_handle, cpu_uid, &info) != AE_OK) { + *tag = 0; + return -EINVAL; + } + + *tag = tph_extract_tag(mem_type, pdev->tph_req_type, &info); + + pci_dbg(pdev, "get steering tag: mem_type=%s, cpu_uid=%d, tag=%#04x\n", + (mem_type == TPH_MEM_TYPE_VM) ? "volatile" : "persistent", + cpu_uid, *tag); + + return 0; +#else + return -ENODEV; +#endif +} +EXPORT_SYMBOL(pcie_tph_get_cpu_st); + +/** + * pcie_tph_set_st_entry() - Set Steering Tag in the ST table entry + * @pdev: PCI device + * @index: ST table entry index + * @tag: Steering Tag to be written + * + * This function will figure out the proper location of ST table, either in the + * MSI-X table or in the TPH Extended Capability space, and write the Steering + * Tag into the ST entry pointed by index. + * + * Returns: 0 if success, otherwise negative value (-errno) + */ +int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index, u16 tag) +{ + u32 loc; + int err = 0; + + if (!pdev->tph_cap) + return -EINVAL; + + if (!pdev->tph_enabled) + return -EINVAL; + + /* No need to write tag if device is in "No ST Mode" */ + if (pdev->tph_mode == PCI_TPH_ST_NS_MODE) + return 0; + + /* Disable TPH before updating ST to avoid potential instability as + * cautioned in PCIe r6.2, sec 6.17.3, "ST Modes of Operation" + */ + set_ctrl_reg_req_en(pdev, PCI_TPH_REQ_DISABLE); + + loc = get_st_table_loc(pdev); + /* Convert loc to match with PCI_TPH_LOC_* defined in pci_regs.h */ + loc = FIELD_PREP(PCI_TPH_CAP_LOC_MASK, loc); + + switch (loc) { + case PCI_TPH_LOC_MSIX: + err = write_tag_to_msix(pdev, index, tag); + break; + case PCI_TPH_LOC_CAP: + err = write_tag_to_st_table(pdev, index, tag); + break; + default: + err = -EINVAL; + } + + if (err) { + pcie_disable_tph(pdev); + return err; + } + + set_ctrl_reg_req_en(pdev, pdev->tph_mode); + + pci_dbg(pdev, "set steering tag: %s table, index=%d, tag=%#04x\n", + (loc == PCI_TPH_LOC_MSIX) ? "MSI-X" : "ST", index, tag); + + return 0; +} +EXPORT_SYMBOL(pcie_tph_set_st_entry); + /** * pcie_disable_tph - Turn off TPH support for device * @pdev: PCI device @@ -XXX,XX +XXX,XX @@ EXPORT_SYMBOL(pcie_enable_tph); void pci_restore_tph_state(struct pci_dev *pdev) { struct pci_cap_saved_state *save_state; + int num_entries, i, offset; + u16 *st_entry; u32 *cap; if (!pdev->tph_cap) @@ -XXX,XX +XXX,XX @@ void pci_restore_tph_state(struct pci_dev *pdev) /* Restore control register and all ST entries */ cap = &save_state->cap.data[0]; pci_write_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, *cap++); + st_entry = (u16 *)cap; + offset = PCI_TPH_BASE_SIZEOF; + num_entries = get_st_table_size(pdev); + for (i = 0; i < num_entries; i++) { + pci_write_config_word(pdev, pdev->tph_cap + offset, + *st_entry++); + offset += sizeof(u16); + } } void pci_save_tph_state(struct pci_dev *pdev) { struct pci_cap_saved_state *save_state; + int num_entries, i, offset; + u16 *st_entry; u32 *cap; if (!pdev->tph_cap) @@ -XXX,XX +XXX,XX @@ void pci_save_tph_state(struct pci_dev *pdev) /* Save control register */ cap = &save_state->cap.data[0]; pci_read_config_dword(pdev, pdev->tph_cap + PCI_TPH_CTRL, cap++); + + /* Save all ST entries in extended capability structure */ + st_entry = (u16 *)cap; + offset = PCI_TPH_BASE_SIZEOF; + num_entries = get_st_table_size(pdev); + for (i = 0; i < num_entries; i++) { + pci_read_config_word(pdev, pdev->tph_cap + offset, + st_entry++); + offset += sizeof(u16); + } } void pci_no_tph(void) @@ -XXX,XX +XXX,XX @@ void pci_no_tph(void) void pci_tph_init(struct pci_dev *pdev) { + int num_entries; u32 save_size; pdev->tph_cap = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_TPH); if (!pdev->tph_cap) return; - save_size = sizeof(u32); + num_entries = get_st_table_size(pdev); + save_size = sizeof(u32) + num_entries * sizeof(u16); pci_add_ext_cap_save_buffer(pdev, PCI_EXT_CAP_ID_TPH, save_size); } diff --git a/include/linux/pci-tph.h b/include/linux/pci-tph.h index XXXXXXX..XXXXXXX 100644 --- a/include/linux/pci-tph.h +++ b/include/linux/pci-tph.h @@ -XXX,XX +XXX,XX @@ #ifndef LINUX_PCI_TPH_H #define LINUX_PCI_TPH_H +/* + * According to the ECN for PCI Firmware Spec, Steering Tag can be different + * depending on the memory type: Volatile Memory or Persistent Memory. When a + * caller query about a target's Steering Tag, it must provide the target's + * tph_mem_type. ECN link: https://members.pcisig.com/wg/PCI-SIG/document/15470. + */ +enum tph_mem_type { + TPH_MEM_TYPE_VM, /* volatile memory */ + TPH_MEM_TYPE_PM /* persistent memory */ +}; + #ifdef CONFIG_PCIE_TPH +int pcie_tph_set_st_entry(struct pci_dev *pdev, + unsigned int index, u16 tag); +int pcie_tph_get_cpu_st(struct pci_dev *dev, + enum tph_mem_type mem_type, + unsigned int cpu_uid, u16 *tag); void pcie_disable_tph(struct pci_dev *pdev); int pcie_enable_tph(struct pci_dev *pdev, int mode); #else +static inline int pcie_tph_set_st_entry(struct pci_dev *pdev, + unsigned int index, u16 tag) +{ return -EINVAL; } +static inline int pcie_tph_get_cpu_st(struct pci_dev *dev, + enum tph_mem_type mem_type, + unsigned int cpu_uid, u16 *tag) +{ return -EINVAL; } static inline void pcie_disable_tph(struct pci_dev *pdev) { } static inline int pcie_enable_tph(struct pci_dev *pdev, int mode) { return -EINVAL; } -- 2.46.0
Provide a document for TPH feature, including the description of "notph" kernel parameter and the API interface. Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> --- Documentation/PCI/index.rst | 1 + Documentation/PCI/tph.rst | 132 +++++++++++++++++++++++++++ Documentation/driver-api/pci/pci.rst | 3 + 3 files changed, 136 insertions(+) create mode 100644 Documentation/PCI/tph.rst diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst index XXXXXXX..XXXXXXX 100644 --- a/Documentation/PCI/index.rst +++ b/Documentation/PCI/index.rst @@ -XXX,XX +XXX,XX @@ PCI Bus Subsystem pcieaer-howto endpoint/index boot-interrupts + tph diff --git a/Documentation/PCI/tph.rst b/Documentation/PCI/tph.rst new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/Documentation/PCI/tph.rst @@ -XXX,XX +XXX,XX @@ +.. SPDX-License-Identifier: GPL-2.0 + + +=========== +TPH Support +=========== + +:Copyright: 2024 Advanced Micro Devices, Inc. +:Authors: - Eric van Tassell <eric.vantassell@amd.com> + - Wei Huang <wei.huang2@amd.com> + + +Overview +======== + +TPH (TLP Processing Hints) is a PCIe feature that allows endpoint devices +to provide optimization hints for requests that target memory space. +These hints, in a format called Steering Tags (STs), are embedded in the +requester's TLP headers, enabling the system hardware, such as the Root +Complex, to better manage platform resources for these requests. + +For example, on platforms with TPH-based direct data cache injection +support, an endpoint device can include appropriate STs in its DMA +traffic to specify which cache the data should be written to. This allows +the CPU core to have a higher probability of getting data from cache, +potentially improving performance and reducing latency in data +processing. + + +How to Use TPH +============== + +TPH is presented as an optional extended capability in PCIe. The Linux +kernel handles TPH discovery during boot, but it is up to the device +driver to request TPH enablement if it is to be utilized. Once enabled, +the driver uses the provided API to obtain the Steering Tag for the +target memory and to program the ST into the device's ST table. + +Enable TPH support in Linux +--------------------------- + +To support TPH, the kernel must be built with the CONFIG_PCIE_TPH option +enabled. + +Manage TPH +---------- + +To enable TPH for a device, use the following function:: + + int pcie_enable_tph(struct pci_dev *pdev, int mode); + +This function enables TPH support for device with a specific ST mode. +Current supported modes include: + + * PCI_TPH_ST_NS_MODE - NO ST Mode + * PCI_TPH_ST_IV_MODE - Interrupt Vector Mode + * PCI_TPH_ST_DS_MODE - Device Specific Mode + +`pcie_enable_tph()` checks whether the requested mode is actually +supported by the device before enabling. The device driver can figure out +which TPH mode is supported and can be properly enabled based on the +return value of `pcie_enable_tph()`. + +To disable TPH, use the following function:: + + void pcie_disable_tph(struct pci_dev *pdev); + +Manage ST +--------- + +Steering Tags are platform specific. PCIe spec does not specify where STs +are from. Instead PCI Firmware Specification defines an ACPI _DSM method +(see the `Revised _DSM for Cache Locality TPH Features ECN +<https://members.pcisig.com/wg/PCI-SIG/document/15470>`_) for retrieving +STs for a target memory of various properties. This method is what is +supported in this implementation. + +To retrieve a Steering Tag for a target memory associated with a specific +CPU, use the following function:: + + int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type type, + unsigned int cpu_uid, u16 *tag); + +The `type` argument is used to specify the memory type, either volatile +or persistent, of the target memory. The `cpu_uid` argument specifies the +CPU where the memory is associated to. + +After the ST value is retrieved, the device driver can use the following +function to write the ST into the device:: + + int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index, + u16 tag); + +The `index` argument is the ST table entry index the ST tag will be +written into. `pcie_tph_set_st_entry()` will figure out the proper +location of ST table, either in the MSI-X table or in the TPH Extended +Capability space, and write the Steering Tag into the ST entry pointed by +the `index` argument. + +It is completely up to the driver to decide how to use these TPH +functions. For example a network device driver can use the TPH APIs above +to update the Steering Tag when interrupt affinity of a RX/TX queue has +been changed. Here is a sample code for IRQ affinity notifier: + +.. code-block:: c + + static void irq_affinity_notified(struct irq_affinity_notify *notify, + const cpumask_t *mask) + { + struct drv_irq *irq; + unsigned int cpu_id; + u16 tag; + + irq = container_of(notify, struct drv_irq, affinity_notify); + cpumask_copy(irq->cpu_mask, mask); + + /* Pick a right CPU as the target - here is just an example */ + cpu_id = cpumask_first(irq->cpu_mask); + + if (pcie_tph_get_cpu_st(irq->pdev, TPH_MEM_TYPE_VM, cpu_id, + &tag)) + return; + + if (pcie_tph_set_st_entry(irq->pdev, irq->msix_nr, tag)) + return; + } + +Disable TPH system-wide +----------------------- + +There is a kernel command line option available to control TPH feature: + * "notph": TPH will be disabled for all endpoint devices. diff --git a/Documentation/driver-api/pci/pci.rst b/Documentation/driver-api/pci/pci.rst index XXXXXXX..XXXXXXX 100644 --- a/Documentation/driver-api/pci/pci.rst +++ b/Documentation/driver-api/pci/pci.rst @@ -XXX,XX +XXX,XX @@ PCI Support Library .. kernel-doc:: drivers/pci/pci-sysfs.c :internal: +.. kernel-doc:: drivers/pci/tph.c + :export: + PCI Hotplug Support Library --------------------------- -- 2.46.0
From: Manoj Panicker <manoj.panicker2@amd.com> Implement TPH support in Broadcom BNXT device driver. The driver uses TPH functions to retrieve and configure the device's Steering Tags when its interrupt affinity is being changed. With appropriate firmware, we see sustancial memory bandwidth savings and other benefits using real network benchmarks. Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Co-developed-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 83 +++++++++++++++++++++++ drivers/net/ethernet/broadcom/bnxt/bnxt.h | 7 ++ net/core/netdev_rx_queue.c | 1 + 3 files changed, 91 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -XXX,XX +XXX,XX @@ #include <net/page_pool/helpers.h> #include <linux/align.h> #include <net/netdev_queues.h> +#include <net/netdev_rx_queue.h> +#include <linux/pci-tph.h> #include "bnxt_hsi.h" #include "bnxt.h" @@ -XXX,XX +XXX,XX @@ int bnxt_reserve_rings(struct bnxt *bp, bool irq_re_init) return 0; } +static void __bnxt_irq_affinity_notify(struct irq_affinity_notify *notify, + const cpumask_t *mask) +{ + struct bnxt_irq *irq; + u16 tag; + int err; + + irq = container_of(notify, struct bnxt_irq, affinity_notify); + cpumask_copy(irq->cpu_mask, mask); + + if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM, + cpumask_first(irq->cpu_mask), &tag)) + return; + + if (pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag)) + return; + + if (netif_running(irq->bp->dev)) { + rtnl_lock(); + err = netdev_rx_queue_restart(irq->bp->dev, irq->ring_nr); + if (err) + netdev_err(irq->bp->dev, + "rx queue restart failed: err=%d\n", err); + rtnl_unlock(); + } +} + +static void __bnxt_irq_affinity_release(struct kref __always_unused *ref) +{ +} + +static void bnxt_release_irq_notifier(struct bnxt_irq *irq) +{ + irq_set_affinity_notifier(irq->vector, NULL); +} + +static void bnxt_register_irq_notifier(struct bnxt *bp, struct bnxt_irq *irq) +{ + struct irq_affinity_notify *notify; + + irq->bp = bp; + + /* Nothing to do if TPH is not enabled */ + if (!bp->tph_mode) + return; + + /* Register IRQ affinity notifier */ + notify = &irq->affinity_notify; + notify->irq = irq->vector; + notify->notify = __bnxt_irq_affinity_notify; + notify->release = __bnxt_irq_affinity_release; + + irq_set_affinity_notifier(irq->vector, notify); +} + static void bnxt_free_irq(struct bnxt *bp) { struct bnxt_irq *irq; @@ -XXX,XX +XXX,XX @@ static void bnxt_free_irq(struct bnxt *bp) free_cpumask_var(irq->cpu_mask); irq->have_cpumask = 0; } + + bnxt_release_irq_notifier(irq); + free_irq(irq->vector, bp->bnapi[i]); } irq->requested = 0; } + + /* Disable TPH support */ + pcie_disable_tph(bp->pdev); + bp->tph_mode = 0; } static int bnxt_request_irq(struct bnxt *bp) @@ -XXX,XX +XXX,XX @@ static int bnxt_request_irq(struct bnxt *bp) #ifdef CONFIG_RFS_ACCEL rmap = bp->dev->rx_cpu_rmap; #endif + + /* Enable TPH support as part of IRQ request */ + rc = pcie_enable_tph(bp->pdev, PCI_TPH_ST_IV_MODE); + if (!rc) + bp->tph_mode = PCI_TPH_ST_IV_MODE; + for (i = 0, j = 0; i < bp->cp_nr_rings; i++) { int map_idx = bnxt_cp_num_to_irq_num(bp, i); struct bnxt_irq *irq = &bp->irq_tbl[map_idx]; @@ -XXX,XX +XXX,XX @@ static int bnxt_request_irq(struct bnxt *bp) if (zalloc_cpumask_var(&irq->cpu_mask, GFP_KERNEL)) { int numa_node = dev_to_node(&bp->pdev->dev); + u16 tag; irq->have_cpumask = 1; + irq->msix_nr = map_idx; + irq->ring_nr = i; cpumask_set_cpu(cpumask_local_spread(i, numa_node), irq->cpu_mask); rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask); @@ -XXX,XX +XXX,XX @@ static int bnxt_request_irq(struct bnxt *bp) irq->vector); break; } + + bnxt_register_irq_notifier(bp, irq); + + /* Init ST table entry */ + if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM, + cpumask_first(irq->cpu_mask), + &tag)) + continue; + + pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag); } } return rc; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index XXXXXXX..XXXXXXX 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -XXX,XX +XXX,XX @@ struct bnxt_irq { u8 have_cpumask:1; char name[IFNAMSIZ + BNXT_IRQ_NAME_EXTRA]; cpumask_var_t cpu_mask; + + struct bnxt *bp; + int msix_nr; + int ring_nr; + struct irq_affinity_notify affinity_notify; }; #define HWRM_RING_ALLOC_TX 0x1 @@ -XXX,XX +XXX,XX @@ struct bnxt { struct net_device *dev; struct pci_dev *pdev; + u8 tph_mode; + atomic_t intr_sem; u32 flags; diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c index XXXXXXX..XXXXXXX 100644 --- a/net/core/netdev_rx_queue.c +++ b/net/core/netdev_rx_queue.c @@ -XXX,XX +XXX,XX @@ int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx) return err; } +EXPORT_SYMBOL_GPL(netdev_rx_queue_restart); -- 2.46.0
From: Michael Chan <michael.chan@broadcom.com> Newer firmware can use the NQ ring ID associated with each RX/RX AGG ring to enable PCIe steering tag. Older firmware will just ignore the information. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -XXX,XX +XXX,XX @@ static int hwrm_ring_alloc_send_msg(struct bnxt *bp, /* Association of rx ring with stats context */ grp_info = &bp->grp_info[ring->grp_idx]; + req->nq_ring_id = cpu_to_le16(grp_info->cp_fw_ring_id); req->rx_buf_size = cpu_to_le16(bp->rx_buf_use_size); req->stat_ctx_id = cpu_to_le32(grp_info->fw_stats_ctx); req->enables |= cpu_to_le32( - RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID); + RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID | + RING_ALLOC_REQ_ENABLES_NQ_RING_ID_VALID); if (NET_IP_ALIGN == 2) flags = RING_ALLOC_REQ_FLAGS_RX_SOP_PAD; req->flags = cpu_to_le16(flags); @@ -XXX,XX +XXX,XX @@ static int hwrm_ring_alloc_send_msg(struct bnxt *bp, /* Association of agg ring with rx ring */ grp_info = &bp->grp_info[ring->grp_idx]; req->rx_ring_id = cpu_to_le16(grp_info->rx_fw_ring_id); + req->nq_ring_id = cpu_to_le16(grp_info->cp_fw_ring_id); req->rx_buf_size = cpu_to_le16(BNXT_RX_PAGE_SIZE); req->stat_ctx_id = cpu_to_le32(grp_info->fw_stats_ctx); req->enables |= cpu_to_le32( RING_ALLOC_REQ_ENABLES_RX_RING_ID_VALID | - RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID); + RING_ALLOC_REQ_ENABLES_RX_BUF_SIZE_VALID | + RING_ALLOC_REQ_ENABLES_NQ_RING_ID_VALID); } else { req->ring_type = RING_ALLOC_REQ_RING_TYPE_RX; } -- 2.46.0