[PATCH v3 00/13] spi: cadence-quadspi: add PHY tuning support

Santhosh Kumar K posted 13 patches 1 week, 4 days ago
.../spi/cdns,qspi-nor-peripheral-props.yaml   |    8 +
.../bindings/spi/spi-peripheral-props.yaml    |   10 +-
drivers/mtd/nand/spi/core.c                   |   35 +
drivers/mtd/spi-nor/core.c                    |   85 +-
drivers/spi/spi-cadence-quadspi.c             | 2267 +++++++++++++++--
drivers/spi/spi-mem.c                         |   57 +-
drivers/spi/spi.c                             |   17 +-
include/linux/mtd/spi-nor.h                   |    3 +
include/linux/mtd/spinand.h                   |    4 +
include/linux/spi/spi-mem.h                   |   10 +
include/linux/spi/spi.h                       |    2 +
11 files changed, 2308 insertions(+), 190 deletions(-)
[PATCH v3 00/13] spi: cadence-quadspi: add PHY tuning support
Posted by Santhosh Kumar K 1 week, 4 days ago
This series implements PHY tuning support for the Cadence QSPI controller
to enable reliable high-speed operations. Without PHY tuning, controllers
use conservative timing that limits performance. PHY tuning calibrates
RX/TX delay lines to find optimal data capture timing windows, enabling
operation up to the controller's maximum frequency.

Background:
High-speed SPI memory controllers require precise timing calibration for
reliable operation. At higher frequencies, board-to-board variations make
fixed timing parameters inadequate. The Cadence QSPI controller includes
a PHY interface with programmable delay lines (0-127 taps) for RX and TX
paths, but these require runtime calibration to find the valid timing
window.

Approach:
Add SDR/DDR PHY tuning algorithms for the Cadence controller:

SDR Mode Tuning (1D search):
 - Searches for two consecutive valid RX delay windows
 - Selects the larger window and uses its midpoint for maximum margin
 - TX delay fixed at maximum (127) as it's less critical in SDR

DDR Mode Tuning (2D search):
 - Finds RX boundaries (rxlow/rxhigh) using TX window sweeps
 - Finds TX boundaries (txlow/txhigh) at fixed RX positions
 - Defines valid region corners and detects gaps via binary search
 - Applies temperature compensation for optimal point selection
 - Handles single or dual passing regions with different strategies

Patch description:
Infrastructure (1-5):
 - Patch 1:   Extend spi-max-frequency DT binding to accept an optional
              second value forming a [base-freq, max-freq] pair
 - Patch 2:   Add cadence-specific cdns,phy-pattern-partition phandle for
              NOR flash PHY tuning pattern location
 - Patch 3:   Parse two-element spi-max-frequency in spi.c; adds
              spi_device.base_speed_hz (0 when a single value is used,
              keeping all existing DT fully compatible)
 - Patch 4:   Add spi_mem_apply_base_freq_cap(), called from
              spi_mem_exec_op() to cap non-PHY ops to base_speed_hz;
              tuned ops bypass the cap because execute_tuning() marks
              them with op->max_freq = max_speed_hz
 - Patch 5:   Add execute_tuning callback to spi_controller_mem_ops and
              spi_mem_execute_tuning() wrapper in SPI-MEM core

Cadence QSPI Implementation (6-10):
 - Patch 6:   Move cqspi_readdata_capture() earlier (preparatory)
 - Patch 7:   Add DQS bit to cqspi_readdata_capture() (preparatory)
 - Patch 8:   Add complete PHY tuning support: DLL management, pattern
              verification (NOR via cdns,phy-pattern-partition phandle,
              NAND via write-to-cache), SDR 1D and DDR 2D search
              algorithms with temperature compensation, AM654-specific
              execute_tuning entry point; base_speed_hz is cleared during
              the tuning loop and restored unconditionally on return
 - Patch 9:   Reject 2-byte-address DDR operations via a new
              CQSPI_NO_2BYTE_ADDR_PHY_DDR quirk flag to work around
              AM654 OSPI erratum i2383
 - Patch 10:  Enable PHY for direct memory-mapped reads (aligned body
              region only; unaligned head and tail run without PHY) and
              for indirect writes >= 1 KB

MTD core (11-13):
 - Patch 11:  Integrate tuning in SPI-NAND probe; propagate the validated
              frequency to all plane dirmaps (primary and secondary op
              templates) and to the persistent write dirmap template
 - Patch 12:  Extract spi_nor_spimem_get_read_op() helper (preparatory)
 - Patch 13:  Integrate tuning in SPI-NOR probe; patch the dirmap op
              template with the validated frequency; store the result in
              nor->max_read_op so all subsequent reads (dirmap and direct)
              pick up the tuned speed automatically

Series dependency:
Merge after: https://lore.kernel.org/linux-spi/20260527173736.2243004-1-s-k6@ti.com/T/#u

Testing:
This series was tested on TI's
AM62Ax SK with OSPI NAND flash and
AM62Px SK with OSPI NOR flash:

Read throughput:
|-------------------------------------|
|           | without PHY | with PHY  |
|-------------------------------------|
| OSPI NOR  | 37.5 MB/s   | 216 MB/s  |
|-------------------------------------|
| OSPI NAND | 9.2 MB/s    | 35.1 MB/s |
|-------------------------------------|

Write throughput:
|-------------------------------------|
|           | without PHY | with PHY  |
|-------------------------------------|
| OSPI NAND | 6 MB/s      | 9.2 MB/s  |
|-------------------------------------|

Test log: https://gist.github.com/santhosh21/3434d062f31622c5877a375218cd49c7
Repo: https://github.com/santhosh21/linux/commits/phy_tuning_v3/

Changes in v3:
 - Drop spi-has-dqs DT property; DQS is now enabled automatically when
   the selected read operation uses DDR signalling (dtr flags in the op)
 - Extend spi-max-frequency to accept an optional second value forming a
   [base-freq, max-freq] pair; the presence of two values signals PHY
   tuning intent and encodes both the conservative base speed and the
   calibration target in one property
 - Add base_speed_hz to struct spi_device (spi.c/spi.h) and parse the
   two-element array there; single-value DT is fully backward-compatible
 - Move frequency enforcement from the cadence driver to core: new
   spi_mem_apply_base_freq_cap() called from spi_mem_exec_op() replaces
   the per-driver cqspi_op_matches_tuned() and non_phy_clk_rate field
 - Propagate the tuned max_freq to dirmap op templates after
   execute_tuning() succeeds; store persistent op templates in
   spi_nor.max_read_op and spinand.{max_read,max_write}_op so the
   frequency writeback survives across the probe call
 - Replace NOR pattern partition lookup by name with a
   cdns,phy-pattern-partition DT phandle pointing directly to the
   partition node
 - Add CQSPI_NO_2BYTE_ADDR_PHY_DDR quirk and reject 2-byte-address DDR
   ops in cqspi_supports_mem_op() to work around AM654 erratum i2383
 - Remove RFC tag
 - Rebase on v7.1-rc5
 - Collect tags from Miquel
 - Link to v2: https://lore.kernel.org/linux-spi/20260113141617.1905039-1-s-k6@ti.com/

Changes in v2:
 - Restructure the .execute_tuning() call from spi-mem clients instead
   of mtdcore with best read_op and write_op (optional) passed
 - Add compatible-specific .execute_tuning() call which can be called by
   spi_mem_execute_tuning() if exists
 - Handle tuning requirement check by controller instead of spi-mem
   clients
 - Add support to write the phy_pattern to cache if relevant write_op
   is passed or get the partition offset which contains the phy_pattern
 - Add tuning algorithm for DDR mode
 - Add support for DQS
 - Restrict PHY frequency to tuned operations
 - Link to v1: https://lore.kernel.org/linux-spi/20250811193219.731851-1-s-k6@ti.com/

Signed-off-by: Santhosh Kumar K <s-k6@ti.com>

Pratyush Yadav (1):
  mtd: spi-nor: extract read op template construction into helper

Santhosh Kumar K (12):
  spi: dt-bindings: allow spi-max-frequency to specify a frequency pair
  spi: dt-bindings: cdns,qspi-nor: add PHY tuning pattern partition
    property
  spi: parse two-element spi-max-frequency property
  spi: spi-mem: add spi_mem_apply_base_freq_cap()
  spi: spi-mem: add execute_tuning callback and spi_mem_execute_tuning()
  spi: cadence-quadspi: move cqspi_readdata_capture earlier
  spi: cadence-quadspi: add DQS support to read data capture
  spi: cadence-quadspi: add PHY tuning support
  spi: cadence-quadspi: reject 2-byte-address DDR ops on PHY-tunable
    hardware
  spi: cadence-quadspi: enable PHY for direct reads and indirect writes
  mtd: spinand: run PHY tuning after init and update dirmap frequencies
  mtd: spi-nor: run PHY tuning after init and update dirmap frequency

 .../spi/cdns,qspi-nor-peripheral-props.yaml   |    8 +
 .../bindings/spi/spi-peripheral-props.yaml    |   10 +-
 drivers/mtd/nand/spi/core.c                   |   35 +
 drivers/mtd/spi-nor/core.c                    |   85 +-
 drivers/spi/spi-cadence-quadspi.c             | 2267 +++++++++++++++--
 drivers/spi/spi-mem.c                         |   57 +-
 drivers/spi/spi.c                             |   17 +-
 include/linux/mtd/spi-nor.h                   |    3 +
 include/linux/mtd/spinand.h                   |    4 +
 include/linux/spi/spi-mem.h                   |   10 +
 include/linux/spi/spi.h                       |    2 +
 11 files changed, 2308 insertions(+), 190 deletions(-)

-- 
2.34.1
Re: [PATCH v3 00/13] spi: cadence-quadspi: add PHY tuning support
Posted by Miquel Raynal 1 week, 4 days ago
Hi Santhosh,

Very happy to see this v3! Looks pretty neat overall.

On 27/05/2026 at 23:25:14 +0530, Santhosh Kumar K <s-k6@ti.com> wrote:

> This series implements PHY tuning support for the Cadence QSPI controller
> to enable reliable high-speed operations. Without PHY tuning, controllers
> use conservative timing that limits performance. PHY tuning calibrates
> RX/TX delay lines to find optimal data capture timing windows, enabling
> operation up to the controller's maximum frequency.
>
> Background:
> High-speed SPI memory controllers require precise timing calibration for
> reliable operation. At higher frequencies, board-to-board variations make
> fixed timing parameters inadequate. The Cadence QSPI controller includes
> a PHY interface with programmable delay lines (0-127 taps) for RX and TX
> paths, but these require runtime calibration to find the valid timing
> window.
>
> Approach:
> Add SDR/DDR PHY tuning algorithms for the Cadence controller:
>
> SDR Mode Tuning (1D search):
>  - Searches for two consecutive valid RX delay windows
>  - Selects the larger window and uses its midpoint for maximum margin
>  - TX delay fixed at maximum (127) as it's less critical in SDR
>
> DDR Mode Tuning (2D search):
>  - Finds RX boundaries (rxlow/rxhigh) using TX window sweeps
>  - Finds TX boundaries (txlow/txhigh) at fixed RX positions
>  - Defines valid region corners and detects gaps via binary search
>  - Applies temperature compensation for optimal point selection
>  - Handles single or dual passing regions with different strategies
>
> Patch description:
> Infrastructure (1-5):
>  - Patch 1:   Extend spi-max-frequency DT binding to accept an optional
>               second value forming a [base-freq, max-freq] pair
>  - Patch 2:   Add cadence-specific cdns,phy-pattern-partition phandle for
>               NOR flash PHY tuning pattern location
>  - Patch 3:   Parse two-element spi-max-frequency in spi.c; adds
>               spi_device.base_speed_hz (0 when a single value is used,
>               keeping all existing DT fully compatible)
>  - Patch 4:   Add spi_mem_apply_base_freq_cap(), called from
>               spi_mem_exec_op() to cap non-PHY ops to base_speed_hz;
>               tuned ops bypass the cap because execute_tuning() marks
>               them with op->max_freq = max_speed_hz
>  - Patch 5:   Add execute_tuning callback to spi_controller_mem_ops and
>               spi_mem_execute_tuning() wrapper in SPI-MEM core
>
> Cadence QSPI Implementation (6-10):
>  - Patch 6:   Move cqspi_readdata_capture() earlier (preparatory)
>  - Patch 7:   Add DQS bit to cqspi_readdata_capture() (preparatory)
>  - Patch 8:   Add complete PHY tuning support: DLL management, pattern
>               verification (NOR via cdns,phy-pattern-partition phandle,
>               NAND via write-to-cache), SDR 1D and DDR 2D search
>               algorithms with temperature compensation, AM654-specific
>               execute_tuning entry point; base_speed_hz is cleared during
>               the tuning loop and restored unconditionally on return
>  - Patch 9:   Reject 2-byte-address DDR operations via a new
>               CQSPI_NO_2BYTE_ADDR_PHY_DDR quirk flag to work around
>               AM654 OSPI erratum i2383
>  - Patch 10:  Enable PHY for direct memory-mapped reads (aligned body
>               region only; unaligned head and tail run without PHY) and
>               for indirect writes >= 1 KB
>
> MTD core (11-13):
>  - Patch 11:  Integrate tuning in SPI-NAND probe; propagate the validated
>               frequency to all plane dirmaps (primary and secondary op
>               templates) and to the persistent write dirmap template
>  - Patch 12:  Extract spi_nor_spimem_get_read_op() helper (preparatory)
>  - Patch 13:  Integrate tuning in SPI-NOR probe; patch the dirmap op
>               template with the validated frequency; store the result in
>               nor->max_read_op so all subsequent reads (dirmap and direct)
>               pick up the tuned speed automatically
>
> Series dependency:
> Merge after:
> https://lore.kernel.org/linux-spi/20260527173736.2243004-1-s-k6@ti.com/T/#u

Isn't the DQS series a prerequisite as well? I sent it as an RFC, we can
definitely consider it for merge together with this series once
ready.

Link: https://lore.kernel.org/linux-mtd/20260205-winbond-nand-next-phy-tuning-v1-0-5e7d3976f0f1@bootlin.com/

Do you confirm that you have "[PATCH DO NOT MERGE RFC 4/4] spi: cadence-qspi: Retrieve
DQS capability using the core helper" in your branch for the PHY tuning
series to work?

> Testing:
> This series was tested on TI's
> AM62Ax SK with OSPI NAND flash and
> AM62Px SK with OSPI NOR flash:
>
> Read throughput:
> |-------------------------------------|
> |           | without PHY | with PHY  |
> |-------------------------------------|
> | OSPI NOR  | 37.5 MB/s   | 216 MB/s  |

I am impressed by the SPI NOR improvement o_O

> |-------------------------------------|
> | OSPI NAND | 9.2 MB/s    | 35.1 MB/s |
> |-------------------------------------|

Was this tested in 8D-8D-8D mode?

> Write throughput:
> |-------------------------------------|
> |           | without PHY | with PHY  |
> |-------------------------------------|
> | OSPI NAND | 6 MB/s      | 9.2 MB/s  |
> |-------------------------------------|

Thanks,
Miquèl
Re: [PATCH v3 00/13] spi: cadence-quadspi: add PHY tuning support
Posted by Santhosh Kumar K 1 week ago
Hello Miquel,

On 28/05/26 14:00, Miquel Raynal wrote:
> Hi Santhosh,
> 
> Very happy to see this v3! Looks pretty neat overall.
> 
> On 27/05/2026 at 23:25:14 +0530, Santhosh Kumar K <s-k6@ti.com> wrote:
> 
>> This series implements PHY tuning support for the Cadence QSPI controller
>> to enable reliable high-speed operations. Without PHY tuning, controllers
>> use conservative timing that limits performance. PHY tuning calibrates
>> RX/TX delay lines to find optimal data capture timing windows, enabling
>> operation up to the controller's maximum frequency.
>>
>> Background:
>> High-speed SPI memory controllers require precise timing calibration for
>> reliable operation. At higher frequencies, board-to-board variations make
>> fixed timing parameters inadequate. The Cadence QSPI controller includes
>> a PHY interface with programmable delay lines (0-127 taps) for RX and TX
>> paths, but these require runtime calibration to find the valid timing
>> window.
>>
>> Approach:
>> Add SDR/DDR PHY tuning algorithms for the Cadence controller:
>>
>> SDR Mode Tuning (1D search):
>>   - Searches for two consecutive valid RX delay windows
>>   - Selects the larger window and uses its midpoint for maximum margin
>>   - TX delay fixed at maximum (127) as it's less critical in SDR
>>
>> DDR Mode Tuning (2D search):
>>   - Finds RX boundaries (rxlow/rxhigh) using TX window sweeps
>>   - Finds TX boundaries (txlow/txhigh) at fixed RX positions
>>   - Defines valid region corners and detects gaps via binary search
>>   - Applies temperature compensation for optimal point selection
>>   - Handles single or dual passing regions with different strategies
>>
>> Patch description:
>> Infrastructure (1-5):
>>   - Patch 1:   Extend spi-max-frequency DT binding to accept an optional
>>                second value forming a [base-freq, max-freq] pair
>>   - Patch 2:   Add cadence-specific cdns,phy-pattern-partition phandle for
>>                NOR flash PHY tuning pattern location
>>   - Patch 3:   Parse two-element spi-max-frequency in spi.c; adds
>>                spi_device.base_speed_hz (0 when a single value is used,
>>                keeping all existing DT fully compatible)
>>   - Patch 4:   Add spi_mem_apply_base_freq_cap(), called from
>>                spi_mem_exec_op() to cap non-PHY ops to base_speed_hz;
>>                tuned ops bypass the cap because execute_tuning() marks
>>                them with op->max_freq = max_speed_hz
>>   - Patch 5:   Add execute_tuning callback to spi_controller_mem_ops and
>>                spi_mem_execute_tuning() wrapper in SPI-MEM core
>>
>> Cadence QSPI Implementation (6-10):
>>   - Patch 6:   Move cqspi_readdata_capture() earlier (preparatory)
>>   - Patch 7:   Add DQS bit to cqspi_readdata_capture() (preparatory)
>>   - Patch 8:   Add complete PHY tuning support: DLL management, pattern
>>                verification (NOR via cdns,phy-pattern-partition phandle,
>>                NAND via write-to-cache), SDR 1D and DDR 2D search
>>                algorithms with temperature compensation, AM654-specific
>>                execute_tuning entry point; base_speed_hz is cleared during
>>                the tuning loop and restored unconditionally on return
>>   - Patch 9:   Reject 2-byte-address DDR operations via a new
>>                CQSPI_NO_2BYTE_ADDR_PHY_DDR quirk flag to work around
>>                AM654 OSPI erratum i2383
>>   - Patch 10:  Enable PHY for direct memory-mapped reads (aligned body
>>                region only; unaligned head and tail run without PHY) and
>>                for indirect writes >= 1 KB
>>
>> MTD core (11-13):
>>   - Patch 11:  Integrate tuning in SPI-NAND probe; propagate the validated
>>                frequency to all plane dirmaps (primary and secondary op
>>                templates) and to the persistent write dirmap template
>>   - Patch 12:  Extract spi_nor_spimem_get_read_op() helper (preparatory)
>>   - Patch 13:  Integrate tuning in SPI-NOR probe; patch the dirmap op
>>                template with the validated frequency; store the result in
>>                nor->max_read_op so all subsequent reads (dirmap and direct)
>>                pick up the tuned speed automatically
>>
>> Series dependency:
>> Merge after:
>> https://lore.kernel.org/linux-spi/20260527173736.2243004-1-s-k6@ti.com/T/#u
> 
> Isn't the DQS series a prerequisite as well? I sent it as an RFC, we can
> definitely consider it for merge together with this series once
> ready.
> 
> Link: https://lore.kernel.org/linux-mtd/20260205-winbond-nand-next-phy-tuning-v1-0-5e7d3976f0f1@bootlin.com/
> 
> Do you confirm that you have "[PATCH DO NOT MERGE RFC 4/4] spi: cadence-qspi: Retrieve
> DQS capability using the core helper" in your branch for the PHY tuning
> series to work?

The DQS configuration is now derived from the selected read_op variant
(SDR vs DDR), which in turn selects the corresponding tuning algorithm.
The SDR and DDR tuning algorithms are designed such that SDR tuning runs
with DQS disabled, while DDR tuning runs with DQS enabled.

Because of this, the DQS support series is no longer a prerequisite for
the PHY tuning series. However, it can be useful follow-up to make the
implementation more optimal. Once use_dqs is enabled, we can
additionally check has_dqs to ensure the flash advertises DQS support
before enabling it.

> 
>> Testing:
>> This series was tested on TI's
>> AM62Ax SK with OSPI NAND flash and
>> AM62Px SK with OSPI NOR flash:
>>
>> Read throughput:
>> |-------------------------------------|
>> |           | without PHY | with PHY  |
>> |-------------------------------------|
>> | OSPI NOR  | 37.5 MB/s   | 216 MB/s  |
> 
> I am impressed by the SPI NOR improvement o_O
> 
>> |-------------------------------------|
>> | OSPI NAND | 9.2 MB/s    | 35.1 MB/s |
>> |-------------------------------------|
> 
> Was this tested in 8D-8D-8D mode?

Tested in 8S-PHY mode. 8D-PHY mode is not supported with 2-byte
addressing due to Errata-i2383. [0]

[0] https://www.ti.com/lit/er/sprz544c/sprz544c.pdf

Regards,
Santhosh.

> 
>> Write throughput:
>> |-------------------------------------|
>> |           | without PHY | with PHY  |
>> |-------------------------------------|
>> | OSPI NAND | 6 MB/s      | 9.2 MB/s  |
>> |-------------------------------------|
> 
> Thanks,
> Miquèl