From: Frieder Schrempf <frieder.schrempf@kontron.de>
With clock rates changing on a per-op basis, we need to make sure
that we meet the RX sampling point delay constraint of the underlying
SPI chip.
Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
---
drivers/spi/spi-mem.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
index a09371a075d2e..6b8bee7d6f5e3 100644
--- a/drivers/spi/spi-mem.c
+++ b/drivers/spi/spi-mem.c
@@ -589,6 +589,8 @@ void spi_mem_adjust_op_freq(struct spi_mem *mem, struct spi_mem_op *op)
{
if (!op->max_freq || op->max_freq > mem->spi->max_speed_hz)
op->max_freq = mem->spi->max_speed_hz;
+
+ op->max_freq = spi_set_rx_sampling_point(mem->spi, op->max_freq);
}
EXPORT_SYMBOL_GPL(spi_mem_adjust_op_freq);
--
2.53.0
Hi Frieder,
First, thanks for the series! I believe it will conflict with the SPI
tuning series as you mentioned, but we should be able to make them live
side by side.
On 03/03/2026 at 17:29:27 +01, Frieder Schrempf <frieder@fris.de> wrote:
> From: Frieder Schrempf <frieder.schrempf@kontron.de>
>
> With clock rates changing on a per-op basis, we need to make sure
> that we meet the RX sampling point delay constraint of the underlying
> SPI chip.
>
> Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
> ---
> drivers/spi/spi-mem.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
> index a09371a075d2e..6b8bee7d6f5e3 100644
> --- a/drivers/spi/spi-mem.c
> +++ b/drivers/spi/spi-mem.c
> @@ -589,6 +589,8 @@ void spi_mem_adjust_op_freq(struct spi_mem *mem, struct spi_mem_op *op)
> {
> if (!op->max_freq || op->max_freq > mem->spi->max_speed_hz)
> op->max_freq = mem->spi->max_speed_hz;
> +
> + op->max_freq = spi_set_rx_sampling_point(mem->spi,
> op->max_freq);
How can this work with the above 2 lines? Maybe there will be the need
to use a "min of the max" frequencies?
Thanks,
Miquèl
Hi Miquèl,
On 09.03.26 16:09, Miquel Raynal wrote:
> Hi Frieder,
>
> First, thanks for the series! I believe it will conflict with the SPI
> tuning series as you mentioned, but we should be able to make them live
> side by side.
Ok, good! Thanks for your feedback!
>
> On 03/03/2026 at 17:29:27 +01, Frieder Schrempf <frieder@fris.de> wrote:
>
>> From: Frieder Schrempf <frieder.schrempf@kontron.de>
>>
>> With clock rates changing on a per-op basis, we need to make sure
>> that we meet the RX sampling point delay constraint of the underlying
>> SPI chip.
>>
>> Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
>> ---
>> drivers/spi/spi-mem.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
>> index a09371a075d2e..6b8bee7d6f5e3 100644
>> --- a/drivers/spi/spi-mem.c
>> +++ b/drivers/spi/spi-mem.c
>> @@ -589,6 +589,8 @@ void spi_mem_adjust_op_freq(struct spi_mem *mem, struct spi_mem_op *op)
>> {
>> if (!op->max_freq || op->max_freq > mem->spi->max_speed_hz)
>> op->max_freq = mem->spi->max_speed_hz;
>> +
>> + op->max_freq = spi_set_rx_sampling_point(mem->spi,
>> op->max_freq);
>
> How can this work with the above 2 lines? Maybe there will be the need
> to use a "min of the max" frequencies?
The sampling point delay is coupled to the clock frequency. So if the
clock changes per-op, we also need to change the sampling delay per-op.
In general, if we want to avoid switching the parameters back and forth
in cases of alternating ops with different max frequencies, we should
maybe do some "min of the max" calculation and use the resulting
frequency for all ops.
Is that what you mean?
We could even set a threshold to decide between using a common "min of
the max" frequency or do the switching per-op.
One other possibility would be to somehow cache the per-op frequencies
and calculated sampling delay values so they can be reused when
switching without much overhead.
There is one more issue that is not yet covered by this series: Before
spinand_id_detect() we don't know yet what RX sampling delay value the
chip requires, but we already use read-status and read-id operations at
maximum chip clock.
For example on Winbond W25N04KV this leads to detection failures. So we
should maybe introduce some kind of reduced clock setpoint for the
initial detection, that is safe to be used without RX sampling delay
compensation.
Thanks
Frieder
Hi Frieder, > The sampling point delay is coupled to the clock frequency. So if the > clock changes per-op, we also need to change the sampling delay per-op. > > In general, if we want to avoid switching the parameters back and forth > in cases of alternating ops with different max frequencies, we should > maybe do some "min of the max" calculation and use the resulting > frequency for all ops. > > Is that what you mean? Not exactly, I am not afraid by the time it would take to switch, I am afraid by the likelihood that both the PHY tuning series and your series might want to force a different maximum frequency. For instance with TI designs, when entering PHY mode, the frequency is 166MHz, period. You cannot lower it because by design, you bypass the SPI divisors. > We could even set a threshold to decide between using a common "min of > the max" frequency or do the switching per-op. > > One other possibility would be to somehow cache the per-op frequencies > and calculated sampling delay values so they can be reused when > switching without much overhead. > > There is one more issue that is not yet covered by this series: Before > spinand_id_detect() we don't know yet what RX sampling delay value the > chip requires, but we already use read-status and read-id operations at > maximum chip clock. Not exactly "maximum chip clock" as in Santhosh's work, but indeed we run these at the frequency given in the DT as bein the maximum frequency. If that's not correct, you must lower it in the DT. That is why I am in favour of having two maximum frequencies: the standard one, that just works, which is the one for non optimized settings (the one we actually use today) and another one, when tuning the bus timings. > For example on Winbond W25N04KV this leads to detection failures. So we > should maybe introduce some kind of reduced clock setpoint for the > initial detection, that is safe to be used without RX sampling delay > compensation. That should be the DT max frequency, no? Thanks, Miquèl
Hi Miquèl, On 31.03.26 11:23, Miquel Raynal wrote: > Hi Frieder, > >> The sampling point delay is coupled to the clock frequency. So if the >> clock changes per-op, we also need to change the sampling delay per-op. >> >> In general, if we want to avoid switching the parameters back and forth >> in cases of alternating ops with different max frequencies, we should >> maybe do some "min of the max" calculation and use the resulting >> frequency for all ops. >> >> Is that what you mean? > > Not exactly, I am not afraid by the time it would take to switch, I am > afraid by the likelihood that both the PHY tuning series and your series > might want to force a different maximum frequency. For instance with TI > designs, when entering PHY mode, the frequency is 166MHz, period. You > cannot lower it because by design, you bypass the SPI divisors. Ah, ok. I didn't know the PHY mode bypasses the usual clock setup. > >> We could even set a threshold to decide between using a common "min of >> the max" frequency or do the switching per-op. >> >> One other possibility would be to somehow cache the per-op frequencies >> and calculated sampling delay values so they can be reused when >> switching without much overhead. >> >> There is one more issue that is not yet covered by this series: Before >> spinand_id_detect() we don't know yet what RX sampling delay value the >> chip requires, but we already use read-status and read-id operations at >> maximum chip clock. > > Not exactly "maximum chip clock" as in Santhosh's work, but indeed we > run these at the frequency given in the DT as bein the maximum > frequency. If that's not correct, you must lower it in the DT. That is > why I am in favour of having two maximum frequencies: the standard one, > that just works, which is the one for non optimized settings (the one we > actually use today) and another one, when tuning the bus timings. > >> For example on Winbond W25N04KV this leads to detection failures. So we >> should maybe introduce some kind of reduced clock setpoint for the >> initial detection, that is safe to be used without RX sampling delay >> compensation. > > That should be the DT max frequency, no? Hm, I've seen it the other way round. In my perspective the DT max frequency describes the maximum clock frequency supported by the chip as given in the datasheet (assuming there are no other limitations introduced by board design, etc.). The RX sampling delay is a timing parameter specified in the datasheet and it is relevant in order to interface the chip at maximum frequency within spec. Your approach would mean, that we need to manually calculate the maximum supported clock frequency without RX sampling delay compensation and use this value in the DT. Then let the driver increase the clock if RX sampling delay compensation is available. This would have the advantage, that even before initial detection we would use the correct clock by default. But it has the downside that the clock value in DT won't match the datasheet value. In my approach we assume the driver is able to use the maximum clock from the DT (same as in datasheet) by using RX sampling delay compensation and only if it turns out the compensation is not possible we will lower the clock dynamically. What if we do the following? 1. Add a parameter in the SPI NAND chip driver that contains the maximum supported clock frequency as given in the datasheet as this is clearly the best place to keep this device-specific parameter. 2. Allow to leave spi-max-frequency in DT unset for SPI NANDs in case there are no board-specific limitations and therefore the maximum frequency supported by the combination of chip and controller can be used. 3. By default limit the clock frequency for the READ_ID op to a safe value that works for all chips and controllers, even if RX sampling delay compensations is not available. 4. In PHY mode, check against the DT max frequency (board limitations) and chip max frequency before switching to this mode. I hope this makes at least a bit of sense. I'm really not yet familiar with all the different use-cases and limitations. Best regards Frieder
Hello, >>> For example on Winbond W25N04KV this leads to detection failures. So we >>> should maybe introduce some kind of reduced clock setpoint for the >>> initial detection, that is safe to be used without RX sampling delay >>> compensation. >> >> That should be the DT max frequency, no? > > Hm, I've seen it the other way round. In my perspective the DT max > frequency describes the maximum clock frequency supported by the chip as > given in the datasheet (assuming there are no other limitations > introduced by board design, etc.). I believe that what is in the datasheet should instead be statically described in the SPI NAND vendor file. The max SPI frequency DT property is here to tell at which speed to use the device, and this depends mostly on the routing. > The RX sampling delay is a timing parameter specified in the datasheet > and it is relevant in order to interface the chip at maximum frequency > within spec. If it's not design dependent, then we should handle it "statically". > Your approach would mean, that we need to manually calculate the maximum > supported clock frequency without RX sampling delay compensation and use > this value in the DT. Then let the driver increase the clock if RX > sampling delay compensation is available. > > This would have the advantage, that even before initial detection we > would use the correct clock by default. But it has the downside that the > clock value in DT won't match the datasheet value. It never does. Winbond chips can run at 166MHz. I know no design capable of that natively (ie. without finer tuning, like what Santhosh proposes). > In my approach we assume the driver is able to use the maximum clock > from the DT (same as in datasheet) by using RX sampling delay > compensation and only if it turns out the compensation is not possible > we will lower the clock dynamically. > > What if we do the following? > > 1. Add a parameter in the SPI NAND chip driver that contains the maximum > supported clock frequency as given in the datasheet as this is clearly > the best place to keep this device-specific parameter. I've so far avoided doing it, but yes, this is something we can do. It is going to be simple enough to implement as all the infrastructure is already there. You can provide variants with speed limitations (look into winbond.c). I was writing "0" in the fields which didn't need a limitation that is something else than the maximum speed the chip can sustain, as anyway the DT property *will* tell us what is this speed, if we are ever able to reach it. > 2. Allow to leave spi-max-frequency in DT unset for SPI NANDs in case > there are no board-specific limitations and therefore the maximum > frequency supported by the combination of chip and controller can be > used. In many cases, the limitations are board specific. Anyway, this is already the case, you may just not put any value in this property. > 3. By default limit the clock frequency for the READ_ID op to a safe > value that works for all chips and controllers, even if RX sampling > delay compensations is not available. I am sorry this is not going to work out. There is no such harmonized speed, differences can be an order (or two) of magnitude different, we do not want to pay the penalty of a very slow identification on all designs. We currently do the read several times with different parameters. What if we were trying one more time by cutting the speed by 2 (completely random proposal)? > 4. In PHY mode, check against the DT max frequency (board limitations) > and chip max frequency before switching to this mode. There is not such thing :-) the max frequency in DT currently would be set to the platform limitation. We need somehow another parameter indicating what is the maximum speed if we can fine tune the timings. > I hope this makes at least a bit of sense. I'm really not yet familiar > with all the different use-cases and limitations. If I get back to your own issue. Is there any chance we could avoid tweaking the DT for handling it? Would there be a configuration of your controller that would allow reading the ID of all the chips with enough delays? Currently, the spi core doesn't care much about other parameters than the frequency, but there is perhaps room for improvement. Thanks, Miquèl
On 31.03.26 17:26, Miquel Raynal wrote: > Hello, > >>>> For example on Winbond W25N04KV this leads to detection failures. So we >>>> should maybe introduce some kind of reduced clock setpoint for the >>>> initial detection, that is safe to be used without RX sampling delay >>>> compensation. >>> >>> That should be the DT max frequency, no? >> >> Hm, I've seen it the other way round. In my perspective the DT max >> frequency describes the maximum clock frequency supported by the chip as >> given in the datasheet (assuming there are no other limitations >> introduced by board design, etc.). > > I believe that what is in the datasheet should instead be statically > described in the SPI NAND vendor file. > > The max SPI frequency DT property is here to tell at which speed to use > the device, and this depends mostly on the routing. Ok, agree. > >> The RX sampling delay is a timing parameter specified in the datasheet >> and it is relevant in order to interface the chip at maximum frequency >> within spec. > > If it's not design dependent, then we should handle it "statically". The RX delay introduced by the chip is not design-dependent, only chip-dependent. There might be additional delays introduced by the board layout, but that's out of scope currently and I don't know if we even need this. > >> Your approach would mean, that we need to manually calculate the maximum >> supported clock frequency without RX sampling delay compensation and use >> this value in the DT. Then let the driver increase the clock if RX >> sampling delay compensation is available. >> >> This would have the advantage, that even before initial detection we >> would use the correct clock by default. But it has the downside that the >> clock value in DT won't match the datasheet value. > > It never does. Winbond chips can run at 166MHz. I know no design capable > of that natively (ie. without finer tuning, like what Santhosh > proposes). Ok, right. > >> In my approach we assume the driver is able to use the maximum clock >> from the DT (same as in datasheet) by using RX sampling delay >> compensation and only if it turns out the compensation is not possible >> we will lower the clock dynamically. >> >> What if we do the following? >> >> 1. Add a parameter in the SPI NAND chip driver that contains the maximum >> supported clock frequency as given in the datasheet as this is clearly >> the best place to keep this device-specific parameter. > > I've so far avoided doing it, but yes, this is something we can do. It > is going to be simple enough to implement as all the infrastructure is > already there. You can provide variants with speed limitations (look > into winbond.c). I was writing "0" in the fields which didn't need a > limitation that is something else than the maximum speed the chip can > sustain, as anyway the DT property *will* tell us what is this speed, if > we are ever able to reach it. Ok. > >> 2. Allow to leave spi-max-frequency in DT unset for SPI NANDs in case >> there are no board-specific limitations and therefore the maximum >> frequency supported by the combination of chip and controller can be >> used. > > In many cases, the limitations are board specific. Anyway, this is > already the case, you may just not put any value in this property. Ok. > >> 3. By default limit the clock frequency for the READ_ID op to a safe >> value that works for all chips and controllers, even if RX sampling >> delay compensations is not available. > > I am sorry this is not going to work out. There is no such harmonized > speed, differences can be an order (or two) of magnitude different, we > do not want to pay the penalty of a very slow identification on all > designs. We currently do the read several times with different > parameters. What if we were trying one more time by cutting the speed by > 2 (completely random proposal)? If we try with reduced clock as a fallback after the other attempts, there would be a slight chance of one of the earlier attempts with normal clock rate returning some invalid data that matches an existing chip ID, no? Isn't this a bit flaky? Let's say for a worst case scenario a chip has an RX delay of 20ns (the highest I've seen in datasheets so far is 8ns). In that case the maximum clock we could safely use for reading the ID would be 1/(2*20e-9) = 25MHz. Do you think it really makes much of a difference if we read the ID (only a handful of bytes) at 25MHz or full speed (e. g. 104 MHz)? I mean this should be fast enough either way, no? But maybe I'm misjudging this. > >> 4. In PHY mode, check against the DT max frequency (board limitations) >> and chip max frequency before switching to this mode. > > There is not such thing :-) the max frequency in DT currently would be > set to the platform limitation. We need somehow another parameter > indicating what is the maximum speed if we can fine tune the timings. So we can exceed the platform limitations using fine-tuned timings? I guess I need to read and understand that tuning RFC series. > >> I hope this makes at least a bit of sense. I'm really not yet familiar >> with all the different use-cases and limitations. > > If I get back to your own issue. Is there any chance we could avoid > tweaking the DT for handling it? Would there be a configuration of your > controller that would allow reading the ID of all the chips with enough > delays? Currently, the spi core doesn't care much about other parameters > than the frequency, but there is perhaps room for improvement. We could use RX delay compensation (one half clock cycle) in the controller by default (as currently done by the STM32 driver). But that would break if we use a very low clock (for whatever reason) because then the RX sampling delay gets neglectable compared to the clock period and sampling is triggered at the very end of the clock cycle when the data might already be invalid. At least that's what I suspect based on my test results. And it would also break if the chip actually requires more than one half clock cycle of RX sampling delay. The RX sampling delay setting must match the used clock frequency. In most cases there is a lot of room for tolerance before things start to fail, but it's not endless and especially at very high clock rates we need to take it into account. So if we don't cap the speed for reading the ID by default, we somehow need to know what value to use for the RX sampling delay or we need to try different values. And if the controller does not support or implement the RX sampling delay compensation, we need to cap the speed anyway.
Hi Frieder, > The RX delay introduced by the chip is not design-dependent, only > chip-dependent. There might be additional delays introduced by the board > layout, but that's out of scope currently and I don't know if we even > need this. We do, for very high speeds, and this is the purpose of the PHY tuning series from Santhosh. [...] >>> 3. By default limit the clock frequency for the READ_ID op to a safe >>> value that works for all chips and controllers, even if RX sampling >>> delay compensations is not available. >> >> I am sorry this is not going to work out. There is no such harmonized >> speed, differences can be an order (or two) of magnitude different, we >> do not want to pay the penalty of a very slow identification on all >> designs. We currently do the read several times with different >> parameters. What if we were trying one more time by cutting the speed by >> 2 (completely random proposal)? > > If we try with reduced clock as a fallback after the other attempts, > there would be a slight chance of one of the earlier attempts with > normal clock rate returning some invalid data that matches an existing > chip ID, no? Isn't this a bit flaky? Yeah, I understand your (and Mark's) concern. > Let's say for a worst case scenario a chip has an RX delay of 20ns (the > highest I've seen in datasheets so far is 8ns). In that case the maximum > clock we could safely use for reading the ID would be 1/(2*20e-9) = > 25MHz. Do you think it really makes much of a difference if we read the > ID (only a handful of bytes) at 25MHz or full speed (e. g. 104 MHz)? I > mean this should be fast enough either way, no? But maybe I'm misjudging > this. I am honestly not a big fan of the global penalty, but I am not totally opposed either, especially since you just said you only observed 8ns delays at most. This is 62.5MHz, which is already above what most designs use, so the penalty would be minimal. What about taking this approach and see if that fixes most of our use cases? >>> 4. In PHY mode, check against the DT max frequency (board limitations) >>> and chip max frequency before switching to this mode. >> >> There is not such thing :-) the max frequency in DT currently would be >> set to the platform limitation. We need somehow another parameter >> indicating what is the maximum speed if we can fine tune the timings. > > So we can exceed the platform limitations using fine-tuned timings? I > guess I need to read and understand that tuning RFC series. Well, this is still under discussion. As of today, the max speed in DT is the maximum speed. But it is in most cases the maximum speed *without* tuning, which means if we tune the PHY delays we can increase the speed further. I already proposed to turn this DT property into an array, in order to indicate what are the maximum speed*s*, without and with tuning, if ever possible. >>> I hope this makes at least a bit of sense. I'm really not yet familiar >>> with all the different use-cases and limitations. >> >> If I get back to your own issue. Is there any chance we could avoid >> tweaking the DT for handling it? Would there be a configuration of your >> controller that would allow reading the ID of all the chips with enough >> delays? Currently, the spi core doesn't care much about other parameters >> than the frequency, but there is perhaps room for improvement. > > We could use RX delay compensation (one half clock cycle) in the > controller by default (as currently done by the STM32 driver). But that > would break if we use a very low clock (for whatever reason) because > then the RX sampling delay gets neglectable compared to the clock period > and sampling is triggered at the very end of the clock cycle when the > data might already be invalid. At least that's what I suspect based on > my test results. > And it would also break if the chip actually requires more than one half > clock cycle of RX sampling delay. > > The RX sampling delay setting must match the used clock frequency. In > most cases there is a lot of room for tolerance before things start to > fail, but it's not endless and especially at very high clock rates we > need to take it into account. > > So if we don't cap the speed for reading the ID by default, we somehow > need to know what value to use for the RX sampling delay or we need to > try different values. And if the controller does not support or > implement the RX sampling delay compensation, we need to cap the speed > anyway. What about: *SPI controller drivers* - Enable RX delay compensation in the controller by default, disable it if the speed is very low, ie. compensation is dangerous *Core* - READID at max 65MHz (with the default RX delay compensation) *Vendor driver* - Indicate the internal RX delay per chip (maybe we need some heuristics depending on the speed as well?) *SPI controller drivers* - Hook to tune RX based on the fastest reachable speed. Thanks, Miquèl
On Wed Apr 1, 2026 at 10:32 AM CEST, Miquel Raynal wrote: >> Let's say for a worst case scenario a chip has an RX delay of 20ns (the >> highest I've seen in datasheets so far is 8ns). In that case the maximum >> clock we could safely use for reading the ID would be 1/(2*20e-9) = >> 25MHz. Do you think it really makes much of a difference if we read the >> ID (only a handful of bytes) at 25MHz or full speed (e. g. 104 MHz)? I >> mean this should be fast enough either way, no? But maybe I'm misjudging >> this. > > I am honestly not a big fan of the global penalty, but I am not totally > opposed either, especially since you just said you only observed 8ns > delays at most. This is 62.5MHz, which is already above what most > designs use, so the penalty would be minimal. What about taking this > approach and see if that fixes most of our use cases? What are the actual numbers we are talking about here? I mean, at least for SPI NOR, we only read the ID *once*. And it takes about 56 bits (command + id length of 6). That is about 2us at 25MHz. I'd guess the setup and the software handling takes far longer than that. -michael
On Tue, Mar 31, 2026 at 07:57:21PM +0200, Frieder Schrempf wrote: > On 31.03.26 17:26, Miquel Raynal wrote: > > I am sorry this is not going to work out. There is no such harmonized > > speed, differences can be an order (or two) of magnitude different, we > > do not want to pay the penalty of a very slow identification on all > > designs. We currently do the read several times with different > > parameters. What if we were trying one more time by cutting the speed by > > 2 (completely random proposal)? > If we try with reduced clock as a fallback after the other attempts, > there would be a slight chance of one of the earlier attempts with > normal clock rate returning some invalid data that matches an existing > chip ID, no? Isn't this a bit flaky? I'd worry about single bit errors causing confusion between related parts from a single manufacturer...
From: Frank Li (AI-BOT) <frank.li@nxp.com> Subject: Re: [PATCH RFC 6/7] spi: spi-mem: Call spi_set_rx_sampling_point() for each op > + op->max_freq = spi_set_rx_sampling_point(mem->spi, op->max_freq); Does spi_set_rx_sampling_point() return an error code? If so, you need to check for it. If it can fail, silently ignoring the return value may cause timing violations. Suggest adding error handling or documenting why failure is acceptable here. Also, the blank line before this statement is unnecessary. Remove it to keep the function compact. --- AI bot review and may be useless.
© 2016 - 2026 Red Hat, Inc.