RE: [PATCH v2 0/6] spi-mem: Allow specifying the byte order in DTR mode

David Laight posted 6 patches 4 years, 3 months ago
Only 0 patches received!
There is a newer version of this series
RE: [PATCH v2 0/6] spi-mem: Allow specifying the byte order in DTR mode
Posted by David Laight 4 years, 3 months ago
Thought...

Can you read the device in STR mode until you get a suitable
non-palindromic value, then read it in DTR mode and dynamically
determine the byte order?

Clearly this won't work if the device is erased to all 0xff.
But a check could be done on/after the first write.

I suspect write times are actually dominated by the time spent
waiting for the write to complete?
(Never mind the earlier block erase time.)
So always writing in STR mode probably makes little difference?
Writes really ought to be uncommon as well.

Speeding up reads is a different matter - and probably useful.

Of course, if you've got hardware reading the spi memory in DTR
mode for config data you might need to byteswap it (compared
to the STR writes) - but that is probably a 2nd order problem.

I've got some bespoke logic on an PCIe fpga for accessing spi memory.
Uses address bits for the control signals and converts a 32bit
read/write into 8 nibble transfers to the chip.
(uses byte enables - don't an odd number of clocks.)
mmapp()ed to userspace for updating the 6MB fpga image.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [PATCH v2 0/6] spi-mem: Allow specifying the byte order in DTR mode
Posted by Michael Walle 4 years, 3 months ago
Hi,

Am 2022-03-16 14:55, schrieb David Laight:
> Thought...

Thank you for your proposal.

> Can you read the device in STR mode until you get a suitable
> non-palindromic value, then read it in DTR mode and dynamically
> determine the byte order?
> 
> Clearly this won't work if the device is erased to all 0xff.
> But a check could be done on/after the first write.
> 
> I suspect write times are actually dominated by the time spent
> waiting for the write to complete?
> (Never mind the earlier block erase time.)
> So always writing in STR mode probably makes little difference?
> Writes really ought to be uncommon as well.
> 
> Speeding up reads is a different matter - and probably useful.
> 
> Of course, if you've got hardware reading the spi memory in DTR
> mode for config data you might need to byteswap it (compared
> to the STR writes) - but that is probably a 2nd order problem.
> 
> I've got some bespoke logic on an PCIe fpga for accessing spi memory.
> Uses address bits for the control signals and converts a 32bit
> read/write into 8 nibble transfers to the chip.
> (uses byte enables - don't an odd number of clocks.)
> mmapp()ed to userspace for updating the 6MB fpga image.

Our problem is not how to detect that we have to swap it, but
rather what we do when we have to do it.

If we have a controller which can swap the bytes for us on the
fly, we are lucky and can enable swapping if we need it. We are
also lucky when we don't have to swap the flash contents, obviously.

But what do we do when we need to swap it and the controller
doesn't support it. We could do it in software which will slow
things down. So depending on the use case this might or might not
work. We can degrade it to a speed which doesn't have this issue;
which might be 1-1-1 in the worst case. We could also do just
nothing special; but this will lead to inconsistencies between
reading in 1-1-1 and 8d-8d-8d.

-michael
RE: [PATCH v2 0/6] spi-mem: Allow specifying the byte order in DTR mode
Posted by David Laight 4 years, 3 months ago
From: Michael Walle
> Sent: 17 March 2022 09:40
> 
> Am 2022-03-16 14:55, schrieb David Laight:
> > Thought...
> 
> Thank you for your proposal.
> 
> > Can you read the device in STR mode until you get a suitable
> > non-palindromic value, then read it in DTR mode and dynamically
> > determine the byte order?
> >
> > Clearly this won't work if the device is erased to all 0xff.
> > But a check could be done on/after the first write.
> >
> > I suspect write times are actually dominated by the time spent
> > waiting for the write to complete?
> > (Never mind the earlier block erase time.)
> > So always writing in STR mode probably makes little difference?
> > Writes really ought to be uncommon as well.
> >
> > Speeding up reads is a different matter - and probably useful.
> >
> > Of course, if you've got hardware reading the spi memory in DTR
> > mode for config data you might need to byteswap it (compared
> > to the STR writes) - but that is probably a 2nd order problem.
> >
> > I've got some bespoke logic on an PCIe fpga for accessing spi memory.
> > Uses address bits for the control signals and converts a 32bit
> > read/write into 8 nibble transfers to the chip.
> > (uses byte enables - don't an odd number of clocks.)
> > mmapp()ed to userspace for updating the 6MB fpga image.
> 
> Our problem is not how to detect that we have to swap it, but
> rather what we do when we have to do it.
> 
> If we have a controller which can swap the bytes for us on the
> fly, we are lucky and can enable swapping if we need it. We are
> also lucky when we don't have to swap the flash contents, obviously.
> 
> But what do we do when we need to swap it and the controller
> doesn't support it. We could do it in software which will slow
> things down. So depending on the use case this might or might not
> work. We can degrade it to a speed which doesn't have this issue;
> which might be 1-1-1 in the worst case. We could also do just
> nothing special; but this will lead to inconsistencies between
> reading in 1-1-1 and 8d-8d-8d.

I really doubt you'll notice the effects of a software byteswap
compared to the actual time taken to do an spi read.

What's the maximum clock rate for spi memory?
Something like 50MHz ?
If the spi controller isn't doing dma then the cpu pio reads
to get the data are very likely to be even slower than that.
(Especially if they are PCIe reads.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [PATCH v2 0/6] spi-mem: Allow specifying the byte order in DTR mode
Posted by Vignesh Raghavendra 4 years, 3 months ago

On 17/03/22 3:44 pm, David Laight wrote:
> From: Michael Walle
>> Sent: 17 March 2022 09:40
>>
>> Am 2022-03-16 14:55, schrieb David Laight:
>>> Thought...
>>
>> Thank you for your proposal.
>>
>>> Can you read the device in STR mode until you get a suitable
>>> non-palindromic value, then read it in DTR mode and dynamically
>>> determine the byte order?
>>>
>>> Clearly this won't work if the device is erased to all 0xff.
>>> But a check could be done on/after the first write.
>>>
>>> I suspect write times are actually dominated by the time spent
>>> waiting for the write to complete?
>>> (Never mind the earlier block erase time.)
>>> So always writing in STR mode probably makes little difference?
>>> Writes really ought to be uncommon as well.
>>>
>>> Speeding up reads is a different matter - and probably useful.
>>>
>>> Of course, if you've got hardware reading the spi memory in DTR
>>> mode for config data you might need to byteswap it (compared
>>> to the STR writes) - but that is probably a 2nd order problem.
>>>
>>> I've got some bespoke logic on an PCIe fpga for accessing spi memory.
>>> Uses address bits for the control signals and converts a 32bit
>>> read/write into 8 nibble transfers to the chip.
>>> (uses byte enables - don't an odd number of clocks.)
>>> mmapp()ed to userspace for updating the 6MB fpga image.
>>
>> Our problem is not how to detect that we have to swap it, but
>> rather what we do when we have to do it.
>>
>> If we have a controller which can swap the bytes for us on the
>> fly, we are lucky and can enable swapping if we need it. We are
>> also lucky when we don't have to swap the flash contents, obviously.
>>
>> But what do we do when we need to swap it and the controller
>> doesn't support it. We could do it in software which will slow
>> things down. So depending on the use case this might or might not
>> work. We can degrade it to a speed which doesn't have this issue;
>> which might be 1-1-1 in the worst case. We could also do just
>> nothing special; but this will lead to inconsistencies between
>> reading in 1-1-1 and 8d-8d-8d.
> 
> I really doubt you'll notice the effects of a software byteswap
> compared to the actual time taken to do an spi read.
> 
> What's the maximum clock rate for spi memory?
> Something like 50MHz ?

We have Octal SPI flashes running at upwards of 200MHz clock (400MB/s)
so SW byteswap will add significant overhead.


> If the spi controller isn't doing dma then the cpu pio reads
> to get the data are very likely to be even slower than that.
> (Especially if they are PCIe reads.)
> 

Modern OSPI/QSPI flash controllers provide MMIO interface to read from
flash where DMA can pull data as if though you are reading from On chip RAM

Regards
Vignesh
RE: [PATCH v2 0/6] spi-mem: Allow specifying the byte order in DTR mode
Posted by David Laight 4 years, 3 months ago
From: Vignesh Raghavendra
> Sent: 17 March 2022 10:24
...
> Modern OSPI/QSPI flash controllers provide MMIO interface to read from
> flash where DMA can pull data as if though you are reading from On chip RAM

So the cpu does an MMIO read cycle to the controller which doesn't
complete until (for the nibble-mode spi device I have):
1) Chipselect is asserted.
2) The 8-bit command has been clocked out.
3) The 32bit address have been clocked out (8 clocks in nibbles).
4) A few (probably 4) extra delay clocks are added.
5) The data is read - 8 clocks for 32bits in nibble mode.
6) Chipselect is removed.

Now you can do long sequential reads without all the red tape.
But a random read in nibble mode is about 30 clocks.
16 bit mode saves 6 clocks for the data and maybe 6 for the address?

The controller could do 'clever stuff' for sequential reads.
At a cost of slowing down random reads.

So even at 400MHz it isn't that fast.

If the MMIO interface to the flash controller is PCIe you can
add in a load of extra latency for the cpu read itself.

While PCIe allows multiple read requests to be outstanding,
the Intel cpu I've looked at serialise the reads from each
cpu core (each cpu always uses the same TLP tag).

Now longer read TLP help a lot (IIRC max is 256 bytes).
But the x86 cpu will only generate read TLP for register reads.
You need to use AVX512 registers (or cache line fetches) to
get better throughput!

The alternative is getting the flash controller to issue
the read/write TLP for memory transfers.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [PATCH v2 0/6] spi-mem: Allow specifying the byte order in DTR mode
Posted by Vignesh Raghavendra 4 years, 3 months ago

On 17/03/22 4:40 pm, David Laight wrote:
> From: Vignesh Raghavendra
>> Sent: 17 March 2022 10:24
> ...
>> Modern OSPI/QSPI flash controllers provide MMIO interface to read from
>> flash where DMA can pull data as if though you are reading from On chip RAM
> 
> So the cpu does an MMIO read cycle to the controller which doesn't
> complete until (for the nibble-mode spi device I have):
> 1) Chipselect is asserted.
> 2) The 8-bit command has been clocked out.
> 3) The 32bit address have been clocked out (8 clocks in nibbles).
> 4) A few (probably 4) extra delay clocks are added.
> 5) The data is read - 8 clocks for 32bits in nibble mode.
> 6) Chipselect is removed.
> 
> Now you can do long sequential reads without all the red tape.
> But a random read in nibble mode is about 30 clocks.
> 16 bit mode saves 6 clocks for the data and maybe 6 for the address?
> 
> The controller could do 'clever stuff' for sequential reads.
> At a cost of slowing down random reads.
> 
> So even at 400MHz it isn't that fast.

Random CPU reads would be inherently slow, its just how HW is.

But, there are cases like image load from flash and Filesystem over
flash which would use DMA to maximize performance, such cases would be
greatly affected if we do SW byte swap

> 
> If the MMIO interface to the flash controller is PCIe you can
> add in a load of extra latency for the cpu read itself.
> 
> While PCIe allows multiple read requests to be outstanding,
> the Intel cpu I've looked at serialise the reads from each
> cpu core (each cpu always uses the same TLP tag).
> 
> Now longer read TLP help a lot (IIRC max is 256 bytes).
> But the x86 cpu will only generate read TLP for register reads.
> You need to use AVX512 registers (or cache line fetches) to
> get better throughput!
> 

Direct CPU fetch from SPI would not be able to make use of full
Bandwidth for high speed flashes and its not the only usecase.

Regards
Vignesh