drivers/pci/bus.c | 9 +++++++++ 1 file changed, 9 insertions(+)
We have numerous platforms that permit assigning addresses from 0 to PCI
devices, both in the memory and the I/O bus space, and we happily do so
if there is no conflict, e.g.:
pci 0000:07:00.0: BAR 0: assigned [io 0x0000-0x0007]
pci 0000:07:00.1: BAR 0: assigned [io 0x0008-0x000f]
pci 0000:06:01.0: PCI bridge to [bus 07]
pci 0000:06:01.0: bridge window [io 0x0000-0x0fff]
(with the SiFive HiFive Unmatched RISC-V board and a dual serial port
option card based on the OxSemi OXPCIe952 device wired for the legacy
UART mode).
Address 0 is treated specially however in many places, for example in
`pci_iomap_range' and `pci_iomap_wc_range' we require that the start
address is non-zero, and even if we let such an address through, then
individual device drivers could reject a request to handle a device at
such an address, such as in `uart_configure_port'. Consequently given
devices configured as shown above only one is actually usable:
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
serial 0000:07:00.0: enabling device (0000 -> 0001)
serial: probe of 0000:07:00.0 failed with error -12
serial 0000:07:00.1: enabling device (0000 -> 0001)
serial 0000:07:00.1: detected caps 00000700 should be 00000500
0000:07:00.1: ttyS0 at I/O 0x8 (irq = 39, base_baud = 15625000) is a 16C950/954
Especially I/O space ranges are particularly valuable, because bridges
only decode bits from 12 up and consequently where 16-bit addressing is
in effect, as few as 16 separate ranges can be assigned to individual
buses only.
Therefore avoid handing out address 0, however rather than bumping the
lowest address available to PCI via PCIBIOS_MIN_IO and PCIBIOS_MIN_MEM,
or doing an equivalent arrangement in `__pci_assign_resource', let the
whole range assigned to a bus start from that address and instead only
avoid it for actual devices. Do it in `pci_bus_alloc_from_region' then
observing that bridge resources will have the IORESOURCE_STARTALIGN flag
set rather than IORESOURCE_SIZEALIGN and by making the least significant
bit decoded 1 according to the resource type, either memory or I/O.
With this in place the system in question we have:
pci 0000:07:00.0: BAR 0: assigned [io 0x0008-0x000f]
pci 0000:07:00.1: BAR 0: assigned [io 0x0010-0x0017]
pci 0000:06:01.0: PCI bridge to [bus 07]
pci 0000:06:01.0: bridge window [io 0x0000-0x0fff]
and then devices work correctly:
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
serial 0000:07:00.0: enabling device (0000 -> 0001)
serial 0000:07:00.0: detected caps 00000700 should be 00000500
0000:07:00.0: ttyS0 at I/O 0x8 (irq = 38, base_baud = 15625000) is a 16C950/954
serial 0000:07:00.1: enabling device (0000 -> 0001)
serial 0000:07:00.1: detected caps 00000700 should be 00000500
0000:07:00.1: ttyS1 at I/O 0x10 (irq = 39, base_baud = 15625000) is a 16C950/954
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
Hi,
NB I have an OxSemi OXPCIe952 based card that can be wired to either the
native or the legacy mode via a jumper block and I am so glad that I have
checked whether it works in the legacy mode as well. I guess there are so
few legacy-free platforms still for nobody else to notice this issue yet.
I think I've chosen the right solution, but I'll be happy to hear any
suggestions for an alternative one. Otherwise please apply.
Maciej
---
drivers/pci/bus.c | 9 +++++++++
1 file changed, 9 insertions(+)
linux-pci-bus-alloc-from-region-min.diff
Index: linux-macro/drivers/pci/bus.c
===================================================================
--- linux-macro.orig/drivers/pci/bus.c
+++ linux-macro/drivers/pci/bus.c
@@ -194,6 +194,15 @@ static int pci_bus_alloc_from_region(str
*/
if (avail.start)
min_used = avail.start;
+ /*
+ * For non-bridge resources avoid assigning address 0 as
+ * we assume that to mean no assignment in many places,
+ * starting from `pci_iomap_range'.
+ */
+ if (min_used == 0 && (res->flags & IORESOURCE_SIZEALIGN))
+ min_used = res->flags & IORESOURCE_IO ?
+ ~PCI_BASE_ADDRESS_IO_MASK + 1 :
+ ~PCI_BASE_ADDRESS_MEM_MASK + 1;
max = avail.end;
On Sat, Feb 26, 2022 at 10:47:10AM +0000, Maciej W. Rozycki wrote: > We have numerous platforms that permit assigning addresses from 0 to PCI > devices, both in the memory and the I/O bus space, and we happily do so > if there is no conflict, e.g.: > > pci 0000:07:00.0: BAR 0: assigned [io 0x0000-0x0007] > pci 0000:07:00.1: BAR 0: assigned [io 0x0008-0x000f] > pci 0000:06:01.0: PCI bridge to [bus 07] > pci 0000:06:01.0: bridge window [io 0x0000-0x0fff] > > (with the SiFive HiFive Unmatched RISC-V board and a dual serial port > option card based on the OxSemi OXPCIe952 device wired for the legacy > UART mode). > > Address 0 is treated specially however in many places, for example in > `pci_iomap_range' and `pci_iomap_wc_range' we require that the start > address is non-zero, and even if we let such an address through, then > individual device drivers could reject a request to handle a device at > such an address, such as in `uart_configure_port'. Consequently given > devices configured as shown above only one is actually usable: pci_iomap_range() tests the resource start, i.e., the CPU address. I guess the implication is that on RISC-V, the CPU-side port address is the same as the PCI bus port address? Is that actually a requirement? Maybe you could also avoid this by remapping the ports in CPU address space? Is the same true for PCI memory addresses? They are identical to CPU addresses, i.e., no translation is applied? On the PCI side, zero is a perfectly valid address, so it's a shame to throw it away if we don't have to, especially since throwing away even 16 bytes of MMIO space means a 4GB 32-bit BAR cannot be mapped at all. Bjorn
On Wed, 13 Apr 2022, Bjorn Helgaas wrote:
> > Address 0 is treated specially however in many places, for example in
> > `pci_iomap_range' and `pci_iomap_wc_range' we require that the start
> > address is non-zero, and even if we let such an address through, then
> > individual device drivers could reject a request to handle a device at
> > such an address, such as in `uart_configure_port'. Consequently given
> > devices configured as shown above only one is actually usable:
>
> pci_iomap_range() tests the resource start, i.e., the CPU address. I
> guess the implication is that on RISC-V, the CPU-side port address is
> the same as the PCI bus port address?
Umm, for all systems I came across except x86, which have native port I/O
access machine instructions, a port I/O resource records PCI bus addresses
of the device rather than its CPU addresses, which encode the location of
an MMIO window the PCI port I/O space is accessed through. E.g. my RISC-V
SiFive Unmatched has this:
# cat /proc/ioports
00000000-0000ffff : pcie@e00000000
00000000-00001fff : PCI Bus 0000:01
00000000-00001fff : PCI Bus 0000:02
00000000-00001fff : PCI Bus 0000:05
00000000-00001fff : PCI Bus 0000:06
00000000-00000fff : PCI Bus 0000:07
00000004-00000007 : 0000:07:00.0
00000004-00000006 : parport0
00000008-0000000f : 0000:07:00.0
00000008-0000000a : parport0
0000000b-0000000f : parport0
00001000-00001fff : PCI Bus 0000:08
00001000-00001fff : PCI Bus 0000:09
00001000-000010ff : 0000:09:02.0
00001100-0000117f : 0000:09:01.0
#
and my MIPS MTI Malta has this:
# cat /proc/ioports
00000000-0000001f : dma1
00000020-00000021 : pic1
00000040-0000005f : timer
00000060-0000006f : keyboard
00000070-00000077 : rtc0
00000080-0000008f : dma page reg
000000a0-000000a1 : pic2
000000c0-000000df : dma2
00000170-00000177 : ata_piix
000001f0-000001f7 : ata_piix
000002f8-000002ff : serial
00000376-00000376 : ata_piix
00000378-0000037a : parport0
0000037b-0000037f : parport0
000003f6-000003f6 : ata_piix
000003f8-000003ff : serial
00001000-001fffff : GT-64120 PCI I/O
00001000-0000103f : 0000:00:0a.3
00001040-0000105f : 0000:00:0a.2
00001060-0000107f : 0000:00:0b.0
00001060-0000107f : pcnet32_probe_pci
00001080-000010ff : 0000:00:14.0
00001100-0000110f : 0000:00:0a.3
00001400-000014ff : 0000:00:13.0
00001800-0000180f : 0000:00:0a.1
00001800-0000180f : ata_piix
#
(though this is not strictly correctly reported, because the legacy junk
is also behind the GT-64120). It is especially clear with the Malta that
PCI port I/O addresses have nothing to do with CPU addresses given this:
# cat /proc/iomem
00000000-0fffbfff : System RAM
00100000-0076e9bf : Kernel code
0076e9c0-0097665f : Kernel data
00ab0000-00ae6ccf : Kernel bss
10000000-17ffffff : GT-64120 PCI MEM
10000000-100fffff : 0000:00:0b.0
10100000-101fffff : PCI Bus 0000:01
10100000-10101fff : 0000:01:00.0
10100000-10101fff : xhci-hcd
10200000-1021ffff : 0000:00:13.0
10220000-1022ffff : 0000:00:0c.0
10230000-1023ffff : 0000:00:14.0
10240000-10240fff : 0000:00:0c.0
10241000-10241fff : 0000:00:13.0
10242000-1024207f : 0000:00:14.0
10242000-1024207f : defxx
10242080-1024209f : 0000:00:0b.0
1e000000-1e3fffff : 1e000000.flash flash@1e000000
1f000900-1f00093f : serial
#
where we have system RAM from CPU address 0 onwards (of course the Malta
has legacy PC/ISA junk in the southbridge, so it only allocates native PCI
port I/O resources from 0x1000 up and therefore it avoids the problem with
port I/O address 0).
Maybe there are systems that don't do that and use CPU addresses for port
I/O resources, but I haven't come across one.
> Is that actually a requirement? Maybe you could also avoid this by
> remapping the ports in CPU address space?
Sadly it's not recorded in /proc/iomem, but from my understanding of the
Unmatched DTS the CPU address of PCI I/O port 0 is 0x60080000, and for the
Malta likewise it's 0x18000000, so the remapping is already there.
> Is the same true for PCI memory addresses? They are identical to CPU
> addresses, i.e., no translation is applied?
For MMIO I guess this isn't a problem for the systems I know of, but it
would be if the PCI MMIO access window was decoded at 0 somewhere. For
the Unmatched and the Malta the windows are at 0x60090000 and 0x10000000
respectively.
> On the PCI side, zero is a perfectly valid address, so it's a shame to
> throw it away if we don't have to, especially since throwing away even
> 16 bytes of MMIO space means a 4GB 32-bit BAR cannot be mapped at all.
A problem with considering an address special, be it 0 or another value,
is that such a designated location is thrown away. Buses usually do not
treat any addresses specially, it's merely a software convention.
Maciej
On Thu, Apr 14, 2022 at 02:10:43AM +0100, Maciej W. Rozycki wrote: > On Wed, 13 Apr 2022, Bjorn Helgaas wrote: > > > > Address 0 is treated specially however in many places, for example in > > > `pci_iomap_range' and `pci_iomap_wc_range' we require that the start > > > address is non-zero, and even if we let such an address through, then > > > individual device drivers could reject a request to handle a device at > > > such an address, such as in `uart_configure_port'. Consequently given > > > devices configured as shown above only one is actually usable: > > > > pci_iomap_range() tests the resource start, i.e., the CPU address. I > > guess the implication is that on RISC-V, the CPU-side port address is > > the same as the PCI bus port address? > > Umm, for all systems I came across except x86, which have native port I/O > access machine instructions, a port I/O resource records PCI bus addresses > of the device rather than its CPU addresses, which encode the location of > an MMIO window the PCI port I/O space is accessed through. My point is only that it is not necessary for the PCI bus address and the struct resource address, i.e., the argument to inb(), to be the same. I tried to find the RISC-V definition of inb(), but it's obfuscated too much to be easily discoverable. Bjorn
On Thu, 14 Apr 2022, Bjorn Helgaas wrote: > > > > Address 0 is treated specially however in many places, for example in > > > > `pci_iomap_range' and `pci_iomap_wc_range' we require that the start > > > > address is non-zero, and even if we let such an address through, then > > > > individual device drivers could reject a request to handle a device at > > > > such an address, such as in `uart_configure_port'. Consequently given > > > > devices configured as shown above only one is actually usable: > > > > > > pci_iomap_range() tests the resource start, i.e., the CPU address. I > > > guess the implication is that on RISC-V, the CPU-side port address is > > > the same as the PCI bus port address? > > > > Umm, for all systems I came across except x86, which have native port I/O > > access machine instructions, a port I/O resource records PCI bus addresses > > of the device rather than its CPU addresses, which encode the location of > > an MMIO window the PCI port I/O space is accessed through. > > My point is only that it is not necessary for the PCI bus address and > the struct resource address, i.e., the argument to inb(), to be the > same. Sure, but I have yet to see a system where it is the case. Also in principle peer PCI buses could have their own port I/O address spaces each mapped via distinct MMIO windows in the CPU address space, but I haven't heard of such a system. That of course doesn't mean there's no such system in existence. > I tried to find the RISC-V definition of inb(), but it's obfuscated > too much to be easily discoverable. AFAICT the RISC-V port uses definitions from include/asm-generic/io.h. Palmer, did I get this right? Maciej
On Thu, 14 Apr 2022 13:22:42 PDT (-0700), macro@orcam.me.uk wrote:
> On Thu, 14 Apr 2022, Bjorn Helgaas wrote:
>
>> > > > Address 0 is treated specially however in many places, for example in
>> > > > `pci_iomap_range' and `pci_iomap_wc_range' we require that the start
>> > > > address is non-zero, and even if we let such an address through, then
>> > > > individual device drivers could reject a request to handle a device at
>> > > > such an address, such as in `uart_configure_port'. Consequently given
>> > > > devices configured as shown above only one is actually usable:
>> > >
>> > > pci_iomap_range() tests the resource start, i.e., the CPU address. I
>> > > guess the implication is that on RISC-V, the CPU-side port address is
>> > > the same as the PCI bus port address?
>> >
>> > Umm, for all systems I came across except x86, which have native port I/O
>> > access machine instructions, a port I/O resource records PCI bus addresses
>> > of the device rather than its CPU addresses, which encode the location of
>> > an MMIO window the PCI port I/O space is accessed through.
>>
>> My point is only that it is not necessary for the PCI bus address and
>> the struct resource address, i.e., the argument to inb(), to be the
>> same.
>
> Sure, but I have yet to see a system where it is the case.
>
> Also in principle peer PCI buses could have their own port I/O address
> spaces each mapped via distinct MMIO windows in the CPU address space, but
> I haven't heard of such a system. That of course doesn't mean there's no
> such system in existence.
>
>> I tried to find the RISC-V definition of inb(), but it's obfuscated
>> too much to be easily discoverable.
>
> AFAICT the RISC-V port uses definitions from include/asm-generic/io.h.
> Palmer, did I get this right?
I'd argue that asm-generic/io.h uses the definitions from RISC-V, but
the result is the same ;).
The general idea is that the IO itself is pretty generic for a handful
of ports, they just need to be decorated with some fences (or whatever
the ISA calls them) before/after the load/store. Those are the
__io_p{b,a}{r,w}() macros, which are in riscv's io.h. IIRC they stand
for something like Port{Before,After}{Read,Write}.
On Sat, 26 Feb 2022, Maciej W. Rozycki wrote: > We have numerous platforms that permit assigning addresses from 0 to PCI > devices, both in the memory and the I/O bus space, and we happily do so > if there is no conflict, e.g.: Ping for: <https://lore.kernel.org/lkml/alpine.DEB.2.21.2202260044180.25061@angie.orcam.me.uk/> Maciej
On Sat, 26 Feb 2022, Maciej W. Rozycki wrote: > We have numerous platforms that permit assigning addresses from 0 to PCI > devices, both in the memory and the I/O bus space, and we happily do so > if there is no conflict, e.g.: Ping for: <https://lore.kernel.org/lkml/alpine.DEB.2.21.2202260044180.25061@angie.orcam.me.uk/> Maciej
© 2016 - 2026 Red Hat, Inc.