[PATCH] PCI: Avoid handing out address 0 to devices

Maciej W. Rozycki posted 1 patch 4 years, 3 months ago
drivers/pci/bus.c |    9 +++++++++
1 file changed, 9 insertions(+)
[PATCH] PCI: Avoid handing out address 0 to devices
Posted by Maciej W. Rozycki 4 years, 3 months ago
We have numerous platforms that permit assigning addresses from 0 to PCI 
devices, both in the memory and the I/O bus space, and we happily do so 
if there is no conflict, e.g.:

pci 0000:07:00.0: BAR 0: assigned [io  0x0000-0x0007]
pci 0000:07:00.1: BAR 0: assigned [io  0x0008-0x000f]
pci 0000:06:01.0: PCI bridge to [bus 07]
pci 0000:06:01.0:   bridge window [io  0x0000-0x0fff]

(with the SiFive HiFive Unmatched RISC-V board and a dual serial port 
option card based on the OxSemi OXPCIe952 device wired for the legacy 
UART mode).

Address 0 is treated specially however in many places, for example in 
`pci_iomap_range' and `pci_iomap_wc_range' we require that the start 
address is non-zero, and even if we let such an address through, then 
individual device drivers could reject a request to handle a device at 
such an address, such as in `uart_configure_port'.  Consequently given
devices configured as shown above only one is actually usable:

Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
serial 0000:07:00.0: enabling device (0000 -> 0001)
serial: probe of 0000:07:00.0 failed with error -12
serial 0000:07:00.1: enabling device (0000 -> 0001)
serial 0000:07:00.1: detected caps 00000700 should be 00000500
0000:07:00.1: ttyS0 at I/O 0x8 (irq = 39, base_baud = 15625000) is a 16C950/954

Especially I/O space ranges are particularly valuable, because bridges 
only decode bits from 12 up and consequently where 16-bit addressing is 
in effect, as few as 16 separate ranges can be assigned to individual 
buses only.

Therefore avoid handing out address 0, however rather than bumping the 
lowest address available to PCI via PCIBIOS_MIN_IO and PCIBIOS_MIN_MEM, 
or doing an equivalent arrangement in `__pci_assign_resource', let the 
whole range assigned to a bus start from that address and instead only 
avoid it for actual devices.  Do it in `pci_bus_alloc_from_region' then 
observing that bridge resources will have the IORESOURCE_STARTALIGN flag 
set rather than IORESOURCE_SIZEALIGN and by making the least significant 
bit decoded 1 according to the resource type, either memory or I/O.

With this in place the system in question we have:

pci 0000:07:00.0: BAR 0: assigned [io  0x0008-0x000f]
pci 0000:07:00.1: BAR 0: assigned [io  0x0010-0x0017]
pci 0000:06:01.0: PCI bridge to [bus 07]
pci 0000:06:01.0:   bridge window [io  0x0000-0x0fff]

and then devices work correctly:

Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
serial 0000:07:00.0: enabling device (0000 -> 0001)
serial 0000:07:00.0: detected caps 00000700 should be 00000500
0000:07:00.0: ttyS0 at I/O 0x8 (irq = 38, base_baud = 15625000) is a 16C950/954
serial 0000:07:00.1: enabling device (0000 -> 0001)
serial 0000:07:00.1: detected caps 00000700 should be 00000500
0000:07:00.1: ttyS1 at I/O 0x10 (irq = 39, base_baud = 15625000) is a 16C950/954

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
Hi,

 NB I have an OxSemi OXPCIe952 based card that can be wired to either the 
native or the legacy mode via a jumper block and I am so glad that I have 
checked whether it works in the legacy mode as well.  I guess there are so 
few legacy-free platforms still for nobody else to notice this issue yet.

 I think I've chosen the right solution, but I'll be happy to hear any 
suggestions for an alternative one.  Otherwise please apply.

  Maciej
---
 drivers/pci/bus.c |    9 +++++++++
 1 file changed, 9 insertions(+)

linux-pci-bus-alloc-from-region-min.diff
Index: linux-macro/drivers/pci/bus.c
===================================================================
--- linux-macro.orig/drivers/pci/bus.c
+++ linux-macro/drivers/pci/bus.c
@@ -194,6 +194,15 @@ static int pci_bus_alloc_from_region(str
 		 */
 		if (avail.start)
 			min_used = avail.start;
+		/*
+		 * For non-bridge resources avoid assigning address 0 as
+		 * we assume that to mean no assignment in many places,
+		 * starting from `pci_iomap_range'.
+		 */
+		if (min_used == 0 && (res->flags & IORESOURCE_SIZEALIGN))
+			min_used = res->flags & IORESOURCE_IO ?
+				   ~PCI_BASE_ADDRESS_IO_MASK + 1 :
+				   ~PCI_BASE_ADDRESS_MEM_MASK + 1;
 
 		max = avail.end;
Re: [PATCH] PCI: Avoid handing out address 0 to devices
Posted by Bjorn Helgaas 4 years, 2 months ago
On Sat, Feb 26, 2022 at 10:47:10AM +0000, Maciej W. Rozycki wrote:
> We have numerous platforms that permit assigning addresses from 0 to PCI 
> devices, both in the memory and the I/O bus space, and we happily do so 
> if there is no conflict, e.g.:
> 
> pci 0000:07:00.0: BAR 0: assigned [io  0x0000-0x0007]
> pci 0000:07:00.1: BAR 0: assigned [io  0x0008-0x000f]
> pci 0000:06:01.0: PCI bridge to [bus 07]
> pci 0000:06:01.0:   bridge window [io  0x0000-0x0fff]
> 
> (with the SiFive HiFive Unmatched RISC-V board and a dual serial port 
> option card based on the OxSemi OXPCIe952 device wired for the legacy 
> UART mode).
> 
> Address 0 is treated specially however in many places, for example in 
> `pci_iomap_range' and `pci_iomap_wc_range' we require that the start 
> address is non-zero, and even if we let such an address through, then 
> individual device drivers could reject a request to handle a device at 
> such an address, such as in `uart_configure_port'.  Consequently given
> devices configured as shown above only one is actually usable:

pci_iomap_range() tests the resource start, i.e., the CPU address.  I
guess the implication is that on RISC-V, the CPU-side port address is
the same as the PCI bus port address?

Is that actually a requirement?  Maybe you could also avoid this by
remapping the ports in CPU address space?

Is the same true for PCI memory addresses?  They are identical to CPU
addresses, i.e., no translation is applied?

On the PCI side, zero is a perfectly valid address, so it's a shame to
throw it away if we don't have to, especially since throwing away even
16 bytes of MMIO space means a 4GB 32-bit BAR cannot be mapped at all.

Bjorn
Re: [PATCH] PCI: Avoid handing out address 0 to devices
Posted by Maciej W. Rozycki 4 years, 2 months ago
On Wed, 13 Apr 2022, Bjorn Helgaas wrote:

> > Address 0 is treated specially however in many places, for example in 
> > `pci_iomap_range' and `pci_iomap_wc_range' we require that the start 
> > address is non-zero, and even if we let such an address through, then 
> > individual device drivers could reject a request to handle a device at 
> > such an address, such as in `uart_configure_port'.  Consequently given
> > devices configured as shown above only one is actually usable:
> 
> pci_iomap_range() tests the resource start, i.e., the CPU address.  I
> guess the implication is that on RISC-V, the CPU-side port address is
> the same as the PCI bus port address?

 Umm, for all systems I came across except x86, which have native port I/O 
access machine instructions, a port I/O resource records PCI bus addresses 
of the device rather than its CPU addresses, which encode the location of 
an MMIO window the PCI port I/O space is accessed through.  E.g. my RISC-V 
SiFive Unmatched has this:

# cat /proc/ioports
00000000-0000ffff : pcie@e00000000
  00000000-00001fff : PCI Bus 0000:01
    00000000-00001fff : PCI Bus 0000:02
      00000000-00001fff : PCI Bus 0000:05
        00000000-00001fff : PCI Bus 0000:06
          00000000-00000fff : PCI Bus 0000:07
          00000004-00000007 : 0000:07:00.0
          00000004-00000006 : parport0
          00000008-0000000f : 0000:07:00.0
          00000008-0000000a : parport0
          0000000b-0000000f : parport0
          00001000-00001fff : PCI Bus 0000:08
          00001000-00001fff : PCI Bus 0000:09
          00001000-000010ff : 0000:09:02.0
          00001100-0000117f : 0000:09:01.0
# 

and my MIPS MTI Malta has this:

# cat /proc/ioports
00000000-0000001f : dma1
00000020-00000021 : pic1
00000040-0000005f : timer
00000060-0000006f : keyboard
00000070-00000077 : rtc0
00000080-0000008f : dma page reg
000000a0-000000a1 : pic2
000000c0-000000df : dma2
00000170-00000177 : ata_piix
000001f0-000001f7 : ata_piix
000002f8-000002ff : serial
00000376-00000376 : ata_piix
00000378-0000037a : parport0
0000037b-0000037f : parport0
000003f6-000003f6 : ata_piix
000003f8-000003ff : serial
00001000-001fffff : GT-64120 PCI I/O
  00001000-0000103f : 0000:00:0a.3
  00001040-0000105f : 0000:00:0a.2
  00001060-0000107f : 0000:00:0b.0
    00001060-0000107f : pcnet32_probe_pci
  00001080-000010ff : 0000:00:14.0
  00001100-0000110f : 0000:00:0a.3
  00001400-000014ff : 0000:00:13.0
  00001800-0000180f : 0000:00:0a.1
    00001800-0000180f : ata_piix
# 

(though this is not strictly correctly reported, because the legacy junk 
is also behind the GT-64120).  It is especially clear with the Malta that 
PCI port I/O addresses have nothing to do with CPU addresses given this:

# cat /proc/iomem
00000000-0fffbfff : System RAM
  00100000-0076e9bf : Kernel code
  0076e9c0-0097665f : Kernel data
  00ab0000-00ae6ccf : Kernel bss
10000000-17ffffff : GT-64120 PCI MEM
  10000000-100fffff : 0000:00:0b.0
  10100000-101fffff : PCI Bus 0000:01
    10100000-10101fff : 0000:01:00.0
      10100000-10101fff : xhci-hcd
  10200000-1021ffff : 0000:00:13.0
  10220000-1022ffff : 0000:00:0c.0
  10230000-1023ffff : 0000:00:14.0
  10240000-10240fff : 0000:00:0c.0
  10241000-10241fff : 0000:00:13.0
  10242000-1024207f : 0000:00:14.0
    10242000-1024207f : defxx
  10242080-1024209f : 0000:00:0b.0
1e000000-1e3fffff : 1e000000.flash flash@1e000000
1f000900-1f00093f : serial
# 

where we have system RAM from CPU address 0 onwards (of course the Malta 
has legacy PC/ISA junk in the southbridge, so it only allocates native PCI 
port I/O resources from 0x1000 up and therefore it avoids the problem with 
port I/O address 0).

 Maybe there are systems that don't do that and use CPU addresses for port 
I/O resources, but I haven't come across one.

> Is that actually a requirement?  Maybe you could also avoid this by
> remapping the ports in CPU address space?

 Sadly it's not recorded in /proc/iomem, but from my understanding of the 
Unmatched DTS the CPU address of PCI I/O port 0 is 0x60080000, and for the 
Malta likewise it's 0x18000000, so the remapping is already there.

> Is the same true for PCI memory addresses?  They are identical to CPU
> addresses, i.e., no translation is applied?

 For MMIO I guess this isn't a problem for the systems I know of, but it 
would be if the PCI MMIO access window was decoded at 0 somewhere.  For 
the Unmatched and the Malta the windows are at 0x60090000 and 0x10000000 
respectively.

> On the PCI side, zero is a perfectly valid address, so it's a shame to
> throw it away if we don't have to, especially since throwing away even
> 16 bytes of MMIO space means a 4GB 32-bit BAR cannot be mapped at all.

 A problem with considering an address special, be it 0 or another value, 
is that such a designated location is thrown away.  Buses usually do not 
treat any addresses specially, it's merely a software convention.

  Maciej
Re: [PATCH] PCI: Avoid handing out address 0 to devices
Posted by Bjorn Helgaas 4 years, 2 months ago
On Thu, Apr 14, 2022 at 02:10:43AM +0100, Maciej W. Rozycki wrote:
> On Wed, 13 Apr 2022, Bjorn Helgaas wrote:
> 
> > > Address 0 is treated specially however in many places, for example in 
> > > `pci_iomap_range' and `pci_iomap_wc_range' we require that the start 
> > > address is non-zero, and even if we let such an address through, then 
> > > individual device drivers could reject a request to handle a device at 
> > > such an address, such as in `uart_configure_port'.  Consequently given
> > > devices configured as shown above only one is actually usable:
> > 
> > pci_iomap_range() tests the resource start, i.e., the CPU address.  I
> > guess the implication is that on RISC-V, the CPU-side port address is
> > the same as the PCI bus port address?
> 
>  Umm, for all systems I came across except x86, which have native port I/O 
> access machine instructions, a port I/O resource records PCI bus addresses 
> of the device rather than its CPU addresses, which encode the location of 
> an MMIO window the PCI port I/O space is accessed through.

My point is only that it is not necessary for the PCI bus address and
the struct resource address, i.e., the argument to inb(), to be the
same.

I tried to find the RISC-V definition of inb(), but it's obfuscated
too much to be easily discoverable.

Bjorn
Re: [PATCH] PCI: Avoid handing out address 0 to devices
Posted by Maciej W. Rozycki 4 years, 2 months ago
On Thu, 14 Apr 2022, Bjorn Helgaas wrote:

> > > > Address 0 is treated specially however in many places, for example in 
> > > > `pci_iomap_range' and `pci_iomap_wc_range' we require that the start 
> > > > address is non-zero, and even if we let such an address through, then 
> > > > individual device drivers could reject a request to handle a device at 
> > > > such an address, such as in `uart_configure_port'.  Consequently given
> > > > devices configured as shown above only one is actually usable:
> > > 
> > > pci_iomap_range() tests the resource start, i.e., the CPU address.  I
> > > guess the implication is that on RISC-V, the CPU-side port address is
> > > the same as the PCI bus port address?
> > 
> >  Umm, for all systems I came across except x86, which have native port I/O 
> > access machine instructions, a port I/O resource records PCI bus addresses 
> > of the device rather than its CPU addresses, which encode the location of 
> > an MMIO window the PCI port I/O space is accessed through.
> 
> My point is only that it is not necessary for the PCI bus address and
> the struct resource address, i.e., the argument to inb(), to be the
> same.

 Sure, but I have yet to see a system where it is the case.

 Also in principle peer PCI buses could have their own port I/O address 
spaces each mapped via distinct MMIO windows in the CPU address space, but 
I haven't heard of such a system.  That of course doesn't mean there's no 
such system in existence.

> I tried to find the RISC-V definition of inb(), but it's obfuscated
> too much to be easily discoverable.

 AFAICT the RISC-V port uses definitions from include/asm-generic/io.h.  
Palmer, did I get this right?

  Maciej
Re: [PATCH] PCI: Avoid handing out address 0 to devices
Posted by Palmer Dabbelt 4 years, 2 months ago
On Thu, 14 Apr 2022 13:22:42 PDT (-0700), macro@orcam.me.uk wrote:
> On Thu, 14 Apr 2022, Bjorn Helgaas wrote:
>
>> > > > Address 0 is treated specially however in many places, for example in
>> > > > `pci_iomap_range' and `pci_iomap_wc_range' we require that the start
>> > > > address is non-zero, and even if we let such an address through, then
>> > > > individual device drivers could reject a request to handle a device at
>> > > > such an address, such as in `uart_configure_port'.  Consequently given
>> > > > devices configured as shown above only one is actually usable:
>> > >
>> > > pci_iomap_range() tests the resource start, i.e., the CPU address.  I
>> > > guess the implication is that on RISC-V, the CPU-side port address is
>> > > the same as the PCI bus port address?
>> >
>> >  Umm, for all systems I came across except x86, which have native port I/O
>> > access machine instructions, a port I/O resource records PCI bus addresses
>> > of the device rather than its CPU addresses, which encode the location of
>> > an MMIO window the PCI port I/O space is accessed through.
>>
>> My point is only that it is not necessary for the PCI bus address and
>> the struct resource address, i.e., the argument to inb(), to be the
>> same.
>
>  Sure, but I have yet to see a system where it is the case.
>
>  Also in principle peer PCI buses could have their own port I/O address
> spaces each mapped via distinct MMIO windows in the CPU address space, but
> I haven't heard of such a system.  That of course doesn't mean there's no
> such system in existence.
>
>> I tried to find the RISC-V definition of inb(), but it's obfuscated
>> too much to be easily discoverable.
>
>  AFAICT the RISC-V port uses definitions from include/asm-generic/io.h.
> Palmer, did I get this right?

I'd argue that asm-generic/io.h uses the definitions from RISC-V, but 
the result is the same ;).

The general idea is that the IO itself is pretty generic for a handful 
of ports, they just need to be decorated with some fences (or whatever 
the ISA calls them) before/after the load/store.  Those are the 
__io_p{b,a}{r,w}() macros, which are in riscv's io.h.  IIRC they stand 
for something like Port{Before,After}{Read,Write}.
[PING^2][PATCH] PCI: Avoid handing out address 0 to devices
Posted by Maciej W. Rozycki 4 years, 2 months ago
On Sat, 26 Feb 2022, Maciej W. Rozycki wrote:

> We have numerous platforms that permit assigning addresses from 0 to PCI 
> devices, both in the memory and the I/O bus space, and we happily do so 
> if there is no conflict, e.g.:

 Ping for:
<https://lore.kernel.org/lkml/alpine.DEB.2.21.2202260044180.25061@angie.orcam.me.uk/>

  Maciej
[PING][PATCH] PCI: Avoid handing out address 0 to devices
Posted by Maciej W. Rozycki 4 years, 2 months ago
On Sat, 26 Feb 2022, Maciej W. Rozycki wrote:

> We have numerous platforms that permit assigning addresses from 0 to PCI 
> devices, both in the memory and the I/O bus space, and we happily do so 
> if there is no conflict, e.g.:

 Ping for:
<https://lore.kernel.org/lkml/alpine.DEB.2.21.2202260044180.25061@angie.orcam.me.uk/>

  Maciej