[PATCH 00/24] PCI: Bridge window selection improvements

Ilpo Järvinen posted 24 patches 1 month, 1 week ago
There is a newer version of this series
arch/m68k/kernel/pcibios.c   |  39 +-
arch/mips/pci/pci-legacy.c   |  38 +-
arch/sparc/kernel/leon_pci.c |  27 --
arch/sparc/kernel/pci.c      |  27 --
arch/sparc/kernel/pcic.c     |  27 --
drivers/pci/bus.c            |   3 +
drivers/pci/pci-sysfs.c      |  27 +-
drivers/pci/pci.h            |   8 +-
drivers/pci/probe.c          |  35 +-
drivers/pci/setup-bus.c      | 798 ++++++++++++++++++-----------------
drivers/pci/setup-res.c      |  46 +-
include/linux/pci.h          |   5 +-
12 files changed, 504 insertions(+), 576 deletions(-)
[PATCH 00/24] PCI: Bridge window selection improvements
Posted by Ilpo Järvinen 1 month, 1 week ago
This series is based on top of the three resource fitting and
assignment algorithm fixes (v3).

PCI resource fitting and assignment code needs to find the bridge
window a resource belongs to in multiple places, yet, no common
function for that exists. Thus, each site has its own version of
the decision, each with their own corner cases, misbehaviors, and
some resulting in complex interfaces between internal functions.

This series tries to rectify the situation by adding two new functions
to select the bridge window. To support these functions, bridge windows
must always contain their type information in flags which requires
modifying the flags behavior for bridge window resources.

I've hit problems related to zeroed resource flags so many times by now
that I've already lost count which has highlighted over and over again
that clearing type information is not a good idea. As also proven by
some changes of this series, retaining the flags for bridge windows
ended up fixing existing issues (although kernel ended up recovering
from the worst problem graciously and the other just results in dormant
code).

This series only changes resource flags behavior for bridge windows.
The sensible direction is to make a similar change for the other
resources as well eventually but making that change involves more
uncertainty and is not strictly necessary yet. Driver code outside of
PCI core could have assumptions about the flags, whereas bridge windows
are mostly internal to PCI core code (or should be, sane endpoint
drivers shouldn't be messing with the bridge windows). Thus, limiting
the flags behavior changes to bridge windows for now is safer than
attempting full behavioral change in a single step.


I've tried to look out for any trouble that code under arch/ could
cause after the flags start to behave differently and therefore ended
up consolidating arch/ code to use pci_enable_resources(). My
impression is that strictly speaking only the MIPS code would break
similar to PCI core's copy of pci_enable_resources(), the others were
much more lax in checking so they'd likely keep working but
consolidation seemed still the best approach there as the enable checks
seemed diverging for no apparent reason.

Most sites are converted by this change. There are three known places
that are not yet converted:

  - fail_type based logic in __assign_resources_sorted():
    I'm expecting to cover this along with the resizable BAR
    changes as I need to change the fallback logic anyway (one
    of the motivators what got me started with this series,
    I need an easy way to acquire the bridge window during
    retries/fallbacks if maximum sized BARs do not fit, which
    is what this series provides).

  - Failure detection after BAR resize: Keeps using the type
    based heuristic for failure detection. It isn't very clear how
    to decide which assignment failures should be counted and which
    not. There could be pre-existing failures that keep happening
    that end up blocking BAR resize but that's no worse than behavior
    before this series. How to identify the relevant failures does
    not look straightforward given the current structures. This
    clearly needs more thought before coding any solution.

  - resource assignment itself: This is a very complex change
    due to bus and kernel resources abstractions and might not be
    realistic any time soon.

I'd have wanted to also get rid of pci_bridge_check_ranges() that
(re)adds type information which seemed now unnecessary. It turns out,
however, that root windows still require so it will have to wait for
now.

This change has been tested on a large number of machine I've access to
which come with heterogeneous PCI configurations. Some resources
retained their original addresses now also with pci=realloc because
this series fixed the unnecessary release(+assign) of those resources.
Other than that, nothing worth of note from that testing.


My test coverage is x86 centric unfortunately so I'd appreciate if
somebody with access to non-x86 archs takes the effort to test this
series.

Info for potential testers:

Usually, it's enough to gather lspci -vvv pre and post the series, and
use diff to see whether the resources remained the same and also check
that the same drivers are still bound to the devices to confirm that
devices got properly enabled (also shown by lspci -vvv). I normally
test both with and without pci=realloc. In case of a trouble, besides
lspci -vvv output, providing pre and post dmesg and /proc/iomem
contents would be helpful, please take the dmesg with dyndbg="file
drivers/pci/*.c +p" on the kernel cmdline.

Ilpo Järvinen (24):
  m68k/PCI: Use pci_enable_resources() in pcibios_enable_device()
  sparc/PCI: Remove pcibios_enable_device() as they do nothing extra
  MIPS: PCI: Use pci_enable_resources()
  PCI: Move find_bus_resource_of_type() earlier
  PCI: Refactor find_bus_resource_of_type() logic checks
  PCI: Always claim bridge window before its setup
  PCI: Disable non-claimed bridge window
  PCI: Use pci_release_resource() instead of release_resource()
  PCI: Enable bridge even if bridge window fails to assign
  PCI: Preserve bridge window resource type flags
  PCI: Add defines for bridge window indexing
  PCI: Add bridge window selection functions
  PCI: Fix finding bridge window in pci_reassign_bridge_resources()
  PCI: Warn if bridge window cannot be released when resizing BAR
  PCI: Use pbus_select_window() during BAR resize
  PCI: Use pbus_select_window_for_type() during IO window sizing
  PCI: Rename resource variable from r to res
  PCI: Use pbus_select_window() in space available checker
  PCI: Use pbus_select_window_for_type() during mem window sizing
  PCI: Refactor distributing available memory to use loops
  PCI: Refactor remove_dev_resources() to use pbus_select_window()
  PCI: Add pci_setup_one_bridge_window()
  PCI: Pass bridge window to pci_bus_release_bridge_resources()
  PCI: Alter misleading recursion to pci_bus_release_bridge_resources()

 arch/m68k/kernel/pcibios.c   |  39 +-
 arch/mips/pci/pci-legacy.c   |  38 +-
 arch/sparc/kernel/leon_pci.c |  27 --
 arch/sparc/kernel/pci.c      |  27 --
 arch/sparc/kernel/pcic.c     |  27 --
 drivers/pci/bus.c            |   3 +
 drivers/pci/pci-sysfs.c      |  27 +-
 drivers/pci/pci.h            |   8 +-
 drivers/pci/probe.c          |  35 +-
 drivers/pci/setup-bus.c      | 798 ++++++++++++++++++-----------------
 drivers/pci/setup-res.c      |  46 +-
 include/linux/pci.h          |   5 +-
 12 files changed, 504 insertions(+), 576 deletions(-)


base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
prerequisite-patch-id: 801e8dd3aa9847d4945cb7d8958574a6006004ab
prerequisite-patch-id: 0233311f04e3ea013676b6cc00626410bbe11e41
prerequisite-patch-id: 9841faf37d56c1acf1167559613e862ef62e509d
-- 
2.39.5

Re: [PATCH 00/24] PCI: Bridge window selection improvements
Posted by Bjorn Helgaas 1 month ago
On Fri, Aug 22, 2025 at 05:55:41PM +0300, Ilpo Järvinen wrote:
> This series is based on top of the three resource fitting and
> assignment algorithm fixes (v3).
> 
> PCI resource fitting and assignment code needs to find the bridge
> window a resource belongs to in multiple places, yet, no common
> function for that exists. Thus, each site has its own version of
> the decision, each with their own corner cases, misbehaviors, and
> some resulting in complex interfaces between internal functions.
> ...

> I've tried to look out for any trouble that code under arch/ could
> cause after the flags start to behave differently and therefore ended
> up consolidating arch/ code to use pci_enable_resources(). My
> impression is that strictly speaking only the MIPS code would break
> similar to PCI core's copy of pci_enable_resources(), the others were
> much more lax in checking so they'd likely keep working but
> consolidation seemed still the best approach there as the enable checks
> seemed diverging for no apparent reason.
> ...

>   m68k/PCI: Use pci_enable_resources() in pcibios_enable_device()
>   sparc/PCI: Remove pcibios_enable_device() as they do nothing extra
>   MIPS: PCI: Use pci_enable_resources()
> ...

>  arch/m68k/kernel/pcibios.c   |  39 +-
>  arch/mips/pci/pci-legacy.c   |  38 +-
>  arch/sparc/kernel/leon_pci.c |  27 --
>  arch/sparc/kernel/pci.c      |  27 --
>  arch/sparc/kernel/pcic.c     |  27 --
> ...

I love the fact that you're doing so much cleanup.  Thanks for all the
work in this!

Obviously all this code is quite sensitive, so I put it on
pci/resource to get more exposure in -next.  If it turns out that we
trip over things or just don't feel comfortable yet for v6.18, we can
always defer this part until the next cycle.

I will also watch for acks from the m68k, mips, and sparc maintainers
for those pieces.

Bjorn
Re: [PATCH 00/24] PCI: Bridge window selection improvements
Posted by Geert Uytterhoeven 1 month ago
CC Greg

On Thu, 28 Aug 2025 at 00:36, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Fri, Aug 22, 2025 at 05:55:41PM +0300, Ilpo Järvinen wrote:
> > This series is based on top of the three resource fitting and
> > assignment algorithm fixes (v3).
> >
> > PCI resource fitting and assignment code needs to find the bridge
> > window a resource belongs to in multiple places, yet, no common
> > function for that exists. Thus, each site has its own version of
> > the decision, each with their own corner cases, misbehaviors, and
> > some resulting in complex interfaces between internal functions.
> > ...
>
> > I've tried to look out for any trouble that code under arch/ could
> > cause after the flags start to behave differently and therefore ended
> > up consolidating arch/ code to use pci_enable_resources(). My
> > impression is that strictly speaking only the MIPS code would break
> > similar to PCI core's copy of pci_enable_resources(), the others were
> > much more lax in checking so they'd likely keep working but
> > consolidation seemed still the best approach there as the enable checks
> > seemed diverging for no apparent reason.
> > ...
>
> >   m68k/PCI: Use pci_enable_resources() in pcibios_enable_device()
> >   sparc/PCI: Remove pcibios_enable_device() as they do nothing extra
> >   MIPS: PCI: Use pci_enable_resources()
> > ...
>
> >  arch/m68k/kernel/pcibios.c   |  39 +-
> >  arch/mips/pci/pci-legacy.c   |  38 +-
> >  arch/sparc/kernel/leon_pci.c |  27 --
> >  arch/sparc/kernel/pci.c      |  27 --
> >  arch/sparc/kernel/pcic.c     |  27 --
> > ...
>
> I love the fact that you're doing so much cleanup.  Thanks for all the
> work in this!
>
> Obviously all this code is quite sensitive, so I put it on
> pci/resource to get more exposure in -next.  If it turns out that we
> trip over things or just don't feel comfortable yet for v6.18, we can
> always defer this part until the next cycle.
>
> I will also watch for acks from the m68k, mips, and sparc maintainers
> for those pieces.
Re: [PATCH 00/24] PCI: Bridge window selection improvements
Posted by Ilpo Järvinen 1 month ago
On Wed, 27 Aug 2025, Bjorn Helgaas wrote:

> On Fri, Aug 22, 2025 at 05:55:41PM +0300, Ilpo Järvinen wrote:
> > This series is based on top of the three resource fitting and
> > assignment algorithm fixes (v3).
> > 
> > PCI resource fitting and assignment code needs to find the bridge
> > window a resource belongs to in multiple places, yet, no common
> > function for that exists. Thus, each site has its own version of
> > the decision, each with their own corner cases, misbehaviors, and
> > some resulting in complex interfaces between internal functions.
> > ...
> 
> > I've tried to look out for any trouble that code under arch/ could
> > cause after the flags start to behave differently and therefore ended
> > up consolidating arch/ code to use pci_enable_resources(). My
> > impression is that strictly speaking only the MIPS code would break
> > similar to PCI core's copy of pci_enable_resources(), the others were
> > much more lax in checking so they'd likely keep working but
> > consolidation seemed still the best approach there as the enable checks
> > seemed diverging for no apparent reason.
> > ...
> 
> >   m68k/PCI: Use pci_enable_resources() in pcibios_enable_device()
> >   sparc/PCI: Remove pcibios_enable_device() as they do nothing extra
> >   MIPS: PCI: Use pci_enable_resources()
> > ...
> 
> >  arch/m68k/kernel/pcibios.c   |  39 +-
> >  arch/mips/pci/pci-legacy.c   |  38 +-
> >  arch/sparc/kernel/leon_pci.c |  27 --
> >  arch/sparc/kernel/pci.c      |  27 --
> >  arch/sparc/kernel/pcic.c     |  27 --
> > ...
> 
> I love the fact that you're doing so much cleanup.  Thanks for all the
> work in this!
>
> Obviously all this code is quite sensitive, so I put it on
> pci/resource to get more exposure in -next.

Thanks. I really appreciate the opportunity to have it in next for extra 
testing as my testing, while relatively extensive, still has its limits.

I'll need to do minor corrections into a few intermediate patches though 
to ensure bisectability, we really want to make this as bisectable as 
possible. In other words, I've found 2 relatively small issues in them 
which won't change the end result when the whole series is complete and 
fixed some small grammar errors in the changelogs.

I see you made some corrections so I'm not sure what's the best course of 
action here to update them. Should I just send v2 normally and you deal 
with your changes while replacing v1 with v2?

> If it turns out that we
> trip over things or just don't feel comfortable yet for v6.18, we can
> always defer this part until the next cycle.

I agree, and really please don't hesitate to postpone if it turns out 
necessary.

> I will also watch for acks from the m68k, mips, and sparc maintainers
> for those pieces.
> 
> Bjorn
> 

-- 
 i.
Re: [PATCH 00/24] PCI: Bridge window selection improvements
Posted by Bjorn Helgaas 1 month ago
On Thu, Aug 28, 2025 at 07:47:06PM +0300, Ilpo Järvinen wrote:
> On Wed, 27 Aug 2025, Bjorn Helgaas wrote:
> > On Fri, Aug 22, 2025 at 05:55:41PM +0300, Ilpo Järvinen wrote:
> > > This series is based on top of the three resource fitting and
> > > assignment algorithm fixes (v3).
> > > 
> > > PCI resource fitting and assignment code needs to find the bridge
> > > window a resource belongs to in multiple places, yet, no common
> > > function for that exists. Thus, each site has its own version of
> > > the decision, each with their own corner cases, misbehaviors, and
> > > some resulting in complex interfaces between internal functions.
> > > ...

> I'll need to do minor corrections into a few intermediate patches though 
> to ensure bisectability, we really want to make this as bisectable as 
> possible. In other words, I've found 2 relatively small issues in them 
> which won't change the end result when the whole series is complete and 
> fixed some small grammar errors in the changelogs.
> 
> I see you made some corrections so I'm not sure what's the best course of 
> action here to update them. Should I just send v2 normally and you deal 
> with your changes while replacing v1 with v2?

That would work for me.  Or if you picked the patches from
pci/resource and posted a v2 based on them, that would be even easier
for me.
Re: [PATCH 00/24] PCI: Bridge window selection improvements
Posted by Ilpo Järvinen 1 month, 1 week ago
On Fri, 22 Aug 2025, Ilpo Järvinen wrote:

> This series is based on top of the three resource fitting and
> assignment algorithm fixes (v3).

I realized I didn't include a link to those patches. It's this series: 

https://lore.kernel.org/linux-pci/20250822123359.16305-1-ilpo.jarvinen@linux.intel.com/

I'm sorry about the extra hassle.

-- 
 i.

> PCI resource fitting and assignment code needs to find the bridge
> window a resource belongs to in multiple places, yet, no common
> function for that exists. Thus, each site has its own version of
> the decision, each with their own corner cases, misbehaviors, and
> some resulting in complex interfaces between internal functions.
> 
> This series tries to rectify the situation by adding two new functions
> to select the bridge window. To support these functions, bridge windows
> must always contain their type information in flags which requires
> modifying the flags behavior for bridge window resources.
> 
> I've hit problems related to zeroed resource flags so many times by now
> that I've already lost count which has highlighted over and over again
> that clearing type information is not a good idea. As also proven by
> some changes of this series, retaining the flags for bridge windows
> ended up fixing existing issues (although kernel ended up recovering
> from the worst problem graciously and the other just results in dormant
> code).
> 
> This series only changes resource flags behavior for bridge windows.
> The sensible direction is to make a similar change for the other
> resources as well eventually but making that change involves more
> uncertainty and is not strictly necessary yet. Driver code outside of
> PCI core could have assumptions about the flags, whereas bridge windows
> are mostly internal to PCI core code (or should be, sane endpoint
> drivers shouldn't be messing with the bridge windows). Thus, limiting
> the flags behavior changes to bridge windows for now is safer than
> attempting full behavioral change in a single step.
> 
> 
> I've tried to look out for any trouble that code under arch/ could
> cause after the flags start to behave differently and therefore ended
> up consolidating arch/ code to use pci_enable_resources(). My
> impression is that strictly speaking only the MIPS code would break
> similar to PCI core's copy of pci_enable_resources(), the others were
> much more lax in checking so they'd likely keep working but
> consolidation seemed still the best approach there as the enable checks
> seemed diverging for no apparent reason.
> 
> Most sites are converted by this change. There are three known places
> that are not yet converted:
> 
>   - fail_type based logic in __assign_resources_sorted():
>     I'm expecting to cover this along with the resizable BAR
>     changes as I need to change the fallback logic anyway (one
>     of the motivators what got me started with this series,
>     I need an easy way to acquire the bridge window during
>     retries/fallbacks if maximum sized BARs do not fit, which
>     is what this series provides).
> 
>   - Failure detection after BAR resize: Keeps using the type
>     based heuristic for failure detection. It isn't very clear how
>     to decide which assignment failures should be counted and which
>     not. There could be pre-existing failures that keep happening
>     that end up blocking BAR resize but that's no worse than behavior
>     before this series. How to identify the relevant failures does
>     not look straightforward given the current structures. This
>     clearly needs more thought before coding any solution.
> 
>   - resource assignment itself: This is a very complex change
>     due to bus and kernel resources abstractions and might not be
>     realistic any time soon.
> 
> I'd have wanted to also get rid of pci_bridge_check_ranges() that
> (re)adds type information which seemed now unnecessary. It turns out,
> however, that root windows still require so it will have to wait for
> now.
> 
> This change has been tested on a large number of machine I've access to
> which come with heterogeneous PCI configurations. Some resources
> retained their original addresses now also with pci=realloc because
> this series fixed the unnecessary release(+assign) of those resources.
> Other than that, nothing worth of note from that testing.
> 
> 
> My test coverage is x86 centric unfortunately so I'd appreciate if
> somebody with access to non-x86 archs takes the effort to test this
> series.
> 
> Info for potential testers:
> 
> Usually, it's enough to gather lspci -vvv pre and post the series, and
> use diff to see whether the resources remained the same and also check
> that the same drivers are still bound to the devices to confirm that
> devices got properly enabled (also shown by lspci -vvv). I normally
> test both with and without pci=realloc. In case of a trouble, besides
> lspci -vvv output, providing pre and post dmesg and /proc/iomem
> contents would be helpful, please take the dmesg with dyndbg="file
> drivers/pci/*.c +p" on the kernel cmdline.
> 
> Ilpo Järvinen (24):
>   m68k/PCI: Use pci_enable_resources() in pcibios_enable_device()
>   sparc/PCI: Remove pcibios_enable_device() as they do nothing extra
>   MIPS: PCI: Use pci_enable_resources()
>   PCI: Move find_bus_resource_of_type() earlier
>   PCI: Refactor find_bus_resource_of_type() logic checks
>   PCI: Always claim bridge window before its setup
>   PCI: Disable non-claimed bridge window
>   PCI: Use pci_release_resource() instead of release_resource()
>   PCI: Enable bridge even if bridge window fails to assign
>   PCI: Preserve bridge window resource type flags
>   PCI: Add defines for bridge window indexing
>   PCI: Add bridge window selection functions
>   PCI: Fix finding bridge window in pci_reassign_bridge_resources()
>   PCI: Warn if bridge window cannot be released when resizing BAR
>   PCI: Use pbus_select_window() during BAR resize
>   PCI: Use pbus_select_window_for_type() during IO window sizing
>   PCI: Rename resource variable from r to res
>   PCI: Use pbus_select_window() in space available checker
>   PCI: Use pbus_select_window_for_type() during mem window sizing
>   PCI: Refactor distributing available memory to use loops
>   PCI: Refactor remove_dev_resources() to use pbus_select_window()
>   PCI: Add pci_setup_one_bridge_window()
>   PCI: Pass bridge window to pci_bus_release_bridge_resources()
>   PCI: Alter misleading recursion to pci_bus_release_bridge_resources()
> 
>  arch/m68k/kernel/pcibios.c   |  39 +-
>  arch/mips/pci/pci-legacy.c   |  38 +-
>  arch/sparc/kernel/leon_pci.c |  27 --
>  arch/sparc/kernel/pci.c      |  27 --
>  arch/sparc/kernel/pcic.c     |  27 --
>  drivers/pci/bus.c            |   3 +
>  drivers/pci/pci-sysfs.c      |  27 +-
>  drivers/pci/pci.h            |   8 +-
>  drivers/pci/probe.c          |  35 +-
>  drivers/pci/setup-bus.c      | 798 ++++++++++++++++++-----------------
>  drivers/pci/setup-res.c      |  46 +-
>  include/linux/pci.h          |   5 +-
>  12 files changed, 504 insertions(+), 576 deletions(-)
> 
> 
> base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
> prerequisite-patch-id: 801e8dd3aa9847d4945cb7d8958574a6006004ab
> prerequisite-patch-id: 0233311f04e3ea013676b6cc00626410bbe11e41
> prerequisite-patch-id: 9841faf37d56c1acf1167559613e862ef62e509d
>