[PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling

Koichiro Den posted 10 patches 1 week, 4 days ago
drivers/ntb/hw/epf/ntb_hw_epf.c               |  89 ++++++++++-
drivers/pci/endpoint/functions/pci-epf-vntb.c | 147 +++++++++++++++---
2 files changed, 210 insertions(+), 26 deletions(-)
[PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling
Posted by Koichiro Den 1 week, 4 days ago
This series fixes doorbell bit/vector handling for the EPF-based NTB
pair (ntb_hw_epf <-> pci-epf-*ntb). Its primary goal is to enable safe
per-db-vector handling in the NTB core and clients (e.g. ntb_transport),
without changing the on-the-wire doorbell mapping.


Background / problem
====================

ntb_hw_epf historically applies an extra offset when ringing peer
doorbells: the link event uses the first interrupt slot, and doorbells
start from the third slot (i.e. a second slot is effectively unused).
pci-epf-vntb carries the matching offset on the EP side as well.

As long as db_vector_count()/db_vector_mask() are not implemented, this
mismatch is mostly masked. Doorbell events are effectively treated as
"can hit any QP" and the off-by-one vector numbering does not surface
clearly.

However, once per-vector handling is enabled, the current state becomes
problematic:

  - db_valid_mask exposes bits that do not correspond to real doorbells
    (link/unused slots leak into the mask).
  - ntb_db_event() is fed with 1-based/shifted vectors, while NTB core
    expects a 0-based db_vector for doorbells.
  - On pci-epf-vntb, .peer_db_set() may be called in atomic context, but
    it directly calls pci_epc_raise_irq(), which can sleep.


Why NOT fix the root offset?
============================

The natural "root" fix would be to remove the historical extra offset in
the peer_db_set() doorbell paths for ntb_hw_epf and pci-epf-vntb.
Unfortunately this would lead to interoperability issues when mixing old
and new kernel versions (old/new peers). A new side would ring a
different interrupt slot than what an old peer expects, leading to
missed or misrouted doorbells, once db_vector_count()/db_vector_mask()
are implemented.

Therefore this series intentionally keeps the legacy offset, and instead
fixes the surrounding pieces so the mapping is documented and handled
consistently in masks, vector numbering, and per-vector reporting.


What this series does
=====================

- pci-epf-vntb:

  - Document the legacy offset.
  - Defer MSI doorbell raises to process context to avoid sleeping in
    atomic context. This becomes relevant once multiple doorbells are
    raised concurrently at a high rate.
  - Report doorbell vectors as 0-based to ntb_db_event().
  - Fix db_valid_mask and implement db_vector_count()/db_vector_mask().

- ntb_hw_epf:

  - Document the legacy offset in ntb_epf_peer_db_set().
  - Fix db_valid_mask to cover only real doorbell bits.
  - Report 0-based db_vector to ntb_db_event() (accounting for the
    unused slot).
  - Keep db_val as a bitmask and fix db_read/db_clear semantics
    accordingly.
  - Implement db_vector_count()/db_vector_mask().


Compatibility
=============

By keeping the legacy offset intact, this series aims to remain
compatible across mixed kernel versions. The observable changes are
limited to correct mask/vector reporting and safer execution context
handling.

Patches 1-5 (PCI Endpoint) and 6-10 (NTB) are independent and can be
applied separately through the respective trees. They are sent together
in this v3 for convenience.

Once the remaining acks from NTB maintainers are collected, the plan is
to take the whole series through the PCI EP tree. See:
https://lore.kernel.org/linux-pci/rnzsnp5de4qf5w7smebkmqekpuaqckltx73rj6ha3q2nrby5yp@7hsgvdzvjkp6/

---
Changelog
=========

Changes since v2:
  - No functional changes.
  - Rebased onto current pci/endpoint
    e022f0c72c7f ("selftests: pci_endpoint: Skip reserved BARs").
    * Patch 2 needed a trivial context-only adjustment while rebasing, due
      to commit d799984233a5 ("PCI: endpoint: pci-epf-vntb: Stop
      cmd_handler work in epf_ntb_epc_cleanup").
  - Picked up additional Reviewed-by tags from Frank.
  - Fixed the incorrect v2 series title.

Changes since v1:
  - Addressed feedback from Dave (add a source code comment, introduce
    enum to eliminate magic numbers)
  - Updated source code comment in Patch 2.
  - No functional changes, so retained Reviewed-by tags by Frank and Dave.
    Thank you both for the review.

v2: https://lore.kernel.org/linux-pci/20260227084955.3184017-1-den@valinux.co.jp/
v1: https://lore.kernel.org/linux-pci/20260224133459.1741537-1-den@valinux.co.jp/


Best regards,
Koichiro


Koichiro Den (10):
  PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset
  PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic
    context
  PCI: endpoint: pci-epf-vntb: Report 0-based doorbell vector via
    ntb_db_event()
  PCI: endpoint: pci-epf-vntb: Exclude reserved slots from db_valid_mask
  PCI: endpoint: pci-epf-vntb: Implement db_vector_count/mask for
    doorbells
  NTB: epf: Document legacy doorbell slot offset in
    ntb_epf_peer_db_set()
  NTB: epf: Make db_valid_mask cover only real doorbell bits
  NTB: epf: Report 0-based doorbell vector via ntb_db_event()
  NTB: epf: Fix doorbell bitmask handling in db_read/db_clear
  NTB: epf: Implement db_vector_count/mask for doorbells

 drivers/ntb/hw/epf/ntb_hw_epf.c               |  89 ++++++++++-
 drivers/pci/endpoint/functions/pci-epf-vntb.c | 147 +++++++++++++++---
 2 files changed, 210 insertions(+), 26 deletions(-)

-- 
2.51.0
Re: [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling
Posted by Koichiro Den 1 week, 4 days ago
On Mon, Mar 23, 2026 at 12:15:34PM +0900, Koichiro Den wrote:
> This series fixes doorbell bit/vector handling for the EPF-based NTB
> pair (ntb_hw_epf <-> pci-epf-*ntb). Its primary goal is to enable safe
> per-db-vector handling in the NTB core and clients (e.g. ntb_transport),
> without changing the on-the-wire doorbell mapping.
> 
> 
> Background / problem
> ====================
> 
> ntb_hw_epf historically applies an extra offset when ringing peer
> doorbells: the link event uses the first interrupt slot, and doorbells
> start from the third slot (i.e. a second slot is effectively unused).
> pci-epf-vntb carries the matching offset on the EP side as well.
> 
> As long as db_vector_count()/db_vector_mask() are not implemented, this
> mismatch is mostly masked. Doorbell events are effectively treated as
> "can hit any QP" and the off-by-one vector numbering does not surface
> clearly.
> 
> However, once per-vector handling is enabled, the current state becomes
> problematic:
> 
>   - db_valid_mask exposes bits that do not correspond to real doorbells
>     (link/unused slots leak into the mask).
>   - ntb_db_event() is fed with 1-based/shifted vectors, while NTB core
>     expects a 0-based db_vector for doorbells.
>   - On pci-epf-vntb, .peer_db_set() may be called in atomic context, but
>     it directly calls pci_epc_raise_irq(), which can sleep.
> 
> 
> Why NOT fix the root offset?
> ============================
> 
> The natural "root" fix would be to remove the historical extra offset in
> the peer_db_set() doorbell paths for ntb_hw_epf and pci-epf-vntb.
> Unfortunately this would lead to interoperability issues when mixing old
> and new kernel versions (old/new peers). A new side would ring a
> different interrupt slot than what an old peer expects, leading to
> missed or misrouted doorbells, once db_vector_count()/db_vector_mask()
> are implemented.
> 
> Therefore this series intentionally keeps the legacy offset, and instead
> fixes the surrounding pieces so the mapping is documented and handled
> consistently in masks, vector numbering, and per-vector reporting.
> 
> 
> What this series does
> =====================
> 
> - pci-epf-vntb:
> 
>   - Document the legacy offset.
>   - Defer MSI doorbell raises to process context to avoid sleeping in
>     atomic context. This becomes relevant once multiple doorbells are
>     raised concurrently at a high rate.
>   - Report doorbell vectors as 0-based to ntb_db_event().
>   - Fix db_valid_mask and implement db_vector_count()/db_vector_mask().
> 
> - ntb_hw_epf:
> 
>   - Document the legacy offset in ntb_epf_peer_db_set().
>   - Fix db_valid_mask to cover only real doorbell bits.
>   - Report 0-based db_vector to ntb_db_event() (accounting for the
>     unused slot).
>   - Keep db_val as a bitmask and fix db_read/db_clear semantics
>     accordingly.
>   - Implement db_vector_count()/db_vector_mask().
> 
> 
> Compatibility
> =============
> 
> By keeping the legacy offset intact, this series aims to remain
> compatible across mixed kernel versions. The observable changes are
> limited to correct mask/vector reporting and safer execution context
> handling.
> 
> Patches 1-5 (PCI Endpoint) and 6-10 (NTB) are independent and can be
> applied separately through the respective trees. They are sent together
> in this v3 for convenience.
> 
> Once the remaining acks from NTB maintainers are collected, the plan is

Hi Dave,

When you have a chance, I'd appreciate another look at patches 6-10 (which are
unchanged since v2). If you do not see any blockers, Acked-by would be
greatly appreciated.


P.S. Regarding Sashiko's feedback [1], my understanding is that there are no
blockers, but there are a few points that would be better addressed separately
as orthogonal follow-ups:

- configfs knobs mutability and bounds checking, including (but not limited to)
  db_count.
  In my opinion, allowing updates after .bind() looks questionable, and
  returning -EBUSY once bound seems more appropriate. I'm leaning toward
  handling this as a separate hardening series.

- ntb_hw_epf IRQ unwind concern.
  This is what I was trying to address in [2], which I hope will land soon.

- Other lifecycle concerns.
  These are largely tied to the current vNTB implementation and were part of
  [3], for which I still plan to post a follow-up series that adds .remove()
  implementation to vntb_pci_driver.

[1] https://sashiko.dev/#/patchset/20260323031544.2598111-1-den%40valinux.co.jp
[2] https://lore.kernel.org/ntb/20260304083028.1391068-1-den@valinux.co.jp/
[3] https://lore.kernel.org/all/20260226084142.2226875-1-den@valinux.co.jp/


Best regards,
Koichiro

> to take the whole series through the PCI EP tree. See:
> https://lore.kernel.org/linux-pci/rnzsnp5de4qf5w7smebkmqekpuaqckltx73rj6ha3q2nrby5yp@7hsgvdzvjkp6/
> 
> ---
> Changelog
> =========
> 
> Changes since v2:
>   - No functional changes.
>   - Rebased onto current pci/endpoint
>     e022f0c72c7f ("selftests: pci_endpoint: Skip reserved BARs").
>     * Patch 2 needed a trivial context-only adjustment while rebasing, due
>       to commit d799984233a5 ("PCI: endpoint: pci-epf-vntb: Stop
>       cmd_handler work in epf_ntb_epc_cleanup").
>   - Picked up additional Reviewed-by tags from Frank.
>   - Fixed the incorrect v2 series title.
> 
> Changes since v1:
>   - Addressed feedback from Dave (add a source code comment, introduce
>     enum to eliminate magic numbers)
>   - Updated source code comment in Patch 2.
>   - No functional changes, so retained Reviewed-by tags by Frank and Dave.
>     Thank you both for the review.
> 
> v2: https://lore.kernel.org/linux-pci/20260227084955.3184017-1-den@valinux.co.jp/
> v1: https://lore.kernel.org/linux-pci/20260224133459.1741537-1-den@valinux.co.jp/
> 
> 
> Best regards,
> Koichiro
> 
> 
> Koichiro Den (10):
>   PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset
>   PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic
>     context
>   PCI: endpoint: pci-epf-vntb: Report 0-based doorbell vector via
>     ntb_db_event()
>   PCI: endpoint: pci-epf-vntb: Exclude reserved slots from db_valid_mask
>   PCI: endpoint: pci-epf-vntb: Implement db_vector_count/mask for
>     doorbells
>   NTB: epf: Document legacy doorbell slot offset in
>     ntb_epf_peer_db_set()
>   NTB: epf: Make db_valid_mask cover only real doorbell bits
>   NTB: epf: Report 0-based doorbell vector via ntb_db_event()
>   NTB: epf: Fix doorbell bitmask handling in db_read/db_clear
>   NTB: epf: Implement db_vector_count/mask for doorbells
> 
>  drivers/ntb/hw/epf/ntb_hw_epf.c               |  89 ++++++++++-
>  drivers/pci/endpoint/functions/pci-epf-vntb.c | 147 +++++++++++++++---
>  2 files changed, 210 insertions(+), 26 deletions(-)
> 
> -- 
> 2.51.0
> 
>
Re: [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling
Posted by Niklas Cassel 1 week, 2 days ago
Hello Koichiro,

On Tue, Mar 24, 2026 at 12:43:53AM +0900, Koichiro Den wrote:
> 
> - configfs knobs mutability and bounds checking, including (but not limited to)
>   db_count.
>   In my opinion, allowing updates after .bind() looks questionable, and
>   returning -EBUSY once bound seems more appropriate. I'm leaning toward
>   handling this as a separate hardening series.

This is in line with what we did for pci-epf-test, see commit:
ffcc4850a161 ("PCI: endpoint: pci-epf-test: Allow overriding default BAR sizes")

but we return EOPNOTSUPP instead of EBUSY.


Kind regards,
Niklas
Re: [PATCH v3 00/10] PCI: endpoint: pci-epf-vntb / NTB: epf: Enable per-doorbell bit handling
Posted by Koichiro Den 1 week, 2 days ago
On Wed, Mar 25, 2026 at 07:23:37AM +0100, Niklas Cassel wrote:
> Hello Koichiro,
> 
> On Tue, Mar 24, 2026 at 12:43:53AM +0900, Koichiro Den wrote:
> > 
> > - configfs knobs mutability and bounds checking, including (but not limited to)
> >   db_count.
> >   In my opinion, allowing updates after .bind() looks questionable, and
> >   returning -EBUSY once bound seems more appropriate. I'm leaning toward
> >   handling this as a separate hardening series.
> 
> This is in line with what we did for pci-epf-test, see commit:
> ffcc4850a161 ("PCI: endpoint: pci-epf-test: Allow overriding default BAR sizes")
> 
> but we return EOPNOTSUPP instead of EBUSY.

Yes, I remember you were discussing this with Mani in another thread. For
consistency, we should use -EOPNOTSUPP here as well. Thanks for the reminder.

Best regards,
Koichiro

> 
> 
> Kind regards,
> Niklas