drivers/net/ethernet/intel/igb/igb.h | 1 + drivers/net/ethernet/intel/igb/igb_main.c | 5 ++++- drivers/net/ethernet/intel/igb/igb_ptp.c | 22 ++++++++++++++++++++++ 3 files changed, 27 insertions(+), 1 deletion(-)
Retrieve Tx timestamp directly from interrupt handler for i210.
The current implementation uses schedule_work() which is executed by the
system work queue to retrieve Tx timestamps. This increases latency and can
lead to timeouts in case of heavy system load. i210 is often used in
industrial systems, where timestamp timeouts can be fatal.
Therefore, fetch the timestamp directly from the interrupt handler.
The work queue code stays for all other NICs supported by igb.
Tested on Intel i210 and i350.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
---
Changes in v3:
- Switch back to IRQ, but for i210 only
- Keep kworker for all other NICs like i350 (Miroslav)
- Link to v2: https://lore.kernel.org/r/20250822-igb_irq_ts-v2-1-1ac37078a7a4@linutronix.de
Changes in v2:
- Switch from IRQ to PTP aux worker due to NTP performance regression (Miroslav)
- Link to v1: https://lore.kernel.org/r/20250815-igb_irq_ts-v1-1-8c6fc0353422@linutronix.de
---
drivers/net/ethernet/intel/igb/igb.h | 1 +
drivers/net/ethernet/intel/igb/igb_main.c | 5 ++++-
drivers/net/ethernet/intel/igb/igb_ptp.c | 22 ++++++++++++++++++++++
3 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 0fff1df81b7b..1de29670784e 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -776,6 +776,7 @@ int igb_ptp_hwtstamp_get(struct net_device *netdev,
int igb_ptp_hwtstamp_set(struct net_device *netdev,
struct kernel_hwtstamp_config *config,
struct netlink_ext_ack *extack);
+void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter);
void igb_set_flag_queue_pairs(struct igb_adapter *, const u32);
unsigned int igb_get_max_rss_queues(struct igb_adapter *);
#ifdef CONFIG_IGB_HWMON
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index dbea37269d2c..d0d9245e6d72 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -7078,7 +7078,10 @@ static void igb_tsync_interrupt(struct igb_adapter *adapter)
if (tsicr & E1000_TSICR_TXTS) {
/* retrieve hardware timestamp */
- schedule_work(&adapter->ptp_tx_work);
+ if (hw->mac.type == e1000_i210)
+ igb_ptp_tx_tstamp_event(adapter);
+ else
+ schedule_work(&adapter->ptp_tx_work);
}
if (tsicr & TSINTR_TT0)
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c
index bd85d02ecadd..8c8f2b8615f7 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct ptp_clock_info *ptp, unsigned int pin,
return 0;
}
+/**
+ * igb_ptp_tx_tstamp_event
+ * @adapter: pointer to igb adapter
+ *
+ * This function checks the TSYNCTXCTL valid bit and stores the Tx hardware
+ * timestamp at the current skb.
+ **/
+void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter)
+{
+ struct e1000_hw *hw = &adapter->hw;
+ u32 tsynctxctl;
+
+ if (!adapter->ptp_tx_skb)
+ return;
+
+ tsynctxctl = rd32(E1000_TSYNCTXCTL);
+ if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID)))
+ return;
+
+ igb_ptp_tx_hwtstamp(adapter);
+}
+
/**
* igb_ptp_tx_work
* @work: pointer to work struct
---
base-commit: e07d0d30939990da377672ef49ca09763b4fbc79
change-id: 20250813-igb_irq_ts-1aa77cc7b4cb
Best regards,
--
Kurt Kanzenbach <kurt@linutronix.de>
On 2026-02-05 08:54:34 [+0100], Kurt Kanzenbach wrote: > Retrieve Tx timestamp directly from interrupt handler for i210. > > The current implementation uses schedule_work() which is executed by the > system work queue to retrieve Tx timestamps. This increases latency and can > lead to timeouts in case of heavy system load. i210 is often used in > industrial systems, where timestamp timeouts can be fatal. > > Therefore, fetch the timestamp directly from the interrupt handler. > > The work queue code stays for all other NICs supported by igb. > > Tested on Intel i210 and i350. > > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> IMHO this is a compromise with Miroslav where he observed less PTP timestamps on the i350. While testing I did not get near Miroslav's difference but there was a small change. I don't understand *why* because the current workqueue usage reads the timestamp on the same CPU on which the interrupt occurred. Doing it directly just avoids the context switch. This feels beneficial. Sebastian
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Kurt Kanzenbach
> Sent: Thursday, February 5, 2026 8:55 AM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>
> Cc: Paul Menzel <pmenzel@molgen.mpg.de>; Vadim Fedorenko
> <vadim.fedorenko@linux.dev>; Gomes, Vinicius
> <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran
> <richardcochran@gmail.com>; Kurt Kanzenbach <kurt@linutronix.de>;
> linux-kernel@vger.kernel.org; Andrew Lunn <andrew+netdev@lunn.ch>;
> Eric Dumazet <edumazet@google.com>; intel-wired-lan@lists.osuosl.org;
> Keller, Jacob E <jacob.e.keller@intel.com>; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; David S. Miller
> <davem@davemloft.net>; Sebastian Andrzej Siewior
> <bigeasy@linutronix.de>
> Subject: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx
> timestamp directly from interrupt for i210
>
> Retrieve Tx timestamp directly from interrupt handler for i210.
>
> The current implementation uses schedule_work() which is executed by
> the system work queue to retrieve Tx timestamps. This increases
> latency and can lead to timeouts in case of heavy system load. i210 is
> often used in industrial systems, where timestamp timeouts can be
> fatal.
>
> Therefore, fetch the timestamp directly from the interrupt handler.
>
> The work queue code stays for all other NICs supported by igb.
>
> Tested on Intel i210 and i350.
>
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
> ---
> Changes in v3:
> - Switch back to IRQ, but for i210 only
> - Keep kworker for all other NICs like i350 (Miroslav)
> - Link to v2: https://lore.kernel.org/r/20250822-igb_irq_ts-v2-1-
> 1ac37078a7a4@linutronix.de
>
> Changes in v2:
> - Switch from IRQ to PTP aux worker due to NTP performance regression
> (Miroslav)
> - Link to v1: https://lore.kernel.org/r/20250815-igb_irq_ts-v1-1-
> 8c6fc0353422@linutronix.de
> ---
> drivers/net/ethernet/intel/igb/igb.h | 1 +
> drivers/net/ethernet/intel/igb/igb_main.c | 5 ++++-
> drivers/net/ethernet/intel/igb/igb_ptp.c | 22 ++++++++++++++++++++++
> 3 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb.h
> b/drivers/net/ethernet/intel/igb/igb.h
> index 0fff1df81b7b..1de29670784e 100644
> --- a/drivers/net/ethernet/intel/igb/igb.h
> +++ b/drivers/net/ethernet/intel/igb/igb.h
> @@ -776,6 +776,7 @@ int igb_ptp_hwtstamp_get(struct net_device
> *netdev, int igb_ptp_hwtstamp_set(struct net_device *netdev,
> struct kernel_hwtstamp_config *config,
> struct netlink_ext_ack *extack);
> +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter);
> void igb_set_flag_queue_pairs(struct igb_adapter *, const u32);
> unsigned int igb_get_max_rss_queues(struct igb_adapter *); #ifdef
> CONFIG_IGB_HWMON diff --git
> a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index dbea37269d2c..d0d9245e6d72 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -7078,7 +7078,10 @@ static void igb_tsync_interrupt(struct
> igb_adapter *adapter)
>
> if (tsicr & E1000_TSICR_TXTS) {
> /* retrieve hardware timestamp */
> - schedule_work(&adapter->ptp_tx_work);
> + if (hw->mac.type == e1000_i210)
> + igb_ptp_tx_tstamp_event(adapter); <-Called from IRQ!
> + else
> + schedule_work(&adapter->ptp_tx_work);
> }
>
> if (tsicr & TSINTR_TT0)
> diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c
> b/drivers/net/ethernet/intel/igb/igb_ptp.c
> index bd85d02ecadd..8c8f2b8615f7 100644
> --- a/drivers/net/ethernet/intel/igb/igb_ptp.c
> +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
> @@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct
> ptp_clock_info *ptp, unsigned int pin,
> return 0;
> }
>
> +/**
> + * igb_ptp_tx_tstamp_event
> + * @adapter: pointer to igb adapter
> + *
> + * This function checks the TSYNCTXCTL valid bit and stores the Tx
> +hardware
> + * timestamp at the current skb.
> + **/
> +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter) {
> + struct e1000_hw *hw = &adapter->hw;
> + u32 tsynctxctl;
> +
> + if (!adapter->ptp_tx_skb)
> + return;
> +
> + tsynctxctl = rd32(E1000_TSYNCTXCTL);
> + if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID)))
> + return;
> +
> + igb_ptp_tx_hwtstamp(adapter); <-Calls existing function designed for work queue!
skb_tstamp_tx() can sleep
Smells like sleep-in-atomic isn't it?
> +}
> +
> /**
> * igb_ptp_tx_work
> * @work: pointer to work struct
>
> ---
> base-commit: e07d0d30939990da377672ef49ca09763b4fbc79
> change-id: 20250813-igb_irq_ts-1aa77cc7b4cb
>
> Best regards,
> --
> Kurt Kanzenbach <kurt@linutronix.de>
On Thu Feb 05 2026, Loktionov, Aleksandr wrote:
>> +/**
>> + * igb_ptp_tx_tstamp_event
>> + * @adapter: pointer to igb adapter
>> + *
>> + * This function checks the TSYNCTXCTL valid bit and stores the Tx
>> +hardware
>> + * timestamp at the current skb.
>> + **/
>> +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter) {
>> + struct e1000_hw *hw = &adapter->hw;
>> + u32 tsynctxctl;
>> +
>> + if (!adapter->ptp_tx_skb)
>> + return;
>> +
>> + tsynctxctl = rd32(E1000_TSYNCTXCTL);
>> + if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID)))
>> + return;
>> +
>> + igb_ptp_tx_hwtstamp(adapter); <-Calls existing function designed for work queue!
>
> skb_tstamp_tx() can sleep
> Smells like sleep-in-atomic isn't it?
AFAICS skb_tstamp_tx() is safe to call here.
> spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep
In case you're worried about PREEMPT_RT: On -RT the IRQ runs a dedicated
thread. BTW I've tested this with and without -RT and with
CONFIG_DEBUG_ATOMIC_SLEEP.
Thanks,
Kurt
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Kurt Kanzenbach
> Sent: Thursday, February 5, 2026 12:58 PM
> To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; Nguyen,
> Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>
> Cc: Paul Menzel <pmenzel@molgen.mpg.de>; Vadim Fedorenko
> <vadim.fedorenko@linux.dev>; Gomes, Vinicius
> <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran
> <richardcochran@gmail.com>; linux-kernel@vger.kernel.org; Andrew Lunn
> <andrew+netdev@lunn.ch>; Eric Dumazet <edumazet@google.com>; intel-
> wired-lan@lists.osuosl.org; Keller, Jacob E
> <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net>;
> Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Subject: Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx
> timestamp directly from interrupt for i210
>
> On Thu Feb 05 2026, Loktionov, Aleksandr wrote:
> >> +/**
> >> + * igb_ptp_tx_tstamp_event
> >> + * @adapter: pointer to igb adapter
> >> + *
> >> + * This function checks the TSYNCTXCTL valid bit and stores the Tx
> >> +hardware
> >> + * timestamp at the current skb.
> >> + **/
> >> +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter) {
> >> + struct e1000_hw *hw = &adapter->hw;
> >> + u32 tsynctxctl;
> >> +
> >> + if (!adapter->ptp_tx_skb)
> >> + return;
> >> +
> >> + tsynctxctl = rd32(E1000_TSYNCTXCTL);
> >> + if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID)))
> >> + return;
> >> +
> >> + igb_ptp_tx_hwtstamp(adapter); <-Calls existing function
> designed for work queue!
> >
> > skb_tstamp_tx() can sleep
> > Smells like sleep-in-atomic isn't it?
>
> AFAICS skb_tstamp_tx() is safe to call here.
>
> > spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep
>
> In case you're worried about PREEMPT_RT: On -RT the IRQ runs a
> dedicated thread. BTW I've tested this with and without -RT and with
> CONFIG_DEBUG_ATOMIC_SLEEP.
>
> Thanks,
> Kurt
Thank you, Kurt for sharing your experience. I don't have so many experience with RT Linux.
For me calling a function, not designed to be called from IRQ context is a SUS.
So, I rose the question about sleeping.
But there is also a question about non-RT Kernels with Shared IRQ Vectors...
I.e. regular packet processing (NAPI poll in softirq) and PTP timestamp events (in hardirq).
I suspect we have potentially broken drivers with shared vectors (MSI-X should be safe I hope).
I'd like the comment to be added:
/* Safe to call from IRQ: dedicated misc vector + RT-safe on PREEMPT_RT */
igb_ptp_tx_hwtstamp(adapter);
But in long term perspective prefer to refactor moving to NAPI is the safer, more maintainable pattern.
What do you think?
Alex
On 2/5/2026 4:20 AM, Loktionov, Aleksandr wrote:
>
>
>> -----Original Message-----
>> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
>> Of Kurt Kanzenbach
>> Sent: Thursday, February 5, 2026 12:58 PM
>> To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; Nguyen,
>> Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
>> <przemyslaw.kitszel@intel.com>
>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>; Vadim Fedorenko
>> <vadim.fedorenko@linux.dev>; Gomes, Vinicius
>> <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran
>> <richardcochran@gmail.com>; linux-kernel@vger.kernel.org; Andrew Lunn
>> <andrew+netdev@lunn.ch>; Eric Dumazet <edumazet@google.com>; intel-
>> wired-lan@lists.osuosl.org; Keller, Jacob E
>> <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
>> Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net>;
>> Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> Subject: Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx
>> timestamp directly from interrupt for i210
>>
>> On Thu Feb 05 2026, Loktionov, Aleksandr wrote:
>>>> +/**
>>>> + * igb_ptp_tx_tstamp_event
>>>> + * @adapter: pointer to igb adapter
>>>> + *
>>>> + * This function checks the TSYNCTXCTL valid bit and stores the Tx
>>>> +hardware
>>>> + * timestamp at the current skb.
>>>> + **/
>>>> +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter) {
>>>> + struct e1000_hw *hw = &adapter->hw;
>>>> + u32 tsynctxctl;
>>>> +
>>>> + if (!adapter->ptp_tx_skb)
>>>> + return;
>>>> +
>>>> + tsynctxctl = rd32(E1000_TSYNCTXCTL);
>>>> + if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID)))
>>>> + return;
>>>> +
>>>> + igb_ptp_tx_hwtstamp(adapter); <-Calls existing function
>> designed for work queue!
>>>
>>> skb_tstamp_tx() can sleep
>>> Smells like sleep-in-atomic isn't it?
>>
>> AFAICS skb_tstamp_tx() is safe to call here.
>>
>>> spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep
>>
>> In case you're worried about PREEMPT_RT: On -RT the IRQ runs a
>> dedicated thread. BTW I've tested this with and without -RT and with
>> CONFIG_DEBUG_ATOMIC_SLEEP.
>>
>> Thanks,
>> Kurt
>
> Thank you, Kurt for sharing your experience. I don't have so many experience with RT Linux.
> For me calling a function, not designed to be called from IRQ context is a SUS.
> So, I rose the question about sleeping.
>
My understanding is that RT is only safe to convert such spinlock_t to
mutex *because* it also converts IRQs to threads.
On 2026-02-05 09:47:14 [+0000], Loktionov, Aleksandr wrote: … > > --- a/drivers/net/ethernet/intel/igb/igb_ptp.c > > +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c > > @@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct … > > + igb_ptp_tx_hwtstamp(adapter); <-Calls existing function designed for work queue! > > skb_tstamp_tx() can sleep > Smells like sleep-in-atomic isn't it? How or where can it sleep? Sebastian
> -----Original Message----- > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Sent: Thursday, February 5, 2026 11:04 AM > To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com> > Cc: Kurt Kanzenbach <kurt@linutronix.de>; Nguyen, Anthony L > <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw > <przemyslaw.kitszel@intel.com>; Paul Menzel <pmenzel@molgen.mpg.de>; > Vadim Fedorenko <vadim.fedorenko@linux.dev>; Gomes, Vinicius > <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran > <richardcochran@gmail.com>; linux-kernel@vger.kernel.org; Andrew Lunn > <andrew+netdev@lunn.ch>; Eric Dumazet <edumazet@google.com>; intel- > wired-lan@lists.osuosl.org; Keller, Jacob E > <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo > Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net> > Subject: Re: RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve > Tx timestamp directly from interrupt for i210 > > On 2026-02-05 09:47:14 [+0000], Loktionov, Aleksandr wrote: > … > > > --- a/drivers/net/ethernet/intel/igb/igb_ptp.c > > > +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c > > > @@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct > … > > > + igb_ptp_tx_hwtstamp(adapter); <-Calls existing function > designed for work queue! > > > > skb_tstamp_tx() can sleep > > Smells like sleep-in-atomic isn't it? > > How or where can it sleep? > > Sebastian igb_ptp_tx_hwtstamp() -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_tstamp_tx -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/__skb_complete_tx_timestamp -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/sock_queue_err_skb -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_queue_tail -> https://elixir.bootlin.com/linux/v6.19-rc5/source/net/core/skbuff.c#L4075 spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep
On 05/02/2026 10:37, Loktionov, Aleksandr wrote: > > >> -----Original Message----- >> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> >> Sent: Thursday, February 5, 2026 11:04 AM >> To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com> >> Cc: Kurt Kanzenbach <kurt@linutronix.de>; Nguyen, Anthony L >> <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw >> <przemyslaw.kitszel@intel.com>; Paul Menzel <pmenzel@molgen.mpg.de>; >> Vadim Fedorenko <vadim.fedorenko@linux.dev>; Gomes, Vinicius >> <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran >> <richardcochran@gmail.com>; linux-kernel@vger.kernel.org; Andrew Lunn >> <andrew+netdev@lunn.ch>; Eric Dumazet <edumazet@google.com>; intel- >> wired-lan@lists.osuosl.org; Keller, Jacob E >> <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo >> Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net> >> Subject: Re: RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve >> Tx timestamp directly from interrupt for i210 >> >> On 2026-02-05 09:47:14 [+0000], Loktionov, Aleksandr wrote: >> … >>>> --- a/drivers/net/ethernet/intel/igb/igb_ptp.c >>>> +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c >>>> @@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct >> … >>>> + igb_ptp_tx_hwtstamp(adapter); <-Calls existing function >> designed for work queue! >>> >>> skb_tstamp_tx() can sleep >>> Smells like sleep-in-atomic isn't it? >> >> How or where can it sleep? >> >> Sebastian > > igb_ptp_tx_hwtstamp() -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_tstamp_tx -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/__skb_complete_tx_timestamp -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/sock_queue_err_skb -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_queue_tail -> https://elixir.bootlin.com/linux/v6.19-rc5/source/net/core/skbuff.c#L4075 > > spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep Hmm... that actually means we have some drivers broken for RT kernels if they are processing TX timestamps within a single irq vector: - hisilicon/hns3 - intel/i40e (and ice probably) - marvell/mvpp2 For igb/igc/i40e it's still OK to process TX timestamps directly in MSI-X configuration, as ring processing has separate vector, right? But in general skb_tstamp_tx should be moved to BH processing (NAPI poll callback).
On 2026-02-05 11:56:44 [+0000], Vadim Fedorenko wrote: > On 05/02/2026 10:37, Loktionov, Aleksandr wrote: > > spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep > > Hmm... that actually means we have some drivers broken for RT kernels if > they are processing TX timestamps within a single irq vector: > - hisilicon/hns3 > - intel/i40e (and ice probably) > - marvell/mvpp2 > > For igb/igc/i40e it's still OK to process TX timestamps directly in > MSI-X configuration, as ring processing has separate vector, right? The statement made above is not accurate. Each and every driver does request_irq() and here on PREEMPT_RT you can freely acquire spinlock_t. But !RT looks problematic… __skb_tstamp_tx() invokes skb_may_tx_timestamp() which should exit early most of the time due to the passed bool (which is true) or sysctl_tstamp_allow_data which is true. However, should both be false then it tries to read_lock_bh(&sk->sk_callback_lock); where lockdep will complain because this lock is now acquired with disabled interrupts. The function will attempt do free the fresh/ cloned skb in error case via kfree_skb(). Since it is fresh skb, sk_buff::destructor is NULL and the warning in skb_release_head_state() won't trigger. So the only thing that bothers me is the read_lock_bh() in skb_may_tx_timestamp() which deadlocks if the socket is write-locked on the same CPU. > But in general skb_tstamp_tx should be moved to BH processing (NAPI poll > callback). Sebastian
On 05/02/2026 14:51, Sebastian Andrzej Siewior wrote: > On 2026-02-05 11:56:44 [+0000], Vadim Fedorenko wrote: >> On 05/02/2026 10:37, Loktionov, Aleksandr wrote: >>> spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep >> >> Hmm... that actually means we have some drivers broken for RT kernels if >> they are processing TX timestamps within a single irq vector: >> - hisilicon/hns3 >> - intel/i40e (and ice probably) >> - marvell/mvpp2 >> >> For igb/igc/i40e it's still OK to process TX timestamps directly in >> MSI-X configuration, as ring processing has separate vector, right? > > The statement made above is not accurate. Each and every driver does > request_irq() and here on PREEMPT_RT you can freely acquire spinlock_t. > > But !RT looks problematic… > > __skb_tstamp_tx() invokes skb_may_tx_timestamp() which should exit early > most of the time due to the passed bool (which is true) or > sysctl_tstamp_allow_data which is true. However, should both be false > then it tries to > read_lock_bh(&sk->sk_callback_lock); > > where lockdep will complain because this lock is now acquired with > disabled interrupts. > > The function will attempt do free the fresh/ cloned skb in error case > via kfree_skb(). Since it is fresh skb, sk_buff::destructor is NULL and > the warning in skb_release_head_state() won't trigger. > > So the only thing that bothers me is the read_lock_bh() in > skb_may_tx_timestamp() which deadlocks if the socket is write-locked on > the same CPU. Alright. Now you make me think whether we should enforce OPT_TSONLY option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this check, and in case sysctl was flipped off - drop TX timestamps as it's done now?
On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote: > > So the only thing that bothers me is the read_lock_bh() in > > skb_may_tx_timestamp() which deadlocks if the socket is write-locked on > > the same CPU. > > Alright. Now you make me think whether we should enforce OPT_TSONLY > option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this > check, and in case sysctl was flipped off - drop TX timestamps as > it's done now? This would "fix" this problem for all users which do deliver the timestamp from their IRQ handler instead of napi. There are a few of those… This would be considered stable material, right? (despite the fact that we have it for quite some time and nobody complained so far). Sebastian
On 05/02/2026 16:43, Sebastian Andrzej Siewior wrote: > On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote: >>> So the only thing that bothers me is the read_lock_bh() in >>> skb_may_tx_timestamp() which deadlocks if the socket is write-locked on >>> the same CPU. >> >> Alright. Now you make me think whether we should enforce OPT_TSONLY >> option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this >> check, and in case sysctl was flipped off - drop TX timestamps as >> it's done now? > > This would "fix" this problem for all users which do deliver the > timestamp from their IRQ handler instead of napi. There are a few of > those… > This would be considered stable material, right? (despite the fact that > we have it for quite some time and nobody complained so far). cc: Willem as he is the author of the check introduced back in 2015. But it's more like a question to maintainers whether it is acceptable way of "fixing" drivers or it's no-go solution
Vadim Fedorenko wrote: > On 05/02/2026 16:43, Sebastian Andrzej Siewior wrote: > > On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote: > >>> So the only thing that bothers me is the read_lock_bh() in > >>> skb_may_tx_timestamp() which deadlocks if the socket is write-locked on > >>> the same CPU. > >> > >> Alright. Now you make me think whether we should enforce OPT_TSONLY > >> option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this > >> check, and in case sysctl was flipped off - drop TX timestamps as > >> it's done now? > > > > This would "fix" this problem for all users which do deliver the > > timestamp from their IRQ handler instead of napi. There are a few of > > those… > > This would be considered stable material, right? (despite the fact that > > we have it for quite some time and nobody complained so far). > > cc: Willem as he is the author of the check introduced back in 2015. > > But it's more like a question to maintainers whether it is acceptable > way of "fixing" drivers or it's no-go solution Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users.
On 05.02.2026 21:41, Willem de Bruijn wrote: > Vadim Fedorenko wrote: >> On 05/02/2026 16:43, Sebastian Andrzej Siewior wrote: >>> On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote: >>>>> So the only thing that bothers me is the read_lock_bh() in >>>>> skb_may_tx_timestamp() which deadlocks if the socket is write-locked on >>>>> the same CPU. >>>> >>>> Alright. Now you make me think whether we should enforce OPT_TSONLY >>>> option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this >>>> check, and in case sysctl was flipped off - drop TX timestamps as >>>> it's done now? >>> >>> This would "fix" this problem for all users which do deliver the >>> timestamp from their IRQ handler instead of napi. There are a few of >>> those… >>> This would be considered stable material, right? (despite the fact that >>> we have it for quite some time and nobody complained so far). >> >> cc: Willem as he is the author of the check introduced back in 2015. >> >> But it's more like a question to maintainers whether it is acceptable >> way of "fixing" drivers or it's no-go solution > > Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users. Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX timestamps are silently dropped. To receive these timestamps users have to get CAP_NET_RAW permission, and it will work with the updated logic as well...
Vadim Fedorenko wrote: > On 05.02.2026 21:41, Willem de Bruijn wrote: > > Vadim Fedorenko wrote: > >> On 05/02/2026 16:43, Sebastian Andrzej Siewior wrote: > >>> On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote: > >>>>> So the only thing that bothers me is the read_lock_bh() in > >>>>> skb_may_tx_timestamp() which deadlocks if the socket is write-locked on > >>>>> the same CPU. > >>>> > >>>> Alright. Now you make me think whether we should enforce OPT_TSONLY > >>>> option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this > >>>> check, and in case sysctl was flipped off - drop TX timestamps as > >>>> it's done now? > >>> > >>> This would "fix" this problem for all users which do deliver the > >>> timestamp from their IRQ handler instead of napi. There are a few of > >>> those… > >>> This would be considered stable material, right? (despite the fact that > >>> we have it for quite some time and nobody complained so far). > >> > >> cc: Willem as he is the author of the check introduced back in 2015. > >> > >> But it's more like a question to maintainers whether it is acceptable > >> way of "fixing" drivers or it's no-go solution > > > > Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users. > > Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX > timestamps are silently dropped. Are you referring to sysctl_tstamp_allow_data? That is enabled by default. > To receive these timestamps users have to get > CAP_NET_RAW permission, and it will work with the updated logic as well...
On 2026-02-05 16:41:03 [-0500], Willem de Bruijn wrote: > > Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users. okay. Can we move the check to sock_set_timestamping()/ setsockopt() time? On the plus side we could throw an error instead silently dropping packets. This might be a late win given that you describe the users as legacy users. I'm not sure if the "permission" can change over time and so get revoked while an application is running. Sebastian
On 2026-02-05 10:37:05 [+0000], Loktionov, Aleksandr wrote: > > > > How or where can it sleep? > > > > Sebastian > > igb_ptp_tx_hwtstamp() -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_tstamp_tx -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/__skb_complete_tx_timestamp -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/sock_queue_err_skb -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_queue_tail -> https://elixir.bootlin.com/linux/v6.19-rc5/source/net/core/skbuff.c#L4075 Would you please quote an actual call chain that can be looked up and not this where a line crosses 300 characters? > spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep Okay. So you are concerned about this spinlock_t, I see. igb_tsync_interrupt() also invokes ptp_clock_event() which acquires pps_event_time::tsevqs_lock. Why is this not a problem? Sebastian
© 2016 - 2026 Red Hat, Inc.