.../ethernet/mellanox/mlx5/core/lib/clock.c | 113 ++++++++++++++++-- drivers/ptp/ptp_chardev.c | 34 ++++-- include/uapi/linux/ptp_clock.h | 4 + 3 files changed, 130 insertions(+), 21 deletions(-)
Hi, This series by Carolina adds support in ptp and usage in mlx5 for exposing the raw free-running cycle counter of PTP hardware clocks. Find detailed description by Carolina below [1]. Regards, Tariq [1] This patch series introduces support for exposing the raw free-running cycle counter of PTP hardware clocks. Some telemetry and low-level logging use cycle counter timestamps rather than nanoseconds. Currently, there is no generic interface to correlate these raw values with system time. To address this, the series introduces two new ioctl commands that allow userspace to query the device's raw cycle counter together with host time: - PTP_SYS_OFFSET_PRECISE_CYCLES - PTP_SYS_OFFSET_EXTENDED_CYCLES These commands work like their existing counterparts but return the device timestamp in cycle units instead of real-time nanoseconds. This can also be useful in the XDP fast path: if a driver inserts the raw cycle value into metadata instead of a real-time timestamp, it can avoid the overhead of converting cycles to time in the kernel. Then userspace can resolve the cycle-to-time mapping using this ioctl when needed. Adds the new PTP ioctls and integrates support in ptp_ioctl(): - ptp: Add ioctl commands to expose raw cycle counter values Support for exposing raw cycles in mlx5: - net/mlx5: Extract MTCTR register read logic into helper function - net/mlx5: Support getcyclesx and getcrosscycles Carolina Jubran (3): ptp: Add ioctl commands to expose raw cycle counter values net/mlx5: Extract MTCTR register read logic into helper function net/mlx5: Support getcyclesx and getcrosscycles .../ethernet/mellanox/mlx5/core/lib/clock.c | 113 ++++++++++++++++-- drivers/ptp/ptp_chardev.c | 34 ++++-- include/uapi/linux/ptp_clock.h | 4 + 3 files changed, 130 insertions(+), 21 deletions(-) base-commit: 06baf9bfa6ca8db7d5f32e12e27d1dc1b7cb3a8a -- 2.31.1
On Tue, 15 Jul 2025 08:15:30 +0300 Tariq Toukan wrote: > This patch series introduces support for exposing the raw free-running > cycle counter of PTP hardware clocks. Could you say more about use cases? I realized when massaging the cover letter to apply the series that all the use cases are vague and hypothetical. > Some telemetry and low-level logging use cycle counter timestamps > rather than nanoseconds. What is that "some telemetry"? > Currently, there is no generic interface to > correlate these raw values with system time. > > To address this, the series introduces two new ioctl commands that > allow userspace to query the device's raw cycle counter together with > host time: > > - PTP_SYS_OFFSET_PRECISE_CYCLES > > - PTP_SYS_OFFSET_EXTENDED_CYCLES > > These commands work like their existing counterparts but return the > device timestamp in cycle units instead of real-time nanoseconds. > > This can also be useful in the XDP fast path: if a driver inserts the > raw cycle value into metadata instead of a real-time timestamp, it can > avoid the overhead of converting cycles to time in the kernel. Then > userspace can resolve the cycle-to-time mapping using this ioctl when > needed. There is no API to achieve that today, right? The XDP access helpers are supposed to return converted time. Are you planning to add new callbacks? If there are solid networking use cases for this I'd prefer we fully iron them out before merging this uAPI. If there are RDMA use cases please spell them out in more detail. -- pw-bot: cr
On 22/07/2025 3:09, Jakub Kicinski wrote: > On Tue, 15 Jul 2025 08:15:30 +0300 Tariq Toukan wrote: >> This patch series introduces support for exposing the raw free-running >> cycle counter of PTP hardware clocks. > > Could you say more about use cases? I realized when massaging the cover > letter to apply the series that all the use cases are vague and > hypothetical. > >> Some telemetry and low-level logging use cycle counter timestamps >> rather than nanoseconds. > > What is that "some telemetry"? > >> Currently, there is no generic interface to >> correlate these raw values with system time. >> >> To address this, the series introduces two new ioctl commands that >> allow userspace to query the device's raw cycle counter together with >> host time: >> >> - PTP_SYS_OFFSET_PRECISE_CYCLES >> >> - PTP_SYS_OFFSET_EXTENDED_CYCLES >> >> These commands work like their existing counterparts but return the >> device timestamp in cycle units instead of real-time nanoseconds. >> >> This can also be useful in the XDP fast path: if a driver inserts the >> raw cycle value into metadata instead of a real-time timestamp, it can >> avoid the overhead of converting cycles to time in the kernel. Then >> userspace can resolve the cycle-to-time mapping using this ioctl when >> needed. > > There is no API to achieve that today, right? The XDP access helpers > are supposed to return converted time. Are you planning to add new > callbacks? > > If there are solid networking use cases for this I'd prefer we fully > iron them out before merging this uAPI. If there are RDMA use cases > please spell them out in more detail. Hi Jakub Thanks for the feedback. One concrete use case is monitoring the frequency stability of the device clock in FreeRunning mode. User space can periodically sample the (cycle, time) pairs returned by the new ioctl to estimate the clock’s frequency and detect anomalies, for example, drift caused by temperature changes. This is especially useful in holdover scenarios. Another practical case is with DPDK. When the hardware is in FreeRunning mode, the CQE contains raw cycle counter values. DPDK returns these values directly to user space without converting them. The new ioctl provides a generic and consistent way to translate those raw values to host time. As for XDP, you’re right that it doesn’t expose raw cycles today. The point here is more future-looking: if drivers ever choose to emit raw cycles into metadata for performance, this API gives user space a clean way to interpret those timestamps. Carolina
On Tue, 29 Jul 2025 09:57:13 +0300 Carolina Jubran wrote: > One concrete use case is monitoring the frequency stability of the > device clock in FreeRunning mode. User space can periodically sample the > (cycle, time) pairs returned by the new ioctl to estimate the clock’s > frequency and detect anomalies, for example, drift caused by temperature > changes. This is especially useful in holdover scenarios. Because the servo running on the host doesn't know the stability? Seems like your real use case is the one below. > Another practical case is with DPDK. When the hardware is in FreeRunning > mode, the CQE contains raw cycle counter values. DPDK returns these > values directly to user space without converting them. The new ioctl > provides a generic and consistent way to translate those raw values to > host time. > > As for XDP, you’re right that it doesn’t expose raw cycles today. The > point here is more future-looking: if drivers ever choose to emit raw > cycles into metadata for performance, this API gives user space a clean > way to interpret those timestamps. Got it, I can see how DPDK / kernel bypass may need this. Please include this justification in the commit message for v2 and let's see if anyone merges it.
On 30/07/2025 1:40, Jakub Kicinski wrote: > On Tue, 29 Jul 2025 09:57:13 +0300 Carolina Jubran wrote: >> One concrete use case is monitoring the frequency stability of the >> device clock in FreeRunning mode. User space can periodically sample the >> (cycle, time) pairs returned by the new ioctl to estimate the clock’s >> frequency and detect anomalies, for example, drift caused by temperature >> changes. This is especially useful in holdover scenarios. > > Because the servo running on the host doesn't know the stability? > Seems like your real use case is the one below. > >> Another practical case is with DPDK. When the hardware is in FreeRunning >> mode, the CQE contains raw cycle counter values. DPDK returns these >> values directly to user space without converting them. The new ioctl >> provides a generic and consistent way to translate those raw values to >> host time. >> >> As for XDP, you’re right that it doesn’t expose raw cycles today. The >> point here is more future-looking: if drivers ever choose to emit raw >> cycles into metadata for performance, this API gives user space a clean >> way to interpret those timestamps. > > Got it, I can see how DPDK / kernel bypass may need this. > > Please include this justification in the commit message for v2 > and let's see if anyone merges it. Thanks, I’ll include the DPDK/kernel bypass justification clearly in the v2 commit message and cover letter. Additionally, I wanted to mention another relevant use case that wasn’t brought up earlier: fwctl can expose event records tagged with raw cycle counter timestamps. When the device is in free-running mode, correlating those with host time becomes difficult unless user space has access to both cycle and system time snapshots.
On Thu, 31 Jul 2025 22:03:02 +0300 Carolina Jubran wrote: > Additionally, I wanted to mention another relevant use case that wasn’t > brought up earlier: fwctl can expose event records tagged with raw cycle > counter timestamps. When the device is in free-running mode, correlating > those with host time becomes difficult unless user space has access to > both cycle and system time snapshots. Okay, so DPDK and DOCA, got it.
On Tue, 15 Jul 2025 08:15:30 +0300 Tariq Toukan wrote: > This patch series introduces support for exposing the raw free-running > cycle counter of PTP hardware clocks. Some telemetry and low-level > logging use cycle counter timestamps rather than nanoseconds. > Currently, there is no generic interface to correlate these raw values > with system time. > > To address this, the series introduces two new ioctl commands that > allow userspace to query the device's raw cycle counter together with > host time: > > - PTP_SYS_OFFSET_PRECISE_CYCLES > > - PTP_SYS_OFFSET_EXTENDED_CYCLES > > These commands work like their existing counterparts but return the > device timestamp in cycle units instead of real-time nanoseconds. > > This can also be useful in the XDP fast path: if a driver inserts the > raw cycle value into metadata instead of a real-time timestamp, it can > avoid the overhead of converting cycles to time in the kernel. Then > userspace can resolve the cycle-to-time mapping using this ioctl when > needed. > > Adds the new PTP ioctls and integrates support in ptp_ioctl(): > - ptp: Add ioctl commands to expose raw cycle counter values > > Support for exposing raw cycles in mlx5: > - net/mlx5: Extract MTCTR register read logic into helper function > - net/mlx5: Support getcyclesx and getcrosscycles It'd be great to an Ack from Thomas or Richard on this (or failing that at least other vendors?) Seems like we have a number of parallel efforts to extend the PTP uAPI, I'm not sure how they all square against each other, TBH. Full thread for folks I CCed in: https://lore.kernel.org/all/1752556533-39218-1-git-send-email-tariqt@nvidia.com/
On 7/18/2025 4:29 PM, Jakub Kicinski wrote: > On Tue, 15 Jul 2025 08:15:30 +0300 Tariq Toukan wrote: >> This patch series introduces support for exposing the raw free-running >> cycle counter of PTP hardware clocks. Some telemetry and low-level >> logging use cycle counter timestamps rather than nanoseconds. >> Currently, there is no generic interface to correlate these raw values >> with system time. >> >> To address this, the series introduces two new ioctl commands that >> allow userspace to query the device's raw cycle counter together with >> host time: >> >> - PTP_SYS_OFFSET_PRECISE_CYCLES >> >> - PTP_SYS_OFFSET_EXTENDED_CYCLES >> >> These commands work like their existing counterparts but return the >> device timestamp in cycle units instead of real-time nanoseconds. >> >> This can also be useful in the XDP fast path: if a driver inserts the >> raw cycle value into metadata instead of a real-time timestamp, it can >> avoid the overhead of converting cycles to time in the kernel. Then >> userspace can resolve the cycle-to-time mapping using this ioctl when >> needed. >> >> Adds the new PTP ioctls and integrates support in ptp_ioctl(): >> - ptp: Add ioctl commands to expose raw cycle counter values >> >> Support for exposing raw cycles in mlx5: >> - net/mlx5: Extract MTCTR register read logic into helper function >> - net/mlx5: Support getcyclesx and getcrosscycles > > It'd be great to an Ack from Thomas or Richard on this (or failing that > at least other vendors?) Seems like we have a number of parallel > efforts to extend the PTP uAPI, I'm not sure how they all square > against each other, TBH. > > Full thread for folks I CCed in: > https://lore.kernel.org/all/1752556533-39218-1-git-send-email-tariqt@nvidia.com/ > I agree with Jakub about the need to properly explain the use cases and goals in the commit and cover letter. AFAIK there are no current public APIs for reporting cycles to userspace, so this really only makes sense with something like DPDK. Even the XDP related helpers expect nanosecond units now. Its unclear if we will need other parts of the APIs to also handle cycles, or if simple ability to get the current cycles is sufficient. The API also doesn't directly provide a way to query the expected or nominal relationship between cycles and clock time. If you try to just use PTP_SYS_OFFSET_EXTENDED_CYCLES to compare a cycles value to a clock value to adjust a timestamp, that requires that some other process is keeping CLOCK_REALTIME and the PHC clock synchronized. When handled within the driver, the software typically has an assumption about the relationship based on expected frequencies. Thus, a conversion from cycles to time uses this relationship. You don't appear to expose that relationship through the API, which means you can only infer it either by knowing the device, or by assuming CLOCK_REALTIME is already synchronized with the PHC? I guess userspace could also simply build its own equivalent of the struct timecounter using this API.. hmm.
On 30/07/2025 2:33, Jacob Keller wrote: > > > On 7/18/2025 4:29 PM, Jakub Kicinski wrote: >> On Tue, 15 Jul 2025 08:15:30 +0300 Tariq Toukan wrote: >>> This patch series introduces support for exposing the raw free-running >>> cycle counter of PTP hardware clocks. Some telemetry and low-level >>> logging use cycle counter timestamps rather than nanoseconds. >>> Currently, there is no generic interface to correlate these raw values >>> with system time. >>> >>> To address this, the series introduces two new ioctl commands that >>> allow userspace to query the device's raw cycle counter together with >>> host time: >>> >>> - PTP_SYS_OFFSET_PRECISE_CYCLES >>> >>> - PTP_SYS_OFFSET_EXTENDED_CYCLES >>> >>> These commands work like their existing counterparts but return the >>> device timestamp in cycle units instead of real-time nanoseconds. >>> >>> This can also be useful in the XDP fast path: if a driver inserts the >>> raw cycle value into metadata instead of a real-time timestamp, it can >>> avoid the overhead of converting cycles to time in the kernel. Then >>> userspace can resolve the cycle-to-time mapping using this ioctl when >>> needed. >>> >>> Adds the new PTP ioctls and integrates support in ptp_ioctl(): >>> - ptp: Add ioctl commands to expose raw cycle counter values >>> >>> Support for exposing raw cycles in mlx5: >>> - net/mlx5: Extract MTCTR register read logic into helper function >>> - net/mlx5: Support getcyclesx and getcrosscycles >> >> It'd be great to an Ack from Thomas or Richard on this (or failing that >> at least other vendors?) Seems like we have a number of parallel >> efforts to extend the PTP uAPI, I'm not sure how they all square >> against each other, TBH. >> >> Full thread for folks I CCed in: >> https://lore.kernel.org/all/1752556533-39218-1-git-send-email-tariqt@nvidia.com/ >> > > I agree with Jakub about the need to properly explain the use cases and > goals in the commit and cover letter. AFAIK there are no current public > APIs for reporting cycles to userspace, so this really only makes sense > with something like DPDK. Even the XDP related helpers expect nanosecond > units now. Its unclear if we will need other parts of the APIs to also > handle cycles, or if simple ability to get the current cycles is sufficient. > > The API also doesn't directly provide a way to query the expected or > nominal relationship between cycles and clock time. > > If you try to just use PTP_SYS_OFFSET_EXTENDED_CYCLES to compare a > cycles value to a clock value to adjust a timestamp, that requires that > some other process is keeping CLOCK_REALTIME and the PHC clock > synchronized. When handled within the driver, the software typically has > an assumption about the relationship based on expected frequencies. > Thus, a conversion from cycles to time uses this relationship. > > You don't appear to expose that relationship through the API, which > means you can only infer it either by knowing the device, or by assuming > CLOCK_REALTIME is already synchronized with the PHC? > > I guess userspace could also simply build its own equivalent of the > struct timecounter using this API.. hmm. Hi Jacob, You’re right I’m not trying to reason about the nominal frequency. The goal is to collect (cycle, system time) pairs and use them to correlate raw device timestamps with host time. This doesn’t require the PHC to be synchronized to CLOCK_REALTIME, but it does assume the user can estimate the drift or nominal frequency from the ioctl data. I’ll clarify this in v2. Thanks, Carolina
On Fri, Jul 18 2025 at 16:29, Jakub Kicinski wrote: > On Tue, 15 Jul 2025 08:15:30 +0300 Tariq Toukan wrote: >> This patch series introduces support for exposing the raw free-running >> cycle counter of PTP hardware clocks. Some telemetry and low-level >> logging use cycle counter timestamps rather than nanoseconds. >> Currently, there is no generic interface to correlate these raw values >> with system time. >> >> To address this, the series introduces two new ioctl commands that >> allow userspace to query the device's raw cycle counter together with >> host time: >> >> - PTP_SYS_OFFSET_PRECISE_CYCLES >> >> - PTP_SYS_OFFSET_EXTENDED_CYCLES >> >> These commands work like their existing counterparts but return the >> device timestamp in cycle units instead of real-time nanoseconds. >> >> This can also be useful in the XDP fast path: if a driver inserts the >> raw cycle value into metadata instead of a real-time timestamp, it can >> avoid the overhead of converting cycles to time in the kernel. Then >> userspace can resolve the cycle-to-time mapping using this ioctl when >> needed. >> >> Adds the new PTP ioctls and integrates support in ptp_ioctl(): >> - ptp: Add ioctl commands to expose raw cycle counter values >> >> Support for exposing raw cycles in mlx5: >> - net/mlx5: Extract MTCTR register read logic into helper function >> - net/mlx5: Support getcyclesx and getcrosscycles > > It'd be great to an Ack from Thomas or Richard on this (or failing that > at least other vendors?) Seems like we have a number of parallel > efforts to extend the PTP uAPI, I'm not sure how they all square > against each other, TBH. I don't see a conflict vs. the aux clock support. These are orthogonal issues and from a conceptual point it makes sense to me to expose the raw cycles for the purposes Tariq described. Thanks, tglx
© 2016 - 2025 Red Hat, Inc.