[PATCH v3 0/7] Move memory listener register to vhost_vdpa_init

Jonah Palmer posted 7 patches 9 months, 1 week ago
Only 0 patches received!
There is a newer version of this series
hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
include/hw/virtio/vhost-vdpa.h | 22 +++++++-
net/vhost-vdpa.c               | 34 ++----------
3 files changed, 88 insertions(+), 66 deletions(-)
[PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
Posted by Jonah Palmer 9 months, 1 week ago
Current memory operations like pinning may take a lot of time at the
destination.  Currently they are done after the source of the migration is
stopped, and before the workload is resumed at the destination.  This is a
period where neigher traffic can flow, nor the VM workload can continue
(downtime).

We can do better as we know the memory layout of the guest RAM at the
destination from the moment that all devices are initializaed.  So
moving that operation allows QEMU to communicate the kernel the maps
while the workload is still running in the source, so Linux can start
mapping them.

As a small drawback, there is a time in the initialization where QEMU
cannot respond to QMP etc.  By some testing, this time is about
0.2seconds.  This may be further reduced (or increased) depending on the
vdpa driver and the platform hardware, and it is dominated by the cost
of memory pinning.

This matches the time that we move out of the called downtime window.
The downtime is measured as checking the trace timestamp from the moment
the source suspend the device to the moment the destination starts the
eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
secs to 2.0949.

Future directions on top of this series may include to move more things ahead
of the migration time, like set DRIVER_OK or perform actual iterative migration
of virtio-net devices.

Comments are welcome.

This series is a different approach of series [1]. As the title does not
reflect the changes anymore, please refer to the previous one to know the
series history.

This series is based on [2], it must be applied after it.

[Jonah Palmer]
This series was rebased after [3] was pulled in, as [3] was a prerequisite
fix for this series.

v3:
---
* Rebase

v2:
---
* Move the memory listener registration to vhost_vdpa_set_owner function.
* Move the iova_tree allocation to net_vhost_vdpa_init.

v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.

[1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
[2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
[3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/

Eugenio Pérez (7):
  vdpa: check for iova tree initialized at net_client_start
  vdpa: reorder vhost_vdpa_set_backend_cap
  vdpa: set backend capabilities at vhost_vdpa_init
  vdpa: add listener_registered
  vdpa: reorder listener assignment
  vdpa: move iova_tree allocation to net_vhost_vdpa_init
  vdpa: move memory listener register to vhost_vdpa_init

 hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
 include/hw/virtio/vhost-vdpa.h | 22 +++++++-
 net/vhost-vdpa.c               | 34 ++----------
 3 files changed, 88 insertions(+), 66 deletions(-)

-- 
2.43.5


Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
Posted by Lei Yang 9 months ago
Hi Jonah

I tested this series with the vhost_vdpa device based on mellanox
ConnectX-6 DX nic and hit the host kernel crash. This problem can be
easier to reproduce under the hotplug/unplug device scenario.
For the core dump messages please review the attachment.
FW version:
#  flint -d 0000:0d:00.0 q |grep Version
FW Version:            22.44.1036
Product Version:       22.44.1036

Best Regards
Lei

On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
> Current memory operations like pinning may take a lot of time at the
> destination.  Currently they are done after the source of the migration is
> stopped, and before the workload is resumed at the destination.  This is a
> period where neigher traffic can flow, nor the VM workload can continue
> (downtime).
>
> We can do better as we know the memory layout of the guest RAM at the
> destination from the moment that all devices are initializaed.  So
> moving that operation allows QEMU to communicate the kernel the maps
> while the workload is still running in the source, so Linux can start
> mapping them.
>
> As a small drawback, there is a time in the initialization where QEMU
> cannot respond to QMP etc.  By some testing, this time is about
> 0.2seconds.  This may be further reduced (or increased) depending on the
> vdpa driver and the platform hardware, and it is dominated by the cost
> of memory pinning.
>
> This matches the time that we move out of the called downtime window.
> The downtime is measured as checking the trace timestamp from the moment
> the source suspend the device to the moment the destination starts the
> eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
> secs to 2.0949.
>
> Future directions on top of this series may include to move more things ahead
> of the migration time, like set DRIVER_OK or perform actual iterative migration
> of virtio-net devices.
>
> Comments are welcome.
>
> This series is a different approach of series [1]. As the title does not
> reflect the changes anymore, please refer to the previous one to know the
> series history.
>
> This series is based on [2], it must be applied after it.
>
> [Jonah Palmer]
> This series was rebased after [3] was pulled in, as [3] was a prerequisite
> fix for this series.
>
> v3:
> ---
> * Rebase
>
> v2:
> ---
> * Move the memory listener registration to vhost_vdpa_set_owner function.
> * Move the iova_tree allocation to net_vhost_vdpa_init.
>
> v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
>
> [1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
> [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
> [3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/
>
> Eugenio Pérez (7):
>   vdpa: check for iova tree initialized at net_client_start
>   vdpa: reorder vhost_vdpa_set_backend_cap
>   vdpa: set backend capabilities at vhost_vdpa_init
>   vdpa: add listener_registered
>   vdpa: reorder listener assignment
>   vdpa: move iova_tree allocation to net_vhost_vdpa_init
>   vdpa: move memory listener register to vhost_vdpa_init
>
>  hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
>  include/hw/virtio/vhost-vdpa.h | 22 +++++++-
>  net/vhost-vdpa.c               | 34 ++----------
>  3 files changed, 88 insertions(+), 66 deletions(-)
>
> --
> 2.43.5
>
>
[  257.598060] openvswitch: Open vSwitch switching datapath
[  257.826760] mlx5_core 0000:0d:00.0: E-Switch: Enable: mode(LEGACY), nvfs(4), necvfs(0), active vports(5)
[  257.928288] pci 0000:0d:00.2: [15b3:101e] type 00 class 0x020000 PCIe Endpoint
[  257.928363] pci 0000:0d:00.2: enabling Extended Tags
[  257.931674] mlx5_core 0000:0d:00.2: enabling device (0000 -> 0002)
[  257.931747] mlx5_core 0000:0d:00.2: PTM is not supported by PCIe
[  257.931766] mlx5_core 0000:0d:00.2: firmware version: 22.44.1036
[  258.127431] mlx5_core 0000:0d:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  258.139089] mlx5_core 0000:0d:00.2: Assigned random MAC address 76:34:f6:43:68:d7
[  258.282267] mlx5_core 0000:0d:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  258.286453] mlx5_core 0000:0d:00.2 ens2f0v0: renamed from eth0
[  258.317591] pci 0000:0d:00.3: [15b3:101e] type 00 class 0x020000 PCIe Endpoint
[  258.317669] pci 0000:0d:00.3: enabling Extended Tags
[  258.321033] mlx5_core 0000:0d:00.3: enabling device (0000 -> 0002)
[  258.321110] mlx5_core 0000:0d:00.3: PTM is not supported by PCIe
[  258.321128] mlx5_core 0000:0d:00.3: firmware version: 22.44.1036
[  258.442412] mlx5_core 0000:0d:00.2 ens2f0v0: Link up
[  258.521068] mlx5_core 0000:0d:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  258.532963] mlx5_core 0000:0d:00.3: Assigned random MAC address 0a:b5:ca:d1:b3:e8
[  258.674658] mlx5_core 0000:0d:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  258.678230] mlx5_core 0000:0d:00.3 ens2f0v1: renamed from eth0
[  258.708606] pci 0000:0d:00.4: [15b3:101e] type 00 class 0x020000 PCIe Endpoint
[  258.708684] pci 0000:0d:00.4: enabling Extended Tags
[  258.711905] mlx5_core 0000:0d:00.4: enabling device (0000 -> 0002)
[  258.711974] mlx5_core 0000:0d:00.4: PTM is not supported by PCIe
[  258.711991] mlx5_core 0000:0d:00.4: firmware version: 22.44.1036
[  258.832721] mlx5_core 0000:0d:00.3 ens2f0v1: Link up
[  258.909375] mlx5_core 0000:0d:00.4: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  258.921060] mlx5_core 0000:0d:00.4: Assigned random MAC address ca:a1:82:3c:f5:0e
[  259.060439] mlx5_core 0000:0d:00.4: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  259.063961] mlx5_core 0000:0d:00.4 ens2f0v2: renamed from eth0
[  259.094975] pci 0000:0d:00.5: [15b3:101e] type 00 class 0x020000 PCIe Endpoint
[  259.095052] pci 0000:0d:00.5: enabling Extended Tags
[  259.098268] mlx5_core 0000:0d:00.5: enabling device (0000 -> 0002)
[  259.098327] mlx5_core 0000:0d:00.5: PTM is not supported by PCIe
[  259.098344] mlx5_core 0000:0d:00.5: firmware version: 22.44.1036
[  259.217430] mlx5_core 0000:0d:00.4 ens2f0v2: Link up
[  259.294213] mlx5_core 0000:0d:00.5: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  259.305970] mlx5_core 0000:0d:00.5: Assigned random MAC address 42:22:11:90:2f:3b
[  259.446987] mlx5_core 0000:0d:00.5: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  259.450475] mlx5_core 0000:0d:00.5 ens2f0v3: renamed from eth0
[  259.590880] mlx5_core 0000:0d:00.5 ens2f0v3: Link up
[  262.238175] mlx5_core 0000:0d:00.0: E-Switch: Disable: mode(LEGACY), nvfs(4), necvfs(0), active vports(5)
[  263.576242] mlx5_core 0000:0d:00.0: E-Switch: Supported tc chains and prios offload
[  264.016364] mlx5_core 0000:0d:00.0 ens2f0np0: Link up
[  264.017221] mlx5_core 0000:0d:00.0 ens2f0np0: Dropping C-tag vlan stripping offload due to S-tag vlan
[  264.017223] mlx5_core 0000:0d:00.0 ens2f0np0: Disabling HW_VLAN CTAG FILTERING, not supported in switchdev mode
[  264.136922] mlx5_core 0000:0d:00.0 ens2f0npf0vf0: renamed from eth0
[  264.151402] debugfs: Directory 'nic' with parent '0000:0d:00.0' already present!
[  264.214572] mlx5_core 0000:0d:00.0 ens2f0npf0vf1: renamed from eth0
[  264.226278] debugfs: Directory 'nic' with parent '0000:0d:00.0' already present!
[  264.292884] mlx5_core 0000:0d:00.0 ens2f0npf0vf2: renamed from eth0
[  264.303506] debugfs: Directory 'nic' with parent '0000:0d:00.0' already present!
[  264.377461] mlx5_core 0000:0d:00.0: E-Switch: Enable: mode(OFFLOADS), nvfs(4), necvfs(0), active vports(4)
[  264.380931] mlx5_core 0000:0d:00.0 ens2f0npf0vf3: renamed from eth0
[  269.425503] mlx5_core 0000:0d:00.2: enabling device (0000 -> 0002)
[  269.425569] mlx5_core 0000:0d:00.2: PTM is not supported by PCIe
[  269.425588] mlx5_core 0000:0d:00.2: firmware version: 22.44.1036
[  269.620194] mlx5_core 0000:0d:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  269.631907] mlx5_core 0000:0d:00.2: Assigned random MAC address b2:0a:5e:e7:f9:eb
[  269.772359] mlx5_core 0000:0d:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  269.776165] mlx5_core 0000:0d:00.2 ens2f0v0: renamed from eth0
[  269.915517] mlx5_core 0000:0d:00.2 ens2f0v0: Link up
[  272.845232] mlx5_core 0000:0d:00.3: enabling device (0000 -> 0002)
[  272.845299] mlx5_core 0000:0d:00.3: PTM is not supported by PCIe
[  272.845319] mlx5_core 0000:0d:00.3: firmware version: 22.44.1036
[  273.039306] mlx5_core 0000:0d:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  273.051151] mlx5_core 0000:0d:00.3: Assigned random MAC address 6a:6f:da:6b:50:70
[  273.193736] mlx5_core 0000:0d:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  273.197343] mlx5_core 0000:0d:00.3 ens2f0v1: renamed from eth0
[  273.338087] mlx5_core 0000:0d:00.3 ens2f0v1: Link up
[  276.254430] mlx5_core 0000:0d:00.4: enabling device (0000 -> 0002)
[  276.254497] mlx5_core 0000:0d:00.4: PTM is not supported by PCIe
[  276.254515] mlx5_core 0000:0d:00.4: firmware version: 22.44.1036
[  276.448550] mlx5_core 0000:0d:00.4: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  276.460439] mlx5_core 0000:0d:00.4: Assigned random MAC address f6:c5:69:e8:30:10
[  276.601405] mlx5_core 0000:0d:00.4: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  276.604867] mlx5_core 0000:0d:00.4 ens2f0v2: renamed from eth0
[  276.752492] mlx5_core 0000:0d:00.4 ens2f0v2: Link up
[  279.669193] mlx5_core 0000:0d:00.5: enabling device (0000 -> 0002)
[  279.669265] mlx5_core 0000:0d:00.5: PTM is not supported by PCIe
[  279.669282] mlx5_core 0000:0d:00.5: firmware version: 22.44.1036
[  279.863461] mlx5_core 0000:0d:00.5: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  279.875360] mlx5_core 0000:0d:00.5: Assigned random MAC address be:56:90:1d:b5:e0
[  280.017839] mlx5_core 0000:0d:00.5: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  280.021391] mlx5_core 0000:0d:00.5 ens2f0v3: renamed from eth0
[  280.162052] mlx5_core 0000:0d:00.5 ens2f0v3: Link up
[  283.258060] ovs-system: entered promiscuous mode
[  283.298028] GACT probability on
[  283.302732] Timeout policy base is empty
[  283.357058] ens2f0np0_br: entered promiscuous mode
[  283.369803] mlx5_core 0000:0d:00.0 ens2f0np0: entered promiscuous mode
[  283.406604] mlx5_core 0000:0d:00.0 ens2f0npf0vf0: entered promiscuous mode
[  283.441464] mlx5_core 0000:0d:00.0 ens2f0npf0vf1: entered promiscuous mode
[  283.478269] mlx5_core 0000:0d:00.0 ens2f0npf0vf2: entered promiscuous mode
[  283.517622] mlx5_core 0000:0d:00.0 ens2f0npf0vf3: entered promiscuous mode
[  283.533791] Mirror/redirect action on
[  345.179996] FS-Cache: Loaded
[  345.267362] Key type dns_resolver registered
[  345.435186] NFS: Registering the id_resolver key type
[  345.435195] Key type id_resolver registered
[  345.435196] Key type id_legacy registered
[  345.576167] systemd-rc-local-generator[3537]: /etc/rc.d/rc.local is not marked executable, skipping.
[  376.131613] Bluetooth: Core ver 2.22
[  376.131636] NET: Registered PF_BLUETOOTH protocol family
[  376.131638] Bluetooth: HCI device and connection manager initialized
[  376.131642] Bluetooth: HCI socket layer initialized
[  376.131643] Bluetooth: L2CAP socket layer initialized
[  376.131647] Bluetooth: SCO socket layer initialized
[  423.869555] No such timeout policy "ovs_test_tp"
[  429.662364] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  429.663567] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  430.945688] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  437.634039] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4416): performing device reset
[  445.937186] mlx5_core 0000:0d:00.0: poll_health:801:(pid 0): device's health compromised - reached miss count
[  445.937212] mlx5_core 0000:0d:00.0: print_health_info:431:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:
[  445.937221] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[0] 0x0521945b
[  445.937228] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[1] 0x00000000
[  445.937234] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[2] 0x00000000
[  445.937240] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[3] 0x00000000
[  445.937247] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[4] 0x00000000
[  445.937253] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[5] 0x00000000
[  445.937259] mlx5_core 0000:0d:00.0: print_health_info:438:(pid 0): assert_exit_ptr 0x21492f38
[  445.937265] mlx5_core 0000:0d:00.0: print_health_info:439:(pid 0): assert_callra 0x2102d5f0
[  445.937280] mlx5_core 0000:0d:00.0: print_health_info:440:(pid 0): fw_ver 22.44.1036
[  445.937286] mlx5_core 0000:0d:00.0: print_health_info:442:(pid 0): time 1742220438
[  445.937294] mlx5_core 0000:0d:00.0: print_health_info:443:(pid 0): hw_id 0x00000212
[  445.937296] mlx5_core 0000:0d:00.0: print_health_info:444:(pid 0): rfr 0
[  445.937297] mlx5_core 0000:0d:00.0: print_health_info:445:(pid 0): severity 3 (ERROR)
[  445.937303] mlx5_core 0000:0d:00.0: print_health_info:446:(pid 0): irisc_index 3
[  445.937314] mlx5_core 0000:0d:00.0: print_health_info:447:(pid 0): synd 0x1: firmware internal error
[  445.937320] mlx5_core 0000:0d:00.0: print_health_info:449:(pid 0): ext_synd 0x8f7a
[  445.937327] mlx5_core 0000:0d:00.0: print_health_info:450:(pid 0): raw fw_ver 0x162c040c
[  446.257192] mlx5_core 0000:0d:00.2: poll_health:801:(pid 0): device's health compromised - reached miss count
[  446.513190] mlx5_core 0000:0d:00.3: poll_health:801:(pid 0): device's health compromised - reached miss count
[  446.577190] mlx5_core 0000:0d:00.4: poll_health:801:(pid 0): device's health compromised - reached miss count
[  447.473192] mlx5_core 0000:0d:00.1: poll_health:801:(pid 0): device's health compromised - reached miss count
[  447.473215] mlx5_core 0000:0d:00.1: print_health_info:431:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:
[  447.473221] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[0] 0x0521945b
[  447.473228] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[1] 0x00000000
[  447.473234] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[2] 0x00000000
[  447.473240] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[3] 0x00000000
[  447.473246] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[4] 0x00000000
[  447.473252] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[5] 0x00000000
[  447.473259] mlx5_core 0000:0d:00.1: print_health_info:438:(pid 0): assert_exit_ptr 0x21492f38
[  447.473265] mlx5_core 0000:0d:00.1: print_health_info:439:(pid 0): assert_callra 0x2102d5f0
[  447.473279] mlx5_core 0000:0d:00.1: print_health_info:440:(pid 0): fw_ver 22.44.1036
[  447.473286] mlx5_core 0000:0d:00.1: print_health_info:442:(pid 0): time 1742220438
[  447.473292] mlx5_core 0000:0d:00.1: print_health_info:443:(pid 0): hw_id 0x00000212
[  447.473293] mlx5_core 0000:0d:00.1: print_health_info:444:(pid 0): rfr 0
[  447.473295] mlx5_core 0000:0d:00.1: print_health_info:445:(pid 0): severity 3 (ERROR)
[  447.473300] mlx5_core 0000:0d:00.1: print_health_info:446:(pid 0): irisc_index 3
[  447.473311] mlx5_core 0000:0d:00.1: print_health_info:447:(pid 0): synd 0x1: firmware internal error
[  447.473317] mlx5_core 0000:0d:00.1: print_health_info:449:(pid 0): ext_synd 0x8f7a
[  447.473323] mlx5_core 0000:0d:00.1: print_health_info:450:(pid 0): raw fw_ver 0x162c040c
[  447.729198] mlx5_core 0000:0d:00.5: poll_health:801:(pid 0): device's health compromised - reached miss count
[  456.169156] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 4416): suspending device
[  456.171590] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4416): performing device reset
[  456.305726] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4416): performing device reset
[  457.305137] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4428): performing device reset
[  495.742404] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 4410): suspending device
[  495.843610] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  495.991695] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  496.020164] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  496.035602] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  496.049646] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4417): performing device reset
[  832.265070] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 4410): suspending device
[  832.372650] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  853.399862] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 4417): suspending device
[  853.529509] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4417): performing device reset
[  853.673868] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4417): performing device reset
[  854.395224] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  884.925816] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 4410): suspending device
[  885.042204] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  885.244260] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  896.300680] No such timeout policy "ovs_test_tp"
[  900.357954] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[  900.359110] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[  902.100249] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[  910.628854] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[  937.895823] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6064): suspending device
[  937.899325] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[  938.085016] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[  939.055817] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6077): performing device reset
[  976.568586] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[  976.662853] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[  976.810406] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[  976.834590] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[  976.850151] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[  976.864017] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 1311.108310] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1311.220778] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1332.256657] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6065): suspending device
[ 1332.386466] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 1332.529703] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 1333.241475] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1356.795129] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1356.917443] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1357.064267] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1357.087461] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1357.102482] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1357.118749] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 1413.013745] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1413.129036] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1434.138735] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6065): suspending device
[ 1434.269223] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 1434.412250] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 1435.137768] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1458.692556] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1458.804336] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1458.950704] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1458.973833] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1458.988729] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1459.004483] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 1514.899786] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1515.005398] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1536.017815] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6067): suspending device
[ 1536.147266] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1536.290797] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1537.013159] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1560.562567] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1560.672042] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1560.822567] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1560.845755] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1560.860899] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1560.874076] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1616.775191] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1616.882382] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1637.909014] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6067): suspending device
[ 1638.038584] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1638.182394] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1638.899863] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1662.455170] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1662.571479] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1662.718829] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1662.742060] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1662.757116] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1662.770550] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1718.669137] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1718.778560] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1739.797035] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6067): suspending device
[ 1739.928191] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1740.071945] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1740.793808] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1744.056362] ------------[ cut here ]------------
[ 1744.056364] WARNING: CPU: 46 PID: 0 at kernel/time/timer.c:1685 __run_timers.part.0+0x253/0x280
[ 1744.056375] Modules linked in: act_skbedit bluetooth nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs act_mirred cls_matchall nfnetlink_cttimeout nfnetlink act_gact cls_flower sch_ingress openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 mlx5_vdpa vringh vhost_vdpa vhost vhost_iotlb vdpa bridge stp llc qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp mlx5_ib kvm_intel ib_uverbs cdc_ether kvm macsec usbnet ib_core mii iTCO_wdt pmt_telemetry dell_smbios pmt_class rapl iTCO_vendor_support dcdbas wmi_bmof dell_wmi_descriptor ipmi_ssif acpi_power_meter joydev acpi_ipmi intel_sdsi isst_if_mmio isst_if_mbox_pci isst_if_common intel_vsec intel_cstate i2c_i801 mei_me mei ipmi_si i2c_smbus intel_uncore ipmi_devintf pcspkr i2c_ismt ipmi_msghandler xfs libcrc32c sd_mod mgag200 i2c_algo_bit sg drm_shmem_helper mlx5_core
[ 1744.056420]  drm_kms_helper mlxfw ahci psample libahci crct10dif_pclmul iaa_crypto crc32_pclmul drm bnxt_en libata megaraid_sas crc32c_intel idxd tls ghash_clmulni_intel tg3 idxd_bus pci_hyperv_intf wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod fuse
[ 1744.056432] CPU: 46 PID: 0 Comm: swapper/46 Kdump: loaded Not tainted 5.14.0-570.1.1.el9_6.x86_64 #1
[ 1744.056435] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS 1.3.2 03/28/2023
[ 1744.056436] RIP: 0010:__run_timers.part.0+0x253/0x280
[ 1744.056439] Code: 6e ff ff ff 0f 1f 44 00 00 e9 64 ff ff ff 49 8b 44 24 10 83 6c 24 04 01 8b 4c 24 04 83 f9 ff 0f 85 f8 fe ff ff e9 9a fe ff ff <0f> 0b e9 22 ff ff ff 41 80 7c 24 26 00 0f 84 75 fe ff ff 0f 0b e9
[ 1744.056440] RSP: 0018:ff367707c7118ef0 EFLAGS: 00010046
[ 1744.056442] RAX: 0000000000000000 RBX: 0000000100160000 RCX: 0000000000000200
[ 1744.056443] RDX: ff367707c7118f00 RSI: ff119e88e01e1540 RDI: ff119e88e01e1568
[ 1744.056444] RBP: ff119e818706f658 R08: 0000000000000009 R09: 0000000000000001
[ 1744.056445] R10: ffffffff9e8060c0 R11: ff367707c7110082 R12: ff119e88e01e1540
[ 1744.056445] R13: dead000000000122 R14: 0000000000000000 R15: ff367707c7118f00
[ 1744.056446] FS:  0000000000000000(0000) GS:ff119e88e01c0000(0000) knlGS:0000000000000000
[ 1744.056448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1744.056448] CR2: 00007f37a222ffe8 CR3: 000000089a8f2001 CR4: 0000000000773ef0
[ 1744.056449] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1744.056450] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 1744.056451] PKRU: 55555554
[ 1744.056452] Call Trace:
[ 1744.056453]  <IRQ>
[ 1744.056455]  ? show_trace_log_lvl+0x1c4/0x2df
[ 1744.056460]  ? show_trace_log_lvl+0x1c4/0x2df
[ 1744.056461]  ? run_timer_softirq+0x26/0x50
[ 1744.056463]  ? __run_timers.part.0+0x253/0x280
[ 1744.056465]  ? __warn+0x7e/0xd0
[ 1744.056470]  ? __run_timers.part.0+0x253/0x280
[ 1744.056471]  ? report_bug+0x100/0x140
[ 1744.056475]  ? handle_bug+0x3c/0x70
[ 1744.056478]  ? exc_invalid_op+0x14/0x70
[ 1744.056480]  ? asm_exc_invalid_op+0x16/0x20
[ 1744.056486]  ? __run_timers.part.0+0x253/0x280
[ 1744.056488]  ? tick_nohz_highres_handler+0x6d/0x90
[ 1744.056491]  ? __hrtimer_run_queues+0x121/0x2b0
[ 1744.056494]  ? sched_clock+0xc/0x30
[ 1744.056498]  run_timer_softirq+0x26/0x50
[ 1744.056500]  handle_softirqs+0xce/0x270
[ 1744.056504]  __irq_exit_rcu+0xa3/0xc0
[ 1744.056507]  sysvec_apic_timer_interrupt+0x72/0x90
[ 1744.056510]  </IRQ>
[ 1744.056511]  <TASK>
[ 1744.056511]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 1744.056513] RIP: 0010:cpuidle_enter_state+0xbc/0x420
[ 1744.056515] Code: e6 01 00 00 e8 75 52 46 ff e8 90 ed ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 91 1c 45 ff 45 84 ff 0f 85 3f 01 00 00 fb 45 85 f6 <0f> 88 a0 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52 48 8d 04 82 49
[ 1744.056516] RSP: 0018:ff367707c67afe80 EFLAGS: 00000202
[ 1744.056517] RAX: ff119e88e01f38c0 RBX: 0000000000000002 RCX: 000000000000001f
[ 1744.056518] RDX: 0000000000000000 RSI: 0000000040000000 RDI: 0000000000000000
[ 1744.056519] RBP: ff119e88e01ff1f0 R08: 0000019611dc062f R09: 0000000000000001
[ 1744.056520] R10: 000000000000afc8 R11: ff119e88e01f1b24 R12: ffffffff9ecd0a80
[ 1744.056520] R13: 0000019611dc062f R14: 0000000000000002 R15: 0000000000000000
[ 1744.056523]  cpuidle_enter+0x29/0x40
[ 1744.056527]  cpuidle_idle_call+0xfa/0x160
[ 1744.056532]  do_idle+0x7b/0xe0
[ 1744.056533]  cpu_startup_entry+0x26/0x30
[ 1744.056535]  start_secondary+0x115/0x140
[ 1744.056539]  secondary_startup_64_no_verify+0x187/0x18b
[ 1744.056544]  </TASK>
[ 1744.056545] ---[ end trace 0000000000000000 ]---
[ 1764.350097] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1764.461299] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1764.610195] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1764.633164] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1764.651522] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1764.666709] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6074): performing device reset
[ 1820.564757] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1820.669447] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1841.689173] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6074): suspending device
[ 1841.820848] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6074): performing device reset
[ 1841.964521] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6074): performing device reset
[ 1842.686383] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1866.244482] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1866.356684] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1866.504013] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1866.527119] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1866.542271] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1866.557025] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6069): performing device reset
[ 1922.453849] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1922.566288] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1943.577231] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6069): suspending device
[ 1943.705482] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6069): performing device reset
[ 1943.849078] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6069): performing device reset
[ 1944.575623] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1968.131649] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1968.244956] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1968.393767] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1968.416840] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1968.432204] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1968.445394] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6066): performing device reset
[ 2023.748063] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2023.859118] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2044.773289] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6066): suspending device
[ 2044.902992] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6066): performing device reset
[ 2045.046642] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6066): performing device reset
[ 2045.765646] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2069.321650] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2069.432046] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2069.579077] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2069.602194] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2069.617295] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2069.630408] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6066): performing device reset
[ 2125.529218] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2125.642232] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2146.660349] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6068): suspending device
[ 2146.789723] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2146.934212] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2147.651131] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2171.206931] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2171.317257] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2171.463890] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2171.486887] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2171.502178] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2171.515624] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2227.925133] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2228.039644] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2249.064863] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6068): suspending device
[ 2249.196469] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2249.342621] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2250.062035] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2273.618035] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2273.728952] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2273.875403] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2273.903320] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2273.918752] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2273.935470] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2329.733715] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2329.845466] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2350.757001] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6068): suspending device
[ 2350.888854] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2351.032363] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2351.754748] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2375.311251] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2375.423505] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2375.575649] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2375.598606] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2375.613576] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2375.626555] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2431.431256] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2431.548474] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2452.563063] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6082): suspending device
[ 2452.692186] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6082): performing device reset
[ 2452.835890] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6082): performing device reset
[ 2453.558500] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2477.107406] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2477.222543] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2477.369468] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2477.392708] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2477.408000] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2477.421005] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6082): performing device reset
[ 2532.820704] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2532.929522] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2553.944287] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6083): suspending device
[ 2554.073386] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6083): performing device reset
[ 2554.217339] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6083): performing device reset
[ 2554.939721] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2578.496858] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2578.608906] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2578.755717] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2578.778646] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2578.793624] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2578.806434] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6083): performing device reset
[ 2635.208023] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2635.319509] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2656.349622] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6064): suspending device
[ 2656.481365] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[ 2656.626034] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[ 2657.344870] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2680.900938] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2681.012904] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2681.162658] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2681.190490] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2681.209509] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2681.222689] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[ 2736.520697] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2736.629074] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2757.549668] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6064): suspending device
[ 2757.677681] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[ 2757.821586] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[ 2758.545571] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2782.101880] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2782.212478] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2782.362330] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2782.391840] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2782.410782] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2782.424201] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 2837.821875] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2837.934527] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2858.857829] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6065): suspending device
[ 2858.987470] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 2859.131164] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 2859.856901] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2883.416052] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2883.529299] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2883.678733] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2883.702133] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2883.720613] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2883.736648] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 2939.534333] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2939.650840] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2960.666945] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6065): suspending device
[ 2960.795641] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 2960.939717] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 2961.662744] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2985.218288] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2985.330631] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2985.478083] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2985.501059] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2985.516039] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2985.532442] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6073): performing device reset
[ 3051.941145] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 3052.052588] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3061.926205] mlx5_core 0000:0d:00.0: mlx5_fw_tracer_handle_traces:741:(pid 12): FWTracer: Events were lost
[ 3073.011616] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6073): suspending device
[ 3073.145816] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6073): performing device reset
[ 3073.292354] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6073): performing device reset
[ 3074.007125] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3097.566078] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 3097.681695] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3097.841924] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3097.865284] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3097.880624] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3097.895416] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6073): performing device reset
[ 3153.793814] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 3153.904742] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3174.840368] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6073): suspending device
[ 3174.976699] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6073): performing device reset
[ 3175.120717] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6073): performing device reset
[ 3175.831356] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3199.386363] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 3199.505991] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3199.653190] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3199.676267] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3199.691369] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3199.709429] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6073): performing device reset
[ 3255.502913] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 3255.626444] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3256.256680] general protection fault, probably for non-canonical address 0x16919e70800880c0: 0000 [#1] PREEMPT SMP NOPTI
[ 3256.256684] CPU: 38 PID: 0 Comm: swapper/38 Kdump: loaded Tainted: G        W         -------  ---  5.14.0-570.1.1.el9_6.x86_64 #1
[ 3256.256687] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS 1.3.2 03/28/2023
[ 3256.256687] RIP: 0010:__build_skb_around+0x8c/0xf0
[ 3256.256695] Code: 24 d0 00 00 00 66 41 89 94 24 ba 00 00 00 66 41 89 8c 24 b6 00 00 00 65 8b 15 1c 09 bd 62 66 41 89 94 24 a0 00 00 00 48 01 d8 <48> c7 00 00 00 00 00 48 c7 40 08 00 00 00 00 48 c7 40 10 00 00 00
[ 3256.256696] RSP: 0018:ff367707c6f78cf0 EFLAGS: 00010206
[ 3256.256698] RAX: 16919e70800880c0 RBX: 16919e7080088000 RCX: 00000000ffffffff
[ 3256.256699] RDX: 0000000000000026 RSI: 16919e7080088000 RDI: ff119e81b3397000
[ 3256.256700] RBP: 0000000000000300 R08: ff367707c6f78cc0 R09: 0000000000000040
[ 3256.256701] R10: 000000000000005a R11: 16919e7080088040 R12: ff119e81b3397000
[ 3256.256702] R13: 0000000000000200 R14: 0000000000000000 R15: 0000000000000040
[ 3256.256703] FS:  0000000000000000(0000) GS:ff119e88e00c0000(0000) knlGS:0000000000000000
[ 3256.256704] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3256.256705] CR2: 00007f420c0205b8 CR3: 000000089a8f2006 CR4: 0000000000773ef0
[ 3256.256706] PKRU: 55555554
[ 3256.256707] Call Trace:
[ 3256.256708]  <IRQ>
[ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
[ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
[ 3256.256715]  ? __build_skb+0x4a/0x60
[ 3256.256719]  ? __die_body.cold+0x8/0xd
[ 3256.256720]  ? die_addr+0x39/0x60
[ 3256.256725]  ? exc_general_protection+0x1ec/0x420
[ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
[ 3256.256736]  ? __build_skb_around+0x8c/0xf0
[ 3256.256738]  __build_skb+0x4a/0x60
[ 3256.256740]  build_skb+0x11/0xa0
[ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
[ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
[ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
[ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
[ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
[ 3256.257226]  __napi_poll+0x29/0x170
[ 3256.257229]  net_rx_action+0x29c/0x370
[ 3256.257231]  handle_softirqs+0xce/0x270
[ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
[ 3256.257238]  common_interrupt+0x80/0xa0
[ 3256.257241]  </IRQ>
[ 3256.257241]  <TASK>
[ 3256.257242]  asm_common_interrupt+0x22/0x40
[ 3256.257244] RIP: 0010:cpuidle_enter_state+0xbc/0x420
[ 3256.257246] Code: e6 01 00 00 e8 75 52 46 ff e8 90 ed ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 91 1c 45 ff 45 84 ff 0f 85 3f 01 00 00 fb 45 85 f6 <0f> 88 a0 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52 48 8d 04 82 49
[ 3256.257247] RSP: 0018:ff367707c676fe80 EFLAGS: 00000202
[ 3256.257249] RAX: ff119e88e00f38c0 RBX: 0000000000000002 RCX: 000000000000001f
[ 3256.257250] RDX: 0000000000000000 RSI: 0000000040000000 RDI: 0000000000000000
[ 3256.257251] RBP: ff119e88e00ff1f0 R08: 000002f62805a71f R09: 0000000000000000
[ 3256.257251] R10: 00000000000003e1 R11: ff119e88e00f1b24 R12: ffffffff9ecd0a80
[ 3256.257252] R13: 000002f62805a71f R14: 0000000000000002 R15: 0000000000000000
[ 3256.257254]  cpuidle_enter+0x29/0x40
[ 3256.257259]  cpuidle_idle_call+0xfa/0x160
[ 3256.257262]  do_idle+0x7b/0xe0
[ 3256.257264]  cpu_startup_entry+0x26/0x30
[ 3256.257266]  start_secondary+0x115/0x140
[ 3256.257270]  secondary_startup_64_no_verify+0x187/0x18b
[ 3256.257274]  </TASK>
[ 3256.257275] Modules linked in: act_skbedit bluetooth nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs act_mirred cls_matchall nfnetlink_cttimeout nfnetlink act_gact cls_flower sch_ingress openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 mlx5_vdpa vringh vhost_vdpa vhost vhost_iotlb vdpa bridge stp llc qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp mlx5_ib kvm_intel ib_uverbs cdc_ether kvm macsec usbnet ib_core mii iTCO_wdt pmt_telemetry dell_smbios pmt_class rapl iTCO_vendor_support dcdbas wmi_bmof dell_wmi_descriptor ipmi_ssif acpi_power_meter joydev acpi_ipmi intel_sdsi isst_if_mmio isst_if_mbox_pci isst_if_common intel_vsec intel_cstate i2c_i801 mei_me mei ipmi_si i2c_smbus intel_uncore ipmi_devintf pcspkr i2c_ismt ipmi_msghandler xfs libcrc32c sd_mod mgag200 i2c_algo_bit sg drm_shmem_helper mlx5_core
[ 3256.257321]  drm_kms_helper mlxfw ahci psample libahci crct10dif_pclmul iaa_crypto crc32_pclmul drm bnxt_en libata megaraid_sas crc32c_intel idxd tls ghash_clmulni_intel tg3 idxd_bus pci_hyperv_intf wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod fuse
Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
Posted by Jason Wang 9 months ago
On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiyang@redhat.com> wrote:
>
> Hi Jonah
>
> I tested this series with the vhost_vdpa device based on mellanox
> ConnectX-6 DX nic and hit the host kernel crash. This problem can be
> easier to reproduce under the hotplug/unplug device scenario.
> For the core dump messages please review the attachment.
> FW version:
> #  flint -d 0000:0d:00.0 q |grep Version
> FW Version:            22.44.1036
> Product Version:       22.44.1036

The trace looks more like a mlx5e driver bug other than vDPA?

[ 3256.256707] Call Trace:
[ 3256.256708]  <IRQ>
[ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
[ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
[ 3256.256715]  ? __build_skb+0x4a/0x60
[ 3256.256719]  ? __die_body.cold+0x8/0xd
[ 3256.256720]  ? die_addr+0x39/0x60
[ 3256.256725]  ? exc_general_protection+0x1ec/0x420
[ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
[ 3256.256736]  ? __build_skb_around+0x8c/0xf0
[ 3256.256738]  __build_skb+0x4a/0x60
[ 3256.256740]  build_skb+0x11/0xa0
[ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
[ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
[ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
[ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
[ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
[ 3256.257226]  __napi_poll+0x29/0x170
[ 3256.257229]  net_rx_action+0x29c/0x370
[ 3256.257231]  handle_softirqs+0xce/0x270
[ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
[ 3256.257238]  common_interrupt+0x80/0xa0

Which kernel tree did you use? Can you please try net.git?

Thanks

>
> Best Regards
> Lei
>
> On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >
> > Current memory operations like pinning may take a lot of time at the
> > destination.  Currently they are done after the source of the migration is
> > stopped, and before the workload is resumed at the destination.  This is a
> > period where neigher traffic can flow, nor the VM workload can continue
> > (downtime).
> >
> > We can do better as we know the memory layout of the guest RAM at the
> > destination from the moment that all devices are initializaed.  So
> > moving that operation allows QEMU to communicate the kernel the maps
> > while the workload is still running in the source, so Linux can start
> > mapping them.
> >
> > As a small drawback, there is a time in the initialization where QEMU
> > cannot respond to QMP etc.  By some testing, this time is about
> > 0.2seconds.  This may be further reduced (or increased) depending on the
> > vdpa driver and the platform hardware, and it is dominated by the cost
> > of memory pinning.
> >
> > This matches the time that we move out of the called downtime window.
> > The downtime is measured as checking the trace timestamp from the moment
> > the source suspend the device to the moment the destination starts the
> > eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
> > secs to 2.0949.
> >
> > Future directions on top of this series may include to move more things ahead
> > of the migration time, like set DRIVER_OK or perform actual iterative migration
> > of virtio-net devices.
> >
> > Comments are welcome.
> >
> > This series is a different approach of series [1]. As the title does not
> > reflect the changes anymore, please refer to the previous one to know the
> > series history.
> >
> > This series is based on [2], it must be applied after it.
> >
> > [Jonah Palmer]
> > This series was rebased after [3] was pulled in, as [3] was a prerequisite
> > fix for this series.
> >
> > v3:
> > ---
> > * Rebase
> >
> > v2:
> > ---
> > * Move the memory listener registration to vhost_vdpa_set_owner function.
> > * Move the iova_tree allocation to net_vhost_vdpa_init.
> >
> > v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
> >
> > [1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
> > [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
> > [3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/
> >
> > Eugenio Pérez (7):
> >   vdpa: check for iova tree initialized at net_client_start
> >   vdpa: reorder vhost_vdpa_set_backend_cap
> >   vdpa: set backend capabilities at vhost_vdpa_init
> >   vdpa: add listener_registered
> >   vdpa: reorder listener assignment
> >   vdpa: move iova_tree allocation to net_vhost_vdpa_init
> >   vdpa: move memory listener register to vhost_vdpa_init
> >
> >  hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
> >  include/hw/virtio/vhost-vdpa.h | 22 +++++++-
> >  net/vhost-vdpa.c               | 34 ++----------
> >  3 files changed, 88 insertions(+), 66 deletions(-)
> >
> > --
> > 2.43.5
> >
> >
Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
Posted by Lei Yang 9 months ago
On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiyang@redhat.com> wrote:
> >
> > Hi Jonah
> >
> > I tested this series with the vhost_vdpa device based on mellanox
> > ConnectX-6 DX nic and hit the host kernel crash. This problem can be
> > easier to reproduce under the hotplug/unplug device scenario.
> > For the core dump messages please review the attachment.
> > FW version:
> > #  flint -d 0000:0d:00.0 q |grep Version
> > FW Version:            22.44.1036
> > Product Version:       22.44.1036
>
> The trace looks more like a mlx5e driver bug other than vDPA?
>
> [ 3256.256707] Call Trace:
> [ 3256.256708]  <IRQ>
> [ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
> [ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
> [ 3256.256715]  ? __build_skb+0x4a/0x60
> [ 3256.256719]  ? __die_body.cold+0x8/0xd
> [ 3256.256720]  ? die_addr+0x39/0x60
> [ 3256.256725]  ? exc_general_protection+0x1ec/0x420
> [ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
> [ 3256.256736]  ? __build_skb_around+0x8c/0xf0
> [ 3256.256738]  __build_skb+0x4a/0x60
> [ 3256.256740]  build_skb+0x11/0xa0
> [ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
> [ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
> [ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
> [ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
> [ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
> [ 3256.257226]  __napi_poll+0x29/0x170
> [ 3256.257229]  net_rx_action+0x29c/0x370
> [ 3256.257231]  handle_softirqs+0xce/0x270
> [ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
> [ 3256.257238]  common_interrupt+0x80/0xa0
>
Hi Jason

> Which kernel tree did you use? Can you please try net.git?

I used the latest 9.6 downstream kernel and upstream qemu (applied
this series of patches) to test this scenario.
First based on my test result this bug is related to this series of
patches, the conclusions are based on the following test results(All
test results are based on the above mentioned nic driver):
Case 1: downstream kernel + downstream qemu-kvm  -  pass
Case 2: downstream kernel + upstream qemu (doesn't included this
series of patches)  -  pass
Case 3: downstream kernel + upstream qemu (included this series of
patches)  - failed, reproduce ratio 100%

Then I also tried to test it with the net.git tree, but it will hit
the host kernel panic after compiling when rebooting the host. For the
call trace info please review following messages:
[    9.902851] No filesystem could mount root, tried:
[    9.902851]
[    9.909248] Kernel panic - not syncing: VFS: Unable to mount root
fs on "/dev/mapper/rhel_dell--per760--12-root" or unknown-block(0,0)
[    9.921335] CPU: 16 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6+ #3
[    9.928398] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS
1.3.2 03/28/2023
[    9.935876] Call Trace:
[    9.938332]  <TASK>
[    9.940436]  panic+0x356/0x380
[    9.943513]  mount_root_generic+0x2e7/0x300
[    9.947717]  prepare_namespace+0x65/0x270
[    9.951731]  kernel_init_freeable+0x2e2/0x310
[    9.956105]  ? __pfx_kernel_init+0x10/0x10
[    9.960221]  kernel_init+0x16/0x1d0
[    9.963715]  ret_from_fork+0x2d/0x50
[    9.967303]  ? __pfx_kernel_init+0x10/0x10
[    9.971404]  ret_from_fork_asm+0x1a/0x30
[    9.975348]  </TASK>
[    9.977555] Kernel Offset: 0xc00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   10.101881] ---[ end Kernel panic - not syncing: VFS: Unable to
mount root fs on "/dev/mapper/rhel_dell--per760--12-root" or
unknown-block(0,0) ]---

# git log -1
commit 4003c9e78778e93188a09d6043a74f7154449d43 (HEAD -> main,
origin/main, origin/HEAD)
Merge: 8f7617f45009 2409fa66e29a
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Mar 13 07:58:48 2025 -1000

    Merge tag 'net-6.14-rc7' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net


Thanks

Lei
>
> Thanks
>
> >
> > Best Regards
> > Lei
> >
> > On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> > >
> > > Current memory operations like pinning may take a lot of time at the
> > > destination.  Currently they are done after the source of the migration is
> > > stopped, and before the workload is resumed at the destination.  This is a
> > > period where neigher traffic can flow, nor the VM workload can continue
> > > (downtime).
> > >
> > > We can do better as we know the memory layout of the guest RAM at the
> > > destination from the moment that all devices are initializaed.  So
> > > moving that operation allows QEMU to communicate the kernel the maps
> > > while the workload is still running in the source, so Linux can start
> > > mapping them.
> > >
> > > As a small drawback, there is a time in the initialization where QEMU
> > > cannot respond to QMP etc.  By some testing, this time is about
> > > 0.2seconds.  This may be further reduced (or increased) depending on the
> > > vdpa driver and the platform hardware, and it is dominated by the cost
> > > of memory pinning.
> > >
> > > This matches the time that we move out of the called downtime window.
> > > The downtime is measured as checking the trace timestamp from the moment
> > > the source suspend the device to the moment the destination starts the
> > > eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
> > > secs to 2.0949.
> > >
> > > Future directions on top of this series may include to move more things ahead
> > > of the migration time, like set DRIVER_OK or perform actual iterative migration
> > > of virtio-net devices.
> > >
> > > Comments are welcome.
> > >
> > > This series is a different approach of series [1]. As the title does not
> > > reflect the changes anymore, please refer to the previous one to know the
> > > series history.
> > >
> > > This series is based on [2], it must be applied after it.
> > >
> > > [Jonah Palmer]
> > > This series was rebased after [3] was pulled in, as [3] was a prerequisite
> > > fix for this series.
> > >
> > > v3:
> > > ---
> > > * Rebase
> > >
> > > v2:
> > > ---
> > > * Move the memory listener registration to vhost_vdpa_set_owner function.
> > > * Move the iova_tree allocation to net_vhost_vdpa_init.
> > >
> > > v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
> > >
> > > [1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
> > > [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
> > > [3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/
> > >
> > > Eugenio Pérez (7):
> > >   vdpa: check for iova tree initialized at net_client_start
> > >   vdpa: reorder vhost_vdpa_set_backend_cap
> > >   vdpa: set backend capabilities at vhost_vdpa_init
> > >   vdpa: add listener_registered
> > >   vdpa: reorder listener assignment
> > >   vdpa: move iova_tree allocation to net_vhost_vdpa_init
> > >   vdpa: move memory listener register to vhost_vdpa_init
> > >
> > >  hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
> > >  include/hw/virtio/vhost-vdpa.h | 22 +++++++-
> > >  net/vhost-vdpa.c               | 34 ++----------
> > >  3 files changed, 88 insertions(+), 66 deletions(-)
> > >
> > > --
> > > 2.43.5
> > >
> > >
>
Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
Posted by Si-Wei Liu 9 months ago
Hi Lei,

On 3/18/2025 7:06 AM, Lei Yang wrote:
> On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <jasowang@redhat.com> wrote:
>> On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiyang@redhat.com> wrote:
>>> Hi Jonah
>>>
>>> I tested this series with the vhost_vdpa device based on mellanox
>>> ConnectX-6 DX nic and hit the host kernel crash. This problem can be
>>> easier to reproduce under the hotplug/unplug device scenario.
>>> For the core dump messages please review the attachment.
>>> FW version:
>>> #  flint -d 0000:0d:00.0 q |grep Version
>>> FW Version:            22.44.1036
>>> Product Version:       22.44.1036
>> The trace looks more like a mlx5e driver bug other than vDPA?
>>
>> [ 3256.256707] Call Trace:
>> [ 3256.256708]  <IRQ>
>> [ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
>> [ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
>> [ 3256.256715]  ? __build_skb+0x4a/0x60
>> [ 3256.256719]  ? __die_body.cold+0x8/0xd
>> [ 3256.256720]  ? die_addr+0x39/0x60
>> [ 3256.256725]  ? exc_general_protection+0x1ec/0x420
>> [ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
>> [ 3256.256736]  ? __build_skb_around+0x8c/0xf0
>> [ 3256.256738]  __build_skb+0x4a/0x60
>> [ 3256.256740]  build_skb+0x11/0xa0
>> [ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
>> [ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
>> [ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
>> [ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
>> [ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
>> [ 3256.257226]  __napi_poll+0x29/0x170
>> [ 3256.257229]  net_rx_action+0x29c/0x370
>> [ 3256.257231]  handle_softirqs+0xce/0x270
>> [ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
>> [ 3256.257238]  common_interrupt+0x80/0xa0
>>
> Hi Jason
>
>> Which kernel tree did you use? Can you please try net.git?
> I used the latest 9.6 downstream kernel and upstream qemu (applied
> this series of patches) to test this scenario.
> First based on my test result this bug is related to this series of
> patches, the conclusions are based on the following test results(All
> test results are based on the above mentioned nic driver):
> Case 1: downstream kernel + downstream qemu-kvm  -  pass
> Case 2: downstream kernel + upstream qemu (doesn't included this
> series of patches)  -  pass
> Case 3: downstream kernel + upstream qemu (included this series of
> patches)  - failed, reproduce ratio 100%
Just as Dragos replied earlier, the firmware was already in a bogus 
state before the panic that I also suspect it has something to do with 
various bugs in the downstream kernel. You have to apply the 3 patches 
to the downstream kernel before you may kick of the relevant tests 
again. Please pay special attention to which specific command or step 
that triggers the unhealthy report from firmware, and let us know if you 
still run into any of them.

In addition, as you seem to be testing the device hot plug and unplug 
use cases, for which the latest qemu should have related fixes 
below[1][2], but in case they are missed somehow it might also end up 
with bad firmware state to some extend. Just fyi.

[1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
[2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")

Thanks,
-Siwei
>
> Then I also tried to test it with the net.git tree, but it will hit
> the host kernel panic after compiling when rebooting the host. For the
> call trace info please review following messages:
> [    9.902851] No filesystem could mount root, tried:
> [    9.902851]
> [    9.909248] Kernel panic - not syncing: VFS: Unable to mount root
> fs on "/dev/mapper/rhel_dell--per760--12-root" or unknown-block(0,0)
> [    9.921335] CPU: 16 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6+ #3
> [    9.928398] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS
> 1.3.2 03/28/2023
> [    9.935876] Call Trace:
> [    9.938332]  <TASK>
> [    9.940436]  panic+0x356/0x380
> [    9.943513]  mount_root_generic+0x2e7/0x300
> [    9.947717]  prepare_namespace+0x65/0x270
> [    9.951731]  kernel_init_freeable+0x2e2/0x310
> [    9.956105]  ? __pfx_kernel_init+0x10/0x10
> [    9.960221]  kernel_init+0x16/0x1d0
> [    9.963715]  ret_from_fork+0x2d/0x50
> [    9.967303]  ? __pfx_kernel_init+0x10/0x10
> [    9.971404]  ret_from_fork_asm+0x1a/0x30
> [    9.975348]  </TASK>
> [    9.977555] Kernel Offset: 0xc00000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [   10.101881] ---[ end Kernel panic - not syncing: VFS: Unable to
> mount root fs on "/dev/mapper/rhel_dell--per760--12-root" or
> unknown-block(0,0) ]---
>
> # git log -1
> commit 4003c9e78778e93188a09d6043a74f7154449d43 (HEAD -> main,
> origin/main, origin/HEAD)
> Merge: 8f7617f45009 2409fa66e29a
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Thu Mar 13 07:58:48 2025 -1000
>
>      Merge tag 'net-6.14-rc7' of
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
>
>
> Thanks
>
> Lei
>> Thanks
>>
>>> Best Regards
>>> Lei
>>>
>>> On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>>> Current memory operations like pinning may take a lot of time at the
>>>> destination.  Currently they are done after the source of the migration is
>>>> stopped, and before the workload is resumed at the destination.  This is a
>>>> period where neigher traffic can flow, nor the VM workload can continue
>>>> (downtime).
>>>>
>>>> We can do better as we know the memory layout of the guest RAM at the
>>>> destination from the moment that all devices are initializaed.  So
>>>> moving that operation allows QEMU to communicate the kernel the maps
>>>> while the workload is still running in the source, so Linux can start
>>>> mapping them.
>>>>
>>>> As a small drawback, there is a time in the initialization where QEMU
>>>> cannot respond to QMP etc.  By some testing, this time is about
>>>> 0.2seconds.  This may be further reduced (or increased) depending on the
>>>> vdpa driver and the platform hardware, and it is dominated by the cost
>>>> of memory pinning.
>>>>
>>>> This matches the time that we move out of the called downtime window.
>>>> The downtime is measured as checking the trace timestamp from the moment
>>>> the source suspend the device to the moment the destination starts the
>>>> eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
>>>> secs to 2.0949.
>>>>
>>>> Future directions on top of this series may include to move more things ahead
>>>> of the migration time, like set DRIVER_OK or perform actual iterative migration
>>>> of virtio-net devices.
>>>>
>>>> Comments are welcome.
>>>>
>>>> This series is a different approach of series [1]. As the title does not
>>>> reflect the changes anymore, please refer to the previous one to know the
>>>> series history.
>>>>
>>>> This series is based on [2], it must be applied after it.
>>>>
>>>> [Jonah Palmer]
>>>> This series was rebased after [3] was pulled in, as [3] was a prerequisite
>>>> fix for this series.
>>>>
>>>> v3:
>>>> ---
>>>> * Rebase
>>>>
>>>> v2:
>>>> ---
>>>> * Move the memory listener registration to vhost_vdpa_set_owner function.
>>>> * Move the iova_tree allocation to net_vhost_vdpa_init.
>>>>
>>>> v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
>>>>
>>>> [1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
>>>> [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
>>>> [3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/
>>>>
>>>> Eugenio Pérez (7):
>>>>    vdpa: check for iova tree initialized at net_client_start
>>>>    vdpa: reorder vhost_vdpa_set_backend_cap
>>>>    vdpa: set backend capabilities at vhost_vdpa_init
>>>>    vdpa: add listener_registered
>>>>    vdpa: reorder listener assignment
>>>>    vdpa: move iova_tree allocation to net_vhost_vdpa_init
>>>>    vdpa: move memory listener register to vhost_vdpa_init
>>>>
>>>>   hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
>>>>   include/hw/virtio/vhost-vdpa.h | 22 +++++++-
>>>>   net/vhost-vdpa.c               | 34 ++----------
>>>>   3 files changed, 88 insertions(+), 66 deletions(-)
>>>>
>>>> --
>>>> 2.43.5
>>>>
>>>>


Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
Posted by Lei Yang 9 months ago
Hi Dragos, Si-Wei

1.  I applied [0] [1] [2] to the downstream kernel then tested
hotplug/unplug, this bug still exists.

[0] 35025963326e ("vdpa/mlx5: Fix suboptimal range on iotlb iteration")
[1] 29ce8b8a4fa7 ("vdpa/mlx5: Fix PA offset with unaligned starting iotlb map")
[2] a6097e0a54a5 ("vdpa/mlx5: Fix oversized null mkey longer than 32bit")

2. Si-Wei mentioned two patches [1] [2] have been merged into qemu
master branch, so based on the test result it can not help fix this
bug.
[1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
[2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")

3. I found triggers for the unhealthy report from firmware step is
just boot up guest when using the current patches qemu. The host dmesg
will print  unhealthy info immediately after booting up the guest.

Thanks
Lei


On Wed, Mar 19, 2025 at 8:14 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Hi Lei,
>
> On 3/18/2025 7:06 AM, Lei Yang wrote:
> > On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <jasowang@redhat.com> wrote:
> >> On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiyang@redhat.com> wrote:
> >>> Hi Jonah
> >>>
> >>> I tested this series with the vhost_vdpa device based on mellanox
> >>> ConnectX-6 DX nic and hit the host kernel crash. This problem can be
> >>> easier to reproduce under the hotplug/unplug device scenario.
> >>> For the core dump messages please review the attachment.
> >>> FW version:
> >>> #  flint -d 0000:0d:00.0 q |grep Version
> >>> FW Version:            22.44.1036
> >>> Product Version:       22.44.1036
> >> The trace looks more like a mlx5e driver bug other than vDPA?
> >>
> >> [ 3256.256707] Call Trace:
> >> [ 3256.256708]  <IRQ>
> >> [ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
> >> [ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
> >> [ 3256.256715]  ? __build_skb+0x4a/0x60
> >> [ 3256.256719]  ? __die_body.cold+0x8/0xd
> >> [ 3256.256720]  ? die_addr+0x39/0x60
> >> [ 3256.256725]  ? exc_general_protection+0x1ec/0x420
> >> [ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
> >> [ 3256.256736]  ? __build_skb_around+0x8c/0xf0
> >> [ 3256.256738]  __build_skb+0x4a/0x60
> >> [ 3256.256740]  build_skb+0x11/0xa0
> >> [ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
> >> [ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
> >> [ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
> >> [ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
> >> [ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
> >> [ 3256.257226]  __napi_poll+0x29/0x170
> >> [ 3256.257229]  net_rx_action+0x29c/0x370
> >> [ 3256.257231]  handle_softirqs+0xce/0x270
> >> [ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
> >> [ 3256.257238]  common_interrupt+0x80/0xa0
> >>
> > Hi Jason
> >
> >> Which kernel tree did you use? Can you please try net.git?
> > I used the latest 9.6 downstream kernel and upstream qemu (applied
> > this series of patches) to test this scenario.
> > First based on my test result this bug is related to this series of
> > patches, the conclusions are based on the following test results(All
> > test results are based on the above mentioned nic driver):
> > Case 1: downstream kernel + downstream qemu-kvm  -  pass
> > Case 2: downstream kernel + upstream qemu (doesn't included this
> > series of patches)  -  pass
> > Case 3: downstream kernel + upstream qemu (included this series of
> > patches)  - failed, reproduce ratio 100%
> Just as Dragos replied earlier, the firmware was already in a bogus
> state before the panic that I also suspect it has something to do with
> various bugs in the downstream kernel. You have to apply the 3 patches
> to the downstream kernel before you may kick of the relevant tests
> again. Please pay special attention to which specific command or step
> that triggers the unhealthy report from firmware, and let us know if you
> still run into any of them.
>
> In addition, as you seem to be testing the device hot plug and unplug
> use cases, for which the latest qemu should have related fixes
> below[1][2], but in case they are missed somehow it might also end up
> with bad firmware state to some extend. Just fyi.
>
> [1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
> [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")
>
> Thanks,
> -Siwei
> >
> > Then I also tried to test it with the net.git tree, but it will hit
> > the host kernel panic after compiling when rebooting the host. For the
> > call trace info please review following messages:
> > [    9.902851] No filesystem could mount root, tried:
> > [    9.902851]
> > [    9.909248] Kernel panic - not syncing: VFS: Unable to mount root
> > fs on "/dev/mapper/rhel_dell--per760--12-root" or unknown-block(0,0)
> > [    9.921335] CPU: 16 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6+ #3
> > [    9.928398] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS
> > 1.3.2 03/28/2023
> > [    9.935876] Call Trace:
> > [    9.938332]  <TASK>
> > [    9.940436]  panic+0x356/0x380
> > [    9.943513]  mount_root_generic+0x2e7/0x300
> > [    9.947717]  prepare_namespace+0x65/0x270
> > [    9.951731]  kernel_init_freeable+0x2e2/0x310
> > [    9.956105]  ? __pfx_kernel_init+0x10/0x10
> > [    9.960221]  kernel_init+0x16/0x1d0
> > [    9.963715]  ret_from_fork+0x2d/0x50
> > [    9.967303]  ? __pfx_kernel_init+0x10/0x10
> > [    9.971404]  ret_from_fork_asm+0x1a/0x30
> > [    9.975348]  </TASK>
> > [    9.977555] Kernel Offset: 0xc00000 from 0xffffffff81000000
> > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [   10.101881] ---[ end Kernel panic - not syncing: VFS: Unable to
> > mount root fs on "/dev/mapper/rhel_dell--per760--12-root" or
> > unknown-block(0,0) ]---
> >
> > # git log -1
> > commit 4003c9e78778e93188a09d6043a74f7154449d43 (HEAD -> main,
> > origin/main, origin/HEAD)
> > Merge: 8f7617f45009 2409fa66e29a
> > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > Date:   Thu Mar 13 07:58:48 2025 -1000
> >
> >      Merge tag 'net-6.14-rc7' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> >
> >
> > Thanks
> >
> > Lei
> >> Thanks
> >>
> >>> Best Regards
> >>> Lei
> >>>
> >>> On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>>> Current memory operations like pinning may take a lot of time at the
> >>>> destination.  Currently they are done after the source of the migration is
> >>>> stopped, and before the workload is resumed at the destination.  This is a
> >>>> period where neigher traffic can flow, nor the VM workload can continue
> >>>> (downtime).
> >>>>
> >>>> We can do better as we know the memory layout of the guest RAM at the
> >>>> destination from the moment that all devices are initializaed.  So
> >>>> moving that operation allows QEMU to communicate the kernel the maps
> >>>> while the workload is still running in the source, so Linux can start
> >>>> mapping them.
> >>>>
> >>>> As a small drawback, there is a time in the initialization where QEMU
> >>>> cannot respond to QMP etc.  By some testing, this time is about
> >>>> 0.2seconds.  This may be further reduced (or increased) depending on the
> >>>> vdpa driver and the platform hardware, and it is dominated by the cost
> >>>> of memory pinning.
> >>>>
> >>>> This matches the time that we move out of the called downtime window.
> >>>> The downtime is measured as checking the trace timestamp from the moment
> >>>> the source suspend the device to the moment the destination starts the
> >>>> eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
> >>>> secs to 2.0949.
> >>>>
> >>>> Future directions on top of this series may include to move more things ahead
> >>>> of the migration time, like set DRIVER_OK or perform actual iterative migration
> >>>> of virtio-net devices.
> >>>>
> >>>> Comments are welcome.
> >>>>
> >>>> This series is a different approach of series [1]. As the title does not
> >>>> reflect the changes anymore, please refer to the previous one to know the
> >>>> series history.
> >>>>
> >>>> This series is based on [2], it must be applied after it.
> >>>>
> >>>> [Jonah Palmer]
> >>>> This series was rebased after [3] was pulled in, as [3] was a prerequisite
> >>>> fix for this series.
> >>>>
> >>>> v3:
> >>>> ---
> >>>> * Rebase
> >>>>
> >>>> v2:
> >>>> ---
> >>>> * Move the memory listener registration to vhost_vdpa_set_owner function.
> >>>> * Move the iova_tree allocation to net_vhost_vdpa_init.
> >>>>
> >>>> v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
> >>>>
> >>>> [1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
> >>>> [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
> >>>> [3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/
> >>>>
> >>>> Eugenio Pérez (7):
> >>>>    vdpa: check for iova tree initialized at net_client_start
> >>>>    vdpa: reorder vhost_vdpa_set_backend_cap
> >>>>    vdpa: set backend capabilities at vhost_vdpa_init
> >>>>    vdpa: add listener_registered
> >>>>    vdpa: reorder listener assignment
> >>>>    vdpa: move iova_tree allocation to net_vhost_vdpa_init
> >>>>    vdpa: move memory listener register to vhost_vdpa_init
> >>>>
> >>>>   hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
> >>>>   include/hw/virtio/vhost-vdpa.h | 22 +++++++-
> >>>>   net/vhost-vdpa.c               | 34 ++----------
> >>>>   3 files changed, 88 insertions(+), 66 deletions(-)
> >>>>
> >>>> --
> >>>> 2.43.5
> >>>>
> >>>>
>
Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
Posted by Dragos Tatulea 9 months ago
Hi Lei,

On 03/20, Lei Yang wrote:
> Hi Dragos, Si-Wei
> 
> 1.  I applied [0] [1] [2] to the downstream kernel then tested
> hotplug/unplug, this bug still exists.
> 
> [0] 35025963326e ("vdpa/mlx5: Fix suboptimal range on iotlb iteration")
> [1] 29ce8b8a4fa7 ("vdpa/mlx5: Fix PA offset with unaligned starting iotlb map")
> [2] a6097e0a54a5 ("vdpa/mlx5: Fix oversized null mkey longer than 32bit")
> 
> 2. Si-Wei mentioned two patches [1] [2] have been merged into qemu
> master branch, so based on the test result it can not help fix this
> bug.
> [1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
> [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")
> 
> 3. I found triggers for the unhealthy report from firmware step is
> just boot up guest when using the current patches qemu. The host dmesg
> will print  unhealthy info immediately after booting up the guest.
> 
Did you set the locked memory to ulimite before (ulimit -l unlimited)?
This could also be the cause for the FW issue.

Thanks,
Dragos

> Thanks
> Lei
> 
> 
> On Wed, Mar 19, 2025 at 8:14 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >
> > Hi Lei,
> >
> > On 3/18/2025 7:06 AM, Lei Yang wrote:
> > > On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <jasowang@redhat.com> wrote:
> > >> On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiyang@redhat.com> wrote:
> > >>> Hi Jonah
> > >>>
> > >>> I tested this series with the vhost_vdpa device based on mellanox
> > >>> ConnectX-6 DX nic and hit the host kernel crash. This problem can be
> > >>> easier to reproduce under the hotplug/unplug device scenario.
> > >>> For the core dump messages please review the attachment.
> > >>> FW version:
> > >>> #  flint -d 0000:0d:00.0 q |grep Version
> > >>> FW Version:            22.44.1036
> > >>> Product Version:       22.44.1036
> > >> The trace looks more like a mlx5e driver bug other than vDPA?
> > >>
> > >> [ 3256.256707] Call Trace:
> > >> [ 3256.256708]  <IRQ>
> > >> [ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
> > >> [ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
> > >> [ 3256.256715]  ? __build_skb+0x4a/0x60
> > >> [ 3256.256719]  ? __die_body.cold+0x8/0xd
> > >> [ 3256.256720]  ? die_addr+0x39/0x60
> > >> [ 3256.256725]  ? exc_general_protection+0x1ec/0x420
> > >> [ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
> > >> [ 3256.256736]  ? __build_skb_around+0x8c/0xf0
> > >> [ 3256.256738]  __build_skb+0x4a/0x60
> > >> [ 3256.256740]  build_skb+0x11/0xa0
> > >> [ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
> > >> [ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
> > >> [ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
> > >> [ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
> > >> [ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
> > >> [ 3256.257226]  __napi_poll+0x29/0x170
> > >> [ 3256.257229]  net_rx_action+0x29c/0x370
> > >> [ 3256.257231]  handle_softirqs+0xce/0x270
> > >> [ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
> > >> [ 3256.257238]  common_interrupt+0x80/0xa0
> > >>
> > > Hi Jason
> > >
> > >> Which kernel tree did you use? Can you please try net.git?
> > > I used the latest 9.6 downstream kernel and upstream qemu (applied
> > > this series of patches) to test this scenario.
> > > First based on my test result this bug is related to this series of
> > > patches, the conclusions are based on the following test results(All
> > > test results are based on the above mentioned nic driver):
> > > Case 1: downstream kernel + downstream qemu-kvm  -  pass
> > > Case 2: downstream kernel + upstream qemu (doesn't included this
> > > series of patches)  -  pass
> > > Case 3: downstream kernel + upstream qemu (included this series of
> > > patches)  - failed, reproduce ratio 100%
> > Just as Dragos replied earlier, the firmware was already in a bogus
> > state before the panic that I also suspect it has something to do with
> > various bugs in the downstream kernel. You have to apply the 3 patches
> > to the downstream kernel before you may kick of the relevant tests
> > again. Please pay special attention to which specific command or step
> > that triggers the unhealthy report from firmware, and let us know if you
> > still run into any of them.
> >
> > In addition, as you seem to be testing the device hot plug and unplug
> > use cases, for which the latest qemu should have related fixes
> > below[1][2], but in case they are missed somehow it might also end up
> > with bad firmware state to some extend. Just fyi.
> >
> > [1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
> > [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")
> >
> > Thanks,
> > -Siwei
> > >
> > > Then I also tried to test it with the net.git tree, but it will hit
> > > the host kernel panic after compiling when rebooting the host. For the
> > > call trace info please review following messages:
> > > [    9.902851] No filesystem could mount root, tried:
> > > [    9.902851]
> > > [    9.909248] Kernel panic - not syncing: VFS: Unable to mount root
> > > fs on "/dev/mapper/rhel_dell--per760--12-root" or unknown-block(0,0)
> > > [    9.921335] CPU: 16 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6+ #3
> > > [    9.928398] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS
> > > 1.3.2 03/28/2023
> > > [    9.935876] Call Trace:
> > > [    9.938332]  <TASK>
> > > [    9.940436]  panic+0x356/0x380
> > > [    9.943513]  mount_root_generic+0x2e7/0x300
> > > [    9.947717]  prepare_namespace+0x65/0x270
> > > [    9.951731]  kernel_init_freeable+0x2e2/0x310
> > > [    9.956105]  ? __pfx_kernel_init+0x10/0x10
> > > [    9.960221]  kernel_init+0x16/0x1d0
> > > [    9.963715]  ret_from_fork+0x2d/0x50
> > > [    9.967303]  ? __pfx_kernel_init+0x10/0x10
> > > [    9.971404]  ret_from_fork_asm+0x1a/0x30
> > > [    9.975348]  </TASK>
> > > [    9.977555] Kernel Offset: 0xc00000 from 0xffffffff81000000
> > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > [   10.101881] ---[ end Kernel panic - not syncing: VFS: Unable to
> > > mount root fs on "/dev/mapper/rhel_dell--per760--12-root" or
> > > unknown-block(0,0) ]---
> > >
> > > # git log -1
> > > commit 4003c9e78778e93188a09d6043a74f7154449d43 (HEAD -> main,
> > > origin/main, origin/HEAD)
> > > Merge: 8f7617f45009 2409fa66e29a
> > > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > > Date:   Thu Mar 13 07:58:48 2025 -1000
> > >
> > >      Merge tag 'net-6.14-rc7' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> > >
> > >
> > > Thanks
> > >
> > > Lei
> > >> Thanks
> > >>
> > >>> Best Regards
> > >>> Lei
> > >>>
> > >>> On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> > >>>> Current memory operations like pinning may take a lot of time at the
> > >>>> destination.  Currently they are done after the source of the migration is
> > >>>> stopped, and before the workload is resumed at the destination.  This is a
> > >>>> period where neigher traffic can flow, nor the VM workload can continue
> > >>>> (downtime).
> > >>>>
> > >>>> We can do better as we know the memory layout of the guest RAM at the
> > >>>> destination from the moment that all devices are initializaed.  So
> > >>>> moving that operation allows QEMU to communicate the kernel the maps
> > >>>> while the workload is still running in the source, so Linux can start
> > >>>> mapping them.
> > >>>>
> > >>>> As a small drawback, there is a time in the initialization where QEMU
> > >>>> cannot respond to QMP etc.  By some testing, this time is about
> > >>>> 0.2seconds.  This may be further reduced (or increased) depending on the
> > >>>> vdpa driver and the platform hardware, and it is dominated by the cost
> > >>>> of memory pinning.
> > >>>>
> > >>>> This matches the time that we move out of the called downtime window.
> > >>>> The downtime is measured as checking the trace timestamp from the moment
> > >>>> the source suspend the device to the moment the destination starts the
> > >>>> eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
> > >>>> secs to 2.0949.
> > >>>>
> > >>>> Future directions on top of this series may include to move more things ahead
> > >>>> of the migration time, like set DRIVER_OK or perform actual iterative migration
> > >>>> of virtio-net devices.
> > >>>>
> > >>>> Comments are welcome.
> > >>>>
> > >>>> This series is a different approach of series [1]. As the title does not
> > >>>> reflect the changes anymore, please refer to the previous one to know the
> > >>>> series history.
> > >>>>
> > >>>> This series is based on [2], it must be applied after it.
> > >>>>
> > >>>> [Jonah Palmer]
> > >>>> This series was rebased after [3] was pulled in, as [3] was a prerequisite
> > >>>> fix for this series.
> > >>>>
> > >>>> v3:
> > >>>> ---
> > >>>> * Rebase
> > >>>>
> > >>>> v2:
> > >>>> ---
> > >>>> * Move the memory listener registration to vhost_vdpa_set_owner function.
> > >>>> * Move the iova_tree allocation to net_vhost_vdpa_init.
> > >>>>
> > >>>> v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
> > >>>>
> > >>>> [1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
> > >>>> [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
> > >>>> [3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/
> > >>>>
> > >>>> Eugenio Pérez (7):
> > >>>>    vdpa: check for iova tree initialized at net_client_start
> > >>>>    vdpa: reorder vhost_vdpa_set_backend_cap
> > >>>>    vdpa: set backend capabilities at vhost_vdpa_init
> > >>>>    vdpa: add listener_registered
> > >>>>    vdpa: reorder listener assignment
> > >>>>    vdpa: move iova_tree allocation to net_vhost_vdpa_init
> > >>>>    vdpa: move memory listener register to vhost_vdpa_init
> > >>>>
> > >>>>   hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
> > >>>>   include/hw/virtio/vhost-vdpa.h | 22 +++++++-
> > >>>>   net/vhost-vdpa.c               | 34 ++----------
> > >>>>   3 files changed, 88 insertions(+), 66 deletions(-)
> > >>>>
> > >>>> --
> > >>>> 2.43.5
> > >>>>
> > >>>>
> >
> 

Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
Posted by Lei Yang 9 months ago
On Thu, Mar 20, 2025 at 11:48 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Hi Lei,
>
> On 03/20, Lei Yang wrote:
> > Hi Dragos, Si-Wei
> >
> > 1.  I applied [0] [1] [2] to the downstream kernel then tested
> > hotplug/unplug, this bug still exists.
> >
> > [0] 35025963326e ("vdpa/mlx5: Fix suboptimal range on iotlb iteration")
> > [1] 29ce8b8a4fa7 ("vdpa/mlx5: Fix PA offset with unaligned starting iotlb map")
> > [2] a6097e0a54a5 ("vdpa/mlx5: Fix oversized null mkey longer than 32bit")
> >
> > 2. Si-Wei mentioned two patches [1] [2] have been merged into qemu
> > master branch, so based on the test result it can not help fix this
> > bug.
> > [1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
> > [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")
> >
> > 3. I found triggers for the unhealthy report from firmware step is
> > just boot up guest when using the current patches qemu. The host dmesg
> > will print  unhealthy info immediately after booting up the guest.
> >

Hi Dragos

> Did you set the locked memory to ulimite before (ulimit -l unlimited)?
> This could also be the cause for the FW issue.

Yes, I did it. I executed it (ulimit -l unlimited) before I boot up the guest.

Thanks
Lei
>
> Thanks,
> Dragos
>
> > Thanks
> > Lei
> >
> >
> > On Wed, Mar 19, 2025 at 8:14 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> > >
> > > Hi Lei,
> > >
> > > On 3/18/2025 7:06 AM, Lei Yang wrote:
> > > > On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >> On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiyang@redhat.com> wrote:
> > > >>> Hi Jonah
> > > >>>
> > > >>> I tested this series with the vhost_vdpa device based on mellanox
> > > >>> ConnectX-6 DX nic and hit the host kernel crash. This problem can be
> > > >>> easier to reproduce under the hotplug/unplug device scenario.
> > > >>> For the core dump messages please review the attachment.
> > > >>> FW version:
> > > >>> #  flint -d 0000:0d:00.0 q |grep Version
> > > >>> FW Version:            22.44.1036
> > > >>> Product Version:       22.44.1036
> > > >> The trace looks more like a mlx5e driver bug other than vDPA?
> > > >>
> > > >> [ 3256.256707] Call Trace:
> > > >> [ 3256.256708]  <IRQ>
> > > >> [ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
> > > >> [ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
> > > >> [ 3256.256715]  ? __build_skb+0x4a/0x60
> > > >> [ 3256.256719]  ? __die_body.cold+0x8/0xd
> > > >> [ 3256.256720]  ? die_addr+0x39/0x60
> > > >> [ 3256.256725]  ? exc_general_protection+0x1ec/0x420
> > > >> [ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
> > > >> [ 3256.256736]  ? __build_skb_around+0x8c/0xf0
> > > >> [ 3256.256738]  __build_skb+0x4a/0x60
> > > >> [ 3256.256740]  build_skb+0x11/0xa0
> > > >> [ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
> > > >> [ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
> > > >> [ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
> > > >> [ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
> > > >> [ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
> > > >> [ 3256.257226]  __napi_poll+0x29/0x170
> > > >> [ 3256.257229]  net_rx_action+0x29c/0x370
> > > >> [ 3256.257231]  handle_softirqs+0xce/0x270
> > > >> [ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
> > > >> [ 3256.257238]  common_interrupt+0x80/0xa0
> > > >>
> > > > Hi Jason
> > > >
> > > >> Which kernel tree did you use? Can you please try net.git?
> > > > I used the latest 9.6 downstream kernel and upstream qemu (applied
> > > > this series of patches) to test this scenario.
> > > > First based on my test result this bug is related to this series of
> > > > patches, the conclusions are based on the following test results(All
> > > > test results are based on the above mentioned nic driver):
> > > > Case 1: downstream kernel + downstream qemu-kvm  -  pass
> > > > Case 2: downstream kernel + upstream qemu (doesn't included this
> > > > series of patches)  -  pass
> > > > Case 3: downstream kernel + upstream qemu (included this series of
> > > > patches)  - failed, reproduce ratio 100%
> > > Just as Dragos replied earlier, the firmware was already in a bogus
> > > state before the panic that I also suspect it has something to do with
> > > various bugs in the downstream kernel. You have to apply the 3 patches
> > > to the downstream kernel before you may kick of the relevant tests
> > > again. Please pay special attention to which specific command or step
> > > that triggers the unhealthy report from firmware, and let us know if you
> > > still run into any of them.
> > >
> > > In addition, as you seem to be testing the device hot plug and unplug
> > > use cases, for which the latest qemu should have related fixes
> > > below[1][2], but in case they are missed somehow it might also end up
> > > with bad firmware state to some extend. Just fyi.
> > >
> > > [1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
> > > [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")
> > >
> > > Thanks,
> > > -Siwei
> > > >
> > > > Then I also tried to test it with the net.git tree, but it will hit
> > > > the host kernel panic after compiling when rebooting the host. For the
> > > > call trace info please review following messages:
> > > > [    9.902851] No filesystem could mount root, tried:
> > > > [    9.902851]
> > > > [    9.909248] Kernel panic - not syncing: VFS: Unable to mount root
> > > > fs on "/dev/mapper/rhel_dell--per760--12-root" or unknown-block(0,0)
> > > > [    9.921335] CPU: 16 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6+ #3
> > > > [    9.928398] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS
> > > > 1.3.2 03/28/2023
> > > > [    9.935876] Call Trace:
> > > > [    9.938332]  <TASK>
> > > > [    9.940436]  panic+0x356/0x380
> > > > [    9.943513]  mount_root_generic+0x2e7/0x300
> > > > [    9.947717]  prepare_namespace+0x65/0x270
> > > > [    9.951731]  kernel_init_freeable+0x2e2/0x310
> > > > [    9.956105]  ? __pfx_kernel_init+0x10/0x10
> > > > [    9.960221]  kernel_init+0x16/0x1d0
> > > > [    9.963715]  ret_from_fork+0x2d/0x50
> > > > [    9.967303]  ? __pfx_kernel_init+0x10/0x10
> > > > [    9.971404]  ret_from_fork_asm+0x1a/0x30
> > > > [    9.975348]  </TASK>
> > > > [    9.977555] Kernel Offset: 0xc00000 from 0xffffffff81000000
> > > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > [   10.101881] ---[ end Kernel panic - not syncing: VFS: Unable to
> > > > mount root fs on "/dev/mapper/rhel_dell--per760--12-root" or
> > > > unknown-block(0,0) ]---
> > > >
> > > > # git log -1
> > > > commit 4003c9e78778e93188a09d6043a74f7154449d43 (HEAD -> main,
> > > > origin/main, origin/HEAD)
> > > > Merge: 8f7617f45009 2409fa66e29a
> > > > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > > > Date:   Thu Mar 13 07:58:48 2025 -1000
> > > >
> > > >      Merge tag 'net-6.14-rc7' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> > > >
> > > >
> > > > Thanks
> > > >
> > > > Lei
> > > >> Thanks
> > > >>
> > > >>> Best Regards
> > > >>> Lei
> > > >>>
> > > >>> On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> > > >>>> Current memory operations like pinning may take a lot of time at the
> > > >>>> destination.  Currently they are done after the source of the migration is
> > > >>>> stopped, and before the workload is resumed at the destination.  This is a
> > > >>>> period where neigher traffic can flow, nor the VM workload can continue
> > > >>>> (downtime).
> > > >>>>
> > > >>>> We can do better as we know the memory layout of the guest RAM at the
> > > >>>> destination from the moment that all devices are initializaed.  So
> > > >>>> moving that operation allows QEMU to communicate the kernel the maps
> > > >>>> while the workload is still running in the source, so Linux can start
> > > >>>> mapping them.
> > > >>>>
> > > >>>> As a small drawback, there is a time in the initialization where QEMU
> > > >>>> cannot respond to QMP etc.  By some testing, this time is about
> > > >>>> 0.2seconds.  This may be further reduced (or increased) depending on the
> > > >>>> vdpa driver and the platform hardware, and it is dominated by the cost
> > > >>>> of memory pinning.
> > > >>>>
> > > >>>> This matches the time that we move out of the called downtime window.
> > > >>>> The downtime is measured as checking the trace timestamp from the moment
> > > >>>> the source suspend the device to the moment the destination starts the
> > > >>>> eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
> > > >>>> secs to 2.0949.
> > > >>>>
> > > >>>> Future directions on top of this series may include to move more things ahead
> > > >>>> of the migration time, like set DRIVER_OK or perform actual iterative migration
> > > >>>> of virtio-net devices.
> > > >>>>
> > > >>>> Comments are welcome.
> > > >>>>
> > > >>>> This series is a different approach of series [1]. As the title does not
> > > >>>> reflect the changes anymore, please refer to the previous one to know the
> > > >>>> series history.
> > > >>>>
> > > >>>> This series is based on [2], it must be applied after it.
> > > >>>>
> > > >>>> [Jonah Palmer]
> > > >>>> This series was rebased after [3] was pulled in, as [3] was a prerequisite
> > > >>>> fix for this series.
> > > >>>>
> > > >>>> v3:
> > > >>>> ---
> > > >>>> * Rebase
> > > >>>>
> > > >>>> v2:
> > > >>>> ---
> > > >>>> * Move the memory listener registration to vhost_vdpa_set_owner function.
> > > >>>> * Move the iova_tree allocation to net_vhost_vdpa_init.
> > > >>>>
> > > >>>> v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
> > > >>>>
> > > >>>> [1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
> > > >>>> [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
> > > >>>> [3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/
> > > >>>>
> > > >>>> Eugenio Pérez (7):
> > > >>>>    vdpa: check for iova tree initialized at net_client_start
> > > >>>>    vdpa: reorder vhost_vdpa_set_backend_cap
> > > >>>>    vdpa: set backend capabilities at vhost_vdpa_init
> > > >>>>    vdpa: add listener_registered
> > > >>>>    vdpa: reorder listener assignment
> > > >>>>    vdpa: move iova_tree allocation to net_vhost_vdpa_init
> > > >>>>    vdpa: move memory listener register to vhost_vdpa_init
> > > >>>>
> > > >>>>   hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
> > > >>>>   include/hw/virtio/vhost-vdpa.h | 22 +++++++-
> > > >>>>   net/vhost-vdpa.c               | 34 ++----------
> > > >>>>   3 files changed, 88 insertions(+), 66 deletions(-)
> > > >>>>
> > > >>>> --
> > > >>>> 2.43.5
> > > >>>>
> > > >>>>
> > >
> >
>
Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
Posted by Dragos Tatulea 9 months ago
Hi,

On 03/18, Lei Yang wrote:
> On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiyang@redhat.com> wrote:
> > >
> > > Hi Jonah
> > >
> > > I tested this series with the vhost_vdpa device based on mellanox
> > > ConnectX-6 DX nic and hit the host kernel crash. This problem can be
> > > easier to reproduce under the hotplug/unplug device scenario.
> > > For the core dump messages please review the attachment.
> > > FW version:
> > > #  flint -d 0000:0d:00.0 q |grep Version
> > > FW Version:            22.44.1036
> > > Product Version:       22.44.1036
> >
> > The trace looks more like a mlx5e driver bug other than vDPA?
> >
> > [ 3256.256707] Call Trace:
> > [ 3256.256708]  <IRQ>
> > [ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
> > [ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
> > [ 3256.256715]  ? __build_skb+0x4a/0x60
> > [ 3256.256719]  ? __die_body.cold+0x8/0xd
> > [ 3256.256720]  ? die_addr+0x39/0x60
> > [ 3256.256725]  ? exc_general_protection+0x1ec/0x420
> > [ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
> > [ 3256.256736]  ? __build_skb_around+0x8c/0xf0
> > [ 3256.256738]  __build_skb+0x4a/0x60
> > [ 3256.256740]  build_skb+0x11/0xa0
> > [ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
> > [ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
> > [ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
> > [ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
> > [ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
> > [ 3256.257226]  __napi_poll+0x29/0x170
> > [ 3256.257229]  net_rx_action+0x29c/0x370
> > [ 3256.257231]  handle_softirqs+0xce/0x270
> > [ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
> > [ 3256.257238]  common_interrupt+0x80/0xa0
> >
The logs indicate that the mlx5_vdpa device is already in bad FW state
before this crash:

[  445.937186] mlx5_core 0000:0d:00.0: poll_health:801:(pid 0): device's health compromised - reached miss count
[  445.937212] mlx5_core 0000:0d:00.0: print_health_info:431:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:
[  445.937221] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[0] 0x0521945b
[  445.937228] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[1] 0x00000000
[  445.937234] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[2] 0x00000000
[  445.937240] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[3] 0x00000000
[  445.937247] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[4] 0x00000000
[  445.937253] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[5] 0x00000000
[  445.937259] mlx5_core 0000:0d:00.0: print_health_info:438:(pid 0): assert_exit_ptr 0x21492f38
[  445.937265] mlx5_core 0000:0d:00.0: print_health_info:439:(pid 0): assert_callra 0x2102d5f0
[  445.937280] mlx5_core 0000:0d:00.0: print_health_info:440:(pid 0): fw_ver 22.44.1036
[  445.937286] mlx5_core 0000:0d:00.0: print_health_info:442:(pid 0): time 1742220438
[  445.937294] mlx5_core 0000:0d:00.0: print_health_info:443:(pid 0): hw_id 0x00000212
[  445.937296] mlx5_core 0000:0d:00.0: print_health_info:444:(pid 0): rfr 0
[  445.937297] mlx5_core 0000:0d:00.0: print_health_info:445:(pid 0): severity 3 (ERROR)
[  445.937303] mlx5_core 0000:0d:00.0: print_health_info:446:(pid 0): irisc_index 3
[  445.937314] mlx5_core 0000:0d:00.0: print_health_info:447:(pid 0): synd 0x1: firmware internal error
[  445.937320] mlx5_core 0000:0d:00.0: print_health_info:449:(pid 0): ext_synd 0x8f7a
[  445.937327] mlx5_core 0000:0d:00.0: print_health_info:450:(pid 0): raw fw_ver 0x162c040c
[  446.257192] mlx5_core 0000:0d:00.2: poll_health:801:(pid 0): device's health compromised - reached miss count
[  446.513190] mlx5_core 0000:0d:00.3: poll_health:801:(pid 0): device's health compromised - reached miss count
[  446.577190] mlx5_core 0000:0d:00.4: poll_health:801:(pid 0): device's health compromised - reached miss count
[  447.473192] mlx5_core 0000:0d:00.1: poll_health:801:(pid 0): device's health compromised - reached miss count
[  447.473215] mlx5_core 0000:0d:00.1: print_health_info:431:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:
[  447.473221] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[0] 0x0521945b
[  447.473228] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[1] 0x00000000
[  447.473234] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[2] 0x00000000
[  447.473240] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[3] 0x00000000
[  447.473246] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[4] 0x00000000
[  447.473252] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[5] 0x00000000
[  447.473259] mlx5_core 0000:0d:00.1: print_health_info:438:(pid 0): assert_exit_ptr 0x21492f38
[  447.473265] mlx5_core 0000:0d:00.1: print_health_info:439:(pid 0): assert_callra 0x2102d5f0
[  447.473279] mlx5_core 0000:0d:00.1: print_health_info:440:(pid 0): fw_ver 22.44.1036
[  447.473286] mlx5_core 0000:0d:00.1: print_health_info:442:(pid 0): time 1742220438
[  447.473292] mlx5_core 0000:0d:00.1: print_health_info:443:(pid 0): hw_id 0x00000212
[  447.473293] mlx5_core 0000:0d:00.1: print_health_info:444:(pid 0): rfr 0
[  447.473295] mlx5_core 0000:0d:00.1: print_health_info:445:(pid 0): severity 3 (ERROR)
[  447.473300] mlx5_core 0000:0d:00.1: print_health_info:446:(pid 0): irisc_index 3
[  447.473311] mlx5_core 0000:0d:00.1: print_health_info:447:(pid 0): synd 0x1: firmware internal error
[  447.473317] mlx5_core 0000:0d:00.1: print_health_info:449:(pid 0): ext_synd 0x8f7a
[  447.473323] mlx5_core 0000:0d:00.1: print_health_info:450:(pid 0): raw fw_ver 0x162c040c
[  447.729198] mlx5_core 0000:0d:00.5: poll_health:801:(pid 0): device's health compromised - reached miss count

This is related to a ring translation error on the FW side.

Si-Wei has some relevant fixes in the latest kernel [0][1]. And there is
an upcoming fix [2] which is pending merge. These might help. Either
that or there is something off with the mapping.

[0] 35025963326e ("vdpa/mlx5: Fix suboptimal range on iotlb iteration")
[1] 29ce8b8a4fa7 ("vdpa/mlx5: Fix PA offset with unaligned starting iotlb map")
[2] a6097e0a54a5 ("vdpa/mlx5: Fix oversized null mkey longer than 32bit")

Thanks,
Dragos