drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 + drivers/vdpa/mlx5/core/mr.c | 17 ++++++++++ drivers/vdpa/mlx5/net/mlx5_vnet.c | 27 ++++++++++++++-- drivers/vdpa/vdpa_sim/vdpa_sim.c | 52 ++++++++++++++++++++++++------ drivers/vhost/vdpa.c | 49 +++++++++++++++++++++++++--- drivers/virtio/virtio_vdpa.c | 2 +- include/linux/vdpa.h | 30 +++++++++++++++-- include/uapi/linux/vhost_types.h | 2 ++ 8 files changed, 161 insertions(+), 19 deletions(-)
In order to reduce needlessly high setup and teardown cost of iotlb mapping during live migration, it's crucial to decouple the vhost-vdpa iotlb abstraction from the virtio device life cycle, i.e. iotlb mappings should be left intact across virtio device reset [1]. For it to work, the on-chip IOMMU parent device could implement a separate .reset_map() operation callback to restore 1:1 DMA mapping without having to resort to the .reset() callback, the latter of which is mainly used to reset virtio device state. This new .reset_map() callback will be invoked only before the vhost-vdpa driver is to be removed and detached from the vdpa bus, such that other vdpa bus drivers, e.g. virtio-vdpa, can start with 1:1 DMA mapping when they are attached. For the context, those on-chip IOMMU parent devices, create the 1:1 DMA mapping at vdpa device creation, and they would implicitly destroy the 1:1 mapping when the first .set_map or .dma_map callback is invoked. This patchset is rebased on top of the latest vhost tree. [1] Reducing vdpa migration downtime because of memory pin / maps https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html --- v4: - Rework compatibility using new .compat_reset driver op v3: - add .reset_map support to vdpa_sim - introduce module parameter to provide bug-for-bug compatibility with older userspace v2: - improved commit message to clarify the intended csope of .reset_map API - improved commit messages to clarify no breakage on older userspace v1: - rewrote commit messages to include more detailed description and background - reword to vendor specific IOMMU implementation from on-chip IOMMU - include parent device backend feautres to persistent iotlb precondition - reimplement mlx5_vdpa patch on top of descriptor group series RFC v3: - fix missing return due to merge error in patch #4 RFC v2: - rebased on top of the "[PATCH RFC v2 0/3] vdpa: dedicated descriptor table group" series: https://lore.kernel.org/virtualization/1694248959-13369-1-git-send-email-si-wei.liu@oracle.com/ --- Si-Wei Liu (7): vdpa: introduce .reset_map operation callback vhost-vdpa: reset vendor specific mapping to initial state in .release vhost-vdpa: introduce IOTLB_PERSIST backend feature bit vdpa: introduce .compat_reset operation callback vhost-vdpa: clean iotlb map during reset for older userspace vdpa/mlx5: implement .reset_map driver op vdpa_sim: implement .reset_map support drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 + drivers/vdpa/mlx5/core/mr.c | 17 ++++++++++ drivers/vdpa/mlx5/net/mlx5_vnet.c | 27 ++++++++++++++-- drivers/vdpa/vdpa_sim/vdpa_sim.c | 52 ++++++++++++++++++++++++------ drivers/vhost/vdpa.c | 49 +++++++++++++++++++++++++--- drivers/virtio/virtio_vdpa.c | 2 +- include/linux/vdpa.h | 30 +++++++++++++++-- include/uapi/linux/vhost_types.h | 2 ++ 8 files changed, 161 insertions(+), 19 deletions(-) -- 2.39.3
Hi Si-Wei: On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > > In order to reduce needlessly high setup and teardown cost > of iotlb mapping during live migration, it's crucial to > decouple the vhost-vdpa iotlb abstraction from the virtio > device life cycle, i.e. iotlb mappings should be left > intact across virtio device reset [1]. For it to work, the > on-chip IOMMU parent device could implement a separate > .reset_map() operation callback to restore 1:1 DMA mapping > without having to resort to the .reset() callback, the > latter of which is mainly used to reset virtio device state. > This new .reset_map() callback will be invoked only before > the vhost-vdpa driver is to be removed and detached from > the vdpa bus, such that other vdpa bus drivers, e.g. > virtio-vdpa, can start with 1:1 DMA mapping when they > are attached. For the context, those on-chip IOMMU parent > devices, create the 1:1 DMA mapping at vdpa device creation, > and they would implicitly destroy the 1:1 mapping when > the first .set_map or .dma_map callback is invoked. > > This patchset is rebased on top of the latest vhost tree. > > [1] Reducing vdpa migration downtime because of memory pin / maps > https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html > > --- > v4: > - Rework compatibility using new .compat_reset driver op I still think having a set_backend_feature() or reset_map(clean=true) might be better. As it tries hard to not introduce new stuff on the bus. But we can listen to others for sure. Thanks
On 10/22/2023 8:51 PM, Jason Wang wrote: > Hi Si-Wei: > > On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote: >> In order to reduce needlessly high setup and teardown cost >> of iotlb mapping during live migration, it's crucial to >> decouple the vhost-vdpa iotlb abstraction from the virtio >> device life cycle, i.e. iotlb mappings should be left >> intact across virtio device reset [1]. For it to work, the >> on-chip IOMMU parent device could implement a separate >> .reset_map() operation callback to restore 1:1 DMA mapping >> without having to resort to the .reset() callback, the >> latter of which is mainly used to reset virtio device state. >> This new .reset_map() callback will be invoked only before >> the vhost-vdpa driver is to be removed and detached from >> the vdpa bus, such that other vdpa bus drivers, e.g. >> virtio-vdpa, can start with 1:1 DMA mapping when they >> are attached. For the context, those on-chip IOMMU parent >> devices, create the 1:1 DMA mapping at vdpa device creation, >> and they would implicitly destroy the 1:1 mapping when >> the first .set_map or .dma_map callback is invoked. >> >> This patchset is rebased on top of the latest vhost tree. >> >> [1] Reducing vdpa migration downtime because of memory pin / maps >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html >> >> --- >> v4: >> - Rework compatibility using new .compat_reset driver op > I still think having a set_backend_feature() This will overload backend features with the role of carrying over compatibility quirks, which I tried to avoid from. While I think the .compat_reset from the v4 code just works with the backend features acknowledgement (and maybe others as well) to determine, but not directly tie it to backend features itself. These two have different implications in terms of requirement, scope and maintaining/deprecation, better to cope with compat quirks in explicit and driver visible way. > or reset_map(clean=true) might be better. An explicit op might be marginally better in driver writer's point of view. Compliant driver doesn't have to bother asserting clean_map never be true so their code would never bother dealing with this case, as explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map during reset for older userspace": " The separation of .compat_reset from the regular .reset allows vhost-vdpa able to know which driver had broken behavior before, so it can apply the corresponding compatibility quirk to the individual driver whenever needed. Compared to overloading the existing .reset with flags, .compat_reset won't cause any extra burden to the implementation of every compliant driver. " > As it tries hard to not introduce new stuff on the bus. Honestly I don't see substantial difference between these other than the color. There's no single best solution that stands out among the 3. And I assume you already noticed it from all the above 3 approaches will have to go with backend features negotiation, that the 1st vdpa reset before backend feature negotiation will use the compliant version of .reset that doesn't clean up the map. While I don't think this nuance matters much to existing older userspace apps, as the maps should already get cleaned by previous process in vhost_vdpa_cleanup(), but if bug-for-bug behavioral compatibility is what you want, module parameter will be the single best answer. Regards, -Siwei > But we can listen to others for sure. > > Thanks >
QE tested this series v4 with regression testing on real nic, there is no new regression bug. Tested-by: Lei Yang <leiyang@redhat.com> On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > > > > On 10/22/2023 8:51 PM, Jason Wang wrote: > > Hi Si-Wei: > > > > On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > >> In order to reduce needlessly high setup and teardown cost > >> of iotlb mapping during live migration, it's crucial to > >> decouple the vhost-vdpa iotlb abstraction from the virtio > >> device life cycle, i.e. iotlb mappings should be left > >> intact across virtio device reset [1]. For it to work, the > >> on-chip IOMMU parent device could implement a separate > >> .reset_map() operation callback to restore 1:1 DMA mapping > >> without having to resort to the .reset() callback, the > >> latter of which is mainly used to reset virtio device state. > >> This new .reset_map() callback will be invoked only before > >> the vhost-vdpa driver is to be removed and detached from > >> the vdpa bus, such that other vdpa bus drivers, e.g. > >> virtio-vdpa, can start with 1:1 DMA mapping when they > >> are attached. For the context, those on-chip IOMMU parent > >> devices, create the 1:1 DMA mapping at vdpa device creation, > >> and they would implicitly destroy the 1:1 mapping when > >> the first .set_map or .dma_map callback is invoked. > >> > >> This patchset is rebased on top of the latest vhost tree. > >> > >> [1] Reducing vdpa migration downtime because of memory pin / maps > >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html > >> > >> --- > >> v4: > >> - Rework compatibility using new .compat_reset driver op > > I still think having a set_backend_feature() > This will overload backend features with the role of carrying over > compatibility quirks, which I tried to avoid from. While I think the > .compat_reset from the v4 code just works with the backend features > acknowledgement (and maybe others as well) to determine, but not > directly tie it to backend features itself. These two have different > implications in terms of requirement, scope and maintaining/deprecation, > better to cope with compat quirks in explicit and driver visible way. > > > or reset_map(clean=true) might be better. > An explicit op might be marginally better in driver writer's point of > view. Compliant driver doesn't have to bother asserting clean_map never > be true so their code would never bother dealing with this case, as > explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map > during reset for older userspace": > > " > The separation of .compat_reset from the regular .reset allows > vhost-vdpa able to know which driver had broken behavior before, so it > can apply the corresponding compatibility quirk to the individual > driver > whenever needed. Compared to overloading the existing .reset with > flags, .compat_reset won't cause any extra burden to the implementation > of every compliant driver. > " > > > As it tries hard to not introduce new stuff on the bus. > Honestly I don't see substantial difference between these other than the > color. There's no single best solution that stands out among the 3. And > I assume you already noticed it from all the above 3 approaches will > have to go with backend features negotiation, that the 1st vdpa reset > before backend feature negotiation will use the compliant version of > .reset that doesn't clean up the map. While I don't think this nuance > matters much to existing older userspace apps, as the maps should > already get cleaned by previous process in vhost_vdpa_cleanup(), but if > bug-for-bug behavioral compatibility is what you want, module parameter > will be the single best answer. > > Regards, > -Siwei > > > But we can listen to others for sure. > > > > Thanks > > >
Thanks a lot for testing! Please be aware that there's a follow-up fix for a potential oops in this v4 series: https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/ Would be nice to have it applied for any tests. Thanks, -Siwei On 10/23/2023 11:51 PM, Lei Yang wrote: > QE tested this series v4 with regression testing on real nic, there is > no new regression bug. > > Tested-by: Lei Yang <leiyang@redhat.com> > > On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote: >> >> >> On 10/22/2023 8:51 PM, Jason Wang wrote: >>> Hi Si-Wei: >>> >>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote: >>>> In order to reduce needlessly high setup and teardown cost >>>> of iotlb mapping during live migration, it's crucial to >>>> decouple the vhost-vdpa iotlb abstraction from the virtio >>>> device life cycle, i.e. iotlb mappings should be left >>>> intact across virtio device reset [1]. For it to work, the >>>> on-chip IOMMU parent device could implement a separate >>>> .reset_map() operation callback to restore 1:1 DMA mapping >>>> without having to resort to the .reset() callback, the >>>> latter of which is mainly used to reset virtio device state. >>>> This new .reset_map() callback will be invoked only before >>>> the vhost-vdpa driver is to be removed and detached from >>>> the vdpa bus, such that other vdpa bus drivers, e.g. >>>> virtio-vdpa, can start with 1:1 DMA mapping when they >>>> are attached. For the context, those on-chip IOMMU parent >>>> devices, create the 1:1 DMA mapping at vdpa device creation, >>>> and they would implicitly destroy the 1:1 mapping when >>>> the first .set_map or .dma_map callback is invoked. >>>> >>>> This patchset is rebased on top of the latest vhost tree. >>>> >>>> [1] Reducing vdpa migration downtime because of memory pin / maps >>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html >>>> >>>> --- >>>> v4: >>>> - Rework compatibility using new .compat_reset driver op >>> I still think having a set_backend_feature() >> This will overload backend features with the role of carrying over >> compatibility quirks, which I tried to avoid from. While I think the >> .compat_reset from the v4 code just works with the backend features >> acknowledgement (and maybe others as well) to determine, but not >> directly tie it to backend features itself. These two have different >> implications in terms of requirement, scope and maintaining/deprecation, >> better to cope with compat quirks in explicit and driver visible way. >> >>> or reset_map(clean=true) might be better. >> An explicit op might be marginally better in driver writer's point of >> view. Compliant driver doesn't have to bother asserting clean_map never >> be true so their code would never bother dealing with this case, as >> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map >> during reset for older userspace": >> >> " >> The separation of .compat_reset from the regular .reset allows >> vhost-vdpa able to know which driver had broken behavior before, so it >> can apply the corresponding compatibility quirk to the individual >> driver >> whenever needed. Compared to overloading the existing .reset with >> flags, .compat_reset won't cause any extra burden to the implementation >> of every compliant driver. >> " >> >>> As it tries hard to not introduce new stuff on the bus. >> Honestly I don't see substantial difference between these other than the >> color. There's no single best solution that stands out among the 3. And >> I assume you already noticed it from all the above 3 approaches will >> have to go with backend features negotiation, that the 1st vdpa reset >> before backend feature negotiation will use the compliant version of >> .reset that doesn't clean up the map. While I don't think this nuance >> matters much to existing older userspace apps, as the maps should >> already get cleaned by previous process in vhost_vdpa_cleanup(), but if >> bug-for-bug behavioral compatibility is what you want, module parameter >> will be the single best answer. >> >> Regards, >> -Siwei >> >>> But we can listen to others for sure. >>> >>> Thanks >>>
On Wed, Oct 25, 2023 at 1:27 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > Hello Si-Wei > Thanks a lot for testing! Please be aware that there's a follow-up fix > for a potential oops in this v4 series: > The first, when I did not apply this patch [1], I will also hit this patch mentioned problem. After I applied this patch, this problem will no longer to hit again. But I hit another issues, about the error messages please review the attached file. [1] https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/ My test steps: git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git cd linux/ b4 am 1697880319-4937-1-git-send-email-si-wei.liu@oracle.com b4 am 20231018171456.1624030-2-dtatulea@nvidia.com b4 am 1698102863-21122-1-git-send-email-si-wei.liu@oracle.com git am ./v4_20231018_dtatulea_vdpa_add_support_for_vq_descriptor_mappings.mbx git am ./v4_20231021_si_wei_liu_vdpa_decouple_reset_of_iotlb_mapping_from_device_reset.mbx git am ./20231023_si_wei_liu_vhost_vdpa_fix_null_pointer_deref_in__compat_vdpa_reset.mbx cp /boot/config-5.14.0-377.el9.x86_64 .config make -j 32 make modules_install make install Thanks Lei > https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/ > > Would be nice to have it applied for any tests. > > Thanks, > -Siwei > > On 10/23/2023 11:51 PM, Lei Yang wrote: > > QE tested this series v4 with regression testing on real nic, there is > > no new regression bug. > > > > Tested-by: Lei Yang <leiyang@redhat.com> > > > > On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > >> > >> > >> On 10/22/2023 8:51 PM, Jason Wang wrote: > >>> Hi Si-Wei: > >>> > >>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > >>>> In order to reduce needlessly high setup and teardown cost > >>>> of iotlb mapping during live migration, it's crucial to > >>>> decouple the vhost-vdpa iotlb abstraction from the virtio > >>>> device life cycle, i.e. iotlb mappings should be left > >>>> intact across virtio device reset [1]. For it to work, the > >>>> on-chip IOMMU parent device could implement a separate > >>>> .reset_map() operation callback to restore 1:1 DMA mapping > >>>> without having to resort to the .reset() callback, the > >>>> latter of which is mainly used to reset virtio device state. > >>>> This new .reset_map() callback will be invoked only before > >>>> the vhost-vdpa driver is to be removed and detached from > >>>> the vdpa bus, such that other vdpa bus drivers, e.g. > >>>> virtio-vdpa, can start with 1:1 DMA mapping when they > >>>> are attached. For the context, those on-chip IOMMU parent > >>>> devices, create the 1:1 DMA mapping at vdpa device creation, > >>>> and they would implicitly destroy the 1:1 mapping when > >>>> the first .set_map or .dma_map callback is invoked. > >>>> > >>>> This patchset is rebased on top of the latest vhost tree. > >>>> > >>>> [1] Reducing vdpa migration downtime because of memory pin / maps > >>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html > >>>> > >>>> --- > >>>> v4: > >>>> - Rework compatibility using new .compat_reset driver op > >>> I still think having a set_backend_feature() > >> This will overload backend features with the role of carrying over > >> compatibility quirks, which I tried to avoid from. While I think the > >> .compat_reset from the v4 code just works with the backend features > >> acknowledgement (and maybe others as well) to determine, but not > >> directly tie it to backend features itself. These two have different > >> implications in terms of requirement, scope and maintaining/deprecation, > >> better to cope with compat quirks in explicit and driver visible way. > >> > >>> or reset_map(clean=true) might be better. > >> An explicit op might be marginally better in driver writer's point of > >> view. Compliant driver doesn't have to bother asserting clean_map never > >> be true so their code would never bother dealing with this case, as > >> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map > >> during reset for older userspace": > >> > >> " > >> The separation of .compat_reset from the regular .reset allows > >> vhost-vdpa able to know which driver had broken behavior before, so it > >> can apply the corresponding compatibility quirk to the individual > >> driver > >> whenever needed. Compared to overloading the existing .reset with > >> flags, .compat_reset won't cause any extra burden to the implementation > >> of every compliant driver. > >> " > >> > >>> As it tries hard to not introduce new stuff on the bus. > >> Honestly I don't see substantial difference between these other than the > >> color. There's no single best solution that stands out among the 3. And > >> I assume you already noticed it from all the above 3 approaches will > >> have to go with backend features negotiation, that the 1st vdpa reset > >> before backend feature negotiation will use the compliant version of > >> .reset that doesn't clean up the map. While I don't think this nuance > >> matters much to existing older userspace apps, as the maps should > >> already get cleaned by previous process in vhost_vdpa_cleanup(), but if > >> bug-for-bug behavioral compatibility is what you want, module parameter > >> will be the single best answer. > >> > >> Regards, > >> -Siwei > >> > >>> But we can listen to others for sure. > >>> > >>> Thanks > >>> > [ 6325.462426] BUG: unable to handle page fault for address: 00000001005b4af4 [ 6325.469301] #PF: supervisor read access in kernel mode [ 6325.474440] #PF: error_code(0x0000) - not-present page [ 6325.479577] PGD 16a80a067 P4D 0 [ 6325.482811] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 6325.487169] CPU: 4 PID: 40387 Comm: qemu-kvm Not tainted 6.6.0-rc7+ #3 [ 6325.493695] Hardware name: Dell Inc. PowerEdge R750/0PJ80M, BIOS 1.8.2 09/14/2022 [ 6325.501175] RIP: 0010:_compat_vdpa_reset.isra.0+0x27/0xb0 [vhost_vdpa] [ 6325.507708] Code: 90 90 90 0f 1f 44 00 00 41 55 4c 8d ae 08 03 00 00 41 54 55 48 89 f5 53 4c 8b a6 00 03 00 00 48 85 ff 74 49 48 8b 07 4c 89 ef <48> 8b 80 88 45 00 00 48 c1 e8 08 48 83 f0 01 89 c3 e8 73 5e 9b dc [ 6325.526455] RSP: 0018:ff73a85762073ba0 EFLAGS: 00010286 [ 6325.531681] RAX: 00000001005b056c RBX: ff32b13ca6994c68 RCX: 0000000000000002 [ 6325.538813] RDX: 0000000000000001 RSI: ff32b13c07559000 RDI: ff32b13c07559308 [ 6325.545947] RBP: ff32b13c07559000 R08: 0000000000000000 R09: ff32b12ca497c0f0 [ 6325.553079] R10: ff73a85762073c58 R11: 0000000c106f9de3 R12: ff32b12c95b1d050 [ 6325.560212] R13: ff32b13c07559308 R14: ff32b12d0ddc5100 R15: 0000000000008002 [ 6325.567346] FS: 00007fec5b8cbf80(0000) GS:ff32b13bbfc80000(0000) knlGS:0000000000000000 [ 6325.575432] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6325.581177] CR2: 00000001005b4af4 CR3: 000000015644a003 CR4: 0000000000773ee0 [ 6325.588309] PKRU: 55555554 [ 6325.591022] Call Trace: [ 6325.593477] <TASK> [ 6325.595582] ? __die+0x20/0x70 [ 6325.598650] ? page_fault_oops+0x76/0x170 [ 6325.602662] ? exc_page_fault+0x65/0x150 [ 6325.606588] ? asm_exc_page_fault+0x22/0x30 [ 6325.610775] ? _compat_vdpa_reset.isra.0+0x27/0xb0 [vhost_vdpa] [ 6325.616692] vhost_vdpa_open+0x57/0x280 [vhost_vdpa] [ 6325.621660] ? __pfx_chrdev_open+0x10/0x10 [ 6325.625759] chrdev_open+0xc6/0x260 [ 6325.629250] ? __pfx_chrdev_open+0x10/0x10 [ 6325.633349] do_dentry_open+0x16e/0x530 [ 6325.637189] do_open+0x21c/0x400 [ 6325.640421] path_openat+0x111/0x290 [ 6325.644000] do_filp_open+0xb2/0x160 [ 6325.647582] ? __check_object_size.part.0+0x5e/0x140 [ 6325.652548] do_sys_openat2+0x96/0xd0 [ 6325.656212] __x64_sys_openat+0x53/0xa0 [ 6325.660051] do_syscall_64+0x59/0x90 [ 6325.663631] ? syscall_exit_to_user_mode+0x22/0x40 [ 6325.668423] ? do_syscall_64+0x69/0x90 [ 6325.672174] ? syscall_exit_to_user_mode+0x22/0x40 [ 6325.676970] ? do_syscall_64+0x69/0x90 [ 6325.680721] ? do_syscall_64+0x69/0x90 [ 6325.684473] ? syscall_exit_to_user_mode+0x22/0x40 [ 6325.689268] ? do_syscall_64+0x69/0x90 [ 6325.693020] ? exc_page_fault+0x65/0x150 [ 6325.696945] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 6325.701998] RIP: 0033:0x7fec5c33e654 [ 6325.705576] Code: 24 20 eb 8f 66 90 44 89 54 24 0c e8 b6 d5 f5 ff 44 8b 54 24 0c 44 89 e2 48 89 ee 41 89 c0 bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 89 44 24 0c e8 08 d6 f5 ff 8b 44 [ 6325.724322] RSP: 002b:00007ffebbe28fa0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 [ 6325.731890] RAX: ffffffffffffffda RBX: 00007fea10018560 RCX: 00007fec5c33e654 [ 6325.739021] RDX: 0000000000080002 RSI: 00007fea10018560 RDI: 00000000ffffff9c [ 6325.746156] RBP: 00007fea10018560 R08: 0000000000000000 R09: 0000000000000000 [ 6325.753287] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000080002 [ 6325.760419] R13: 00007fec5b328e70 R14: 00007fec5b328e80 R15: 0000000000000002 [ 6325.767552] </TASK> [ 6325.769744] Modules linked in: act_skbedit act_mirred mlx5_vdpa vringh vhost_vdpa vhost vhost_iotlb vdpa cls_matchall nfnetlink_cttimeout nfnetlink act_gact cls_flower sch_ingress openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs bridge stp llc qrtr intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_ib kvm dell_wmi ledtrig_audio iTCO_wdt ib_uverbs isst_if_mmio sparse_keymap ib_core iTCO_vendor_support irqbypass isst_if_mbox_pci intel_vsec acpi_ipmi isst_if_common i2c_i801 rfkill rapl ipmi_ssif video dell_smbios joydev dax_hmem cxl_acpi intel_cstate mei_me ipmi_si dell_wmi_descriptor mei wmi_bmof dcdbas intel_pch_thermal ipmi_devintf intel_uncore ipmi_msghandler cxl_core i2c_smbus pcspkr acpi_power_meter xfs libcrc32c sd_mod mgag200 nvme_tcp sg i2c_algo_bit nvme_fabrics drm_shmem_helper nvme_core [ 6325.769800] drm_kms_helper nvme_common ahci crct10dif_pclmul mlx5_core t10_pi libahci crc32_pclmul drm mlxfw crc32c_intel libata psample megaraid_sas tg3 ghash_clmulni_intel pci_hyperv_intf wmi dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse [ 6325.894242] CR2: 00000001005b4af4 [ 6325.897560] ---[ end trace 0000000000000000 ]--- [ 6325.967430] pstore: backend (erst) writing error (-28) [ 6325.972570] RIP: 0010:_compat_vdpa_reset.isra.0+0x27/0xb0 [vhost_vdpa] [ 6325.979094] Code: 90 90 90 0f 1f 44 00 00 41 55 4c 8d ae 08 03 00 00 41 54 55 48 89 f5 53 4c 8b a6 00 03 00 00 48 85 ff 74 49 48 8b 07 4c 89 ef <48> 8b 80 88 45 00 00 48 c1 e8 08 48 83 f0 01 89 c3 e8 73 5e 9b dc [ 6325.997840] RSP: 0018:ff73a85762073ba0 EFLAGS: 00010286 [ 6326.003067] RAX: 00000001005b056c RBX: ff32b13ca6994c68 RCX: 0000000000000002 [ 6326.010200] RDX: 0000000000000001 RSI: ff32b13c07559000 RDI: ff32b13c07559308 [ 6326.017332] RBP: ff32b13c07559000 R08: 0000000000000000 R09: ff32b12ca497c0f0 [ 6326.024464] R10: ff73a85762073c58 R11: 0000000c106f9de3 R12: ff32b12c95b1d050 [ 6326.031599] R13: ff32b13c07559308 R14: ff32b12d0ddc5100 R15: 0000000000008002 [ 6326.038731] FS: 00007fec5b8cbf80(0000) GS:ff32b13bbfc80000(0000) knlGS:0000000000000000 [ 6326.046816] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6326.052563] CR2: 00000001005b4af4 CR3: 000000015644a003 CR4: 0000000000773ee0 [ 6326.059695] PKRU: 55555554 [ 6326.062407] Kernel panic - not syncing: Fatal exception [ 6326.067651] Kernel Offset: 0x1c800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 6326.142894] ---[ end Kernel panic - not syncing: Fatal exception ]---
Hi Yang Lei, Thanks for testing my patches and reporting! As for the issue, could you please try what I posted in: https://lore.kernel.org/virtualization/1698275594-19204-1-git-send-email-si-wei.liu@oracle.com/ and let me know how it goes? Thank you very much! Thanks, -Siwei On 10/25/2023 2:41 AM, Lei Yang wrote: > On Wed, Oct 25, 2023 at 1:27 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > Hello Si-Wei >> Thanks a lot for testing! Please be aware that there's a follow-up fix >> for a potential oops in this v4 series: >> > The first, when I did not apply this patch [1], I will also hit this > patch mentioned problem. After I applied this patch, this problem will > no longer to hit again. But I hit another issues, about the error > messages please review the attached file. > [1] https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/ > > My test steps: > git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > cd linux/ > b4 am 1697880319-4937-1-git-send-email-si-wei.liu@oracle.com > b4 am 20231018171456.1624030-2-dtatulea@nvidia.com > b4 am 1698102863-21122-1-git-send-email-si-wei.liu@oracle.com > git am ./v4_20231018_dtatulea_vdpa_add_support_for_vq_descriptor_mappings.mbx > git am ./v4_20231021_si_wei_liu_vdpa_decouple_reset_of_iotlb_mapping_from_device_reset.mbx > git am ./20231023_si_wei_liu_vhost_vdpa_fix_null_pointer_deref_in__compat_vdpa_reset.mbx > cp /boot/config-5.14.0-377.el9.x86_64 .config > make -j 32 > make modules_install > make install > > Thanks > > Lei >> https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/ >> >> Would be nice to have it applied for any tests. >> >> Thanks, >> -Siwei >> >> On 10/23/2023 11:51 PM, Lei Yang wrote: >>> QE tested this series v4 with regression testing on real nic, there is >>> no new regression bug. >>> >>> Tested-by: Lei Yang <leiyang@redhat.com> >>> >>> On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote: >>>> >>>> On 10/22/2023 8:51 PM, Jason Wang wrote: >>>>> Hi Si-Wei: >>>>> >>>>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote: >>>>>> In order to reduce needlessly high setup and teardown cost >>>>>> of iotlb mapping during live migration, it's crucial to >>>>>> decouple the vhost-vdpa iotlb abstraction from the virtio >>>>>> device life cycle, i.e. iotlb mappings should be left >>>>>> intact across virtio device reset [1]. For it to work, the >>>>>> on-chip IOMMU parent device could implement a separate >>>>>> .reset_map() operation callback to restore 1:1 DMA mapping >>>>>> without having to resort to the .reset() callback, the >>>>>> latter of which is mainly used to reset virtio device state. >>>>>> This new .reset_map() callback will be invoked only before >>>>>> the vhost-vdpa driver is to be removed and detached from >>>>>> the vdpa bus, such that other vdpa bus drivers, e.g. >>>>>> virtio-vdpa, can start with 1:1 DMA mapping when they >>>>>> are attached. For the context, those on-chip IOMMU parent >>>>>> devices, create the 1:1 DMA mapping at vdpa device creation, >>>>>> and they would implicitly destroy the 1:1 mapping when >>>>>> the first .set_map or .dma_map callback is invoked. >>>>>> >>>>>> This patchset is rebased on top of the latest vhost tree. >>>>>> >>>>>> [1] Reducing vdpa migration downtime because of memory pin / maps >>>>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html >>>>>> >>>>>> --- >>>>>> v4: >>>>>> - Rework compatibility using new .compat_reset driver op >>>>> I still think having a set_backend_feature() >>>> This will overload backend features with the role of carrying over >>>> compatibility quirks, which I tried to avoid from. While I think the >>>> .compat_reset from the v4 code just works with the backend features >>>> acknowledgement (and maybe others as well) to determine, but not >>>> directly tie it to backend features itself. These two have different >>>> implications in terms of requirement, scope and maintaining/deprecation, >>>> better to cope with compat quirks in explicit and driver visible way. >>>> >>>>> or reset_map(clean=true) might be better. >>>> An explicit op might be marginally better in driver writer's point of >>>> view. Compliant driver doesn't have to bother asserting clean_map never >>>> be true so their code would never bother dealing with this case, as >>>> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map >>>> during reset for older userspace": >>>> >>>> " >>>> The separation of .compat_reset from the regular .reset allows >>>> vhost-vdpa able to know which driver had broken behavior before, so it >>>> can apply the corresponding compatibility quirk to the individual >>>> driver >>>> whenever needed. Compared to overloading the existing .reset with >>>> flags, .compat_reset won't cause any extra burden to the implementation >>>> of every compliant driver. >>>> " >>>> >>>>> As it tries hard to not introduce new stuff on the bus. >>>> Honestly I don't see substantial difference between these other than the >>>> color. There's no single best solution that stands out among the 3. And >>>> I assume you already noticed it from all the above 3 approaches will >>>> have to go with backend features negotiation, that the 1st vdpa reset >>>> before backend feature negotiation will use the compliant version of >>>> .reset that doesn't clean up the map. While I don't think this nuance >>>> matters much to existing older userspace apps, as the maps should >>>> already get cleaned by previous process in vhost_vdpa_cleanup(), but if >>>> bug-for-bug behavioral compatibility is what you want, module parameter >>>> will be the single best answer. >>>> >>>> Regards, >>>> -Siwei >>>> >>>>> But we can listen to others for sure. >>>>> >>>>> Thanks >>>>>
On Thu, Oct 26, 2023 at 7:32 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > > Hi Yang Lei, > > Thanks for testing my patches and reporting! As for the issue, could you > please try what I posted in: > > https://lore.kernel.org/virtualization/1698275594-19204-1-git-send-email-si-wei.liu@oracle.com/ > HI Si-Wei > and let me know how it goes? Thank you very much! This problem has gone after applying this patch [1]. [1] https://lore.kernel.org/virtualization/1698275594-19204-1-git-send-email-si-wei.liu@oracle.com/ Thanks Lei > > Thanks, > -Siwei > > On 10/25/2023 2:41 AM, Lei Yang wrote: > > On Wed, Oct 25, 2023 at 1:27 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > > Hello Si-Wei > >> Thanks a lot for testing! Please be aware that there's a follow-up fix > >> for a potential oops in this v4 series: > >> > > The first, when I did not apply this patch [1], I will also hit this > > patch mentioned problem. After I applied this patch, this problem will > > no longer to hit again. But I hit another issues, about the error > > messages please review the attached file. > > [1] https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/ > > > > My test steps: > > git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > > cd linux/ > > b4 am 1697880319-4937-1-git-send-email-si-wei.liu@oracle.com > > b4 am 20231018171456.1624030-2-dtatulea@nvidia.com > > b4 am 1698102863-21122-1-git-send-email-si-wei.liu@oracle.com > > git am ./v4_20231018_dtatulea_vdpa_add_support_for_vq_descriptor_mappings.mbx > > git am ./v4_20231021_si_wei_liu_vdpa_decouple_reset_of_iotlb_mapping_from_device_reset.mbx > > git am ./20231023_si_wei_liu_vhost_vdpa_fix_null_pointer_deref_in__compat_vdpa_reset.mbx > > cp /boot/config-5.14.0-377.el9.x86_64 .config > > make -j 32 > > make modules_install > > make install > > > > Thanks > > > > Lei > >> https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/ > >> > >> Would be nice to have it applied for any tests. > >> > >> Thanks, > >> -Siwei > >> > >> On 10/23/2023 11:51 PM, Lei Yang wrote: > >>> QE tested this series v4 with regression testing on real nic, there is > >>> no new regression bug. > >>> > >>> Tested-by: Lei Yang <leiyang@redhat.com> > >>> > >>> On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > >>>> > >>>> On 10/22/2023 8:51 PM, Jason Wang wrote: > >>>>> Hi Si-Wei: > >>>>> > >>>>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > >>>>>> In order to reduce needlessly high setup and teardown cost > >>>>>> of iotlb mapping during live migration, it's crucial to > >>>>>> decouple the vhost-vdpa iotlb abstraction from the virtio > >>>>>> device life cycle, i.e. iotlb mappings should be left > >>>>>> intact across virtio device reset [1]. For it to work, the > >>>>>> on-chip IOMMU parent device could implement a separate > >>>>>> .reset_map() operation callback to restore 1:1 DMA mapping > >>>>>> without having to resort to the .reset() callback, the > >>>>>> latter of which is mainly used to reset virtio device state. > >>>>>> This new .reset_map() callback will be invoked only before > >>>>>> the vhost-vdpa driver is to be removed and detached from > >>>>>> the vdpa bus, such that other vdpa bus drivers, e.g. > >>>>>> virtio-vdpa, can start with 1:1 DMA mapping when they > >>>>>> are attached. For the context, those on-chip IOMMU parent > >>>>>> devices, create the 1:1 DMA mapping at vdpa device creation, > >>>>>> and they would implicitly destroy the 1:1 mapping when > >>>>>> the first .set_map or .dma_map callback is invoked. > >>>>>> > >>>>>> This patchset is rebased on top of the latest vhost tree. > >>>>>> > >>>>>> [1] Reducing vdpa migration downtime because of memory pin / maps > >>>>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html > >>>>>> > >>>>>> --- > >>>>>> v4: > >>>>>> - Rework compatibility using new .compat_reset driver op > >>>>> I still think having a set_backend_feature() > >>>> This will overload backend features with the role of carrying over > >>>> compatibility quirks, which I tried to avoid from. While I think the > >>>> .compat_reset from the v4 code just works with the backend features > >>>> acknowledgement (and maybe others as well) to determine, but not > >>>> directly tie it to backend features itself. These two have different > >>>> implications in terms of requirement, scope and maintaining/deprecation, > >>>> better to cope with compat quirks in explicit and driver visible way. > >>>> > >>>>> or reset_map(clean=true) might be better. > >>>> An explicit op might be marginally better in driver writer's point of > >>>> view. Compliant driver doesn't have to bother asserting clean_map never > >>>> be true so their code would never bother dealing with this case, as > >>>> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map > >>>> during reset for older userspace": > >>>> > >>>> " > >>>> The separation of .compat_reset from the regular .reset allows > >>>> vhost-vdpa able to know which driver had broken behavior before, so it > >>>> can apply the corresponding compatibility quirk to the individual > >>>> driver > >>>> whenever needed. Compared to overloading the existing .reset with > >>>> flags, .compat_reset won't cause any extra burden to the implementation > >>>> of every compliant driver. > >>>> " > >>>> > >>>>> As it tries hard to not introduce new stuff on the bus. > >>>> Honestly I don't see substantial difference between these other than the > >>>> color. There's no single best solution that stands out among the 3. And > >>>> I assume you already noticed it from all the above 3 approaches will > >>>> have to go with backend features negotiation, that the 1st vdpa reset > >>>> before backend feature negotiation will use the compliant version of > >>>> .reset that doesn't clean up the map. While I don't think this nuance > >>>> matters much to existing older userspace apps, as the maps should > >>>> already get cleaned by previous process in vhost_vdpa_cleanup(), but if > >>>> bug-for-bug behavioral compatibility is what you want, module parameter > >>>> will be the single best answer. > >>>> > >>>> Regards, > >>>> -Siwei > >>>> > >>>>> But we can listen to others for sure. > >>>>> > >>>>> Thanks > >>>>> >
© 2016 - 2025 Red Hat, Inc.