[PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset

Si-Wei Liu posted 7 patches 2 years, 1 month ago
There is a newer version of this series
drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
drivers/vdpa/mlx5/core/mr.c        | 17 ++++++++++
drivers/vdpa/mlx5/net/mlx5_vnet.c  | 27 ++++++++++++++--
drivers/vdpa/vdpa_sim/vdpa_sim.c   | 52 ++++++++++++++++++++++++------
drivers/vhost/vdpa.c               | 49 +++++++++++++++++++++++++---
drivers/virtio/virtio_vdpa.c       |  2 +-
include/linux/vdpa.h               | 30 +++++++++++++++--
include/uapi/linux/vhost_types.h   |  2 ++
8 files changed, 161 insertions(+), 19 deletions(-)
[PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset
Posted by Si-Wei Liu 2 years, 1 month ago
In order to reduce needlessly high setup and teardown cost
of iotlb mapping during live migration, it's crucial to
decouple the vhost-vdpa iotlb abstraction from the virtio
device life cycle, i.e. iotlb mappings should be left
intact across virtio device reset [1]. For it to work, the
on-chip IOMMU parent device could implement a separate
.reset_map() operation callback to restore 1:1 DMA mapping
without having to resort to the .reset() callback, the
latter of which is mainly used to reset virtio device state.
This new .reset_map() callback will be invoked only before
the vhost-vdpa driver is to be removed and detached from
the vdpa bus, such that other vdpa bus drivers, e.g. 
virtio-vdpa, can start with 1:1 DMA mapping when they
are attached. For the context, those on-chip IOMMU parent
devices, create the 1:1 DMA mapping at vdpa device creation,
and they would implicitly destroy the 1:1 mapping when
the first .set_map or .dma_map callback is invoked.

This patchset is rebased on top of the latest vhost tree.

[1] Reducing vdpa migration downtime because of memory pin / maps
https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html

---
v4:
- Rework compatibility using new .compat_reset driver op

v3:
- add .reset_map support to vdpa_sim
- introduce module parameter to provide bug-for-bug compatibility with older
  userspace 

v2:
- improved commit message to clarify the intended csope of .reset_map API
- improved commit messages to clarify no breakage on older userspace

v1:
- rewrote commit messages to include more detailed description and background
- reword to vendor specific IOMMU implementation from on-chip IOMMU
- include parent device backend feautres to persistent iotlb precondition
- reimplement mlx5_vdpa patch on top of descriptor group series

RFC v3:
- fix missing return due to merge error in patch #4

RFC v2:
- rebased on top of the "[PATCH RFC v2 0/3] vdpa: dedicated descriptor table group" series:
  https://lore.kernel.org/virtualization/1694248959-13369-1-git-send-email-si-wei.liu@oracle.com/

---

Si-Wei Liu (7):
  vdpa: introduce .reset_map operation callback
  vhost-vdpa: reset vendor specific mapping to initial state in .release
  vhost-vdpa: introduce IOTLB_PERSIST backend feature bit
  vdpa: introduce .compat_reset operation callback
  vhost-vdpa: clean iotlb map during reset for older userspace
  vdpa/mlx5: implement .reset_map driver op
  vdpa_sim: implement .reset_map support

 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
 drivers/vdpa/mlx5/core/mr.c        | 17 ++++++++++
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 27 ++++++++++++++--
 drivers/vdpa/vdpa_sim/vdpa_sim.c   | 52 ++++++++++++++++++++++++------
 drivers/vhost/vdpa.c               | 49 +++++++++++++++++++++++++---
 drivers/virtio/virtio_vdpa.c       |  2 +-
 include/linux/vdpa.h               | 30 +++++++++++++++--
 include/uapi/linux/vhost_types.h   |  2 ++
 8 files changed, 161 insertions(+), 19 deletions(-)

-- 
2.39.3
Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset
Posted by Jason Wang 2 years, 1 month ago
Hi Si-Wei:

On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> In order to reduce needlessly high setup and teardown cost
> of iotlb mapping during live migration, it's crucial to
> decouple the vhost-vdpa iotlb abstraction from the virtio
> device life cycle, i.e. iotlb mappings should be left
> intact across virtio device reset [1]. For it to work, the
> on-chip IOMMU parent device could implement a separate
> .reset_map() operation callback to restore 1:1 DMA mapping
> without having to resort to the .reset() callback, the
> latter of which is mainly used to reset virtio device state.
> This new .reset_map() callback will be invoked only before
> the vhost-vdpa driver is to be removed and detached from
> the vdpa bus, such that other vdpa bus drivers, e.g.
> virtio-vdpa, can start with 1:1 DMA mapping when they
> are attached. For the context, those on-chip IOMMU parent
> devices, create the 1:1 DMA mapping at vdpa device creation,
> and they would implicitly destroy the 1:1 mapping when
> the first .set_map or .dma_map callback is invoked.
>
> This patchset is rebased on top of the latest vhost tree.
>
> [1] Reducing vdpa migration downtime because of memory pin / maps
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
>
> ---
> v4:
> - Rework compatibility using new .compat_reset driver op

I still think having a set_backend_feature() or reset_map(clean=true)
might be better. As it tries hard to not introduce new stuff on the
bus.

But we can listen to others for sure.

Thanks
Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset
Posted by Si-Wei Liu 2 years, 1 month ago

On 10/22/2023 8:51 PM, Jason Wang wrote:
> Hi Si-Wei:
>
> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>> In order to reduce needlessly high setup and teardown cost
>> of iotlb mapping during live migration, it's crucial to
>> decouple the vhost-vdpa iotlb abstraction from the virtio
>> device life cycle, i.e. iotlb mappings should be left
>> intact across virtio device reset [1]. For it to work, the
>> on-chip IOMMU parent device could implement a separate
>> .reset_map() operation callback to restore 1:1 DMA mapping
>> without having to resort to the .reset() callback, the
>> latter of which is mainly used to reset virtio device state.
>> This new .reset_map() callback will be invoked only before
>> the vhost-vdpa driver is to be removed and detached from
>> the vdpa bus, such that other vdpa bus drivers, e.g.
>> virtio-vdpa, can start with 1:1 DMA mapping when they
>> are attached. For the context, those on-chip IOMMU parent
>> devices, create the 1:1 DMA mapping at vdpa device creation,
>> and they would implicitly destroy the 1:1 mapping when
>> the first .set_map or .dma_map callback is invoked.
>>
>> This patchset is rebased on top of the latest vhost tree.
>>
>> [1] Reducing vdpa migration downtime because of memory pin / maps
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
>>
>> ---
>> v4:
>> - Rework compatibility using new .compat_reset driver op
> I still think having a set_backend_feature()
This will overload backend features with the role of carrying over 
compatibility quirks, which I tried to avoid from. While I think the 
.compat_reset from the v4 code just works with the backend features 
acknowledgement (and maybe others as well) to determine, but not 
directly tie it to backend features itself. These two have different 
implications in terms of requirement, scope and maintaining/deprecation, 
better to cope with compat quirks in explicit and driver visible way.

>   or reset_map(clean=true) might be better.
An explicit op might be marginally better in driver writer's point of 
view. Compliant driver doesn't have to bother asserting clean_map never 
be true so their code would never bother dealing with this case, as 
explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map 
during reset for older userspace":

"
     The separation of .compat_reset from the regular .reset allows
     vhost-vdpa able to know which driver had broken behavior before, so it
     can apply the corresponding compatibility quirk to the individual 
driver
     whenever needed.  Compared to overloading the existing .reset with
     flags, .compat_reset won't cause any extra burden to the implementation
     of every compliant driver.
"

>   As it tries hard to not introduce new stuff on the bus.
Honestly I don't see substantial difference between these other than the 
color. There's no single best solution that stands out among the 3. And 
I assume you already noticed it from all the above 3 approaches will 
have to go with backend features negotiation, that the 1st vdpa reset 
before backend feature negotiation will use the compliant version of 
.reset that doesn't clean up the map. While I don't think this nuance 
matters much to existing older userspace apps, as the maps should 
already get cleaned by previous process in vhost_vdpa_cleanup(), but if 
bug-for-bug behavioral compatibility is what you want, module parameter 
will be the single best answer.

Regards,
-Siwei

> But we can listen to others for sure.
>
> Thanks
>

Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset
Posted by Lei Yang 2 years, 1 month ago
QE tested this series v4 with regression testing on real nic, there is
no new regression bug.

Tested-by: Lei Yang <leiyang@redhat.com>

On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 10/22/2023 8:51 PM, Jason Wang wrote:
> > Hi Si-Wei:
> >
> > On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >> In order to reduce needlessly high setup and teardown cost
> >> of iotlb mapping during live migration, it's crucial to
> >> decouple the vhost-vdpa iotlb abstraction from the virtio
> >> device life cycle, i.e. iotlb mappings should be left
> >> intact across virtio device reset [1]. For it to work, the
> >> on-chip IOMMU parent device could implement a separate
> >> .reset_map() operation callback to restore 1:1 DMA mapping
> >> without having to resort to the .reset() callback, the
> >> latter of which is mainly used to reset virtio device state.
> >> This new .reset_map() callback will be invoked only before
> >> the vhost-vdpa driver is to be removed and detached from
> >> the vdpa bus, such that other vdpa bus drivers, e.g.
> >> virtio-vdpa, can start with 1:1 DMA mapping when they
> >> are attached. For the context, those on-chip IOMMU parent
> >> devices, create the 1:1 DMA mapping at vdpa device creation,
> >> and they would implicitly destroy the 1:1 mapping when
> >> the first .set_map or .dma_map callback is invoked.
> >>
> >> This patchset is rebased on top of the latest vhost tree.
> >>
> >> [1] Reducing vdpa migration downtime because of memory pin / maps
> >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
> >>
> >> ---
> >> v4:
> >> - Rework compatibility using new .compat_reset driver op
> > I still think having a set_backend_feature()
> This will overload backend features with the role of carrying over
> compatibility quirks, which I tried to avoid from. While I think the
> .compat_reset from the v4 code just works with the backend features
> acknowledgement (and maybe others as well) to determine, but not
> directly tie it to backend features itself. These two have different
> implications in terms of requirement, scope and maintaining/deprecation,
> better to cope with compat quirks in explicit and driver visible way.
>
> >   or reset_map(clean=true) might be better.
> An explicit op might be marginally better in driver writer's point of
> view. Compliant driver doesn't have to bother asserting clean_map never
> be true so their code would never bother dealing with this case, as
> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
> during reset for older userspace":
>
> "
>      The separation of .compat_reset from the regular .reset allows
>      vhost-vdpa able to know which driver had broken behavior before, so it
>      can apply the corresponding compatibility quirk to the individual
> driver
>      whenever needed.  Compared to overloading the existing .reset with
>      flags, .compat_reset won't cause any extra burden to the implementation
>      of every compliant driver.
> "
>
> >   As it tries hard to not introduce new stuff on the bus.
> Honestly I don't see substantial difference between these other than the
> color. There's no single best solution that stands out among the 3. And
> I assume you already noticed it from all the above 3 approaches will
> have to go with backend features negotiation, that the 1st vdpa reset
> before backend feature negotiation will use the compliant version of
> .reset that doesn't clean up the map. While I don't think this nuance
> matters much to existing older userspace apps, as the maps should
> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
> bug-for-bug behavioral compatibility is what you want, module parameter
> will be the single best answer.
>
> Regards,
> -Siwei
>
> > But we can listen to others for sure.
> >
> > Thanks
> >
>
Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset
Posted by Si-Wei Liu 2 years, 1 month ago
Thanks a lot for testing! Please be aware that there's a follow-up fix 
for a potential oops in this v4 series:

https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/

Would be nice to have it applied for any tests.

Thanks,
-Siwei

On 10/23/2023 11:51 PM, Lei Yang wrote:
> QE tested this series v4 with regression testing on real nic, there is
> no new regression bug.
>
> Tested-by: Lei Yang <leiyang@redhat.com>
>
> On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 10/22/2023 8:51 PM, Jason Wang wrote:
>>> Hi Si-Wei:
>>>
>>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>> In order to reduce needlessly high setup and teardown cost
>>>> of iotlb mapping during live migration, it's crucial to
>>>> decouple the vhost-vdpa iotlb abstraction from the virtio
>>>> device life cycle, i.e. iotlb mappings should be left
>>>> intact across virtio device reset [1]. For it to work, the
>>>> on-chip IOMMU parent device could implement a separate
>>>> .reset_map() operation callback to restore 1:1 DMA mapping
>>>> without having to resort to the .reset() callback, the
>>>> latter of which is mainly used to reset virtio device state.
>>>> This new .reset_map() callback will be invoked only before
>>>> the vhost-vdpa driver is to be removed and detached from
>>>> the vdpa bus, such that other vdpa bus drivers, e.g.
>>>> virtio-vdpa, can start with 1:1 DMA mapping when they
>>>> are attached. For the context, those on-chip IOMMU parent
>>>> devices, create the 1:1 DMA mapping at vdpa device creation,
>>>> and they would implicitly destroy the 1:1 mapping when
>>>> the first .set_map or .dma_map callback is invoked.
>>>>
>>>> This patchset is rebased on top of the latest vhost tree.
>>>>
>>>> [1] Reducing vdpa migration downtime because of memory pin / maps
>>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
>>>>
>>>> ---
>>>> v4:
>>>> - Rework compatibility using new .compat_reset driver op
>>> I still think having a set_backend_feature()
>> This will overload backend features with the role of carrying over
>> compatibility quirks, which I tried to avoid from. While I think the
>> .compat_reset from the v4 code just works with the backend features
>> acknowledgement (and maybe others as well) to determine, but not
>> directly tie it to backend features itself. These two have different
>> implications in terms of requirement, scope and maintaining/deprecation,
>> better to cope with compat quirks in explicit and driver visible way.
>>
>>>    or reset_map(clean=true) might be better.
>> An explicit op might be marginally better in driver writer's point of
>> view. Compliant driver doesn't have to bother asserting clean_map never
>> be true so their code would never bother dealing with this case, as
>> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
>> during reset for older userspace":
>>
>> "
>>       The separation of .compat_reset from the regular .reset allows
>>       vhost-vdpa able to know which driver had broken behavior before, so it
>>       can apply the corresponding compatibility quirk to the individual
>> driver
>>       whenever needed.  Compared to overloading the existing .reset with
>>       flags, .compat_reset won't cause any extra burden to the implementation
>>       of every compliant driver.
>> "
>>
>>>    As it tries hard to not introduce new stuff on the bus.
>> Honestly I don't see substantial difference between these other than the
>> color. There's no single best solution that stands out among the 3. And
>> I assume you already noticed it from all the above 3 approaches will
>> have to go with backend features negotiation, that the 1st vdpa reset
>> before backend feature negotiation will use the compliant version of
>> .reset that doesn't clean up the map. While I don't think this nuance
>> matters much to existing older userspace apps, as the maps should
>> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
>> bug-for-bug behavioral compatibility is what you want, module parameter
>> will be the single best answer.
>>
>> Regards,
>> -Siwei
>>
>>> But we can listen to others for sure.
>>>
>>> Thanks
>>>

Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset
Posted by Lei Yang 2 years, 1 month ago
On Wed, Oct 25, 2023 at 1:27 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
Hello Si-Wei
> Thanks a lot for testing! Please be aware that there's a follow-up fix
> for a potential oops in this v4 series:
>
The first, when I did not apply this patch [1], I will also hit this
patch mentioned problem. After I applied this patch, this problem will
no longer to hit again. But I hit another issues, about the error
messages please review the attached file.
[1] https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/

My test steps:
git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
cd  linux/
b4 am 1697880319-4937-1-git-send-email-si-wei.liu@oracle.com
b4 am 20231018171456.1624030-2-dtatulea@nvidia.com
b4 am 1698102863-21122-1-git-send-email-si-wei.liu@oracle.com
git am ./v4_20231018_dtatulea_vdpa_add_support_for_vq_descriptor_mappings.mbx
git am ./v4_20231021_si_wei_liu_vdpa_decouple_reset_of_iotlb_mapping_from_device_reset.mbx
git am ./20231023_si_wei_liu_vhost_vdpa_fix_null_pointer_deref_in__compat_vdpa_reset.mbx
cp /boot/config-5.14.0-377.el9.x86_64 .config
make -j 32
make modules_install
make install

Thanks

Lei
> https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/
>
> Would be nice to have it applied for any tests.
>
> Thanks,
> -Siwei
>
> On 10/23/2023 11:51 PM, Lei Yang wrote:
> > QE tested this series v4 with regression testing on real nic, there is
> > no new regression bug.
> >
> > Tested-by: Lei Yang <leiyang@redhat.com>
> >
> > On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>
> >>
> >> On 10/22/2023 8:51 PM, Jason Wang wrote:
> >>> Hi Si-Wei:
> >>>
> >>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>> In order to reduce needlessly high setup and teardown cost
> >>>> of iotlb mapping during live migration, it's crucial to
> >>>> decouple the vhost-vdpa iotlb abstraction from the virtio
> >>>> device life cycle, i.e. iotlb mappings should be left
> >>>> intact across virtio device reset [1]. For it to work, the
> >>>> on-chip IOMMU parent device could implement a separate
> >>>> .reset_map() operation callback to restore 1:1 DMA mapping
> >>>> without having to resort to the .reset() callback, the
> >>>> latter of which is mainly used to reset virtio device state.
> >>>> This new .reset_map() callback will be invoked only before
> >>>> the vhost-vdpa driver is to be removed and detached from
> >>>> the vdpa bus, such that other vdpa bus drivers, e.g.
> >>>> virtio-vdpa, can start with 1:1 DMA mapping when they
> >>>> are attached. For the context, those on-chip IOMMU parent
> >>>> devices, create the 1:1 DMA mapping at vdpa device creation,
> >>>> and they would implicitly destroy the 1:1 mapping when
> >>>> the first .set_map or .dma_map callback is invoked.
> >>>>
> >>>> This patchset is rebased on top of the latest vhost tree.
> >>>>
> >>>> [1] Reducing vdpa migration downtime because of memory pin / maps
> >>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
> >>>>
> >>>> ---
> >>>> v4:
> >>>> - Rework compatibility using new .compat_reset driver op
> >>> I still think having a set_backend_feature()
> >> This will overload backend features with the role of carrying over
> >> compatibility quirks, which I tried to avoid from. While I think the
> >> .compat_reset from the v4 code just works with the backend features
> >> acknowledgement (and maybe others as well) to determine, but not
> >> directly tie it to backend features itself. These two have different
> >> implications in terms of requirement, scope and maintaining/deprecation,
> >> better to cope with compat quirks in explicit and driver visible way.
> >>
> >>>    or reset_map(clean=true) might be better.
> >> An explicit op might be marginally better in driver writer's point of
> >> view. Compliant driver doesn't have to bother asserting clean_map never
> >> be true so their code would never bother dealing with this case, as
> >> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
> >> during reset for older userspace":
> >>
> >> "
> >>       The separation of .compat_reset from the regular .reset allows
> >>       vhost-vdpa able to know which driver had broken behavior before, so it
> >>       can apply the corresponding compatibility quirk to the individual
> >> driver
> >>       whenever needed.  Compared to overloading the existing .reset with
> >>       flags, .compat_reset won't cause any extra burden to the implementation
> >>       of every compliant driver.
> >> "
> >>
> >>>    As it tries hard to not introduce new stuff on the bus.
> >> Honestly I don't see substantial difference between these other than the
> >> color. There's no single best solution that stands out among the 3. And
> >> I assume you already noticed it from all the above 3 approaches will
> >> have to go with backend features negotiation, that the 1st vdpa reset
> >> before backend feature negotiation will use the compliant version of
> >> .reset that doesn't clean up the map. While I don't think this nuance
> >> matters much to existing older userspace apps, as the maps should
> >> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
> >> bug-for-bug behavioral compatibility is what you want, module parameter
> >> will be the single best answer.
> >>
> >> Regards,
> >> -Siwei
> >>
> >>> But we can listen to others for sure.
> >>>
> >>> Thanks
> >>>
>
[ 6325.462426] BUG: unable to handle page fault for address: 00000001005b4af4
[ 6325.469301] #PF: supervisor read access in kernel mode
[ 6325.474440] #PF: error_code(0x0000) - not-present page
[ 6325.479577] PGD 16a80a067 P4D 0 
[ 6325.482811] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 6325.487169] CPU: 4 PID: 40387 Comm: qemu-kvm Not tainted 6.6.0-rc7+ #3
[ 6325.493695] Hardware name: Dell Inc. PowerEdge R750/0PJ80M, BIOS 1.8.2 09/14/2022
[ 6325.501175] RIP: 0010:_compat_vdpa_reset.isra.0+0x27/0xb0 [vhost_vdpa]
[ 6325.507708] Code: 90 90 90 0f 1f 44 00 00 41 55 4c 8d ae 08 03 00 00 41 54 55 48 89 f5 53 4c 8b a6 00 03 00 00 48 85 ff 74 49 48 8b 07 4c 89 ef <48> 8b 80 88 45 00 00 48 c1 e8 08 48 83 f0 01 89 c3 e8 73 5e 9b dc
[ 6325.526455] RSP: 0018:ff73a85762073ba0 EFLAGS: 00010286
[ 6325.531681] RAX: 00000001005b056c RBX: ff32b13ca6994c68 RCX: 0000000000000002
[ 6325.538813] RDX: 0000000000000001 RSI: ff32b13c07559000 RDI: ff32b13c07559308
[ 6325.545947] RBP: ff32b13c07559000 R08: 0000000000000000 R09: ff32b12ca497c0f0
[ 6325.553079] R10: ff73a85762073c58 R11: 0000000c106f9de3 R12: ff32b12c95b1d050
[ 6325.560212] R13: ff32b13c07559308 R14: ff32b12d0ddc5100 R15: 0000000000008002
[ 6325.567346] FS:  00007fec5b8cbf80(0000) GS:ff32b13bbfc80000(0000) knlGS:0000000000000000
[ 6325.575432] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6325.581177] CR2: 00000001005b4af4 CR3: 000000015644a003 CR4: 0000000000773ee0
[ 6325.588309] PKRU: 55555554
[ 6325.591022] Call Trace:
[ 6325.593477]  <TASK>
[ 6325.595582]  ? __die+0x20/0x70
[ 6325.598650]  ? page_fault_oops+0x76/0x170
[ 6325.602662]  ? exc_page_fault+0x65/0x150
[ 6325.606588]  ? asm_exc_page_fault+0x22/0x30
[ 6325.610775]  ? _compat_vdpa_reset.isra.0+0x27/0xb0 [vhost_vdpa]
[ 6325.616692]  vhost_vdpa_open+0x57/0x280 [vhost_vdpa]
[ 6325.621660]  ? __pfx_chrdev_open+0x10/0x10
[ 6325.625759]  chrdev_open+0xc6/0x260
[ 6325.629250]  ? __pfx_chrdev_open+0x10/0x10
[ 6325.633349]  do_dentry_open+0x16e/0x530
[ 6325.637189]  do_open+0x21c/0x400
[ 6325.640421]  path_openat+0x111/0x290
[ 6325.644000]  do_filp_open+0xb2/0x160
[ 6325.647582]  ? __check_object_size.part.0+0x5e/0x140
[ 6325.652548]  do_sys_openat2+0x96/0xd0
[ 6325.656212]  __x64_sys_openat+0x53/0xa0
[ 6325.660051]  do_syscall_64+0x59/0x90
[ 6325.663631]  ? syscall_exit_to_user_mode+0x22/0x40
[ 6325.668423]  ? do_syscall_64+0x69/0x90
[ 6325.672174]  ? syscall_exit_to_user_mode+0x22/0x40
[ 6325.676970]  ? do_syscall_64+0x69/0x90
[ 6325.680721]  ? do_syscall_64+0x69/0x90
[ 6325.684473]  ? syscall_exit_to_user_mode+0x22/0x40
[ 6325.689268]  ? do_syscall_64+0x69/0x90
[ 6325.693020]  ? exc_page_fault+0x65/0x150
[ 6325.696945]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 6325.701998] RIP: 0033:0x7fec5c33e654
[ 6325.705576] Code: 24 20 eb 8f 66 90 44 89 54 24 0c e8 b6 d5 f5 ff 44 8b 54 24 0c 44 89 e2 48 89 ee 41 89 c0 bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 89 44 24 0c e8 08 d6 f5 ff 8b 44
[ 6325.724322] RSP: 002b:00007ffebbe28fa0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[ 6325.731890] RAX: ffffffffffffffda RBX: 00007fea10018560 RCX: 00007fec5c33e654
[ 6325.739021] RDX: 0000000000080002 RSI: 00007fea10018560 RDI: 00000000ffffff9c
[ 6325.746156] RBP: 00007fea10018560 R08: 0000000000000000 R09: 0000000000000000
[ 6325.753287] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000080002
[ 6325.760419] R13: 00007fec5b328e70 R14: 00007fec5b328e80 R15: 0000000000000002
[ 6325.767552]  </TASK>
[ 6325.769744] Modules linked in: act_skbedit act_mirred mlx5_vdpa vringh vhost_vdpa vhost vhost_iotlb vdpa cls_matchall nfnetlink_cttimeout nfnetlink act_gact cls_flower sch_ingress openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs bridge stp llc qrtr intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlx5_ib kvm dell_wmi ledtrig_audio iTCO_wdt ib_uverbs isst_if_mmio sparse_keymap ib_core iTCO_vendor_support irqbypass isst_if_mbox_pci intel_vsec acpi_ipmi isst_if_common i2c_i801 rfkill rapl ipmi_ssif video dell_smbios joydev dax_hmem cxl_acpi intel_cstate mei_me ipmi_si dell_wmi_descriptor mei wmi_bmof dcdbas intel_pch_thermal ipmi_devintf intel_uncore ipmi_msghandler cxl_core i2c_smbus pcspkr acpi_power_meter xfs libcrc32c sd_mod mgag200 nvme_tcp sg i2c_algo_bit nvme_fabrics drm_shmem_helper nvme_core
[ 6325.769800]  drm_kms_helper nvme_common ahci crct10dif_pclmul mlx5_core t10_pi libahci crc32_pclmul drm mlxfw crc32c_intel libata psample megaraid_sas tg3 ghash_clmulni_intel pci_hyperv_intf wmi dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse
[ 6325.894242] CR2: 00000001005b4af4
[ 6325.897560] ---[ end trace 0000000000000000 ]---
[ 6325.967430] pstore: backend (erst) writing error (-28)
[ 6325.972570] RIP: 0010:_compat_vdpa_reset.isra.0+0x27/0xb0 [vhost_vdpa]
[ 6325.979094] Code: 90 90 90 0f 1f 44 00 00 41 55 4c 8d ae 08 03 00 00 41 54 55 48 89 f5 53 4c 8b a6 00 03 00 00 48 85 ff 74 49 48 8b 07 4c 89 ef <48> 8b 80 88 45 00 00 48 c1 e8 08 48 83 f0 01 89 c3 e8 73 5e 9b dc
[ 6325.997840] RSP: 0018:ff73a85762073ba0 EFLAGS: 00010286
[ 6326.003067] RAX: 00000001005b056c RBX: ff32b13ca6994c68 RCX: 0000000000000002
[ 6326.010200] RDX: 0000000000000001 RSI: ff32b13c07559000 RDI: ff32b13c07559308
[ 6326.017332] RBP: ff32b13c07559000 R08: 0000000000000000 R09: ff32b12ca497c0f0
[ 6326.024464] R10: ff73a85762073c58 R11: 0000000c106f9de3 R12: ff32b12c95b1d050
[ 6326.031599] R13: ff32b13c07559308 R14: ff32b12d0ddc5100 R15: 0000000000008002
[ 6326.038731] FS:  00007fec5b8cbf80(0000) GS:ff32b13bbfc80000(0000) knlGS:0000000000000000
[ 6326.046816] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6326.052563] CR2: 00000001005b4af4 CR3: 000000015644a003 CR4: 0000000000773ee0
[ 6326.059695] PKRU: 55555554
[ 6326.062407] Kernel panic - not syncing: Fatal exception
[ 6326.067651] Kernel Offset: 0x1c800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 6326.142894] ---[ end Kernel panic - not syncing: Fatal exception ]---
Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset
Posted by Si-Wei Liu 2 years, 1 month ago
Hi Yang Lei,

Thanks for testing my patches and reporting! As for the issue, could you 
please try what I posted in:

https://lore.kernel.org/virtualization/1698275594-19204-1-git-send-email-si-wei.liu@oracle.com/

and let me know how it goes? Thank you very much!

Thanks,
-Siwei

On 10/25/2023 2:41 AM, Lei Yang wrote:
> On Wed, Oct 25, 2023 at 1:27 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> Hello Si-Wei
>> Thanks a lot for testing! Please be aware that there's a follow-up fix
>> for a potential oops in this v4 series:
>>
> The first, when I did not apply this patch [1], I will also hit this
> patch mentioned problem. After I applied this patch, this problem will
> no longer to hit again. But I hit another issues, about the error
> messages please review the attached file.
> [1] https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/
>
> My test steps:
> git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> cd  linux/
> b4 am 1697880319-4937-1-git-send-email-si-wei.liu@oracle.com
> b4 am 20231018171456.1624030-2-dtatulea@nvidia.com
> b4 am 1698102863-21122-1-git-send-email-si-wei.liu@oracle.com
> git am ./v4_20231018_dtatulea_vdpa_add_support_for_vq_descriptor_mappings.mbx
> git am ./v4_20231021_si_wei_liu_vdpa_decouple_reset_of_iotlb_mapping_from_device_reset.mbx
> git am ./20231023_si_wei_liu_vhost_vdpa_fix_null_pointer_deref_in__compat_vdpa_reset.mbx
> cp /boot/config-5.14.0-377.el9.x86_64 .config
> make -j 32
> make modules_install
> make install
>
> Thanks
>
> Lei
>> https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/
>>
>> Would be nice to have it applied for any tests.
>>
>> Thanks,
>> -Siwei
>>
>> On 10/23/2023 11:51 PM, Lei Yang wrote:
>>> QE tested this series v4 with regression testing on real nic, there is
>>> no new regression bug.
>>>
>>> Tested-by: Lei Yang <leiyang@redhat.com>
>>>
>>> On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>
>>>> On 10/22/2023 8:51 PM, Jason Wang wrote:
>>>>> Hi Si-Wei:
>>>>>
>>>>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>>> In order to reduce needlessly high setup and teardown cost
>>>>>> of iotlb mapping during live migration, it's crucial to
>>>>>> decouple the vhost-vdpa iotlb abstraction from the virtio
>>>>>> device life cycle, i.e. iotlb mappings should be left
>>>>>> intact across virtio device reset [1]. For it to work, the
>>>>>> on-chip IOMMU parent device could implement a separate
>>>>>> .reset_map() operation callback to restore 1:1 DMA mapping
>>>>>> without having to resort to the .reset() callback, the
>>>>>> latter of which is mainly used to reset virtio device state.
>>>>>> This new .reset_map() callback will be invoked only before
>>>>>> the vhost-vdpa driver is to be removed and detached from
>>>>>> the vdpa bus, such that other vdpa bus drivers, e.g.
>>>>>> virtio-vdpa, can start with 1:1 DMA mapping when they
>>>>>> are attached. For the context, those on-chip IOMMU parent
>>>>>> devices, create the 1:1 DMA mapping at vdpa device creation,
>>>>>> and they would implicitly destroy the 1:1 mapping when
>>>>>> the first .set_map or .dma_map callback is invoked.
>>>>>>
>>>>>> This patchset is rebased on top of the latest vhost tree.
>>>>>>
>>>>>> [1] Reducing vdpa migration downtime because of memory pin / maps
>>>>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
>>>>>>
>>>>>> ---
>>>>>> v4:
>>>>>> - Rework compatibility using new .compat_reset driver op
>>>>> I still think having a set_backend_feature()
>>>> This will overload backend features with the role of carrying over
>>>> compatibility quirks, which I tried to avoid from. While I think the
>>>> .compat_reset from the v4 code just works with the backend features
>>>> acknowledgement (and maybe others as well) to determine, but not
>>>> directly tie it to backend features itself. These two have different
>>>> implications in terms of requirement, scope and maintaining/deprecation,
>>>> better to cope with compat quirks in explicit and driver visible way.
>>>>
>>>>>     or reset_map(clean=true) might be better.
>>>> An explicit op might be marginally better in driver writer's point of
>>>> view. Compliant driver doesn't have to bother asserting clean_map never
>>>> be true so their code would never bother dealing with this case, as
>>>> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
>>>> during reset for older userspace":
>>>>
>>>> "
>>>>        The separation of .compat_reset from the regular .reset allows
>>>>        vhost-vdpa able to know which driver had broken behavior before, so it
>>>>        can apply the corresponding compatibility quirk to the individual
>>>> driver
>>>>        whenever needed.  Compared to overloading the existing .reset with
>>>>        flags, .compat_reset won't cause any extra burden to the implementation
>>>>        of every compliant driver.
>>>> "
>>>>
>>>>>     As it tries hard to not introduce new stuff on the bus.
>>>> Honestly I don't see substantial difference between these other than the
>>>> color. There's no single best solution that stands out among the 3. And
>>>> I assume you already noticed it from all the above 3 approaches will
>>>> have to go with backend features negotiation, that the 1st vdpa reset
>>>> before backend feature negotiation will use the compliant version of
>>>> .reset that doesn't clean up the map. While I don't think this nuance
>>>> matters much to existing older userspace apps, as the maps should
>>>> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
>>>> bug-for-bug behavioral compatibility is what you want, module parameter
>>>> will be the single best answer.
>>>>
>>>> Regards,
>>>> -Siwei
>>>>
>>>>> But we can listen to others for sure.
>>>>>
>>>>> Thanks
>>>>>

Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset
Posted by Lei Yang 2 years, 1 month ago
On Thu, Oct 26, 2023 at 7:32 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Hi Yang Lei,
>
> Thanks for testing my patches and reporting! As for the issue, could you
> please try what I posted in:
>
> https://lore.kernel.org/virtualization/1698275594-19204-1-git-send-email-si-wei.liu@oracle.com/
>
HI Si-Wei
> and let me know how it goes? Thank you very much!

This problem has gone after applying this patch [1].
[1] https://lore.kernel.org/virtualization/1698275594-19204-1-git-send-email-si-wei.liu@oracle.com/

Thanks
Lei
>
> Thanks,
> -Siwei
>
> On 10/25/2023 2:41 AM, Lei Yang wrote:
> > On Wed, Oct 25, 2023 at 1:27 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> > Hello Si-Wei
> >> Thanks a lot for testing! Please be aware that there's a follow-up fix
> >> for a potential oops in this v4 series:
> >>
> > The first, when I did not apply this patch [1], I will also hit this
> > patch mentioned problem. After I applied this patch, this problem will
> > no longer to hit again. But I hit another issues, about the error
> > messages please review the attached file.
> > [1] https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/
> >
> > My test steps:
> > git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > cd  linux/
> > b4 am 1697880319-4937-1-git-send-email-si-wei.liu@oracle.com
> > b4 am 20231018171456.1624030-2-dtatulea@nvidia.com
> > b4 am 1698102863-21122-1-git-send-email-si-wei.liu@oracle.com
> > git am ./v4_20231018_dtatulea_vdpa_add_support_for_vq_descriptor_mappings.mbx
> > git am ./v4_20231021_si_wei_liu_vdpa_decouple_reset_of_iotlb_mapping_from_device_reset.mbx
> > git am ./20231023_si_wei_liu_vhost_vdpa_fix_null_pointer_deref_in__compat_vdpa_reset.mbx
> > cp /boot/config-5.14.0-377.el9.x86_64 .config
> > make -j 32
> > make modules_install
> > make install
> >
> > Thanks
> >
> > Lei
> >> https://lore.kernel.org/virtualization/1698102863-21122-1-git-send-email-si-wei.liu@oracle.com/
> >>
> >> Would be nice to have it applied for any tests.
> >>
> >> Thanks,
> >> -Siwei
> >>
> >> On 10/23/2023 11:51 PM, Lei Yang wrote:
> >>> QE tested this series v4 with regression testing on real nic, there is
> >>> no new regression bug.
> >>>
> >>> Tested-by: Lei Yang <leiyang@redhat.com>
> >>>
> >>> On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>>
> >>>> On 10/22/2023 8:51 PM, Jason Wang wrote:
> >>>>> Hi Si-Wei:
> >>>>>
> >>>>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>>>> In order to reduce needlessly high setup and teardown cost
> >>>>>> of iotlb mapping during live migration, it's crucial to
> >>>>>> decouple the vhost-vdpa iotlb abstraction from the virtio
> >>>>>> device life cycle, i.e. iotlb mappings should be left
> >>>>>> intact across virtio device reset [1]. For it to work, the
> >>>>>> on-chip IOMMU parent device could implement a separate
> >>>>>> .reset_map() operation callback to restore 1:1 DMA mapping
> >>>>>> without having to resort to the .reset() callback, the
> >>>>>> latter of which is mainly used to reset virtio device state.
> >>>>>> This new .reset_map() callback will be invoked only before
> >>>>>> the vhost-vdpa driver is to be removed and detached from
> >>>>>> the vdpa bus, such that other vdpa bus drivers, e.g.
> >>>>>> virtio-vdpa, can start with 1:1 DMA mapping when they
> >>>>>> are attached. For the context, those on-chip IOMMU parent
> >>>>>> devices, create the 1:1 DMA mapping at vdpa device creation,
> >>>>>> and they would implicitly destroy the 1:1 mapping when
> >>>>>> the first .set_map or .dma_map callback is invoked.
> >>>>>>
> >>>>>> This patchset is rebased on top of the latest vhost tree.
> >>>>>>
> >>>>>> [1] Reducing vdpa migration downtime because of memory pin / maps
> >>>>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
> >>>>>>
> >>>>>> ---
> >>>>>> v4:
> >>>>>> - Rework compatibility using new .compat_reset driver op
> >>>>> I still think having a set_backend_feature()
> >>>> This will overload backend features with the role of carrying over
> >>>> compatibility quirks, which I tried to avoid from. While I think the
> >>>> .compat_reset from the v4 code just works with the backend features
> >>>> acknowledgement (and maybe others as well) to determine, but not
> >>>> directly tie it to backend features itself. These two have different
> >>>> implications in terms of requirement, scope and maintaining/deprecation,
> >>>> better to cope with compat quirks in explicit and driver visible way.
> >>>>
> >>>>>     or reset_map(clean=true) might be better.
> >>>> An explicit op might be marginally better in driver writer's point of
> >>>> view. Compliant driver doesn't have to bother asserting clean_map never
> >>>> be true so their code would never bother dealing with this case, as
> >>>> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
> >>>> during reset for older userspace":
> >>>>
> >>>> "
> >>>>        The separation of .compat_reset from the regular .reset allows
> >>>>        vhost-vdpa able to know which driver had broken behavior before, so it
> >>>>        can apply the corresponding compatibility quirk to the individual
> >>>> driver
> >>>>        whenever needed.  Compared to overloading the existing .reset with
> >>>>        flags, .compat_reset won't cause any extra burden to the implementation
> >>>>        of every compliant driver.
> >>>> "
> >>>>
> >>>>>     As it tries hard to not introduce new stuff on the bus.
> >>>> Honestly I don't see substantial difference between these other than the
> >>>> color. There's no single best solution that stands out among the 3. And
> >>>> I assume you already noticed it from all the above 3 approaches will
> >>>> have to go with backend features negotiation, that the 1st vdpa reset
> >>>> before backend feature negotiation will use the compliant version of
> >>>> .reset that doesn't clean up the map. While I don't think this nuance
> >>>> matters much to existing older userspace apps, as the maps should
> >>>> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
> >>>> bug-for-bug behavioral compatibility is what you want, module parameter
> >>>> will be the single best answer.
> >>>>
> >>>> Regards,
> >>>> -Siwei
> >>>>
> >>>>> But we can listen to others for sure.
> >>>>>
> >>>>> Thanks
> >>>>>
>