arch/arm64/include/asm/tlbflush.h | 53 ++++++++++++++++++------------- 1 file changed, 31 insertions(+), 22 deletions(-)
A kernel crash on the destination VM after the live migration was
reported by Yihuang Yu. The issue is only reproducible on NVidia's
grace-hopper where TLBI RANGE feature is available. The kernel crash
is caused by incomplete TLB flush and missed dirty page. For the
root cause and analysis, please refer to PATCH[v3 1/3]'s commit log.
Thanks to Marc Zyngier who proposed all the code changes.
PATCH[1] fixes the kernel crash by extending __TLBI_RANGE_NUM() so that
the TLBI RANGE on the area with MAX_TLBI_RANGE_PAGES pages can
be supported
PATCH[2] improves __TLBI_VADDR_RANGE() with masks and FIELD_PREP()
PATCH[3] allows TLBI RANGE operation on the area with MAX_TLBI_RANGE_PAGES
pages in __flush_tlb_range_nosync()
v2: https://lists.infradead.org/pipermail/linux-arm-kernel/2024-April/917432.html
v1: https://lists.infradead.org/pipermail/linux-arm-kernel/2024-April/916972.html
Changelog
=========
v3:
Improve __TLBI_RANGE_NUM() and its comments. Added patches
to improve __TLBI_VADDR_RANGE() and __flush_tlb_range_nosync() (Marc)
v2:
Improve __TLBI_RANGE_NUM() (Marc)
Gavin Shan (3):
arm64: tlb: Fix TLBI RANGE operand
arm64: tlb: Improve __TLBI_VADDR_RANGE()
arm64: tlb: Allow range operation for MAX_TLBI_RANGE_PAGES
arch/arm64/include/asm/tlbflush.h | 53 ++++++++++++++++++-------------
1 file changed, 31 insertions(+), 22 deletions(-)
--
2.44.0
On Fri, 05 Apr 2024 13:58:49 +1000, Gavin Shan wrote:
> A kernel crash on the destination VM after the live migration was
> reported by Yihuang Yu. The issue is only reproducible on NVidia's
> grace-hopper where TLBI RANGE feature is available. The kernel crash
> is caused by incomplete TLB flush and missed dirty page. For the
> root cause and analysis, please refer to PATCH[v3 1/3]'s commit log.
>
> Thanks to Marc Zyngier who proposed all the code changes.
>
> [...]
Applied to arm64 (for-next/fixes), thanks!
[1/3] arm64: tlb: Fix TLBI RANGE operand
(no commit info)
[2/3] arm64: tlb: Improve __TLBI_VADDR_RANGE()
https://git.kernel.org/arm64/c/e07255d69702
[3/3] arm64: tlb: Allow range operation for MAX_TLBI_RANGE_PAGES
https://git.kernel.org/arm64/c/73301e464a72
Cheers,
--
Will
https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
On 4/5/24 11:58, Gavin Shan wrote: > A kernel crash on the destination VM after the live migration was > reported by Yihuang Yu. The issue is only reproducible on NVidia's > grace-hopper where TLBI RANGE feature is available. The kernel crash > is caused by incomplete TLB flush and missed dirty page. For the > root cause and analysis, please refer to PATCH[v3 1/3]'s commit log. > > Thanks to Marc Zyngier who proposed all the code changes. > > PATCH[1] fixes the kernel crash by extending __TLBI_RANGE_NUM() so that > the TLBI RANGE on the area with MAX_TLBI_RANGE_PAGES pages can > be supported > PATCH[2] improves __TLBI_VADDR_RANGE() with masks and FIELD_PREP() > PATCH[3] allows TLBI RANGE operation on the area with MAX_TLBI_RANGE_PAGES > pages in __flush_tlb_range_nosync() > > v2: https://lists.infradead.org/pipermail/linux-arm-kernel/2024-April/917432.html > v1: https://lists.infradead.org/pipermail/linux-arm-kernel/2024-April/916972.html > > Changelog > ========= > v3: > Improve __TLBI_RANGE_NUM() and its comments. Added patches > to improve __TLBI_VADDR_RANGE() and __flush_tlb_range_nosync() (Marc) > v2: > Improve __TLBI_RANGE_NUM() (Marc) > > Gavin Shan (3): > arm64: tlb: Fix TLBI RANGE operand > arm64: tlb: Improve __TLBI_VADDR_RANGE() > arm64: tlb: Allow range operation for MAX_TLBI_RANGE_PAGES > > arch/arm64/include/asm/tlbflush.h | 53 ++++++++++++++++++------------- > 1 file changed, 31 insertions(+), 22 deletions(-) > For the series. Reviewed-by: Shaoqin Huang <shahuang@redhat.com> -- Shaoqin
On Fri, 05 Apr 2024 13:58:49 +1000, Gavin Shan wrote:
> A kernel crash on the destination VM after the live migration was
> reported by Yihuang Yu. The issue is only reproducible on NVidia's
> grace-hopper where TLBI RANGE feature is available. The kernel crash
> is caused by incomplete TLB flush and missed dirty page. For the
> root cause and analysis, please refer to PATCH[v3 1/3]'s commit log.
>
> Thanks to Marc Zyngier who proposed all the code changes.
>
> [...]
Applied to arm64 (for-next/fixes), thanks!
[1/3] arm64: tlb: Fix TLBI RANGE operand
https://git.kernel.org/arm64/c/e3ba51ab24fd
--
Catalin
© 2016 - 2026 Red Hat, Inc.