[PATCH v3 0/3] arm64: tlb: Fix TLBI RANGE operand

Gavin Shan posted 3 patches 1 year, 10 months ago
arch/arm64/include/asm/tlbflush.h | 53 ++++++++++++++++++-------------
1 file changed, 31 insertions(+), 22 deletions(-)
[PATCH v3 0/3] arm64: tlb: Fix TLBI RANGE operand
Posted by Gavin Shan 1 year, 10 months ago
A kernel crash on the destination VM after the live migration was
reported by Yihuang Yu. The issue is only reproducible on NVidia's
grace-hopper where TLBI RANGE feature is available. The kernel crash
is caused by incomplete TLB flush and missed dirty page. For the
root cause and analysis, please refer to PATCH[v3 1/3]'s commit log.

Thanks to Marc Zyngier who proposed all the code changes.

PATCH[1] fixes the kernel crash by extending __TLBI_RANGE_NUM() so that
         the TLBI RANGE on the area with MAX_TLBI_RANGE_PAGES pages can
         be supported
PATCH[2] improves __TLBI_VADDR_RANGE() with masks and FIELD_PREP()
PATCH[3] allows TLBI RANGE operation on the area with MAX_TLBI_RANGE_PAGES
         pages in __flush_tlb_range_nosync()

v2: https://lists.infradead.org/pipermail/linux-arm-kernel/2024-April/917432.html
v1: https://lists.infradead.org/pipermail/linux-arm-kernel/2024-April/916972.html

Changelog
=========
v3:
  Improve __TLBI_RANGE_NUM() and its comments. Added patches
  to improve __TLBI_VADDR_RANGE() and __flush_tlb_range_nosync() (Marc) 
v2:
  Improve __TLBI_RANGE_NUM()                                     (Marc)

Gavin Shan (3):
  arm64: tlb: Fix TLBI RANGE operand
  arm64: tlb: Improve __TLBI_VADDR_RANGE()
  arm64: tlb: Allow range operation for MAX_TLBI_RANGE_PAGES

 arch/arm64/include/asm/tlbflush.h | 53 ++++++++++++++++++-------------
 1 file changed, 31 insertions(+), 22 deletions(-)

-- 
2.44.0
Re: [PATCH v3 0/3] arm64: tlb: Fix TLBI RANGE operand
Posted by Will Deacon 1 year, 10 months ago
On Fri, 05 Apr 2024 13:58:49 +1000, Gavin Shan wrote:
> A kernel crash on the destination VM after the live migration was
> reported by Yihuang Yu. The issue is only reproducible on NVidia's
> grace-hopper where TLBI RANGE feature is available. The kernel crash
> is caused by incomplete TLB flush and missed dirty page. For the
> root cause and analysis, please refer to PATCH[v3 1/3]'s commit log.
> 
> Thanks to Marc Zyngier who proposed all the code changes.
> 
> [...]

Applied to arm64 (for-next/fixes), thanks!

[1/3] arm64: tlb: Fix TLBI RANGE operand
      (no commit info)
[2/3] arm64: tlb: Improve __TLBI_VADDR_RANGE()
      https://git.kernel.org/arm64/c/e07255d69702
[3/3] arm64: tlb: Allow range operation for MAX_TLBI_RANGE_PAGES
      https://git.kernel.org/arm64/c/73301e464a72

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
Re: [PATCH v3 0/3] arm64: tlb: Fix TLBI RANGE operand
Posted by Shaoqin Huang 1 year, 10 months ago

On 4/5/24 11:58, Gavin Shan wrote:
> A kernel crash on the destination VM after the live migration was
> reported by Yihuang Yu. The issue is only reproducible on NVidia's
> grace-hopper where TLBI RANGE feature is available. The kernel crash
> is caused by incomplete TLB flush and missed dirty page. For the
> root cause and analysis, please refer to PATCH[v3 1/3]'s commit log.
> 
> Thanks to Marc Zyngier who proposed all the code changes.
> 
> PATCH[1] fixes the kernel crash by extending __TLBI_RANGE_NUM() so that
>           the TLBI RANGE on the area with MAX_TLBI_RANGE_PAGES pages can
>           be supported
> PATCH[2] improves __TLBI_VADDR_RANGE() with masks and FIELD_PREP()
> PATCH[3] allows TLBI RANGE operation on the area with MAX_TLBI_RANGE_PAGES
>           pages in __flush_tlb_range_nosync()
> 
> v2: https://lists.infradead.org/pipermail/linux-arm-kernel/2024-April/917432.html
> v1: https://lists.infradead.org/pipermail/linux-arm-kernel/2024-April/916972.html
> 
> Changelog
> =========
> v3:
>    Improve __TLBI_RANGE_NUM() and its comments. Added patches
>    to improve __TLBI_VADDR_RANGE() and __flush_tlb_range_nosync() (Marc)
> v2:
>    Improve __TLBI_RANGE_NUM()                                     (Marc)
> 
> Gavin Shan (3):
>    arm64: tlb: Fix TLBI RANGE operand
>    arm64: tlb: Improve __TLBI_VADDR_RANGE()
>    arm64: tlb: Allow range operation for MAX_TLBI_RANGE_PAGES
> 
>   arch/arm64/include/asm/tlbflush.h | 53 ++++++++++++++++++-------------
>   1 file changed, 31 insertions(+), 22 deletions(-)
> 

For the series.

Reviewed-by: Shaoqin Huang <shahuang@redhat.com>

-- 
Shaoqin
Re: (subset) [PATCH v3 0/3] arm64: tlb: Fix TLBI RANGE operand
Posted by Catalin Marinas 1 year, 10 months ago
On Fri, 05 Apr 2024 13:58:49 +1000, Gavin Shan wrote:
> A kernel crash on the destination VM after the live migration was
> reported by Yihuang Yu. The issue is only reproducible on NVidia's
> grace-hopper where TLBI RANGE feature is available. The kernel crash
> is caused by incomplete TLB flush and missed dirty page. For the
> root cause and analysis, please refer to PATCH[v3 1/3]'s commit log.
> 
> Thanks to Marc Zyngier who proposed all the code changes.
> 
> [...]

Applied to arm64 (for-next/fixes), thanks!

[1/3] arm64: tlb: Fix TLBI RANGE operand
      https://git.kernel.org/arm64/c/e3ba51ab24fd

-- 
Catalin