[PATCH v3] riscv: mm: Avoid spurious fault after hotplugging vmemmap

Vivian Wang posted 1 patch 3 days, 1 hour ago
arch/riscv/include/asm/pgtable.h | 4 ++++
arch/riscv/mm/init.c             | 6 ++++++
mm/sparse-vmemmap.c              | 8 ++++++++
3 files changed, 18 insertions(+)
[PATCH v3] riscv: mm: Avoid spurious fault after hotplugging vmemmap
Posted by Vivian Wang 3 days, 1 hour ago
section_activate() does not flush TLB after populating new vmemmap
pages. On most architectures, this is okay. However it is a problem on
RISC-V since there the TLB caching non-present entries is permitted,
which causes spurious faults on some hardwares.

This seems to be most easily reproduced with DEBUG_VM=y and
PAGE_POISONING=y, which causes these newly mapped struct pages to be
poisoned i.e. written to immediately after mapping.

Add a hook vmemmap_populate_finalize() in __populate_section_memmap()
after population, to allow architectures to handle such situations as
needed. Then implement it on RISC-V to arrange for the existing
exception handler code to deal with these faults if they happen.

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
---
Depends on my earlier kfence fixes for mark_new_valid_map() [1].

Found while testing AMD_HSA/ZONE_DEVICE on SpacemiT K3. Using
ZONE_DEVICE requires another fix [2].

[1]: https://lore.kernel.org/linux-riscv/20260303-handle-kfence-protect-spurious-fault-v2-0-f80d8354d79d@iscas.ac.cn
[2]: https://lore.kernel.org/linux-riscv/20260309-riscv-sparsemem-vmemmap-limits-v1-2-f40efe18e3cd@iscas.ac.cn
---
Changes in v3:
- Merged back into one patch (Mike)
- (No code changes otherwise.)
- Link to v2: https://patch.msgid.link/20260604-mark-after-vmemmap-populate-v2-0-ab6a7d03b434@iscas.ac.cn

Changes in v2:
- Split patch in two, hook point and riscv hook 
- Explain hook necessity in patch 1 message (Mike)
- Make hook #define based (Mike)
- Call finalize hook only on populate success
- Link to v1: https://patch.msgid.link/20260525-mark-after-vmemmap-populate-v1-1-e698d859ba16@iscas.ac.cn
---
 arch/riscv/include/asm/pgtable.h | 4 ++++
 arch/riscv/mm/init.c             | 6 ++++++
 mm/sparse-vmemmap.c              | 8 ++++++++
 3 files changed, 18 insertions(+)

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index a1a7c6520a09..aa0f50e3d534 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -1243,6 +1243,10 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
 #define TASK_SIZE	FIXADDR_START
 #endif
 
+/* Needed on SPARSEMEM_VMEMMAP */
+#define vmemmap_populate_finalize vmemmap_populate_finalize
+void __meminit vmemmap_populate_finalize(void);
+
 #else /* CONFIG_MMU */
 
 #define PAGE_SHARED		__pgprot(0)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 706f43523935..cf9ae4099f82 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1360,6 +1360,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	 */
 	return vmemmap_populate_hugepages(start, end, node, altmap);
 }
+
+void __meminit vmemmap_populate_finalize(void)
+{
+	/* Avoid faults on cached non-present TLB entries. */
+	mark_new_valid_map();
+}
 #endif
 
 #if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 6eadb9d116e4..2a8b923fabe8 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -544,6 +544,12 @@ static int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
 
 #endif
 
+#ifndef vmemmap_populate_finalize
+static void __meminit vmemmap_populate_finalize(void)
+{
+}
+#endif
+
 struct page * __meminit __populate_section_memmap(unsigned long pfn,
 		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
 		struct dev_pagemap *pgmap)
@@ -564,6 +570,8 @@ struct page * __meminit __populate_section_memmap(unsigned long pfn,
 	if (r < 0)
 		return NULL;
 
+	vmemmap_populate_finalize();
+
 	return pfn_to_page(pfn);
 }
 

---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20260525-mark-after-vmemmap-populate-68bd790839c9
prerequisite-message-id: <20260303-handle-kfence-protect-spurious-fault-v2-0-f80d8354d79d@iscas.ac.cn>
prerequisite-patch-id: fdc42f2647e21d111f44a6532887a6705cd470a9
prerequisite-patch-id: 096fa339c84c36643ae4311fd8362dc63e23d950
prerequisite-patch-id: 305c876a5f4a23a840a8142aea79b796ed297545
prerequisite-patch-id: d78cb55d6a616b1170f06a401c8fd44acd11e5d5
prerequisite-patch-id: b02b4a76e94f3e2821291d4c23b46f6e5ecf5203

Best regards,
--  
Vivian Wang <wangruikang@iscas.ac.cn>
Re: [PATCH v3] riscv: mm: Avoid spurious fault after hotplugging vmemmap
Posted by David Hildenbrand (Arm) 2 days, 21 hours ago
On 6/5/26 07:23, Vivian Wang wrote:
> section_activate() does not flush TLB after populating new vmemmap
> pages. On most architectures, this is okay. However it is a problem on
> RISC-V since there the TLB caching non-present entries is permitted,
> which causes spurious faults on some hardwares.
> 
> This seems to be most easily reproduced with DEBUG_VM=y and
> PAGE_POISONING=y, which causes these newly mapped struct pages to be
> poisoned i.e. written to immediately after mapping.
> 
> Add a hook vmemmap_populate_finalize() in __populate_section_memmap()
> after population, to allow architectures to handle such situations as
> needed. Then implement it on RISC-V to arrange for the existing
> exception handler code to deal with these faults if they happen.
> 
> Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
> ---
> Depends on my earlier kfence fixes for mark_new_valid_map() [1].
> 
> Found while testing AMD_HSA/ZONE_DEVICE on SpacemiT K3. Using
> ZONE_DEVICE requires another fix [2].
> 
> [1]: https://lore.kernel.org/linux-riscv/20260303-handle-kfence-protect-spurious-fault-v2-0-f80d8354d79d@iscas.ac.cn
> [2]: https://lore.kernel.org/linux-riscv/20260309-riscv-sparsemem-vmemmap-limits-v1-2-f40efe18e3cd@iscas.ac.cn
> ---
> Changes in v3:
> - Merged back into one patch (Mike)
> - (No code changes otherwise.)
> - Link to v2: https://patch.msgid.link/20260604-mark-after-vmemmap-populate-v2-0-ab6a7d03b434@iscas.ac.cn
> 
> Changes in v2:
> - Split patch in two, hook point and riscv hook 
> - Explain hook necessity in patch 1 message (Mike)
> - Make hook #define based (Mike)
> - Call finalize hook only on populate success
> - Link to v1: https://patch.msgid.link/20260525-mark-after-vmemmap-populate-v1-1-e698d859ba16@iscas.ac.cn
> ---
>  arch/riscv/include/asm/pgtable.h | 4 ++++
>  arch/riscv/mm/init.c             | 6 ++++++
>  mm/sparse-vmemmap.c              | 8 ++++++++
>  3 files changed, 18 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index a1a7c6520a09..aa0f50e3d534 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -1243,6 +1243,10 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>  #define TASK_SIZE	FIXADDR_START
>  #endif
>  
> +/* Needed on SPARSEMEM_VMEMMAP */
> +#define vmemmap_populate_finalize vmemmap_populate_finalize
> +void __meminit vmemmap_populate_finalize(void);
> +
>  #else /* CONFIG_MMU */
>  
>  #define PAGE_SHARED		__pgprot(0)
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 706f43523935..cf9ae4099f82 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -1360,6 +1360,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  	 */
>  	return vmemmap_populate_hugepages(start, end, node, altmap);
>  }
> +
> +void __meminit vmemmap_populate_finalize(void)
> +{
> +	/* Avoid faults on cached non-present TLB entries. */
> +	mark_new_valid_map();
> +}
>  #endif

How does mark_new_valid_map() handle concurrent access?

There is some comment in there, but I am not sure if that actually implies that
mark_new_valid_map() can be called concurrently from arbitrary context?


-- 
Cheers,

David
Re: [PATCH v3] riscv: mm: Avoid spurious fault after hotplugging vmemmap
Posted by Vivian Wang 4 hours ago
On 6/5/26 17:16, David Hildenbrand (Arm) wrote:
> On 6/5/26 07:23, Vivian Wang wrote:
>> section_activate() does not flush TLB after populating new vmemmap
>> pages. On most architectures, this is okay. However it is a problem on
>> RISC-V since there the TLB caching non-present entries is permitted,
>> which causes spurious faults on some hardwares.
>>
>> This seems to be most easily reproduced with DEBUG_VM=y and
>> PAGE_POISONING=y, which causes these newly mapped struct pages to be
>> poisoned i.e. written to immediately after mapping.
>>
>> Add a hook vmemmap_populate_finalize() in __populate_section_memmap()
>> after population, to allow architectures to handle such situations as
>> needed. Then implement it on RISC-V to arrange for the existing
>> exception handler code to deal with these faults if they happen.
>>
>> Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
>> ---
>> Depends on my earlier kfence fixes for mark_new_valid_map() [1].
>>
>> Found while testing AMD_HSA/ZONE_DEVICE on SpacemiT K3. Using
>> ZONE_DEVICE requires another fix [2].
>>
>> [1]: https://lore.kernel.org/linux-riscv/20260303-handle-kfence-protect-spurious-fault-v2-0-f80d8354d79d@iscas.ac.cn
>> [2]: https://lore.kernel.org/linux-riscv/20260309-riscv-sparsemem-vmemmap-limits-v1-2-f40efe18e3cd@iscas.ac.cn
>> ---
>> Changes in v3:
>> - Merged back into one patch (Mike)
>> - (No code changes otherwise.)
>> - Link to v2: https://patch.msgid.link/20260604-mark-after-vmemmap-populate-v2-0-ab6a7d03b434@iscas.ac.cn
>>
>> Changes in v2:
>> - Split patch in two, hook point and riscv hook 
>> - Explain hook necessity in patch 1 message (Mike)
>> - Make hook #define based (Mike)
>> - Call finalize hook only on populate success
>> - Link to v1: https://patch.msgid.link/20260525-mark-after-vmemmap-populate-v1-1-e698d859ba16@iscas.ac.cn
>> ---
>>  arch/riscv/include/asm/pgtable.h | 4 ++++
>>  arch/riscv/mm/init.c             | 6 ++++++
>>  mm/sparse-vmemmap.c              | 8 ++++++++
>>  3 files changed, 18 insertions(+)
>>
>> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
>> index a1a7c6520a09..aa0f50e3d534 100644
>> --- a/arch/riscv/include/asm/pgtable.h
>> +++ b/arch/riscv/include/asm/pgtable.h
>> @@ -1243,6 +1243,10 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>>  #define TASK_SIZE	FIXADDR_START
>>  #endif
>>  
>> +/* Needed on SPARSEMEM_VMEMMAP */
>> +#define vmemmap_populate_finalize vmemmap_populate_finalize
>> +void __meminit vmemmap_populate_finalize(void);
>> +
>>  #else /* CONFIG_MMU */
>>  
>>  #define PAGE_SHARED		__pgprot(0)
>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
>> index 706f43523935..cf9ae4099f82 100644
>> --- a/arch/riscv/mm/init.c
>> +++ b/arch/riscv/mm/init.c
>> @@ -1360,6 +1360,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>  	 */
>>  	return vmemmap_populate_hugepages(start, end, node, altmap);
>>  }
>> +
>> +void __meminit vmemmap_populate_finalize(void)
>> +{
>> +	/* Avoid faults on cached non-present TLB entries. */
>> +	mark_new_valid_map();
>> +}
>>  #endif
> How does mark_new_valid_map() handle concurrent access?
>
> There is some comment in there, but I am not sure if that actually implies that
> mark_new_valid_map() can be called concurrently from arbitrary context?

AFAICT, yes, it is intended to be used concurrently from arbitrary
context, and this is why I used them.

I had not originally written the code, so some riscv maintainer can
maybe chime in on this, but as I understand it the idea is that we only
ever set bits in the bitmap in mark_new_valid_map() while "allocating",
whereas the assembly code in handle_exception in
arch/riscv/kernel/entry.S only ever clears individual bits. So at least
the access to the bitmap itself should be fine.

This was added in commit 503638e0babf ("riscv: Stop emitting preventive
sfence.vma for new vmalloc mappings").

Do you think the mark_new_valid_map() function needs extra
synchronization like smp_wmb()?

Regards,
Vivian "dramforever" Wang