[PATCH v2 00/18] riscv: Memory type control for platforms with physical memory aliases

Samuel Holland posted 18 patches 2 months, 1 week ago
There is a newer version of this series
.../bindings/riscv/physical-memory.yaml       |  91 +++++++
arch/riscv/Kconfig                            |  16 ++
arch/riscv/Kconfig.errata                     |  19 --
arch/riscv/Kconfig.socs                       |   4 +
arch/riscv/boot/dts/eswin/eic7700.dtsi        |   5 +
.../boot/dts/starfive/jh7100-common.dtsi      |  24 --
arch/riscv/boot/dts/starfive/jh7100.dtsi      |   4 +
arch/riscv/include/asm/alternative-macros.h   |  45 +++-
arch/riscv/include/asm/errata_list.h          |  45 ----
arch/riscv/include/asm/hwcap.h                |   1 +
arch/riscv/include/asm/pgtable-32.h           |  17 +-
arch/riscv/include/asm/pgtable-64.h           | 228 +++++++++++++-----
arch/riscv/include/asm/pgtable-bits.h         |  43 +++-
arch/riscv/include/asm/pgtable.h              |  67 ++---
arch/riscv/kernel/alternative.c               |   4 +-
arch/riscv/kernel/cpufeature.c                |   6 +
arch/riscv/kernel/hibernate.c                 |  18 +-
arch/riscv/kernel/setup.c                     |   1 +
arch/riscv/kvm/gstage.c                       |   6 +-
arch/riscv/mm/Makefile                        |   1 +
arch/riscv/mm/init.c                          |  68 +++---
arch/riscv/mm/memory-alias.S                  | 123 ++++++++++
arch/riscv/mm/pgtable.c                       | 114 +++++++--
arch/riscv/mm/ptdump.c                        |  16 +-
fs/dax.c                                      |   4 +-
fs/proc/task_mmu.c                            |  27 ++-
fs/userfaultfd.c                              |   6 +-
include/dt-bindings/riscv/physical-memory.h   |  44 ++++
include/linux/huge_mm.h                       |   8 +-
include/linux/mm.h                            |  14 +-
include/linux/pgtable.h                       | 112 ++++-----
kernel/events/core.c                          |   8 +-
mm/debug_vm_pgtable.c                         |   4 +-
mm/filemap.c                                  |   6 +-
mm/gup.c                                      |  37 +--
mm/hmm.c                                      |   2 +-
mm/huge_memory.c                              |  90 +++----
mm/hugetlb.c                                  |  10 +-
mm/hugetlb_vmemmap.c                          |   4 +-
mm/kasan/init.c                               |  39 +--
mm/kasan/shadow.c                             |  12 +-
mm/khugepaged.c                               |  10 +-
mm/ksm.c                                      |   2 +-
mm/madvise.c                                  |   8 +-
mm/mapping_dirty_helpers.c                    |   2 +-
mm/memory-failure.c                           |  14 +-
mm/memory.c                                   |  78 +++---
mm/mempolicy.c                                |   4 +-
mm/migrate.c                                  |   4 +-
mm/migrate_device.c                           |  10 +-
mm/mlock.c                                    |   6 +-
mm/mprotect.c                                 |   4 +-
mm/mremap.c                                   |  30 +--
mm/page_table_check.c                         |   7 +-
mm/page_vma_mapped.c                          |   6 +-
mm/pagewalk.c                                 |  14 +-
mm/percpu.c                                   |   8 +-
mm/pgalloc-track.h                            |   8 +-
mm/pgtable-generic.c                          |  25 +-
mm/ptdump.c                                   |  10 +-
mm/rmap.c                                     |   8 +-
mm/sparse-vmemmap.c                           |  10 +-
mm/userfaultfd.c                              |  10 +-
mm/vmalloc.c                                  |  49 ++--
mm/vmscan.c                                   |  16 +-
65 files changed, 1110 insertions(+), 626 deletions(-)
create mode 100644 Documentation/devicetree/bindings/riscv/physical-memory.yaml
create mode 100644 arch/riscv/mm/memory-alias.S
create mode 100644 include/dt-bindings/riscv/physical-memory.h
[PATCH v2 00/18] riscv: Memory type control for platforms with physical memory aliases
Posted by Samuel Holland 2 months, 1 week ago
On some RISC-V platforms, including StarFive JH7100 and ESWIN EIC7700,
DRAM is mapped to multiple physical address ranges, with each alias
having a different set of statically-determined Physical Memory
Attributes (PMAs), such as cacheability. Software can alter the PMAs for
a page by selecting a PFN from the corresponding physical address range.
On these platforms, this is the only way to allocate noncached memory
for use with noncoherent DMA.

These physical memory aliases are only visible to architecture code.
Generic MM code only ever sees the primary (cacheable) alias. The major
change from v1 of this series is that I was asked to move the hooks from
pfn_pXX()/pXX_pfn() to set_pXX()/pXXp_get().

 - Patches 1-7 ensure that architecture-specific code that hooks page
   table reads and writes is always called, and the calls are balanced.
 - Patches 8-11 refactor existing platform-specific memory type support
   to be modeled as variants on top of the standard Svpbmt extension,
   and apply the memory type transformation during PTE reads/writes.
 - Patches 12-16 add a new DT binding to describe physical memory
   regions, and implement a new memory type variant that transforms the
   PFN to use the desired alias when reading/writing page tables.
 - Patches 17-18 enable this new memory type variant on StarFive JH7100
   and ESWIN EIC7700.

I have boot-tested this series and tested DMA on SoCs with each of the
four ways to select a memory type: SiFive FU740 (none), SiFive
P470-based SoC (Svpbmt), Allwinner D1 (XTheadMae), and ESWIN EIC7700
(aliases).

Here is some basic `perf benchmark` data comparing relative performance
between v6.17 and either the generic MM changes or the whole series:

 Test        | Scenario   |  FU740 |   P470 |    D1  | EIC7700
 =============================================================
 syscall     | patch 1-10 | +3.17% | +0.89% | +2.60% |  +0.68%
   basic     |     series | -2.52% | -1.41% | +1.37% |  -0.64%
 -------------------------------------------------------------
 syscall     | patch 1-10 | +0.17% | -0.57% | +2.79% |  -1.12%
   fork      |     series | -1.31% | -5.91% | -1.50% |  -2.73%
 -------------------------------------------------------------
 syscall     | patch 1-10 | -0.24% | -0.30% | +2.76% |  +1.32%
   execve    |     series | -1.65% | -4.82% | -1.38% |  -0.66%
 -------------------------------------------------------------
 sched       | patch 1-10 | +1.54% | -5.76% | -5.09% |  -1.04%
   messaging |     series | +0.66% | +2.00% | +1.40% |  +1.97%
 -------------------------------------------------------------

The benchmark results are stable within each machine, and the same
binary was used on all machines. I would have expected the preparatory
changes (patch 1-10) to hurt performance somewhat, due to READ_ONCE/
WRITE_ONCE generating additional loads/stores, but surprisingly
performance was improved in some cases. The variation across machines
in response to the entire series is expected, as each of these machines
gets a different version of the alternative block.

Changes in v2:
 - Keep Kconfig options for each PBMT variant separate/non-overlapping
 - Move fixup code sequences to set_pXX() and pXXp_get()
 - Only define ALT_UNFIX_MT in configurations that need it
 - Improve inline documentation of ALT_FIXUP_MT/ALT_UNFIX_MT
 - Fix erroneously-escaped newline in assembly ALTERNATIVE_CFG_3 macro
 - Remove references to Physical Address Width (no longer part of Smmpt)
 - Remove special first entry from the list of physical memory regions
 - Fix compatible string in example
 - Put new code behind a new Kconfig option RISCV_ISA_XLINUXMEMALIAS
 - Document the calling convention of riscv_fixup/unfix_memory_alias()
 - Do not transform !pte_present() (e.g. swap) PTEs
 - Export riscv_fixup/unfix_memory_alias() to fix module compilation
 - Move the JH7100 DT changes from jh7100-common.dtsi to jh7100.dtsi
 - Keep RISCV_DMA_NONCOHERENT and RISCV_NONSTANDARD_CACHE_OPS selected

Anshuman Khandual (1):
  mm/ptdump: Replace READ_ONCE() with standard page table accessors

Samuel Holland (17):
  perf/core: Replace READ_ONCE() with standard page table accessors
  mm: Move the fallback definitions of pXXp_get()
  mm: Always use page table accessor functions
  mm: Allow page table accessors to be non-idempotent
  riscv: hibernate: Replace open-coded pXXp_get()
  riscv: mm: Always use page table accessor functions
  riscv: mm: Simplify set_p4d() and set_pgd()
  riscv: mm: Deduplicate _PAGE_CHG_MASK definition
  riscv: ptdump: Only show N and MT bits when enabled in the kernel
  riscv: mm: Fix up memory types when writing page tables
  riscv: mm: Expose all page table bits to assembly code
  riscv: alternative: Add an ALTERNATIVE_3 macro
  riscv: alternative: Allow calls with alternate link registers
  dt-bindings: riscv: Describe physical memory regions
  riscv: mm: Use physical memory aliases to apply PMAs
  riscv: dts: starfive: jh7100: Use physical memory ranges for DMA
  riscv: dts: eswin: eic7700: Use physical memory ranges for DMA

 .../bindings/riscv/physical-memory.yaml       |  91 +++++++
 arch/riscv/Kconfig                            |  16 ++
 arch/riscv/Kconfig.errata                     |  19 --
 arch/riscv/Kconfig.socs                       |   4 +
 arch/riscv/boot/dts/eswin/eic7700.dtsi        |   5 +
 .../boot/dts/starfive/jh7100-common.dtsi      |  24 --
 arch/riscv/boot/dts/starfive/jh7100.dtsi      |   4 +
 arch/riscv/include/asm/alternative-macros.h   |  45 +++-
 arch/riscv/include/asm/errata_list.h          |  45 ----
 arch/riscv/include/asm/hwcap.h                |   1 +
 arch/riscv/include/asm/pgtable-32.h           |  17 +-
 arch/riscv/include/asm/pgtable-64.h           | 228 +++++++++++++-----
 arch/riscv/include/asm/pgtable-bits.h         |  43 +++-
 arch/riscv/include/asm/pgtable.h              |  67 ++---
 arch/riscv/kernel/alternative.c               |   4 +-
 arch/riscv/kernel/cpufeature.c                |   6 +
 arch/riscv/kernel/hibernate.c                 |  18 +-
 arch/riscv/kernel/setup.c                     |   1 +
 arch/riscv/kvm/gstage.c                       |   6 +-
 arch/riscv/mm/Makefile                        |   1 +
 arch/riscv/mm/init.c                          |  68 +++---
 arch/riscv/mm/memory-alias.S                  | 123 ++++++++++
 arch/riscv/mm/pgtable.c                       | 114 +++++++--
 arch/riscv/mm/ptdump.c                        |  16 +-
 fs/dax.c                                      |   4 +-
 fs/proc/task_mmu.c                            |  27 ++-
 fs/userfaultfd.c                              |   6 +-
 include/dt-bindings/riscv/physical-memory.h   |  44 ++++
 include/linux/huge_mm.h                       |   8 +-
 include/linux/mm.h                            |  14 +-
 include/linux/pgtable.h                       | 112 ++++-----
 kernel/events/core.c                          |   8 +-
 mm/debug_vm_pgtable.c                         |   4 +-
 mm/filemap.c                                  |   6 +-
 mm/gup.c                                      |  37 +--
 mm/hmm.c                                      |   2 +-
 mm/huge_memory.c                              |  90 +++----
 mm/hugetlb.c                                  |  10 +-
 mm/hugetlb_vmemmap.c                          |   4 +-
 mm/kasan/init.c                               |  39 +--
 mm/kasan/shadow.c                             |  12 +-
 mm/khugepaged.c                               |  10 +-
 mm/ksm.c                                      |   2 +-
 mm/madvise.c                                  |   8 +-
 mm/mapping_dirty_helpers.c                    |   2 +-
 mm/memory-failure.c                           |  14 +-
 mm/memory.c                                   |  78 +++---
 mm/mempolicy.c                                |   4 +-
 mm/migrate.c                                  |   4 +-
 mm/migrate_device.c                           |  10 +-
 mm/mlock.c                                    |   6 +-
 mm/mprotect.c                                 |   4 +-
 mm/mremap.c                                   |  30 +--
 mm/page_table_check.c                         |   7 +-
 mm/page_vma_mapped.c                          |   6 +-
 mm/pagewalk.c                                 |  14 +-
 mm/percpu.c                                   |   8 +-
 mm/pgalloc-track.h                            |   8 +-
 mm/pgtable-generic.c                          |  25 +-
 mm/ptdump.c                                   |  10 +-
 mm/rmap.c                                     |   8 +-
 mm/sparse-vmemmap.c                           |  10 +-
 mm/userfaultfd.c                              |  10 +-
 mm/vmalloc.c                                  |  49 ++--
 mm/vmscan.c                                   |  16 +-
 65 files changed, 1110 insertions(+), 626 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/riscv/physical-memory.yaml
 create mode 100644 arch/riscv/mm/memory-alias.S
 create mode 100644 include/dt-bindings/riscv/physical-memory.h

-- 
2.47.2
Re: [PATCH v2 00/18] riscv: Memory type control for platforms with physical memory aliases
Posted by Andrew Morton 2 months, 1 week ago
On Wed,  8 Oct 2025 18:57:36 -0700 Samuel Holland <samuel.holland@sifive.com> wrote:

> On some RISC-V platforms, including StarFive JH7100 and ESWIN EIC7700,
> DRAM is mapped to multiple physical address ranges, with each alias
> having a different set of statically-determined Physical Memory
> Attributes (PMAs), such as cacheability. Software can alter the PMAs for
> a page by selecting a PFN from the corresponding physical address range.
> On these platforms, this is the only way to allocate noncached memory
> for use with noncoherent DMA.

Well that's weird.

> --- a/mm/ptdump.c
> +++ b/mm/ptdump.c
> @@ -31,7 +31,7 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
>  			    unsigned long next, struct mm_walk *walk)
>  {
>  	struct ptdump_state *st = walk->private;
> -	pgd_t val = READ_ONCE(*pgd);
> +	pgd_t val = pgdp_get(pgd);
>  
>  #if CONFIG_PGTABLE_LEVELS > 4 && \
>  		(defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS))

OK, but how are we to maintain this?  Will someone be running
grep/coccinelle/whatever on each kernel release?

Please give some thought to finding a way to break the build if someone
uses a plain dereference or a READ_ONCE().  Or add a checkpatch rule. 
Or something.  Let's not rely upon the whole world knowing about this.
Re: [PATCH v2 00/18] riscv: Memory type control for platforms with physical memory aliases
Posted by Samuel Holland 2 months, 1 week ago
On 2025-10-09 8:15 PM, Andrew Morton wrote:
> On Wed,  8 Oct 2025 18:57:36 -0700 Samuel Holland <samuel.holland@sifive.com> wrote:
> 
>> On some RISC-V platforms, including StarFive JH7100 and ESWIN EIC7700,
>> DRAM is mapped to multiple physical address ranges, with each alias
>> having a different set of statically-determined Physical Memory
>> Attributes (PMAs), such as cacheability. Software can alter the PMAs for
>> a page by selecting a PFN from the corresponding physical address range.
>> On these platforms, this is the only way to allocate noncached memory
>> for use with noncoherent DMA.
> 
> Well that's weird.
> 
>> --- a/mm/ptdump.c
>> +++ b/mm/ptdump.c
>> @@ -31,7 +31,7 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
>>  			    unsigned long next, struct mm_walk *walk)
>>  {
>>  	struct ptdump_state *st = walk->private;
>> -	pgd_t val = READ_ONCE(*pgd);
>> +	pgd_t val = pgdp_get(pgd);
>>  
>>  #if CONFIG_PGTABLE_LEVELS > 4 && \
>>  		(defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS))
> 
> OK, but how are we to maintain this?  Will someone be running
> grep/coccinelle/whatever on each kernel release?
> 
> Please give some thought to finding a way to break the build if someone
> uses a plain dereference or a READ_ONCE().  Or add a checkpatch rule. 
> Or something.  Let's not rely upon the whole world knowing about this.

My initial plan was to add a script to scripts/coccinelle so `make coccicheck`
would catch any new instances. This would require some way to avoid false
positives in the few places where these pointers are safe to dereference (like
the ptentp and pmdvalp mentioned in commit message), such as a separate typedef
or a naming convention.

I had also explored using sparse to annotate pte_t and friends as noderef. This
would require changes to the sparse tool to allow noderef to work with a
non-pointer type (and get inherited by any pointers to that type), or else each
pointer parameter/variable would need to be annotated in the source code
(equivalent to __user). Neither seems ideal.

I hadn't considered a checkpatch rule. That's probably the most straightforward
solution, to warn on any instances of "\*(vmf(\.|->))?(pte|p[mu4g]d)p?", along
with a coccinelle script that could be run occasionally.

Regards,
Samuel