riscv: mm: Backport of mmap hint address fixes

[PATCH 6.6.y 0/2] riscv: mm: Backport of mmap hint address fixes

Posted by Vivian Wang 4 months ago

Backport of the two riscv mmap patches from master. In effect, these two
patches removes arch_get_mmap_{base,end} for riscv.

Guo Ren: Please take a look. Patch 1 has a slightly non-trivial conflict
with your commit 97b7ac69be2e ("riscv: mm: Fixup compat
arch_get_mmap_end"), which changed STACK_TOP_MAX from TASK_SIZE_64 to
TASK_SIZE when CONFIG_64BIT=y. This shouldn't be a problem, but, well,
just to be safe.

---
Charlie Jenkins (2):
      riscv: mm: Use hint address in mmap if available
      riscv: mm: Do not restrict mmap address based on hint

 arch/riscv/include/asm/processor.h | 33 +++++----------------------------
 1 file changed, 5 insertions(+), 28 deletions(-)
---
base-commit: 60a9e718726fa7019ae00916e4b1c52498da5b60
change-id: 20250917-riscv-mmap-addr-space-6-6-15e7db6b5db6

Best regards,
-- 
Vivian "dramforever" Wang

Re: [PATCH 6.6.y 0/2] riscv: mm: Backport of mmap hint address fixes

Posted by Greg KH 4 months ago

On Wed, Oct 08, 2025 at 03:50:15PM +0800, Vivian Wang wrote:
> Backport of the two riscv mmap patches from master. In effect, these two
> patches removes arch_get_mmap_{base,end} for riscv.

Why is this needed?  What bug does this fix?

thanks,

greg k-h

Re: [PATCH 6.6.y 0/2] riscv: mm: Backport of mmap hint address fixes

Posted by Vivian Wang 4 months ago

On 10/8/25 18:20, Greg KH wrote:
> On Wed, Oct 08, 2025 at 03:50:15PM +0800, Vivian Wang wrote:
>> Backport of the two riscv mmap patches from master. In effect, these two
>> patches removes arch_get_mmap_{base,end} for riscv.
> Why is this needed?  What bug does this fix?

The behavior of mmap hint address in current 6.6.y is broken when > 39
bits of virtual address is available (i.e. Sv48 or Sv57, having 48 and
57 bits of VA available, respectively). The man-pages mmap(2) page
states, for the hint address [1]:

       If addr is NULL, then the kernel chooses the (page-aligned)
       address at which to create the mapping; this is the most portable
       method of creating a new mapping.  If addr is not NULL, then the
       kernel takes it as a hint about where to place the mapping; on
       Linux, the kernel will pick a nearby page boundary (but always
       above or equal to the value specified by
       /proc/sys/vm/mmap_min_addr) and attempt to create the mapping
       there.  If another mapping already exists there, the kernel picks
       a new address that may or may not depend on the hint.  The address
       of the new mapping is returned as the result of the call.

Therefore, if a userspace program specifies a large hint address of e.g.
1<<50, and both the kernel and the hardware supports it, it should be
used even if MAP_FIXED is not specified. This is also the behavior
implemented in x86_64, arm64, and, on a recent enough (> 6.10) kernel,
riscv64.

However, current 6.6.y for riscv64 implements a bizarre behavior, where
the hint address is treated as an upper bound instead. Therefore,
passing 1<<50 would actually return a VA in 48-bit space.

To reproduce, call mmap with arguments like:

       mmap(hint, 4096, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

Comparison:

        hint = 0x4000000000000 i.e. 1 << 50

                    6.6.106             6.6.106 + patch
            sv48    0x7fff90223000      0x7fff93b4e000
            sv57    0x7fffb7d49000      0x4000000000000

When the hint is not used, the exact address is of course random, which
is expected. However, since the address 1<<50 is supported under Sv57,
it should be usable by mmap, but with current 6.6.y behavior it is not
used, and some other address from 48-bit space used instead.

There's not yet real riscv64 hardware with Sv57, but an analogous
problem arises on Sv48 with an address like 1<<40.

One real userspace program that runs into this is the Go programming
language runtime with TSAN enabled. Excerpt from a test log [2], which
was run on an Eswin EIC7700x, which supports Sv48:

fatal error: too many address space collisions for -race mode
runtime stack:
runtime.throw({0x257eaa?, 0x4000000?})
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/panic.go:1246 +0x38 fp=0x7ffff84af758 sp=0x7ffff84af730 pc=0xc9310
runtime.(*mheap).sysAlloc(0x3e3c20, 0x81cc8?, 0x3f3e28, 0x3f3e50)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/malloc.go:799 +0x56c fp=0x7ffff84af7f8 sp=0x7ffff84af758 pc=0x67944
runtime.(*mheap).grow(0x3e3c20, 0x7fffb69fee00?)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/mheap.go:1568 +0x9c fp=0x7ffff84af870 sp=0x7ffff84af7f8 pc=0x824c4
runtime.(*mheap).allocSpan(0x3e3c20, 0x1, 0x0, 0x10)
[...]
FAIL    runtime/race    0.285s

With TSAN enabled, the Go runtime allocates a lot of virtual address
space. As the message suggests, if the return value of mmap is not equal
to a non-zero hint, the runtime assumes that mmap is failing to allocate
the address because some other mapping is already there (in other words,
it assumes the man-pages documented behavior), and unmaps it and tries a
different address, until it tries too many times and gives up. This
means Go with TSAN fails to initialize on Sv48 and current 6.6.y.

(cc Meng Zhuo, in case of any questions about the Go runtime here.)

Patch 1 here addresses the above issue, but introduced regressions (see
replies in "Link"). Patch 2 addresses those regressions.

Thanks,
Vivian "dramforever" Wang

[1]: https://man7.org/linux/man-pages/man2/mmap.2.html
[2]: https://logs.chromium.org/logs/golang/buildbucket/cr-buildbucket/8708301310656989281/+/u/step/22/log/2

Re: [PATCH 6.6.y 0/2] riscv: mm: Backport of mmap hint address fixes

Posted by Greg KH 4 months ago

On Thu, Oct 09, 2025 at 12:19:46PM +0800, Vivian Wang wrote:
> 
> On 10/8/25 18:20, Greg KH wrote:
> > On Wed, Oct 08, 2025 at 03:50:15PM +0800, Vivian Wang wrote:
> >> Backport of the two riscv mmap patches from master. In effect, these two
> >> patches removes arch_get_mmap_{base,end} for riscv.
> > Why is this needed?  What bug does this fix?
> 
> The behavior of mmap hint address in current 6.6.y is broken when > 39
> bits of virtual address is available (i.e. Sv48 or Sv57, having 48 and
> 57 bits of VA available, respectively). The man-pages mmap(2) page
> states, for the hint address [1]:
> 
>        If addr is NULL, then the kernel chooses the (page-aligned)
>        address at which to create the mapping; this is the most portable
>        method of creating a new mapping.  If addr is not NULL, then the
>        kernel takes it as a hint about where to place the mapping; on
>        Linux, the kernel will pick a nearby page boundary (but always
>        above or equal to the value specified by
>        /proc/sys/vm/mmap_min_addr) and attempt to create the mapping
>        there.  If another mapping already exists there, the kernel picks
>        a new address that may or may not depend on the hint.  The address
>        of the new mapping is returned as the result of the call.
> 
> Therefore, if a userspace program specifies a large hint address of e.g.
> 1<<50, and both the kernel and the hardware supports it, it should be
> used even if MAP_FIXED is not specified. This is also the behavior
> implemented in x86_64, arm64, and, on a recent enough (> 6.10) kernel,
> riscv64.
> 
> However, current 6.6.y for riscv64 implements a bizarre behavior, where
> the hint address is treated as an upper bound instead. Therefore,
> passing 1<<50 would actually return a VA in 48-bit space.
> 
> To reproduce, call mmap with arguments like:
> 
>        mmap(hint, 4096, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> 
> Comparison:
> 
>         hint = 0x4000000000000 i.e. 1 << 50
> 
>                     6.6.106             6.6.106 + patch
>             sv48    0x7fff90223000      0x7fff93b4e000
>             sv57    0x7fffb7d49000      0x4000000000000
> 
> When the hint is not used, the exact address is of course random, which
> is expected. However, since the address 1<<50 is supported under Sv57,
> it should be usable by mmap, but with current 6.6.y behavior it is not
> used, and some other address from 48-bit space used instead.
> 
> There's not yet real riscv64 hardware with Sv57, but an analogous
> problem arises on Sv48 with an address like 1<<40.

As this issue has been fixed for many years now, why is it just showing
up now?  Shouldn't you be using 6.12.y for new hardware?

> One real userspace program that runs into this is the Go programming
> language runtime with TSAN enabled. Excerpt from a test log [2], which
> was run on an Eswin EIC7700x, which supports Sv48:
> 
> fatal error: too many address space collisions for -race mode
> runtime stack:
> runtime.throw({0x257eaa?, 0x4000000?})
>     /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/panic.go:1246 +0x38 fp=0x7ffff84af758 sp=0x7ffff84af730 pc=0xc9310
> runtime.(*mheap).sysAlloc(0x3e3c20, 0x81cc8?, 0x3f3e28, 0x3f3e50)
>     /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/malloc.go:799 +0x56c fp=0x7ffff84af7f8 sp=0x7ffff84af758 pc=0x67944
> runtime.(*mheap).grow(0x3e3c20, 0x7fffb69fee00?)
>     /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/mheap.go:1568 +0x9c fp=0x7ffff84af870 sp=0x7ffff84af7f8 pc=0x824c4
> runtime.(*mheap).allocSpan(0x3e3c20, 0x1, 0x0, 0x10)
> [...]
> FAIL    runtime/race    0.285s
> 
> With TSAN enabled, the Go runtime allocates a lot of virtual address
> space. As the message suggests, if the return value of mmap is not equal
> to a non-zero hint, the runtime assumes that mmap is failing to allocate
> the address because some other mapping is already there (in other words,
> it assumes the man-pages documented behavior), and unmaps it and tries a
> different address, until it tries too many times and gives up. This
> means Go with TSAN fails to initialize on Sv48 and current 6.6.y.
> 
> (cc Meng Zhuo, in case of any questions about the Go runtime here.)
> 
> Patch 1 here addresses the above issue, but introduced regressions (see
> replies in "Link"). Patch 2 addresses those regressions.

Ok, that makes a bit more sense, but again, why is this just showing up
now?  What changed to cause this to be noticed at and needed to be fixed
at this moment in time and not before?

thanks,

greg k-h

Re: [PATCH 6.6.y 0/2] riscv: mm: Backport of mmap hint address fixes

Posted by Vivian Wang 4 months ago

On 10/9/25 13:00, Greg KH wrote:
> On Thu, Oct 09, 2025 at 12:19:46PM +0800, Vivian Wang wrote:
>> [...]
> Ok, that makes a bit more sense, but again, why is this just showing up
> now?  What changed to cause this to be noticed at and needed to be fixed
> at this moment in time and not before?

As of why this came quite late in the lifetime of the 6.6.y branch, I
believe it's a combination of two factors.

Firstly, actual Sv48-capable RISC-V hardware came fairly late. Milk-V
Megrez (with Eswin EIC7700X), on which the Go TSAN thing ran, was
shipped only early this year. The DC ROMA II laptop (EIC7702X) and
Framework mainboard with the same SoC has not even shipped yet, or maybe
only shipped to developers - I'm not so certain. Most other RISC-V
machines only have Sv39.

Secondly, there is interest among some Chinese software vendors to ship
Linux distros based on a 6.6.y LTS kernel. The "RISC-V Common Kernel"
(RVCK) project [1], with support from openEuler and various HW vendors,
maintains backports on top of a 6.6.y kernel. "RockOS" [2] is a distro
maintained by PLCT Lab, ISCAS, for EIC770{0,2}X-based boards, and it has
a 6.6.y kernel branch. Both have cherry-picked the mmap patches for now.

We operate with the understanding that the official stable kernel will
not be accepting new major features and drivers, but fixes do belong in
stable, and at least from the perspective of PLCT Lab we generally try
to send patches instead of hoarding them. Hence, the earlier backport
request and this backport series.

I hope this explanation is acceptable.

Thanks,
Vivian "dramforever" Wang

PS: This 6.6 kernel thing isn't just a RISC-V thing, by the way. KylinOS
V11 has shipped in August with a 6.6 kernel. Deepin and UOS will be
shipping with 6.6, with UOS "25" shipping maybe late this year or early
2026.

[1]: https://github.com/RVCK-Project/rvck
[2]: https://docs.rockos.dev/

Re: [PATCH 6.6.y 0/2] riscv: mm: Backport of mmap hint address fixes

Posted by Greg KH 4 months ago

On Thu, Oct 09, 2025 at 01:50:11PM +0800, Vivian Wang wrote:
> 
> On 10/9/25 13:00, Greg KH wrote:
> > On Thu, Oct 09, 2025 at 12:19:46PM +0800, Vivian Wang wrote:
> >> [...]
> > Ok, that makes a bit more sense, but again, why is this just showing up
> > now?  What changed to cause this to be noticed at and needed to be fixed
> > at this moment in time and not before?
> 
> As of why this came quite late in the lifetime of the 6.6.y branch, I
> believe it's a combination of two factors.
> 
> Firstly, actual Sv48-capable RISC-V hardware came fairly late. Milk-V
> Megrez (with Eswin EIC7700X), on which the Go TSAN thing ran, was
> shipped only early this year. The DC ROMA II laptop (EIC7702X) and
> Framework mainboard with the same SoC has not even shipped yet, or maybe
> only shipped to developers - I'm not so certain. Most other RISC-V
> machines only have Sv39.
> 
> Secondly, there is interest among some Chinese software vendors to ship
> Linux distros based on a 6.6.y LTS kernel. The "RISC-V Common Kernel"
> (RVCK) project [1], with support from openEuler and various HW vendors,
> maintains backports on top of a 6.6.y kernel. "RockOS" [2] is a distro
> maintained by PLCT Lab, ISCAS, for EIC770{0,2}X-based boards, and it has
> a 6.6.y kernel branch. Both have cherry-picked the mmap patches for now.
> 
> We operate with the understanding that the official stable kernel will
> not be accepting new major features and drivers, but fixes do belong in
> stable, and at least from the perspective of PLCT Lab we generally try
> to send patches instead of hoarding them. Hence, the earlier backport
> request and this backport series.
> 
> I hope this explanation is acceptable.

Thanks for the detailed explaination.  I've queued these up now.

But wow, shipping new products on a 2 year old kernel feels very risky
to me, but hey, what do I know?  :)

> PS: This 6.6 kernel thing isn't just a RISC-V thing, by the way. KylinOS
> V11 has shipped in August with a 6.6 kernel. Deepin and UOS will be
> shipping with 6.6, with UOS "25" shipping maybe late this year or early
> 2026.

That too is crazy.  They should know better.

Just to give a bit of context for this, for the latest 6.6.y release,
6.6.110, there are currently over 300 documented unfixed CVE items in
that branch.  Feels rough to be doing a new release based on that...

good luck!

greg k-h