RISC-V uses xRET instructions on return from interrupt and to go back
to user-space; the xRET instruction is not core serializing.
Use FENCE.I for providing core serialization as follows:
- by calling sync_core_before_usermode() on return from interrupt (cf.
ipi_sync_core()),
- via switch_mm() and sync_core_before_usermode() (respectively, for
uthread->uthread and kthread->uthread transitions) to go back to
user-space.
On RISC-V, the serialization in switch_mm() is activated by resetting
the icache_stale_mask of the mm at prepare_sync_core_cmd().
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Suggested-by: Palmer Dabbelt <palmer@dabbelt.com>
---
.../membarrier-sync-core/arch-support.txt | 2 +-
arch/riscv/Kconfig | 3 ++
arch/riscv/include/asm/sync_core.h | 29 +++++++++++++++++++
3 files changed, 33 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/include/asm/sync_core.h
diff --git a/Documentation/features/sched/membarrier-sync-core/arch-support.txt b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
index d96b778b87ed8..f6f0bd871a578 100644
--- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt
+++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
@@ -43,7 +43,7 @@
| openrisc: | TODO |
| parisc: | TODO |
| powerpc: | ok |
- | riscv: | TODO |
+ | riscv: | ok |
| s390: | ok |
| sh: | TODO |
| sparc: | TODO |
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 95a2a06acc6a6..3e2734a3f2957 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -27,14 +27,17 @@ config RISCV
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_KCOV
+ select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MMIOWB
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PMEM_API
+ select ARCH_HAS_PREPARE_SYNC_CORE_CMD
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SET_DIRECT_MAP if MMU
select ARCH_HAS_SET_MEMORY if MMU
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL
+ select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
select ARCH_HAS_SYSCALL_WRAPPER
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UBSAN_SANITIZE_ALL
diff --git a/arch/riscv/include/asm/sync_core.h b/arch/riscv/include/asm/sync_core.h
new file mode 100644
index 0000000000000..9153016da8f14
--- /dev/null
+++ b/arch/riscv/include/asm/sync_core.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_RISCV_SYNC_CORE_H
+#define _ASM_RISCV_SYNC_CORE_H
+
+/*
+ * RISC-V implements return to user-space through an xRET instruction,
+ * which is not core serializing.
+ */
+static inline void sync_core_before_usermode(void)
+{
+ asm volatile ("fence.i" ::: "memory");
+}
+
+#ifdef CONFIG_SMP
+/*
+ * Ensure the next switch_mm() on every CPU issues a core serializing
+ * instruction for the given @mm.
+ */
+static inline void prepare_sync_core_cmd(struct mm_struct *mm)
+{
+ cpumask_setall(&mm->context.icache_stale_mask);
+}
+#else
+static inline void prepare_sync_core_cmd(struct mm_struct *mm)
+{
+}
+#endif /* CONFIG_SMP */
+
+#endif /* _ASM_RISCV_SYNC_CORE_H */
--
2.34.1
On 2023-11-27 05:32, Andrea Parri wrote:
> RISC-V uses xRET instructions on return from interrupt and to go back
> to user-space; the xRET instruction is not core serializing.
>
> Use FENCE.I for providing core serialization as follows:
>
> - by calling sync_core_before_usermode() on return from interrupt (cf.
> ipi_sync_core()),
>
> - via switch_mm() and sync_core_before_usermode() (respectively, for
> uthread->uthread and kthread->uthread transitions) to go back to
> user-space.
>
> On RISC-V, the serialization in switch_mm() is activated by resetting
> the icache_stale_mask of the mm at prepare_sync_core_cmd().
>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> Suggested-by: Palmer Dabbelt <palmer@dabbelt.com>
> ---
[...]
> +
> +#ifdef CONFIG_SMP
> +/*
> + * Ensure the next switch_mm() on every CPU issues a core serializing
> + * instruction for the given @mm.
> + */
> +static inline void prepare_sync_core_cmd(struct mm_struct *mm)
> +{
> + cpumask_setall(&mm->context.icache_stale_mask);
I am concerned about the possibility that this change lacks two barriers in the
following scenario:
On a transition from uthread -> uthread on [CPU 0], from a thread belonging to
another mm to a thread belonging to the mm [!mm -> mm] for which a concurrent
membarrier sync-core is done on [CPU 1]:
- [CPU 1] sets all bits in the mm icache_stale_mask [A]. There are no barriers
associated with these stores.
- [CPU 0] store to rq->curr [B] (by the scheduler) vs [CPU 1] loads rq->curr [C]
within membarrier to decide if the IPI should be skipped. Let's say CPU 1 observes
cpu_rq(0)->curr->mm != mm, so it skips the IPI.
- This means membarrier relies on switch_mm() to issue the sync-core.
- [CPU 0] switch_mm() loads [D] the icache_stale_mask. If the bit is zero, switch_mm()
may incorrectly skip the sync-core.
AFAIU, [C] can be reordered before [A] because there is no barrier between those
operations within membarrier. I suspect it can cause the switch_mm() code to skip
a needed sync-core.
AFAIU, [D] can be reordered before [B] because there is no documented barrier
between those operations within the scheduler, which can also cause switch_mm()
to skip a needed sync-core.
We possibly have a similar scenario for uthread->uthread when the scheduler
switches between mm -> !mm.
One way to fix this would be to add the following barriers:
- A smp_mb() between [A] and [C], possibly just after cpumask_setall() in
prepare_sync_core_cmd(), with comments detailing the ordering it guarantees,
- A smp_mb() between [B] and [D], possibly just before cpumask_test_cpu() in
flush_icache_deferred(), with appropriate comments.
Am I missing something ?
Thanks,
Mathieu
> +}
> +#else
> +static inline void prepare_sync_core_cmd(struct mm_struct *mm)
> +{
> +}
> +#endif /* CONFIG_SMP */
> +
> +#endif /* _ASM_RISCV_SYNC_CORE_H */
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
> I am concerned about the possibility that this change lacks two barriers in the > following scenario: > > On a transition from uthread -> uthread on [CPU 0], from a thread belonging to > another mm to a thread belonging to the mm [!mm -> mm] for which a concurrent > membarrier sync-core is done on [CPU 1]: > > - [CPU 1] sets all bits in the mm icache_stale_mask [A]. There are no barriers > associated with these stores. > > - [CPU 0] store to rq->curr [B] (by the scheduler) vs [CPU 1] loads rq->curr [C] > within membarrier to decide if the IPI should be skipped. Let's say CPU 1 observes > cpu_rq(0)->curr->mm != mm, so it skips the IPI. > > - This means membarrier relies on switch_mm() to issue the sync-core. > > - [CPU 0] switch_mm() loads [D] the icache_stale_mask. If the bit is zero, switch_mm() > may incorrectly skip the sync-core. > > AFAIU, [C] can be reordered before [A] because there is no barrier between those > operations within membarrier. I suspect it can cause the switch_mm() code to skip > a needed sync-core. > > AFAIU, [D] can be reordered before [B] because there is no documented barrier > between those operations within the scheduler, which can also cause switch_mm() > to skip a needed sync-core. > > We possibly have a similar scenario for uthread->uthread when the scheduler > switches between mm -> !mm. > > One way to fix this would be to add the following barriers: > > - A smp_mb() between [A] and [C], possibly just after cpumask_setall() in > prepare_sync_core_cmd(), with comments detailing the ordering it guarantees, > - A smp_mb() between [B] and [D], possibly just before cpumask_test_cpu() in > flush_icache_deferred(), with appropriate comments. > > Am I missing something ? Thanks for the detailed analysis. AFAIU, the following barrier (in membarrier_private_expedited()) /* * Matches memory barriers around rq->curr modification in * scheduler. */ smp_mb(); /* system call entry is not a mb. */ can serve the purpose of ordering [A] before [C] (to be documented in v2). But I agree that [B] and [D] are unordered /missing suitable synchronization. Worse, RISC-V has currently no full barrier after [B] and before returning to user-space: I'm thinking (inspired by the PowerPC implementation), diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c index 217fd4de61342..f63222513076d 100644 --- a/arch/riscv/mm/context.c +++ b/arch/riscv/mm/context.c @@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, if (unlikely(prev == next)) return; +#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP) + /* + * The membarrier system call requires a full memory barrier + * after storing to rq->curr, before going back to user-space. + * + * Only need the full barrier when switching between processes: + * barrier when switching from kernel to userspace is not + * required here, given that it is implied by mmdrop(); barrier + * when switching from userspace to kernel is not needed after + * store to rq->curr. + */ + if (unlikely(atomic_read(&next->membarrier_state) & + (MEMBARRIER_STATE_PRIVATE_EXPEDITED | + MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev) + smp_mb(); +#endif + /* * Mark the current MM context as inactive, and the next as * active. This is at least used by the icache flushing diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a708d225c28e8..a1c749fddd095 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6670,8 +6670,9 @@ static void __sched notrace __schedule(unsigned int sched_mode) * * Here are the schemes providing that barrier on the * various architectures: - * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC. - * switch_mm() rely on membarrier_arch_switch_mm() on PowerPC. + * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC, + * RISC-V. switch_mm() relies on membarrier_arch_switch_mm() + * on PowerPC. * - finish_lock_switch() for weakly-ordered * architectures where spin_unlock is a full barrier, * - switch_to() for arm64 (weakly-ordered, spin_unlock The silver lining is that similar changes (probably as a separate/preliminary patch) also restore the desired order between [B] and [D] AFAIU; so with them, 2/2 would just need additions to document the above SYNC_CORE scenario. Thoughts? Andrea
On 2023-11-28 10:13, Andrea Parri wrote: >> I am concerned about the possibility that this change lacks two barriers in the >> following scenario: >> >> On a transition from uthread -> uthread on [CPU 0], from a thread belonging to >> another mm to a thread belonging to the mm [!mm -> mm] for which a concurrent >> membarrier sync-core is done on [CPU 1]: >> >> - [CPU 1] sets all bits in the mm icache_stale_mask [A]. There are no barriers >> associated with these stores. >> >> - [CPU 0] store to rq->curr [B] (by the scheduler) vs [CPU 1] loads rq->curr [C] >> within membarrier to decide if the IPI should be skipped. Let's say CPU 1 observes >> cpu_rq(0)->curr->mm != mm, so it skips the IPI. >> >> - This means membarrier relies on switch_mm() to issue the sync-core. >> >> - [CPU 0] switch_mm() loads [D] the icache_stale_mask. If the bit is zero, switch_mm() >> may incorrectly skip the sync-core. >> >> AFAIU, [C] can be reordered before [A] because there is no barrier between those >> operations within membarrier. I suspect it can cause the switch_mm() code to skip >> a needed sync-core. >> >> AFAIU, [D] can be reordered before [B] because there is no documented barrier >> between those operations within the scheduler, which can also cause switch_mm() >> to skip a needed sync-core. >> >> We possibly have a similar scenario for uthread->uthread when the scheduler >> switches between mm -> !mm. >> >> One way to fix this would be to add the following barriers: >> >> - A smp_mb() between [A] and [C], possibly just after cpumask_setall() in >> prepare_sync_core_cmd(), with comments detailing the ordering it guarantees, >> - A smp_mb() between [B] and [D], possibly just before cpumask_test_cpu() in >> flush_icache_deferred(), with appropriate comments. >> >> Am I missing something ? > > Thanks for the detailed analysis. > > AFAIU, the following barrier (in membarrier_private_expedited()) > > /* > * Matches memory barriers around rq->curr modification in > * scheduler. > */ > smp_mb(); /* system call entry is not a mb. */ > > can serve the purpose of ordering [A] before [C] (to be documented in v2). Agreed. Yes it should be documented. > > But I agree that [B] and [D] are unordered /missing suitable synchronization. > Worse, RISC-V has currently no full barrier after [B] and before returning to > user-space: I'm thinking (inspired by the PowerPC implementation), If RISC-V currently supports the membarrier private cmd and lacks the appropriate smp_mb() in switch_mm(), then it's a bug. This initial patch should be a "Fix" and fast-tracked as such. Indeed, looking at how ASID is used to implement switch_mm, it appears to not require a full smp_mb() at all as long as there are no ASID rollovers. > > diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c > index 217fd4de61342..f63222513076d 100644 > --- a/arch/riscv/mm/context.c > +++ b/arch/riscv/mm/context.c > @@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, > if (unlikely(prev == next)) > return; > > +#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP) > + /* > + * The membarrier system call requires a full memory barrier > + * after storing to rq->curr, before going back to user-space. > + * > + * Only need the full barrier when switching between processes: > + * barrier when switching from kernel to userspace is not > + * required here, given that it is implied by mmdrop(); barrier > + * when switching from userspace to kernel is not needed after > + * store to rq->curr. > + */ > + if (unlikely(atomic_read(&next->membarrier_state) & > + (MEMBARRIER_STATE_PRIVATE_EXPEDITED | > + MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev) > + smp_mb(); > +#endif The approach looks good. Please implement it within a separate membarrier_arch_switch_mm() as done on powerpc. > + > /* > * Mark the current MM context as inactive, and the next as > * active. This is at least used by the icache flushing > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index a708d225c28e8..a1c749fddd095 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6670,8 +6670,9 @@ static void __sched notrace __schedule(unsigned int sched_mode) > * > * Here are the schemes providing that barrier on the > * various architectures: > - * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC. > - * switch_mm() rely on membarrier_arch_switch_mm() on PowerPC. > + * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC, > + * RISC-V. switch_mm() relies on membarrier_arch_switch_mm() > + * on PowerPC. > * - finish_lock_switch() for weakly-ordered > * architectures where spin_unlock is a full barrier, > * - switch_to() for arm64 (weakly-ordered, spin_unlock > > The silver lining is that similar changes (probably as a separate/preliminary > patch) also restore the desired order between [B] and [D] AFAIU; so with them, > 2/2 would just need additions to document the above SYNC_CORE scenario. Exactly. > Thoughts? I think we should be OK with the changes you suggest. Thanks! Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com
> > diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> > index 217fd4de61342..f63222513076d 100644
> > --- a/arch/riscv/mm/context.c
> > +++ b/arch/riscv/mm/context.c
> > @@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
> > if (unlikely(prev == next))
> > return;
> > +#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP)
> > + /*
> > + * The membarrier system call requires a full memory barrier
> > + * after storing to rq->curr, before going back to user-space.
> > + *
> > + * Only need the full barrier when switching between processes:
> > + * barrier when switching from kernel to userspace is not
> > + * required here, given that it is implied by mmdrop(); barrier
> > + * when switching from userspace to kernel is not needed after
> > + * store to rq->curr.
> > + */
> > + if (unlikely(atomic_read(&next->membarrier_state) &
> > + (MEMBARRIER_STATE_PRIVATE_EXPEDITED |
> > + MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev)
> > + smp_mb();
> > +#endif
>
> The approach looks good. Please implement it within a separate
> membarrier_arch_switch_mm() as done on powerpc.
Will do. Thanks.
As regards the Fixes: tag, I guess it boils down to what we want or we
need to take for commit "riscv: Support membarrier private cmd". :-)
FWIW, a quick git-log search confirmed that MEMBARRIER has been around
for quite some time in the RISC-V world (though I'm not familiar with
any of its mainstream uses): commit 1464d00b27b2 says (at least) since
93917ad50972 ("RISC-V: Add support for restartable sequence"). I am
currently inclined to pick the latter commit (and check it w/ Palmer),
but other suggestions are welcome.
Andrea
On 2023-11-29 13:29, Andrea Parri wrote:
>>> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
>>> index 217fd4de61342..f63222513076d 100644
>>> --- a/arch/riscv/mm/context.c
>>> +++ b/arch/riscv/mm/context.c
>>> @@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
>>> if (unlikely(prev == next))
>>> return;
>>> +#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP)
>>> + /*
>>> + * The membarrier system call requires a full memory barrier
>>> + * after storing to rq->curr, before going back to user-space.
>>> + *
>>> + * Only need the full barrier when switching between processes:
>>> + * barrier when switching from kernel to userspace is not
>>> + * required here, given that it is implied by mmdrop(); barrier
>>> + * when switching from userspace to kernel is not needed after
>>> + * store to rq->curr.
>>> + */
>>> + if (unlikely(atomic_read(&next->membarrier_state) &
>>> + (MEMBARRIER_STATE_PRIVATE_EXPEDITED |
>>> + MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev)
>>> + smp_mb();
>>> +#endif
>>
>> The approach looks good. Please implement it within a separate
>> membarrier_arch_switch_mm() as done on powerpc.
>
> Will do. Thanks.
>
> As regards the Fixes: tag, I guess it boils down to what we want or we
> need to take for commit "riscv: Support membarrier private cmd". :-)
I'm not seeing this commit in the Linux master branch, am I missing
something ?
> FWIW, a quick git-log search confirmed that MEMBARRIER has been around
> for quite some time in the RISC-V world (though I'm not familiar with
> any of its mainstream uses): commit 1464d00b27b2 says (at least) since
> 93917ad50972 ("RISC-V: Add support for restartable sequence"). I am
> currently inclined to pick the latter commit (and check it w/ Palmer),
> but other suggestions are welcome.
Supporting membarrier private expedited is not optional since Linux 4.14:
see kernel/sched/core.c:
membarrier_switch_mm(rq, prev->active_mm, next->mm);
/*
* sys_membarrier() requires an smp_mb() between setting
* rq->curr / membarrier_switch_mm() and returning to userspace.
*
* The below provides this either through switch_mm(), or in
* case 'prev->active_mm == next->mm' through
* finish_task_switch()'s mmdrop().
*/
switch_mm_irqs_off(prev->active_mm, next->mm, next);
Failure to provide the required barrier is a bug in the architecture's
switch_mm implementation when CONFIG_MEMBARRIER=y.
We should probably introduce a new
Documentation/features/sched/membarrier/arch-support.txt
to clarify this requirement.
Userspace code such as liburcu [1] heavily relies on membarrier private
expedited (when available) to speed up RCU read-side critical sections.
Various DNS servers, including BIND 9, use liburcu.
Thanks,
Mathieu
[1] https:/liburcu.org
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
> > As regards the Fixes: tag, I guess it boils down to what we want or we
> > need to take for commit "riscv: Support membarrier private cmd". :-)
>
> I'm not seeing this commit in the Linux master branch, am I missing
> something ?
I don't think you're missing something: I was wondering "what/where is
this commit"? Sorry for the confusion.
> > FWIW, a quick git-log search confirmed that MEMBARRIER has been around
> > for quite some time in the RISC-V world (though I'm not familiar with
> > any of its mainstream uses): commit 1464d00b27b2 says (at least) since
> > 93917ad50972 ("RISC-V: Add support for restartable sequence"). I am
> > currently inclined to pick the latter commit (and check it w/ Palmer),
> > but other suggestions are welcome.
>
> Supporting membarrier private expedited is not optional since Linux 4.14:
>
> see kernel/sched/core.c:
>
> membarrier_switch_mm(rq, prev->active_mm, next->mm);
> /*
> * sys_membarrier() requires an smp_mb() between setting
> * rq->curr / membarrier_switch_mm() and returning to userspace.
> *
> * The below provides this either through switch_mm(), or in
> * case 'prev->active_mm == next->mm' through
> * finish_task_switch()'s mmdrop().
> */
> switch_mm_irqs_off(prev->active_mm, next->mm, next);
>
> Failure to provide the required barrier is a bug in the architecture's
> switch_mm implementation when CONFIG_MEMBARRIER=y.
>
> We should probably introduce a new
> Documentation/features/sched/membarrier/arch-support.txt
> to clarify this requirement.
>
> Userspace code such as liburcu [1] heavily relies on membarrier private
> expedited (when available) to speed up RCU read-side critical sections.
> Various DNS servers, including BIND 9, use liburcu.
Thanks for the information.
So I should probably stick to 93917ad50972, which apparently selected
CONFIG_MEMBARRIER on RISC-V, for the Fixes: tag in question.
I'll look into adding the membarrier feature you mention (as a final/
follow-up patch), unless you or someone else want to take care of it.
Andrea
On 2023-11-29 16:25, Andrea Parri wrote:
>>> As regards the Fixes: tag, I guess it boils down to what we want or we
>>> need to take for commit "riscv: Support membarrier private cmd". :-)
>>
>> I'm not seeing this commit in the Linux master branch, am I missing
>> something ?
>
> I don't think you're missing something: I was wondering "what/where is
> this commit"? Sorry for the confusion.
>
>
>>> FWIW, a quick git-log search confirmed that MEMBARRIER has been around
>>> for quite some time in the RISC-V world (though I'm not familiar with
>>> any of its mainstream uses): commit 1464d00b27b2 says (at least) since
>>> 93917ad50972 ("RISC-V: Add support for restartable sequence"). I am
>>> currently inclined to pick the latter commit (and check it w/ Palmer),
>>> but other suggestions are welcome.
>>
>> Supporting membarrier private expedited is not optional since Linux 4.14:
>>
>> see kernel/sched/core.c:
>>
>> membarrier_switch_mm(rq, prev->active_mm, next->mm);
>> /*
>> * sys_membarrier() requires an smp_mb() between setting
>> * rq->curr / membarrier_switch_mm() and returning to userspace.
>> *
>> * The below provides this either through switch_mm(), or in
>> * case 'prev->active_mm == next->mm' through
>> * finish_task_switch()'s mmdrop().
>> */
>> switch_mm_irqs_off(prev->active_mm, next->mm, next);
>>
>> Failure to provide the required barrier is a bug in the architecture's
>> switch_mm implementation when CONFIG_MEMBARRIER=y.
>>
>> We should probably introduce a new
>> Documentation/features/sched/membarrier/arch-support.txt
>> to clarify this requirement.
>>
>> Userspace code such as liburcu [1] heavily relies on membarrier private
>> expedited (when available) to speed up RCU read-side critical sections.
>> Various DNS servers, including BIND 9, use liburcu.
>
> Thanks for the information.
>
> So I should probably stick to 93917ad50972, which apparently selected
> CONFIG_MEMBARRIER on RISC-V, for the Fixes: tag in question.
I think it goes further than that, because you can explicitly
CONFIG_MEMBARRIER=y, see init/Kconfig:
config MEMBARRIER
bool "Enable membarrier() system call" if EXPERT
default y
help
Enable the membarrier() system call that allows issuing memory
barriers across all running threads, which can be used to distribute
the cost of user-space memory barriers asymmetrically by transforming
pairs of memory barriers into pairs consisting of membarrier() and a
compiler barrier.
If unsure, say Y.
Before 1464d00b27b2, riscv just happened to set it to =n in the defconfig.
I suspect the initial port of riscv merged after v4.14 was already broken.
> I'll look into adding the membarrier feature you mention (as a final/
> follow-up patch), unless you or someone else want to take care of it.
I'll be happy to review it :)
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
> > So I should probably stick to 93917ad50972, which apparently selected > > CONFIG_MEMBARRIER on RISC-V, for the Fixes: tag in question. > > I think it goes further than that, because you can explicitly > CONFIG_MEMBARRIER=y, see init/Kconfig: > > config MEMBARRIER > bool "Enable membarrier() system call" if EXPERT > default y > help > Enable the membarrier() system call that allows issuing memory > barriers across all running threads, which can be used to distribute > the cost of user-space memory barriers asymmetrically by transforming > pairs of memory barriers into pairs consisting of membarrier() and a > compiler barrier. > > If unsure, say Y. > > Before 1464d00b27b2, riscv just happened to set it to =n in the defconfig. > > I suspect the initial port of riscv merged after v4.14 was already broken. I see. Oh well, guess I'll have to leave this up to the maintainers then (I believe I've never managed to build riscv that far), Palmer? > > I'll look into adding the membarrier feature you mention (as a final/ > > follow-up patch), unless you or someone else want to take care of it. > > I'll be happy to review it :) Sweet! :-) Andrea
On Wed, 29 Nov 2023 14:43:34 PST (-0800), parri.andrea@gmail.com wrote:
>> > So I should probably stick to 93917ad50972, which apparently selected
>> > CONFIG_MEMBARRIER on RISC-V, for the Fixes: tag in question.
>>
>> I think it goes further than that, because you can explicitly
>> CONFIG_MEMBARRIER=y, see init/Kconfig:
>>
>> config MEMBARRIER
>> bool "Enable membarrier() system call" if EXPERT
>> default y
>> help
>> Enable the membarrier() system call that allows issuing memory
>> barriers across all running threads, which can be used to distribute
>> the cost of user-space memory barriers asymmetrically by transforming
>> pairs of memory barriers into pairs consisting of membarrier() and a
>> compiler barrier.
>>
>> If unsure, say Y.
>>
>> Before 1464d00b27b2, riscv just happened to set it to =n in the defconfig.
>>
>> I suspect the initial port of riscv merged after v4.14 was already broken.
>
> I see. Oh well, guess I'll have to leave this up to the maintainers then
> (I believe I've never managed to build riscv that far), Palmer?
I see
$ git grep "config MEMBARRIER" fab957c11efe2f405e08b9f0d080524bc2631428
fab957c11efe2f405e08b9f0d080524bc2631428:init/Kconfig:config MEMBARRIER
so IMO this is just one of those forever bugs. So I'd lean towards
Fixes: fab957c11efe ("RISC-V: Atomic and Locking Code")
(or anything in that original patch set). It's not that big of a
backport, so I think it's safe enough?
>> > I'll look into adding the membarrier feature you mention (as a final/
>> > follow-up patch), unless you or someone else want to take care of it.
>>
>> I'll be happy to review it :)
>
> Sweet! :-)
>
> Andrea
> I see
>
> $ git grep "config MEMBARRIER" fab957c11efe2f405e08b9f0d080524bc2631428
> fab957c11efe2f405e08b9f0d080524bc2631428:init/Kconfig:config MEMBARRIER
>
> so IMO this is just one of those forever bugs. So I'd lean towards
>
> Fixes: fab957c11efe ("RISC-V: Atomic and Locking Code")
Works for me, will apply in v2.
> (or anything in that original patch set). It's not that big of a backport,
> so I think it's safe enough?
Indeed, I think so.
The final version of this fix will likely depend on some machinery/code
introduced by 3ccfebedd8cf54 ("powerpc, membarrier: Skip memory barrier
in switch_mm()"); but, yes, nothing we can't safely adjust I think.
Thanks,
Andrea
On Wed, 06 Dec 2023 06:11:07 PST (-0800), parri.andrea@gmail.com wrote:
>> I see
>>
>> $ git grep "config MEMBARRIER" fab957c11efe2f405e08b9f0d080524bc2631428
>> fab957c11efe2f405e08b9f0d080524bc2631428:init/Kconfig:config MEMBARRIER
>>
>> so IMO this is just one of those forever bugs. So I'd lean towards
>>
>> Fixes: fab957c11efe ("RISC-V: Atomic and Locking Code")
>
> Works for me, will apply in v2.
>
>
>> (or anything in that original patch set). It's not that big of a backport,
>> so I think it's safe enough?
>
> Indeed, I think so.
>
> The final version of this fix will likely depend on some machinery/code
> introduced by 3ccfebedd8cf54 ("powerpc, membarrier: Skip memory barrier
> in switch_mm()"); but, yes, nothing we can't safely adjust I think.
Ya, I guess we'll have to look to know for sure but hopefully it's
manageable.
> Thanks,
> Andrea
> > The final version of this fix will likely depend on some machinery/code
> > introduced by 3ccfebedd8cf54 ("powerpc, membarrier: Skip memory barrier
> > in switch_mm()"); but, yes, nothing we can't safely adjust I think.
>
> Ya, I guess we'll have to look to know for sure but hopefully it's
> manageable.
Absolutely. One approach would be to follow what PowerPC did: AFAIU, before
3ccfebedd8cf54 membarrier/powerpc used to hard code the required barrier in
in finish_task_switch(), "masking" it as an smp_mb__after_unlock_lock(); riscv
could use a similar approach (though with a different/new mask function).
Alternatively, we could maybe keep the barrier in switch_mm().
But let me complete and send out v2 with the fix at stake... this should give
us a more concrete basis to discuss about these matters.
Andrea
On Wed, 06 Dec 2023 09:42:44 PST (-0800), parri.andrea@gmail.com wrote:
>> > The final version of this fix will likely depend on some machinery/code
>> > introduced by 3ccfebedd8cf54 ("powerpc, membarrier: Skip memory barrier
>> > in switch_mm()"); but, yes, nothing we can't safely adjust I think.
>>
>> Ya, I guess we'll have to look to know for sure but hopefully it's
>> manageable.
>
> Absolutely. One approach would be to follow what PowerPC did: AFAIU, before
> 3ccfebedd8cf54 membarrier/powerpc used to hard code the required barrier in
> in finish_task_switch(), "masking" it as an smp_mb__after_unlock_lock(); riscv
> could use a similar approach (though with a different/new mask function).
IIRC we patterned our MMIOWB/after-spinlock barriers after PPC, so
that's probably a good place to start here as well.
> Alternatively, we could maybe keep the barrier in switch_mm().
>
> But let me complete and send out v2 with the fix at stake... this should give
> us a more concrete basis to discuss about these matters.
>
> Andrea
© 2016 - 2025 Red Hat, Inc.