[PATCH -fixes] dma-mapping: add default implementation to arch_dma_{set|clear}_uncached

Yangyu Chen posted 1 patch 1 year, 5 months ago
include/linux/dma-map-ops.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
[PATCH -fixes] dma-mapping: add default implementation to arch_dma_{set|clear}_uncached
Posted by Yangyu Chen 1 year, 5 months ago
Currently, we have some code in kernel/dma/direct.c which references
arch_dma_set_uncached and arch_dma_clear_uncached. However, many
architectures do not provide these symbols, and the code currently
relies on compiler optimization to cut the unnecessary code. When the
compiler fails to optimize it, the code will reference the symbol and
cause a link error. I found this bug when developing some new extensions
for RISC-V on LLVM. The error message is shown below:

```
  LD      .tmp_vmlinux.kallsyms1
ld.lld: error: undefined symbol: arch_dma_set_uncached
>>> referenced by direct.c
>>>               kernel/dma/direct.o:(dma_direct_alloc) in archive
vmlinux.a
make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
```

This patch adds default implementations for arch_dma_set_uncached and
arch_dma_clear_uncached in include/linux/dma-map-ops.h. So that the
code in kernel/dma/direct.c can always reference these symbols to
avoid link errors.

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
---
 include/linux/dma-map-ops.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 02a1c825896b..92a713557859 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -420,8 +420,22 @@ static inline void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
 }
 #endif /* ARCH_HAS_DMA_MARK_CLEAN */
 
+#ifdef CONFIG_ARCH_HAS_DMA_SET_UNCACHED
 void *arch_dma_set_uncached(void *addr, size_t size);
+#else
+static inline void *arch_dma_set_uncached(void *addr, size_t size)
+{
+	return ERR_PTR(-EINVAL);
+}
+#endif /* CONFIG_ARCH_HAS_DMA_SET_UNCACHED */
+
+#ifdef CONFIG_ARCH_HAS_DMA_CLEAR_UNCACHED
 void arch_dma_clear_uncached(void *addr, size_t size);
+#else
+static inline void arch_dma_clear_uncached(void *addr, size_t size)
+{
+}
+#endif /* CONFIG_ARCH_HAS_DMA_CLEAR_UNCACHED */
 
 #ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT
 bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr);
-- 
2.45.2
Re: [PATCH -fixes] dma-mapping: add default implementation to arch_dma_{set|clear}_uncached
Posted by Christoph Hellwig 1 year, 5 months ago
On Tue, Jul 09, 2024 at 05:25:29PM +0800, Yangyu Chen wrote:
> Currently, we have some code in kernel/dma/direct.c which references
> arch_dma_set_uncached and arch_dma_clear_uncached. However, many
> architectures do not provide these symbols, and the code currently
> relies on compiler optimization to cut the unnecessary code. When the
> compiler fails to optimize it, the code will reference the symbol and
> cause a link error. I found this bug when developing some new extensions
> for RISC-V on LLVM. The error message is shown below:

Same comment as for the last one.  I think your compiler misbehaves,
and the typical reason for that would be if you disable all
optimizations.
Re: [PATCH -fixes] dma-mapping: add default implementation to arch_dma_{set|clear}_uncached
Posted by Yangyu Chen 1 year, 5 months ago

> On Jul 9, 2024, at 19:19, Christoph Hellwig <hch@lst.de> wrote:
> 
> On Tue, Jul 09, 2024 at 05:25:29PM +0800, Yangyu Chen wrote:
>> Currently, we have some code in kernel/dma/direct.c which references
>> arch_dma_set_uncached and arch_dma_clear_uncached. However, many
>> architectures do not provide these symbols, and the code currently
>> relies on compiler optimization to cut the unnecessary code. When the
>> compiler fails to optimize it, the code will reference the symbol and
>> cause a link error. I found this bug when developing some new extensions
>> for RISC-V on LLVM. The error message is shown below:
> 
> Same comment as for the last one.  I think your compiler misbehaves,
> and the typical reason for that would be if you disable all
> optimizations.
> 

The reason is that some optimizations failed to apply after adding
some passes. I will fix the compiler later. Whatever, we should not
rely on this optimization to get the code being successfully compiled.
Re: [PATCH -fixes] dma-mapping: add default implementation to arch_dma_{set|clear}_uncached
Posted by Robin Murphy 1 year, 5 months ago
On 09/07/2024 12:39 pm, Yangyu Chen wrote:
> 
> 
>> On Jul 9, 2024, at 19:19, Christoph Hellwig <hch@lst.de> wrote:
>>
>> On Tue, Jul 09, 2024 at 05:25:29PM +0800, Yangyu Chen wrote:
>>> Currently, we have some code in kernel/dma/direct.c which references
>>> arch_dma_set_uncached and arch_dma_clear_uncached. However, many
>>> architectures do not provide these symbols, and the code currently
>>> relies on compiler optimization to cut the unnecessary code. When the
>>> compiler fails to optimize it, the code will reference the symbol and
>>> cause a link error. I found this bug when developing some new extensions
>>> for RISC-V on LLVM. The error message is shown below:
>>
>> Same comment as for the last one.  I think your compiler misbehaves,
>> and the typical reason for that would be if you disable all
>> optimizations.
>>
> 
> The reason is that some optimizations failed to apply after adding
> some passes. I will fix the compiler later. Whatever, we should not
> rely on this optimization to get the code being successfully compiled.

We rely on unreachable code being optimised out *all over* the place - 
it's kind of the main point of constructs like IS_ENABLED(), and why the 
kernel as a whole does not and will not support being compiled with -O0. 
Frankly I'm quite surprised that this one would be the only case you hit 
if your compiler is not doing that properly.

Thanks,
Robin.
Re: [PATCH -fixes] dma-mapping: add default implementation to arch_dma_{set|clear}_uncached
Posted by Christoph Hellwig 1 year, 5 months ago
On Tue, Jul 09, 2024 at 07:39:29PM +0800, Yangyu Chen wrote:
> The reason is that some optimizations failed to apply after adding
> some passes. I will fix the compiler later. Whatever, we should not
> rely on this optimization to get the code being successfully compiled.

The Linux kernel relies on constant propagation and dead code
eliminiation a lot to make code simpler and more readable.
Re: [PATCH -fixes] dma-mapping: add default implementation to arch_dma_{set|clear}_uncached
Posted by Yangyu Chen 1 year, 5 months ago

> On Jul 9, 2024, at 19:46, Christoph Hellwig <hch@lst.de> wrote:
> 
> On Tue, Jul 09, 2024 at 07:39:29PM +0800, Yangyu Chen wrote:
>> The reason is that some optimizations failed to apply after adding
>> some passes. I will fix the compiler later. Whatever, we should not
>> rely on this optimization to get the code being successfully compiled.
> 
> The Linux kernel relies on constant propagation and dead code
> eliminiation a lot to make code simpler and more readable.

Actually, the compiler is patched LLVM with -O2 optimization. I didn’t
turn off the optimization.

You can see what we did for the compiler here[1] and compile the
kernel with `-march=rv64imac_zicond_zicldst` added to KBUILD_CFLAGS.
I added conditional load/store pass as Intel did for the x86 APX
extension, which appeared last year (called hoist load/store in
LLVM if you want to search the PR), and then LLVM failed to optimize
this.

The only failed symbol on the kernel with `ARCH=riscv defconfig`
is `arch_dma_set_uncached` since the compiler requires all possible
values to be known. I think a pattern like in kernel/dma/direct.c:349
for symbol `arch_dma_clear_uncached`, which uses `if
(IS_ENABLE(CONFIG_ARCH_HAS_xxx)) xxx` is acceptable. But for the
symbol `arch_dma_set_uncached`, a complex analysis is needed for a
value set in a block of branches. I think we should not rely on such
compiler optimization in such a complex pattern.

In addition, patching this way can also make this symbol safer to use.

[1] https://github.com/cyyself/llvm-project/tree/zicldst-support-bugless-v3

Thanks,
Yangyu Chen
Re: [PATCH -fixes] dma-mapping: add default implementation to arch_dma_{set|clear}_uncached
Posted by Robin Murphy 1 year, 5 months ago
On 09/07/2024 1:22 pm, Yangyu Chen wrote:
> 
> 
>> On Jul 9, 2024, at 19:46, Christoph Hellwig <hch@lst.de> wrote:
>>
>> On Tue, Jul 09, 2024 at 07:39:29PM +0800, Yangyu Chen wrote:
>>> The reason is that some optimizations failed to apply after adding
>>> some passes. I will fix the compiler later. Whatever, we should not
>>> rely on this optimization to get the code being successfully compiled.
>>
>> The Linux kernel relies on constant propagation and dead code
>> eliminiation a lot to make code simpler and more readable.
> 
> Actually, the compiler is patched LLVM with -O2 optimization. I didn’t
> turn off the optimization.

Well, sorry, you did do that, by patching the compiler in a way which 
makes it no longer happen as before. If LLVM is otherwise able to make 
this optimisation as expected then to me that sounds like your patch 
effectively causes a codegen regression in LLVM, thus it should hardly 
be the concern of Linux, nor every other project which may get worse 
codegen because of it, to work around it.

Regardless of whether people think it's reasonable to *depend* on 
compiler optimisation or not, emitting dead code which was not 
previously emitted at the same optimisation level seems like a clear 
step backwards.

> You can see what we did for the compiler here[1] and compile the
> kernel with `-march=rv64imac_zicond_zicldst` added to KBUILD_CFLAGS.
> I added conditional load/store pass as Intel did for the x86 APX
> extension, which appeared last year (called hoist load/store in
> LLVM if you want to search the PR), and then LLVM failed to optimize
> this.
> 
> The only failed symbol on the kernel with `ARCH=riscv defconfig`
> is `arch_dma_set_uncached` since the compiler requires all possible
> values to be known. I think a pattern like in kernel/dma/direct.c:349
> for symbol `arch_dma_clear_uncached`, which uses `if
> (IS_ENABLE(CONFIG_ARCH_HAS_xxx)) xxx` is acceptable. But for the
> symbol `arch_dma_set_uncached`, a complex analysis is needed for a
> value set in a block of branches. I think we should not rely on such
> compiler optimization in such a complex pattern.

I'm not a compiler guy, but is it really that complex when the variable 
is only ever written with the same compile-time-constant value that it's 
already initialised with? If the optimisation pass is so focused on 
being able to use a conditional store instruction that it would rather 
emit one which has no effect either way than elide it entirely, that 
doesn't strike me as a particularly good optimisation :/

Thanks,
Robin.

> 
> In addition, patching this way can also make this symbol safer to use.
> 
> [1] https://github.com/cyyself/llvm-project/tree/zicldst-support-bugless-v3
> 
> Thanks,
> Yangyu Chen
Re: [PATCH -fixes] dma-mapping: add default implementation to arch_dma_{set|clear}_uncached
Posted by Yangyu Chen 1 year, 5 months ago

> On Jul 9, 2024, at 23:52, Robin Murphy <robin.murphy@arm.com> wrote:
> On 09/07/2024 1:22 pm, Yangyu Chen wrote:
>> The only failed symbol on the kernel with `ARCH=riscv defconfig`
>> is `arch_dma_set_uncached` since the compiler requires all possible
>> values to be known. I think a pattern like in kernel/dma/direct.c:349
>> for symbol `arch_dma_clear_uncached`, which uses `if
>> (IS_ENABLE(CONFIG_ARCH_HAS_xxx)) xxx` is acceptable. But for the
>> symbol `arch_dma_set_uncached`, a complex analysis is needed for a
>> value set in a block of branches. I think we should not rely on such
>> compiler optimization in such a complex pattern.
> I'm not a compiler guy, but is it really that complex when the variable is only ever written with the same compile-time-constant value that it's already initialised with? If the optimisation pass is so focused on being able to use a conditional store instruction that it would rather emit one which has no effect either way than elide it entirely, that doesn't strike me as a particularly good optimisation :/

I confirm this is a regression for the compiler. However, I think
both the compiler and the kernel should be fixed, as I mentioned
the complexity of this pattern in the last email. Also, here is the
only failed symbol in the kernel with RISC-V defconfig.

Thanks,
Yangyu Chen