drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c | 4 ++++ 1 file changed, 4 insertions(+)
dml21_map_dc_state_into_dml_display_cfg calls (the call is usually
inlined by the compiler) populate_dml21_surface_config_from_plane_state
and populate_dml21_plane_config_from_plane_state which may use FPU. In
a x86-64 build:
$ objdump --disassemble=dml21_map_dc_state_into_dml_display_cfg \
> drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.o |
> grep %xmm -c
63
Thus it needs to be guarded with DC_FP_START. But we must note that the
current code quality of the in-kernel FPU use in AMD dml2 is very much
problematic: we are actually calling DC_FP_START in dml21_wrapper.c
here, and this translation unit is built with CC_FLAGS_FPU. Strictly
speaking this does not make any sense: with CC_FLAGS_FPU the compiler is
allowed to generate FPU uses anywhere in the translated code, perhaps
out of the DC_FP_START guard. This problematic pattern also occurs in
at least dml2_wrapper.c, dcn35_fpu.c, and dcn351_fpu.c. Thus we really
need a careful audit and refactor for the in-kernel FPU uses, and this
patch is simply whacking a mole. However per the reporter, whacking
this mole is enough to make a 9060XT "just work."
Reported-by: Asiacn <710187964@qq.com>
Link: https://github.com/loongson-community/discussions/issues/102
Tested-by: Asiacn <710187964@qq.com>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
---
drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c
index 03de3cf06ae5..059ede6ff256 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c
@@ -224,7 +224,9 @@ static bool dml21_mode_check_and_programming(const struct dc *in_dc, struct dc_s
dml_ctx->config.svp_pstate.callbacks.release_phantom_streams_and_planes(in_dc, context);
/* Populate stream, plane mappings and other fields in display config. */
+ DC_FP_START();
result = dml21_map_dc_state_into_dml_display_cfg(in_dc, context, dml_ctx);
+ DC_FP_END();
if (!result)
return false;
@@ -279,7 +281,9 @@ static bool dml21_check_mode_support(const struct dc *in_dc, struct dc_state *co
dml_ctx->config.svp_pstate.callbacks.release_phantom_streams_and_planes(in_dc, context);
mode_support->dml2_instance = dml_init->dml2_instance;
+ DC_FP_START();
dml21_map_dc_state_into_dml_display_cfg(in_dc, context, dml_ctx);
+ DC_FP_END();
dml_ctx->v21.mode_programming.dml2_instance->scratch.build_mode_programming_locals.mode_programming_params.programming = dml_ctx->v21.mode_programming.programming;
DC_FP_START();
is_supported = dml2_check_mode_supported(mode_support);
--
2.51.0
No regression found in DC promotion test. Reviewed-by: Alex Hung <alex.hung@amd.com> On 8/25/25 02:52, Xi Ruoyao wrote: > dml21_map_dc_state_into_dml_display_cfg calls (the call is usually > inlined by the compiler) populate_dml21_surface_config_from_plane_state > and populate_dml21_plane_config_from_plane_state which may use FPU. In > a x86-64 build: > > $ objdump --disassemble=dml21_map_dc_state_into_dml_display_cfg \ > > drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.o | > > grep %xmm -c > 63 > > Thus it needs to be guarded with DC_FP_START. But we must note that the > current code quality of the in-kernel FPU use in AMD dml2 is very much > problematic: we are actually calling DC_FP_START in dml21_wrapper.c > here, and this translation unit is built with CC_FLAGS_FPU. Strictly > speaking this does not make any sense: with CC_FLAGS_FPU the compiler is > allowed to generate FPU uses anywhere in the translated code, perhaps > out of the DC_FP_START guard. This problematic pattern also occurs in > at least dml2_wrapper.c, dcn35_fpu.c, and dcn351_fpu.c. Thus we really > need a careful audit and refactor for the in-kernel FPU uses, and this > patch is simply whacking a mole. However per the reporter, whacking > this mole is enough to make a 9060XT "just work." > > Reported-by: Asiacn <710187964@qq.com> > Link: https://github.com/loongson-community/discussions/issues/102 > Tested-by: Asiacn <710187964@qq.com> > Signed-off-by: Xi Ruoyao <xry111@xry111.site> > --- > drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c > index 03de3cf06ae5..059ede6ff256 100644 > --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c > +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c > @@ -224,7 +224,9 @@ static bool dml21_mode_check_and_programming(const struct dc *in_dc, struct dc_s > dml_ctx->config.svp_pstate.callbacks.release_phantom_streams_and_planes(in_dc, context); > > /* Populate stream, plane mappings and other fields in display config. */ > + DC_FP_START(); > result = dml21_map_dc_state_into_dml_display_cfg(in_dc, context, dml_ctx); > + DC_FP_END(); > if (!result) > return false; > > @@ -279,7 +281,9 @@ static bool dml21_check_mode_support(const struct dc *in_dc, struct dc_state *co > dml_ctx->config.svp_pstate.callbacks.release_phantom_streams_and_planes(in_dc, context); > > mode_support->dml2_instance = dml_init->dml2_instance; > + DC_FP_START(); > dml21_map_dc_state_into_dml_display_cfg(in_dc, context, dml_ctx); > + DC_FP_END(); > dml_ctx->v21.mode_programming.dml2_instance->scratch.build_mode_programming_locals.mode_programming_params.programming = dml_ctx->v21.mode_programming.programming; > DC_FP_START(); > is_supported = dml2_check_mode_supported(mode_support);
On Mon, Aug 25, 2025 at 4:54 PM Xi Ruoyao <xry111@xry111.site> wrote: > > dml21_map_dc_state_into_dml_display_cfg calls (the call is usually > inlined by the compiler) populate_dml21_surface_config_from_plane_state > and populate_dml21_plane_config_from_plane_state which may use FPU. In > a x86-64 build: > > $ objdump --disassemble=dml21_map_dc_state_into_dml_display_cfg \ > > drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.o | > > grep %xmm -c > 63 > > Thus it needs to be guarded with DC_FP_START. But we must note that the > current code quality of the in-kernel FPU use in AMD dml2 is very much > problematic: we are actually calling DC_FP_START in dml21_wrapper.c > here, and this translation unit is built with CC_FLAGS_FPU. Strictly > speaking this does not make any sense: with CC_FLAGS_FPU the compiler is > allowed to generate FPU uses anywhere in the translated code, perhaps > out of the DC_FP_START guard. This problematic pattern also occurs in > at least dml2_wrapper.c, dcn35_fpu.c, and dcn351_fpu.c. Thus we really > need a careful audit and refactor for the in-kernel FPU uses, and this > patch is simply whacking a mole. However per the reporter, whacking > this mole is enough to make a 9060XT "just work." > > Reported-by: Asiacn <710187964@qq.com> > Link: https://github.com/loongson-community/discussions/issues/102 > Tested-by: Asiacn <710187964@qq.com> > Signed-off-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> > --- > drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c > index 03de3cf06ae5..059ede6ff256 100644 > --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c > +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c > @@ -224,7 +224,9 @@ static bool dml21_mode_check_and_programming(const struct dc *in_dc, struct dc_s > dml_ctx->config.svp_pstate.callbacks.release_phantom_streams_and_planes(in_dc, context); > > /* Populate stream, plane mappings and other fields in display config. */ > + DC_FP_START(); > result = dml21_map_dc_state_into_dml_display_cfg(in_dc, context, dml_ctx); > + DC_FP_END(); > if (!result) > return false; > > @@ -279,7 +281,9 @@ static bool dml21_check_mode_support(const struct dc *in_dc, struct dc_state *co > dml_ctx->config.svp_pstate.callbacks.release_phantom_streams_and_planes(in_dc, context); > > mode_support->dml2_instance = dml_init->dml2_instance; > + DC_FP_START(); > dml21_map_dc_state_into_dml_display_cfg(in_dc, context, dml_ctx); > + DC_FP_END(); > dml_ctx->v21.mode_programming.dml2_instance->scratch.build_mode_programming_locals.mode_programming_params.programming = dml_ctx->v21.mode_programming.programming; > DC_FP_START(); > is_supported = dml2_check_mode_supported(mode_support); > -- > 2.51.0 > >
On 8/25/25 02:52, Xi Ruoyao wrote: > dml21_map_dc_state_into_dml_display_cfg calls (the call is usually > inlined by the compiler) populate_dml21_surface_config_from_plane_state > and populate_dml21_plane_config_from_plane_state which may use FPU. In > a x86-64 build: > > $ objdump --disassemble=dml21_map_dc_state_into_dml_display_cfg \ > > drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.o | > > grep %xmm -c > 63 > > Thus it needs to be guarded with DC_FP_START. But we must note that the > current code quality of the in-kernel FPU use in AMD dml2 is very much > problematic: we are actually calling DC_FP_START in dml21_wrapper.c > here, and this translation unit is built with CC_FLAGS_FPU. Strictly > speaking this does not make any sense: with CC_FLAGS_FPU the compiler is > allowed to generate FPU uses anywhere in the translated code, perhaps > out of the DC_FP_START guard. This problematic pattern also occurs in > at least dml2_wrapper.c, dcn35_fpu.c, and dcn351_fpu.c. Thus we really Let me share Austin's comments below: " Both CC_FLAGS_FPU and DC_FP_START are required for FPU usage. CC_FLAGS_FPU allows the compiler to generate FPU code whereas DC_FP_START ensures that the FPU registers don't get tainted during runtime. The change itself looks fine to me. " > need a careful audit and refactor for the in-kernel FPU uses, and this > patch is simply whacking a mole. However per the reporter, whacking > this mole is enough to make a 9060XT "just work." > > Reported-by: Asiacn <710187964@qq.com> > Link: https://github.com/loongson-community/discussions/issues/102 "Link" to "BugLink"? The link is in Chinese and it would be helpful if you can spell out the problem in commit messages to something like like "this fixes 9060XT fails to output to HDMI on xa61200" or other meaningful descriptions. > Tested-by: Asiacn <710187964@qq.com> > Signed-off-by: Xi Ruoyao <xry111@xry111.site> > --- > drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c > index 03de3cf06ae5..059ede6ff256 100644 > --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c > +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_wrapper.c > @@ -224,7 +224,9 @@ static bool dml21_mode_check_and_programming(const struct dc *in_dc, struct dc_s > dml_ctx->config.svp_pstate.callbacks.release_phantom_streams_and_planes(in_dc, context); > > /* Populate stream, plane mappings and other fields in display config. */ > + DC_FP_START(); > result = dml21_map_dc_state_into_dml_display_cfg(in_dc, context, dml_ctx); > + DC_FP_END(); > if (!result) > return false; > > @@ -279,7 +281,9 @@ static bool dml21_check_mode_support(const struct dc *in_dc, struct dc_state *co > dml_ctx->config.svp_pstate.callbacks.release_phantom_streams_and_planes(in_dc, context); > > mode_support->dml2_instance = dml_init->dml2_instance; > + DC_FP_START(); > dml21_map_dc_state_into_dml_display_cfg(in_dc, context, dml_ctx); > + DC_FP_END(); > dml_ctx->v21.mode_programming.dml2_instance->scratch.build_mode_programming_locals.mode_programming_params.programming = dml_ctx->v21.mode_programming.programming; > DC_FP_START(); > is_supported = dml2_check_mode_supported(mode_support); I will send this patch to promotion test.
On Mon, 2025-09-08 at 14:18 -0600, Alex Hung wrote: > > > On 8/25/25 02:52, Xi Ruoyao wrote: > > dml21_map_dc_state_into_dml_display_cfg calls (the call is usually > > inlined by the compiler) populate_dml21_surface_config_from_plane_state > > and populate_dml21_plane_config_from_plane_state which may use FPU. In > > a x86-64 build: > > > > $ objdump --disassemble=dml21_map_dc_state_into_dml_display_cfg \ > > > drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.o | > > > grep %xmm -c > > 63 > > > > Thus it needs to be guarded with DC_FP_START. But we must note that the > > current code quality of the in-kernel FPU use in AMD dml2 is very much > > problematic: we are actually calling DC_FP_START in dml21_wrapper.c > > here, and this translation unit is built with CC_FLAGS_FPU. Strictly > > speaking this does not make any sense: with CC_FLAGS_FPU the compiler is > > allowed to generate FPU uses anywhere in the translated code, perhaps > > out of the DC_FP_START guard. This problematic pattern also occurs in > > at least dml2_wrapper.c, dcn35_fpu.c, and dcn351_fpu.c. Thus we really > > Let me share Austin's comments below: > > " > Both CC_FLAGS_FPU and DC_FP_START are required for FPU usage. > > CC_FLAGS_FPU allows the compiler to generate FPU code whereas > DC_FP_START ensures that the FPU registers don't get tainted during runtime. But the correct pattern would be: isolating the code using FPU into its own translation unit, say xxx_fp.c, then call DC_FP_START in another translation unit (for which CC_FLAGS_FPU is NOT used) before calling the routines in xxx_fp.c. It's because there's generally no way to guarantee the compiler only generate FP code in the range of DC_FP_START ... DC_FP_END if CC_FLAGS_FPU is used. The compiler can generate FPU code in situations we don't intend to use (and don't anticipate) FPU, for example if a "counting number of ones in a word" pattern is detected when the code is built for LoongArch, the compiler can invoke FPU to use a popcount instruction with CC_FLAGS_FPU. (That doesn't really happen now but it may happen in the future.) And the correct pattern is already used by, for example dcn10_resource.c: DC_FP_START(); voltage_supported = dcn_validate_bandwidth(dc, context, validate_mode); DC_FP_END(); Here dcn_validate_bandwidth is in dcn_calcs.c where CC_FLAGS_FPU is in- effect, but dcn10_resource.c itself is not built with CC_FLAGS_FPU. -- Xi Ruoyao <xry111@xry111.site>
On Tue, 2025-09-09 at 09:47 +0800, Xi Ruoyao wrote: > On Mon, 2025-09-08 at 14:18 -0600, Alex Hung wrote: > > > > > > On 8/25/25 02:52, Xi Ruoyao wrote: > > > dml21_map_dc_state_into_dml_display_cfg calls (the call is usually > > > inlined by the compiler) populate_dml21_surface_config_from_plane_state > > > and populate_dml21_plane_config_from_plane_state which may use FPU. In > > > a x86-64 build: > > > > > > $ objdump --disassemble=dml21_map_dc_state_into_dml_display_cfg \ > > > > drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.o | > > > > grep %xmm -c > > > 63 > > > > > > Thus it needs to be guarded with DC_FP_START. But we must note that the > > > current code quality of the in-kernel FPU use in AMD dml2 is very much > > > problematic: we are actually calling DC_FP_START in dml21_wrapper.c > > > here, and this translation unit is built with CC_FLAGS_FPU. Strictly > > > speaking this does not make any sense: with CC_FLAGS_FPU the compiler is > > > allowed to generate FPU uses anywhere in the translated code, perhaps > > > out of the DC_FP_START guard. This problematic pattern also occurs in > > > at least dml2_wrapper.c, dcn35_fpu.c, and dcn351_fpu.c. Thus we really > > > > Let me share Austin's comments below: > > > > " > > Both CC_FLAGS_FPU and DC_FP_START are required for FPU usage. > > > > CC_FLAGS_FPU allows the compiler to generate FPU code whereas > > DC_FP_START ensures that the FPU registers don't get tainted during runtime. > > But the correct pattern would be: isolating the code using FPU into its > own translation unit, say xxx_fp.c, then call DC_FP_START in another > translation unit (for which CC_FLAGS_FPU is NOT used) before calling the > routines in xxx_fp.c. > > It's because there's generally no way to guarantee the compiler only > generate FP code in the range of DC_FP_START ... DC_FP_END if > CC_FLAGS_FPU is used. The compiler can generate FPU code in situations > we don't intend to use (and don't anticipate) FPU, for example if a > "counting number of ones in a word" pattern is detected when the code is > built for LoongArch, the compiler can invoke FPU to use a popcount > instruction with CC_FLAGS_FPU. (That doesn't really happen now but it > may happen in the future.) > > And the correct pattern is already used by, for example > dcn10_resource.c: > > DC_FP_START(); > voltage_supported = dcn_validate_bandwidth(dc, context, validate_mode); > DC_FP_END(); > > Here dcn_validate_bandwidth is in dcn_calcs.c where CC_FLAGS_FPU is in- > effect, but dcn10_resource.c itself is not built with CC_FLAGS_FPU. And the correct pattern is already documented and explained in dcn20_fpu.c: * 4. Developers **must not** use DC_FP_START/END in this file, but they need * to ensure that the caller invokes it before access any function available * in this file. For this reason, public functions in this file must invoke * dc_assert_fp_enabled(); * * Let's expand a little bit more the idea in the code pattern. To fully * isolate FPU operations in a single place, we must avoid situations where * compilers spill FP values to registers due to FP enable in a specific C * file. Note that even if we isolate all FPU functions in a single file and * call its interface from other files, the compiler might enable the use of * FPU before we call DC_FP_START. Nevertheless, it is the programmer's * responsibility to invoke DC_FP_START/END in the correct place. To highlight * situations where developers forgot to use the FP protection before calling * the DC FPU interface functions, we introduce a helper that checks if the * function is invoked under FP protection. If not, it will trigger a kernel * warning. -- Xi Ruoyao <xry111@xry111.site>
© 2016 - 2025 Red Hat, Inc.