drivers/gpu/drm/tiny/bochs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Use ioremap_wc() instead of ioremap() to map framebuffer during driver
probing phase.
Using ioremap() results in a VA being mapped with PAT=UC-. Additionally,
on x86 architectures, ioremap() invokes memtype_reserve() to reserve the
memory type as UC- for the physical range. This reservation can cause
subsequent calls to ioremap_wc() to fail to map the VA with PAT=WC to the
same physical range for framebuffre in ttm_kmap_iter_linear_io_init().
Consequently, the operation drm_gem_vram_bo_driver_move() ->
ttm_bo_move_memcpy() -> ttm_move_memcpy() becomes significantly slow on
platforms where UC memory access is slow.
Here's the performance data measured in a guest on the physical machine
"Sapphire Rapids XCC".
With host KVM honors guest PAT memory types, the effective memory type
for this framebuffer range is
- WC when ioremap_wc() is used in driver probing phase
- UC- when ioremap() is used.
The data presented is an average from 10 execution runs.
The memcpy range for the data is
mem->bus.offset=0xfd000000, mem->size=0x3e8000.
--------------------------------------------------------------
| in bochs_hw_init() |
| ioremap() | ioremap_wc() |
------------------------------|----------------|--------------|
cycles of | 2227.4M | 17.8M |
drm_gem_vram_bo_driver_move() | | |
------------------------------|----------------|--------------|
time of | 1.24s | 0.01s |
drm_gem_vram_bo_driver_move() | | |
--------------------------------------------------------------
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Closes: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com/#t
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
drivers/gpu/drm/tiny/bochs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c
index 31fc5d839e10..6414f0a72f6a 100644
--- a/drivers/gpu/drm/tiny/bochs.c
+++ b/drivers/gpu/drm/tiny/bochs.c
@@ -261,7 +261,7 @@ static int bochs_hw_init(struct drm_device *dev)
if (pci_request_region(pdev, 0, "bochs-drm") != 0)
DRM_WARN("Cannot request framebuffer, boot fb still active?\n");
- bochs->fb_map = ioremap(addr, size);
+ bochs->fb_map = ioremap_wc(addr, size);
if (bochs->fb_map == NULL) {
DRM_ERROR("Cannot map framebuffer\n");
return -ENOMEM;
--
2.43.2
Hi
Am 09.09.24 um 07:15 schrieb Yan Zhao:
> Use ioremap_wc() instead of ioremap() to map framebuffer during driver
> probing phase.
>
> Using ioremap() results in a VA being mapped with PAT=UC-. Additionally,
> on x86 architectures, ioremap() invokes memtype_reserve() to reserve the
> memory type as UC- for the physical range. This reservation can cause
> subsequent calls to ioremap_wc() to fail to map the VA with PAT=WC to the
> same physical range for framebuffre in ttm_kmap_iter_linear_io_init().
> Consequently, the operation drm_gem_vram_bo_driver_move() ->
> ttm_bo_move_memcpy() -> ttm_move_memcpy() becomes significantly slow on
> platforms where UC memory access is slow.
I've noticed this too and pushed a major update that replaces the entire
memory management. [1]
The patch is still welcome, I think, but you may want to rebase onto the
latest drm-misc-next branch. [2]
Best regards
Thomas
[1] https://patchwork.freedesktop.org/series/138086/
[2] https://gitlab.freedesktop.org/drm/misc/kernel/-/tree/drm-misc-next
>
> Here's the performance data measured in a guest on the physical machine
> "Sapphire Rapids XCC".
> With host KVM honors guest PAT memory types, the effective memory type
> for this framebuffer range is
> - WC when ioremap_wc() is used in driver probing phase
> - UC- when ioremap() is used.
>
> The data presented is an average from 10 execution runs.
> The memcpy range for the data is
> mem->bus.offset=0xfd000000, mem->size=0x3e8000.
>
> --------------------------------------------------------------
> | in bochs_hw_init() |
> | ioremap() | ioremap_wc() |
> ------------------------------|----------------|--------------|
> cycles of | 2227.4M | 17.8M |
> drm_gem_vram_bo_driver_move() | | |
> ------------------------------|----------------|--------------|
> time of | 1.24s | 0.01s |
> drm_gem_vram_bo_driver_move() | | |
> --------------------------------------------------------------
>
> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Closes: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com/#t
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
> drivers/gpu/drm/tiny/bochs.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c
> index 31fc5d839e10..6414f0a72f6a 100644
> --- a/drivers/gpu/drm/tiny/bochs.c
> +++ b/drivers/gpu/drm/tiny/bochs.c
> @@ -261,7 +261,7 @@ static int bochs_hw_init(struct drm_device *dev)
> if (pci_request_region(pdev, 0, "bochs-drm") != 0)
> DRM_WARN("Cannot request framebuffer, boot fb still active?\n");
>
> - bochs->fb_map = ioremap(addr, size);
> + bochs->fb_map = ioremap_wc(addr, size);
> if (bochs->fb_map == NULL) {
> DRM_ERROR("Cannot map framebuffer\n");
> return -ENOMEM;
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)
On Mon, Sep 09, 2024 at 08:40:30AM +0200, Thomas Zimmermann wrote: > Hi > > Am 09.09.24 um 07:15 schrieb Yan Zhao: > > Use ioremap_wc() instead of ioremap() to map framebuffer during driver > > probing phase. > > > > Using ioremap() results in a VA being mapped with PAT=UC-. Additionally, > > on x86 architectures, ioremap() invokes memtype_reserve() to reserve the > > memory type as UC- for the physical range. This reservation can cause > > subsequent calls to ioremap_wc() to fail to map the VA with PAT=WC to the > > same physical range for framebuffre in ttm_kmap_iter_linear_io_init(). > > Consequently, the operation drm_gem_vram_bo_driver_move() -> > > ttm_bo_move_memcpy() -> ttm_move_memcpy() becomes significantly slow on > > platforms where UC memory access is slow. > > I've noticed this too and pushed a major update that replaces the entire > memory management. [1] > > The patch is still welcome, I think, but you may want to rebase onto the > latest drm-misc-next branch. [2] > > Best regards > Thomas > > [1] https://patchwork.freedesktop.org/series/138086/ > [2] https://gitlab.freedesktop.org/drm/misc/kernel/-/tree/drm-misc-next Thanks! The updated version is at https://lore.kernel.org/all/20240909131643.28915-1-yan.y.zhao@intel.com
© 2016 - 2026 Red Hat, Inc.