[PATCH v2] drm/bochs: use devm_ioremap_wc() to map framebuffer

Yan Zhao posted 1 patch 2 months, 3 weeks ago
drivers/gpu/drm/tiny/bochs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH v2] drm/bochs: use devm_ioremap_wc() to map framebuffer
Posted by Yan Zhao 2 months, 3 weeks ago
Opt for devm_ioremap_wc() over devm_ioremap() when mapping the framebuffer.

Using devm_ioremap() results in the VA being mapped with PAT=UC-, which
considerably slows down drm_fb_memcpy(). In contrast, devm_ioremap_wc()
maps the VA with PAT set to WC, leading to better performance on platforms
where access to UC memory is much slower than WC memory.

Here's the performance data measured in a guest on the physical machine
"Sapphire Rapids XCC".
With host KVM honors guest PAT memory types, the effective memory type
for this framebuffer range is
- WC when devm_ioremap_wc() is used
- UC- when devm_ioremap() is used.

The data presented is an average from 10 execution runs.

Cycles: Avg cycles of executed bochs_primary_plane_helper_atomic_update()
        from VM boot to GDM show up
Cnt:    Avg cnt of executed bochs_primary_plane_helper_atomic_update()
        from VM boot to GDM show up
T:      Avg time of each bochs_primary_plane_helper_atomic_update().

 -------------------------------------------------
|            | devm_ioremap() | devm_ioremap_wc() |
|------------|----------------|-------------------|
|  Cycles    |    211.545M    |   0.157M          |
|------------|----------------|-------------------|
|  Cnt       |     142        |   1917            |
|------------|----------------|-------------------|
|  T         |    0.1748s     |   0.0004s         |
 -------------------------------------------------

Note:
Following the rebase to [3], the previously reported GDM failure on the
VGA device [1] can no longer be reproduced, thanks to the memory management
improvements made in [2]. Despite this, I have proceeded to submit this
patch because of the noticeable performance improvements it provides.

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Closes: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com/#t
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com/#t [1]
Link: https://patchwork.freedesktop.org/series/138086 [2]
Link: https://gitlab.freedesktop.org/drm/misc/kernel/-/tree/drm-misc-next [3]
---
v2:
- Rebased to the latest drm-misc-next branch. [2]
- Updated patch log to match the base code.

v1: https://lore.kernel.org/all/20240909051529.26776-1-yan.y.zhao@intel.com
---
 drivers/gpu/drm/tiny/bochs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c
index 69c5f65e9853..9055b1dd66df 100644
--- a/drivers/gpu/drm/tiny/bochs.c
+++ b/drivers/gpu/drm/tiny/bochs.c
@@ -268,7 +268,7 @@ static int bochs_hw_init(struct bochs_device *bochs)
 	if (!devm_request_mem_region(&pdev->dev, addr, size, "bochs-drm"))
 		DRM_WARN("Cannot request framebuffer, boot fb still active?\n");
 
-	bochs->fb_map = devm_ioremap(&pdev->dev, addr, size);
+	bochs->fb_map = devm_ioremap_wc(&pdev->dev, addr, size);
 	if (bochs->fb_map == NULL) {
 		DRM_ERROR("Cannot map framebuffer\n");
 		return -ENOMEM;
-- 
2.43.2
Re: [PATCH v2] drm/bochs: use devm_ioremap_wc() to map framebuffer
Posted by Vitaly Kuznetsov 2 months, 2 weeks ago
Yan Zhao <yan.y.zhao@intel.com> writes:

> Opt for devm_ioremap_wc() over devm_ioremap() when mapping the framebuffer.
>
> Using devm_ioremap() results in the VA being mapped with PAT=UC-, which
> considerably slows down drm_fb_memcpy(). In contrast, devm_ioremap_wc()
> maps the VA with PAT set to WC, leading to better performance on platforms
> where access to UC memory is much slower than WC memory.
>
> Here's the performance data measured in a guest on the physical machine
> "Sapphire Rapids XCC".
> With host KVM honors guest PAT memory types, the effective memory type
> for this framebuffer range is
> - WC when devm_ioremap_wc() is used
> - UC- when devm_ioremap() is used.
>
> The data presented is an average from 10 execution runs.
>
> Cycles: Avg cycles of executed bochs_primary_plane_helper_atomic_update()
>         from VM boot to GDM show up
> Cnt:    Avg cnt of executed bochs_primary_plane_helper_atomic_update()
>         from VM boot to GDM show up
> T:      Avg time of each bochs_primary_plane_helper_atomic_update().
>
>  -------------------------------------------------
> |            | devm_ioremap() | devm_ioremap_wc() |
> |------------|----------------|-------------------|
> |  Cycles    |    211.545M    |   0.157M          |
> |------------|----------------|-------------------|
> |  Cnt       |     142        |   1917            |
> |------------|----------------|-------------------|
> |  T         |    0.1748s     |   0.0004s         |
>  -------------------------------------------------
>
> Note:
> Following the rebase to [3], the previously reported GDM failure on the
> VGA device [1] can no longer be reproduced, thanks to the memory management
> improvements made in [2]. Despite this, I have proceeded to submit this
> patch because of the noticeable performance improvements it provides.
>
> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>

FWIW, this patch (alone) resolves the observed issue, thanks!

Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>

I, however, share Paolo's concern around existing VMs which KVM's change
is effectively breaking.

> Closes: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com/#t
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Link: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com/#t [1]
> Link: https://patchwork.freedesktop.org/series/138086 [2]
> Link: https://gitlab.freedesktop.org/drm/misc/kernel/-/tree/drm-misc-next [3]
> ---
> v2:
> - Rebased to the latest drm-misc-next branch. [2]
> - Updated patch log to match the base code.
>
> v1: https://lore.kernel.org/all/20240909051529.26776-1-yan.y.zhao@intel.com
> ---
>  drivers/gpu/drm/tiny/bochs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c
> index 69c5f65e9853..9055b1dd66df 100644
> --- a/drivers/gpu/drm/tiny/bochs.c
> +++ b/drivers/gpu/drm/tiny/bochs.c
> @@ -268,7 +268,7 @@ static int bochs_hw_init(struct bochs_device *bochs)
>  	if (!devm_request_mem_region(&pdev->dev, addr, size, "bochs-drm"))
>  		DRM_WARN("Cannot request framebuffer, boot fb still active?\n");
>  
> -	bochs->fb_map = devm_ioremap(&pdev->dev, addr, size);
> +	bochs->fb_map = devm_ioremap_wc(&pdev->dev, addr, size);
>  	if (bochs->fb_map == NULL) {
>  		DRM_ERROR("Cannot map framebuffer\n");
>  		return -ENOMEM;

-- 
Vitaly
Re: [PATCH v2] drm/bochs: use devm_ioremap_wc() to map framebuffer
Posted by Thomas Zimmermann 2 months, 3 weeks ago
Hi

Am 09.09.24 um 15:16 schrieb Yan Zhao:
> Opt for devm_ioremap_wc() over devm_ioremap() when mapping the framebuffer.
>
> Using devm_ioremap() results in the VA being mapped with PAT=UC-, which
> considerably slows down drm_fb_memcpy(). In contrast, devm_ioremap_wc()
> maps the VA with PAT set to WC, leading to better performance on platforms
> where access to UC memory is much slower than WC memory.
>
> Here's the performance data measured in a guest on the physical machine
> "Sapphire Rapids XCC".
> With host KVM honors guest PAT memory types, the effective memory type
> for this framebuffer range is
> - WC when devm_ioremap_wc() is used
> - UC- when devm_ioremap() is used.
>
> The data presented is an average from 10 execution runs.
>
> Cycles: Avg cycles of executed bochs_primary_plane_helper_atomic_update()
>          from VM boot to GDM show up
> Cnt:    Avg cnt of executed bochs_primary_plane_helper_atomic_update()
>          from VM boot to GDM show up
> T:      Avg time of each bochs_primary_plane_helper_atomic_update().
>
>   -------------------------------------------------
> |            | devm_ioremap() | devm_ioremap_wc() |
> |------------|----------------|-------------------|
> |  Cycles    |    211.545M    |   0.157M          |
> |------------|----------------|-------------------|
> |  Cnt       |     142        |   1917            |
> |------------|----------------|-------------------|
> |  T         |    0.1748s     |   0.0004s         |
>   -------------------------------------------------

Very nice. Thank you so much.

> Note:
> Following the rebase to [3], the previously reported GDM failure on the
> VGA device [1] can no longer be reproduced, thanks to the memory management
> improvements made in [2]. Despite this, I have proceeded to submit this
> patch because of the noticeable performance improvements it provides.
>
> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Closes: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com/#t
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Link: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com/#t [1]
> Link: https://patchwork.freedesktop.org/series/138086 [2]
> Link: https://gitlab.freedesktop.org/drm/misc/kernel/-/tree/drm-misc-next [3]

Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Texted-by: Thomas Zimmermann <tzimmermann@suse.de>

If no other reviews come in, I'll merge the patch by the end of the week.

Best regards
Thomas

> ---
> v2:
> - Rebased to the latest drm-misc-next branch. [2]
> - Updated patch log to match the base code.
>
> v1: https://lore.kernel.org/all/20240909051529.26776-1-yan.y.zhao@intel.com
> ---
>   drivers/gpu/drm/tiny/bochs.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c
> index 69c5f65e9853..9055b1dd66df 100644
> --- a/drivers/gpu/drm/tiny/bochs.c
> +++ b/drivers/gpu/drm/tiny/bochs.c
> @@ -268,7 +268,7 @@ static int bochs_hw_init(struct bochs_device *bochs)
>   	if (!devm_request_mem_region(&pdev->dev, addr, size, "bochs-drm"))
>   		DRM_WARN("Cannot request framebuffer, boot fb still active?\n");
>   
> -	bochs->fb_map = devm_ioremap(&pdev->dev, addr, size);
> +	bochs->fb_map = devm_ioremap_wc(&pdev->dev, addr, size);
>   	if (bochs->fb_map == NULL) {
>   		DRM_ERROR("Cannot map framebuffer\n");
>   		return -ENOMEM;

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)
Re: [PATCH v2] drm/bochs: use devm_ioremap_wc() to map framebuffer
Posted by Yan Zhao 2 months, 3 weeks ago
On Tue, Sep 10, 2024 at 09:24:22AM +0200, Thomas Zimmermann wrote:
> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
> Texted-by: Thomas Zimmermann <tzimmermann@suse.de>
Thank you!
I suppose this is "Tested-by", right?
Re: [PATCH v2] drm/bochs: use devm_ioremap_wc() to map framebuffer
Posted by Thomas Zimmermann 2 months, 2 weeks ago

Am 10.09.24 um 10:05 schrieb Yan Zhao:
> On Tue, Sep 10, 2024 at 09:24:22AM +0200, Thomas Zimmermann wrote:
>> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
>> Texted-by: Thomas Zimmermann <tzimmermann@suse.de>
> Thank you!
> I suppose this is "Tested-by", right?

Yes

Tested-by: Thomas Zimmermann <tzimmermann@suse.de>

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)