[PATCH 2/2] drm/amdgpu: Bypass resizing bars for PVH dom0

Jiqian Chen posted 2 patches 2 weeks, 5 days ago
[PATCH 2/2] drm/amdgpu: Bypass resizing bars for PVH dom0
Posted by Jiqian Chen 2 weeks, 5 days ago
VPCI of Xen doesn't support resizable bar. When discrete GPU is used on
PVH dom0 which using the VPCI, amdgpu fails to probe, so we need to
disable this capability for PVH dom0.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Huang Rui <Ray.Huang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b3fb92bbd9e2..012feb3790dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1619,6 +1619,10 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
 	if (!IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
 		return 0;
 
+	/* Bypass for PVH dom0 which doesn't support resizable bar */
+	if (xen_initial_domain() && xen_pvh_domain())
+		return 0;
+
 	/* Bypass for VF */
 	if (amdgpu_sriov_vf(adev))
 		return 0;
-- 
2.34.1
Re: [PATCH 2/2] drm/amdgpu: Bypass resizing bars for PVH dom0
Posted by kernel test robot 2 weeks, 4 days ago
Hi Jiqian,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.12-rc6 next-20241105]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Jiqian-Chen/drm-amdgpu-set-passthrough-mode-for-xen-pvh-hvm/20241105-141716
base:   linus/master
patch link:    https://lore.kernel.org/r/20241105060531.3503788-3-Jiqian.Chen%40amd.com
patch subject: [PATCH 2/2] drm/amdgpu: Bypass resizing bars for PVH dom0
config: arm64-allmodconfig (https://download.01.org/0day-ci/archive/20241106/202411060019.p34zs7ce-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 639a7ac648f1e50ccd2556e17d401c04f9cce625)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241106/202411060019.p34zs7ce-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411060019.p34zs7ce-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:33:
   In file included from include/linux/iommu.h:10:
   In file included from include/linux/scatterlist.h:8:
   In file included from include/linux/mm.h:2213:
   include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     504 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     505 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     511 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     512 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     524 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     525 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1542:6: error: call to undeclared function 'xen_initial_domain'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    1542 |         if (xen_initial_domain() && xen_pvh_domain())
         |             ^
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1542:30: error: call to undeclared function 'xen_pvh_domain'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    1542 |         if (xen_initial_domain() && xen_pvh_domain())
         |                                     ^
   4 warnings and 2 errors generated.


vim +/xen_initial_domain +1542 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

  1519	
  1520	/**
  1521	 * amdgpu_device_resize_fb_bar - try to resize FB BAR
  1522	 *
  1523	 * @adev: amdgpu_device pointer
  1524	 *
  1525	 * Try to resize FB BAR to make all VRAM CPU accessible. We try very hard not
  1526	 * to fail, but if any of the BARs is not accessible after the size we abort
  1527	 * driver loading by returning -ENODEV.
  1528	 */
  1529	int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
  1530	{
  1531		int rbar_size = pci_rebar_bytes_to_size(adev->gmc.real_vram_size);
  1532		struct pci_bus *root;
  1533		struct resource *res;
  1534		unsigned int i;
  1535		u16 cmd;
  1536		int r;
  1537	
  1538		if (!IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
  1539			return 0;
  1540	
  1541		/* Bypass for PVH dom0 which doesn't support resizable bar */
> 1542		if (xen_initial_domain() && xen_pvh_domain())
  1543			return 0;
  1544	
  1545		/* Bypass for VF */
  1546		if (amdgpu_sriov_vf(adev))
  1547			return 0;
  1548	
  1549		/* PCI_EXT_CAP_ID_VNDR extended capability is located at 0x100 */
  1550		if (!pci_find_ext_capability(adev->pdev, PCI_EXT_CAP_ID_VNDR))
  1551			DRM_WARN("System can't access extended configuration space, please check!!\n");
  1552	
  1553		/* skip if the bios has already enabled large BAR */
  1554		if (adev->gmc.real_vram_size &&
  1555		    (pci_resource_len(adev->pdev, 0) >= adev->gmc.real_vram_size))
  1556			return 0;
  1557	
  1558		/* Check if the root BUS has 64bit memory resources */
  1559		root = adev->pdev->bus;
  1560		while (root->parent)
  1561			root = root->parent;
  1562	
  1563		pci_bus_for_each_resource(root, res, i) {
  1564			if (res && res->flags & (IORESOURCE_MEM | IORESOURCE_MEM_64) &&
  1565			    res->start > 0x100000000ull)
  1566				break;
  1567		}
  1568	
  1569		/* Trying to resize is pointless without a root hub window above 4GB */
  1570		if (!res)
  1571			return 0;
  1572	
  1573		/* Limit the BAR size to what is available */
  1574		rbar_size = min(fls(pci_rebar_get_possible_sizes(adev->pdev, 0)) - 1,
  1575				rbar_size);
  1576	
  1577		/* Disable memory decoding while we change the BAR addresses and size */
  1578		pci_read_config_word(adev->pdev, PCI_COMMAND, &cmd);
  1579		pci_write_config_word(adev->pdev, PCI_COMMAND,
  1580				      cmd & ~PCI_COMMAND_MEMORY);
  1581	
  1582		/* Free the VRAM and doorbell BAR, we most likely need to move both. */
  1583		amdgpu_doorbell_fini(adev);
  1584		if (adev->asic_type >= CHIP_BONAIRE)
  1585			pci_release_resource(adev->pdev, 2);
  1586	
  1587		pci_release_resource(adev->pdev, 0);
  1588	
  1589		r = pci_resize_resource(adev->pdev, 0, rbar_size);
  1590		if (r == -ENOSPC)
  1591			DRM_INFO("Not enough PCI address space for a large BAR.");
  1592		else if (r && r != -ENOTSUPP)
  1593			DRM_ERROR("Problem resizing BAR0 (%d).", r);
  1594	
  1595		pci_assign_unassigned_bus_resources(adev->pdev->bus);
  1596	
  1597		/* When the doorbell or fb BAR isn't available we have no chance of
  1598		 * using the device.
  1599		 */
  1600		r = amdgpu_doorbell_init(adev);
  1601		if (r || (pci_resource_flags(adev->pdev, 0) & IORESOURCE_UNSET))
  1602			return -ENODEV;
  1603	
  1604		pci_write_config_word(adev->pdev, PCI_COMMAND, cmd);
  1605	
  1606		return 0;
  1607	}
  1608	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH 2/2] drm/amdgpu: Bypass resizing bars for PVH dom0
Posted by Christian König 2 weeks, 5 days ago
Am 05.11.24 um 07:05 schrieb Jiqian Chen:
> VPCI of Xen doesn't support resizable bar. When discrete GPU is used on
> PVH dom0 which using the VPCI, amdgpu fails to probe, so we need to
> disable this capability for PVH dom0.

What do you mean VPCI doesn't support resizeable BAR?

This is mandatory to be supported or otherwise general PCI resource 
assignment won't work either.

In other words you can't hotplug something if that here doesn't work either.

Regards,
Christian.

>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Reviewed-by: Huang Rui <Ray.Huang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index b3fb92bbd9e2..012feb3790dd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1619,6 +1619,10 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
>   	if (!IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
>   		return 0;
>   
> +	/* Bypass for PVH dom0 which doesn't support resizable bar */
> +	if (xen_initial_domain() && xen_pvh_domain())
> +		return 0;
> +
>   	/* Bypass for VF */
>   	if (amdgpu_sriov_vf(adev))
>   		return 0;
Re: [PATCH 2/2] drm/amdgpu: Bypass resizing bars for PVH dom0
Posted by kernel test robot 2 weeks, 5 days ago
Hi Jiqian,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.12-rc6 next-20241105]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Jiqian-Chen/drm-amdgpu-set-passthrough-mode-for-xen-pvh-hvm/20241105-141716
base:   linus/master
patch link:    https://lore.kernel.org/r/20241105060531.3503788-3-Jiqian.Chen%40amd.com
patch subject: [PATCH 2/2] drm/amdgpu: Bypass resizing bars for PVH dom0
config: arc-randconfig-002-20241105 (https://download.01.org/0day-ci/archive/20241105/202411051924.dZP9MxDH-lkp@intel.com/config)
compiler: arceb-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241105/202411051924.dZP9MxDH-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411051924.dZP9MxDH-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/dev_printk.h:14,
                    from include/linux/device.h:15,
                    from include/linux/power_supply.h:15,
                    from drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:28:
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: In function 'amdgpu_device_resize_fb_bar':
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1542:13: error: implicit declaration of function 'xen_initial_domain' [-Werror=implicit-function-declaration]
    1542 |         if (xen_initial_domain() && xen_pvh_domain())
         |             ^~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:57:52: note: in definition of macro '__trace_if_var'
      57 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
         |                                                    ^~~~
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1542:9: note: in expansion of macro 'if'
    1542 |         if (xen_initial_domain() && xen_pvh_domain())
         |         ^~
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1542:37: error: implicit declaration of function 'xen_pvh_domain' [-Werror=implicit-function-declaration]
    1542 |         if (xen_initial_domain() && xen_pvh_domain())
         |                                     ^~~~~~~~~~~~~~
   include/linux/compiler.h:57:52: note: in definition of macro '__trace_if_var'
      57 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : __trace_if_value(cond))
         |                                                    ^~~~
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1542:9: note: in expansion of macro 'if'
    1542 |         if (xen_initial_domain() && xen_pvh_domain())
         |         ^~
   cc1: some warnings being treated as errors


vim +/xen_initial_domain +1542 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

  1519	
  1520	/**
  1521	 * amdgpu_device_resize_fb_bar - try to resize FB BAR
  1522	 *
  1523	 * @adev: amdgpu_device pointer
  1524	 *
  1525	 * Try to resize FB BAR to make all VRAM CPU accessible. We try very hard not
  1526	 * to fail, but if any of the BARs is not accessible after the size we abort
  1527	 * driver loading by returning -ENODEV.
  1528	 */
  1529	int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
  1530	{
  1531		int rbar_size = pci_rebar_bytes_to_size(adev->gmc.real_vram_size);
  1532		struct pci_bus *root;
  1533		struct resource *res;
  1534		unsigned int i;
  1535		u16 cmd;
  1536		int r;
  1537	
  1538		if (!IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
  1539			return 0;
  1540	
  1541		/* Bypass for PVH dom0 which doesn't support resizable bar */
> 1542		if (xen_initial_domain() && xen_pvh_domain())
  1543			return 0;
  1544	
  1545		/* Bypass for VF */
  1546		if (amdgpu_sriov_vf(adev))
  1547			return 0;
  1548	
  1549		/* PCI_EXT_CAP_ID_VNDR extended capability is located at 0x100 */
  1550		if (!pci_find_ext_capability(adev->pdev, PCI_EXT_CAP_ID_VNDR))
  1551			DRM_WARN("System can't access extended configuration space, please check!!\n");
  1552	
  1553		/* skip if the bios has already enabled large BAR */
  1554		if (adev->gmc.real_vram_size &&
  1555		    (pci_resource_len(adev->pdev, 0) >= adev->gmc.real_vram_size))
  1556			return 0;
  1557	
  1558		/* Check if the root BUS has 64bit memory resources */
  1559		root = adev->pdev->bus;
  1560		while (root->parent)
  1561			root = root->parent;
  1562	
  1563		pci_bus_for_each_resource(root, res, i) {
  1564			if (res && res->flags & (IORESOURCE_MEM | IORESOURCE_MEM_64) &&
  1565			    res->start > 0x100000000ull)
  1566				break;
  1567		}
  1568	
  1569		/* Trying to resize is pointless without a root hub window above 4GB */
  1570		if (!res)
  1571			return 0;
  1572	
  1573		/* Limit the BAR size to what is available */
  1574		rbar_size = min(fls(pci_rebar_get_possible_sizes(adev->pdev, 0)) - 1,
  1575				rbar_size);
  1576	
  1577		/* Disable memory decoding while we change the BAR addresses and size */
  1578		pci_read_config_word(adev->pdev, PCI_COMMAND, &cmd);
  1579		pci_write_config_word(adev->pdev, PCI_COMMAND,
  1580				      cmd & ~PCI_COMMAND_MEMORY);
  1581	
  1582		/* Free the VRAM and doorbell BAR, we most likely need to move both. */
  1583		amdgpu_doorbell_fini(adev);
  1584		if (adev->asic_type >= CHIP_BONAIRE)
  1585			pci_release_resource(adev->pdev, 2);
  1586	
  1587		pci_release_resource(adev->pdev, 0);
  1588	
  1589		r = pci_resize_resource(adev->pdev, 0, rbar_size);
  1590		if (r == -ENOSPC)
  1591			DRM_INFO("Not enough PCI address space for a large BAR.");
  1592		else if (r && r != -ENOTSUPP)
  1593			DRM_ERROR("Problem resizing BAR0 (%d).", r);
  1594	
  1595		pci_assign_unassigned_bus_resources(adev->pdev->bus);
  1596	
  1597		/* When the doorbell or fb BAR isn't available we have no chance of
  1598		 * using the device.
  1599		 */
  1600		r = amdgpu_doorbell_init(adev);
  1601		if (r || (pci_resource_flags(adev->pdev, 0) & IORESOURCE_UNSET))
  1602			return -ENODEV;
  1603	
  1604		pci_write_config_word(adev->pdev, PCI_COMMAND, cmd);
  1605	
  1606		return 0;
  1607	}
  1608	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki