[Qemu RFC 0/7] Early enabling of DCD emulation in Qemu

Fan Ni posted 7 patches 12 months ago
Failed in applying to current master (apply log)
hw/cxl/cxl-mailbox-utils.c  | 389 +++++++++++++++++++++++++++-
hw/mem/cxl_type3.c          | 492 +++++++++++++++++++++++++++++++-----
include/hw/cxl/cxl_device.h |  50 +++-
include/hw/cxl/cxl_events.h |  16 ++
qapi/cxl.json               |  44 ++++
5 files changed, 924 insertions(+), 67 deletions(-)
[Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by Fan Ni 12 months ago
Since the early draft of DCD support in kernel is out
(https://lore.kernel.org/linux-cxl/20230417164126.GA1904906@bgt-140510-bm03/T/#t),
this patch series provide dcd emulation in qemu so people who are interested
can have an early try. It is noted that the patch series may need to be updated
accordingly if the kernel side implementation changes.

To support DCD emulation, the patch series add DCD related mailbox command
support (CXL Spec 3.0: 8.2.9.8.9), and extend the cxl type3 memory device
with dynamic capacity extent and region representative.
To support read/write to the dynamic capacity of the device, a host backend
is provided and necessary check mechnism is added to ensure the dynamic
capacity accessed is backed with active dc extents.
Currently FM related mailbox commands (cxl spec 3.0: 7.6.7.6) is not supported
, but we add two qmp interfaces for adding/releasing dynamic capacity extents.
Also, the support for multiple hosts sharing the same DCD case is missing.

Things we can try with the patch series together with kernel dcd code:
1. Create DC regions to cover the address range of the dynamic capacity
regions.
2. Add/release dynamic capacity extents to the device and notify the
kernel.
3. Test kernel side code to accept added dc extents and create dax devices,
and release dc extents and notify the device
4. Online the memory range backed with dc extents and let application use
them.

The patch series is based on Jonathan's local qemu branch:
https://gitlab.com/jic23/qemu/-/tree/cxl-2023-02-28

Simple tests peformed with the patch series:
1 Install cxl modules:

modprobe -a cxl_acpi cxl_core cxl_pci cxl_port cxl_mem

2 Create dc regions:

region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
echo 0x10000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
echo 0x10000000 > /sys/bus/cxl/devices/$region/size
echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
echo 1 > /sys/bus/cxl/devices/$region/commit
echo $region > /sys/bus/cxl/drivers/cxl_region/bind

/home/fan/cxl/tools-and-scripts# cxl list
[
  {
    "memdevs":[
      {
        "memdev":"mem0",
        "pmem_size":536870912,
        "ram_size":0,
        "serial":0,
        "host":"0000:0d:00.0"
      }
    ]
  },
  {
    "regions":[
      {
        "region":"region0",
        "resource":45365592064,
        "size":268435456,
        "interleave_ways":1,
        "interleave_granularity":256,
        "decode_state":"commit"
      }
    ]
  }
]

3 Add two dc extents (128MB each) through qmp interface

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity-event",
	"arguments": {
		 "path": "/machine/peripheral/cxl-pmem0",
		"region-id" : 0,
		 "num-extent": 2,
		"dpa":0,
		"extent-len": 128
	}
}

/home/fan/cxl/tools-and-scripts# lsmem
RANGE                                  SIZE   STATE REMOVABLE   BLOCK
0x0000000000000000-0x000000007fffffff    2G  online       yes    0-15
0x0000000100000000-0x000000027fffffff    6G  online       yes   32-79
0x0000000a90000000-0x0000000a9fffffff  256M offline           338-339

Memory block size:       128M
Total online memory:       8G
Total offline memory:    256M


4.Online the momory with 'daxctl online-memory dax0.0' to online the memory

/home/fan/cxl/ndctl# ./build/daxctl/daxctl online-memory dax0.0
[  230.730553] Fallback order for Node 0: 0 1
[  230.730825] Fallback order for Node 1: 1 0
[  230.730953] Built 2 zonelists, mobility grouping on.  Total pages: 2042541
[  230.731110] Policy zone: Normal
onlined memory for 1 device

root@bgt-140510-bm03:/home/fan/cxl/ndctl# lsmem
RANGE                                  SIZE   STATE REMOVABLE BLOCK
0x0000000000000000-0x000000007fffffff    2G  online       yes  0-15
0x0000000100000000-0x000000027fffffff    6G  online       yes 32-79
0x0000000a90000000-0x0000000a97ffffff  128M  online       yes   338
0x0000000a98000000-0x0000000a9fffffff  128M offline             339

Memory block size:       128M
Total online memory:     8.1G
Total offline memory:    128M

5 using dc extents as regular memory

/home/fan/cxl/ndctl# numactl --membind=1 ls
CONTRIBUTING.md  README.md  clean_config.sh  cscope.out   git-version-gen
ndctl	       scripts	test.h      version.h.in COPYING		 acpi.h
config.h.meson   cxl	  make-git-snapshot.sh	ndctl.spec.in  sles	tools
Documentation	 build	    contrib	     daxctl	  meson.build		rhel
tags	topology.png LICENSES	 ccan	    cscope.files
git-version  meson_options.txt	rpmbuild.sh    test	util


QEMU command line cxl configuration:

RP1="-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=512M \
-object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=512M \
-object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=512M \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
-device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,dc-memdev=cxl-mem2,id=cxl-pmem0,num-dc-regions=1\
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k"


Kernel DCD support used to test the changes

The code is tested with the posted kernel dcd support:
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.5/dcd-preview

commit: f425bc34c600e2a3721d6560202962ec41622815

To make the test work, we have made the following changes to the above kernel commit:

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 5f04bbc18af5..5f421d3c5cef 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -68,6 +68,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
 	CXL_CMD(SCAN_MEDIA, 0x11, 0, 0),
 	CXL_CMD(GET_SCAN_MEDIA, 0, CXL_VARIABLE_PAYLOAD, 0),
 	CXL_CMD(GET_DC_EXTENT_LIST, 0x8, CXL_VARIABLE_PAYLOAD, 0),
+	CXL_CMD(GET_DC_CONFIG, 0x2, CXL_VARIABLE_PAYLOAD, 0),
 };
 
 /*
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 291c716abd49..ae10e3cf43a1 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -194,7 +194,7 @@ static int cxl_region_manage_dc(struct cxl_region *cxlr)
 		}
 		cxlds->dc_list_gen_num = extent_gen_num;
 		dev_dbg(cxlds->dev, "No of preallocated extents :%d\n", rc);
-		enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);
+		/*enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);*/
 	}
 	return 0;
 err:
@@ -2810,7 +2810,8 @@ int cxl_add_dc_extent(struct cxl_dev_state *cxlds, struct resource *alloc_dpa_re
 				dev_dax->align, memremap_compat_align()))) {
 		rc = alloc_dev_dax_range(dev_dax, hpa,
 					resource_size(alloc_dpa_res));
-		return rc;
+		if (rc)
+			return rc;
 	}
 
 	rc = xa_insert(&cxlr_dc->dax_dev_list, hpa, dev_dax, GFP_KERNEL);
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 9e45b1056022..653bec203838 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -659,7 +659,7 @@ static int cxl_event_irqsetup(struct cxl_dev_state *cxlds)
 
 	/* Driver enables DCD interrupt after creating the dc cxl_region */
 	rc = cxl_event_req_irq(cxlds, policy.dyncap_settings, CXL_EVENT_TYPE_DCD,
-					IRQF_SHARED | IRQF_ONESHOT | IRQF_NO_AUTOEN);
+					IRQF_SHARED | IRQF_ONESHOT);
 	if (rc) {
 		dev_err(cxlds->dev, "Failed to get interrupt for event dc log\n");
 		return rc;
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index 6ca85861750c..910a48259239 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -47,6 +47,7 @@
 	___C(SCAN_MEDIA, "Scan Media"),                                   \
 	___C(GET_SCAN_MEDIA, "Get Scan Media Results"),                   \
 	___C(GET_DC_EXTENT_LIST, "Get dynamic capacity extents"),         \
+	___C(GET_DC_CONFIG, "Get dynamic capacity configuration"),         \
 	___C(MAX, "invalid / last command")
 
 #define ___C(a, b) CXL_MEM_COMMAND_ID_##a



Fan Ni (7):
  hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
    payload of identify memory device command
  hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
    and mailbox command support
  hw/mem/cxl_type3: Add a parameter to pass number of DC regions the
    device supports in qemu command line
  hw/mem/cxl_type3: Add DC extent representative to cxl type3 device
  hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
    dynamic capacity response
  Add qmp interfaces to add/release dynamic capacity extents
  hw/mem/cxl_type3: add read/write support to dynamic capacity

 hw/cxl/cxl-mailbox-utils.c  | 389 +++++++++++++++++++++++++++-
 hw/mem/cxl_type3.c          | 492 +++++++++++++++++++++++++++++++-----
 include/hw/cxl/cxl_device.h |  50 +++-
 include/hw/cxl/cxl_events.h |  16 ++
 qapi/cxl.json               |  44 ++++
 5 files changed, 924 insertions(+), 67 deletions(-)

-- 
2.25.1
Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by Fan Ni 9 months, 2 weeks ago
On Thu, May 11, 2023 at 05:56:40PM +0000, Fan Ni wrote:

FYI.

I have updated the patch series and sent out again.

I suggested anyone who are interested in DCD and using this patch series to
use the new series. Quite a few things has been fixed.

https://lore.kernel.org/linux-cxl/20230724162313.34196-1-fan.ni@samsung.com/T/#t

Also, if you want to use the code repo directly, you can try

https://github.com/moking/qemu-dcd-preview-latest/tree/dcd-dev

Fan


> Since the early draft of DCD support in kernel is out
> (https://lore.kernel.org/linux-cxl/20230417164126.GA1904906@bgt-140510-bm03/T/#t),
> this patch series provide dcd emulation in qemu so people who are interested
> can have an early try. It is noted that the patch series may need to be updated
> accordingly if the kernel side implementation changes.
> 
> To support DCD emulation, the patch series add DCD related mailbox command
> support (CXL Spec 3.0: 8.2.9.8.9), and extend the cxl type3 memory device
> with dynamic capacity extent and region representative.
> To support read/write to the dynamic capacity of the device, a host backend
> is provided and necessary check mechnism is added to ensure the dynamic
> capacity accessed is backed with active dc extents.
> Currently FM related mailbox commands (cxl spec 3.0: 7.6.7.6) is not supported
> , but we add two qmp interfaces for adding/releasing dynamic capacity extents.
> Also, the support for multiple hosts sharing the same DCD case is missing.
> 
> Things we can try with the patch series together with kernel dcd code:
> 1. Create DC regions to cover the address range of the dynamic capacity
> regions.
> 2. Add/release dynamic capacity extents to the device and notify the
> kernel.
> 3. Test kernel side code to accept added dc extents and create dax devices,
> and release dc extents and notify the device
> 4. Online the memory range backed with dc extents and let application use
> them.
> 
> The patch series is based on Jonathan's local qemu branch:
> https://gitlab.com/jic23/qemu/-/tree/cxl-2023-02-28
> 
> Simple tests peformed with the patch series:
> 1 Install cxl modules:
> 
> modprobe -a cxl_acpi cxl_core cxl_pci cxl_port cxl_mem
> 
> 2 Create dc regions:
> 
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
> echo 0x10000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> echo 0x10000000 > /sys/bus/cxl/devices/$region/size
> echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> echo 1 > /sys/bus/cxl/devices/$region/commit
> echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> 
> /home/fan/cxl/tools-and-scripts# cxl list
> [
>   {
>     "memdevs":[
>       {
>         "memdev":"mem0",
>         "pmem_size":536870912,
>         "ram_size":0,
>         "serial":0,
>         "host":"0000:0d:00.0"
>       }
>     ]
>   },
>   {
>     "regions":[
>       {
>         "region":"region0",
>         "resource":45365592064,
>         "size":268435456,
>         "interleave_ways":1,
>         "interleave_granularity":256,
>         "decode_state":"commit"
>       }
>     ]
>   }
> ]
> 
> 3 Add two dc extents (128MB each) through qmp interface
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity-event",
> 	"arguments": {
> 		 "path": "/machine/peripheral/cxl-pmem0",
> 		"region-id" : 0,
> 		 "num-extent": 2,
> 		"dpa":0,
> 		"extent-len": 128
> 	}
> }
> 
> /home/fan/cxl/tools-and-scripts# lsmem
> RANGE                                  SIZE   STATE REMOVABLE   BLOCK
> 0x0000000000000000-0x000000007fffffff    2G  online       yes    0-15
> 0x0000000100000000-0x000000027fffffff    6G  online       yes   32-79
> 0x0000000a90000000-0x0000000a9fffffff  256M offline           338-339
> 
> Memory block size:       128M
> Total online memory:       8G
> Total offline memory:    256M
> 
> 
> 4.Online the momory with 'daxctl online-memory dax0.0' to online the memory
> 
> /home/fan/cxl/ndctl# ./build/daxctl/daxctl online-memory dax0.0
> [  230.730553] Fallback order for Node 0: 0 1
> [  230.730825] Fallback order for Node 1: 1 0
> [  230.730953] Built 2 zonelists, mobility grouping on.  Total pages: 2042541
> [  230.731110] Policy zone: Normal
> onlined memory for 1 device
> 
> root@bgt-140510-bm03:/home/fan/cxl/ndctl# lsmem
> RANGE                                  SIZE   STATE REMOVABLE BLOCK
> 0x0000000000000000-0x000000007fffffff    2G  online       yes  0-15
> 0x0000000100000000-0x000000027fffffff    6G  online       yes 32-79
> 0x0000000a90000000-0x0000000a97ffffff  128M  online       yes   338
> 0x0000000a98000000-0x0000000a9fffffff  128M offline             339
> 
> Memory block size:       128M
> Total online memory:     8.1G
> Total offline memory:    128M
> 
> 5 using dc extents as regular memory
> 
> /home/fan/cxl/ndctl# numactl --membind=1 ls
> CONTRIBUTING.md  README.md  clean_config.sh  cscope.out   git-version-gen
> ndctl	       scripts	test.h      version.h.in COPYING		 acpi.h
> config.h.meson   cxl	  make-git-snapshot.sh	ndctl.spec.in  sles	tools
> Documentation	 build	    contrib	     daxctl	  meson.build		rhel
> tags	topology.png LICENSES	 ccan	    cscope.files
> git-version  meson_options.txt	rpmbuild.sh    test	util
> 
> 
> QEMU command line cxl configuration:
> 
> RP1="-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=512M \
> -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=512M \
> -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=512M \
> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,dc-memdev=cxl-mem2,id=cxl-pmem0,num-dc-regions=1\
> -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k"
> 
> 
> Kernel DCD support used to test the changes
> 
> The code is tested with the posted kernel dcd support:
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.5/dcd-preview
> 
> commit: f425bc34c600e2a3721d6560202962ec41622815
> 
> To make the test work, we have made the following changes to the above kernel commit:
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 5f04bbc18af5..5f421d3c5cef 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -68,6 +68,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>  	CXL_CMD(SCAN_MEDIA, 0x11, 0, 0),
>  	CXL_CMD(GET_SCAN_MEDIA, 0, CXL_VARIABLE_PAYLOAD, 0),
>  	CXL_CMD(GET_DC_EXTENT_LIST, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> +	CXL_CMD(GET_DC_CONFIG, 0x2, CXL_VARIABLE_PAYLOAD, 0),
>  };
>  
>  /*
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 291c716abd49..ae10e3cf43a1 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -194,7 +194,7 @@ static int cxl_region_manage_dc(struct cxl_region *cxlr)
>  		}
>  		cxlds->dc_list_gen_num = extent_gen_num;
>  		dev_dbg(cxlds->dev, "No of preallocated extents :%d\n", rc);
> -		enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);
> +		/*enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);*/
>  	}
>  	return 0;
>  err:
> @@ -2810,7 +2810,8 @@ int cxl_add_dc_extent(struct cxl_dev_state *cxlds, struct resource *alloc_dpa_re
>  				dev_dax->align, memremap_compat_align()))) {
>  		rc = alloc_dev_dax_range(dev_dax, hpa,
>  					resource_size(alloc_dpa_res));
> -		return rc;
> +		if (rc)
> +			return rc;
>  	}
>  
>  	rc = xa_insert(&cxlr_dc->dax_dev_list, hpa, dev_dax, GFP_KERNEL);
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 9e45b1056022..653bec203838 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -659,7 +659,7 @@ static int cxl_event_irqsetup(struct cxl_dev_state *cxlds)
>  
>  	/* Driver enables DCD interrupt after creating the dc cxl_region */
>  	rc = cxl_event_req_irq(cxlds, policy.dyncap_settings, CXL_EVENT_TYPE_DCD,
> -					IRQF_SHARED | IRQF_ONESHOT | IRQF_NO_AUTOEN);
> +					IRQF_SHARED | IRQF_ONESHOT);
>  	if (rc) {
>  		dev_err(cxlds->dev, "Failed to get interrupt for event dc log\n");
>  		return rc;
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index 6ca85861750c..910a48259239 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -47,6 +47,7 @@
>  	___C(SCAN_MEDIA, "Scan Media"),                                   \
>  	___C(GET_SCAN_MEDIA, "Get Scan Media Results"),                   \
>  	___C(GET_DC_EXTENT_LIST, "Get dynamic capacity extents"),         \
> +	___C(GET_DC_CONFIG, "Get dynamic capacity configuration"),         \
>  	___C(MAX, "invalid / last command")
>  
>  #define ___C(a, b) CXL_MEM_COMMAND_ID_##a
> 
> 
> 
> Fan Ni (7):
>   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
>     payload of identify memory device command
>   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
>     and mailbox command support
>   hw/mem/cxl_type3: Add a parameter to pass number of DC regions the
>     device supports in qemu command line
>   hw/mem/cxl_type3: Add DC extent representative to cxl type3 device
>   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
>     dynamic capacity response
>   Add qmp interfaces to add/release dynamic capacity extents
>   hw/mem/cxl_type3: add read/write support to dynamic capacity
> 
>  hw/cxl/cxl-mailbox-utils.c  | 389 +++++++++++++++++++++++++++-
>  hw/mem/cxl_type3.c          | 492 +++++++++++++++++++++++++++++++-----
>  include/hw/cxl/cxl_device.h |  50 +++-
>  include/hw/cxl/cxl_events.h |  16 ++
>  qapi/cxl.json               |  44 ++++
>  5 files changed, 924 insertions(+), 67 deletions(-)
> 
> -- 
> 2.25.1
> 
Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by Ira Weiny 9 months, 2 weeks ago
Fan Ni wrote:
> On Thu, May 11, 2023 at 05:56:40PM +0000, Fan Ni wrote:
> 
> FYI.
> 
> I have updated the patch series and sent out again.
> 
> I suggested anyone who are interested in DCD and using this patch series to
> use the new series. Quite a few things has been fixed.
> 
> https://lore.kernel.org/linux-cxl/20230724162313.34196-1-fan.ni@samsung.com/T/#t
> 
> Also, if you want to use the code repo directly, you can try
> 
> https://github.com/moking/qemu-dcd-preview-latest/tree/dcd-dev

Thanks for the branch!

I took a quick look and I don't see a resolution to the problem I
mentioned with non DCD devices being supported.[1]

[1] https://lore.kernel.org/all/6483946e8152f_f1132294a2@iweiny-mobl.notmuch/

Did you fix this in a different way?  If I don't add DC to my mem devices they
don't get probed properly.  I'm still looking into this with your new branch,
but I don't think DC commands should be in the CEL if the device does not
support it.

Also I get a build warning on this branch I had to fix[3] as my build is
treating warnings as errors.[2]

I don't think this fix is technically necessary as 'list' should never be NULL
that I can see.  But might be nice to check or just use my fix.

I'll try and get to a review once I get the DCD stuff out on the list again.

Ira


[2]
../hw/mem/cxl_type3.c: In function ‘qmp_cxl_process_dynamic_capacity_event.constprop’:
../hw/mem/cxl_type3.c:2063:28: error: ‘rid’ may be used uninitialized [-Werror=maybe-uninitialized]
 2063 |     dCap.updated_region_id = rid;
      |     ~~~~~~~~~~~~~~~~~~~~~~~^~~~~
../hw/mem/cxl_type3.c:1987:13: note: ‘rid’ was declared here
 1987 |     uint8_t rid;
      |             ^~~
cc1: all warnings being treated as errors

[3]

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index e67328780407..d25e6064f6c9 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1984,7 +1984,7 @@ static void qmp_cxl_process_dynamic_capacity_event(const char *path,
     CXLDCExtentRecordList *list = records;
     CXLDCExtent_raw *extents;
     uint64_t dpa, len;
-    uint8_t rid;
+    uint8_t rid = 0;
     int i;
 
     if (!obj) {

Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by Fan Ni 9 months, 2 weeks ago
On Tue, Jul 25, 2023 at 08:18:08AM -0700, Ira Weiny wrote:

> Fan Ni wrote:
> > On Thu, May 11, 2023 at 05:56:40PM +0000, Fan Ni wrote:
> > 
> > FYI.
> > 
> > I have updated the patch series and sent out again.
> > 
> > I suggested anyone who are interested in DCD and using this patch series to
> > use the new series. Quite a few things has been fixed.
> > 
> > https://urldefense.com/v3/__https://lore.kernel.org/linux-cxl/20230724162313.34196-1-fan.ni@samsung.com/T/*t__;Iw!!EwVzqGoTKBqv-0DWAJBm!V8kDTpT5yLUAyWm3sFm7XIgN0QdNUyQYZd9vYLHjUVkMkhDT14F8avgNBh23KPAtsS_dGm2LZuHJ102mgIg$ 
> > 
> > Also, if you want to use the code repo directly, you can try
> > 
> > https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=0600fd6b-678be820-06017624-74fe485fb305-f529279062b02b73&q=1&e=5df65010-e62f-40a1-9a21-609eb1400921&u=https*3A*2F*2Fgithub.com*2Fmoking*2Fqemu-dcd-preview-latest*2Ftree*2Fdcd-dev__;JSUlJSUlJQ!!EwVzqGoTKBqv-0DWAJBm!V8kDTpT5yLUAyWm3sFm7XIgN0QdNUyQYZd9vYLHjUVkMkhDT14F8avgNBh23KPAtsS_dGm2LZuHJpB0SmNs$ 
> 
> Thanks for the branch!
> 
> I took a quick look and I don't see a resolution to the problem I
> mentioned with non DCD devices being supported.[1]
> 
> [1] https://urldefense.com/v3/__https://lore.kernel.org/all/6483946e8152f_f1132294a2@iweiny-mobl.notmuch/__;!!EwVzqGoTKBqv-0DWAJBm!V8kDTpT5yLUAyWm3sFm7XIgN0QdNUyQYZd9vYLHjUVkMkhDT14F8avgNBh23KPAtsS_dGm2LZuHJ4geSEAg$ 
> 
> Did you fix this in a different way?  If I don't add DC to my mem devices they
> don't get probed properly.  I'm still looking into this with your new branch,
> but I don't think DC commands should be in the CEL if the device does not
> support it.
> 
> Also I get a build warning on this branch I had to fix[3] as my build is
> treating warnings as errors.[2]
> 
> I don't think this fix is technically necessary as 'list' should never be NULL
> that I can see.  But might be nice to check or just use my fix.
> 
> I'll try and get to a review once I get the DCD stuff out on the list again.
> 
> Ira
> 

Oh, I missed your previous comments, let me look into it and fix
accordingly and send out a new version.

Btw, when I did the DCD test with the last DCD kernel code, I found
some issue there.

When I add a DCD extent for the first time, it will be recognized as
system RAM automatically and show up with command lsmem.

However, when I release it and try to re-add the same extent again.
The adding seems normal and the device will show up under /dev/ as
dax0.X. But it will not show up with lsmem command and I have to use
daxctl reconfigure command to turn it to system ram and then it can
show up with lsmem command. I would expect the behavior for the
first add and second add be the same.

Fan.


> 
> [2]
> ../hw/mem/cxl_type3.c: In function ‘qmp_cxl_process_dynamic_capacity_event.constprop’:
> ../hw/mem/cxl_type3.c:2063:28: error: ‘rid’ may be used uninitialized [-Werror=maybe-uninitialized]
>  2063 |     dCap.updated_region_id = rid;
>       |     ~~~~~~~~~~~~~~~~~~~~~~~^~~~~
> ../hw/mem/cxl_type3.c:1987:13: note: ‘rid’ was declared here
>  1987 |     uint8_t rid;
>       |             ^~~
> cc1: all warnings being treated as errors
> 
> [3]
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index e67328780407..d25e6064f6c9 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -1984,7 +1984,7 @@ static void qmp_cxl_process_dynamic_capacity_event(const char *path,
>      CXLDCExtentRecordList *list = records;
>      CXLDCExtent_raw *extents;
>      uint64_t dpa, len;
> -    uint8_t rid;
> +    uint8_t rid = 0;
>      int i;
>  
>      if (!obj) {
Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by Ira Weiny 11 months ago
Fan Ni wrote:
> Since the early draft of DCD support in kernel is out
> (https://lore.kernel.org/linux-cxl/20230417164126.GA1904906@bgt-140510-bm03/T/#t),
> this patch series provide dcd emulation in qemu so people who are interested
> can have an early try. It is noted that the patch series may need to be updated
> accordingly if the kernel side implementation changes.

Fan,

Do you have a git tree we can pull this from which is updated to a more
recent CXL branch from Jonathan?

Thanks,
Ira

> 
> To support DCD emulation, the patch series add DCD related mailbox command
> support (CXL Spec 3.0: 8.2.9.8.9), and extend the cxl type3 memory device
> with dynamic capacity extent and region representative.
> To support read/write to the dynamic capacity of the device, a host backend
> is provided and necessary check mechnism is added to ensure the dynamic
> capacity accessed is backed with active dc extents.
> Currently FM related mailbox commands (cxl spec 3.0: 7.6.7.6) is not supported
> , but we add two qmp interfaces for adding/releasing dynamic capacity extents.
> Also, the support for multiple hosts sharing the same DCD case is missing.
> 
> Things we can try with the patch series together with kernel dcd code:
> 1. Create DC regions to cover the address range of the dynamic capacity
> regions.
> 2. Add/release dynamic capacity extents to the device and notify the
> kernel.
> 3. Test kernel side code to accept added dc extents and create dax devices,
> and release dc extents and notify the device
> 4. Online the memory range backed with dc extents and let application use
> them.
> 
> The patch series is based on Jonathan's local qemu branch:
> https://gitlab.com/jic23/qemu/-/tree/cxl-2023-02-28
> 
> Simple tests peformed with the patch series:
> 1 Install cxl modules:
> 
> modprobe -a cxl_acpi cxl_core cxl_pci cxl_port cxl_mem
> 
> 2 Create dc regions:
> 
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
> echo 0x10000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> echo 0x10000000 > /sys/bus/cxl/devices/$region/size
> echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> echo 1 > /sys/bus/cxl/devices/$region/commit
> echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> 
> /home/fan/cxl/tools-and-scripts# cxl list
> [
>   {
>     "memdevs":[
>       {
>         "memdev":"mem0",
>         "pmem_size":536870912,
>         "ram_size":0,
>         "serial":0,
>         "host":"0000:0d:00.0"
>       }
>     ]
>   },
>   {
>     "regions":[
>       {
>         "region":"region0",
>         "resource":45365592064,
>         "size":268435456,
>         "interleave_ways":1,
>         "interleave_granularity":256,
>         "decode_state":"commit"
>       }
>     ]
>   }
> ]
> 
> 3 Add two dc extents (128MB each) through qmp interface
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity-event",
> 	"arguments": {
> 		 "path": "/machine/peripheral/cxl-pmem0",
> 		"region-id" : 0,
> 		 "num-extent": 2,
> 		"dpa":0,
> 		"extent-len": 128
> 	}
> }
> 
> /home/fan/cxl/tools-and-scripts# lsmem
> RANGE                                  SIZE   STATE REMOVABLE   BLOCK
> 0x0000000000000000-0x000000007fffffff    2G  online       yes    0-15
> 0x0000000100000000-0x000000027fffffff    6G  online       yes   32-79
> 0x0000000a90000000-0x0000000a9fffffff  256M offline           338-339
> 
> Memory block size:       128M
> Total online memory:       8G
> Total offline memory:    256M
> 
> 
> 4.Online the momory with 'daxctl online-memory dax0.0' to online the memory
> 
> /home/fan/cxl/ndctl# ./build/daxctl/daxctl online-memory dax0.0
> [  230.730553] Fallback order for Node 0: 0 1
> [  230.730825] Fallback order for Node 1: 1 0
> [  230.730953] Built 2 zonelists, mobility grouping on.  Total pages: 2042541
> [  230.731110] Policy zone: Normal
> onlined memory for 1 device
> 
> root@bgt-140510-bm03:/home/fan/cxl/ndctl# lsmem
> RANGE                                  SIZE   STATE REMOVABLE BLOCK
> 0x0000000000000000-0x000000007fffffff    2G  online       yes  0-15
> 0x0000000100000000-0x000000027fffffff    6G  online       yes 32-79
> 0x0000000a90000000-0x0000000a97ffffff  128M  online       yes   338
> 0x0000000a98000000-0x0000000a9fffffff  128M offline             339
> 
> Memory block size:       128M
> Total online memory:     8.1G
> Total offline memory:    128M
> 
> 5 using dc extents as regular memory
> 
> /home/fan/cxl/ndctl# numactl --membind=1 ls
> CONTRIBUTING.md  README.md  clean_config.sh  cscope.out   git-version-gen
> ndctl	       scripts	test.h      version.h.in COPYING		 acpi.h
> config.h.meson   cxl	  make-git-snapshot.sh	ndctl.spec.in  sles	tools
> Documentation	 build	    contrib	     daxctl	  meson.build		rhel
> tags	topology.png LICENSES	 ccan	    cscope.files
> git-version  meson_options.txt	rpmbuild.sh    test	util
> 
> 
> QEMU command line cxl configuration:
> 
> RP1="-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=512M \
> -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=512M \
> -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=512M \
> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,dc-memdev=cxl-mem2,id=cxl-pmem0,num-dc-regions=1\
> -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k"
> 
> 
> Kernel DCD support used to test the changes
> 
> The code is tested with the posted kernel dcd support:
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.5/dcd-preview
> 
> commit: f425bc34c600e2a3721d6560202962ec41622815
> 
> To make the test work, we have made the following changes to the above kernel commit:
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 5f04bbc18af5..5f421d3c5cef 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -68,6 +68,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>  	CXL_CMD(SCAN_MEDIA, 0x11, 0, 0),
>  	CXL_CMD(GET_SCAN_MEDIA, 0, CXL_VARIABLE_PAYLOAD, 0),
>  	CXL_CMD(GET_DC_EXTENT_LIST, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> +	CXL_CMD(GET_DC_CONFIG, 0x2, CXL_VARIABLE_PAYLOAD, 0),
>  };
>  
>  /*
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 291c716abd49..ae10e3cf43a1 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -194,7 +194,7 @@ static int cxl_region_manage_dc(struct cxl_region *cxlr)
>  		}
>  		cxlds->dc_list_gen_num = extent_gen_num;
>  		dev_dbg(cxlds->dev, "No of preallocated extents :%d\n", rc);
> -		enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);
> +		/*enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);*/
>  	}
>  	return 0;
>  err:
> @@ -2810,7 +2810,8 @@ int cxl_add_dc_extent(struct cxl_dev_state *cxlds, struct resource *alloc_dpa_re
>  				dev_dax->align, memremap_compat_align()))) {
>  		rc = alloc_dev_dax_range(dev_dax, hpa,
>  					resource_size(alloc_dpa_res));
> -		return rc;
> +		if (rc)
> +			return rc;
>  	}
>  
>  	rc = xa_insert(&cxlr_dc->dax_dev_list, hpa, dev_dax, GFP_KERNEL);
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 9e45b1056022..653bec203838 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -659,7 +659,7 @@ static int cxl_event_irqsetup(struct cxl_dev_state *cxlds)
>  
>  	/* Driver enables DCD interrupt after creating the dc cxl_region */
>  	rc = cxl_event_req_irq(cxlds, policy.dyncap_settings, CXL_EVENT_TYPE_DCD,
> -					IRQF_SHARED | IRQF_ONESHOT | IRQF_NO_AUTOEN);
> +					IRQF_SHARED | IRQF_ONESHOT);
>  	if (rc) {
>  		dev_err(cxlds->dev, "Failed to get interrupt for event dc log\n");
>  		return rc;
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index 6ca85861750c..910a48259239 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -47,6 +47,7 @@
>  	___C(SCAN_MEDIA, "Scan Media"),                                   \
>  	___C(GET_SCAN_MEDIA, "Get Scan Media Results"),                   \
>  	___C(GET_DC_EXTENT_LIST, "Get dynamic capacity extents"),         \
> +	___C(GET_DC_CONFIG, "Get dynamic capacity configuration"),         \
>  	___C(MAX, "invalid / last command")
>  
>  #define ___C(a, b) CXL_MEM_COMMAND_ID_##a
> 
> 
> 
> Fan Ni (7):
>   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
>     payload of identify memory device command
>   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
>     and mailbox command support
>   hw/mem/cxl_type3: Add a parameter to pass number of DC regions the
>     device supports in qemu command line
>   hw/mem/cxl_type3: Add DC extent representative to cxl type3 device
>   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
>     dynamic capacity response
>   Add qmp interfaces to add/release dynamic capacity extents
>   hw/mem/cxl_type3: add read/write support to dynamic capacity
> 
>  hw/cxl/cxl-mailbox-utils.c  | 389 +++++++++++++++++++++++++++-
>  hw/mem/cxl_type3.c          | 492 +++++++++++++++++++++++++++++++-----
>  include/hw/cxl/cxl_device.h |  50 +++-
>  include/hw/cxl/cxl_events.h |  16 ++
>  qapi/cxl.json               |  44 ++++
>  5 files changed, 924 insertions(+), 67 deletions(-)
> 
> -- 
> 2.25.1
Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by Fan Ni 11 months ago
On Mon, Jun 05, 2023 at 10:35:48AM -0700, Ira Weiny wrote:
> Fan Ni wrote:
> > Since the early draft of DCD support in kernel is out
> > (https://urldefense.com/v3/__https://lore.kernel.org/linux-cxl/20230417164126.GA1904906@bgt-140510-bm03/T/*t__;Iw!!EwVzqGoTKBqv-0DWAJBm!RHzXPIcSiGsqUciUIH6HnlG_W--4L5CHfvcOIeUFdwKFhAujXuFDxjymmpCdOu7SLr61rww7lr21LzAGNOk$ ),
> > this patch series provide dcd emulation in qemu so people who are interested
> > can have an early try. It is noted that the patch series may need to be updated
> > accordingly if the kernel side implementation changes.
> 
> Fan,
> 
> Do you have a git tree we can pull this from which is updated to a more
> recent CXL branch from Jonathan?
> 
> Thanks,
> Ira

Hi Ira,

I have a git tree of the patch series based on Jonathan's branch
cxl-2023-02-28: https://github.com/moking/qemu-dev/tree/dcd-rfe.

That may be not new enough to include some of the recent patches, but I can
rebase it to a newer branch if you can tell me which branch you want to use.

Thanks,
Fan

> 
> > 
> > To support DCD emulation, the patch series add DCD related mailbox command
> > support (CXL Spec 3.0: 8.2.9.8.9), and extend the cxl type3 memory device
> > with dynamic capacity extent and region representative.
> > To support read/write to the dynamic capacity of the device, a host backend
> > is provided and necessary check mechnism is added to ensure the dynamic
> > capacity accessed is backed with active dc extents.
> > Currently FM related mailbox commands (cxl spec 3.0: 7.6.7.6) is not supported
> > , but we add two qmp interfaces for adding/releasing dynamic capacity extents.
> > Also, the support for multiple hosts sharing the same DCD case is missing.
> > 
> > Things we can try with the patch series together with kernel dcd code:
> > 1. Create DC regions to cover the address range of the dynamic capacity
> > regions.
> > 2. Add/release dynamic capacity extents to the device and notify the
> > kernel.
> > 3. Test kernel side code to accept added dc extents and create dax devices,
> > and release dc extents and notify the device
> > 4. Online the memory range backed with dc extents and let application use
> > them.
> > 
> > The patch series is based on Jonathan's local qemu branch:
> > https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2023-02-28__;!!EwVzqGoTKBqv-0DWAJBm!RHzXPIcSiGsqUciUIH6HnlG_W--4L5CHfvcOIeUFdwKFhAujXuFDxjymmpCdOu7SLr61rww7lr21OO3UHEM$ 
> > 
> > Simple tests peformed with the patch series:
> > 1 Install cxl modules:
> > 
> > modprobe -a cxl_acpi cxl_core cxl_pci cxl_port cxl_mem
> > 
> > 2 Create dc regions:
> > 
> > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> > echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> > echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> > echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> > echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
> > echo 0x10000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> > echo 0x10000000 > /sys/bus/cxl/devices/$region/size
> > echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> > echo 1 > /sys/bus/cxl/devices/$region/commit
> > echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> > 
> > /home/fan/cxl/tools-and-scripts# cxl list
> > [
> >   {
> >     "memdevs":[
> >       {
> >         "memdev":"mem0",
> >         "pmem_size":536870912,
> >         "ram_size":0,
> >         "serial":0,
> >         "host":"0000:0d:00.0"
> >       }
> >     ]
> >   },
> >   {
> >     "regions":[
> >       {
> >         "region":"region0",
> >         "resource":45365592064,
> >         "size":268435456,
> >         "interleave_ways":1,
> >         "interleave_granularity":256,
> >         "decode_state":"commit"
> >       }
> >     ]
> >   }
> > ]
> > 
> > 3 Add two dc extents (128MB each) through qmp interface
> > 
> > { "execute": "qmp_capabilities" }
> > 
> > { "execute": "cxl-add-dynamic-capacity-event",
> > 	"arguments": {
> > 		 "path": "/machine/peripheral/cxl-pmem0",
> > 		"region-id" : 0,
> > 		 "num-extent": 2,
> > 		"dpa":0,
> > 		"extent-len": 128
> > 	}
> > }
> > 
> > /home/fan/cxl/tools-and-scripts# lsmem
> > RANGE                                  SIZE   STATE REMOVABLE   BLOCK
> > 0x0000000000000000-0x000000007fffffff    2G  online       yes    0-15
> > 0x0000000100000000-0x000000027fffffff    6G  online       yes   32-79
> > 0x0000000a90000000-0x0000000a9fffffff  256M offline           338-339
> > 
> > Memory block size:       128M
> > Total online memory:       8G
> > Total offline memory:    256M
> > 
> > 
> > 4.Online the momory with 'daxctl online-memory dax0.0' to online the memory
> > 
> > /home/fan/cxl/ndctl# ./build/daxctl/daxctl online-memory dax0.0
> > [  230.730553] Fallback order for Node 0: 0 1
> > [  230.730825] Fallback order for Node 1: 1 0
> > [  230.730953] Built 2 zonelists, mobility grouping on.  Total pages: 2042541
> > [  230.731110] Policy zone: Normal
> > onlined memory for 1 device
> > 
> > root@bgt-140510-bm03:/home/fan/cxl/ndctl# lsmem
> > RANGE                                  SIZE   STATE REMOVABLE BLOCK
> > 0x0000000000000000-0x000000007fffffff    2G  online       yes  0-15
> > 0x0000000100000000-0x000000027fffffff    6G  online       yes 32-79
> > 0x0000000a90000000-0x0000000a97ffffff  128M  online       yes   338
> > 0x0000000a98000000-0x0000000a9fffffff  128M offline             339
> > 
> > Memory block size:       128M
> > Total online memory:     8.1G
> > Total offline memory:    128M
> > 
> > 5 using dc extents as regular memory
> > 
> > /home/fan/cxl/ndctl# numactl --membind=1 ls
> > CONTRIBUTING.md  README.md  clean_config.sh  cscope.out   git-version-gen
> > ndctl	       scripts	test.h      version.h.in COPYING		 acpi.h
> > config.h.meson   cxl	  make-git-snapshot.sh	ndctl.spec.in  sles	tools
> > Documentation	 build	    contrib	     daxctl	  meson.build		rhel
> > tags	topology.png LICENSES	 ccan	    cscope.files
> > git-version  meson_options.txt	rpmbuild.sh    test	util
> > 
> > 
> > QEMU command line cxl configuration:
> > 
> > RP1="-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=512M \
> > -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=512M \
> > -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=512M \
> > -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> > -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> > -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,dc-memdev=cxl-mem2,id=cxl-pmem0,num-dc-regions=1\
> > -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k"
> > 
> > 
> > Kernel DCD support used to test the changes
> > 
> > The code is tested with the posted kernel dcd support:
> > https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.5*dcd-preview__;Lw!!EwVzqGoTKBqv-0DWAJBm!RHzXPIcSiGsqUciUIH6HnlG_W--4L5CHfvcOIeUFdwKFhAujXuFDxjymmpCdOu7SLr61rww7lr21q5Iza3M$ 
> > 
> > commit: f425bc34c600e2a3721d6560202962ec41622815
> > 
> > To make the test work, we have made the following changes to the above kernel commit:
> > 
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 5f04bbc18af5..5f421d3c5cef 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -68,6 +68,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> >  	CXL_CMD(SCAN_MEDIA, 0x11, 0, 0),
> >  	CXL_CMD(GET_SCAN_MEDIA, 0, CXL_VARIABLE_PAYLOAD, 0),
> >  	CXL_CMD(GET_DC_EXTENT_LIST, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> > +	CXL_CMD(GET_DC_CONFIG, 0x2, CXL_VARIABLE_PAYLOAD, 0),
> >  };
> >  
> >  /*
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 291c716abd49..ae10e3cf43a1 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -194,7 +194,7 @@ static int cxl_region_manage_dc(struct cxl_region *cxlr)
> >  		}
> >  		cxlds->dc_list_gen_num = extent_gen_num;
> >  		dev_dbg(cxlds->dev, "No of preallocated extents :%d\n", rc);
> > -		enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);
> > +		/*enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);*/
> >  	}
> >  	return 0;
> >  err:
> > @@ -2810,7 +2810,8 @@ int cxl_add_dc_extent(struct cxl_dev_state *cxlds, struct resource *alloc_dpa_re
> >  				dev_dax->align, memremap_compat_align()))) {
> >  		rc = alloc_dev_dax_range(dev_dax, hpa,
> >  					resource_size(alloc_dpa_res));
> > -		return rc;
> > +		if (rc)
> > +			return rc;
> >  	}
> >  
> >  	rc = xa_insert(&cxlr_dc->dax_dev_list, hpa, dev_dax, GFP_KERNEL);
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 9e45b1056022..653bec203838 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -659,7 +659,7 @@ static int cxl_event_irqsetup(struct cxl_dev_state *cxlds)
> >  
> >  	/* Driver enables DCD interrupt after creating the dc cxl_region */
> >  	rc = cxl_event_req_irq(cxlds, policy.dyncap_settings, CXL_EVENT_TYPE_DCD,
> > -					IRQF_SHARED | IRQF_ONESHOT | IRQF_NO_AUTOEN);
> > +					IRQF_SHARED | IRQF_ONESHOT);
> >  	if (rc) {
> >  		dev_err(cxlds->dev, "Failed to get interrupt for event dc log\n");
> >  		return rc;
> > diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> > index 6ca85861750c..910a48259239 100644
> > --- a/include/uapi/linux/cxl_mem.h
> > +++ b/include/uapi/linux/cxl_mem.h
> > @@ -47,6 +47,7 @@
> >  	___C(SCAN_MEDIA, "Scan Media"),                                   \
> >  	___C(GET_SCAN_MEDIA, "Get Scan Media Results"),                   \
> >  	___C(GET_DC_EXTENT_LIST, "Get dynamic capacity extents"),         \
> > +	___C(GET_DC_CONFIG, "Get dynamic capacity configuration"),         \
> >  	___C(MAX, "invalid / last command")
> >  
> >  #define ___C(a, b) CXL_MEM_COMMAND_ID_##a
> > 
> > 
> > 
> > Fan Ni (7):
> >   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
> >     payload of identify memory device command
> >   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
> >     and mailbox command support
> >   hw/mem/cxl_type3: Add a parameter to pass number of DC regions the
> >     device supports in qemu command line
> >   hw/mem/cxl_type3: Add DC extent representative to cxl type3 device
> >   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
> >     dynamic capacity response
> >   Add qmp interfaces to add/release dynamic capacity extents
> >   hw/mem/cxl_type3: add read/write support to dynamic capacity
> > 
> >  hw/cxl/cxl-mailbox-utils.c  | 389 +++++++++++++++++++++++++++-
> >  hw/mem/cxl_type3.c          | 492 +++++++++++++++++++++++++++++++-----
> >  include/hw/cxl/cxl_device.h |  50 +++-
> >  include/hw/cxl/cxl_events.h |  16 ++
> >  qapi/cxl.json               |  44 ++++
> >  5 files changed, 924 insertions(+), 67 deletions(-)
> > 
> > -- 
> > 2.25.1
> 
> 
Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by Shesha Bhushan Sreenivasamurthy 11 months ago
Hi Fan,
   I am implementing DCD FMAPI commands and planning to start pushing changes to the below branch. That requires the contributions you have made. Can your changes be pushed to the below branch ?

https://gitlab.com/jic23/qemu/-/tree/cxl-2023-05-25


From: Fan Ni <fan.ni@samsung.com>
Sent: Monday, June 5, 2023 10:51 AM
To: Ira Weiny <ira.weiny@intel.com>
Cc: qemu-devel@nongnu.org <qemu-devel@nongnu.org>; jonathan.cameron@huawei.com <jonathan.cameron@huawei.com>; linux-cxl@vger.kernel.org <linux-cxl@vger.kernel.org>; gregory.price@memverge.com <gregory.price@memverge.com>; hchkuo@avery-design.com.tw <hchkuo@avery-design.com.tw>; cbrowy@avery-design.com <cbrowy@avery-design.com>; dan.j.williams@intel.com <dan.j.williams@intel.com>; Adam Manzanares <a.manzanares@samsung.com>; dave@stgolabs.net <dave@stgolabs.net>; nmtadam.samsung@gmail.com <nmtadam.samsung@gmail.com>; nifan@outlook.com <nifan@outlook.com>
Subject: Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu 
 
On Mon, Jun 05, 2023 at 10:35:48AM -0700, Ira Weiny wrote:
> Fan Ni wrote:
> > Since the early draft of DCD support in kernel is out
> > (https://urldefense.com/v3/__https://lore.kernel.org/linux-cxl/20230417164126.GA1904906@bgt-140510-bm03/T/*t__;Iw!!EwVzqGoTKBqv-0DWAJBm!RHzXPIcSiGsqUciUIH6HnlG_W--4L5CHfvcOIeUFdwKFhAujXuFDxjymmpCdOu7SLr61rww7lr21LzAGNOk$ ),
> > this patch series provide dcd emulation in qemu so people who are interested
> > can have an early try. It is noted that the patch series may need to be updated
> > accordingly if the kernel side implementation changes.
> 
> Fan,
> 
> Do you have a git tree we can pull this from which is updated to a more
> recent CXL branch from Jonathan?
> 
> Thanks,
> Ira

Hi Ira,

I have a git tree of the patch series based on Jonathan's branch
cxl-2023-02-28: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_moking_qemu-2Ddev_tree_dcd-2Drfe&d=DwIFAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=Zta64bwn4nurTRpD4LY2OGr8KklkMRPn7Z_Qy0o4unU&m=w6dicn5kXEG4Imk6TpICIjdA6KJ-xt84dtHui-Y0fv5H13bijtzEvjxECKE5MHYf&s=3yeO9RN5FY3gPfO2y19X057YeqRTTQTQNfNA-Gfir_Q&e= .

That may be not new enough to include some of the recent patches, but I can
rebase it to a newer branch if you can tell me which branch you want to use.

Thanks,
Fan

> 
> > 
> > To support DCD emulation, the patch series add DCD related mailbox command
> > support (CXL Spec 3.0: 8.2.9.8.9), and extend the cxl type3 memory device
> > with dynamic capacity extent and region representative.
> > To support read/write to the dynamic capacity of the device, a host backend
> > is provided and necessary check mechnism is added to ensure the dynamic
> > capacity accessed is backed with active dc extents.
> > Currently FM related mailbox commands (cxl spec 3.0: 7.6.7.6) is not supported
> > , but we add two qmp interfaces for adding/releasing dynamic capacity extents.
> > Also, the support for multiple hosts sharing the same DCD case is missing.
> > 
> > Things we can try with the patch series together with kernel dcd code:
> > 1. Create DC regions to cover the address range of the dynamic capacity
> > regions.
> > 2. Add/release dynamic capacity extents to the device and notify the
> > kernel.
> > 3. Test kernel side code to accept added dc extents and create dax devices,
> > and release dc extents and notify the device
> > 4. Online the memory range backed with dc extents and let application use
> > them.
> > 
> > The patch series is based on Jonathan's local qemu branch:
> > https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2023-02-28__;!!EwVzqGoTKBqv-0DWAJBm!RHzXPIcSiGsqUciUIH6HnlG_W--4L5CHfvcOIeUFdwKFhAujXuFDxjymmpCdOu7SLr61rww7lr21OO3UHEM$ 
> > 
> > Simple tests peformed with the patch series:
> > 1 Install cxl modules:
> > 
> > modprobe -a cxl_acpi cxl_core cxl_pci cxl_port cxl_mem
> > 
> > 2 Create dc regions:
> > 
> > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> > echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> > echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> > echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> > echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
> > echo 0x10000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> > echo 0x10000000 > /sys/bus/cxl/devices/$region/size
> > echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> > echo 1 > /sys/bus/cxl/devices/$region/commit
> > echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> > 
> > /home/fan/cxl/tools-and-scripts# cxl list
> > [
> >   {
> >     "memdevs":[
> >       {
> >         "memdev":"mem0",
> >         "pmem_size":536870912,
> >         "ram_size":0,
> >         "serial":0,
> >         "host":"0000:0d:00.0"
> >       }
> >     ]
> >   },
> >   {
> >     "regions":[
> >       {
> >         "region":"region0",
> >         "resource":45365592064,
> >         "size":268435456,
> >         "interleave_ways":1,
> >         "interleave_granularity":256,
> >         "decode_state":"commit"
> >       }
> >     ]
> >   }
> > ]
> > 
> > 3 Add two dc extents (128MB each) through qmp interface
> > 
> > { "execute": "qmp_capabilities" }
> > 
> > { "execute": "cxl-add-dynamic-capacity-event",
> >      "arguments": {
> >               "path": "/machine/peripheral/cxl-pmem0",
> >              "region-id" : 0,
> >               "num-extent": 2,
> >              "dpa":0,
> >              "extent-len": 128
> >      }
> > }
> > 
> > /home/fan/cxl/tools-and-scripts# lsmem
> > RANGE                                  SIZE   STATE REMOVABLE   BLOCK
> > 0x0000000000000000-0x000000007fffffff    2G  online       yes    0-15
> > 0x0000000100000000-0x000000027fffffff    6G  online       yes   32-79
> > 0x0000000a90000000-0x0000000a9fffffff  256M offline           338-339
> > 
> > Memory block size:       128M
> > Total online memory:       8G
> > Total offline memory:    256M
> > 
> > 
> > 4.Online the momory with 'daxctl online-memory dax0.0' to online the memory
> > 
> > /home/fan/cxl/ndctl# ./build/daxctl/daxctl online-memory dax0.0
> > [  230.730553] Fallback order for Node 0: 0 1
> > [  230.730825] Fallback order for Node 1: 1 0
> > [  230.730953] Built 2 zonelists, mobility grouping on.  Total pages: 2042541
> > [  230.731110] Policy zone: Normal
> > onlined memory for 1 device
> > 
> > root@bgt-140510-bm03:/home/fan/cxl/ndctl# lsmem
> > RANGE                                  SIZE   STATE REMOVABLE BLOCK
> > 0x0000000000000000-0x000000007fffffff    2G  online       yes  0-15
> > 0x0000000100000000-0x000000027fffffff    6G  online       yes 32-79
> > 0x0000000a90000000-0x0000000a97ffffff  128M  online       yes   338
> > 0x0000000a98000000-0x0000000a9fffffff  128M offline             339
> > 
> > Memory block size:       128M
> > Total online memory:     8.1G
> > Total offline memory:    128M
> > 
> > 5 using dc extents as regular memory
> > 
> > /home/fan/cxl/ndctl# numactl --membind=1 ls
> > CONTRIBUTING.md  README.md  clean_config.sh  cscope.out   git-version-gen
> > ndctl              scripts   test.h      version.h.in COPYING                 acpi.h
> > config.h.meson   cxl          make-git-snapshot.sh   ndctl.spec.in  sles     tools
> > Documentation        build       contrib           daxctl        meson.build            rhel
> > tags        topology.png LICENSES    ccan        cscope.files
> > git-version  meson_options.txt      rpmbuild.sh    test     util
> > 
> > 
> > QEMU command line cxl configuration:
> > 
> > RP1="-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=512M \
> > -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=512M \
> > -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=512M \
> > -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> > -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> > -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,dc-memdev=cxl-mem2,id=cxl-pmem0,num-dc-regions=1\
> > -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k"
> > 
> > 
> > Kernel DCD support used to test the changes
> > 
> > The code is tested with the posted kernel dcd support:
> > https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.5*dcd-preview__;Lw!!EwVzqGoTKBqv-0DWAJBm!RHzXPIcSiGsqUciUIH6HnlG_W--4L5CHfvcOIeUFdwKFhAujXuFDxjymmpCdOu7SLr61rww7lr21q5Iza3M$ 
> > 
> > commit: f425bc34c600e2a3721d6560202962ec41622815
> > 
> > To make the test work, we have made the following changes to the above kernel commit:
> > 
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 5f04bbc18af5..5f421d3c5cef 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -68,6 +68,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> >      CXL_CMD(SCAN_MEDIA, 0x11, 0, 0),
> >      CXL_CMD(GET_SCAN_MEDIA, 0, CXL_VARIABLE_PAYLOAD, 0),
> >      CXL_CMD(GET_DC_EXTENT_LIST, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> > +   CXL_CMD(GET_DC_CONFIG, 0x2, CXL_VARIABLE_PAYLOAD, 0),
> >  };
> >  
> >  /*
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 291c716abd49..ae10e3cf43a1 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -194,7 +194,7 @@ static int cxl_region_manage_dc(struct cxl_region *cxlr)
> >              }
> >              cxlds->dc_list_gen_num = extent_gen_num;
> >              dev_dbg(cxlds->dev, "No of preallocated extents :%d\n", rc);
> > -           enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);
> > +           /*enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);*/
> >      }
> >      return 0;
> >  err:
> > @@ -2810,7 +2810,8 @@ int cxl_add_dc_extent(struct cxl_dev_state *cxlds, struct resource *alloc_dpa_re
> >                              dev_dax->align, memremap_compat_align()))) {
> >              rc = alloc_dev_dax_range(dev_dax, hpa,
> >                                      resource_size(alloc_dpa_res));
> > -           return rc;
> > +           if (rc)
> > +                   return rc;
> >      }
> >  
> >      rc = xa_insert(&cxlr_dc->dax_dev_list, hpa, dev_dax, GFP_KERNEL);
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 9e45b1056022..653bec203838 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -659,7 +659,7 @@ static int cxl_event_irqsetup(struct cxl_dev_state *cxlds)
> >  
> >      /* Driver enables DCD interrupt after creating the dc cxl_region */
> >      rc = cxl_event_req_irq(cxlds, policy.dyncap_settings, CXL_EVENT_TYPE_DCD,
> > -                                   IRQF_SHARED | IRQF_ONESHOT | IRQF_NO_AUTOEN);
> > +                                   IRQF_SHARED | IRQF_ONESHOT);
> >      if (rc) {
> >              dev_err(cxlds->dev, "Failed to get interrupt for event dc log\n");
> >              return rc;
> > diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> > index 6ca85861750c..910a48259239 100644
> > --- a/include/uapi/linux/cxl_mem.h
> > +++ b/include/uapi/linux/cxl_mem.h
> > @@ -47,6 +47,7 @@
> >      ___C(SCAN_MEDIA, "Scan Media"),                                   \
> >      ___C(GET_SCAN_MEDIA, "Get Scan Media Results"),                   \
> >      ___C(GET_DC_EXTENT_LIST, "Get dynamic capacity extents"),         \
> > +   ___C(GET_DC_CONFIG, "Get dynamic capacity configuration"),         \
> >      ___C(MAX, "invalid / last command")
> >  
> >  #define ___C(a, b) CXL_MEM_COMMAND_ID_##a
> > 
> > 
> > 
> > Fan Ni (7):
> >   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
> >     payload of identify memory device command
> >   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
> >     and mailbox command support
> >   hw/mem/cxl_type3: Add a parameter to pass number of DC regions the
> >     device supports in qemu command line
> >   hw/mem/cxl_type3: Add DC extent representative to cxl type3 device
> >   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
> >     dynamic capacity response
> >   Add qmp interfaces to add/release dynamic capacity extents
> >   hw/mem/cxl_type3: add read/write support to dynamic capacity
> > 
> >  hw/cxl/cxl-mailbox-utils.c  | 389 +++++++++++++++++++++++++++-
> >  hw/mem/cxl_type3.c          | 492 +++++++++++++++++++++++++++++++-----
> >  include/hw/cxl/cxl_device.h |  50 +++-
> >  include/hw/cxl/cxl_events.h |  16 ++
> >  qapi/cxl.json               |  44 ++++
> >  5 files changed, 924 insertions(+), 67 deletions(-)
> > 
> > -- 
> > 2.25.1
> 
> 
Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by Ira Weiny 11 months ago
Shesha Bhushan Sreenivasamurthy wrote:
> Hi Fan,
>    I am implementing DCD FMAPI commands and planning to start pushing changes to the below branch. That requires the contributions you have made. Can your changes be pushed to the below branch ?
> 
> https://gitlab.com/jic23/qemu/-/tree/cxl-2023-05-25

This is the branch I'm trying to use as well.

Thanks,
Ira
Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by nifan@outlook.com 11 months ago
The 06/08/2023 08:43, Ira Weiny wrote:
> Shesha Bhushan Sreenivasamurthy wrote:
> > Hi Fan,
> >    I am implementing DCD FMAPI commands and planning to start pushing changes to the below branch. That requires the contributions you have made. Can your changes be pushed to the below branch ?
> > 
> > https://gitlab.com/jic23/qemu/-/tree/cxl-2023-05-25
> 
> This is the branch I'm trying to use as well.
> 
> Thanks,
> Ira

Hi Ira & Shesha,
FYI. I reabased my patch series on top of the above branch and created a new
branch here:

https://github.com/moking/qemu-dcd-preview-latest/tree/dcd-preview

It passes the same tests as shown here:
https://lore.kernel.org/linux-cxl/6481f70fca5c2_c82be29440@iweiny-mobl.notmuch/T/#m76f6e85ce3d7292b1982960eb22086ee03922166

-- 
Fan Ni <nifan@outlook.com>
Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by Ira Weiny 11 months ago
nifan@outlook.com wrote:
> The 06/08/2023 08:43, Ira Weiny wrote:
> > Shesha Bhushan Sreenivasamurthy wrote:

[snip]

> 
> Hi Ira & Shesha,
> FYI. I reabased my patch series on top of the above branch and created a new
> branch here:
> 
> https://github.com/moking/qemu-dcd-preview-latest/tree/dcd-preview

Thanks!

> 
> It passes the same tests as shown here:
> https://lore.kernel.org/linux-cxl/6481f70fca5c2_c82be29440@iweiny-mobl.notmuch/T/#m76f6e85ce3d7292b1982960eb22086ee03922166

I've not gotten very far with this testing.  But I did find that regular
type 3 devices don't work with this change.  I used the patch below to get
this working.  Was there something I was missing to configure a non-DCD
device?

I don't particularly like adding another bool to this call stack.  Seems
like this calls for a flags field but I want to move on to DCD work so I
hacked this in.

Ira

commit ed27935044dcbd2c6ba71f8411b218621f3f4167
Author: Ira Weiny <ira.weiny@intel.com>
Date:   Fri Jun 9 13:56:33 2023 -0700

    hw/mem/cxl_type3: Exclude DCD from CEL when type3 is not DCD
    
    Per CXL 3.0 9.13.3 Dynamic Capacity Device (DCD) when the type 3 memory
    device does not have DCD support the CEL should not include DCD
    configuration commands.
    
    If the number of DC regions supported is 0 skip the DCD commands in the
    CEL.
    
    Applies on top of Fan Ni's work here:
    https://github.com/moking/qemu-dcd-preview-latest/tree/dcd-preview
    
    Not-yet-Signed-off-by: Ira Weiny <ira.weiny@intel.com>

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index a4a2c6a80004..262e35935563 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -288,7 +288,7 @@ static void mailbox_reg_init_common(CXLDeviceState *cxl_dstate)
 
 static void memdev_reg_init_common(CXLDeviceState *cxl_dstate) { }
 
-void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
+void cxl_device_register_init_common(CXLDeviceState *cxl_dstate, bool is_dcd)
 {
     uint64_t *cap_hdrs = cxl_dstate->caps_reg_state64;
     const int cap_count = 3;
@@ -307,7 +307,7 @@ void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
     cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000, 1);
     memdev_reg_init_common(cxl_dstate);
 
-    cxl_initialize_mailbox(cxl_dstate, false);
+    cxl_initialize_mailbox(cxl_dstate, false, is_dcd);
 }
 
 void cxl_device_register_init_swcci(CXLDeviceState *cxl_dstate)
@@ -329,7 +329,7 @@ void cxl_device_register_init_swcci(CXLDeviceState *cxl_dstate)
     cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000, 1);
     memdev_reg_init_common(cxl_dstate);
 
-    cxl_initialize_mailbox(cxl_dstate, true);
+    cxl_initialize_mailbox(cxl_dstate, true, false);
 }
 
 uint64_t cxl_device_get_timestamp(CXLDeviceState *cxl_dstate)
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 93b26e717c94..80e9cb9a8f04 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1526,7 +1526,8 @@ static void bg_timercb(void *opaque)
     }
 }
 
-void cxl_initialize_mailbox(CXLDeviceState *cxl_dstate, bool switch_cci)
+void cxl_initialize_mailbox(CXLDeviceState *cxl_dstate, bool switch_cci,
+                            bool is_dcd)
 {
     if (!switch_cci) {
         cxl_dstate->cxl_cmd_set = cxl_cmd_set;
@@ -1534,6 +1535,9 @@ void cxl_initialize_mailbox(CXLDeviceState *cxl_dstate, bool switch_cci)
         cxl_dstate->cxl_cmd_set = cxl_cmd_set_sw;
     }
     for (int set = 0; set < 256; set++) {
+        if (!is_dcd && set == DCD_CONFIG) {
+            continue;
+        }
         for (int cmd = 0; cmd < 256; cmd++) {
             if (cxl_dstate->cxl_cmd_set[set][cmd].handler) {
                 struct cxl_cmd *c = &cxl_dstate->cxl_cmd_set[set][cmd];
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 329e8b5915b3..e6e6e125990c 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1276,9 +1276,11 @@ static void ct3d_reset(DeviceState *dev)
     CXLType3Dev *ct3d = CXL_TYPE3(dev);
     uint32_t *reg_state = ct3d->cxl_cstate.crb.cache_mem_registers;
     uint32_t *write_msk = ct3d->cxl_cstate.crb.cache_mem_regs_write_mask;
+    bool is_dcd;
 
     cxl_component_register_init_common(reg_state, write_msk, CXL2_TYPE3_DEVICE);
-    cxl_device_register_init_common(&ct3d->cxl_dstate);
+    is_dcd = (ct3d->dc.num_regions != 0);
+    cxl_device_register_init_common(&ct3d->cxl_dstate, is_dcd);
 }
 
 static Property ct3_props[] = {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 1ccddcca7d0d..4621bba4f533 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -233,7 +233,7 @@ typedef struct cxl_device_state {
 void cxl_device_register_block_init(Object *obj, CXLDeviceState *dev);
 
 /* Set up default values for the register block */
-void cxl_device_register_init_common(CXLDeviceState *dev);
+void cxl_device_register_init_common(CXLDeviceState *dev, bool is_dcd);
 void cxl_device_register_init_swcci(CXLDeviceState *dev);
 
 /*
@@ -280,7 +280,7 @@ CXL_DEVICE_CAPABILITY_HEADER_REGISTER(MEMORY_DEVICE,
                                       CXL_DEVICE_CAP_HDR1_OFFSET +
                                           CXL_DEVICE_CAP_REG_SIZE * 2)
 
-void cxl_initialize_mailbox(CXLDeviceState *cxl_dstate, bool switch_cci);
+void cxl_initialize_mailbox(CXLDeviceState *cxl_dstate, bool switch_cci, bool is_dcd);
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate);
 
 #define cxl_device_cap_init(dstate, reg, cap_id, ver)                      \
Re: [EXT] Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by Shesha Bhushan Sreenivasamurthy 11 months ago


From: Ira Weiny <ira.weiny@intel.com>
Sent: Friday, June 9, 2023 2:06 PM
To: nifan@outlook.com <nifan@outlook.com>; Ira Weiny <ira.weiny@intel.com>; Shesha Bhushan Sreenivasamurthy <sheshas@marvell.com>
Cc: Shesha Bhushan Sreenivasamurthy <sheshas@marvell.com>; Fan Ni <fan.ni@samsung.com>; Jonathan Cameron <Jonathan.Cameron@huawei.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; linux-cxl@vger.kernel.org <linux-cxl@vger.kernel.org>; gregory.price@memverge.com <gregory.price@memverge.com>; hchkuo@avery-design.com.tw <hchkuo@avery-design.com.tw>; cbrowy@avery-design.com <cbrowy@avery-design.com>; dan.j.williams@intel.com <dan.j.williams@intel.com>; Adam Manzanares <a.manzanares@samsung.com>; dave@stgolabs.net <dave@stgolabs.net>; nmtadam.samsung@gmail.com <nmtadam.samsung@gmail.com>
Subject: [EXT] Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu 
 
External Email

----------------------------------------------------------------------
nifan@outlook.com wrote:
> The 06/08/2023 08:43, Ira Weiny wrote:
> > Shesha Bhushan Sreenivasamurthy wrote:

[snip]

> 
> Hi Ira & Shesha,
> FYI. I reabased my patch series on top of the above branch and created a new
> branch here:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_moking_qemu-2Ddcd-2Dpreview-2Dlatest_tree_dcd-2Dpreview&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=Zta64bwn4nurTRpD4LY2OGr8KklkMRPn7Z_Qy0o4unU&m=SvyB_49EIFUT8-ZEVgIEYYjU6-zGTX4wb30kuNLUhkTSHYZK5-C0Gxr7uvefhtj4&s=MFD7qlSaTuy-w6aDmavIMbSP_aeaqZmSML7IVOX5jLs&e= 

Thanks!

> 
> It passes the same tests as shown here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_linux-2Dcxl_6481f70fca5c2-5Fc82be29440-40iweiny-2Dmobl.notmuch_T_-23m76f6e85ce3d7292b1982960eb22086ee03922166&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=Zta64bwn4nurTRpD4LY2OGr8KklkMRPn7Z_Qy0o4unU&m=SvyB_49EIFUT8-ZEVgIEYYjU6-zGTX4wb30kuNLUhkTSHYZK5-C0Gxr7uvefhtj4&s=e-fQOi0RzSZXxfSz37Bpz1sKtp7Yy0MWqonZnswK0RU&e= 

I've not gotten very far with this testing.  But I did find that regular
type 3 devices don't work with this change.  I used the patch below to get
this working.  Was there something I was missing to configure a non-DCD
device?

I don't particularly like adding another bool to this call stack.  Seems
like this calls for a flags field but I want to move on to DCD work so I
hacked this in.

I am working on the DCD FM-API commands here -
https://gitlab.com/sheshas/qemu-fmapi/-/tree/cxl-2023-05-25
-Shesha

Ira

commit ed27935044dcbd2c6ba71f8411b218621f3f4167
Author: Ira Weiny <ira.weiny@intel.com>
Date:   Fri Jun 9 13:56:33 2023 -0700

    hw/mem/cxl_type3: Exclude DCD from CEL when type3 is not DCD
    
    Per CXL 3.0 9.13.3 Dynamic Capacity Device (DCD) when the type 3 memory
    device does not have DCD support the CEL should not include DCD
    configuration commands.
    
    If the number of DC regions supported is 0 skip the DCD commands in the
    CEL.
    
    Applies on top of Fan Ni's work here:
    https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_moking_qemu-2Ddcd-2Dpreview-2Dlatest_tree_dcd-2Dpreview&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=Zta64bwn4nurTRpD4LY2OGr8KklkMRPn7Z_Qy0o4unU&m=SvyB_49EIFUT8-ZEVgIEYYjU6-zGTX4wb30kuNLUhkTSHYZK5-C0Gxr7uvefhtj4&s=MFD7qlSaTuy-w6aDmavIMbSP_aeaqZmSML7IVOX5jLs&e= 
    
    Not-yet-Signed-off-by: Ira Weiny <ira.weiny@intel.com>

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index a4a2c6a80004..262e35935563 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -288,7 +288,7 @@ static void mailbox_reg_init_common(CXLDeviceState *cxl_dstate)
 
 static void memdev_reg_init_common(CXLDeviceState *cxl_dstate) { }
 
-void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
+void cxl_device_register_init_common(CXLDeviceState *cxl_dstate, bool is_dcd)
 {
     uint64_t *cap_hdrs = cxl_dstate->caps_reg_state64;
     const int cap_count = 3;
@@ -307,7 +307,7 @@ void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
     cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000, 1);
     memdev_reg_init_common(cxl_dstate);
 
-    cxl_initialize_mailbox(cxl_dstate, false);
+    cxl_initialize_mailbox(cxl_dstate, false, is_dcd);
 }
 
 void cxl_device_register_init_swcci(CXLDeviceState *cxl_dstate)
@@ -329,7 +329,7 @@ void cxl_device_register_init_swcci(CXLDeviceState *cxl_dstate)
     cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000, 1);
     memdev_reg_init_common(cxl_dstate);
 
-    cxl_initialize_mailbox(cxl_dstate, true);
+    cxl_initialize_mailbox(cxl_dstate, true, false);
 }
 
 uint64_t cxl_device_get_timestamp(CXLDeviceState *cxl_dstate)
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 93b26e717c94..80e9cb9a8f04 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1526,7 +1526,8 @@ static void bg_timercb(void *opaque)
     }
 }
 
-void cxl_initialize_mailbox(CXLDeviceState *cxl_dstate, bool switch_cci)
+void cxl_initialize_mailbox(CXLDeviceState *cxl_dstate, bool switch_cci,
+                            bool is_dcd)
 {
     if (!switch_cci) {
         cxl_dstate->cxl_cmd_set = cxl_cmd_set;
@@ -1534,6 +1535,9 @@ void cxl_initialize_mailbox(CXLDeviceState *cxl_dstate, bool switch_cci)
         cxl_dstate->cxl_cmd_set = cxl_cmd_set_sw;
     }
     for (int set = 0; set < 256; set++) {
+        if (!is_dcd && set == DCD_CONFIG) {
+            continue;
+        }
         for (int cmd = 0; cmd < 256; cmd++) {
             if (cxl_dstate->cxl_cmd_set[set][cmd].handler) {
                 struct cxl_cmd *c = &cxl_dstate->cxl_cmd_set[set][cmd];
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 329e8b5915b3..e6e6e125990c 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1276,9 +1276,11 @@ static void ct3d_reset(DeviceState *dev)
     CXLType3Dev *ct3d = CXL_TYPE3(dev);
     uint32_t *reg_state = ct3d->cxl_cstate.crb.cache_mem_registers;
     uint32_t *write_msk = ct3d->cxl_cstate.crb.cache_mem_regs_write_mask;
+    bool is_dcd;
 
     cxl_component_register_init_common(reg_state, write_msk, CXL2_TYPE3_DEVICE);
-    cxl_device_register_init_common(&ct3d->cxl_dstate);
+    is_dcd = (ct3d->dc.num_regions != 0);
+    cxl_device_register_init_common(&ct3d->cxl_dstate, is_dcd);
 }
 
 static Property ct3_props[] = {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 1ccddcca7d0d..4621bba4f533 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -233,7 +233,7 @@ typedef struct cxl_device_state {
 void cxl_device_register_block_init(Object *obj, CXLDeviceState *dev);
 
 /* Set up default values for the register block */
-void cxl_device_register_init_common(CXLDeviceState *dev);
+void cxl_device_register_init_common(CXLDeviceState *dev, bool is_dcd);
 void cxl_device_register_init_swcci(CXLDeviceState *dev);
 
 /*
@@ -280,7 +280,7 @@ CXL_DEVICE_CAPABILITY_HEADER_REGISTER(MEMORY_DEVICE,
                                       CXL_DEVICE_CAP_HDR1_OFFSET +
                                           CXL_DEVICE_CAP_REG_SIZE * 2)
 
-void cxl_initialize_mailbox(CXLDeviceState *cxl_dstate, bool switch_cci);
+void cxl_initialize_mailbox(CXLDeviceState *cxl_dstate, bool switch_cci, bool is_dcd);
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate);
 
 #define cxl_device_cap_init(dstate, reg, cap_id, ver)                      \
Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by Fan Ni 11 months ago
On Wed, Jun 07, 2023 at 06:13:01PM +0000, Shesha Bhushan Sreenivasamurthy wrote:
> Hi Fan,
>    I am implementing DCD FMAPI commands and planning to start pushing changes to the below branch. That requires the contributions you have made. Can your changes be pushed to the below branch ?
> 
> https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2023-05-25__;!!EwVzqGoTKBqv-0DWAJBm!Vt5uIqwW-L4c4gh02ulI4M762JNQ3_aE9k9lb6QlwE2xm6T23ic7ig7Y77i1VN7l_RX_ySIQhre_z7Q0JA$ 

Can you push changes to the branch directly? I think it is Jonathan's private
branch. However, I can fork the branch and rebase my patch series atop and
share with you the new repo if that helps you move forward your
work.
Let me know your thought.

Fan

> 
> 
> From: Fan Ni <fan.ni@samsung.com>
> Sent: Monday, June 5, 2023 10:51 AM
> To: Ira Weiny <ira.weiny@intel.com>
> Cc: qemu-devel@nongnu.org <qemu-devel@nongnu.org>; jonathan.cameron@huawei.com <jonathan.cameron@huawei.com>; linux-cxl@vger.kernel.org <linux-cxl@vger.kernel.org>; gregory.price@memverge.com <gregory.price@memverge.com>; hchkuo@avery-design.com.tw <hchkuo@avery-design.com.tw>; cbrowy@avery-design.com <cbrowy@avery-design.com>; dan.j.williams@intel.com <dan.j.williams@intel.com>; Adam Manzanares <a.manzanares@samsung.com>; dave@stgolabs.net <dave@stgolabs.net>; nmtadam.samsung@gmail.com <nmtadam.samsung@gmail.com>; nifan@outlook.com <nifan@outlook.com>
> Subject: Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu 
>  
> On Mon, Jun 05, 2023 at 10:35:48AM -0700, Ira Weiny wrote:
> > Fan Ni wrote:
> > > Since the early draft of DCD support in kernel is out
> > > (https://urldefense.com/v3/__https://lore.kernel.org/linux-cxl/20230417164126.GA1904906@bgt-140510-bm03/T/*t__;Iw!!EwVzqGoTKBqv-0DWAJBm!RHzXPIcSiGsqUciUIH6HnlG_W--4L5CHfvcOIeUFdwKFhAujXuFDxjymmpCdOu7SLr61rww7lr21LzAGNOk$ ),
> > > this patch series provide dcd emulation in qemu so people who are interested
> > > can have an early try. It is noted that the patch series may need to be updated
> > > accordingly if the kernel side implementation changes.
> > 
> > Fan,
> > 
> > Do you have a git tree we can pull this from which is updated to a more
> > recent CXL branch from Jonathan?
> > 
> > Thanks,
> > Ira
> 
> Hi Ira,
> 
> I have a git tree of the patch series based on Jonathan's branch
> cxl-2023-02-28: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_moking_qemu-2Ddev_tree_dcd-2Drfe&d=DwIFAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=Zta64bwn4nurTRpD4LY2OGr8KklkMRPn7Z_Qy0o4unU&m=w6dicn5kXEG4Imk6TpICIjdA6KJ-xt84dtHui-Y0fv5H13bijtzEvjxECKE5MHYf&s=3yeO9RN5FY3gPfO2y19X057YeqRTTQTQNfNA-Gfir_Q&e= .
> 
> That may be not new enough to include some of the recent patches, but I can
> rebase it to a newer branch if you can tell me which branch you want to use.
> 
> Thanks,
> Fan
> 
> > 
> > > 
> > > To support DCD emulation, the patch series add DCD related mailbox command
> > > support (CXL Spec 3.0: 8.2.9.8.9), and extend the cxl type3 memory device
> > > with dynamic capacity extent and region representative.
> > > To support read/write to the dynamic capacity of the device, a host backend
> > > is provided and necessary check mechnism is added to ensure the dynamic
> > > capacity accessed is backed with active dc extents.
> > > Currently FM related mailbox commands (cxl spec 3.0: 7.6.7.6) is not supported
> > > , but we add two qmp interfaces for adding/releasing dynamic capacity extents.
> > > Also, the support for multiple hosts sharing the same DCD case is missing.
> > > 
> > > Things we can try with the patch series together with kernel dcd code:
> > > 1. Create DC regions to cover the address range of the dynamic capacity
> > > regions.
> > > 2. Add/release dynamic capacity extents to the device and notify the
> > > kernel.
> > > 3. Test kernel side code to accept added dc extents and create dax devices,
> > > and release dc extents and notify the device
> > > 4. Online the memory range backed with dc extents and let application use
> > > them.
> > > 
> > > The patch series is based on Jonathan's local qemu branch:
> > > https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2023-02-28__;!!EwVzqGoTKBqv-0DWAJBm!RHzXPIcSiGsqUciUIH6HnlG_W--4L5CHfvcOIeUFdwKFhAujXuFDxjymmpCdOu7SLr61rww7lr21OO3UHEM$ 
> > > 
> > > Simple tests peformed with the patch series:
> > > 1 Install cxl modules:
> > > 
> > > modprobe -a cxl_acpi cxl_core cxl_pci cxl_port cxl_mem
> > > 
> > > 2 Create dc regions:
> > > 
> > > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> > > echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> > > echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> > > echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> > > echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
> > > echo 0x10000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> > > echo 0x10000000 > /sys/bus/cxl/devices/$region/size
> > > echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> > > echo 1 > /sys/bus/cxl/devices/$region/commit
> > > echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> > > 
> > > /home/fan/cxl/tools-and-scripts# cxl list
> > > [
> > >   {
> > >     "memdevs":[
> > >       {
> > >         "memdev":"mem0",
> > >         "pmem_size":536870912,
> > >         "ram_size":0,
> > >         "serial":0,
> > >         "host":"0000:0d:00.0"
> > >       }
> > >     ]
> > >   },
> > >   {
> > >     "regions":[
> > >       {
> > >         "region":"region0",
> > >         "resource":45365592064,
> > >         "size":268435456,
> > >         "interleave_ways":1,
> > >         "interleave_granularity":256,
> > >         "decode_state":"commit"
> > >       }
> > >     ]
> > >   }
> > > ]
> > > 
> > > 3 Add two dc extents (128MB each) through qmp interface
> > > 
> > > { "execute": "qmp_capabilities" }
> > > 
> > > { "execute": "cxl-add-dynamic-capacity-event",
> > >      "arguments": {
> > >               "path": "/machine/peripheral/cxl-pmem0",
> > >              "region-id" : 0,
> > >               "num-extent": 2,
> > >              "dpa":0,
> > >              "extent-len": 128
> > >      }
> > > }
> > > 
> > > /home/fan/cxl/tools-and-scripts# lsmem
> > > RANGE                                  SIZE   STATE REMOVABLE   BLOCK
> > > 0x0000000000000000-0x000000007fffffff    2G  online       yes    0-15
> > > 0x0000000100000000-0x000000027fffffff    6G  online       yes   32-79
> > > 0x0000000a90000000-0x0000000a9fffffff  256M offline           338-339
> > > 
> > > Memory block size:       128M
> > > Total online memory:       8G
> > > Total offline memory:    256M
> > > 
> > > 
> > > 4.Online the momory with 'daxctl online-memory dax0.0' to online the memory
> > > 
> > > /home/fan/cxl/ndctl# ./build/daxctl/daxctl online-memory dax0.0
> > > [  230.730553] Fallback order for Node 0: 0 1
> > > [  230.730825] Fallback order for Node 1: 1 0
> > > [  230.730953] Built 2 zonelists, mobility grouping on.  Total pages: 2042541
> > > [  230.731110] Policy zone: Normal
> > > onlined memory for 1 device
> > > 
> > > root@bgt-140510-bm03:/home/fan/cxl/ndctl# lsmem
> > > RANGE                                  SIZE   STATE REMOVABLE BLOCK
> > > 0x0000000000000000-0x000000007fffffff    2G  online       yes  0-15
> > > 0x0000000100000000-0x000000027fffffff    6G  online       yes 32-79
> > > 0x0000000a90000000-0x0000000a97ffffff  128M  online       yes   338
> > > 0x0000000a98000000-0x0000000a9fffffff  128M offline             339
> > > 
> > > Memory block size:       128M
> > > Total online memory:     8.1G
> > > Total offline memory:    128M
> > > 
> > > 5 using dc extents as regular memory
> > > 
> > > /home/fan/cxl/ndctl# numactl --membind=1 ls
> > > CONTRIBUTING.md  README.md  clean_config.sh  cscope.out   git-version-gen
> > > ndctl              scripts   test.h      version.h.in COPYING                 acpi.h
> > > config.h.meson   cxl          make-git-snapshot.sh   ndctl.spec.in  sles     tools
> > > Documentation        build       contrib           daxctl        meson.build            rhel
> > > tags        topology.png LICENSES    ccan        cscope.files
> > > git-version  meson_options.txt      rpmbuild.sh    test     util
> > > 
> > > 
> > > QEMU command line cxl configuration:
> > > 
> > > RP1="-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=512M \
> > > -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=512M \
> > > -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=512M \
> > > -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> > > -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> > > -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,dc-memdev=cxl-mem2,id=cxl-pmem0,num-dc-regions=1\
> > > -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k"
> > > 
> > > 
> > > Kernel DCD support used to test the changes
> > > 
> > > The code is tested with the posted kernel dcd support:
> > > https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.5*dcd-preview__;Lw!!EwVzqGoTKBqv-0DWAJBm!RHzXPIcSiGsqUciUIH6HnlG_W--4L5CHfvcOIeUFdwKFhAujXuFDxjymmpCdOu7SLr61rww7lr21q5Iza3M$ 
> > > 
> > > commit: f425bc34c600e2a3721d6560202962ec41622815
> > > 
> > > To make the test work, we have made the following changes to the above kernel commit:
> > > 
> > > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > > index 5f04bbc18af5..5f421d3c5cef 100644
> > > --- a/drivers/cxl/core/mbox.c
> > > +++ b/drivers/cxl/core/mbox.c
> > > @@ -68,6 +68,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> > >      CXL_CMD(SCAN_MEDIA, 0x11, 0, 0),
> > >      CXL_CMD(GET_SCAN_MEDIA, 0, CXL_VARIABLE_PAYLOAD, 0),
> > >      CXL_CMD(GET_DC_EXTENT_LIST, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> > > +   CXL_CMD(GET_DC_CONFIG, 0x2, CXL_VARIABLE_PAYLOAD, 0),
> > >  };
> > >  
> > >  /*
> > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > > index 291c716abd49..ae10e3cf43a1 100644
> > > --- a/drivers/cxl/core/region.c
> > > +++ b/drivers/cxl/core/region.c
> > > @@ -194,7 +194,7 @@ static int cxl_region_manage_dc(struct cxl_region *cxlr)
> > >              }
> > >              cxlds->dc_list_gen_num = extent_gen_num;
> > >              dev_dbg(cxlds->dev, "No of preallocated extents :%d\n", rc);
> > > -           enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);
> > > +           /*enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);*/
> > >      }
> > >      return 0;
> > >  err:
> > > @@ -2810,7 +2810,8 @@ int cxl_add_dc_extent(struct cxl_dev_state *cxlds, struct resource *alloc_dpa_re
> > >                              dev_dax->align, memremap_compat_align()))) {
> > >              rc = alloc_dev_dax_range(dev_dax, hpa,
> > >                                      resource_size(alloc_dpa_res));
> > > -           return rc;
> > > +           if (rc)
> > > +                   return rc;
> > >      }
> > >  
> > >      rc = xa_insert(&cxlr_dc->dax_dev_list, hpa, dev_dax, GFP_KERNEL);
> > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > index 9e45b1056022..653bec203838 100644
> > > --- a/drivers/cxl/pci.c
> > > +++ b/drivers/cxl/pci.c
> > > @@ -659,7 +659,7 @@ static int cxl_event_irqsetup(struct cxl_dev_state *cxlds)
> > >  
> > >      /* Driver enables DCD interrupt after creating the dc cxl_region */
> > >      rc = cxl_event_req_irq(cxlds, policy.dyncap_settings, CXL_EVENT_TYPE_DCD,
> > > -                                   IRQF_SHARED | IRQF_ONESHOT | IRQF_NO_AUTOEN);
> > > +                                   IRQF_SHARED | IRQF_ONESHOT);
> > >      if (rc) {
> > >              dev_err(cxlds->dev, "Failed to get interrupt for event dc log\n");
> > >              return rc;
> > > diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> > > index 6ca85861750c..910a48259239 100644
> > > --- a/include/uapi/linux/cxl_mem.h
> > > +++ b/include/uapi/linux/cxl_mem.h
> > > @@ -47,6 +47,7 @@
> > >      ___C(SCAN_MEDIA, "Scan Media"),                                   \
> > >      ___C(GET_SCAN_MEDIA, "Get Scan Media Results"),                   \
> > >      ___C(GET_DC_EXTENT_LIST, "Get dynamic capacity extents"),         \
> > > +   ___C(GET_DC_CONFIG, "Get dynamic capacity configuration"),         \
> > >      ___C(MAX, "invalid / last command")
> > >  
> > >  #define ___C(a, b) CXL_MEM_COMMAND_ID_##a
> > > 
> > > 
> > > 
> > > Fan Ni (7):
> > >   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
> > >     payload of identify memory device command
> > >   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
> > >     and mailbox command support
> > >   hw/mem/cxl_type3: Add a parameter to pass number of DC regions the
> > >     device supports in qemu command line
> > >   hw/mem/cxl_type3: Add DC extent representative to cxl type3 device
> > >   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
> > >     dynamic capacity response
> > >   Add qmp interfaces to add/release dynamic capacity extents
> > >   hw/mem/cxl_type3: add read/write support to dynamic capacity
> > > 
> > >  hw/cxl/cxl-mailbox-utils.c  | 389 +++++++++++++++++++++++++++-
> > >  hw/mem/cxl_type3.c          | 492 +++++++++++++++++++++++++++++++-----
> > >  include/hw/cxl/cxl_device.h |  50 +++-
> > >  include/hw/cxl/cxl_events.h |  16 ++
> > >  qapi/cxl.json               |  44 ++++
> > >  5 files changed, 924 insertions(+), 67 deletions(-)
> > > 
> > > -- 
> > > 2.25.1
> > 
> > 
Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by Jonathan Cameron via 11 months, 3 weeks ago
On Thu, 11 May 2023 17:56:40 +0000
Fan Ni <fan.ni@samsung.com> wrote:

> Since the early draft of DCD support in kernel is out
> (https://lore.kernel.org/linux-cxl/20230417164126.GA1904906@bgt-140510-bm03/T/#t),
> this patch series provide dcd emulation in qemu so people who are interested
> can have an early try. It is noted that the patch series may need to be updated
> accordingly if the kernel side implementation changes.
> 
> To support DCD emulation, the patch series add DCD related mailbox command
> support (CXL Spec 3.0: 8.2.9.8.9), and extend the cxl type3 memory device
> with dynamic capacity extent and region representative.
> To support read/write to the dynamic capacity of the device, a host backend
> is provided and necessary check mechnism is added to ensure the dynamic
> capacity accessed is backed with active dc extents.
> Currently FM related mailbox commands (cxl spec 3.0: 7.6.7.6) is not supported
> , but we add two qmp interfaces for adding/releasing dynamic capacity extents.
> Also, the support for multiple hosts sharing the same DCD case is missing.
> 
> Things we can try with the patch series together with kernel dcd code:
> 1. Create DC regions to cover the address range of the dynamic capacity
> regions.
> 2. Add/release dynamic capacity extents to the device and notify the
> kernel.
> 3. Test kernel side code to accept added dc extents and create dax devices,
> and release dc extents and notify the device
> 4. Online the memory range backed with dc extents and let application use
> them.
> 
> The patch series is based on Jonathan's local qemu branch:
> https://gitlab.com/jic23/qemu/-/tree/cxl-2023-02-28
> 
> Simple tests peformed with the patch series:
> 1 Install cxl modules:
> 
> modprobe -a cxl_acpi cxl_core cxl_pci cxl_port cxl_mem
> 
> 2 Create dc regions:
> 
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
> echo 0x10000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> echo 0x10000000 > /sys/bus/cxl/devices/$region/size
> echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> echo 1 > /sys/bus/cxl/devices/$region/commit
> echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> 
> /home/fan/cxl/tools-and-scripts# cxl list
> [
>   {
>     "memdevs":[
>       {
>         "memdev":"mem0",
>         "pmem_size":536870912,
>         "ram_size":0,
>         "serial":0,
>         "host":"0000:0d:00.0"
>       }
>     ]
>   },
>   {
>     "regions":[
>       {
>         "region":"region0",
>         "resource":45365592064,
>         "size":268435456,
>         "interleave_ways":1,
>         "interleave_granularity":256,
>         "decode_state":"commit"
>       }
>     ]
>   }
> ]
> 
> 3 Add two dc extents (128MB each) through qmp interface
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity-event",
> 	"arguments": {
> 		 "path": "/machine/peripheral/cxl-pmem0",
> 		"region-id" : 0,
> 		 "num-extent": 2,
> 		"dpa":0,
> 		"extent-len": 128
> 	}
> }
> 
> /home/fan/cxl/tools-and-scripts# lsmem
> RANGE                                  SIZE   STATE REMOVABLE   BLOCK
> 0x0000000000000000-0x000000007fffffff    2G  online       yes    0-15
> 0x0000000100000000-0x000000027fffffff    6G  online       yes   32-79
> 0x0000000a90000000-0x0000000a9fffffff  256M offline           338-339
> 
> Memory block size:       128M
> Total online memory:       8G
> Total offline memory:    256M
> 
> 
> 4.Online the momory with 'daxctl online-memory dax0.0' to online the memory
> 
> /home/fan/cxl/ndctl# ./build/daxctl/daxctl online-memory dax0.0
> [  230.730553] Fallback order for Node 0: 0 1
> [  230.730825] Fallback order for Node 1: 1 0
> [  230.730953] Built 2 zonelists, mobility grouping on.  Total pages: 2042541
> [  230.731110] Policy zone: Normal
> onlined memory for 1 device
> 
> root@bgt-140510-bm03:/home/fan/cxl/ndctl# lsmem
> RANGE                                  SIZE   STATE REMOVABLE BLOCK
> 0x0000000000000000-0x000000007fffffff    2G  online       yes  0-15
> 0x0000000100000000-0x000000027fffffff    6G  online       yes 32-79
> 0x0000000a90000000-0x0000000a97ffffff  128M  online       yes   338
> 0x0000000a98000000-0x0000000a9fffffff  128M offline             339
> 
> Memory block size:       128M
> Total online memory:     8.1G
> Total offline memory:    128M
> 
> 5 using dc extents as regular memory
> 
> /home/fan/cxl/ndctl# numactl --membind=1 ls
> CONTRIBUTING.md  README.md  clean_config.sh  cscope.out   git-version-gen
> ndctl	       scripts	test.h      version.h.in COPYING		 acpi.h
> config.h.meson   cxl	  make-git-snapshot.sh	ndctl.spec.in  sles	tools
> Documentation	 build	    contrib	     daxctl	  meson.build		rhel
> tags	topology.png LICENSES	 ccan	    cscope.files
> git-version  meson_options.txt	rpmbuild.sh    test	util
> 
> 
> QEMU command line cxl configuration:
> 
> RP1="-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=512M \
> -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=512M \
> -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=512M \
> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,dc-memdev=cxl-mem2,id=cxl-pmem0,num-dc-regions=1\
> -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k"
> 
> 
> Kernel DCD support used to test the changes
> 
> The code is tested with the posted kernel dcd support:
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.5/dcd-preview
> 

Very nice!  +CC Navneet who may want to comment on the below (and the
emulation as well)

I've not had a chance to look at the code on the kernel side yet.


> commit: f425bc34c600e2a3721d6560202962ec41622815
> 
> To make the test work, we have made the following changes to the above kernel commit:
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 5f04bbc18af5..5f421d3c5cef 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -68,6 +68,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>  	CXL_CMD(SCAN_MEDIA, 0x11, 0, 0),
>  	CXL_CMD(GET_SCAN_MEDIA, 0, CXL_VARIABLE_PAYLOAD, 0),
>  	CXL_CMD(GET_DC_EXTENT_LIST, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> +	CXL_CMD(GET_DC_CONFIG, 0x2, CXL_VARIABLE_PAYLOAD, 0),
>  };
>  
>  /*
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 291c716abd49..ae10e3cf43a1 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -194,7 +194,7 @@ static int cxl_region_manage_dc(struct cxl_region *cxlr)
>  		}
>  		cxlds->dc_list_gen_num = extent_gen_num;
>  		dev_dbg(cxlds->dev, "No of preallocated extents :%d\n", rc);
> -		enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);
> +		/*enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);*/

Some race condition that means we need to enable the DCD event earlier?

>  	}
>  	return 0;
>  err:
> @@ -2810,7 +2810,8 @@ int cxl_add_dc_extent(struct cxl_dev_state *cxlds, struct resource *alloc_dpa_re
>  				dev_dax->align, memremap_compat_align()))) {
>  		rc = alloc_dev_dax_range(dev_dax, hpa,
>  					resource_size(alloc_dpa_res));
> -		return rc;
> +		if (rc)
> +			return rc;

No idea on this one as it's in the code I haven't looked at yet!

>  	}
>  
>  	rc = xa_insert(&cxlr_dc->dax_dev_list, hpa, dev_dax, GFP_KERNEL);
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 9e45b1056022..653bec203838 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -659,7 +659,7 @@ static int cxl_event_irqsetup(struct cxl_dev_state *cxlds)
>  
>  	/* Driver enables DCD interrupt after creating the dc cxl_region */
>  	rc = cxl_event_req_irq(cxlds, policy.dyncap_settings, CXL_EVENT_TYPE_DCD,
> -					IRQF_SHARED | IRQF_ONESHOT | IRQF_NO_AUTOEN);
> +					IRQF_SHARED | IRQF_ONESHOT);

This will be otherside of the removal of the enable above.

>  	if (rc) {
>  		dev_err(cxlds->dev, "Failed to get interrupt for event dc log\n");
>  		return rc;
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index 6ca85861750c..910a48259239 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -47,6 +47,7 @@
>  	___C(SCAN_MEDIA, "Scan Media"),                                   \
>  	___C(GET_SCAN_MEDIA, "Get Scan Media Results"),                   \
>  	___C(GET_DC_EXTENT_LIST, "Get dynamic capacity extents"),         \
> +	___C(GET_DC_CONFIG, "Get dynamic capacity configuration"),         \
>  	___C(MAX, "invalid / last command")
>  
>  #define ___C(a, b) CXL_MEM_COMMAND_ID_##a
> 
> 
> 
> Fan Ni (7):
>   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
>     payload of identify memory device command
>   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
>     and mailbox command support
>   hw/mem/cxl_type3: Add a parameter to pass number of DC regions the
>     device supports in qemu command line
>   hw/mem/cxl_type3: Add DC extent representative to cxl type3 device
>   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
>     dynamic capacity response
>   Add qmp interfaces to add/release dynamic capacity extents
>   hw/mem/cxl_type3: add read/write support to dynamic capacity
> 
>  hw/cxl/cxl-mailbox-utils.c  | 389 +++++++++++++++++++++++++++-
>  hw/mem/cxl_type3.c          | 492 +++++++++++++++++++++++++++++++-----
>  include/hw/cxl/cxl_device.h |  50 +++-
>  include/hw/cxl/cxl_events.h |  16 ++
>  qapi/cxl.json               |  44 ++++
>  5 files changed, 924 insertions(+), 67 deletions(-)
>
Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by nifan@outlook.com 10 months, 1 week ago
The 05/15/2023 14:00, Jonathan Cameron wrote:
> On Thu, 11 May 2023 17:56:40 +0000
> Fan Ni <fan.ni@samsung.com> wrote:
> 
> > Since the early draft of DCD support in kernel is out
> > (https://lore.kernel.org/linux-cxl/20230417164126.GA1904906@bgt-140510-bm03/T/#t),
> > this patch series provide dcd emulation in qemu so people who are interested
> > can have an early try. It is noted that the patch series may need to be updated
> > accordingly if the kernel side implementation changes.
> > 
> > To support DCD emulation, the patch series add DCD related mailbox command
> > support (CXL Spec 3.0: 8.2.9.8.9), and extend the cxl type3 memory device
> > with dynamic capacity extent and region representative.
> > To support read/write to the dynamic capacity of the device, a host backend
> > is provided and necessary check mechnism is added to ensure the dynamic
> > capacity accessed is backed with active dc extents.
> > Currently FM related mailbox commands (cxl spec 3.0: 7.6.7.6) is not supported
> > , but we add two qmp interfaces for adding/releasing dynamic capacity extents.
> > Also, the support for multiple hosts sharing the same DCD case is missing.
> > 
> > Things we can try with the patch series together with kernel dcd code:
> > 1. Create DC regions to cover the address range of the dynamic capacity
> > regions.
> > 2. Add/release dynamic capacity extents to the device and notify the
> > kernel.
> > 3. Test kernel side code to accept added dc extents and create dax devices,
> > and release dc extents and notify the device
> > 4. Online the memory range backed with dc extents and let application use
> > them.
> > 
> > The patch series is based on Jonathan's local qemu branch:
> > https://gitlab.com/jic23/qemu/-/tree/cxl-2023-02-28
> > 
> > Simple tests peformed with the patch series:
> > 1 Install cxl modules:
> > 
> > modprobe -a cxl_acpi cxl_core cxl_pci cxl_port cxl_mem
> > 
> > 2 Create dc regions:
> > 
> > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> > echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> > echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> > echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> > echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
> > echo 0x10000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> > echo 0x10000000 > /sys/bus/cxl/devices/$region/size
> > echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> > echo 1 > /sys/bus/cxl/devices/$region/commit
> > echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> > 
> > /home/fan/cxl/tools-and-scripts# cxl list
> > [
> >   {
> >     "memdevs":[
> >       {
> >         "memdev":"mem0",
> >         "pmem_size":536870912,
> >         "ram_size":0,
> >         "serial":0,
> >         "host":"0000:0d:00.0"
> >       }
> >     ]
> >   },
> >   {
> >     "regions":[
> >       {
> >         "region":"region0",
> >         "resource":45365592064,
> >         "size":268435456,
> >         "interleave_ways":1,
> >         "interleave_granularity":256,
> >         "decode_state":"commit"
> >       }
> >     ]
> >   }
> > ]
> > 
> > 3 Add two dc extents (128MB each) through qmp interface
> > 
> > { "execute": "qmp_capabilities" }
> > 
> > { "execute": "cxl-add-dynamic-capacity-event",
> > 	"arguments": {
> > 		 "path": "/machine/peripheral/cxl-pmem0",
> > 		"region-id" : 0,
> > 		 "num-extent": 2,
> > 		"dpa":0,
> > 		"extent-len": 128
> > 	}
> > }
> > 
> > /home/fan/cxl/tools-and-scripts# lsmem
> > RANGE                                  SIZE   STATE REMOVABLE   BLOCK
> > 0x0000000000000000-0x000000007fffffff    2G  online       yes    0-15
> > 0x0000000100000000-0x000000027fffffff    6G  online       yes   32-79
> > 0x0000000a90000000-0x0000000a9fffffff  256M offline           338-339
> > 
> > Memory block size:       128M
> > Total online memory:       8G
> > Total offline memory:    256M
> > 
> > 
> > 4.Online the momory with 'daxctl online-memory dax0.0' to online the memory
> > 
> > /home/fan/cxl/ndctl# ./build/daxctl/daxctl online-memory dax0.0
> > [  230.730553] Fallback order for Node 0: 0 1
> > [  230.730825] Fallback order for Node 1: 1 0
> > [  230.730953] Built 2 zonelists, mobility grouping on.  Total pages: 2042541
> > [  230.731110] Policy zone: Normal
> > onlined memory for 1 device
> > 
> > root@bgt-140510-bm03:/home/fan/cxl/ndctl# lsmem
> > RANGE                                  SIZE   STATE REMOVABLE BLOCK
> > 0x0000000000000000-0x000000007fffffff    2G  online       yes  0-15
> > 0x0000000100000000-0x000000027fffffff    6G  online       yes 32-79
> > 0x0000000a90000000-0x0000000a97ffffff  128M  online       yes   338
> > 0x0000000a98000000-0x0000000a9fffffff  128M offline             339
> > 
> > Memory block size:       128M
> > Total online memory:     8.1G
> > Total offline memory:    128M
> > 
> > 5 using dc extents as regular memory
> > 
> > /home/fan/cxl/ndctl# numactl --membind=1 ls
> > CONTRIBUTING.md  README.md  clean_config.sh  cscope.out   git-version-gen
> > ndctl	       scripts	test.h      version.h.in COPYING		 acpi.h
> > config.h.meson   cxl	  make-git-snapshot.sh	ndctl.spec.in  sles	tools
> > Documentation	 build	    contrib	     daxctl	  meson.build		rhel
> > tags	topology.png LICENSES	 ccan	    cscope.files
> > git-version  meson_options.txt	rpmbuild.sh    test	util
> > 
> > 
> > QEMU command line cxl configuration:
> > 
> > RP1="-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=512M \
> > -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=512M \
> > -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=512M \
> > -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> > -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> > -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,dc-memdev=cxl-mem2,id=cxl-pmem0,num-dc-regions=1\
> > -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k"
> > 
> > 
> > Kernel DCD support used to test the changes
> > 
> > The code is tested with the posted kernel dcd support:
> > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.5/dcd-preview
> > 
> 
> Very nice!  +CC Navneet who may want to comment on the below (and the
> emulation as well)
> 
> I've not had a chance to look at the code on the kernel side yet.

Thanks Jonathan for all the comments for the series, will reflect them
in the next version.

Fan

> 
> 
> > commit: f425bc34c600e2a3721d6560202962ec41622815
> > 
> > To make the test work, we have made the following changes to the above kernel commit:
> > 
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 5f04bbc18af5..5f421d3c5cef 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -68,6 +68,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> >  	CXL_CMD(SCAN_MEDIA, 0x11, 0, 0),
> >  	CXL_CMD(GET_SCAN_MEDIA, 0, CXL_VARIABLE_PAYLOAD, 0),
> >  	CXL_CMD(GET_DC_EXTENT_LIST, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> > +	CXL_CMD(GET_DC_CONFIG, 0x2, CXL_VARIABLE_PAYLOAD, 0),
> >  };
> >  
> >  /*
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 291c716abd49..ae10e3cf43a1 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -194,7 +194,7 @@ static int cxl_region_manage_dc(struct cxl_region *cxlr)
> >  		}
> >  		cxlds->dc_list_gen_num = extent_gen_num;
> >  		dev_dbg(cxlds->dev, "No of preallocated extents :%d\n", rc);
> > -		enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);
> > +		/*enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);*/
> 
> Some race condition that means we need to enable the DCD event earlier?
> 
> >  	}
> >  	return 0;
> >  err:
> > @@ -2810,7 +2810,8 @@ int cxl_add_dc_extent(struct cxl_dev_state *cxlds, struct resource *alloc_dpa_re
> >  				dev_dax->align, memremap_compat_align()))) {
> >  		rc = alloc_dev_dax_range(dev_dax, hpa,
> >  					resource_size(alloc_dpa_res));
> > -		return rc;
> > +		if (rc)
> > +			return rc;
> 
> No idea on this one as it's in the code I haven't looked at yet!
> 
> >  	}
> >  
> >  	rc = xa_insert(&cxlr_dc->dax_dev_list, hpa, dev_dax, GFP_KERNEL);
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 9e45b1056022..653bec203838 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -659,7 +659,7 @@ static int cxl_event_irqsetup(struct cxl_dev_state *cxlds)
> >  
> >  	/* Driver enables DCD interrupt after creating the dc cxl_region */
> >  	rc = cxl_event_req_irq(cxlds, policy.dyncap_settings, CXL_EVENT_TYPE_DCD,
> > -					IRQF_SHARED | IRQF_ONESHOT | IRQF_NO_AUTOEN);
> > +					IRQF_SHARED | IRQF_ONESHOT);
> 
> This will be otherside of the removal of the enable above.
> 
> >  	if (rc) {
> >  		dev_err(cxlds->dev, "Failed to get interrupt for event dc log\n");
> >  		return rc;
> > diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> > index 6ca85861750c..910a48259239 100644
> > --- a/include/uapi/linux/cxl_mem.h
> > +++ b/include/uapi/linux/cxl_mem.h
> > @@ -47,6 +47,7 @@
> >  	___C(SCAN_MEDIA, "Scan Media"),                                   \
> >  	___C(GET_SCAN_MEDIA, "Get Scan Media Results"),                   \
> >  	___C(GET_DC_EXTENT_LIST, "Get dynamic capacity extents"),         \
> > +	___C(GET_DC_CONFIG, "Get dynamic capacity configuration"),         \
> >  	___C(MAX, "invalid / last command")
> >  
> >  #define ___C(a, b) CXL_MEM_COMMAND_ID_##a
> > 
> > 
> > 
> > Fan Ni (7):
> >   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
> >     payload of identify memory device command
> >   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
> >     and mailbox command support
> >   hw/mem/cxl_type3: Add a parameter to pass number of DC regions the
> >     device supports in qemu command line
> >   hw/mem/cxl_type3: Add DC extent representative to cxl type3 device
> >   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
> >     dynamic capacity response
> >   Add qmp interfaces to add/release dynamic capacity extents
> >   hw/mem/cxl_type3: add read/write support to dynamic capacity
> > 
> >  hw/cxl/cxl-mailbox-utils.c  | 389 +++++++++++++++++++++++++++-
> >  hw/mem/cxl_type3.c          | 492 +++++++++++++++++++++++++++++++-----
> >  include/hw/cxl/cxl_device.h |  50 +++-
> >  include/hw/cxl/cxl_events.h |  16 ++
> >  qapi/cxl.json               |  44 ++++
> >  5 files changed, 924 insertions(+), 67 deletions(-)
> > 
> 

-- 
Fan Ni <nifan@outlook.com>
RE: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu
Posted by Singh, Navneet 11 months, 3 weeks ago

-----Original Message-----
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com> 
Sent: Monday, May 15, 2023 6:31 PM
To: Fan Ni <fan.ni@samsung.com>
Cc: qemu-devel@nongnu.org; linux-cxl@vger.kernel.org; gregory.price@memverge.com; hchkuo@avery-design.com.tw; Browy, Christopher <cbrowy@avery-design.com>; Weiny, Ira <ira.weiny@intel.com>; Williams, Dan J <dan.j.williams@intel.com>; Adam Manzanares <a.manzanares@samsung.com>; dave@stgolabs.net; nmtadam.samsung@gmail.com; nifan@outlook.com; Singh, Navneet <navneet.singh@intel.com>
Subject: Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu

On Thu, 11 May 2023 17:56:40 +0000
Fan Ni <fan.ni@samsung.com> wrote:

> Since the early draft of DCD support in kernel is out 
> (https://lore.kernel.org/linux-cxl/20230417164126.GA1904906@bgt-140510
> -bm03/T/#t), this patch series provide dcd emulation in qemu so people 
> who are interested can have an early try. It is noted that the patch 
> series may need to be updated accordingly if the kernel side 
> implementation changes.
> 
> To support DCD emulation, the patch series add DCD related mailbox 
> command support (CXL Spec 3.0: 8.2.9.8.9), and extend the cxl type3 
> memory device with dynamic capacity extent and region representative.
> To support read/write to the dynamic capacity of the device, a host 
> backend is provided and necessary check mechnism is added to ensure 
> the dynamic capacity accessed is backed with active dc extents.
> Currently FM related mailbox commands (cxl spec 3.0: 7.6.7.6) is not 
> supported , but we add two qmp interfaces for adding/releasing dynamic capacity extents.
> Also, the support for multiple hosts sharing the same DCD case is missing.
> 
> Things we can try with the patch series together with kernel dcd code:
> 1. Create DC regions to cover the address range of the dynamic 
> capacity regions.
> 2. Add/release dynamic capacity extents to the device and notify the 
> kernel.
> 3. Test kernel side code to accept added dc extents and create dax 
> devices, and release dc extents and notify the device 4. Online the 
> memory range backed with dc extents and let application use them.
> 
> The patch series is based on Jonathan's local qemu branch:
> https://gitlab.com/jic23/qemu/-/tree/cxl-2023-02-28
> 
> Simple tests peformed with the patch series:
> 1 Install cxl modules:
> 
> modprobe -a cxl_acpi cxl_core cxl_pci cxl_port cxl_mem
> 
> 2 Create dc regions:
> 
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
> echo 0x10000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> echo 0x10000000 > /sys/bus/cxl/devices/$region/size echo  "decoder2.0" 
> > /sys/bus/cxl/devices/$region/target0
> echo 1 > /sys/bus/cxl/devices/$region/commit
> echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> 
> /home/fan/cxl/tools-and-scripts# cxl list [
>   {
>     "memdevs":[
>       {
>         "memdev":"mem0",
>         "pmem_size":536870912,
>         "ram_size":0,
>         "serial":0,
>         "host":"0000:0d:00.0"
>       }
>     ]
>   },
>   {
>     "regions":[
>       {
>         "region":"region0",
>         "resource":45365592064,
>         "size":268435456,
>         "interleave_ways":1,
>         "interleave_granularity":256,
>         "decode_state":"commit"
>       }
>     ]
>   }
> ]
> 
> 3 Add two dc extents (128MB each) through qmp interface
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity-event",
> 	"arguments": {
> 		 "path": "/machine/peripheral/cxl-pmem0",
> 		"region-id" : 0,
> 		 "num-extent": 2,
> 		"dpa":0,
> 		"extent-len": 128
> 	}
> }
> 
> /home/fan/cxl/tools-and-scripts# lsmem
> RANGE                                  SIZE   STATE REMOVABLE   BLOCK
> 0x0000000000000000-0x000000007fffffff    2G  online       yes    0-15
> 0x0000000100000000-0x000000027fffffff    6G  online       yes   32-79
> 0x0000000a90000000-0x0000000a9fffffff  256M offline           338-339
> 
> Memory block size:       128M
> Total online memory:       8G
> Total offline memory:    256M
> 
> 
> 4.Online the momory with 'daxctl online-memory dax0.0' to online the 
> memory
> 
> /home/fan/cxl/ndctl# ./build/daxctl/daxctl online-memory dax0.0 [  
> 230.730553] Fallback order for Node 0: 0 1 [  230.730825] Fallback 
> order for Node 1: 1 0 [  230.730953] Built 2 zonelists, mobility 
> grouping on.  Total pages: 2042541 [  230.731110] Policy zone: Normal 
> onlined memory for 1 device
> 
> root@bgt-140510-bm03:/home/fan/cxl/ndctl# lsmem
> RANGE                                  SIZE   STATE REMOVABLE BLOCK
> 0x0000000000000000-0x000000007fffffff    2G  online       yes  0-15
> 0x0000000100000000-0x000000027fffffff    6G  online       yes 32-79
> 0x0000000a90000000-0x0000000a97ffffff  128M  online       yes   338
> 0x0000000a98000000-0x0000000a9fffffff  128M offline             339
> 
> Memory block size:       128M
> Total online memory:     8.1G
> Total offline memory:    128M
> 
> 5 using dc extents as regular memory
> 
> /home/fan/cxl/ndctl# numactl --membind=1 ls
> CONTRIBUTING.md  README.md  clean_config.sh  cscope.out   git-version-gen
> ndctl	       scripts	test.h      version.h.in COPYING		 acpi.h
> config.h.meson   cxl	  make-git-snapshot.sh	ndctl.spec.in  sles	tools
> Documentation	 build	    contrib	     daxctl	  meson.build		rhel
> tags	topology.png LICENSES	 ccan	    cscope.files
> git-version  meson_options.txt	rpmbuild.sh    test	util
> 
> 
> QEMU command line cxl configuration:
> 
> RP1="-object 
> memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,siz
> e=512M \ -object 
> memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,si
> ze=512M \ -object 
> memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=51
> 2M \ -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ -device 
> cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \ -device 
> cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,dc-memdev=cxl-m
> em2,id=cxl-pmem0,num-dc-regions=1\
> -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k"
> 
> 
> Kernel DCD support used to test the changes
> 
> The code is tested with the posted kernel dcd support:
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for
> -6.5/dcd-preview
> 

Very nice!  +CC Navneet who may want to comment on the below (and the emulation as well)

I've not had a chance to look at the code on the kernel side yet.


> commit: f425bc34c600e2a3721d6560202962ec41622815
> 
> To make the test work, we have made the following changes to the above kernel commit:
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 
> 5f04bbc18af5..5f421d3c5cef 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -68,6 +68,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>  	CXL_CMD(SCAN_MEDIA, 0x11, 0, 0),
>  	CXL_CMD(GET_SCAN_MEDIA, 0, CXL_VARIABLE_PAYLOAD, 0),
>  	CXL_CMD(GET_DC_EXTENT_LIST, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> +	CXL_CMD(GET_DC_CONFIG, 0x2, CXL_VARIABLE_PAYLOAD, 0),
>  };
>  
>  /*
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c 
> index 291c716abd49..ae10e3cf43a1 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -194,7 +194,7 @@ static int cxl_region_manage_dc(struct cxl_region *cxlr)
>  		}
>  		cxlds->dc_list_gen_num = extent_gen_num;
>  		dev_dbg(cxlds->dev, "No of preallocated extents :%d\n", rc);
> -		enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);
> +		/*enable_irq(cxlds->cxl_irq[CXL_EVENT_TYPE_DCD]);*/

Some race condition that means we need to enable the DCD event earlier?
Navneet - I have been working on the DCD feature last few weeks and this has been removed and will be handled like other Events.
>  	}
>  	return 0;
>  err:
> @@ -2810,7 +2810,8 @@ int cxl_add_dc_extent(struct cxl_dev_state *cxlds, struct resource *alloc_dpa_re
>  				dev_dax->align, memremap_compat_align()))) {
>  		rc = alloc_dev_dax_range(dev_dax, hpa,
>  					resource_size(alloc_dpa_res));
> -		return rc;
> +		if (rc)
> +			return rc;

No idea on this one as it's in the code I haven't looked at yet!
Navneet - This is also fixed , in last moment changes this bug got introduced.

>  	}
>  
>  	rc = xa_insert(&cxlr_dc->dax_dev_list, hpa, dev_dax, GFP_KERNEL); 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 
> 9e45b1056022..653bec203838 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -659,7 +659,7 @@ static int cxl_event_irqsetup(struct cxl_dev_state 
> *cxlds)
>  
>  	/* Driver enables DCD interrupt after creating the dc cxl_region */
>  	rc = cxl_event_req_irq(cxlds, policy.dyncap_settings, CXL_EVENT_TYPE_DCD,
> -					IRQF_SHARED | IRQF_ONESHOT | IRQF_NO_AUTOEN);
> +					IRQF_SHARED | IRQF_ONESHOT);

This will be otherside of the removal of the enable above.

>  	if (rc) {
>  		dev_err(cxlds->dev, "Failed to get interrupt for event dc log\n");
>  		return rc;
> diff --git a/include/uapi/linux/cxl_mem.h 
> b/include/uapi/linux/cxl_mem.h index 6ca85861750c..910a48259239 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -47,6 +47,7 @@
>  	___C(SCAN_MEDIA, "Scan Media"),                                   \
>  	___C(GET_SCAN_MEDIA, "Get Scan Media Results"),                   \
>  	___C(GET_DC_EXTENT_LIST, "Get dynamic capacity extents"),         \
> +	___C(GET_DC_CONFIG, "Get dynamic capacity configuration"),         \
>  	___C(MAX, "invalid / last command")
>  
>  #define ___C(a, b) CXL_MEM_COMMAND_ID_##a
> 
> 
> 
> Fan Ni (7):
>   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
>     payload of identify memory device command
>   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
>     and mailbox command support
>   hw/mem/cxl_type3: Add a parameter to pass number of DC regions the
>     device supports in qemu command line
>   hw/mem/cxl_type3: Add DC extent representative to cxl type3 device
>   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
>     dynamic capacity response
>   Add qmp interfaces to add/release dynamic capacity extents
>   hw/mem/cxl_type3: add read/write support to dynamic capacity
> 
>  hw/cxl/cxl-mailbox-utils.c  | 389 +++++++++++++++++++++++++++-
>  hw/mem/cxl_type3.c          | 492 +++++++++++++++++++++++++++++++-----
>  include/hw/cxl/cxl_device.h |  50 +++-  include/hw/cxl/cxl_events.h |  
> 16 ++
>  qapi/cxl.json               |  44 ++++
>  5 files changed, 924 insertions(+), 67 deletions(-)
> 
[RFC 1/7] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command
Posted by Fan Ni 12 months ago
From: Fan Ni <nifan@outlook.com>

Based on CXL spec 3.0 Table 8-94 (Identify Memory Device Output
Payload), dynamic capacity event log size should be part of
output of the Identify command.
Add dc_event_log_size to the output payload for the host to get the info.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 9f8e6722d7..7ff4fbdf22 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -21,6 +21,8 @@
 #include "sysemu/hostmem.h"
 
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
+/* Experimental value: dynamic capacity event log size */
+#define CXL_DC_EVENT_LOG_SIZE 8
 
 /*
  * How to add a new command, example. The command set FOO, with cmd BAR.
@@ -519,8 +521,9 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd *cmd,
         uint16_t inject_poison_limit;
         uint8_t poison_caps;
         uint8_t qos_telemetry_caps;
+		uint16_t dc_event_log_size;
     } QEMU_PACKED *id;
-    QEMU_BUILD_BUG_ON(sizeof(*id) != 0x43);
+    QEMU_BUILD_BUG_ON(sizeof(*id) != 0x45);
 
     CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
     CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
@@ -543,6 +546,7 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd *cmd,
     st24_le_p(id->poison_list_max_mer, 256);
     /* No limit - so limited by main poison record limit */
     stw_le_p(&id->inject_poison_limit, 0);
+	stw_le_p(&id->dc_event_log_size, CXL_DC_EVENT_LOG_SIZE);
 
     *len = sizeof(*id);
     return CXL_MBOX_SUCCESS;
-- 
2.25.1
Re: [RFC 1/7] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command
Posted by Jonathan Cameron via 11 months, 3 weeks ago
On Thu, 11 May 2023 17:56:40 +0000
Fan Ni <fan.ni@samsung.com> wrote:

> From: Fan Ni <nifan@outlook.com>
> 
> Based on CXL spec 3.0 Table 8-94 (Identify Memory Device Output
> Payload), dynamic capacity event log size should be part of
> output of the Identify command.
> Add dc_event_log_size to the output payload for the host to get the info.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

Trivial formatting aside, looks good to me.

Jonathan


> ---
>  hw/cxl/cxl-mailbox-utils.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 9f8e6722d7..7ff4fbdf22 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -21,6 +21,8 @@
>  #include "sysemu/hostmem.h"
>  
>  #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
> +/* Experimental value: dynamic capacity event log size */
> +#define CXL_DC_EVENT_LOG_SIZE 8
>  
>  /*
>   * How to add a new command, example. The command set FOO, with cmd BAR.
> @@ -519,8 +521,9 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd *cmd,
>          uint16_t inject_poison_limit;
>          uint8_t poison_caps;
>          uint8_t qos_telemetry_caps;
> +		uint16_t dc_event_log_size;
Qemu uses 4 space indentation not tabs.

>      } QEMU_PACKED *id;
> -    QEMU_BUILD_BUG_ON(sizeof(*id) != 0x43);
> +    QEMU_BUILD_BUG_ON(sizeof(*id) != 0x45);
>  
>      CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
>      CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
> @@ -543,6 +546,7 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd *cmd,
>      st24_le_p(id->poison_list_max_mer, 256);
>      /* No limit - so limited by main poison record limit */
>      stw_le_p(&id->inject_poison_limit, 0);
> +	stw_le_p(&id->dc_event_log_size, CXL_DC_EVENT_LOG_SIZE);
>  
>      *len = sizeof(*id);
>      return CXL_MBOX_SUCCESS;
[RFC 2/7] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
Posted by Fan Ni 12 months ago
From: Fan Ni <nifan@outlook.com>

Per cxl spec 3.0, add dynamic capacity region representative based on
Table 8-126 and extend the cxl type3 device definition to include dc region
information. Also, based on info in 8.2.9.8.9.1, add 'Get Dynamic Capacity
Configuration' mailbox support.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 68 +++++++++++++++++++++++++++++++++++++
 include/hw/cxl/cxl_device.h | 16 +++++++++
 2 files changed, 84 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 7ff4fbdf22..61c77e52d8 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -81,6 +81,8 @@ enum {
         #define GET_POISON_LIST        0x0
         #define INJECT_POISON          0x1
         #define CLEAR_POISON           0x2
+	DCD_CONFIG = 0x48, /*8.2.9.8.9*/
+		#define GET_DC_REGION_CONFIG   0x0
     PHYSICAL_SWITCH = 0x51
         #define IDENTIFY_SWITCH_DEVICE      0x0
 };
@@ -935,6 +937,70 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * cxl spec 3.0: 8.2.9.8.9.2
+ * Get Dynamic Capacity Configuration
+ **/
+static CXLRetCode cmd_dcd_get_dyn_cap_config(struct cxl_cmd *cmd,
+		CXLDeviceState *cxl_dstate,
+		uint16_t *len)
+{
+	struct get_dyn_cap_config_in_pl {
+		uint8_t region_cnt;
+		uint8_t start_region_id;
+	} QEMU_PACKED;
+
+    struct get_dyn_cap_config_out_pl {
+		uint8_t num_regions;
+		uint8_t rsvd1[7];
+		struct {
+			uint64_t base;
+			uint64_t decode_len;
+			uint64_t region_len;
+			uint64_t block_size;
+			uint32_t dsmadhandle;
+			uint8_t flags;
+			uint8_t rsvd2[3];
+		} QEMU_PACKED records[];
+	} QEMU_PACKED;
+
+	struct get_dyn_cap_config_in_pl *in = (void *)cmd->payload;
+	struct get_dyn_cap_config_out_pl *out = (void *)cmd->payload;
+	struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+	uint16_t record_count = 0, i = 0;
+	uint16_t out_pl_len;
+
+	if (in->start_region_id >= ct3d->dc.num_regions)
+		record_count = 0;
+	else if (ct3d->dc.num_regions - in->start_region_id < in->region_cnt)
+		record_count = ct3d->dc.num_regions - in->start_region_id;
+	else
+		record_count = in->region_cnt;
+
+	out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+	assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
+
+	memset(out, 0, out_pl_len);
+	out->num_regions = record_count;
+	for (; i < record_count; i++) {
+		stq_le_p(&out->records[i].base,
+			ct3d->dc.regions[in->start_region_id+i].base);
+		stq_le_p(&out->records[i].decode_len,
+			ct3d->dc.regions[in->start_region_id+i].decode_len);
+		stq_le_p(&out->records[i].region_len,
+			ct3d->dc.regions[in->start_region_id+i].len);
+		stq_le_p(&out->records[i].block_size,
+			ct3d->dc.regions[in->start_region_id+i].block_size);
+		stl_le_p(&out->records[i].dsmadhandle,
+			ct3d->dc.regions[in->start_region_id+i].dsmadhandle);
+		out->records[i].flags
+			= ct3d->dc.regions[in->start_region_id+i].flags;
+	}
+
+	*len = out_pl_len;
+	return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -973,6 +1039,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
         cmd_media_inject_poison, 8, 0 },
     [MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
         cmd_media_clear_poison, 72, 0 },
+	[DCD_CONFIG][GET_DC_REGION_CONFIG] = { "DCD_GET_DC_REGION_CONFIG",
+		cmd_dcd_get_dyn_cap_config, 2, 0 },
 };
 
 static struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index e285369693..8a04e53e90 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -383,6 +383,17 @@ typedef struct CXLPoison {
 typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
 #define CXL_POISON_LIST_LIMIT 256
 
+#define DCD_MAX_REGION_NUM 8
+
+typedef struct CXLDCD_Region {
+	uint64_t base;
+	uint64_t decode_len; /* in multiples of 256MB */
+	uint64_t len;
+	uint64_t block_size;
+	uint32_t dsmadhandle;
+	uint8_t flags;
+} CXLDCD_Region;
+
 struct CXLType3Dev {
     /* Private */
     PCIDevice parent_obj;
@@ -414,6 +425,11 @@ struct CXLType3Dev {
     unsigned int poison_list_cnt;
     bool poison_list_overflowed;
     uint64_t poison_list_overflow_ts;
+
+	struct dynamic_capacity {
+		uint8_t num_regions; // 1-8
+		struct CXLDCD_Region regions[DCD_MAX_REGION_NUM];
+	} dc;
 };
 
 #define TYPE_CXL_TYPE3 "cxl-type3"
-- 
2.25.1
Re: [RFC 2/7] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
Posted by Jonathan Cameron via 11 months, 3 weeks ago
On Thu, 11 May 2023 17:56:40 +0000
Fan Ni <fan.ni@samsung.com> wrote:

> From: Fan Ni <nifan@outlook.com>
> 
> Per cxl spec 3.0, add dynamic capacity region representative based on
> Table 8-126 and extend the cxl type3 device definition to include dc region
> information. Also, based on info in 8.2.9.8.9.1, add 'Get Dynamic Capacity
> Configuration' mailbox support.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 68 +++++++++++++++++++++++++++++++++++++
>  include/hw/cxl/cxl_device.h | 16 +++++++++
>  2 files changed, 84 insertions(+)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 7ff4fbdf22..61c77e52d8 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -81,6 +81,8 @@ enum {
>          #define GET_POISON_LIST        0x0
>          #define INJECT_POISON          0x1
>          #define CLEAR_POISON           0x2
> +	DCD_CONFIG = 0x48, /*8.2.9.8.9*/

Always include which spec version in references.  Stuff keeps moving.

> +		#define GET_DC_REGION_CONFIG   0x0

Called simply Dynamic Capacity Configuration in spec.  Sure it's
all regions today, but who knows in future so we should match
naming.  GET_DC_CONFIG should do.

>      PHYSICAL_SWITCH = 0x51
>          #define IDENTIFY_SWITCH_DEVICE      0x0
>  };
> @@ -935,6 +937,70 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
>      return CXL_MBOX_SUCCESS;
>  }
>  
> +/*
> + * cxl spec 3.0: 8.2.9.8.9.2
> + * Get Dynamic Capacity Configuration
> + **/
> +static CXLRetCode cmd_dcd_get_dyn_cap_config(struct cxl_cmd *cmd,
> +		CXLDeviceState *cxl_dstate,
> +		uint16_t *len)
> +{
> +	struct get_dyn_cap_config_in_pl {
> +		uint8_t region_cnt;
> +		uint8_t start_region_id;
> +	} QEMU_PACKED;
> +
> +    struct get_dyn_cap_config_out_pl {
> +		uint8_t num_regions;
> +		uint8_t rsvd1[7];
> +		struct {
> +			uint64_t base;
> +			uint64_t decode_len;
> +			uint64_t region_len;
> +			uint64_t block_size;
> +			uint32_t dsmadhandle;
> +			uint8_t flags;
> +			uint8_t rsvd2[3];
> +		} QEMU_PACKED records[];
> +	} QEMU_PACKED;
> +
> +	struct get_dyn_cap_config_in_pl *in = (void *)cmd->payload;
> +	struct get_dyn_cap_config_out_pl *out = (void *)cmd->payload;
> +	struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> +	uint16_t record_count = 0, i = 0;
> +	uint16_t out_pl_len;
> +
> +	if (in->start_region_id >= ct3d->dc.num_regions)
> +		record_count = 0;

Probably an error return rather than 0 records. Invalid input seems most appropriate.
My expectation, though the text doesn't call it out is that first issued command is
for 0 records to just get the Number of Available Regions, then rest of entries
will be right size.    A similar case for Get Supported Features calls out
"The device shall return Invalid Input if Starting Feature Index is greater than the
Device Supported Features value"

> +	else if (ct3d->dc.num_regions - in->start_region_id < in->region_cnt)
> +		record_count = ct3d->dc.num_regions - in->start_region_id;
> +	else
> +		record_count = in->region_cnt;

Do the last two conditions with min() ?

> +
> +	out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> +	assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> +
> +	memset(out, 0, out_pl_len);
> +	out->num_regions = record_count;
> +	for (; i < record_count; i++) {

i = 0 here makes more sense to me.

> +		stq_le_p(&out->records[i].base,
> +			ct3d->dc.regions[in->start_region_id+i].base);

Spaces around +

> +		stq_le_p(&out->records[i].decode_len,
> +			ct3d->dc.regions[in->start_region_id+i].decode_len);
> +		stq_le_p(&out->records[i].region_len,
> +			ct3d->dc.regions[in->start_region_id+i].len);
> +		stq_le_p(&out->records[i].block_size,
> +			ct3d->dc.regions[in->start_region_id+i].block_size);
> +		stl_le_p(&out->records[i].dsmadhandle,
> +			ct3d->dc.regions[in->start_region_id+i].dsmadhandle);
> +		out->records[i].flags
> +			= ct3d->dc.regions[in->start_region_id+i].flags;
> +	}
> +
> +	*len = out_pl_len;
> +	return CXL_MBOX_SUCCESS;
> +}
> +
>  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>  #define IMMEDIATE_DATA_CHANGE (1 << 2)
>  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> @@ -973,6 +1039,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
>          cmd_media_inject_poison, 8, 0 },
>      [MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
>          cmd_media_clear_poison, 72, 0 },
> +	[DCD_CONFIG][GET_DC_REGION_CONFIG] = { "DCD_GET_DC_REGION_CONFIG",
> +		cmd_dcd_get_dyn_cap_config, 2, 0 },
>  };
>  
>  static struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index e285369693..8a04e53e90 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -383,6 +383,17 @@ typedef struct CXLPoison {
>  typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
>  #define CXL_POISON_LIST_LIMIT 256
>  
> +#define DCD_MAX_REGION_NUM 8
> +
> +typedef struct CXLDCD_Region {
> +	uint64_t base;
> +	uint64_t decode_len; /* in multiples of 256MB */
> +	uint64_t len;
> +	uint64_t block_size;
> +	uint32_t dsmadhandle;
> +	uint8_t flags;
> +} CXLDCD_Region;
> +
>  struct CXLType3Dev {
>      /* Private */
>      PCIDevice parent_obj;
> @@ -414,6 +425,11 @@ struct CXLType3Dev {
>      unsigned int poison_list_cnt;
>      bool poison_list_overflowed;
>      uint64_t poison_list_overflow_ts;
> +
> +	struct dynamic_capacity {
> +		uint8_t num_regions; // 1-8
Or none if it's not present :)

> +		struct CXLDCD_Region regions[DCD_MAX_REGION_NUM];
> +	} dc;
>  };
>  
>  #define TYPE_CXL_TYPE3 "cxl-type3"
Re: [RFC 2/7] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
Posted by Nathan Fontenot 11 months, 4 weeks ago
On 5/11/23 12:56, Fan Ni wrote:
> From: Fan Ni <nifan@outlook.com>
> 
> Per cxl spec 3.0, add dynamic capacity region representative based on
> Table 8-126 and extend the cxl type3 device definition to include dc region
> information. Also, based on info in 8.2.9.8.9.1, add 'Get Dynamic Capacity
> Configuration' mailbox support.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 68 +++++++++++++++++++++++++++++++++++++
>  include/hw/cxl/cxl_device.h | 16 +++++++++
>  2 files changed, 84 insertions(+)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 7ff4fbdf22..61c77e52d8 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -81,6 +81,8 @@ enum {
>          #define GET_POISON_LIST        0x0
>          #define INJECT_POISON          0x1
>          #define CLEAR_POISON           0x2
> +	DCD_CONFIG = 0x48, /*8.2.9.8.9*/
> +		#define GET_DC_REGION_CONFIG   0x0
>      PHYSICAL_SWITCH = 0x51
>          #define IDENTIFY_SWITCH_DEVICE      0x0
>  };
> @@ -935,6 +937,70 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
>      return CXL_MBOX_SUCCESS;
>  }
>  
> +/*
> + * cxl spec 3.0: 8.2.9.8.9.2
> + * Get Dynamic Capacity Configuration
> + **/
> +static CXLRetCode cmd_dcd_get_dyn_cap_config(struct cxl_cmd *cmd,
> +		CXLDeviceState *cxl_dstate,
> +		uint16_t *len)
> +{
> +	struct get_dyn_cap_config_in_pl {
> +		uint8_t region_cnt;
> +		uint8_t start_region_id;
> +	} QEMU_PACKED;
> +
> +    struct get_dyn_cap_config_out_pl {
> +		uint8_t num_regions;
> +		uint8_t rsvd1[7];
> +		struct {
> +			uint64_t base;
> +			uint64_t decode_len;
> +			uint64_t region_len;
> +			uint64_t block_size;
> +			uint32_t dsmadhandle;
> +			uint8_t flags;
> +			uint8_t rsvd2[3];
> +		} QEMU_PACKED records[];

Could you declare CXLDCD_Region as QEMU_PACKED and use it here instead of
re-defining the region structure?

> +	} QEMU_PACKED;
> +
> +	struct get_dyn_cap_config_in_pl *in = (void *)cmd->payload;
> +	struct get_dyn_cap_config_out_pl *out = (void *)cmd->payload;
> +	struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> +	uint16_t record_count = 0, i = 0;
> +	uint16_t out_pl_len;
> +
> +	if (in->start_region_id >= ct3d->dc.num_regions)
> +		record_count = 0;
> +	else if (ct3d->dc.num_regions - in->start_region_id < in->region_cnt)
> +		record_count = ct3d->dc.num_regions - in->start_region_id;
> +	else
> +		record_count = in->region_cnt;
> +
> +	out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> +	assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> +
> +	memset(out, 0, out_pl_len);
> +	out->num_regions = record_count;
> +	for (; i < record_count; i++) {
> +		stq_le_p(&out->records[i].base,
> +			ct3d->dc.regions[in->start_region_id+i].base);
> +		stq_le_p(&out->records[i].decode_len,
> +			ct3d->dc.regions[in->start_region_id+i].decode_len);
> +		stq_le_p(&out->records[i].region_len,
> +			ct3d->dc.regions[in->start_region_id+i].len);
> +		stq_le_p(&out->records[i].block_size,
> +			ct3d->dc.regions[in->start_region_id+i].block_size);
> +		stl_le_p(&out->records[i].dsmadhandle,
> +			ct3d->dc.regions[in->start_region_id+i].dsmadhandle);
> +		out->records[i].flags
> +			= ct3d->dc.regions[in->start_region_id+i].flags;

In this loop your reading from 'in' and writing to 'out' where in and out both
point to the same payload buffer. It works because of the structure layouts but
feels like a bug waiting to happen. Perhaps saving start_region to a local variable
and using that for the loop?

-Nathan

> +	}
> +
> +	*len = out_pl_len;
> +	return CXL_MBOX_SUCCESS;
> +}
> +
>  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>  #define IMMEDIATE_DATA_CHANGE (1 << 2)
>  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> @@ -973,6 +1039,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
>          cmd_media_inject_poison, 8, 0 },
>      [MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
>          cmd_media_clear_poison, 72, 0 },
> +	[DCD_CONFIG][GET_DC_REGION_CONFIG] = { "DCD_GET_DC_REGION_CONFIG",
> +		cmd_dcd_get_dyn_cap_config, 2, 0 },
>  };
>  
>  static struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index e285369693..8a04e53e90 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -383,6 +383,17 @@ typedef struct CXLPoison {
>  typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
>  #define CXL_POISON_LIST_LIMIT 256
>  
> +#define DCD_MAX_REGION_NUM 8
> +
> +typedef struct CXLDCD_Region {
> +	uint64_t base;
> +	uint64_t decode_len; /* in multiples of 256MB */
> +	uint64_t len;
> +	uint64_t block_size;
> +	uint32_t dsmadhandle;
> +	uint8_t flags;
> +} CXLDCD_Region;
> +
>  struct CXLType3Dev {
>      /* Private */
>      PCIDevice parent_obj;
> @@ -414,6 +425,11 @@ struct CXLType3Dev {
>      unsigned int poison_list_cnt;
>      bool poison_list_overflowed;
>      uint64_t poison_list_overflow_ts;
> +
> +	struct dynamic_capacity {
> +		uint8_t num_regions; // 1-8
> +		struct CXLDCD_Region regions[DCD_MAX_REGION_NUM];
> +	} dc;
>  };
>  
>  #define TYPE_CXL_TYPE3 "cxl-type3"
Re: [RFC 2/7] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
Posted by Jonathan Cameron via 11 months, 3 weeks ago
On Thu, 11 May 2023 16:53:23 -0500
Nathan Fontenot <nafonten@amd.com> wrote:

> On 5/11/23 12:56, Fan Ni wrote:
> > From: Fan Ni <nifan@outlook.com>
> > 
> > Per cxl spec 3.0, add dynamic capacity region representative based on
> > Table 8-126 and extend the cxl type3 device definition to include dc region
> > information. Also, based on info in 8.2.9.8.9.1, add 'Get Dynamic Capacity
> > Configuration' mailbox support.
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  | 68 +++++++++++++++++++++++++++++++++++++
> >  include/hw/cxl/cxl_device.h | 16 +++++++++
> >  2 files changed, 84 insertions(+)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index 7ff4fbdf22..61c77e52d8 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -81,6 +81,8 @@ enum {
> >          #define GET_POISON_LIST        0x0
> >          #define INJECT_POISON          0x1
> >          #define CLEAR_POISON           0x2
> > +	DCD_CONFIG = 0x48, /*8.2.9.8.9*/
> > +		#define GET_DC_REGION_CONFIG   0x0
> >      PHYSICAL_SWITCH = 0x51
> >          #define IDENTIFY_SWITCH_DEVICE      0x0
> >  };
> > @@ -935,6 +937,70 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
> >      return CXL_MBOX_SUCCESS;
> >  }
> >  
> > +/*
> > + * cxl spec 3.0: 8.2.9.8.9.2
> > + * Get Dynamic Capacity Configuration
> > + **/
> > +static CXLRetCode cmd_dcd_get_dyn_cap_config(struct cxl_cmd *cmd,
> > +		CXLDeviceState *cxl_dstate,
> > +		uint16_t *len)
> > +{
> > +	struct get_dyn_cap_config_in_pl {
> > +		uint8_t region_cnt;
> > +		uint8_t start_region_id;
> > +	} QEMU_PACKED;
> > +
> > +    struct get_dyn_cap_config_out_pl {
> > +		uint8_t num_regions;
> > +		uint8_t rsvd1[7];
> > +		struct {
> > +			uint64_t base;
> > +			uint64_t decode_len;
> > +			uint64_t region_len;
> > +			uint64_t block_size;
> > +			uint32_t dsmadhandle;
> > +			uint8_t flags;
> > +			uint8_t rsvd2[3];
> > +		} QEMU_PACKED records[];  
> 
> Could you declare CXLDCD_Region as QEMU_PACKED and use it here instead of
> re-defining the region structure?

Could be done, but care needed on the endian conversions.  I wouldn't
mind seeing this always held as little endian state though. We'd have
done that anyway if it was a memory mapped command set rather than read
via the mailbox so there is plenty of precedence.

Jonathan

> 
> > +	} QEMU_PACKED;
> > +
> > +	struct get_dyn_cap_config_in_pl *in = (void *)cmd->payload;
> > +	struct get_dyn_cap_config_out_pl *out = (void *)cmd->payload;
> > +	struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> > +	uint16_t record_count = 0, i = 0;
> > +	uint16_t out_pl_len;
> > +
> > +	if (in->start_region_id >= ct3d->dc.num_regions)
> > +		record_count = 0;
> > +	else if (ct3d->dc.num_regions - in->start_region_id < in->region_cnt)
> > +		record_count = ct3d->dc.num_regions - in->start_region_id;
> > +	else
> > +		record_count = in->region_cnt;
> > +
> > +	out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> > +	assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> > +
> > +	memset(out, 0, out_pl_len);
> > +	out->num_regions = record_count;
> > +	for (; i < record_count; i++) {
> > +		stq_le_p(&out->records[i].base,
> > +			ct3d->dc.regions[in->start_region_id+i].base);
> > +		stq_le_p(&out->records[i].decode_len,
> > +			ct3d->dc.regions[in->start_region_id+i].decode_len);
> > +		stq_le_p(&out->records[i].region_len,
> > +			ct3d->dc.regions[in->start_region_id+i].len);
> > +		stq_le_p(&out->records[i].block_size,
> > +			ct3d->dc.regions[in->start_region_id+i].block_size);
> > +		stl_le_p(&out->records[i].dsmadhandle,
> > +			ct3d->dc.regions[in->start_region_id+i].dsmadhandle);
> > +		out->records[i].flags
> > +			= ct3d->dc.regions[in->start_region_id+i].flags;  
> 
> In this loop your reading from 'in' and writing to 'out' where in and out both
> point to the same payload buffer. It works because of the structure layouts but
> feels like a bug waiting to happen. Perhaps saving start_region to a local variable
> and using that for the loop?

Does it work?  There is a memset of out above.
Definitely need a local copy of start_region_id before that.
This might only be working because of good fortune / compilers being 'clever'.

Jonathan


> 
> -Nathan
> 
> > +	}
> > +
> > +	*len = out_pl_len;
> > +	return CXL_MBOX_SUCCESS;
> > +}
> > +
> >  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
> >  #define IMMEDIATE_DATA_CHANGE (1 << 2)
> >  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> > @@ -973,6 +1039,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
> >          cmd_media_inject_poison, 8, 0 },
> >      [MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
> >          cmd_media_clear_poison, 72, 0 },
> > +	[DCD_CONFIG][GET_DC_REGION_CONFIG] = { "DCD_GET_DC_REGION_CONFIG",
> > +		cmd_dcd_get_dyn_cap_config, 2, 0 },
> >  };
> >  
> >  static struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index e285369693..8a04e53e90 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -383,6 +383,17 @@ typedef struct CXLPoison {
> >  typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
> >  #define CXL_POISON_LIST_LIMIT 256
> >  
> > +#define DCD_MAX_REGION_NUM 8
> > +
> > +typedef struct CXLDCD_Region {
> > +	uint64_t base;
> > +	uint64_t decode_len; /* in multiples of 256MB */
> > +	uint64_t len;
> > +	uint64_t block_size;
> > +	uint32_t dsmadhandle;
> > +	uint8_t flags;
> > +} CXLDCD_Region;
> > +
> >  struct CXLType3Dev {
> >      /* Private */
> >      PCIDevice parent_obj;
> > @@ -414,6 +425,11 @@ struct CXLType3Dev {
> >      unsigned int poison_list_cnt;
> >      bool poison_list_overflowed;
> >      uint64_t poison_list_overflow_ts;
> > +
> > +	struct dynamic_capacity {
> > +		uint8_t num_regions; // 1-8
> > +		struct CXLDCD_Region regions[DCD_MAX_REGION_NUM];
> > +	} dc;
> >  };
> >  
> >  #define TYPE_CXL_TYPE3 "cxl-type3"
Re: [RFC 2/7] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
Posted by nifan@outlook.com 10 months, 1 week ago
The 05/15/2023 14:58, Jonathan Cameron wrote:
> On Thu, 11 May 2023 16:53:23 -0500
> Nathan Fontenot <nafonten@amd.com> wrote:
> 
> > On 5/11/23 12:56, Fan Ni wrote:
> > > From: Fan Ni <nifan@outlook.com>
> > > 
> > > Per cxl spec 3.0, add dynamic capacity region representative based on
> > > Table 8-126 and extend the cxl type3 device definition to include dc region
> > > information. Also, based on info in 8.2.9.8.9.1, add 'Get Dynamic Capacity
> > > Configuration' mailbox support.
> > > 
> > > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > > ---
> > >  hw/cxl/cxl-mailbox-utils.c  | 68 +++++++++++++++++++++++++++++++++++++
> > >  include/hw/cxl/cxl_device.h | 16 +++++++++
> > >  2 files changed, 84 insertions(+)
> > > 
> > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > index 7ff4fbdf22..61c77e52d8 100644
> > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > @@ -81,6 +81,8 @@ enum {
> > >          #define GET_POISON_LIST        0x0
> > >          #define INJECT_POISON          0x1
> > >          #define CLEAR_POISON           0x2
> > > +	DCD_CONFIG = 0x48, /*8.2.9.8.9*/
> > > +		#define GET_DC_REGION_CONFIG   0x0
> > >      PHYSICAL_SWITCH = 0x51
> > >          #define IDENTIFY_SWITCH_DEVICE      0x0
> > >  };
> > > @@ -935,6 +937,70 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
> > >      return CXL_MBOX_SUCCESS;
> > >  }
> > >  
> > > +/*
> > > + * cxl spec 3.0: 8.2.9.8.9.2
> > > + * Get Dynamic Capacity Configuration
> > > + **/
> > > +static CXLRetCode cmd_dcd_get_dyn_cap_config(struct cxl_cmd *cmd,
> > > +		CXLDeviceState *cxl_dstate,
> > > +		uint16_t *len)
> > > +{
> > > +	struct get_dyn_cap_config_in_pl {
> > > +		uint8_t region_cnt;
> > > +		uint8_t start_region_id;
> > > +	} QEMU_PACKED;
> > > +
> > > +    struct get_dyn_cap_config_out_pl {
> > > +		uint8_t num_regions;
> > > +		uint8_t rsvd1[7];
> > > +		struct {
> > > +			uint64_t base;
> > > +			uint64_t decode_len;
> > > +			uint64_t region_len;
> > > +			uint64_t block_size;
> > > +			uint32_t dsmadhandle;
> > > +			uint8_t flags;
> > > +			uint8_t rsvd2[3];
> > > +		} QEMU_PACKED records[];  
> > 
> > Could you declare CXLDCD_Region as QEMU_PACKED and use it here instead of
> > re-defining the region structure?
> 
> Could be done, but care needed on the endian conversions.  I wouldn't
> mind seeing this always held as little endian state though. We'd have
> done that anyway if it was a memory mapped command set rather than read
> via the mailbox so there is plenty of precedence.
> 
> Jonathan

I will leave it as it is for now.
> 
> > 
> > > +	} QEMU_PACKED;
> > > +
> > > +	struct get_dyn_cap_config_in_pl *in = (void *)cmd->payload;
> > > +	struct get_dyn_cap_config_out_pl *out = (void *)cmd->payload;
> > > +	struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> > > +	uint16_t record_count = 0, i = 0;
> > > +	uint16_t out_pl_len;
> > > +
> > > +	if (in->start_region_id >= ct3d->dc.num_regions)
> > > +		record_count = 0;
> > > +	else if (ct3d->dc.num_regions - in->start_region_id < in->region_cnt)
> > > +		record_count = ct3d->dc.num_regions - in->start_region_id;
> > > +	else
> > > +		record_count = in->region_cnt;
> > > +
> > > +	out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> > > +	assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> > > +
> > > +	memset(out, 0, out_pl_len);
> > > +	out->num_regions = record_count;
> > > +	for (; i < record_count; i++) {
> > > +		stq_le_p(&out->records[i].base,
> > > +			ct3d->dc.regions[in->start_region_id+i].base);
> > > +		stq_le_p(&out->records[i].decode_len,
> > > +			ct3d->dc.regions[in->start_region_id+i].decode_len);
> > > +		stq_le_p(&out->records[i].region_len,
> > > +			ct3d->dc.regions[in->start_region_id+i].len);
> > > +		stq_le_p(&out->records[i].block_size,
> > > +			ct3d->dc.regions[in->start_region_id+i].block_size);
> > > +		stl_le_p(&out->records[i].dsmadhandle,
> > > +			ct3d->dc.regions[in->start_region_id+i].dsmadhandle);
> > > +		out->records[i].flags
> > > +			= ct3d->dc.regions[in->start_region_id+i].flags;  
> > 
> > In this loop your reading from 'in' and writing to 'out' where in and out both
> > point to the same payload buffer. It works because of the structure layouts but
> > feels like a bug waiting to happen. Perhaps saving start_region to a local variable
> > and using that for the loop?
> 
> Does it work?  There is a memset of out above.
> Definitely need a local copy of start_region_id before that.
> This might only be working because of good fortune / compilers being 'clever'.
> 
> Jonathan

Yes. We need a local variable to record the start_region_id.

Thanks Nathan for catching the issue.

Fan
> 
> 
> > 
> > -Nathan
> > 
> > > +	}
> > > +
> > > +	*len = out_pl_len;
> > > +	return CXL_MBOX_SUCCESS;
> > > +}
> > > +
> > >  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
> > >  #define IMMEDIATE_DATA_CHANGE (1 << 2)
> > >  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> > > @@ -973,6 +1039,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
> > >          cmd_media_inject_poison, 8, 0 },
> > >      [MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
> > >          cmd_media_clear_poison, 72, 0 },
> > > +	[DCD_CONFIG][GET_DC_REGION_CONFIG] = { "DCD_GET_DC_REGION_CONFIG",
> > > +		cmd_dcd_get_dyn_cap_config, 2, 0 },
> > >  };
> > >  
> > >  static struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> > > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > > index e285369693..8a04e53e90 100644
> > > --- a/include/hw/cxl/cxl_device.h
> > > +++ b/include/hw/cxl/cxl_device.h
> > > @@ -383,6 +383,17 @@ typedef struct CXLPoison {
> > >  typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
> > >  #define CXL_POISON_LIST_LIMIT 256
> > >  
> > > +#define DCD_MAX_REGION_NUM 8
> > > +
> > > +typedef struct CXLDCD_Region {
> > > +	uint64_t base;
> > > +	uint64_t decode_len; /* in multiples of 256MB */
> > > +	uint64_t len;
> > > +	uint64_t block_size;
> > > +	uint32_t dsmadhandle;
> > > +	uint8_t flags;
> > > +} CXLDCD_Region;
> > > +
> > >  struct CXLType3Dev {
> > >      /* Private */
> > >      PCIDevice parent_obj;
> > > @@ -414,6 +425,11 @@ struct CXLType3Dev {
> > >      unsigned int poison_list_cnt;
> > >      bool poison_list_overflowed;
> > >      uint64_t poison_list_overflow_ts;
> > > +
> > > +	struct dynamic_capacity {
> > > +		uint8_t num_regions; // 1-8
> > > +		struct CXLDCD_Region regions[DCD_MAX_REGION_NUM];
> > > +	} dc;
> > >  };
> > >  
> > >  #define TYPE_CXL_TYPE3 "cxl-type3"  
> 

-- 
Fan Ni <nifan@outlook.com>
[RFC 3/7] hw/mem/cxl_type3: Add a parameter to pass number of DC regions the device supports in qemu command line
Posted by Fan Ni 12 months ago
From: Fan Ni <nifan@outlook.com>

Add a property 'num-dc-regions' to ct3_props to allow users to create DC
regions.
With the change, users can control the number of DC regions the device
supports.
To make it easier, other parameters of the region like region base, length,
and block size are hard coded. If desired, these parameters
can be added easily.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/mem/cxl_type3.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 2b483d3d8e..b9c375d9b4 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -684,6 +684,34 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
     }
 }
 
+/*
+ * Create a dc region to test "Get Dynamic Capacity Configuration" command.
+ */
+static int cxl_create_toy_regions(CXLType3Dev *ct3d)
+{
+	int i;
+	uint64_t region_base = ct3d->hostvmem?ct3d->hostvmem->size
+		+ ct3d->hostpmem->size:ct3d->hostpmem->size;
+	uint64_t region_len = 1024*1024*1024;
+	uint64_t decode_len = 4; /* 4*256MB */
+	uint64_t blk_size = 2*1024*1024;
+	struct CXLDCD_Region *region;
+
+	for (i = 0; i < ct3d->dc.num_regions; i++) {
+		region = &ct3d->dc.regions[i];
+		region->base = region_base;
+		region->decode_len = decode_len;
+		region->len = region_len;
+		region->block_size = blk_size;
+		/* dsmad_handle is set when creating cdat table entries */
+		region->flags = 0;
+
+		region_base += region->len;
+	}
+
+	return 0;
+}
+
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
@@ -752,6 +780,9 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         g_free(p_name);
     }
 
+	if (cxl_create_toy_regions(ct3d))
+		return false;
+
     return true;
 }
 
@@ -1036,6 +1067,7 @@ static Property ct3_props[] = {
     DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
     DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
     DEFINE_PROP_UINT16("spdm", CXLType3Dev, spdm_port, 0),
+	DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.25.1
Re: [RFC 3/7] hw/mem/cxl_type3: Add a parameter to pass number of DC regions the device supports in qemu command line
Posted by Jonathan Cameron via 11 months, 3 weeks ago
On Thu, 11 May 2023 17:56:40 +0000
Fan Ni <fan.ni@samsung.com> wrote:

> From: Fan Ni <nifan@outlook.com>
> 
> Add a property 'num-dc-regions' to ct3_props to allow users to create DC
> regions.
> With the change, users can control the number of DC regions the device
> supports.
> To make it easier, other parameters of the region like region base, length,
> and block size are hard coded. If desired, these parameters
> can be added easily.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>

Ok. This is fine for initial testing, but we need to figure out how to actually
handle DCD regions and how to back them with memory.
Probably a 3rd memory backend to cover all the DCD regions?
Default perhaps to an even spread of a few regions (no real point in doing
more than 2 for initial support, fall back to 1 region if size is too small).
We will want to be able to mess with regions from the FM-API but lots more to
do there before that matters and we can still have default config for any
regions we define now.

Jonathan

> ---
>  hw/mem/cxl_type3.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 2b483d3d8e..b9c375d9b4 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -684,6 +684,34 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
>      }
>  }
>  
> +/*
> + * Create a dc region to test "Get Dynamic Capacity Configuration" command.
> + */
> +static int cxl_create_toy_regions(CXLType3Dev *ct3d)
> +{
> +	int i;
> +	uint64_t region_base = ct3d->hostvmem?ct3d->hostvmem->size
> +		+ ct3d->hostpmem->size:ct3d->hostpmem->size;
> +	uint64_t region_len = 1024*1024*1024;
> +	uint64_t decode_len = 4; /* 4*256MB */
> +	uint64_t blk_size = 2*1024*1024;
> +	struct CXLDCD_Region *region;
> +
> +	for (i = 0; i < ct3d->dc.num_regions; i++) {
> +		region = &ct3d->dc.regions[i];
> +		region->base = region_base;
> +		region->decode_len = decode_len;
> +		region->len = region_len;
> +		region->block_size = blk_size;
> +		/* dsmad_handle is set when creating cdat table entries */
> +		region->flags = 0;
> +
> +		region_base += region->len;
> +	}
> +
> +	return 0;
> +}
> +
>  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>  {
>      DeviceState *ds = DEVICE(ct3d);
> @@ -752,6 +780,9 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>          g_free(p_name);
>      }
>  
> +	if (cxl_create_toy_regions(ct3d))
> +		return false;
> +
>      return true;
>  }
>  
> @@ -1036,6 +1067,7 @@ static Property ct3_props[] = {
>      DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
>      DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
>      DEFINE_PROP_UINT16("spdm", CXLType3Dev, spdm_port, 0),
> +	DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>
[RFC 4/7] hw/mem/cxl_type3: Add DC extent representative to cxl type3 device
Posted by Fan Ni 12 months ago
From: Fan Ni <nifan@outlook.com>

Add dynamic capacity extent information to the definition of
CXLType3Dev and add get DC extent list mailbox command based on
CXL.spec.3.0:.8.2.9.8.9.2.

With this command, we can create dc regions as below:

region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
echo 1 > /sys/bus/cxl/devices/$region/interleave_ways

echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
echo 0x30000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size

echo 0x30000000 > /sys/bus/cxl/devices/$region/size
echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
echo 1 > /sys/bus/cxl/devices/$region/commit
echo $region > /sys/bus/cxl/drivers/cxl_region/bind

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 73 ++++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3.c          |  1 +
 include/hw/cxl/cxl_device.h | 23 ++++++++++++
 3 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 61c77e52d8..ed2ac154cb 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -83,6 +83,7 @@ enum {
         #define CLEAR_POISON           0x2
 	DCD_CONFIG = 0x48, /*8.2.9.8.9*/
 		#define GET_DC_REGION_CONFIG   0x0
+		#define GET_DYN_CAP_EXT_LIST   0x1
     PHYSICAL_SWITCH = 0x51
         #define IDENTIFY_SWITCH_DEVICE      0x0
 };
@@ -938,7 +939,7 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
 }
 
 /*
- * cxl spec 3.0: 8.2.9.8.9.2
+ * cxl spec 3.0: 8.2.9.8.9.1
  * Get Dynamic Capacity Configuration
  **/
 static CXLRetCode cmd_dcd_get_dyn_cap_config(struct cxl_cmd *cmd,
@@ -1001,6 +1002,73 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(struct cxl_cmd *cmd,
 	return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * cxl spec 3.0: 8.2.9.8.9.2
+ * Get Dynamic Capacity Extent List (Opcode 4810h)
+ **/
+static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(struct cxl_cmd *cmd,
+		CXLDeviceState *cxl_dstate,
+		uint16_t *len)
+{
+	struct get_dyn_cap_ext_list_in_pl {
+		uint32_t extent_cnt;
+		uint32_t start_extent_id;
+	} QEMU_PACKED;
+
+	struct get_dyn_cap_ext_list_out_pl {
+		uint32_t count;
+		uint32_t total_extents;
+		uint32_t generation_num;
+		uint8_t rsvd[4];
+		struct {
+			uint64_t start_dpa;
+			uint64_t len;
+			uint8_t tag[0x10];
+			uint16_t shared_seq;
+			uint8_t rsvd[6];
+		} QEMU_PACKED records[];
+	} QEMU_PACKED;
+
+	struct get_dyn_cap_ext_list_in_pl *in = (void *)cmd->payload;
+	struct get_dyn_cap_ext_list_out_pl *out = (void *)cmd->payload;
+	struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+	uint16_t record_count = 0, i = 0, record_done = 0;
+	CXLDCDExtentList *extent_list = &ct3d->dc.extents;
+	CXLDCD_Extent *ent;
+	uint16_t out_pl_len;
+
+	if (in->start_extent_id > ct3d->dc.total_extent_count)
+		return CXL_MBOX_INVALID_INPUT;
+
+	if (ct3d->dc.total_extent_count - in->start_extent_id < in->extent_cnt)
+		record_count = ct3d->dc.total_extent_count - in->start_extent_id;
+	else
+		record_count = in->extent_cnt;
+
+	out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+	assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
+
+	memset(out, 0, out_pl_len);
+	stl_le_p(&out->count, record_count);
+	stl_le_p(&out->total_extents, ct3d->dc.total_extent_count);
+	stl_le_p(&out->generation_num, ct3d->dc.ext_list_gen_seq);
+
+	QTAILQ_FOREACH(ent, extent_list, node) {
+		if (i++ < in->start_extent_id)
+			continue;
+		stq_le_p(&out->records[i].start_dpa, ent->start_dpa);
+		stq_le_p(&out->records[i].len, ent->len);
+		memcpy(&out->records[i].tag, ent->tag, 0x10);
+		stw_le_p(&out->records[i].shared_seq, ent->shared_seq);
+		record_done++;
+		if (record_done == record_count)
+			break;
+	}
+
+	*len = out_pl_len;
+	return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1041,6 +1109,9 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
         cmd_media_clear_poison, 72, 0 },
 	[DCD_CONFIG][GET_DC_REGION_CONFIG] = { "DCD_GET_DC_REGION_CONFIG",
 		cmd_dcd_get_dyn_cap_config, 2, 0 },
+	[DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
+		"DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
+		8, 0 },
 };
 
 static struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index b9c375d9b4..23954711b5 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -708,6 +708,7 @@ static int cxl_create_toy_regions(CXLType3Dev *ct3d)
 
 		region_base += region->len;
 	}
+	QTAILQ_INIT(&ct3d->dc.extents);
 
 	return 0;
 }
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 8a04e53e90..20ad5e7411 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -385,6 +385,25 @@ typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
 
 #define DCD_MAX_REGION_NUM 8
 
+typedef struct CXLDCD_Extent_raw {
+	uint64_t start_dpa;
+	uint64_t len;
+	uint8_t tag[0x10];
+	uint16_t shared_seq;
+	uint8_t rsvd[0x6];
+} QEMU_PACKED CXLDCExtent_raw;
+
+typedef struct CXLDCD_Extent {
+	uint64_t start_dpa;
+	uint64_t len;
+	uint8_t tag[0x10];
+	uint16_t shared_seq;
+	uint8_t rsvd[0x6];
+
+	QTAILQ_ENTRY(CXLDCD_Extent) node;
+} CXLDCD_Extent;
+typedef QTAILQ_HEAD(, CXLDCD_Extent) CXLDCDExtentList;
+
 typedef struct CXLDCD_Region {
 	uint64_t base;
 	uint64_t decode_len; /* in multiples of 256MB */
@@ -429,6 +448,10 @@ struct CXLType3Dev {
 	struct dynamic_capacity {
 		uint8_t num_regions; // 1-8
 		struct CXLDCD_Region regions[DCD_MAX_REGION_NUM];
+		CXLDCDExtentList extents;
+
+		uint32_t total_extent_count;
+		uint32_t ext_list_gen_seq;
 	} dc;
 };
 
-- 
2.25.1
Re: [RFC 4/7] hw/mem/cxl_type3: Add DC extent representative to cxl type3 device
Posted by Jonathan Cameron via 11 months, 3 weeks ago
On Thu, 11 May 2023 17:56:40 +0000
Fan Ni <fan.ni@samsung.com> wrote:

> From: Fan Ni <nifan@outlook.com>
> 
> Add dynamic capacity extent information to the definition of
> CXLType3Dev and add get DC extent list mailbox command based on
> CXL.spec.3.0:.8.2.9.8.9.2.
> 
> With this command, we can create dc regions as below:
> 
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> 
> echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
> echo 0x30000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> 
> echo 0x30000000 > /sys/bus/cxl/devices/$region/size
> echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> echo 1 > /sys/bus/cxl/devices/$region/commit
> echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
Hi Fan,

A few comments inline,

Thanks,

Jonathan

> ---
>  hw/cxl/cxl-mailbox-utils.c  | 73 ++++++++++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3.c          |  1 +
>  include/hw/cxl/cxl_device.h | 23 ++++++++++++
>  3 files changed, 96 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 61c77e52d8..ed2ac154cb 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -83,6 +83,7 @@ enum {
>          #define CLEAR_POISON           0x2
>  	DCD_CONFIG = 0x48, /*8.2.9.8.9*/
>  		#define GET_DC_REGION_CONFIG   0x0
> +		#define GET_DYN_CAP_EXT_LIST   0x1
>      PHYSICAL_SWITCH = 0x51
>          #define IDENTIFY_SWITCH_DEVICE      0x0
>  };
> @@ -938,7 +939,7 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
>  }
>  
>  /*
> - * cxl spec 3.0: 8.2.9.8.9.2
> + * cxl spec 3.0: 8.2.9.8.9.1

Push that back to earlier patch.

>   * Get Dynamic Capacity Configuration
>   **/
>  static CXLRetCode cmd_dcd_get_dyn_cap_config(struct cxl_cmd *cmd,
> @@ -1001,6 +1002,73 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(struct cxl_cmd *cmd,
>  	return CXL_MBOX_SUCCESS;
>  }
>  
> +/*
> + * cxl spec 3.0: 8.2.9.8.9.2
> + * Get Dynamic Capacity Extent List (Opcode 4810h)
> + **/
> +static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(struct cxl_cmd *cmd,
> +		CXLDeviceState *cxl_dstate,
> +		uint16_t *len)
> +{
> +	struct get_dyn_cap_ext_list_in_pl {
> +		uint32_t extent_cnt;
> +		uint32_t start_extent_id;
> +	} QEMU_PACKED;
> +
> +	struct get_dyn_cap_ext_list_out_pl {
> +		uint32_t count;
> +		uint32_t total_extents;
> +		uint32_t generation_num;
> +		uint8_t rsvd[4];
> +		struct {
> +			uint64_t start_dpa;
> +			uint64_t len;
> +			uint8_t tag[0x10];
> +			uint16_t shared_seq;
> +			uint8_t rsvd[6];
> +		} QEMU_PACKED records[];
> +	} QEMU_PACKED;
> +
> +	struct get_dyn_cap_ext_list_in_pl *in = (void *)cmd->payload;
> +	struct get_dyn_cap_ext_list_out_pl *out = (void *)cmd->payload;
> +	struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> +	uint16_t record_count = 0, i = 0, record_done = 0;
> +	CXLDCDExtentList *extent_list = &ct3d->dc.extents;
> +	CXLDCD_Extent *ent;
> +	uint16_t out_pl_len;
> +
> +	if (in->start_extent_id > ct3d->dc.total_extent_count)
> +		return CXL_MBOX_INVALID_INPUT;
> +
> +	if (ct3d->dc.total_extent_count - in->start_extent_id < in->extent_cnt)
> +		record_count = ct3d->dc.total_extent_count - in->start_extent_id;
> +	else
> +		record_count = in->extent_cnt;
> +
> +	out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> +	assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> +
> +	memset(out, 0, out_pl_len);
> +	stl_le_p(&out->count, record_count);
> +	stl_le_p(&out->total_extents, ct3d->dc.total_extent_count);
> +	stl_le_p(&out->generation_num, ct3d->dc.ext_list_gen_seq);
> +
> +	QTAILQ_FOREACH(ent, extent_list, node) {
> +		if (i++ < in->start_extent_id)
> +			continue;
> +		stq_le_p(&out->records[i].start_dpa, ent->start_dpa);

I has been incrementing for the records skipped.  By now it may be well off
the end of records.  You need a separate index for the ones you are filling
that is incremented only when you write one.
record_done for example.
	out->records[record_done].len etc


> +		stq_le_p(&out->records[i].len, ent->len);
> +		memcpy(&out->records[i].tag, ent->tag, 0x10);
> +		stw_le_p(&out->records[i].shared_seq, ent->shared_seq);
> +		record_done++;
> +		if (record_done == record_count)
> +			break;
> +	}
> +
> +	*len = out_pl_len;
> +	return CXL_MBOX_SUCCESS;
> +}
> +
>  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>  #define IMMEDIATE_DATA_CHANGE (1 << 2)
>  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> @@ -1041,6 +1109,9 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
>          cmd_media_clear_poison, 72, 0 },
>  	[DCD_CONFIG][GET_DC_REGION_CONFIG] = { "DCD_GET_DC_REGION_CONFIG",
>  		cmd_dcd_get_dyn_cap_config, 2, 0 },
> +	[DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
> +		"DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
> +		8, 0 },
>  };
>  
>  static struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index b9c375d9b4..23954711b5 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -708,6 +708,7 @@ static int cxl_create_toy_regions(CXLType3Dev *ct3d)
>  
>  		region_base += region->len;
>  	}
> +	QTAILQ_INIT(&ct3d->dc.extents);
>  
>  	return 0;
>  }
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 8a04e53e90..20ad5e7411 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -385,6 +385,25 @@ typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
>  
>  #define DCD_MAX_REGION_NUM 8
>  
> +typedef struct CXLDCD_Extent_raw {
> +	uint64_t start_dpa;
> +	uint64_t len;
> +	uint8_t tag[0x10];
> +	uint16_t shared_seq;
> +	uint8_t rsvd[0x6];
> +} QEMU_PACKED CXLDCExtent_raw;

What's this for?

> +
> +typedef struct CXLDCD_Extent {
> +	uint64_t start_dpa;
> +	uint64_t len;
> +	uint8_t tag[0x10];
> +	uint16_t shared_seq;
> +	uint8_t rsvd[0x6];
> +
> +	QTAILQ_ENTRY(CXLDCD_Extent) node;
> +} CXLDCD_Extent;
> +typedef QTAILQ_HEAD(, CXLDCD_Extent) CXLDCDExtentList;
> +
>  typedef struct CXLDCD_Region {
>  	uint64_t base;
>  	uint64_t decode_len; /* in multiples of 256MB */
> @@ -429,6 +448,10 @@ struct CXLType3Dev {
>  	struct dynamic_capacity {
>  		uint8_t num_regions; // 1-8
>  		struct CXLDCD_Region regions[DCD_MAX_REGION_NUM];
> +		CXLDCDExtentList extents;
> +
> +		uint32_t total_extent_count;
> +		uint32_t ext_list_gen_seq;
>  	} dc;
>  };
>
Re: [RFC 4/7] hw/mem/cxl_type3: Add DC extent representative to cxl type3 device
Posted by Nathan Fontenot 11 months, 4 weeks ago
On 5/11/23 12:56, Fan Ni wrote:
> From: Fan Ni <nifan@outlook.com>
> 
> Add dynamic capacity extent information to the definition of
> CXLType3Dev and add get DC extent list mailbox command based on
> CXL.spec.3.0:.8.2.9.8.9.2.
> 
> With this command, we can create dc regions as below:
> 
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> 
> echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
> echo 0x30000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> 
> echo 0x30000000 > /sys/bus/cxl/devices/$region/size
> echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> echo 1 > /sys/bus/cxl/devices/$region/commit
> echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 73 ++++++++++++++++++++++++++++++++++++-
>  hw/mem/cxl_type3.c          |  1 +
>  include/hw/cxl/cxl_device.h | 23 ++++++++++++
>  3 files changed, 96 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 61c77e52d8..ed2ac154cb 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -83,6 +83,7 @@ enum {
>          #define CLEAR_POISON           0x2
>  	DCD_CONFIG = 0x48, /*8.2.9.8.9*/
>  		#define GET_DC_REGION_CONFIG   0x0
> +		#define GET_DYN_CAP_EXT_LIST   0x1
>      PHYSICAL_SWITCH = 0x51
>          #define IDENTIFY_SWITCH_DEVICE      0x0
>  };
> @@ -938,7 +939,7 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
>  }
>  
>  /*
> - * cxl spec 3.0: 8.2.9.8.9.2
> + * cxl spec 3.0: 8.2.9.8.9.1
>   * Get Dynamic Capacity Configuration
>   **/
>  static CXLRetCode cmd_dcd_get_dyn_cap_config(struct cxl_cmd *cmd,
> @@ -1001,6 +1002,73 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(struct cxl_cmd *cmd,
>  	return CXL_MBOX_SUCCESS;
>  }
>  
> +/*
> + * cxl spec 3.0: 8.2.9.8.9.2
> + * Get Dynamic Capacity Extent List (Opcode 4810h)
> + **/
> +static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(struct cxl_cmd *cmd,
> +		CXLDeviceState *cxl_dstate,
> +		uint16_t *len)
> +{
> +	struct get_dyn_cap_ext_list_in_pl {
> +		uint32_t extent_cnt;
> +		uint32_t start_extent_id;
> +	} QEMU_PACKED;
> +
> +	struct get_dyn_cap_ext_list_out_pl {
> +		uint32_t count;
> +		uint32_t total_extents;
> +		uint32_t generation_num;
> +		uint8_t rsvd[4];
> +		struct {
> +			uint64_t start_dpa;
> +			uint64_t len;
> +			uint8_t tag[0x10];
> +			uint16_t shared_seq;
> +			uint8_t rsvd[6];
> +		} QEMU_PACKED records[];

Similar to a previous note, could this be CXLDCExtent_raw instead of re-defining
the structure?


> +	} QEMU_PACKED;
> +
> +	struct get_dyn_cap_ext_list_in_pl *in = (void *)cmd->payload;
> +	struct get_dyn_cap_ext_list_out_pl *out = (void *)cmd->payload;
> +	struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> +	uint16_t record_count = 0, i = 0, record_done = 0;
> +	CXLDCDExtentList *extent_list = &ct3d->dc.extents;
> +	CXLDCD_Extent *ent;
> +	uint16_t out_pl_len;
> +
> +	if (in->start_extent_id > ct3d->dc.total_extent_count)
> +		return CXL_MBOX_INVALID_INPUT;
> +
> +	if (ct3d->dc.total_extent_count - in->start_extent_id < in->extent_cnt)
> +		record_count = ct3d->dc.total_extent_count - in->start_extent_id;
> +	else
> +		record_count = in->extent_cnt;
> +
> +	out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> +	assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);

Perhaps it would be nicer to return a failure here instaead of assert().

-Nathan

> +
> +	memset(out, 0, out_pl_len);
> +	stl_le_p(&out->count, record_count);
> +	stl_le_p(&out->total_extents, ct3d->dc.total_extent_count);
> +	stl_le_p(&out->generation_num, ct3d->dc.ext_list_gen_seq);
> +
> +	QTAILQ_FOREACH(ent, extent_list, node) {
> +		if (i++ < in->start_extent_id)
> +			continue;
> +		stq_le_p(&out->records[i].start_dpa, ent->start_dpa);
> +		stq_le_p(&out->records[i].len, ent->len);
> +		memcpy(&out->records[i].tag, ent->tag, 0x10);
> +		stw_le_p(&out->records[i].shared_seq, ent->shared_seq);
> +		record_done++;
> +		if (record_done == record_count)
> +			break;
> +	}
> +
> +	*len = out_pl_len;
> +	return CXL_MBOX_SUCCESS;
> +}
> +
>  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>  #define IMMEDIATE_DATA_CHANGE (1 << 2)
>  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> @@ -1041,6 +1109,9 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
>          cmd_media_clear_poison, 72, 0 },
>  	[DCD_CONFIG][GET_DC_REGION_CONFIG] = { "DCD_GET_DC_REGION_CONFIG",
>  		cmd_dcd_get_dyn_cap_config, 2, 0 },
> +	[DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
> +		"DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
> +		8, 0 },
>  };
>  
>  static struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index b9c375d9b4..23954711b5 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -708,6 +708,7 @@ static int cxl_create_toy_regions(CXLType3Dev *ct3d)
>  
>  		region_base += region->len;
>  	}
> +	QTAILQ_INIT(&ct3d->dc.extents);
>  
>  	return 0;
>  }
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 8a04e53e90..20ad5e7411 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -385,6 +385,25 @@ typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
>  
>  #define DCD_MAX_REGION_NUM 8
>  
> +typedef struct CXLDCD_Extent_raw {
> +	uint64_t start_dpa;
> +	uint64_t len;
> +	uint8_t tag[0x10];
> +	uint16_t shared_seq;
> +	uint8_t rsvd[0x6];
> +} QEMU_PACKED CXLDCExtent_raw;
> +
> +typedef struct CXLDCD_Extent {
> +	uint64_t start_dpa;
> +	uint64_t len;
> +	uint8_t tag[0x10];
> +	uint16_t shared_seq;
> +	uint8_t rsvd[0x6];
> +
> +	QTAILQ_ENTRY(CXLDCD_Extent) node;
> +} CXLDCD_Extent;> +typedef QTAILQ_HEAD(, CXLDCD_Extent) CXLDCDExtentList;
> +
>  typedef struct CXLDCD_Region {
>  	uint64_t base;
>  	uint64_t decode_len; /* in multiples of 256MB */
> @@ -429,6 +448,10 @@ struct CXLType3Dev {
>  	struct dynamic_capacity {
>  		uint8_t num_regions; // 1-8
>  		struct CXLDCD_Region regions[DCD_MAX_REGION_NUM];
> +		CXLDCDExtentList extents;
> +
> +		uint32_t total_extent_count;
> +		uint32_t ext_list_gen_seq;
>  	} dc;
>  };
>
[RFC 5/7] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
Posted by Fan Ni 12 months ago
From: Fan Ni <nifan@outlook.com>

Per CXL spec 3.0, we implemented the two mailbox commands:
Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.8.9.3, and
Release Dynamic Capacity Response (Opcode 4803h) 8.2.9.8.9.4.

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  | 223 ++++++++++++++++++++++++++++++++++++
 include/hw/cxl/cxl_device.h |   3 +-
 2 files changed, 225 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index ed2ac154cb..7212934627 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -84,6 +84,8 @@ enum {
 	DCD_CONFIG = 0x48, /*8.2.9.8.9*/
 		#define GET_DC_REGION_CONFIG   0x0
 		#define GET_DYN_CAP_EXT_LIST   0x1
+		#define ADD_DYN_CAP_RSP        0x2
+		#define RELEASE_DYN_CAP        0x3
     PHYSICAL_SWITCH = 0x51
         #define IDENTIFY_SWITCH_DEVICE      0x0
 };
@@ -1069,6 +1071,221 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(struct cxl_cmd *cmd,
 	return CXL_MBOX_SUCCESS;
 }
 
+static inline int test_bits(const unsigned long *addr, int nr, int size)
+{
+	unsigned long res = find_next_zero_bit(addr, size + nr, nr);
+
+	if (res >= nr + size)
+		return 1;
+	else
+		return 0;
+}
+
+static uint8_t find_region_id(struct CXLType3Dev *dev, uint64_t dpa
+		, uint64_t len)
+{
+	int8_t i = dev->dc.num_regions-1;
+
+	while (i > 0 && dpa < dev->dc.regions[i].base)
+		i--;
+
+	if (dpa < dev->dc.regions[i].base
+			|| dpa + len > dev->dc.regions[i].base + dev->dc.regions[i].len)
+		return dev->dc.num_regions;
+
+	return i;
+}
+
+static CXLRetCode detect_malformed_extent_list(CXLType3Dev *dev, void *data)
+{
+	struct updated_dc_extent_list_in_pl {
+		uint32_t num_entries_updated;
+		uint8_t rsvd[4];
+		struct {
+			uint64_t start_dpa;
+			uint64_t len;
+			uint8_t rsvd[8];
+		} QEMU_PACKED updated_entries[];
+	} QEMU_PACKED;
+
+	struct updated_dc_extent_list_in_pl *in = data;
+	unsigned long *blk_bitmap;
+	uint64_t min_block_size = dev->dc.regions[0].block_size;
+	struct CXLDCD_Region *region = &dev->dc.regions[0];
+	uint32_t i;
+	uint64_t dpa, len;
+	uint8_t rid;
+
+	for (i = 1; i < dev->dc.num_regions; i++) {
+		region = &dev->dc.regions[i];
+		if (min_block_size > region->block_size)
+			min_block_size = region->block_size;
+	}
+	blk_bitmap = bitmap_new((region->len + region->base
+				- dev->dc.regions[0].base) / min_block_size);
+	g_assert(blk_bitmap);
+	bitmap_zero(blk_bitmap, (region->len + region->base
+				- dev->dc.regions[0].base) / min_block_size);
+
+	for (i = 0; i < in->num_entries_updated; i++) {
+		dpa = in->updated_entries[i].start_dpa;
+		len = in->updated_entries[i].len;
+
+		rid = find_region_id(dev, dpa, len);
+		if (rid == dev->dc.num_regions) {
+			g_free(blk_bitmap);
+			return CXL_MBOX_INVALID_PA;
+		}
+		region = &dev->dc.regions[rid];
+		if (dpa % region->block_size || len % region->block_size) {
+			g_free(blk_bitmap);
+			return CXL_MBOX_INVALID_EXTENT_LIST;
+		}
+		/* the dpa range already covered by some other extents in the list */
+		if (test_bits(blk_bitmap, dpa/min_block_size, len/min_block_size)) {
+			g_free(blk_bitmap);
+			return CXL_MBOX_INVALID_EXTENT_LIST;
+		}
+		bitmap_set(blk_bitmap, dpa/min_block_size, len/min_block_size);
+	}
+
+	g_free(blk_bitmap);
+	return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * cxl spec 3.0: 8.2.9.8.9.3
+ * Add Dynamic Capacity Response (opcode 4802h)
+ * Assuming extent list is updated when a extent is added, when receiving
+ * the response, verify and ensure the extent is utilized by the host, and
+ * update extent list  as needed.
+ **/
+static CXLRetCode cmd_dcd_add_dyn_cap_rsp(struct cxl_cmd *cmd,
+		CXLDeviceState *cxl_dstate,
+		uint16_t *len_unused)
+{
+	struct add_dyn_cap_ext_list_in_pl {
+		uint32_t num_entries_updated;
+		uint8_t rsvd[4];
+		struct {
+			uint64_t start_dpa;
+			uint64_t len;
+			uint8_t rsvd[8];
+		} QEMU_PACKED updated_entries[];
+	} QEMU_PACKED;
+
+	struct add_dyn_cap_ext_list_in_pl *in = (void *)cmd->payload;
+	struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+	CXLDCDExtentList *extent_list = &ct3d->dc.extents;
+	CXLDCD_Extent *ent;
+	uint32_t i;
+	uint64_t dpa, len;
+	CXLRetCode rs;
+
+	if (in->num_entries_updated == 0)
+		return CXL_MBOX_SUCCESS;
+
+	rs = detect_malformed_extent_list(ct3d, in);
+	if (rs != CXL_MBOX_SUCCESS)
+		return rs;
+
+	for (i = 0; i < in->num_entries_updated; i++) {
+		dpa = in->updated_entries[i].start_dpa;
+		len = in->updated_entries[i].len;
+
+		/* Todo: check following
+		 * One or more of the updated extent lists contain Starting DPA
+		 * or Lengths that are out of range of a current extent list
+		 * maintained by the device.
+		 **/
+
+		QTAILQ_FOREACH(ent, extent_list, node) {
+			if (ent->start_dpa == dpa && ent->len == len)
+				return CXL_MBOX_INVALID_PA;
+			if (ent->start_dpa <= dpa
+				&& dpa + len <= ent->start_dpa + ent->len) {
+				ent->start_dpa = dpa;
+				ent->len = len;
+				break;
+			} else if ((dpa < ent->start_dpa + ent->len
+				&& dpa + len > ent->start_dpa + ent->len)
+				|| (dpa < ent->start_dpa && dpa + len > ent->start_dpa))
+				return CXL_MBOX_INVALID_EXTENT_LIST;
+		}
+		// a new extent added
+		if (!ent) {
+			ent = g_new0(CXLDCD_Extent, 1);
+			assert(ent);
+			ent->start_dpa = dpa;
+			ent->len = len;
+			memset(ent->tag, 0, 0x10);
+			ent->shared_seq = 0;
+			QTAILQ_INSERT_TAIL(extent_list, ent, node);
+		}
+	}
+
+	return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * Spec 3.0: 8.2.9.8.9.4
+ * Release Dynamic Capacity (opcode 4803h)
+ **/
+static CXLRetCode cmd_dcd_release_dcd_capacity(struct cxl_cmd *cmd,
+		CXLDeviceState *cxl_dstate,
+		uint16_t *len_unused)
+{
+	struct release_dcd_cap_in_pl {
+		uint32_t num_entries_updated;
+		uint8_t rsvd[4];
+		struct {
+			uint64_t start_dpa;
+			uint64_t len;
+			uint8_t rsvd[8];
+		} QEMU_PACKED updated_entries[];
+	} QEMU_PACKED;
+
+	struct release_dcd_cap_in_pl *in = (void *)cmd->payload;
+	struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+	CXLDCDExtentList *extent_list = &ct3d->dc.extents;
+	CXLDCD_Extent *ent;
+	uint32_t i;
+	uint64_t dpa, len;
+	CXLRetCode rs;
+
+	if (in->num_entries_updated == 0)
+		return CXL_MBOX_INVALID_INPUT;
+
+	rs = detect_malformed_extent_list(ct3d, in);
+	if (rs != CXL_MBOX_SUCCESS)
+		return rs;
+
+		/* Todo: check following
+		 * One or more of the updated extent lists contain Starting DPA
+		 * or Lengths that are out of range of a current extent list
+		 * maintained by the device.
+		 **/
+
+	for (i = 0; i < in->num_entries_updated; i++) {
+		dpa = in->updated_entries[i].start_dpa;
+		len = in->updated_entries[i].len;
+
+		QTAILQ_FOREACH(ent, extent_list, node) {
+			if (ent->start_dpa == dpa && ent->len == len)
+				break;
+			else if ((dpa < ent->start_dpa + ent->len
+				&& dpa + len > ent->start_dpa + ent->len)
+				|| (dpa < ent->start_dpa && dpa + len > ent->start_dpa))
+				return CXL_MBOX_INVALID_EXTENT_LIST;
+		}
+		/* found the entry, release it */
+		if (ent)
+			QTAILQ_REMOVE(extent_list, ent, node);
+	}
+
+	return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1112,6 +1329,12 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 	[DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
 		"DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
 		8, 0 },
+	[DCD_CONFIG][ADD_DYN_CAP_RSP] = {
+		"ADD_DCD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
+		~0, IMMEDIATE_DATA_CHANGE },
+	[DCD_CONFIG][RELEASE_DYN_CAP] = {
+		"RELEASE_DCD_DYNAMIC_CAPACITY", cmd_dcd_release_dcd_capacity,
+		~0, IMMEDIATE_DATA_CHANGE },
 };
 
 static struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 20ad5e7411..c0c8fcc24b 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -131,7 +131,8 @@ typedef enum {
     CXL_MBOX_INCORRECT_PASSPHRASE = 0x14,
     CXL_MBOX_UNSUPPORTED_MAILBOX = 0x15,
     CXL_MBOX_INVALID_PAYLOAD_LENGTH = 0x16,
-    CXL_MBOX_MAX = 0x17
+	CXL_MBOX_INVALID_EXTENT_LIST = 0x17,
+	CXL_MBOX_MAX = 0x18
 } CXLRetCode;
 
 struct cxl_cmd;
-- 
2.25.1
Re: [RFC 5/7] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
Posted by Jonathan Cameron via 11 months, 3 weeks ago
On Thu, 11 May 2023 17:56:40 +0000
Fan Ni <fan.ni@samsung.com> wrote:

> From: Fan Ni <nifan@outlook.com>
> 
> Per CXL spec 3.0, we implemented the two mailbox commands:
> Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.8.9.3, and
> Release Dynamic Capacity Response (Opcode 4803h) 8.2.9.8.9.4.
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 223 ++++++++++++++++++++++++++++++++++++
>  include/hw/cxl/cxl_device.h |   3 +-
>  2 files changed, 225 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index ed2ac154cb..7212934627 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -84,6 +84,8 @@ enum {
>  	DCD_CONFIG = 0x48, /*8.2.9.8.9*/
>  		#define GET_DC_REGION_CONFIG   0x0
>  		#define GET_DYN_CAP_EXT_LIST   0x1
> +		#define ADD_DYN_CAP_RSP        0x2
> +		#define RELEASE_DYN_CAP        0x3
>      PHYSICAL_SWITCH = 0x51
>          #define IDENTIFY_SWITCH_DEVICE      0x0
>  };
> @@ -1069,6 +1071,221 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(struct cxl_cmd *cmd,
>  	return CXL_MBOX_SUCCESS;
>  }
>  
> +static inline int test_bits(const unsigned long *addr, int nr, int size)

Not obvious what this does from the name.  Please add some docs.

> +{
> +	unsigned long res = find_next_zero_bit(addr, size + nr, nr);
> +
> +	if (res >= nr + size)
> +		return 1;
> +	else
> +		return 0;
> +}
> +
> +static uint8_t find_region_id(struct CXLType3Dev *dev, uint64_t dpa

Operates only on things in dev->dc, so perhaps pass that instead.

> +		, uint64_t len)

comma on previous line.

> +{
> +	int8_t i = dev->dc.num_regions-1;
> +
> +	while (i > 0 && dpa < dev->dc.regions[i].base)
> +		i--;
> +
> +	if (dpa < dev->dc.regions[i].base
> +			|| dpa + len > dev->dc.regions[i].base + dev->dc.regions[i].len)
> +		return dev->dc.num_regions;
> +
> +	return i;
> +}
> +
> +static CXLRetCode detect_malformed_extent_list(CXLType3Dev *dev, void *data)
> +{
> +	struct updated_dc_extent_list_in_pl {
This is same as used in next function.  Pull it out of the functions and
use that type rather than mapping via a void *data pointer.  

> +		uint32_t num_entries_updated;
> +		uint8_t rsvd[4];
> +		struct {
> +			uint64_t start_dpa;
> +			uint64_t len;
> +			uint8_t rsvd[8];
> +		} QEMU_PACKED updated_entries[];
> +	} QEMU_PACKED;
> +
> +	struct updated_dc_extent_list_in_pl *in = data;
> +	unsigned long *blk_bitmap;
> +	uint64_t min_block_size = dev->dc.regions[0].block_size;
> +	struct CXLDCD_Region *region = &dev->dc.regions[0];
> +	uint32_t i;
> +	uint64_t dpa, len;
> +	uint8_t rid;
> +
> +	for (i = 1; i < dev->dc.num_regions; i++) {
> +		region = &dev->dc.regions[i];
> +		if (min_block_size > region->block_size)
> +			min_block_size = region->block_size;
> +	}
> +	blk_bitmap = bitmap_new((region->len + region->base
> +				- dev->dc.regions[0].base) / min_block_size);
> +	g_assert(blk_bitmap);

Abort in bitmap_new() anyway so no need for this.  Most qemu allocations
don't need to be checked as they fail hard anyway so we never get to the checks.

> +	bitmap_zero(blk_bitmap, (region->len + region->base
> +				- dev->dc.regions[0].base) / min_block_size);

bitmap_new() seems to use a g_malloc0 internally so no need to zero again here
I think.

> +
> +	for (i = 0; i < in->num_entries_updated; i++) {
> +		dpa = in->updated_entries[i].start_dpa;
> +		len = in->updated_entries[i].len;
> +
> +		rid = find_region_id(dev, dpa, len);
> +		if (rid == dev->dc.num_regions) {

Use a goto and single cleanup path having set ret or similar to
the particular issue.

> +			g_free(blk_bitmap);
> +			return CXL_MBOX_INVALID_PA;
> +		}
> +		region = &dev->dc.regions[rid];
> +		if (dpa % region->block_size || len % region->block_size) {
> +			g_free(blk_bitmap);
> +			return CXL_MBOX_INVALID_EXTENT_LIST;
goto from here as well.. etc.

> +		}
> +		/* the dpa range already covered by some other extents in the list */
> +		if (test_bits(blk_bitmap, dpa/min_block_size, len/min_block_size)) {
> +			g_free(blk_bitmap);
> +			return CXL_MBOX_INVALID_EXTENT_LIST;
> +		}
> +		bitmap_set(blk_bitmap, dpa/min_block_size, len/min_block_size);
> +	}
> +
> +	g_free(blk_bitmap);
> +	return CXL_MBOX_SUCCESS;
> +}
> +
> +/*
> + * cxl spec 3.0: 8.2.9.8.9.3
> + * Add Dynamic Capacity Response (opcode 4802h)
> + * Assuming extent list is updated when a extent is added, when receiving
> + * the response, verify and ensure the extent is utilized by the host, and
> + * update extent list  as needed.

Double space in middle of sentence

> + **/
> +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(struct cxl_cmd *cmd,
> +		CXLDeviceState *cxl_dstate,
> +		uint16_t *len_unused)
> +{
> +	struct add_dyn_cap_ext_list_in_pl {
> +		uint32_t num_entries_updated;
> +		uint8_t rsvd[4];
> +		struct {
> +			uint64_t start_dpa;
> +			uint64_t len;
> +			uint8_t rsvd[8];
> +		} QEMU_PACKED updated_entries[];

These extent list entries keep turning up in the code. Pull that out of here
to be a general 'Extent list element' or similar.

> +	} QEMU_PACKED;
> +
> +	struct add_dyn_cap_ext_list_in_pl *in = (void *)cmd->payload;
> +	struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> +	CXLDCDExtentList *extent_list = &ct3d->dc.extents;
> +	CXLDCD_Extent *ent;
> +	uint32_t i;
> +	uint64_t dpa, len;
> +	CXLRetCode rs;
> +
> +	if (in->num_entries_updated == 0)
> +		return CXL_MBOX_SUCCESS;
> +
> +	rs = detect_malformed_extent_list(ct3d, in);
> +	if (rs != CXL_MBOX_SUCCESS)
> +		return rs;
> +
> +	for (i = 0; i < in->num_entries_updated; i++) {
> +		dpa = in->updated_entries[i].start_dpa;
> +		len = in->updated_entries[i].len;
> +
> +		/* Todo: check following
> +		 * One or more of the updated extent lists contain Starting DPA
> +		 * or Lengths that are out of range of a current extent list
> +		 * maintained by the device.
> +		 **/
> +
> +		QTAILQ_FOREACH(ent, extent_list, node) {

Add some comments here.  Is this a repeated entry test?

> +			if (ent->start_dpa == dpa && ent->len == len)
> +				return CXL_MBOX_INVALID_PA;
> +			if (ent->start_dpa <= dpa
> +				&& dpa + len <= ent->start_dpa + ent->len) {

Comment needed on this one. Why is it shrinking an existing entry?

> +				ent->start_dpa = dpa;
> +				ent->len = len;
> +				break;

If you break here I think it will only result in return CXL_MBOX_SUCCESS so
might as well do that here.

> +			} else if ((dpa < ent->start_dpa + ent->len
> +				&& dpa + len > ent->start_dpa + ent->len)

Above is new entry would contain existing one.

> +				|| (dpa < ent->start_dpa && dpa + len > ent->start_dpa))
> +				return CXL_MBOX_INVALID_EXTENT_LIST;
> +		}
> +		// a new extent added

Why?  This function generally need more documentation.

> +		if (!ent) {
> +			ent = g_new0(CXLDCD_Extent, 1);
> +			assert(ent);

No need.  g_new0 will already have blown up before you get here.

> +			ent->start_dpa = dpa;
> +			ent->len = len;
> +			memset(ent->tag, 0, 0x10);
Allocated empty just above, still empty so no need to set it again.
> +			ent->shared_seq = 0;
> +			QTAILQ_INSERT_TAIL(extent_list, ent, node);
> +		}
> +	}
> +
> +	return CXL_MBOX_SUCCESS;
> +}
> +
> +/*
> + * Spec 3.0: 8.2.9.8.9.4
> + * Release Dynamic Capacity (opcode 4803h)
> + **/
> +static CXLRetCode cmd_dcd_release_dcd_capacity(struct cxl_cmd *cmd,
> +		CXLDeviceState *cxl_dstate,
> +		uint16_t *len_unused)
> +{
> +	struct release_dcd_cap_in_pl {
> +		uint32_t num_entries_updated;
> +		uint8_t rsvd[4];
> +		struct {
> +			uint64_t start_dpa;
> +			uint64_t len;
> +			uint8_t rsvd[8];
> +		} QEMU_PACKED updated_entries[];
> +	} QEMU_PACKED;
> +
> +	struct release_dcd_cap_in_pl *in = (void *)cmd->payload;
> +	struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> +	CXLDCDExtentList *extent_list = &ct3d->dc.extents;
> +	CXLDCD_Extent *ent;
> +	uint32_t i;
> +	uint64_t dpa, len;
> +	CXLRetCode rs;
> +
> +	if (in->num_entries_updated == 0)
> +		return CXL_MBOX_INVALID_INPUT;
> +
> +	rs = detect_malformed_extent_list(ct3d, in);
> +	if (rs != CXL_MBOX_SUCCESS)
> +		return rs;
> +
> +		/* Todo: check following
> +		 * One or more of the updated extent lists contain Starting DPA
> +		 * or Lengths that are out of range of a current extent list
> +		 * maintained by the device.
> +		 **/
> +
> +	for (i = 0; i < in->num_entries_updated; i++) {
> +		dpa = in->updated_entries[i].start_dpa;
> +		len = in->updated_entries[i].len;
> +
> +		QTAILQ_FOREACH(ent, extent_list, node) {
> +			if (ent->start_dpa == dpa && ent->len == len)

Do the remove here and comment on 'found entry' not needed.
Note I think you can release partial extents so that will need handling at some point.

> +				break;
> +			else if ((dpa < ent->start_dpa + ent->len
> +				&& dpa + len > ent->start_dpa + ent->len)
> +				|| (dpa < ent->start_dpa && dpa + len > ent->start_dpa))
Comment on this condition and why it's a problem.

> +				return CXL_MBOX_INVALID_EXTENT_LIST;
> +		}
> +		/* found the entry, release it */
> +		if (ent)
> +			QTAILQ_REMOVE(extent_list, ent, node);
> +	}
> +
> +	return CXL_MBOX_SUCCESS;
> +}
> +
>  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>  #define IMMEDIATE_DATA_CHANGE (1 << 2)
>  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> @@ -1112,6 +1329,12 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
>  	[DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
>  		"DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
>  		8, 0 },
> +	[DCD_CONFIG][ADD_DYN_CAP_RSP] = {
> +		"ADD_DCD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
> +		~0, IMMEDIATE_DATA_CHANGE },
> +	[DCD_CONFIG][RELEASE_DYN_CAP] = {
> +		"RELEASE_DCD_DYNAMIC_CAPACITY", cmd_dcd_release_dcd_capacity,
> +		~0, IMMEDIATE_DATA_CHANGE },
>  };
>  
>  static struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 20ad5e7411..c0c8fcc24b 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -131,7 +131,8 @@ typedef enum {
>      CXL_MBOX_INCORRECT_PASSPHRASE = 0x14,
>      CXL_MBOX_UNSUPPORTED_MAILBOX = 0x15,
>      CXL_MBOX_INVALID_PAYLOAD_LENGTH = 0x16,
> -    CXL_MBOX_MAX = 0x17
> +	CXL_MBOX_INVALID_EXTENT_LIST = 0x17,

0x1e in the spec.

> +	CXL_MBOX_MAX = 0x18
>  } CXLRetCode;
>  
>  struct cxl_cmd;
Re: [RFC 5/7] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
Posted by nifan@outlook.com 10 months, 1 week ago
The 05/15/2023 15:37, Jonathan Cameron wrote:
> On Thu, 11 May 2023 17:56:40 +0000
> Fan Ni <fan.ni@samsung.com> wrote:
> 
> > From: Fan Ni <nifan@outlook.com>
> > 
> > Per CXL spec 3.0, we implemented the two mailbox commands:
> > Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.8.9.3, and
> > Release Dynamic Capacity Response (Opcode 4803h) 8.2.9.8.9.4.
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  | 223 ++++++++++++++++++++++++++++++++++++
> >  include/hw/cxl/cxl_device.h |   3 +-
> >  2 files changed, 225 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index ed2ac154cb..7212934627 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -84,6 +84,8 @@ enum {
> >  	DCD_CONFIG = 0x48, /*8.2.9.8.9*/
> >  		#define GET_DC_REGION_CONFIG   0x0
> >  		#define GET_DYN_CAP_EXT_LIST   0x1
> > +		#define ADD_DYN_CAP_RSP        0x2
> > +		#define RELEASE_DYN_CAP        0x3
> >      PHYSICAL_SWITCH = 0x51
> >          #define IDENTIFY_SWITCH_DEVICE      0x0
> >  };
> > @@ -1069,6 +1071,221 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(struct cxl_cmd *cmd,
> >  	return CXL_MBOX_SUCCESS;
> >  }
> >  
> > +static inline int test_bits(const unsigned long *addr, int nr, int size)
> 
> Not obvious what this does from the name.  Please add some docs.
> 
> > +{
> > +	unsigned long res = find_next_zero_bit(addr, size + nr, nr);
> > +
> > +	if (res >= nr + size)
> > +		return 1;
> > +	else
> > +		return 0;
> > +}
> > +
> > +static uint8_t find_region_id(struct CXLType3Dev *dev, uint64_t dpa
> 
> Operates only on things in dev->dc, so perhaps pass that instead.
dc is a struct defined inside CXLType3Dev struct, so will leave it as it
is for now.

Fan
> 
> > +		, uint64_t len)
> 
> comma on previous line.
> 
> > +{
> > +	int8_t i = dev->dc.num_regions-1;
> > +
> > +	while (i > 0 && dpa < dev->dc.regions[i].base)
> > +		i--;
> > +
> > +	if (dpa < dev->dc.regions[i].base
> > +			|| dpa + len > dev->dc.regions[i].base + dev->dc.regions[i].len)
> > +		return dev->dc.num_regions;
> > +
> > +	return i;
> > +}
> > +
> > +static CXLRetCode detect_malformed_extent_list(CXLType3Dev *dev, void *data)
> > +{
> > +	struct updated_dc_extent_list_in_pl {
> This is same as used in next function.  Pull it out of the functions and
> use that type rather than mapping via a void *data pointer.  
> 
> > +		uint32_t num_entries_updated;
> > +		uint8_t rsvd[4];
> > +		struct {
> > +			uint64_t start_dpa;
> > +			uint64_t len;
> > +			uint8_t rsvd[8];
> > +		} QEMU_PACKED updated_entries[];
> > +	} QEMU_PACKED;
> > +
> > +	struct updated_dc_extent_list_in_pl *in = data;
> > +	unsigned long *blk_bitmap;
> > +	uint64_t min_block_size = dev->dc.regions[0].block_size;
> > +	struct CXLDCD_Region *region = &dev->dc.regions[0];
> > +	uint32_t i;
> > +	uint64_t dpa, len;
> > +	uint8_t rid;
> > +
> > +	for (i = 1; i < dev->dc.num_regions; i++) {
> > +		region = &dev->dc.regions[i];
> > +		if (min_block_size > region->block_size)
> > +			min_block_size = region->block_size;
> > +	}
> > +	blk_bitmap = bitmap_new((region->len + region->base
> > +				- dev->dc.regions[0].base) / min_block_size);
> > +	g_assert(blk_bitmap);
> 
> Abort in bitmap_new() anyway so no need for this.  Most qemu allocations
> don't need to be checked as they fail hard anyway so we never get to the checks.
> 
> > +	bitmap_zero(blk_bitmap, (region->len + region->base
> > +				- dev->dc.regions[0].base) / min_block_size);
> 
> bitmap_new() seems to use a g_malloc0 internally so no need to zero again here
> I think.
> 
> > +
> > +	for (i = 0; i < in->num_entries_updated; i++) {
> > +		dpa = in->updated_entries[i].start_dpa;
> > +		len = in->updated_entries[i].len;
> > +
> > +		rid = find_region_id(dev, dpa, len);
> > +		if (rid == dev->dc.num_regions) {
> 
> Use a goto and single cleanup path having set ret or similar to
> the particular issue.
> 
> > +			g_free(blk_bitmap);
> > +			return CXL_MBOX_INVALID_PA;
> > +		}
> > +		region = &dev->dc.regions[rid];
> > +		if (dpa % region->block_size || len % region->block_size) {
> > +			g_free(blk_bitmap);
> > +			return CXL_MBOX_INVALID_EXTENT_LIST;
> goto from here as well.. etc.
> 
> > +		}
> > +		/* the dpa range already covered by some other extents in the list */
> > +		if (test_bits(blk_bitmap, dpa/min_block_size, len/min_block_size)) {
> > +			g_free(blk_bitmap);
> > +			return CXL_MBOX_INVALID_EXTENT_LIST;
> > +		}
> > +		bitmap_set(blk_bitmap, dpa/min_block_size, len/min_block_size);
> > +	}
> > +
> > +	g_free(blk_bitmap);
> > +	return CXL_MBOX_SUCCESS;
> > +}
> > +
> > +/*
> > + * cxl spec 3.0: 8.2.9.8.9.3
> > + * Add Dynamic Capacity Response (opcode 4802h)
> > + * Assuming extent list is updated when a extent is added, when receiving
> > + * the response, verify and ensure the extent is utilized by the host, and
> > + * update extent list  as needed.
> 
> Double space in middle of sentence
> 
> > + **/
> > +static CXLRetCode cmd_dcd_add_dyn_cap_rsp(struct cxl_cmd *cmd,
> > +		CXLDeviceState *cxl_dstate,
> > +		uint16_t *len_unused)
> > +{
> > +	struct add_dyn_cap_ext_list_in_pl {
> > +		uint32_t num_entries_updated;
> > +		uint8_t rsvd[4];
> > +		struct {
> > +			uint64_t start_dpa;
> > +			uint64_t len;
> > +			uint8_t rsvd[8];
> > +		} QEMU_PACKED updated_entries[];
> 
> These extent list entries keep turning up in the code. Pull that out of here
> to be a general 'Extent list element' or similar.
> 
> > +	} QEMU_PACKED;
> > +
> > +	struct add_dyn_cap_ext_list_in_pl *in = (void *)cmd->payload;
> > +	struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> > +	CXLDCDExtentList *extent_list = &ct3d->dc.extents;
> > +	CXLDCD_Extent *ent;
> > +	uint32_t i;
> > +	uint64_t dpa, len;
> > +	CXLRetCode rs;
> > +
> > +	if (in->num_entries_updated == 0)
> > +		return CXL_MBOX_SUCCESS;
> > +
> > +	rs = detect_malformed_extent_list(ct3d, in);
> > +	if (rs != CXL_MBOX_SUCCESS)
> > +		return rs;
> > +
> > +	for (i = 0; i < in->num_entries_updated; i++) {
> > +		dpa = in->updated_entries[i].start_dpa;
> > +		len = in->updated_entries[i].len;
> > +
> > +		/* Todo: check following
> > +		 * One or more of the updated extent lists contain Starting DPA
> > +		 * or Lengths that are out of range of a current extent list
> > +		 * maintained by the device.
> > +		 **/
> > +
> > +		QTAILQ_FOREACH(ent, extent_list, node) {
> 
> Add some comments here.  Is this a repeated entry test?
> 
> > +			if (ent->start_dpa == dpa && ent->len == len)
> > +				return CXL_MBOX_INVALID_PA;
> > +			if (ent->start_dpa <= dpa
> > +				&& dpa + len <= ent->start_dpa + ent->len) {
> 
> Comment needed on this one. Why is it shrinking an existing entry?
> 
> > +				ent->start_dpa = dpa;
> > +				ent->len = len;
> > +				break;
> 
> If you break here I think it will only result in return CXL_MBOX_SUCCESS so
> might as well do that here.
> 
> > +			} else if ((dpa < ent->start_dpa + ent->len
> > +				&& dpa + len > ent->start_dpa + ent->len)
> 
> Above is new entry would contain existing one.
> 
> > +				|| (dpa < ent->start_dpa && dpa + len > ent->start_dpa))
> > +				return CXL_MBOX_INVALID_EXTENT_LIST;
> > +		}
> > +		// a new extent added
> 
> Why?  This function generally need more documentation.
> 
> > +		if (!ent) {
> > +			ent = g_new0(CXLDCD_Extent, 1);
> > +			assert(ent);
> 
> No need.  g_new0 will already have blown up before you get here.
> 
> > +			ent->start_dpa = dpa;
> > +			ent->len = len;
> > +			memset(ent->tag, 0, 0x10);
> Allocated empty just above, still empty so no need to set it again.
> > +			ent->shared_seq = 0;
> > +			QTAILQ_INSERT_TAIL(extent_list, ent, node);
> > +		}
> > +	}
> > +
> > +	return CXL_MBOX_SUCCESS;
> > +}
> > +
> > +/*
> > + * Spec 3.0: 8.2.9.8.9.4
> > + * Release Dynamic Capacity (opcode 4803h)
> > + **/
> > +static CXLRetCode cmd_dcd_release_dcd_capacity(struct cxl_cmd *cmd,
> > +		CXLDeviceState *cxl_dstate,
> > +		uint16_t *len_unused)
> > +{
> > +	struct release_dcd_cap_in_pl {
> > +		uint32_t num_entries_updated;
> > +		uint8_t rsvd[4];
> > +		struct {
> > +			uint64_t start_dpa;
> > +			uint64_t len;
> > +			uint8_t rsvd[8];
> > +		} QEMU_PACKED updated_entries[];
> > +	} QEMU_PACKED;
> > +
> > +	struct release_dcd_cap_in_pl *in = (void *)cmd->payload;
> > +	struct CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> > +	CXLDCDExtentList *extent_list = &ct3d->dc.extents;
> > +	CXLDCD_Extent *ent;
> > +	uint32_t i;
> > +	uint64_t dpa, len;
> > +	CXLRetCode rs;
> > +
> > +	if (in->num_entries_updated == 0)
> > +		return CXL_MBOX_INVALID_INPUT;
> > +
> > +	rs = detect_malformed_extent_list(ct3d, in);
> > +	if (rs != CXL_MBOX_SUCCESS)
> > +		return rs;
> > +
> > +		/* Todo: check following
> > +		 * One or more of the updated extent lists contain Starting DPA
> > +		 * or Lengths that are out of range of a current extent list
> > +		 * maintained by the device.
> > +		 **/
> > +
> > +	for (i = 0; i < in->num_entries_updated; i++) {
> > +		dpa = in->updated_entries[i].start_dpa;
> > +		len = in->updated_entries[i].len;
> > +
> > +		QTAILQ_FOREACH(ent, extent_list, node) {
> > +			if (ent->start_dpa == dpa && ent->len == len)
> 
> Do the remove here and comment on 'found entry' not needed.
> Note I think you can release partial extents so that will need handling at some point.
> 
> > +				break;
> > +			else if ((dpa < ent->start_dpa + ent->len
> > +				&& dpa + len > ent->start_dpa + ent->len)
> > +				|| (dpa < ent->start_dpa && dpa + len > ent->start_dpa))
> Comment on this condition and why it's a problem.
> 
> > +				return CXL_MBOX_INVALID_EXTENT_LIST;
> > +		}
> > +		/* found the entry, release it */
> > +		if (ent)
> > +			QTAILQ_REMOVE(extent_list, ent, node);
> > +	}
> > +
> > +	return CXL_MBOX_SUCCESS;
> > +}
> > +
> >  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
> >  #define IMMEDIATE_DATA_CHANGE (1 << 2)
> >  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> > @@ -1112,6 +1329,12 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
> >  	[DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
> >  		"DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
> >  		8, 0 },
> > +	[DCD_CONFIG][ADD_DYN_CAP_RSP] = {
> > +		"ADD_DCD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
> > +		~0, IMMEDIATE_DATA_CHANGE },
> > +	[DCD_CONFIG][RELEASE_DYN_CAP] = {
> > +		"RELEASE_DCD_DYNAMIC_CAPACITY", cmd_dcd_release_dcd_capacity,
> > +		~0, IMMEDIATE_DATA_CHANGE },
> >  };
> >  
> >  static struct cxl_cmd cxl_cmd_set_sw[256][256] = {
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index 20ad5e7411..c0c8fcc24b 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -131,7 +131,8 @@ typedef enum {
> >      CXL_MBOX_INCORRECT_PASSPHRASE = 0x14,
> >      CXL_MBOX_UNSUPPORTED_MAILBOX = 0x15,
> >      CXL_MBOX_INVALID_PAYLOAD_LENGTH = 0x16,
> > -    CXL_MBOX_MAX = 0x17
> > +	CXL_MBOX_INVALID_EXTENT_LIST = 0x17,
> 
> 0x1e in the spec.
> 
> > +	CXL_MBOX_MAX = 0x18
> >  } CXLRetCode;
> >  
> >  struct cxl_cmd;
> 

-- 
Fan Ni <nifan@outlook.com>
[RFC 6/7] Add qmp interfaces to add/release dynamic capacity extents
Posted by Fan Ni 12 months ago
From: Fan Ni <nifan@outlook.com>

Since fabric manager emulation is not supported yet, the change implements
the functions to add/release dynamic capacity extents as QMP interfaces.

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each is 128MB
long) to region 0 (starting at dpa offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity-event",
    "arguments": {
	"path": "/machine/peripheral/cxl-pmem0",
	"region-id" : 0,
	"num-extent": 2,
	"dpa":0,
	"extent-len": 128
	}
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MB from region
0 (starting at dpa offset 0) look like below:

{ "execute": "cxl-release-dynamic-capacity-event",
	"arguments": {
		 "path": "/machine/peripheral/cxl-pmem0",
		"region-id" : 0,
		 "num-extent": 1 ,
		"dpa":0,
		"extent-len": 128
	}
}

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/mem/cxl_type3.c          | 127 ++++++++++++++++++++++++++++++++++++
 include/hw/cxl/cxl_events.h |  16 +++++
 qapi/cxl.json               |  44 +++++++++++++
 3 files changed, 187 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 23954711b5..70d47d43b9 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1651,6 +1651,133 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
     }
 }
 
+static const QemuUUID dynamic_capacity_uuid = {
+	.data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+			0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+static void qmp_cxl_process_dynamic_capacity_event(const char *path, CxlEventLog log,
+		uint8_t flags, uint8_t type, uint16_t hid, uint8_t rid, uint32_t extent_cnt,
+		CXLDCExtent_raw *extents, Error **errp)
+{
+	Object *obj = object_resolve_path(path, NULL);
+	CXLEventDynamicCapacity dCap;
+	CXLEventRecordHdr *hdr = &dCap.hdr;
+	CXLDeviceState *cxlds;
+	CXLType3Dev *dcd;
+	int i;
+
+	if (!obj) {
+		error_setg(errp, "Unable to resolve path");
+		return;
+	}
+	if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
+		error_setg(errp, "Path not point to a valid CXL type3 device");
+		return;
+	}
+
+	dcd = CXL_TYPE3(obj);
+	cxlds = &dcd->cxl_dstate;
+	memset(&dCap, 0, sizeof(dCap));
+
+	if (!dcd->dc.num_regions) {
+		error_setg(errp, "No dynamic capacity support from the device");
+		return;
+	}
+
+	/*
+	 * 8.2.9.1.5
+	 * All Dynamic Capacity event records shall set the Event Record
+	 * Severity field in the Common Event Record Format to Informational
+	 * Event. All Dynamic Capacity related events shall be logged in the
+	 * Dynamic Capacity Event Log.
+	 */
+	assert(flags & (1<<CXL_EVENT_TYPE_INFO));
+	cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap));
+
+	/*
+	 * 00h: add capacity
+	 * 01h: release capacity
+	 * 02h: forced capacity release
+	 * 03h: region configuration updated
+	 * 04h: Add capacity response
+	 * 05h: capacity released
+	 **/
+	dCap.type = type;
+	stw_le_p(&dCap.host_id, hid);
+	dCap.updated_region_id = rid;
+	for (i = 0; i < extent_cnt; i++) {
+		extents[i].start_dpa += dcd->dc.regions[rid].base;
+		memcpy(&dCap.dynamic_capacity_extent, &extents[i]
+				, sizeof(CXLDCExtent_raw));
+
+		if (cxl_event_insert(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP,
+					(CXLEventRecordRaw *)&dCap)) {
+			;
+		}
+		cxl_event_irq_assert(dcd);
+	}
+}
+
+#define MEM_BLK_SIZE_MB 128
+void qmp_cxl_add_dynamic_capacity_event(const char *path, uint8_t region_id,
+		uint32_t num_exent, uint64_t dpa, uint64_t extent_len_MB, Error **errp)
+{
+	uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+	CXLDCExtent_raw *extents;
+	int i;
+
+	if (extent_len_MB < MEM_BLK_SIZE_MB) {
+		error_setg(errp,
+			"extent size cannot be smaller than memory block size (%dMB)",
+			MEM_BLK_SIZE_MB);
+		return;
+	}
+
+	extents = g_new0(CXLDCExtent_raw, num_exent);
+	for (i = 0; i < num_exent; i++) {
+		extents[i].start_dpa = dpa;
+		extents[i].len = extent_len_MB*1024*1024;
+		memset(extents[i].tag, 0, 0x10);
+		extents[i].shared_seq = 0;
+		dpa += extents[i].len;
+	}
+
+	qmp_cxl_process_dynamic_capacity_event(path, CXL_EVENT_LOG_INFORMATIONAL,
+			flags, 0x0, 0, region_id, num_exent, extents, errp);
+
+	g_free(extents);
+}
+
+void qmp_cxl_release_dynamic_capacity_event(const char *path, uint8_t region_id,
+		uint32_t num_exent, uint64_t dpa, uint64_t extent_len_MB, Error **errp)
+{
+	uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+	CXLDCExtent_raw *extents;
+	int i;
+
+	if (extent_len_MB < MEM_BLK_SIZE_MB) {
+		error_setg(errp,
+			"extent size cannot be smaller than memory block size (%dMB)",
+			MEM_BLK_SIZE_MB);
+		return;
+	}
+
+	extents = g_new0(CXLDCExtent_raw, num_exent);
+	for (i = 0; i < num_exent; i++) {
+		extents[i].start_dpa = dpa;
+		extents[i].len = extent_len_MB*1024*1024;
+		memset(extents[i].tag, 0, 0x10);
+		extents[i].shared_seq = 0;
+		dpa += extents[i].len;
+	}
+
+	qmp_cxl_process_dynamic_capacity_event(path, CXL_EVENT_LOG_INFORMATIONAL,
+			flags, 0x1, 0, region_id, num_exent, extents, errp);
+
+	g_free(extents);
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index 089ba2091f..dd00458d1d 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -165,4 +165,20 @@ typedef struct CXLEventMemoryModule {
     uint8_t reserved[0x3d];
 } QEMU_PACKED CXLEventMemoryModule;
 
+/*
+ * Dynamic Capacity Event Record
+ * CXL Rev 3.0 Section 8.2.9.2.1.5: Table 8-47
+ * All fields little endian.
+ */
+typedef struct CXLEventDynamicCapacity {
+	CXLEventRecordHdr hdr;
+	uint8_t type;
+	uint8_t reserved1;
+	uint16_t host_id;
+	uint8_t updated_region_id;
+	uint8_t reserved2[3];
+	uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
+	uint8_t reserved[0x20];
+} QEMU_PACKED CXLEventDynamicCapacity;
+
 #endif /* CXL_EVENTS_H */
diff --git a/qapi/cxl.json b/qapi/cxl.json
index 8b3d30cd71..c9a9a45ce4 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -264,3 +264,47 @@
             'type': 'CxlCorErrorType'
   }
 }
+
+##
+# @cxl-add-dynamic-capacity-event:
+#
+# Command to add dynamic capacity extent event
+#
+# @path: CXL DCD canonical QOM path
+# @region-id: region id
+# @num-extent: number of extents to add, test only
+# @dpa: start dpa for the operation
+# @extent-len: extent size in MB
+#
+# Since: 8.0
+##
+{ 'command': 'cxl-add-dynamic-capacity-event',
+  'data': { 'path': 'str',
+           'region-id': 'uint8',
+           'num-extent': 'uint32',
+           'dpa':'uint64',
+           'extent-len': 'uint64'
+  }
+}
+
+##
+# @cxl-release-dynamic-capacity-event:
+#
+# Command to add dynamic capacity extent event
+#
+# @path: CXL DCD canonical QOM path
+# @region-id: region id
+# @num-extent: number of extents to add, test only
+# @dpa: start dpa for the operation
+# @extent-len: extent size in MB
+#
+# Since: 8.0
+##
+{ 'command': 'cxl-release-dynamic-capacity-event',
+  'data': { 'path': 'str',
+           'region-id': 'uint8',
+           'num-extent': 'uint32',
+           'dpa':'uint64',
+           'extent-len': 'uint64'
+  }
+}
-- 
2.25.1
Re: [RFC 6/7] Add qmp interfaces to add/release dynamic capacity extents
Posted by Jonathan Cameron via 11 months, 3 weeks ago
On Thu, 11 May 2023 17:56:40 +0000
Fan Ni <fan.ni@samsung.com> wrote:

> From: Fan Ni <nifan@outlook.com>
> 
> Since fabric manager emulation is not supported yet, the change implements
> the functions to add/release dynamic capacity extents as QMP interfaces.

This makes sense at least as a stop gap.

> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each is 128MB
> long) to region 0 (starting at dpa offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity-event",
>     "arguments": {
> 	"path": "/machine/peripheral/cxl-pmem0",
> 	"region-id" : 0,
> 	"num-extent": 2,
What does num-extent mean? 
A multiple entry injection mechanism makes sense but this
doesn't seem be one.  Look at the error injection stuff done
to ensure we could inject multiple of those as one atomic operation
to trigger the various multi error handling paths.

> 	"dpa":0,
> 	"extent-len": 128
> 	}
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MB from region
> 0 (starting at dpa offset 0) look like below:
> 
> { "execute": "cxl-release-dynamic-capacity-event",
> 	"arguments": {
> 		 "path": "/machine/peripheral/cxl-pmem0",
> 		"region-id" : 0,
> 		 "num-extent": 1 ,
> 		"dpa":0,
> 		"extent-len": 128
> 	}
> }
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/mem/cxl_type3.c          | 127 ++++++++++++++++++++++++++++++++++++
>  include/hw/cxl/cxl_events.h |  16 +++++
>  qapi/cxl.json               |  44 +++++++++++++
>  3 files changed, 187 insertions(+)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 23954711b5..70d47d43b9 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -1651,6 +1651,133 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
>      }
>  }
>  
> +static const QemuUUID dynamic_capacity_uuid = {
> +	.data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
> +			0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
> +};
> +
> +static void qmp_cxl_process_dynamic_capacity_event(const char *path, CxlEventLog log,
> +		uint8_t flags, uint8_t type, uint16_t hid, uint8_t rid, uint32_t extent_cnt,
> +		CXLDCExtent_raw *extents, Error **errp)
> +{
> +	Object *obj = object_resolve_path(path, NULL);
> +	CXLEventDynamicCapacity dCap;
> +	CXLEventRecordHdr *hdr = &dCap.hdr;
> +	CXLDeviceState *cxlds;
> +	CXLType3Dev *dcd;
> +	int i;
> +
> +	if (!obj) {
> +		error_setg(errp, "Unable to resolve path");
> +		return;
> +	}
> +	if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
> +		error_setg(errp, "Path not point to a valid CXL type3 device");
> +		return;
> +	}
> +
> +	dcd = CXL_TYPE3(obj);
> +	cxlds = &dcd->cxl_dstate;
> +	memset(&dCap, 0, sizeof(dCap));
> +
> +	if (!dcd->dc.num_regions) {
> +		error_setg(errp, "No dynamic capacity support from the device");
> +		return;
> +	}
> +
> +	/*
> +	 * 8.2.9.1.5
> +	 * All Dynamic Capacity event records shall set the Event Record
> +	 * Severity field in the Common Event Record Format to Informational
> +	 * Event. All Dynamic Capacity related events shall be logged in the
> +	 * Dynamic Capacity Event Log.
> +	 */
> +	assert(flags & (1<<CXL_EVENT_TYPE_INFO));

Given this requirement, why pass in those flags at all? Just set it in here instead
thus ensuring it's always right.

> +	cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap));
> +
> +	/*
> +	 * 00h: add capacity
> +	 * 01h: release capacity

Enum for these so the input is typed.

> +	 * 02h: forced capacity release
> +	 * 03h: region configuration updated
> +	 * 04h: Add capacity response
> +	 * 05h: capacity released
> +	 **/
> +	dCap.type = type;
> +	stw_le_p(&dCap.host_id, hid);
> +	dCap.updated_region_id = rid;
> +	for (i = 0; i < extent_cnt; i++) {
> +		extents[i].start_dpa += dcd->dc.regions[rid].base;

Mixture of handling endian conversion and not.  Whilst we still have
a bunch of cleanup to do around this, new code should handle endian
conversions always.  If touching code with problems, a precursor patch
to fix that code up before adding new stuff would be great as well.

> +		memcpy(&dCap.dynamic_capacity_extent, &extents[i]
> +				, sizeof(CXLDCExtent_raw));

comma on previous line.

> +
> +		if (cxl_event_insert(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP,
> +					(CXLEventRecordRaw *)&dCap)) {
> +			;

?  Failure here indicates a bug or an overflow of the event log.
Both want handling.

> +		}
> +		cxl_event_irq_assert(dcd);
> +	}
> +}
> +
> +#define MEM_BLK_SIZE_MB 128
> +void qmp_cxl_add_dynamic_capacity_event(const char *path, uint8_t region_id,
> +		uint32_t num_exent, uint64_t dpa, uint64_t extent_len_MB, Error **errp)
> +{
> +	uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;

As above, no point in handling flags out here if they always have same value.
Push them to where it matters.

> +	CXLDCExtent_raw *extents;
> +	int i;
> +
> +	if (extent_len_MB < MEM_BLK_SIZE_MB) {
> +		error_setg(errp,
> +			"extent size cannot be smaller than memory block size (%dMB)",
> +			MEM_BLK_SIZE_MB);
> +		return;
> +	}
> +
> +	extents = g_new0(CXLDCExtent_raw, num_exent);

Ah. Raw extents used in here. Either combine the different definitions or bring
that one forwards to this patch.

> +	for (i = 0; i < num_exent; i++) {
> +		extents[i].start_dpa = dpa;
> +		extents[i].len = extent_len_MB*1024*1024;
> +		memset(extents[i].tag, 0, 0x10);
> +		extents[i].shared_seq = 0;
> +		dpa += extents[i].len;
> +	}
> +
> +	qmp_cxl_process_dynamic_capacity_event(path, CXL_EVENT_LOG_INFORMATIONAL,
> +			flags, 0x0, 0, region_id, num_exent, extents, errp);
> +
> +	g_free(extents);
> +}
> +
> +void qmp_cxl_release_dynamic_capacity_event(const char *path, uint8_t region_id,
> +		uint32_t num_exent, uint64_t dpa, uint64_t extent_len_MB, Error **errp)
> +{
> +	uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
> +	CXLDCExtent_raw *extents;
> +	int i;
> +
> +	if (extent_len_MB < MEM_BLK_SIZE_MB) {
> +		error_setg(errp,
> +			"extent size cannot be smaller than memory block size (%dMB)",
> +			MEM_BLK_SIZE_MB);
> +		return;
> +	}
> +
> +	extents = g_new0(CXLDCExtent_raw, num_exent);
> +	for (i = 0; i < num_exent; i++) {
> +		extents[i].start_dpa = dpa;
> +		extents[i].len = extent_len_MB*1024*1024;
> +		memset(extents[i].tag, 0, 0x10);
> +		extents[i].shared_seq = 0;
> +		dpa += extents[i].len;
> +	}
> +
> +	qmp_cxl_process_dynamic_capacity_event(path, CXL_EVENT_LOG_INFORMATIONAL,
> +			flags, 0x1, 0, region_id, num_exent, extents, errp);
> +
> +	g_free(extents);
> +}
> +
>  static void ct3_class_init(ObjectClass *oc, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(oc);
> diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> index 089ba2091f..dd00458d1d 100644
> --- a/include/hw/cxl/cxl_events.h
> +++ b/include/hw/cxl/cxl_events.h
> @@ -165,4 +165,20 @@ typedef struct CXLEventMemoryModule {
>      uint8_t reserved[0x3d];
>  } QEMU_PACKED CXLEventMemoryModule;
>  
> +/*
> + * Dynamic Capacity Event Record
> + * CXL Rev 3.0 Section 8.2.9.2.1.5: Table 8-47
> + * All fields little endian.
> + */
> +typedef struct CXLEventDynamicCapacity {
> +	CXLEventRecordHdr hdr;
> +	uint8_t type;
> +	uint8_t reserved1;
> +	uint16_t host_id;
> +	uint8_t updated_region_id;
> +	uint8_t reserved2[3];
> +	uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
> +	uint8_t reserved[0x20];
> +} QEMU_PACKED CXLEventDynamicCapacity;
> +
>  #endif /* CXL_EVENTS_H */
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 8b3d30cd71..c9a9a45ce4 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -264,3 +264,47 @@
>              'type': 'CxlCorErrorType'
>    }
>  }
> +
> +##
> +# @cxl-add-dynamic-capacity-event:
> +#
> +# Command to add dynamic capacity extent event
> +#
> +# @path: CXL DCD canonical QOM path
> +# @region-id: region id
> +# @num-extent: number of extents to add, test only
Moving towards 
> +# @dpa: start dpa for the operation
> +# @extent-len: extent size in MB
> +#
> +# Since: 8.0
> +##
> +{ 'command': 'cxl-add-dynamic-capacity-event',
> +  'data': { 'path': 'str',
> +           'region-id': 'uint8',
> +           'num-extent': 'uint32',

Look at how cxl-inject-uncorrectable-errors is done
as that handles a set of records all in one command and
we want similar here - so that we generate what it would
look like if the fm-api was used.

> +           'dpa':'uint64',
> +           'extent-len': 'uint64'
> +  }
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity-event:
> +#
> +# Command to add dynamic capacity extent event
> +#
> +# @path: CXL DCD canonical QOM path
> +# @region-id: region id
> +# @num-extent: number of extents to add, test only
> +# @dpa: start dpa for the operation
> +# @extent-len: extent size in MB
> +#
> +# Since: 8.0
> +##
> +{ 'command': 'cxl-release-dynamic-capacity-event',
> +  'data': { 'path': 'str',
> +           'region-id': 'uint8',
> +           'num-extent': 'uint32',
> +           'dpa':'uint64',
> +           'extent-len': 'uint64'
> +  }
> +}
[RFC 7/7] hw/mem/cxl_type3: add read/write support to dynamic capacity
Posted by Fan Ni 12 months ago
From: Fan Ni <nifan@outlook.com>

Before the change, read from or write to dynamic capacity of the memory
device is not support as 1) no host backed file/memory is provided for
it; 2) no address space is created for the dynamic capacity.

With the change, add code to support following:
1. add a new property to type3 device "dc-memdev" to point to host
   memory backend for dynamic capacity;
2. add a bitmap for each region to track whether a block is host backed,
which will be used for address check when read/write dynamic capacity;
3. add namespace for dynamic capacity for read/write support;
4. create cdat entries for each dynamic capacity region;

Signed-off-by: Fan Ni <fan.ni@samsung.com>
---
 hw/cxl/cxl-mailbox-utils.c  |  21 ++-
 hw/mem/cxl_type3.c          | 336 +++++++++++++++++++++++++++++-------
 include/hw/cxl/cxl_device.h |   8 +-
 3 files changed, 298 insertions(+), 67 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 7212934627..efe61e67fb 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -391,9 +391,11 @@ static CXLRetCode cmd_firmware_update_get_info(struct cxl_cmd *cmd,
         char fw_rev4[0x10];
     } QEMU_PACKED *fw_info;
     QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
+	CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
 
     if ((cxl_dstate->vmem_size < CXL_CAPACITY_MULTIPLIER) ||
-        (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER)) {
+			(cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) ||
+		(ct3d->dc.total_dynamic_capicity < CXL_CAPACITY_MULTIPLIER)) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -534,7 +536,9 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd *cmd,
     CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
 
     if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
-        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
+		(!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+		(!QEMU_IS_ALIGNED(ct3d->dc.total_dynamic_capicity,
+						CXL_CAPACITY_MULTIPLIER))) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -543,7 +547,8 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd *cmd,
 
     snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
 
-    stq_le_p(&id->total_capacity, cxl_dstate->mem_size / CXL_CAPACITY_MULTIPLIER);
+	stq_le_p(&id->total_capacity,
+			cxl_dstate->static_mem_size / CXL_CAPACITY_MULTIPLIER);
     stq_le_p(&id->persistent_capacity, cxl_dstate->pmem_size / CXL_CAPACITY_MULTIPLIER);
     stq_le_p(&id->volatile_capacity, cxl_dstate->vmem_size / CXL_CAPACITY_MULTIPLIER);
     stl_le_p(&id->lsa_size, cvc->get_lsa_size(ct3d));
@@ -568,9 +573,12 @@ static CXLRetCode cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
         uint64_t next_pmem;
     } QEMU_PACKED *part_info = (void *)cmd->payload;
     QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
+	CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
 
     if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
-        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
+		(!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+		(!QEMU_IS_ALIGNED(ct3d->dc.total_dynamic_capicity,
+						CXL_CAPACITY_MULTIPLIER))) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -881,9 +889,8 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
     struct clear_poison_pl *in = (void *)cmd->payload;
 
     dpa = ldq_le_p(&in->dpa);
-    if (dpa + 64 > cxl_dstate->mem_size) {
-        return CXL_MBOX_INVALID_PA;
-    }
+	if (dpa + 64 > cxl_dstate->static_mem_size && ct3d->dc.num_regions == 0)
+		return CXL_MBOX_INVALID_PA;
 
     QLIST_FOREACH(ent, poison_list, node) {
         /*
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 70d47d43b9..334660bd0f 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -33,8 +33,8 @@ enum {
 };
 
 static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
-                                         int dsmad_handle, MemoryRegion *mr,
-                                         bool is_pmem, uint64_t dpa_base)
+		int dsmad_handle, uint8_t flags,
+		uint64_t dpa_base, uint64_t size)
 {
     g_autofree CDATDsmas *dsmas = NULL;
     g_autofree CDATDslbis *dslbis0 = NULL;
@@ -53,9 +53,9 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
             .length = sizeof(*dsmas),
         },
         .DSMADhandle = dsmad_handle,
-        .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
+		.flags = flags,
         .DPA_base = dpa_base,
-        .DPA_length = int128_get64(mr->size),
+		.DPA_length = size,
     };
 
     /* For now, no memory side cache, plausiblish numbers */
@@ -137,9 +137,9 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
          * NV: Reserved - the non volatile from DSMAS matters
          * V: EFI_MEMORY_SP
          */
-        .EFI_memory_type_attr = is_pmem ? 2 : 1,
+		.EFI_memory_type_attr = flags ? 2 : 1,
         .DPA_offset = 0,
-        .DPA_length = int128_get64(mr->size),
+		.DPA_length = size,
     };
 
     /* Header always at start of structure */
@@ -158,14 +158,15 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
     g_autofree CDATSubHeader **table = NULL;
     CXLType3Dev *ct3d = priv;
     MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
+	MemoryRegion *dc_mr = NULL;
     int dsmad_handle = 0;
     int cur_ent = 0;
     int len = 0;
     int rc, i;
+	uint64_t vmr_size = 0, pmr_size = 0;
 
-    if (!ct3d->hostpmem && !ct3d->hostvmem) {
-        return 0;
-    }
+	if (!ct3d->hostpmem && !ct3d->hostvmem && !ct3d->dc.num_regions)
+		return 0;
 
     if (ct3d->hostvmem) {
         volatile_mr = host_memory_backend_get_memory(ct3d->hostvmem);
@@ -173,6 +174,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
             return -EINVAL;
         }
         len += CT3_CDAT_NUM_ENTRIES;
+		vmr_size = volatile_mr->size;
     }
 
     if (ct3d->hostpmem) {
@@ -181,7 +183,19 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
             return -EINVAL;
         }
         len += CT3_CDAT_NUM_ENTRIES;
-    }
+		pmr_size = nonvolatile_mr->size;
+	}
+
+	if (ct3d->dc.num_regions) {
+		if (ct3d->dc.host_dc) {
+			dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+			if (!dc_mr)
+				return -EINVAL;
+			len += CT3_CDAT_NUM_ENTRIES * ct3d->dc.num_regions;
+		} else {
+			return -EINVAL;
+		}
+	}
 
     table = g_malloc0(len * sizeof(*table));
     if (!table) {
@@ -189,23 +203,45 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
     }
 
     /* Now fill them in */
-    if (volatile_mr) {
-        rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++, volatile_mr,
-                                           false, 0);
-        if (rc < 0) {
-            return rc;
-        }
-        cur_ent = CT3_CDAT_NUM_ENTRIES;
-    }
+	if (volatile_mr) {
+		rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++,
+				0, 0, vmr_size);
+		if (rc < 0)
+			return rc;
+		cur_ent = CT3_CDAT_NUM_ENTRIES;
+	}
+
+	if (nonvolatile_mr) {
+		rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
+				CDAT_DSMAS_FLAG_NV, vmr_size, pmr_size);
+		if (rc < 0)
+			goto error_cleanup;
+		cur_ent += CT3_CDAT_NUM_ENTRIES;
+	}
+
+	if (dc_mr) {
+		uint64_t region_base = vmr_size + pmr_size;
+
+		/*
+		 * Currently we create cdat entries for each region, should we only
+		 * create dsmas table instead??
+		 * We assume all dc regions are non-volatile for now.
+		 *
+		 */
+		for (i = 0; i < ct3d->dc.num_regions; i++) {
+			rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent])
+					, dsmad_handle++
+					, CDAT_DSMAS_FLAG_NV|CDAT_DSMAS_FLAG_DYNAMIC_CAP
+					, region_base, ct3d->dc.regions[i].len);
+			if (rc < 0)
+				goto error_cleanup;
+			ct3d->dc.regions[i].dsmadhandle = dsmad_handle-1;
+
+			cur_ent += CT3_CDAT_NUM_ENTRIES;
+			region_base += ct3d->dc.regions[i].len;
+		}
+	}
 
-    if (nonvolatile_mr) {
-        rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
-                nonvolatile_mr, true, (volatile_mr ? volatile_mr->size : 0));
-        if (rc < 0) {
-            goto error_cleanup;
-        }
-        cur_ent += CT3_CDAT_NUM_ENTRIES;
-    }
     assert(len == cur_ent);
 
     *cdat_table = g_steal_pointer(&table);
@@ -706,6 +742,11 @@ static int cxl_create_toy_regions(CXLType3Dev *ct3d)
 		/* dsmad_handle is set when creating cdat table entries */
 		region->flags = 0;
 
+		region->blk_bitmap = bitmap_new(region->len / region->block_size);
+		if (!region->blk_bitmap)
+			return -1;
+		bitmap_zero(region->blk_bitmap, region->len / region->block_size);
+
 		region_base += region->len;
 	}
 	QTAILQ_INIT(&ct3d->dc.extents);
@@ -713,11 +754,24 @@ static int cxl_create_toy_regions(CXLType3Dev *ct3d)
 	return 0;
 }
 
+static void cxl_destroy_toy_regions(CXLType3Dev *ct3d)
+{
+	int i;
+	struct CXLDCD_Region *region;
+
+	for (i = 0; i < ct3d->dc.num_regions; i++) {
+		region = &ct3d->dc.regions[i];
+		if (region->blk_bitmap)
+			g_free(region->blk_bitmap);
+	}
+}
+
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
 
-    if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem) {
+	if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem
+			&& !ct3d->dc.num_regions) {
         error_setg(errp, "at least one memdev property must be set");
         return false;
     } else if (ct3d->hostmem && ct3d->hostpmem) {
@@ -754,7 +808,7 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         }
         address_space_init(&ct3d->hostvmem_as, vmr, v_name);
         ct3d->cxl_dstate.vmem_size = vmr->size;
-        ct3d->cxl_dstate.mem_size += vmr->size;
+		ct3d->cxl_dstate.static_mem_size += vmr->size;
         g_free(v_name);
     }
 
@@ -777,12 +831,47 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         }
         address_space_init(&ct3d->hostpmem_as, pmr, p_name);
         ct3d->cxl_dstate.pmem_size = pmr->size;
-        ct3d->cxl_dstate.mem_size += pmr->size;
+		ct3d->cxl_dstate.static_mem_size += pmr->size;
         g_free(p_name);
     }
 
-	if (cxl_create_toy_regions(ct3d))
-		return false;
+	ct3d->dc.total_dynamic_capicity = 0;
+	if (ct3d->dc.host_dc) {
+		MemoryRegion *dc_mr;
+		char *dc_name;
+		uint64_t total_region_size = 0;
+		int i;
+
+		dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+		if (!dc_mr) {
+			error_setg(errp, "dynamic capacity must have backing device");
+			return false;
+		}
+		/* FIXME: set dc as nonvolatile for now */
+		memory_region_set_nonvolatile(dc_mr, true);
+		memory_region_set_enabled(dc_mr, true);
+		host_memory_backend_set_mapped(ct3d->dc.host_dc, true);
+		if (ds->id) {
+			dc_name = g_strdup_printf("cxl-dcd-dpa-dc-space:%s", ds->id);
+		} else {
+			dc_name = g_strdup("cxl-dcd-dpa-dc-space");
+		}
+		address_space_init(&ct3d->dc.host_dc_as, dc_mr, dc_name);
+
+		if (cxl_create_toy_regions(ct3d)) {
+			return false;
+		}
+
+		for (i = 0; i < ct3d->dc.num_regions; i++) {
+			total_region_size += ct3d->dc.regions[i].len;
+		}
+		/* Make sure the host backend is large enough to cover all dc range */
+		assert(total_region_size <= dc_mr->size);
+		assert(dc_mr->size % (256*1024*1024) == 0);
+
+		ct3d->dc.total_dynamic_capicity = total_region_size;
+		g_free(dc_name);
+	}
 
     return true;
 }
@@ -890,6 +979,10 @@ err_release_cdat:
 err_free_special_ops:
     g_free(regs->special_ops);
 err_address_space_free:
+	if (ct3d->dc.host_dc) {
+		cxl_destroy_toy_regions(ct3d);
+		address_space_destroy(&ct3d->dc.host_dc_as);
+	}
     if (ct3d->hostpmem) {
         address_space_destroy(&ct3d->hostpmem_as);
     }
@@ -909,6 +1002,10 @@ static void ct3_exit(PCIDevice *pci_dev)
     cxl_doe_cdat_release(cxl_cstate);
     spdm_sock_fini(ct3d->doe_spdm.socket);
     g_free(regs->special_ops);
+	if (ct3d->dc.host_dc) {
+		cxl_destroy_toy_regions(ct3d);
+		address_space_destroy(&ct3d->dc.host_dc_as);
+	}
     if (ct3d->hostpmem) {
         address_space_destroy(&ct3d->hostpmem_as);
     }
@@ -917,6 +1014,100 @@ static void ct3_exit(PCIDevice *pci_dev)
     }
 }
 
+static void set_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+		uint64_t len)
+{
+	int i;
+	CXLDCD_Region *region = NULL;
+
+	if (dpa < ct3d->dc.regions[0].base
+		|| dpa >= ct3d->dc.regions[0].base + ct3d->dc.total_dynamic_capicity)
+		return;
+
+	/*
+	 * spec 3.0 9.13.3: Regions are used in increasing-DPA order, with
+	 * Region 0 being used for the lowest DPA of Dynamic Capacity and
+	 * Region 7 for the highest DPA.
+	 * So we check from the last region to find where the dpa belongs.
+	 * access across multiple regions is not allowed.
+	 **/
+	for (i = ct3d->dc.num_regions-1; i >= 0; i--) {
+		region = &ct3d->dc.regions[i];
+		if (dpa >= region->base)
+			break;
+	}
+
+	bitmap_set(region->blk_bitmap, (dpa-region->base)/region->block_size,
+			len/region->block_size);
+}
+
+static bool test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+		uint64_t len)
+{
+	int i;
+	CXLDCD_Region *region = NULL;
+	uint64_t nbits;
+	long nr;
+
+	if (dpa < ct3d->dc.regions[0].base
+		   || dpa >= ct3d->dc.regions[0].base + ct3d->dc.total_dynamic_capicity)
+		return false;
+
+	/*
+	 * spec 3.0 9.13.3: Regions are used in increasing-DPA order, with
+	 * Region 0 being used for the lowest DPA of Dynamic Capacity and
+	 * Region 7 for the highest DPA.
+	 * So we check from the last region to find where the dpa belongs.
+	 * access across multiple regions is not allowed.
+	 **/
+	for (i = ct3d->dc.num_regions-1; i >= 0; i--) {
+		region = &ct3d->dc.regions[i];
+		if (dpa >= region->base)
+			break;
+	}
+
+	nr = (dpa-region->base)/region->block_size;
+	nbits = (len + region->block_size-1)/region->block_size;
+	if (find_next_zero_bit(region->blk_bitmap, nr+nbits, nr)
+			>= nr+nbits)
+		return true;
+
+	return false;
+}
+
+static void clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+		uint64_t len)
+{
+	int i;
+	CXLDCD_Region *region = NULL;
+	uint64_t nbits;
+	long nr;
+
+	if (dpa < ct3d->dc.regions[0].base
+		|| dpa >= ct3d->dc.regions[0].base + ct3d->dc.total_dynamic_capicity)
+		return;
+
+	/*
+	 * spec 3.0 9.13.3: Regions are used in increasing-DPA order, with
+	 * Region 0 being used for the lowest DPA of Dynamic Capacity and
+	 * Region 7 for the highest DPA.
+	 * So we check from the last region to find where the dpa belongs.
+	 * access across multiple regions is not allowed.
+	 **/
+	for (i = ct3d->dc.num_regions-1; i >= 0; i--) {
+		region = &ct3d->dc.regions[i];
+		if (dpa >= region->base)
+			break;
+	}
+
+	nr = (dpa-region->base) / region->block_size;
+	nbits = (len + region->block_size-1) / region->block_size;
+	for (i = 0; i < nbits; i++) {
+		clear_bit(nr, region->blk_bitmap);
+		nr++;
+	}
+}
+
 static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr host_addr, uint64_t *dpa)
 {
     uint32_t *cache_mem = ct3d->cxl_cstate.crb.cache_mem_registers;
@@ -973,16 +1164,24 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
                                        AddressSpace **as,
                                        uint64_t *dpa_offset)
 {
-    MemoryRegion *vmr = NULL, *pmr = NULL;
+	MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
+	uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
 
     if (ct3d->hostvmem) {
         vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+		vmr_size = int128_get64(vmr->size);
     }
     if (ct3d->hostpmem) {
         pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+		pmr_size = int128_get64(pmr->size);
     }
+	if (ct3d->dc.host_dc) {
+		dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+		/* Do we want dc_size to be dc_mr->size or not?? */
+		dc_size = ct3d->dc.total_dynamic_capicity;
+	}
 
-    if (!vmr && !pmr) {
+	if (!vmr && !pmr && !dc_mr) {
         return -ENODEV;
     }
 
@@ -990,20 +1189,22 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
         return -EINVAL;
     }
 
-    if (*dpa_offset > int128_get64(ct3d->cxl_dstate.mem_size)) {
+    if (*dpa_offset >= vmr_size + pmr_size + dc_size ||
+       (*dpa_offset >= vmr_size + pmr_size && ct3d->dc.num_regions == 0)) {
         return -EINVAL;
     }
 
-    if (vmr) {
-        if (*dpa_offset < int128_get64(vmr->size)) {
-            *as = &ct3d->hostvmem_as;
-        } else {
-            *as = &ct3d->hostpmem_as;
-            *dpa_offset -= vmr->size;
-        }
-    } else {
-        *as = &ct3d->hostpmem_as;
-    }
+	if (*dpa_offset < vmr_size)
+		*as = &ct3d->hostvmem_as;
+	else if (*dpa_offset < vmr_size + pmr_size) {
+		*as = &ct3d->hostpmem_as;
+		*dpa_offset -= vmr_size;
+	} else {
+		if (!test_region_block_backed(ct3d, *dpa_offset, size))
+			return -ENODEV;
+		*as = &ct3d->dc.host_dc_as;
+		*dpa_offset -= (vmr_size + pmr_size);
+	}
 
     return 0;
 }
@@ -1069,6 +1270,8 @@ static Property ct3_props[] = {
     DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
     DEFINE_PROP_UINT16("spdm", CXLType3Dev, spdm_port, 0),
 	DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
+	DEFINE_PROP_LINK("dc-memdev", CXLType3Dev, dc.host_dc,
+					TYPE_MEMORY_BACKEND, HostMemoryBackend *),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1135,34 +1338,41 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
 
 static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
 {
-    MemoryRegion *vmr = NULL, *pmr = NULL;
+	MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
     AddressSpace *as;
+	uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
 
     if (ct3d->hostvmem) {
         vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+		vmr_size = int128_get64(vmr->size);
     }
     if (ct3d->hostpmem) {
         pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+		pmr_size = int128_get64(pmr->size);
     }
+	if (ct3d->dc.host_dc) {
+		dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+		dc_size = ct3d->dc.total_dynamic_capicity;
+	}
 
-    if (!vmr && !pmr) {
+    if (!vmr && !pmr && !dc_mr) {
         return false;
     }
 
-    if (dpa_offset + 64 > int128_get64(ct3d->cxl_dstate.mem_size)) {
-        return false;
-    }
+	if (dpa_offset >= vmr_size + pmr_size + dc_size)
+		return false;
+	if (dpa_offset + 64 >= vmr_size + pmr_size && ct3d->dc.num_regions == 0)
+		return false;
 
-    if (vmr) {
-        if (dpa_offset <= int128_get64(vmr->size)) {
-            as = &ct3d->hostvmem_as;
-        } else {
-            as = &ct3d->hostpmem_as;
-            dpa_offset -= vmr->size;
-        }
-    } else {
-        as = &ct3d->hostpmem_as;
-    }
+	if (dpa_offset < vmr_size) {
+		as = &ct3d->hostvmem_as;
+	} else if (dpa_offset < vmr_size + pmr_size) {
+		as = &ct3d->hostpmem_as;
+		dpa_offset -= vmr->size;
+	} else {
+		as = &ct3d->dc.host_dc_as;
+		dpa_offset -= (vmr_size + pmr_size);
+	}
 
     address_space_write(as, dpa_offset, MEMTXATTRS_UNSPECIFIED, &data, 64);
     return true;
@@ -1711,6 +1921,14 @@ static void qmp_cxl_process_dynamic_capacity_event(const char *path, CxlEventLog
 		memcpy(&dCap.dynamic_capacity_extent, &extents[i]
 				, sizeof(CXLDCExtent_raw));
 
+		if (dCap.type == 0x0)
+			set_region_block_backed(dcd, extents[i].start_dpa, extents[i].len);
+		else if (dCap.type == 0x1)
+			clear_region_block_backed(dcd, extents[i].start_dpa,
+					extents[i].len);
+		else
+			error_setg(errp, "DC event not support yet, no bitmap op");
+
 		if (cxl_event_insert(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP,
 					(CXLEventRecordRaw *)&dCap)) {
 			;
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index c0c8fcc24b..d9b6776e2c 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -211,7 +211,7 @@ typedef struct cxl_device_state {
     } timestamp;
 
     /* memory region size, HDM */
-    uint64_t mem_size;
+	uint64_t static_mem_size;
     uint64_t pmem_size;
     uint64_t vmem_size;
 
@@ -412,6 +412,7 @@ typedef struct CXLDCD_Region {
 	uint64_t block_size;
 	uint32_t dsmadhandle;
 	uint8_t flags;
+	unsigned long *blk_bitmap;
 } CXLDCD_Region;
 
 struct CXLType3Dev {
@@ -447,12 +448,17 @@ struct CXLType3Dev {
     uint64_t poison_list_overflow_ts;
 
 	struct dynamic_capacity {
+		HostMemoryBackend *host_dc;
+		AddressSpace host_dc_as;
+
+		uint8_t num_hosts; //Table 7-55
 		uint8_t num_regions; // 1-8
 		struct CXLDCD_Region regions[DCD_MAX_REGION_NUM];
 		CXLDCDExtentList extents;
 
 		uint32_t total_extent_count;
 		uint32_t ext_list_gen_seq;
+		uint64_t total_dynamic_capicity; // 256M aligned
 	} dc;
 };
 
-- 
2.25.1
Re: [RFC 7/7] hw/mem/cxl_type3: add read/write support to dynamic capacity
Posted by Jonathan Cameron via 11 months, 3 weeks ago
On Thu, 11 May 2023 17:56:40 +0000
Fan Ni <fan.ni@samsung.com> wrote:

> From: Fan Ni <nifan@outlook.com>
> 
> Before the change, read from or write to dynamic capacity of the memory
> device is not support as 1) no host backed file/memory is provided for
> it; 2) no address space is created for the dynamic capacity.

Ah nice. I should have read ahead.  Probably makes sense to reorder things
so that when we present DCD region it will work.

> 
> With the change, add code to support following:
> 1. add a new property to type3 device "dc-memdev" to point to host
>    memory backend for dynamic capacity;
> 2. add a bitmap for each region to track whether a block is host backed,
> which will be used for address check when read/write dynamic capacity;
> 3. add namespace for dynamic capacity for read/write support;
> 4. create cdat entries for each dynamic capacity region;
> 
> Signed-off-by: Fan Ni <fan.ni@samsung.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  21 ++-
>  hw/mem/cxl_type3.c          | 336 +++++++++++++++++++++++++++++-------
>  include/hw/cxl/cxl_device.h |   8 +-
>  3 files changed, 298 insertions(+), 67 deletions(-)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 7212934627..efe61e67fb 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -391,9 +391,11 @@ static CXLRetCode cmd_firmware_update_get_info(struct cxl_cmd *cmd,
>          char fw_rev4[0x10];
>      } QEMU_PACKED *fw_info;
>      QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
> +	CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
>  
>      if ((cxl_dstate->vmem_size < CXL_CAPACITY_MULTIPLIER) ||
> -        (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER)) {
> +			(cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) ||

Keep old alignment

> +		(ct3d->dc.total_dynamic_capicity < CXL_CAPACITY_MULTIPLIER)) {

We should think about the separation between what goes in cxl_dstate and directly
in ct3d. That boundary has been blurring for a while and getting some review
comments.

>          return CXL_MBOX_INTERNAL_ERROR;
>      }
>  
> @@ -534,7 +536,9 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd *cmd,
>      CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
>  
>      if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
> -        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
> +		(!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
> +		(!QEMU_IS_ALIGNED(ct3d->dc.total_dynamic_capicity,
> +						CXL_CAPACITY_MULTIPLIER))) {
>          return CXL_MBOX_INTERNAL_ERROR;
>      }
>  
> @@ -543,7 +547,8 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd *cmd,
>  
>      snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
>  
> -    stq_le_p(&id->total_capacity, cxl_dstate->mem_size / CXL_CAPACITY_MULTIPLIER);
> +	stq_le_p(&id->total_capacity,
> +			cxl_dstate->static_mem_size / CXL_CAPACITY_MULTIPLIER);

Pull the rename out as a precursor patch.

>      stq_le_p(&id->persistent_capacity, cxl_dstate->pmem_size / CXL_CAPACITY_MULTIPLIER);
>      stq_le_p(&id->volatile_capacity, cxl_dstate->vmem_size / CXL_CAPACITY_MULTIPLIER);
>      stl_le_p(&id->lsa_size, cvc->get_lsa_size(ct3d));
> @@ -568,9 +573,12 @@ static CXLRetCode cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
>          uint64_t next_pmem;
>      } QEMU_PACKED *part_info = (void *)cmd->payload;
>      QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
> +	CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
>  
>      if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
> -        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
> +		(!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
> +		(!QEMU_IS_ALIGNED(ct3d->dc.total_dynamic_capicity,
> +						CXL_CAPACITY_MULTIPLIER))) {
>          return CXL_MBOX_INTERNAL_ERROR;
>      }
>  
> @@ -881,9 +889,8 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
>      struct clear_poison_pl *in = (void *)cmd->payload;
>  
>      dpa = ldq_le_p(&in->dpa);
> -    if (dpa + 64 > cxl_dstate->mem_size) {
> -        return CXL_MBOX_INVALID_PA;
> -    }
> +	if (dpa + 64 > cxl_dstate->static_mem_size && ct3d->dc.num_regions == 0)

This test will need expanding to include DPAs in DC regions.

> +		return CXL_MBOX_INVALID_PA;
>  
>      QLIST_FOREACH(ent, poison_list, node) {
>          /*
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 70d47d43b9..334660bd0f 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -33,8 +33,8 @@ enum {
>  };
>  
>  static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> -                                         int dsmad_handle, MemoryRegion *mr,
> -                                         bool is_pmem, uint64_t dpa_base)
> +		int dsmad_handle, uint8_t flags,
> +		uint64_t dpa_base, uint64_t size)
>  {
>      g_autofree CDATDsmas *dsmas = NULL;
>      g_autofree CDATDslbis *dslbis0 = NULL;
> @@ -53,9 +53,9 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
>              .length = sizeof(*dsmas),
>          },
>          .DSMADhandle = dsmad_handle,
> -        .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
> +		.flags = flags,
>          .DPA_base = dpa_base,
> -        .DPA_length = int128_get64(mr->size),
> +		.DPA_length = size,
>      };
>  
>      /* For now, no memory side cache, plausiblish numbers */
> @@ -137,9 +137,9 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
>           * NV: Reserved - the non volatile from DSMAS matters
>           * V: EFI_MEMORY_SP
>           */
> -        .EFI_memory_type_attr = is_pmem ? 2 : 1,
> +		.EFI_memory_type_attr = flags ? 2 : 1,

Fix all these alignment changes (spaces vs tabs)

>          .DPA_offset = 0,
> -        .DPA_length = int128_get64(mr->size),
> +		.DPA_length = size,
>      };
>  
>      /* Header always at start of structure */
> @@ -158,14 +158,15 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
>      g_autofree CDATSubHeader **table = NULL;
>      CXLType3Dev *ct3d = priv;
>      MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
> +	MemoryRegion *dc_mr = NULL;
>      int dsmad_handle = 0;
>      int cur_ent = 0;
>      int len = 0;
>      int rc, i;
> +	uint64_t vmr_size = 0, pmr_size = 0;
>  
> -    if (!ct3d->hostpmem && !ct3d->hostvmem) {
> -        return 0;
> -    }
> +	if (!ct3d->hostpmem && !ct3d->hostvmem && !ct3d->dc.num_regions)
> +		return 0;
>  
>      if (ct3d->hostvmem) {
>          volatile_mr = host_memory_backend_get_memory(ct3d->hostvmem);
> @@ -173,6 +174,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
>              return -EINVAL;
>          }
>          len += CT3_CDAT_NUM_ENTRIES;
> +		vmr_size = volatile_mr->size;
>      }
>  
>      if (ct3d->hostpmem) {
> @@ -181,7 +183,19 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
>              return -EINVAL;
>          }
>          len += CT3_CDAT_NUM_ENTRIES;
> -    }
> +		pmr_size = nonvolatile_mr->size;
> +	}
> +
> +	if (ct3d->dc.num_regions) {
> +		if (ct3d->dc.host_dc) {
> +			dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> +			if (!dc_mr)
> +				return -EINVAL;
> +			len += CT3_CDAT_NUM_ENTRIES * ct3d->dc.num_regions;
> +		} else {
> +			return -EINVAL;
> +		}
> +	}
>  
>      table = g_malloc0(len * sizeof(*table));
>      if (!table) {
> @@ -189,23 +203,45 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
>      }
>  
>      /* Now fill them in */
> -    if (volatile_mr) {
> -        rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++, volatile_mr,
> -                                           false, 0);
> -        if (rc < 0) {
> -            return rc;
> -        }
> -        cur_ent = CT3_CDAT_NUM_ENTRIES;
> -    }
> +	if (volatile_mr) {
> +		rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++,
> +				0, 0, vmr_size);
> +		if (rc < 0)
> +			return rc;
Without the tabs / spaces accidental conversion this diff should look a lot
clearer..
> +		cur_ent = CT3_CDAT_NUM_ENTRIES;
> +	}
> +
> +	if (nonvolatile_mr) {
> +		rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
> +				CDAT_DSMAS_FLAG_NV, vmr_size, pmr_size);
> +		if (rc < 0)
> +			goto error_cleanup;
> +		cur_ent += CT3_CDAT_NUM_ENTRIES;
> +	}
> +
> +	if (dc_mr) {
> +		uint64_t region_base = vmr_size + pmr_size;
> +
> +		/*
> +		 * Currently we create cdat entries for each region, should we only
> +		 * create dsmas table instead??

I don't think it does any harm to have a lot of similar entries. We may want to reconsider
this in the longer term to make sure that more complex code paths are handled where
things are shared.  What combinations does the spec allow?
One entry for all regions with them all sharing a single dsmad handle?


> +		 * We assume all dc regions are non-volatile for now.
> +		 *
> +		 */
> +		for (i = 0; i < ct3d->dc.num_regions; i++) {
> +			rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent])
> +					, dsmad_handle++
> +					, CDAT_DSMAS_FLAG_NV|CDAT_DSMAS_FLAG_DYNAMIC_CAP
> +					, region_base, ct3d->dc.regions[i].len);
> +			if (rc < 0)
> +				goto error_cleanup;
> +			ct3d->dc.regions[i].dsmadhandle = dsmad_handle-1;
> +
> +			cur_ent += CT3_CDAT_NUM_ENTRIES;
> +			region_base += ct3d->dc.regions[i].len;
> +		}
> +	}
>  
> -    if (nonvolatile_mr) {
> -        rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
> -                nonvolatile_mr, true, (volatile_mr ? volatile_mr->size : 0));
> -        if (rc < 0) {
> -            goto error_cleanup;
> -        }
> -        cur_ent += CT3_CDAT_NUM_ENTRIES;
> -    }
>      assert(len == cur_ent);
>  
>      *cdat_table = g_steal_pointer(&table);
> @@ -706,6 +742,11 @@ static int cxl_create_toy_regions(CXLType3Dev *ct3d)
>  		/* dsmad_handle is set when creating cdat table entries */
>  		region->flags = 0;
>  
> +		region->blk_bitmap = bitmap_new(region->len / region->block_size);
> +		if (!region->blk_bitmap)
> +			return -1;
> +		bitmap_zero(region->blk_bitmap, region->len / region->block_size);
> +
>  		region_base += region->len;
>  	}
>  	QTAILQ_INIT(&ct3d->dc.extents);
> @@ -713,11 +754,24 @@ static int cxl_create_toy_regions(CXLType3Dev *ct3d)
>  	return 0;
>  }
>  
> +static void cxl_destroy_toy_regions(CXLType3Dev *ct3d)

Why toy?  They work after this so no longer toys ;)

> +{
> +	int i;
> +	struct CXLDCD_Region *region;
> +
> +	for (i = 0; i < ct3d->dc.num_regions; i++) {
> +		region = &ct3d->dc.regions[i];
> +		if (region->blk_bitmap)
> +			g_free(region->blk_bitmap);
Why is check needed? Is there a path where we call this function
without the bitmap having been allocated successfully?

> +	}
> +}
> +
>  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>  {
>      DeviceState *ds = DEVICE(ct3d);
>  
> -    if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem) {
> +	if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem
> +			&& !ct3d->dc.num_regions) {
>          error_setg(errp, "at least one memdev property must be set");
>          return false;
>      } else if (ct3d->hostmem && ct3d->hostpmem) {
> @@ -754,7 +808,7 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>          }
>          address_space_init(&ct3d->hostvmem_as, vmr, v_name);
>          ct3d->cxl_dstate.vmem_size = vmr->size;
> -        ct3d->cxl_dstate.mem_size += vmr->size;
> +		ct3d->cxl_dstate.static_mem_size += vmr->size;
>          g_free(v_name);
>      }
>  
> @@ -777,12 +831,47 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
>          }
>          address_space_init(&ct3d->hostpmem_as, pmr, p_name);
>          ct3d->cxl_dstate.pmem_size = pmr->size;
> -        ct3d->cxl_dstate.mem_size += pmr->size;
> +		ct3d->cxl_dstate.static_mem_size += pmr->size;
>          g_free(p_name);
>      }
>  
> -	if (cxl_create_toy_regions(ct3d))
> -		return false;
> +	ct3d->dc.total_dynamic_capicity = 0;
> +	if (ct3d->dc.host_dc) {
> +		MemoryRegion *dc_mr;
> +		char *dc_name;
> +		uint64_t total_region_size = 0;
> +		int i;
> +
> +		dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> +		if (!dc_mr) {
> +			error_setg(errp, "dynamic capacity must have backing device");
> +			return false;

> +		}
> +		/* FIXME: set dc as nonvolatile for now */
That's fine. I think to do anything else we'll want multiple backends anyway.
Perhaps rename the parameter to reflect that it's volatile for now though otherwise
we'll end up deprecating another memory region command line parameter and people will
begin to get grumpy ;)

> +		memory_region_set_nonvolatile(dc_mr, true);
> +		memory_region_set_enabled(dc_mr, true);
> +		host_memory_backend_set_mapped(ct3d->dc.host_dc, true);
> +		if (ds->id) {
> +			dc_name = g_strdup_printf("cxl-dcd-dpa-dc-space:%s", ds->id);
> +		} else {
> +			dc_name = g_strdup("cxl-dcd-dpa-dc-space");
> +		}
> +		address_space_init(&ct3d->dc.host_dc_as, dc_mr, dc_name);
> +
> +		if (cxl_create_toy_regions(ct3d)) {
> +			return false;
> +		}
> +
> +		for (i = 0; i < ct3d->dc.num_regions; i++) {
> +			total_region_size += ct3d->dc.regions[i].len;
> +		}
> +		/* Make sure the host backend is large enough to cover all dc range */
> +		assert(total_region_size <= dc_mr->size);
> +		assert(dc_mr->size % (256*1024*1024) == 0);
> +
> +		ct3d->dc.total_dynamic_capicity = total_region_size;
> +		g_free(dc_name);
> +	}
>  
>      return true;
>  }
> @@ -890,6 +979,10 @@ err_release_cdat:
>  err_free_special_ops:
>      g_free(regs->special_ops);
>  err_address_space_free:
> +	if (ct3d->dc.host_dc) {
> +		cxl_destroy_toy_regions(ct3d);
> +		address_space_destroy(&ct3d->dc.host_dc_as);
> +	}
>      if (ct3d->hostpmem) {
>          address_space_destroy(&ct3d->hostpmem_as);
>      }
> @@ -909,6 +1002,10 @@ static void ct3_exit(PCIDevice *pci_dev)
>      cxl_doe_cdat_release(cxl_cstate);
>      spdm_sock_fini(ct3d->doe_spdm.socket);
>      g_free(regs->special_ops);
> +	if (ct3d->dc.host_dc) {
> +		cxl_destroy_toy_regions(ct3d);
> +		address_space_destroy(&ct3d->dc.host_dc_as);
> +	}
>      if (ct3d->hostpmem) {
>          address_space_destroy(&ct3d->hostpmem_as);
>      }
> @@ -917,6 +1014,100 @@ static void ct3_exit(PCIDevice *pci_dev)
>      }
>  }
>  
> +static void set_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
> +		uint64_t len)
> +{
> +	int i;
> +	CXLDCD_Region *region = NULL;
> +
> +	if (dpa < ct3d->dc.regions[0].base
> +		|| dpa >= ct3d->dc.regions[0].base + ct3d->dc.total_dynamic_capicity)
> +		return;
> +
> +	/*
> +	 * spec 3.0 9.13.3: Regions are used in increasing-DPA order, with
> +	 * Region 0 being used for the lowest DPA of Dynamic Capacity and
> +	 * Region 7 for the highest DPA.
> +	 * So we check from the last region to find where the dpa belongs.
> +	 * access across multiple regions is not allowed.
> +	 **/
> +	for (i = ct3d->dc.num_regions-1; i >= 0; i--) {
> +		region = &ct3d->dc.regions[i];
> +		if (dpa >= region->base)
> +			break;
> +	}
> +
> +	bitmap_set(region->blk_bitmap, (dpa-region->base)/region->block_size,
> +			len/region->block_size);
> +}
> +
> +static bool test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
> +		uint64_t len)
> +{
> +	int i;
> +	CXLDCD_Region *region = NULL;
> +	uint64_t nbits;
> +	long nr;
> +
> +	if (dpa < ct3d->dc.regions[0].base
> +		   || dpa >= ct3d->dc.regions[0].base + ct3d->dc.total_dynamic_capicity)
> +		return false;
> +
> +	/*
> +	 * spec 3.0 9.13.3: Regions are used in increasing-DPA order, with
> +	 * Region 0 being used for the lowest DPA of Dynamic Capacity and
> +	 * Region 7 for the highest DPA.
> +	 * So we check from the last region to find where the dpa belongs.
> +	 * access across multiple regions is not allowed.
> +	 **/
> +	for (i = ct3d->dc.num_regions-1; i >= 0; i--) {
> +		region = &ct3d->dc.regions[i];
> +		if (dpa >= region->base)
> +			break;
> +	}
> +
> +	nr = (dpa-region->base)/region->block_size;
> +	nbits = (len + region->block_size-1)/region->block_size;
> +	if (find_next_zero_bit(region->blk_bitmap, nr+nbits, nr)
> +			>= nr+nbits)
> +		return true;
> +
> +	return false;

return find_next_zero_bit(...) >= nr + nbits;

> +}
> +
> +static void clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
> +		uint64_t len)
> +{
> +	int i;
> +	CXLDCD_Region *region = NULL;
> +	uint64_t nbits;
> +	long nr;
> +
> +	if (dpa < ct3d->dc.regions[0].base
> +		|| dpa >= ct3d->dc.regions[0].base + ct3d->dc.total_dynamic_capicity)
> +		return;
> +
> +	/*
> +	 * spec 3.0 9.13.3: Regions are used in increasing-DPA order, with
> +	 * Region 0 being used for the lowest DPA of Dynamic Capacity and
> +	 * Region 7 for the highest DPA.
> +	 * So we check from the last region to find where the dpa belongs.
> +	 * access across multiple regions is not allowed.
> +	 **/
> +	for (i = ct3d->dc.num_regions-1; i >= 0; i--) {
> +		region = &ct3d->dc.regions[i];
> +		if (dpa >= region->base)
> +			break;
> +	}
> +
> +	nr = (dpa-region->base) / region->block_size;
> +	nbits = (len + region->block_size-1) / region->block_size;

Why handle non precise multiple?  

> +	for (i = 0; i < nbits; i++) {
> +		clear_bit(nr, region->blk_bitmap);
> +		nr++;
> +	}

bitmap_clear()?



> +
>  static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr host_addr, uint64_t *dpa)
>  {
>      uint32_t *cache_mem = ct3d->cxl_cstate.crb.cache_mem_registers;
> @@ -973,16 +1164,24 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
>                                         AddressSpace **as,
>                                         uint64_t *dpa_offset)
>  {
> -    MemoryRegion *vmr = NULL, *pmr = NULL;
> +	MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
> +	uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
>  
>      if (ct3d->hostvmem) {
>          vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> +		vmr_size = int128_get64(vmr->size);
>      }
>      if (ct3d->hostpmem) {
>          pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> +		pmr_size = int128_get64(pmr->size);
>      }
> +	if (ct3d->dc.host_dc) {
> +		dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> +		/* Do we want dc_size to be dc_mr->size or not?? */
> +		dc_size = ct3d->dc.total_dynamic_capicity;
> +	}
>  
> -    if (!vmr && !pmr) {
> +	if (!vmr && !pmr && !dc_mr) {
>          return -ENODEV;
>      }
>  
> @@ -990,20 +1189,22 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
>          return -EINVAL;
>      }
>  
> -    if (*dpa_offset > int128_get64(ct3d->cxl_dstate.mem_size)) {
> +    if (*dpa_offset >= vmr_size + pmr_size + dc_size ||
> +       (*dpa_offset >= vmr_size + pmr_size && ct3d->dc.num_regions == 0)) {
>          return -EINVAL;
>      }
>  
> -    if (vmr) {
> -        if (*dpa_offset < int128_get64(vmr->size)) {
> -            *as = &ct3d->hostvmem_as;
> -        } else {
> -            *as = &ct3d->hostpmem_as;
> -            *dpa_offset -= vmr->size;
> -        }
> -    } else {
> -        *as = &ct3d->hostpmem_as;
> -    }
> +	if (*dpa_offset < vmr_size)
> +		*as = &ct3d->hostvmem_as;
> +	else if (*dpa_offset < vmr_size + pmr_size) {
> +		*as = &ct3d->hostpmem_as;
> +		*dpa_offset -= vmr_size;
> +	} else {
> +		if (!test_region_block_backed(ct3d, *dpa_offset, size))
> +			return -ENODEV;
> +		*as = &ct3d->dc.host_dc_as;
> +		*dpa_offset -= (vmr_size + pmr_size);
> +	}
>  
>      return 0;
>  }
> @@ -1069,6 +1270,8 @@ static Property ct3_props[] = {
>      DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
>      DEFINE_PROP_UINT16("spdm", CXLType3Dev, spdm_port, 0),
>  	DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
> +	DEFINE_PROP_LINK("dc-memdev", CXLType3Dev, dc.host_dc,
> +					TYPE_MEMORY_BACKEND, HostMemoryBackend *),

Perhaps volatile-dc-memdev?  leaves us space for a persistent one in future.
If anyone every cares that is ;)

>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> @@ -1135,34 +1338,41 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
>  
>  static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
>  {
> -    MemoryRegion *vmr = NULL, *pmr = NULL;
> +	MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
>      AddressSpace *as;
> +	uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
>  
>      if (ct3d->hostvmem) {
>          vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> +		vmr_size = int128_get64(vmr->size);
>      }
>      if (ct3d->hostpmem) {
>          pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> +		pmr_size = int128_get64(pmr->size);
>      }
> +	if (ct3d->dc.host_dc) {
> +		dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> +		dc_size = ct3d->dc.total_dynamic_capicity;
> +	}
>  
> -    if (!vmr && !pmr) {
> +    if (!vmr && !pmr && !dc_mr) {
>          return false;
>      }
>  
> -    if (dpa_offset + 64 > int128_get64(ct3d->cxl_dstate.mem_size)) {
> -        return false;
> -    }
> +	if (dpa_offset >= vmr_size + pmr_size + dc_size)
> +		return false;
> +	if (dpa_offset + 64 >= vmr_size + pmr_size && ct3d->dc.num_regions == 0)
> +		return false;
>  
> -    if (vmr) {
> -        if (dpa_offset <= int128_get64(vmr->size)) {
> -            as = &ct3d->hostvmem_as;
> -        } else {
> -            as = &ct3d->hostpmem_as;
> -            dpa_offset -= vmr->size;
> -        }
> -    } else {
> -        as = &ct3d->hostpmem_as;
> -    }
> +	if (dpa_offset < vmr_size) {
> +		as = &ct3d->hostvmem_as;
> +	} else if (dpa_offset < vmr_size + pmr_size) {
> +		as = &ct3d->hostpmem_as;
> +		dpa_offset -= vmr->size;
> +	} else {
> +		as = &ct3d->dc.host_dc_as;
> +		dpa_offset -= (vmr_size + pmr_size);
> +	}
>  
>      address_space_write(as, dpa_offset, MEMTXATTRS_UNSPECIFIED, &data, 64);
>      return true;
> @@ -1711,6 +1921,14 @@ static void qmp_cxl_process_dynamic_capacity_event(const char *path, CxlEventLog
>  		memcpy(&dCap.dynamic_capacity_extent, &extents[i]
>  				, sizeof(CXLDCExtent_raw));
>  
> +		if (dCap.type == 0x0)

Enum values as suggested in earlier patch.

> +			set_region_block_backed(dcd, extents[i].start_dpa, extents[i].len);
> +		else if (dCap.type == 0x1)
> +			clear_region_block_backed(dcd, extents[i].start_dpa,
> +					extents[i].len);
> +		else
> +			error_setg(errp, "DC event not support yet, no bitmap op");
> +
>  		if (cxl_event_insert(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP,
>  					(CXLEventRecordRaw *)&dCap)) {
>  			;
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index c0c8fcc24b..d9b6776e2c 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -211,7 +211,7 @@ typedef struct cxl_device_state {
>      } timestamp;
>  
>      /* memory region size, HDM */
> -    uint64_t mem_size;
> +	uint64_t static_mem_size;
>      uint64_t pmem_size;
>      uint64_t vmem_size;
>  
> @@ -412,6 +412,7 @@ typedef struct CXLDCD_Region {
>  	uint64_t block_size;
>  	uint32_t dsmadhandle;
>  	uint8_t flags;
> +	unsigned long *blk_bitmap;
>  } CXLDCD_Region;
>  
>  struct CXLType3Dev {
> @@ -447,12 +448,17 @@ struct CXLType3Dev {
>      uint64_t poison_list_overflow_ts;
>  
>  	struct dynamic_capacity {
> +		HostMemoryBackend *host_dc;
> +		AddressSpace host_dc_as;
> +
> +		uint8_t num_hosts; //Table 7-55

Not visible here as far as I can see. So leave it for now.

>  		uint8_t num_regions; // 1-8
>  		struct CXLDCD_Region regions[DCD_MAX_REGION_NUM];
>  		CXLDCDExtentList extents;
>  
>  		uint32_t total_extent_count;
>  		uint32_t ext_list_gen_seq;
> +		uint64_t total_dynamic_capicity; // 256M aligned
capacity

>  	} dc;
>  };
>
Re: [RFC 7/7] hw/mem/cxl_type3: add read/write support to dynamic capacity
Posted by nifan@outlook.com 10 months, 1 week ago
The 05/15/2023 16:22, Jonathan Cameron wrote:
> On Thu, 11 May 2023 17:56:40 +0000
> Fan Ni <fan.ni@samsung.com> wrote:
> 
> > From: Fan Ni <nifan@outlook.com>
> > 
> > Before the change, read from or write to dynamic capacity of the memory
> > device is not support as 1) no host backed file/memory is provided for
> > it; 2) no address space is created for the dynamic capacity.
> 
> Ah nice. I should have read ahead.  Probably makes sense to reorder things
> so that when we present DCD region it will work.

We can back dynamic capacity with host memory/file and create address
space for dc regions, but until extents can be added we should not expect
any read/write can happen to the dynamic capacity, right?

Fan
> 
> > 
> > With the change, add code to support following:
> > 1. add a new property to type3 device "dc-memdev" to point to host
> >    memory backend for dynamic capacity;
> > 2. add a bitmap for each region to track whether a block is host backed,
> > which will be used for address check when read/write dynamic capacity;
> > 3. add namespace for dynamic capacity for read/write support;
> > 4. create cdat entries for each dynamic capacity region;
> > 
> > Signed-off-by: Fan Ni <fan.ni@samsung.com>
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  |  21 ++-
> >  hw/mem/cxl_type3.c          | 336 +++++++++++++++++++++++++++++-------
> >  include/hw/cxl/cxl_device.h |   8 +-
> >  3 files changed, 298 insertions(+), 67 deletions(-)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index 7212934627..efe61e67fb 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -391,9 +391,11 @@ static CXLRetCode cmd_firmware_update_get_info(struct cxl_cmd *cmd,
> >          char fw_rev4[0x10];
> >      } QEMU_PACKED *fw_info;
> >      QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
> > +	CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> >  
> >      if ((cxl_dstate->vmem_size < CXL_CAPACITY_MULTIPLIER) ||
> > -        (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER)) {
> > +			(cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) ||
> 
> Keep old alignment
> 
> > +		(ct3d->dc.total_dynamic_capicity < CXL_CAPACITY_MULTIPLIER)) {
> 
> We should think about the separation between what goes in cxl_dstate and directly
> in ct3d. That boundary has been blurring for a while and getting some review
> comments.
> 
> >          return CXL_MBOX_INTERNAL_ERROR;
> >      }
> >  
> > @@ -534,7 +536,9 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd *cmd,
> >      CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
> >  
> >      if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
> > -        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
> > +		(!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
> > +		(!QEMU_IS_ALIGNED(ct3d->dc.total_dynamic_capicity,
> > +						CXL_CAPACITY_MULTIPLIER))) {
> >          return CXL_MBOX_INTERNAL_ERROR;
> >      }
> >  
> > @@ -543,7 +547,8 @@ static CXLRetCode cmd_identify_memory_device(struct cxl_cmd *cmd,
> >  
> >      snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
> >  
> > -    stq_le_p(&id->total_capacity, cxl_dstate->mem_size / CXL_CAPACITY_MULTIPLIER);
> > +	stq_le_p(&id->total_capacity,
> > +			cxl_dstate->static_mem_size / CXL_CAPACITY_MULTIPLIER);
> 
> Pull the rename out as a precursor patch.
> 
> >      stq_le_p(&id->persistent_capacity, cxl_dstate->pmem_size / CXL_CAPACITY_MULTIPLIER);
> >      stq_le_p(&id->volatile_capacity, cxl_dstate->vmem_size / CXL_CAPACITY_MULTIPLIER);
> >      stl_le_p(&id->lsa_size, cvc->get_lsa_size(ct3d));
> > @@ -568,9 +573,12 @@ static CXLRetCode cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
> >          uint64_t next_pmem;
> >      } QEMU_PACKED *part_info = (void *)cmd->payload;
> >      QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
> > +	CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> >  
> >      if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
> > -        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
> > +		(!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
> > +		(!QEMU_IS_ALIGNED(ct3d->dc.total_dynamic_capicity,
> > +						CXL_CAPACITY_MULTIPLIER))) {
> >          return CXL_MBOX_INTERNAL_ERROR;
> >      }
> >  
> > @@ -881,9 +889,8 @@ static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
> >      struct clear_poison_pl *in = (void *)cmd->payload;
> >  
> >      dpa = ldq_le_p(&in->dpa);
> > -    if (dpa + 64 > cxl_dstate->mem_size) {
> > -        return CXL_MBOX_INVALID_PA;
> > -    }
> > +	if (dpa + 64 > cxl_dstate->static_mem_size && ct3d->dc.num_regions == 0)
> 
> This test will need expanding to include DPAs in DC regions.
> 
> > +		return CXL_MBOX_INVALID_PA;
> >  
> >      QLIST_FOREACH(ent, poison_list, node) {
> >          /*
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 70d47d43b9..334660bd0f 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -33,8 +33,8 @@ enum {
> >  };
> >  
> >  static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> > -                                         int dsmad_handle, MemoryRegion *mr,
> > -                                         bool is_pmem, uint64_t dpa_base)
> > +		int dsmad_handle, uint8_t flags,
> > +		uint64_t dpa_base, uint64_t size)
> >  {
> >      g_autofree CDATDsmas *dsmas = NULL;
> >      g_autofree CDATDslbis *dslbis0 = NULL;
> > @@ -53,9 +53,9 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> >              .length = sizeof(*dsmas),
> >          },
> >          .DSMADhandle = dsmad_handle,
> > -        .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
> > +		.flags = flags,
> >          .DPA_base = dpa_base,
> > -        .DPA_length = int128_get64(mr->size),
> > +		.DPA_length = size,
> >      };
> >  
> >      /* For now, no memory side cache, plausiblish numbers */
> > @@ -137,9 +137,9 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
> >           * NV: Reserved - the non volatile from DSMAS matters
> >           * V: EFI_MEMORY_SP
> >           */
> > -        .EFI_memory_type_attr = is_pmem ? 2 : 1,
> > +		.EFI_memory_type_attr = flags ? 2 : 1,
> 
> Fix all these alignment changes (spaces vs tabs)
> 
> >          .DPA_offset = 0,
> > -        .DPA_length = int128_get64(mr->size),
> > +		.DPA_length = size,
> >      };
> >  
> >      /* Header always at start of structure */
> > @@ -158,14 +158,15 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
> >      g_autofree CDATSubHeader **table = NULL;
> >      CXLType3Dev *ct3d = priv;
> >      MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
> > +	MemoryRegion *dc_mr = NULL;
> >      int dsmad_handle = 0;
> >      int cur_ent = 0;
> >      int len = 0;
> >      int rc, i;
> > +	uint64_t vmr_size = 0, pmr_size = 0;
> >  
> > -    if (!ct3d->hostpmem && !ct3d->hostvmem) {
> > -        return 0;
> > -    }
> > +	if (!ct3d->hostpmem && !ct3d->hostvmem && !ct3d->dc.num_regions)
> > +		return 0;
> >  
> >      if (ct3d->hostvmem) {
> >          volatile_mr = host_memory_backend_get_memory(ct3d->hostvmem);
> > @@ -173,6 +174,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
> >              return -EINVAL;
> >          }
> >          len += CT3_CDAT_NUM_ENTRIES;
> > +		vmr_size = volatile_mr->size;
> >      }
> >  
> >      if (ct3d->hostpmem) {
> > @@ -181,7 +183,19 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
> >              return -EINVAL;
> >          }
> >          len += CT3_CDAT_NUM_ENTRIES;
> > -    }
> > +		pmr_size = nonvolatile_mr->size;
> > +	}
> > +
> > +	if (ct3d->dc.num_regions) {
> > +		if (ct3d->dc.host_dc) {
> > +			dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > +			if (!dc_mr)
> > +				return -EINVAL;
> > +			len += CT3_CDAT_NUM_ENTRIES * ct3d->dc.num_regions;
> > +		} else {
> > +			return -EINVAL;
> > +		}
> > +	}
> >  
> >      table = g_malloc0(len * sizeof(*table));
> >      if (!table) {
> > @@ -189,23 +203,45 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
> >      }
> >  
> >      /* Now fill them in */
> > -    if (volatile_mr) {
> > -        rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++, volatile_mr,
> > -                                           false, 0);
> > -        if (rc < 0) {
> > -            return rc;
> > -        }
> > -        cur_ent = CT3_CDAT_NUM_ENTRIES;
> > -    }
> > +	if (volatile_mr) {
> > +		rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++,
> > +				0, 0, vmr_size);
> > +		if (rc < 0)
> > +			return rc;
> Without the tabs / spaces accidental conversion this diff should look a lot
> clearer..
> > +		cur_ent = CT3_CDAT_NUM_ENTRIES;
> > +	}
> > +
> > +	if (nonvolatile_mr) {
> > +		rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
> > +				CDAT_DSMAS_FLAG_NV, vmr_size, pmr_size);
> > +		if (rc < 0)
> > +			goto error_cleanup;
> > +		cur_ent += CT3_CDAT_NUM_ENTRIES;
> > +	}
> > +
> > +	if (dc_mr) {
> > +		uint64_t region_base = vmr_size + pmr_size;
> > +
> > +		/*
> > +		 * Currently we create cdat entries for each region, should we only
> > +		 * create dsmas table instead??
> 
> I don't think it does any harm to have a lot of similar entries. We may want to reconsider
> this in the longer term to make sure that more complex code paths are handled where
> things are shared.  What combinations does the spec allow?
> One entry for all regions with them all sharing a single dsmad handle?
> 
> 
> > +		 * We assume all dc regions are non-volatile for now.
> > +		 *
> > +		 */
> > +		for (i = 0; i < ct3d->dc.num_regions; i++) {
> > +			rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent])
> > +					, dsmad_handle++
> > +					, CDAT_DSMAS_FLAG_NV|CDAT_DSMAS_FLAG_DYNAMIC_CAP
> > +					, region_base, ct3d->dc.regions[i].len);
> > +			if (rc < 0)
> > +				goto error_cleanup;
> > +			ct3d->dc.regions[i].dsmadhandle = dsmad_handle-1;
> > +
> > +			cur_ent += CT3_CDAT_NUM_ENTRIES;
> > +			region_base += ct3d->dc.regions[i].len;
> > +		}
> > +	}
> >  
> > -    if (nonvolatile_mr) {
> > -        rc = ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
> > -                nonvolatile_mr, true, (volatile_mr ? volatile_mr->size : 0));
> > -        if (rc < 0) {
> > -            goto error_cleanup;
> > -        }
> > -        cur_ent += CT3_CDAT_NUM_ENTRIES;
> > -    }
> >      assert(len == cur_ent);
> >  
> >      *cdat_table = g_steal_pointer(&table);
> > @@ -706,6 +742,11 @@ static int cxl_create_toy_regions(CXLType3Dev *ct3d)
> >  		/* dsmad_handle is set when creating cdat table entries */
> >  		region->flags = 0;
> >  
> > +		region->blk_bitmap = bitmap_new(region->len / region->block_size);
> > +		if (!region->blk_bitmap)
> > +			return -1;
> > +		bitmap_zero(region->blk_bitmap, region->len / region->block_size);
> > +
> >  		region_base += region->len;
> >  	}
> >  	QTAILQ_INIT(&ct3d->dc.extents);
> > @@ -713,11 +754,24 @@ static int cxl_create_toy_regions(CXLType3Dev *ct3d)
> >  	return 0;
> >  }
> >  
> > +static void cxl_destroy_toy_regions(CXLType3Dev *ct3d)
> 
> Why toy?  They work after this so no longer toys ;)
> 
> > +{
> > +	int i;
> > +	struct CXLDCD_Region *region;
> > +
> > +	for (i = 0; i < ct3d->dc.num_regions; i++) {
> > +		region = &ct3d->dc.regions[i];
> > +		if (region->blk_bitmap)
> > +			g_free(region->blk_bitmap);
> Why is check needed? Is there a path where we call this function
> without the bitmap having been allocated successfully?
> 
> > +	}
> > +}
> > +
> >  static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> >  {
> >      DeviceState *ds = DEVICE(ct3d);
> >  
> > -    if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem) {
> > +	if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem
> > +			&& !ct3d->dc.num_regions) {
> >          error_setg(errp, "at least one memdev property must be set");
> >          return false;
> >      } else if (ct3d->hostmem && ct3d->hostpmem) {
> > @@ -754,7 +808,7 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> >          }
> >          address_space_init(&ct3d->hostvmem_as, vmr, v_name);
> >          ct3d->cxl_dstate.vmem_size = vmr->size;
> > -        ct3d->cxl_dstate.mem_size += vmr->size;
> > +		ct3d->cxl_dstate.static_mem_size += vmr->size;
> >          g_free(v_name);
> >      }
> >  
> > @@ -777,12 +831,47 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> >          }
> >          address_space_init(&ct3d->hostpmem_as, pmr, p_name);
> >          ct3d->cxl_dstate.pmem_size = pmr->size;
> > -        ct3d->cxl_dstate.mem_size += pmr->size;
> > +		ct3d->cxl_dstate.static_mem_size += pmr->size;
> >          g_free(p_name);
> >      }
> >  
> > -	if (cxl_create_toy_regions(ct3d))
> > -		return false;
> > +	ct3d->dc.total_dynamic_capicity = 0;
> > +	if (ct3d->dc.host_dc) {
> > +		MemoryRegion *dc_mr;
> > +		char *dc_name;
> > +		uint64_t total_region_size = 0;
> > +		int i;
> > +
> > +		dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > +		if (!dc_mr) {
> > +			error_setg(errp, "dynamic capacity must have backing device");
> > +			return false;
> 
> > +		}
> > +		/* FIXME: set dc as nonvolatile for now */
> That's fine. I think to do anything else we'll want multiple backends anyway.
> Perhaps rename the parameter to reflect that it's volatile for now though otherwise
> we'll end up deprecating another memory region command line parameter and people will
> begin to get grumpy ;)
> 
> > +		memory_region_set_nonvolatile(dc_mr, true);
> > +		memory_region_set_enabled(dc_mr, true);
> > +		host_memory_backend_set_mapped(ct3d->dc.host_dc, true);
> > +		if (ds->id) {
> > +			dc_name = g_strdup_printf("cxl-dcd-dpa-dc-space:%s", ds->id);
> > +		} else {
> > +			dc_name = g_strdup("cxl-dcd-dpa-dc-space");
> > +		}
> > +		address_space_init(&ct3d->dc.host_dc_as, dc_mr, dc_name);
> > +
> > +		if (cxl_create_toy_regions(ct3d)) {
> > +			return false;
> > +		}
> > +
> > +		for (i = 0; i < ct3d->dc.num_regions; i++) {
> > +			total_region_size += ct3d->dc.regions[i].len;
> > +		}
> > +		/* Make sure the host backend is large enough to cover all dc range */
> > +		assert(total_region_size <= dc_mr->size);
> > +		assert(dc_mr->size % (256*1024*1024) == 0);
> > +
> > +		ct3d->dc.total_dynamic_capicity = total_region_size;
> > +		g_free(dc_name);
> > +	}
> >  
> >      return true;
> >  }
> > @@ -890,6 +979,10 @@ err_release_cdat:
> >  err_free_special_ops:
> >      g_free(regs->special_ops);
> >  err_address_space_free:
> > +	if (ct3d->dc.host_dc) {
> > +		cxl_destroy_toy_regions(ct3d);
> > +		address_space_destroy(&ct3d->dc.host_dc_as);
> > +	}
> >      if (ct3d->hostpmem) {
> >          address_space_destroy(&ct3d->hostpmem_as);
> >      }
> > @@ -909,6 +1002,10 @@ static void ct3_exit(PCIDevice *pci_dev)
> >      cxl_doe_cdat_release(cxl_cstate);
> >      spdm_sock_fini(ct3d->doe_spdm.socket);
> >      g_free(regs->special_ops);
> > +	if (ct3d->dc.host_dc) {
> > +		cxl_destroy_toy_regions(ct3d);
> > +		address_space_destroy(&ct3d->dc.host_dc_as);
> > +	}
> >      if (ct3d->hostpmem) {
> >          address_space_destroy(&ct3d->hostpmem_as);
> >      }
> > @@ -917,6 +1014,100 @@ static void ct3_exit(PCIDevice *pci_dev)
> >      }
> >  }
> >  
> > +static void set_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
> > +		uint64_t len)
> > +{
> > +	int i;
> > +	CXLDCD_Region *region = NULL;
> > +
> > +	if (dpa < ct3d->dc.regions[0].base
> > +		|| dpa >= ct3d->dc.regions[0].base + ct3d->dc.total_dynamic_capicity)
> > +		return;
> > +
> > +	/*
> > +	 * spec 3.0 9.13.3: Regions are used in increasing-DPA order, with
> > +	 * Region 0 being used for the lowest DPA of Dynamic Capacity and
> > +	 * Region 7 for the highest DPA.
> > +	 * So we check from the last region to find where the dpa belongs.
> > +	 * access across multiple regions is not allowed.
> > +	 **/
> > +	for (i = ct3d->dc.num_regions-1; i >= 0; i--) {
> > +		region = &ct3d->dc.regions[i];
> > +		if (dpa >= region->base)
> > +			break;
> > +	}
> > +
> > +	bitmap_set(region->blk_bitmap, (dpa-region->base)/region->block_size,
> > +			len/region->block_size);
> > +}
> > +
> > +static bool test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
> > +		uint64_t len)
> > +{
> > +	int i;
> > +	CXLDCD_Region *region = NULL;
> > +	uint64_t nbits;
> > +	long nr;
> > +
> > +	if (dpa < ct3d->dc.regions[0].base
> > +		   || dpa >= ct3d->dc.regions[0].base + ct3d->dc.total_dynamic_capicity)
> > +		return false;
> > +
> > +	/*
> > +	 * spec 3.0 9.13.3: Regions are used in increasing-DPA order, with
> > +	 * Region 0 being used for the lowest DPA of Dynamic Capacity and
> > +	 * Region 7 for the highest DPA.
> > +	 * So we check from the last region to find where the dpa belongs.
> > +	 * access across multiple regions is not allowed.
> > +	 **/
> > +	for (i = ct3d->dc.num_regions-1; i >= 0; i--) {
> > +		region = &ct3d->dc.regions[i];
> > +		if (dpa >= region->base)
> > +			break;
> > +	}
> > +
> > +	nr = (dpa-region->base)/region->block_size;
> > +	nbits = (len + region->block_size-1)/region->block_size;
> > +	if (find_next_zero_bit(region->blk_bitmap, nr+nbits, nr)
> > +			>= nr+nbits)
> > +		return true;
> > +
> > +	return false;
> 
> return find_next_zero_bit(...) >= nr + nbits;
> 
> > +}
> > +
> > +static void clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
> > +		uint64_t len)
> > +{
> > +	int i;
> > +	CXLDCD_Region *region = NULL;
> > +	uint64_t nbits;
> > +	long nr;
> > +
> > +	if (dpa < ct3d->dc.regions[0].base
> > +		|| dpa >= ct3d->dc.regions[0].base + ct3d->dc.total_dynamic_capicity)
> > +		return;
> > +
> > +	/*
> > +	 * spec 3.0 9.13.3: Regions are used in increasing-DPA order, with
> > +	 * Region 0 being used for the lowest DPA of Dynamic Capacity and
> > +	 * Region 7 for the highest DPA.
> > +	 * So we check from the last region to find where the dpa belongs.
> > +	 * access across multiple regions is not allowed.
> > +	 **/
> > +	for (i = ct3d->dc.num_regions-1; i >= 0; i--) {
> > +		region = &ct3d->dc.regions[i];
> > +		if (dpa >= region->base)
> > +			break;
> > +	}
> > +
> > +	nr = (dpa-region->base) / region->block_size;
> > +	nbits = (len + region->block_size-1) / region->block_size;
> 
> Why handle non precise multiple?  
> 
> > +	for (i = 0; i < nbits; i++) {
> > +		clear_bit(nr, region->blk_bitmap);
> > +		nr++;
> > +	}
> 
> bitmap_clear()?
> 
> 
> 
> > +
> >  static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr host_addr, uint64_t *dpa)
> >  {
> >      uint32_t *cache_mem = ct3d->cxl_cstate.crb.cache_mem_registers;
> > @@ -973,16 +1164,24 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
> >                                         AddressSpace **as,
> >                                         uint64_t *dpa_offset)
> >  {
> > -    MemoryRegion *vmr = NULL, *pmr = NULL;
> > +	MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
> > +	uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
> >  
> >      if (ct3d->hostvmem) {
> >          vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> > +		vmr_size = int128_get64(vmr->size);
> >      }
> >      if (ct3d->hostpmem) {
> >          pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> > +		pmr_size = int128_get64(pmr->size);
> >      }
> > +	if (ct3d->dc.host_dc) {
> > +		dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > +		/* Do we want dc_size to be dc_mr->size or not?? */
> > +		dc_size = ct3d->dc.total_dynamic_capicity;
> > +	}
> >  
> > -    if (!vmr && !pmr) {
> > +	if (!vmr && !pmr && !dc_mr) {
> >          return -ENODEV;
> >      }
> >  
> > @@ -990,20 +1189,22 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
> >          return -EINVAL;
> >      }
> >  
> > -    if (*dpa_offset > int128_get64(ct3d->cxl_dstate.mem_size)) {
> > +    if (*dpa_offset >= vmr_size + pmr_size + dc_size ||
> > +       (*dpa_offset >= vmr_size + pmr_size && ct3d->dc.num_regions == 0)) {
> >          return -EINVAL;
> >      }
> >  
> > -    if (vmr) {
> > -        if (*dpa_offset < int128_get64(vmr->size)) {
> > -            *as = &ct3d->hostvmem_as;
> > -        } else {
> > -            *as = &ct3d->hostpmem_as;
> > -            *dpa_offset -= vmr->size;
> > -        }
> > -    } else {
> > -        *as = &ct3d->hostpmem_as;
> > -    }
> > +	if (*dpa_offset < vmr_size)
> > +		*as = &ct3d->hostvmem_as;
> > +	else if (*dpa_offset < vmr_size + pmr_size) {
> > +		*as = &ct3d->hostpmem_as;
> > +		*dpa_offset -= vmr_size;
> > +	} else {
> > +		if (!test_region_block_backed(ct3d, *dpa_offset, size))
> > +			return -ENODEV;
> > +		*as = &ct3d->dc.host_dc_as;
> > +		*dpa_offset -= (vmr_size + pmr_size);
> > +	}
> >  
> >      return 0;
> >  }
> > @@ -1069,6 +1270,8 @@ static Property ct3_props[] = {
> >      DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
> >      DEFINE_PROP_UINT16("spdm", CXLType3Dev, spdm_port, 0),
> >  	DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
> > +	DEFINE_PROP_LINK("dc-memdev", CXLType3Dev, dc.host_dc,
> > +					TYPE_MEMORY_BACKEND, HostMemoryBackend *),
> 
> Perhaps volatile-dc-memdev?  leaves us space for a persistent one in future.
> If anyone every cares that is ;)
> 
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > @@ -1135,34 +1338,41 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
> >  
> >  static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
> >  {
> > -    MemoryRegion *vmr = NULL, *pmr = NULL;
> > +	MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
> >      AddressSpace *as;
> > +	uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
> >  
> >      if (ct3d->hostvmem) {
> >          vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> > +		vmr_size = int128_get64(vmr->size);
> >      }
> >      if (ct3d->hostpmem) {
> >          pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> > +		pmr_size = int128_get64(pmr->size);
> >      }
> > +	if (ct3d->dc.host_dc) {
> > +		dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
> > +		dc_size = ct3d->dc.total_dynamic_capicity;
> > +	}
> >  
> > -    if (!vmr && !pmr) {
> > +    if (!vmr && !pmr && !dc_mr) {
> >          return false;
> >      }
> >  
> > -    if (dpa_offset + 64 > int128_get64(ct3d->cxl_dstate.mem_size)) {
> > -        return false;
> > -    }
> > +	if (dpa_offset >= vmr_size + pmr_size + dc_size)
> > +		return false;
> > +	if (dpa_offset + 64 >= vmr_size + pmr_size && ct3d->dc.num_regions == 0)
> > +		return false;
> >  
> > -    if (vmr) {
> > -        if (dpa_offset <= int128_get64(vmr->size)) {
> > -            as = &ct3d->hostvmem_as;
> > -        } else {
> > -            as = &ct3d->hostpmem_as;
> > -            dpa_offset -= vmr->size;
> > -        }
> > -    } else {
> > -        as = &ct3d->hostpmem_as;
> > -    }
> > +	if (dpa_offset < vmr_size) {
> > +		as = &ct3d->hostvmem_as;
> > +	} else if (dpa_offset < vmr_size + pmr_size) {
> > +		as = &ct3d->hostpmem_as;
> > +		dpa_offset -= vmr->size;
> > +	} else {
> > +		as = &ct3d->dc.host_dc_as;
> > +		dpa_offset -= (vmr_size + pmr_size);
> > +	}
> >  
> >      address_space_write(as, dpa_offset, MEMTXATTRS_UNSPECIFIED, &data, 64);
> >      return true;
> > @@ -1711,6 +1921,14 @@ static void qmp_cxl_process_dynamic_capacity_event(const char *path, CxlEventLog
> >  		memcpy(&dCap.dynamic_capacity_extent, &extents[i]
> >  				, sizeof(CXLDCExtent_raw));
> >  
> > +		if (dCap.type == 0x0)
> 
> Enum values as suggested in earlier patch.
> 
> > +			set_region_block_backed(dcd, extents[i].start_dpa, extents[i].len);
> > +		else if (dCap.type == 0x1)
> > +			clear_region_block_backed(dcd, extents[i].start_dpa,
> > +					extents[i].len);
> > +		else
> > +			error_setg(errp, "DC event not support yet, no bitmap op");
> > +
> >  		if (cxl_event_insert(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP,
> >  					(CXLEventRecordRaw *)&dCap)) {
> >  			;
> > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > index c0c8fcc24b..d9b6776e2c 100644
> > --- a/include/hw/cxl/cxl_device.h
> > +++ b/include/hw/cxl/cxl_device.h
> > @@ -211,7 +211,7 @@ typedef struct cxl_device_state {
> >      } timestamp;
> >  
> >      /* memory region size, HDM */
> > -    uint64_t mem_size;
> > +	uint64_t static_mem_size;
> >      uint64_t pmem_size;
> >      uint64_t vmem_size;
> >  
> > @@ -412,6 +412,7 @@ typedef struct CXLDCD_Region {
> >  	uint64_t block_size;
> >  	uint32_t dsmadhandle;
> >  	uint8_t flags;
> > +	unsigned long *blk_bitmap;
> >  } CXLDCD_Region;
> >  
> >  struct CXLType3Dev {
> > @@ -447,12 +448,17 @@ struct CXLType3Dev {
> >      uint64_t poison_list_overflow_ts;
> >  
> >  	struct dynamic_capacity {
> > +		HostMemoryBackend *host_dc;
> > +		AddressSpace host_dc_as;
> > +
> > +		uint8_t num_hosts; //Table 7-55
> 
> Not visible here as far as I can see. So leave it for now.
> 
> >  		uint8_t num_regions; // 1-8
> >  		struct CXLDCD_Region regions[DCD_MAX_REGION_NUM];
> >  		CXLDCDExtentList extents;
> >  
> >  		uint32_t total_extent_count;
> >  		uint32_t ext_list_gen_seq;
> > +		uint64_t total_dynamic_capicity; // 256M aligned
> capacity
> 
> >  	} dc;
> >  };
> >  
> 

-- 
Fan Ni <nifan@outlook.com>
Re: [RFC 7/7] hw/mem/cxl_type3: add read/write support to dynamic capacity
Posted by Jonathan Cameron via 10 months, 1 week ago
On Wed, 28 Jun 2023 10:09:47 -0700
"nifan@outlook.com" <nifan@outlook.com> wrote:

> The 05/15/2023 16:22, Jonathan Cameron wrote:
> > On Thu, 11 May 2023 17:56:40 +0000
> > Fan Ni <fan.ni@samsung.com> wrote:
> >   
> > > From: Fan Ni <nifan@outlook.com>
> > > 
> > > Before the change, read from or write to dynamic capacity of the memory
> > > device is not support as 1) no host backed file/memory is provided for
> > > it; 2) no address space is created for the dynamic capacity.  
> > 
> > Ah nice. I should have read ahead.  Probably makes sense to reorder things
> > so that when we present DCD region it will work.  
> 
> We can back dynamic capacity with host memory/file and create address
> space for dc regions, but until extents can be added we should not expect
> any read/write can happen to the dynamic capacity, right?

True.  Seems logically 'unusual' though to set up the routing etc, but
not plumb the actual memory access i until later.  I guess it all comes
together in the end and doing it this way lets you handle the extent mapping
later.  So fine to leave it as you have it.

Jonathan