[PATCH v4 00/42] CXl 2.0 emulation Support

Jonathan Cameron via posted 42 patches 2 years, 3 months ago
Test checkpatch failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20220124171705.10432-1-Jonathan.Cameron@huawei.com
Maintainers: David Hildenbrand <david@redhat.com>, Ani Sinha <ani@anisinha.ca>, Shannon Zhao <shannon.zhaosl@gmail.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Daniel Henrique Barboza <danielhb413@gmail.com>, Michael Tokarev <mjt@tls.msk.ru>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, Greg Kurz <groug@kaod.org>, Markus Armbruster <armbru@redhat.com>, Thomas Huth <thuth@redhat.com>, Jonathan Cameron <jonathan.cameron@huawei.com>, Peter Xu <peterx@redhat.com>, Ben Widawsky <ben.widawsky@intel.com>, Paolo Bonzini <pbonzini@redhat.com>, Laurent Vivier <lvivier@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, "Cédric Le Goater" <clg@kaod.org>, David Gibson <david@gibson.dropbear.id.au>, Igor Mammedov <imammedo@redhat.com>, Eric Blake <eblake@redhat.com>, "Philippe Mathieu-Daudé" <f4bug@amsat.org>, Yanan Wang <wangyanan55@huawei.com>, Laurent Vivier <laurent@vivier.eu>, Sergio Lopez <slp@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>
There is a newer version of this series
MAINTAINERS                         |   7 +
hw/Kconfig                          |   1 +
hw/acpi/Kconfig                     |   5 +
hw/acpi/cxl.c                       | 232 +++++++++++++
hw/acpi/meson.build                 |   1 +
hw/arm/Kconfig                      |   1 +
hw/arm/virt-acpi-build.c            |  30 ++
hw/arm/virt.c                       |  40 ++-
hw/core/machine.c                   |  26 ++
hw/cxl/Kconfig                      |   3 +
hw/cxl/cxl-component-utils.c        | 277 +++++++++++++++
hw/cxl/cxl-device-utils.c           | 268 +++++++++++++++
hw/cxl/cxl-host-stubs.c             |  22 ++
hw/cxl/cxl-host.c                   | 263 ++++++++++++++
hw/cxl/cxl-mailbox-utils.c          | 509 ++++++++++++++++++++++++++++
hw/cxl/meson.build                  |   9 +
hw/i386/acpi-build.c                |  97 +++++-
hw/i386/microvm.c                   |   1 +
hw/i386/pc.c                        |  57 +++-
hw/mem/Kconfig                      |   5 +
hw/mem/cxl_type3.c                  | 353 +++++++++++++++++++
hw/mem/meson.build                  |   1 +
hw/meson.build                      |   1 +
hw/pci-bridge/Kconfig               |   5 +
hw/pci-bridge/cxl_root_port.c       | 231 +++++++++++++
hw/pci-bridge/meson.build           |   1 +
hw/pci-bridge/pci_expander_bridge.c | 179 +++++++++-
hw/pci-bridge/pcie_root_port.c      |   6 +-
hw/pci-host/gpex-acpi.c             |  22 +-
hw/pci/pci.c                        |  21 +-
hw/pci/pcie_port.c                  |  25 ++
hw/ppc/spapr.c                      |   1 +
include/hw/acpi/cxl.h               |  28 ++
include/hw/arm/virt.h               |   1 +
include/hw/boards.h                 |   2 +
include/hw/cxl/cxl.h                |  51 +++
include/hw/cxl/cxl_component.h      | 206 +++++++++++
include/hw/cxl/cxl_device.h         | 266 +++++++++++++++
include/hw/cxl/cxl_pci.h            | 160 +++++++++
include/hw/pci/pci.h                |  14 +
include/hw/pci/pci_bridge.h         |  20 ++
include/hw/pci/pci_bus.h            |   7 +
include/hw/pci/pci_ids.h            |   1 +
include/hw/pci/pcie_port.h          |   2 +
qapi/machine.json                   |  15 +
qemu-options.hx                     |  37 ++
softmmu/memory.c                    |   9 +
softmmu/vl.c                        |  11 +
tests/data/acpi/pc/CEDT             | Bin 0 -> 36 bytes
tests/data/acpi/q35/CEDT            | Bin 0 -> 36 bytes
tests/data/acpi/q35/DSDT.viot       | Bin 9398 -> 9416 bytes
tests/data/acpi/virt/CEDT           | Bin 0 -> 36 bytes
tests/qtest/cxl-test.c              | 151 +++++++++
tests/qtest/meson.build             |   4 +
54 files changed, 3645 insertions(+), 40 deletions(-)
create mode 100644 hw/acpi/cxl.c
create mode 100644 hw/cxl/Kconfig
create mode 100644 hw/cxl/cxl-component-utils.c
create mode 100644 hw/cxl/cxl-device-utils.c
create mode 100644 hw/cxl/cxl-host-stubs.c
create mode 100644 hw/cxl/cxl-host.c
create mode 100644 hw/cxl/cxl-mailbox-utils.c
create mode 100644 hw/cxl/meson.build
create mode 100644 hw/mem/cxl_type3.c
create mode 100644 hw/pci-bridge/cxl_root_port.c
create mode 100644 include/hw/acpi/cxl.h
create mode 100644 include/hw/cxl/cxl.h
create mode 100644 include/hw/cxl/cxl_component.h
create mode 100644 include/hw/cxl/cxl_device.h
create mode 100644 include/hw/cxl/cxl_pci.h
create mode 100644 tests/data/acpi/pc/CEDT
create mode 100644 tests/data/acpi/q35/CEDT
create mode 100644 tests/data/acpi/virt/CEDT
create mode 100644 tests/qtest/cxl-test.c
[PATCH v4 00/42] CXl 2.0 emulation Support
Posted by Jonathan Cameron via 2 years, 3 months ago
Previous version was RFC v3: CXL 2.0 Support.
No longer an RFC as I would consider the vast majority of this
to be ready for detailed review. There are still questions called
out in some patches however.

Looking in particular for:
* Review of the PCI interactions
* x86 and ARM machine interactions (particularly the memory maps)
* Review of the interleaving approach - is the basic idea
  acceptable?
* Review of the command line interface.
* CXL related review welcome but much of that got reviewed
  in earlier versions and hasn't changed substantially.

Main changes:
* The CXL fixed memory windows are now instantiated via a
  -cxl-fixed-memory-window command line option.  As they are host level
  entities, not associated with a particular hardware entity a top
  level parameter seems the most natural way to describe them.
  This is also much closer to how it works on a real host than the
  previous assignment of a physical address window to all components
  along the CXL path.
* Dynamic host memory physical address space allocation both for
  the CXL host bridge MMIO space and the CFMWS windows.
* Interleaving support (based loosely on Philippe Mathieu-Daudé's
  earlier work on an interleaved memory device).  Note this is rudimentary
  and low performance but it may be sufficient for test purposes.
* Additional PCI and memory related utility functions needed for the
  interleaving.
* Various minor cleanup and increase in scope of tests.
* For now dropped the support for presenting CXL type 3 devices
  as memory devices in various QEMU interfaces.
* Dropped the patch letting UID be different from bus_nr.  Whilst
  it may be a useful thing to have, we don't need it for this series
  and so should be handled separately.

I've called out patches with major changes by marking them as
co-developed or introducing them as new patches. The original
memory window code has been dropped

After discussions at plumbers and more recently on the mailing list
it was clear that there was interest in getting emulation for CXL 2.0
upstream in QEMU.  This version resolves many of the outstanding issues
and enables the following features:

* Support on both x86/pc and ARM/virt with relevant ACPI tables
  generated in QEMU.
* Host bridge based on the existing PCI Expander Bridge PXB.
* CXL fixed memory windows, allowing host to describe interleaving
  across multiple CXL host bridges.
* pxb-cxl CXL host bridge support including MMIO region for control
  and HDM (Host manage device memory - basically interleaving / routing)
  decoder configuration.
* Basic CXL Root port support.
* CXL Type 3 device support with persistent memory regions (backed by
  hostmem backend).
* Pulled MAINTAINERS entry out to a separate patch and add myself as
  a co-maintainer at Ben's suggestion.

Big TODOs:

* Volatile memory devices (easy but it's more code so left for now).
* Switch support.
* Hotplug?  May not need much but it's not tested yet!
* More tests and tighter verification that values written to hardware
  are actually valid - stuff that real hardware would check.
* Main host bridge support (not a priority for me...)
* Testing, testing and more testing.  I have been running a basic
  set of ARM and x86 tests on this, but there is always room for
  more tests and greater automation.

Why do we want QEMU emulation of CXL?

As Ben stated in V3, QEMU support has been critical to getting OS
software written given lack of availability of hardware supporting the
latest CXL features (coupled with very high demand for support being
ready in a timely fashion). What has become clear since Ben's v3
is that situation is a continuous one.  Whilst we can't talk about
them yet, CXL 3.0 features and OS support have been prototyped on
top of this support and a lot of the ongoing kernel work is being
tested against these patches.

Other features on the qemu-list that build on these include PCI-DOE
/CDAT support from the Avery Design team further showing how this
code is useful.  Whilst not directly related this is also the test
platform for work on PCI IDE/CMA + related DMTF SPDM as CXL both
utilizes and extends those technologies and is likely to be an early
adopter.
Refs:
CMA Kernel: https://lore.kernel.org/all/20210804161839.3492053-1-Jonathan.Cameron@huawei.com/
CMA Qemu: https://lore.kernel.org/qemu-devel/1624665723-5169-1-git-send-email-cbrowy@avery-design.com/
DOE Qemu: https://lore.kernel.org/qemu-devel/1623329999-15662-1-git-send-email-cbrowy@avery-design.com/


As can be seen there is non trivial interaction with other areas of
Qemu, particularly PCI and keeping this set up to date is proving
a burden we'd rather do without :)

Ben mentioned a few other good reasons in v3:
https://lore.kernel.org/qemu-devel/20210202005948.241655-1-ben.widawsky@intel.com/

The evolution of this series perhaps leave it in a less than
entirely obvious order and that may get tidied up in future postings.
I'm also open to this being considered in bite sized chunks.  What
we have here is about what you need for it to be useful for testing
currently kernel code.

All comments welcome.

Ben - I lifted one patch from your git tree that didn't have a
Sign-off.   hw/cxl/component Add a dumb HDM decoder handler
Could you confirm you are happy for one to be added?

Example of new command line (with virt ITS patches ;)

qemu-system-aarch64 -M virt,gic-version=3,cxl=on \
 -m 4g,maxmem=8G,slots=8 \
 ...
 -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-mem4,share=on,mem-path=/tmp/cxltest4.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-lsa4,share=on,mem-path=/tmp/lsa4.raw,size=256M,align=256M \
 -object memory-backend-file,id=tt,share=on,mem-path=/tmp/tt.raw,size=1g \
 -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
 -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \
 -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
 -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0,size=256M \
 -device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \
 -device cxl-type3,bus=root_port14,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1,size=256M \
 -device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \
 -device cxl-type3,bus=root_port15,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2,size=256M \
 -device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \
 -device cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3,size=256M \
 -cxl-fixed-memory-window targets=cxl.1,size=4G,interleave-granularity=8k \
 -cxl-fixed-memory-window targets=cxl.1,targets=cxl.2,size=4G,interleave-granularity=8k

First CFMWS suitable for 2 way interleave, the second for 4 way (2 way
at host level and 2 way at the host bridge).
targets=<range of pxb-cxl uids> , multiple entries if range is disjoint.

With Ben's CXL region patches (v3 shortly) plus fixes as discussed on list,
Linux commands to bring up a 4 way interleave is:

 cd /sys/bus/cxl/devices/
 region=$(cat decoder0.1/create_region)
 echo $region  > decoder0.1/create_region
 ls -lh
 
 //Note the order of devices and adjust the following to make sure they
 //are in order across the 4 root ports.  Easy to do in a tool, but
 //not easy to paste in a cover letter.

 cd region0.1\:0
 echo 4 > interleave_ways
 echo mem2 > target0
 echo mem3 > target1
 echo mem0 > target2
 echo mem1 > target3
 echo $((1024<<20)) > size
 echo 4096 > interleave_granularity
 echo region0.1:0 > /sys/bus/cxl/drivers/cxl_region/bind

Tested with devmem2 and files with known content.
Kernel tree was based on previous version of the region patches
from Ben with various fixes. As Dan just posted an updated version
next job on my list is to test that.

Thanks to Shameer for his help with reviewing the new stuff before
posting.

I'll post a git tree shortly for any who prefer that to lots
of emails ;)

Thanks,

Jonathan

Ben Widawsky (26):
  hw/pci/cxl: Add a CXL component type (interface)
  hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)
  hw/cxl/device: Introduce a CXL device (8.2.8)
  hw/cxl/device: Implement the CAP array (8.2.8.1-2)
  hw/cxl/device: Implement basic mailbox (8.2.8.4)
  hw/cxl/device: Add memory device utilities
  hw/cxl/device: Add cheap EVENTS implementation (8.2.9.1)
  hw/cxl/device: Timestamp implementation (8.2.9.3)
  hw/cxl/device: Add log commands (8.2.9.4) + CEL
  hw/pxb: Use a type for realizing expanders
  hw/pci/cxl: Create a CXL bus type
  hw/pxb: Allow creation of a CXL PXB (host bridge)
  acpi/pci: Consolidate host bridge setup
  hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)
  hw/cxl/rp: Add a root port
  hw/cxl/device: Add a memory device (8.2.8.5)
  hw/cxl/device: Implement MMIO HDM decoding (8.2.5.12)
  acpi/cxl: Add _OSC implementation (9.14.2)
  tests/acpi: allow CEDT table addition
  acpi/cxl: Create the CEDT (9.14.1)
  hw/cxl/device: Add some trivial commands
  hw/cxl/device: Plumb real Label Storage Area (LSA) sizing
  hw/cxl/device: Implement get/set Label Storage Area (LSA)
  acpi/cxl: Introduce CFMWS structures in CEDT
  hw/cxl/component Add a dumb HDM decoder handler
  qtest/cxl: Add very basic sanity tests

Jonathan Cameron (16):
  MAINTAINERS: Add entry for Compute Express Link Emulation
  tests/acpi: allow DSDT.viot table changes.
  tests/acpi: Add update DSDT.viot
  cxl: Machine level control on whether CXL support is enabled
  hw/cxl/component: Add utils for interleave parameter encoding/decoding
  hw/cxl/host: Add support for CXL Fixed Memory Windows.
  hw/pci-host/gpex-acpi: Add support for dsdt construction for pxb-cxl
  pci/pcie_port: Add pci_find_port_by_pn()
  CXL/cxl_component: Add cxl_get_hb_cstate()
  mem/cxl_type3: Add read and write functions for associated hostmem.
  cxl/cxl-host: Add memops for CFMWS region.
  arm/virt: Allow virt/CEDT creation
  hw/arm/virt: Basic CXL enablement on pci_expander_bridge instances
    pxb-cxl
  RFC: softmmu/memory: Add ops to memory_region_ram_init_from_file
  i386/pc: Enable CXL fixed memory windows
  qtest/acpi: Add reference CEDT tables.

 MAINTAINERS                         |   7 +
 hw/Kconfig                          |   1 +
 hw/acpi/Kconfig                     |   5 +
 hw/acpi/cxl.c                       | 232 +++++++++++++
 hw/acpi/meson.build                 |   1 +
 hw/arm/Kconfig                      |   1 +
 hw/arm/virt-acpi-build.c            |  30 ++
 hw/arm/virt.c                       |  40 ++-
 hw/core/machine.c                   |  26 ++
 hw/cxl/Kconfig                      |   3 +
 hw/cxl/cxl-component-utils.c        | 277 +++++++++++++++
 hw/cxl/cxl-device-utils.c           | 268 +++++++++++++++
 hw/cxl/cxl-host-stubs.c             |  22 ++
 hw/cxl/cxl-host.c                   | 263 ++++++++++++++
 hw/cxl/cxl-mailbox-utils.c          | 509 ++++++++++++++++++++++++++++
 hw/cxl/meson.build                  |   9 +
 hw/i386/acpi-build.c                |  97 +++++-
 hw/i386/microvm.c                   |   1 +
 hw/i386/pc.c                        |  57 +++-
 hw/mem/Kconfig                      |   5 +
 hw/mem/cxl_type3.c                  | 353 +++++++++++++++++++
 hw/mem/meson.build                  |   1 +
 hw/meson.build                      |   1 +
 hw/pci-bridge/Kconfig               |   5 +
 hw/pci-bridge/cxl_root_port.c       | 231 +++++++++++++
 hw/pci-bridge/meson.build           |   1 +
 hw/pci-bridge/pci_expander_bridge.c | 179 +++++++++-
 hw/pci-bridge/pcie_root_port.c      |   6 +-
 hw/pci-host/gpex-acpi.c             |  22 +-
 hw/pci/pci.c                        |  21 +-
 hw/pci/pcie_port.c                  |  25 ++
 hw/ppc/spapr.c                      |   1 +
 include/hw/acpi/cxl.h               |  28 ++
 include/hw/arm/virt.h               |   1 +
 include/hw/boards.h                 |   2 +
 include/hw/cxl/cxl.h                |  51 +++
 include/hw/cxl/cxl_component.h      | 206 +++++++++++
 include/hw/cxl/cxl_device.h         | 266 +++++++++++++++
 include/hw/cxl/cxl_pci.h            | 160 +++++++++
 include/hw/pci/pci.h                |  14 +
 include/hw/pci/pci_bridge.h         |  20 ++
 include/hw/pci/pci_bus.h            |   7 +
 include/hw/pci/pci_ids.h            |   1 +
 include/hw/pci/pcie_port.h          |   2 +
 qapi/machine.json                   |  15 +
 qemu-options.hx                     |  37 ++
 softmmu/memory.c                    |   9 +
 softmmu/vl.c                        |  11 +
 tests/data/acpi/pc/CEDT             | Bin 0 -> 36 bytes
 tests/data/acpi/q35/CEDT            | Bin 0 -> 36 bytes
 tests/data/acpi/q35/DSDT.viot       | Bin 9398 -> 9416 bytes
 tests/data/acpi/virt/CEDT           | Bin 0 -> 36 bytes
 tests/qtest/cxl-test.c              | 151 +++++++++
 tests/qtest/meson.build             |   4 +
 54 files changed, 3645 insertions(+), 40 deletions(-)
 create mode 100644 hw/acpi/cxl.c
 create mode 100644 hw/cxl/Kconfig
 create mode 100644 hw/cxl/cxl-component-utils.c
 create mode 100644 hw/cxl/cxl-device-utils.c
 create mode 100644 hw/cxl/cxl-host-stubs.c
 create mode 100644 hw/cxl/cxl-host.c
 create mode 100644 hw/cxl/cxl-mailbox-utils.c
 create mode 100644 hw/cxl/meson.build
 create mode 100644 hw/mem/cxl_type3.c
 create mode 100644 hw/pci-bridge/cxl_root_port.c
 create mode 100644 include/hw/acpi/cxl.h
 create mode 100644 include/hw/cxl/cxl.h
 create mode 100644 include/hw/cxl/cxl_component.h
 create mode 100644 include/hw/cxl/cxl_device.h
 create mode 100644 include/hw/cxl/cxl_pci.h
 create mode 100644 tests/data/acpi/pc/CEDT
 create mode 100644 tests/data/acpi/q35/CEDT
 create mode 100644 tests/data/acpi/virt/CEDT
 create mode 100644 tests/qtest/cxl-test.c

-- 
2.32.0


Re: [PATCH v4 00/42] CXl 2.0 emulation Support
Posted by Jonathan Cameron via 2 years, 3 months ago
On Mon, 24 Jan 2022 17:16:23 +0000
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> 
> I'll post a git tree shortly for any who prefer that to lots
> of emails ;)


https://github.com/hisilicon/qemu.git  cxl-v4

I've included the pci swizzle fix on the tree as it
avoids some really annoying waits for interrupts to get
masked when testing on ARM.

I've also put the basic DOE patch set on top (CDAT + compliance)
and the ARM GIC ITS support patch for virt as who wouldn't
want that?

Thanks,

Jonathan
 
> 
> Thanks,
> 
> Jonathan
> 
> Ben Widawsky (26):
>   hw/pci/cxl: Add a CXL component type (interface)
>   hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)
>   hw/cxl/device: Introduce a CXL device (8.2.8)
>   hw/cxl/device: Implement the CAP array (8.2.8.1-2)
>   hw/cxl/device: Implement basic mailbox (8.2.8.4)
>   hw/cxl/device: Add memory device utilities
>   hw/cxl/device: Add cheap EVENTS implementation (8.2.9.1)
>   hw/cxl/device: Timestamp implementation (8.2.9.3)
>   hw/cxl/device: Add log commands (8.2.9.4) + CEL
>   hw/pxb: Use a type for realizing expanders
>   hw/pci/cxl: Create a CXL bus type
>   hw/pxb: Allow creation of a CXL PXB (host bridge)
>   acpi/pci: Consolidate host bridge setup
>   hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)
>   hw/cxl/rp: Add a root port
>   hw/cxl/device: Add a memory device (8.2.8.5)
>   hw/cxl/device: Implement MMIO HDM decoding (8.2.5.12)
>   acpi/cxl: Add _OSC implementation (9.14.2)
>   tests/acpi: allow CEDT table addition
>   acpi/cxl: Create the CEDT (9.14.1)
>   hw/cxl/device: Add some trivial commands
>   hw/cxl/device: Plumb real Label Storage Area (LSA) sizing
>   hw/cxl/device: Implement get/set Label Storage Area (LSA)
>   acpi/cxl: Introduce CFMWS structures in CEDT
>   hw/cxl/component Add a dumb HDM decoder handler
>   qtest/cxl: Add very basic sanity tests
> 
> Jonathan Cameron (16):
>   MAINTAINERS: Add entry for Compute Express Link Emulation
>   tests/acpi: allow DSDT.viot table changes.
>   tests/acpi: Add update DSDT.viot
>   cxl: Machine level control on whether CXL support is enabled
>   hw/cxl/component: Add utils for interleave parameter encoding/decoding
>   hw/cxl/host: Add support for CXL Fixed Memory Windows.
>   hw/pci-host/gpex-acpi: Add support for dsdt construction for pxb-cxl
>   pci/pcie_port: Add pci_find_port_by_pn()
>   CXL/cxl_component: Add cxl_get_hb_cstate()
>   mem/cxl_type3: Add read and write functions for associated hostmem.
>   cxl/cxl-host: Add memops for CFMWS region.
>   arm/virt: Allow virt/CEDT creation
>   hw/arm/virt: Basic CXL enablement on pci_expander_bridge instances
>     pxb-cxl
>   RFC: softmmu/memory: Add ops to memory_region_ram_init_from_file
>   i386/pc: Enable CXL fixed memory windows
>   qtest/acpi: Add reference CEDT tables.
> 
>  MAINTAINERS                         |   7 +
>  hw/Kconfig                          |   1 +
>  hw/acpi/Kconfig                     |   5 +
>  hw/acpi/cxl.c                       | 232 +++++++++++++
>  hw/acpi/meson.build                 |   1 +
>  hw/arm/Kconfig                      |   1 +
>  hw/arm/virt-acpi-build.c            |  30 ++
>  hw/arm/virt.c                       |  40 ++-
>  hw/core/machine.c                   |  26 ++
>  hw/cxl/Kconfig                      |   3 +
>  hw/cxl/cxl-component-utils.c        | 277 +++++++++++++++
>  hw/cxl/cxl-device-utils.c           | 268 +++++++++++++++
>  hw/cxl/cxl-host-stubs.c             |  22 ++
>  hw/cxl/cxl-host.c                   | 263 ++++++++++++++
>  hw/cxl/cxl-mailbox-utils.c          | 509 ++++++++++++++++++++++++++++
>  hw/cxl/meson.build                  |   9 +
>  hw/i386/acpi-build.c                |  97 +++++-
>  hw/i386/microvm.c                   |   1 +
>  hw/i386/pc.c                        |  57 +++-
>  hw/mem/Kconfig                      |   5 +
>  hw/mem/cxl_type3.c                  | 353 +++++++++++++++++++
>  hw/mem/meson.build                  |   1 +
>  hw/meson.build                      |   1 +
>  hw/pci-bridge/Kconfig               |   5 +
>  hw/pci-bridge/cxl_root_port.c       | 231 +++++++++++++
>  hw/pci-bridge/meson.build           |   1 +
>  hw/pci-bridge/pci_expander_bridge.c | 179 +++++++++-
>  hw/pci-bridge/pcie_root_port.c      |   6 +-
>  hw/pci-host/gpex-acpi.c             |  22 +-
>  hw/pci/pci.c                        |  21 +-
>  hw/pci/pcie_port.c                  |  25 ++
>  hw/ppc/spapr.c                      |   1 +
>  include/hw/acpi/cxl.h               |  28 ++
>  include/hw/arm/virt.h               |   1 +
>  include/hw/boards.h                 |   2 +
>  include/hw/cxl/cxl.h                |  51 +++
>  include/hw/cxl/cxl_component.h      | 206 +++++++++++
>  include/hw/cxl/cxl_device.h         | 266 +++++++++++++++
>  include/hw/cxl/cxl_pci.h            | 160 +++++++++
>  include/hw/pci/pci.h                |  14 +
>  include/hw/pci/pci_bridge.h         |  20 ++
>  include/hw/pci/pci_bus.h            |   7 +
>  include/hw/pci/pci_ids.h            |   1 +
>  include/hw/pci/pcie_port.h          |   2 +
>  qapi/machine.json                   |  15 +
>  qemu-options.hx                     |  37 ++
>  softmmu/memory.c                    |   9 +
>  softmmu/vl.c                        |  11 +
>  tests/data/acpi/pc/CEDT             | Bin 0 -> 36 bytes
>  tests/data/acpi/q35/CEDT            | Bin 0 -> 36 bytes
>  tests/data/acpi/q35/DSDT.viot       | Bin 9398 -> 9416 bytes
>  tests/data/acpi/virt/CEDT           | Bin 0 -> 36 bytes
>  tests/qtest/cxl-test.c              | 151 +++++++++
>  tests/qtest/meson.build             |   4 +
>  54 files changed, 3645 insertions(+), 40 deletions(-)
>  create mode 100644 hw/acpi/cxl.c
>  create mode 100644 hw/cxl/Kconfig
>  create mode 100644 hw/cxl/cxl-component-utils.c
>  create mode 100644 hw/cxl/cxl-device-utils.c
>  create mode 100644 hw/cxl/cxl-host-stubs.c
>  create mode 100644 hw/cxl/cxl-host.c
>  create mode 100644 hw/cxl/cxl-mailbox-utils.c
>  create mode 100644 hw/cxl/meson.build
>  create mode 100644 hw/mem/cxl_type3.c
>  create mode 100644 hw/pci-bridge/cxl_root_port.c
>  create mode 100644 include/hw/acpi/cxl.h
>  create mode 100644 include/hw/cxl/cxl.h
>  create mode 100644 include/hw/cxl/cxl_component.h
>  create mode 100644 include/hw/cxl/cxl_device.h
>  create mode 100644 include/hw/cxl/cxl_pci.h
>  create mode 100644 tests/data/acpi/pc/CEDT
>  create mode 100644 tests/data/acpi/q35/CEDT
>  create mode 100644 tests/data/acpi/virt/CEDT
>  create mode 100644 tests/qtest/cxl-test.c
> 


Re: [PATCH v4 00/42] CXl 2.0 emulation Support
Posted by Alex Bennée 2 years, 2 months ago
Jonathan Cameron <Jonathan.Cameron@huawei.com> writes:

> Previous version was RFC v3: CXL 2.0 Support.
> No longer an RFC as I would consider the vast majority of this
> to be ready for detailed review. There are still questions called
> out in some patches however.

I've been through and added comments through the first half of the
patches. I'll see if I can get to the second half next week however if
you beat me to it with a re-rev I expect some ripples from the requested
changes.

Aside from ensuring the rest of the builds work:

  https://gitlab.com/stsquad/qemu/-/pipelines/456700583/failures
  
it looks pretty good to me. I await the next version ;-)

-- 
Alex Bennée

Re: [PATCH v4 00/42] CXl 2.0 emulation Support
Posted by Jonathan Cameron via 2 years, 2 months ago
On Thu, 27 Jan 2022 14:22:52 +0000
Alex Bennée <alex.bennee@linaro.org> wrote:

> Jonathan Cameron <Jonathan.Cameron@huawei.com> writes:
> 
> > Previous version was RFC v3: CXL 2.0 Support.
> > No longer an RFC as I would consider the vast majority of this
> > to be ready for detailed review. There are still questions called
> > out in some patches however.  
> 
> I've been through and added comments through the first half of the
> patches. I'll see if I can get to the second half next week however if
> you beat me to it with a re-rev I expect some ripples from the requested
> changes.
> 
> Aside from ensuring the rest of the builds work:
> 
>   https://gitlab.com/stsquad/qemu/-/pipelines/456700583/failures
>   
> it looks pretty good to me. I await the next version ;-)
> 

Thanks for ploughing through them - it's a great help.
Hopefully I'll get a new version out before you get back to them.

The CI certainly threw up some unexpected issues alongside the
bugs and wrong assumptions and build issues you pointed out.

* can't have a field called ERROR in a register on some archs
* doesn't work if you don't push the tags on the tree... (win builds)

but should be clean in next version.

Thanks,

Jonathan







Re: [PATCH v4 00/42] CXl 2.0 emulation Support
Posted by Ben Widawsky 2 years, 2 months ago
Really awesome work Jonathan. Dan and I are wrapping up some of the kernel bits,
so all I'll do for now is try to run this, but I hope to be able to review the
parts I'm familiar with at least.

On 22-01-24 17:16:23, Jonathan Cameron wrote:
> Previous version was RFC v3: CXL 2.0 Support.
> No longer an RFC as I would consider the vast majority of this
> to be ready for detailed review. There are still questions called
> out in some patches however.
> 
> Looking in particular for:
> * Review of the PCI interactions
> * x86 and ARM machine interactions (particularly the memory maps)
> * Review of the interleaving approach - is the basic idea
>   acceptable?
> * Review of the command line interface.
> * CXL related review welcome but much of that got reviewed
>   in earlier versions and hasn't changed substantially.
> 
> Main changes:
> * The CXL fixed memory windows are now instantiated via a
>   -cxl-fixed-memory-window command line option.  As they are host level
>   entities, not associated with a particular hardware entity a top
>   level parameter seems the most natural way to describe them.
>   This is also much closer to how it works on a real host than the
>   previous assignment of a physical address window to all components
>   along the CXL path.

Excellent.

> * Dynamic host memory physical address space allocation both for
>   the CXL host bridge MMIO space and the CFMWS windows.

I thought I had done the host bridge MMIO, but perhaps I was mistaken. Either
way, this is an important step to support all platforms more generally.

> * Interleaving support (based loosely on Philippe Mathieu-Daudé's
>   earlier work on an interleaved memory device).  Note this is rudimentary
>   and low performance but it may be sufficient for test purposes.

I'll have to look at this further. I had some thoughts about how we might make
this fast, but it would be more of fake interleaving. How low is "low"?

> * Additional PCI and memory related utility functions needed for the
>   interleaving.
> * Various minor cleanup and increase in scope of tests.
> * For now dropped the support for presenting CXL type 3 devices
>   as memory devices in various QEMU interfaces.

What are the downsides to this? I only used the memory interface originally
because it seemed like a natural fit, but looking back I'm not sure we gain
much (though my memory is very lossy).

> * Dropped the patch letting UID be different from bus_nr.  Whilst
>   it may be a useful thing to have, we don't need it for this series
>   and so should be handled separately.
> 
> I've called out patches with major changes by marking them as
> co-developed or introducing them as new patches. The original
> memory window code has been dropped
> 
> After discussions at plumbers and more recently on the mailing list
> it was clear that there was interest in getting emulation for CXL 2.0
> upstream in QEMU.  This version resolves many of the outstanding issues
> and enables the following features:
> 
> * Support on both x86/pc and ARM/virt with relevant ACPI tables
>   generated in QEMU.
> * Host bridge based on the existing PCI Expander Bridge PXB.
> * CXL fixed memory windows, allowing host to describe interleaving
>   across multiple CXL host bridges.
> * pxb-cxl CXL host bridge support including MMIO region for control
>   and HDM (Host manage device memory - basically interleaving / routing)
>   decoder configuration.
> * Basic CXL Root port support.
> * CXL Type 3 device support with persistent memory regions (backed by
>   hostmem backend).
> * Pulled MAINTAINERS entry out to a separate patch and add myself as
>   a co-maintainer at Ben's suggestion.
> 
> Big TODOs:
> 
> * Volatile memory devices (easy but it's more code so left for now).
> * Switch support.
> * Hotplug?  May not need much but it's not tested yet!
> * More tests and tighter verification that values written to hardware
>   are actually valid - stuff that real hardware would check.
> * Main host bridge support (not a priority for me...)

I originally cared about this for the sake of making a system more realistic. I
now believe we should drop this entirely.

> * Testing, testing and more testing.  I have been running a basic
>   set of ARM and x86 tests on this, but there is always room for
>   more tests and greater automation.
> 
> Why do we want QEMU emulation of CXL?
> 
> As Ben stated in V3, QEMU support has been critical to getting OS
> software written given lack of availability of hardware supporting the
> latest CXL features (coupled with very high demand for support being
> ready in a timely fashion). What has become clear since Ben's v3
> is that situation is a continuous one.  Whilst we can't talk about
> them yet, CXL 3.0 features and OS support have been prototyped on
> top of this support and a lot of the ongoing kernel work is being
> tested against these patches.
> 
> Other features on the qemu-list that build on these include PCI-DOE
> /CDAT support from the Avery Design team further showing how this
> code is useful.  Whilst not directly related this is also the test
> platform for work on PCI IDE/CMA + related DMTF SPDM as CXL both
> utilizes and extends those technologies and is likely to be an early
> adopter.
> Refs:
> CMA Kernel: https://lore.kernel.org/all/20210804161839.3492053-1-Jonathan.Cameron@huawei.com/
> CMA Qemu: https://lore.kernel.org/qemu-devel/1624665723-5169-1-git-send-email-cbrowy@avery-design.com/
> DOE Qemu: https://lore.kernel.org/qemu-devel/1623329999-15662-1-git-send-email-cbrowy@avery-design.com/
> 
> 
> As can be seen there is non trivial interaction with other areas of
> Qemu, particularly PCI and keeping this set up to date is proving
> a burden we'd rather do without :)
> 
> Ben mentioned a few other good reasons in v3:
> https://lore.kernel.org/qemu-devel/20210202005948.241655-1-ben.widawsky@intel.com/
> 
> The evolution of this series perhaps leave it in a less than
> entirely obvious order and that may get tidied up in future postings.
> I'm also open to this being considered in bite sized chunks.  What
> we have here is about what you need for it to be useful for testing
> currently kernel code.
> 
> All comments welcome.
> 
> Ben - I lifted one patch from your git tree that didn't have a
> Sign-off.   hw/cxl/component Add a dumb HDM decoder handler
> Could you confirm you are happy for one to be added?

Sure.

> 
> Example of new command line (with virt ITS patches ;)
> 
> qemu-system-aarch64 -M virt,gic-version=3,cxl=on \
>  -m 4g,maxmem=8G,slots=8 \
>  ...
>  -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M,align=256M \
>  -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M,align=256M \
>  -object memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M,align=256M \
>  -object memory-backend-file,id=cxl-mem4,share=on,mem-path=/tmp/cxltest4.raw,size=256M,align=256M \
>  -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M,align=256M \
>  -object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M,align=256M \
>  -object memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M,align=256M \
>  -object memory-backend-file,id=cxl-lsa4,share=on,mem-path=/tmp/lsa4.raw,size=256M,align=256M \

Is align actually necessary here?

>  -object memory-backend-file,id=tt,share=on,mem-path=/tmp/tt.raw,size=1g \

Did you mean to put this in there? Is it somehow used internally?

>  -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
>  -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \
>  -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
>  -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0,size=256M \
>  -device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \
>  -device cxl-type3,bus=root_port14,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1,size=256M \
>  -device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \
>  -device cxl-type3,bus=root_port15,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2,size=256M \
>  -device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \
>  -device cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3,size=256M \
>  -cxl-fixed-memory-window targets=cxl.1,size=4G,interleave-granularity=8k \
>  -cxl-fixed-memory-window targets=cxl.1,targets=cxl.2,size=4G,interleave-granularity=8k

I assume interleave-ways is based on the number of targets. For testing purposes
it might be nice to add the flags as well (perhaps it's there).

> 
> First CFMWS suitable for 2 way interleave, the second for 4 way (2 way
> at host level and 2 way at the host bridge).
> targets=<range of pxb-cxl uids> , multiple entries if range is disjoint.
> 
> With Ben's CXL region patches (v3 shortly) plus fixes as discussed on list,
> Linux commands to bring up a 4 way interleave is:
> 
>  cd /sys/bus/cxl/devices/
>  region=$(cat decoder0.1/create_region)
>  echo $region  > decoder0.1/create_region
>  ls -lh
>  
>  //Note the order of devices and adjust the following to make sure they
>  //are in order across the 4 root ports.  Easy to do in a tool, but
>  //not easy to paste in a cover letter.
> 
>  cd region0.1\:0
>  echo 4 > interleave_ways
>  echo mem2 > target0
>  echo mem3 > target1
>  echo mem0 > target2
>  echo mem1 > target3
>  echo $((1024<<20)) > size
>  echo 4096 > interleave_granularity
>  echo region0.1:0 > /sys/bus/cxl/drivers/cxl_region/bind
> 
> Tested with devmem2 and files with known content.
> Kernel tree was based on previous version of the region patches
> from Ben with various fixes. As Dan just posted an updated version
> next job on my list is to test that.
> 
> Thanks to Shameer for his help with reviewing the new stuff before
> posting.
> 
> I'll post a git tree shortly for any who prefer that to lots
> of emails ;)
> 
> Thanks,
> 
> Jonathan

Thanks again!
Ben

[snip]



Re: [PATCH v4 00/42] CXl 2.0 emulation Support
Posted by Ben Widawsky 2 years, 2 months ago
On 22-01-25 11:18:08, Ben Widawsky wrote:
> Really awesome work Jonathan. Dan and I are wrapping up some of the kernel bits,
> so all I'll do for now is try to run this, but I hope to be able to review the
> parts I'm familiar with at least.
> 
> On 22-01-24 17:16:23, Jonathan Cameron wrote:
> > Previous version was RFC v3: CXL 2.0 Support.
> > No longer an RFC as I would consider the vast majority of this
> > to be ready for detailed review. There are still questions called
> > out in some patches however.
> > 
> > Looking in particular for:
> > * Review of the PCI interactions
> > * x86 and ARM machine interactions (particularly the memory maps)
> > * Review of the interleaving approach - is the basic idea
> >   acceptable?
> > * Review of the command line interface.
> > * CXL related review welcome but much of that got reviewed
> >   in earlier versions and hasn't changed substantially.
> > 
> > Main changes:
> > * The CXL fixed memory windows are now instantiated via a
> >   -cxl-fixed-memory-window command line option.  As they are host level
> >   entities, not associated with a particular hardware entity a top
> >   level parameter seems the most natural way to describe them.
> >   This is also much closer to how it works on a real host than the
> >   previous assignment of a physical address window to all components
> >   along the CXL path.
> 
> Excellent.
> 
> > * Dynamic host memory physical address space allocation both for
> >   the CXL host bridge MMIO space and the CFMWS windows.
> 
> I thought I had done the host bridge MMIO, but perhaps I was mistaken. Either
> way, this is an important step to support all platforms more generally.
> 
> > * Interleaving support (based loosely on Philippe Mathieu-Daudé's
> >   earlier work on an interleaved memory device).  Note this is rudimentary
> >   and low performance but it may be sufficient for test purposes.
> 
> I'll have to look at this further. I had some thoughts about how we might make
> this fast, but it would be more of fake interleaving. How low is "low"?
> 
> > * Additional PCI and memory related utility functions needed for the
> >   interleaving.
> > * Various minor cleanup and increase in scope of tests.
> > * For now dropped the support for presenting CXL type 3 devices
> >   as memory devices in various QEMU interfaces.
> 
> What are the downsides to this? I only used the memory interface originally
> because it seemed like a natural fit, but looking back I'm not sure we gain
> much (though my memory is very lossy).
> 
> > * Dropped the patch letting UID be different from bus_nr.  Whilst
> >   it may be a useful thing to have, we don't need it for this series
> >   and so should be handled separately.
> > 
> > I've called out patches with major changes by marking them as
> > co-developed or introducing them as new patches. The original
> > memory window code has been dropped
> > 
> > After discussions at plumbers and more recently on the mailing list
> > it was clear that there was interest in getting emulation for CXL 2.0
> > upstream in QEMU.  This version resolves many of the outstanding issues
> > and enables the following features:
> > 
> > * Support on both x86/pc and ARM/virt with relevant ACPI tables
> >   generated in QEMU.
> > * Host bridge based on the existing PCI Expander Bridge PXB.
> > * CXL fixed memory windows, allowing host to describe interleaving
> >   across multiple CXL host bridges.
> > * pxb-cxl CXL host bridge support including MMIO region for control
> >   and HDM (Host manage device memory - basically interleaving / routing)
> >   decoder configuration.
> > * Basic CXL Root port support.
> > * CXL Type 3 device support with persistent memory regions (backed by
> >   hostmem backend).
> > * Pulled MAINTAINERS entry out to a separate patch and add myself as
> >   a co-maintainer at Ben's suggestion.
> > 
> > Big TODOs:
> > 
> > * Volatile memory devices (easy but it's more code so left for now).
> > * Switch support.
> > * Hotplug?  May not need much but it's not tested yet!
> > * More tests and tighter verification that values written to hardware
> >   are actually valid - stuff that real hardware would check.
> > * Main host bridge support (not a priority for me...)
> 
> I originally cared about this for the sake of making a system more realistic. I
> now believe we should drop this entirely.
> 
> > * Testing, testing and more testing.  I have been running a basic
> >   set of ARM and x86 tests on this, but there is always room for
> >   more tests and greater automation.
> > 
> > Why do we want QEMU emulation of CXL?
> > 
> > As Ben stated in V3, QEMU support has been critical to getting OS
> > software written given lack of availability of hardware supporting the
> > latest CXL features (coupled with very high demand for support being
> > ready in a timely fashion). What has become clear since Ben's v3
> > is that situation is a continuous one.  Whilst we can't talk about
> > them yet, CXL 3.0 features and OS support have been prototyped on
> > top of this support and a lot of the ongoing kernel work is being
> > tested against these patches.
> > 
> > Other features on the qemu-list that build on these include PCI-DOE
> > /CDAT support from the Avery Design team further showing how this
> > code is useful.  Whilst not directly related this is also the test
> > platform for work on PCI IDE/CMA + related DMTF SPDM as CXL both
> > utilizes and extends those technologies and is likely to be an early
> > adopter.
> > Refs:
> > CMA Kernel: https://lore.kernel.org/all/20210804161839.3492053-1-Jonathan.Cameron@huawei.com/
> > CMA Qemu: https://lore.kernel.org/qemu-devel/1624665723-5169-1-git-send-email-cbrowy@avery-design.com/
> > DOE Qemu: https://lore.kernel.org/qemu-devel/1623329999-15662-1-git-send-email-cbrowy@avery-design.com/
> > 
> > 
> > As can be seen there is non trivial interaction with other areas of
> > Qemu, particularly PCI and keeping this set up to date is proving
> > a burden we'd rather do without :)
> > 
> > Ben mentioned a few other good reasons in v3:
> > https://lore.kernel.org/qemu-devel/20210202005948.241655-1-ben.widawsky@intel.com/
> > 
> > The evolution of this series perhaps leave it in a less than
> > entirely obvious order and that may get tidied up in future postings.
> > I'm also open to this being considered in bite sized chunks.  What
> > we have here is about what you need for it to be useful for testing
> > currently kernel code.
> > 
> > All comments welcome.
> > 
> > Ben - I lifted one patch from your git tree that didn't have a
> > Sign-off.   hw/cxl/component Add a dumb HDM decoder handler
> > Could you confirm you are happy for one to be added?
> 
> Sure.
> 
> > 
> > Example of new command line (with virt ITS patches ;)
> > 
> > qemu-system-aarch64 -M virt,gic-version=3,cxl=on \
> >  -m 4g,maxmem=8G,slots=8 \
> >  ...
> >  -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=cxl-mem4,share=on,mem-path=/tmp/cxltest4.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=cxl-lsa4,share=on,mem-path=/tmp/lsa4.raw,size=256M,align=256M \
> 
> Is align actually necessary here?
> 
> >  -object memory-backend-file,id=tt,share=on,mem-path=/tmp/tt.raw,size=1g \
> 
> Did you mean to put this in there? Is it somehow used internally?
> 
> >  -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> >  -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \
> >  -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> >  -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0,size=256M \
> >  -device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \
> >  -device cxl-type3,bus=root_port14,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1,size=256M \
> >  -device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \
> >  -device cxl-type3,bus=root_port15,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2,size=256M \
> >  -device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \
> >  -device cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3,size=256M \
> >  -cxl-fixed-memory-window targets=cxl.1,size=4G,interleave-granularity=8k \
> >  -cxl-fixed-memory-window targets=cxl.1,targets=cxl.2,size=4G,interleave-granularity=8k
> 
> I assume interleave-ways is based on the number of targets. For testing purposes
> it might be nice to add the flags as well (perhaps it's there).
> 

This requires cxl=on machine arg now btw.

> > 
> > First CFMWS suitable for 2 way interleave, the second for 4 way (2 way
> > at host level and 2 way at the host bridge).
> > targets=<range of pxb-cxl uids> , multiple entries if range is disjoint.
> > 
> > With Ben's CXL region patches (v3 shortly) plus fixes as discussed on list,
> > Linux commands to bring up a 4 way interleave is:
> > 
> >  cd /sys/bus/cxl/devices/
> >  region=$(cat decoder0.1/create_region)
> >  echo $region  > decoder0.1/create_region
> >  ls -lh
> >  
> >  //Note the order of devices and adjust the following to make sure they
> >  //are in order across the 4 root ports.  Easy to do in a tool, but
> >  //not easy to paste in a cover letter.
> > 
> >  cd region0.1\:0
> >  echo 4 > interleave_ways
> >  echo mem2 > target0
> >  echo mem3 > target1
> >  echo mem0 > target2
> >  echo mem1 > target3
> >  echo $((1024<<20)) > size
> >  echo 4096 > interleave_granularity
> >  echo region0.1:0 > /sys/bus/cxl/drivers/cxl_region/bind
> > 
> > Tested with devmem2 and files with known content.
> > Kernel tree was based on previous version of the region patches
> > from Ben with various fixes. As Dan just posted an updated version
> > next job on my list is to test that.
> > 
> > Thanks to Shameer for his help with reviewing the new stuff before
> > posting.
> > 
> > I'll post a git tree shortly for any who prefer that to lots
> > of emails ;)
> > 
> > Thanks,
> > 
> > Jonathan
> 
> Thanks again!
> Ben
> 
> [snip]
> 
> 

Re: [PATCH v4 00/42] CXl 2.0 emulation Support
Posted by Jonathan Cameron via 2 years, 2 months ago
On Tue, 25 Jan 2022 15:55:03 -0800
Ben Widawsky <ben.widawsky@intel.com> wrote:

> On 22-01-25 11:18:08, Ben Widawsky wrote:
> > Really awesome work Jonathan. Dan and I are wrapping up some of the kernel bits,
> > so all I'll do for now is try to run this, but I hope to be able to review the
> > parts I'm familiar with at least.
> > 
> > On 22-01-24 17:16:23, Jonathan Cameron wrote:  
> > > Previous version was RFC v3: CXL 2.0 Support.
> > > No longer an RFC as I would consider the vast majority of this
> > > to be ready for detailed review. There are still questions called
> > > out in some patches however.
> > > 
> > > Looking in particular for:
> > > * Review of the PCI interactions
> > > * x86 and ARM machine interactions (particularly the memory maps)
> > > * Review of the interleaving approach - is the basic idea
> > >   acceptable?
> > > * Review of the command line interface.
> > > * CXL related review welcome but much of that got reviewed
> > >   in earlier versions and hasn't changed substantially.
> > > 
> > > Main changes:
> > > * The CXL fixed memory windows are now instantiated via a
> > >   -cxl-fixed-memory-window command line option.  As they are host level
> > >   entities, not associated with a particular hardware entity a top
> > >   level parameter seems the most natural way to describe them.
> > >   This is also much closer to how it works on a real host than the
> > >   previous assignment of a physical address window to all components
> > >   along the CXL path.  
> > 
> > Excellent.
> >   
> > > * Dynamic host memory physical address space allocation both for
> > >   the CXL host bridge MMIO space and the CFMWS windows.  
> > 
> > I thought I had done the host bridge MMIO, but perhaps I was mistaken. Either
> > way, this is an important step to support all platforms more generally.

That is comment probably more general that it needs to be ;)  I can't remember how
much of this was done using fixed addresses but it all got rewritten
anyway.  As you've probably noticed I got lazy on change logs because lots
of changes had minor influence on a large set of patches making them fiddly
to document.

> >   
> > > * Interleaving support (based loosely on Philippe Mathieu-Daudé's
> > >   earlier work on an interleaved memory device).  Note this is rudimentary
> > >   and low performance but it may be sufficient for test purposes.  
> > 
> > I'll have to look at this further. I had some thoughts about how we might make
> > this fast, but it would be more of fake interleaving. How low is "low"?

The question becomes what is the purpose.  We aren't doing this emulation
to provide a realistic system, but rather to just have something we can
test kernel etc tooling against.

I'm not yet in a position to do perf tests, but given it's walking the
decoders ever time it is never going to be great.  We could look at
caching the walks, but then a bunch of locking comes into play.
Mind you right now I suspect we'll have all sorts of nasty issues
if devices are hot removed whilst transactions are in flight.
Tidying that up was one of the TODOs I forgot to list.

> >   
> > > * Additional PCI and memory related utility functions needed for the
> > >   interleaving.
> > > * Various minor cleanup and increase in scope of tests.
> > > * For now dropped the support for presenting CXL type 3 devices
> > >   as memory devices in various QEMU interfaces.  
> > 
> > What are the downsides to this? I only used the memory interface originally
> > because it seemed like a natural fit, but looking back I'm not sure we gain
> > much (though my memory is very lossy).

Main downside is simply that people might expect to see all their memory
devices in some of the info commands and right now the CXL ones don't show
up at all.  That doesn't necessarily mean we need to use the existing
framework, but it might make sense to extend the tools a little to include
the CXL attached memories.

> >   
> > > * Dropped the patch letting UID be different from bus_nr.  Whilst
> > >   it may be a useful thing to have, we don't need it for this series
> > >   and so should be handled separately.
> > > 
> > > I've called out patches with major changes by marking them as
> > > co-developed or introducing them as new patches. The original
> > > memory window code has been dropped
> > > 
> > > After discussions at plumbers and more recently on the mailing list
> > > it was clear that there was interest in getting emulation for CXL 2.0
> > > upstream in QEMU.  This version resolves many of the outstanding issues
> > > and enables the following features:
> > > 
> > > * Support on both x86/pc and ARM/virt with relevant ACPI tables
> > >   generated in QEMU.
> > > * Host bridge based on the existing PCI Expander Bridge PXB.
> > > * CXL fixed memory windows, allowing host to describe interleaving
> > >   across multiple CXL host bridges.
> > > * pxb-cxl CXL host bridge support including MMIO region for control
> > >   and HDM (Host manage device memory - basically interleaving / routing)
> > >   decoder configuration.
> > > * Basic CXL Root port support.
> > > * CXL Type 3 device support with persistent memory regions (backed by
> > >   hostmem backend).
> > > * Pulled MAINTAINERS entry out to a separate patch and add myself as
> > >   a co-maintainer at Ben's suggestion.
> > > 
> > > Big TODOs:
> > > 
> > > * Volatile memory devices (easy but it's more code so left for now).
> > > * Switch support.
> > > * Hotplug?  May not need much but it's not tested yet!
> > > * More tests and tighter verification that values written to hardware
> > >   are actually valid - stuff that real hardware would check.
> > > * Main host bridge support (not a priority for me...)  
> > 
> > I originally cared about this for the sake of making a system more realistic. I
> > now believe we should drop this entirely.
Cool. That avoids some mess around the type of a CXL host bridge that
I hadn't figured a clean way around (short of checking against all
implemented possibilities)

> >   
> > > * Testing, testing and more testing.  I have been running a basic
> > >   set of ARM and x86 tests on this, but there is always room for
> > >   more tests and greater automation.
> > > 
> > > Why do we want QEMU emulation of CXL?
> > > 
> > > As Ben stated in V3, QEMU support has been critical to getting OS
> > > software written given lack of availability of hardware supporting the
> > > latest CXL features (coupled with very high demand for support being
> > > ready in a timely fashion). What has become clear since Ben's v3
> > > is that situation is a continuous one.  Whilst we can't talk about
> > > them yet, CXL 3.0 features and OS support have been prototyped on
> > > top of this support and a lot of the ongoing kernel work is being
> > > tested against these patches.
> > > 
> > > Other features on the qemu-list that build on these include PCI-DOE
> > > /CDAT support from the Avery Design team further showing how this
> > > code is useful.  Whilst not directly related this is also the test
> > > platform for work on PCI IDE/CMA + related DMTF SPDM as CXL both
> > > utilizes and extends those technologies and is likely to be an early
> > > adopter.
> > > Refs:
> > > CMA Kernel: https://lore.kernel.org/all/20210804161839.3492053-1-Jonathan.Cameron@huawei.com/
> > > CMA Qemu: https://lore.kernel.org/qemu-devel/1624665723-5169-1-git-send-email-cbrowy@avery-design.com/
> > > DOE Qemu: https://lore.kernel.org/qemu-devel/1623329999-15662-1-git-send-email-cbrowy@avery-design.com/
> > > 
> > > 
> > > As can be seen there is non trivial interaction with other areas of
> > > Qemu, particularly PCI and keeping this set up to date is proving
> > > a burden we'd rather do without :)
> > > 
> > > Ben mentioned a few other good reasons in v3:
> > > https://lore.kernel.org/qemu-devel/20210202005948.241655-1-ben.widawsky@intel.com/
> > > 
> > > The evolution of this series perhaps leave it in a less than
> > > entirely obvious order and that may get tidied up in future postings.
> > > I'm also open to this being considered in bite sized chunks.  What
> > > we have here is about what you need for it to be useful for testing
> > > currently kernel code.
> > > 
> > > All comments welcome.
> > > 
> > > Ben - I lifted one patch from your git tree that didn't have a
> > > Sign-off.   hw/cxl/component Add a dumb HDM decoder handler
> > > Could you confirm you are happy for one to be added?  
> > 
> > Sure.

Cool. I'll put that in for v5.

> >   
> > > 
> > > Example of new command line (with virt ITS patches ;)
> > > 
> > > qemu-system-aarch64 -M virt,gic-version=3,cxl=on \
> > >  -m 4g,maxmem=8G,slots=8 \
> > >  ...
> > >  -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M,align=256M \
> > >  -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M,align=256M \
> > >  -object memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M,align=256M \
> > >  -object memory-backend-file,id=cxl-mem4,share=on,mem-path=/tmp/cxltest4.raw,size=256M,align=256M \
> > >  -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M,align=256M \
> > >  -object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M,align=256M \
> > >  -object memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M,align=256M \
> > >  -object memory-backend-file,id=cxl-lsa4,share=on,mem-path=/tmp/lsa4.raw,size=256M,align=256M \  
> > 
> > Is align actually necessary here?

Err.  That's been in my config a long time.  IIRC I ran into problems with your earlier
versions when I didn't provide it, but might not be necessary any more. Good spot.

> >   
> > >  -object memory-backend-file,id=tt,share=on,mem-path=/tmp/tt.raw,size=1g \  
> > 
> > Did you mean to put this in there? Is it somehow used internally?

Oops. Nope - that is bad editing on my part - was part of a nvdimm test I was running
to make sure I didn't accidentally break anything in the normal file backed
hostmem paths.

> >   
> > >  -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> > >  -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \
> > >  -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> > >  -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0,size=256M \
> > >  -device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \
> > >  -device cxl-type3,bus=root_port14,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1,size=256M \
> > >  -device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \
> > >  -device cxl-type3,bus=root_port15,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2,size=256M \
> > >  -device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \
> > >  -device cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3,size=256M \
> > >  -cxl-fixed-memory-window targets=cxl.1,size=4G,interleave-granularity=8k \
> > >  -cxl-fixed-memory-window targets=cxl.1,targets=cxl.2,size=4G,interleave-granularity=8k  
> > 
> > I assume interleave-ways is based on the number of targets. For testing purposes
> > it might be nice to add the flags as well (perhaps it's there).

Good point, though not implemented yet.  Easy thing to add as a later step.

> >   
> 
> This requires cxl=on machine arg now btw.

Absolutely.  Though I wonder if that is worth bothering with.   We could just always
reserve the memory for the host bridge MMIO and run through building empty CEDT
(or add sanity checks in the CEDT build code on no HB and no CFMWS == no CEDT).

I'd be interested on what various machine maintainers think on this.

Longer term it would be nice to generally cleanup the machine PA space
code to use some sort of allocator because we just added another layer
of if/else to some already deep trees based on all the optional parts.

For ARM there is one suitable for the MMIO regions (given the hack to just
allocated space for 16) but the CFMWs / device memory etc are just as
nasty to deal with as on i386/pc

> 
> > > 
> > > First CFMWS suitable for 2 way interleave, the second for 4 way (2 way
> > > at host level and 2 way at the host bridge).
> > > targets=<range of pxb-cxl uids> , multiple entries if range is disjoint.
> > > 
> > > With Ben's CXL region patches (v3 shortly) plus fixes as discussed on list,
> > > Linux commands to bring up a 4 way interleave is:
> > > 
> > >  cd /sys/bus/cxl/devices/
> > >  region=$(cat decoder0.1/create_region)
> > >  echo $region  > decoder0.1/create_region
> > >  ls -lh
> > >  
> > >  //Note the order of devices and adjust the following to make sure they
> > >  //are in order across the 4 root ports.  Easy to do in a tool, but
> > >  //not easy to paste in a cover letter.
> > > 
> > >  cd region0.1\:0
> > >  echo 4 > interleave_ways
> > >  echo mem2 > target0
> > >  echo mem3 > target1
> > >  echo mem0 > target2
> > >  echo mem1 > target3
> > >  echo $((1024<<20)) > size
> > >  echo 4096 > interleave_granularity
> > >  echo region0.1:0 > /sys/bus/cxl/drivers/cxl_region/bind
> > > 
> > > Tested with devmem2 and files with known content.
> > > Kernel tree was based on previous version of the region patches
> > > from Ben with various fixes. As Dan just posted an updated version
> > > next job on my list is to test that.
> > > 
> > > Thanks to Shameer for his help with reviewing the new stuff before
> > > posting.
> > > 
> > > I'll post a git tree shortly for any who prefer that to lots
> > > of emails ;)
> > > 
> > > Thanks,
> > > 
> > > Jonathan  
> > 
> > Thanks again!
> > Ben
You are welcome.

Been an interesting learning curve as all my past QEMU work
was rather more superficial than this.

Jonathan

> > 
> > [snip]
> > 
> >   


Re: [PATCH v4 00/42] CXl 2.0 emulation Support
Posted by Alex Bennée 2 years, 2 months ago
Jonathan Cameron <Jonathan.Cameron@huawei.com> writes:

> Previous version was RFC v3: CXL 2.0 Support.
> No longer an RFC as I would consider the vast majority of this
> to be ready for detailed review. There are still questions called
> out in some patches however.
>
> Looking in particular for:
> * Review of the PCI interactions
> * x86 and ARM machine interactions (particularly the memory maps)
> * Review of the interleaving approach - is the basic idea
>   acceptable?
> * Review of the command line interface.
> * CXL related review welcome but much of that got reviewed
>   in earlier versions and hasn't changed substantially.
>
<snip>
>
> Why do we want QEMU emulation of CXL?
>
> As Ben stated in V3, QEMU support has been critical to getting OS
> software written given lack of availability of hardware supporting the
> latest CXL features (coupled with very high demand for support being
> ready in a timely fashion). What has become clear since Ben's v3
> is that situation is a continuous one.  Whilst we can't talk about
> them yet, CXL 3.0 features and OS support have been prototyped on
> top of this support and a lot of the ongoing kernel work is being
> tested against these patches.

Is the core CXL support already in the upstream kernel or do you need a
patched one?

> Other features on the qemu-list that build on these include PCI-DOE
> /CDAT support from the Avery Design team further showing how this
> code is useful.  Whilst not directly related this is also the test
> platform for work on PCI IDE/CMA + related DMTF SPDM as CXL both
> utilizes and extends those technologies and is likely to be an early
> adopter.
> Refs:
> CMA Kernel: https://lore.kernel.org/all/20210804161839.3492053-1-Jonathan.Cameron@huawei.com/
> CMA Qemu: https://lore.kernel.org/qemu-devel/1624665723-5169-1-git-send-email-cbrowy@avery-design.com/
> DOE Qemu: https://lore.kernel.org/qemu-devel/1623329999-15662-1-git-send-email-cbrowy@avery-design.com/
>
>
> As can be seen there is non trivial interaction with other areas of
> Qemu, particularly PCI and keeping this set up to date is proving
> a burden we'd rather do without :)
>
> Ben mentioned a few other good reasons in v3:
> https://lore.kernel.org/qemu-devel/20210202005948.241655-1-ben.widawsky@intel.com/
>
> The evolution of this series perhaps leave it in a less than
> entirely obvious order and that may get tidied up in future postings.
> I'm also open to this being considered in bite sized chunks.  What
> we have here is about what you need for it to be useful for testing
> currently kernel code.

Ah right...

> All comments welcome.
>
> Ben - I lifted one patch from your git tree that didn't have a
> Sign-off.   hw/cxl/component Add a dumb HDM decoder handler
> Could you confirm you are happy for one to be added?
>
> Example of new command line (with virt ITS patches ;)

One thing I think is missing in this series is some documentation. We've
been historically bad at adding it for new devices but given the
complexity of CXL I think we should certainly try to improve. I think a
reasonable stab could be made from the commit messages in the series. I
would suggest:

  docs/system/devices/cxl.rst

And include:

  - an brief overview of CXL
  - kernel config options

and an some example command lines, like bellow:

>
> qemu-system-aarch64 -M virt,gic-version=3,cxl=on \
>  -m 4g,maxmem=8G,slots=8 \
>  ...
>  -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M,align=256M \
>  -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M,align=256M \
>  -object memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M,align=256M \
>  -object memory-backend-file,id=cxl-mem4,share=on,mem-path=/tmp/cxltest4.raw,size=256M,align=256M \
>  -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M,align=256M \
>  -object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M,align=256M \
>  -object memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M,align=256M \
>  -object memory-backend-file,id=cxl-lsa4,share=on,mem-path=/tmp/lsa4.raw,size=256M,align=256M \
>  -object memory-backend-file,id=tt,share=on,mem-path=/tmp/tt.raw,size=1g \
>  -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
>  -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \
>  -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
>  -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0,size=256M \
>  -device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \
>  -device cxl-type3,bus=root_port14,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1,size=256M \
>  -device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \
>  -device cxl-type3,bus=root_port15,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2,size=256M \
>  -device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \
>  -device cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3,size=256M \
>  -cxl-fixed-memory-window targets=cxl.1,size=4G,interleave-granularity=8k \
>  -cxl-fixed-memory-window
> targets=cxl.1,targets=cxl.2,size=4G,interleave-granularity=8k

So AIUI the above creates some CXL pmem devices that are part of the CXL
root bus which itself is on the PCIe bus? Is the intention that
reads/writes into the pmem by the guest end up visible in various forms
in the memory backend files? Are memory backends required or can the
address space be treated as volatile RAM that doesn't persist beyond a
reset/reboot?

Maybe a simple diagram will help make things clearer?

>
> First CFMWS suitable for 2 way interleave, the second for 4 way (2 way
> at host level and 2 way at the host bridge).
> targets=<range of pxb-cxl uids> , multiple entries if range is disjoint.
>
<snip>

-- 
Alex Bennée

Re: [PATCH v4 00/42] CXl 2.0 emulation Support
Posted by Jonathan Cameron via 2 years, 2 months ago
On Tue, 25 Jan 2022 13:55:29 +0000
Alex Bennée <alex.bennee@linaro.org> wrote:

Hi Alex,

Thanks for taking a look so quickly!

> Jonathan Cameron <Jonathan.Cameron@huawei.com> writes:
> 
> > Previous version was RFC v3: CXL 2.0 Support.
> > No longer an RFC as I would consider the vast majority of this
> > to be ready for detailed review. There are still questions called
> > out in some patches however.
> >
> > Looking in particular for:
> > * Review of the PCI interactions
> > * x86 and ARM machine interactions (particularly the memory maps)
> > * Review of the interleaving approach - is the basic idea
> >   acceptable?
> > * Review of the command line interface.
> > * CXL related review welcome but much of that got reviewed
> >   in earlier versions and hasn't changed substantially.
> >  
> <snip>
> >
> > Why do we want QEMU emulation of CXL?
> >
> > As Ben stated in V3, QEMU support has been critical to getting OS
> > software written given lack of availability of hardware supporting the
> > latest CXL features (coupled with very high demand for support being
> > ready in a timely fashion). What has become clear since Ben's v3
> > is that situation is a continuous one.  Whilst we can't talk about
> > them yet, CXL 3.0 features and OS support have been prototyped on
> > top of this support and a lot of the ongoing kernel work is being
> > tested against these patches.  
> 
> Is the core CXL support already in the upstream kernel or do you need a
> patched one?

Most of support is upstream for those features we are emulating so far,
but a few elements are still work in progress.

The interleave feature has had a couple of revisions on list and
Dan Williams posted a new version of that yesterday.

https://lore.kernel.org/linux-cxl/164298411792.3018233.7493009997525360044.stgit@dwillia2-desk3.amr.corp.intel.com/T/#t

I haven't tested that version yet but will get to that shortly,
this was done against the previous version on list.
I would expect this feature to go in this kernel cycle.
 
> 
> > Other features on the qemu-list that build on these include PCI-DOE
> > /CDAT support from the Avery Design team further showing how this
> > code is useful.  Whilst not directly related this is also the test
> > platform for work on PCI IDE/CMA + related DMTF SPDM as CXL both
> > utilizes and extends those technologies and is likely to be an early
> > adopter.
> > Refs:
> > CMA Kernel: https://lore.kernel.org/all/20210804161839.3492053-1-Jonathan.Cameron@huawei.com/
> > CMA Qemu: https://lore.kernel.org/qemu-devel/1624665723-5169-1-git-send-email-cbrowy@avery-design.com/
> > DOE Qemu: https://lore.kernel.org/qemu-devel/1623329999-15662-1-git-send-email-cbrowy@avery-design.com/
> >
> >
> > As can be seen there is non trivial interaction with other areas of
> > Qemu, particularly PCI and keeping this set up to date is proving
> > a burden we'd rather do without :)
> >
> > Ben mentioned a few other good reasons in v3:
> > https://lore.kernel.org/qemu-devel/20210202005948.241655-1-ben.widawsky@intel.com/
> >
> > The evolution of this series perhaps leave it in a less than
> > entirely obvious order and that may get tidied up in future postings.
> > I'm also open to this being considered in bite sized chunks.  What
> > we have here is about what you need for it to be useful for testing
> > currently kernel code.  
> 
> Ah right...
> 
> > All comments welcome.
> >
> > Ben - I lifted one patch from your git tree that didn't have a
> > Sign-off.   hw/cxl/component Add a dumb HDM decoder handler
> > Could you confirm you are happy for one to be added?
> >
> > Example of new command line (with virt ITS patches ;)  
> 
> One thing I think is missing in this series is some documentation. We've
> been historically bad at adding it for new devices but given the
> complexity of CXL I think we should certainly try to improve. I think a
> reasonable stab could be made from the commit messages in the series. I
> would suggest:
> 
>   docs/system/devices/cxl.rst
> 
> And include:
> 
>   - an brief overview of CXL
>   - kernel config options

Sure. Good idea, I'll write something up.

> 
> and an some example command lines, like bellow:
> 
> >
> > qemu-system-aarch64 -M virt,gic-version=3,cxl=on \
> >  -m 4g,maxmem=8G,slots=8 \
> >  ...
> >  -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=cxl-mem4,share=on,mem-path=/tmp/cxltest4.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=cxl-lsa4,share=on,mem-path=/tmp/lsa4.raw,size=256M,align=256M \
> >  -object memory-backend-file,id=tt,share=on,mem-path=/tmp/tt.raw,size=1g \
> >  -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> >  -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \
> >  -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> >  -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0,size=256M \
> >  -device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \
> >  -device cxl-type3,bus=root_port14,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1,size=256M \
> >  -device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \
> >  -device cxl-type3,bus=root_port15,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2,size=256M \
> >  -device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \
> >  -device cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3,size=256M \
> >  -cxl-fixed-memory-window targets=cxl.1,size=4G,interleave-granularity=8k \
> >  -cxl-fixed-memory-window
> > targets=cxl.1,targets=cxl.2,size=4G,interleave-granularity=8k  
> 
> So AIUI the above creates some CXL pmem devices that are part of the CXL
> root bus which itself is on the PCIe bus? 

That is possibly because of the 'hack' that pxb (pci-expander-bridge)
does of pretending to be a root bus "on2 the pci bus (which I'm fairly
sure you can't actually do in real PCI).  Reality is that is just convenience for
QEMU rather than anything you'd see on a real system.  It's just easier
to use PXB for this as it works on various architectures.  From an OS point
of view there isn't a driver associated with the PXB device, instead its
just seen via ACPI description just like any other root bus.

The CXL root bus, in the sense of the one below which you can
conceive of CXL host bridges sitting is host specific and not visible on
the PCI bus.  It's effectively part of the system interconnect routing
the CXL memory read/write to the CXL root bridges.  That configuration is
considered static by the time any generic software sees it (early boot firmware
may do the actual setup in a similar fashion to a system address map
routing for multiple socket systems which is configured very early in boot and
isn't something we'd want to emulate).  The Fixed memory windows (CFMW) provide
a static description of a particularly region of Physical Address space
which will do interleaving across a predefined set of host bridges with
a particular interleave granularity.  They can also have QoS values, but
so far I've skipped that in the emulation so they are all in QOS group 0.
On real hardware you'd likely have quite a lot of CFMWs to cover combinations
the OS might want to use - spanning a huge part of the physical address space.

Those CXL root bridges have spec defined controls over some features (
such as the interleave across the root ports below a particular root bridge)
and an existence in ACPI that is an extension of what is done for PCI root
bridges.

The CXL root ports are visible as PCI topology as are the CXL devices below
them, including switches (which this patch set doesn't currently support)

From a Linux point of view we end up with two parallel topologies for
CXL and PCI with cross points where the two line up (there end up being
quite a few elements in CXL that don't exist in the PCI topology
representation).

> Is the intention that
> reads/writes into the pmem by the guest end up visible in various forms
> in the memory backend files? 

Yes.  That's how I've been testing it so far. It's very nice to be
able to prefill the files and hence know you are reading the location
you expect.

> Are memory backends required or can the
> address space be treated as volatile RAM that doesn't persist beyond a
> reset/reboot?

We could potentially do that though it would limit testing somewhat, particularly
when we come to label storage area (LSA) based setup which will "describe" the
topology of a previous boot. It's hard to test something that is
pretending to be persistent memory without being able to have the contents
persist across boot.

> 
> Maybe a simple diagram will help make things clearer?

Sure - I'll give it a go though it won't be particularly simple!

Comments welcome as I would expect this will end up as part of
the documentation.

Memory Address Map for CXL elements.  Note where exactly these regions
appear is Arch and platform dependent.  

  Base somewhere far up in the Host PA map.
_______________________________
|                              |
| CXL Host Bridge 0 Registers  | 
| CXL Host Bridge 1 Registers  |
|       ...                    |  This bit is normal MMIO register space.
| CXL Host bridge N registers  |  including programmable interleave decoders 
|______________________________|  for interleave across root ports.
|                              |
              ....     
|                              |
|______________________________|
|                              |
|   CFMW 0,                    |  Note that there can be multiple regions
|   Interleave 2 way, targets  |  of memory within this 1TB which can be
|   Hostbridge 0, Hostbridge 1 |  interleaved differently: in the host bridges
|   Granularity 16KiB, 1TB     |  across root ports or in switches below the root.
|______________________________|  ports
|                              |
|   CFMW 1,                    |
|   Interleave 1 way, target   |
|   Hostbridge 0, 512GiB       | 
|______________________________|
etc for all interleave combinations
configured, or built in to the
system before any generic software
sees it.

System Topology considering CFMW 0 only to keep this simple.
x marks the match in each decoder level
Switches have more interleave decoders and other features
that we haven't implemented yet in QEMU.

                Address Read to CFMW0 base + N
              _________________|________________
             |                                  |
             |  Host interconnect               |  
             |  Configured to route CFM         |
             |  memory access to particular HB  |
             |_____x____________________________|
                   |                     |
             Interleave Decoder          |
             Matches this HB             |  
                   |                     |
            _______|__________      _____|____________
           |                  |    |                  |
           | CXL HB 0         |    | CXL HB 1         | Only exist in PCI (mostly)
           | HB IntLv Decoder |    | HB IntLv Decoder | via ACPI description
           |  PCI Root Bus 0c |    | PCI Root Bus 0d  |
           |x_________________|    |__________________| In CXL have MMIO
            |                |       |               |  at location given in CEDT
            |                |       |               |  CHBS entry (ACPI)
____________|___   __________|__   __|_________   ___|_________ 
|  Root Port 0  | | Root Port 1 | | Root Port 2| | Root Port 3 |
|  Appears in   | | Appears in  | | Appears in | | Appear in   |
|  PCI topology | | PCI Topology| | PCI Topo   | | PCI Topo    |
|  As 0c:00.0   | | as 0c:01.0  | | as de:00.0 | | as de:01.0   |
|_______________| |_____________| |____________| |_____________|
      |                  |               |              |
      |                  |               |              |
 _____|_________   ______|______   ______|_____   ______|_______
|     x         | |             | |            | |              |
| CXL Type3 0   | | CXL Type3 1 | | CXL type3 2| | CLX Type 3 3 |
|               | |             | |            | |              |
| PMEM0(Vol LSA)| | PMEM1 (...) | | PMEM2 (...)| | PMEM3 (...)  |
| Decoder to go | |             | |            | |              |
| from host PA  | | PCI 0e:00.0 | | PCI df:00.0| | PCI e0:00.0  |
| to device PA  | |             | |            | |              | 
| PCI as 0d:00.0| |             | |            | |              |
|_______________| |_____________| |____________| |______________|

   Backed by        Backed by       Backed by       Backed by
    file 0           file 1           file 2          file 3

LSA backed by additional files for each device (not yet supported)

So currently we have decoders as follows for each interleaved access.
1) CFMW decoder - fixed config so forms part of qemu command line.
2) Host bridge decoders - programmable decoders that the system
   software will program either based on user command or based
   on info from the Label Storage Area (not yet emulated)
3) Type 3 device decoders. Down to here the address used is the
   Host PA.  These decoders convert to the local device PA
   (in simple case - drop some bits in the middle of the address)

Future patches will add decoders in switch upstream ports making
the above diagram have another layer between root ports and
the memory devices.

Note, we've focused for now on Persistent Memory devices as they are seen
as an early and important usecase (and are the most complex one).
But it should be straight forward to add volatile memory
support and indeed that would be backed by RAM.

lspci -tv for above shows

-+-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host Bridge (this is the cxl PXB)f
 |           \-OTHER STUFF
 +-[0000:0c]-+-00.0-[0d]----00.0  Intel Corporation Device 0d93
 |           \-01.0-[0e]----00.0  Intel Corporation Device 0d93
 \-[0000:de]-+-00.0-[df]----00.0  Intel Corporation Device 0d93
             \-01.0-[e0]----00.0  Intel Corporation Device 0d93

Where those Intel parts are the type 3 devices.

So everything should now be as clear as mud.

Thanks,

Jonathan


> 
> >
> > First CFMWS suitable for 2 way interleave, the second for 4 way (2 way
> > at host level and 2 way at the host bridge).
> > targets=<range of pxb-cxl uids> , multiple entries if range is disjoint.
> >  
> <snip>
>