docs/man/xl.cfg.5.pod.in | 10 + docs/misc/arm/cache-coloring.rst | 201 ++++++++++++++ docs/misc/arm/device-tree/booting.txt | 4 + docs/misc/xen-command-line.pandoc | 45 ++++ tools/libs/light/libxl_create.c | 12 + tools/libs/light/libxl_types.idl | 1 + tools/xl/xl_parse.c | 52 +++- xen/arch/arm/Kconfig | 28 ++ xen/arch/arm/Makefile | 1 + xen/arch/arm/alternative.c | 5 + xen/arch/arm/coloring.c | 367 ++++++++++++++++++++++++++ xen/arch/arm/domain.c | 14 + xen/arch/arm/domain_build.c | 22 +- xen/arch/arm/include/asm/coloring.h | 60 +++++ xen/arch/arm/include/asm/config.h | 4 +- xen/arch/arm/include/asm/domain.h | 4 + xen/arch/arm/include/asm/mm.h | 22 +- xen/arch/arm/include/asm/processor.h | 16 ++ xen/arch/arm/mm.c | 144 ++++++++-- xen/arch/arm/psci.c | 4 +- xen/arch/arm/setup.c | 90 ++++++- xen/arch/arm/smpboot.c | 3 +- xen/arch/arm/xen.lds.S | 2 +- xen/common/page_alloc.c | 237 ++++++++++++++++- xen/common/vmap.c | 25 ++ xen/include/public/arch-arm.h | 8 + xen/include/xen/vmap.h | 4 + 27 files changed, 1333 insertions(+), 52 deletions(-) create mode 100644 docs/misc/arm/cache-coloring.rst create mode 100644 xen/arch/arm/coloring.c create mode 100644 xen/arch/arm/include/asm/coloring.h
Shared caches in multi-core CPU architectures represent a problem for predictability of memory access latency. This jeopardizes applicability of many Arm platform in real-time critical and mixed-criticality scenarios. We introduce support for cache partitioning with page coloring, a transparent software technique that enables isolation between domains and Xen, and thus avoids cache interference. When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows the user to define assignments of cache partitions ids, called colors, where assigning different colors guarantees no mutual eviction on cache will ever happen. This instructs the Xen memory allocator to provide the i-th color assignee only with pages that maps to color i, i.e. that are indexed in the i-th cache partition. The proposed implementation supports the dom0less feature. The solution has been tested in several scenarios, including Xilinx Zynq MPSoCs. Overview of implementation and commits structure ------------------------------------------------ - [1-3] Coloring initialization, cache layout auto-probing and coloring data for domains are added. - [4-5] xl and Device Tree support for coloring is addedd. - [6-7] A new page allocator for domain memory that implement the cache coloring mechanism is introduced. - [8-12] Coloring support is added for Xen .text region. Changes in v2 ------------- Lot of things changed between the two versions, mainly I tried to follow all the comments left by the maintainers after the previous version review. Here is a brief list of the major points (even if, imho, it's easier to repeat all the review process): - One of the easiest change to spot is the reduced number of patches in the series. A lot of problems of bad splitting of commits where present before (documentation only in last commits, functionalities firstly introduced and later used in other commits, etc). - Definition of LLC (Last Level Cache) as the place where coloring applies should be more consistent throughout all the series (documentation and cache layout auto-probing code). - Kconfig option to let configure the maximum number of cache colors. - Only one kind of syntax to specify color configurations. - Only arrays to store colors (no more need for bitmaps). - No more limitations on the max number of colors (previously, because of a static assert failure, it was limited to 64). - Kconfig option to let configure the buddy allocator reserved size. - Removed the duplicated version of setup_pagetables. - No more need to expose vm_alloc function as non-static. Open points and possible problems --------------------------------- - The way xl passes user space memory to Xen it's adapted from various points of the xl code itself (e.g. xc_domain_node_setaffinity) and it works, but it really needs attention from expert maintainers since I'm not completely sure this is the correct way of doing things. - We still need to bring back the relocation feature (part of) in order to move Xen memory to a colored space where the hypervisor could be isolated from VMs interference (see the revert commit #10 and the get_xen_paddr function in #12). - Revert commits #8 and #9 are needed because coloring has the command line parsing as a prerequisite for its initialization and setup_pagetables must be called after it in order to color the Xen mapping. The DTB mapping is then added to the boot page tables instead of the Xen ones. Probably the way this is done is a bit simplistic. Looking forward for comments on the subject. - A temporary mapping of the old Xen code (old here means non-colored) is used to reach variables in the old physical space so that secondary CPUs can boot. There were some comments in the previous version on that because the mapping is available for all the CPUs while only CPU0 is the one supposed to access it. I'm not sure how to temporarily mapping things only for the master CPU. - A lot of #ifdef for cache coloring are introduced because I prefer to define functions only if they are actually needed. Let me know if you prefer a different approach. - Julien posted an RFC to address a problem with the switch_ttbr function. For the moment I haven't considered it since it's still a work in progress. Acknowledgements ---------------- This work is sponsored by Xilinx Inc., and supported by University of Modena and Reggio Emilia and Minerva Systems. Carlo Nonato (10): xen/arm: add cache coloring initialization xen/arm: add cache coloring initialization for domains xen/arm: dump cache colors in domain info debug-key tools/xl: add support for cache coloring configuration xen/arm: add support for cache coloring configuration via device-tree xen/common: add cache coloring allocator for domains xen/common: add colored heap info debug-key Revert "xen/arm: Remove unused BOOT_RELOC_VIRT_START" xen/arm: add Xen cache colors command line parameter xen/arm: add cache coloring support for Xen Luca Miccio (2): Revert "xen/arm: setup: Add Xen as boot module before printing all boot modules" Revert "xen/arm: mm: Initialize page-tables earlier" docs/man/xl.cfg.5.pod.in | 10 + docs/misc/arm/cache-coloring.rst | 201 ++++++++++++++ docs/misc/arm/device-tree/booting.txt | 4 + docs/misc/xen-command-line.pandoc | 45 ++++ tools/libs/light/libxl_create.c | 12 + tools/libs/light/libxl_types.idl | 1 + tools/xl/xl_parse.c | 52 +++- xen/arch/arm/Kconfig | 28 ++ xen/arch/arm/Makefile | 1 + xen/arch/arm/alternative.c | 5 + xen/arch/arm/coloring.c | 367 ++++++++++++++++++++++++++ xen/arch/arm/domain.c | 14 + xen/arch/arm/domain_build.c | 22 +- xen/arch/arm/include/asm/coloring.h | 60 +++++ xen/arch/arm/include/asm/config.h | 4 +- xen/arch/arm/include/asm/domain.h | 4 + xen/arch/arm/include/asm/mm.h | 22 +- xen/arch/arm/include/asm/processor.h | 16 ++ xen/arch/arm/mm.c | 144 ++++++++-- xen/arch/arm/psci.c | 4 +- xen/arch/arm/setup.c | 90 ++++++- xen/arch/arm/smpboot.c | 3 +- xen/arch/arm/xen.lds.S | 2 +- xen/common/page_alloc.c | 237 ++++++++++++++++- xen/common/vmap.c | 25 ++ xen/include/public/arch-arm.h | 8 + xen/include/xen/vmap.h | 4 + 27 files changed, 1333 insertions(+), 52 deletions(-) create mode 100644 docs/misc/arm/cache-coloring.rst create mode 100644 xen/arch/arm/coloring.c create mode 100644 xen/arch/arm/include/asm/coloring.h -- 2.34.1
Hi Carlo, On 26/08/2022 13:50, Carlo Nonato wrote: > - The way xl passes user space memory to Xen it's adapted from various > points of the xl code itself (e.g. xc_domain_node_setaffinity) and it > works, but it really needs attention from expert maintainers since > I'm not completely sure this is the correct way of doing things. > - We still need to bring back the relocation feature (part of) in order > to move Xen memory to a colored space where the hypervisor could be > isolated from VMs interference (see the revert commit #10 and the > get_xen_paddr function in #12). > - Revert commits #8 and #9 are needed because coloring has the command > line parsing as a prerequisite for its initialization and > setup_pagetables must be called after it in order to color the Xen > mapping. The DTB mapping is then added to the boot page tables instead > of the Xen ones. Probably the way this is done is a bit simplistic. > Looking forward for comments on the subject. > - A temporary mapping of the old Xen code (old here means non-colored) > is used to reach variables in the old physical space so that secondary > CPUs can boot. There were some comments in the previous version on that > because the mapping is available for all the CPUs while only CPU0 is > the one supposed to access it. I'm not sure how to temporarily mapping > things only for the master CPU. On Arm64, Xen will only use one set of page-tables for all the CPUs. So it will not be possible to have a temporary mapping for a single CPU. But what you can do is mapping the region and unmapping it when you are done. That said, I would rather prefer if we can get rid of the old copy of Xen. This would means secondary CPUs will directly jump to the new Xen. > - A lot of #ifdef for cache coloring are introduced because I prefer to > define functions only if they are actually needed. Let me know if you > prefer a different approach. The preferred approach in Xen is to provide stub helpers in the #else part. > - Julien posted an RFC to address a problem with the switch_ttbr function. > For the moment I haven't considered it since it's still a work in progress. I have posted a new version for this: https://lore.kernel.org/xen-devel/20221022150422.17707-1-julien@xen.org/ There are a couple of open questions about the interaction with cache coloring. Please have a look there. Cheers, -- Julien Grall
On 26.08.2022 14:50, Carlo Nonato wrote: > Shared caches in multi-core CPU architectures represent a problem for > predictability of memory access latency. This jeopardizes applicability > of many Arm platform in real-time critical and mixed-criticality > scenarios. We introduce support for cache partitioning with page > coloring, a transparent software technique that enables isolation > between domains and Xen, and thus avoids cache interference. > > When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows > the user to define assignments of cache partitions ids, called colors, > where assigning different colors guarantees no mutual eviction on cache > will ever happen. This instructs the Xen memory allocator to provide > the i-th color assignee only with pages that maps to color i, i.e. that > are indexed in the i-th cache partition. > > The proposed implementation supports the dom0less feature. > The solution has been tested in several scenarios, including Xilinx Zynq > MPSoCs. Having looked at the non-Arm-specific parts of this I have one basic question: Wouldn't it be possible to avoid the addition of entirely new logic by treating the current model as just using a single color, therefore merely becoming a special case of what you want? Plus an advanced question: In how far does this interoperate with static allocation, which again is (for now) an Arm-only feature? Your reference to dom0less above doesn't cover this afaict. Jan
On Thu, 15 Sep 2022, Jan Beulich wrote: > Plus an advanced question: In how far does this interoperate with > static allocation, which again is (for now) an Arm-only feature? > Your reference to dom0less above doesn't cover this afaict. I take you are referring to static-mem, the static memory ranges for dom0less domUs described in docs/misc/arm/device-tree/booting.txt. static-mem doesn't interoperate with cache coloring: each static range would span across multiple colors. You have to choose either feature, using both at the same time doesn't make sense. Cheers, Stefano
Hi Jan, On Thu, Sep 15, 2022 at 03:29:08PM +0200, Jan Beulich wrote: > On 26.08.2022 14:50, Carlo Nonato wrote: > > Shared caches in multi-core CPU architectures represent a problem for > > predictability of memory access latency. This jeopardizes applicability > > of many Arm platform in real-time critical and mixed-criticality > > scenarios. We introduce support for cache partitioning with page > > coloring, a transparent software technique that enables isolation > > between domains and Xen, and thus avoids cache interference. > > > > When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows > > the user to define assignments of cache partitions ids, called colors, > > where assigning different colors guarantees no mutual eviction on cache > > will ever happen. This instructs the Xen memory allocator to provide > > the i-th color assignee only with pages that maps to color i, i.e. that > > are indexed in the i-th cache partition. > > > > The proposed implementation supports the dom0less feature. > > The solution has been tested in several scenarios, including Xilinx Zynq > > MPSoCs. > > Having looked at the non-Arm-specific parts of this I have one basic > question: Wouldn't it be possible to avoid the addition of entirely > new logic by treating the current model as just using a single color, > therefore merely becoming a special case of what you want? Nice question. Thanks! In principle, you are quite right: monochrome is just a degenerate choice of colouring---the colouring implementation with a single colour allows assigning all the available pages, exactly as it happens with the ordinary allocator. The difference lies in the allocation algorithm. In practice, that would be quite inefficient. This is because the allocation logic used by the coloured allocator is quite simpler, since it operates with lists, instead of binary trees. Now, upgrading the logic of the coloured allocator would be an overkill because lowering the complexity of insertion/removal operations from linear to logarithmic does not change much, since in the real world, the longest sequence of physically contiguous pages that may be assigned is max_colours - 1. Cheers. -- Marco Solieri, Ph.D. CEO & Founder Tel: +39-059-205-5182 -- Mobile: +39-349-678-66-65 -- OpenPGP: 0x75822E7E Minerva Systems SRL -- https://www.minervasys.tech Via Campi 213/B, 41125, Modena, Italy -- PIVA/CF 03996890368 ~~> Discover how to easily optimise complex embedded solutions for high-performance, safety and predictability. Together. > Plus an advanced question: In how far does this interoperate with > static allocation, which again is (for now) an Arm-only feature? > Your reference to dom0less above doesn't cover this afaict. > > Jan
Hi Carlo, On 26/08/2022 13:50, Carlo Nonato wrote: > Shared caches in multi-core CPU architectures represent a problem for > predictability of memory access latency. This jeopardizes applicability > of many Arm platform in real-time critical and mixed-criticality > scenarios. We introduce support for cache partitioning with page > coloring, a transparent software technique that enables isolation > between domains and Xen, and thus avoids cache interference. > > When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows > the user to define assignments of cache partitions ids, called colors, > where assigning different colors guarantees no mutual eviction on cache > will ever happen. This instructs the Xen memory allocator to provide > the i-th color assignee only with pages that maps to color i, i.e. that > are indexed in the i-th cache partition. > > The proposed implementation supports the dom0less feature. > The solution has been tested in several scenarios, including Xilinx Zynq > MPSoCs. > > Overview of implementation and commits structure > ------------------------------------------------ > > - [1-3] Coloring initialization, cache layout auto-probing and coloring > data for domains are added. > - [4-5] xl and Device Tree support for coloring is addedd. > - [6-7] A new page allocator for domain memory that implement the cache > coloring mechanism is introduced. > - [8-12] Coloring support is added for Xen .text region. > > Changes in v2 > ------------- > > Lot of things changed between the two versions, mainly I tried to follow > all the comments left by the maintainers after the previous version review. > Here is a brief list of the major points (even if, imho, it's easier to > repeat all the review process): The series doesn't build on Arm64 without cache coloring. Please make sure to compile and check that Xen still boot on system after your series with cache coloring disabled. Cheers, -- Julien Grall
Hi Julien, On Sat, Sep 10, 2022 at 5:12 PM Julien Grall <julien@xen.org> wrote: > > Hi Carlo, > > On 26/08/2022 13:50, Carlo Nonato wrote: > > Shared caches in multi-core CPU architectures represent a problem for > > predictability of memory access latency. This jeopardizes applicability > > of many Arm platform in real-time critical and mixed-criticality > > scenarios. We introduce support for cache partitioning with page > > coloring, a transparent software technique that enables isolation > > between domains and Xen, and thus avoids cache interference. > > > > When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows > > the user to define assignments of cache partitions ids, called colors, > > where assigning different colors guarantees no mutual eviction on cache > > will ever happen. This instructs the Xen memory allocator to provide > > the i-th color assignee only with pages that maps to color i, i.e. that > > are indexed in the i-th cache partition. > > > > The proposed implementation supports the dom0less feature. > > The solution has been tested in several scenarios, including Xilinx Zynq > > MPSoCs. > > > > Overview of implementation and commits structure > > ------------------------------------------------ > > > > - [1-3] Coloring initialization, cache layout auto-probing and coloring > > data for domains are added. > > - [4-5] xl and Device Tree support for coloring is addedd. > > - [6-7] A new page allocator for domain memory that implement the cache > > coloring mechanism is introduced. > > - [8-12] Coloring support is added for Xen .text region. > > > > Changes in v2 > > ------------- > > > > Lot of things changed between the two versions, mainly I tried to follow > > all the comments left by the maintainers after the previous version review. > > Here is a brief list of the major points (even if, imho, it's easier to > > repeat all the review process): > > The series doesn't build on Arm64 without cache coloring. Please make > sure to compile and check that Xen still boot on system after your > series with cache coloring disabled. I'm sorry for that. Tested multiple times, but probably missed it after some last minute change. The following diff fixes it. diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h index 00351ee014..6abe2fdef7 100644 --- a/xen/arch/arm/include/asm/mm.h +++ b/xen/arch/arm/include/asm/mm.h @@ -411,7 +411,7 @@ static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn) #else #define virt_boot_xen(virt) virt #define set_value_for_secondary(var, val) \ - var = val; + var = val; \ clean_dcache(var); #endif > > Cheers, > > -- > Julien Grall Thanks. - Carlo Nonato
© 2016 - 2024 Red Hat, Inc.