[PATCH v6 00/17] Device tree based NUMA support for Arm

Henry Wang posted 17 patches 1 year, 11 months ago
Failed in applying to current master (apply log)
CHANGELOG.md                      |   1 +
SUPPORT.md                        |   1 +
docs/misc/xen-command-line.pandoc |   2 +-
xen/arch/arm/Kconfig              |  11 ++
xen/arch/arm/Makefile             |   2 +
xen/arch/arm/domain_build.c       |   6 +
xen/arch/arm/include/asm/numa.h   |  91 ++++++++-
xen/arch/arm/numa-dt.c            | 299 ++++++++++++++++++++++++++++++
xen/arch/arm/numa.c               | 184 ++++++++++++++++++
xen/arch/arm/setup.c              |  17 ++
xen/arch/arm/smpboot.c            |  38 ++++
xen/arch/x86/include/asm/numa.h   |   1 -
xen/arch/x86/srat.c               |   2 +-
xen/include/xen/numa.h            |  10 +
14 files changed, 661 insertions(+), 4 deletions(-)
create mode 100644 xen/arch/arm/numa-dt.c
create mode 100644 xen/arch/arm/numa.c
[PATCH v6 00/17] Device tree based NUMA support for Arm
Posted by Henry Wang 1 year, 11 months ago
The preparation work to support NUMA on Arm has been merged
and can be found at [1] and [2]. The initial discussions of
the Arm NUMA support can be found at [3].

- Background of this series:

Xen memory allocation and scheduler modules are NUMA aware.
But actually, on x86 has implemented the architecture APIs
to support NUMA. Arm was providing a set of fake architecture
APIs to make it compatible with NUMA awared memory allocation
and scheduler.

Arm system was working well as a single node NUMA system with
these fake APIs, because we didn't have multiple nodes NUMA
system on Arm. But in recent years, more and more Arm devices
support multiple nodes NUMA system.

So now we have a new problem. When Xen is running on these Arm
devices, Xen still treat them as single node SMP systems. The
NUMA affinity capability of Xen memory allocation and scheduler
becomes meaningless. Because they rely on input data that does
not reflect real NUMA layout.

Xen still think the access time for all of the memory is the
same for all CPUs. However, Xen may allocate memory to a VM
from different NUMA nodes with different access speeds. This
difference can be amplified in workloads inside VM, causing
performance instability and timeouts.

So in this patch series, we implement a set of NUMA API to use
device tree to describe the NUMA layout. We reuse most of the
code of x86 NUMA to create and maintain the mapping between
memory and CPU, create the matrix between any two NUMA nodes.
Except ACPI and some x86 specified code, we have moved other
code to common. In next stage, when we implement ACPI based
NUMA for Arm64, we may move the ACPI NUMA code to common too,
but in current stage, we keep it as x86 only.

This patch serires has been tested and booted well on FVP in
Arm64 mode with NUMA configs in device tree and one HPE x86
NUMA machine.

[1] https://lists.xenproject.org/archives/html/xen-devel/2022-06/msg00499.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2022-11/msg01043.html
[3] https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg01903.html

Henry Wang (1):
  xen/arm: Set correct per-cpu cpu_core_mask

Wei Chen (16):
  xen/arm: use NR_MEM_BANKS to override default NR_NODE_MEMBLKS
  xen/arm: implement helpers to get and update NUMA status
  xen/arm: implement node distance helpers for Arm
  xen/arm: use arch_get_ram_range to get memory ranges from bootinfo
  xen/arm: build NUMA cpu_to_node map in dt_smp_init_cpus
  xen/arm: Add boot and secondary CPU to NUMA system
  xen/arm: introduce a helper to parse device tree processor node
  xen/arm: introduce a helper to parse device tree memory node
  xen/arm: introduce a helper to parse device tree NUMA distance map
  xen/arm: unified entry to parse all NUMA data from device tree
  xen/arm: keep guest still be NUMA unware
  xen/arm: enable device tree based NUMA in system init
  xen/arm: implement numa_node_to_arch_nid for device tree NUMA
  xen/arm: use CONFIG_NUMA to gate node_online_map in smpboot
  xen/arm: Provide Kconfig options for Arm to enable NUMA
  docs: update numa command line to support Arm

 CHANGELOG.md                      |   1 +
 SUPPORT.md                        |   1 +
 docs/misc/xen-command-line.pandoc |   2 +-
 xen/arch/arm/Kconfig              |  11 ++
 xen/arch/arm/Makefile             |   2 +
 xen/arch/arm/domain_build.c       |   6 +
 xen/arch/arm/include/asm/numa.h   |  91 ++++++++-
 xen/arch/arm/numa-dt.c            | 299 ++++++++++++++++++++++++++++++
 xen/arch/arm/numa.c               | 184 ++++++++++++++++++
 xen/arch/arm/setup.c              |  17 ++
 xen/arch/arm/smpboot.c            |  38 ++++
 xen/arch/x86/include/asm/numa.h   |   1 -
 xen/arch/x86/srat.c               |   2 +-
 xen/include/xen/numa.h            |  10 +
 14 files changed, 661 insertions(+), 4 deletions(-)
 create mode 100644 xen/arch/arm/numa-dt.c
 create mode 100644 xen/arch/arm/numa.c

-- 
2.25.1
Re: [PATCH v6 00/17] Device tree based NUMA support for Arm
Posted by Stefano Stabellini 1 year, 11 months ago
Hi Henry,

Thank you for all your work on this series and in general upstreaming
other patches too!

How do we test it? Is there a way to test this series with QEMU and some
special device tree? I am asking because during review it would make
things easier if we could exercise the new code somehow.

I haven't had a chance to review this properly but I noticed that a few
code additions are not protected by CONFIG_NUMA. Maybe they should?

Also given that this is not a small series, I wanted to check with you
if this is a good time to do a full review of the series (in the sense
that you are OK with handling review feedback now) or whether you would
rather have us do the review at another time.

Cheers,

Stefano


On Mon, 20 Nov 2023, Henry Wang wrote:
> The preparation work to support NUMA on Arm has been merged
> and can be found at [1] and [2]. The initial discussions of
> the Arm NUMA support can be found at [3].
> 
> - Background of this series:
> 
> Xen memory allocation and scheduler modules are NUMA aware.
> But actually, on x86 has implemented the architecture APIs
> to support NUMA. Arm was providing a set of fake architecture
> APIs to make it compatible with NUMA awared memory allocation
> and scheduler.
> 
> Arm system was working well as a single node NUMA system with
> these fake APIs, because we didn't have multiple nodes NUMA
> system on Arm. But in recent years, more and more Arm devices
> support multiple nodes NUMA system.
> 
> So now we have a new problem. When Xen is running on these Arm
> devices, Xen still treat them as single node SMP systems. The
> NUMA affinity capability of Xen memory allocation and scheduler
> becomes meaningless. Because they rely on input data that does
> not reflect real NUMA layout.
> 
> Xen still think the access time for all of the memory is the
> same for all CPUs. However, Xen may allocate memory to a VM
> from different NUMA nodes with different access speeds. This
> difference can be amplified in workloads inside VM, causing
> performance instability and timeouts.
> 
> So in this patch series, we implement a set of NUMA API to use
> device tree to describe the NUMA layout. We reuse most of the
> code of x86 NUMA to create and maintain the mapping between
> memory and CPU, create the matrix between any two NUMA nodes.
> Except ACPI and some x86 specified code, we have moved other
> code to common. In next stage, when we implement ACPI based
> NUMA for Arm64, we may move the ACPI NUMA code to common too,
> but in current stage, we keep it as x86 only.
> 
> This patch serires has been tested and booted well on FVP in
> Arm64 mode with NUMA configs in device tree and one HPE x86
> NUMA machine.
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2022-06/msg00499.html
> [2] https://lists.xenproject.org/archives/html/xen-devel/2022-11/msg01043.html
> [3] https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg01903.html
> 
> Henry Wang (1):
>   xen/arm: Set correct per-cpu cpu_core_mask
> 
> Wei Chen (16):
>   xen/arm: use NR_MEM_BANKS to override default NR_NODE_MEMBLKS
>   xen/arm: implement helpers to get and update NUMA status
>   xen/arm: implement node distance helpers for Arm
>   xen/arm: use arch_get_ram_range to get memory ranges from bootinfo
>   xen/arm: build NUMA cpu_to_node map in dt_smp_init_cpus
>   xen/arm: Add boot and secondary CPU to NUMA system
>   xen/arm: introduce a helper to parse device tree processor node
>   xen/arm: introduce a helper to parse device tree memory node
>   xen/arm: introduce a helper to parse device tree NUMA distance map
>   xen/arm: unified entry to parse all NUMA data from device tree
>   xen/arm: keep guest still be NUMA unware
>   xen/arm: enable device tree based NUMA in system init
>   xen/arm: implement numa_node_to_arch_nid for device tree NUMA
>   xen/arm: use CONFIG_NUMA to gate node_online_map in smpboot
>   xen/arm: Provide Kconfig options for Arm to enable NUMA
>   docs: update numa command line to support Arm
> 
>  CHANGELOG.md                      |   1 +
>  SUPPORT.md                        |   1 +
>  docs/misc/xen-command-line.pandoc |   2 +-
>  xen/arch/arm/Kconfig              |  11 ++
>  xen/arch/arm/Makefile             |   2 +
>  xen/arch/arm/domain_build.c       |   6 +
>  xen/arch/arm/include/asm/numa.h   |  91 ++++++++-
>  xen/arch/arm/numa-dt.c            | 299 ++++++++++++++++++++++++++++++
>  xen/arch/arm/numa.c               | 184 ++++++++++++++++++
>  xen/arch/arm/setup.c              |  17 ++
>  xen/arch/arm/smpboot.c            |  38 ++++
>  xen/arch/x86/include/asm/numa.h   |   1 -
>  xen/arch/x86/srat.c               |   2 +-
>  xen/include/xen/numa.h            |  10 +
>  14 files changed, 661 insertions(+), 4 deletions(-)
>  create mode 100644 xen/arch/arm/numa-dt.c
>  create mode 100644 xen/arch/arm/numa.c
> 
> -- 
> 2.25.1
>
Re: [PATCH v6 00/17] Device tree based NUMA support for Arm
Posted by Henry Wang 1 year, 11 months ago
Hi Stefano,

Thanks for the email!

> On Nov 30, 2023, at 09:06, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> Hi Henry,
> 
> Thank you for all your work on this series and in general upstreaming
> other patches too!

It’s our pleasure.

> How do we test it? Is there a way to test this series with QEMU and some
> special device tree? I am asking because during review it would make
> things easier if we could exercise the new code somehow.

Emmm, let me add some details. I tested the code using FVP. From my
understanding, you only need to tweak the device tree for the memory,
cpus node and add a "distance-map” node following the device tree binding [1].

But I understand your pain of lacking the FVP to play with the code (same
for the R82 support). For these, we’ve checked internally with the legal experts
about the possibility to include the FVP in the upstream’s GitLab CI containers
without triggering the redistribution issue. It seem that  now all the FVPs
referenced on [2], including both the “FVP_Base_RevC-2xAEMvA” and the
“FVP_BaseR_AEMv8R”, are licensed under lightweight Eco System EULA
that has no restrictions on the redistribution. Therefore I think now we are safe to
pack them in the upstream’s containers and use it for the GitLab CI testing.

I think we can add a GitLab CI job to test NUMA (and future Arm features), it is
just we need to design and do a extendable framework for it. I can work on it
in the next version of the NUMA series. What do you think?

> I haven't had a chance to review this properly but I noticed that a few
> code additions are not protected by CONFIG_NUMA. Maybe they should?

I think you are referring to the code in xen/arch/arm/setup.c correct? These
parts of the code is designed to work for both NUMA and non-NUMA setups
(see the in-code comment on top of them). And actually, the function
numa_set_cpu_core_mask() should not be protected by CONFIG_NUMA,
as it is fixing an actual "bug" described in the commit message of the patch #15.

> Also given that this is not a small series, I wanted to check with you
> if this is a good time to do a full review of the series (in the sense
> that you are OK with handling review feedback now) or whether you would
> rather have us do the review at another time.

Thanks for being considerate! I have been planned a task and reserved my
bandwidth internally to work on the upstream of this series. Also personally I
prioritize the upstream’s comment/feedback in my working schedule. So I think I
will handle the review feedback in time, don’t worry :)

[1] https://www.kernel.org/doc/Documentation/devicetree/bindings/numa.txt
[2] https://developer.arm.com/Tools%20and%20Software/Fixed%20Virtual%20Platforms

Kind regards,
Henry

> 
> Cheers,
> 
> Stefano
> 
> 
> On Mon, 20 Nov 2023, Henry Wang wrote:
>> The preparation work to support NUMA on Arm has been merged
>> and can be found at [1] and [2]. The initial discussions of
>> the Arm NUMA support can be found at [3].
>> 
>> - Background of this series:
>> 
>> Xen memory allocation and scheduler modules are NUMA aware.
>> But actually, on x86 has implemented the architecture APIs
>> to support NUMA. Arm was providing a set of fake architecture
>> APIs to make it compatible with NUMA awared memory allocation
>> and scheduler.
>> 
>> Arm system was working well as a single node NUMA system with
>> these fake APIs, because we didn't have multiple nodes NUMA
>> system on Arm. But in recent years, more and more Arm devices
>> support multiple nodes NUMA system.
>> 
>> So now we have a new problem. When Xen is running on these Arm
>> devices, Xen still treat them as single node SMP systems. The
>> NUMA affinity capability of Xen memory allocation and scheduler
>> becomes meaningless. Because they rely on input data that does
>> not reflect real NUMA layout.
>> 
>> Xen still think the access time for all of the memory is the
>> same for all CPUs. However, Xen may allocate memory to a VM
>> from different NUMA nodes with different access speeds. This
>> difference can be amplified in workloads inside VM, causing
>> performance instability and timeouts.
>> 
>> So in this patch series, we implement a set of NUMA API to use
>> device tree to describe the NUMA layout. We reuse most of the
>> code of x86 NUMA to create and maintain the mapping between
>> memory and CPU, create the matrix between any two NUMA nodes.
>> Except ACPI and some x86 specified code, we have moved other
>> code to common. In next stage, when we implement ACPI based
>> NUMA for Arm64, we may move the ACPI NUMA code to common too,
>> but in current stage, we keep it as x86 only.
>> 
>> This patch serires has been tested and booted well on FVP in
>> Arm64 mode with NUMA configs in device tree and one HPE x86
>> NUMA machine.
>> 
>> [1] https://lists.xenproject.org/archives/html/xen-devel/2022-06/msg00499.html
>> [2] https://lists.xenproject.org/archives/html/xen-devel/2022-11/msg01043.html
>> [3] https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg01903.html
>> 
>> Henry Wang (1):
>>  xen/arm: Set correct per-cpu cpu_core_mask
>> 
>> Wei Chen (16):
>>  xen/arm: use NR_MEM_BANKS to override default NR_NODE_MEMBLKS
>>  xen/arm: implement helpers to get and update NUMA status
>>  xen/arm: implement node distance helpers for Arm
>>  xen/arm: use arch_get_ram_range to get memory ranges from bootinfo
>>  xen/arm: build NUMA cpu_to_node map in dt_smp_init_cpus
>>  xen/arm: Add boot and secondary CPU to NUMA system
>>  xen/arm: introduce a helper to parse device tree processor node
>>  xen/arm: introduce a helper to parse device tree memory node
>>  xen/arm: introduce a helper to parse device tree NUMA distance map
>>  xen/arm: unified entry to parse all NUMA data from device tree
>>  xen/arm: keep guest still be NUMA unware
>>  xen/arm: enable device tree based NUMA in system init
>>  xen/arm: implement numa_node_to_arch_nid for device tree NUMA
>>  xen/arm: use CONFIG_NUMA to gate node_online_map in smpboot
>>  xen/arm: Provide Kconfig options for Arm to enable NUMA
>>  docs: update numa command line to support Arm
>> 
>> CHANGELOG.md                      |   1 +
>> SUPPORT.md                        |   1 +
>> docs/misc/xen-command-line.pandoc |   2 +-
>> xen/arch/arm/Kconfig              |  11 ++
>> xen/arch/arm/Makefile             |   2 +
>> xen/arch/arm/domain_build.c       |   6 +
>> xen/arch/arm/include/asm/numa.h   |  91 ++++++++-
>> xen/arch/arm/numa-dt.c            | 299 ++++++++++++++++++++++++++++++
>> xen/arch/arm/numa.c               | 184 ++++++++++++++++++
>> xen/arch/arm/setup.c              |  17 ++
>> xen/arch/arm/smpboot.c            |  38 ++++
>> xen/arch/x86/include/asm/numa.h   |   1 -
>> xen/arch/x86/srat.c               |   2 +-
>> xen/include/xen/numa.h            |  10 +
>> 14 files changed, 661 insertions(+), 4 deletions(-)
>> create mode 100644 xen/arch/arm/numa-dt.c
>> create mode 100644 xen/arch/arm/numa.c
>> 
>> -- 
>> 2.25.1
>> 

Re: [PATCH v6 00/17] Device tree based NUMA support for Arm
Posted by Julien Grall 1 year, 10 months ago
Hi Henry,

On 20/11/2023 02:54, Henry Wang wrote:
> The preparation work to support NUMA on Arm has been merged
> and can be found at [1] and [2]. The initial discussions of
> the Arm NUMA support can be found at [3].
> 
> - Background of this series:
> 
> Xen memory allocation and scheduler modules are NUMA aware.
> But actually, on x86 has implemented the architecture APIs
> to support NUMA. Arm was providing a set of fake architecture
> APIs to make it compatible with NUMA awared memory allocation
> and scheduler.
> 
> Arm system was working well as a single node NUMA system with
> these fake APIs, because we didn't have multiple nodes NUMA
> system on Arm. But in recent years, more and more Arm devices
> support multiple nodes NUMA system.
> 
> So now we have a new problem. When Xen is running on these Arm
> devices, Xen still treat them as single node SMP systems. The
> NUMA affinity capability of Xen memory allocation and scheduler
> becomes meaningless. Because they rely on input data that does
> not reflect real NUMA layout.
> 
> Xen still think the access time for all of the memory is the
> same for all CPUs. However, Xen may allocate memory to a VM
> from different NUMA nodes with different access speeds. This
> difference can be amplified in workloads inside VM, causing
> performance instability and timeouts.
> 
> So in this patch series, we implement a set of NUMA API to use
> device tree to describe the NUMA layout. We reuse most of the
> code of x86 NUMA to create and maintain the mapping between
> memory and CPU, create the matrix between any two NUMA nodes.
> Except ACPI and some x86 specified code, we have moved other
> code to common. In next stage, when we implement ACPI based
> NUMA for Arm64, we may move the ACPI NUMA code to common too,
> but in current stage, we keep it as x86 only.
> 
> This patch serires has been tested and booted well on FVP in
> Arm64 mode with NUMA configs in device tree and one HPE x86
> NUMA machine.

This series doesn't seem to apply cleanly to staging. Do you have a tree 
with the series applied? This would help to check some of the final 
behavior.

Cheers,

-- 
Julien Grall