mm: mempolicy: Multi-tier interleaving

[RFC PATCH 0/2] mm: mempolicy: Multi-tier interleaving

Posted by Ravi Jonnalagadda 2 years, 4 months ago

From: Ravi Shankar <ravis.opensrc@micron.com>

Hello,

The current interleave policy operates by interleaving page requests
among nodes defined in the memory policy. To accommodate the
introduction of memory tiers for various memory types (e.g., DDR, CXL,
HBM, PMEM, etc.), a mechanism is needed for interleaving page requests
across these memory types or tiers.

This can be achieved by implementing an interleaving method that
considers the tier weights.
The tier weight will determine the proportion of nodes to select from
those specified in the memory policy.
A tier weight can be assigned to each memory type within the system.

Hasan Al Maruf had put forth a proposal for interleaving between two
tiers, namely the top tier and the low tier. However, this patch was
not adopted due to constraints on the number of available tiers.

https://lore.kernel.org/linux-mm/YqD0%2FtzFwXvJ1gK6@cmpxchg.org/T/

New proposed changes:

1. Introducea sysfs entry to allow setting the interleave weight for each
memory tier.
2. Each tier with a default weight of 1, indicating a standard 1:1
proportion.
3. Distribute the weight of that tier in a uniform manner across all nodes.
4. Modifications to the existing interleaving algorithm to support the
implementation of multi-tier interleaving based on tier-weights.

This is inline with Huang, Ying's presentation in lpc22, 16th slide in
https://lpc.events/event/16/contributions/1209/attachments/1042/1995/\
Live%20In%20a%20World%20With%20Multiple%20Memory%20Types.pdf

Observed a significant increase (165%) in bandwidth utilization
with the newly proposed multi-tier interleaving compared to the
traditional 1:1 interleaving approach between DDR and CXL tier nodes,
where 85% of the bandwidth is allocated to DDR tier and 15% to CXL
tier with MLC -w2 option.

Usage Example:

1. Set weights for DDR (tier4) and CXL(teir22) tiers.
echo 85 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight
echo 15 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight

2. Interleave between DRR(tier4, node-0) and CXL (tier22, node-1) using numactl
numactl -i0,1 mlc --loaded_latency W2

Srinivasulu Thanneeru (2):
  memory tier: Introduce sysfs for tier interleave weights.
  mm: mempolicy: Interleave policy for tiered memory nodes

 include/linux/memory-tiers.h |  27 ++++++++-
 include/linux/sched.h        |   2 +
 mm/memory-tiers.c            |  67 +++++++++++++++-------
 mm/mempolicy.c               | 107 +++++++++++++++++++++++++++++++++--
 4 files changed, 174 insertions(+), 29 deletions(-)

-- 
2.39.3

Re: [RFC PATCH 0/2] mm: mempolicy: Multi-tier interleaving

Posted by Huang, Ying 2 years, 4 months ago

Hi, Ravi,

Thanks for the patch!

Ravi Jonnalagadda <ravis.opensrc@micron.com> writes:

> From: Ravi Shankar <ravis.opensrc@micron.com>
>
> Hello,
>
> The current interleave policy operates by interleaving page requests
> among nodes defined in the memory policy. To accommodate the
> introduction of memory tiers for various memory types (e.g., DDR, CXL,
> HBM, PMEM, etc.), a mechanism is needed for interleaving page requests
> across these memory types or tiers.

Why do we need interleaving page allocation among memory tiers?  I think
that you need to make it more explicit.  I guess that it's to increase
maximal memory bandwidth for workloads?

> This can be achieved by implementing an interleaving method that
> considers the tier weights.
> The tier weight will determine the proportion of nodes to select from
> those specified in the memory policy.
> A tier weight can be assigned to each memory type within the system.

What is the problem of the original interleaving?  I think you need to
make it explicit too.

> Hasan Al Maruf had put forth a proposal for interleaving between two
> tiers, namely the top tier and the low tier. However, this patch was
> not adopted due to constraints on the number of available tiers.
>
> https://lore.kernel.org/linux-mm/YqD0%2FtzFwXvJ1gK6@cmpxchg.org/T/
>
> New proposed changes:
>
> 1. Introducea sysfs entry to allow setting the interleave weight for each
> memory tier.
> 2. Each tier with a default weight of 1, indicating a standard 1:1
> proportion.
> 3. Distribute the weight of that tier in a uniform manner across all nodes.
> 4. Modifications to the existing interleaving algorithm to support the
> implementation of multi-tier interleaving based on tier-weights.
>
> This is inline with Huang, Ying's presentation in lpc22, 16th slide in
> https://lpc.events/event/16/contributions/1209/attachments/1042/1995/\
> Live%20In%20a%20World%20With%20Multiple%20Memory%20Types.pdf

Thanks to refer to the original work about this.

> Observed a significant increase (165%) in bandwidth utilization
> with the newly proposed multi-tier interleaving compared to the
> traditional 1:1 interleaving approach between DDR and CXL tier nodes,
> where 85% of the bandwidth is allocated to DDR tier and 15% to CXL
> tier with MLC -w2 option.

It appears that "mlc" isn't an open source software.  Better to use a
open source software to test.  And, even better to use a more practical
workloads instead of a memory bandwidth/latency measurement tool.

> Usage Example:
>
> 1. Set weights for DDR (tier4) and CXL(teir22) tiers.
> echo 85 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight
> echo 15 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight
>
> 2. Interleave between DRR(tier4, node-0) and CXL (tier22, node-1) using numactl
> numactl -i0,1 mlc --loaded_latency W2
>
> Srinivasulu Thanneeru (2):
>   memory tier: Introduce sysfs for tier interleave weights.
>   mm: mempolicy: Interleave policy for tiered memory nodes
>
>  include/linux/memory-tiers.h |  27 ++++++++-
>  include/linux/sched.h        |   2 +
>  mm/memory-tiers.c            |  67 +++++++++++++++-------
>  mm/mempolicy.c               | 107 +++++++++++++++++++++++++++++++++--
>  4 files changed, 174 insertions(+), 29 deletions(-)

--
Best Regards,
Huang, Ying

Re: [EXT] Re: [RFC PATCH 0/2] mm: mempolicy: Multi-tier interleaving

Posted by Srinivasulu Thanneeru 2 years, 4 months ago

Micron Confidential

Hi Huang,

Thanks to you for your comments and in the next version, these suggestions will be incorporated.

Regards,
Srini

Micron Confidential
________________________________________
From: Huang, Ying <ying.huang@intel.com>
Sent: Thursday, September 28, 2023 11:44 AM
To: Ravis OpenSrc
Cc: linux-mm@vger.kernel.org; linux-cxl@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arch@vger.kernel.org; linux-api@vger.kernel.org; luto@kernel.org; tglx@linutronix.de; mingo@redhat.com; bp@alien8.de; dietmar.eggemann@arm.com; vincent.guittot@linaro.org; dave.hansen@linux.intel.com; hpa@zytor.com; arnd@arndb.de; akpm@linux-foundation.org; x86@kernel.org; aneesh.kumar@linux.ibm.com; gregory.price@memverge.com; John Groves; Srinivasulu Thanneeru; Eishan Mirakhur; Vishal Tanna
Subject: [EXT] Re: [RFC PATCH 0/2] mm: mempolicy: Multi-tier interleaving

CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.


Hi, Ravi,

Thanks for the patch!

Ravi Jonnalagadda <ravis.opensrc@micron.com> writes:

> From: Ravi Shankar <ravis.opensrc@micron.com>
>
> Hello,
>
> The current interleave policy operates by interleaving page requests
> among nodes defined in the memory policy. To accommodate the
> introduction of memory tiers for various memory types (e.g., DDR, CXL,
> HBM, PMEM, etc.), a mechanism is needed for interleaving page requests
> across these memory types or tiers.

Why do we need interleaving page allocation among memory tiers?  I think
that you need to make it more explicit.  I guess that it's to increase
maximal memory bandwidth for workloads?

Yes, it is to increase the maximal memory bandwidth.

> This can be achieved by implementing an interleaving method that
> considers the tier weights.
> The tier weight will determine the proportion of nodes to select from
> those specified in the memory policy.
> A tier weight can be assigned to each memory type within the system.

What is the problem of the original interleaving?  I think you need to
make it explicit too.

The original approach, page distribution is fixed 1:1, user/admin cannot be changed as required. The need to use different ratios has become evident from the introduction of new memory tiers that cover a wide range of memory types.

With default interleaving we observed memory bandwidth utilization is less compare to the proposed approach with 85:15, when interleave between DRR and CXL.

We will capture this information in next series.

> Hasan Al Maruf had put forth a proposal for interleaving between two
> tiers, namely the top tier and the low tier. However, this patch was
> not adopted due to constraints on the number of available tiers.
>
> https://lore.kernel.org/linux-mm/YqD0%2FtzFwXvJ1gK6@cmpxchg.org/T/
>
> New proposed changes:
>
> 1. Introducea sysfs entry to allow setting the interleave weight for each
> memory tier.
> 2. Each tier with a default weight of 1, indicating a standard 1:1
> proportion.
> 3. Distribute the weight of that tier in a uniform manner across all nodes.
> 4. Modifications to the existing interleaving algorithm to support the
> implementation of multi-tier interleaving based on tier-weights.
>
> This is inline with Huang, Ying's presentation in lpc22, 16th slide in
> https://lpc.events/event/16/contributions/1209/attachments/1042/1995//
> Live%20In%20a%20World%20With%20Multiple%20Memory%20Types.pdf

Thanks to refer to the original work about this.

> Observed a significant increase (165%) in bandwidth utilization
> with the newly proposed multi-tier interleaving compared to the
> traditional 1:1 interleaving approach between DDR and CXL tier nodes,
> where 85% of the bandwidth is allocated to DDR tier and 15% to CXL
> tier with MLC -w2 option.

It appears that "mlc" isn't an open source software.  Better to use a
open source software to test.  And, even better to use a more practical
workloads instead of a memory bandwidth/latency measurement tool.

Sure, will try it.

> Usage Example:
>
> 1. Set weights for DDR (tier4) and CXL(teir22) tiers.
> echo 85 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight
> echo 15 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight
>
> 2. Interleave between DRR(tier4, node-0) and CXL (tier22, node-1) using numactl
> numactl -i0,1 mlc --loaded_latency W2
>
> Srinivasulu Thanneeru (2):
>   memory tier: Introduce sysfs for tier interleave weights.
>   mm: mempolicy: Interleave policy for tiered memory nodes
>
>  include/linux/memory-tiers.h |  27 ++++++++-
>  include/linux/sched.h        |   2 +
>  mm/memory-tiers.c            |  67 +++++++++++++++-------
>  mm/mempolicy.c               | 107 +++++++++++++++++++++++++++++++++--
>  4 files changed, 174 insertions(+), 29 deletions(-)

--
Best Regards,
Huang, Ying