[PATCH v3 0/3] Implement numa node notifier

Oscar Salvador posted 3 patches 7 months, 3 weeks ago
There is a newer version of this series
drivers/acpi/numa/hmat.c  |   6 +-
drivers/base/node.c       |  21 +++++++
drivers/cxl/core/region.c |  14 ++---
drivers/cxl/cxl.h         |   4 +-
include/linux/memory.h    |  38 ++++++++++++-
kernel/cgroup/cpuset.c    |   2 +-
mm/memory-tiers.c         |   8 +--
mm/memory_hotplug.c       | 117 +++++++++++++++++++++-----------------
mm/mempolicy.c            |   8 +--
mm/page_ext.c             |  12 +---
mm/slub.c                 |  45 +++------------
11 files changed, 153 insertions(+), 122 deletions(-)
[PATCH v3 0/3] Implement numa node notifier
Posted by Oscar Salvador 7 months, 3 weeks ago
v2 -> v3:
  - Add Suggested-by (David)
  - Replace last N_NORMAL_MEMORY mention in slub (David)
  - Replace the notifier for autoweitght-mempolicy
  - Fix build on !CONFIG_MEMORY_HOTPLUG

v1 -> v2:
  - Remove status_change_nid_normal and the code that
    deals with it (David & Vlastimil)
  - Remove slab_mem_offline_callback (David & Vlastimil)
  - Change the order of canceling the notifiers
    in {online,offline}_pages (Vlastimil)
  - Fix up a couple of whitespaces (Jonathan Cameron)
  - Add RBs-by

Memory notifier is a tool that allow consumers to get notified whenever
memory gets onlined or offlined in the system.
Currently, there are 10 consumers of that, but 5 out of those 10 consumers
are only interested in getting notifications when a numa node changes its
memory state.
That means going from memoryless to memory-aware of vice versa.

Which means that for every {online,offline}_pages operation they get
notified even though the numa node might not have changed its state.

While we are doing this, remove status_change_nid_normal, as the only
current user (slub) does not really need it.
This allows us to further simplify and clean up the code.

The first patch gets rid of status_change_nid_normal in slub.
The second patch implements a numa node notifier that does just that, and have
those consumers register in there, so they get notified only when they are
interested.

The third patch replaces 'status_change_nid{_normal}' fields within
memory_notify with a 'nid', as that is only what we need for memory
notifer and update the only user of it (page_ext).

Consumers that are only interested in numa node states change are:

 - memory-tier
 - slub
 - cpuset
 - hmat
 - cxl
 - autoweight-mempolicy

Oscar Salvador (3):
  mm,slub: Do not special case N_NORMAL nodes for slab_nodes
  mm,memory_hotplug: Implement numa node notifier
  mm,memory_hotplug: Rename status_change_nid parameter in memory_notify

 drivers/acpi/numa/hmat.c  |   6 +-
 drivers/base/node.c       |  21 +++++++
 drivers/cxl/core/region.c |  14 ++---
 drivers/cxl/cxl.h         |   4 +-
 include/linux/memory.h    |  38 ++++++++++++-
 kernel/cgroup/cpuset.c    |   2 +-
 mm/memory-tiers.c         |   8 +--
 mm/memory_hotplug.c       | 117 +++++++++++++++++++++-----------------
 mm/mempolicy.c            |   8 +--
 mm/page_ext.c             |  12 +---
 mm/slub.c                 |  45 +++------------
 11 files changed, 153 insertions(+), 122 deletions(-)

-- 
2.49.0
Re: [PATCH v3 0/3] Implement numa node notifier
Posted by Andrew Morton 7 months, 2 weeks ago
On Fri,  2 May 2025 10:36:21 +0200 Oscar Salvador <osalvador@suse.de> wrote:

> Memory notifier is a tool that allow consumers to get notified whenever
> memory gets onlined or offlined in the system.
> Currently, there are 10 consumers of that, but 5 out of those 10 consumers
> are only interested in getting notifications when a numa node changes its
> memory state.
> That means going from memoryless to memory-aware of vice versa.
> 
> Which means that for every {online,offline}_pages operation they get
> notified even though the numa node might not have changed its state.

Why is this a problem?  Is there some bug?  Are these notifications so
frequent that there are significant inefficiencies here?

Further down-thread, Gregory tells us that Dan's patch "seems to fix
the underlying problem", but nobody (including Dan) told us about any
"problem" at all.
Re: [PATCH v3 0/3] Implement numa node notifier
Posted by Oscar Salvador 7 months, 2 weeks ago
On Sat, May 03, 2025 at 08:03:34PM -0700, Andrew Morton wrote:
> Why is this a problem?  Is there some bug?  Are these notifications so
> frequent that there are significant inefficiencies here?

hi Andrew,

There is no bug, it is just suboptimal.

That the numa node state changes were tied to the memory notifier was
something hacky and that have us bugged for a while now.
Were mean to tidy that up but just never got around it.

Actually, first time I brought that up was when I reviewed the first implementation
of memory demotion (~ca 3-4 years ago now?).

With the addition of yet another consumer (auto-weitght mempolicy) that was only
interested in get notified on numa node changes, it became more clear that we
really want to split those up.

> Further down-thread, Gregory tells us that Dan's patch "seems to fix
> the underlying problem", but nobody (including Dan) told us about any
> "problem" at all.

That is related to auto-weight mempolicy patches, not to this one.
I _think_ Gregory means that I take it in as part of the series.

-- 
Oscar Salvador
SUSE Labs
Re: [PATCH v3 0/3] Implement numa node notifier
Posted by Gregory Price 7 months, 2 weeks ago
On Sun, May 04, 2025 at 07:44:40AM +0200, Oscar Salvador wrote:
> > Further down-thread, Gregory tells us that Dan's patch "seems to fix
> > the underlying problem", but nobody (including Dan) told us about any
> > "problem" at all.
> 
> That is related to auto-weight mempolicy patches, not to this one.
> I _think_ Gregory means that I take it in as part of the series.
> 

Yes, sorry for the imprecise language.  The two patches address similar
issues, but one without the other leaves us in an odd state.

1) Returning an error without this patch is an un-desired behavior
   because the current behavior can produce duplicate online
   notifications - which is expected but odd.  So instead of returning
   an error, we should just continue with the callback stack.

2) This patch makes it so that there won't be duplicates - and so
   returning an error if probably appropriate, as something is actually
   wrong if that happens.

So it makes sense to pull these changes together.

Thanks,
~Gregory
Re: [PATCH v3 0/3] Implement numa node notifier
Posted by Andrew Morton 7 months, 2 weeks ago
On Sun, 4 May 2025 07:44:40 +0200 Oscar Salvador <osalvador@suse.de> wrote:

> On Sat, May 03, 2025 at 08:03:34PM -0700, Andrew Morton wrote:
> > Why is this a problem?  Is there some bug?  Are these notifications so
> > frequent that there are significant inefficiencies here?
> 
> hi Andrew,
> 
> There is no bug, it is just suboptimal.
> 
> That the numa node state changes were tied to the memory notifier was
> something hacky and that have us bugged for a while now.
> Were mean to tidy that up but just never got around it.
> 
> Actually, first time I brought that up was when I reviewed the first implementation
> of memory demotion (~ca 3-4 years ago now?).
> 
> With the addition of yet another consumer (auto-weitght mempolicy) that was only
> interested in get notified on numa node changes, it became more clear that we
> really want to split those up.

OK, please add to the [0/N].

> > Further down-thread, Gregory tells us that Dan's patch "seems to fix
> > the underlying problem", but nobody (including Dan) told us about any
> > "problem" at all.
> 
> That is related to auto-weight mempolicy patches, not to this one.
> I _think_ Gregory means that I take it in as part of the series.

Ah, I'm glad to have company in my confusion ;)