[v2] PM: EM: Add netlink support for the energy model.

[PATCH v2 00/10] PM: EM: Add netlink support for the energy model.

Posted by Changwoo Min 8 months ago

There is a need to access the energy model from the userspace. One such
example is the sched_ext schedulers [1]. The userspace part of the
sched_ext schedules could feed the (post-processed) energy-model
information to the BPF part of the scheduler.

Currently, debugfs is the only way to read the energy model from userspace;
however, it lacks proper notification mechanisms when a performance domain
and its associated energy model change.

This patch set introduces a generic netlink for the energy model, as
discussed in [2]. It allows a userspace program to read the performance
domain and its energy model. It notifies the userspace program when a
performance domain is created or deleted or its energy model is updated
through a multicast interface.

Specifically, it supports two commands:
  - EM_CMD_GET_PDS: Get the list of information for all performance
    domains.
  - EM_CMD_GET_PD_TABLE: Get the energy model table of a performance
    domain.

Also, it supports three notification events:
  - EM_CMD_PD_CREATED: When a performance domain is created.
  - EM_CMD_PD_DELETED: When a performance domain is deleted.
  - EM_CMD_PD_UPDATED: When the energy model table of a performance domain
    is updated.

This can be tested using the tool, tools/net/ynl/pyynl/cli.py, for example,
with the following commands:

  $> tools/net/ynl/pyynl/cli.py \
     --spec Documentation/netlink/specs/em.yaml \
     --do get-pds
  $> tools/net/ynl/pyynl/cli.py \
     --spec Documentation/netlink/specs/em.yaml \
     --do get-pd-table --json '{"pd-id": 0}'
  $> tools/net/ynl/pyynl/cli.py \
     --spec Documentation/netlink/specs/em.yaml \
     --subscribe event  --sleep 10

[1] https://lwn.net/Articles/922405/
[2] https://lore.kernel.org/lkml/a82423bc-8c38-4d57-93da-c4f20011cc92@arm.com/

ChangeLog v1 -> v2:
  - Use YNL to generate boilerplate code. Overhaul the naming conventions
    (command, event, notification, attribute) to follow the typical
    conventions of other YNL-based netlink implementations.
  - Calculate the exact message size instead of using NLMSG_GOODSIZE
    when allocating a message (genlmsg_new). This avoids the reallocation
    of a message.
  - Remove an unnecessary function, em_netlink_exit(), and initialize the
    netlink (em_netlink_init) at em_netlink.c without touching energy_model.c.

CC: Lukasz Luba <lukasz.luba@arm.com>
CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: Tejun Heo <tj@kernel.org>
Signed-off-by: Changwoo Min <changwoo@igalia.com>

Changwoo Min (10):
  PM: EM: Add em.yaml and autogen files.
  PM: EM: Add a skeleton code for netlink notification.
  PM: EM: Assign a unique ID when creating a performance domain.
  PM: EM: Expose the ID of a performance domain via debugfs.
  PM: EM: Add an iterator and accessor for the performance domain.
  PM: EM: Implement em_nl_get_pds_doit().
  PM: EM: Implement em_nl_get_pd_table_doit().
  PM: EM: Implement em_notify_pd_deleted().
  PM: EM: Implement em_notify_pd_created/updated().
  PM: EM: Notify an event when the performance domain changes.

 Documentation/netlink/specs/em.yaml | 113 ++++++++++
 MAINTAINERS                         |   3 +
 include/linux/energy_model.h        |  19 ++
 include/uapi/linux/energy_model.h   |  62 ++++++
 kernel/power/Makefile               |   5 +-
 kernel/power/em_netlink.c           | 311 ++++++++++++++++++++++++++++
 kernel/power/em_netlink.h           |  34 +++
 kernel/power/em_netlink_autogen.c   |  48 +++++
 kernel/power/em_netlink_autogen.h   |  23 ++
 kernel/power/energy_model.c         |  83 +++++++-
 10 files changed, 699 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/netlink/specs/em.yaml
 create mode 100644 include/uapi/linux/energy_model.h
 create mode 100644 kernel/power/em_netlink.c
 create mode 100644 kernel/power/em_netlink.h
 create mode 100644 kernel/power/em_netlink_autogen.c
 create mode 100644 kernel/power/em_netlink_autogen.h

-- 
2.49.0

Re: [PATCH v2 00/10] PM: EM: Add netlink support for the energy model.

Posted by Changwoo Min 7 months, 2 weeks ago

Gentle ping as it reaches 2-weeks.

@Lukasz, @Rafael -- I have a question related to the energy model
in general. As far as I understand, the energy model describes
the performance-energy consumption tradeoff when a single CPU in
a performance domain is running. However, in reality, SoCs may
have thermal constraints, which would result in additional
constraints. For example, running all CPUs with the highest
frequency may not be possible. My question is this: does kernel
maintain and use such (thermal?) constraints?

Regards,
Changwoo Min

On 6/13/25 18:44, Changwoo Min wrote:
> There is a need to access the energy model from the userspace. One such
> example is the sched_ext schedulers [1]. The userspace part of the
> sched_ext schedules could feed the (post-processed) energy-model
> information to the BPF part of the scheduler.
> 
> Currently, debugfs is the only way to read the energy model from userspace;
> however, it lacks proper notification mechanisms when a performance domain
> and its associated energy model change.
> 
> This patch set introduces a generic netlink for the energy model, as
> discussed in [2]. It allows a userspace program to read the performance
> domain and its energy model. It notifies the userspace program when a
> performance domain is created or deleted or its energy model is updated
> through a multicast interface.
> 
> Specifically, it supports two commands:
>    - EM_CMD_GET_PDS: Get the list of information for all performance
>      domains.
>    - EM_CMD_GET_PD_TABLE: Get the energy model table of a performance
>      domain.
> 
> Also, it supports three notification events:
>    - EM_CMD_PD_CREATED: When a performance domain is created.
>    - EM_CMD_PD_DELETED: When a performance domain is deleted.
>    - EM_CMD_PD_UPDATED: When the energy model table of a performance domain
>      is updated.
> 
> This can be tested using the tool, tools/net/ynl/pyynl/cli.py, for example,
> with the following commands:
> 
>    $> tools/net/ynl/pyynl/cli.py \
>       --spec Documentation/netlink/specs/em.yaml \
>       --do get-pds
>    $> tools/net/ynl/pyynl/cli.py \
>       --spec Documentation/netlink/specs/em.yaml \
>       --do get-pd-table --json '{"pd-id": 0}'
>    $> tools/net/ynl/pyynl/cli.py \
>       --spec Documentation/netlink/specs/em.yaml \
>       --subscribe event  --sleep 10
> 
> [1] https://lwn.net/Articles/922405/
> [2] https://lore.kernel.org/lkml/a82423bc-8c38-4d57-93da-c4f20011cc92@arm.com/
> 
> ChangeLog v1 -> v2:
>    - Use YNL to generate boilerplate code. Overhaul the naming conventions
>      (command, event, notification, attribute) to follow the typical
>      conventions of other YNL-based netlink implementations.
>    - Calculate the exact message size instead of using NLMSG_GOODSIZE
>      when allocating a message (genlmsg_new). This avoids the reallocation
>      of a message.
>    - Remove an unnecessary function, em_netlink_exit(), and initialize the
>      netlink (em_netlink_init) at em_netlink.c without touching energy_model.c.
> 
> CC: Lukasz Luba <lukasz.luba@arm.com>
> CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> CC: Tejun Heo <tj@kernel.org>
> Signed-off-by: Changwoo Min <changwoo@igalia.com>
> 
> Changwoo Min (10):
>    PM: EM: Add em.yaml and autogen files.
>    PM: EM: Add a skeleton code for netlink notification.
>    PM: EM: Assign a unique ID when creating a performance domain.
>    PM: EM: Expose the ID of a performance domain via debugfs.
>    PM: EM: Add an iterator and accessor for the performance domain.
>    PM: EM: Implement em_nl_get_pds_doit().
>    PM: EM: Implement em_nl_get_pd_table_doit().
>    PM: EM: Implement em_notify_pd_deleted().
>    PM: EM: Implement em_notify_pd_created/updated().
>    PM: EM: Notify an event when the performance domain changes.
> 
>   Documentation/netlink/specs/em.yaml | 113 ++++++++++
>   MAINTAINERS                         |   3 +
>   include/linux/energy_model.h        |  19 ++
>   include/uapi/linux/energy_model.h   |  62 ++++++
>   kernel/power/Makefile               |   5 +-
>   kernel/power/em_netlink.c           | 311 ++++++++++++++++++++++++++++
>   kernel/power/em_netlink.h           |  34 +++
>   kernel/power/em_netlink_autogen.c   |  48 +++++
>   kernel/power/em_netlink_autogen.h   |  23 ++
>   kernel/power/energy_model.c         |  83 +++++++-
>   10 files changed, 699 insertions(+), 2 deletions(-)
>   create mode 100644 Documentation/netlink/specs/em.yaml
>   create mode 100644 include/uapi/linux/energy_model.h
>   create mode 100644 kernel/power/em_netlink.c
>   create mode 100644 kernel/power/em_netlink.h
>   create mode 100644 kernel/power/em_netlink_autogen.c
>   create mode 100644 kernel/power/em_netlink_autogen.h
>

Re: [PATCH v2 00/10] PM: EM: Add netlink support for the energy model.

Posted by Lukasz Luba 7 months, 1 week ago

Hi Changwoo,

On 6/27/25 04:37, Changwoo Min wrote:
> Gentle ping as it reaches 2-weeks.

My apologies for delay on that topic.

Let me have a look into this...

> 
> @Lukasz, @Rafael -- I have a question related to the energy model
> in general. As far as I understand, the energy model describes
> the performance-energy consumption tradeoff when a single CPU in
> a performance domain is running. However, in reality, SoCs may
> have thermal constraints, which would result in additional
> constraints. For example, running all CPUs with the highest
> frequency may not be possible. My question is this: does kernel
> maintain and use such (thermal?) constraints?

That's true in real scenarios on mobile SoCs, running with max freq
on all CPUs is possible likely only for short period...

The Energy Model itself doesn't handle such situation. The code in
thermal framework and in Energy Aware Scheduler has feature to handle
it and know which top OPPs are not possible to be used.

Although, the EM in such situation is likely to be adjusted, because the
SoC temperature reaches high values. Especially if that heat was
generated by the GPU not CPUs themselves, then it's extra leakage will
be accounted and EM data modified in runtime.

Another scenario when the EM might be updated is when Middleware
will recognize a known 'scenario' e.g. long video conference
with camera in use (thus Image Signal Processor, which also can
heat the SoC, like GPU). Or a 'preferred profile' for light-weight
application using some HW decoding, e.g. video playback and
thus some CPUs are more preferred by EAS to be used in it (EM might
change the energy efficiency gently for such CPUs).

Regards,
Lukasz

Re: [PATCH v2 00/10] PM: EM: Add netlink support for the energy model.

Posted by Changwoo Min 7 months, 1 week ago

Hi Lukasz,

On 6/30/25 19:07, Lukasz Luba wrote:
>> @Lukasz, @Rafael -- I have a question related to the energy model
>> in general. As far as I understand, the energy model describes
>> the performance-energy consumption tradeoff when a single CPU in
>> a performance domain is running. However, in reality, SoCs may
>> have thermal constraints, which would result in additional
>> constraints. For example, running all CPUs with the highest
>> frequency may not be possible. My question is this: does kernel
>> maintain and use such (thermal?) constraints?
> 
> That's true in real scenarios on mobile SoCs, running with max freq
> on all CPUs is possible likely only for short period...
> 
> The Energy Model itself doesn't handle such situation. The code in
> thermal framework and in Energy Aware Scheduler has feature to handle
> it and know which top OPPs are not possible to be used.
> 
> Although, the EM in such situation is likely to be adjusted, because the
> SoC temperature reaches high values. Especially if that heat was
> generated by the GPU not CPUs themselves, then it's extra leakage will
> be accounted and EM data modified in runtime.
> 
> Another scenario when the EM might be updated is when Middleware
> will recognize a known 'scenario' e.g. long video conference
> with camera in use (thus Image Signal Processor, which also can
> heat the SoC, like GPU). Or a 'preferred profile' for light-weight
> application using some HW decoding, e.g. video playback and
> thus some CPUs are more preferred by EAS to be used in it (EM might
> change the energy efficiency gently for such CPUs).

Thank you for the explanation! Besides this, do you see anything that
needs to be addressed in the code? Of course, I expect there are. For
the one reported by the kernel test report, that is the obvious one, so
I will address it together with your and others' feedback

Regards,
Changwoo Min