[v1] Add a new acquire resource to query vcpu statistics

[RFC PATCH 0/2] Add a new acquire resource to query vcpu statistics

Posted by Matias Ezequiel Vara Larsen 3 years, 8 months ago

Hello all,

The purpose of this RFC is to get feedback about a new acquire resource that
exposes vcpu statistics for a given domain. The current mechanism to get those
statistics is by querying the hypervisor. This mechanism relies on a hypercall
and holds the domctl spinlock during its execution. When a pv tool like xcp-rrdd
periodically samples these counters, it ends up affecting other paths that share
that spinlock. By using acquire resources, the pv tool only requires a few
hypercalls to set the shared memory region and samples are got without issuing
any other hypercall. The original idea has been suggested by Andrew Cooper to
which I have been discussing about how to implement the current PoC. You can
find the RFC patch series at [1]. The series is rebased on top of stable-4.15.

I am currently a bit blocked on 1) what to expose and 2) how to expose it. For
1), I decided to expose what xcp-rrdd is querying, e.g., XEN_DOMCTL_getvcpuinfo.
More precisely, xcp-rrd gets runstate.time[RUNSTATE_running]. This is a uint64_t
counter. However, the time spent in other states may be interesting too.
Regarding 2), I am not sure if simply using an array of uint64_t is enough or if
a different interface should be exposed. The remaining question is when to get
new values. For the moment, I am updating this counter during
vcpu_runstate_change().

The current series includes a simple pv tool that shows how this new interface is
used. This tool maps the counter and periodically samples it.

Any feedback/help would be appreciated.

Thanks, Matias.

[1] https://github.com/MatiasVara/xen/tree/feature_stats

Matias Ezequiel Vara Larsen (2):
  xen/memory : Add stats_table resource type
  tools/misc: Add xen-stats tool

 tools/misc/Makefile         |  5 +++
 tools/misc/xen-stats.c      | 83 +++++++++++++++++++++++++++++++++++++
 xen/common/domain.c         | 42 +++++++++++++++++++
 xen/common/memory.c         | 29 +++++++++++++
 xen/common/sched/core.c     |  5 +++
 xen/include/public/memory.h |  1 +
 xen/include/xen/sched.h     |  5 +++
 7 files changed, 170 insertions(+)
 create mode 100644 tools/misc/xen-stats.c

-- 
2.25.1

Re: [RFC PATCH 0/2] Add a new acquire resource to query vcpu statistics

Posted by George Dunlap 3 years, 7 months ago

> On 17 May 2022, at 15:33, Matias Ezequiel Vara Larsen <matiasevara@gmail.com> wrote:
> 
> Hello all,
> 
> The purpose of this RFC is to get feedback about a new acquire resource that
> exposes vcpu statistics for a given domain. The current mechanism to get those
> statistics is by querying the hypervisor. This mechanism relies on a hypercall
> and holds the domctl spinlock during its execution. When a pv tool like xcp-rrdd
> periodically samples these counters, it ends up affecting other paths that share
> that spinlock. By using acquire resources, the pv tool only requires a few
> hypercalls to set the shared memory region and samples are got without issuing
> any other hypercall. The original idea has been suggested by Andrew Cooper to
> which I have been discussing about how to implement the current PoC. You can
> find the RFC patch series at [1]. The series is rebased on top of stable-4.15.
> 
> I am currently a bit blocked on 1) what to expose and 2) how to expose it. For
> 1), I decided to expose what xcp-rrdd is querying, e.g., XEN_DOMCTL_getvcpuinfo.
> More precisely, xcp-rrd gets runstate.time[RUNSTATE_running]. This is a uint64_t
> counter. However, the time spent in other states may be interesting too.
> Regarding 2), I am not sure if simply using an array of uint64_t is enough or if
> a different interface should be exposed. The remaining question is when to get
> new values. For the moment, I am updating this counter during
> vcpu_runstate_change().
> 
> The current series includes a simple pv tool that shows how this new interface is
> used. This tool maps the counter and periodically samples it.
> 
> Any feedback/help would be appreciated.

Hey Matias,

Sorry it’s taken so long to get back to you.  My day-to-day job has shifted away from technical things to community management; this has been on my radar but I never made time to dig into it.

There are some missing details I’ve had to try to piece together about the situation, so let me sketch things out a bit further and see if I understand the situation:

* xcp-rrd currently wants (at minimum) to record runstate.time[RUNSTATE_running] for each vcpu.  Currently that means calling XEN_DOMCTL_getvcpuinfo, which has to hold a single global domctl_lock (!) for the entire hypercall; and of course must be iterated over every vcpu in the system for every update.

* VCPUOP_get_runstate_info copies out a vcpu_runstate_info struct, which contains information on the other runstates.  Additionally, VCPUOP_register_runstate_memory_area already does something similar to what you want: it passes a virtual address to Xen, which Xen maps, and copies information about the various vcpus into (in update_runstate_area()).

* However, the above assumes a domain of “current->domain”: That is a domain can call VCPUOP_get_runstate_info on one of its own vcpus, but dom0 cannot call it to get information about the vcpus of other domains.

* Additionally, VCPUOP_register_runstate_memory_area registers by *virtual address*; this is actually problematic even for guest kernels looking at their own vcpus; but would be completely inappropriate for a dom0 userspace application, which is what you’re looking at.

Your solution is to expose things via the xenforeignmemory interface instead, modelled after the vmtrace_buf functionality.

Does that all sound right?

I think at a high level that’s probably the right way to go.

As you say, my default would be to expose similar information as VCPUOP_get_runstate_info.  I’d even consider just using vcpu_runstate_info_t.

The other option would be to try to make the page a more general “foreign vcpu info” page, which we could expand with more information as we find it useful.

In this patch, you’re allocating 4k *per vcpu on the entire system* to hold a single 64-bit value; even if you decide to use vcpu_runstate_info_t, there’s still quite a large wastage.  Would it make sense rather to have this pass back an array of MFNs designed to be mapped contiguously, with the vcpus listed as an array? This seems to be what XENMEM_resource_ioreq_server does.

The advantage of making the structure extensible is that we wouldn’t need to add another interface, and potentially another full page, if we wanted to add more functionality that we wanted to export.  On the other hand, every new functionality that we add may require adding code to copy things into it; making it so that such code is added bit by bit as it’s requested might be better.

I have some more comments I’ll give on the 1/2 patch.

 -George

> 
> Thanks, Matias.
> 
> [1] https://github.com/MatiasVara/xen/tree/feature_stats
> 
> Matias Ezequiel Vara Larsen (2):
>  xen/memory : Add stats_table resource type
>  tools/misc: Add xen-stats tool
> 
> tools/misc/Makefile         |  5 +++
> tools/misc/xen-stats.c      | 83 +++++++++++++++++++++++++++++++++++++
> xen/common/domain.c         | 42 +++++++++++++++++++
> xen/common/memory.c         | 29 +++++++++++++
> xen/common/sched/core.c     |  5 +++
> xen/include/public/memory.h |  1 +
> xen/include/xen/sched.h     |  5 +++
> 7 files changed, 170 insertions(+)
> create mode 100644 tools/misc/xen-stats.c
> 
> --
> 2.25.1
>

Re: [RFC PATCH 0/2] Add a new acquire resource to query vcpu statistics

Posted by Matias Ezequiel Vara Larsen 3 years, 7 months ago

Hello George and thanks for the review! you will find my comments below.

On Fri, Jun 17, 2022 at 07:54:32PM +0000, George Dunlap wrote:
> 
> 
> > On 17 May 2022, at 15:33, Matias Ezequiel Vara Larsen <matiasevara@gmail.com> wrote:
> > 
> > Hello all,
> > 
> > The purpose of this RFC is to get feedback about a new acquire resource that
> > exposes vcpu statistics for a given domain. The current mechanism to get those
> > statistics is by querying the hypervisor. This mechanism relies on a hypercall
> > and holds the domctl spinlock during its execution. When a pv tool like xcp-rrdd
> > periodically samples these counters, it ends up affecting other paths that share
> > that spinlock. By using acquire resources, the pv tool only requires a few
> > hypercalls to set the shared memory region and samples are got without issuing
> > any other hypercall. The original idea has been suggested by Andrew Cooper to
> > which I have been discussing about how to implement the current PoC. You can
> > find the RFC patch series at [1]. The series is rebased on top of stable-4.15.
> > 
> > I am currently a bit blocked on 1) what to expose and 2) how to expose it. For
> > 1), I decided to expose what xcp-rrdd is querying, e.g., XEN_DOMCTL_getvcpuinfo.
> > More precisely, xcp-rrd gets runstate.time[RUNSTATE_running]. This is a uint64_t
> > counter. However, the time spent in other states may be interesting too.
> > Regarding 2), I am not sure if simply using an array of uint64_t is enough or if
> > a different interface should be exposed. The remaining question is when to get
> > new values. For the moment, I am updating this counter during
> > vcpu_runstate_change().
> > 
> > The current series includes a simple pv tool that shows how this new interface is
> > used. This tool maps the counter and periodically samples it.
> > 
> > Any feedback/help would be appreciated.
> 
> Hey Matias,
> 
> Sorry it’s taken so long to get back to you.  My day-to-day job has shifted away from technical things to community management; this has been on my radar but I never made time to dig into it.
> 
> There are some missing details I’ve had to try to piece together about the situation, so let me sketch things out a bit further and see if I understand the situation:
> 
> * xcp-rrd currently wants (at minimum) to record runstate.time[RUNSTATE_running] for each vcpu.  Currently that means calling XEN_DOMCTL_getvcpuinfo, which has to hold a single global domctl_lock (!) for the entire hypercall; and of course must be iterated over every vcpu in the system for every update.
> 

For example, in xcp-ng, xcp-rrdd is sampling all the VCPUs of the system every 5
seconds. Also, xcp-rrdd queries other counters like XEN_DOMCTL_getdomaininfo. 

Out of curiosity, do you know any benchmark to measure the impact of this
querying? My guess is that the time of domctl-based operations would increase
with the number of VCPUs. In such a escenario, I am supposing that the time to
query all vcpus increase with the number of vcpus thus holding the domctl_lock
longer. However, this would be only observable in a large
enough system.

> * VCPUOP_get_runstate_info copies out a vcpu_runstate_info struct, which contains information on the other runstates.  Additionally, VCPUOP_register_runstate_memory_area already does something similar to what you want: it passes a virtual address to Xen, which Xen maps, and copies information about the various vcpus into (in update_runstate_area()).
> 
> * However, the above assumes a domain of “current->domain”: That is a domain can call VCPUOP_get_runstate_info on one of its own vcpus, but dom0 cannot call it to get information about the vcpus of other domains.
> 
> * Additionally, VCPUOP_register_runstate_memory_area registers by *virtual address*; this is actually problematic even for guest kernels looking at their own vcpus; but would be completely inappropriate for a dom0 userspace application, which is what you’re looking at.
> 

I just learned about VCPUOP_register_runstate_memory_area a few days ago. I did
not know that it is only for current->domain. Thanks for pointing it out.

> Your solution is to expose things via the xenforeignmemory interface instead, modelled after the vmtrace_buf functionality.
> 
> Does that all sound right?
> 

That's correct. I used vmtrace_buf functionality for inspiration.

> I think at a high level that’s probably the right way to go.
> 
> As you say, my default would be to expose similar information as VCPUOP_get_runstate_info.  I’d even consider just using vcpu_runstate_info_t.
> 
> The other option would be to try to make the page a more general “foreign vcpu info” page, which we could expand with more information as we find it useful.
> 
> In this patch, you’re allocating 4k *per vcpu on the entire system* to hold a single 64-bit value; even if you decide to use vcpu_runstate_info_t, there’s still quite a large wastage.  Would it make sense rather to have this pass back an array of MFNs designed to be mapped contiguously, with the vcpus listed as an array? This seems to be what XENMEM_resource_ioreq_server does.
> 
> The advantage of making the structure extensible is that we wouldn’t need to add another interface, and potentially another full page, if we wanted to add more functionality that we wanted to export.  On the other hand, every new functionality that we add may require adding code to copy things into it; making it so that such code is added bit by bit as it’s requested might be better.
> 

Current PoC is indeed a waste of memory in two senses:
1) data structures are not well chosen 
2) memory is being allocated unconditionally

For 1), you propose to use an extensible structure on top of an array of MFNs. I
checked xen.git/xen/include/public/hvm/ioreq.h, it defines the
structure:

struct shared_iopage {
    struct ioreq vcpu_ioreq[1];
};

And then, it accesses it as:

p->vcpu_ioreq[v->vcpu_id];

I could have similar structures, let me sketch it and then I will write down a
design document. The extensible structures could look like:

struct vcpu_stats{ 
   uint64 runstate_running_time;
   // potentially other runstate-time counters
};

struct shared_statspages {
   // potentially other counters, e.g., domain-info
   struct vcpu_stats vcpu_info[1]
}; 

The shared_statspage structure would be mapped on top of an array of continuous
MFNs. The vcpus are listed as an array. I think this structure could be extended
to record per-domain counters by defining them just before vcpu_info[1].  

What do you think?

For 2), you propose a domctl flag on domain creation to enable/disable the
allocation and use of these buffers. I think that is the right way to go for the
moment.

There is a 3) point regarding what Jan suggested about how to ensure that the
consumed data is consistent. I do not have a response for that yet, I will think
about it.

I will address these points and submit v1 in the next few weeks.   

Thanks, Matias.

> I have some more comments I’ll give on the 1/2 patch.
> 
>  -George
> 
> 
> 
> 
> 
> 
> > 
> > Thanks, Matias.
> > 
> > [1] https://github.com/MatiasVara/xen/tree/feature_stats
> > 
> > Matias Ezequiel Vara Larsen (2):
> >  xen/memory : Add stats_table resource type
> >  tools/misc: Add xen-stats tool
> > 
> > tools/misc/Makefile         |  5 +++
> > tools/misc/xen-stats.c      | 83 +++++++++++++++++++++++++++++++++++++
> > xen/common/domain.c         | 42 +++++++++++++++++++
> > xen/common/memory.c         | 29 +++++++++++++
> > xen/common/sched/core.c     |  5 +++
> > xen/include/public/memory.h |  1 +
> > xen/include/xen/sched.h     |  5 +++
> > 7 files changed, 170 insertions(+)
> > create mode 100644 tools/misc/xen-stats.c
> > 
> > --
> > 2.25.1
> > 
>

RE: [RFC PATCH 0/2] Add a new acquire resource to query vcpu statistics

Posted by Henry Wang 3 years, 7 months ago

Hi,

It seems that this series has been stale for more than a month with no comment
from maintainer for patch#1 [1] and actions needed from author for patch#2 [2].
So sending this email as a gentle reminder. Thanks!

[1] https://patchwork.kernel.org/project/xen-devel/patch/d0afb6657b1e78df4857ad7bcc875982e9c022b4.1652797713.git.matias.vara@vates.fr/
[2] https://patchwork.kernel.org/project/xen-devel/patch/e233c4f60c6fe97b93b3adf9affeb0404c554130.1652797713.git.matias.vara@vates.fr/

Kind regards,
Henry

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of
> Matias Ezequiel Vara Larsen
> Subject: [RFC PATCH 0/2] Add a new acquire resource to query vcpu
> statistics
> 
> Hello all,
> 
> The purpose of this RFC is to get feedback about a new acquire resource
> that
> exposes vcpu statistics for a given domain. The current mechanism to get
> those
> statistics is by querying the hypervisor. This mechanism relies on a
> hypercall
> and holds the domctl spinlock during its execution. When a pv tool like xcp-
> rrdd
> periodically samples these counters, it ends up affecting other paths that
> share
> that spinlock. By using acquire resources, the pv tool only requires a few
> hypercalls to set the shared memory region and samples are got without
> issuing
> any other hypercall. The original idea has been suggested by Andrew
> Cooper to
> which I have been discussing about how to implement the current PoC. You
> can
> find the RFC patch series at [1]. The series is rebased on top of stable-4.15.
> 
> I am currently a bit blocked on 1) what to expose and 2) how to expose it.
> For
> 1), I decided to expose what xcp-rrdd is querying, e.g.,
> XEN_DOMCTL_getvcpuinfo.
> More precisely, xcp-rrd gets runstate.time[RUNSTATE_running]. This is a
> uint64_t
> counter. However, the time spent in other states may be interesting too.
> Regarding 2), I am not sure if simply using an array of uint64_t is enough or
> if
> a different interface should be exposed. The remaining question is when to
> get
> new values. For the moment, I am updating this counter during
> vcpu_runstate_change().
> 
> The current series includes a simple pv tool that shows how this new
> interface is
> used. This tool maps the counter and periodically samples it.
> 
> Any feedback/help would be appreciated.
> 
> Thanks, Matias.
> 
> [1] https://github.com/MatiasVara/xen/tree/feature_stats
> 
> Matias Ezequiel Vara Larsen (2):
>   xen/memory : Add stats_table resource type
>   tools/misc: Add xen-stats tool
> 
>  tools/misc/Makefile         |  5 +++
>  tools/misc/xen-stats.c      | 83 +++++++++++++++++++++++++++++++++++++
>  xen/common/domain.c         | 42 +++++++++++++++++++
>  xen/common/memory.c         | 29 +++++++++++++
>  xen/common/sched/core.c     |  5 +++
>  xen/include/public/memory.h |  1 +
>  xen/include/xen/sched.h     |  5 +++
>  7 files changed, 170 insertions(+)
>  create mode 100644 tools/misc/xen-stats.c
> 
> --
> 2.25.1
>