[libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring Technology (CMT) support

Wang Huaqiang posted 3 patches 19 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/1528448539-25588-1-git-send-email-huaqiang.wang@intel.com
Test syntax-check failed
include/libvirt/libvirt-domain.h    |  10 ++
src/conf/domain_conf.c              |  28 ++++
src/conf/domain_conf.h              |   3 +
src/driver-hypervisor.h             |   8 +
src/libvirt-domain.c                |  92 +++++++++++
src/libvirt_private.syms            |   9 +
src/libvirt_public.syms             |   6 +
src/qemu/qemu_driver.c              | 189 +++++++++++++++++++++
src/qemu/qemu_process.c             |  65 +++++++-
src/remote/remote_daemon_dispatch.c |  45 +++++
src/remote/remote_driver.c          |   2 +
src/remote/remote_protocol.x        |  28 +++-
src/remote_protocol-structs         |  12 ++
src/util/virresctrl.c               | 316 +++++++++++++++++++++++++++++++++++-
src/util/virresctrl.h               |  44 +++++
tools/virsh-domain-monitor.c        |   7 +
tools/virsh-domain.c                |  74 +++++++++
17 files changed, 933 insertions(+), 5 deletions(-)

[libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring Technology (CMT) support

Posted by Wang Huaqiang 19 weeks ago
This is an RFC request for supporting CPU Cache Monitoring Technology (CMT) feature in libvirt. Since MBM is also another feature which is very close to CMT, for simplicity we only discuss CMT here. MBM is the followup that will be implemented after CMT.
About CMT please refer to Intel x86 SDM section 17.18 of volume 3 (link:https://software.intel.com/en-us/articles/intel-sdm).

## About '_virResctrlMon' interface 

The cache allocation technology (CAT) has already been implemented in util/virresctrl.* which interacts with Linux kernel resctrl file system. Very simlimar to CAT, the CMT object is represented by 'struct _virResctrlMon', which is

```
struct _virResctrlMon {
    virObject parent;

    /* pairedalloc: pointer to a resctrl allocaion it paried with.
     * NULL for a resctrl monitoring group not associated with
     * any allocation. */
    virResctrlAllocPtr pairedalloc;
    /* The identifier (any unique string for now) */
    char *id;
    /* libvirt-generated path, may be identical to alloction path
     * may not if allocation is ready */
    char *path;
};
```

Almost following the same logic behind '_virResctrlAlloc' which is mainly presented in file 'virresctrl.c', a group of APIs has been designed to manipulate '_virResctrlMon'. The '_virResctrlMon' shares a lot in common with '_virResctrlAlloc' except field 'pairedalloc'.
'pairedalloc' stores the pointer of paired resctrl allocation object. With current libvirt resctrl implementation, if a resctrl '_virResctrlAlloc' object is created, the CMT hardware is enabled automatically and shares the same folder under same resctrlfs, I call the CMT '_virResctrlMon' object that shares the same folder under resctrlfs as 'paired' _virResctrlMon, further, one '_virResctrlMon' and one '_virResctrlAlloc' are a pare.  In '_virResctrlMon' the paired '_virResctrlAlloc' is tracked through pairedalloc. paired mon group could not be dynamically enabled or disabled during runtime.
'pairedalloc' could be set to NULL, which creates a non-paired mon group object. Which is necessory because CMT could work independently to monitor the utilization of critical CPU resouces (cache or memory bandwidth) without allocating any dedicated cache or memory bandwidth. A non-paired mon group object represents an independent working CMT. Non-paired mon group could be enabled or disabled during runtime.

## About virsh command 'resctrl' 

To set or get the resctrl mon group (hardware CMT), a virsh command 'resctrl' is created. here are the common usages:
```
[root@dl-c200 david]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 1     vm3                            running
 3     vm2                            running
 -     vm1                            shut off
```

### Test on a running domain vm3
To get RDT monitoring status, type 'virsh resctrl <domain>'
```
    [root@dl-c200 david]# virsh resctrl vm3
    RDT Monitoring Status: Enabled
```

To enable RDT monitoring, type 'virsh resctrl <domain> --enable'
```
    [root@dl-c200 david]# virsh resctrl vm3 --enable
    RDT Monitoring Status: Enabled
```

To diable RDT monitoring, type 'virsh resctrl <domain> --disable'
```
    [root@dl-c200 david]# virsh resctrl vm3 --disable
    RDT Monitoring Status: Disabled

    [root@dl-c200 david]# virsh resctrl vm3
    RDT Monitoring Status: Disabled
```

### test on domain not running vm1
if domain is not active, it will fail to set RDT monitoring status, and also get the state of 'disabled'
```
    [root@dl-c200 david]# virsh resctrl vm1
    RDT Monitoring Status: Disabled

    [root@dl-c200 david]# virsh resctrl vm1 --enable
    error: Requested operation is not valid: domain is not running

    [root@dl-c200 david]# virsh resctrl vm1 --disable
    error: Requested operation is not valid: domain is not running
```

### test on domain vm2
domain vm2 is active and the CAT functionality is enabled through 'cachetune' (configured in 'cputune/cachetune' section). So the resctrl mon group is a 'paried' one, for 'pared' mon group, the RDT monitoring could not be disabled. If it is allowed to disable 'paire' mon group, we have to destroy resctrl allocation folders which is not supported by current cache allocation design.
```
    [root@dl-c200 libvirt]# virsh resctrl vm2 --enable
    RDT Monitoring Status: Enabled (forced by cachetune)

    [root@dl-c200 libvirt]# virsh resctrl vm2 --disable
    RDT Monitoring Status: Enabled (forced by cachetune)

    [root@dl-c200 libvirt]# virsh resctrl vm2
    RDT Monitoring Status: Enabled (forced by cachetune)
```

## About showing the utilization information of RDT

A domstats field has been created to show the utilization of RDT resources, the command is like this:
```
    [root@dl-c200 libvirt]# virsh domstats --resctrl
    Domain: 'vm1'
      resctrl.cmt=0

    Domain: 'vm3'
      resctrl.cmt=180224

    Domain: 'vm2'
      resctrl.cmt=2613248
```


Wang Huaqiang (3):
  util: add Intel x86 RDT/CMT support
  tools: virsh: add command for controling/monitoring resctrl
  tools: virsh domstats: show RDT CMT resource utilization information

 include/libvirt/libvirt-domain.h    |  10 ++
 src/conf/domain_conf.c              |  28 ++++
 src/conf/domain_conf.h              |   3 +
 src/driver-hypervisor.h             |   8 +
 src/libvirt-domain.c                |  92 +++++++++++
 src/libvirt_private.syms            |   9 +
 src/libvirt_public.syms             |   6 +
 src/qemu/qemu_driver.c              | 189 +++++++++++++++++++++
 src/qemu/qemu_process.c             |  65 +++++++-
 src/remote/remote_daemon_dispatch.c |  45 +++++
 src/remote/remote_driver.c          |   2 +
 src/remote/remote_protocol.x        |  28 +++-
 src/remote_protocol-structs         |  12 ++
 src/util/virresctrl.c               | 316 +++++++++++++++++++++++++++++++++++-
 src/util/virresctrl.h               |  44 +++++
 tools/virsh-domain-monitor.c        |   7 +
 tools/virsh-domain.c                |  74 +++++++++
 17 files changed, 933 insertions(+), 5 deletions(-)

-- 
2.7.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring Technology (CMT) support

Posted by Martin Kletzander 18 weeks ago
[It would be nice if you wrapped the long lines]

On Fri, Jun 08, 2018 at 05:02:16PM +0800, Wang Huaqiang wrote:
>This is an RFC request for supporting CPU Cache Monitoring Technology (CMT) feature in libvirt. Since MBM is also another feature which is very close to CMT, for simplicity we only discuss CMT here. MBM is the followup that will be implemented after CMT.
>About CMT please refer to Intel x86 SDM section 17.18 of volume 3 (link:https://software.intel.com/en-us/articles/intel-sdm).
>

Can you elaborate on how is this different to the CMT perf event that is already
in libvirt and can be monitored through domstats API?

  https://libvirt.org/formatdomain.html#elementsPerf

>## About '_virResctrlMon' interface
>
>The cache allocation technology (CAT) has already been implemented in util/virresctrl.* which interacts with Linux kernel resctrl file system. Very simlimar to CAT, the CMT object is represented by 'struct _virResctrlMon', which is
>
>```
>struct _virResctrlMon {
>    virObject parent;
>
>    /* pairedalloc: pointer to a resctrl allocaion it paried with.
>     * NULL for a resctrl monitoring group not associated with
>     * any allocation. */
>    virResctrlAllocPtr pairedalloc;
>    /* The identifier (any unique string for now) */
>    char *id;
>    /* libvirt-generated path, may be identical to alloction path
>     * may not if allocation is ready */
>    char *path;
>};
>```
>
>Almost following the same logic behind '_virResctrlAlloc' which is mainly presented in file 'virresctrl.c', a group of APIs has been designed to manipulate '_virResctrlMon'. The '_virResctrlMon' shares a lot in common with '_virResctrlAlloc' except field 'pairedalloc'.
>'pairedalloc' stores the pointer of paired resctrl allocation object. With current libvirt resctrl implementation, if a resctrl '_virResctrlAlloc' object is created, the CMT hardware is enabled automatically and shares the same folder under same resctrlfs, I call the CMT '_virResctrlMon' object that shares the same folder under resctrlfs as 'paired' _virResctrlMon, further, one '_virResctrlMon' and one '_virResctrlAlloc' are a pare.  In '_virResctrlMon' the paired '_virResctrlAlloc' is tracked through pairedalloc. paired mon group could not be dynamically enabled or disabled during runtime.
>'pairedalloc' could be set to NULL, which creates a non-paired mon group object. Which is necessory because CMT could work independently to monitor the utilization of critical CPU resouces (cache or memory bandwidth) without allocating any dedicated cache or memory bandwidth. A non-paired mon group object represents an independent working CMT. Non-paired mon group could be enabled or disabled during runtime.
>
>## About virsh command 'resctrl'
>
>To set or get the resctrl mon group (hardware CMT), a virsh command 'resctrl' is created. here are the common usages:

The command does make sense for people who know how the stuff works on the
inside or have seen the code in libvirt.  For other users the name 'resctrl' is
going to feel very much arbitrary.  We re trying to abstract the details for
users, so I don't see why it should be named 'resctrl' when it handles "RDT
Monitoring Status".

>```
>[root@dl-c200 david]# virsh list --all
> Id    Name                           State
>----------------------------------------------------
> 1     vm3                            running
> 3     vm2                            running
> -     vm1                            shut off
>```
>
>### Test on a running domain vm3
>To get RDT monitoring status, type 'virsh resctrl <domain>'
>```
>    [root@dl-c200 david]# virsh resctrl vm3
>    RDT Monitoring Status: Enabled
>```
>
>To enable RDT monitoring, type 'virsh resctrl <domain> --enable'
>```
>    [root@dl-c200 david]# virsh resctrl vm3 --enable
>    RDT Monitoring Status: Enabled
>```
>
>To diable RDT monitoring, type 'virsh resctrl <domain> --disable'
>```
>    [root@dl-c200 david]# virsh resctrl vm3 --disable
>    RDT Monitoring Status: Disabled
>
>    [root@dl-c200 david]# virsh resctrl vm3
>    RDT Monitoring Status: Disabled
>```
>
>### test on domain not running vm1
>if domain is not active, it will fail to set RDT monitoring status, and also get the state of 'disabled'
>```
>    [root@dl-c200 david]# virsh resctrl vm1
>    RDT Monitoring Status: Disabled
>
>    [root@dl-c200 david]# virsh resctrl vm1 --enable
>    error: Requested operation is not valid: domain is not running
>
>    [root@dl-c200 david]# virsh resctrl vm1 --disable
>    error: Requested operation is not valid: domain is not running
>```
>

Can't these commands enable it in the XML?  It would be nice if the XML part was
shown here in the explanation.

>### test on domain vm2
>domain vm2 is active and the CAT functionality is enabled through 'cachetune' (configured in 'cputune/cachetune' section). So the resctrl mon group is a 'paried' one, for 'pared' mon group, the RDT monitoring could not be disabled. If it is allowed to disable 'paire' mon group, we have to destroy resctrl allocation folders which is not supported by current cache allocation design.

What if you have multiple cachetunes?  What if the cachetune is only set for one
vcpu and you want to monitor the others as well?  I guess I have to see the
patches to understand why you have so much information stored for something that
looks like a boolean (enable/disable).

>```
>    [root@dl-c200 libvirt]# virsh resctrl vm2 --enable
>    RDT Monitoring Status: Enabled (forced by cachetune)
>
>    [root@dl-c200 libvirt]# virsh resctrl vm2 --disable
>    RDT Monitoring Status: Enabled (forced by cachetune)
>
>    [root@dl-c200 libvirt]# virsh resctrl vm2
>    RDT Monitoring Status: Enabled (forced by cachetune)
>```
>
>## About showing the utilization information of RDT
>
>A domstats field has been created to show the utilization of RDT resources, the command is like this:
>```
>    [root@dl-c200 libvirt]# virsh domstats --resctrl
>    Domain: 'vm1'
>      resctrl.cmt=0
>
>    Domain: 'vm3'
>      resctrl.cmt=180224
>
>    Domain: 'vm2'
>      resctrl.cmt=2613248
>```
>
>
>Wang Huaqiang (3):
>  util: add Intel x86 RDT/CMT support
>  tools: virsh: add command for controling/monitoring resctrl
>  tools: virsh domstats: show RDT CMT resource utilization information
>
> include/libvirt/libvirt-domain.h    |  10 ++
> src/conf/domain_conf.c              |  28 ++++
> src/conf/domain_conf.h              |   3 +
> src/driver-hypervisor.h             |   8 +
> src/libvirt-domain.c                |  92 +++++++++++
> src/libvirt_private.syms            |   9 +
> src/libvirt_public.syms             |   6 +
> src/qemu/qemu_driver.c              | 189 +++++++++++++++++++++
> src/qemu/qemu_process.c             |  65 +++++++-
> src/remote/remote_daemon_dispatch.c |  45 +++++
> src/remote/remote_driver.c          |   2 +
> src/remote/remote_protocol.x        |  28 +++-
> src/remote_protocol-structs         |  12 ++
> src/util/virresctrl.c               | 316 +++++++++++++++++++++++++++++++++++-
> src/util/virresctrl.h               |  44 +++++
> tools/virsh-domain-monitor.c        |   7 +
> tools/virsh-domain.c                |  74 +++++++++
> 17 files changed, 933 insertions(+), 5 deletions(-)
>
>-- 
>2.7.4
>
>--
>libvir-list mailing list
>libvir-list@redhat.com
>https://www.redhat.com/mailman/listinfo/libvir-list
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring Technology (CMT) support

Posted by Wang, Huaqiang 18 weeks ago
Hi Martin,

Thanks for your comments, please see my update inline below.

> -----Original Message-----
> From: Martin Kletzander [mailto:mkletzan@redhat.com]
> Sent: Monday, June 11, 2018 4:30 PM
> To: Wang, Huaqiang <huaqiang.wang@intel.com>
> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing
> <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui
> <rui.zang@intel.com>
> Subject: Re: [libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring
> Technology (CMT) support
> 
> [It would be nice if you wrapped the long lines]
I'll pay attention to these long lines. Thanks for advices. 
> 
> On Fri, Jun 08, 2018 at 05:02:16PM +0800, Wang Huaqiang wrote:
> >This is an RFC request for supporting CPU Cache Monitoring Technology (CMT)
> feature in libvirt. Since MBM is also another feature which is very close to CMT,
> for simplicity we only discuss CMT here. MBM is the followup that will be
> implemented after CMT.
> >About CMT please refer to Intel x86 SDM section 17.18 of volume 3
> (link:https://software.intel.com/en-us/articles/intel-sdm).
> >
> 
> Can you elaborate on how is this different to the CMT perf event that is already
> in libvirt and can be monitored through domstats API?

Due to kernel interface removal of the perf events 'cmt,mbmt,mbml', the libvirt will no 
longer work with latest kernel. Please examine following link for details.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=c39a0e2c8850f08249383f2425dbd8dbe4baad69,

This serials is trying to provide the similar functions of this missing part for reporting
cmt, mbmt and mbml information. First we only focus on cmt. 
Comparing with 'CMT perf event already in libvirt', I am trying to implement almost 
the same output as 'perf.cmt' in the output message of 'domstats', but with another
name , such as 'resctrl.cmt' or 'rdt.cmt' (or some others). 
Another difference is that the underlying implementation is done through the 
kernel resctrl fs.

This serials also attempts to provide a command interface for enabling and disabling 
cmt feature in scope of whole domain as original perf event based cmt could be 
controlled, enabled or disabled, through specifying '--enable cmt' or '--disable cmt' 
while invoking command 'virsh perf <domain>'. 
Our version is like 'virsh resctrl <domain> --enable' with a difference of no suffix of 
'cmt'.  The 'cmt' is omitted because the CMT and MBM  function are both enabled 
whenever a valid resctrl fs sub-folder created, there is no way to disable one while 
enable another one, such as enabling CMT while disabling MBML at the same time.

This serials is trying to stick to interfaces exposed by perf event based CMT/MBM
and provide an interface substitution for perf event based CMB/MBM, such as 
the perf based CMT only provides the cache occupancy information for whole 
domain only. We are also in thinking providing the capability to provide the 
cache occupancy information based on vcpus groups which may be specified in
XML file. 
For example, if we have following configuration:
<cputune>
	<vcpupin vcpu='0' cpuset='1'/>
	<vcpupin vcpu='1' cpuset='3-4'/>
	<vcpupin vcpu='2' cpuset='4-5'/>
	<vcpupin vcpu='3' cpuset='6-7'/>
	<cachetune vcpus='0'>
  		<cache id='0' level='3' type='both' size='2816' unit='KiB'/>
  		<cache id='1' level='3' type='both' size='2816' unit='KiB'/>
	</cachetune>
	<cachetune vcpus='1-2'>
  		<cache id='0' level='3' type='both' size='2816' unit='KiB'/>
  		<cache id='1' level='3' type='both' size='2816' unit='KiB'/>
	</cachetune>
	<rdt-monitoring vcpu='0' enable='yes'>
  	<rdt-monitoring vcpu='1-2' enable='yes'>
	<rdt-monitoring vcpu='3' enable='yes'>
</cputune>

The 'domstats' will output following information regarding cmt
	 [root@dl-c200 libvirt]# virsh domstats vm1 --resctrl 
		Domain: 'vm1'
			rdt.cmt.total=645562
			rdt.cmt.vcpu0=104331
			rdt.cmt.vcpu1_2=203200
			rdt.cmt.vcpu3=340129

Those updates address your comment for " Can you elaborate on how is
 this different to the CMT perf event that is already in libvirt and can be 
monitored through domstats API?", any input is welcome. 

> 
>   https://libvirt.org/formatdomain.html#elementsPerf
> 
> >## About '_virResctrlMon' interface
> >
> >The cache allocation technology (CAT) has already been implemented in
> >util/virresctrl.* which interacts with Linux kernel resctrl file
> >system. Very simlimar to CAT, the CMT object is represented by 'struct
> >_virResctrlMon', which is
> >
> >```
> >struct _virResctrlMon {
> >    virObject parent;
> >
> >    /* pairedalloc: pointer to a resctrl allocaion it paried with.
> >     * NULL for a resctrl monitoring group not associated with
> >     * any allocation. */
> >    virResctrlAllocPtr pairedalloc;
> >    /* The identifier (any unique string for now) */
> >    char *id;
> >    /* libvirt-generated path, may be identical to alloction path
> >     * may not if allocation is ready */
> >    char *path;
> >};
> >```
> >
> >Almost following the same logic behind '_virResctrlAlloc' which is mainly
> presented in file 'virresctrl.c', a group of APIs has been designed to manipulate
> '_virResctrlMon'. The '_virResctrlMon' shares a lot in common with
> '_virResctrlAlloc' except field 'pairedalloc'.
> >'pairedalloc' stores the pointer of paired resctrl allocation object. With current
> libvirt resctrl implementation, if a resctrl '_virResctrlAlloc' object is created, the
> CMT hardware is enabled automatically and shares the same folder under same
> resctrlfs, I call the CMT '_virResctrlMon' object that shares the same folder
> under resctrlfs as 'paired' _virResctrlMon, further, one '_virResctrlMon' and one
> '_virResctrlAlloc' are a pare.  In '_virResctrlMon' the paired '_virResctrlAlloc' is
> tracked through pairedalloc. paired mon group could not be dynamically enabled
> or disabled during runtime.
> >'pairedalloc' could be set to NULL, which creates a non-paired mon group
> object. Which is necessory because CMT could work independently to monitor
> the utilization of critical CPU resouces (cache or memory bandwidth) without
> allocating any dedicated cache or memory bandwidth. A non-paired mon group
> object represents an independent working CMT. Non-paired mon group could be
> enabled or disabled during runtime.
> >
> >## About virsh command 'resctrl'
> >
> >To set or get the resctrl mon group (hardware CMT), a virsh command 'resctrl'
> is created. here are the common usages:
> 
> The command does make sense for people who know how the stuff works on
> the inside or have seen the code in libvirt.  For other users the name 'resctrl' is
> going to feel very much arbitrary.  We re trying to abstract the details for users,
> so I don't see why it should be named 'resctrl' when it handles "RDT Monitoring
> Status".

Agree. 'resctrl' do make a lot of confusion to end users. 
Since the underlying kernel interface combines CAT and MBM features together, 
what I mean is , the files ' llc_occupancy', ' mbm_local_bytes' and ' mbm_total_bytes'
that represent the information of cache, local memory bandwidth, and total
memory bandwidth respectively are created automatically and simultaneously for
 each resctrl group, there is no way to enable one and disable another one. So for
a command which affects both cache and memory bandwidth, I would like to use 
the word 'rdt' as the key command word. Both cache monitoring(CMT) and memory 
bandwidth monitoring(MBM) are belong to the scope of RDT monitoring.
So to replace the confusing word 'resctrl', I'd like to use 'rdtmon' as command name, 
the command 'virsh resctrl <domain>' would be changed to 'virsh rdtmon <domain>'.
Also, here welcoming any suggestions from community. 

> 
> >```
> >[root@dl-c200 david]# virsh list --all
> > Id    Name                           State
> >----------------------------------------------------
> > 1     vm3                            running
> > 3     vm2                            running
> > -     vm1                            shut off
> >```
> >
> >### Test on a running domain vm3
> >To get RDT monitoring status, type 'virsh resctrl <domain>'
> >```
> >    [root@dl-c200 david]# virsh resctrl vm3
> >    RDT Monitoring Status: Enabled
> >```
> >
> >To enable RDT monitoring, type 'virsh resctrl <domain> --enable'
> >```
> >    [root@dl-c200 david]# virsh resctrl vm3 --enable
> >    RDT Monitoring Status: Enabled
> >```
> >
> >To diable RDT monitoring, type 'virsh resctrl <domain> --disable'
> >```
> >    [root@dl-c200 david]# virsh resctrl vm3 --disable
> >    RDT Monitoring Status: Disabled
> >
> >    [root@dl-c200 david]# virsh resctrl vm3
> >    RDT Monitoring Status: Disabled
> >```
> >
> >### test on domain not running vm1
> >if domain is not active, it will fail to set RDT monitoring status, and also get the
> state of 'disabled'
> >```
> >    [root@dl-c200 david]# virsh resctrl vm1
> >    RDT Monitoring Status: Disabled
> >
> >    [root@dl-c200 david]# virsh resctrl vm1 --enable
> >    error: Requested operation is not valid: domain is not running
> >
> >    [root@dl-c200 david]# virsh resctrl vm1 --disable
> >    error: Requested operation is not valid: domain is not running ```
> >
> 
> Can't these commands enable it in the XML?  It would be nice if the XML part
> was
> shown here in the explanation.

In the POC code of the first version there is no XML changes, and could not be
enabled/disabled through XML file. 

Let's have a discuss and add this function, how about this configuration
<cputune>
	<cachetune vcpus='1-2'>
  		<cache id='0' level='3' type='both' size='2816' unit='KiB'/>
  		<cache id='1' level='3' type='both' size='2816' unit='KiB'/>
	</cachetune>
	<rdt-monitoring vcpu='0' enable='no'>
  	<rdt-monitoring vcpu='1-2' enable='yes'>
	<rdt-monitoring vcpu='3' enable='yes'>
</cputune> 
With upper setting, 
- Two rdt monitoring groups will be created along with the launch of vm. 
- <rdt-monitoring vcpu='1-2' enable='yes'> is created automatically due to the 
setting of <cachetune>. Under resctrl fs, the resctrl allocation and rdt monitoring
group are presented in the way of sub-folders, we cannot create two sub-folders 
under resctrl fs folders for one process. so a resctrl allocation will create a rdt
monitoring group as well. This rdt monitoring group could not be disabled in 
runtime because there is no way to disable resctrl allocation (CAT) in runtime.
- <rdt-monitoring vcpu='3' enable='yes'> creates another default enabled rdt
monitoring group, and task id (pid associated with vcpu3) will be put into the
'tasks' file. This rdt monitoring over vcpu 3 could be enabled or disabled in 
runtime through command such as 'virsh rdtmon --enable vcpu3' . 
The MBM feature will also be enabled or disabled with this command. 
- <rdt-monitoring vcpu='0' enable='no'> specifies the default CAT state for vcpu0 
of domain, which is disabled after launch, and could be changed in runtime.


> >### test on domain vm2
> >domain vm2 is active and the CAT functionality is enabled through 'cachetune'
> (configured in 'cputune/cachetune' section). So the resctrl mon group is a
> 'paried' one, for 'pared' mon group, the RDT monitoring could not be disabled. If
> it is allowed to disable 'paire' mon group, we have to destroy resctrl allocation
> folders which is not supported by current cache allocation design.
> 
> What if you have multiple cachetunes?  What if the cachetune is only set for one
> vcpu and you want to monitor the others as well?  I guess I have to see the
> patches to understand why you have so much information stored for something
> that
> looks like a boolean (enable/disable).

At the time I raised this RFC, there is no design for reporting rdt monitoring
information in granularity of cachetune, only report cache /memory bandwidth
information for whole domain.
But now I'd like to discuss the design that I list above, reporting rdt monitoring 
Information based on the setting of rdt-monitoring(cachetune) groups. Need your 
comments.

> 
> >```
> >    [root@dl-c200 libvirt]# virsh resctrl vm2 --enable
> >    RDT Monitoring Status: Enabled (forced by cachetune)
> >
> >    [root@dl-c200 libvirt]# virsh resctrl vm2 --disable
> >    RDT Monitoring Status: Enabled (forced by cachetune)
> >
> >    [root@dl-c200 libvirt]# virsh resctrl vm2
> >    RDT Monitoring Status: Enabled (forced by cachetune)
> >```
> >
> >## About showing the utilization information of RDT
> >
> >A domstats field has been created to show the utilization of RDT resources, the
> command is like this:
> >```
> >    [root@dl-c200 libvirt]# virsh domstats --resctrl
> >    Domain: 'vm1'
> >      resctrl.cmt=0
> >
> >    Domain: 'vm3'
> >      resctrl.cmt=180224
> >
> >    Domain: 'vm2'
> >      resctrl.cmt=2613248
> >```
> >
> >
> >Wang Huaqiang (3):
> >  util: add Intel x86 RDT/CMT support
> >  tools: virsh: add command for controling/monitoring resctrl
> >  tools: virsh domstats: show RDT CMT resource utilization information
> >
> > include/libvirt/libvirt-domain.h    |  10 ++
> > src/conf/domain_conf.c              |  28 ++++
> > src/conf/domain_conf.h              |   3 +
> > src/driver-hypervisor.h             |   8 +
> > src/libvirt-domain.c                |  92 +++++++++++
> > src/libvirt_private.syms            |   9 +
> > src/libvirt_public.syms             |   6 +
> > src/qemu/qemu_driver.c              | 189 +++++++++++++++++++++
> > src/qemu/qemu_process.c             |  65 +++++++-
> > src/remote/remote_daemon_dispatch.c |  45 +++++
> > src/remote/remote_driver.c          |   2 +
> > src/remote/remote_protocol.x        |  28 +++-
> > src/remote_protocol-structs         |  12 ++
> > src/util/virresctrl.c               | 316 +++++++++++++++++++++++++++++++++++-
> > src/util/virresctrl.h               |  44 +++++
> > tools/virsh-domain-monitor.c        |   7 +
> > tools/virsh-domain.c                |  74 +++++++++
> > 17 files changed, 933 insertions(+), 5 deletions(-)
> >
> >--
> >2.7.4
> >
> >--
> >libvir-list mailing list
> >libvir-list@redhat.com
> >https://www.redhat.com/mailman/listinfo/libvir-list

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring Technology (CMT) support

Posted by Martin Kletzander 18 weeks ago
On Tue, Jun 12, 2018 at 10:11:30AM +0000, Wang, Huaqiang wrote:
>Hi Martin,
>
>Thanks for your comments, please see my update inline below.
>
>> -----Original Message-----
>> From: Martin Kletzander [mailto:mkletzan@redhat.com]
>> Sent: Monday, June 11, 2018 4:30 PM
>> To: Wang, Huaqiang <huaqiang.wang@intel.com>
>> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing
>> <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui
>> <rui.zang@intel.com>
>> Subject: Re: [libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring
>> Technology (CMT) support
>>
>> [It would be nice if you wrapped the long lines]
>I'll pay attention to these long lines. Thanks for advices.

No need to, most email clients can do that automatically.  Doing stuff like this
manually is very unproductive :).

>>
>> On Fri, Jun 08, 2018 at 05:02:16PM +0800, Wang Huaqiang wrote:
>> >This is an RFC request for supporting CPU Cache Monitoring Technology (CMT)
>> feature in libvirt. Since MBM is also another feature which is very close to CMT,
>> for simplicity we only discuss CMT here. MBM is the followup that will be
>> implemented after CMT.
>> >About CMT please refer to Intel x86 SDM section 17.18 of volume 3
>> (link:https://software.intel.com/en-us/articles/intel-sdm).
>> >
>>
>> Can you elaborate on how is this different to the CMT perf event that is already
>> in libvirt and can be monitored through domstats API?
>
>Due to kernel interface removal of the perf events 'cmt,mbmt,mbml', the libvirt will no
>longer work with latest kernel. Please examine following link for details.
>https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=c39a0e2c8850f08249383f2425dbd8dbe4baad69,
>
>This serials is trying to provide the similar functions of this missing part for reporting
>cmt, mbmt and mbml information. First we only focus on cmt.
>Comparing with 'CMT perf event already in libvirt', I am trying to implement almost
>the same output as 'perf.cmt' in the output message of 'domstats', but with another
>name , such as 'resctrl.cmt' or 'rdt.cmt' (or some others).
>Another difference is that the underlying implementation is done through the
>kernel resctrl fs.
>
>This serials also attempts to provide a command interface for enabling and disabling
>cmt feature in scope of whole domain as original perf event based cmt could be
>controlled, enabled or disabled, through specifying '--enable cmt' or '--disable cmt'
>while invoking command 'virsh perf <domain>'.
>Our version is like 'virsh resctrl <domain> --enable' with a difference of no suffix of
>'cmt'.  The 'cmt' is omitted because the CMT and MBM  function are both enabled
>whenever a valid resctrl fs sub-folder created, there is no way to disable one while
>enable another one, such as enabling CMT while disabling MBML at the same time.
>
>This serials is trying to stick to interfaces exposed by perf event based CMT/MBM
>and provide an interface substitution for perf event based CMB/MBM, such as
>the perf based CMT only provides the cache occupancy information for whole
>domain only. We are also in thinking providing the capability to provide the
>cache occupancy information based on vcpus groups which may be specified in
>XML file.
>For example, if we have following configuration:
><cputune>
>	<vcpupin vcpu='0' cpuset='1'/>
>	<vcpupin vcpu='1' cpuset='3-4'/>
>	<vcpupin vcpu='2' cpuset='4-5'/>
>	<vcpupin vcpu='3' cpuset='6-7'/>
>	<cachetune vcpus='0'>
>  		<cache id='0' level='3' type='both' size='2816' unit='KiB'/>
>  		<cache id='1' level='3' type='both' size='2816' unit='KiB'/>
>	</cachetune>
>	<cachetune vcpus='1-2'>
>  		<cache id='0' level='3' type='both' size='2816' unit='KiB'/>
>  		<cache id='1' level='3' type='both' size='2816' unit='KiB'/>
>	</cachetune>
>	<rdt-monitoring vcpu='0' enable='yes'>
>  	<rdt-monitoring vcpu='1-2' enable='yes'>
>	<rdt-monitoring vcpu='3' enable='yes'>
></cputune>
>
>The 'domstats' will output following information regarding cmt
>	 [root@dl-c200 libvirt]# virsh domstats vm1 --resctrl
>		Domain: 'vm1'
>			rdt.cmt.total=645562
>			rdt.cmt.vcpu0=104331
>			rdt.cmt.vcpu1_2=203200
>			rdt.cmt.vcpu3=340129
>

beware as 1-4 is something else than 1,4 so you need to differentiate that.  Or
to make it easier to parse for consumers of that API just list each vcpu on its
own line (but then you need to say which are counted together). Or group them:

rdt.cmt.total=645562
rdt.cmt.group0.value=104331
rdt.cmt.group0.vcpus=0
rdt.cmt.group0.value=203200
rdt.cmt.group0.vcpus=1-2
rdt.cmt.group0.value=340129
rdt.cmt.group0.vcpus=3

Honestly, I don't care that much how it is going to look, but it needs to be
easy to parse and understand.

>Those updates address your comment for " Can you elaborate on how is
> this different to the CMT perf event that is already in libvirt and can be
>monitored through domstats API?", any input is welcome.
>

Great, this information (or rather a brief summary) should be part of the patch
series.  Not necessarily the commit messages (some of the things would fit
there), but at least the cover letter.  Otherwise you might get the same
question next time and will have to provide the same answer to the next reviewer
and so on.

>>
>>   https://libvirt.org/formatdomain.html#elementsPerf
>>
>> >## About '_virResctrlMon' interface
>> >
>> >The cache allocation technology (CAT) has already been implemented in
>> >util/virresctrl.* which interacts with Linux kernel resctrl file
>> >system. Very simlimar to CAT, the CMT object is represented by 'struct
>> >_virResctrlMon', which is
>> >
>> >```
>> >struct _virResctrlMon {
>> >    virObject parent;
>> >
>> >    /* pairedalloc: pointer to a resctrl allocaion it paried with.
>> >     * NULL for a resctrl monitoring group not associated with
>> >     * any allocation. */
>> >    virResctrlAllocPtr pairedalloc;
>> >    /* The identifier (any unique string for now) */
>> >    char *id;
>> >    /* libvirt-generated path, may be identical to alloction path
>> >     * may not if allocation is ready */
>> >    char *path;
>> >};
>> >```
>> >
>> >Almost following the same logic behind '_virResctrlAlloc' which is mainly
>> presented in file 'virresctrl.c', a group of APIs has been designed to manipulate
>> '_virResctrlMon'. The '_virResctrlMon' shares a lot in common with
>> '_virResctrlAlloc' except field 'pairedalloc'.
>> >'pairedalloc' stores the pointer of paired resctrl allocation object. With current
>> libvirt resctrl implementation, if a resctrl '_virResctrlAlloc' object is created, the
>> CMT hardware is enabled automatically and shares the same folder under same
>> resctrlfs, I call the CMT '_virResctrlMon' object that shares the same folder
>> under resctrlfs as 'paired' _virResctrlMon, further, one '_virResctrlMon' and one
>> '_virResctrlAlloc' are a pare.  In '_virResctrlMon' the paired '_virResctrlAlloc' is
>> tracked through pairedalloc. paired mon group could not be dynamically enabled
>> or disabled during runtime.
>> >'pairedalloc' could be set to NULL, which creates a non-paired mon group
>> object. Which is necessory because CMT could work independently to monitor
>> the utilization of critical CPU resouces (cache or memory bandwidth) without
>> allocating any dedicated cache or memory bandwidth. A non-paired mon group
>> object represents an independent working CMT. Non-paired mon group could be
>> enabled or disabled during runtime.
>> >
>> >## About virsh command 'resctrl'
>> >
>> >To set or get the resctrl mon group (hardware CMT), a virsh command 'resctrl'
>> is created. here are the common usages:
>>
>> The command does make sense for people who know how the stuff works on
>> the inside or have seen the code in libvirt.  For other users the name 'resctrl' is
>> going to feel very much arbitrary.  We re trying to abstract the details for users,
>> so I don't see why it should be named 'resctrl' when it handles "RDT Monitoring
>> Status".
>
>Agree. 'resctrl' do make a lot of confusion to end users.
>Since the underlying kernel interface combines CAT and MBM features together,
>what I mean is , the files ' llc_occupancy', ' mbm_local_bytes' and ' mbm_total_bytes'
>that represent the information of cache, local memory bandwidth, and total
>memory bandwidth respectively are created automatically and simultaneously for
> each resctrl group, there is no way to enable one and disable another one. So for
>a command which affects both cache and memory bandwidth, I would like to use
>the word 'rdt' as the key command word. Both cache monitoring(CMT) and memory
>bandwidth monitoring(MBM) are belong to the scope of RDT monitoring.
>So to replace the confusing word 'resctrl', I'd like to use 'rdtmon' as command name,
>the command 'virsh resctrl <domain>' would be changed to 'virsh rdtmon <domain>'.
>Also, here welcoming any suggestions from community.
>

Libvirt tries to abstract various vendor-specific things.  For example AMD's SEV
is abstracted under the name `launch-security` IIRC so that if there are more in
the future not all the code needs to be duplicated.  In the same sense Intel's
RDT could be named in a more generic sense.  Resource Control and Monitoring
seems to reflect what it does, but it's kind of a mouthful.  Maybe others will
have better ideas.  I'm bad at naming.

>>
>> >```
>> >[root@dl-c200 david]# virsh list --all
>> > Id    Name                           State
>> >----------------------------------------------------
>> > 1     vm3                            running
>> > 3     vm2                            running
>> > -     vm1                            shut off
>> >```
>> >
>> >### Test on a running domain vm3
>> >To get RDT monitoring status, type 'virsh resctrl <domain>'
>> >```
>> >    [root@dl-c200 david]# virsh resctrl vm3
>> >    RDT Monitoring Status: Enabled
>> >```
>> >
>> >To enable RDT monitoring, type 'virsh resctrl <domain> --enable'
>> >```
>> >    [root@dl-c200 david]# virsh resctrl vm3 --enable
>> >    RDT Monitoring Status: Enabled
>> >```
>> >
>> >To diable RDT monitoring, type 'virsh resctrl <domain> --disable'
>> >```
>> >    [root@dl-c200 david]# virsh resctrl vm3 --disable
>> >    RDT Monitoring Status: Disabled
>> >
>> >    [root@dl-c200 david]# virsh resctrl vm3
>> >    RDT Monitoring Status: Disabled
>> >```
>> >
>> >### test on domain not running vm1
>> >if domain is not active, it will fail to set RDT monitoring status, and also get the
>> state of 'disabled'
>> >```
>> >    [root@dl-c200 david]# virsh resctrl vm1
>> >    RDT Monitoring Status: Disabled
>> >
>> >    [root@dl-c200 david]# virsh resctrl vm1 --enable
>> >    error: Requested operation is not valid: domain is not running
>> >
>> >    [root@dl-c200 david]# virsh resctrl vm1 --disable
>> >    error: Requested operation is not valid: domain is not running ```
>> >
>>
>> Can't these commands enable it in the XML?  It would be nice if the XML part
>> was
>> shown here in the explanation.
>
>In the POC code of the first version there is no XML changes, and could not be
>enabled/disabled through XML file.
>
>Let's have a discuss and add this function, how about this configuration
><cputune>
>	<cachetune vcpus='1-2'>
>  		<cache id='0' level='3' type='both' size='2816' unit='KiB'/>
>  		<cache id='1' level='3' type='both' size='2816' unit='KiB'/>
>	</cachetune>
>	<rdt-monitoring vcpu='0' enable='no'>
>  	<rdt-monitoring vcpu='1-2' enable='yes'>
>	<rdt-monitoring vcpu='3' enable='yes'>
></cputune>

Just so we are on the same note, it doesn't have to have an option to be
enabled/disabled in the XML.  However, you probably still need to keep the state
of that information somewhere across libvirtd restarts.  If there is any, I
haven't gone through the code.

>With upper setting,
>- Two rdt monitoring groups will be created along with the launch of vm.
>- <rdt-monitoring vcpu='1-2' enable='yes'> is created automatically due to the
>setting of <cachetune>. Under resctrl fs, the resctrl allocation and rdt monitoring
>group are presented in the way of sub-folders, we cannot create two sub-folders
>under resctrl fs folders for one process. so a resctrl allocation will create a rdt
>monitoring group as well. This rdt monitoring group could not be disabled in
>runtime because there is no way to disable resctrl allocation (CAT) in runtime.
>- <rdt-monitoring vcpu='3' enable='yes'> creates another default enabled rdt
>monitoring group, and task id (pid associated with vcpu3) will be put into the
>'tasks' file. This rdt monitoring over vcpu 3 could be enabled or disabled in
>runtime through command such as 'virsh rdtmon --enable vcpu3' .
>The MBM feature will also be enabled or disabled with this command.
>- <rdt-monitoring vcpu='0' enable='no'> specifies the default CAT state for vcpu0
>of domain, which is disabled after launch, and could be changed in runtime.
>

There are many places where stuff can be created.  I started going down the
rabbit hole again (like last time when I was implementing CAT) and again, the
kernel interface is horrible.  Inconsistent naming, poor documentation (or maybe
I'm just a very bad reader).  I hope someone will join this review because I
can't sensibly map the kernel interface to whatever libvirt might do/expose.  I
already wasted so much time on CAT and I don't want to go back to that again.

Let's not do any XML changes unless we find out they are actually needed.

>
>> >### test on domain vm2
>> >domain vm2 is active and the CAT functionality is enabled through 'cachetune'
>> (configured in 'cputune/cachetune' section). So the resctrl mon group is a
>> 'paried' one, for 'pared' mon group, the RDT monitoring could not be disabled. If
>> it is allowed to disable 'paire' mon group, we have to destroy resctrl allocation
>> folders which is not supported by current cache allocation design.
>>
>> What if you have multiple cachetunes?  What if the cachetune is only set for one
>> vcpu and you want to monitor the others as well?  I guess I have to see the
>> patches to understand why you have so much information stored for something
>> that
>> looks like a boolean (enable/disable).
>
>At the time I raised this RFC, there is no design for reporting rdt monitoring
>information in granularity of cachetune, only report cache /memory bandwidth
>information for whole domain.
>But now I'd like to discuss the design that I list above, reporting rdt monitoring
>Information based on the setting of rdt-monitoring(cachetune) groups. Need your
>comments.
>

I just wanted to know what is the preferred approach.  If we're creating
mon_groups/domain_name_vcpus_X/ or just new resctrl group (there is not much of
a difference in that).  Does it take hot-(un)plug of vcpus into consideration?
How about emulator threads and iothreads?  I know libvirt doesn't support them
yet for CAT, but that'd be a good way to start adding features to libvirt IMHO.

Or live changes to cachetunes.  If we have that, then maybe the addition of
monitoring will make more sense and it will fit more nicely (since we'll have a
more complete picture).

>>
>> >```
>> >    [root@dl-c200 libvirt]# virsh resctrl vm2 --enable
>> >    RDT Monitoring Status: Enabled (forced by cachetune)
>> >
>> >    [root@dl-c200 libvirt]# virsh resctrl vm2 --disable
>> >    RDT Monitoring Status: Enabled (forced by cachetune)
>> >
>> >    [root@dl-c200 libvirt]# virsh resctrl vm2
>> >    RDT Monitoring Status: Enabled (forced by cachetune)
>> >```
>> >
>> >## About showing the utilization information of RDT
>> >
>> >A domstats field has been created to show the utilization of RDT resources, the
>> command is like this:
>> >```
>> >    [root@dl-c200 libvirt]# virsh domstats --resctrl
>> >    Domain: 'vm1'
>> >      resctrl.cmt=0
>> >
>> >    Domain: 'vm3'
>> >      resctrl.cmt=180224
>> >
>> >    Domain: 'vm2'
>> >      resctrl.cmt=2613248
>> >```
>> >
>> >
>> >Wang Huaqiang (3):
>> >  util: add Intel x86 RDT/CMT support
>> >  tools: virsh: add command for controling/monitoring resctrl
>> >  tools: virsh domstats: show RDT CMT resource utilization information
>> >
>> > include/libvirt/libvirt-domain.h    |  10 ++
>> > src/conf/domain_conf.c              |  28 ++++
>> > src/conf/domain_conf.h              |   3 +
>> > src/driver-hypervisor.h             |   8 +
>> > src/libvirt-domain.c                |  92 +++++++++++
>> > src/libvirt_private.syms            |   9 +
>> > src/libvirt_public.syms             |   6 +
>> > src/qemu/qemu_driver.c              | 189 +++++++++++++++++++++
>> > src/qemu/qemu_process.c             |  65 +++++++-
>> > src/remote/remote_daemon_dispatch.c |  45 +++++
>> > src/remote/remote_driver.c          |   2 +
>> > src/remote/remote_protocol.x        |  28 +++-
>> > src/remote_protocol-structs         |  12 ++
>> > src/util/virresctrl.c               | 316 +++++++++++++++++++++++++++++++++++-
>> > src/util/virresctrl.h               |  44 +++++
>> > tools/virsh-domain-monitor.c        |   7 +
>> > tools/virsh-domain.c                |  74 +++++++++
>> > 17 files changed, 933 insertions(+), 5 deletions(-)
>> >
>> >--
>> >2.7.4
>> >
>> >--
>> >libvir-list mailing list
>> >libvir-list@redhat.com
>> >https://www.redhat.com/mailman/listinfo/libvir-list
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring Technology (CMT) support

Posted by Wang, Huaqiang 16 weeks ago
Please see my inline reply.

> -----Original Message-----
> From: Martin Kletzander [mailto:mkletzan@redhat.com]
> Sent: Thursday, June 14, 2018 3:54 PM
> To: Wang, Huaqiang <huaqiang.wang@intel.com>
> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>; Niu, Bing
> <bing.niu@intel.com>; Ding, Jian-feng <jian-feng.ding@intel.com>; Zang, Rui
> <rui.zang@intel.com>
> Subject: Re: [libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache Monitoring
> Technology (CMT) support
> 
> On Tue, Jun 12, 2018 at 10:11:30AM +0000, Wang, Huaqiang wrote:
> >Hi Martin,
> >
> >Thanks for your comments, please see my update inline below.
> >
> >> -----Original Message-----
> >> From: Martin Kletzander [mailto:mkletzan@redhat.com]
> >> Sent: Monday, June 11, 2018 4:30 PM
> >> To: Wang, Huaqiang <huaqiang.wang@intel.com>
> >> Cc: libvir-list@redhat.com; Feng, Shaohe <shaohe.feng@intel.com>;
> >> Niu, Bing <bing.niu@intel.com>; Ding, Jian-feng
> >> <jian-feng.ding@intel.com>; Zang, Rui <rui.zang@intel.com>
> >> Subject: Re: [libvirt] [RFC PATCH 0/3] RFC for X86 RDT Cache
> >> Monitoring Technology (CMT) support
> >>
> >> [It would be nice if you wrapped the long lines]
> >I'll pay attention to these long lines. Thanks for advices.
> 
> No need to, most email clients can do that automatically.  Doing stuff like this
> manually is very unproductive :).
> 
> >>
> >> On Fri, Jun 08, 2018 at 05:02:16PM +0800, Wang Huaqiang wrote:
> >> >This is an RFC request for supporting CPU Cache Monitoring
> >> >Technology (CMT)
> >> feature in libvirt. Since MBM is also another feature which is very
> >> close to CMT, for simplicity we only discuss CMT here. MBM is the
> >> followup that will be implemented after CMT.
> >> >About CMT please refer to Intel x86 SDM section 17.18 of volume 3
> >> (link:https://software.intel.com/en-us/articles/intel-sdm).
> >> >
> >>
> >> Can you elaborate on how is this different to the CMT perf event that
> >> is already in libvirt and can be monitored through domstats API?
> >
> >Due to kernel interface removal of the perf events 'cmt,mbmt,mbml', the
> >libvirt will no longer work with latest kernel. Please examine following link for
> details.
> >https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> >/commit/?id=c39a0e2c8850f08249383f2425dbd8dbe4baad69,
> >
> >This serials is trying to provide the similar functions of this missing
> >part for reporting cmt, mbmt and mbml information. First we only focus on cmt.
> >Comparing with 'CMT perf event already in libvirt', I am trying to
> >implement almost the same output as 'perf.cmt' in the output message of
> >'domstats', but with another name , such as 'resctrl.cmt' or 'rdt.cmt' (or some
> others).
> >Another difference is that the underlying implementation is done
> >through the kernel resctrl fs.
> >
> >This serials also attempts to provide a command interface for enabling
> >and disabling cmt feature in scope of whole domain as original perf
> >event based cmt could be controlled, enabled or disabled, through specifying '-
> -enable cmt' or '--disable cmt'
> >while invoking command 'virsh perf <domain>'.
> >Our version is like 'virsh resctrl <domain> --enable' with a difference
> >of no suffix of 'cmt'.  The 'cmt' is omitted because the CMT and MBM
> >function are both enabled whenever a valid resctrl fs sub-folder
> >created, there is no way to disable one while enable another one, such as
> enabling CMT while disabling MBML at the same time.
> >
> >This serials is trying to stick to interfaces exposed by perf event
> >based CMT/MBM and provide an interface substitution for perf event
> >based CMB/MBM, such as the perf based CMT only provides the cache
> >occupancy information for whole domain only. We are also in thinking
> >providing the capability to provide the cache occupancy information
> >based on vcpus groups which may be specified in XML file.
> >For example, if we have following configuration:
> ><cputune>
> >	<vcpupin vcpu='0' cpuset='1'/>
> >	<vcpupin vcpu='1' cpuset='3-4'/>
> >	<vcpupin vcpu='2' cpuset='4-5'/>
> >	<vcpupin vcpu='3' cpuset='6-7'/>
> >	<cachetune vcpus='0'>
> >  		<cache id='0' level='3' type='both' size='2816' unit='KiB'/>
> >  		<cache id='1' level='3' type='both' size='2816' unit='KiB'/>
> >	</cachetune>
> >	<cachetune vcpus='1-2'>
> >  		<cache id='0' level='3' type='both' size='2816' unit='KiB'/>
> >  		<cache id='1' level='3' type='both' size='2816' unit='KiB'/>
> >	</cachetune>
> >	<rdt-monitoring vcpu='0' enable='yes'>
> >  	<rdt-monitoring vcpu='1-2' enable='yes'>
> >	<rdt-monitoring vcpu='3' enable='yes'> </cputune>
> >
> >The 'domstats' will output following information regarding cmt
> >	 [root@dl-c200 libvirt]# virsh domstats vm1 --resctrl
> >		Domain: 'vm1'
> >			rdt.cmt.total=645562
> >			rdt.cmt.vcpu0=104331
> >			rdt.cmt.vcpu1_2=203200
> >			rdt.cmt.vcpu3=340129
> >
> 
> beware as 1-4 is something else than 1,4 so you need to differentiate that.  Or
> to make it easier to parse for consumers of that API just list each vcpu on its
> own line (but then you need to say which are counted together). Or group them:
> 
> rdt.cmt.total=645562
> rdt.cmt.group0.value=104331
> rdt.cmt.group0.vcpus=0
> rdt.cmt.group0.value=203200
> rdt.cmt.group0.vcpus=1-2
> rdt.cmt.group0.value=340129
> rdt.cmt.group0.vcpus=3
> 
> Honestly, I don't care that much how it is going to look, but it needs to be easy
> to parse and understand.
> Honestly, I don't care that much how it is going to look, but it needs to be 
easy to parse and understand.

Your arrangement by separating group vcpus and group resource value is much better than
my version, thanks for suggestion.  
By the way, I may omit the output of 'rdt.cmt.total', reason is if not all domain's vcpus are 
covered in the resctrl monitoring groups, the 'rdt.cmt.total' may be confusing, either mean
providing whole domain's resource utilization information or a sum of created groups 
resource utilization. 
If user want a sum of resource for currently enabled CMT monitoring groups, user can add them
by themselves. If user wants whole domain's number, create groups covering all vcpus.


> >Those updates address your comment for " Can you elaborate on how is
> >this different to the CMT perf event that is already in libvirt and can
> >be monitored through domstats API?", any input is welcome.
> >
> 
> Great, this information (or rather a brief summary) should be part of the patch
> series.  Not necessarily the commit messages (some of the things would fit
> there), but at least the cover letter.  Otherwise you might get the same question
> next time and will have to provide the same answer to the next reviewer and so
> on.

OK. I'll update this part of discussion in my next version RFC as well as the cover letter of the 
POC code.

> 
> >>
> >>   https://libvirt.org/formatdomain.html#elementsPerf
> >>
> >> >## About '_virResctrlMon' interface
> >> >
> >> >The cache allocation technology (CAT) has already been implemented
> >> >in
> >> >util/virresctrl.* which interacts with Linux kernel resctrl file
> >> >system. Very simlimar to CAT, the CMT object is represented by
> >> >'struct _virResctrlMon', which is
> >> >
> >> >```
> >> >struct _virResctrlMon {
> >> >    virObject parent;
> >> >
> >> >    /* pairedalloc: pointer to a resctrl allocaion it paried with.
> >> >     * NULL for a resctrl monitoring group not associated with
> >> >     * any allocation. */
> >> >    virResctrlAllocPtr pairedalloc;
> >> >    /* The identifier (any unique string for now) */
> >> >    char *id;
> >> >    /* libvirt-generated path, may be identical to alloction path
> >> >     * may not if allocation is ready */
> >> >    char *path;
> >> >};
> >> >```
> >> >
> >> >Almost following the same logic behind '_virResctrlAlloc' which is
> >> >mainly
> >> presented in file 'virresctrl.c', a group of APIs has been designed
> >> to manipulate '_virResctrlMon'. The '_virResctrlMon' shares a lot in
> >> common with '_virResctrlAlloc' except field 'pairedalloc'.
> >> >'pairedalloc' stores the pointer of paired resctrl allocation
> >> >object. With current
> >> libvirt resctrl implementation, if a resctrl '_virResctrlAlloc'
> >> object is created, the CMT hardware is enabled automatically and
> >> shares the same folder under same resctrlfs, I call the CMT
> >> '_virResctrlMon' object that shares the same folder under resctrlfs
> >> as 'paired' _virResctrlMon, further, one '_virResctrlMon' and one
> >> '_virResctrlAlloc' are a pare.  In '_virResctrlMon' the paired
> >> '_virResctrlAlloc' is tracked through pairedalloc. paired mon group could not
> be dynamically enabled or disabled during runtime.
> >> >'pairedalloc' could be set to NULL, which creates a non-paired mon
> >> >group
> >> object. Which is necessory because CMT could work independently to
> >> monitor the utilization of critical CPU resouces (cache or memory
> >> bandwidth) without allocating any dedicated cache or memory
> >> bandwidth. A non-paired mon group object represents an independent
> >> working CMT. Non-paired mon group could be enabled or disabled during
> runtime.
> >> >
> >> >## About virsh command 'resctrl'
> >> >
> >> >To set or get the resctrl mon group (hardware CMT), a virsh command
> 'resctrl'
> >> is created. here are the common usages:
> >>
> >> The command does make sense for people who know how the stuff works
> >> on the inside or have seen the code in libvirt.  For other users the
> >> name 'resctrl' is going to feel very much arbitrary.  We re trying to
> >> abstract the details for users, so I don't see why it should be named
> >> 'resctrl' when it handles "RDT Monitoring Status".
> >
> >Agree. 'resctrl' do make a lot of confusion to end users.
> >Since the underlying kernel interface combines CAT and MBM features
> >together, what I mean is , the files ' llc_occupancy', ' mbm_local_bytes' and '
> mbm_total_bytes'
> >that represent the information of cache, local memory bandwidth, and
> >total memory bandwidth respectively are created automatically and
> >simultaneously for  each resctrl group, there is no way to enable one
> >and disable another one. So for a command which affects both cache and
> >memory bandwidth, I would like to use the word 'rdt' as the key command
> >word. Both cache monitoring(CMT) and memory bandwidth monitoring(MBM)
> are belong to the scope of RDT monitoring.
> >So to replace the confusing word 'resctrl', I'd like to use 'rdtmon' as
> >command name, the command 'virsh resctrl <domain>' would be changed to
> 'virsh rdtmon <domain>'.
> >Also, here welcoming any suggestions from community.
> >
> 
> Libvirt tries to abstract various vendor-specific things.  For example AMD's SEV is
> abstracted under the name `launch-security` IIRC so that if there are more in the
> future not all the code needs to be duplicated.  In the same sense Intel's RDT
> could be named in a more generic sense.  Resource Control and Monitoring
> seems to reflect what it does, but it's kind of a mouthful.  Maybe others will
> have better ideas.  I'm bad at naming.
> 

I am bad at naming too :)
I agree that 'RDT' and 'resctrl' are pretty confusing for system administrator from the name. 
But 'Resource Control' or 'Monitoring' is not good choice either, in my opinion. These two 
phrases have too big scope varying from network resource to memory (DRAM) resource as 
well as some other resources.  
Here only focus on CPU resources, currently, the last level cache and memory bandwidth,
I would like to use 'cpu-resouce' or 'cpures' as the name for general RDT feature enabling.
how about the interfaces shown below:
1. A virsh command 'cpu-resource' for checking domain associated resctrl resource group status 
and creating/setting resctrl monitoring group in granularity of vcpu. there command may show like
this
virsh cpu-resource --create <resource type> --destroy <resource type> --vcpulist <vcpulist> 
   --group-name <resctrl group name> 
*. Using '--create' and '--destroy' to substitute '--enable' and '--disable' that I proposed in my 
last update. create and destroy are more accurate as the operation action is accurately set up
and delete resource groups. 
*. for <resource type>, here will specify 'monitoring' for feature both CMT and MBM. CAT and MBA
could be supported here if it is planned to created function which is similar to command 
'cachetune' or 'membwtune' here, his parameter is also extensible for future CPU resources. 
*. For <vcpulist> it specify the associated vcpu list for a cpu resource group.
*. For <group-name>, specifies resource group name, if creating monitoring group for specific
vcpu list, a null string for this is expected to match the virResctrlAllocPtr->id string. 
This argument is also extensible to support some other features, e.g. create an monitoring group
for emulator threads with a specific group name, such as 'emulator'. 
Resource monitoring group for iothread could be done by leveraging 'group-name' argument
in a similar way. 

2. An update for virsh command 'domstats'.
Followed your suggestion you provided in upper discussion, the output related to rdt is like these:
 
cpu-resource.cache-occupancy.group0.value=104331 
cpu-resource.cache-occupancy.group0.vcpus=0 
cpu-resource.cache-occupancy.group0.value=203200 
cpu-resource.cache-occupancy.group0.vcpus=1-2 
cpu-resource.cache-occupancy.group0.value=340129 
cpu-resource.cache-occupancy.group0.vcpus=3 
later for mbm, these outputs would be
cpu-resource.memory-bandwidth.group0.value=10331 
cpu-resource.memory-bandwidth.group0.vcpus=0 
cpu-resource.memory-bandwidthy.group0.value=2000 
cpu-resource.memory-bandwidth.group0.vcpus=1-2 


> >>
> >> >```
> >> >[root@dl-c200 david]# virsh list --all
> >> > Id    Name                           State
> >> >----------------------------------------------------
> >> > 1     vm3                            running
> >> > 3     vm2                            running
> >> > -     vm1                            shut off
> >> >```
> >> >
> >> >### Test on a running domain vm3
> >> >To get RDT monitoring status, type 'virsh resctrl <domain>'
> >> >```
> >> >    [root@dl-c200 david]# virsh resctrl vm3
> >> >    RDT Monitoring Status: Enabled
> >> >```
> >> >
> >> >To enable RDT monitoring, type 'virsh resctrl <domain> --enable'
> >> >```
> >> >    [root@dl-c200 david]# virsh resctrl vm3 --enable
> >> >    RDT Monitoring Status: Enabled
> >> >```
> >> >
> >> >To diable RDT monitoring, type 'virsh resctrl <domain> --disable'
> >> >```
> >> >    [root@dl-c200 david]# virsh resctrl vm3 --disable
> >> >    RDT Monitoring Status: Disabled
> >> >
> >> >    [root@dl-c200 david]# virsh resctrl vm3
> >> >    RDT Monitoring Status: Disabled
> >> >```
> >> >
> >> >### test on domain not running vm1
> >> >if domain is not active, it will fail to set RDT monitoring status,
> >> >and also get the
> >> state of 'disabled'
> >> >```
> >> >    [root@dl-c200 david]# virsh resctrl vm1
> >> >    RDT Monitoring Status: Disabled
> >> >
> >> >    [root@dl-c200 david]# virsh resctrl vm1 --enable
> >> >    error: Requested operation is not valid: domain is not running
> >> >
> >> >    [root@dl-c200 david]# virsh resctrl vm1 --disable
> >> >    error: Requested operation is not valid: domain is not running
> >> > ```
> >> >
> >>
> >> Can't these commands enable it in the XML?  It would be nice if the
> >> XML part was shown here in the explanation.
> >
> >In the POC code of the first version there is no XML changes, and could
> >not be enabled/disabled through XML file.
> >
> >Let's have a discuss and add this function, how about this
> >configuration <cputune>
> >	<cachetune vcpus='1-2'>
> >  		<cache id='0' level='3' type='both' size='2816' unit='KiB'/>
> >  		<cache id='1' level='3' type='both' size='2816' unit='KiB'/>
> >	</cachetune>
> >	<rdt-monitoring vcpu='0' enable='no'>
> >  	<rdt-monitoring vcpu='1-2' enable='yes'>
> >	<rdt-monitoring vcpu='3' enable='yes'> </cputune>
> 

To not make user confusing, here changing 'rdt-monitoring' to 'monitoring'. Since 
'monitoring' is a sub-node of 'cputune', it obviously means CPU(tune) related 
'monitoring'.
Also removing attribute 'enable'. 
The XML configuration would be:
            <cputune>
              <cachetune vcpus='1-2'>
  		<cache id='0' level='3' type='both' size='2816' unit='KiB'/>
  		<cache id='1' level='3' type='both' size='2816' unit='KiB'/>
              </cachetune>

              <monitoring vcpu='0'>
              <monitoring vcpu='1-2'>
              <monitoring vcpu='3'>
            </cputune>


> Just so we are on the same note, it doesn't have to have an option to be
> enabled/disabled in the XML.  However, you probably still need to keep the state
> of that information somewhere across libvirtd restarts.  If there is any, I haven't
> gone through the code.
> 

Similar to def->ncachetunes and def->cachetunes, def->nmongroups and 
def->mongroups are created to preserve monitoring group settings.

> >With upper setting,
> >- Two rdt monitoring groups will be created along with the launch of vm.
> >- <rdt-monitoring vcpu='1-2' enable='yes'> is created automatically due
> >to the setting of <cachetune>. Under resctrl fs, the resctrl allocation
> >and rdt monitoring group are presented in the way of sub-folders, we
> >cannot create two sub-folders under resctrl fs folders for one process.
> >so a resctrl allocation will create a rdt monitoring group as well.
> >This rdt monitoring group could not be disabled in runtime because there is no
> way to disable resctrl allocation (CAT) in runtime.
> >- <rdt-monitoring vcpu='3' enable='yes'> creates another default
> >enabled rdt monitoring group, and task id (pid associated with vcpu3)
> >will be put into the 'tasks' file. This rdt monitoring over vcpu 3
> >could be enabled or disabled in runtime through command such as 'virsh
> rdtmon --enable vcpu3' .
> >The MBM feature will also be enabled or disabled with this command.
> >- <rdt-monitoring vcpu='0' enable='no'> specifies the default CAT state
> >for vcpu0 of domain, which is disabled after launch, and could be changed in
> runtime.
> >
> 
> There are many places where stuff can be created.  I started going down the
> rabbit hole again (like last time when I was implementing CAT) and again, the
> kernel interface is horrible.  Inconsistent naming, poor documentation (or
> maybe I'm just a very bad reader).  I hope someone will join this review because
> I can't sensibly map the kernel interface to whatever libvirt might do/expose.  I
> already wasted so much time on CAT and I don't want to go back to that again.
> 
> Let's not do any XML changes unless we find out they are actually needed.
> 

For the XML changes part, not understand. If we want the feature to save and 
create some resource groups at domain startup, the XML is the place for keeping 
the configuration. Do you still want to remove my XML changes after I removed 
'enable' attribute?

> >
> >> >### test on domain vm2
> >> >domain vm2 is active and the CAT functionality is enabled through
> 'cachetune'
> >> (configured in 'cputune/cachetune' section). So the resctrl mon group
> >> is a 'paried' one, for 'pared' mon group, the RDT monitoring could
> >> not be disabled. If it is allowed to disable 'paire' mon group, we
> >> have to destroy resctrl allocation folders which is not supported by current
> cache allocation design.
> >>
> >> What if you have multiple cachetunes?  What if the cachetune is only
> >> set for one vcpu and you want to monitor the others as well?  I guess
> >> I have to see the patches to understand why you have so much
> >> information stored for something that looks like a boolean
> >> (enable/disable).
> >
> >At the time I raised this RFC, there is no design for reporting rdt
> >monitoring information in granularity of cachetune, only report cache
> >/memory bandwidth information for whole domain.
> >But now I'd like to discuss the design that I list above, reporting rdt
> >monitoring Information based on the setting of
> >rdt-monitoring(cachetune) groups. Need your comments.
> >
> 
> I just wanted to know what is the preferred approach.  If we're creating
> mon_groups/domain_name_vcpus_X/ or just new resctrl group (there is not
> much of a difference in that).  Does it take hot-(un)plug of vcpus into
> consideration?
> How about emulator threads and iothreads?  I know libvirt doesn't support them
> yet for CAT, but that'd be a good way to start adding features to libvirt IMHO.
> 
> Or live changes to cachetunes.  If we have that, then maybe the addition of
> monitoring will make more sense and it will fit more nicely (since we'll have a
> more complete picture).
> 

I haven't taken vcpu hotplug in consideration. It may cause some trouble to libvirt 
RDT function, both resource monitoring and allocation part, because the vcpu 
thread may be destroyed after command of setvcpu, if rescource control interface
does not aware that, it will cause some miss-match, e.g. resource group 
sub-directory exists but it doesn't work well for missing disappeared vcpu thread.
If resource group live change interface (CMT and CAT) exists we could ask user to
destroy resource groups first. Or we define the rule that if you want to change 
vcpu count lively, your resource groups, both allocation and monitoring groups, 
will disappear. 

I have no special consideration for emulation threads and io threads. I don't find 
there is any special action for emulator and io threads by reading CAT source
codes, do I miss any part of it? Anyway, do we need this feature right now?

You proposed a command, 'cachetune', for a live change of cache allocations,
that should be fine to implement, I will involve some code to implement the
live change of monitoring groups in my next POC code. 
Maybe I could submit a patch for 'cachetune' after this RDT monitoring feature.


> >>
> >> >```
> >> >    [root@dl-c200 libvirt]# virsh resctrl vm2 --enable
> >> >    RDT Monitoring Status: Enabled (forced by cachetune)
> >> >
> >> >    [root@dl-c200 libvirt]# virsh resctrl vm2 --disable
> >> >    RDT Monitoring Status: Enabled (forced by cachetune)
> >> >
> >> >    [root@dl-c200 libvirt]# virsh resctrl vm2
> >> >    RDT Monitoring Status: Enabled (forced by cachetune) ```
> >> >
> >> >## About showing the utilization information of RDT
> >> >
> >> >A domstats field has been created to show the utilization of RDT
> >> >resources, the
> >> command is like this:
> >> >```
> >> >    [root@dl-c200 libvirt]# virsh domstats --resctrl
> >> >    Domain: 'vm1'
> >> >      resctrl.cmt=0
> >> >
> >> >    Domain: 'vm3'
> >> >      resctrl.cmt=180224
> >> >
> >> >    Domain: 'vm2'
> >> >      resctrl.cmt=2613248
> >> >```
> >> >
> >> >
> >> >Wang Huaqiang (3):
> >> >  util: add Intel x86 RDT/CMT support
> >> >  tools: virsh: add command for controling/monitoring resctrl
> >> >  tools: virsh domstats: show RDT CMT resource utilization
> >> >information
> >> >
> >> > include/libvirt/libvirt-domain.h    |  10 ++
> >> > src/conf/domain_conf.c              |  28 ++++
> >> > src/conf/domain_conf.h              |   3 +
> >> > src/driver-hypervisor.h             |   8 +
> >> > src/libvirt-domain.c                |  92 +++++++++++
> >> > src/libvirt_private.syms            |   9 +
> >> > src/libvirt_public.syms             |   6 +
> >> > src/qemu/qemu_driver.c              | 189 +++++++++++++++++++++
> >> > src/qemu/qemu_process.c             |  65 +++++++-
> >> > src/remote/remote_daemon_dispatch.c |  45 +++++
> >> > src/remote/remote_driver.c          |   2 +
> >> > src/remote/remote_protocol.x        |  28 +++-
> >> > src/remote_protocol-structs         |  12 ++
> >> > src/util/virresctrl.c               | 316
> +++++++++++++++++++++++++++++++++++-
> >> > src/util/virresctrl.h               |  44 +++++
> >> > tools/virsh-domain-monitor.c        |   7 +
> >> > tools/virsh-domain.c                |  74 +++++++++
> >> > 17 files changed, 933 insertions(+), 5 deletions(-)
> >> >
> >> >--
> >> >2.7.4
> >> >
> >> >--
> >> >libvir-list mailing list
> >> >libvir-list@redhat.com
> >> >https://www.redhat.com/mailman/listinfo/libvir-list

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list