From nobody Sat May 18 10:30:03 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1684329395; cv=none; d=zohomail.com; s=zohoarc; b=YW/nQlzLNwqo98BlH5dAg4ill98ixBy21veyYIEm4yeydDCVbrKrM2QoCoVFXi56xOPKbAL8fUGTxTq31X3V1MHNvu8V4XdEUh/CQD6+RRZsnF7TokPvkbvIFALYZ+kjTvRD1xvUhVI9A2Bhs1i25PVYcsySE9jrYD6ozSqAzNM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1684329395; h=Content-Transfer-Encoding:Date:From:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:To; bh=lBFSkKdXdu9AIoEu3PYweuQjYnR3RVJNCXdaMOrD58M=; b=hMGML6EEdGv5qgCadBcovTaq7X/jdHVAl3XPstVy+j+e+NcFiwGJJPRTAFDsoCa/Izx2dhHn3J8V8fApGuJMAfl2RJG4Rckz8k+OuBzqGr+dSuvUBms+Nm2UNFo3369B1cT7ee8qwTbo96tW/s5XrDXyRwYOB1gAWcL5Z2oqW8g= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1684329395704210.8216548062395; Wed, 17 May 2023 06:16:35 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pzH0d-0002pn-0o; Wed, 17 May 2023 09:16:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pzGto-0001YN-F4 for qemu-devel@nongnu.org; Wed, 17 May 2023 09:09:00 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pzGth-0007vp-HE for qemu-devel@nongnu.org; Wed, 17 May 2023 09:08:57 -0400 Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-56-6bZ18C3IOlOByy4dDzCKGQ-1; Wed, 17 May 2023 09:07:41 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8C9C91C0A587 for ; Wed, 17 May 2023 13:07:41 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.39.208.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 014E71121314; Wed, 17 May 2023 13:07:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684328931; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=lBFSkKdXdu9AIoEu3PYweuQjYnR3RVJNCXdaMOrD58M=; b=QpxtC2Ze6rjUsBpMD9brStJaH+Y6K1IuH7RlZiOUMfJz3HPw2y2KTLUQezIhNdUSD8hKdf 00gXIiBRULO6qiuhf+7qY+M7dn/tAAPH1lfnofiC2+FhoVFcSPUzHV+q44YHWKBU1Eaagq vCV0rQWo6dbBTXf5auxnfILvW1c34GM= X-MC-Unique: 6bZ18C3IOlOByy4dDzCKGQ-1 From: Anthony Harivel To: qemu-devel@nongnu.org, pbonzini@redhat.com, mtosatti@redhat.com Subject: [RFC PATCH] Add support for RAPL MSRs in KVM/Qemu Date: Wed, 17 May 2023 15:07:30 +0200 Message-Id: <20230517130730.85469-1-aharivel@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=aharivel@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Wed, 17 May 2023 09:15:46 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1684329396394100001 Content-Type: text/plain; charset="utf-8" Starting with the "Sandy Bridge" generation, Intel CPUs provide a RAPL interface (Running Average Power Limit) for advertising the accumulated energy consumption of various power domains (e.g. CPU packages, DRAM, etc.). The consumption is reported via MSRs (model specific registers) like MSR_PKG_ENERGY_STATUS for the CPU package power domain. These MSRs are 64 bits registers that represent the accumulated energy consumption in micro Joules. They are updated by microcode every ~1ms. For now, KVM always returns 0 when the guest requests the value of these MSRs. Use the KVM MSR filtering mechanism to allow QEMU handle these MSRs dynamically in userspace. To limit the amount of system calls for every MSR call, create a new thread in QEMU that updates the "virtual" MSR values asynchronously. Each vCPU has its own vMSR to reflect the independence of vCPUs. The thread updates the vMSR values with the ratio of energy consumed of the whole physical CPU package the vCPU thread runs on and the thread's utime and stime values. All other non-vCPU threads are also taken into account. Their energy consumption is evenly distributed among all vCPUs threads running on the same physical CPU package. This feature is activated with -accel kvm,rapl=3Dtrue. Actual limitation: - Works only on Intel host CPU because AMD CPUs are using different MSR adresses. - Only the Package Power-Plane (MSR_PKG_ENERGY_STATUS) is reported at the moment. - Since each vCPU has an independent vMSR value, the vCPU topology must be changed to match that reality. There must be a single vCPU per virtual socket (e.g.: -smp 4,sockets=3D4). Accessing pkg-0 energy will give vCPU 0 energy, pkg-1 will give vCPU 1 energy, etc. Signed-off-by: Anthony Harivel --- Notes: Earlier this year, I've proposed a patch in linux KVM [1] in order to bring energy awareness in VM. =20 Thanks to the feedback, I've worked on another solution that requires only a QEMU patch that make us of MSR filtering mecanism. =20 This patch is proposed as an RFC at the moment in order to validate the paradigm and see if the actual limitation could be adressed in a second phase. =20 Regards, Anthony =20 [1]: https://lore.kernel.org/kvm/20230118142123.461247-1-aharivel@redha= t.com/ accel/kvm/kvm-all.c | 13 ++ include/sysemu/kvm_int.h | 11 ++ target/i386/cpu.h | 8 + target/i386/kvm/kvm.c | 273 ++++++++++++++++++++++++++++++++++ target/i386/kvm/meson.build | 1 + target/i386/kvm/vmsr_energy.c | 132 ++++++++++++++++ target/i386/kvm/vmsr_energy.h | 80 ++++++++++ 7 files changed, 518 insertions(+) create mode 100644 target/i386/kvm/vmsr_energy.c create mode 100644 target/i386/kvm/vmsr_energy.h diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index cf3a88d90e92..13bb2a523c5d 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -3699,6 +3699,12 @@ static void kvm_set_dirty_ring_size(Object *obj, Vis= itor *v, s->kvm_dirty_ring_size =3D value; } =20 +static void kvm_set_kvm_rapl(Object *obj, bool value, Error **errp) +{ + KVMState *s =3D KVM_STATE(obj); + s->msr_energy.enable =3D value; +} + static void kvm_accel_instance_init(Object *obj) { KVMState *s =3D KVM_STATE(obj); @@ -3715,6 +3721,7 @@ static void kvm_accel_instance_init(Object *obj) s->xen_version =3D 0; s->xen_gnttab_max_frames =3D 64; s->xen_evtchn_max_pirq =3D 256; + s->msr_energy.enable =3D false; } =20 /** @@ -3755,6 +3762,12 @@ static void kvm_accel_class_init(ObjectClass *oc, vo= id *data) object_class_property_set_description(oc, "dirty-ring-size", "Size of KVM dirty page ring buffer (default: 0, i.e. use bitmap)"= ); =20 + object_class_property_add_bool(oc, "rapl", + NULL, + kvm_set_kvm_rapl); + object_class_property_set_description(oc, "rapl", + "Allow energy related MSRs for RAPL interface in Guest"); + kvm_arch_accel_class_init(oc); } =20 diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index a641c974ea54..cf3a01f498d7 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -47,6 +47,16 @@ typedef struct KVMMemoryListener { =20 #define KVM_MSI_HASHTAB_SIZE 256 =20 +struct KVMMsrEnergy { + bool enable; + QemuThread msr_thr; + int cpus; + uint64_t *msr_value; + uint64_t msr_unit; + uint64_t msr_limit; + uint64_t msr_info; +}; + enum KVMDirtyRingReaperState { KVM_DIRTY_RING_REAPER_NONE =3D 0, /* The reaper is sleeping */ @@ -116,6 +126,7 @@ struct KVMState uint64_t kvm_dirty_ring_bytes; /* Size of the per-vcpu dirty ring */ uint32_t kvm_dirty_ring_size; /* Number of dirty GFNs per ring */ struct KVMDirtyRingReaper reaper; + struct KVMMsrEnergy msr_energy; NotifyVmexitOption notify_vmexit; uint32_t notify_window; uint32_t xen_version; diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 8504aaac6807..14f9c2901680 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -396,6 +396,10 @@ typedef enum X86Seg { #define MSR_IA32_TSX_CTRL 0x122 #define MSR_IA32_TSCDEADLINE 0x6e0 #define MSR_IA32_PKRS 0x6e1 +#define MSR_RAPL_POWER_UNIT 0x00000606 +#define MSR_PKG_POWER_LIMIT 0x00000610 +#define MSR_PKG_ENERGY_STATUS 0x00000611 +#define MSR_PKG_POWER_INFO 0x00000614 #define MSR_ARCH_LBR_CTL 0x000014ce #define MSR_ARCH_LBR_DEPTH 0x000014cf #define MSR_ARCH_LBR_FROM_0 0x00001500 @@ -1757,6 +1761,10 @@ typedef struct CPUArchState { =20 uintptr_t retaddr; =20 + /* RAPL MSR */ + uint64_t msr_rapl_power_unit; + uint64_t msr_pkg_energy_status; + /* Fields up to this point are cleared by a CPU reset */ struct {} end_reset_fields; =20 diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index de531842f6b1..c79d6b811109 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -16,11 +16,16 @@ #include "qapi/qapi-events-run-state.h" #include "qapi/error.h" #include "qapi/visitor.h" +#include +#include #include #include #include +#include +#include =20 #include +#include #include "standard-headers/asm-x86/kvm_para.h" #include "hw/xen/interface/arch-x86/cpuid.h" =20 @@ -35,6 +40,7 @@ #include "xen-emu.h" #include "hyperv.h" #include "hyperv-proto.h" +#include "vmsr_energy.h" =20 #include "exec/gdbstub.h" #include "qemu/host-utils.h" @@ -2518,6 +2524,49 @@ static bool kvm_rdmsr_core_thread_count(X86CPU *cpu,= uint32_t msr, return true; } =20 +static bool kvm_rdmsr_rapl_power_unit(X86CPU *cpu, uint32_t msr, + uint64_t *val) +{ + + CPUState *cs =3D CPU(cpu); + + *val =3D cs->kvm_state->msr_energy.msr_unit; + + return true; +} + +static bool kvm_rdmsr_pkg_power_limit(X86CPU *cpu, uint32_t msr, + uint64_t *val) +{ + + CPUState *cs =3D CPU(cpu); + + *val =3D cs->kvm_state->msr_energy.msr_limit; + + return true; +} + +static bool kvm_rdmsr_pkg_power_info(X86CPU *cpu, uint32_t msr, + uint64_t *val) +{ + + CPUState *cs =3D CPU(cpu); + + *val =3D cs->kvm_state->msr_energy.msr_info; + + return true; +} + +static bool kvm_rdmsr_pkg_energy_status(X86CPU *cpu, uint32_t msr, + uint64_t *val) +{ + + CPUState *cs =3D CPU(cpu); + *val =3D cs->kvm_state->msr_energy.msr_value[cs->cpu_index]; + + return true; +} + static Notifier smram_machine_done; static KVMMemoryListener smram_listener; static AddressSpace smram_address_space; @@ -2552,6 +2601,190 @@ static void register_smram_listener(Notifier *n, vo= id *unused) &smram_address_space, 1, "kvm-smram"); } =20 +static void *kvm_msr_energy_thread(void *data) +{ + KVMState *s =3D data; + struct KVMMsrEnergy *vmsr =3D &s->msr_energy; + unsigned int maxpkgs, maxcpus, maxticks; + package_energy_stat *pkg_stat; + int num_threads; + thread_stat *thd_stat; + CPUState *cpu; + pid_t pid; + + rcu_register_thread(); + + /* Get QEMU PID*/ + pid =3D getpid(); + + /* Assuming those values are the same accross physical system/packages= */ + maxcpus =3D get_maxcpus(0); /* Number of CPUS per packages */ + maxpkgs =3D numa_max_node(); /* Number of Packages on the system */ + /* Those MSR values should not change as well */ + vmsr->msr_unit =3D read_msr(MSR_RAPL_POWER_UNIT, 0); + vmsr->msr_limit =3D read_msr(MSR_PKG_POWER_LIMIT, 0); + vmsr->msr_info =3D read_msr(MSR_PKG_POWER_INFO, 0); + + /* Allocate memory for each package energy status */ + pkg_stat =3D (package_energy_stat *) calloc(maxpkgs + 1, + sizeof(package_energy_stat)); + + /* + * Max numbers of ticks per package + * time in second * number of ticks/second * Number of cores / package + * ex: for 100 ticks/second/CPU, 12 CPUs per Package gives 1200 ticks = max + */ + maxticks =3D (MSR_ENERGY_THREAD_SLEEP_US / 1000000) + * sysconf(_SC_CLK_TCK) * maxcpus; + + while (true) { + + /* Get all qemu threads id */ + pid_t *thread_ids =3D get_thread_ids(pid, &num_threads); + + if (thread_ids =3D=3D NULL) { + return NULL; + } + + /* Allocate memory for each thread stats */ + thd_stat =3D (thread_stat *) calloc(num_threads, sizeof(thread_sta= t)); + + /* Populate all the thread stats */ + for (int i =3D 0; i < num_threads; i++) { + thd_stat[i].thread_id =3D thread_ids[i]; + thd_stat[i].utime =3D calloc(2, sizeof(unsigned long long)); + thd_stat[i].stime =3D calloc(2, sizeof(unsigned long long)); + read_thread_stat(&thd_stat[i], pid, 0); + thd_stat[i].numa_node_id =3D numa_node_of_cpu(thd_stat[i].cpu_= id); + } + + /* Retrieve all packages power plane energy counter */ + for (int i =3D 0; i <=3D maxpkgs; i++) { + for (int j =3D 0; j < num_threads; j++) { + /* + * Use the first thread we found that ran on the CPU + * of the package to read the packages energy counter + */ + if (thd_stat[j].numa_node_id =3D=3D i) { + pkg_stat[i].e_start =3D read_msr(MSR_PKG_ENERGY_STATUS= , i); + break; + } + } + } + + /* Sleep a short period while the other threads are working */ + usleep(MSR_ENERGY_THREAD_SLEEP_US); + + /* + * Retrieve all packages power plane energy counter + * Calculate the delta of all packages + */ + for (int i =3D 0; i <=3D maxpkgs; i++) { + for (int j =3D 0; j < num_threads; j++) { + /* + * Use the first thread we found that ran on the CPU + * of the package to read the packages energy counter + */ + if (thd_stat[j].numa_node_id =3D=3D i) { + pkg_stat[i].e_end =3D + read_msr(MSR_PKG_ENERGY_STATUS, thd_stat[j].cpu_id= ); + pkg_stat[i].e_delta =3D + pkg_stat[i].e_end - pkg_stat[i].e_start; + break; + } + } + } + + /* Delta of ticks spend by each thread between the sample */ + for (int i =3D 0; i < num_threads; i++) { + if (read_thread_stat(&thd_stat[i], pid, 1) !=3D 0) { + /* + * We don't count the dead thread + * i.e threads that existed before the sleep + * and not anymore + */ + thd_stat[i].delta_ticks =3D 0; + } else { + delta_ticks(thd_stat, i); + } + } + + /* + * Identify the vCPU threads + * Calculate the Number of vCPU per package + */ + CPU_FOREACH(cpu) { + for (int i =3D 0; i < num_threads; i++) { + if (cpu->thread_id =3D=3D thd_stat[i].thread_id) { + thd_stat[i].is_vcpu =3D true; + thd_stat[i].vcpu_id =3D cpu->cpu_index; + pkg_stat[thd_stat[i].numa_node_id].nb_vcpu++; + break; + } + } + } + + /* Calculate the total energy of all non-vCPU thread */ + for (int i =3D 0; i < num_threads; i++) { + double temp; + if ((thd_stat[i].is_vcpu !=3D true) && + (thd_stat[i].delta_ticks > 0)) { + temp =3D get_ratio(pkg_stat, thd_stat, maxticks, i); + pkg_stat[thd_stat[i].numa_node_id].e_ratio + +=3D (uint64_t)lround(temp); + } + } + + /* Calculate the ratio per non-vCPU thread of each package */ + for (int i =3D 0; i <=3D maxpkgs; i++) { + if (pkg_stat[i].nb_vcpu > 0) { + pkg_stat[i].e_ratio =3D pkg_stat[i].e_ratio / pkg_stat[i].= nb_vcpu; + } + } + + /* Calculate the energy for each vCPU thread */ + for (int i =3D 0; i < num_threads; i++) { + double temp; + + if ((thd_stat[i].is_vcpu =3D=3D true) && + (thd_stat[i].delta_ticks > 0)) { + temp =3D get_ratio(pkg_stat, thd_stat, maxticks, i); + vmsr->msr_value[thd_stat[i].vcpu_id] +=3D (uint64_t)lround= (temp); + vmsr->msr_value[thd_stat[i].vcpu_id] \ + +=3D pkg_stat[thd_stat[i].numa_node_id].e_ratio; + } + } + + /* free all memory */ + for (int i =3D 0; i < num_threads; i++) { + free(thd_stat[i].utime); + free(thd_stat[i].stime); + } + free(thd_stat); + free(thread_ids); + } + + rcu_unregister_thread(); + return NULL; +} + +static int kvm_msr_energy_thread_init(KVMState *s, MachineState *ms) +{ + struct KVMMsrEnergy *r =3D &s->msr_energy; + + /* Retrieve the number of vCPU */ + r->cpus =3D ms->smp.cpus; + + /* Allocate register memory (MSR_PKG_STATUS) for each vCPU */ + r->msr_value =3D calloc(r->cpus, sizeof(r->msr_value)); + + qemu_thread_create(&r->msr_thr, "kvm-msr", + kvm_msr_energy_thread, + s, QEMU_THREAD_JOINABLE); + + return 0; +} + int kvm_arch_init(MachineState *ms, KVMState *s) { uint64_t identity_base =3D 0xfffbc000; @@ -2765,6 +2998,46 @@ int kvm_arch_init(MachineState *ms, KVMState *s) strerror(-ret)); exit(1); } + + if (s->msr_energy.enable =3D=3D true) { + + r =3D kvm_filter_msr(s, MSR_RAPL_POWER_UNIT, + kvm_rdmsr_rapl_power_unit, NULL); + if (!r) { + error_report("Could not install MSR_RAPL_POWER_UNIT \ + handler: %s", + strerror(-ret)); + exit(1); + } + + r =3D kvm_filter_msr(s, MSR_PKG_POWER_LIMIT, + kvm_rdmsr_pkg_power_limit, NULL); + if (!r) { + error_report("Could not install MSR_PKG_POWER_LIMIT \ + handler: %s", + strerror(-ret)); + exit(1); + } + + r =3D kvm_filter_msr(s, MSR_PKG_POWER_INFO, + kvm_rdmsr_pkg_power_info, NULL); + if (!r) { + error_report("Could not install MSR_PKG_POWER_INFO \ + handler: %s", + strerror(-ret)); + exit(1); + } + r =3D kvm_filter_msr(s, MSR_PKG_ENERGY_STATUS, + kvm_rdmsr_pkg_energy_status, NULL); + if (!r) { + error_report("Could not install MSR_PKG_ENERGY_STATUS \ + handler: %s", + strerror(-ret)); + exit(1); + } else { + kvm_msr_energy_thread_init(s, ms); + } + } } =20 return 0; diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build index 322272091bce..9cdc93c6c439 100644 --- a/target/i386/kvm/meson.build +++ b/target/i386/kvm/meson.build @@ -5,6 +5,7 @@ i386_softmmu_kvm_ss =3D ss.source_set() i386_softmmu_kvm_ss.add(files( 'kvm.c', 'kvm-cpu.c', + 'vmsr_energy.c', )) =20 i386_softmmu_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen-emu.c'= )) diff --git a/target/i386/kvm/vmsr_energy.c b/target/i386/kvm/vmsr_energy.c new file mode 100644 index 000000000000..8bd86b32becf --- /dev/null +++ b/target/i386/kvm/vmsr_energy.c @@ -0,0 +1,132 @@ +/* + * QEMU KVM support -- x86 virtual energy-related MSR. + * + * Copyright 2023 Red Hat, Inc. 2023 + * + * Author: + * Anthony Harivel + * + * This work is licensed under the terms of the GNU GPL, version 2 or late= r. + * See the COPYING file in the top-level directory. + * + */ + +#include "vmsr_energy.h" + +#define MAX_PATH_LEN 50 +#define MAX_LINE_LEN 500 + +uint64_t read_msr(uint32_t reg, unsigned int cpu_id) +{ + int fd; + uint64_t data; + + char path[MAX_PATH_LEN]; + snprintf(path, MAX_PATH_LEN, "/dev/cpu/%u/msr", cpu_id); + + fd =3D open(path , O_RDONLY); + if (fd < 0) { + return 0; + } + if (pread(fd, &data, sizeof data, reg) !=3D sizeof data) { + data =3D 0; + } + + close(fd); + return data; +} + +/* Retrieve the number of physical CPU on the package */ +unsigned int get_maxcpus(unsigned int package_num) +{ + int k, ncpus; + unsigned int maxcpus; + struct bitmask *cpus; + + cpus =3D numa_allocate_cpumask(); + ncpus =3D cpus->size; + + if (numa_node_to_cpus(package_num, cpus) < 0) { + printf("node %u failed to convert\n", package_num); + } + + maxcpus =3D 0; + for (k =3D 0; k < ncpus; k++) { + if (numa_bitmask_isbitset(cpus, k)) { + maxcpus++; + } + } + + return maxcpus; +} + +int read_thread_stat(struct thread_stat *thread, int pid, int index) +{ + char path[MAX_PATH_LEN]; + snprintf(path, MAX_PATH_LEN, "/proc/%u/task/%d/stat", pid, \ + thread->thread_id); + + FILE *file =3D fopen(path, "r"); + if (file =3D=3D NULL) { + return -1; + } + + if (fscanf(file, "%*d (%*[^)]) %*c %*d %*d %*d %*d %*d %*u %*u %*u %*u= %*u" + " %llu %llu %*d %*d %*d %*d %*d %*d %*u %*u %*d %*u %*u" + " %*u %*u %*u %*u %*u %*u %*u %*u %*u %*d %*u %*u %u", + &thread->utime[index], &thread->stime[index], &thread->cpu_id) = !=3D 3) + return -1; + + fclose(file); + return 0; +} + +/* Read QEMU stat task folder to retrieve all QEMU threads ID */ +pid_t *get_thread_ids(pid_t pid, int *num_threads) +{ + char path[100]; + sprintf(path, "/proc/%d/task", pid); + + DIR *dir =3D opendir(path); + if (dir =3D=3D NULL) { + perror("opendir"); + return NULL; + } + + pid_t *thread_ids =3D NULL; + int thread_count =3D 0; + + struct dirent *ent; + while ((ent =3D readdir(dir)) !=3D NULL) { + if (ent->d_name[0] =3D=3D '.') { + continue; + } + pid_t tid =3D atoi(ent->d_name); + if (pid !=3D tid) { + thread_ids =3D realloc(thread_ids, + (thread_count + 1) * sizeof(pid_t)); + thread_ids[thread_count] =3D tid; + thread_count++; + } + } + + closedir(dir); + + *num_threads =3D thread_count; + return thread_ids; +} + +void delta_ticks(thread_stat *thd_stat, int i) +{ + thd_stat[i].delta_ticks =3D (thd_stat[i].utime[1] + thd_stat[i].stime[= 1]) + - (thd_stat[i].utime[0] + thd_stat[i].stime[0]= ); +} + +double get_ratio(package_energy_stat *pkg_stat, + thread_stat *thd_stat, + int maxticks, int i) { + + return (pkg_stat[thd_stat[i].numa_node_id].e_delta / 100.0) + * ((100.0 / maxticks) * thd_stat[i].delta_ticks); +} + diff --git a/target/i386/kvm/vmsr_energy.h b/target/i386/kvm/vmsr_energy.h new file mode 100644 index 000000000000..5f79d2cbe00d --- /dev/null +++ b/target/i386/kvm/vmsr_energy.h @@ -0,0 +1,80 @@ +/* + * QEMU KVM support -- x86 virtual energy-related MSR. + * + * Copyright 2023 Red Hat, Inc. 2023 + * + * Author: + * Anthony Harivel + * + * This work is licensed under the terms of the GNU GPL, version 2 or late= r. + * See the COPYING file in the top-level directory. + * + */ + +#ifndef VMSR_ENERGY_H +#define VMSR_ENERGY_H + +#include "qemu/osdep.h" + +#include + +/* + * Define the interval time in micro seconds between 2 samples of + * energy related MSRs + */ +#define MSR_ENERGY_THREAD_SLEEP_US 1000000.0 + +/* + * Thread statistic + * @ thread_id: TID (thread ID) + * @ is_vcpu: true is thread is vCPU thread + * @ cpu_id: CPU number last executed on + * @ vcpu_id: vCPU ID + * @ numa_node_id:node number of the CPU + * @ utime: amount of clock ticks the thread + * has been scheduled in User mode + * @ stime: amount of clock ticks the thread + * has been scheduled in System mode + * @ delta_ticks: delta of utime+stime between + * the two samples (before/after sleep) + */ +struct thread_stat { + unsigned int thread_id; + bool is_vcpu; + unsigned int cpu_id; + unsigned int vcpu_id; + unsigned int numa_node_id; + unsigned long long *utime; + unsigned long long *stime; + unsigned long long delta_ticks; +}; + +/* + * Package statistic + * @ e_start: package energy counter before the sleep + * @ e_end: package energy counter after the sleep + * @ e_delta: delta of package energy counter + * @ e_ratio: store the energy ratio of non-vCPU thread + * @ nb_vcpu: number of vCPU running on this package + */ +struct packge_energy_stat { + uint64_t e_start; + uint64_t e_end; + uint64_t e_delta; + uint64_t e_ratio; + unsigned int nb_vcpu; +}; + +typedef struct thread_stat thread_stat; +typedef struct packge_energy_stat package_energy_stat; + +uint64_t read_msr(uint32_t reg, unsigned int cpu_id); +void delta_ticks(thread_stat *thd_stat, int i); +unsigned int get_maxcpus(unsigned int package_num); +int read_thread_stat(struct thread_stat *thread, int pid, int index); +pid_t *get_thread_ids(pid_t pid, int *num_threads); +double get_ratio(package_energy_stat *pkg_stat, + thread_stat *thd_stat, + int maxticks, int i); + +#endif /* VMSR_ENERGY_H */ --=20 2.40.1