From nobody Mon Feb 9 16:54:11 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=reject dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1730713414; cv=none; d=zohomail.com; s=zohoarc; b=oGzVZ3AWHZD3Yoqi4+DdHkFYczIkMYs4kwdik2UE9r6RNyOyqZLg+1y/pdH/C0GiietzKXHPKU0S7vKt6xGPcytbPx3i1VFUOBkSOqmLKQGv75BHLAEe+wm2qUWpVkBXT1diqHB3bwSmRSouWqnYYbbbRltrMzyrxYZvAgg8GIA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1730713414; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=Iclc0zYVet4nx97YIS/BP9FtQpieaSXD0nnk1nSLYTE=; b=Djzg60b/7HvszCgDrY4TpTU820thi09TT0pAfe/n7X9De+cctGOhYWRzXENStwLigKUwU1wQJBpPz3nQuAqaoflEjkdYT4x5+TEC2XfuXgd4mvp0egNzSay23n3oE0lcquQ4WTGCSCLV41VjgSLwoXNoMnMCVq/Ncl8MhS7Cm08= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 173071341440538.61600410367976; Mon, 4 Nov 2024 01:43:34 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1t7tbM-0004hB-SN; Mon, 04 Nov 2024 04:42:25 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1t7tbG-0004gD-Iq for qemu-devel@nongnu.org; Mon, 04 Nov 2024 04:42:18 -0500 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1t7tbE-0000mo-JH for qemu-devel@nongnu.org; Mon, 04 Nov 2024 04:42:18 -0500 Received: from pps.filterd (m0333520.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4A48fcC0018200; Mon, 4 Nov 2024 09:42:03 GMT Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.appoci.oracle.com [130.35.103.27]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 42nc4bt9m6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 Nov 2024 09:42:02 +0000 (GMT) Received: from pps.filterd (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 4A47aUTJ009324; Mon, 4 Nov 2024 09:42:02 GMT Received: from pps.reinject (localhost [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 42nahbt0d1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 Nov 2024 09:42:02 +0000 Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 4A49fqfm018519; Mon, 4 Nov 2024 09:42:01 GMT Received: from localhost.localdomain (ca-dev80.us.oracle.com [10.211.9.80]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 42nahbt06k-6; Mon, 04 Nov 2024 09:42:01 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=corp-2023-11-20; bh=Iclc0 zYVet4nx97YIS/BP9FtQpieaSXD0nnk1nSLYTE=; b=mhhDRL664Gemhe8XLOnwp y3kTWMYkGSJI+MiKWwFekrw1xofWdDq4XhuB7kEYIUGkSMUMIdjo5tgYsAzhU/1V J43KiAjRc5aUn3o/WRWbyyI2ESAEf2OSDunXQruP1FapEfKsbANLbrTMKx1Jr9L7 ZOkm0mFpT1GPwk7rJnm3k6jS3XtFbHaLdDTaJUUZBzKTX5YcwOP7Ki9NPGL8CSl+ sksldJHzNXmSn5jLRsaKWw7cY95kkLyavvUJDdQe9zGO0Q+Elsb4v0h8GnIaV9KL s/t3QqBTlr0LwEdeXJd8GqVG1oXD+CdipWzkreEIVB3Qy6Zot0jpl9p3eZHoZJNX g== From: Dongli Zhang To: qemu-devel@nongnu.org, kvm@vger.kernel.org Cc: pbonzini@redhat.com, mtosatti@redhat.com, sandipan.das@amd.com, babu.moger@amd.com, zhao1.liu@intel.com, likexu@tencent.com, like.xu.linux@gmail.com, zhenyuw@linux.intel.com, groug@kaod.org, lyan@digitalocean.com, khorenko@virtuozzo.com, alexander.ivanov@virtuozzo.com, den@virtuozzo.com, joe.jin@oracle.com, davydov-max@yandex-team.ru Subject: [PATCH 5/7] target/i386/kvm: reset AMD PMU registers during VM reset Date: Mon, 4 Nov 2024 01:40:20 -0800 Message-ID: <20241104094119.4131-6-dongli.zhang@oracle.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241104094119.4131-1-dongli.zhang@oracle.com> References: <20241104094119.4131-1-dongli.zhang@oracle.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-11-04_07,2024-11-01_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 mlxlogscore=999 mlxscore=0 suspectscore=0 bulkscore=0 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2409260000 definitions=main-2411040085 X-Proofpoint-ORIG-GUID: LGLvVXHmiIkrbiWy_YSBI4nhmM2P3gqK X-Proofpoint-GUID: LGLvVXHmiIkrbiWy_YSBI4nhmM2P3gqK Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.177.32; envelope-from=dongli.zhang@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1730713416559116600 Content-Type: text/plain; charset="utf-8" QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM and kvm_put_msrs() to restore them to KVM. However, there is no support for AMD PMU registers. Currently, has_pmu_version and num_pmu_gp_counters are initialized based on cpuid(0xa), which does not apply to AMD processors. For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers is determined based on the CPU version. To address this issue, we need to add support for AMD PMU registers. Without this support, the following problems can arise: 1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while running "perf top", the PMU registers are not disabled properly. 2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs() does not handle AMD PMU registers, causing some PMU events to remain enabled in KVM. 3. The KVM kvm_pmc_speculative_in_use() function consistently returns true, preventing the reclamation of these events. Consequently, the kvm_pmc->perf_event remains active. 4. After a reboot, the VM kernel may report the following error: [ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detect= ed, complain to your hardware vendor. [ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR= c0010200 is 530076) 5. In the worst case, the active kvm_pmc->perf_event may inject unknown NMIs randomly into the VM kernel: [...] Uhhuh. NMI received for unknown reason 30 on CPU 0. To resolve these issues, we propose resetting AMD PMU registers during the VM reset process. Signed-off-by: Dongli Zhang --- target/i386/cpu.h | 8 +++ target/i386/kvm/kvm.c | 156 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 161 insertions(+), 3 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 59959b8b7a..0505eb3b08 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -488,6 +488,14 @@ typedef enum X86Seg { #define MSR_CORE_PERF_GLOBAL_CTRL 0x38f #define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390 =20 +#define MSR_K7_EVNTSEL0 0xc0010000 +#define MSR_K7_PERFCTR0 0xc0010004 +#define MSR_F15H_PERF_CTL0 0xc0010200 +#define MSR_F15H_PERF_CTR0 0xc0010201 + +#define AMD64_NUM_COUNTERS 4 +#define AMD64_NUM_COUNTERS_CORE 6 + #define MSR_MC0_CTL 0x400 #define MSR_MC0_STATUS 0x401 #define MSR_MC0_ADDR 0x402 diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index ca2b644e2c..83ec85a9b9 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -2035,7 +2035,7 @@ full: abort(); } =20 -static void kvm_init_pmu_info(CPUX86State *env) +static void kvm_init_pmu_info_intel(CPUX86State *env) { uint32_t eax, edx; uint32_t unused; @@ -2072,6 +2072,80 @@ static void kvm_init_pmu_info(CPUX86State *env) } } =20 +static void kvm_init_pmu_info_amd(CPUX86State *env) +{ + int64_t family; + + has_pmu_version =3D 0; + + /* + * To determine the CPU family, the following code is derived from + * x86_cpuid_version_get_family(). + */ + family =3D (env->cpuid_version >> 8) & 0xf; + if (family =3D=3D 0xf) { + family +=3D (env->cpuid_version >> 20) & 0xff; + } + + /* + * Performance-monitoring supported from K7 and later. + */ + if (family < 6) { + return; + } + + has_pmu_version =3D 1; + + if (!(env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_PERFCORE)) { + num_pmu_gp_counters =3D AMD64_NUM_COUNTERS; + return; + } + + num_pmu_gp_counters =3D AMD64_NUM_COUNTERS_CORE; +} + +static bool is_same_vendor(CPUX86State *env) +{ + static uint32_t host_cpuid_vendor1; + static uint32_t host_cpuid_vendor2; + static uint32_t host_cpuid_vendor3; + + host_cpuid(0x0, 0, NULL, &host_cpuid_vendor1, &host_cpuid_vendor3, + &host_cpuid_vendor2); + + return env->cpuid_vendor1 =3D=3D host_cpuid_vendor1 && + env->cpuid_vendor2 =3D=3D host_cpuid_vendor2 && + env->cpuid_vendor3 =3D=3D host_cpuid_vendor3; +} + +static void kvm_init_pmu_info(CPUX86State *env) +{ + /* + * It is not supported to virtualize AMD PMU registers on Intel + * processors, nor to virtualize Intel PMU registers on AMD processors. + */ + if (!is_same_vendor(env)) { + return; + } + + /* + * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to + * disable the AMD pmu virtualization. + * + * If KVM_CAP_PMU_CAPABILITY is supported, kvm_state->pmu_cap_disabled + * indicates the KVM has already disabled the pmu virtualization. + */ + if (kvm_state->pmu_cap_disabled) { + return; + } + + if (IS_INTEL_CPU(env)) { + kvm_init_pmu_info_intel(env); + } else if (IS_AMD_CPU(env)) { + kvm_init_pmu_info_amd(env); + } +} + int kvm_arch_init_vcpu(CPUState *cs) { struct { @@ -4027,7 +4101,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level) kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control= _msr); } =20 - if (has_pmu_version > 0) { + if (IS_INTEL_CPU(env) && has_pmu_version > 0) { if (has_pmu_version > 1) { /* Stop the counter. */ kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0); @@ -4058,6 +4132,38 @@ static int kvm_put_msrs(X86CPU *cpu, int level) env->msr_global_ctrl); } } + + if (IS_AMD_CPU(env) && has_pmu_version > 0) { + uint32_t sel_base =3D MSR_K7_EVNTSEL0; + uint32_t ctr_base =3D MSR_K7_PERFCTR0; + /* + * The address of the next selector or counter register is + * obtained by incrementing the address of the current selector + * or counter register by one. + */ + uint32_t step =3D 1; + + /* + * When PERFCORE is enabled, AMD PMU uses a separate set of + * addresses for the selector and counter registers. + * Additionally, the address of the next selector or counter + * register is determined by incrementing the address of the + * current register by two. + */ + if (num_pmu_gp_counters =3D=3D AMD64_NUM_COUNTERS_CORE) { + sel_base =3D MSR_F15H_PERF_CTL0; + ctr_base =3D MSR_F15H_PERF_CTR0; + step =3D 2; + } + + for (i =3D 0; i < num_pmu_gp_counters; i++) { + kvm_msr_entry_add(cpu, ctr_base + i * step, + env->msr_gp_counters[i]); + kvm_msr_entry_add(cpu, sel_base + i * step, + env->msr_gp_evtsel[i]); + } + } + /* * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-= add, * only sync them to KVM on the first cpu @@ -4503,7 +4609,8 @@ static int kvm_get_msrs(X86CPU *cpu) if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_POLL_CONTROL)) { kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1); } - if (has_pmu_version > 0) { + + if (IS_INTEL_CPU(env) && has_pmu_version > 0) { if (has_pmu_version > 1) { kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0); kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0); @@ -4519,6 +4626,35 @@ static int kvm_get_msrs(X86CPU *cpu) } } =20 + if (IS_AMD_CPU(env) && has_pmu_version > 0) { + uint32_t sel_base =3D MSR_K7_EVNTSEL0; + uint32_t ctr_base =3D MSR_K7_PERFCTR0; + /* + * The address of the next selector or counter register is + * obtained by incrementing the address of the current selector + * or counter register by one. + */ + uint32_t step =3D 1; + + /* + * When PERFCORE is enabled, AMD PMU uses a separate set of + * addresses for the selector and counter registers. + * Additionally, the address of the next selector or counter + * register is determined by incrementing the address of the + * current register by two. + */ + if (num_pmu_gp_counters =3D=3D AMD64_NUM_COUNTERS_CORE) { + sel_base =3D MSR_F15H_PERF_CTL0; + ctr_base =3D MSR_F15H_PERF_CTR0; + step =3D 2; + } + + for (i =3D 0; i < num_pmu_gp_counters; i++) { + kvm_msr_entry_add(cpu, ctr_base + i * step, 0); + kvm_msr_entry_add(cpu, sel_base + i * step, 0); + } + } + if (env->mcg_cap) { kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0); kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0); @@ -4830,6 +4966,20 @@ static int kvm_get_msrs(X86CPU *cpu) case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1: env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] =3D msrs[i].data; break; + case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + 3: + env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] =3D msrs[i].data; + break; + case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + 3: + env->msr_gp_counters[index - MSR_K7_PERFCTR0] =3D msrs[i].data; + break; + case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTL0 + 0xb: + index =3D index - MSR_F15H_PERF_CTL0; + if (index & 0x1) { + env->msr_gp_counters[index] =3D msrs[i].data; + } else { + env->msr_gp_evtsel[index] =3D msrs[i].data; + } + break; case HV_X64_MSR_HYPERCALL: env->msr_hv_hypercall =3D msrs[i].data; break; --=20 2.39.3