From nobody Mon Dec 1 20:56:55 2025 Received: from fra-out-003.esa.eu-central-1.outbound.mail-perimeter.amazon.com (fra-out-003.esa.eu-central-1.outbound.mail-perimeter.amazon.com [3.72.182.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8833531A7F6; Mon, 1 Dec 2025 14:24:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=3.72.182.33 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764599094; cv=none; b=X8Q7YmdKPnHnfsPsDY75oTSeGkaM2QfIaFzuKE3mLV1Fntx8kTgqJ4t6uhGqfSsoOuh443xwl9JdIS2jTkHETtobdXgvNIw9h/Yqc9VTUUFNNSza9P7SHRGVVuHSskC1zo5ge7de57EhftHRSsQ13kOzqvPl0qdku2M8ZPMVerw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764599094; c=relaxed/simple; bh=V3pCUcV0THf57Rt37Dp9v+liCAg9z5E32iBeAXB5cuo=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=lNEfEIhGYEiDvMCUmKbc46la8hoJx//ePqF3icGDWwI88cCZWfaKU49R7W9wbwnaQ6z/w6dseWfTYZAzz5Ga6vIZmwShW1njH2xVziIsHE7zuaBqF9kNXnkcQb1YvvQz1+uH5bdatNYZcNRpRj7BLoPiwrm1LlrtqHWCx5k5BTI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (2048-bit key) header.d=amazon.com header.i=@amazon.com header.b=ivo2SK6r; arc=none smtp.client-ip=3.72.182.33 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=amazon.com header.i=@amazon.com header.b="ivo2SK6r" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazoncorp2; t=1764599092; x=1796135092; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=V3pCUcV0THf57Rt37Dp9v+liCAg9z5E32iBeAXB5cuo=; b=ivo2SK6rj68MAxi4m6QnqrIq0AM/uYw472Wfh0v9G9xj96a2UWoo05+T 2+oZNqAqokPwLbTlvCPr8y6f5pE9lWV5O+/KygsztJNheGf9BwZ8VJpkk WLeeYCa0I2c71wq9GEE/kl8v4p27sFMYc+QuoklCPfDkoc2KekRZRg8k6 jKY1mD3RQz4SA6PGcQmGy8m0zausauYi8CDvBZ2k71OkwYEVwWJsJ1uYM sE9Zy19pCq/Mvrh2kc/q8nIjS1LHbG3Pl99Q4d/m3AWA7SyeSFrPVQH+C IjmiNoTxKJXpT4fs4sWSaNG1PKGmb/DoSWPAGOgU+++mDY1oTSA+K0EtF Q==; X-CSE-ConnectionGUID: X2fZYrEpTIGIAGP39BYxEg== X-CSE-MsgGUID: LracZSpeS9qxt9IvmtlxPg== X-IronPort-AV: E=Sophos;i="6.20,240,1758585600"; d="scan'208";a="6063446" Received: from ip-10-6-11-83.eu-central-1.compute.internal (HELO smtpout.naws.eu-central-1.prod.farcaster.email.amazon.dev) ([10.6.11.83]) by internal-fra-out-003.esa.eu-central-1.outbound.mail-perimeter.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Dec 2025 14:24:32 +0000 Received: from EX19MTAEUB002.ant.amazon.com [54.240.197.224:12457] by smtpin.naws.eu-central-1.prod.farcaster.email.amazon.dev [10.0.33.168:2525] with esmtp (Farcaster) id b82bc2ab-6722-4032-ab95-79616369f5a6; Mon, 1 Dec 2025 14:24:32 +0000 (UTC) X-Farcaster-Flow-ID: b82bc2ab-6722-4032-ab95-79616369f5a6 Received: from EX19D003EUB001.ant.amazon.com (10.252.51.97) by EX19MTAEUB002.ant.amazon.com (10.252.51.79) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.29; Mon, 1 Dec 2025 14:24:29 +0000 Received: from u5934974a1cdd59.ant.amazon.com (10.146.13.108) by EX19D003EUB001.ant.amazon.com (10.252.51.97) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.29; Mon, 1 Dec 2025 14:24:20 +0000 From: Fernand Sieber To: , CC: =?UTF-8?q?Jan=20H=2E=20Sch=C3=B6nherr?= , , , , , , , , , , , , , , , Subject: [PATCH] KVM: x86/pmu: Do not accidentally create BTS events Date: Mon, 1 Dec 2025 16:23:57 +0200 Message-ID: <20251201142359.344741-1-sieberf@amazon.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D041UWA001.ant.amazon.com (10.13.139.124) To EX19D003EUB001.ant.amazon.com (10.252.51.97) Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Jan H. Sch=C3=B6nherr It is possible to degrade host performance by manipulating performance counters from a VM and tricking the host hypervisor to enable branch tracing. When the guest programs a CPU to track branch instructions and deliver an interrupt after exactly one branch instruction, the value one is handled by the host KVM/perf subsystems and treated incorrectly as a special value to enable the branch trace store (BTS) subsystem. It should not be possible to enable BTS from a guest. When BTS is enabled, it leads to general host performance degradation to both VMs and host. Perf considers the combination of PERF_COUNT_HW_BRANCH_INSTRUCTIONS with a sample_period of 1 a special case and handles this as a BTS event (see intel_pmu_has_bts_period()) -- a deviation from the usual semantic, where the sample_period represents the amount of branch instructions to encounter before the overflow handler is invoked. Nothing prevents a guest from programming its vPMU with the above settings (count branch, interrupt after one branch), which causes KVM to erroneously instruct perf to create a BTS event within pmc_reprogram_counter(), which does not have the desired semantics. The guest could also do more benign actions and request an interrupt after a more reasonable number of branch instructions via its vPMU. In that case counting works initially. However, KVM occasionally pauses and resumes the created performance counters. If the remaining amount of branch instructions until interrupt has reached 1 exactly, pmc_resume_counter() fails to resume the counter and a BTS event is created instead with its incorrect semantics. Fix this behavior by not passing the special value "1" as sample_period to perf. Instead, perform the same quirk that happens later in x86_perf_event_set_period() anyway, when the performance counter is transferred to the actual PMU: bump the sample_period to 2. Testing: From guest: `./wrmsr -p 12 0x186 0x1100c4` `./wrmsr -p 12 0xc1 0xffffffffffff` `./wrmsr -p 12 0x186 0x5100c4` This sequence sets up branch instruction counting, initializes the counter to overflow after one event (0xffffffffffff), and then enables edge detection (bit 18) for branch events. ./wrmsr -p 12 0x186 0x1100c4 Writes to IA32_PERFEVTSEL0 (0x186) Value 0x1100c4 breaks down as: Event =3D 0xC4 (Branch instructions) Bits 16-17: 0x1 (User mode only) Bit 22: 1 (Enable counter) ./wrmsr -p 12 0xc1 0xffffffffffff Writes to IA32_PMC0 (0xC1) Sets counter to maximum value (0xffffffffffff) This effectively sets up the counter to overflow on the next branch ./wrmsr -p 12 0x186 0x5100c4 Updates IA32_PERFEVTSEL0 again Similar to first command but adds bit 18 (0x4 to 0x5) Enables edge detection (bit 18) These MSR writes are trapped by the hypervisor in KVM and forwarded to the perf subsystem to create corresponding monitoring events. It is possible to repro this problem in a more realistic guest scenario: `perf record -e branches:u -c 2 -a &` `perf record -e branches:u -c 2 -a &` This presumably triggers the issue by KVM pausing and resuming the performance counter at the wrong moment, when its value is about to overflow. Signed-off-by: Jan H. Sch=C3=B6nherr Signed-off-by: Fernand Sieber Reviewed-by: David Woodhouse Reviewed-by: Hendrik Borghorst Link: https://lore.kernel.org/r/20251124100220.238177-1-sieberf@amazon.com --- arch/x86/kvm/pmu.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 487ad19a236e..547512028e24 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -225,6 +225,19 @@ static u64 get_sample_period(struct kvm_pmc *pmc, u64 = counter_value) { u64 sample_period =3D (-counter_value) & pmc_bitmask(pmc); =20 + /* + * A sample_period of 1 might get mistaken by perf for a BTS event, see + * intel_pmu_has_bts_period(). This would prevent re-arming the counter + * via pmc_resume_counter(), followed by the accidental creation of an + * actual BTS event, which we do not want. + * + * Avoid this by bumping the sampling period. Note, that we do not lose + * any precision, because the same quirk happens later anyway (for + * different reasons) in x86_perf_event_set_period(). + */ + if (sample_period =3D=3D 1) + sample_period =3D 2; + if (!sample_period) sample_period =3D pmc_bitmask(pmc) + 1; return sample_period; --=20 2.43.0 Amazon Development Centre (South Africa) (Proprietary) Limited 29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa Registration Number: 2004 / 034463 / 07