From nobody Sun Feb 8 00:47:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECEAFECAAD8 for ; Fri, 23 Sep 2022 00:14:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230408AbiIWAOM (ORCPT ); Thu, 22 Sep 2022 20:14:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230281AbiIWAOH (ORCPT ); Thu, 22 Sep 2022 20:14:07 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0183F85AC for ; Thu, 22 Sep 2022 17:14:06 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id br14-20020a056a00440e00b00548434985cdso6163379pfb.8 for ; Thu, 22 Sep 2022 17:14:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date; bh=EBTyUY3XZE752wRZafMDoegRM4hNkAZeyMdz0Uk4JbQ=; b=fkCRmoRfPIqYKdmcvJI9Fta4b9KfZN7e0eQdNaWmIXq9ZQL4DSBz+ajrSHsCXaMi71 Zgn/HSUe51xwQOcV032hxfi5UP8kOspSJUZimrlImKt8/S+EXPqiqZcjWgY/gPDW//zO Lbd+lGKdejwCBAFox3oJ8si7077UsyN2QRpJMlYW/LdMFfHBZMv9CvBXz8mb2wGRZg70 tq+xOvLDHaOLiKIHzdCdveOyHMnaxG14nqTihHP18CygyqHXXOqkBBRGJY0qf0IEbHJC soqmTCAAaBLssjgchpPzZYOs7hyCCSv2AWBn8thxprC6q1yuge/HlbOsWZy8mx9w9Hjb Uf2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date; bh=EBTyUY3XZE752wRZafMDoegRM4hNkAZeyMdz0Uk4JbQ=; b=ho7jq3eVTYAZAA5LcENNzSaoThMbJqm/1rcWS2I05FdPrxIMdB51YmJODLOHe4eCSH 9OOWzZ50SRP22oSAapzkXbGmozvSmrdhttQxANEYmmTaXpAyq9zDavepSAjX7Ut70s34 WGo6pTlbRDTvhiah5OCfdaHCy03Hajf+Ep/Z3ya6oqXqxWLTjEKXtUJrbV99qDXOpC/7 eeoKv1qaNKr2HFhTywODP6mxqf66xFXetpzf4+qF2I+1qbJvRbkQ8WtucSwhCCb2QAon 1fsH1Y2AmPBIgyBfQmzsAcaYuClAHy4LzEGUNcM0Yx131mIlACBf9A5C1T1IjhW/e3/N LPww== X-Gm-Message-State: ACrzQf3BTTTueuj94YqW3KHZboUu/z3xghqVsAKteXPhIlCq8xH9PC/o 0x9coxcCsA6AsVwr0V5pQT7b9Tl65sk= X-Google-Smtp-Source: AMsMyM7F4JyUTc+0+Z0vKXfRIXgEG8sC2y6UBGCK0o5D0CrYPyYnoSRVj26nigofwjfILt7FrNyq7J229xg= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:10a:b0:200:2849:235f with SMTP id p10-20020a17090b010a00b002002849235fmr437280pjz.1.1663892046038; Thu, 22 Sep 2022 17:14:06 -0700 (PDT) Reply-To: Sean Christopherson Date: Fri, 23 Sep 2022 00:13:52 +0000 In-Reply-To: <20220923001355.3741194-1-seanjc@google.com> Mime-Version: 1.0 References: <20220923001355.3741194-1-seanjc@google.com> X-Mailer: git-send-email 2.37.3.998.g577e59143f-goog Message-ID: <20220923001355.3741194-2-seanjc@google.com> Subject: [PATCH 1/4] KVM: x86/pmu: Force reprogramming of all counters on PMU filter change From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Aaron Lewis , Like Xu , Wanpeng Li Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Force vCPUs to reprogram all counters on a PMU filter change to provide a sane ABI for userspace. Use the existing KVM_REQ_PMU to do the programming, and take advantage of the fact that the reprogram_pmi bitmap fits in a u64 to set all bits in a single atomic update. Note, setting the bitmap and making the request needs to be done _after_ the SRCU synchronization to ensure that vCPUs will reprogram using the new filter. KVM's current "lazy" approach is confusing and non-deterministic. It's confusing because, from a developer perspective, the code is buggy as it makes zero sense to let userspace modify the filter but then not actually enforce the new filter. The lazy approach is non-deterministic because KVM enforces the filter whenever a counter is reprogrammed, not just on guest WRMSRs, i.e. a guest might gain/lose access to an event at random times depending on what is going on in the host. Note, the resulting behavior is still non-determinstic while the filter is in flux. If userspace wants to guarantee deterministic behavior, all vCPUs should be paused during the filter update. Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter") Cc: Aaron Lewis Jim Mattson Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 11 ++++++++++- arch/x86/kvm/pmu.c | 15 +++++++++++++-- 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index b3ce723efb43..462f041ede9f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -519,7 +519,16 @@ struct kvm_pmu { struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC]; struct kvm_pmc fixed_counters[KVM_PMC_MAX_FIXED]; struct irq_work irq_work; - DECLARE_BITMAP(reprogram_pmi, X86_PMC_IDX_MAX); + + /* + * Overlay the bitmap with a 64-bit atomic so that all bits can be + * set in a single access, e.g. to reprogram all counters when the PMU + * filter changes. + */ + union { + DECLARE_BITMAP(reprogram_pmi, X86_PMC_IDX_MAX); + atomic64_t __reprogram_pmi; + }; DECLARE_BITMAP(all_valid_pmc_idx, X86_PMC_IDX_MAX); DECLARE_BITMAP(pmc_in_use, X86_PMC_IDX_MAX); =20 diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index d9b9a0f0db17..4504987cbbe2 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -577,6 +577,8 @@ EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event); int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp) { struct kvm_pmu_event_filter tmp, *filter; + struct kvm_vcpu *vcpu; + unsigned long i; size_t size; int r; =20 @@ -613,9 +615,18 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm,= void __user *argp) mutex_lock(&kvm->lock); filter =3D rcu_replace_pointer(kvm->arch.pmu_event_filter, filter, mutex_is_locked(&kvm->lock)); - mutex_unlock(&kvm->lock); - synchronize_srcu_expedited(&kvm->srcu); + + BUILD_BUG_ON(sizeof(((struct kvm_pmu *)0)->reprogram_pmi) > + sizeof(((struct kvm_pmu *)0)->__reprogram_pmi)); + + kvm_for_each_vcpu(i, vcpu, kvm) + atomic64_set(&vcpu_to_pmu(vcpu)->__reprogram_pmi, -1ull); + + kvm_make_all_cpus_request(kvm, KVM_REQ_PMU); + + mutex_unlock(&kvm->lock); + r =3D 0; cleanup: kfree(filter); --=20 2.37.3.998.g577e59143f-goog From nobody Sun Feb 8 00:47:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57E2DC6FA8B for ; Fri, 23 Sep 2022 00:14:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231237AbiIWAOU (ORCPT ); Thu, 22 Sep 2022 20:14:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230442AbiIWAOJ (ORCPT ); Thu, 22 Sep 2022 20:14:09 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 544BC103FF2 for ; Thu, 22 Sep 2022 17:14:08 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id q22-20020a62e116000000b005428fb66124so6129474pfh.16 for ; Thu, 22 Sep 2022 17:14:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date; bh=f2xmpMmtFzdLGsMsB8kYkuwB1JyVMmF0btvwH040Ee4=; b=rnJaFLPPbtEl6p8La4UuvJhCknH1LynXt8bS7KvIUGS3P7eeW30h7WaODIpwswxgzj roFq6ACeln5B5O+fl50fFlFJ7wQ2rGsO9epEhfT30UKCFX3s+0S3uO+Ie1y7k3+dPGFZ fyGMfm4BRK4zUGtWjhOJ/CavPC++zvEDZnnAVaTWrrif8FAUXzvKISy9FfG2v3I9Zbav zQsIBzbyyX14LwXYLZXehp/DXX9KNEkupPok80ZJM7cCSiW54V0uE0EMolLe1MCyIQSz JCCn0G2pjR+/pzm4ZLNsb6wHq2/B01AWyE6cHihCndklGqAAnV3u9Z5WquEizknAePSt FccA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date; bh=f2xmpMmtFzdLGsMsB8kYkuwB1JyVMmF0btvwH040Ee4=; b=SyG0RE77xUPJYMdzgfOYS/kC0zq4dPZKRFRaTIwAJtfi4gaL2nCvbzrgnUncz8JiAE vjMMeKOVVTYbtokrY6aMkkjkY6arhVMxJjuZUre1YpSkxAuk9VBkS8LGkApDG58xmNAp lKNKXaaq/jJ4sNgM937QdSRZUyUKw9DsMbEGyH6YYPR7u17ZEvN/r0CeRTBXUiRKgoCC +cj4SQ09lX0gdY3ZX5u3unAd+JZnzGul5+NYP6jL62lS0cZgKKQcsXakApBNNDx0b2vF WRTgaxNLtL4muvn+wv8mlogDqSinBjgGxoPF+MnqJmKvk0hA9AhdGv5d3cnvp903vR6X 6IWA== X-Gm-Message-State: ACrzQf2D7uBRYRA4oQsvocFSHcXQn6kMOmopEMu7bYKi8aNSODYXzn7S hss0MfLf4pobIfkbTB4JGZUeadMYXho= X-Google-Smtp-Source: AMsMyM6w+VH96XwQOSnc5r8zfbqw1kG64xzRQX/6SN96JqNtwiRpf7Q+Oh9TqgLurWU6/4eTY93vgQmAHfs= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a00:a22:b0:54e:6a90:fbef with SMTP id p34-20020a056a000a2200b0054e6a90fbefmr6068807pfh.53.1663892047901; Thu, 22 Sep 2022 17:14:07 -0700 (PDT) Reply-To: Sean Christopherson Date: Fri, 23 Sep 2022 00:13:53 +0000 In-Reply-To: <20220923001355.3741194-1-seanjc@google.com> Mime-Version: 1.0 References: <20220923001355.3741194-1-seanjc@google.com> X-Mailer: git-send-email 2.37.3.998.g577e59143f-goog Message-ID: <20220923001355.3741194-3-seanjc@google.com> Subject: [PATCH 2/4] KVM: x86/pmu: Clear "reprogram" bit if counter is disabled or disallowed From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Aaron Lewis , Like Xu , Wanpeng Li Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When reprogramming a counter, clear the counter's "reprogram pending" bit if the counter is disabled (by the guest) or is disallowed (by the userspace filter). In both cases, there's no need to re-attempt programming on the next coincident KVM_REQ_PMU as enabling the counter by either method will trigger reprogramming. Signed-off-by: Sean Christopherson --- arch/x86/kvm/pmu.c | 38 ++++++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 4504987cbbe2..4cd99320019b 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -150,9 +150,9 @@ static void kvm_perf_overflow(struct perf_event *perf_e= vent, __kvm_perf_overflow(pmc, true); } =20 -static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, - u64 config, bool exclude_user, - bool exclude_kernel, bool intr) +static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config, + bool exclude_user, bool exclude_kernel, + bool intr) { struct kvm_pmu *pmu =3D pmc_to_pmu(pmc); struct perf_event *event; @@ -204,14 +204,14 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc= , u32 type, if (IS_ERR(event)) { pr_debug_ratelimited("kvm_pmu: event creation failed %ld for pmc->idx = =3D %d\n", PTR_ERR(event), pmc->idx); - return; + return PTR_ERR(event); } =20 pmc->perf_event =3D event; pmc_to_pmu(pmc)->event_count++; - clear_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi); pmc->is_paused =3D false; pmc->intr =3D intr || pebs; + return 0; } =20 static void pmc_pause_counter(struct kvm_pmc *pmc) @@ -245,7 +245,6 @@ static bool pmc_resume_counter(struct kvm_pmc *pmc) perf_event_enable(pmc->perf_event); pmc->is_paused =3D false; =20 - clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi); return true; } =20 @@ -303,10 +302,10 @@ void reprogram_counter(struct kvm_pmc *pmc) pmc_pause_counter(pmc); =20 if (!pmc_speculative_in_use(pmc) || !pmc_is_enabled(pmc)) - return; + goto reprogram_complete; =20 if (!check_pmu_event_filter(pmc)) - return; + goto reprogram_complete; =20 if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL) printk_once("kvm pmu: pin control bit is ignored\n"); @@ -324,16 +323,27 @@ void reprogram_counter(struct kvm_pmc *pmc) } =20 if (pmc->current_config =3D=3D new_config && pmc_resume_counter(pmc)) - return; + goto reprogram_complete; =20 pmc_release_perf_event(pmc); =20 pmc->current_config =3D new_config; - pmc_reprogram_counter(pmc, PERF_TYPE_RAW, - (eventsel & pmu->raw_event_mask), - !(eventsel & ARCH_PERFMON_EVENTSEL_USR), - !(eventsel & ARCH_PERFMON_EVENTSEL_OS), - eventsel & ARCH_PERFMON_EVENTSEL_INT); + + /* + * If reprogramming fails, e.g. due to contention, leave the counter's + * regprogram bit set, i.e. opportunistically try again on the next PMU + * refresh. Don't make a new request as doing so can stall the guest + * if reprogramming repeatedly fails. + */ + if (pmc_reprogram_counter(pmc, PERF_TYPE_RAW, + (eventsel & pmu->raw_event_mask), + !(eventsel & ARCH_PERFMON_EVENTSEL_USR), + !(eventsel & ARCH_PERFMON_EVENTSEL_OS), + eventsel & ARCH_PERFMON_EVENTSEL_INT)) + return; + +reprogram_complete: + clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi); } EXPORT_SYMBOL_GPL(reprogram_counter); =20 --=20 2.37.3.998.g577e59143f-goog From nobody Sun Feb 8 00:47:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55027C54EE9 for ; Fri, 23 Sep 2022 00:14:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231250AbiIWAOZ (ORCPT ); Thu, 22 Sep 2022 20:14:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230502AbiIWAOL (ORCPT ); Thu, 22 Sep 2022 20:14:11 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F641106A01 for ; Thu, 22 Sep 2022 17:14:10 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id w197-20020a627bce000000b00556ac6baf07so109682pfc.4 for ; Thu, 22 Sep 2022 17:14:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date; bh=EzYkgN8mabakGnvYj+N48jsNX6a8/81bGRaQci8iJjQ=; b=aNLw7jyaLlbqLQteGB5ehmKXP57MyrZ038jjncK+m8lWrgBzi7q3XZa//tAT+Ua9XK tWAiYF4q3CcF07V04BwKJnZS52xB6NBnjmrptFm1tBtO6Z2juurrkaKosLHW9R9eouzU GdEKAHhfL/mHP1gwRRIx7chgyjW8hoRBweHgDc2zoQ+PzsbtgfMuYxLVOPKmlhriXT3L tQs0vNt553TEIRrNiQBpBHfuc5zHZAtqlU69QBdk+vc9Rdx50nn/8A1IeWe6whe2PKem Lrt3BKz53b15J5MEoALRIMs1WoSVNY5LGiOO+jiaoyzAU3TEq4qJoFlMV7J8tqOQ4RDJ dB3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date; bh=EzYkgN8mabakGnvYj+N48jsNX6a8/81bGRaQci8iJjQ=; b=4kMtnVmVqVaiCegk5ziZoFMe7+l5dZNiwN7QSfFMLAFvBnH0H1bkITAmViPjsOGn3a fOUlVtffRs6INt71OgfLqOaV1evZZnCLwjShBhf9A+Euv2aXrmPMCvAILCtXH98525dG HR8mb5yGHXqXsInQkwCR7X49JjMAou4JUO1uXcLXP9P1Buz0Z+AqoIu8D6QAngll31sB sUmhUIpqziuY06kWXfUifyKywCUjMSOo3jWbj7DPr2qYyLsik8dGdZlDWfWWud4MNTy+ fzP5jtAAdOV4qJnT+oL7YmNqdmx5uD1rik3WYNmRgvSdFzCwyxUu/5J4vjgypKeqyHKn FM8w== X-Gm-Message-State: ACrzQf1GOu0Jc+s2c+Gl6akXK+n0TM7lUZ2qjkfyqFwrV6xHLjO9CIN9 tEMU5Xr1SJj5BdDnJFtIAx7M/vascpw= X-Google-Smtp-Source: AMsMyM4phjE663nw9ogfVflbHCX+eO8lLiQBp7ZCmqUDUmna3PKoffErmx3DpUfId/cYrbhyW5w9HEyaxy0= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:d2ce:b0:178:7b5:c070 with SMTP id n14-20020a170902d2ce00b0017807b5c070mr5827561plc.58.1663892049838; Thu, 22 Sep 2022 17:14:09 -0700 (PDT) Reply-To: Sean Christopherson Date: Fri, 23 Sep 2022 00:13:54 +0000 In-Reply-To: <20220923001355.3741194-1-seanjc@google.com> Mime-Version: 1.0 References: <20220923001355.3741194-1-seanjc@google.com> X-Mailer: git-send-email 2.37.3.998.g577e59143f-goog Message-ID: <20220923001355.3741194-4-seanjc@google.com> Subject: [PATCH 3/4] KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event() From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Aaron Lewis , Like Xu , Wanpeng Li Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Like Xu Batch reprogramming PMU counters by setting KVM_REQ_PMU and thus deferring reprogramming kvm_pmu_handle_event() to avoid reprogramming a counter multiple times during a single VM-Exit. Deferring programming will also allow KVM to fix a bug where immediately reprogramming a counter can result in sleeping (taking a mutex) while interrupts are disabled in the VM-Exit fastpath. Introduce kvm_pmu_request_counter_reprogam() to make it obvious that KVM is _requesting_ a reprogram and not actually doing the reprogram. Opportunistically refine related comments to avoid misunderstandings. Signed-off-by: Like Xu Link: https://lore.kernel.org/r/20220831085328.45489-5-likexu@tencent.com Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/pmu.c | 17 ++++++++++++----- arch/x86/kvm/pmu.h | 6 +++++- arch/x86/kvm/svm/pmu.c | 2 +- arch/x86/kvm/vmx/pmu_intel.c | 6 +++--- 5 files changed, 22 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 462f041ede9f..12dcfc9330e7 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -493,6 +493,7 @@ struct kvm_pmc { struct perf_event *perf_event; struct kvm_vcpu *vcpu; /* + * only for creating or reusing perf_event, * eventsel value for general purpose counters, * ctrl value for fixed counters. */ diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 4cd99320019b..d8330e6064ab 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -101,7 +101,11 @@ static inline void __kvm_perf_overflow(struct kvm_pmc = *pmc, bool in_pmi) struct kvm_pmu *pmu =3D pmc_to_pmu(pmc); bool skip_pmi =3D false; =20 - /* Ignore counters that have been reprogrammed already. */ + /* + * Ignore overflow events for counters that are scheduled to be + * reprogrammed, e.g. if a PMI for the previous event races with KVM's + * handling of a related guest WRMSR. + */ if (test_and_set_bit(pmc->idx, pmu->reprogram_pmi)) return; =20 @@ -292,7 +296,7 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc) return allow_event; } =20 -void reprogram_counter(struct kvm_pmc *pmc) +static void reprogram_counter(struct kvm_pmc *pmc) { struct kvm_pmu *pmu =3D pmc_to_pmu(pmc); u64 eventsel =3D pmc->eventsel; @@ -345,7 +349,6 @@ void reprogram_counter(struct kvm_pmc *pmc) reprogram_complete: clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi); } -EXPORT_SYMBOL_GPL(reprogram_counter); =20 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu) { @@ -355,10 +358,11 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu) for_each_set_bit(bit, pmu->reprogram_pmi, X86_PMC_IDX_MAX) { struct kvm_pmc *pmc =3D static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, bit= ); =20 - if (unlikely(!pmc || !pmc->perf_event)) { + if (unlikely(!pmc)) { clear_bit(bit, pmu->reprogram_pmi); continue; } + reprogram_counter(pmc); } =20 @@ -552,12 +556,15 @@ static inline bool eventsel_match_perf_hw_id(struct k= vm_pmc *pmc, static inline bool cpl_is_matched(struct kvm_pmc *pmc) { bool select_os, select_user; - u64 config =3D pmc->current_config; + u64 config; =20 if (pmc_is_gp(pmc)) { + config =3D pmc->eventsel; select_os =3D config & ARCH_PERFMON_EVENTSEL_OS; select_user =3D config & ARCH_PERFMON_EVENTSEL_USR; } else { + config =3D fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl, + pmc->idx - INTEL_PMC_IDX_FIXED); select_os =3D config & 0x1; select_user =3D config & 0x2; } diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index 5cc5721f260b..85ff3c0588ba 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -183,7 +183,11 @@ static inline void kvm_init_pmu_capability(void) KVM_PMC_MAX_FIXED); } =20 -void reprogram_counter(struct kvm_pmc *pmc); +static inline void kvm_pmu_request_counter_reprogam(struct kvm_pmc *pmc) +{ + set_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi); + kvm_make_request(KVM_REQ_PMU, pmc->vcpu); +} =20 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu); void kvm_pmu_handle_event(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c index b68956299fa8..041aa898e1bc 100644 --- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -159,7 +159,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struc= t msr_data *msr_info) data &=3D ~pmu->reserved_bits; if (data !=3D pmc->eventsel) { pmc->eventsel =3D data; - reprogram_counter(pmc); + kvm_pmu_request_counter_reprogam(pmc); } return 0; } diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 25b70a85bef5..e38518afc265 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -52,7 +52,7 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu,= u64 data) pmc =3D get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR0 + i); =20 __set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use); - reprogram_counter(pmc); + kvm_pmu_request_counter_reprogam(pmc); } } =20 @@ -76,7 +76,7 @@ static void reprogram_counters(struct kvm_pmu *pmu, u64 d= iff) for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX) { pmc =3D intel_pmc_idx_to_pmc(pmu, bit); if (pmc) - reprogram_counter(pmc); + kvm_pmu_request_counter_reprogam(pmc); } } =20 @@ -477,7 +477,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, str= uct msr_data *msr_info) reserved_bits ^=3D HSW_IN_TX_CHECKPOINTED; if (!(data & reserved_bits)) { pmc->eventsel =3D data; - reprogram_counter(pmc); + kvm_pmu_request_counter_reprogam(pmc); return 0; } } else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, false)) --=20 2.37.3.998.g577e59143f-goog From nobody Sun Feb 8 00:47:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 229FAC54EE9 for ; Fri, 23 Sep 2022 00:14:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231304AbiIWAOa (ORCPT ); Thu, 22 Sep 2022 20:14:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231171AbiIWAOO (ORCPT ); Thu, 22 Sep 2022 20:14:14 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A202106F68 for ; Thu, 22 Sep 2022 17:14:12 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id f15-20020a17090a664f00b002038154eb4bso1889520pjm.9 for ; Thu, 22 Sep 2022 17:14:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date; bh=MpGY4j2+HNaS4pIK9i1574FgHbMWXnwyl50j6Y5l4ns=; b=DDZQtcI8HcW4ZNwcB3YWNlpdzUNkTDOJD4Z4frFbFub3o0IvhP65RrnoMw1lrlFZli XUCEZU53R+VS0cb0POQZWjfl1vaQeYkVVp8MqTtMWLEpgmye3Qokzg/QaVmopGauY9pk gJ858vimDO428U2r0JFoiA3qceWciU/4poFF0NW0xJwE+AAWd1QzxD107Iou9uh7fnwu KVMc7pradILq2kMDgjY9gPlR4WNWnOL9qeKDumwMdd6lnWn/r3hDZl3C2mgb7XQnAeWY ZojaWYTAIZ0BmqgM4JpAbndykSPnvhAY6BTnDxePOr6ZRI9wtRe6twjp5ult/6+RN5cv xyaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date; bh=MpGY4j2+HNaS4pIK9i1574FgHbMWXnwyl50j6Y5l4ns=; b=F9SDYhMMow8DoVKa3s3k+k3LtWfYMYI+uRVuszMu9gfizmxb8TUcgwGt+LOilfh5qG NQh7FRCbJzE6HxcEHdufCQ6hXcxTtpurl0WE7BbX0vbwKJ11WAz1WQa9F5+m/sAEssDn VANHK7+m4ai2w0f5OLWRPGPoI5VcZ2oYtwuhVZPtXEVxfCiIN7RuOigSSTP5EWllMFik u79X/3/XTUANieBJTZvuQW5Wr703vFUy59fhiiESiK0oP/W6lcJRNfdLLApyiaufsmI3 ZnutoOjQRm4m8MGaKTETdvxlVXIm0aWv2QSa/0JsFYSFVZxze+XHYZCFQlXQCRO0HvnS bxLA== X-Gm-Message-State: ACrzQf3TCxW/tTVVF0tkKt4rE6SGxPAWXJ1QuqGyzpQ4euL2UKfHeK1x FZfsynt1vd1hDoyYCurAM4TN6zLHb3A= X-Google-Smtp-Source: AMsMyM6dR8YjTdbIGbobhsytlzYF4LuFcy7c0+vSn8nMZnL9q0hTWanGut+PyhNipbx1MRUNNtBJTh3lLqQ= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:10a:b0:200:2849:235f with SMTP id p10-20020a17090b010a00b002002849235fmr437296pjz.1.1663892051472; Thu, 22 Sep 2022 17:14:11 -0700 (PDT) Reply-To: Sean Christopherson Date: Fri, 23 Sep 2022 00:13:55 +0000 In-Reply-To: <20220923001355.3741194-1-seanjc@google.com> Mime-Version: 1.0 References: <20220923001355.3741194-1-seanjc@google.com> X-Mailer: git-send-email 2.37.3.998.g577e59143f-goog Message-ID: <20220923001355.3741194-5-seanjc@google.com> Subject: [PATCH 4/4] KVM: x86/pmu: Defer counter emulated overflow via pmc->prev_counter From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Aaron Lewis , Like Xu , Wanpeng Li Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Like Xu Defer reprogramming counters and handling overflow via KVM_REQ_PMU when incrementing counters. KVM skips emulated WRMSR in the VM-Exit fastpath, the fastpath runs with IRQs disabled, skipping instructions can increment and reprogram counters, reprogramming counters can sleep, and sleeping is disallowed while IRQs are disabled. [*] BUG: sleeping function called from invalid context at kernel/locking/m= utex.c:580 [*] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2981888, name: = CPU 15/KVM [*] preempt_count: 1, expected: 0 [*] RCU nest depth: 0, expected: 0 [*] INFO: lockdep is turned off. [*] irq event stamp: 0 [*] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [*] hardirqs last disabled at (0): [] copy_process+0x146= a/0x62d0 [*] softirqs last enabled at (0): [] copy_process+0x14a= 9/0x62d0 [*] softirqs last disabled at (0): [<0000000000000000>] 0x0 [*] Preemption disabled at: [*] [] vcpu_enter_guest+0x1001/0x3dc0 [kvm] [*] CPU: 17 PID: 2981888 Comm: CPU 15/KVM Kdump: 5.19.0-rc1-g239111db364c-= dirty #2 [*] Call Trace: [*] [*] dump_stack_lvl+0x6c/0x9b [*] __might_resched.cold+0x22e/0x297 [*] __mutex_lock+0xc0/0x23b0 [*] perf_event_ctx_lock_nested+0x18f/0x340 [*] perf_event_pause+0x1a/0x110 [*] reprogram_counter+0x2af/0x1490 [kvm] [*] kvm_pmu_trigger_event+0x429/0x950 [kvm] [*] kvm_skip_emulated_instruction+0x48/0x90 [kvm] [*] handle_fastpath_set_msr_irqoff+0x349/0x3b0 [kvm] [*] vmx_vcpu_run+0x268e/0x3b80 [kvm_intel] [*] vcpu_enter_guest+0x1d22/0x3dc0 [kvm] Add a field to kvm_pmc to track the previous counter value in order to defer overflow detection to kvm_pmu_handle_event() (the counter must be paused before handling overflow, and that may increment the counter). Opportunistically shrink sizeof(struct kvm_pmc) a bit. Suggested-by: Wanpeng Li Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions") Signed-off-by: Like Xu Link: https://lore.kernel.org/r/20220831085328.45489-6-likexu@tencent.com [sean: avoid re-triggering KVM_REQ_PMU on overflow, tweak changelog] Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 5 +++-- arch/x86/kvm/pmu.c | 32 ++++++++++++++++---------------- arch/x86/kvm/svm/pmu.c | 2 +- arch/x86/kvm/vmx/pmu_intel.c | 4 ++-- 4 files changed, 22 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 12dcfc9330e7..9639404f2856 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -488,7 +488,10 @@ enum pmc_type { struct kvm_pmc { enum pmc_type type; u8 idx; + bool is_paused; + bool intr; u64 counter; + u64 prev_counter; u64 eventsel; struct perf_event *perf_event; struct kvm_vcpu *vcpu; @@ -498,8 +501,6 @@ struct kvm_pmc { * ctrl value for fixed counters. */ u64 current_config; - bool is_paused; - bool intr; }; =20 #define KVM_PMC_MAX_FIXED 3 diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index d8330e6064ab..935c9d80ab50 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -101,14 +101,6 @@ static inline void __kvm_perf_overflow(struct kvm_pmc = *pmc, bool in_pmi) struct kvm_pmu *pmu =3D pmc_to_pmu(pmc); bool skip_pmi =3D false; =20 - /* - * Ignore overflow events for counters that are scheduled to be - * reprogrammed, e.g. if a PMI for the previous event races with KVM's - * handling of a related guest WRMSR. - */ - if (test_and_set_bit(pmc->idx, pmu->reprogram_pmi)) - return; - if (pmc->perf_event && pmc->perf_event->attr.precise_ip) { if (!in_pmi) { /* @@ -126,7 +118,6 @@ static inline void __kvm_perf_overflow(struct kvm_pmc *= pmc, bool in_pmi) } else { __set_bit(pmc->idx, (unsigned long *)&pmu->global_status); } - kvm_make_request(KVM_REQ_PMU, pmc->vcpu); =20 if (!pmc->intr || skip_pmi) return; @@ -151,7 +142,17 @@ static void kvm_perf_overflow(struct perf_event *perf_= event, { struct kvm_pmc *pmc =3D perf_event->overflow_handler_context; =20 + /* + * Ignore overflow events for counters that are scheduled to be + * reprogrammed, e.g. if a PMI for the previous event races with KVM's + * handling of a related guest WRMSR. + */ + if (test_and_set_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi)) + return; + __kvm_perf_overflow(pmc, true); + + kvm_make_request(KVM_REQ_PMU, pmc->vcpu); } =20 static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config, @@ -311,6 +312,9 @@ static void reprogram_counter(struct kvm_pmc *pmc) if (!check_pmu_event_filter(pmc)) goto reprogram_complete; =20 + if (pmc->counter < pmc->prev_counter) + __kvm_perf_overflow(pmc, false); + if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL) printk_once("kvm pmu: pin control bit is ignored\n"); =20 @@ -348,6 +352,7 @@ static void reprogram_counter(struct kvm_pmc *pmc) =20 reprogram_complete: clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi); + pmc->prev_counter =3D 0; } =20 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu) @@ -536,14 +541,9 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu) =20 static void kvm_pmu_incr_counter(struct kvm_pmc *pmc) { - u64 prev_count; - - prev_count =3D pmc->counter; + pmc->prev_counter =3D pmc->counter; pmc->counter =3D (pmc->counter + 1) & pmc_bitmask(pmc); - - reprogram_counter(pmc); - if (pmc->counter < prev_count) - __kvm_perf_overflow(pmc, false); + kvm_pmu_request_counter_reprogam(pmc); } =20 static inline bool eventsel_match_perf_hw_id(struct kvm_pmc *pmc, diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c index 041aa898e1bc..2ec420b85d6a 100644 --- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -211,7 +211,7 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu) struct kvm_pmc *pmc =3D &pmu->gp_counters[i]; =20 pmc_stop_counter(pmc); - pmc->counter =3D pmc->eventsel =3D 0; + pmc->counter =3D pmc->prev_counter =3D pmc->eventsel =3D 0; } } =20 diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index e38518afc265..1bf5d4b00296 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -647,14 +647,14 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu) pmc =3D &pmu->gp_counters[i]; =20 pmc_stop_counter(pmc); - pmc->counter =3D pmc->eventsel =3D 0; + pmc->counter =3D pmc->prev_counter =3D pmc->eventsel =3D 0; } =20 for (i =3D 0; i < KVM_PMC_MAX_FIXED; i++) { pmc =3D &pmu->fixed_counters[i]; =20 pmc_stop_counter(pmc); - pmc->counter =3D 0; + pmc->counter =3D pmc->prev_counter =3D 0; } =20 pmu->fixed_ctr_ctrl =3D pmu->global_ctrl =3D pmu->global_status =3D 0; --=20 2.37.3.998.g577e59143f-goog