From nobody Fri Nov 29 00:53:51 2024 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59D8417279E for ; Thu, 26 Sep 2024 01:35:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314518; cv=none; b=NCUSvuRcAKv6Zwn10dQOTrsC/mtDyCPLxTraHW6r4D5zSDFeQVMdAoN1tkDWHeXHMHABR7xTUJjyLuQrUk+dc95zQ8dmY12z54EPuIY4kzWy7qugEBFjCWtzBuoG9N6LBXc+eG23fkktA8POh1ASbYnenUQWyHd08JKyR0d2ewE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314518; c=relaxed/simple; bh=KYU/GVdx2SffFJn1dliltWJ7zMQtOV4x+z6L79CjWmU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Cg7ysYnwzXX9pcTGWbbv6i3t7w3+qtyipGK8FPQOFPX7mjFpB0RM5+z14GRWEKyf2MS8Hq94fxu3h4j8/wQhaeeqMjr/B1rJa6hFlmt1Qv/twp47jXOcVNoko6F9pSwYyvxDTF1rBTGp/OYuQTUP17PrjHf0XscSDGfbwwS4wt8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=VZmOHiVf; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="VZmOHiVf" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e25cae769abso816218276.0 for ; Wed, 25 Sep 2024 18:35:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314516; x=1727919316; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xCPXKlvg3fvHTzjtwRf5eteIHvTFuMld4Zvbq5X/Dxs=; b=VZmOHiVfNd1RolPuPyxc6uAaDapZZplAb7NfyZw2j4eZYQ/pmxD9ihsZ2ITrHV6z4C ah3HzNoBaumc049He3aL653VgUMQOJbXctfNC2kzLxXd4FA4ZXohiVf5az+dU6uEE+6n axO6NdBWAqDvtLNhhK6OArKxT178f0ntaTnnLKUe5AiifgOGqlrNgkaxWYhh+IticjYt TzNk5e+NvFYzI52eZtgZUDFuxc9Y22vOGfV57fPkmd4rZMpaHDC3Sb411WRhYsUDGykY D9Ktj2hf3r3UyFbtjNiOwJqIbjMr1Yts2SU5KVR7K/zzed42CnxWl2uPHPDULYPdiAUb xyOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314516; x=1727919316; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xCPXKlvg3fvHTzjtwRf5eteIHvTFuMld4Zvbq5X/Dxs=; b=PNaYWIPOTLQUw7FGJ+p0JQ6J9mlBngb7r++9YSScc/FykezYLkdLxHf9+poarTP7fL h6HBQ48tpBL3gPpRKbqRpsXJ7CqI05+3KUVgMsHLtK/kEeNiUPVmboHgUxWzOpNGzved wbRXrlE+O0CR/qoDsbDMujDXeJ4iW9orzQULmxngnP3Wz6W4P2sDNF+BwcKHzApm6GKU R7cFc7TEfNwxmWgsMh60LzQjRR0VWOW3vo6Qua9oTX2VEgLhOyZVWQovg5JgEAl51T4M cBXZrdkUu5OCJFrdoTVJJp0OJ2xwaTNzt4+3I9PJxwi34o1aalit1+cN5IQEF5OW4Qls hepA== X-Forwarded-Encrypted: i=1; AJvYcCXJNqTOgq70iqhd3ocaaAhEfws4ti8TZ+QFPST5xC61XZm+Y2djYcJqGrMmWJHRdMq8cMT5ISHA7t5xzGU=@vger.kernel.org X-Gm-Message-State: AOJu0YwOyFUeGicE/BHEanR7QcBaBHjfuOwjdJHQbgeKz/pBr9ZuCfLa jfMPN4TFQRC7loMQdWDOMbODnNcGeOxxyBs/HR2PAWDugM1NMc9MrbIirPYfBdhT4pZyXc608pK +Apv8rXDLJMfvpWbiJg== X-Google-Smtp-Source: AGHT+IHCtBDgdi+XBFMofkXIkzVJkl+teLSBao9sOlv55Q8B7jZN9hLVgp83Pu834/Q28A3MUQoEEh6YXmiO2wqz X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a25:97c7:0:b0:e25:c8fc:b78c with SMTP id 3f1490d57ef6-e25c8fcbb92mr1586276.9.1727314516206; Wed, 25 Sep 2024 18:35:16 -0700 (PDT) Date: Thu, 26 Sep 2024 01:34:49 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-2-jthoughton@google.com> Subject: [PATCH v7 01/18] KVM: Remove kvm_handle_hva_range helper functions From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" kvm_handle_hva_range is only used by the young notifiers. In a later patch, it will be even further tied to the young notifiers. Instead of renaming kvm_handle_hva_range to something like kvm_handle_hva_range_young, simply remove kvm_handle_hva_range. This seems slightly more readable, though there is slightly more code duplication. Finally, rename __kvm_handle_hva_range to kvm_handle_hva_range, now that the name is available. Suggested-by: David Matlack Signed-off-by: James Houghton --- virt/kvm/kvm_main.c | 81 +++++++++++++++++++++------------------------ 1 file changed, 37 insertions(+), 44 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d51357fd28d7..090e79e4304f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -589,8 +589,8 @@ static void kvm_null_fn(void) node; \ node =3D interval_tree_iter_next(node, start, last)) \ =20 -static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, - const struct kvm_mmu_notifier_range *range) +static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, + const struct kvm_mmu_notifier_range *range) { struct kvm_mmu_notifier_return r =3D { .ret =3D false, @@ -666,42 +666,6 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_r= ange(struct kvm *kvm, return r; } =20 -static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, - unsigned long start, - unsigned long end, - gfn_handler_t handler) -{ - struct kvm *kvm =3D mmu_notifier_to_kvm(mn); - const struct kvm_mmu_notifier_range range =3D { - .start =3D start, - .end =3D end, - .handler =3D handler, - .on_lock =3D (void *)kvm_null_fn, - .flush_on_ret =3D true, - .may_block =3D false, - }; - - return __kvm_handle_hva_range(kvm, &range).ret; -} - -static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifi= er *mn, - unsigned long start, - unsigned long end, - gfn_handler_t handler) -{ - struct kvm *kvm =3D mmu_notifier_to_kvm(mn); - const struct kvm_mmu_notifier_range range =3D { - .start =3D start, - .end =3D end, - .handler =3D handler, - .on_lock =3D (void *)kvm_null_fn, - .flush_on_ret =3D false, - .may_block =3D false, - }; - - return __kvm_handle_hva_range(kvm, &range).ret; -} - void kvm_mmu_invalidate_begin(struct kvm *kvm) { lockdep_assert_held_write(&kvm->mmu_lock); @@ -794,7 +758,7 @@ static int kvm_mmu_notifier_invalidate_range_start(stru= ct mmu_notifier *mn, * that guest memory has been reclaimed. This needs to be done *after* * dropping mmu_lock, as x86's reclaim path is slooooow. */ - if (__kvm_handle_hva_range(kvm, &hva_range).found_memslot) + if (kvm_handle_hva_range(kvm, &hva_range).found_memslot) kvm_arch_guest_memory_reclaimed(kvm); =20 return 0; @@ -840,7 +804,7 @@ static void kvm_mmu_notifier_invalidate_range_end(struc= t mmu_notifier *mn, }; bool wake; =20 - __kvm_handle_hva_range(kvm, &hva_range); + kvm_handle_hva_range(kvm, &hva_range); =20 /* Pairs with the increment in range_start(). */ spin_lock(&kvm->mn_invalidate_lock); @@ -862,9 +826,19 @@ static int kvm_mmu_notifier_clear_flush_young(struct m= mu_notifier *mn, unsigned long start, unsigned long end) { + struct kvm *kvm =3D mmu_notifier_to_kvm(mn); + const struct kvm_mmu_notifier_range range =3D { + .start =3D start, + .end =3D end, + .handler =3D kvm_age_gfn, + .on_lock =3D (void *)kvm_null_fn, + .flush_on_ret =3D true, + .may_block =3D false, + }; + trace_kvm_age_hva(start, end); =20 - return kvm_handle_hva_range(mn, start, end, kvm_age_gfn); + return kvm_handle_hva_range(kvm, &range).ret; } =20 static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, @@ -872,6 +846,16 @@ static int kvm_mmu_notifier_clear_young(struct mmu_not= ifier *mn, unsigned long start, unsigned long end) { + struct kvm *kvm =3D mmu_notifier_to_kvm(mn); + const struct kvm_mmu_notifier_range range =3D { + .start =3D start, + .end =3D end, + .handler =3D kvm_age_gfn, + .on_lock =3D (void *)kvm_null_fn, + .flush_on_ret =3D false, + .may_block =3D false, + }; + trace_kvm_age_hva(start, end); =20 /* @@ -887,17 +871,26 @@ static int kvm_mmu_notifier_clear_young(struct mmu_no= tifier *mn, * cadence. If we find this inaccurate, we might come up with a * more sophisticated heuristic later. */ - return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn); + return kvm_handle_hva_range(kvm, &range).ret; } =20 static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long address) { + struct kvm *kvm =3D mmu_notifier_to_kvm(mn); + const struct kvm_mmu_notifier_range range =3D { + .start =3D address, + .end =3D address + 1, + .handler =3D kvm_test_age_gfn, + .on_lock =3D (void *)kvm_null_fn, + .flush_on_ret =3D false, + .may_block =3D false, + }; + trace_kvm_test_age_hva(address); =20 - return kvm_handle_hva_range_no_flush(mn, address, address + 1, - kvm_test_age_gfn); + return kvm_handle_hva_range(kvm, &range).ret; } =20 static void kvm_mmu_notifier_release(struct mmu_notifier *mn, --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DB7C4A3E for ; Thu, 26 Sep 2024 01:35:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314520; cv=none; b=J3uKBqp6ip/8U7MBEDHGgBwFFuBcM4Bz5x7OJky5psTA2mBdoQZkVUCKHbzJ2qCf6FRH1mZv+mNvJg5nrKHOlSZB+2ISGXuJDZA8Jy4hIcc4UpReOrAAiLUUJRw1+ZaQaP9oqFi+jWJ+76r+DOfUirnAcFbjDTQyJDmgdV6AtNw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314520; c=relaxed/simple; bh=pFh4dn0DcxvpXIZdI7PRhX8hY1TmiFNRUmbeFvcagR0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=AnQDFVHr+YAyF4zCbGrRSIM7+ZZJq90pZtP4/23VFydaEb+1fGK1iRshjts/09LmyJKXBwVxHhOhrzTIox0kw0lE5HKE9jOH9kXA20UodqQ3BCtRuMQh6TaDVwmUF8wxKAwfJE3rryTswLJ+ED9Sww3lA9SJgNBLHJ/4As2og1k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0bUNuo1z; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0bUNuo1z" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-6d54ab222fcso13064207b3.1 for ; Wed, 25 Sep 2024 18:35:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314517; x=1727919317; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nrPGFdAf3DB4vj7nij04fMkTLi5+MW1J/YyJNP+NgJQ=; b=0bUNuo1zuvX/xdlLbkevtba/S+59HrgjxK9eSNMjQCr7KTaOuPWyIZ8tnV1I0DZ6n5 RM25YHs5HibNT3Wc+N2Eizz6ZZGob6q1SKi5SzKaY70uZCIrrk7t8MWcLn+ShgNw38VR xPAlQDb7mYLP0VzERh42Cdf14F1J3+XoSuq7lj/tnh4B5/l0xLypNUWWw486WaNG7Q6E C3HpHV+rIFDgdp6CdytSCXAhS5/ZSj8UyMJuPXxbOn+vk8W/Fs2hJdYgu+KyAnB7nTjI 2AWKGQs4pSqByLrDQDYTcBUGcAgFA6WoowbY7uXPFt0RaLgAQ71HxsZeD0S8ma0v9RCd wrlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314517; x=1727919317; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nrPGFdAf3DB4vj7nij04fMkTLi5+MW1J/YyJNP+NgJQ=; b=n0OR88wjYYbc4SCuTiYjRfR4c9H77cT1lgyMOmNZi4klrnYFCs0GO6D2xFPDFflCZn de72G6KT3B0VZSggockZBXPqIQR6FP4aoJTYSC4qBgSlWFzuT/AQOcf0oBGK8e96fSzZ U/UP4OhIuQg5xlKPTQhaySa/xGgZ8PB/MFmGAh+Tu+xt1A/+mB9zji1FfySvSEUiPHKr Bpy1rPJgbDah/KTODUW/Fs5Ge9+JYwHSvhviNG+RylUFyG72gAntB80w+Wxr+JklbJr1 kRfqg9A6aQBJwIPzp+j5yNeIKccC5GGRjNShzjHs5HObPjVcMvMdMRNl1caZR17o+2qS uIPg== X-Forwarded-Encrypted: i=1; AJvYcCUW48iFH6ujTuUO1PpCoEwCQ8p0lVvwXiSW8j1vuvoQBomyEBr1Zhfxs+2XQz/iWKIcTyUJgI3s9KRCo34=@vger.kernel.org X-Gm-Message-State: AOJu0Yw+sEJmg1VU5LzaQJmfUpsvf2QMQ3bLuDYG4NOPuQcatWv6TWd7 wiKYdQ8UmMRIfYw9vZMDijOImsU+V8/5t0bfvIKf1WAGea7eoNWN9EX1pCRKos0nP1LHzcJr+mH t+dWatlKcPJPoTgKRBQ== X-Google-Smtp-Source: AGHT+IHBkpgMYqE9bFhZXKzj+UxHtQzg97Wx4pqBE3O5Wvje89pjGl0kpLMaEu3zaYYGi2UruxZxZDhUO7JaGYM6 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a25:3546:0:b0:e1b:10df:78ea with SMTP id 3f1490d57ef6-e24d7df1a54mr3612276.4.1727314517106; Wed, 25 Sep 2024 18:35:17 -0700 (PDT) Date: Thu, 26 Sep 2024 01:34:50 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-3-jthoughton@google.com> Subject: [PATCH v7 02/18] KVM: Add lockless memslot walk to KVM From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Provide flexibility to the architecture to synchronize as optimally as they can instead of always taking the MMU lock for writing. Architectures that do their own locking must select CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS. The immediate application is to allow architectures to implement the test/clear_young MMU notifiers more cheaply. Suggested-by: Yu Zhao Signed-off-by: James Houghton --- include/linux/kvm_host.h | 1 + virt/kvm/Kconfig | 3 +++ virt/kvm/kvm_main.c | 28 +++++++++++++++++++++------- 3 files changed, 25 insertions(+), 7 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b23c6d48392f..98a987e88578 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -266,6 +266,7 @@ struct kvm_gfn_range { gfn_t end; union kvm_mmu_notifier_arg arg; bool may_block; + bool lockless; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index fd6a3010afa8..58d896b2f4ed 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -100,6 +100,9 @@ config KVM_GENERIC_MMU_NOTIFIER select MMU_NOTIFIER bool =20 +config KVM_MMU_NOTIFIER_YOUNG_LOCKLESS + bool + config KVM_GENERIC_MEMORY_ATTRIBUTES depends on KVM_GENERIC_MMU_NOTIFIER bool diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 090e79e4304f..7d5b35cfc1ed 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -555,6 +555,7 @@ struct kvm_mmu_notifier_range { on_lock_fn_t on_lock; bool flush_on_ret; bool may_block; + bool lockless; }; =20 /* @@ -609,6 +610,10 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_ran= ge(struct kvm *kvm, IS_KVM_NULL_FN(range->handler))) return r; =20 + /* on_lock will never be called for lockless walks */ + if (WARN_ON_ONCE(range->lockless && !IS_KVM_NULL_FN(range->on_lock))) + return r; + idx =3D srcu_read_lock(&kvm->srcu); =20 for (i =3D 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) { @@ -640,15 +645,18 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_ra= nge(struct kvm *kvm, gfn_range.start =3D hva_to_gfn_memslot(hva_start, slot); gfn_range.end =3D hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot); gfn_range.slot =3D slot; + gfn_range.lockless =3D range->lockless; =20 if (!r.found_memslot) { r.found_memslot =3D true; - KVM_MMU_LOCK(kvm); - if (!IS_KVM_NULL_FN(range->on_lock)) - range->on_lock(kvm); - - if (IS_KVM_NULL_FN(range->handler)) - goto mmu_unlock; + if (!range->lockless) { + KVM_MMU_LOCK(kvm); + if (!IS_KVM_NULL_FN(range->on_lock)) + range->on_lock(kvm); + + if (IS_KVM_NULL_FN(range->handler)) + goto mmu_unlock; + } } r.ret |=3D range->handler(kvm, &gfn_range); } @@ -658,7 +666,7 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_rang= e(struct kvm *kvm, kvm_flush_remote_tlbs(kvm); =20 mmu_unlock: - if (r.found_memslot) + if (r.found_memslot && !range->lockless) KVM_MMU_UNLOCK(kvm); =20 srcu_read_unlock(&kvm->srcu, idx); @@ -834,6 +842,8 @@ static int kvm_mmu_notifier_clear_flush_young(struct mm= u_notifier *mn, .on_lock =3D (void *)kvm_null_fn, .flush_on_ret =3D true, .may_block =3D false, + .lockless =3D + IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS), }; =20 trace_kvm_age_hva(start, end); @@ -854,6 +864,8 @@ static int kvm_mmu_notifier_clear_young(struct mmu_noti= fier *mn, .on_lock =3D (void *)kvm_null_fn, .flush_on_ret =3D false, .may_block =3D false, + .lockless =3D + IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS), }; =20 trace_kvm_age_hva(start, end); @@ -886,6 +898,8 @@ static int kvm_mmu_notifier_test_young(struct mmu_notif= ier *mn, .on_lock =3D (void *)kvm_null_fn, .flush_on_ret =3D false, .may_block =3D false, + .lockless =3D + IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS), }; =20 trace_kvm_test_age_hva(address); --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7722F13C9C7 for ; Thu, 26 Sep 2024 01:35:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314521; cv=none; b=S3AwIy3c+UUvj6oIzdWhd2baHziFWqiA6zSzzZo//k2xqTR0ItJWqnRJgYRKogGcMtSK6xJvYdsJHWM1OLty0Qdan9kkX6Sh2nc9KmFrSKAh4PWz2dJsSF/fH8CbPsHbIDWFWoKvWMJ6TaJCYhklLC+kzR3KRANMLKrB6SxBrCs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314521; c=relaxed/simple; bh=LXeRxqanC6hgm//uBKuWRGkPuW268d3E0Jk8XF6EZn0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=OsbO/jZosSxi+xG7RCOLElKFlzPdVjMTWPJ70nkQJ8mfH2AXOiEpH5kdpn59/vnSdLUR1j+iWsZjq6uw1cIeAT015S+1G+dX1KwTLSkgxjtvD/SdAQMfhPxEiEckQxaS4fxnMsfcg1zWunXp8unXURiQuPlAe79sArSs2XYVeKA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=hkFfrtNo; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="hkFfrtNo" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6e2270a147aso11245947b3.0 for ; Wed, 25 Sep 2024 18:35:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314518; x=1727919318; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CIRgMt/tPt8dDHjlNVeDudvk1PSxwGvmBpyKYz5f4us=; b=hkFfrtNoec1yRcXv24pnRpLHXmbsLnZ9lnI+bVu4IWbUK7jtup7ZDPU2kEYOn1cCHs xU8CfMNAGzvjiLuastARaTCBgxvkpyUGOVrG6SN4PtBwZnHd4CGdRE0yaEACB9hGbIkf h95169iSJMWvN3gydyhMQIe+25D2Je+ZLNOWv8u2M7TXPACcoB7GnOLmuTEmxG9ZC+tk JG6uhOMzx/TPEU7xQeerGEFDJNKMQ8ZrW23AIp5FavhV8RFwWtNG6Fai9MpGbzAsEE8B 4jvxsUZe04qnggdR93yWRirPG9Vub51uZDBzSt+GIxPvs2+LkDnaU+wADQDwzYf11hGj KzFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314518; x=1727919318; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CIRgMt/tPt8dDHjlNVeDudvk1PSxwGvmBpyKYz5f4us=; b=L1ha6etO7hQHC+ZeNykV4oFSyZvBJSL5UtpTyHWxcv5+oHqkKRXuhkMCgWQmydhpVU 3E94LY2OMN/fZPd3nwP9QgtEWk3H7ygDlFBRZVik30YNXmQK78sv2Fl8AsqbrqFL8A2w efHRIiauM22VY/n3jauxq4w4DKz+KHgzPBNMjLDPlN69ukIfORMrQzY4KgnblXDr9/XH 4d1Ecz0+DJ8veoL5kfU31OhrabR6HOWWb7JqWUQzKbux0JxO26KEG189HPloq3saKim2 FjO9Wh5JPFK0djlXbo+HqfnBo8dlcwnaNSmu7D8GbhY2NDGxwNWOHCg5obo1O93o5N9P uk3Q== X-Forwarded-Encrypted: i=1; AJvYcCVVDlae6O0O6T3WmHEPikB3sz42XsAhkHkVqNhrcYDY+qXII0oUXpoJTeB9Kp1HvcUXHsnTSMjdg6GO9Mw=@vger.kernel.org X-Gm-Message-State: AOJu0YwcwUz9xOPvDywzV60atZF18uT2Sa1WCvBreNXYDFAooAnQBdg/ paXMhHJ/deoFwFwSu7si/noZ7d1J99Ej166/qD5lH/rausoVoqozReIN/3NMc8w6qaRsLXYh33H K1MjCxqEFcekXZvLwtQ== X-Google-Smtp-Source: AGHT+IHo9npkiN5cwqAiCiI3e58k0AahBEai2pKIEYXOCcGxS2Z6C3puLSeMWFdZt66z08Hill7horZ6pvVyPTNq X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a05:690c:4a08:b0:6e2:371f:4afe with SMTP id 00721157ae682-6e2371f4d7dmr1417b3.4.1727314518344; Wed, 25 Sep 2024 18:35:18 -0700 (PDT) Date: Thu, 26 Sep 2024 01:34:51 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-4-jthoughton@google.com> Subject: [PATCH v7 03/18] KVM: x86/mmu: Factor out spte atomic bit clearing routine From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This new function, tdp_mmu_clear_spte_bits_atomic(), will be used in a follow-up patch to enable lockless Accessed and R/W/X bit clearing. Signed-off-by: James Houghton --- arch/x86/kvm/mmu/tdp_iter.h | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index 2880fd392e0c..ec171568487c 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -25,6 +25,13 @@ static inline u64 kvm_tdp_mmu_write_spte_atomic(tdp_ptep= _t sptep, u64 new_spte) return xchg(rcu_dereference(sptep), new_spte); } =20 +static inline u64 tdp_mmu_clear_spte_bits_atomic(tdp_ptep_t sptep, u64 mas= k) +{ + atomic64_t *sptep_atomic =3D (atomic64_t *)rcu_dereference(sptep); + + return (u64)atomic64_fetch_and(~mask, sptep_atomic); +} + static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) { KVM_MMU_WARN_ON(is_ept_ve_possible(new_spte)); @@ -65,10 +72,8 @@ static inline u64 tdp_mmu_clear_spte_bits(tdp_ptep_t spt= ep, u64 old_spte, { atomic64_t *sptep_atomic; =20 - if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) { - sptep_atomic =3D (atomic64_t *)rcu_dereference(sptep); - return (u64)atomic64_fetch_and(~mask, sptep_atomic); - } + if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) + return tdp_mmu_clear_spte_bits_atomic(sptep, mask); =20 __kvm_tdp_mmu_write_spte(sptep, old_spte & ~mask); return old_spte; --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC50117ADFF for ; Thu, 26 Sep 2024 01:35:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314522; cv=none; b=SO+G2G8/q6sdPA9cH2UnIYAlSt8g7A7KkpXJar3kNTEeUo0UFXHd8f3i8cdDeS6YuC/DWtVI9RGct8dI9GYww9O+bmpGUX0zWYcQiN5BXvsytOmpX+bBe+YlTXJfklO9zrH9qYyBtx366GDSqxf+8hKq6GO3k97dCJF0/9uhHDU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314522; c=relaxed/simple; bh=YSy/E4lmu4BnYYPKkFBN9wiRlocJQas8/7ToGEk8u2o=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=MnpkdV8JWeJA84EsOJOdIGEUlKXdekeUVAFZqlLx4eW0sw0EoSEYyC5qAhBecsx9CFpmR1Yz8efL1v4QrwdQg2xLVokebvJ++5wTrGJddMSWFbAAbGGaQKtqAQQiw7pDXTo+R9J+EfpQPY0efJM4ieIPuIvv2KBlAm5JXHcb2/s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=1NHxNoe7; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="1NHxNoe7" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6dbbeee08f0so23490087b3.0 for ; Wed, 25 Sep 2024 18:35:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314520; x=1727919320; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=E9ikx7qvh9LLa0/zOZpb6K1KXC0ViYbLtiviLVHMlqE=; b=1NHxNoe7nF/p3+NCSIYn2UiXNf5lrc/OzXSoXFoxy/Hwvd1DdU0idCvNhHJogjHcHI d8YR8K8uheaU/7HuNMMGVr8AwHc7an+XHiEAkDki432BxDHKiDSr06XC9Tc8MELWgKQT 1+k1xtowZ0rTHc5vGUowQ6UwhLaY17+uIc9oXzGtvkR0UGi/llSIEl8lMAIHHRvZfhzN qti8wrPA2RLziqV9x2jjUqHSK8i2HBG+PXpyfuJUOH8YwoU1VpEeZP94+ynAlQgPpMVd riRRgfIN+TaX0bdrzZdk6ygdpn6vLtGJ5uWEpW4z4/W59YvDKX2tQcXyywj5VTPU3lIf kSFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314520; x=1727919320; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=E9ikx7qvh9LLa0/zOZpb6K1KXC0ViYbLtiviLVHMlqE=; b=nRhf4T+/FcGStV1bcVR6T00dC9fXgsBsONmtwMMDMBc/BN1Ui9cbBSueS1i/0xGyxU f1QAvvl86vJkJB5SKCOm2bX8ZKHV6ZAm6I7BZKS5xORhKmcjd/2wPK5OF5HanxDsO7Eu bXbjWuYPFi1mqHvSKIxW9FNzWk4puW2lcH4DZ91T84J5uzxFyP8z8y2ws+uJ70GhIsHx QsX9DrBD1S/Ful7ur3AjgmA/JXeY/vCutLkVyPiXNS6TKeh9/q4msuvoD6q+pPMNIqIS wX2mebvCjGPxNSPswpkCaQllhnB76/LvaaN/9tjOAb/2DnApTo/x4sQCzZH0uCBQVdr2 /h1w== X-Forwarded-Encrypted: i=1; AJvYcCXJpSPzSEf9iy/Tfl+P7RKHasL0bhMeyUZYKoVxjqAbnIQyQ4dabOgvksvcB6j1nRu/SwY/oamKaNAw8t0=@vger.kernel.org X-Gm-Message-State: AOJu0Yz3sJ7bkTG+ewFOoVk1EeGwZgEexIkhs09kqQTHXR2C+qRzdymj mInvvio8rhlhpK4FVaTjobkWik2KlKSsEZcdWb6pB3ckLHEjMEfvnHd4Tgttd028RFJJSoqdESN FBtqO5n81BeUOeHAsew== X-Google-Smtp-Source: AGHT+IH0FiQjxyLo+JfGazQOCkHrPRgPRywE06mMVfat9iBHnlYATt5rsXjVLhFjPfx7L1BClqPZPXX/no7MzpGQ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a05:690c:2b92:b0:6dd:bb6e:ec89 with SMTP id 00721157ae682-6e22efd31b8mr137077b3.2.1727314519545; Wed, 25 Sep 2024 18:35:19 -0700 (PDT) Date: Thu, 26 Sep 2024 01:34:52 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-5-jthoughton@google.com> Subject: [PATCH v7 04/18] KVM: x86/mmu: Relax locking for kvm_test_age_gfn and kvm_age_gfn From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Walk the TDP MMU in an RCU read-side critical section without holding mmu_lock when harvesting and potentially updating age information on sptes. This requires a way to do RCU-safe walking of the tdp_mmu_roots; do this with a new macro. The PTE modifications are now done atomically, and kvm_tdp_mmu_spte_need_atomic_write() has been updated to account for the fact that kvm_age_gfn can now lockless update the accessed bit and the W/R/X bits). If the cmpxchg for marking the spte for access tracking fails, leave it as is and treat it as if it were young, as if the spte is being actively modified, it is most likely young. Harvesting age information from the shadow MMU is still done while holding the MMU write lock. Suggested-by: Yu Zhao Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu/mmu.c | 10 ++++-- arch/x86/kvm/mmu/tdp_iter.h | 14 ++++---- arch/x86/kvm/mmu/tdp_mmu.c | 57 ++++++++++++++++++++++++--------- 5 files changed, 58 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 46e0a466d7fb..adc814bad4bb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1454,6 +1454,7 @@ struct kvm_arch { * tdp_mmu_page set. * * For reads, this list is protected by: + * RCU alone or * the MMU lock in read mode + RCU or * the MMU lock in write mode * diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index faed96e33e38..3928e9b2d84a 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -23,6 +23,7 @@ config KVM depends on X86_LOCAL_APIC select KVM_COMMON select KVM_GENERIC_MMU_NOTIFIER + select KVM_MMU_NOTIFIER_YOUNG_LOCKLESS select HAVE_KVM_IRQCHIP select HAVE_KVM_PFNCACHE select HAVE_KVM_DIRTY_RING_TSO diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0d94354bb2f8..355a66c26517 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1649,8 +1649,11 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_ran= ge *range) { bool young =3D false; =20 - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + write_lock(&kvm->mmu_lock); young =3D kvm_rmap_age_gfn_range(kvm, range, false); + write_unlock(&kvm->mmu_lock); + } =20 if (tdp_mmu_enabled) young |=3D kvm_tdp_mmu_age_gfn_range(kvm, range); @@ -1662,8 +1665,11 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gf= n_range *range) { bool young =3D false; =20 - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + write_lock(&kvm->mmu_lock); young =3D kvm_rmap_age_gfn_range(kvm, range, true); + write_unlock(&kvm->mmu_lock); + } =20 if (tdp_mmu_enabled) young |=3D kvm_tdp_mmu_test_age_gfn(kvm, range); diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index ec171568487c..510936a8455a 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -39,10 +39,11 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t = sptep, u64 new_spte) } =20 /* - * SPTEs must be modified atomically if they are shadow-present, leaf - * SPTEs, and have volatile bits, i.e. has bits that can be set outside - * of mmu_lock. The Writable bit can be set by KVM's fast page fault - * handler, and Accessed and Dirty bits can be set by the CPU. + * SPTEs must be modified atomically if they have bits that can be set out= side + * of the mmu_lock. This can happen for any shadow-present leaf SPTEs, as = the + * Writable bit can be set by KVM's fast page fault handler, the Accessed = and + * Dirty bits can be set by the CPU, and the Accessed and R/X bits can be + * cleared by age_gfn_range. * * Note, non-leaf SPTEs do have Accessed bits and those bits are * technically volatile, but KVM doesn't consume the Accessed bit of @@ -53,8 +54,7 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sp= tep, u64 new_spte) static inline bool kvm_tdp_mmu_spte_need_atomic_write(u64 old_spte, int le= vel) { return is_shadow_present_pte(old_spte) && - is_last_spte(old_spte, level) && - spte_has_volatile_bits(old_spte); + is_last_spte(old_spte, level); } =20 static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, @@ -70,8 +70,6 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep= , u64 old_spte, static inline u64 tdp_mmu_clear_spte_bits(tdp_ptep_t sptep, u64 old_spte, u64 mask, int level) { - atomic64_t *sptep_atomic; - if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) return tdp_mmu_clear_spte_bits_atomic(sptep, mask); =20 diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 3b996c1fdaab..4477201c2d53 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -178,6 +178,15 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct k= vm *kvm, ((_only_valid) && (_root)->role.invalid))) { \ } else =20 +/* + * Iterate over all TDP MMU roots in an RCU read-side critical section. + */ +#define for_each_valid_tdp_mmu_root_rcu(_kvm, _root, _as_id) \ + list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link) \ + if ((_as_id >=3D 0 && kvm_mmu_page_as_id(_root) !=3D _as_id) || \ + (_root)->role.invalid) { \ + } else + #define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ __for_each_tdp_mmu_root(_kvm, _root, _as_id, false) =20 @@ -1222,6 +1231,26 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(s= truct kvm *kvm, return ret; } =20 +static __always_inline bool kvm_tdp_mmu_handle_gfn_lockless(struct kvm *kv= m, + struct kvm_gfn_range *range, + tdp_handler_t handler) +{ + struct kvm_mmu_page *root; + struct tdp_iter iter; + bool ret =3D false; + + rcu_read_lock(); + + for_each_valid_tdp_mmu_root_rcu(kvm, root, range->slot->as_id) { + tdp_root_for_each_leaf_pte(iter, root, range->start, range->end) + ret |=3D handler(kvm, &iter, range); + } + + rcu_read_unlock(); + + return ret; +} + /* * Mark the SPTEs range of GFNs [start, end) unaccessed and return non-zero * if any of the GFNs in the range have been accessed. @@ -1240,23 +1269,21 @@ static bool age_gfn_range(struct kvm *kvm, struct t= dp_iter *iter, return false; =20 if (spte_ad_enabled(iter->old_spte)) { - iter->old_spte =3D tdp_mmu_clear_spte_bits(iter->sptep, - iter->old_spte, - shadow_accessed_mask, - iter->level); + iter->old_spte =3D tdp_mmu_clear_spte_bits_atomic(iter->sptep, + shadow_accessed_mask); new_spte =3D iter->old_spte & ~shadow_accessed_mask; } else { - /* - * Capture the dirty status of the page, so that it doesn't get - * lost when the SPTE is marked for access tracking. - */ + new_spte =3D mark_spte_for_access_track(iter->old_spte); + if (__tdp_mmu_set_spte_atomic(iter, new_spte)) + /* + * The cmpxchg failed. Even if we had cleared the + * Accessed bit, it likely would have been set again, + * so this spte is probably young. + */ + return true; + if (is_writable_pte(iter->old_spte)) kvm_set_pfn_dirty(spte_to_pfn(iter->old_spte)); - - new_spte =3D mark_spte_for_access_track(iter->old_spte); - iter->old_spte =3D kvm_tdp_mmu_write_spte(iter->sptep, - iter->old_spte, new_spte, - iter->level); } =20 trace_kvm_tdp_mmu_spte_changed(iter->as_id, iter->gfn, iter->level, @@ -1266,7 +1293,7 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp= _iter *iter, =20 bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *rang= e) { - return kvm_tdp_mmu_handle_gfn(kvm, range, age_gfn_range); + return kvm_tdp_mmu_handle_gfn_lockless(kvm, range, age_gfn_range); } =20 static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter, @@ -1277,7 +1304,7 @@ static bool test_age_gfn(struct kvm *kvm, struct tdp_= iter *iter, =20 bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - return kvm_tdp_mmu_handle_gfn(kvm, range, test_age_gfn); + return kvm_tdp_mmu_handle_gfn_lockless(kvm, range, test_age_gfn); } =20 /* --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9DA418133C for ; Thu, 26 Sep 2024 01:35:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314523; cv=none; b=Gjd5kHaMipquF/U76m+ejUJh8S9X1nk8vyaburjnCEp/axspagsSD2oabowO45DbP6v6lVIY705egeK0uBud118ntnR3QNHZWe5+c+hJ6ge0GDvVY+ZPvwB/RByRH1zu/BL9u2Cw/UE1f8+smEmxCi0p1JzSbb7b7N/42lcoHgU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314523; c=relaxed/simple; bh=MSP16v22spRDZgdygkuPyo99YM6MsaYwZUFeeUwYgi0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=alGgkjuovq/EZhG+RqwtQ6MlKxbD/qUtemZmcZuEejdYcO13I2+plJ9LW38zMejBMoE30wMUgfuhgUAXe15+JAmNOLvzZiyPo2f0OLjYNgPDBTdOXGDLqqQm5fCxIhneFEipX6KPZEMD6ohkr8KOoeV19XrKWuZJXvfbAa4O2ck= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=jdNQFy2N; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="jdNQFy2N" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e0353b731b8so688954276.2 for ; Wed, 25 Sep 2024 18:35:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314521; x=1727919321; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=dAogBQT7i8/Ut4GrJolXC+/DUS73zZR+MbirDcTLTRg=; b=jdNQFy2NwTQ/OEl/hYfrxySWsR8l5S0fLcHy36u1gHEPeSSqeOIURi7gCyXgLwvZpc AQO1EgdhVH3Wyclw+tAf6dHCbiRyIfIkv8PSQUMww8F7cTGmOPQrYag7f3ktOutu/fG9 Ka5UoTl5Ll9GXwft750H3YchpS0SzknkFW/yX71291unvDA/ifaiGSfi03T7dFhYlg7l 6ydSFpeOKNqgE6lsec/GLG7e9oKL2XPyGQ+vPRpZErDtqYk6GWnjSAEUVgoCnnejuHm6 D3Y9FWxHPs9CFnd7czVolKBHWenxIu/YR/fBx+Nn8dSyp8HrqhX34PgGc8fcyvzLphHZ L/dQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314521; x=1727919321; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dAogBQT7i8/Ut4GrJolXC+/DUS73zZR+MbirDcTLTRg=; b=RhWTcGDbPFsS1Rw2HwvVsKQOQR8jp8jCJzKRcpih8nqFdOxUzWBJ+0eMi5XGV2fAt9 AAwWrzCccdNsEesi8QMDC4JdlL57H4lGPF6YgdBoE1L8H2JriQ1JipiPeu1rLUekyssJ HJkv4zAikskKUWrahR6hapTV1XkAl7X6LpGcFZGAxkVIvUkfcb9oyOEoVFvqCXyotg80 QC6wzFvBJL2JBGAqJyW5MWsWlhcAj96b0kBNjWcN4hIqXId5lnzuuQcGJ3fzax7TjTpy LX28+HdPhLyWlraWZD9ZLwaLvehi5rigd1gfrLWjLD132PAVvlgudSFVVvu43oWbtqHG GwPQ== X-Forwarded-Encrypted: i=1; AJvYcCWad55uH4CS4UGT0PLu59S0ctwsBbBb0G3cXvX8LVl1FTW1fAcaGVw3khmfXH2oGH05oTGAN8VepdnZoFE=@vger.kernel.org X-Gm-Message-State: AOJu0YyxBF5DUpzVFxabhv3TzFXIBzZnsuV6OSbyoddczRXR/ovzxm4K DuYqh9z+TGnptEgdQytG7amEhNmOGX1eoQyrzX1O9goz+54yNugpFhdyFS1FmABSCw5/73AwW6l SzKAzD6jV8HxzZ1zkOg== X-Google-Smtp-Source: AGHT+IFQs3DNBWoE9LT9ciS2x6yC/841ISG2URP+wJo+4FsamoM+oSoqTqpMkGycanRDnHNwxgVEwB1O11ztgMfN X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1782:b0:e11:7a38:8883 with SMTP id 3f1490d57ef6-e24d9613a90mr3400276.7.1727314520739; Wed, 25 Sep 2024 18:35:20 -0700 (PDT) Date: Thu, 26 Sep 2024 01:34:53 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-6-jthoughton@google.com> Subject: [PATCH v7 05/18] KVM: x86/mmu: Rearrange kvm_{test_,}age_gfn From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Reorder the TDP MMU check to be first for both kvm_test_age_gfn and kvm_age_gfn. For kvm_test_age_gfn, this allows us to completely avoid needing to grab the MMU lock when the TDP MMU reports that the page is young. Do the same for kvm_age_gfn merely for consistency. Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 355a66c26517..03df592284ac 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1649,15 +1649,15 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_ra= nge *range) { bool young =3D false; =20 + if (tdp_mmu_enabled) + young =3D kvm_tdp_mmu_age_gfn_range(kvm, range); + if (kvm_memslots_have_rmaps(kvm)) { write_lock(&kvm->mmu_lock); - young =3D kvm_rmap_age_gfn_range(kvm, range, false); + young |=3D kvm_rmap_age_gfn_range(kvm, range, false); write_unlock(&kvm->mmu_lock); } =20 - if (tdp_mmu_enabled) - young |=3D kvm_tdp_mmu_age_gfn_range(kvm, range); - return young; } =20 @@ -1665,15 +1665,15 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_g= fn_range *range) { bool young =3D false; =20 - if (kvm_memslots_have_rmaps(kvm)) { + if (tdp_mmu_enabled) + young =3D kvm_tdp_mmu_test_age_gfn(kvm, range); + + if (!young && kvm_memslots_have_rmaps(kvm)) { write_lock(&kvm->mmu_lock); - young =3D kvm_rmap_age_gfn_range(kvm, range, true); + young |=3D kvm_rmap_age_gfn_range(kvm, range, true); write_unlock(&kvm->mmu_lock); } =20 - if (tdp_mmu_enabled) - young |=3D kvm_tdp_mmu_test_age_gfn(kvm, range); - return young; } =20 --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-qt1-f201.google.com (mail-qt1-f201.google.com [209.85.160.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E4AB0187FF9 for ; Thu, 26 Sep 2024 01:35:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314524; cv=none; b=K9ZAnwEKT2L1gJSGaO1/ElF2+qYGhQLdQU8oLYOhiFnEEOxKq0Jd53sqB4FraxvKziHnF4K3+RtPtzbAxFVukrNp2e33iPRmhBXi4AfXHlJqPlpQT3yEFXtHDhlwHY77MbXxV+Ff3cJJYy3/xBgan3pUWtLUbSeFXqJ2OPY9MjU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314524; c=relaxed/simple; bh=hkxJqdv7VrfnIe/UVtmf3L4Q0b/v0J4ZLYo2Ee8Rysg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=GXZ8teoZ6D9YXdb5eiGY3wbDeaMwq2BVygdxj8mJqdypIsyw02P9JHFvqg96PDMgE6sWhgu/OF81vWoypiBdMpSA/PSvFEWRb38KJKsPl/uswB4GvM2qdCU5pJZoXAgmgYRf1rpciu9gP8NRgk7GZgGyaWJFpPTi3MUeVu6LEic= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=jNXTG1Op; arc=none smtp.client-ip=209.85.160.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="jNXTG1Op" Received: by mail-qt1-f201.google.com with SMTP id d75a77b69052e-4582a8b6376so9736931cf.1 for ; Wed, 25 Sep 2024 18:35:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314522; x=1727919322; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ZiB+e++yi01uCPO64xcLvipIH38DTgOpYlfSBrXkWpM=; b=jNXTG1OpLevsGbcR+MqpfVMZwgPXRmUJxePMjPFbYIVYVnzjIa+2vHGoF1izPxKCyt 9SVTXm1FZnY3ncNWznXj4uBLnlXbdn2GeYtWZWpSXKkcI5H3SRsxERO9Xp+TVqY0h7S+ AK30Cd/4CwnroUusiuT3bDUODwiI7Aiujsa4He4egeauG62IKDi7Z8XpvvvANOBOstGE ab8DFXXdhao4wNIwzqzYKiYMkXuvTcCXsOegUdstL1z6VfFEkPiNlo8fthIxpWJKkQWD +P/bZkinHQZmCIy1Tla4re1ec/smNsh3DEB6tzVDDAm5dWHrZMZaRzhmuv5d7cQ9ZxIL pklQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314522; x=1727919322; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZiB+e++yi01uCPO64xcLvipIH38DTgOpYlfSBrXkWpM=; b=TbLtyccSdV1xomfXnhr331l4AWLlEbnYsrSKDN3HCR5sL25wO2Kunz2hZAVPTxq2LZ cJqynARN/qRaykbZ8pyISxvTC1U6R+8527TVOIUZdnGESO09j8yacYeWP1cIzlr07uXE jPGMvf5VptZWAtJIUiCI+bob8us+Xd0ifSedXFVlYFh0F/qnU5paI9qQ9dxjUNSe+X9l hZQDzipA5tETpKEBzfAy6oH8VpSIPubEFbDKkLoS6QapZ4hxCEZ7bOqFOrEmsavkgOgE YOoQuAH23BhQ14Z8ZYip+iyDL0R6ctXGik6sK37cxgnxyW1CgXvV2F0Hyc7A4fqx+1Ko H0Yg== X-Forwarded-Encrypted: i=1; AJvYcCVgsSjRs2WCOghHlE9DdnhX9ETUJmBgLg1+xXkzrwQeFcfEGi4bYN1/tq3LW+i2gvxM+jHen3oLuXjMh8k=@vger.kernel.org X-Gm-Message-State: AOJu0Yy861TkPndJH59VIi+jdl8mwVZiC/+CZxK//+h2iPmrABUWbN79 2pJwOWlR/YCPCT5co1Y3MfPLKutmK/k2CvdJMjBoi1/2A7tavp4ItS0zii9vPZu25j/pYANWdgP qdqQlHAvO2dnhJVCmfw== X-Google-Smtp-Source: AGHT+IEij2jZTZ7Qfz4w3meuEhq8eeslauQ2yZtvTlXKLk5ZsUhbavG3fPazfw188lkIpQjxnlBFPCvmtl1c33sQ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:ac8:43cf:0:b0:458:4a61:2020 with SMTP id d75a77b69052e-45c9493ec3dmr13541cf.2.1727314521861; Wed, 25 Sep 2024 18:35:21 -0700 (PDT) Date: Thu, 26 Sep 2024 01:34:54 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-7-jthoughton@google.com> Subject: [PATCH v7 06/18] KVM: x86/mmu: Only check gfn age in shadow MMU if indirect_shadow_pages > 0 From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Optimize both kvm_age_gfn and kvm_test_age_gfn's interaction with the shadow MMU by, rather than checking if our memslot has rmaps, check if there are any indirect_shadow_pages at all. Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 03df592284ac..b4e543bdf3f0 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1645,6 +1645,11 @@ static bool kvm_rmap_age_gfn_range(struct kvm *kvm, return young; } =20 +static bool kvm_has_shadow_mmu_sptes(struct kvm *kvm) +{ + return !tdp_mmu_enabled || READ_ONCE(kvm->arch.indirect_shadow_pages); +} + bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young =3D false; @@ -1652,7 +1657,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_rang= e *range) if (tdp_mmu_enabled) young =3D kvm_tdp_mmu_age_gfn_range(kvm, range); =20 - if (kvm_memslots_have_rmaps(kvm)) { + if (kvm_has_shadow_mmu_sptes(kvm)) { write_lock(&kvm->mmu_lock); young |=3D kvm_rmap_age_gfn_range(kvm, range, false); write_unlock(&kvm->mmu_lock); @@ -1668,7 +1673,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn= _range *range) if (tdp_mmu_enabled) young =3D kvm_tdp_mmu_test_age_gfn(kvm, range); =20 - if (!young && kvm_memslots_have_rmaps(kvm)) { + if (!young && kvm_has_shadow_mmu_sptes(kvm)) { write_lock(&kvm->mmu_lock); young |=3D kvm_rmap_age_gfn_range(kvm, range, true); write_unlock(&kvm->mmu_lock); --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4E6818B499 for ; Thu, 26 Sep 2024 01:35:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314526; cv=none; b=hBuhPuB68W/dy5rEl40762tXAhUK71QSEd02r0J1lywdNlZuYp/JT93QwJCQDiW2rM9HwjoFG47FjMWp69Epq39yfIYOrlk4kWcb2gAEuqH1tmTB8jKzyzi/0ALuYSw/DX7rMUcfmGMJ/zpDlkKwcoMSp81wYBcxrqRMxq0Mt84= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314526; c=relaxed/simple; bh=i7ngJ/q5kYIHv8bPc7pXv+SigYhoKYSU3nQq3Jajqd0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=pakfqzTvQBbO+eRWjHpGFCqA+YlzncV5kiIhYutyhs91mrEZkbxycj3FdgOJiQq1/aMOwRh+h3KaAJlSPxpp7iqk/v4YBU1gF+8TaDNfHSpPFqxH8tGkoc0qZSm5yxH+u95sQNYdlbKxWONqSw7wbGuP41mFxG+EPvnSbtbTbPA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=MlGjxCjp; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="MlGjxCjp" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e2019b73d66so998757276.3 for ; Wed, 25 Sep 2024 18:35:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314524; x=1727919324; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MAX0OZs2fthT88FLek8b9DqcvP6EmNxdVeFSdhP6yzs=; b=MlGjxCjpMWcL04NToxLQANTbtyPP/2IaWbryTpoTSqvdMTDNpLq23u9qmd3cj/55HM xidhNhzxb9HwSOdOCkiPbHBpbv+T0Ax0s+awYfnt4brcZ4olgm70YwhpHDtC9D5GDjQt 88UjhmatowSNEuq9LN1I5NLo1kpERexAc4dBIQ/DB3m3jE2bN61E7aRADdsPlhJnYEVM u3ThGTkz0sm7PA7UyoGTzuV5vsE8QOrP2J246ZcG06s8Z875DZD8oV3SfK2KW0lPMjJn ijOeuUGDv0+xbQ7o6RSAE9X2Mlt8gxhZEHGIhHDLYacpuJA9iT0iXjMhTBq4h3KSauBg QUBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314524; x=1727919324; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MAX0OZs2fthT88FLek8b9DqcvP6EmNxdVeFSdhP6yzs=; b=Dnh3yZeYpk9hgpKOSmRu3tr4s2Vgjx71/CYjKJ+BLNeTmLsdPfMXhaB3o7p5ScyJ27 Q94M/VIRt7bLJRj6ps1Sox01Nat0d6v5t6cFmTFZJz3USSDlPllCwfA91WqAEPZ77MQj fMqg+D4m/F5PWAuyEADQwu+8W3jyRSaAput2cCDKTrWzbfG0kki4FuRZ2sG/gGtmVo8G I2Ah1tsukP7/sU9rTWKNtLoC+8U/kh+4O6+nS7bvYQuuAAyZgrIlsG5dkJjAhd4ztaWA 7hBOd7PKpbsOYYEm+j6ePLQ0up+9Qseh7SNOlOC14syJBY/5BtFKHA4AxVzfcARDNFgv Z4cg== X-Forwarded-Encrypted: i=1; AJvYcCXNXutiT7x1Zzu4u1SrWpWjd0Um1YUVXH3mW54Q3H0ZfWYFH/kBUyJ5Q66gOj8rRfAS5z00MUL1Jd8g8DA=@vger.kernel.org X-Gm-Message-State: AOJu0YxwXjLkEJCdFhLg1t2pEdQa/NOCo84Kf2+/ZuieCE9lyooRs42d E7vrmvnDmB7bs0elwEW0/pCFsxzAQI0MsIvJwv8dPWYzEWvXDNJuYteqnpdnaO1DVuXqyld5Lq7 lAdEwTge8xJHZeGVctg== X-Google-Smtp-Source: AGHT+IGaRNcJ0pUCHmWao5UQiFUuGRcX2zXYa1voIITtc2canA4FkWeSC7N08ZTT/PlEa6ZAuN210z4HqmS+Jzo6 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a25:d608:0:b0:e11:6a73:b0d with SMTP id 3f1490d57ef6-e24d9dc640bmr3172276.6.1727314523065; Wed, 25 Sep 2024 18:35:23 -0700 (PDT) Date: Thu, 26 Sep 2024 01:34:55 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-8-jthoughton@google.com> Subject: [PATCH v7 07/18] KVM: x86/mmu: Refactor low level rmap helpers to prep for walking w/o mmu_lock From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Refactor the pte_list and rmap code to always read and write rmap_head->val exactly once, e.g. by collecting changes in a local variable and then propagating those changes back to rmap_head->val as appropriate. This will allow implementing a per-rmap rwlock (of sorts) by adding a LOCKED bit into the rmap value alongside the MANY bit. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 83 +++++++++++++++++++++++++----------------- 1 file changed, 50 insertions(+), 33 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index b4e543bdf3f0..17de470f542c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -920,21 +920,24 @@ static struct kvm_memory_slot *gfn_to_memslot_dirty_b= itmap(struct kvm_vcpu *vcpu static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, struct kvm_rmap_head *rmap_head) { + unsigned long old_val, new_val; struct pte_list_desc *desc; int count =3D 0; =20 - if (!rmap_head->val) { - rmap_head->val =3D (unsigned long)spte; - } else if (!(rmap_head->val & KVM_RMAP_MANY)) { + old_val =3D rmap_head->val; + + if (!old_val) { + new_val =3D (unsigned long)spte; + } else if (!(old_val & KVM_RMAP_MANY)) { desc =3D kvm_mmu_memory_cache_alloc(cache); - desc->sptes[0] =3D (u64 *)rmap_head->val; + desc->sptes[0] =3D (u64 *)old_val; desc->sptes[1] =3D spte; desc->spte_count =3D 2; desc->tail_count =3D 0; - rmap_head->val =3D (unsigned long)desc | KVM_RMAP_MANY; + new_val =3D (unsigned long)desc | KVM_RMAP_MANY; ++count; } else { - desc =3D (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc =3D (struct pte_list_desc *)(old_val & ~KVM_RMAP_MANY); count =3D desc->tail_count + desc->spte_count; =20 /* @@ -943,21 +946,25 @@ static int pte_list_add(struct kvm_mmu_memory_cache *= cache, u64 *spte, */ if (desc->spte_count =3D=3D PTE_LIST_EXT) { desc =3D kvm_mmu_memory_cache_alloc(cache); - desc->more =3D (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY= ); + desc->more =3D (struct pte_list_desc *)(old_val & ~KVM_RMAP_MANY); desc->spte_count =3D 0; desc->tail_count =3D count; - rmap_head->val =3D (unsigned long)desc | KVM_RMAP_MANY; + new_val =3D (unsigned long)desc | KVM_RMAP_MANY; + } else { + new_val =3D old_val; } desc->sptes[desc->spte_count++] =3D spte; } + + rmap_head->val =3D new_val; + return count; } =20 -static void pte_list_desc_remove_entry(struct kvm *kvm, - struct kvm_rmap_head *rmap_head, +static void pte_list_desc_remove_entry(struct kvm *kvm, unsigned long *rma= p_val, struct pte_list_desc *desc, int i) { - struct pte_list_desc *head_desc =3D (struct pte_list_desc *)(rmap_head->v= al & ~KVM_RMAP_MANY); + struct pte_list_desc *head_desc =3D (struct pte_list_desc *)(*rmap_val & = ~KVM_RMAP_MANY); int j =3D head_desc->spte_count - 1; =20 /* @@ -984,9 +991,9 @@ static void pte_list_desc_remove_entry(struct kvm *kvm, * head at the next descriptor, i.e. the new head. */ if (!head_desc->more) - rmap_head->val =3D 0; + *rmap_val =3D 0; else - rmap_head->val =3D (unsigned long)head_desc->more | KVM_RMAP_MANY; + *rmap_val =3D (unsigned long)head_desc->more | KVM_RMAP_MANY; mmu_free_pte_list_desc(head_desc); } =20 @@ -994,24 +1001,26 @@ static void pte_list_remove(struct kvm *kvm, u64 *sp= te, struct kvm_rmap_head *rmap_head) { struct pte_list_desc *desc; + unsigned long rmap_val; int i; =20 - if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_head->val, kvm)) - return; + rmap_val =3D rmap_head->val; + if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_val, kvm)) + goto out; =20 - if (!(rmap_head->val & KVM_RMAP_MANY)) { - if (KVM_BUG_ON_DATA_CORRUPTION((u64 *)rmap_head->val !=3D spte, kvm)) - return; + if (!(rmap_val & KVM_RMAP_MANY)) { + if (KVM_BUG_ON_DATA_CORRUPTION((u64 *)rmap_val !=3D spte, kvm)) + goto out; =20 - rmap_head->val =3D 0; + rmap_val =3D 0; } else { - desc =3D (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc =3D (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); while (desc) { for (i =3D 0; i < desc->spte_count; ++i) { if (desc->sptes[i] =3D=3D spte) { - pte_list_desc_remove_entry(kvm, rmap_head, + pte_list_desc_remove_entry(kvm, &rmap_val, desc, i); - return; + goto out; } } desc =3D desc->more; @@ -1019,6 +1028,9 @@ static void pte_list_remove(struct kvm *kvm, u64 *spt= e, =20 KVM_BUG_ON_DATA_CORRUPTION(true, kvm); } + +out: + rmap_head->val =3D rmap_val; } =20 static void kvm_zap_one_rmap_spte(struct kvm *kvm, @@ -1033,17 +1045,19 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, struct kvm_rmap_head *rmap_head) { struct pte_list_desc *desc, *next; + unsigned long rmap_val; int i; =20 - if (!rmap_head->val) + rmap_val =3D rmap_head->val; + if (!rmap_val) return false; =20 - if (!(rmap_head->val & KVM_RMAP_MANY)) { - mmu_spte_clear_track_bits(kvm, (u64 *)rmap_head->val); + if (!(rmap_val & KVM_RMAP_MANY)) { + mmu_spte_clear_track_bits(kvm, (u64 *)rmap_val); goto out; } =20 - desc =3D (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc =3D (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); =20 for (; desc; desc =3D next) { for (i =3D 0; i < desc->spte_count; i++) @@ -1059,14 +1073,15 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, =20 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head) { + unsigned long rmap_val =3D rmap_head->val; struct pte_list_desc *desc; =20 - if (!rmap_head->val) + if (!rmap_val) return 0; - else if (!(rmap_head->val & KVM_RMAP_MANY)) + else if (!(rmap_val & KVM_RMAP_MANY)) return 1; =20 - desc =3D (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc =3D (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); return desc->tail_count + desc->spte_count; } =20 @@ -1109,6 +1124,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) */ struct rmap_iterator { /* private fields */ + struct rmap_head *head; struct pte_list_desc *desc; /* holds the sptep if not NULL */ int pos; /* index of the sptep */ }; @@ -1123,18 +1139,19 @@ struct rmap_iterator { static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, struct rmap_iterator *iter) { + unsigned long rmap_val =3D rmap_head->val; u64 *sptep; =20 - if (!rmap_head->val) + if (!rmap_val) return NULL; =20 - if (!(rmap_head->val & KVM_RMAP_MANY)) { + if (!(rmap_val & KVM_RMAP_MANY)) { iter->desc =3D NULL; - sptep =3D (u64 *)rmap_head->val; + sptep =3D (u64 *)rmap_val; goto out; } =20 - iter->desc =3D (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + iter->desc =3D (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); iter->pos =3D 0; sptep =3D iter->desc->sptes[iter->pos]; out: --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D58C618CC04 for ; Thu, 26 Sep 2024 01:35:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314527; cv=none; b=j3FpBq7UnhIHP6DABrKs+pRVSNsh0xB49tPmeTMAGE6eoKHur5HpwIFaQ2lZNBo8nptHAeXW0TPe1ffuGnyrYLZi8aBaV7ln/haPUZExQKHmXqYzt8vHrYFZH/9AneSSAjYrjITuFf9YXVYfRRN/koqCohKZrIN2GlSDPthq3jM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314527; c=relaxed/simple; bh=kDtNUQWNt+SQcHHta/bZAK9WcZs+ACOnkP4YoyybjWg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=iA9Uw6Y8mSrRCDvXlBd9xFo0mxjht7wBSCJ8hMDQiB36/AsIL7DtcapYvZQZFThugM442VTGCfOHA+2HURXw4Ck23JOKWF1Zmc27Ocavn4Bv9FqTaH8B7puqWaZpPHUTEZ9cCM1TCoPmlPzGwrXu94arwxmZRMdpykqSnasVnEk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=pTYd9M30; arc=none smtp.client-ip=209.85.219.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="pTYd9M30" Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-e163641feb9so1193562276.0 for ; Wed, 25 Sep 2024 18:35:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314525; x=1727919325; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PEKVE3jXTsgIQ3oSBpZASsaX7oLzK9gVxgGb+Q0WSSI=; b=pTYd9M30sW29imMHTGG3YgJ2MSDnA1SjUjerKMt1O6uAudzZLN4ieODN0kRgBbflMv 2LTxiHnhMYUTGCPXpxKHd28OEV5NleG9+9G7s/XKhBtIEvEIUUKNp/uDuU20Yl8zSVKV Hz6m4ZqKbKDdiCVx9gKcoTmJ1UEEaFRj+atWYxSSeOSexLuptxX2T/HsPdgKAgPajIJu UiQWOCvLnsoNoJIfyG7xGmatO254+BQKNzZAHa7w8IQeW1QHG8c9TPYRPf80Eb0DfErQ k/0JkByZhMottH49kI8sgl1GYcdgVRvxSApforIY2ZDlhX4JCNee9m5i0uiHCMF9tJhw lhxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314525; x=1727919325; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PEKVE3jXTsgIQ3oSBpZASsaX7oLzK9gVxgGb+Q0WSSI=; b=ngMEZObQsZaYiSrkx4DsMuuwxl49/5wTjKyV5iXjdyQKTCsvPelo0K4cG0QNRlwjoO Ria/3LfL1G6M762EkRumv5KqPDmTWeLE+SYMhxplb5MOIGd691zjM8SlfkKY4xF1fPYQ btw0G7YeCDkADtrZ7SK4i7gUgcu8snpY3YgrOpB6e0va4dtQ9srfX7XIX1c2F6i7woeY mbxPhHF+ac1L8pqJYD4veEzmWOLMkhlURlVZTceppMQ+cyMkhqcdzMaLUx4vy7ei9KUo HTjUXaN4+gNBSKnvoog05s7aAKTZzMVDkXNzyWEiP+C/dkht1oLO2ioa/h3T5uREJnYz hLHg== X-Forwarded-Encrypted: i=1; AJvYcCVLZCb+WnJUxxIEbW/o86sJKKaZQTaLR77KbL1ZcP0bUgnUj4zcePxNOJLfG03zKeOD9lMv2l0ixtfPviY=@vger.kernel.org X-Gm-Message-State: AOJu0YxL6hV9p6ztClSsTknGw/RZ2EQ3Un1sBZdgPdaNjYbiClWneEnG lefU8MYFXGQoRBDewRqUOKXXUdUGVfgPEcQAuNykPvi0+r9FauAkzJ8NhuCjbU+ctQ5ffUYTQ0X oaEkf9HEQsZWnHQSgNA== X-Google-Smtp-Source: AGHT+IF3/FYwkPT4bz8o9cvma+KKRWpatKrEO97ngoYUR5+iC7Jt4hs+IFCE9RrvcEVoXK4mDoAHACDMu6yNCXzE X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a5b:704:0:b0:e20:2da6:ed77 with SMTP id 3f1490d57ef6-e25ca95c803mr24214276.5.1727314524854; Wed, 25 Sep 2024 18:35:24 -0700 (PDT) Date: Thu, 26 Sep 2024 01:34:56 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-9-jthoughton@google.com> Subject: [PATCH v7 08/18] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Steal another bit from rmap entries (which are word aligned pointers, i.e. have 2 free bits on 32-bit KVM, and 3 free bits on 64-bit KVM), and use the bit to implement a *very* rudimentary per-rmap spinlock. The only anticipated usage of the lock outside of mmu_lock is for aging gfns, and collisions between aging and other MMU rmap operations are quite rare, e.g. unless userspace is being silly and aging a tiny range over and over in a tight loop, time between contention when aging an actively running VM is O(seconds). In short, a more sophisticated locking scheme shouldn't be necessary. Note, the lock only protects the rmap structure itself, SPTEs that are pointed at by a locked rmap can still be modified and zapped by another task (KVM drops/zaps SPTEs before deleting the rmap entries) Signed-off-by: Sean Christopherson Co-developed-by: James Houghton Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/mmu/mmu.c | 129 +++++++++++++++++++++++++++++--- 2 files changed, 120 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index adc814bad4bb..d1164ca3e840 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -26,6 +26,7 @@ #include #include #include +#include =20 #include #include @@ -401,7 +402,7 @@ union kvm_cpu_role { }; =20 struct kvm_rmap_head { - unsigned long val; + atomic_long_t val; }; =20 struct kvm_pio_request { diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 17de470f542c..79676798ba77 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -909,11 +909,117 @@ static struct kvm_memory_slot *gfn_to_memslot_dirty_= bitmap(struct kvm_vcpu *vcpu * About rmap_head encoding: * * If the bit zero of rmap_head->val is clear, then it points to the only = spte - * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct + * in this rmap chain. Otherwise, (rmap_head->val & ~3) points to a struct * pte_list_desc containing more mappings. */ #define KVM_RMAP_MANY BIT(0) =20 +/* + * rmaps and PTE lists are mostly protected by mmu_lock (the shadow MMU al= ways + * operates with mmu_lock held for write), but rmaps can be walked without + * holding mmu_lock so long as the caller can tolerate SPTEs in the rmap c= hain + * being zapped/dropped _while the rmap is locked_. + * + * Other than the KVM_RMAP_LOCKED flag, modifications to rmap entries must= be + * done while holding mmu_lock for write. This allows a task walking rmaps + * without holding mmu_lock to concurrently walk the same entries as a task + * that is holding mmu_lock but _not_ the rmap lock. Neither task will mo= dify + * the rmaps, thus the walks are stable. + * + * As alluded to above, SPTEs in rmaps are _not_ protected by KVM_RMAP_LOC= KED, + * only the rmap chains themselves are protected. E.g. holding an rmap's = lock + * ensures all "struct pte_list_desc" fields are stable. + */ +#define KVM_RMAP_LOCKED BIT(1) + +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) +{ + unsigned long old_val, new_val; + + /* + * Elide the lock if the rmap is empty, as lockless walkers (read-only + * mode) don't need to (and can't) walk an empty rmap, nor can they add + * entries to the rmap. I.e. the only paths that process empty rmaps + * do so while holding mmu_lock for write, and are mutually exclusive. + */ + old_val =3D atomic_long_read(&rmap_head->val); + if (!old_val) + return 0; + + do { + /* + * If the rmap is locked, wait for it to be unlocked before + * trying acquire the lock, e.g. to bounce the cache line. + */ + while (old_val & KVM_RMAP_LOCKED) { + old_val =3D atomic_long_read(&rmap_head->val); + cpu_relax(); + } + + /* + * Recheck for an empty rmap, it may have been purged by the + * task that held the lock. + */ + if (!old_val) + return 0; + + new_val =3D old_val | KVM_RMAP_LOCKED; + /* + * Use try_cmpxchg_acquire to prevent reads and writes to the rmap + * from being reordered outside of the critical section created by + * __kvm_rmap_lock. + * + * Pairs with smp_store_release in kvm_rmap_unlock. + * + * For the !old_val case, no ordering is needed, as there is no rmap + * to walk. + */ + } while (!atomic_long_try_cmpxchg_acquire(&rmap_head->val, &old_val, new_= val)); + + /* Return the old value, i.e. _without_ the LOCKED bit set. */ + return old_val; +} + +static void kvm_rmap_unlock(struct kvm_rmap_head *rmap_head, + unsigned long new_val) +{ + WARN_ON_ONCE(new_val & KVM_RMAP_LOCKED); + /* + * Ensure that all accesses to the rmap have completed + * before we actually unlock the rmap. + * + * Pairs with the atomic_long_try_cmpxchg_acquire in __kvm_rmap_lock. + */ + atomic_long_set_release(&rmap_head->val, new_val); +} + +static unsigned long kvm_rmap_get(struct kvm_rmap_head *rmap_head) +{ + return atomic_long_read(&rmap_head->val) & ~KVM_RMAP_LOCKED; +} + +/* + * If mmu_lock isn't held, rmaps can only locked in read-only mode. The a= ctual + * locking is the same, but the caller is disallowed from modifying the rm= ap, + * and so the unlock flow is a nop if the rmap is/was empty. + */ +__maybe_unused +static unsigned long kvm_rmap_lock_readonly(struct kvm_rmap_head *rmap_hea= d) +{ + return __kvm_rmap_lock(rmap_head); +} + +__maybe_unused +static void kvm_rmap_unlock_readonly(struct kvm_rmap_head *rmap_head, + unsigned long old_val) +{ + if (!old_val) + return; + + KVM_MMU_WARN_ON(old_val !=3D kvm_rmap_get(rmap_head)); + atomic_long_set(&rmap_head->val, old_val); +} + /* * Returns the number of pointers in the rmap chain, not counting the new = one. */ @@ -924,7 +1030,7 @@ static int pte_list_add(struct kvm_mmu_memory_cache *c= ache, u64 *spte, struct pte_list_desc *desc; int count =3D 0; =20 - old_val =3D rmap_head->val; + old_val =3D kvm_rmap_lock(rmap_head); =20 if (!old_val) { new_val =3D (unsigned long)spte; @@ -956,7 +1062,7 @@ static int pte_list_add(struct kvm_mmu_memory_cache *c= ache, u64 *spte, desc->sptes[desc->spte_count++] =3D spte; } =20 - rmap_head->val =3D new_val; + kvm_rmap_unlock(rmap_head, new_val); =20 return count; } @@ -1004,7 +1110,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spt= e, unsigned long rmap_val; int i; =20 - rmap_val =3D rmap_head->val; + rmap_val =3D kvm_rmap_lock(rmap_head); if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_val, kvm)) goto out; =20 @@ -1030,7 +1136,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spt= e, } =20 out: - rmap_head->val =3D rmap_val; + kvm_rmap_unlock(rmap_head, rmap_val); } =20 static void kvm_zap_one_rmap_spte(struct kvm *kvm, @@ -1048,7 +1154,7 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, unsigned long rmap_val; int i; =20 - rmap_val =3D rmap_head->val; + rmap_val =3D kvm_rmap_lock(rmap_head); if (!rmap_val) return false; =20 @@ -1067,13 +1173,13 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, } out: /* rmap_head is meaningless now, remember to reset it */ - rmap_head->val =3D 0; + kvm_rmap_unlock(rmap_head, 0); return true; } =20 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head) { - unsigned long rmap_val =3D rmap_head->val; + unsigned long rmap_val =3D kvm_rmap_get(rmap_head); struct pte_list_desc *desc; =20 if (!rmap_val) @@ -1139,7 +1245,7 @@ struct rmap_iterator { static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, struct rmap_iterator *iter) { - unsigned long rmap_val =3D rmap_head->val; + unsigned long rmap_val =3D kvm_rmap_get(rmap_head); u64 *sptep; =20 if (!rmap_val) @@ -1483,7 +1589,7 @@ static void slot_rmap_walk_next(struct slot_rmap_walk= _iterator *iterator) while (++iterator->rmap <=3D iterator->end_rmap) { iterator->gfn +=3D KVM_PAGES_PER_HPAGE(iterator->level); =20 - if (iterator->rmap->val) + if (atomic_long_read(&iterator->rmap->val)) return; } =20 @@ -2513,7 +2619,8 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct k= vm_mmu_page *sp, * avoids retaining a large number of stale nested SPs. */ if (tdp_enabled && invalid_list && - child->role.guest_mode && !child->parent_ptes.val) + child->role.guest_mode && + !atomic_long_read(&child->parent_ptes.val)) return kvm_mmu_prepare_zap_page(kvm, child, invalid_list); } --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F80818E743 for ; Thu, 26 Sep 2024 01:35:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314529; cv=none; b=uP42Hqa9hez33S0fHt6KiiyrIpBn80R9DOxVQVQQyFvKyOoMprSKKkoA01nvddi0UwehJ6xjcQbFrN1Nz+XM0zTZTzzzoZcc6SrrgeG8Z20eSXVw7vWngz7BXE+iF0aGt0zg0RzDX8ZuB/HCoSvd4x5aYBqXbePLpB2BK6BeVEo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314529; c=relaxed/simple; bh=0raPSAp7HnmbBOxblzsQepM7TqKA7NY1yCc61VEgCKk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=lm+OjA0xuROBgiOK81n69CPBn/jbNXTlY+z+5ma6V5wxcojjeObbWID/T6xYxkS96/7zgdKJopQ8w64jPECuVvN28x2l1P/1aLce6cyLVhNGApPmTDpXI4pcHBQRBY4pvQQChx8pIKORtEbSlUY2Ubi5mtDb5cpOHhiJZcYIYqU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=m/VTRhi4; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="m/VTRhi4" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6e230808388so6923347b3.0 for ; Wed, 25 Sep 2024 18:35:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314526; x=1727919326; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MfYThc4u3zoEwhsYqTZzHZoRr9LypPNnm95GFpy1spA=; b=m/VTRhi4Vb+l3sQ9+TsKzd34SVYj5ofUYIRw29d46kdZg/s++sbiH2HxV6TUSI+WVe qrvHnx2MdK5vGGg7epJE12uX2HJOeFDslmOQvTk+RW6WJqC/GIttuF5sNSWF2IDe6r84 Oyl3FwIKr2RwMmIGCk06foCowlPWYirhsg5auSAa1k2dFNi/fcTElYCMWW0g9JrQCR8Q RnZw2JSWLo2sbZpZx2YFITuhzBYe2UAHX3CfkY8Ofe7gULoecF/Wfp1c+cyBdt5dJB8M Z9MCwdscIWgwu8MHVeq7SYR0pGIa7TpFAEsABLL5UCGNWygeUXj11bZIfoczL/UeldZR DvPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314526; x=1727919326; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MfYThc4u3zoEwhsYqTZzHZoRr9LypPNnm95GFpy1spA=; b=XikkU4OrGjozlhwjuKRNll0RPPrZqrvmyyweImpgQtgzEYswK0fVFrGCKxkRP6UHhK w8s8s3cCzZxPslPkvarZoFUFz81HFc4UYbKK3aN7oHF9mgiwDjIPokzwwOlsatbSj7j8 Lz6wGQcahhSYtRXwaxHLP7f+iugIODFS1T1qBq9Zq2Px7KycRjlHpFI2ndRDRtq+NDLK a9mM0tg7RQ0Yyx2SGgK99GQGaoV50noLeYXELMCWZkM+grus1udB50qvl4kyz0zKgddS ZkUebcnjsM3ItZC6BYvGiLg1VkjiDdIXZx10aBUDEq6ajEWIDys7QiUo8COPlxPRbhKr Eoaw== X-Forwarded-Encrypted: i=1; AJvYcCVCUmut3bWzNZmXk86Qh/rjhwRSnTI3HMuoSDGANCPyjm3FJcAFvrRbpdv+GE0IfvSbnlk6EHnyZN1p4IU=@vger.kernel.org X-Gm-Message-State: AOJu0Yx6pHwGRftxO2tyhewPkBiyGjuKT8CnEYSEVyVoqC4wVPQyKNUz 6S6zANMXly71y2iLhIdbxqWluZskJGNs2SndHwtUIQDbsr1VpJLncmyA7uf34t7hfTKVm4Cl90+ yoouNF5iV3P54a+FYdw== X-Google-Smtp-Source: AGHT+IFc2BTb/n/6/Pa4YiRnYJohGJI1GOQLPmGDGx2avUwnGUEvHFpkm5PUE1m++wXUG9sRJDYpg95gpRMXrHPb X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a05:690c:5086:b0:6e2:1b8c:39bf with SMTP id 00721157ae682-6e21d835b06mr289447b3.2.1727314525948; Wed, 25 Sep 2024 18:35:25 -0700 (PDT) Date: Thu, 26 Sep 2024 01:34:57 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-10-jthoughton@google.com> Subject: [PATCH v7 09/18] KVM: x86/mmu: Add support for lockless walks of rmap SPTEs From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Add a lockless version of for_each_rmap_spte(), which is pretty much the same as the normal version, except that it doesn't BUG() the host if a non-present SPTE is encountered. When mmu_lock is held, it should be impossible for a different task to zap a SPTE, _and_ zapped SPTEs must be removed from their rmap chain prior to dropping mmu_lock. Thus, the normal walker BUG()s if a non-present SPTE is encountered as something is wildly broken. When walking rmaps without holding mmu_lock, the SPTEs pointed at by the rmap chain can be zapped/dropped, and so a lockless walk can observe a non-present SPTE if it runs concurrently with a different operation that is zapping SPTEs. Signed-off-by: Sean Christopherson [jthoughton: Added lockdep assertion for kvm_rmap_lock, synchronization fix= up] Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 75 +++++++++++++++++++++++------------------- 1 file changed, 42 insertions(+), 33 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 79676798ba77..72c682fa207a 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -932,7 +932,7 @@ static struct kvm_memory_slot *gfn_to_memslot_dirty_bit= map(struct kvm_vcpu *vcpu */ #define KVM_RMAP_LOCKED BIT(1) =20 -static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) +static unsigned long __kvm_rmap_lock(struct kvm_rmap_head *rmap_head) { unsigned long old_val, new_val; =20 @@ -976,14 +976,25 @@ static unsigned long kvm_rmap_lock(struct kvm_rmap_he= ad *rmap_head) */ } while (!atomic_long_try_cmpxchg_acquire(&rmap_head->val, &old_val, new_= val)); =20 - /* Return the old value, i.e. _without_ the LOCKED bit set. */ + /* + * Return the old value, i.e. _without_ the LOCKED bit set. It's + * impossible for the return value to be 0 (see above), i.e. the read- + * only unlock flow can't get a false positive and fail to unlock. + */ return old_val; } =20 +static unsigned long kvm_rmap_lock(struct kvm *kvm, + struct kvm_rmap_head *rmap_head) +{ + lockdep_assert_held_write(&kvm->mmu_lock); + return __kvm_rmap_lock(rmap_head); +} + static void kvm_rmap_unlock(struct kvm_rmap_head *rmap_head, unsigned long new_val) { - WARN_ON_ONCE(new_val & KVM_RMAP_LOCKED); + KVM_MMU_WARN_ON(new_val & KVM_RMAP_LOCKED); /* * Ensure that all accesses to the rmap have completed * before we actually unlock the rmap. @@ -1023,14 +1034,14 @@ static void kvm_rmap_unlock_readonly(struct kvm_rma= p_head *rmap_head, /* * Returns the number of pointers in the rmap chain, not counting the new = one. */ -static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, - struct kvm_rmap_head *rmap_head) +static int pte_list_add(struct kvm *kvm, struct kvm_mmu_memory_cache *cach= e, + u64 *spte, struct kvm_rmap_head *rmap_head) { unsigned long old_val, new_val; struct pte_list_desc *desc; int count =3D 0; =20 - old_val =3D kvm_rmap_lock(rmap_head); + old_val =3D kvm_rmap_lock(kvm, rmap_head); =20 if (!old_val) { new_val =3D (unsigned long)spte; @@ -1110,7 +1121,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spt= e, unsigned long rmap_val; int i; =20 - rmap_val =3D kvm_rmap_lock(rmap_head); + rmap_val =3D kvm_rmap_lock(kvm, rmap_head); if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_val, kvm)) goto out; =20 @@ -1154,7 +1165,7 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, unsigned long rmap_val; int i; =20 - rmap_val =3D kvm_rmap_lock(rmap_head); + rmap_val =3D kvm_rmap_lock(kvm, rmap_head); if (!rmap_val) return false; =20 @@ -1246,23 +1257,18 @@ static u64 *rmap_get_first(struct kvm_rmap_head *rm= ap_head, struct rmap_iterator *iter) { unsigned long rmap_val =3D kvm_rmap_get(rmap_head); - u64 *sptep; =20 if (!rmap_val) return NULL; =20 if (!(rmap_val & KVM_RMAP_MANY)) { iter->desc =3D NULL; - sptep =3D (u64 *)rmap_val; - goto out; + return (u64 *)rmap_val; } =20 iter->desc =3D (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); iter->pos =3D 0; - sptep =3D iter->desc->sptes[iter->pos]; -out: - BUG_ON(!is_shadow_present_pte(*sptep)); - return sptep; + return iter->desc->sptes[iter->pos]; } =20 /* @@ -1272,14 +1278,11 @@ static u64 *rmap_get_first(struct kvm_rmap_head *rm= ap_head, */ static u64 *rmap_get_next(struct rmap_iterator *iter) { - u64 *sptep; - if (iter->desc) { if (iter->pos < PTE_LIST_EXT - 1) { ++iter->pos; - sptep =3D iter->desc->sptes[iter->pos]; - if (sptep) - goto out; + if (iter->desc->sptes[iter->pos]) + return iter->desc->sptes[iter->pos]; } =20 iter->desc =3D iter->desc->more; @@ -1287,20 +1290,24 @@ static u64 *rmap_get_next(struct rmap_iterator *ite= r) if (iter->desc) { iter->pos =3D 0; /* desc->sptes[0] cannot be NULL */ - sptep =3D iter->desc->sptes[iter->pos]; - goto out; + return iter->desc->sptes[iter->pos]; } } =20 return NULL; -out: - BUG_ON(!is_shadow_present_pte(*sptep)); - return sptep; } =20 -#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_) \ - for (_spte_ =3D rmap_get_first(_rmap_head_, _iter_); \ - _spte_; _spte_ =3D rmap_get_next(_iter_)) +#define __for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + for (_sptep_ =3D rmap_get_first(_rmap_head_, _iter_); \ + _sptep_; _sptep_ =3D rmap_get_next(_iter_)) + +#define for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + __for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + if (!WARN_ON_ONCE(!is_shadow_present_pte(*(_sptep_)))) \ + +#define for_each_rmap_spte_lockless(_rmap_head_, _iter_, _sptep_, _spte_) \ + __for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + if (is_shadow_present_pte(_spte_ =3D mmu_spte_get_lockless(sptep))) =20 static void drop_spte(struct kvm *kvm, u64 *sptep) { @@ -1396,11 +1403,12 @@ static bool __rmap_clear_dirty(struct kvm *kvm, str= uct kvm_rmap_head *rmap_head, struct rmap_iterator iter; bool flush =3D false; =20 - for_each_rmap_spte(rmap_head, &iter, sptep) + for_each_rmap_spte(rmap_head, &iter, sptep) { if (spte_ad_need_write_protect(*sptep)) flush |=3D spte_wrprot_for_clear_dirty(sptep); else flush |=3D spte_clear_dirty(sptep); + } =20 return flush; } @@ -1710,7 +1718,7 @@ static void __rmap_add(struct kvm *kvm, kvm_update_page_stats(kvm, sp->role.level, 1); =20 rmap_head =3D gfn_to_rmap(gfn, sp->role.level, slot); - rmap_count =3D pte_list_add(cache, spte, rmap_head); + rmap_count =3D pte_list_add(kvm, cache, spte, rmap_head); =20 if (rmap_count > kvm->stat.max_mmu_rmap_size) kvm->stat.max_mmu_rmap_size =3D rmap_count; @@ -1859,13 +1867,14 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn) return hash_64(gfn, KVM_MMU_HASH_SHIFT); } =20 -static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache, +static void mmu_page_add_parent_pte(struct kvm *kvm, + struct kvm_mmu_memory_cache *cache, struct kvm_mmu_page *sp, u64 *parent_pte) { if (!parent_pte) return; =20 - pte_list_add(cache, parent_pte, &sp->parent_ptes); + pte_list_add(kvm, cache, parent_pte, &sp->parent_ptes); } =20 static void mmu_page_remove_parent_pte(struct kvm *kvm, struct kvm_mmu_pag= e *sp, @@ -2555,7 +2564,7 @@ static void __link_shadow_page(struct kvm *kvm, =20 mmu_spte_set(sptep, spte); =20 - mmu_page_add_parent_pte(cache, sp, sptep); + mmu_page_add_parent_pte(kvm, cache, sp, sptep); =20 /* * The non-direct sub-pagetable must be updated before linking. For --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3722317ADFF for ; Thu, 26 Sep 2024 01:35:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314529; cv=none; b=fScx4L9zFGhuKlxP4SnQQJkYx3aeMtpdqBP+A0nTAsSq73QJuh9YsGkya0ruUMXtiWY456rmFclifz0jRVomzaKN0T+50CxEztBxi/0MvtluPaoDfF1n0Vj/7wXrkoyk7cXHno5dgVvuQZv7pvnkgiYkHQqA79Echd/EHk53IYM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314529; c=relaxed/simple; bh=E9AQ6b+xH0X6/PhVwSlcvCw/vsOlfjn0RvQXxr2pYnI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=qZ8SRY+nP8y95TC0Kk1YgFSE8C85DvCm5BD9mhA3pPndA/XFX8p5sY9ZvzJgqCK55lRiIbV3WR05kueSRJUztO2FxENAPGNMzBLivktyeVO/R65dGcmBIW/cvUJc03A5nSYwZFa0pokZ2bSiie72xtvROK0Nex2n0z283GAMJiU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0izLu6WE; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0izLu6WE" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6db791c42e3so8792987b3.1 for ; Wed, 25 Sep 2024 18:35:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314527; x=1727919327; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=/OnVwOE5c+4OTD3Iw61tcspb9ex9OyShqju36BWGlvg=; b=0izLu6WEdFEPHrK/MbRSe9pZuZy4p0cKedOL1sv1PuE2bbREDrC481Ox9J+4niq0xq v49/Ld2Loax18tzQHKvCS94xKcy4PFvbee6ZzAQWTKhPoTiMlO9T0a+eqZlOUkY1gdje bTf6yOM5DVXF/J1yDnLgXRLGjaiUBKgl3O7OoN9WQ4XiOo/SHkcUSA2DGauqtOn80CjQ dbIHMeZ8pBQsySmgS6ekGHDwf+i06pv/G4Iee79JfNhMuCBqrXeVwjA6ZkMg9NnKA7lN FD6dMdhIkWNg+yjV4tPG305pRHCjliH3d+iGG6LJcdn2X0JSG6DFZl0qslgSyMoVaaey D0dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314527; x=1727919327; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/OnVwOE5c+4OTD3Iw61tcspb9ex9OyShqju36BWGlvg=; b=bwkwPMK9wKEmO4uUYIxZUhG9prsOjlcDthjuT94UIRk8Z5ZN2aGO7Z2vZ3Ni7EqEm6 azQ7tJEiNA5p+VjPa1rh6Y5nx4PeJHnH6iJi9X2X5JV9MNZt7Jt2S4GzK9ZyZzBd9QQL UAqc0wGBjm5tuEj9+tPK6wlkdDOLLxXuQF8Lxm85cMxsa4yWenGNALD8KdwB4hLGFen9 OQ1WJwlsyg5NBbvb7eFEye+Tsu+RRVhP/hK2Ot2aLTGhR2yhkFNc56fxDMyvIR5myCV1 Ozp2h1jnSnTZT5DA3Jt/Gml86cmPZaF/hGUwK41CdO5Cm5d00UpjfqynMYrFhBOcCtHK biug== X-Forwarded-Encrypted: i=1; AJvYcCWLkQVgFo7NEqkMmetIICa86U4rBzhsfsZi2oTvwgafW3SthzPbIGJehTuCWdNiSgaw94Jtix7N5pwjzok=@vger.kernel.org X-Gm-Message-State: AOJu0Yw6rOfcaF9gIG+R57bXOCY/tR329B+mrkbZ1pgurUfpuz6vLCEg g7MtYZ3HY3z1kjJ/SxSsQ8Qlvkq5R1PHzj+KTDMku9xp3I2LEpPDVrmgwrpR0XBARctiQVM2hZA mm+8S0zwOjt1DfFqyyg== X-Google-Smtp-Source: AGHT+IFP6wbu5d5xFxibAPH1VEJb74N56TVg2vvH80n5yLYaLmdn9VnaHIq1SmzVQiXoW0zSRxLWmsPWaG4W/Pg1 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a25:74cc:0:b0:e25:cced:3e3f with SMTP id 3f1490d57ef6-e25cced3f41mr11837276.4.1727314527042; Wed, 25 Sep 2024 18:35:27 -0700 (PDT) Date: Thu, 26 Sep 2024 01:34:58 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-11-jthoughton@google.com> Subject: [PATCH v7 10/18] KVM: x86/mmu: Support rmap walks without holding mmu_lock when aging gfns From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Because an L1 KVM can disable A/D bits for its L2, even if kvm_ad_enabled() in L0, we cannot always locklessly age, as aging requires marking non-A/D sptes for access tracking, which is not supported locklessly yet. We can always gather age information locklessly though. Signed-off-by: Sean Christopherson [jthoughton: Added changelog, adjusted conditional] Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 66 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 62 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 72c682fa207a..a63497bbcc61 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1014,13 +1014,11 @@ static unsigned long kvm_rmap_get(struct kvm_rmap_h= ead *rmap_head) * locking is the same, but the caller is disallowed from modifying the rm= ap, * and so the unlock flow is a nop if the rmap is/was empty. */ -__maybe_unused static unsigned long kvm_rmap_lock_readonly(struct kvm_rmap_head *rmap_hea= d) { return __kvm_rmap_lock(rmap_head); } =20 -__maybe_unused static void kvm_rmap_unlock_readonly(struct kvm_rmap_head *rmap_head, unsigned long old_val) { @@ -1736,8 +1734,53 @@ static void rmap_add(struct kvm_vcpu *vcpu, const st= ruct kvm_memory_slot *slot, __rmap_add(vcpu->kvm, cache, slot, spte, gfn, access); } =20 -static bool kvm_rmap_age_gfn_range(struct kvm *kvm, - struct kvm_gfn_range *range, bool test_only) +static bool kvm_rmap_age_gfn_range_lockless(struct kvm *kvm, + struct kvm_gfn_range *range, + bool test_only) +{ + struct kvm_rmap_head *rmap_head; + struct rmap_iterator iter; + unsigned long rmap_val; + bool young =3D false; + u64 *sptep; + gfn_t gfn; + int level; + u64 spte; + + for (level =3D PG_LEVEL_4K; level <=3D KVM_MAX_HUGEPAGE_LEVEL; level++) { + for (gfn =3D range->start; gfn < range->end; + gfn +=3D KVM_PAGES_PER_HPAGE(level)) { + rmap_head =3D gfn_to_rmap(gfn, level, range->slot); + rmap_val =3D kvm_rmap_lock_readonly(rmap_head); + + for_each_rmap_spte_lockless(rmap_head, &iter, sptep, spte) { + if (!is_accessed_spte(spte)) + continue; + + if (test_only) { + kvm_rmap_unlock_readonly(rmap_head, rmap_val); + return true; + } + + /* + * Marking SPTEs for access tracking outside of + * mmu_lock is unsupported. Report the page as + * young, but otherwise leave it as-is. + */ + if (spte_ad_enabled(spte)) + clear_bit((ffs(shadow_accessed_mask) - 1), + (unsigned long *)sptep); + young =3D true; + } + + kvm_rmap_unlock_readonly(rmap_head, rmap_val); + } + } + return young; +} + +static bool __kvm_rmap_age_gfn_range(struct kvm *kvm, + struct kvm_gfn_range *range, bool test_only) { struct slot_rmap_walk_iterator iterator; struct rmap_iterator iter; @@ -1776,6 +1819,21 @@ static bool kvm_rmap_age_gfn_range(struct kvm *kvm, return young; } =20 +static bool kvm_rmap_age_gfn_range(struct kvm *kvm, + struct kvm_gfn_range *range, bool test_only) +{ + /* + * We can always locklessly test if an spte is young. Because marking + * non-A/D sptes for access tracking without holding the mmu_lock is + * not currently supported, we cannot always locklessly clear. + */ + if (test_only) + return kvm_rmap_age_gfn_range_lockless(kvm, range, test_only); + + lockdep_assert_held_write(&kvm->mmu_lock); + return __kvm_rmap_age_gfn_range(kvm, range, test_only); +} + static bool kvm_has_shadow_mmu_sptes(struct kvm *kvm) { return !tdp_mmu_enabled || READ_ONCE(kvm->arch.indirect_shadow_pages); --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFCD2197A9A for ; Thu, 26 Sep 2024 01:35:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314531; cv=none; b=Z3FKmuOyfa5Bo/cwcSrBg544bk0TMzp1gnqnHEMHZiUw1h6Q3005szebQpBb0IUZ6C0bDmTzTMa+vrvnyzSBQn7nEeU9ssf9umU3Q/5puTmlMZpw22UwpQfWtSWQLT7m6djwqPsvLS/2n7IRCDIPxhOgeIn91IYDlfbz1m/TrWg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314531; c=relaxed/simple; bh=sw/0Zxhq6kIY8BLZP2OCu04cT4fuVwkynKL63RK8D/o=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Ap4bTlum/Fec2qrM4gPQnW/dNTLhRhXio2B7QS32MX2nkpNtnL1u6nTfFdcorPbV00Om2xF1Nt5XZFiB3swCC4zquD9a0Cd1skCyv1oIoXQ7SucvaZLjrzhCpGVvtjY/G+aZiaqnoiUiEOvlkZxi93g+xG4FHBR/mICEqr1RLyE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=UJ56MGYn; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="UJ56MGYn" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-6d54ab222fcso13066177b3.1 for ; Wed, 25 Sep 2024 18:35:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314528; x=1727919328; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=V1Z3PBXE4Aegel3/riahgcLFAyLXvOzS5uEE7fXt5qw=; b=UJ56MGYnqItl4G3t75gQ3/ESpNwGEzgsGmQZs66+3XPTKOOSJVP1N2jGfvKBnKT9ZZ ViejWh4kls3lr8AfcVf/PIpODN7eqJc8flIHxeOdRr1XnCPs7u2X4XFaXea/Ipwd+CWX vBpI+qeqF/q+tSnrH5b82TnXOCGy280ULCB8Vsj3mRlU1F4ljVYKf8BkTRHC9U9FVx4I 3YCPfTvvAPUvlXqOSWvt5/A3z5Ij0V+v2pEOQ2NxJFrBrMjKHWpHNLB+rSTxIR1jboc3 5VgDGAfASlijmnV7AAhOdC14IJ/xOLwt6S9YD9n9/9EIrzITXJeZP6J2/hlUZ4DYnHwu i3hg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314528; x=1727919328; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=V1Z3PBXE4Aegel3/riahgcLFAyLXvOzS5uEE7fXt5qw=; b=OrBEjfG7aRoO47YY3LAVNzXpB8vvOSGNsSyDyBQSD7y8Y/WDwpIy7z8ECJGMYVaCGM NyDT2m/wyMZl+8OxXkdE0PAlBzagxCzAthLVk+Jhe/lEk0l8bS8IpPUDiNgoKtSQTWpF TqRq6sUItPISzGZjP2zlCZlWGuJWKQAaG62Nj/Kv+W2W4j9loQ3StraoOjY7Sjislw7m ZWQQLw0o+XM1NWBlsii73V0XfnzsrIiCIRxZFapS3CA2AwC/zlCaCWNF0qkLd4V1fqbg 3U4lUnKFefO0Co76H+oM7wkhB1dZCL2en2Ee7kBCGAHfiDtcVJ9r9mFo1ZsOQjvE9cvx bsSg== X-Forwarded-Encrypted: i=1; AJvYcCXqZJ8UOFkUTuSzbNKA5IHKmsUIdb/Fx3IzfWDoU/OG7oJnURlfD6x98Jt8zj37M9PNWbQkvVf+s+LZJUQ=@vger.kernel.org X-Gm-Message-State: AOJu0YwSyKRh/Hy03gqIoD35eufIOFtZBBlArkHF3QcfMRLK7XjJyrnh wEFF1u5kyHJWmuqAvH0U01UGKK70jyTk24xWAxIYaKY7Q5a9VndXTkiz+K9z7G6gkNW8txb2hD4 H0QJxpjUOjMhRj+SJuA== X-Google-Smtp-Source: AGHT+IEpNqlcby3XCzPxmMXEa3yAsp88MrqHgIHgIiYwCpactoczoPP9Ca8XqrYS/AN0Db4KS1YJcToucGPf0cLH X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a05:690c:2892:b0:648:fc8a:cd23 with SMTP id 00721157ae682-6e21d6e1f34mr309987b3.2.1727314528595; Wed, 25 Sep 2024 18:35:28 -0700 (PDT) Date: Thu, 26 Sep 2024 01:34:59 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-12-jthoughton@google.com> Subject: [PATCH v7 11/18] mm: Add missing mmu_notifier_clear_young for !MMU_NOTIFIER From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Jason Gunthorpe , David Hildenbrand Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Remove the now unnecessary ifdef in mm/damon/vaddr.c as well. Signed-off-by: James Houghton Reviewed-by: Jason Gunthorpe Acked-by: David Hildenbrand --- include/linux/mmu_notifier.h | 7 +++++++ mm/damon/vaddr.c | 2 -- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index d39ebb10caeb..e2dd57ca368b 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -606,6 +606,13 @@ static inline int mmu_notifier_clear_flush_young(struc= t mm_struct *mm, return 0; } =20 +static inline int mmu_notifier_clear_young(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + return 0; +} + static inline int mmu_notifier_test_young(struct mm_struct *mm, unsigned long address) { diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 58829baf8b5d..2d5b53253bc2 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -351,11 +351,9 @@ static void damon_hugetlb_mkold(pte_t *pte, struct mm_= struct *mm, set_huge_pte_at(mm, addr, pte, entry, psize); } =20 -#ifdef CONFIG_MMU_NOTIFIER if (mmu_notifier_clear_young(mm, addr, addr + huge_page_size(hstate_vma(vma)))) referenced =3D true; -#endif /* CONFIG_MMU_NOTIFIER */ =20 if (referenced) folio_set_young(folio); --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-ua1-f74.google.com (mail-ua1-f74.google.com [209.85.222.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C03D1A4AA6 for ; Thu, 26 Sep 2024 01:35:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314533; cv=none; b=ADqjfu+y/43UpQLFHANWbKTV6dK8d9RDh/3kV0HRdDPTn9wzkYV61B6k2JxEo3Pu8DVud7AaodkZY59RGFxhZPvojUFu/SIIbkm0nhf1QKph4wMwj4arN4UIr8GY/ReI9H5UxKyMN9dNIdIHlqsRFgXsHF534IzlWg3v909PfO0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314533; c=relaxed/simple; bh=TwJSAlmKBARZz/CI6deRJoTQbU3R7Kx2125IS9QJ4rA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=e9FVYzK7glioitimZquezBrVKoiXlCN6qJ9r923172zEkMpt53GQFNcInJcb6F5YSSZigiXbNE7P1fYma2Hl43bAr9komtxWuAK9MD7hrxgYrsWWDwsaL4y15cPbPeKfOn6uPvNHZZMBlaAYE1KmtYDh/vzmungi2tN2T6nVynQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=EadYPUZf; arc=none smtp.client-ip=209.85.222.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EadYPUZf" Received: by mail-ua1-f74.google.com with SMTP id a1e0cc1a2514c-848917ba13aso142235241.3 for ; Wed, 25 Sep 2024 18:35:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314530; x=1727919330; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jwY5XxMaAH9vrNcOxKBlxaXWmYulPCq+P3CKqgHldo8=; b=EadYPUZfKnrSS3YLeCYgelEvWLhlKB3uT+jGwO7o8otaPiDZTDs6vPTCDMAMuzexXX rgRhn2L+UXm2VO1w7KD5z82x5ev47oSHETuz4a3gvs8Q41oV8LiCE8lC8A2xYqGpfCym bKFnQW8z08kxjWRqH3ZzYMRVHaILlsZt5DmsPauiZBobmhkcL4Y7NH+AzJjKnQNDY4j4 d+w6z5kZA/FThfofQCbd+hqTzOEhqE98fu11+kWfWU3TdyA7s8XrNYOWWyy8alx2rc3F kH0vCw6jMeMd383Tr9P45gzfhqAcxTt0J1j5sRhHoQdjQFBz5i/4cFaeNbeq6uoWUxFO zxaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314530; x=1727919330; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jwY5XxMaAH9vrNcOxKBlxaXWmYulPCq+P3CKqgHldo8=; b=l0DS0pPNj8LUEXKJY33rBwCcuEWrJRevV5bklS+zCoF3rLq5qu1NG8fzGBhVTSpFiD dM02OSeXWQ0AGeMBd1Q2cAQ9lc+ajKBWsvjFkzfhsW4gWSW4ky4Ddw2GZklblD25VkzJ Ha81xwCYm3gU1dlVoZzUUKZvLubXCFs9KzGxaMiVpbvDEABcIWhIPP1Rrfw3kYWVy9Ng IfG6WE1TmtALRkDYummM5PKXJbS50bLZaj1AWoXv+rxzychmkMhjLKDv+jRKyUbRjMt1 XS7fZhXLN9AJgV0dvK7CL/+gbS7O1dFHfUn5qLZ4hEW0FKQ7S85RhsLWidtwEc6VEKu1 mlQQ== X-Forwarded-Encrypted: i=1; AJvYcCUc5ysD85dZlGHNi0IcrExqjg2Vurgv6Lgc/jTCAOevWGOdcoQwTlB1FyIwXt2LZTOrhelrhecJQ/B9paw=@vger.kernel.org X-Gm-Message-State: AOJu0YxKBxJr89U0ib3XZNSi7cjB3j7BSu93quD3Y3HtUaDuVUusLIwH YeifHTZJ4rA0gpAA3+vqliFXxOXtTYT26Kn0rmBf4xMJTIa7x2hEhrhK1JMhDB19oUKRif2wfjR vPtWomxxEJKAOX0fwhA== X-Google-Smtp-Source: AGHT+IFR+2qF3anIZYWl2APj1JzmMbbMXKjDESaaFvEMKsw92aYBHlJKDywpCEXIkyk1bOavtsxa/wF5vSbBnBSJ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:ab0:380d:0:b0:846:d5f9:2186 with SMTP id a1e0cc1a2514c-84e83c0a7fdmr10563241.2.1727314529698; Wed, 25 Sep 2024 18:35:29 -0700 (PDT) Date: Thu, 26 Sep 2024 01:35:00 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-13-jthoughton@google.com> Subject: [PATCH v7 12/18] mm: Add has_fast_aging to struct mmu_notifier From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" has_fast_aging should be set by subscribers that non-trivially implement fast-only versions of both test_young() and clear_young(). Fast aging must be opt-in. For a subscriber that has not been enlightened with "fast aging", the test/clear_young() will behave identically whether or not fast_only is given. Have KVM always opt out for now; specific architectures can opt-in later. Given that KVM is the only test/clear_young() implementer, we could instead add an equivalent check in KVM, but doing so would incur an indirect function call every time, even if the notifier ends up being a no-op. Add mm_has_fast_young_notifiers() in case a caller wants to know if it should skip many calls to the mmu notifiers that may not be necessary (like MGLRU look-around). Signed-off-by: James Houghton --- include/linux/mmu_notifier.h | 14 ++++++++++++++ mm/mmu_notifier.c | 20 ++++++++++++++++++++ virt/kvm/kvm_main.c | 1 + 3 files changed, 35 insertions(+) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index e2dd57ca368b..37643fa43687 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -231,6 +231,7 @@ struct mmu_notifier { struct mm_struct *mm; struct rcu_head rcu; unsigned int users; + bool has_fast_aging; }; =20 /** @@ -383,6 +384,7 @@ extern int __mmu_notifier_clear_young(struct mm_struct = *mm, unsigned long end); extern int __mmu_notifier_test_young(struct mm_struct *mm, unsigned long address); +extern bool __mm_has_fast_young_notifiers(struct mm_struct *mm); extern int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range= *r); extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range = *r); extern void __mmu_notifier_arch_invalidate_secondary_tlbs(struct mm_struct= *mm, @@ -428,6 +430,13 @@ static inline int mmu_notifier_test_young(struct mm_st= ruct *mm, return 0; } =20 +static inline bool mm_has_fast_young_notifiers(struct mm_struct *mm) +{ + if (mm_has_notifiers(mm)) + return __mm_has_fast_young_notifiers(mm); + return 0; +} + static inline void mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) { @@ -619,6 +628,11 @@ static inline int mmu_notifier_test_young(struct mm_st= ruct *mm, return 0; } =20 +static inline bool mm_has_fast_young_notifiers(struct mm_struct *mm) +{ + return 0; +} + static inline void mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range) { diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 8982e6139d07..c405e5b072cf 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -382,6 +382,26 @@ int __mmu_notifier_clear_flush_young(struct mm_struct = *mm, return young; } =20 +bool __mm_has_fast_young_notifiers(struct mm_struct *mm) +{ + struct mmu_notifier *subscription; + bool has_fast_aging =3D false; + int id; + + id =3D srcu_read_lock(&srcu); + hlist_for_each_entry_rcu(subscription, + &mm->notifier_subscriptions->list, hlist, + srcu_read_lock_held(&srcu)) { + if (subscription->has_fast_aging) { + has_fast_aging =3D true; + break; + } + } + srcu_read_unlock(&srcu, id); + + return has_fast_aging; +} + int __mmu_notifier_clear_young(struct mm_struct *mm, unsigned long start, unsigned long end) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7d5b35cfc1ed..f6c369eccd2a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -930,6 +930,7 @@ static const struct mmu_notifier_ops kvm_mmu_notifier_o= ps =3D { static int kvm_init_mmu_notifier(struct kvm *kvm) { kvm->mmu_notifier.ops =3D &kvm_mmu_notifier_ops; + kvm->mmu_notifier.has_fast_aging =3D false; return mmu_notifier_register(&kvm->mmu_notifier, current->mm); } =20 --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 981351A4AB4 for ; Thu, 26 Sep 2024 01:35:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314533; cv=none; b=Fz3tt7XCbNKA6Pi3gWptYFkSWKfLRpl6kQqj27ziL967X1bCvDiiUxpOiTbhfGUbqvahz8ClfgfoFYkhbn/L8wUJPlr+b0CuKsoWHSdVB/4b26VcayaM2PJc4q624vS14tkWsFpXsTdoUfj/TyjVXDyTXRaMGJcpDuex5WslZQA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314533; c=relaxed/simple; bh=iynxZ4Ry9WaN4XO7OQhzo2JcvREd4yVbmFHSIphWh40=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=n8HQbBb6WpSEqZXH+ODsmUqpad/R5G/whHhCj59E34vLLo7RIHkk5zcst0IzyzS74H/D3Pv+3DdWvlbM19HsfEBcBxrJC1sdGj5JwymHhnjvVKWH6q9yY5QLR+XBjsLvwovG8nIOeTZA1x/xEq6OkEcvSi0cp3xi34b34g4ZDZ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=VWyL8IMW; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="VWyL8IMW" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6e230808455so5355257b3.0 for ; Wed, 25 Sep 2024 18:35:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314531; x=1727919331; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ogisB+p0t7E+MEmwUL2EdeyMfss1r+UivrCLQXciZ0o=; b=VWyL8IMWaAQGG04phvq0Z82R4n3aqoS8tnjLNrvV31bkzDKYLafBHC6bvUn+CxBWhA wgUjKspczotuLzqQkZA08e3t2qzXpHzyFxXGXL3oAqCpuveC4EQXmyq9ndvserDhpQCU YxO64grDH4l7l7CgnhizfmPdsfkHeV/O3t746cdXyk4q9froEqnH4gGhox54CF69qst/ JGnKQ3s8K78MpE8Vo/EggETvPbj44jRIF4iuqglsTJXRS1dzVVnILFkWJ4GtouowoPqD Ez6nRSiklK1H9zgRKk0QeNTmf2xNHpMc9zpK4F3u+4TahGNcWTUoYo+W4sQIRWsRJlFw srRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314531; x=1727919331; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ogisB+p0t7E+MEmwUL2EdeyMfss1r+UivrCLQXciZ0o=; b=pBgb7T6jrlzU2t+1/8MzqxRva+QSnlpVLWe/1+av3GHyUFmDAYWlIH0ReVZWOXeA5/ eaVRPgLbGudJVJ2sHSCiJpJUPfQsbkouFt7r5crE8lfYNH8hpel5HF/gfkTw3YbmLUmF EDVIs3SmEd2ObeU6T2n4IeAZ+xoIuBOnIJOZf56eg+6k5tRW5/VXUGA7fn2ZI4w5FFmr xqHD5rjR2/afIZpuEL/6J2sXm8oMRx/EUhUfsWpic87t9ucICt+ybJwTUdOxMlVMPMp2 jikea9W7n27MPX+rBZ8EZMLrvrp41iT08ouYk32CmpUSWtGktoWN6PW5p00tgI8ilJph aUlw== X-Forwarded-Encrypted: i=1; AJvYcCVo4VSnmoP76IHTbjaj0C8icXSzmEKwJJF38OjpCIa/P6kgiyujQ6F2SHpaHijRgB4Gkj+CBHGDXUM9vHo=@vger.kernel.org X-Gm-Message-State: AOJu0Yy9P8iyeCDEPhiQWwoeqfLdSSUhoEL/gyOtJWpw61heKX8By95Z YY7BomEfG9hOr+XMqh2ZtU8U0jN5/IeRsOc4ZW3ZC8kERAPYwEjcFhsFb9xLDoH+MXG4YAVxdI6 XNQZS+gXXX01yr5yFXg== X-Google-Smtp-Source: AGHT+IEFdzw0O48kZ1pdx1YoauWRvV0zu2R4nMdxsgoP5v/WIUZXOoAYIDJzZnEXzkuO0wTdBSaqUczWpwalNBYg X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a05:690c:5292:b0:62c:f976:a763 with SMTP id 00721157ae682-6e21d6c27c9mr136077b3.1.1727314530709; Wed, 25 Sep 2024 18:35:30 -0700 (PDT) Date: Thu, 26 Sep 2024 01:35:01 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-14-jthoughton@google.com> Subject: [PATCH v7 13/18] mm: Add fast_only bool to test_young and clear_young MMU notifiers From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" For implementers, the fast_only bool indicates that the age information needs to be harvested such that we do not slow down other MMU operations, and ideally that we are not ourselves slowed down by other MMU operations. Usually this means that the implementation should be lockless. Also add mmu_notifier_test_young_fast_only() and mmu_notifier_clear_young_fast_only() helpers to set fast_only for these notifiers. Signed-off-by: James Houghton --- include/linux/mmu_notifier.h | 61 ++++++++++++++++++++++++++++++++---- include/trace/events/kvm.h | 19 ++++++----- mm/mmu_notifier.c | 18 ++++++++--- virt/kvm/kvm_main.c | 12 ++++--- 4 files changed, 88 insertions(+), 22 deletions(-) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 37643fa43687..7c17e2871c66 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -106,21 +106,38 @@ struct mmu_notifier_ops { * clear_young is a lightweight version of clear_flush_young. Like the * latter, it is supposed to test-and-clear the young/accessed bitflag * in the secondary pte, but it may omit flushing the secondary tlb. + * + * The fast_only parameter indicates that this call should not block, + * and this function should not cause other MMU notifier calls to + * block. Usually this means that the implementation should be + * lockless. + * + * When called with fast_only, this notifier will be a no-op (and + * return that the range is NOT young), unless has_fast_aging is set + * on the struct mmu_notifier. + * + * When fast_only is true, if the implementer cannot determine that a + * range is young without blocking, it should return 0 (i.e., that + * the range is NOT young). */ int (*clear_young)(struct mmu_notifier *subscription, struct mm_struct *mm, unsigned long start, - unsigned long end); + unsigned long end, + bool fast_only); =20 /* * test_young is called to check the young/accessed bitflag in * the secondary pte. This is used to know if the page is * frequently used without actually clearing the flag or tearing * down the secondary mapping on the page. + * + * The fast_only parameter has the same meaning as with clear_young. */ int (*test_young)(struct mmu_notifier *subscription, struct mm_struct *mm, - unsigned long address); + unsigned long address, + bool fast_only); =20 /* * invalidate_range_start() and invalidate_range_end() must be @@ -381,9 +398,11 @@ extern int __mmu_notifier_clear_flush_young(struct mm_= struct *mm, unsigned long end); extern int __mmu_notifier_clear_young(struct mm_struct *mm, unsigned long start, - unsigned long end); + unsigned long end, + bool fast_only); extern int __mmu_notifier_test_young(struct mm_struct *mm, - unsigned long address); + unsigned long address, + bool fast_only); extern bool __mm_has_fast_young_notifiers(struct mm_struct *mm); extern int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range= *r); extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range = *r); @@ -418,7 +437,16 @@ static inline int mmu_notifier_clear_young(struct mm_s= truct *mm, unsigned long end) { if (mm_has_notifiers(mm)) - return __mmu_notifier_clear_young(mm, start, end); + return __mmu_notifier_clear_young(mm, start, end, false); + return 0; +} + +static inline int mmu_notifier_clear_young_fast_only(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + if (mm_has_notifiers(mm)) + return __mmu_notifier_clear_young(mm, start, end, true); return 0; } =20 @@ -426,7 +454,15 @@ static inline int mmu_notifier_test_young(struct mm_st= ruct *mm, unsigned long address) { if (mm_has_notifiers(mm)) - return __mmu_notifier_test_young(mm, address); + return __mmu_notifier_test_young(mm, address, false); + return 0; +} + +static inline int mmu_notifier_test_young_fast_only(struct mm_struct *mm, + unsigned long address) +{ + if (mm_has_notifiers(mm)) + return __mmu_notifier_test_young(mm, address, true); return 0; } =20 @@ -622,12 +658,25 @@ static inline int mmu_notifier_clear_young(struct mm_= struct *mm, return 0; } =20 +static inline int mmu_notifier_clear_young_fast_only(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + return 0; +} + static inline int mmu_notifier_test_young(struct mm_struct *mm, unsigned long address) { return 0; } =20 +static inline int mmu_notifier_test_young_fast_only(struct mm_struct *mm, + unsigned long address) +{ + return 0; +} + static inline bool mm_has_fast_young_notifiers(struct mm_struct *mm) { return 0; diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index 74e40d5d4af4..6d9485cf3e51 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -457,36 +457,41 @@ TRACE_EVENT(kvm_unmap_hva_range, ); =20 TRACE_EVENT(kvm_age_hva, - TP_PROTO(unsigned long start, unsigned long end), - TP_ARGS(start, end), + TP_PROTO(unsigned long start, unsigned long end, bool fast_only), + TP_ARGS(start, end, fast_only), =20 TP_STRUCT__entry( __field( unsigned long, start ) __field( unsigned long, end ) + __field( bool, fast_only ) ), =20 TP_fast_assign( __entry->start =3D start; __entry->end =3D end; + __entry->fast_only =3D fast_only; ), =20 - TP_printk("mmu notifier age hva: %#016lx -- %#016lx", - __entry->start, __entry->end) + TP_printk("mmu notifier age hva: %#016lx -- %#016lx fast_only: %d", + __entry->start, __entry->end, __entry->fast_only) ); =20 TRACE_EVENT(kvm_test_age_hva, - TP_PROTO(unsigned long hva), - TP_ARGS(hva), + TP_PROTO(unsigned long hva, bool fast_only), + TP_ARGS(hva, fast_only), =20 TP_STRUCT__entry( __field( unsigned long, hva ) + __field( bool, fast_only ) ), =20 TP_fast_assign( __entry->hva =3D hva; + __entry->fast_only =3D fast_only; ), =20 - TP_printk("mmu notifier test age hva: %#016lx", __entry->hva) + TP_printk("mmu notifier test age hva: %#016lx fast_only: %d", + __entry->hva, __entry->fast_only) ); =20 #endif /* _TRACE_KVM_MAIN_H */ diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index c405e5b072cf..f9ec810c8a1b 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -404,7 +404,8 @@ bool __mm_has_fast_young_notifiers(struct mm_struct *mm) =20 int __mmu_notifier_clear_young(struct mm_struct *mm, unsigned long start, - unsigned long end) + unsigned long end, + bool fast_only) { struct mmu_notifier *subscription; int young =3D 0, id; @@ -413,9 +414,13 @@ int __mmu_notifier_clear_young(struct mm_struct *mm, hlist_for_each_entry_rcu(subscription, &mm->notifier_subscriptions->list, hlist, srcu_read_lock_held(&srcu)) { + if (fast_only && !subscription->has_fast_aging) + continue; + if (subscription->ops->clear_young) young |=3D subscription->ops->clear_young(subscription, - mm, start, end); + mm, start, end, + fast_only); } srcu_read_unlock(&srcu, id); =20 @@ -423,7 +428,8 @@ int __mmu_notifier_clear_young(struct mm_struct *mm, } =20 int __mmu_notifier_test_young(struct mm_struct *mm, - unsigned long address) + unsigned long address, + bool fast_only) { struct mmu_notifier *subscription; int young =3D 0, id; @@ -432,9 +438,13 @@ int __mmu_notifier_test_young(struct mm_struct *mm, hlist_for_each_entry_rcu(subscription, &mm->notifier_subscriptions->list, hlist, srcu_read_lock_held(&srcu)) { + if (fast_only && !subscription->has_fast_aging) + continue; + if (subscription->ops->test_young) { young =3D subscription->ops->test_young(subscription, mm, - address); + address, + fast_only); if (young) break; } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f6c369eccd2a..ec07caaed6b6 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -846,7 +846,7 @@ static int kvm_mmu_notifier_clear_flush_young(struct mm= u_notifier *mn, IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS), }; =20 - trace_kvm_age_hva(start, end); + trace_kvm_age_hva(start, end, false); =20 return kvm_handle_hva_range(kvm, &range).ret; } @@ -854,7 +854,8 @@ static int kvm_mmu_notifier_clear_flush_young(struct mm= u_notifier *mn, static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long start, - unsigned long end) + unsigned long end, + bool fast_only) { struct kvm *kvm =3D mmu_notifier_to_kvm(mn); const struct kvm_mmu_notifier_range range =3D { @@ -868,7 +869,7 @@ static int kvm_mmu_notifier_clear_young(struct mmu_noti= fier *mn, IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS), }; =20 - trace_kvm_age_hva(start, end); + trace_kvm_age_hva(start, end, fast_only); =20 /* * Even though we do not flush TLB, this will still adversely @@ -888,7 +889,8 @@ static int kvm_mmu_notifier_clear_young(struct mmu_noti= fier *mn, =20 static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, struct mm_struct *mm, - unsigned long address) + unsigned long address, + bool fast_only) { struct kvm *kvm =3D mmu_notifier_to_kvm(mn); const struct kvm_mmu_notifier_range range =3D { @@ -902,7 +904,7 @@ static int kvm_mmu_notifier_test_young(struct mmu_notif= ier *mn, IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS), }; =20 - trace_kvm_test_age_hva(address); + trace_kvm_test_age_hva(address, fast_only); =20 return kvm_handle_hva_range(kvm, &range).ret; } --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54A771A4E8E for ; Thu, 26 Sep 2024 01:35:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314535; cv=none; b=tfHnZgsrxoBe9RJ+FoSOU+ad5RSLFdc9YVCcL385LwSOoHXID+RnWJSxdFJwMYnZAqHi+feHRhWGHPNsHiShL/YQK3qVkQb+vLlvOyXb+l9azl+qqWIpDXF3hk70IK9WRB1LVhZnwONcQ9qZV8JFEI7LxeNaS3QSrvGUqPl1nzs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314535; c=relaxed/simple; bh=qtVup+apKklmYfzUxasBpTxc2jJ2e+fh2noX2r78kUo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=J8et7sTZUQQcoKLeZSzxE0+zG22kY+E9YivDQDP4wqmZ/tz8dS/skJiL8tH5cdhWDqr9g8IfRvopEMALRf1LUhgLIxZwyDBbEDZPGScqFaAasf/xuDTTXOcR6R3hxSbLYDKpmlBAW4wGPWBEid1vjSUOSWGkTkVz7QbEHJT/8PE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=sCHiSt3L; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="sCHiSt3L" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e25cd76fb92so344219276.3 for ; Wed, 25 Sep 2024 18:35:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314532; x=1727919332; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=W5MO1wGLHyIriiPw0D1mbZgOsdCV5vF/ShV42LiObvw=; b=sCHiSt3LZN2UzQCJPFbc7CeOoNT+9acvoIqDw7mwDfu7OGO0WKk3NCdHCEuXz+KoYN V+nkjWkTD4HW3juqt4poaMt95uIF3zBj5nptG5raVrWhZoXgp3H/ZINIYqteoi6m5I3L 3MpOdZgsmLfc2wqs0eB/lD26MkTWfvYpJR/15BAr7K4f3Ui3B+aEcaIsRZoOC465h+7z gQqvEAtggvZoBU4+KNOuwwu7ZKMnXgzSLAVP+G7l5M08N7XQkakFf1FAAsVXUrchVQ0s Mq+f8WTXO1tLaPtgRLq/uch08v/5AhOGq2nhhlRtlhXYNQKNbt0cUuZehdKFkARAf5Np g43A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314532; x=1727919332; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W5MO1wGLHyIriiPw0D1mbZgOsdCV5vF/ShV42LiObvw=; b=H5I1YeeDKtIRo40uamx9pf+aPUX9ZsXfH0V9gUrjQNdJndSZBeClDai4i+HbUG0x7C 1C2iq42GLzfg+R9BVKBfpzD+wNoWenfkAdv8c0Vx6IOw5G5p4TPFYfKR/61pFKMvsal5 T45h0Js/2RtJ7LTHPiqV3DKspwbQn23QOdL3Y5OZ6JegfBJgcY7Gvo4vT0t/ls8pX4Fo elHVdxDrU1xUNAFqKtqBTr4/hQ2lmIVgkWIzhkKUBkum6y7KnFibkhgOrGUpFn0SijEU j46aO7UHPNauFoz4m8W2Is59GiUij1hvg0+zq/Yt9eDHoVXR1SLZgeHchYZ/EWoZx0ah hVzg== X-Forwarded-Encrypted: i=1; AJvYcCWJxSfVQRvFfyXVk+JiHIxieKMLFlmZYAJN628DcIwNmj6zVzCdr0CvfqLevKeSW/QaBdos2DgnybVgCCY=@vger.kernel.org X-Gm-Message-State: AOJu0YwzfYyipfJOSi6mx76woHoDaaWrFsC5j0Fg9xKYWSu5NV1FAyMg Qoh4qVBlKidEfFYu8ztpeRE+k0HoAvQ8uZ9gVp3aXkqaKHba3lsAeRZb0NATi/YamOA4qOysw84 JndwIU6ujaExt6Z9sGA== X-Google-Smtp-Source: AGHT+IHu6+UOTvIBD0ygLiAF0PUJIN5o89y1zOQ3DCuJr2JFhL/uFLj8FE3ghAKMMweGgziAQ1oo3bEK7MWZtMkm X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a25:bb48:0:b0:e03:53a4:1a7 with SMTP id 3f1490d57ef6-e24da1a380bmr81561276.10.1727314532143; Wed, 25 Sep 2024 18:35:32 -0700 (PDT) Date: Thu, 26 Sep 2024 01:35:02 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-15-jthoughton@google.com> Subject: [PATCH v7 14/18] KVM: Pass fast_only to kvm_{test_,}age_gfn From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Provide the basics for architectures to implement a fast-only version of kvm_{test_,}age_gfn. Signed-off-by: James Houghton --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 98a987e88578..55861db556e2 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -258,6 +258,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER union kvm_mmu_notifier_arg { unsigned long attributes; + bool fast_only; }; =20 struct kvm_gfn_range { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ec07caaed6b6..8630dfc82d61 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -867,6 +867,7 @@ static int kvm_mmu_notifier_clear_young(struct mmu_noti= fier *mn, .may_block =3D false, .lockless =3D IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS), + .arg.fast_only =3D fast_only, }; =20 trace_kvm_age_hva(start, end, fast_only); @@ -902,6 +903,7 @@ static int kvm_mmu_notifier_test_young(struct mmu_notif= ier *mn, .may_block =3D false, .lockless =3D IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS), + .arg.fast_only =3D fast_only, }; =20 trace_kvm_test_age_hva(address, fast_only); --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 576C71A76C1 for ; Thu, 26 Sep 2024 01:35:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314535; cv=none; b=AzgWmRrnFakwq0+HS530OY9QA7x2iykSPzV1iwTBTK8oImlTmUd9AJ4CpSsdtYpkJ8clusbLN/nHCAa5bv3Jgs/2naALuvMrh7k6xebK6e1PvdQKYDYkobJcOlNdmrZoUC4xerWQZmTxdnNRju1k9Lt4MWfgit0INvcZopvG/BI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314535; c=relaxed/simple; bh=P/XmdxbT2r5VTwTMBboA5fmY/qNQ13tNl6QPu1wIL4U=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=uW58q9REER7vSQkYqAoIk4g0J6v7R61eFVQNwZhsAb3LQF4mNbNmxDHRF2a+vkdl0OWVV9M0wn6zxRMtmFmmn1F9Vhf53y3m+dnRGUXZMuA/iOlgphT37oY5E3/ztnZColRBWmYm4nXaouK/bMio8EUl82aXUHE77w+rG5fwddE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=nBg7Tszw; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nBg7Tszw" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e25cf4e97aaso147234276.3 for ; Wed, 25 Sep 2024 18:35:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314533; x=1727919333; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=/c4iHe6I2dqFJ8L4xEL/wcuXYs+QQdS3mQ0rZKnPlRM=; b=nBg7TszwOPouVBSmY7QUbGo6r19+GJG19iNTMli3HKbhLUDccOLZ1O4NEKzu88DJRw lKhZPzCcEjcQEQ6EYSwuDE71xUFdpcfQjoebuXEqWRRdvy1klsfZ+8RbiAfxoFnCAgon 4gyukPJD2Jnk9xCyY7aAOixgV3wQRUTjQzUmKQIHUx8ns3aLEcdi01JPOEDFp16gm1MI RCOawb2ET08dOP9t/7jSjm/pkVBy16E4bqZxQOcWQWdsLLlC+2xNQkKH42CLkO+0uT4O Jnx21JkyDW3ob7GQQRs8N7vxGz1c9iOkw/oSEQVUAy0tER3u39OVHlAvoWmUpQ+rS87/ nJwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314533; x=1727919333; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/c4iHe6I2dqFJ8L4xEL/wcuXYs+QQdS3mQ0rZKnPlRM=; b=bxbpGBw6KprQMFeMBelVY6IegS0uf1SUmATyDppu/8KvhfE9uhBxCBOcfkf94THf++ MMSWSHYUlfuLfRnYxeUhThY3gHOW/CMADKqqI7h7PdHKtcOU359nB5yiW0F+aYwz7KuZ mVhkYa6ClM6cXWaTIzyThOa7SViRcaIBjoUc71MMqCpDI1JAo0PvhROeVNwt7TX6C3Gt Zac8TGmUzraZFjcbkyhN+A52OQNNIzpENZZHXdLaPQRMRcHZ2OcgcGKy/F0alDq+IvDh iF4bJlgyMZabak7FanXGj9XoslGc4oGtewiWFpmCIIsrASmZQ2aCOJOUR9EgUuEvSZPP 11Ww== X-Forwarded-Encrypted: i=1; AJvYcCVSYsUHzg8MeppglcCboBhRwwrA8DDmnKw1VAKNvBqzQOKZNF3jbDuOy/4mWDZO+AL9cDMYK7XF4yVhAsQ=@vger.kernel.org X-Gm-Message-State: AOJu0YzZZfr1X5Ue0O2rYp+8bIBuGu44QDBCw3sXuPlLcJRbjBTn4Pcd d5vUsap5gYwNAmUKc5gUyOOvcvrjsv2Eyhqr0SLKA9ssatp/6iVwG5UWJLGxFoBbu2NbrahSWZA s9wc/cBTh1rfgqA95Ow== X-Google-Smtp-Source: AGHT+IGK4m8FSya1x5h1yKPn0SlYZqL+0YnNSUJKqrhSqJhacVQdW7OQ9a4614HCOKTuoSQjtwwz7rbSxAQZQC1c X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a5b:a:0:b0:e25:cf7f:a065 with SMTP id 3f1490d57ef6-e25cf7fa247mr2793276.8.1727314533354; Wed, 25 Sep 2024 18:35:33 -0700 (PDT) Date: Thu, 26 Sep 2024 01:35:03 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-16-jthoughton@google.com> Subject: [PATCH v7 15/18] KVM: x86/mmu: Locklessly harvest access information from shadow MMU From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move where the lock is taken for the shadow MMU case to only take the lock when !range->arg.fast_only (i.e., for the non-fast_only aging MMU notifiers). Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a63497bbcc61..f47bd88b55e3 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1822,16 +1822,24 @@ static bool __kvm_rmap_age_gfn_range(struct kvm *kv= m, static bool kvm_rmap_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, bool test_only) { + bool young; + /* * We can always locklessly test if an spte is young. Because marking * non-A/D sptes for access tracking without holding the mmu_lock is * not currently supported, we cannot always locklessly clear. + * + * For fast_only, we must not take the mmu_lock, so locklessly age in + * that case even though we will not be able to clear the age for + * non-A/D sptes. */ - if (test_only) + if (test_only || range->arg.fast_only) return kvm_rmap_age_gfn_range_lockless(kvm, range, test_only); =20 - lockdep_assert_held_write(&kvm->mmu_lock); - return __kvm_rmap_age_gfn_range(kvm, range, test_only); + write_lock(&kvm->mmu_lock); + young =3D __kvm_rmap_age_gfn_range(kvm, range, test_only); + write_unlock(&kvm->mmu_lock); + return young; } =20 static bool kvm_has_shadow_mmu_sptes(struct kvm *kvm) @@ -1846,11 +1854,8 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_ran= ge *range) if (tdp_mmu_enabled) young =3D kvm_tdp_mmu_age_gfn_range(kvm, range); =20 - if (kvm_has_shadow_mmu_sptes(kvm)) { - write_lock(&kvm->mmu_lock); + if (kvm_has_shadow_mmu_sptes(kvm)) young |=3D kvm_rmap_age_gfn_range(kvm, range, false); - write_unlock(&kvm->mmu_lock); - } =20 return young; } @@ -1862,11 +1867,11 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_g= fn_range *range) if (tdp_mmu_enabled) young =3D kvm_tdp_mmu_test_age_gfn(kvm, range); =20 - if (!young && kvm_has_shadow_mmu_sptes(kvm)) { - write_lock(&kvm->mmu_lock); + if (young) + return young; + + if (kvm_has_shadow_mmu_sptes(kvm)) young |=3D kvm_rmap_age_gfn_range(kvm, range, true); - write_unlock(&kvm->mmu_lock); - } =20 return young; } --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-ua1-f74.google.com (mail-ua1-f74.google.com [209.85.222.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12CD41AB6F9 for ; Thu, 26 Sep 2024 01:35:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314537; cv=none; b=omQLYgM+LQmgPkXFXHYPzXwp8w/eCxBjkBeMGbb02H5Y6SbbznDtMBDrufBP+KxrVQYc6V/AvZ0ZXoAL7vFmHCXVw4S4QVt/7ADKV0GNBaIP2H92wFPm4eiY2XeQMQLh9rbKZH+8C7P1Dquh99LTTcPZUVoWhlE4VzgurDHuzXQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314537; c=relaxed/simple; bh=b3hOj3Hs+s10OWkM1ZyjJYJwKJI4XiaxkkuOL7OmSoE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=rkX/lusjNVDzOBhMpLp1225NPQFVkYe2IgkoA9E/4YKyNFMjrrCCsCcgGAY+RDfGN17/cC8axVH0ZifKQiBpAnnf6jk6cp44hiONaX4UvwY5pBUuhUEvTeQsYpEfc1MJjyMmvWO6IKMYY/PpUwbJj8ortPleBp0XFoL0QnRJHXg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=hNHeWhmg; arc=none smtp.client-ip=209.85.222.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="hNHeWhmg" Received: by mail-ua1-f74.google.com with SMTP id a1e0cc1a2514c-8485e720c52so328918241.1 for ; Wed, 25 Sep 2024 18:35:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314535; x=1727919335; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mEOlPh0OL8PB5uPzly3STECymuKs02gkKhKmsw1ZgQM=; b=hNHeWhmgLFRQ2/Uc9GTeVXn78ModPOXmrvuJYjSfRKHvFK93P5qrFHPFtd58NC3F4N WX6woI2oNEuf213Ih3lQiDxYftnApJVnOFx8g0SOTNNKEHeO6Y6S08EhwKKkucmVIMkv ZwqSLx8o5DxCrd1OZbNDh6KKCSEDKpMgS8M3Vu0Vj+Nc33B/JKfgJNgJxtJ6Jssr1Mt7 nhDzcMRqCj80o+ZKs5ZQgLCrbWPYNz6Q9RfmKjV8aJq7W/u0p4h5JKALmRFtEO3ZY+8Q 4o/ymmrg5fKH78RE4R2kk32bBvjGDQVr/Tz21wgeF3iVvRQi+1UXnF5aCCDBRyVpZTaZ hvsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314535; x=1727919335; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mEOlPh0OL8PB5uPzly3STECymuKs02gkKhKmsw1ZgQM=; b=mybMi5VN6t6mKy7v9vQlanA1I3FCqeqgR+EJw+T/a4z0R5xeoY2R3l+6qlIxPwjoMG E4baRrO6P3zXNAr/ChFKg2jKz26XV7hYv/iZAK5ooPfj6a+MGhqE2MkfbgaY31iolm+M 5miM34FKGjXb+zjkCBc7gbLPuQubUVlV28KZW5P0exDgqOhb5qCeEqWMjWp//xd2J1z0 O/hgO2jyGwcyyG3sb4gKC6UlWJQSNJ63m8hr6phZFnDZbMBpTA73ne9O7N4asRSIPG+y ONgMoLPzJC6TPr0z/LjTSRbAejzN0dAzPCLw4QUZ0P29AuJm04RNpPZ5ezZPlayhrIZw p41Q== X-Forwarded-Encrypted: i=1; AJvYcCW8E5c1MzyWns1A/hV1+xixhLdIaLgjJc8y0uQbn6v/2qcWAOEsqnbUL1IGQeRi5TyMWycMxyzFPZw9b9o=@vger.kernel.org X-Gm-Message-State: AOJu0YyKzxZJD4ouT9SOUzjjk8Q7O0Fq6nInkIsuggPdSyZngeXNrk2q yiV+aPynAzBbBf+YTKQpNCxO65d+mg1fbQjoUJMEum/t1k25MlBOQBmBkZmjp6A1z8U1cX7kM6V H2VE6Nt6ySJol3FG15w== X-Google-Smtp-Source: AGHT+IEMx7yb7oZJxjUcxiV+ubu09F3MgF2iO9wA9qRkGH/snKIzoGYYivc2BPhRyOg04he4NRTTUUTncXdmo6QX X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:ab0:71d3:0:b0:84e:89c5:3bdf with SMTP id a1e0cc1a2514c-84e9940fb1bmr4272241.0.1727314534796; Wed, 25 Sep 2024 18:35:34 -0700 (PDT) Date: Thu, 26 Sep 2024 01:35:04 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-17-jthoughton@google.com> Subject: [PATCH v7 16/18] KVM: x86/mmu: Enable has_fast_aging From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Because the x86 MMU locklessly implements fast_only versions of kvm_test_age_gfn and kvm_age_gfn, we can advertise support for has_fast_aging to allow MGLRU to quickly find better eviction candidates. There is one case where the MMU is not 100% accurate: for the shadow MMU, when A/D bits are not in use, young sptes will never be aged with the fast_only kvm_age_gfn. In this case, such pages will consistently appear young, so they will be the least likely eviction candidates. Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f47bd88b55e3..1798e3853d27 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7708,6 +7708,8 @@ int kvm_mmu_post_init_vm(struct kvm *kvm) { int err; =20 + kvm->mmu_notifier.has_fast_aging =3D true; + if (nx_hugepage_mitigation_hard_disabled) return 0; =20 --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-vs1-f74.google.com (mail-vs1-f74.google.com [209.85.217.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97C991AC88E for ; Thu, 26 Sep 2024 01:35:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314539; cv=none; b=QLezvC7ZuksciJS5mHl88JUQYqvmGD2OFuMTeVp7yvC0OOhkJ0l4B+pLRrOcIHS/mb5RQhJ5Sh4GMaXdhpaelfNP98FfBq/dDCqrZuHPSQ5BLxMLSD93r6FnPJ1iXyahh2eGs+/vawvhCpsQmp8EgJ9JQ5jhOXGUuuoRSZsR7Fg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314539; c=relaxed/simple; bh=3z196vrc/7EnpS5uiU8pv6T9dS6BXm+Q9maqS8EiBdY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=FaR5iQWR1bkoug4jguovY3gGMcUaJGvH3gwzjLP9uFs+XSrbG3FR6c65X+5KQUWsuIbfWuyZonaTu4kuwWXnJB24E0lP5GOgn4zPdNlqG9lSjYwYkV5hFCVFxvz+TPtMPQtRklJ4tUmoly0wz/lN49ZIZkqX6aQcB+ThBtjqP/I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=elxGYgQc; arc=none smtp.client-ip=209.85.217.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="elxGYgQc" Received: by mail-vs1-f74.google.com with SMTP id ada2fe7eead31-49bc7d7b31fso191944137.1 for ; Wed, 25 Sep 2024 18:35:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314536; x=1727919336; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MWj+EtmYN6BoWav4C6XYFCP/YJSRpIEJzj9cj+E6cQI=; b=elxGYgQce8WduOX//vnq2j+DAh6vsg0p8094Crk013athi1Kpiw3LuT/nmFInIPCki MAubiI2+M/cSOc9mORkX7apBDGSGLKGO5sRwkDRSExpsarAvgVtVNWDT0f8mCqi0DysL wYhAlN21qJE65gQRDcEvfPzemX5CwsOUovpJ8YCaKozC+zGVRQnndvuqNquwCqXYxpSu pCWmJ0qlvkOv3toTQo52WfsJX93ztcUfZKQS8szkXse0HHen5uBVD+OrOQYy6gY9GaSO bI5fvazPI0MFilQPQOXaD1NvSeALYbEwmtDMTuZ3r4/Kne3mrQi2adw67HiwGdgpGWkX mzGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314536; x=1727919336; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MWj+EtmYN6BoWav4C6XYFCP/YJSRpIEJzj9cj+E6cQI=; b=t1N8cRPdIZ1nmKJtQpER7PQanrFYzU4hwKlONn+a0so2riC6WgHv7KA4cAD1IESz7N EzoVHNrs4TwVni3QIHPnNmyIik6debG9fjdFicwtTgwrRMlaxsH0GoTXvoYkI9Zah1GO S5LH0OVnxs6Ygosd/pbr+1Rk2raCK43H8G+BY3Tp2Ne7sY43S8SlaF2dV8iWWw27YgJi fxQrJc3AO8KrhPt0Wn/7B3WzfZ326JridS36b1IxxaDVXcS4/BsJNJjo9O9+TOSd+PhU //XIO1GaDCWnK+V5jERL5KJERWL90SLWBtt37QdEmLUSTRGSC+2KZYTU3SCn9AfMj47K 0ycQ== X-Forwarded-Encrypted: i=1; AJvYcCUD5k4++x9t5cRQ9pe+ymWTCyJ0kyCY411b1matCdhGOUEy65Kt87z9x4K9R5GAOS3IAvH+j4n7F6TadLQ=@vger.kernel.org X-Gm-Message-State: AOJu0YwZToOzebU9gMmCooZ4D6yNkNtTK5g1ArzVs10ySbBehaE5NU/t Xwyvv7iGwTnBGgoSBS6yGeadz2/je6Tw2xQ5/jloBfGFLxsfIF8UqYgDSMnX1QsxmpG/6Weo607 fwtsT++vqleJTq6MeVQ== X-Google-Smtp-Source: AGHT+IEWbFq5+mKO2q49QysNNPvZDCXSWbklfJ2UiM9KvwQART8aUu+/JLlyrv3E0SSfh27Nwb4MbN5v9Rl58JjT X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a05:6102:5e97:b0:48d:9c9b:f391 with SMTP id ada2fe7eead31-4a15dd4beb2mr104946137.5.1727314536465; Wed, 25 Sep 2024 18:35:36 -0700 (PDT) Date: Thu, 26 Sep 2024 01:35:05 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-18-jthoughton@google.com> Subject: [PATCH v7 17/18] mm: multi-gen LRU: Have secondary MMUs participate in aging From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Secondary MMUs are currently consulted for access/age information at eviction time, but before then, we don't get accurate age information. That is, pages that are mostly accessed through a secondary MMU (like guest memory, used by KVM) will always just proceed down to the oldest generation, and then at eviction time, if KVM reports the page to be young, the page will be activated/promoted back to the youngest generation. The added feature bit (0x8), if disabled, will make MGLRU behave as if there are no secondary MMUs subscribed to MMU notifiers except at eviction time. Implement aging with the new mmu_notifier_clear_young_fast_only() notifier. For architectures that do not support this notifier, this becomes a no-op. For architectures that do implement it, it should be fast enough to make aging worth it (usually the case if the notifier is implemented locklessly). Suggested-by: Yu Zhao Signed-off-by: James Houghton --- Documentation/admin-guide/mm/multigen_lru.rst | 6 +- include/linux/mmzone.h | 6 +- mm/rmap.c | 9 +- mm/vmscan.c | 148 ++++++++++++++---- 4 files changed, 127 insertions(+), 42 deletions(-) diff --git a/Documentation/admin-guide/mm/multigen_lru.rst b/Documentation/= admin-guide/mm/multigen_lru.rst index 33e068830497..e1862407652c 100644 --- a/Documentation/admin-guide/mm/multigen_lru.rst +++ b/Documentation/admin-guide/mm/multigen_lru.rst @@ -48,6 +48,10 @@ Values Components verified on x86 varieties other than Intel and AMD. If it is disabled, the multi-gen LRU will suffer a negligible performance degradation. +0x0008 Clear the accessed bit in secondary MMU page tables when aging + instead of waiting until eviction time. This results in accurate + page age information for pages that are mainly used by a + secondary MMU. [yYnN] Apply to all the components above. =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 @@ -56,7 +60,7 @@ E.g., =20 echo y >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled - 0x0007 + 0x000f echo 5 >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0005 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 1dc6248feb83..dbfb868c3708 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -400,6 +400,7 @@ enum { LRU_GEN_CORE, LRU_GEN_MM_WALK, LRU_GEN_NONLEAF_YOUNG, + LRU_GEN_SECONDARY_MMU_WALK, NR_LRU_GEN_CAPS }; =20 @@ -557,7 +558,7 @@ struct lru_gen_memcg { =20 void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); =20 void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -576,8 +577,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *l= ruvec) { } =20 -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } =20 static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index 2490e727e2dc..51bbda3bae60 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -870,13 +870,10 @@ static bool folio_referenced_one(struct folio *folio, continue; } =20 - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index cfa839284b92..6ab87dd1c6d9 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include #include #include +#include =20 #include #include @@ -2594,6 +2595,11 @@ static bool should_clear_pmd_young(void) return arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG); } =20 +static bool should_walk_secondary_mmu(void) +{ + return get_cap(LRU_GEN_SECONDARY_MMU_WALK); +} + /*************************************************************************= ***** * shorthand helpers *************************************************************************= *****/ @@ -3291,7 +3297,8 @@ static bool get_next_vma(unsigned long mask, unsigned= long size, struct mm_walk return false; } =20 -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, un= signed long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, un= signed long addr, + struct pglist_data *pgdat) { unsigned long pfn =3D pte_pfn(pte); =20 @@ -3306,10 +3313,15 @@ static unsigned long get_pte_pfn(pte_t pte, struct = vm_area_struct *vma, unsigned if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; =20 + /* try to avoid unnecessary memory loads */ + if (pfn < pgdat->node_start_pfn || pfn >=3D pgdat_end_pfn(pgdat)) + return -1; + return pfn; } =20 -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, un= signed long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, un= signed long addr, + struct pglist_data *pgdat) { unsigned long pfn =3D pmd_pfn(pmd); =20 @@ -3324,6 +3336,10 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct v= m_area_struct *vma, unsigned if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; =20 + /* try to avoid unnecessary memory loads */ + if (pfn < pgdat->node_start_pfn || pfn >=3D pgdat_end_pfn(pgdat)) + return -1; + return pfn; } =20 @@ -3332,10 +3348,6 @@ static struct folio *get_pfn_folio(unsigned long pfn= , struct mem_cgroup *memcg, { struct folio *folio; =20 - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >=3D pgdat_end_pfn(pgdat)) - return NULL; - folio =3D pfn_folio(pfn); if (folio_nid(folio) !=3D pgdat->node_id) return NULL; @@ -3358,6 +3370,26 @@ static bool suitable_to_scan(int total, int young) return young * n >=3D total; } =20 +static bool lru_gen_notifier_clear_young(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + return should_walk_secondary_mmu() && + mmu_notifier_clear_young_fast_only(mm, start, end); +} + +static bool lru_gen_pmdp_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, + pmd_t *pmd) +{ + bool young =3D pmdp_test_and_clear_young(vma, addr, pmd); + + if (lru_gen_notifier_clear_young(vma->vm_mm, addr, addr + PMD_SIZE)) + young =3D true; + + return young; +} + static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long = end, struct mm_walk *args) { @@ -3372,8 +3404,9 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long = start, unsigned long end, struct pglist_data *pgdat =3D lruvec_pgdat(walk->lruvec); DEFINE_MAX_SEQ(walk->lruvec); int old_gen, new_gen =3D lru_gen_from_seq(max_seq); + struct mm_struct *mm =3D args->mm; =20 - pte =3D pte_offset_map_nolock(args->mm, pmd, start & PMD_MASK, &ptl); + pte =3D pte_offset_map_nolock(mm, pmd, start & PMD_MASK, &ptl); if (!pte) return false; if (!spin_trylock(ptl)) { @@ -3391,11 +3424,11 @@ static bool walk_pte_range(pmd_t *pmd, unsigned lon= g start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; =20 - pfn =3D get_pte_pfn(ptent, args->vma, addr); + pfn =3D get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn =3D=3D -1) continue; =20 - if (!pte_young(ptent)) { + if (!pte_young(ptent) && !mm_has_notifiers(mm)) { walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3404,8 +3437,14 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long= start, unsigned long end, if (!folio) continue; =20 - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!lru_gen_notifier_clear_young(mm, addr, addr + PAGE_SIZE) && + !pte_young(ptent)) { + walk->mm_stats[MM_LEAF_OLD]++; + continue; + } + + if (pte_young(ptent)) + ptep_test_and_clear_young(args->vma, addr, pte + i); =20 young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3471,22 +3510,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsig= ned long addr, struct vm_area /* don't round down the first address */ addr =3D i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; =20 - pfn =3D get_pmd_pfn(pmd[i], vma, addr); - if (pfn =3D=3D -1) - goto next; - - if (!pmd_trans_huge(pmd[i])) { - if (should_clear_pmd_young()) + if (pmd_present(pmd[i]) && !pmd_trans_huge(pmd[i])) { + if (should_clear_pmd_young() && + !should_walk_secondary_mmu()) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } =20 + pfn =3D get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn =3D=3D -1) + goto next; + folio =3D get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; =20 - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!lru_gen_pmdp_test_and_clear_young(vma, addr, pmd + i)) { + walk->mm_stats[MM_LEAF_OLD]++; goto next; + } =20 walk->mm_stats[MM_LEAF_YOUNG]++; =20 @@ -3543,19 +3585,18 @@ static void walk_pmd_range(pud_t *pud, unsigned lon= g start, unsigned long end, } =20 if (pmd_trans_huge(val)) { - unsigned long pfn =3D pmd_pfn(val); struct pglist_data *pgdat =3D lruvec_pgdat(walk->lruvec); + unsigned long pfn =3D get_pmd_pfn(val, vma, addr, pgdat); =20 walk->mm_stats[MM_LEAF_TOTAL]++; =20 - if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; + if (pfn =3D=3D -1) continue; - } =20 - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >=3D pgdat_end_pfn(pgdat)) + if (!pmd_young(val) && !mm_has_notifiers(args->mm)) { + walk->mm_stats[MM_LEAF_OLD]++; continue; + } =20 walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; @@ -3563,7 +3604,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long = start, unsigned long end, =20 walk->mm_stats[MM_NONLEAF_TOTAL]++; =20 - if (should_clear_pmd_young()) { + if (should_clear_pmd_young() && !should_walk_secondary_mmu()) { if (!pmd_young(val)) continue; =20 @@ -4030,6 +4071,31 @@ static void lru_gen_age_node(struct pglist_data *pgd= at, struct scan_control *sc) * rmap/PT walk feedback *************************************************************************= *****/ =20 +static bool should_look_around(struct vm_area_struct *vma, unsigned long a= ddr, + pte_t *pte, int *young) +{ + int secondary_young =3D mmu_notifier_clear_young( + vma->vm_mm, addr, addr + PAGE_SIZE); + + /* + * Look around if (1) the PTE is young or (2) the secondary PTE was + * young and one of the "fast" MMUs of one of the secondary MMUs + * reported that the page was young. + */ + if (pte_young(ptep_get(pte))) { + ptep_test_and_clear_young(vma, addr, pte); + *young =3D true; + return true; + } + + if (secondary_young) { + *young =3D true; + return mm_has_fast_young_notifiers(vma->vm_mm); + } + + return false; +} + /* * This function exploits spatial locality when shrink_folio_list() walks = the * rmap. It scans the adjacent PTEs of a young PTE and promotes hot pages.= If @@ -4037,7 +4103,7 @@ static void lru_gen_age_node(struct pglist_data *pgda= t, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between t= he * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; @@ -4055,16 +4121,20 @@ void lru_gen_look_around(struct page_vma_mapped_wal= k *pvmw) struct lru_gen_mm_state *mm_state =3D get_mm_state(lruvec); DEFINE_MAX_SEQ(lruvec); int old_gen, new_gen =3D lru_gen_from_seq(max_seq); + struct mm_struct *mm =3D pvmw->vma->vm_mm; =20 lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); =20 + if (!should_look_around(vma, addr, pte, &young)) + return young; + if (spin_is_contended(pvmw->ptl)) - return; + return young; =20 /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return young; =20 /* avoid taking the LRU lock under the PTL when possible */ walk =3D current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4072,6 +4142,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk = *pvmw) start =3D max(addr & PMD_MASK, vma->vm_start); end =3D min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; =20 + if (end - start =3D=3D PAGE_SIZE) + return young; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end =3D start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4085,7 +4158,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk = *pvmw) =20 /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return young; =20 arch_enter_lazy_mmu_mode(); =20 @@ -4095,19 +4168,23 @@ void lru_gen_look_around(struct page_vma_mapped_wal= k *pvmw) unsigned long pfn; pte_t ptent =3D ptep_get(pte + i); =20 - pfn =3D get_pte_pfn(ptent, vma, addr); + pfn =3D get_pte_pfn(ptent, vma, addr, pgdat); if (pfn =3D=3D -1) continue; =20 - if (!pte_young(ptent)) + if (!pte_young(ptent) && !mm_has_notifiers(mm)) continue; =20 folio =3D get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; =20 - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!lru_gen_notifier_clear_young(mm, addr, addr + PAGE_SIZE) && + !pte_young(ptent)) + continue; + + if (pte_young(ptent)) + ptep_test_and_clear_young(vma, addr, pte + i); =20 young++; =20 @@ -4137,6 +4214,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk = *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return young; } =20 /*************************************************************************= ***** @@ -5140,6 +5219,9 @@ static ssize_t enabled_show(struct kobject *kobj, str= uct kobj_attribute *attr, c if (should_clear_pmd_young()) caps |=3D BIT(LRU_GEN_NONLEAF_YOUNG); =20 + if (should_walk_secondary_mmu()) + caps |=3D BIT(LRU_GEN_SECONDARY_MMU_WALK); + return sysfs_emit(buf, "0x%04x\n", caps); } =20 --=20 2.46.0.792.g87dc391469-goog From nobody Fri Nov 29 00:53:51 2024 Received: from mail-vk1-f201.google.com (mail-vk1-f201.google.com [209.85.221.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53A061ACDFD for ; Thu, 26 Sep 2024 01:35:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314542; cv=none; b=uOnNch0x3T7RA6JFo4SE8XL5yp5fZUWeuPHsTrVV+Ma4VHS37jP3rzLzeLXizj7D8C29EqJVVduwWEXExMzqyvDl/txUwo2QCm2xxZLitHZsZAGIZHpojXhIotnrLBXEEFOibizHDyAEPBVVGdSfNi168SqMSt/KCswSuCtkte8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727314542; c=relaxed/simple; bh=ULluF8Lj2dBffsCUmmqYJdOp6vNt1MER/lIJru+i8Zs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=coqqwvSUuEMwJ2bTvXX+R3aqSClvx1phMPtOAMtzQa/oFFnsMzEkbz6QmXVc6DJCqolOU0vL1LEyxLuYYs8zVU+Cdkkfy1mLQB1SoumSZopp11fEHf6PzLV94Au12vRaL3S0d8HNGw/oOzpDCOoq13K8H6rtvI+XyFrtckbTV8Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=e6OzDfjL; arc=none smtp.client-ip=209.85.221.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="e6OzDfjL" Received: by mail-vk1-f201.google.com with SMTP id 71dfb90a1353d-50124ddd2d5so163913e0c.2 for ; Wed, 25 Sep 2024 18:35:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314538; x=1727919338; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=1HitpDOpIhCd6Znj3amk60TrYFnYLRb+txcGZehcXFE=; b=e6OzDfjLR0oMdNDbiPVDipbsNcaAzKRYB1ugZmHI3ar8jmnKVk6j0U7jc2B2mhPoRj BUI53TuTPscGnH8tT8RxJEkdwqjQlfdq9Dk6QEQ2QfMg18CqVuwEB03VlL+O6Ng5TZBo HUK26R3zs5jt1SEAeC7+8LqhHjKFU+MaFwLcHZOHpX/RjQeycq0G2Bx6/GLVKV413TK4 KKOwzw5MmRKkP1wUhhALn1cQGSixKeFnuEUC/i3/a4fuozdYN+pEN2AZmnV3ePM08o3L 4K07fXN/51ISMr1+TT+eWcisDf8rnvpcpKD9k1SHqvgmQy+3LIo02KLecMVf+PX0RIOS KiMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314538; x=1727919338; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1HitpDOpIhCd6Znj3amk60TrYFnYLRb+txcGZehcXFE=; b=ZCH2KEYGzKNz4BsN64gNdQb1LSrsXs54c1U/FIue3fkAC+cw00qfNY6NRixAe24lUl BELIb/0U/33g9XCs2/jjYUwNTVtM/HzDjZf/1ub58yJI230s5Mf41k2ew2bVLr1FY/M8 Hu906OCED/3wolkciE8rvMrPHToPvwJM+Ncjl/aN6OstlAc83AgngIKKqXrnwKAafEgt grbpqT5s/4YIvh0YDko5Mk2ocYJCL29S7uRylJQB4QTdueE3m1m3UhD704y7cPwGhVET kk4xLkJO5wyEN1Z0kkZOzw6SG1TyUBlm+Wqhe5fil11WZYdceHqx9i/pa2ObwJB8SLqN XEjQ== X-Forwarded-Encrypted: i=1; AJvYcCXYcNhEeqPt6Sh3tqrb0CJUj0IoEpOSIIU0y6Lq40g56VCo4Lva0YqI0b28vqcnM3JgnZKg/dY/XHTSTTE=@vger.kernel.org X-Gm-Message-State: AOJu0YyjpstBsDoN86ZeV4iqPrbJPy+KxZwtwYeo9UORmuwy+J7Ks51s ygXLXfCTNjE6+V9ZvqLZZO38pSHeOPbDoVwgmUSc4+d4F0gtiuo02Qyo4oPQDwpsP13XIKMx2KJ KZ+zhbOvz3t9KSvQhAQ== X-Google-Smtp-Source: AGHT+IFV8iyT3LpbXR6TuEpCRITD2JCMqJuJWTypvZWR30Nc3wsvgcazQHmES7rpaskbP+3JFCF4Mj/JcwI7L1Ez X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a05:6122:30aa:b0:503:dbaa:4402 with SMTP id 71dfb90a1353d-505c206d347mr80904e0c.2.1727314538169; Wed, 25 Sep 2024 18:35:38 -0700 (PDT) Date: Thu, 26 Sep 2024 01:35:06 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-19-jthoughton@google.com> Subject: [PATCH v7 18/18] KVM: selftests: Add multi-gen LRU aging to access_tracking_perf_test From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This test now has two modes of operation: 1. (default) To check how much vCPU performance was affected by access tracking (previously existed, now supports MGLRU aging). 2. (-p) To also benchmark how fast MGLRU can do aging while vCPUs are faulting in memory. Mode (1) also serves as a way to verify that aging is working properly for pages only accessed by KVM. It will fail if one does not have the 0x8 lru_gen feature bit. To support MGLRU, the test creates a memory cgroup, moves itself into it, then uses the lru_gen debugfs output to track memory in that cgroup. The logic to parse the lru_gen debugfs output has been put into selftests/kvm/lib/lru_gen_util.c. Co-developed-by: Axel Rasmussen Signed-off-by: Axel Rasmussen Signed-off-by: James Houghton --- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/access_tracking_perf_test.c | 369 +++++++++++++++-- .../selftests/kvm/include/lru_gen_util.h | 55 +++ .../testing/selftests/kvm/lib/lru_gen_util.c | 391 ++++++++++++++++++ 4 files changed, 786 insertions(+), 30 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/lru_gen_util.h create mode 100644 tools/testing/selftests/kvm/lib/lru_gen_util.c diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests= /kvm/Makefile index 4b3a58d3d473..4b89ab5aff43 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -22,6 +22,7 @@ LIBKVM +=3D lib/elf.c LIBKVM +=3D lib/guest_modes.c LIBKVM +=3D lib/io.c LIBKVM +=3D lib/kvm_util.c +LIBKVM +=3D lib/lru_gen_util.c LIBKVM +=3D lib/memstress.c LIBKVM +=3D lib/guest_sprintf.c LIBKVM +=3D lib/rbtree.c diff --git a/tools/testing/selftests/kvm/access_tracking_perf_test.c b/tool= s/testing/selftests/kvm/access_tracking_perf_test.c index 3c7defd34f56..6ff64ac349a9 100644 --- a/tools/testing/selftests/kvm/access_tracking_perf_test.c +++ b/tools/testing/selftests/kvm/access_tracking_perf_test.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -47,6 +48,20 @@ #include "memstress.h" #include "guest_modes.h" #include "processor.h" +#include "lru_gen_util.h" + +static const char *TEST_MEMCG_NAME =3D "access_tracking_perf_test"; +static const int LRU_GEN_ENABLED =3D 0x1; +static const int LRU_GEN_MM_WALK =3D 0x2; +static const int LRU_GEN_SECONDARY_MMU_WALK =3D 0x8; +static const char *CGROUP_PROCS =3D "cgroup.procs"; +/* + * If using MGLRU, this test assumes a cgroup v2 or cgroup v1 memory hiera= rchy + * is mounted at cgroup_root. + * + * Can be changed with -r. + */ +static const char *cgroup_root =3D "/sys/fs/cgroup"; =20 /* Global variable used to synchronize all of the vCPU threads. */ static int iteration; @@ -62,6 +77,9 @@ static enum { /* The iteration that was last completed by each vCPU. */ static int vcpu_last_completed_iteration[KVM_MAX_VCPUS]; =20 +/* The time at which the last iteration was completed */ +static struct timespec vcpu_last_completed_time[KVM_MAX_VCPUS]; + /* Whether to overlap the regions of memory vCPUs access. */ static bool overlap_memory_access; =20 @@ -74,6 +92,12 @@ struct test_params { =20 /* The number of vCPUs to create in the VM. */ int nr_vcpus; + + /* Whether to use lru_gen aging instead of idle page tracking. */ + bool lru_gen; + + /* Whether to test the performance of aging itself. */ + bool benchmark_lru_gen; }; =20 static uint64_t pread_uint64(int fd, const char *filename, uint64_t index) @@ -89,6 +113,50 @@ static uint64_t pread_uint64(int fd, const char *filena= me, uint64_t index) =20 } =20 +static void write_file_long(const char *path, long v) +{ + FILE *f; + + f =3D fopen(path, "w"); + TEST_ASSERT(f, "fopen(%s) failed", path); + TEST_ASSERT(fprintf(f, "%ld\n", v) > 0, + "fprintf to %s failed", path); + TEST_ASSERT(!fclose(f), "fclose(%s) failed", path); +} + +static char *path_join(const char *parent, const char *child) +{ + char *out =3D NULL; + + return asprintf(&out, "%s/%s", parent, child) >=3D 0 ? out : NULL; +} + +static char *memcg_path(const char *memcg) +{ + return path_join(cgroup_root, memcg); +} + +static char *memcg_file_path(const char *memcg, const char *file) +{ + char *mp =3D memcg_path(memcg); + char *fp; + + if (!mp) + return NULL; + fp =3D path_join(mp, file); + free(mp); + return fp; +} + +static void move_to_memcg(const char *memcg, pid_t pid) +{ + char *procs =3D memcg_file_path(memcg, CGROUP_PROCS); + + TEST_ASSERT(procs, "Failed to construct cgroup.procs path"); + write_file_long(procs, pid); + free(procs); +} + #define PAGEMAP_PRESENT (1ULL << 63) #define PAGEMAP_PFN_MASK ((1ULL << 55) - 1) =20 @@ -242,6 +310,8 @@ static void vcpu_thread_main(struct memstress_vcpu_args= *vcpu_args) }; =20 vcpu_last_completed_iteration[vcpu_idx] =3D current_iteration; + clock_gettime(CLOCK_MONOTONIC, + &vcpu_last_completed_time[vcpu_idx]); } } =20 @@ -253,38 +323,68 @@ static void spin_wait_for_vcpu(int vcpu_idx, int targ= et_iteration) } } =20 +static bool all_vcpus_done(int target_iteration, int nr_vcpus) +{ + for (int i =3D 0; i < nr_vcpus; ++i) + if (READ_ONCE(vcpu_last_completed_iteration[i]) !=3D + target_iteration) + return false; + + return true; +} + /* The type of memory accesses to perform in the VM. */ enum access_type { ACCESS_READ, ACCESS_WRITE, }; =20 -static void run_iteration(struct kvm_vm *vm, int nr_vcpus, const char *des= cription) +static void run_iteration(struct kvm_vm *vm, int nr_vcpus, const char *des= cription, + bool wait) { - struct timespec ts_start; - struct timespec ts_elapsed; int next_iteration, i; =20 /* Kick off the vCPUs by incrementing iteration. */ next_iteration =3D ++iteration; =20 - clock_gettime(CLOCK_MONOTONIC, &ts_start); - /* Wait for all vCPUs to finish the iteration. */ - for (i =3D 0; i < nr_vcpus; i++) - spin_wait_for_vcpu(i, next_iteration); + if (wait) { + struct timespec ts_start; + struct timespec ts_elapsed; + + clock_gettime(CLOCK_MONOTONIC, &ts_start); =20 - ts_elapsed =3D timespec_elapsed(ts_start); - pr_info("%-30s: %ld.%09lds\n", - description, ts_elapsed.tv_sec, ts_elapsed.tv_nsec); + for (i =3D 0; i < nr_vcpus; i++) + spin_wait_for_vcpu(i, next_iteration); + + ts_elapsed =3D timespec_elapsed(ts_start); + + pr_info("%-30s: %ld.%09lds\n", + description, ts_elapsed.tv_sec, ts_elapsed.tv_nsec); + } else + pr_info("%-30s\n", description); } =20 -static void access_memory(struct kvm_vm *vm, int nr_vcpus, - enum access_type access, const char *description) +static void _access_memory(struct kvm_vm *vm, int nr_vcpus, + enum access_type access, const char *description, + bool wait) { memstress_set_write_percent(vm, (access =3D=3D ACCESS_READ) ? 0 : 100); iteration_work =3D ITERATION_ACCESS_MEMORY; - run_iteration(vm, nr_vcpus, description); + run_iteration(vm, nr_vcpus, description, wait); +} + +static void access_memory(struct kvm_vm *vm, int nr_vcpus, + enum access_type access, const char *description) +{ + return _access_memory(vm, nr_vcpus, access, description, true); +} + +static void access_memory_async(struct kvm_vm *vm, int nr_vcpus, + enum access_type access, + const char *description) +{ + return _access_memory(vm, nr_vcpus, access, description, false); } =20 static void mark_memory_idle(struct kvm_vm *vm, int nr_vcpus) @@ -297,19 +397,115 @@ static void mark_memory_idle(struct kvm_vm *vm, int = nr_vcpus) */ pr_debug("Marking VM memory idle (slow)...\n"); iteration_work =3D ITERATION_MARK_IDLE; - run_iteration(vm, nr_vcpus, "Mark memory idle"); + run_iteration(vm, nr_vcpus, "Mark memory idle", true); } =20 -static void run_test(enum vm_guest_mode mode, void *arg) +static void create_memcg(const char *memcg) +{ + const char *full_memcg_path =3D memcg_path(memcg); + int ret; + + TEST_ASSERT(full_memcg_path, "Failed to construct full memcg path"); +retry: + ret =3D mkdir(full_memcg_path, 0755); + if (ret && errno =3D=3D EEXIST) { + TEST_ASSERT(!rmdir(full_memcg_path), + "Found existing memcg at %s, but rmdir failed", + full_memcg_path); + goto retry; + } + TEST_ASSERT(!ret, "Creating the memcg failed: mkdir(%s) failed", + full_memcg_path); + + pr_info("Created memcg at %s\n", full_memcg_path); +} + +/* + * Test lru_gen aging speed while vCPUs are faulting memory in. + * + * This test will run lru_gen aging until the vCPUs have finished all of + * the faulting work, reporting: + * - vcpu wall time (wall time for slowest vCPU) + * - average aging pass duration + * - total number of aging passes + * - total time spent aging + * + * This test produces the most useful results when the vcpu wall time and = the + * total time spent aging are similar (i.e., we want to avoid timing aging + * while the vCPUs aren't doing any work). + */ +static void run_benchmark(enum vm_guest_mode mode, struct kvm_vm *vm, + struct test_params *params) { - struct test_params *params =3D arg; - struct kvm_vm *vm; int nr_vcpus =3D params->nr_vcpus; + struct memcg_stats stats; + struct timespec ts_start, ts_max, ts_vcpus_elapsed, + ts_aging_elapsed, ts_aging_elapsed_avg; + int num_passes =3D 0; =20 - vm =3D memstress_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1, - params->backing_src, !overlap_memory_access); + printf("Running lru_gen benchmark...\n"); =20 - memstress_start_vcpu_threads(nr_vcpus, vcpu_thread_main); + clock_gettime(CLOCK_MONOTONIC, &ts_start); + access_memory_async(vm, nr_vcpus, ACCESS_WRITE, + "Populating memory (async)"); + while (!all_vcpus_done(iteration, nr_vcpus)) { + lru_gen_do_aging_quiet(&stats, TEST_MEMCG_NAME); + ++num_passes; + } + + ts_aging_elapsed =3D timespec_elapsed(ts_start); + ts_aging_elapsed_avg =3D timespec_div(ts_aging_elapsed, num_passes); + + /* Find out when the slowest vCPU finished. */ + ts_max =3D ts_start; + for (int i =3D 0; i < nr_vcpus; ++i) { + struct timespec *vcpu_ts =3D &vcpu_last_completed_time[i]; + + if (ts_max.tv_sec < vcpu_ts->tv_sec || + (ts_max.tv_sec =3D=3D vcpu_ts->tv_sec && + ts_max.tv_nsec < vcpu_ts->tv_nsec)) + ts_max =3D *vcpu_ts; + } + + ts_vcpus_elapsed =3D timespec_sub(ts_max, ts_start); + + pr_info("%-30s: %ld.%09lds\n", "vcpu wall time", + ts_vcpus_elapsed.tv_sec, ts_vcpus_elapsed.tv_nsec); + + pr_info("%-30s: %ld.%09lds, (passes:%d, total:%ld.%09lds)\n", + "lru_gen avg pass duration", + ts_aging_elapsed_avg.tv_sec, + ts_aging_elapsed_avg.tv_nsec, + num_passes, + ts_aging_elapsed.tv_sec, + ts_aging_elapsed.tv_nsec); +} + +/* + * Test how much access tracking affects vCPU performance. + * + * Supports two modes of access tracking: + * - idle page tracking + * - lru_gen aging + * + * When using lru_gen, this test additionally verifies that the pages are = in + * fact getting younger and older, otherwise the performance data would be + * invalid. + * + * The forced lru_gen aging can race with aging that occurs naturally. + */ +static void run_test(enum vm_guest_mode mode, struct kvm_vm *vm, + struct test_params *params) +{ + int nr_vcpus =3D params->nr_vcpus; + bool lru_gen =3D params->lru_gen; + struct memcg_stats stats; + // If guest_page_size is larger than the host's page size, the + // guest (memstress) will only fault in a subset of the host's pages. + long total_pages =3D nr_vcpus * params->vcpu_memory_bytes / + max(memstress_args.guest_page_size, + (uint64_t)getpagesize()); + int found_gens[5]; =20 pr_info("\n"); access_memory(vm, nr_vcpus, ACCESS_WRITE, "Populating memory"); @@ -319,11 +515,78 @@ static void run_test(enum vm_guest_mode mode, void *a= rg) access_memory(vm, nr_vcpus, ACCESS_READ, "Reading from populated memory"); =20 /* Repeat on memory that has been marked as idle. */ - mark_memory_idle(vm, nr_vcpus); + if (lru_gen) { + /* Do an initial page table scan */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + TEST_ASSERT(sum_memcg_stats(&stats) >=3D total_pages, + "Not all pages tracked in lru_gen stats.\n" + "Is lru_gen enabled? Did the memcg get created properly?"); + + /* Find the generation we're currently in (probably youngest) */ + found_gens[0] =3D lru_gen_find_generation(&stats, total_pages); + + /* Do an aging pass now */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* Same generation, but a newer generation has been made */ + found_gens[1] =3D lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[1] =3D=3D found_gens[0], + "unexpected gen change: %d vs. %d", + found_gens[1], found_gens[0]); + } else + mark_memory_idle(vm, nr_vcpus); + access_memory(vm, nr_vcpus, ACCESS_WRITE, "Writing to idle memory"); - mark_memory_idle(vm, nr_vcpus); + + if (lru_gen) { + /* Scan the page tables again */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* The pages should now be young again, so in a newer generation */ + found_gens[2] =3D lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[2] > found_gens[1], + "pages did not get younger"); + + /* Do another aging pass */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* Same generation; new generation has been made */ + found_gens[3] =3D lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[3] =3D=3D found_gens[2], + "unexpected gen change: %d vs. %d", + found_gens[3], found_gens[2]); + } else + mark_memory_idle(vm, nr_vcpus); + access_memory(vm, nr_vcpus, ACCESS_READ, "Reading from idle memory"); =20 + if (lru_gen) { + /* Scan the pages tables again */ + lru_gen_do_aging(&stats, TEST_MEMCG_NAME); + + /* The pages should now be young again, so in a newer generation */ + found_gens[4] =3D lru_gen_find_generation(&stats, total_pages); + TEST_ASSERT(found_gens[4] > found_gens[3], + "pages did not get younger"); + } +} + +static void setup_vm_and_run(enum vm_guest_mode mode, void *arg) +{ + struct test_params *params =3D arg; + int nr_vcpus =3D params->nr_vcpus; + struct kvm_vm *vm; + + vm =3D memstress_create_vm(mode, nr_vcpus, params->vcpu_memory_bytes, 1, + params->backing_src, !overlap_memory_access); + + memstress_start_vcpu_threads(nr_vcpus, vcpu_thread_main); + + if (params->benchmark_lru_gen) + run_benchmark(mode, vm, params); + else + run_test(mode, vm, params); + memstress_join_vcpu_threads(nr_vcpus); memstress_destroy_vm(vm); } @@ -331,8 +594,8 @@ static void run_test(enum vm_guest_mode mode, void *arg) static void help(char *name) { puts(""); - printf("usage: %s [-h] [-m mode] [-b vcpu_bytes] [-v vcpus] [-o] [-s mem= _type]\n", - name); + printf("usage: %s [-h] [-m mode] [-b vcpu_bytes] [-v vcpus] [-o]" + " [-s mem_type] [-l] [-r memcg_root]\n", name); puts(""); printf(" -h: Display this help message."); guest_modes_help(); @@ -342,6 +605,9 @@ static void help(char *name) printf(" -v: specify the number of vCPUs to run.\n"); printf(" -o: Overlap guest memory accesses instead of partitioning\n" " them into a separate region of memory for each vCPU.\n"); + printf(" -l: Use MGLRU aging instead of idle page tracking\n"); + printf(" -p: Benchmark MGLRU aging while faulting memory in\n"); + printf(" -r: The memory cgroup hierarchy root to use (when -l is given)\n= "); backing_src_help("-s"); puts(""); exit(0); @@ -353,13 +619,15 @@ int main(int argc, char *argv[]) .backing_src =3D DEFAULT_VM_MEM_SRC, .vcpu_memory_bytes =3D DEFAULT_PER_VCPU_MEM_SIZE, .nr_vcpus =3D 1, + .lru_gen =3D false, + .benchmark_lru_gen =3D false, }; int page_idle_fd; int opt; =20 guest_modes_append_default(); =20 - while ((opt =3D getopt(argc, argv, "hm:b:v:os:")) !=3D -1) { + while ((opt =3D getopt(argc, argv, "hm:b:v:os:lr:p")) !=3D -1) { switch (opt) { case 'm': guest_modes_cmdline(optarg); @@ -376,6 +644,15 @@ int main(int argc, char *argv[]) case 's': params.backing_src =3D parse_backing_src_type(optarg); break; + case 'l': + params.lru_gen =3D true; + break; + case 'p': + params.benchmark_lru_gen =3D true; + break; + case 'r': + cgroup_root =3D strdup(optarg); + break; case 'h': default: help(argv[0]); @@ -383,12 +660,44 @@ int main(int argc, char *argv[]) } } =20 - page_idle_fd =3D open("/sys/kernel/mm/page_idle/bitmap", O_RDWR); - __TEST_REQUIRE(page_idle_fd >=3D 0, - "CONFIG_IDLE_PAGE_TRACKING is not enabled"); - close(page_idle_fd); + if (!params.lru_gen) { + page_idle_fd =3D open("/sys/kernel/mm/page_idle/bitmap", O_RDWR); + __TEST_REQUIRE(page_idle_fd >=3D 0, + "CONFIG_IDLE_PAGE_TRACKING is not enabled"); + close(page_idle_fd); + } else { + int lru_gen_fd, lru_gen_debug_fd; + long mglru_features; + char mglru_feature_str[8] =3D {}; + + lru_gen_fd =3D open("/sys/kernel/mm/lru_gen/enabled", O_RDONLY); + __TEST_REQUIRE(lru_gen_fd >=3D 0, + "CONFIG_LRU_GEN is not enabled"); + TEST_ASSERT(read(lru_gen_fd, &mglru_feature_str, 7) > 0, + "couldn't read lru_gen features"); + mglru_features =3D strtol(mglru_feature_str, NULL, 16); + __TEST_REQUIRE(mglru_features & LRU_GEN_ENABLED, + "lru_gen is not enabled"); + __TEST_REQUIRE(mglru_features & LRU_GEN_MM_WALK, + "lru_gen does not support MM_WALK"); + __TEST_REQUIRE(mglru_features & LRU_GEN_SECONDARY_MMU_WALK, + "lru_gen does not support SECONDARY_MMU_WALK"); + + lru_gen_debug_fd =3D open(DEBUGFS_LRU_GEN, O_RDWR); + __TEST_REQUIRE(lru_gen_debug_fd >=3D 0, + "Cannot access %s", DEBUGFS_LRU_GEN); + close(lru_gen_debug_fd); + } + + TEST_ASSERT(!params.benchmark_lru_gen || params.lru_gen, + "-p specified without -l"); + + if (params.lru_gen) { + create_memcg(TEST_MEMCG_NAME); + move_to_memcg(TEST_MEMCG_NAME, getpid()); + } =20 - for_each_guest_mode(run_test, ¶ms); + for_each_guest_mode(setup_vm_and_run, ¶ms); =20 return 0; } diff --git a/tools/testing/selftests/kvm/include/lru_gen_util.h b/tools/tes= ting/selftests/kvm/include/lru_gen_util.h new file mode 100644 index 000000000000..4eef8085a3cb --- /dev/null +++ b/tools/testing/selftests/kvm/include/lru_gen_util.h @@ -0,0 +1,55 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Tools for integrating with lru_gen, like parsing the lru_gen debugfs ou= tput. + * + * Copyright (C) 2024, Google LLC. + */ +#ifndef SELFTEST_KVM_LRU_GEN_UTIL_H +#define SELFTEST_KVM_LRU_GEN_UTIL_H + +#include +#include +#include + +#include "test_util.h" + +#define MAX_NR_GENS 16 /* MAX_NR_GENS in include/linux/mmzone.h */ +#define MAX_NR_NODES 4 /* Maximum number of nodes we support */ + +static const char *DEBUGFS_LRU_GEN =3D "/sys/kernel/debug/lru_gen"; + +struct generation_stats { + int gen; + long age_ms; + long nr_anon; + long nr_file; +}; + +struct node_stats { + int node; + int nr_gens; /* Number of populated gens entries. */ + struct generation_stats gens[MAX_NR_GENS]; +}; + +struct memcg_stats { + unsigned long memcg_id; + int nr_nodes; /* Number of populated nodes entries. */ + struct node_stats nodes[MAX_NR_NODES]; +}; + +void print_memcg_stats(const struct memcg_stats *stats, const char *name); + +void read_memcg_stats(struct memcg_stats *stats, const char *memcg); + +void read_print_memcg_stats(struct memcg_stats *stats, const char *memcg); + +long sum_memcg_stats(const struct memcg_stats *stats); + +void lru_gen_do_aging(struct memcg_stats *stats, const char *memcg); + +void lru_gen_do_aging_quiet(struct memcg_stats *stats, const char *memcg); + +int lru_gen_find_generation(const struct memcg_stats *stats, + unsigned long total_pages); + +#endif /* SELFTEST_KVM_LRU_GEN_UTIL_H */ diff --git a/tools/testing/selftests/kvm/lib/lru_gen_util.c b/tools/testing= /selftests/kvm/lib/lru_gen_util.c new file mode 100644 index 000000000000..3c02a635a9f7 --- /dev/null +++ b/tools/testing/selftests/kvm/lib/lru_gen_util.c @@ -0,0 +1,391 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024, Google LLC. + */ + +#include + +#include "lru_gen_util.h" + +/* + * Tracks state while we parse memcg lru_gen stats. The file we're parsing= is + * structured like this (some extra whitespace elided): + * + * memcg (id) (path) + * node (id) + * (gen_nr) (age_in_ms) (nr_anon_pages) (nr_file_pages) + */ +struct memcg_stats_parse_context { + bool consumed; /* Whether or not this line was consumed */ + /* Next parse handler to invoke */ + void (*next_handler)(struct memcg_stats *, + struct memcg_stats_parse_context *, char *); + int current_node_idx; /* Current index in nodes array */ + const char *name; /* The name of the memcg we're looking for */ +}; + +static void memcg_stats_handle_searching(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line); +static void memcg_stats_handle_in_memcg(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line); +static void memcg_stats_handle_in_node(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line); + +struct split_iterator { + char *str; + char *save; +}; + +static char *split_next(struct split_iterator *it) +{ + char *ret =3D strtok_r(it->str, " \t\n\r", &it->save); + + it->str =3D NULL; + return ret; +} + +static void memcg_stats_handle_searching(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line) +{ + struct split_iterator it =3D { .str =3D line }; + char *prefix =3D split_next(&it); + char *memcg_id =3D split_next(&it); + char *memcg_name =3D split_next(&it); + char *end; + + ctx->consumed =3D true; + + if (!prefix || strcmp("memcg", prefix)) + return; /* Not a memcg line (maybe empty), skip */ + + TEST_ASSERT(memcg_id && memcg_name, + "malformed memcg line; no memcg id or memcg_name"); + + if (strcmp(memcg_name + 1, ctx->name)) + return; /* Wrong memcg, skip */ + + /* Found it! */ + + stats->memcg_id =3D strtoul(memcg_id, &end, 10); + TEST_ASSERT(*end =3D=3D '\0', "malformed memcg id '%s'", memcg_id); + if (!stats->memcg_id) + return; /* Removed memcg? */ + + ctx->next_handler =3D memcg_stats_handle_in_memcg; +} + +static void memcg_stats_handle_in_memcg(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line) +{ + struct split_iterator it =3D { .str =3D line }; + char *prefix =3D split_next(&it); + char *id =3D split_next(&it); + long found_node_id; + char *end; + + ctx->consumed =3D true; + ctx->current_node_idx =3D -1; + + if (!prefix) + return; /* Skip empty lines */ + + if (!strcmp("memcg", prefix)) { + /* Memcg done, found next one; stop. */ + ctx->next_handler =3D NULL; + return; + } else if (strcmp("node", prefix)) + TEST_ASSERT(false, "found malformed line after 'memcg ...'," + "token: '%s'", prefix); + + /* At this point we know we have a node line. Parse the ID. */ + + TEST_ASSERT(id, "malformed node line; no node id"); + + found_node_id =3D strtol(id, &end, 10); + TEST_ASSERT(*end =3D=3D '\0', "malformed node id '%s'", id); + + ctx->current_node_idx =3D stats->nr_nodes++; + TEST_ASSERT(ctx->current_node_idx < MAX_NR_NODES, + "memcg has stats for too many nodes, max is %d", + MAX_NR_NODES); + stats->nodes[ctx->current_node_idx].node =3D found_node_id; + + ctx->next_handler =3D memcg_stats_handle_in_node; +} + +static void memcg_stats_handle_in_node(struct memcg_stats *stats, + struct memcg_stats_parse_context *ctx, + char *line) +{ + /* Have to copy since we might not consume */ + char *my_line =3D strdup(line); + struct split_iterator it =3D { .str =3D my_line }; + char *gen, *age, *nr_anon, *nr_file; + struct node_stats *node_stats; + struct generation_stats *gen_stats; + char *end; + + TEST_ASSERT(it.str, "failed to copy input line"); + + gen =3D split_next(&it); + + /* Skip empty lines */ + if (!gen) + goto out_consume; /* Skip empty lines */ + + if (!strcmp("memcg", gen) || !strcmp("node", gen)) { + /* + * Reached next memcg or node section. Don't consume, let the + * other handler deal with this. + */ + ctx->next_handler =3D memcg_stats_handle_in_memcg; + goto out; + } + + node_stats =3D &stats->nodes[ctx->current_node_idx]; + TEST_ASSERT(node_stats->nr_gens < MAX_NR_GENS, + "found too many generation lines; max is %d", + MAX_NR_GENS); + gen_stats =3D &node_stats->gens[node_stats->nr_gens++]; + + age =3D split_next(&it); + nr_anon =3D split_next(&it); + nr_file =3D split_next(&it); + + TEST_ASSERT(age && nr_anon && nr_file, + "malformed generation line; not enough tokens"); + + gen_stats->gen =3D (int)strtol(gen, &end, 10); + TEST_ASSERT(*end =3D=3D '\0', "malformed generation number '%s'", gen); + + gen_stats->age_ms =3D strtol(age, &end, 10); + TEST_ASSERT(*end =3D=3D '\0', "malformed generation age '%s'", age); + + gen_stats->nr_anon =3D strtol(nr_anon, &end, 10); + TEST_ASSERT(*end =3D=3D '\0', "malformed anonymous page count '%s'", + nr_anon); + + gen_stats->nr_file =3D strtol(nr_file, &end, 10); + TEST_ASSERT(*end =3D=3D '\0', "malformed file page count '%s'", nr_file); + +out_consume: + ctx->consumed =3D true; +out: + free(my_line); +} + +/* Pretty-print lru_gen @stats. */ +void print_memcg_stats(const struct memcg_stats *stats, const char *name) +{ + int node, gen; + + fprintf(stderr, "stats for memcg %s (id %lu):\n", + name, stats->memcg_id); + for (node =3D 0; node < stats->nr_nodes; ++node) { + fprintf(stderr, "\tnode %d\n", stats->nodes[node].node); + for (gen =3D 0; gen < stats->nodes[node].nr_gens; ++gen) { + const struct generation_stats *gstats =3D + &stats->nodes[node].gens[gen]; + + fprintf(stderr, + "\t\tgen %d\tage_ms %ld" + "\tnr_anon %ld\tnr_file %ld\n", + gstats->gen, gstats->age_ms, gstats->nr_anon, + gstats->nr_file); + } + } +} + +/* Re-read lru_gen debugfs information for @memcg into @stats. */ +void read_memcg_stats(struct memcg_stats *stats, const char *memcg) +{ + FILE *f; + ssize_t read =3D 0; + char *line =3D NULL; + size_t bufsz; + struct memcg_stats_parse_context ctx =3D { + .next_handler =3D memcg_stats_handle_searching, + .name =3D memcg, + }; + + memset(stats, 0, sizeof(struct memcg_stats)); + + f =3D fopen(DEBUGFS_LRU_GEN, "r"); + TEST_ASSERT(f, "fopen(%s) failed", DEBUGFS_LRU_GEN); + + while (ctx.next_handler && (read =3D getline(&line, &bufsz, f)) > 0) { + ctx.consumed =3D false; + + do { + ctx.next_handler(stats, &ctx, line); + if (!ctx.next_handler) + break; + } while (!ctx.consumed); + } + + if (read < 0 && !feof(f)) + TEST_ASSERT(false, "getline(%s) failed", DEBUGFS_LRU_GEN); + + TEST_ASSERT(stats->memcg_id > 0, "Couldn't find memcg: %s\n" + "Did the memcg get created in the proper mount?", + memcg); + if (line) + free(line); + TEST_ASSERT(!fclose(f), "fclose(%s) failed", DEBUGFS_LRU_GEN); +} + +/* + * Find all pages tracked by lru_gen for this memcg in generation @target_= gen. + * + * If @target_gen is negative, look for all generations. + */ +static long sum_memcg_stats_for_gen(int target_gen, + const struct memcg_stats *stats) +{ + int node, gen; + long total_nr =3D 0; + + for (node =3D 0; node < stats->nr_nodes; ++node) { + const struct node_stats *node_stats =3D &stats->nodes[node]; + + for (gen =3D 0; gen < node_stats->nr_gens; ++gen) { + const struct generation_stats *gen_stats =3D + &node_stats->gens[gen]; + + if (target_gen >=3D 0 && gen_stats->gen !=3D target_gen) + continue; + + total_nr +=3D gen_stats->nr_anon + gen_stats->nr_file; + } + } + + return total_nr; +} + +/* Find all pages tracked by lru_gen for this memcg. */ +long sum_memcg_stats(const struct memcg_stats *stats) +{ + return sum_memcg_stats_for_gen(-1, stats); +} + +/* Read the memcg stats and optionally print if this is a debug build. */ +void read_print_memcg_stats(struct memcg_stats *stats, const char *memcg) +{ + read_memcg_stats(stats, memcg); +#ifdef DEBUG + print_memcg_stats(stats, memcg); +#endif +} + +/* + * If lru_gen aging should force page table scanning. + * + * If you want to set this to false, you will need to do eviction + * before doing extra aging passes. + */ +static const bool force_scan =3D true; + +static void run_aging_impl(unsigned long memcg_id, int node_id, int max_ge= n) +{ + FILE *f =3D fopen(DEBUGFS_LRU_GEN, "w"); + char *command; + size_t sz; + + TEST_ASSERT(f, "fopen(%s) failed", DEBUGFS_LRU_GEN); + sz =3D asprintf(&command, "+ %lu %d %d 1 %d\n", + memcg_id, node_id, max_gen, force_scan); + TEST_ASSERT(sz > 0, "creating aging command failed"); + + pr_debug("Running aging command: %s", command); + if (fwrite(command, sizeof(char), sz, f) < sz) { + TEST_ASSERT(false, "writing aging command %s to %s failed", + command, DEBUGFS_LRU_GEN); + } + + TEST_ASSERT(!fclose(f), "fclose(%s) failed", DEBUGFS_LRU_GEN); +} + +static void _lru_gen_do_aging(struct memcg_stats *stats, const char *memcg, + bool verbose) +{ + int node, gen; + struct timespec ts_start; + struct timespec ts_elapsed; + + pr_debug("lru_gen: invoking aging...\n"); + + /* Must read memcg stats to construct the proper aging command. */ + read_print_memcg_stats(stats, memcg); + + if (verbose) + clock_gettime(CLOCK_MONOTONIC, &ts_start); + + for (node =3D 0; node < stats->nr_nodes; ++node) { + int max_gen =3D 0; + + for (gen =3D 0; gen < stats->nodes[node].nr_gens; ++gen) { + int this_gen =3D stats->nodes[node].gens[gen].gen; + + max_gen =3D max_gen > this_gen ? max_gen : this_gen; + } + + run_aging_impl(stats->memcg_id, stats->nodes[node].node, + max_gen); + } + + if (verbose) { + ts_elapsed =3D timespec_elapsed(ts_start); + pr_info("%-30s: %ld.%09lds\n", "lru_gen: Aging", + ts_elapsed.tv_sec, ts_elapsed.tv_nsec); + } + + /* Re-read so callers get updated information */ + read_print_memcg_stats(stats, memcg); +} + +/* Do aging, and print how long it took. */ +void lru_gen_do_aging(struct memcg_stats *stats, const char *memcg) +{ + return _lru_gen_do_aging(stats, memcg, true); +} + +/* Do aging, don't print anything. */ +void lru_gen_do_aging_quiet(struct memcg_stats *stats, const char *memcg) +{ + return _lru_gen_do_aging(stats, memcg, false); +} + +/* + * Find which generation contains more than half of @total_pages, assuming= that + * such a generation exists. + */ +int lru_gen_find_generation(const struct memcg_stats *stats, + unsigned long total_pages) +{ + int node, gen, gen_idx, min_gen =3D INT_MAX, max_gen =3D -1; + + for (node =3D 0; node < stats->nr_nodes; ++node) + for (gen_idx =3D 0; gen_idx < stats->nodes[node].nr_gens; + ++gen_idx) { + gen =3D stats->nodes[node].gens[gen_idx].gen; + max_gen =3D gen > max_gen ? gen : max_gen; + min_gen =3D gen < min_gen ? gen : min_gen; + } + + for (gen =3D min_gen; gen < max_gen; ++gen) + /* See if the most pages are in this generation. */ + if (sum_memcg_stats_for_gen(gen, stats) > + total_pages / 2) + return gen; + + TEST_ASSERT(false, "No generation includes majority of %lu pages.", + total_pages); + + /* unreachable, but make the compiler happy */ + return -1; +} --=20 2.46.0.792.g87dc391469-goog